#VS2019 – ML.NET Model Builder training using GPU, CPU and … Azure !

Buy Me A Coffee

Hi !

I my previous posts on Model Builder, I show the differences between using the CPU and GPU to train a model. There is a 3rd option, which involves an Azure ML Experiment, and performs the training on the cloud.

It took me some time, to setup this environment, mostly because I tried to use an existing Azure Compute Instance that I already have, and Model Builder needs a Compute Cluster.

Here is also important to remark, that you need to create a Dedicated GPU based Compute Cluster. There are some expenses / costs associated to these resources, so make your numbers before you start.

And, here we go, now we can move forward with the Model Builder Assistant.

I made some tests using a small image data set, and it was awesome. Training an 24 images dataset took between 8 and 9 minutes. and the results were very good. A good insight here, is the chance to get more details directly in the Azure Machine Learning portal.

We can go deep in each experiment, and take a look at some metrics like F1 Score, Precision, Recall and more.

Each Model Builder Image Classification project, will trigger several Azure ML Experiments.

  • Automated ML
  • HyperDrive
  • Preparation
  • Script

The Script experiment is the one we can open to get access to some detailed logs, and also to the ONNX model.

So, I decided to go big and test this using the set of images from a Kaggle challenge [State Farm Distracted Driver Detection] (see references). This is a 1GB image set, 22424images, with ten categories.

The 1st step is to upload the 22424 images to an Azure resource, this took some time.

And then, start tracking the progress in the Azure Machine Learning portal.

And after some time, the process triggered a timeout.

The details on the 4 experiments suggests that some limit was exceeded. I’m not sure if from the IDE or on the Azure side.

However, the experiment in charge to train the model [Run 12], got some sucess models. Acuraccy, F1 and precision were getting better

Reading some log I can see how the error was triggered on Epoch 8. I need to spend more time here to figure out what’s happened !

Note: I already reported the issue to the GitHub Repo.

As final thought. Using Azure as the training environment in Model Builder is an amazing option. A big DataSet maybe a problem, or maybe my quota is the problem. Anyways, in smaller DataSets it worked great. I’ll keep an eye in this issue and update the blog with some news.

Happy coding!


El Bruno

More posts in my blog ElBruno.com.

More info in https://beacons.ai/elbruno




Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: