#Event – Resources used on the Azure AI Fundamentals, intro to #AzureML @conosurtech

Buy Me A Coffee

Hi !

Amazing session yesterday with my friends from ConoSurTech, Fernando, Ivana y Pablo. We had a great session with a high level overview around Azure Machine Learning, as part of the “Diplomado de AI Fundamentals 2020

Diplomado de AI Fundamentals 2020

El diplomado consiste en clases online dictadas a partir del 20 de Octubre de 2020. Las clases son dictadas los días Martes y Jueves a partir de las 18hrs (GMT-3), junto a entrevistas con líderes de productos y comunidades. El diplomado recorrerá los temas que forman parte de la certificación AI-900 Artificial Intelligence Fundamentals sumado a los últimos lanzamientos realizados por Microsoft en el mes de Septiembre. 

So, as usual now is the time to share slides, and links. No Code this time, so you are going to avoid some Copy & Paste.

Slides

Links

A very long list here…

Personal Note

It was a fast session, with over +100 viewers, and … so many topics to talk ! I prepared a full MLOps pipeline to show, I was planning to code from scratch a Jupyter Notebook, and more … we didn’t have time. This comment resume it all:

Recording

Resources

#MLNet – AutoML for ranking scenarios

Buy Me A Coffee

Hi !

This is a cool one, Machine Learning .Net (ML.Net) now support ranking scenario in AutoML. As an Machine Learning aficionado, this is amazing. I can now process my problems with AutoML, and then learn the specifics of the best produced models.

Note: a couple of weeks ago, someone asked a question around ranking scenarios. My knowledge is low here, so I kindly shared a couple of starting points. With AutoML now supporting ranking scenarios, the response is completely different!

So I picked up the sample for the current version (1.5.1), the sample for standard ranking scenarios from ML.Net and a data source based on a public datasets provided by Microsoft originally provided Microsoft Bing (see references); and I created this sample

using System;
using System.Linq;
using System.Net.Http.Headers;
using Microsoft.ML;
using Microsoft.ML.AutoML;
using Microsoft.ML.Data;
namespace ConsoleApp1
{
public class Program
{
static void Main(string[] args)
{
Console.WriteLine("Start …");
Run();
Console.WriteLine("End");
}
private static string TrainDataPath = @"data\train.txt";
private static string TestDataPath = @"data\test.txt";
private static string ModelPath = @"Model.zip";
private static string LabelColumnName = "Label";
private static string GroupColumnName = "GroupId";
private static uint ExperimentTime = 600;
public static void Run()
{
var mlContext = new MLContext();
// STEP 1: Load data
var trainDataView = mlContext.Data.LoadFromTextFile<SearchData>(TrainDataPath, hasHeader: false, separatorChar: '\t');
var testDataView = mlContext.Data.LoadFromTextFile<SearchData>(TestDataPath, hasHeader: false, separatorChar: '\t');
// STEP 2: Run AutoML experiment
Console.WriteLine($"Running AutoML recommendation experiment for {ExperimentTime} seconds…");
var experimentResult = mlContext.Auto()
.CreateRankingExperiment(new RankingExperimentSettings() { MaxExperimentTimeInSeconds = ExperimentTime })
.Execute(trainDataView, testDataView,
new ColumnInformation()
{
LabelColumnName = LabelColumnName,
GroupIdColumnName = GroupColumnName
});
// STEP 3: Print metric from best model
var bestRun = experimentResult.BestRun;
Console.WriteLine($"=====================================================");
Console.WriteLine($"Total models produced: {experimentResult.RunDetails.Count()}");
var i = 0;
foreach (var experimentResultRunDetail in experimentResult.RunDetails)
{
i++;
Console.WriteLine($" {i} – TrainerName: {experimentResultRunDetail.TrainerName}");
Console.WriteLine($" Runtime In Seconds: {experimentResultRunDetail.RuntimeInSeconds}");
Console.WriteLine($"");
//PrintMetrics(experimentResultRunDetail.ValidationMetrics);
}
Console.WriteLine($"");
Console.WriteLine($"=====================================================");
Console.WriteLine($"Best model's trainer: {bestRun.TrainerName}");
// STEP 5: Evaluate test data
var testDataViewWithBestScore = bestRun.Model.Transform(testDataView);
var testMetrics = mlContext.Ranking.Evaluate(testDataViewWithBestScore, labelColumnName: LabelColumnName);
Console.WriteLine($"Metrics of best model on test data —");
PrintMetrics(testMetrics);
// STEP 6: Save the best model for later deployment and inferencing
mlContext.Model.Save(bestRun.Model, trainDataView.Schema, ModelPath);
// STEP 7: Create prediction engine from the best trained model
var predictionEngine = mlContext.Model.CreatePredictionEngine<SearchData, SearchDataPrediction>(bestRun.Model);
// STEP 8: Initialize a new test, and get the prediction
var testPage = new SearchData
{
GroupId = "1",
Features = 9,
Label = 1
};
var prediction = predictionEngine.Predict(testPage);
Console.WriteLine($"Predicted rating for: {prediction.Prediction}");
// New Page
testPage = new SearchData
{
GroupId = "2",
Features = 2,
Label = 9
};
prediction = predictionEngine.Predict(testPage);
Console.WriteLine($"Predicted: {prediction.Prediction}");
Console.WriteLine("Press any key to continue…");
Console.ReadKey();
}
private static void PrintMetrics(RankingMetrics metrics)
{
if (metrics is null)
{
Console.WriteLine($" No metrics");
return;
}
var ndcg = metrics.NormalizedDiscountedCumulativeGains.Aggregate("", (current, p) => current + p + "");
var dcg = metrics.DiscountedCumulativeGains.Aggregate("", (current, p) => current + p + "");
Console.WriteLine($" Normalized Discounted Cumulative Gains: {ndcg}");
Console.WriteLine($" Discounted Cumulative Gains: {dcg}");
}
}
class SearchData
{
[LoadColumn(0)]
public string GroupId;
[LoadColumn(1)]
public float Features;
[LoadColumn(2)]
public float Label;
}
class SearchDataPrediction
{
[ColumnName("PredictedLabel")]
public float Prediction;
public float Score { get; set; }
}
}

The sample run for 10 minutes and evaluates 33 models.

The test file is 266MB and the train data file is 799 MBs.

At the end, the best trainer is [FastTreeRanking].

The output is also very clear about the tested models and the best one. (I trimmed this to make it clearer).

Start ...
Running AutoML recommendation experiment for 600 seconds...
=====================================================
Total models produced: 33
  1 - TrainerName: LightGbmRanking
      Runtime In Seconds: 10.6167636

  2 - TrainerName: FastTreeRanking
      Runtime In Seconds: 11.1055165

  3 - TrainerName: FastTreeRanking
      Runtime In Seconds: 35.0196598

  4 - TrainerName: FastTreeRanking
      Runtime In Seconds: 6.0401781
  ...
=====================================================
Best model's trainer: FastTreeRanking

Press any key to continue...

Super cool feature !

Happy coding!

Greetings

El Bruno

References

#Event – Machine Learning.Net y AutoML, esta vez en Español !

Buy Me A Coffee

Buenas !

Seguimos en modo StayAtHome, y una forma excelente de conectar con las comunidades, es participando en eventos ya sea como Speaker o como Attendee.

Esta vez tengo la oportunidad de hablar en NetCoreConf:

NetCoreConf 2020

Lo último en tecnologías Microsoft y mucho más con los mejores expertos. Donde podrás aprender, compartir y hacer networking. Asistiendo a diversas Conferencias y Workshops. Hablaremos sobre NetCore, Azure, Xamarin, IA, Big Data. ¿A que estas esperando?

NetCoreConf 2020 realizará el primer evento virtual a nivel global dedicado exclusivamente al sector del desarrollo y consultoría que busca descubrir y dar a conocer las nuevas tecnologías de vanguardia y crear vínculos estratégicos que generen sinergias conjuntas entre los profesionales del sector, empresas e instituciones.

NetCoreConf 2020

Mas información NetCoreConf Virtual 2020

La agenda es impresionante, y yo hablaré de uno de los productos más interesantes que Microsoft ha presentado en los últimos años: Machine Learning.Net. En mi sesión comentaré un poco la historia y algunos ejemplos del producto, y además un poco de una herramienta muy interesante para los no programadores: AutoML.

Finalmente, agradecer al gran equipo que esta detrás de este evento:

Happy coding!

Greetings

El Bruno

#Event – Resources used during Getting started with #MachineLearning.net with @TheDataGeeks

Hi!

It was a placer to share some amazing and early time with the Data Platform Geeks in a webinar about Machine Learning.Net.

Slides

Source Code

https://github.com/elbruno/events/tree/master/2020%2001%2021%20DPG%20MLNet

Resources

Event information

https://www.linkedin.com/feed/update/urn:li:activity:6625330471557529600

https://www.linkedin.com/feed/update/urn:li:activity:6625330471557529600

Happy Coding!

Greetings @ Burlington

El Bruno

#Event – Resources used during the #MachineLearning Galore at @MississaugaNetU

Hi!

It was a placer to share some amazing time with the Mississauga .Net User Group last night in my last session of the decade. It was a full night focused on Artificial Intelligence and Machine Learning, and as usual is time to share the resources used in the session.

Slides

Source Code

https://github.com/elbruno/events/tree/master/2019%2009%2005%20Global%20AI%20Night

Resources

Event information

Happy Coding!

Greetings @ Burlington

El Bruno

#Event – Resources used during the #GlobalAINight at @MarsDD

Hi!

It was a placer to share some amazing time with the Metro Toronto .Net User Group. Last night was also a special one, we hosted the event at the amazing @MarsDD it was great to have a huge group interested in Artificial Intelligence.

As usual, it’s time to share the resources of the event

Official Resources https://aka.ms/AA60hn1

This includes Workshops like

  • Creating applications that can see, hear, speak or understand – using Microsoft Cognitive Services
  • Learn how to train high accuracy machine learning models using automated machine learning
  • Crash course on building and accelerating deep learning solutions
  • And more.

It also includes a set of materials around Automated Machine Learning (AutoML).

And of course, my materials.

Slides

Source Code: https://github.com/elbruno/events/tree/master/2019%2009%2005%20Global%20AI%20Night

Happy Coding!

Greetings @ Toronto

El Bruno

Resources

Tweets

#MLNET – How to use the AutoML API in a Console App

Hi !

In my last posts I was testing AutoML using the Model Builder inside Visual Studio and also the CLI commands. There is also an API to use this in a .Net app, and the usage is very simple.

It all start, of course, adding the [Microsoft.ML.AutoML] nuget package

I read the documentation in [How to use the ML.NET automated machine learning API], and I created the following sample using the same data as in my previous posts.

private const uint ExperimentTime = 180;
static void Main(string[] args)
{
var mlContext = new MLContext();
Train(mlContext);
Console.WriteLine("Process complete! Press any key to close the app.");
Console.ReadKey();
}
public static void Train(MLContext mlContext)
{
try
{
// STEP 1: Load the data
var trainData = mlContext.Data.LoadFromTextFile(path: "AgeRangeData03_AgeGenderLabelEncodedMoreData.csv",
columns: new[]
{
new TextLoader.Column("Age", DataKind.Single, 0),
new TextLoader.Column("Gender", DataKind.Single, 1)
,
new TextLoader.Column("Label", DataKind.Single, 2)
},
hasHeader: true,
separatorChar: ','
);
var progressHandler = new MulticlassExperimentProgressHandler();
ConsoleHelper.ConsoleWriteHeader("=============== Running AutoML experiment ===============");
Console.WriteLine($"Running AutoML multiclass classification experiment for {ExperimentTime} seconds…");
ExperimentResult<MulticlassClassificationMetrics> experimentResult = mlContext.Auto()
.CreateMulticlassClassificationExperiment(ExperimentTime)
.Execute(trainData, "Label", progressHandler: progressHandler);
// Print top models found by AutoML
Console.WriteLine();
PrintTopModels(experimentResult);
Console.WriteLine();
}
catch (Exception ex)
{
Console.WriteLine(ex);
}
}

The final result displays the results for each one of the tests and showcase the top 3 ranked models. This time LightGBM Trainer is one more time the best trainer to choose.

There is a full set of samples in the Machine Learning .Net Samples repository. I’ve reused some classes from the Common folder.

The complete source code is available https://github.com/elbruno/Blog/tree/master/20190516%20MLNET%20AutoML%20API

Happy Coding!

Greetings @ Toronto

El Bruno

References

#MLNET – Are you a Command line user? MLNet CLI is great for some AutoML train tasks!

Hi !

Yesterday I wrote about how easy is to use Model Builder to create Machine Learning models directly from data inside Visual Studio.

If you prefer to work with command line interfaces, Machine Learning.Net AutoML also have a CLI interface and with a couple of commands, you can get some amazing results.

So, for this test I follow the tutorial [Auto generate a binary classifier using the CLI] and make some changes to the original command

> mlnet auto-train --task binary-classification --dataset "yelp_labelled.txt" --label-column-index 1 --has-header false --max-exploration-time 10

I’m using the same set of data I used yesterday and, my command is

mlnet auto-train --task regression --dataset "AgeRangeData03_AgeGenderLabelEncodedMoreData.csv" --label-column-index 2 --has-header true --max-exploration-time 60

The output is also interesting: it suggest to use a FastTree Regression trainer

My yesterday test using the IDE suggested a LightBGM regression trainer.

So, I decided to run the CLI one more time with some more processing time. This time the result is also a FastTree Tegression trainer.

Unless you need to use Visual Studio, this option is amazing for fast tests and you can also use the generated projects!

Happy Coding!

Greetings @ Toronto

El Bruno

References

#AutoML – Automated Machine Learning,AKA #Skynet

Hi!

IMHO one of the most important announcements presented last week in Ignite was the Azure preview for AutoML: Automated Machine Learning.

I’m not going to get into details about AutoML, the best option is to read the official post from the Azure Machine Learning team (see references). I’ll do my best effort to try to summarize that the objective of this new tool if to allows you to automatically identify the best pipeline to work in a machine learning environment / scenario.

A pipeline comprises the basic steps of a process of ML

  • Working with data, this means sorting, filtering, check for nulls, labeling, etc.
  • Select a learning algorithm, SVM, Fast Tree, etc.
  • Define features and Labels, adjust parameters, etc

The [try / error / learn] model in each of these steps help us to improve our model, and to get better results (better accuracy).

AutoML It proposes an automatic service, where the best combination is identified to create a pipeline with the best possible accuracy. As always an image rocks the explanation

01 AutoML process.png

Official description

Automated ML is available to try in the preview of Azure Machine Learning. We currently support classification and regression ML model recommendation on numeric and text data, with support for automatic feature generation (including missing values imputations, encoding, normalizations and heuristics-based features), feature transformations and selection. Data scientists can use automated ML through the Azure Machine Learning Python SDK and Jupyter notebook experience. Training can be performed on a local machine or by leveraging the scale and performance of Azure by running it on Azure Machine Learning managed compute. Customers have the flexibility to pick a pipeline from automated ML and customize it before deployment. Model explainability, ensemble models, full support for Azure Databricks and improvements to automated feature engineering will be coming soon.

From here I strongly recommend reading the official documentation that is where it is explained in detail AutoML. Also, if you are familiar with Jupyter Notebooks, in few seconds you can clone and access a new library with a tutorial to try AutoML from zero. You need to clone a repo from https://github.com/Azure/MachineLearningNotebooks

02 AutoML Jupyter Notebooks

The tutorial is pretty straightforward, and with little Azure resources you can see how you optimize a A Classification model with AutoML

03 AzureML Local tutorial

Although for now only models of classification and regression are supported, AutoML is a tool a Keep in mind when you start working in ML.

See you at the event, Happy Coding!

Greetings @ Toronto

El Bruno

References

#AutoML – Automated Machine Learning, modelos de #MachineLearning que aprenden a optimizarse! (en las movies se llama #skynet)

Buenas!

Una de las noticias mas importantes que se presentaron la semana pasada en Ignite fue la Preview de Azure AutoML: Automated Machine Learning.

Lo mejor para entrar en detalles sobre AutoML es leer el post oficial del equipo de Azure Machine Learning (ver referencias). Yo lo intentare resumir en un nuevo framework que permite identificar de forma automática el mejor pipeline para trabajar con datos.

Un pipeline comprende los pasos básicos de un proceso de ML

  • Trabajar con datos, esto significa ordenarlos, eliminar los nulls, etiquetarlos, etc.
  • Seleccionar un algoritmo de aprendizaje, SVM, Fast Tree, etc.
  • Definir Features y Labels, ajustar parámetros, etc

El modelo de prueba / error / aprendizaje en cada uno de estos pasos define la precisión que tendrá nuestro modelo final.

AutoML propone un servicio automático, donde se identifica la mejor combinación para crear una Pipeline con la mejor precisión posible. Como siempre una imagen ayuda a la explicación

01 AutoML process.png

Y la descripción oficial

Automated ML is available to try in the preview of Azure Machine Learning. We currently support classification and regression ML model recommendation on numeric and text data, with support for automatic feature generation (including missing values imputations, encoding, normalizations and heuristics-based features), feature transformations and selection. Data scientists can use automated ML through the Azure Machine Learning Python SDK and Jupyter notebook experience. Training can be performed on a local machine or by leveraging the scale and performance of Azure by running it on Azure Machine Learning managed compute. Customers have the flexibility to pick a pipeline from automated ML and customize it before deployment. Model explainability, ensemble models, full support for Azure Databricks and improvements to automated feature engineering will be coming soon.

Pues bien, a partir de aquí recomiendo leer la documentación oficial que es donde se explica en detalle AutoML.

Si estas familiarizado con Jupyter notebooks, en pocos segundos puedes tener acceso a un tutorial mas que completo solo clonado una library desde https://github.com/Azure/MachineLearningNotebooks

02 AutoML Jupyter Notebooks

El tutorial es bastante sencillo, y con pocos recursos de Azure puedes ver como se optimiza un un modelo de clasificación con AutoML

03 AzureML Local tutorial

Si bien por ahora solo se soportan modelos de clasificación y regresión, AutoML es una herramienta a tener en cuenta cuando comienzas a trabajar en ML.

Nos vemos en el evento, happy coding!

Saludos @ Toronto

El Bruno

References