El Bruno

#MLNET – Analyzing pipeline data in Machine Learning.Net using the new API 0.6.0 (thanks LINQ!)

Hi!

The change in the way the pipelines work in the 0.6.0 version of Machine Learning.Net, also requires some changes in our code if we want to see how the data is processed during each of the pipeline’s steps. Using the example of my previous post, I will work with the following data structure.

01 mlnet age range data

On line 21, let’s create a temporary data view with the following code

	static void Main(string[] args)
	{
	var dataPath = "AgeRangeData.csv";
	var env = new LocalEnvironment();
	var reader = TextLoader.CreateReader(env, ctx => (
	Name: ctx.LoadText(0),
	Age: ctx.LoadFloat(1),
	Gender: ctx.LoadText(2),
	Label: ctx.LoadText(3)),
	separator: ',', hasHeader: true);
	var trainData = reader.Read(new MultiFileSource(dataPath));

	var classification = new MulticlassClassificationContext(env);
	var learningPipeline = reader.MakeNewEstimator()
	.Append(r => (
	r.Label,
	Predictions: classification.Trainers.Sdca
	(label: r.Label.ToKey(),
	features: r.Age.AsVector())));

	// create temp view of data
	var data = reader.Read(new MultiFileSource(dataPath));
	var tempData = learningPipeline.Fit(data).Transform(data);
	var tempRows = tempData.AsDynamic
	.AsEnumerable<AgeRange>(env, reuseRowObject: false).ToArray();

	learningPipeline.Append(r => r.Predictions.predictedLabel.ToValue());

view raw

MLNet060SampleProgramViewTempData.cs

hosted with ❤ by GitHub

That allows me to analyze the data directly in the IDE in debugging mode, or even save the dataset in a temporary file

02 MLNet temp view of pipeline data

This may be enough, however, when working with a class that has the 4 fields of my dataset, I am forcing my App to load and map all the data for each column.

In this example, the generated model to generate a prediction only needs the fields Label and Age. So, we can remove the definition of the columns Name and Gender. We will find the following error

System.InvalidOperationException

HResult=0x80131509

Message=Column ‘Name’ not found in the data view

Source=Microsoft.ML.Api

By default, the load process attempts to map all columns. The solution is to enable the option to ignore missing columns, as the following example shows.

04 mlnet ignore missing columns

Finally, if we work with a lot of data, we can also leverage some LINQ features to just work with a small set of rows

05 mlnet take only 3 elements

Happy coding!

Greetings @ Toronto

El Bruno

References

My Posts

12 Oct 2018

4 responses to “#MLNET – Analyzing pipeline data in Machine Learning.Net using the new API 0.6.0 (thanks LINQ!)”

#MLNET – Novedades en la version 0.7 de Machine Learning.Net (la excusa perfecta para actualizar proximos eventos!) – El Bruno

Nov 12, 2018 3:31 AM

[…] Analyzing pipeline data in Machine Learning.Net using the new API 0.6.0 (thanks LINQ!) […]

LikeLike

Reply
#MLNET – New version 0.7 for Machine Learning.Net (the perfect excuse to update my content for next events!) – El Bruno

Nov 12, 2018 9:00 AM

[…] Analyzing pipeline data in Machine Learning.Net using the new API 0.6.0 (thanks LINQ!) […]

LikeLike

Reply
#MLNet – Visualizando datos del Pipeline en la versión 0.7.0 – El Bruno

Nov 27, 2018 3:31 AM

[…] Analyzing pipeline data in Machine Learning.Net using the new API 0.6.0 (thanks LINQ!) […]

LikeLike

Reply
#MLNet – Looking at data in the Pipeline in version 0.7.0 – El Bruno

Nov 27, 2018 9:01 AM

[…] Analyzing pipeline data in Machine Learning.Net using the new API 0.6.0 (thanks LINQ!) […]

LikeLike

Reply

#MLNET – Analyzing pipeline data in Machine Learning.Net using the new API 0.6.0 (thanks LINQ!)

Share this:

4 responses to “#MLNET – Analyzing pipeline data in Machine Learning.Net using the new API 0.6.0 (thanks LINQ!)”

Leave a comment Cancel reply

Discover more from El Bruno