#MLNET – Analyzing pipeline data in Machine Learning.Net using the new API 0.6.0 (thanks LINQ!)

Hi!

The change in the way the pipelines work in the 0.6.0 version of Machine Learning.Net, also requires some changes in our code if we want to see how the data is processed during each of the pipeline’s steps. Using the example of my previous post, I will work with the following data structure.

01 mlnet age range data

On line 21, let’s create a temporary data view with the following code


static void Main(string[] args)
{
var dataPath = "AgeRangeData.csv";
var env = new LocalEnvironment();
var reader = TextLoader.CreateReader(env, ctx => (
Name: ctx.LoadText(0),
Age: ctx.LoadFloat(1),
Gender: ctx.LoadText(2),
Label: ctx.LoadText(3)),
separator: ',', hasHeader: true);
var trainData = reader.Read(new MultiFileSource(dataPath));
var classification = new MulticlassClassificationContext(env);
var learningPipeline = reader.MakeNewEstimator()
.Append(r => (
r.Label,
Predictions: classification.Trainers.Sdca
(label: r.Label.ToKey(),
features: r.Age.AsVector())));
// create temp view of data
var data = reader.Read(new MultiFileSource(dataPath));
var tempData = learningPipeline.Fit(data).Transform(data);
var tempRows = tempData.AsDynamic
.AsEnumerable<AgeRange>(env, reuseRowObject: false).ToArray();
learningPipeline.Append(r => r.Predictions.predictedLabel.ToValue());

That allows me to analyze the data directly in the IDE in debugging mode, or even save the dataset in a temporary file

02 MLNet temp view of pipeline data

This may be enough, however, when working with a class that has the 4 fields of my dataset, I am forcing my App to load and map all the data for each column.

In this example, the generated model to generate a prediction only needs the fields Label and Age. So, we can remove the definition of the columns Name and Gender. We will find the following error

03 mlnet missing columns while loading data

System.InvalidOperationException

  HResult=0x80131509

  Message=Column ‘Name’ not found in the data view

  Source=Microsoft.ML.Api

By default, the load process attempts to map all columns. The solution is to enable the option to ignore missing columns, as the following example shows.

04 mlnet ignore missing columns

Finally, if we work with a lot of data, we can also leverage some LINQ features to just work with a small set of rows

05 mlnet take only 3 elements

Happy coding!

Greetings @ Toronto

El Bruno

References

My Posts

4 comments

Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: