Hi!
The change in the way the pipelines work in the 0.6.0 version of Machine Learning.Net, also requires some changes in our code if we want to see how the data is processed during each of the pipeline’s steps. Using the example of my previous post, I will work with the following data structure.
On line 21, let’s create a temporary data view with the following code
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
static void Main(string[] args) | |
{ | |
var dataPath = "AgeRangeData.csv"; | |
var env = new LocalEnvironment(); | |
var reader = TextLoader.CreateReader(env, ctx => ( | |
Name: ctx.LoadText(0), | |
Age: ctx.LoadFloat(1), | |
Gender: ctx.LoadText(2), | |
Label: ctx.LoadText(3)), | |
separator: ',', hasHeader: true); | |
var trainData = reader.Read(new MultiFileSource(dataPath)); | |
var classification = new MulticlassClassificationContext(env); | |
var learningPipeline = reader.MakeNewEstimator() | |
.Append(r => ( | |
r.Label, | |
Predictions: classification.Trainers.Sdca | |
(label: r.Label.ToKey(), | |
features: r.Age.AsVector()))); | |
// create temp view of data | |
var data = reader.Read(new MultiFileSource(dataPath)); | |
var tempData = learningPipeline.Fit(data).Transform(data); | |
var tempRows = tempData.AsDynamic | |
.AsEnumerable<AgeRange>(env, reuseRowObject: false).ToArray(); | |
learningPipeline.Append(r => r.Predictions.predictedLabel.ToValue()); |
That allows me to analyze the data directly in the IDE in debugging mode, or even save the dataset in a temporary file
This may be enough, however, when working with a class that has the 4 fields of my dataset, I am forcing my App to load and map all the data for each column.
In this example, the generated model to generate a prediction only needs the fields Label and Age. So, we can remove the definition of the columns Name and Gender. We will find the following error
System.InvalidOperationException
HResult=0x80131509
Message=Column ‘Name’ not found in the data view
Source=Microsoft.ML.Api
By default, the load process attempts to map all columns. The solution is to enable the option to ignore missing columns, as the following example shows.
Finally, if we work with a lot of data, we can also leverage some LINQ features to just work with a small set of rows
Happy coding!
Greetings @ Toronto
El Bruno
References
My Posts
- API improvements in the new 0.6.0 version
- Fix the error [System. InvalidOperationException, Entry Point ‘ Not found] when you train a pipeline
- Adding NuGet Packages in Preview mode from MyGet, ie: Microsoft.ML-0.6.0 Version
- ML.Net 0.5 initial support for TensorFlow
- New version 0.4, news Improvements in Text analysis using Word Embedding
- Error ‘Entry point ‘Trainers.LightGbmClassifier’ not found’ and how to fix it
- Machine Learning Glossary of terms
- Export Machine Learning.Net models to ONNX format
- Loading Data In our Learning Pipeline With List (Lists for ever!)
- What’s new in version 0.2.0
- What’s a Machine Learning model? A 7 minute video as the best possible explanation
- Write and Load models using Machine Learning .Net
- Understanding the step by step of Hello World
- Hello World in ML.Net, Machine Learning for .Net !
4 comments