#MLNET – New version 0.4, news Improvements in Text analysis using Word Embedding

Hi!

Yesterday the Machine Learning.Net team announced a new version 0.4. There are several interesting novelties, however, I like the mention of Word Embedding. I have put to read a little about it and the truth is that the ability to use some existing models of text processing and on the same build our models is something that is appreciated.

In the original team post they shared tons of materials references about Word Embeddings, I have decided to take the sample Console App from the repository and test the differences between classic processing and what we can do with WE.

After a quick test, and using the same data Set for analysis and evaluation, the classic model with works with a Accuracy of 66.60% and using WE the Accuracy goes up to 72.30%

i1

The code for the test is this one. The main difference is in the functions TrainModel() and TrainModelWordEmbeddings().

As I commented at the beginning the interesting thing about this new RElease is the ability to use various models pretrained. In the MSDN post we talk about GloVe, FastText and SSWE. The next step would be to see how the new models behave using some of these models

i2

There are already several models, I’ll just do the test with Some of them because, during the training process, the existing models are downloaded Ondemand and download ~ 6GB per test is at least interesting

I3

Well, the results are pretty interesting

=============== Evaluating model normal ===============
Accuracy: 66.60%
Auc: 73.97%
F1Score: 61.78%
=============== End evaluating ===============

=============== Evaluating model using WordEmbeddings ===============
Accuracy: 72.30%
Auc: 81.19%
F1Score: 70.50%
=============== End evaluating ===============

=============== Evaluating model using WordEmbeddings GloVe50D ===============
Accuracy: 66.10%
Auc: 69.32%
F1Score: 64.28%
=============== End evaluating ===============

=============== Evaluating model using WordEmbeddings GloVe300D ===============
Accuracy: 67.80%
Auc: 73.23%
F1Score: 66.60%
=============== End evaluating ===============

=============== Evaluating model using WordEmbeddings GloVeTwitter50D ===============
Accuracy: 65.30%
Auc: 70.06%
F1Score: 64.26%
=============== End evaluating ===============

=============== Evaluating model using WordEmbeddings GloVeTwitter200D ===============
Accuracy: 65.40%
Auc: 72.63%
F1Score: 64.69%
=============== End evaluating ===============

=============== Evaluating model using WordEmbeddings Sswe ===============
Accuracy: 72.30%
Auc: 81.19%
F1Score: 70.50%
=============== End evaluating ===============

 

El código completo de la app se puede descargar desde https://github.com/elbruno/Blog/tree/master/20180808%20MLNET%200.4%20WordEmbeddings

Happy Coding!

Greetings @ Toronto

El Bruno

References

My Posts

Advertisements

#MLNET – Nueva version 0.4, nuevas mejoras en análisis de texto con Word Embedding

Buenas!

Ayer se anuncio una nueva version de Machine Learning.Net, la version 0.4. Hay varias novedades interesantes, sin embargo, a mí me llamo la atención la mención de Word Embedding. Me he puesto a leer un poco al respecto y la verdad es que la capacidad de utilizar algunos modelos existentes de procesamiento de texto y sobre los mismos construir nuestros modelos es algo que se agradece.

En el post de lanzamiento comentan los detalles al respecto, yo he decidido tomar la app de consola de ejemplo del repositorio y ver las diferencias entre el procesamiento clásico y el que podemos hacer con WE. Utilizando los mismos Set de datos para el análisis y evaluación el modelo clásico con trabaja con una Precisión del 66.60% y utilizando WE esta Precisión sube hasta 72.30%

i1

El código de entrenamiento de los mismos es el siguiente

En la App anterior, utilizo los mismos DataSets para el entrenamiento y evaluación. La diferencia se puede ver en las funciones TrainModel() y TrainModelWordEmbeddings().

Como comentaba al principio lo interesante de este nuevo Release es la capacidad de utilizar varios modelos preentrenados. En el post de MSDN se habla de GloVe, fastText y SSWE. El paso siguiente seria ver como se comportan los nuevos modelos utilizando algunos de estos modelos

i2

Ya hay varios modelos, solo hare la prueba con algunos de ellos ya que, durante el proceso de entrenamiento, los modelos existentes se descargan OnDemand y descargar +6GB por prueba es, como mínimo, interesante

I3

Pues bien, los resultados son bastante interesantes

=============== Evaluating model normal ===============
Accuracy: 66.60%
Auc: 73.97%
F1Score: 61.78%
=============== End evaluating ===============

=============== Evaluating model using WordEmbeddings ===============
Accuracy: 72.30%
Auc: 81.19%
F1Score: 70.50%
=============== End evaluating ===============

=============== Evaluating model using WordEmbeddings GloVe50D ===============
Accuracy: 66.10%
Auc: 69.32%
F1Score: 64.28%
=============== End evaluating ===============

=============== Evaluating model using WordEmbeddings GloVe300D ===============
Accuracy: 67.80%
Auc: 73.23%
F1Score: 66.60%
=============== End evaluating ===============

=============== Evaluating model using WordEmbeddings GloVeTwitter50D ===============
Accuracy: 65.30%
Auc: 70.06%
F1Score: 64.26%
=============== End evaluating ===============

=============== Evaluating model using WordEmbeddings GloVeTwitter200D ===============
Accuracy: 65.40%
Auc: 72.63%
F1Score: 64.69%
=============== End evaluating ===============

=============== Evaluating model using WordEmbeddings Sswe ===============
Accuracy: 72.30%
Auc: 81.19%
F1Score: 70.50%
=============== End evaluating ===============

 

El código completo de la app se puede descargar desde https://github.com/elbruno/Blog/tree/master/20180808%20MLNET%200.4%20WordEmbeddings

Happy Coding!

Saludos @ Toronto

El Bruno

References

My Posts