Another huge and amazing events from my friends at the @netcoreconf, perfect excuse to talk about python, machine learning, computer vision and more. This one was also tricky, no slides, just code during 50 min so here are some related resources.
So after yesterday post where I used YoloV3 and MobileNetSSD, I also remember that we have YoloV4 released on April. I managed to make my code work with YoloV4 with some poor FPS results.
If you are interested on the code, let me know and I’ll be happy to share the code. It’s still a mess, working but a mess.
Abstract: There are a huge number of features which are said to improve Convolutional Neural Network (CNN) accuracy. Practical testing of combinations of such features on large datasets, and theoretical justification of the result, is required. Some features operate on certain models exclusively and for certain problems exclusively, or only for small-scale datasets; while some features, such as batch-normalization and residual-connections, are applicable to the majority of models, tasks, and datasets. We assume that such universal features include Weighted-Residual-Connections (WRC), Cross-Stage-Partial-connections (CSP), Cross mini-Batch Normalization (CmBN), Self-adversarial-training (SAT) and Mish-activation. We use new features: WRC, CSP, CmBN, SAT, Mish activation, Mosaic data augmentation, CmBN, DropBlock regularization, and CIoU loss, and combine some of them to achieve state-of-the-art results: 43.5% AP (65.7% AP50) for the MS COCO dataset at a realtime speed of ~65 FPS on Tesla V100.
However, what I also learned is part of the story behind YoloV4. This is very relevant to our days. The next 10 min video, really nails an explanation about how YoloV4 works.
YOLO was developed by Joseph Redmon. It was 1st presented in 2016, and it was key for object recognition research. This led to better and faster Computer Vision algorithms.
The latest version, YOLO v4 is currently developed by three developers:
Hong-Yuan Mark Liao
No Joseph Redmon in YOLOv4?
Joseph Redmon quit developing YOLO v4 because of the potential misuse of his tech. He recently announced that he would stop doing computer vision research because of the military and ethical issues….
So, why this is important? It’s all about how we use this technology. There are amazing advances in the Computer Vision area, but we also are lacking some regulation about how to use this.
IBM announced that they will no longer offer facial recognition software
2 days ago, IBM announced that they will no longer offer facial recognition software. The Verge wrote an amazing article about this (see references). This sentences really hit a point regarding Ethics and more:
IBM will no longer offer general purpose facial recognition or analysis software, IBM CEO Arvind Krishna said in a letter to Congress today. The company will also no longer develop or research the technology, IBM tells The Verge. Krishna addressed the letter to Sens. Cory Booker (D-NJ) and Kamala Harris (D-CA) and Reps. Karen Bass (D-CA), Hakeem Jeffries (D-NY), and Jerrold Nadler (D-NY).
“IBM firmly opposes and will not condone uses of any [facial recognition] technology, including facial recognition technology offered by other vendors, for mass surveillance, racial profiling, violations of basic human rights and freedoms, or any purpose which is not consistent with our values and Principles of Trust and Transparency,” Krishna said in the letter. “We believe now is the time to begin a national dialogue on whether and how facial recognition technology should be employed by domestic law enforcement agencies.” Facial recognition software has come under scrutiny for issues with racial bias and privacy concerns
Facial recognition software has improved greatly over the last decade thanks to advances in artificial intelligence. At the same time, the technology — because it is often provided by private companies with little regulation or federal oversight — has been shown to suffer from bias along lines of age, race, and ethnicity, which can make the tools unreliable for law enforcement and security and ripe for potential civil rights abuses.
I have a ToDo in my list, to add some new drone demos. In order to do this, I was planning to perform some tests with pretrained models and use them. The 1st 2 in my list are Yolo and MobileNetSSD (see references).
Let’s start with one of the most popular object detection tools, YOLOV3. The official definition:
YOLO (You Only Look Once) is a real-time object detection algorithm that is a single deep convolutional neural network that splits the input image into a set of grid cells, so unlike image classification or face detection, each grid cell in YOLO algorithm will have an associated vector in the output that tells us:
If an object exists in that grid cell.
The class of that object (i.e label).
The predicted bounding box for that object (location).
I pickup some sample code from GitHub repositories and, as usual, from PyImageSearch (see references), and I created a real-time object detection scenario using my webcam as the input feed for YoloV3.
The final demo, works great; we can use the 80 classes that YoloV3 supports and it’s working at ~2FPS.
Another very popular Object Detection Tool is MobileNetSSD. And, the important part here is SSD, Single Shot Detection. Let’s go to the definition:
Single Shot object detection or SSD takes one single shot to detect multiple objects within the image. As you can see in the above image we are detecting coffee, iPhone, notebook, laptop and glasses at the same time.
It composes of two parts
– Extract feature maps, and
– Apply convolution filter to detect objects
SSD is developed by Google researcher teams to main the balance between the two object detection methods which are YOLO and RCNN.
There are specifically two models of SSD are available
– SSD300: In this model the input size is fixed to 300×300. It is used in lower resolution images, faster processing speed and it is less accurate than SSD512
– SSD512: In this model the input size is fixed to 500×500. It is used in higher resolution images and it is more accurate than other models.
SSD is faster than R-CNN because in R-CNN we need two shots one for generating region proposals and one for detecting objects whereas in SSD It can be done in a single shot.
The MobileNet SSD method was first trained on the COCO dataset and was then fine-tuned on PASCAL VOC reaching 72.7% mAP (mean average precision).
For this demo, I’ll use the SSD300 model. Even, if the drone support better quality images and the SSD500 model works with bigger images, SSD300 is a good fit for this.
This sample works at ~20FPS, and this triggered my curiosity to learn more about the 2nd one. I started to read a lot about this, and found some amazing articles and papers. At the end, if you are interested on my personal take, I really enjoyed this 30 min video about the different detectors side-by-side
start to research the amazing world of Computer Vision, you find that there are
plenty of courses, tutorials, videos and more resources. Something is kind of “too
much”, and it’s not easy to choose where to start.
That’s why, when you arrive to one of the Adrian Rosebrock tutorials or articles, they’ll end in one of your favorites bookmarks. He has amazing detailed step by step tutorials, and I learned a lot of Raspberry Pi and OpenCV from his website.
A couple of
weeks ago, Adrian released an amazing resource for Computer Vision enthusiasts:
Again, it’s time to write about some topics what has most caught my attention in the news presented during Microsoft Build 2018. In this case I will only comment on some news related to Vision and Speech.
Computer Vision, now supports Object Detection. We have the ability to detect objects in an image. I have to see more in depth that we can both exploit this capacity in Custom Vision.
Custom Vision, new formats to export models. Until now we had the ability to export Custom Vision models to CoreML and TensorFlow.
Now we have 2 new options that really are impressive
Export to ONNX. About this I wrote about it. Now we can use these models natively as part of our UWP Apps in Windows 10.
Export to Docker File. Especially designed for mixed scenarios with Azure Functions and Azure IOT Edge
The first thing to comment is a big but very necessary change.
We now have a single service that handles: Speech to Text, Text to Speech and Speech Intent Recognition.
The 2nd point to note is that we now have the ability to Create our own Voice Models. This means that we could create Alexa or Cortana style assistants using our own voice. Ideal to give to your partner, your mother or your worst enemy.
And with this I put pause for today. Happy coding!
Otra vez apunto el post a lo que mas me ha llamado la atención en las novedades presentadas en Microsoft Build 2018. En este caso solo comentare algunas novedades relacionadas a Vision y Speech.
Computer Vision, ahora soporta Object Detection. Tenemos la capacidad de detectar objetos en una imagen. Tengo que ver mas a fondo que tanto podemos explotar esta capacidad en Custom Vision.
Custom Vision, nuevos formatos para exportar modelos. Hasta ahora teníamos la capacidad de exportar modelos de Custom Vision a CoreML y a TensorFlow. Ahora tenemos 2 nuevas opciones que realmente son impresionantes
Exportar a ONNX. Sobre esto ya escribí al respecto. Ahora podremos utilizar estos modelos de forma nativa como parte de nuestras UWP Apps en Windows 10.
Exportar a Docker File. Especialmente pensado para escenarios mixtos con Azure Functions y Azure IOT Edge
Lo primero a comentar es un cambio grande pero muy necesario.
Ahora tenemos un único servicio que se encarga de: Speech to Text, Text to Speech y Speech Intent Recognition.
El 2do punto a destacar es que ahora tenemos la capacidad de crear nuestros propios Voice Models. Esto significa que podríamos crear asistentes del estilo Alexa o Cortana utilizando nuestra propia voz. Ideal para regalar a tu pareja, tu madre o a tu peor enemigo.
Today I put on hole my posts on Project Malmo and Minecraft, because thanks to some new connectors in Microsoft Flow, I was able to create an image analysis Mobile App in a matter of minutes.
When we create a Flow triggered using a button, we have a new data type [File] for input data. If we use a File as the start of a Flow, and a Computer Vision activity, we can create a simple 3-step process for analyzing photos.
As I commented earlier, in the Button we created an input field of the type File.
Then we used the File Content of the previous step in a Computer Vision action to get the image description.
And finally we show the description returned by the Cognitive Services action in a notification to the mobile device.
If we launch our Flow, we will see that at the time of selecting a file we can take a photo using our smart phone or select one from the photo gallery
For this sample, I’ll use a simple one: a couple of toys in my desk
The process begins to work, uploading the photo to a temporary location so the Computer Vision process can analyze it
And a few seconds later we have the result that in this case, is 100% correct!
Hoy pongo los posts sobre Project Malmo y Minecraft on Hold, ya que, gracias a algunos nuevos conectores en Microsoft Flow, he podido crear una App móvil para análisis de imágenes en cuestión de minutos.
Pues bien, si creamos un Flow que se inicie utilizando un Button, ahora tenemos el tipo de dato [File], que podemos utilizar cuando se lanza un Flow. Si utilizamos un File como inicio de un Flow, y una actividad de Computer Vision, podemos crear un proceso simple de 3 pasos para analizar fotos.
Como comenté antes, en el Button creamos un input field del tipo File.
Luego utilizamos el File Content del paso anterior en una acción de Computer Vision para describir esa imagen.
Y finalmente mostramos la descripción que nos retorna Cognitive Services en una notificación.
Si lanzamos nuestro Flow, veremos que al momento de seleccionar un archivo podemos sacar una foto o seleccionar una de la galería de fotos
En mi caso, algo simple, 2 juguetes de mi escritorio
El proceso comienza a trabajar, subiendo la foto a una ubicación temporal para que Computer Vision la pueda procesar
Y pocos segundos después ya tenemos el resultado que en este caso, es 100% correcto!
Last Saturday the Microsoft Canada team invited me to give a session on Cognitive Services for a group of Microsoft Student Partners during the MSP Summit 2017. I was lucky to give the session with Sabrina (@sabrina_smai) and as always, it was a great moment.
We end with an example where Hololens used some Computer Vision services to describe the environment around us (holograms included!)
El pasado sábado el equipo de Microsoft Canada me invito para dar una sesión sobre Cognitive Services para un grupo de Microsoft Student Partners durante el MSP Summit. Tuve la suerte de dar la sesion junto a Sabrina (@sabrina_smai) y como siempre, fue un momento genial. Terminamos con un ejemplo donde Hololens utilizaba algunos servicios de Computer Vision para describir el entorno que nos rodea (hologramas incluidos!)