#ComputerVision – Object Detection with #YoloV4 (work in progress …) and let’s think about ethics in Computer Vision

Buy Me A Coffee

Hi !

So after yesterday post where I used YoloV3 and MobileNetSSD, I also remember that we have YoloV4 released on April. I managed to make my code work with YoloV4 with some poor FPS results.

If you are interested on the code, let me know and I’ll be happy to share the code. It’s still a mess, working but a mess.

Abstract: There are a huge number of features which are said to improve Convolutional Neural Network (CNN) accuracy. Practical testing of combinations of such features on large datasets, and theoretical justification of the result, is required. Some features operate on certain models exclusively and for certain problems exclusively, or only for small-scale datasets; while some features, such as batch-normalization and residual-connections, are applicable to the majority of models, tasks, and datasets. We assume that such universal features include Weighted-Residual-Connections (WRC), Cross-Stage-Partial-connections (CSP), Cross mini-Batch Normalization (CmBN), Self-adversarial-training (SAT) and Mish-activation. We use new features: WRC, CSP, CmBN, SAT, Mish activation, Mosaic data augmentation, CmBN, DropBlock regularization, and CIoU loss, and combine some of them to achieve state-of-the-art results: 43.5% AP (65.7% AP50) for the MS COCO dataset at a realtime speed of ~65 FPS on Tesla V100.

However, what I also learned is part of the story behind YoloV4. This is very relevant to our days. The next 10 min video, really nails an explanation about how YoloV4 works.

YOLO History

YOLO was developed by Joseph Redmon. It was 1st presented in 2016, and it was key for object recognition research. This led to better and faster Computer Vision algorithms.

The latest version, YOLO v4 is currently developed by three developers:

  • Alexey Bochkovskiy
  • Chien-Yao Wang
  • Hong-Yuan Mark Liao

No Joseph Redmon in YOLOv4?

Joseph Redmon quit developing YOLO v4 because of the potential misuse of his tech. He recently announced that he would stop doing computer vision research because of the military and ethical issues….

So, why this is important? It’s all about how we use this technology. There are amazing advances in the Computer Vision area, but we also are lacking some regulation about how to use this.

IBM announced that they will no longer offer facial recognition software

2 days ago, IBM announced that they will no longer offer facial recognition software. The Verge wrote an amazing article about this (see references). This sentences really hit a point regarding Ethics and more:

IBM will no longer offer general purpose facial recognition or analysis software, IBM CEO Arvind Krishna said in a letter to Congress today. The company will also no longer develop or research the technology, IBM tells The Verge. Krishna addressed the letter to Sens. Cory Booker (D-NJ) and Kamala Harris (D-CA) and Reps. Karen Bass (D-CA), Hakeem Jeffries (D-NY), and Jerrold Nadler (D-NY).

“IBM firmly opposes and will not condone uses of any [facial recognition] technology, including facial recognition technology offered by other vendors, for mass surveillance, racial profiling, violations of basic human rights and freedoms, or any purpose which is not consistent with our values and Principles of Trust and Transparency,” Krishna said in the letter. “We believe now is the time to begin a national dialogue on whether and how facial recognition technology should be employed by domestic law enforcement agencies.” Facial recognition software has come under scrutiny for issues with racial bias and privacy concerns

Facial recognition software has improved greatly over the last decade thanks to advances in artificial intelligence. At the same time, the technology — because it is often provided by private companies with little regulation or federal oversight — has been shown to suffer from bias along lines of age, race, and ethnicity, which can make the tools unreliable for law enforcement and security and ripe for potential civil rights abuses.

The Verge, IBM will no longer offer, develop, or research facial recognition technology

There it is, think about this.

Happy coding!

Greetings

El Bruno

Resources

#ComputerVision – Object Detection with #YoloV3 and #MobileNetSSD

Buy Me A Coffee

Hi !

I have a ToDo in my list, to add some new drone demos. In order to do this, I was planning to perform some tests with pretrained models and use them. The 1st 2 in my list are Yolo and MobileNetSSD (see references).

YoloV3

Let’s start with one of the most popular object detection tools, YOLOV3. The official definition:

YOLO (You Only Look Once) is a real-time object detection algorithm that is a single deep convolutional neural network that splits the input image into a set of grid cells, so unlike image classification or face detection, each grid cell in YOLO algorithm will have an associated vector in the output that tells us:

If an object exists in that grid cell.

The class of that object (i.e label).

The predicted bounding box for that object (location).

YoloV3

I pickup some sample code from GitHub repositories and, as usual, from PyImageSearch (see references), and I created a real-time object detection scenario using my webcam as the input feed for YoloV3.

Object Detection live sample with Yolo V3

The final demo, works great; we can use the 80 classes that YoloV3 supports and it’s working at ~2FPS.

MobileNetSSD

Another very popular Object Detection Tool is MobileNetSSD. And, the important part here is SSD, Single Shot Detection. Let’s go to the definition:

Single Shot object detection or SSD takes one single shot to detect multiple objects within the image. As you can see in the above image we are detecting coffee, iPhone, notebook, laptop and glasses at the same time.

It composes of two parts

– Extract feature maps, and

– Apply convolution filter to detect objects

SSD is developed by Google researcher teams to main the balance between the two object detection methods which are YOLO and RCNN.

There are specifically two models of SSD are available

– SSD300: In this model the input size is fixed to 300×300. It is used in lower resolution images, faster processing speed and it is less accurate than SSD512

– SSD512: In this model the input size is fixed to 500×500. It is used in higher resolution images and it is more accurate than other models.

SSD is faster than R-CNN because in R-CNN we need two shots one for generating region proposals and one for detecting objects whereas in SSD It can be done in a single shot.

The MobileNet SSD method was first trained on the COCO dataset and was then fine-tuned on PASCAL VOC reaching 72.7% mAP (mean average precision).

For this demo, I’ll use the SSD300 model. Even, if the drone support better quality images and the SSD500 model works with bigger images, SSD300 is a good fit for this.

bject Detection with MobileNetSSD

This sample works at ~20FPS, and this triggered my curiosity to learn more about the 2nd one. I started to read a lot about this, and found some amazing articles and papers. At the end, if you are interested on my personal take, I really enjoyed this 30 min video about the different detectors side-by-side

Source Code

YoloV3 webcam live object detection

MobileNetSSD webcam live object detection

Happy coding!

Greetings

El Bruno

Resources

#AI – Getting started with #ComputerVision, #DeepLearning, and #OpenCV by Adrian Rosebrock @pyimagesearch

display face landmarks in python using face recognition
display face landmarks in python using face recognition
Buy Me A Coffee

Hi!

When you start to research the amazing world of Computer Vision, you find that there are plenty of courses, tutorials, videos and more resources. Something is kind of “too much”, and it’s not easy to choose where to start.

That’s why, when you arrive to one of the Adrian Rosebrock tutorials or articles, they’ll end in one of your favorites bookmarks. He has amazing detailed step by step tutorials, and I learned a lot of Raspberry Pi and OpenCV from his website.

A couple of weeks ago, Adrian released an amazing resource for Computer Vision enthusiasts:

Need help getting started with Computer Vision, Deep Learning, and OpenCV?

No matter if you are starting from zero, have some knowledge or you are already an expert; you must look at this amazing compile of resources. I’ll copy and paste the main topics

And I can’t thanks enough Adrian for his amazing work and also, for sharing all of this!

Happy coding!

Greetings @ Toronto

El Bruno

#AI – Some news in Cognitive Services presented at #MSBuild 2018

Hi!

Again, it’s time to write about some topics what has most caught my attention in the news presented during Microsoft Build 2018. In this case I will only comment on some news related to Vision and Speech.

Vision

  • Computer Vision, now supports Object Detection. We have the ability to detect objects in an image. I have to see more in depth that we can both exploit this capacity in Custom Vision.
  • Custom Vision, new formats to export models. Until now we had the ability to export Custom Vision models to CoreML and TensorFlow.
    Now we have 2 new options that really are impressive

    • Export to ONNX. About this I wrote about it. Now we can use these models natively as part of our UWP Apps in Windows 10.
    • Export to Docker File. Especially designed for mixed scenarios with Azure Functions and Azure IOT Edge

I1

Speech

The first thing to comment is a big but very necessary change.

We now have a single service that handles: Speech to Text, Text to Speech and Speech Intent Recognition.

The 2nd point to note is that we now have the ability to Create our own Voice Models. This means that we could create Alexa or Cortana style assistants using our own voice. Ideal to give to your partner, your mother or your worst enemy.

And with this I put pause for today. Happy coding!

Greetings @ Toronto

El Bruno

 

#AI – Algunas novedades en Cognitive Services presentadas en #MSBuild 2018

Buenas!

Otra vez apunto el post a lo que mas me ha llamado la atención en las novedades presentadas en Microsoft Build 2018. En este caso solo comentare algunas novedades relacionadas a Vision y Speech.

Vision

  • Computer Vision, ahora soporta Object Detection. Tenemos la capacidad de detectar objetos en una imagen. Tengo que ver mas a fondo que tanto podemos explotar esta capacidad en Custom Vision.
  • Custom Vision, nuevos formatos para exportar modelos. Hasta ahora teníamos la capacidad de exportar modelos de Custom Vision a CoreML y a TensorFlow. Ahora tenemos 2 nuevas opciones que realmente son impresionantes
    • Exportar a ONNX. Sobre esto ya escribí al respecto. Ahora podremos utilizar estos modelos de forma nativa como parte de nuestras UWP Apps en Windows 10.
    • Exportar a Docker File. Especialmente pensado para escenarios mixtos con Azure Functions y Azure IOT Edge

I1

Speech

Lo primero a comentar es un cambio grande pero muy necesario.

Ahora tenemos un único servicio que se encarga de: Speech to Text, Text to Speech y Speech Intent Recognition.

El 2do punto a destacar es que ahora tenemos la capacidad de crear nuestros propios Voice Models. Esto significa que podríamos crear asistentes del estilo Alexa o Cortana utilizando nuestra propia voz. Ideal para regalar a tu pareja, tu madre o a tu peor enemigo.

Y con esto pongo pausa por today. Happy coding!

Saludos @ Toronto

El Bruno

 

#Flow – Analyzing images in 3 steps with Microsoft Flow and Computer Vision #CognitiveServices

Hi!

Today I put on hole my posts on Project Malmo and Minecraft, because thanks to some new connectors in Microsoft Flow, I was able to create an image analysis Mobile App in a matter of minutes.

When we create a Flow triggered using a button, we have a new data type [File] for input data. If we use a File as the start of a Flow, and a Computer Vision activity, we can create a simple 3-step process for analyzing photos.

I1

As I commented earlier, in the Button we created an input field of the type File.

I2

Then we used the File Content of the previous step in a Computer Vision action to get the image description.

I3

And finally we show the description returned by the Cognitive Services action in a notification to the mobile device.

I4

If we launch our Flow, we will see that at the time of selecting a file we can take a photo using our smart phone or select one from the photo gallery

I5

For this sample, I’ll use a simple one: a couple of toys in my desk

I6

The process begins to work, uploading the photo to a temporary location so the Computer Vision process can analyze it

I7

And a few seconds later we have the result that in this case, is 100% correct!

I8

 

Happy Coding!

Saludos @ Burlington

El Bruno

References

My posts on Flow

#Flow – Analizando imágenes en 3 pasos con Microsoft Flow y Computer Vision #CognitiveServices

Hola!

Hoy pongo los posts sobre Project Malmo y Minecraft on Hold, ya que, gracias a algunos nuevos conectores en Microsoft Flow, he podido crear una App móvil para análisis de imágenes en cuestión de minutos.

Pues bien, si creamos un Flow que se inicie utilizando un Button, ahora tenemos el tipo de dato [File], que podemos utilizar cuando se lanza un Flow. Si utilizamos un File como inicio de un Flow, y una actividad de Computer Vision, podemos crear un proceso simple de 3 pasos para analizar fotos.

I1

Como comenté antes, en el Button creamos un input field del tipo File.

I2

Luego utilizamos el File Content del paso anterior en una acción de Computer Vision para describir esa imagen.

I3

Y finalmente mostramos la descripción que nos retorna Cognitive Services en una notificación.

I4

Si lanzamos nuestro Flow, veremos que al momento de seleccionar un archivo podemos sacar una foto o seleccionar una de la galería de fotos

I5

En mi caso, algo simple, 2 juguetes de mi escritorio

I6

El proceso comienza a trabajar, subiendo la foto a una ubicación temporal para que Computer Vision la pueda procesar

I7

Y pocos segundos después ya tenemos el resultado que en este caso, es 100% correcto!

I8

 

Happy Coding!

Saludos @ Burlington

El Bruno

References

My posts on Flow

#Event – Materials used during the #MSPSummit 2017 in the #CognitiveServices session

Hello!

Last Saturday the Microsoft Canada team invited me to give a session on Cognitive Services for a group of Microsoft Student Partners during the MSP Summit 2017. I was lucky to give the session with Sabrina (@sabrina_smai) and as always, it was a great moment.
We end with an example where Hololens used some Computer Vision services to describe the environment around us (holograms included!)

The slides of the session can be seen here

 

And the source code for some of the examples can be downloaded from Github (link)

Happy coding!

Gretings @ Burlington

El Bruno

References

#Event – Materiales del evento #CognitiveServices durante el #MSPSummit 2017

Hola!

El pasado sábado el equipo de Microsoft Canada me invito para dar una sesión sobre Cognitive Services para un grupo de Microsoft Student Partners durante el MSP Summit. Tuve la suerte de dar la sesion junto a Sabrina (@sabrina_smai) y como siempre, fue un momento genial. Terminamos con un ejemplo donde Hololens utilizaba algunos servicios de Computer Vision para describir el entorno que nos rodea (hologramas incluidos!)

Las slides de la sesión se pueden ver aquí

Y el código fuente de algunos de los ejemplos se puede descargar desde Github (link)

Happy coding!

Saludos @ Burlington

El Bruno

References

#ComputerVision – How to create a 3D model of a face using a 2D photo (Amazing !)

Hello!

The advances in Computer Vision are becoming more and more impressive. The suite I know best and with use more is Azure Cognitive services, however, there are surprises and advances that leave me with my mouth open.

This is the case of the work published by Aaron S. Jackson, Adrian encyclical, Vasileios Argyriou and Georgios Tzimiropoulos, where he explains how he can create a 3d model from a 2d photo. The best thing is to see it in action

2017 09 25 3D face from 2D 01.gif

I recommend you also see the video where they apply the algorithm in real-time to faces in a video.

Now is the time to try to explain, with my words of a 5-year-old boy, how this works. Behind this algorithm is a Convolutional Neural Network (CNN), which has been trained with 2D images with the results expected in 3D. The interesting thing about this model is that it has reached such a level of sophistication that it does not need a specific point of reference for a face, it works on any face.

With the 2D image information, it is possible to rebuild elements of the face, including parts that are not seen in the 2D image. In this way, and after much training CNN, achieve the results that can be seen in the live demo!

Maybe it’s better to hear this in their own words

Greetings @ Toronto

El Bruno

References