#VS2019 – ML.NET Model Builder for Object Detection, be careful with file names and VoTT

Buy Me A Coffee

Hi !

This is a sequel part after yesterday post on the new Model Builder Object Detection scenario. I’ve found this error, and it took me some time to figure out the reason.

As usual, an error on the Model Builder, with very few details and the suggestion to check more details on the Azure ML Portal.

On the Azure ML Portal, we can see that 3 experiments failed. Now is time to fig into the logs files and try to figure out what happened.

mlnet 02 azure ml portal error

On the main experiment, after reading the logs, I found out that there is a [File missing for image]. And other file related issues.

mlnet 03 azure portal error file missing

And, after some research (test, fail and test again)

I realized that some of my original file names have a space on the file name.

In example [apple 04.png].

When we create the labeling project using VoTT, the exported project makes some changes into the file names and [apple 04.png] becomes [apple%2004.png]. Here is a part of the exported project:

{
    "name": "mlnet bad file name",
    "securityToken": "mlnet bad file name Token",
    "videoSettings": {
        "frameExtractionRate": 15
    },
    "tags": [
        {
            "name": "apple",
            "color": "#5db300"
        }
    ],
    "id": "aajYgiNHg",
    "activeLearningSettings": {
        "autoDetect": false,
        "predictTag": true,
        "modelPathType": "coco"
    },
    "version": "2.2.0",
    "lastVisitedAssetId": "3d30adce06faf9bf8c6540ec6435ef61",
    "assets": {
        "3d30adce06faf9bf8c6540ec6435ef61": {
            "asset": {
                "format": "png",
                "id": "3d30adce06faf9bf8c6540ec6435ef61",
                "name": "apple%2004.png",
                "path": "file:C:/ML/FreshFood/mlnet%20bad%20file%20names/apple%2004.png",
                "size": {
                    "width": 584,
                    "height": 510
                },
                "state": 2,
                "type": 1
            },
            "regions": [
                {
                    "id": "4BDOAk0Xq",
                    "type": "RECTANGLE",
                    "tags": [
                        "apple"
                    ],
                    "boundingBox": {
                        "height": 505.0856531049251,
                        "width": 579.0878504672897,
                        "left": 1.782359813084112,
                        "top": 1.6381156316916488
                    },

The solution, was to use a tool and remove all the spaces and non-standard characters from my image data set. I also did all the labeling process again, and when I launched again my Object Recognition scenario from Model Builder, everything worked fine!

Happy coding!

Greetings

El Bruno

Resources

#VS2019 – ML.NET Model Builder for Object Detection using #Azure Compute #mlnet #objectdetection

Buy Me A Coffee

Hi !

There is a new scenario available in ML.Net Model Builder for Visual Studio 2019: Object Detection. This scenario is not just image tagging, this scenario allows us to detect objects in an image, and get the specific coordinates and size of the detected objects. As usual, it requires a starting data set with images and labels.

Model Builder Object Detection

This one is super helpful and is also very easy to use. Let’s start with the 1st step

Add a new Machine Learning element in a Visual Studio project, and select Object Detection scenario.

select object detection scenario

This scenario only supports Azure training environment. Lucky me, I already have a Azure ML environment up and running (see references for my previous posts).

Next step is to define the data source for the scenario. In order to do this, we need to define an Image Labeling format file. The suggested tool to do this is VoTT (see references). You need to install the local version for VoTT. As far as I understand, for this Model Builder scenario, it only support VoTT projects using local files.

Define Source Images and Labels

For this post, I created a sample project using Fresh Food labels. Just 4

  • apple
  • apple rotten
  • banana
  • banana rotten

Important: The source connection will define the path where for the source images that will be used for labeling. Target connection defines the project location and also the path for the project exported file.

vott project settins

Once we defined our project properties, we can start defining tags and labeling our objects in the project images. This is a manual process, and it can be tedious.

Important: VoTT support custom detailed areas as the ones you see in the image below. For the Model Builder Scenario, these custom areas will be transformed into standard rectangles.

At any moment, you can save and export your project. This will generate a new folder in the [Target Connection] location named [vott-json-export]. In this folder we can find all the used and labeled images and also the exported JSON file that we will use on Visual Studio.

This is sample content of the exported file:

{
    "name": "MLNet Fresh Food",
    "securityToken": "MLNet Fresh Food Token",
    "videoSettings": {
        "frameExtractionRate": 15
    },
    "tags": [
        {
            "name": "apple_rotten",
            "color": "#5db300"
        },
        {
            "name": "apple",
            "color": "#e81123"
        },
        {
            "name": "banana",
            "color": "#6917aa"
        },
        {
            "name": "banana_rotten",
            "color": "#015cda"
        }
    ],
    "id": "uF6EQo2eX",
    "activeLearningSettings": {
        "autoDetect": false,
        "predictTag": true,
        "modelPathType": "coco"
    },
    "version": "2.2.0",
    "lastVisitedAssetId": "27f36f1bbeb5abc8a505d677931d8e1d",
    "assets": {
        "27f36f1bbeb5abc8a505d677931d8e1d": {
            "asset": {
                "format": "png",
                "id": "27f36f1bbeb5abc8a505d677931d8e1d",
                "name": "bananarotten08.png",
                "path": "file:C:/ML/FreshFood/mlnet/bananarotten08.png",
                "size": {
                    "width": 568,
                    "height": 298
                },
                "state": 2,
                "type": 1
            },
            "regions": [
                {
                    "id": "VVg0KLN6d",
                    "type": "RECTANGLE",
                    "tags": [
                        "banana_rotten"
                    ],
                    "boundingBox": {
                        "height": 158.57173447537474,
                        "width": 470.0359550561798,
                        "left": 21.26011235955056,
                        "top": 24.248394004282655
                    },
                    "points": [
                        {
                            "x": 21.26011235955056,
                            "y": 24.248394004282655
                        },
                        {
                            "x": 491.29606741573036,
                            "y": 24.248394004282655
                        },
                        {
                            "x": 491.29606741573036,
                            "y": 182.8201284796574
                        },
                        {
                            "x": 21.26011235955056,
                            "y": 182.8201284796574
                        }
                    ]
                }
            ],
            "version": "2.2.0"
        },

Back again in Visual Studio, we can select this file as the INPUT Json file for our scenario.

mlnet object detection definicion using the exported file

Train the Model

Everything is ready, let’s train the model, using Azure Compute ;D

The process is similar to the Image Classification scenario, it all start uploading the sources images to a Blob in Azure, and then launch 4 Azure ML Experiments

  • AutoML
  • Preparation
  • Script
  • HyperDrive

Once the process is complete, we can also check output metrics like Precision and Accuracy. This one is not the best model, however I consider this a great starting point for my demo blog.

If you want to know more about Azure ML, this is an great opportunity to play around with these experiments. The Script Experiment (Run 16 in the image) has a lot to dig, like the complete output log, some training python files, the generated ONNX model, and more.

mlnet 11 object detection azure track 01

Test and consume the generated Model

Back to Visual Studio we can check the model metrics and also the Best Model (I have so many questions here!).

And we can test the generated model. As you can see in the next images, using a threshold with value 50, the results are good. Still not detecting the rotten apple with the good ones, so I maybe need to figure out some strategies to improve my model.

sample test detecting apples
sample test detecting bananas

Remember that all these features are still in preview, however they are an amazing starting point and tool to use and learn.

Happy coding!

Greetings

El Bruno

Resources

#Coding4Fun – How to control your #drone with 20 lines of code! (11/N)

Buy Me A Coffee

Hi!

Today code objective is very simple:

The drone is flying very happy, but if the camera detects a banana, the drone must land !

Let’s take a look at the program working:

drone flying and when detect a banana lands

And a couple of notes regarding the app

  • Still use Haar Cascades for object detection. I found an article with a Xml file to detect bananas, so I’m working with this one (see references).
  • Using Haar Cascades is not the best technique for object detection. During the testing process, I found a lot of false positives. Mostly with small portions of the frame who were detected as bananas. One solution, was to limit the size of the detected objects using OpenCV (I’ll write more about this in the future)
  • As you can see in the animation, when the drone is a few meters away, the video feed becomes messy. And because the object detection is performed locally, it takes some time to detect the banana.
  • I also implemented some code to take off the drone when the user press the key ‘T’, and land the drone when the user press the key ‘L’
  • The code is starting to become a mess, so a refactoring is needed

Here is the code

# Bruno Capuano
# detect faces using haar cascades from https://github.com/opencv/opencv/tree/master/data/haarcascades
# enable drone video camera
# display video camera using OpenCV
# display FPS
# detect faces and bananas
# launch the drone with key T, and land with key L
## if the drone is flying, and a banana is detected, land the drone
import cv2
import socket
import time
import threading
def receiveData():
global response
while True:
try:
response, _ = clientSocket.recvfrom(1024)
except:
break
def readStates():
global battery
while True:
try:
response_state, _ = stateSocket.recvfrom(256)
if response_state != 'ok':
response_state = response_state.decode('ASCII')
list = response_state.replace(';', ':').split(':')
battery = int(list[21])
except:
break
def sendCommand(command):
global response
timestamp = int(time.time() * 1000)
clientSocket.sendto(command.encode('utf-8'), address)
while response is None:
if (time.time() * 1000) timestamp > 5 * 1000:
return False
return response
def sendReadCommand(command):
response = sendCommand(command)
try:
response = str(response)
except:
pass
return response
def sendControlCommand(command):
response = None
for i in range(0, 5):
response = sendCommand(command)
if response == 'OK' or response == 'ok':
return True
return False
# ———————————————–
# Main program
# ———————————————–
# connection info
UDP_IP = '192.168.10.1'
UDP_PORT = 8889
last_received_command = time.time()
STATE_UDP_PORT = 8890
address = (UDP_IP, UDP_PORT)
response = None
response_state = None
clientSocket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
clientSocket.bind(('', UDP_PORT))
stateSocket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
stateSocket.bind(('', STATE_UDP_PORT))
# start threads
recThread = threading.Thread(target=receiveData)
recThread.daemon = True
recThread.start()
stateThread = threading.Thread(target=readStates)
stateThread.daemon = True
stateThread.start()
# connect to drone
response = sendControlCommand("command")
print(f'command response: {response}')
response = sendControlCommand("streamon")
print(f'streamon response: {response}')
# drone information
battery = 0
# enable face and smile detection
face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
banana_cascade = cv2.CascadeClassifier('banana_classifier.xml')
# open UDP
print(f'opening UDP video feed, wait 2 seconds ')
videoUDP = 'udp://192.168.10.1:11111'
cap = cv2.VideoCapture(videoUDP)
time.sleep(2)
# open
banana_detected = False
drone_flying = False
i = 0
while True:
i = i + 1
start_time = time.time()
try:
_, frameOrig = cap.read()
frame = cv2.resize(frameOrig, (480, 360))
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
# detect faces
faces = face_cascade.detectMultiScale(gray, 1.3, 5)
for (x, y, w, h) in faces:
cv2.rectangle(frame, (x, y), ((x + w), (y + h)), (0, 0, 255), 2)
font = cv2.FONT_HERSHEY_COMPLEX_SMALL
cv2.putText(frame, 'face', (h + 6, w 6), font, 0.7, (255, 255, 255), 1)
# detect banana
bananas = banana_cascade.detectMultiScale(gray,
scaleFactor=1.3,
minNeighbors=5,
minSize=(150, 50))
for (x, y, w, h) in bananas:
cv2.rectangle(frame, (x, y), ((x + w), (y + h)), (0, 255, 0), 2)
font = cv2.FONT_HERSHEY_COMPLEX_SMALL
cv2.putText(frame, 'bananas', (h + 6, w 6), font, 0.7, (255, 255, 255), 1)
if(len(bananas) > 0):
banana_detected = True
else:
banana_detected = False
# fly logic
if (drone_flying == True and banana_detected == True):
drone_flying = False
msg = "land"
sendCommand(msg)
time.sleep(5)
break
# display fps
if (time.time() start_time ) > 0:
fpsInfo = "FPS: " + str(1.0 / (time.time() start_time)) # FPS = 1 / time to process loop
font = cv2.FONT_HERSHEY_DUPLEX
cv2.putText(frame, fpsInfo, (10, 20), font, 0.4, (255, 255, 255), 1)
cv2.imshow('@elbruno – DJI Tello Camera', frame)
sendReadCommand('battery?')
print(f'banana: {banana_detected} – flying: {drone_flying} – battery: {battery} % – i: {i}{fpsInfo}')
except Exception as e:
print(f'exc: {e}')
pass
#raise e
if cv2.waitKey(1) & 0xFF == ord('t'):
drone_flying = True
detection_started = True
msg = "takeoff"
sendCommand(msg)
if cv2.waitKey(1) & 0xFF == ord('l'):
drone_flying = False
msg = "land"
sendCommand(msg)
time.sleep(5)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
msg = "land"
sendCommand(msg) # land
response = sendControlCommand("streamoff")
print(f'streamon response: {response}')

In next posts, I’ll analyze more in details how this works, and a couple of improvements that I can implement.

Happy coding!

Greetings

El Bruno

References

My Posts

#Windows10 – Windows Vision Skills (Preview), an amazing set of AI APIs to run in the edge!

Hi!

Today’s announcement is a big one if you are interested on move AI capabilities to the Edge. The Windows team make public the preview of Windows Vision Skills framework:

Windows Vision Skills framework is meant to standardize the way AI and CV is put to use within a WinRT application running on the edge. It aims to abstract away the complexity of AI and CV techniques by simply defining the concept of skills which are modular pieces of code that process input and produce output. The implementation that contains the complex details is encapsulated by an extensible WinRT API that inherits the base class present in this namespace, which leverages built-in Windows primitives which in-turn eases interop with built-in acceleration frameworks or external 3rd party ones.

The official blog explain the basic features of the framework and describes a set of scenarios like Object Detector, Skeletal Detector, and Emotion Recognizer.

We have UWP Apps in the repo samples, and it only took 1 min to setup everything to get the App up and running. In the following image, it smoothly detects a person and a chair.

The next image is the sample for Skeletal detector (as a old Kinect dev, this really makes me happy!)

This is an big announcement, because all of this APIs are native , and that means we can easily use them in

Greetings @ Toronto

El Bruno

References