This is simple, however the sensor does not support constants requests, and it may return a “too many requests” response when called directly. The idea to get the sensor information directly in the web-request was not valid from day zero.
I asked for support / guidance and my amazing and smart friends show me the concept of OVER ENGINEERING. Dockers, Compose, Queues, Coordination and more was part of some of the proposals. However, they also show me the most easy and simple way to solve this: Multi-threading.
Thread 1, where an infinite loop request information from the sensor, and stores the latest value to be shared.
Thread 2, where a web-server process requests and share the latest sensor information.
Easy ! And after a couple of tests, I manage to create a single file implementing this:
I have a ToDo in my list, to add some new drone demos. In order to do this, I was planning to perform some tests with pretrained models and use them. The 1st 2 in my list are Yolo and MobileNetSSD (see references).
Let’s start with one of the most popular object detection tools, YOLOV3. The official definition:
YOLO (You Only Look Once) is a real-time object detection algorithm that is a single deep convolutional neural network that splits the input image into a set of grid cells, so unlike image classification or face detection, each grid cell in YOLO algorithm will have an associated vector in the output that tells us:
If an object exists in that grid cell.
The class of that object (i.e label).
The predicted bounding box for that object (location).
I pickup some sample code from GitHub repositories and, as usual, from PyImageSearch (see references), and I created a real-time object detection scenario using my webcam as the input feed for YoloV3.
The final demo, works great; we can use the 80 classes that YoloV3 supports and it’s working at ~2FPS.
Another very popular Object Detection Tool is MobileNetSSD. And, the important part here is SSD, Single Shot Detection. Let’s go to the definition:
Single Shot object detection or SSD takes one single shot to detect multiple objects within the image. As you can see in the above image we are detecting coffee, iPhone, notebook, laptop and glasses at the same time.
It composes of two parts
– Extract feature maps, and
– Apply convolution filter to detect objects
SSD is developed by Google researcher teams to main the balance between the two object detection methods which are YOLO and RCNN.
There are specifically two models of SSD are available
– SSD300: In this model the input size is fixed to 300×300. It is used in lower resolution images, faster processing speed and it is less accurate than SSD512
– SSD512: In this model the input size is fixed to 500×500. It is used in higher resolution images and it is more accurate than other models.
SSD is faster than R-CNN because in R-CNN we need two shots one for generating region proposals and one for detecting objects whereas in SSD It can be done in a single shot.
The MobileNet SSD method was first trained on the COCO dataset and was then fine-tuned on PASCAL VOC reaching 72.7% mAP (mean average precision).
For this demo, I’ll use the SSD300 model. Even, if the drone support better quality images and the SSD500 model works with bigger images, SSD300 is a good fit for this.
This sample works at ~20FPS, and this triggered my curiosity to learn more about the 2nd one. I started to read a lot about this, and found some amazing articles and papers. At the end, if you are interested on my personal take, I really enjoyed this 30 min video about the different detectors side-by-side
In my post series I already wrote about how to detect faces. We can do this with a camera and OpenCV. However, a drone can also be moved on command, so let’s write some lines to detect a face, and calculate the orientation and distance of the detected face from the center camera of the camera.
In order to do this, 1st let’s draw a grid in the camera frame, and once a face is detected, let’s show the distance and orientation from the center.
Let’s start with a Grid. The idea is to create a 3×3 grid in the camera frame, and use the center cell as reference for the detected objects. The code to create a 3×3 grid is this one:
We use the line() function on OpenCV, and do some calculations to get the starting and endpoint for the 4 lines for the grid: 2 vertical lines and 2 horizontal lines. For this demo, I’ll implement this in my main webcam.
Based on my face detection samples and other samples in GitHub (see references), now I’ll calculate the position of the detected face (with x, y, h, w) from the center of the camera:
def calculatePositionForDetectedFace(frame, x, y, h , w):
# calculate direction and relative position of the face
cx = int(x + (w / 2)) # Center X of the Face
cy = int(y + (h / 2)) # Center Y of the Face
if (cx <int(camera_Width/2) - centerZone):
cv2.putText (frame, " LEFT " , (20, 50), cv2.FONT_HERSHEY_COMPLEX, 1 , colorGreen, 2)
dir = 1
elif (cx > int(camera_Width / 2) + centerZone):
cv2.putText(frame, " RIGHT ", (20, 50), cv2.FONT_HERSHEY_COMPLEX,1,colorGreen, 3)
dir = 2
elif (cy < int(camera_Heigth / 2) - centerZone):
cv2.putText(frame, " UP ", (20, 50), cv2.FONT_HERSHEY_COMPLEX,1,colorGreen, 3)
dir = 3
elif (cy > int(camera_Heigth / 2) + centerZone):
cv2.putText(frame, " DOWN ", (20, 50), cv2.FONT_HERSHEY_COMPLEX, 1,colorGreen, 3)
dir = 4
# display detected face frame, line from center and direction to go
cv2.line (frame, (int(camera_Width/2),int(camera_Heigth/2)), (cx,cy), colorRed, messageThickness)
cv2.rectangle(frame, (x, y), (x + w, y + h), colorBlue, messageThickness)
cv2.putText (frame, str(int(x)) + " " + str(int(y)), (x - 20, y - 45), cv2.FONT_HERSHEY_COMPLEX,0.7, colorRed, messageThickness)
The output is similar to this one
And now with the base code completed, it’s time to add this logic to the drone samples !
Bonus: the complete code.
# Bruno Capuano 2020
# display the camera feed using OpenCV
# display a 3×3 Grid
# detect faces using openCV and haar cascades
# calculate the relative position for the face from the center of the camera
We already have the drone camera feed ready to process, so let’s do some Image Segmentation today. As usual, let’s start with the formal definition of Image Segmentation
In digital image processing and computer vision, image segmentation is the process of partitioning a digital image into multiple segments (sets of pixels, also known as image objects). The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is typically used to locate objects and boundaries (lines, curves, etc.) in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics.
The result of image segmentation is a set of segments that collectively cover the entire image, or a set of contours extracted from the image (see edge detection). Each of the pixels in a region are similar with respect to some characteristic or computed property, such as color, intensity, or texture. Adjacent regions are significantly different with respect to the same characteristic(s). When applied to a stack of images, typical in medical imaging, the resulting contours after image segmentation can be used to create 3D reconstructions with the help of interpolation algorithms like marching cubes.
The technique is amazing, and once is attached to the drone camera, we can get something like this:
I used a Python library to make most of the work: PixelLib. It was created by an amazing set of colleagues, so please check the references and take a look at the project description.
PixelLib: is a library built for an easy implementation of Image Segmentation in real life problems. PixelLib is a flexible library that can be integrated into software solutions that require the application of Image Segmentation.
Once I have all the pieces together, I pulled a Pull Request with a single change to allow the use of OpenCV and webcam camera frames and I got a basic demo up and running.
Let’s review the code
Line 147. That’s it, a single line which performs the instance segmentation, and also display the bounding boxes.
# Bruno Capuano
# enable drone video camera
# display video camera using OpenCV
# display FPS
# add a bottom image overlay, using a background image
# key D enable / disable instance segmentation detection
Today I’ll step back a couple of posts, and add 2 simple lines to allow me to save a video file from the Drone camera. This is a request, and it’s makes a lot of sense to have recorded a file with the drone camera.
The video will later contains detected objects and more, so let’s go with the code. All the magic happens here:
Lines 97-103. Open the drone camera stream, and also opens a video output stream to save the video file.
Lines 123-124. Display the camera feed and add the camera frame into the output video file.
Lines 136-139. Dispose objects, and close the video output file.