#WinML – How to create a #Windows10 App using #YOLO for object detection (3 of 4)

Windows 10 and YOLOV2 for Object Detection Series


Today I start with the final UWP App running because much of the post will be code and more code. The expected output in a Windows 10 object recognition App with YoloV2 is similar to the following image


Following in the footsteps of my previous post, we were with the result of the process of our Webcam With YoloV2 and that is an array of 21125 Numbers Float. Well, this number is not trivial, as rereading the documentation of YOLOV2 we see that YOLO divides the image into a 13-by 13-cell grid:


Each of these Cells It is responsible for predicting 5 bounding boxes. A bounding box describes the rectangle that contains an object. and from here The number

13 * 13 * 125 = 21125

There are many posts that describe how Yolo works internally, I left some in the references if someone is interested in the details.

Well, in this scenario the next step was to start translating that Grid[21125] in C# objects to work with. As the internet is a very broad source of knowledge, instead of translating some of the Python classes that already exist, I saw that Rene Schulte It had between its GitHub repositories a Fork From another repo where you could see the following classes

  • YoloWinMLParser.cs
    • This class is a parser to convert the Grid In a collection of Frames with the size and location coordinates of the objects detected in the image.
  • YoloBoundingBox.cs
    • This class represents a Frame of detected object.

To show these frames, we add a Canvas About the control that shows the Feed of the camera.


The following code completes the example, with the following considerations

  • There are a number of private variables to work with the model, the collection of frames and the visual styles with which they are painted.
  • The frames of people are painted in green, the other objects in yellow

using System;
using System.Collections.Generic;
using Windows.Media;
using Windows.UI.Core;
using Windows.UI.Text;
using Windows.UI.Xaml;
using Windows.UI.Xaml.Controls;
using Windows.UI.Xaml.Media;
using Windows.UI.Xaml.Navigation;
using UwpAppYolo01.Yolo9000;
namespace UwpAppYolo01
public sealed partial class MainPage : Page
private TinyYoloV2Model _model;
private IList<YoloBoundingBox> _boxes = new List<YoloBoundingBox>();
private readonly YoloWinMlParser _parser = new YoloWinMlParser();
private readonly SolidColorBrush _lineBrushYellow = new SolidColorBrush(Windows.UI.Colors.Yellow);
private readonly SolidColorBrush _lineBrushGreen = new SolidColorBrush(Windows.UI.Colors.Green);
private readonly SolidColorBrush _fillBrush = new SolidColorBrush(Windows.UI.Colors.Transparent);
private readonly double _lineThickness = 2.0;
public MainPage()
protected override async void OnNavigatedTo(NavigationEventArgs e)
await CameraPreview.StartAsync();
CameraPreview.CameraHelper.FrameArrived += CameraHelper_FrameArrived;
private async void LoadYoloOnnxModel()
var file = await Windows.Storage.StorageFile.GetFileFromApplicationUriAsync(new Uri("ms-appx:///Tiny-YOLOv2.onnx"));
_model = await TinyYoloV2Model.CreateTinyYoloV2Model(file); //,
private async void CameraHelper_FrameArrived(object sender, Microsoft.Toolkit.Uwp.Helpers.FrameEventArgs e)
if (e?.VideoFrame?.SoftwareBitmap == null) return;
await Dispatcher.RunAsync(CoreDispatcherPriority.Normal, async () =>
var input = new TinyYoloV2ModelInput { Image = e.VideoFrame };
var output = await _model.EvaluateAsync(input);
_boxes = _parser.ParseOutputs(output.Grid.ToArray());
private void DrawOverlays(VideoFrame inputImage)
if (_boxes.Count <= 0) return;
var filteredBoxes = _parser.NonMaxSuppress(_boxes, 5, .5F);
foreach (var box in filteredBoxes)
DrawYoloBoundingBox(box, YoloCanvas);
private void DrawYoloBoundingBox(YoloBoundingBox box, Canvas overlayCanvas)
var x = (uint)Math.Max(box.X, 0);
var y = (uint)Math.Max(box.Y, 0);
var w = (uint)Math.Min(overlayCanvas.ActualWidth x, box.Width);
var h = (uint)Math.Min(overlayCanvas.ActualHeight y, box.Height);
var rectStroke = box.Label == "person" ? _lineBrushGreen : _lineBrushYellow;
var r = new Windows.UI.Xaml.Shapes.Rectangle
Tag = box,
Width = w,
Height = h,
Fill = _fillBrush,
Stroke = rectStroke,
StrokeThickness = _lineThickness,
Margin = new Thickness(x, y, 0, 0)
var tb = new TextBlock
Margin = new Thickness(x + 4, y + 4, 0, 0),
Text = $"{box.Label} ({Math.Round(box.Confidence, 4)})",
FontWeight = FontWeights.Bold,
Width = 126,
Height = 21,
HorizontalTextAlignment = TextAlignment.Center
var textBack = new Windows.UI.Xaml.Shapes.Rectangle
Width = 134,
Height = 29,
Fill = rectStroke,
Margin = new Thickness(x, y, 0, 0)

The only detail that remains to comment is that YoloV2 is designed to work with images of size 416 x 416. In this case you have to resize the control of Webcam and the Canvas To that Size So that the frames are displayed in the correct position.

In the next post I will share the final example, and also add some work of Rescaling To be able to support other definitions different from 416 x 416.

Happy Coding!

Greetings @ Toronto

El Bruno



    1. HL will be tricky because you analyze a 2d camera photo and then you want to create a frame around an object in a 3d world. So, if you don’t move (I mean really don’t move) you can draw the frame with the coordinates of the camera. I’m guessing some math to calculate size and distance. And I’m not sure how to anchor the frame in a 3d world.

      That’s sounds like an amazing challenge! because you can also use an ONNX model directly in a UWP app in Hololens, keep me informed!


Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: