During August, I’ll be participating and supporting a couple of hackathons and work events (check my next events section!).
happy to share that on September 5th I’ll part of the Global AI
Night in Toronto.
The Global AI Night is a free evening event organized by 88 communities all over the world that are passionate about Artificial Intelligence on the Microsoft Azure. During this AI Night you will get inspired through sessions and get your hands dirty during the workshops. By the end of the night you will be able to infuse AI into your applications.
During the past months, I’ve been playing around with several Image Analysis tools. And ImageAI (see references) is one that deserves a full series of posts. Please take a look at the product and the source code in GitHub, and also please thank the one behind this: Moses Olafenwa (@OlafenwaMoses).
And now, my
2 cents. I’ve started to test ImageAI to create my own image detection models. Most
of the times, this is a hard path to do, however ImageAI show me an interesting
… with the latest release of ImageAI v2.1.0, support for training your custom YOLOv3 models to detect literally any kind and number of objects is now fully supported, …
mean that I can pick up my own set of images dataset and train on top of a YOLOv3
and use it as a trained model. Again, this is amazing.
So, I started to read the article [Train Object Detection AI with 6 lines of code, see references] where Olafenwa explains how to do this using a data set with almost 500 rows with images for Hololens and Oculus Rift.
The code is
very simple and easy to read. There are also examples on how to analyze a
single file, or a video, or even a camera feed. The output for the analysis can
be also in a new file, in a processed video or even a full log file with the
to read the code samples and I realized that I’m missing a scenario:
realtime feed from a webcam, analyze each webcam frame and if a device is
found, add a frame to the realtime feed to display this.
I use OpenCV
to access to my camera, and it took me some time to figure out how to convert my
OpenCV2 camera frame to the format needed by ImageAI. At the end, thanks to the
GitHub code I manage to create this (very slow but working) demo
As usual in
this scenario, now it’s time to improve the performance and start testing with
some tweaks to get a decent up and running App.
I usually use the live subtitle demo feature in PowerPoint to showcase how amazing is the current state of AI, and how we can use it in our daily lives. And now, after the official announcement of Microsoft, we can also use the live subtitle feature in Microsoft Teams.
As you can expect the way to use it is very easy, just enable the Live Subtitles feature and Microsoft Teams will automatically start to
Listen to every audio conversation
Convert the audio to text
Present the live as a subtitle in the MS Teams windows
In the official announcement there is a nice animation on this
We may expect to have also maybe some extra features like language translations and more. That will be also so cool!
is, one more time, related to some amazing Artificial Intelligence features
embedded in Microsoft Office. And this is very helpful if you work in an
organization with tons of Acronyms. I’m sure, you have your own set of acronyms
at different levels: Team, Group and Organization.
are new to this Acronyms, is very hard to get up to date with all of them. That’s
why the Acronyms feature in Word is very important, it may help us and save us lot
page is the [References] tab in the Ribbon, or you can just search for it.
Once, you enabled the pane, it will analyze the text of your Word document and also analyze the definitions mostly used on your organization to get a sense of “what can be an Acronym“. It will leverage the Microsoft Graph to surface definitions of terms that have been previously defined across emails and documents.
amazing example of AI in our day to day use.
I’ve write a couple of time about project Malmo and Minecraft, so if you like Minecraft and Artificial Intelligence, MineRL will make your day. Let’s start with some basis:
MineRL is a large-scale dataset on Minecraft of seven different tasks, which highlight a variety of research challenges including open-world multi-agent interactions, long-term planning, vision, control, navigation, and explicit and implicit subtask hierarchies.
There are 2 main ways to be involved with MineRL, entering the AI (DL) competition, or playing Minecraft (to create more source data to train and test models!)
In the play more, MineRL want to solve Minecraft using state-of-the-art Machine Learning! To do so, MineRL is creating one of the largest datasets of recorded human player data. The dataset includes a set of tasks which highlights many of the hardest problems in modern-day Reinforcement Learning: sparse rewards and hierarchical policies.
There is plenty of information and details on the main website, and as soon as I finish some of my current work and personal projects, I’ll for sure spend more time here!
A couple of days ago, Google presented Translatotron. The name is not the best name, however the idea is amazing:
Google researchers trained a neural network to map audio “voiceprints” from one language to another. After the tool translates an original audio, Translatotron retains the voice and tone of the original speaker. It converts audio input directly to audio output without any intermediary steps.
As usual, the best way to understand this, is to see Translatotronin action. Let’s take a look at the following audios.
This is an amazing technology, and also a great starting point for scenarios where it’s important to keep original speaker vocal characteristics. And let me be honest, it’s also scary if you think on Fake Voice scenarios.
I’m very lucky to be at the next Chicago CodeCamp with another session around Custom Vision:
How a PoC at home can scale to Enterprise Level using Custom Vision APIs
It all started with a DIY project to use Computer Vision for security cameras at home. A custom Machine Learning model is the core component used to analyze pictures to detect people, animals and more in a house environment. The AI processing is performed at the edge, in dedicated hardware and the collected information is stored in the cloud. The same idea can be applied to several CCTV scenarios, like parking lots, train stations, malls and more. However, moving this into enterprise scale brings a set of challenges, which are going to be described and explained in this session.
Today’s announcement is a big one if you are interested on move AI capabilities to the Edge. The Windows team make public the preview of Windows Vision Skills framework:
Vision Skills framework is meant to standardize the way AI and CV is put to use
within a WinRT application running on the edge. It aims to abstract away the
complexity of AI and CV techniques by simply defining the concept of skills
which are modular pieces of code that process input and produce output. The
implementation that contains the complex details is encapsulated by an
extensible WinRT API that inherits the base class present in this namespace, which
leverages built-in Windows primitives which in-turn eases interop with built-in
acceleration frameworks or external 3rd party ones.
The official blog explain the basic features of the framework and describes a set of scenarios like Object Detector, Skeletal Detector, and Emotion Recognizer.
We have UWP
Apps in the repo samples, and it only took 1 min to setup everything to get the
App up and running. In the following image, it smoothly detects a person and a
image is the sample for Skeletal detector (as a old Kinect dev, this really
makes me happy!)
This is an big
announcement, because all of this APIs are native , and that means we can
easily use them in