Hi!
Some people ask me a couple of questions regarding the What’s There App, so I’ll write this post to explain the basic features of the app.
I explain to my kid with a simple schema, that the main steps of the app
- The phone will take a photo with the camera
- The photo will be sent to the Vision API for processing
- The Vision API will return information about the stuff discovered in the photo
- The phone will speak up with the description
And after this my kid gets the idea So let’s review each one of the steps and the required code in a Windows Universal App.
Let’s start with the prerequisites. In the package manifest we need to enable the use of the following capabilities Internet, Microphone and WebCam.
To take the picture, I’ve reused this code which takes a picture, saves the picture and returns the file. It also support the work in “silent mode”, which means the picture will be taken without user interaction, and “normal mode” which will display the camera UI to take a photo.
Next step is to analyze the photo using VisionAPI. I added the Microsoft.ProjectOxford.Vision NuGet package to the project.
Note: You need a couple of keys here to use the online service, take a look at this post.
Next step, is to process the photo using Vision API. The following lines are a good sample on how to do this. An important note here, is that the Captions collections will have all the information you need. Each caption is the complete sentence.
If you want to create your own description phrases, you use the Tags collections for this.
And finally, to perform the speech, I use a SpeechSynthesizer class. With this I get the audio stream from the text and later I play this audio stream using a Media Element.
And that’s it! 4 easy steps to create a sample and fun App
Greetings @ Toronto
-El Bruno
References
- Channel9, What’s There? App video
- El Bruno, What’s There? App post
- GitHub, What’s There? App source code