El Bruno

#CognitiveServices – Tutorial to create and publish a complete model in Custom Speech Service (#CRIS)

Hi !

This is my 3rd or 4th time on this, so I better write about this so I won’t forget next time. So, let´s start from the beggining, Custom Speech Service definition (we used to know this as CRIS).

The Custom Speech Service lets you create custom speech-to-text models, tailored to your application’s environment, user population, and vocabulary.

So, I’ll by pass deep technical details, and I’ll share the necessary steps to build and publish a model to allow a speech to text conversion.

We need to start in a well know place: Azure portal and generate a Key for CRIS. We select New and we filter using Cognitive Services, then we select the option to add a new “Custom Speech Service”.

Clipboard03

Important, here we need to copy and store the keys values

Clipboard06

Now we can go to CRIS homepage, and start our model creation. There is a main menu “Custom Speech” which allows us to work with the components of the service.

Clipboard08

Let’s start with the files that we will use as the basis for creating the acoustic model. Select the option “Adaptation data”. In this section we can create 3 types of data elements
- Acoustics Datasets
- Language Datasets
- Pronunciation Datasets
The first 2 are the minimums needed for a functional model. Let’s start by uploading a couple of files into “Acoustic Datasets”. In this section we must define the name and description of this data set and also upload 2 files
- Transcription file is a plain text file, where we specify the filename (WAV) and the text specified in the same
- Audio files is a zip file with all the audio files (WAV) that we upload as a sample

Clipboard11

In this step is important to read and understand the supported formats for the audi files. All the specifications are detailed in CRIS documentation (link).

Property	Value
File Format	RIFF (WAV)
Sampling Rate	8000 Hz or 16000 Hz
Channels	1 (mono)
Sample Format	PCM, 16 bit integers
File Duration	0.1 seconds < duration < 60 seconds
Silence Collar	> 0.1 seconds
Archive Format	Zip
Maximum Archive Size	2 GB

The next step is to upload a text file with samples of the phrases expected in the acoustic model that we will create. This is similar to the intent we process in LUIS. This file is a text file, and it must have an expected attempt in each line.

Clipboard13

And now that we have the basic files, we can create an Acoustic Model. For that we select the option “Acoustic models” from the top menu and create a new one using the elements that we uploaded previously.

Important: Depending on the size of the files that we upload as initial data, the complete process may take a couple of minutes.

Once you have created the acoustic model with the base files, you will be able to see some details of it. In the following image we can see that the accuracy that was obtained with the audio and the model is 88% (there is a 11.68% of errors detected)

Clipboard16

In the “Accuracy Tests” section we can see the recognized text in each audio file and the expected text in it.

Clipboard18

And it’s time to deploy the model to be able to use it in production. We can create a deployment model in the “Deployments” menu option
Once created we can see that we already have the URLs to use it from an App (about this I will write in another post)
And we also have the option to upload an audio file to test our model. Yes, with the specific format and touches noses that requires CRIS !

Clipboard02

In the next post, a bit of C # code to see how we can use this service from an application.

Happy Coding ! 🙂

Greetings @ Toronto (-6!)

El Bruno

References

25 May 2017

4 responses to “#CognitiveServices – Tutorial to create and publish a complete model in Custom Speech Service (#CRIS)”

#CognitiveServices – Console App de ejemplo para analizar audios con Custom Speech Service (#CRIS) – El Bruno

May 26, 2017 3:30 AM

[…] El Bruno, Tutorial to create and publish a complete model in Custom Speech Service (#CRIS) […]

LikeLike

Reply
#CognitiveServices – Sample Console App to perform audio analysis using Custom Speech Service (#CRIS) – El Bruno

May 26, 2017 12:08 PM

[…] El Bruno, Tutorial to create and publish a complete model in Custom Speech Service (#CRIS) […]

LikeLike

Reply
#CognitiveServices – Cómo crear archivos de Audio para utilizar en Custom Speech Service (#CRIS) – El Bruno

Jun 1, 2017 3:31 AM

[…] Tutorial to create and publish a complete model in Custom Speech Service (#CRIS) […]

LikeLike

Reply
#CognitiveServices – How to create audio files for Custom Speech Service (#CRIS) – El Bruno

Jun 1, 2017 9:30 AM

[…] Tutorial to create and publish a complete model in Custom Speech Service (#CRIS) […]

LikeLike

Reply

#CognitiveServices – Tutorial to create and publish a complete model in Custom Speech Service (#CRIS)

Share this:

4 responses to “#CognitiveServices – Tutorial to create and publish a complete model in Custom Speech Service (#CRIS)”

Leave a comment Cancel reply

Discover more from El Bruno