#CognitiveServices – How to create audio files for Custom Speech Service (#CRIS)


A few days ago I was asked about an easy way to create audio files to be used as datasets in Custom Speech Service (CRIS). As I mentioned in a previous post, the audio files must have special features, so it is important to create them correctly.

Note: the files are WAVs files, mono and another pair of details makes it not easy to create them in a single step.

Although there are several ways to create these files, this is the one I use and it works.

  • To record the audio I use an app that comes by default in Windows: Voice Recorder


  • I guess I don’t need to explain how the app works. Just press the microphone button. Nor do we expect many options in the Settings section.


  • Once we have recorded a session, we can access the list of recordings. If we see the record path of the file we will see that it is recorded with the name “Recording.m4a”


  • Now is the time to find a way to convert M4A files to WAV. In this case I use VLC (link). The software is well known, so I will not write a lot about it. In VLC Select the option “Media // Convert / Save …”


  • Select a file and press the option “Convert”


  • In this step we must create a profile with the information needed to create compatible CRIS compatible files.
  • I created a profile called “WAV Cris 02” with the following configurations
  • Encapsulation: WAV


  • Audio codec with the values required by CRIS


  • Now we can use this profile to convert our M4A file to WAV


  • Ready! We have a WAV file which is compatible with CRIS requirements and we can use the file for our data models.

Happy coding ! 😀

Saludos @ Burlington

El Bruno



  1. Hello Bruno, thank you for the descriptive and excellent example on using CRIS. Is there anyway to get audio stream using custom app and send it to CRIS, though. I know there is a Speech SDK, but it looks too low level or may be this is just a wrong understanding from my part. Cortana does not have hooks to catch audio, so, what would you think is the best way to catch audio from user and use CRIS for TTS?

    Thank you,


    1. Hi Vlad,
      Thanks! I’m guessing you are looking for a simple way to record audio in a UWP File. I’l share an example later this week.


Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: