#CognitiveServices – Sample Console App to perform audio analysis using Custom Speech Service (#CRIS)

Hi !

Yesterday I wrote a post on how to create and publish an Acoustinc Model in Custom Speech Service to perform a text-to-speech process (TTS). The next step is to add some C# code in an App to use this service. For this sample I will use a sample wav file with single sentencente. When I try this file in CRIS test console I get the following result:

Clipboard07

So, it’s working. Let’s create a Console App and add the NuGet package for our platform target.

Clipboard01

Important: By default the platform configuration is set to “Any CPU”, we need to change this to x86 or x64 so we can use the Speech NuGet package without any issues.

Clipboard02

So, big surprise, this package is the same for CRIS and for BING Speech recognition (Thanks to Victor for this tip!). There is a sample WPF implementation in the GitHub repo which uses the Bing keys and architecture, I’ll continue with my CRIS sample.

Let’s work in the sample Console App. There are 3 main sections here

  • Initialize the STT client
  • Process the wav file
  • Get and process CRIS result

The next pieces of code are part sample App.

  • In the Main section I create and init the STT client using the information of my previous post
  • To process the wav file, we open and send the file using small chunks to CRIS
  • Then we need to subscribe to client events
  • In this events me show some of the client received information in the Console App


private static DataRecognitionClient _dataClient;
static void Main()
{
var mode = SpeechRecognitionMode.LongDictation;
var language = "en-US";
var authenticationUri = "https://westus.api.cognitive.microsoft.com/sts/v1.0/issueToken";
var crisSubscriptionKey = Config.CrisSubscriptionKey;
var crisUri = Config.CrisUri;
_dataClient = SpeechRecognitionServiceFactory.CreateDataClient(mode, language, crisSubscriptionKey, crisSubscriptionKey, crisUri);
_dataClient.AuthenticationUri = authenticationUri;
_dataClient.OnResponseReceived += OnDataDictationResponseReceivedHandler;
_dataClient.OnConversationError += OnConversationErrorHandler;
_dataClient.OnIntent += OnIntentHandler;
// start process
SendAudioHelper("sample01.wav");
Console.WriteLine("Process started, wait for results …");
Console.ReadLine();
}


private static void OnIntentHandler(object sender, SpeechIntentEventArgs e)
{
Console.WriteLine($"OnIntentHandler – Payload: {e.Payload}");
}
private static void OnConversationErrorHandler(object sender, SpeechErrorEventArgs e)
{
Console.WriteLine($"Exception: {e}");
}
private static void OnDataDictationResponseReceivedHandler(object sender, SpeechResponseEventArgs e)
{
if (!e.PhraseResponse.Results.Any()) return;
foreach (var phraseResponseResult in e.PhraseResponse.Results)
{
Console.WriteLine($@"Response result
– Confidence: {phraseResponseResult.Confidence}
– Display Text: {phraseResponseResult.DisplayText}
– InverseTextNormalizationResult: {phraseResponseResult.InverseTextNormalizationResult}
– LexicalForm: {phraseResponseResult.LexicalForm}
– MaskedInverseTextNormalizationResult: {phraseResponseResult.MaskedInverseTextNormalizationResult}");
}
}


private static void SendAudioHelper(string wavFileName)
{
using (FileStream fileStream = new FileStream(wavFileName, FileMode.Open, FileAccess.Read))
{
byte[] buffer = new byte[1024];
try
{
int bytesRead;
do
{
bytesRead = fileStream.Read(buffer, 0, buffer.Length);
_dataClient.SendAudio(buffer, bytesRead);
}
while (bytesRead > 0);
}
finally
{
_dataClient.EndAudio();
}
}
}

We get the following result from the running app.

Clipboard04

We can download the source code from GitHub (link)

Greetings @ Toronto (-5!)

El Bruno

References

2 comments

Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: