Hi !
Yesterday I wrote a post on how to create and publish an Acoustinc Model in Custom Speech Service to perform a text-to-speech process (TTS). The next step is to add some C# code in an App to use this service. For this sample I will use a sample wav file with single sentencente. When I try this file in CRIS test console I get the following result:
So, it’s working. Let’s create a Console App and add the NuGet package for our platform target.
Important: By default the platform configuration is set to “Any CPU”, we need to change this to x86 or x64 so we can use the Speech NuGet package without any issues.
So, big surprise, this package is the same for CRIS and for BING Speech recognition (Thanks to Victor for this tip!). There is a sample WPF implementation in the GitHub repo which uses the Bing keys and architecture, I’ll continue with my CRIS sample.
Let’s work in the sample Console App. There are 3 main sections here
- Initialize the STT client
- Process the wav file
- Get and process CRIS result
The next pieces of code are part sample App.
- In the Main section I create and init the STT client using the information of my previous post
- To process the wav file, we open and send the file using small chunks to CRIS
- Then we need to subscribe to client events
- In this events me show some of the client received information in the Console App
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
private static DataRecognitionClient _dataClient; | |
static void Main() | |
{ | |
var mode = SpeechRecognitionMode.LongDictation; | |
var language = "en-US"; | |
var authenticationUri = "https://westus.api.cognitive.microsoft.com/sts/v1.0/issueToken"; | |
var crisSubscriptionKey = Config.CrisSubscriptionKey; | |
var crisUri = Config.CrisUri; | |
_dataClient = SpeechRecognitionServiceFactory.CreateDataClient(mode, language, crisSubscriptionKey, crisSubscriptionKey, crisUri); | |
_dataClient.AuthenticationUri = authenticationUri; | |
_dataClient.OnResponseReceived += OnDataDictationResponseReceivedHandler; | |
_dataClient.OnConversationError += OnConversationErrorHandler; | |
_dataClient.OnIntent += OnIntentHandler; | |
// start process | |
SendAudioHelper("sample01.wav"); | |
Console.WriteLine("Process started, wait for results …"); | |
Console.ReadLine(); | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
private static void OnIntentHandler(object sender, SpeechIntentEventArgs e) | |
{ | |
Console.WriteLine($"OnIntentHandler – Payload: {e.Payload}"); | |
} | |
private static void OnConversationErrorHandler(object sender, SpeechErrorEventArgs e) | |
{ | |
Console.WriteLine($"Exception: {e}"); | |
} | |
private static void OnDataDictationResponseReceivedHandler(object sender, SpeechResponseEventArgs e) | |
{ | |
if (!e.PhraseResponse.Results.Any()) return; | |
foreach (var phraseResponseResult in e.PhraseResponse.Results) | |
{ | |
Console.WriteLine($@"Response result | |
– Confidence: {phraseResponseResult.Confidence} | |
– Display Text: {phraseResponseResult.DisplayText} | |
– InverseTextNormalizationResult: {phraseResponseResult.InverseTextNormalizationResult} | |
– LexicalForm: {phraseResponseResult.LexicalForm} | |
– MaskedInverseTextNormalizationResult: {phraseResponseResult.MaskedInverseTextNormalizationResult}"); | |
} | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
private static void SendAudioHelper(string wavFileName) | |
{ | |
using (FileStream fileStream = new FileStream(wavFileName, FileMode.Open, FileAccess.Read)) | |
{ | |
byte[] buffer = new byte[1024]; | |
try | |
{ | |
int bytesRead; | |
do | |
{ | |
bytesRead = fileStream.Read(buffer, 0, buffer.Length); | |
_dataClient.SendAudio(buffer, bytesRead); | |
} | |
while (bytesRead > 0); | |
} | |
finally | |
{ | |
_dataClient.EndAudio(); | |
} | |
} | |
} |
We get the following result from the running app.
We can download the source code from GitHub (link)
Greetings @ Toronto (-5!)
El Bruno
References
2 comments