#CognitiveServices – Console App de ejemplo para analizar audios con Custom Speech Service (#CRIS)

Hola !

Ayer publiqué un paso a paso sobre como crear un modelo de reconocimiento de audio a texto con Custom Speech Service. El siguiente paso es un ejemplo de código sobre cómo utilizar el mismo. Para este ejemplo utilizo un archivo wav con un simple párrafo. Desde la consola de prueba de CRIS puedo ver que el mismo se funciona bien.

Clipboard07

Lo siguiente es crear una Console App y agregar el paquete NuGet correspondiente a nuestra arquitectura.

Clipboard01

Importante: Es necesario cambiar la arquitectura de nuestra app a x86 o x64 para poder utilizar el package sin problemas.

Clipboard02

El paquete es el de reconocimiento de texto general utilizando BING (gracias a Victor por el tip!). En caso de querer ver la implementación en WPF, en el repo de GitHub del paquete podemos ver la misma.

Volviendo a la app de Consola, lo siguiente es dar forma a nuestra app. La misma se divide en 3 partes principales

Inicialización del cliente de STT (speech-to-text)
Proceso del archivo wav
Proceso del resultado

El siguiente código es el ejemplo de la App. En el mismo podemos ver

Como en el main se inicializa el cliente de STT con la información de CRIS que creamos en el post anterior
Nos suscribimos a los eventos de procesamiento
En estos eventos mostramos la información en la consola
Abrimos un stream desde el archivo wav y enviamos el mismo en chunks para que lo procese CRIS

	private static DataRecognitionClient _dataClient;

	static void Main()
	{

	var mode = SpeechRecognitionMode.LongDictation;
	var language = "en-US";
	var authenticationUri = "https://westus.api.cognitive.microsoft.com/sts/v1.0/issueToken";
	var crisSubscriptionKey = Config.CrisSubscriptionKey;
	var crisUri = Config.CrisUri;

	_dataClient = SpeechRecognitionServiceFactory.CreateDataClient(mode, language, crisSubscriptionKey, crisSubscriptionKey, crisUri);
	_dataClient.AuthenticationUri = authenticationUri;

	_dataClient.OnResponseReceived += OnDataDictationResponseReceivedHandler;
	_dataClient.OnConversationError += OnConversationErrorHandler;
	_dataClient.OnIntent += OnIntentHandler;

	// start process
	SendAudioHelper("sample01.wav");
	Console.WriteLine("Process started, wait for results …");

	Console.ReadLine();
	}

view raw

ConsoleAppCrisLabs01-Main.cs

hosted with ❤ by GitHub

	private static void OnIntentHandler(object sender, SpeechIntentEventArgs e)
	{
	Console.WriteLine($"OnIntentHandler – Payload: {e.Payload}");
	}

	private static void OnConversationErrorHandler(object sender, SpeechErrorEventArgs e)
	{
	Console.WriteLine($"Exception: {e}");
	}

	private static void OnDataDictationResponseReceivedHandler(object sender, SpeechResponseEventArgs e)
	{
	if (!e.PhraseResponse.Results.Any()) return;
	foreach (var phraseResponseResult in e.PhraseResponse.Results)
	{
	Console.WriteLine($@"Response result
	– Confidence: {phraseResponseResult.Confidence}
	– Display Text: {phraseResponseResult.DisplayText}
	– InverseTextNormalizationResult: {phraseResponseResult.InverseTextNormalizationResult}
	– LexicalForm: {phraseResponseResult.LexicalForm}
	– MaskedInverseTextNormalizationResult: {phraseResponseResult.MaskedInverseTextNormalizationResult}");
	}
	}

view raw

ConsoleAppCrisLabs01-ProcessResults.cs

hosted with ❤ by GitHub

	private static void SendAudioHelper(string wavFileName)
	{
	using (FileStream fileStream = new FileStream(wavFileName, FileMode.Open, FileAccess.Read))
	{
	byte[] buffer = new byte[1024];

	try
	{
	int bytesRead;
	do
	{
	bytesRead = fileStream.Read(buffer, 0, buffer.Length);
	_dataClient.SendAudio(buffer, bytesRead);
	}
	while (bytesRead > 0);
	}
	finally
	{
	_dataClient.EndAudio();
	}
	}
	}

view raw

ConsoleAppCrisLabs01-SendFile.cs

hosted with ❤ by GitHub

La app en ejecución nos muestra el siguiente resultado.

Clipboard04

El código fuente se puede descargar desde GitHub (link)

Saludos @ Toronto (-5!)

El Bruno

References

El Bruno, Tutorial to create and publish a complete model in Custom Speech Service (#CRIS)
GitHub, Cognitive Speech STT Windows
Azure, Use a custom speech-to-text endpoint

#CognitiveServices – Console App de ejemplo para analizar audios con Custom Speech Service (#CRIS)

Share this:

Leave a comment Cancel reply

Discover more from El Bruno