When the app uses different PhraseList
elements for the voice commands, the speech recognition is pretty accurate. You can also use lists to constrain the text against which the speech recognizer must match. This significantly improves accuracy. For example, each recipe requires a specific skill level and you don't want the user to have to select the skill level from a dropdown list. You can provide the list to the speech recognizer in a similar way to how you defined the elements for the PhraseList
.
The following lines constrain the possible results of speech recognition to the values included in the skillLevels
List<string>
by calling the skillLevelRecognizer.Recognizer.Grammars.AddGrammarFromList
method:
var skillLevels = new List<string>() { "rookie", "good cooker", "chef", "great chef" }; using (var skillLevelRecognizer = new SpeechRecognizerUI()) { skillLevelRecognizer.Settings.ListenText = "Which is the skill level required for this recipe?"; skillLevelRecognizer.Settings.ExampleText = string.Join(", ", skillLevels); skillLevelRecognizer.Recognizer.Grammars.AddGrammarFromList("skillLevel", skillLevels); var result = await skillLevelRecognizer.RecognizeWithUIAsync(); if ((result.ResultStatus == SpeechRecognitionUIStatus.Succeeded) && (result.RecognitionResult.TextConfidence != SpeechRecognitionConfidence.Rejected)) { var skillLevel = result.RecognitionResult.Text; } }
The code creates a new instance of Windows.Phone.Speech.Recognition.SpeechRecognizerUI
, and initializes the settings to display "Which is the main element?" and provide sample text by joining the strings in the list (Figure 9). This way, the user knows what he can say.
Figure 9: A speech recognition session to ask the user which is the skill level required for the recipe.
If cloud-based speech recognition can hear what you said, it displays the results of the recognition and the phone's voice will tell you what you said (Figure 10). You will notice that the recognition has really improved its accuracy with the use of the list.
Figure 10: The speech recognition results provide feedback to the user.
You can also add your own grammar definitions to the speech recognizer by using an XML file that conforms to the Speech Recognition Grammar Specification (SRGS) W3C standard. With SRGS, you can improve accuracy for speech recognition required in complex scenarios. If you want to dive deeper on SRGS, you should check out the SRGS 1.0 specification.
Providing a Voice Response with Text-to-Speech
If you want to have an app that provides a voice-driven UX, you must use Text-to-Speech, also known as TTS, in order to turn text into spoken words. If the user is speaking to the phone, he won't want to read the output on the screen. Instead, he will expect the phone to provide voice feedback for each interaction.
The basic use of TTS is pretty simple. Add the following using statement to your code:
using Windows.Phone.Speech.Synthesis;
Now, you need only create a new instance of Windows.Phone.Speech.Synthesis.SpeechSynthesizer
and call its SpeakTextAsync
method with an asynchronous execution (and with the text that the phone's voice must read back to the user). The following lines show an example of TTS informing the user the recipe with a specific main
element has been added to his wish list:
var mainRecipeElement = "tomatoes"; var speechSynthesizer = new SpeechSynthesizer(); await speechSynthesizer.SpeakTextAsync(string.Format("I've added the new recipe with {0} to your wish list.", mainRecipeElement));
The SpeakTextAsync
method is useful when you want the phone's voice to read one sentence. However, if you want the phone to read all the necessary steps for a recipe, you probably want to introduce breaks between each step. The speech synthesizer supports the W3C Speech Synthesis Markup Language (SSML) standard with minor differences You can use SSML to provide hints to the synthesizer on how to read the text.
The following lines show a simple example of three recipe steps that the code uses to generate an SSML string, which the synthesizer will read:
var recipeSteps = new List<string>() { "Cut one tomato into 5 pieces", "Add olive oil to the tomato's pieces", "Cut three small potatoes" }; var recipeSSMLBuilder = new System.Text.StringBuilder(); recipeSSMLBuilder.Append("<speak version=\"1.0\" xml:lang=\"en-us\">"); foreach (var step in recipeSteps) { recipeSSMLBuilder.Append(string.Format("{0}{1}", step, "<break time=\"1s\" />")); } recipeSSMLBuilder.Append("</speak>"); var recipeSSML = recipeSSMLBuilder.ToString(); var speechSynthesizer = new SpeechSynthesizer(); await speechSynthesizer.SpeakSsmlAsync(recipeSSML);
I've used a StringBuilder
that will produce the following SSML XML when converted to a string:
<speak version=\"1.0\" xml:lang=\"en-us\"> Cut one tomato into 5 pieces<break time=\"1s\" /> Add olive oil to the tomato's pieces<break time=\"1s\" /> Cut three small potatoes<break time=\"1s\" /> </speak>"
This way, the speech synthesizer will break one second after reading each recipe step. Once the SSML XML is built, the code creates a new instance of Windows.Phone.Speech.Synthesis.SpeechSynthesizer,
and calls its SpeakSsmlAsync
with an asynchronous execution and with the SSML XML. SSML allows you to further customize the speech output. If you want to dive deeper into SSML, consult the SSML 1.0 specification.
By using voice commands, speech recognition, and TTS capabilities, you can provide a complete speech-driven UX in Windows Phone 8 apps. Because many Windows Phone 8 apps take advantage of the speech features by default, users are expecting more apps that provide similar experiences. Give 'em what they want, and they'll be happy customers!
Gaston Hillar is a frequent contributor to Dr. Dobb's.
Related Articles
Windows Phone 8 App Development: Getting Started
Windows Phone 8 App Development: Using Voice Commands