Channels ▼


Using Speech APIs in Windows Phone 8

In the last article, I defined the SearchRecipe voice command that used three PhraseList elements: recipeTypes, recipeConnectors, and recipeElements. The phone navigated to the page as a result of the execution of the SearchRecipe voice command, so the values for the three PhraseList elements were included as additional parameters.

Our code sample uses an if statement to check whether NavigationContext.QueryString is not null and contains the "voiceCommandName" key. If that's true, it means that a voice command made the app navigate to this page. If the value of the "voiceCommandName" key is "SearchRecipe," you need to retrieve additional parameters — the values for the three PhraseList elements defined for the voice command. The code makes three calls to this.NavigationContext.QueryString.TryGetValue to try to retrieve each of the values and displays a message box with the retrieved parameters for the search (Figure 6).

Windows Phone 8 App Development Part 3
Figure 6: The app displays a dialog box with a message built with the parameters retrieved from the voice command.

In this case, the code doesn't check the Boolean result for each call to this.NavigationContext.QueryString.TryGetValue. Depending on your voice commands, you might check the result to make sure you have all the necessary parameter values--and in the event you're missing some, to provide feedback to the user. However, whenever you have a PhraseList token in a voice command, you must take into account that the parameters are not optional, so for those PhraseList tokens, they cannot be empty values.

Using Speech Recognition and Updating Phrase Lists within the App

Suppose that, instead of having a recipeElements PhraseList with just four items, I want to dynamically add items to the PhraseList within the app. It is possible to take advantage of the speech-recognition feature to allow the user to say the main element for a new recipe. This way, the user will be able to search recipes with more elements than the initial elements included in the VoiceCommandDefinition.xml VCD file. Note that speech recognition without any kind of grammar isn't as accurate as you might expect. However, I will also provide an example of using grammar to improve accuracy.

To keep things simple and stay focused on speech topics. The App class, defined in App.xaml.cs, has the following public static property:

  public static List<string> RecipeElements { get; set; }

In addition, the App class initializes RecipeElements with the following code in its constructor:

  RecipeElements = new List<string>()

This way, you can access App.RecipeElements and it includes the same contents that you had in the recipeElements PhraseList. Of course, in a real-life app, you would have additional classes to manage shared resources within your app. The following code uses the standard speech recognition UI to ask the user for the main element for the new recipe. The code includes a line that is commented out because I'll use it later:

using (var mainElementNameRecognizer = new SpeechRecognizerUI())
    mainElementNameRecognizer.Settings.ListenText = "Which is the main element?";
    mainElementNameRecognizer.Settings.ExampleText = "e.g. chicken, turkey";
    //// The following line would use a more appropriate predefined grammar
    ////mainElementNameRecognizer.Recognizer.Grammars.AddGrammarFromPredefinedType("websearch", SpeechPredefinedGrammar.WebSearch);
    var result = await mainElementNameRecognizer.RecognizeWithUIAsync();

    if ((result.ResultStatus == SpeechRecognitionUIStatus.Succeeded)
        && (result.RecognitionResult.TextConfidence != SpeechRecognitionConfidence.Rejected))
        var mainElementName = result.RecognitionResult.Text.ToLower().TrimEnd('.');

        if (!App.RecipeElements.Contains(mainElementName))

        //// Now, I have to update the PhraseList
        var recipesVoiceCommands = VoiceCommandService.InstalledCommandSets["RecipesVoiceCommands"];
        await recipesVoiceCommands.UpdatePhraseListAsync(

It is necessary to add the following using statements:

  • using Windows.Phone.Speech.Recognition;
  • using Windows.Phone.Speech.VoiceCommands;

This code just creates a new instance of Windows.Phone.Speech.Recognition.SpeechRecognizerUI, then initializes the settings to display "Which is the main element?" and provide sample text. Then, the code calls the RecognizeWithUIAsync method with an asynchronous execution. The method begins a speech recognition session with the default UI (Figure 7), and saves its results in the result local variable. The code continues execution after either the speech recognition finishes or the user cancels the recognition.

Windows Phone 8 App Development Part 3
Figure 7: A speech recognition session to ask the user which was the main element of the recipe.

If cloud-based speech recognition can hear what you said, it displays the results of the recognition and the phone's voice tells you what you said (Figure 8). Don't be surprised if the recognition has nothing to do with what you said. However, with some minor improvements to the code, you can really increase accuracy.

Windows Phone 8 App Development Part 3
Figure 8: The speech recognition results provide feedback to the user.

Because the code didn't specify any grammar details, the speech recognition uses a default grammar that is prepared to listen for sentences. Thus, the recognition results will start with an uppercase letter and will end with a period (see Figure 8). Because the speech recognizer UI displays a Cancel button, the user might cancel the speech recognition, so you must always check the value for result.ResultStatus before processing the speech recognition results.

The code makes sure that the following conditions are met:

  • result.ResultStatus is equal to SpeechRecognitionUIStatus.Succeeded
  • result.RecognitionResult.TextConfidence is not equal to SpeechRecognitionConfidence.Rejected

Again, even if the value for TextConfidence is not equal to SpeechRecognitionConfidence.Rejected, the recognition results might not be as accurate as expected. If the conditions are met, the code converts the recognized text available in result.RecognitionResult.Text to lower case, removes the final period, and checks whether the App.RecipeElements list contains the recognized element.

When the element is new, it is necessary to update the PhraseList. The following line retrieves a VoiceCommandSet instance from a dictionary with all the installed command sets:

  var recipesVoiceCommands = VoiceCommandService.InstalledCommandSets["RecipesVoiceCommands"];

Then, the code calls the recipesVoiceCommands.UpdatePhraseListAsync method with an asynchronous execution and two parameters: the PhraseList name to update, and the entire list that defines it. This way, the PhraseList will include all the elements included in App.RecipeElements:

  await recipesVoiceCommands.UpdatePhraseListAsync(

It is easy to update an existing PhraseList. In this example, I'm updating the PhraseList by using the results of speech recognition. Obviously, it isn't the best idea for a real-life app in which you want the PhraseList elements to be extremely accurate. I just wanted to use a short example to explain these features: You can use the ideas in more complex scenarios.

Working with Predefined Grammar and Lists in Speech Recognition

When app needs to recognize words instead of sentences, the speech recognition engine provides a more suitable predefined grammar: the Web search grammar. If you uncomment the following line from the previously shown code, the recognizer will use Web search grammar instead of the default dictation grammar. This way, when you say "octopus", the results will be "octopus" instead of "Octopus." because the Web search grammar doesn't force the first letter to uppercase and it removes the end of sentence period.

mainElementNameRecognizer.Recognizer.Grammars.AddGrammarFromPredefinedType("websearch", SpeechPredefinedGrammar.WebSearch);

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.