Channels ▼
RSS

.NET

Windows Phone 8 App Development: Using Voice Commands


In the first article in this three-part series on Windows Phone 8 app development, I explained how to prepare Visual Studio 2012 with the needed updates to develop apps for Windows Phone 8. In this article, I explain how to develop an app that takes advantage of one of new features related to the speech APIs for Windows Phone 8: Voice Commands.

The Evolution of Voice Commands and the Global Speech Experience

Windows Phone 7.x allowed users to press and hold the hardware Start button to launch the speech dialog known as the Global Speech Experience (GSE). The user could open Store apps, make phone calls, generate and execute Bing search queries, and send text messages with voice commands.

Windows Phone 8 goes a step further and allows app developers to register voice commands that the user can invoke from the GSE. Thus, you can preemptively take advantage of voice commands to allow users to perform actions with your app by speaking to the phone. In fact, Windows Phone 8 enables app developers to extend more built-in experiences than Windows Phone 7.x. Voice commands are extensibility points that Windows Phone 8 allows for the Speech built-in experience. New extensibility points exist for the following built-in experiences (that weren't possible with Windows Phone 7.x):

  • Camera: Lenses.
  • Lock screen: Background photo, Detailed status, and Quick status.
  • Maps: Navigation.
  • People: Custom contact stores.
  • Photo viewer: Edit photos.
  • Speech: Voice commands.
  • Wallet: Wallet items.

These extensibility points were possible with Windows Phone 7.x and are still available with Windows Phone 8:

  • Music & Videos: New list, Now playing tile, and History list.
  • Photos: App pivot
  • Photo viewer: Share and apps
  • Search: Search quick cards

In this article, I'll focus on voice commands, but you should consider the aforementioned list when you design your apps. Extensibility points are very important when creating apps that comply with the Windows Phone 8 UX.

The easiest way to understand the GSE is by using it in a Windows Phone 8 device or in the emulator. If you haven't played with the new GSE yet, it is easy to learn its ins and outs by creating a new app that targets Windows Phone 8 and configuring the emulator. In Visual Studio 2012, follow these steps to create a new Windows Phone 8 app based on the simplest App template:

  • Select File | New | Project… The New Project dialog box will appear.
  • Select Templates | Visual C# | Windows Phone in the left pane. The templates list will display eleven templates for the selected target.
  • Select Windows Phone App (Figure 1). The template generates a single-page project for a Windows Phone app with a XAML-based UI and navigation features. In this example, I want to explain how the app can take advantage of the new extensibility points for Speech, and this template provides me with a very simple app that I can easily customize.

Windows Phone 8 App Development Part 2
Figure 1: Creating a new Windows Phone app based on the simplest XAML-based UI template.

  • Enter the desired name in the Name textbox. I'll use "Recipes" because my sample app will receive voice commands related to cooking recipes.
  • Click OK and the New Windows Phone Application dialog box will display a dropdown list with the different Windows Phone OS versions that you can choose as the target for your new app. In this case, I want to take full advantage of all the new speech-related features available in Windows Phone 8, so I select Windows Phone OS 8.0 in the dropdown list (see Figure 2).

Windows Phone 8 App Development Part 2
Figure 2: The new Windows Phone Application dialog box allows you to select the target Windows Phone OS version for the new project. When you select Windows Phone OS 8.0, it won't be possible to execute the app on previous Windows Phone OS versions.

  • Click OK and the IDE will create a new solution with a Windows Phone app.

If you take a look at the references included in the new project, you will notice that there are just two items (see Figure 3):

  • .NET for Windows Phone
  • Windows Phone

Windows Phone 8 App Development Part 2
Figure 3: The XAML preview of the default page for the Windows Phone app (left) and the project structure (right).

You will also notice there are two well-known members for XAML-based UI apps: App.xaml and MainPage.xaml. MainPage.xaml is the only page that the selected template generates for this app and it is the default navigation page in the app manifest. To use voice commands, you must activate the following capabilities in the app manifest file:

  • ID_CAP_MICROPHONE
  • ID_CAP_NETWORKING
  • ID_CAP_SPEECH_RECOGNITION

You will find a WMAppManifest.xml file within the Properties folder for the Windows Phone app project (Recipes). If you double-click on WMAppManifest.xml, Visual Studio will display the Windows Phone app manifest designer and allow you to change the values for many properties for your app. Click on the Capabilities tab and make sure that the aforementioned capabilities are activated (Figure 4).

Windows Phone 8 App Development Part 2
Figure 4: The Windows Phone App Manifest Designer displaying the capabilities that the Recipes app requires. Windows Phone 8 apps require you to specify all the capabilities that the app requires in order to run without problems in the device.

Now, select the desired emulator from the Debug Target dropdown list (Figure 5) and Visual Studio 2012 will build the Recipes app, launch the emulator, transfer, and run the app (Figure 6).

Windows Phone 8 App Development Part 2
Figure 5: Selecting the WGA 512MB emulator as the debug target in order to test the GSE in the emulator.

Windows Phone 8 App Development Part 2
Figure 6: The emulator executing the Recipes app that displays the contents of MainPage.xaml.

Once the app has been executed in the emulator, you can activate the GSE if you have a microphone. Press and hold the hardware Start button in the emulator or press F2. The first time you do this, the "What can I say?" helper page will appear displaying many examples of the available voice commands and asking you whether you want to send Microsoft the words you speak in order to improve the speech recognition service (Figure 7).

Windows Phone 8 App Development Part 2
Figure 7: The first time the "What can I say?" helper page appears in the emulator with the question about sharing information with Microsoft.

Once you accept or decline the offer, the GSE dialog will appear and will display a random example of an available voice command below the "Listening…" title (Figure 8).

Windows Phone 8 App Development Part 2
Figure 8: The GSE dialog listening to your voice commands.

If you tap on the help button (the question mark icon located on the upper-right corner), the "What can I say?" helper page will appear again displaying some examples of common voice commands in the "common" pivot (see Figure 9).

Windows Phone 8 App Development Part 2
Figure 9: The "What can I say?" helper page displaying the "common" pivot.

Tap "speak" and say "Open Calendar." The GSE will recognize the voice command "Open" and the app name "Calendar." Windows Phone will display the app icon, its name, and feedback from the voice command. In this case, the feedback is just "Starting…" and you will hear the phone's voice saying "Starting Calendar" (see Figure 10).

Windows Phone 8 App Development Part 2
Figure 10: Windows Phone starting the Calendar via the "Open Calendar" voice command.

The GSE will know when a request has been completed, and at that point, it will try to match the verbal request with a valid voice command for the phone and the installed apps that registered voice commands. If GSE doesn't find a match, it will provide feedback. For example, if you say "Open Unknownapp," Windows Phone will understand the "Open" voice command, but it won't find an installed app with the name "Unknownapp" and will notify the user. If the GSE doesn't find a match and doesn't understand any valid command, it performs a Bing search on the phrase with the results of the voice recognition.

Registering Voice Command Definitions

You can also use GSE to provide voice commands for your app. For example, I want the Recipes app to allow users to activate the app and navigate to the page that allows them to create a new recipe with a voice command. In addition, I want the user to know the voice commands available for the Recipes app, so I want to provide valuable information on the "Apps" pivot of the "What can I say?" helper page.

The first step is to create a Voice Command Definition file, known as VCD file, for the Windows Phone App project. The VCD file uses an XML schema to describe the voice commands that an app accepts for one or more specific languages, the feedback provided to the user when the GSE recognizes the command, and the page to which the app must navigate when the command is recognized.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video