Channels ▼
RSS

Web Development

Voice: It's The New UI


Gaston Hillar is an IT consultant and author of more than 40 books on topics ranging from systems programming to IT project management.


Voice has yet to gain a foothold with developers, but Windows 7 is about to change that. Exchange Server 2010 includes Voice Mail Preview, so the push is on to include voice and speech awareness and interaction in applications.

"Voice is the new touch," says Zig Serafin, GM of Microsoft's speech group. "It's the natural evolution from keyboards and touch screens."

Speech-aware apps recognize human speech and react to commands. They talk back to users instead of displaying text, letting people interact with their computers in the same way they interact with other people. These apps have two components:

  • Speech recognition to convert spoken words and sentences to text. Windows 7 lets users train the speech-recognition system to transform it into a voice-recognition system. This way, the speech-recognition engine improves its accuracy based on the user's unique vocal sounds.
  • Speech synthesis to artificially produce human speech and talk to users. Windows 7's Text-To-Speech (TTS) engine converts text in a specific language into speech.

The Speech Recognition Control Panel application (Figure 1) offers everything you need to configure your microphone and train your computer to better understand you. You can find this application in the Ease of Access category. It offers access to the Text to Speech tab in the Speech Properties dialog box. This tab lets you select the default voice to use for the TTS engine. You can use a text to preview the voice and configure its speed and output settings (see Figure 2).

[Click image to view at full size]
Figure 1: The Speech Recognition options in Windows 7 Control Panel.

[Click image to view at full size]
Figure 2: The default configuration for the TTS engine in Windows 7.

However, one of the great problems about speech-related services in Windows 7 is that the APIs and the wrappers for managed code are a bit complex and lack documentation. Thus, I present in this article example C# programs to help you to create speech-aware applications.

Talking to People

To get an app to talk to users, you use speech synthesis services, wrappers provided by both .NET Framework 3.5 and .NET Framework 4 (Release Candidate). First, add the System.Speech.dll assembly as a reference to an existing C# project, then include the System.Speech.Synthesis namespace to access the classes, types, and enumerations offeredby the speech synthesis wrapper. You can create a new instance of the SpeechSynthesizer class and call its Speak method with the text to speak:


using System.Speech.Synthesis;

This way, the TTS engine uses the default voice, its parameter values, and audio output to turn the received text into human speech:


var synthesizer = new SpeechSynthesizer();
synthesizer.Speak ("Hello! How are you?");

The statement after the call to the Speak method isn't executed until the TTS engine finishes saying "Hello! How are you?" To create a more responsive speech-aware application, call the SpeakAsync method, which produces the same effect as Speak but continues to the next statement after scheduling an asynchronous operation to transform the received text to speech:


synthesizer.SpeakAsync("Good morning!");

You may need to cancel an asynchronous scheduled speak command. It is possible to create a Prompt instance for each text to speak, then call the SpeakAsyncCancel for the SpeechSynthesizer instance, with the Prompt instance to be canceled as a parameter. This way, it is possible to cancel a specific text as needed. The following lines show an example that cancels "How are you?"


var prompt1 = new Prompt("Good morning!", SynthesisTextFormat.Text);
var prompt2 = new Prompt("How are you?", SynthesisTextFormat.Text); 

synthesizer.SpeakAsync(prompt1);
synthesizer.SpeakAsync(prompt2);

// Cancel prompt1 -> "How are you?"
synthesizer.SpeakAsyncCancel(prompt1);

You can also cancel all the scheduled asynchronous speak operations by calling the SpeakAsyncCancellAll for the SpeechSynthesizer instance, without parameters.


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 
Dr. Dobb's TV