Gaston Hillar is an IT consultant and author of more than 40 books on topics ranging from systems programming to IT project management.
Voice has yet to gain a foothold with developers, but Windows 7 is about to change that. Exchange Server 2010 includes Voice Mail Preview, so the push is on to include voice and speech awareness and interaction in applications.
"Voice is the new touch," says Zig Serafin, GM of Microsoft's speech group. "It's the natural evolution from keyboards and touch screens."
Speech-aware apps recognize human speech and react to commands. They talk back to users instead of displaying text, letting people interact with their computers in the same way they interact with other people. These apps have two components:
- Speech recognition to convert spoken words and sentences to text. Windows 7 lets users train the speech-recognition system to transform it into a voice-recognition system. This way, the speech-recognition engine improves its accuracy based on the user's unique vocal sounds.
- Speech synthesis to artificially produce human speech and talk to users. Windows 7's Text-To-Speech (TTS) engine converts text in a specific language into speech.
The Speech Recognition Control Panel application (Figure 1) offers everything you need to configure your microphone and train your computer to better understand you. You can find this application in the Ease of Access category. It offers access to the Text to Speech tab in the Speech Properties dialog box. This tab lets you select the default voice to use for the TTS engine. You can use a text to preview the voice and configure its speed and output settings (see Figure 2).
However, one of the great problems about speech-related services in Windows 7 is that the APIs and the wrappers for managed code are a bit complex and lack documentation. Thus, I present in this article example C# programs to help you to create speech-aware applications.
Talking to People
To get an app to talk to users, you use speech synthesis services, wrappers provided by both .NET Framework 3.5 and .NET Framework 4 (Release Candidate). First, add the System.Speech.dll assembly as a reference to an existing C# project, then include the System.Speech.Synthesis namespace to access the classes, types, and enumerations offeredby the speech synthesis wrapper. You can create a new instance of the SpeechSynthesizer class and call its Speak method with the text to speak:
This way, the TTS engine uses the default voice, its parameter values, and audio output to turn the received text into human speech:
var synthesizer = new SpeechSynthesizer(); synthesizer.Speak ("Hello! How are you?");
The statement after the call to the Speak method isn't executed until the TTS engine finishes saying "Hello! How are you?" To create a more responsive speech-aware application, call the SpeakAsync method, which produces the same effect as Speak but continues to the next statement after scheduling an asynchronous operation to transform the received text to speech:
You may need to cancel an asynchronous scheduled speak command. It is possible to create a Prompt instance for each text to speak, then call the SpeakAsyncCancel for the SpeechSynthesizer instance, with the Prompt instance to be canceled as a parameter. This way, it is possible to cancel a specific text as needed. The following lines show an example that cancels "How are you?"
var prompt1 = new Prompt("Good morning!", SynthesisTextFormat.Text); var prompt2 = new Prompt("How are you?", SynthesisTextFormat.Text); synthesizer.SpeakAsync(prompt1); synthesizer.SpeakAsync(prompt2); // Cancel prompt1 -> "How are you?" synthesizer.SpeakAsyncCancel(prompt1);
You can also cancel all the scheduled asynchronous speak operations by calling the SpeakAsyncCancellAll for the SpeechSynthesizer instance, without parameters.