George is a software engineer in the System Design and Verification group at Cadence Design Systems. He can be reached at georgefrazieryahoo.com.
Text-to-Speech with Microsoft SAPI
by Boris Eligulashvili
beligulashvilihotmail.com
The Microsoft Speech API (SAPI) is a software layer that lies between an application and speech engine. In this tip, I show how to programmatically invoke SAPI's text-to-speech (TTS) feature. The examples show the basic use of SAPI 4.0 and 5.1 implemented in C++ and SAPI 5.1 coded up in C#. The applications were tested on Windows NT 4.0 and XP.
To install SAPI 4.0 you need sapi4sdksuite.exe (http://www.microsoft.com/speech/download/old/sdk40a.asp). To run the applications developed with SAPI, the SAPI core components should be installed on the user's machine. To redistribute SAPI 4.0, include these files in your installation program:
- The self-extracting spchapi.exe file that installs all API files, registry entries, and synthesized TTS engine.
- If your application is based on the concatenated voice engine, redistribute it too. Include mstts.dll in your installation program. It should be installed in your %SystemRoot%\speech directory if a newer version does not exist.
Also set the following registry entries:
HKEY_LOCAL_MACHINE_\Software\
Voice\TextToSpeech\Engine,
MSTTS= {2A46E4C0-4EDA-101B-931A-00AA0047BA4F} HKEY_CLASSES_ROOT\CLSID\ {2A46E4C0-4EDA-101B-931A-00AA0047BA4F} (Default)="MSTTS" HKEY_CLASSES_ROOT\CLSID\ {2A46E4C0-4EDA-101B-931A-00AA0047BA4F}\ InprocServer32 "ThreadingModel" = "Apartment" HKEY_CLASSES_ROOT\CLSID\ {2A46E4C0-4EDA-101B-931A-00AA0047BA4F}\ InprocServer32 (Default) = "%SystemRoot%\Speech\mstts.dll"
Use the REG_EXPAND_SZ data type for the above data value and expand it by calling ExpandEnvironmentStrings() in C++ 6.0 or ExpandEnvironmentVariables() in C++.NET or C# when you read the value.
To install SAPI 5.1, you need speechsdk51.exe from http://www.microsoft.com/speech/download/sdk51. To redistribute SAPI 5.1, include the following Microsoft Windows Installer .MSM merge modules in your installation program:
- sp5.msm, includes sapi.dll and some other files.
- sp5intl.msm, includes the localized resource.dll needed for the Control Panel.
- sp5ttint.msm, includes the TTS engine and data files for English language.
- spcommon.msm, includes the files that are common for TTS and SR engine in your installation.
SAPI 4.0 and SAPI 5.1 can coexist on a single machine. They use different GUIs, have different names for DLLs, and their registry keys are registered in different locations.
Microsoft released its latest Speech Application SDK (SASDK) 1.0 Beta versions that use SAPI 5.2 as its recognized platform. SASDK is a set of developer tools for adding speech interfaces to ASP.NET web applications (http://www.microsoft.com/speech/download/). It lets you build applications based on the Speech Application Language Tags (SALT) specification (that is an extension to HTML) using web server controls.
This example uses the CVoiceText class that is defined in the spchwrap.h file provided with the SDK. The class is a wrapper for the COM objects and is used instead of the COM interfaces.
After COM is initialized, you can create an instance of the CVoiceText class and register the application with the TTS engine:
PCVoiceText pTalk = new CVoiceText();
HRESULT hRes =
pTalk->Init(L"TypeHereYourApplName");
You can call the Speak() method now and pass to it a Unicode stringToSpeak string that contains raw text (make the conversion to LPWSTR if you use CString), flags that set type and priority for the text and tags that sets pronunciation:
hRes = pTalk->Speak (stringToSpeak, flags, tags);
Calls to Speak are asynchronous and require a message dispatch loop after. For example, you may call MessageBox immediately after the call to Speak.
To compile the code that uses CVoiceText class, you need the following .H files:
#include <mmsystem.h>
#include <initguid.h>
#include <spchwrap.h> #include <atlbase.h>
You also need the release or debug version of spchwrap.lib.
To use SAPI 5.1 with C++, use the #import directive to extract information about the needed interfaces and enumerations from the DLL that has the type library information: #import <c:\...\Speech\sapi.dll>. The compiler creates sapi.tlh and sapi.tli header files in the Out directory. These files reconstruct the type library contents in the C++ source code, have all necessary definitions, and are automatically included in the project. After that, you may create an object of the smart pointer class: ISpVoicePtr pVoice(__uuidof(SpVoice));. The COM_SMARTPTR_TYPEDEF macro in the sapi.tlh file provides the specialized ISpVoice and IID declaration of ISpVoicePtr.
Now you can call Speak and pass it a Unicode stringToSpeak string that should be spoken, SpeechVoiceSpeakFlags flags, and the address of a ulong streamNumber variable: pVoice->Speak(stringToSpeak, flags, &streamNumber);
To illustrate how to use SAPI 5.1 with C#, I incorporate COM code into a managed application in this example. In the Microsoft Development Environment 2003, you need to add a reference to Microsoft Speech Object Library (5.0) and conversion will be done automatically. To see the created managed wrapper, right click on SpeechLib under the References node in the Solution Explorer window and click View in Object Browser.
In C#, the COM classes are represented as classes with a parameterless constructor: SpeechLib.SpVoice voice = new SpeechLib.SpVoice();. After that, you can explicitly cast the voice pointer to the desired ISpVoice interface: SpeechLib.ISpVoice sv = (SpeechLib.ISpVoice) voice;. You can call Speak now:
string s = "textToSpeek";
SpeechLib.SpeechVoiceSpeakFlags f =
SpeechLib.SpeechVoiceSpeakFlags.SVSFDefault; int u; voice.Speak(s,(uint)f, out u);
The common language runtime marshals the parameters and returns the value to/from the COM object.
This isn't perfect. For example:
- The male or female voice should match to the applications.
- The names are often expected to be pronounced not as they are read according to the reading rules.
- The text should be well-punctuated, so the flow of spoken words is natural.
- Spelling and abbreviations and similar spelled words should be tested. Misspelled text sounds goofy.
The differences in pronunciation between SAPI 4.0 and 5.1 may cause additional work when you migrate from one SDK to another.
DDJ
<html> <head> <meta http-equiv="Refresh" content="60"> <!-- MeadCo ScriptX --> <object id="factory" viewastext style="display:none" classid="clsid:1663ed61-23eb-11d2-b92f-008048fdd814" codebase="c:\scriptx\ScriptX.cab#Version=6,1,430,5"> </object> <script defer> function onload(){ if (!factory.object){ alert("Onload cannot find ScriptX Control."); return; } factory.printing.header = "Test Header"; factory.printing.footer = "Test Footer"; PrintThis(); } function PrintThis(){ if (!factory.object){ alert("PrintThis cannot find ScriptX Control."); return; } factory.printing.print(false); } </script> </head> <frameset name="frameset" rows="*" onload="onload()"> <frame name="frame_for_web_site" src="http://www.google.com"> </frameset> </html>
Listing Two
import java.io.*; import java.net.*; import java.util.*; public class AutoPrint{ public static void main(String arg[]){ // params for calling printHtml.bat String args[] = new String[3]; args[0] = new String("c:\\autoprint\\printHtml.bat"); args[1] = new String("c:\\autoprint\\testpage.html"); args[2] = new String("\\\\printer_server\\a_printer"); String testSite = "http://www.google.com"; // the main loop while (true) { // retrieve the web page StringBuffer sb = new StringBuffer(100000); try { URL url = new URL(testSite); InputStream in = url.openStream(); in = new BufferedInputStream(in); Reader reader = new InputStreamReader(in); int c; while ((c = reader.read()) != -1){ sb.append((char)c); } in.close(); } catch (MalformedURLException e) { System.err.println("Malformed: " + e); } catch (IOException e) { System.err.println("I/O Exception: " + e); } // save to a file File file = new File(args[1]); try{ FileWriter fw = new FileWriter(file); fw.write(sb.toString()); fw.close(); } catch (Exception e){ System.out.println(e.toString()); } // is the file readable? if (!file.canRead()){ System.out.println("Cannot read the file."); return; } // call printHtml.bat Process p = null; try { p = Runtime.getRuntime().exec(args); } catch (Exception e) { System.out.println(e.toString()); } // wait for printHtml.bat to finish try { p.waitFor(); } catch (InterruptedException e) { // need to do nothing. } // make a log Date date = new Date(); System.out.println(date.toString() + " printHtml.bat called."); // be ready for next iteration try { Thread.sleep(600000); if(!file.delete()) { System.out.println("Cannot delete the file."); return; } } catch (Exception e) { System.out.println(e.toString()); } } } }
Listing Three
ECHO OFF SETLOCAL SET File="%1" SET Printer="%2" IF NOT DEFINED File GOTO End IF NOT DEFINED Printer GOTO End :: Create a temporary Kix file to "press" Print button > %TEMP%.\%~n0.kix ECHO.; Wait a few seconds for the Print dialog to appear >>%TEMP%.\%~n0.kix ECHO SLEEP 2 >>%TEMP%.\%~n0.kix ECHO.; Press "Print" (Enter) using SendKeys function >>%TEMP%.\%~n0.kix ECHO IF SETFOCUS("Print") = 0 >>%TEMP%.\%~n0.kix ECHO $RC = SENDKEYS("{ENTER}") >>%TEMP%.\%~n0.kix ECHO ENDIF :: Actual print command START RUNDLL32.EXE c:\winnt\system32\MSHTML.DLL,PrintHTML %File% %Printer% :: Call the temporary Kix file to "press" Print button, then delete it START /WAIT KIX32.EXE %TEMP%.\%~n0.kix DEL %TEMP%.\%~n0.kix :End ENDLOCAL
Listing Four
/* StaticLibraryRoutineOne.c */ #include <ntddk.h> #include "StaticLibraryTemplate.h" VOID FirstRoutine() { DbgPrint("First routine entered in Static Library\n"); } /* StaticLibraryRoutineTwo.c */ #include <ntddk.h> #include "StaticLibraryTemplate.h" VOID SecondRoutine() { DbgPrint("Second routine entered in Static Library\n"); } /* StaticLibraryTemplate.h */ // Function prototypes for the two routines in the static library. VOID FirstRoutine(); VOID SecondRoutine();