Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Design

Tech Tips


June, 2004: Tech Tips

George is a software engineer in the System Design and Verification group at Cadence Design Systems. He can be reached at georgefrazieryahoo.com.


Text-to-Speech with Microsoft SAPI

by Boris Eligulashvili

beligulashvilihotmail.com

The Microsoft Speech API (SAPI) is a software layer that lies between an application and speech engine. In this tip, I show how to programmatically invoke SAPI's text-to-speech (TTS) feature. The examples show the basic use of SAPI 4.0 and 5.1 implemented in C++ and SAPI 5.1 coded up in C#. The applications were tested on Windows NT 4.0 and XP.

To install SAPI 4.0 you need sapi4sdksuite.exe (http://www.microsoft.com/speech/download/old/sdk40a.asp). To run the applications developed with SAPI, the SAPI core components should be installed on the user's machine. To redistribute SAPI 4.0, include these files in your installation program:

  • The self-extracting spchapi.exe file that installs all API files, registry entries, and synthesized TTS engine.
  • If your application is based on the concatenated voice engine, redistribute it too. Include mstts.dll in your installation program. It should be installed in your %SystemRoot%\speech directory if a newer version does not exist.

Also set the following registry entries:

HKEY_LOCAL_MACHINE_\Software\
Voice\TextToSpeech\Engine,
MSTTS= {2A46E4C0-4EDA-101B-931A-00AA0047BA4F} HKEY_CLASSES_ROOT\CLSID\ {2A46E4C0-4EDA-101B-931A-00AA0047BA4F} (Default)="MSTTS" HKEY_CLASSES_ROOT\CLSID\ {2A46E4C0-4EDA-101B-931A-00AA0047BA4F}\ InprocServer32 "ThreadingModel" = "Apartment" HKEY_CLASSES_ROOT\CLSID\ {2A46E4C0-4EDA-101B-931A-00AA0047BA4F}\ InprocServer32 (Default) = "%SystemRoot%\Speech\mstts.dll"

Use the REG_EXPAND_SZ data type for the above data value and expand it by calling ExpandEnvironmentStrings() in C++ 6.0 or ExpandEnvironmentVariables() in C++.NET or C# when you read the value.

To install SAPI 5.1, you need speechsdk51.exe from http://www.microsoft.com/speech/download/sdk51. To redistribute SAPI 5.1, include the following Microsoft Windows Installer .MSM merge modules in your installation program:

  • sp5.msm, includes sapi.dll and some other files.
  • sp5intl.msm, includes the localized resource.dll needed for the Control Panel.
  • sp5ttint.msm, includes the TTS engine and data files for English language.
  • spcommon.msm, includes the files that are common for TTS and SR engine in your installation.

SAPI 4.0 and SAPI 5.1 can coexist on a single machine. They use different GUIs, have different names for DLLs, and their registry keys are registered in different locations.

Microsoft released its latest Speech Application SDK (SASDK) 1.0 Beta versions that use SAPI 5.2 as its recognized platform. SASDK is a set of developer tools for adding speech interfaces to ASP.NET web applications (http://www.microsoft.com/speech/download/). It lets you build applications based on the Speech Application Language Tags (SALT) specification (that is an extension to HTML) using web server controls.

This example uses the CVoiceText class that is defined in the spchwrap.h file provided with the SDK. The class is a wrapper for the COM objects and is used instead of the COM interfaces.

After COM is initialized, you can create an instance of the CVoiceText class and register the application with the TTS engine:

PCVoiceText pTalk = new CVoiceText();
HRESULT hRes =
pTalk->Init(L"TypeHereYourApplName");

You can call the Speak() method now and pass to it a Unicode stringToSpeak string that contains raw text (make the conversion to LPWSTR if you use CString), flags that set type and priority for the text and tags that sets pronunciation:

hRes = pTalk->Speak (stringToSpeak, flags, tags);

Calls to Speak are asynchronous and require a message dispatch loop after. For example, you may call MessageBox immediately after the call to Speak.

To compile the code that uses CVoiceText class, you need the following .H files:

#include <mmsystem.h>
#include <initguid.h>
#include <spchwrap.h> #include <atlbase.h>

You also need the release or debug version of spchwrap.lib.

To use SAPI 5.1 with C++, use the #import directive to extract information about the needed interfaces and enumerations from the DLL that has the type library information: #import <c:\...\Speech\sapi.dll>. The compiler creates sapi.tlh and sapi.tli header files in the Out directory. These files reconstruct the type library contents in the C++ source code, have all necessary definitions, and are automatically included in the project. After that, you may create an object of the smart pointer class: ISpVoicePtr pVoice(__uuidof(SpVoice));. The COM_SMARTPTR_TYPEDEF macro in the sapi.tlh file provides the specialized ISpVoice and IID declaration of ISpVoicePtr.

Now you can call Speak and pass it a Unicode stringToSpeak string that should be spoken, SpeechVoiceSpeakFlags flags, and the address of a ulong streamNumber variable: pVoice->Speak(stringToSpeak, flags, &streamNumber);

To illustrate how to use SAPI 5.1 with C#, I incorporate COM code into a managed application in this example. In the Microsoft Development Environment 2003, you need to add a reference to Microsoft Speech Object Library (5.0) and conversion will be done automatically. To see the created managed wrapper, right click on SpeechLib under the References node in the Solution Explorer window and click View in Object Browser.

In C#, the COM classes are represented as classes with a parameterless constructor: SpeechLib.SpVoice voice = new SpeechLib.SpVoice();. After that, you can explicitly cast the voice pointer to the desired ISpVoice interface: SpeechLib.ISpVoice sv = (SpeechLib.ISpVoice) voice;. You can call Speak now:

string s = "textToSpeek";
SpeechLib.SpeechVoiceSpeakFlags f =
SpeechLib.SpeechVoiceSpeakFlags.SVSFDefault; int u; voice.Speak(s,(uint)f, out u);

The common language runtime marshals the parameters and returns the value to/from the COM object.

This isn't perfect. For example:

  • The male or female voice should match to the applications.
  • The names are often expected to be pronounced not as they are read according to the reading rules.
  • The text should be well-punctuated, so the flow of spoken words is natural.
  • Spelling and abbreviations and similar spelled words should be tested. Misspelled text sounds goofy.

The differences in pronunciation between SAPI 4.0 and 5.1 may cause additional work when you migrate from one SDK to another.

DDJ



Listing One

<html>
<head>
<meta http-equiv="Refresh" content="60">
<!-- MeadCo ScriptX -->
<object id="factory" viewastext  style="display:none"
  classid="clsid:1663ed61-23eb-11d2-b92f-008048fdd814"
  codebase="c:\scriptx\ScriptX.cab#Version=6,1,430,5">
</object>
<script defer>
function onload(){
  if (!factory.object){
    alert("Onload cannot find ScriptX Control.");
    return;
  }
  factory.printing.header = "Test Header";
  factory.printing.footer = "Test Footer";
  PrintThis();
}
function PrintThis(){
  if (!factory.object){
    alert("PrintThis cannot find ScriptX Control.");
    return;
  }
  factory.printing.print(false);
}
</script>
</head>
<frameset name="frameset" rows="*" onload="onload()">
  <frame name="frame_for_web_site" src="http://www.google.com">
</frameset>
</html>


Listing Two
import java.io.*;
import java.net.*;
import java.util.*;

public class AutoPrint{
  public static void main(String arg[]){
    // params for calling printHtml.bat
    String args[] = new String[3];
    args[0] = new String("c:\\autoprint\\printHtml.bat");
    args[1] = new String("c:\\autoprint\\testpage.html");
    args[2] = new String("\\\\printer_server\\a_printer");
    String testSite = "http://www.google.com";
    // the main loop
    while (true) {
      // retrieve the web page
      StringBuffer sb = new StringBuffer(100000);
      try {
        URL url = new URL(testSite);
    InputStream in = url.openStream();
    in = new BufferedInputStream(in);
    Reader reader = new InputStreamReader(in);
    int c;
    while ((c = reader.read()) != -1){
      sb.append((char)c);
    }
    in.close();
      } catch (MalformedURLException e) {
    System.err.println("Malformed: " + e);
      } catch (IOException e) {
    System.err.println("I/O Exception: " + e);
      }
      // save to a file
      File file = new File(args[1]);
      try{
    FileWriter fw = new FileWriter(file);
    fw.write(sb.toString());
    fw.close();
      } catch (Exception e){
    System.out.println(e.toString());
      }
      // is the file readable?
      if (!file.canRead()){
    System.out.println("Cannot read the file.");
    return;
      }
      // call printHtml.bat
      Process p = null;
      try {
    p = Runtime.getRuntime().exec(args);
      } catch (Exception e) {
    System.out.println(e.toString());
      }
      // wait for printHtml.bat to finish
      try {
    p.waitFor();
      } catch (InterruptedException e) {
        // need to do nothing.
      }
      // make a log
      Date date = new Date();
      System.out.println(date.toString() + " printHtml.bat called.");
      // be ready for next iteration
      try {
    Thread.sleep(600000);
    if(!file.delete()) {
      System.out.println("Cannot delete the file.");
      return;
    }
      } catch (Exception e) {
    System.out.println(e.toString());
      }
    }
  }
}


Listing Three
ECHO OFF
SETLOCAL
SET File="%1"
SET Printer="%2"
IF NOT DEFINED File GOTO End
IF NOT DEFINED Printer GOTO End
:: Create a temporary Kix file to "press" Print button
> %TEMP%.\%~n0.kix ECHO.; Wait a few seconds for the Print dialog to appear
>>%TEMP%.\%~n0.kix ECHO SLEEP 2
>>%TEMP%.\%~n0.kix ECHO.; Press "Print" (Enter) using SendKeys function
>>%TEMP%.\%~n0.kix ECHO IF SETFOCUS("Print") = 0
>>%TEMP%.\%~n0.kix ECHO   $RC = SENDKEYS("{ENTER}")
>>%TEMP%.\%~n0.kix ECHO ENDIF

:: Actual print command
START RUNDLL32.EXE c:\winnt\system32\MSHTML.DLL,PrintHTML %File% 
%Printer%
:: Call the temporary Kix file to "press" Print button, then delete it
START /WAIT KIX32.EXE %TEMP%.\%~n0.kix
DEL %TEMP%.\%~n0.kix
:End
ENDLOCAL


Listing Four
/* StaticLibraryRoutineOne.c */
#include <ntddk.h>
#include "StaticLibraryTemplate.h"

VOID
FirstRoutine()
{
    DbgPrint("First routine entered in Static Library\n");
}
/* StaticLibraryRoutineTwo.c */
#include <ntddk.h>
#include "StaticLibraryTemplate.h"

VOID
SecondRoutine()
{
    DbgPrint("Second routine entered in Static Library\n");
}
/* StaticLibraryTemplate.h */
// Function prototypes for the two routines in the static library.

VOID FirstRoutine();
VOID SecondRoutine();


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.