Speech Research with WAVE-GL

WAVE-GL, short for "Wave Generation Language," is a system designed for generating new sounds from mathematical descriptions. Our authors describe the tools they developed to create and manipulate WAV-format sound files.


November 01, 1996
URL:http://www.drdobbs.com/database/speech-research-with-wave-gl/184409996

November 1996: Petra

Speech Research with WAVE-GL

Software tools for creating and manipulating sound files

Petra Lutter, Michael Muller-Wernhart, Jurgen Ramharter, Frank Rattay, and Peter Slowik

The authors are researchers at the Technical University of Vienna. They can be contacted at [email protected].


To better understand how humans process speech, our research group at the Technical University of Vienna has taken a somewhat unique approach. Previous attempts to characterize the features of speech signals have started with recorded speech samples and used Fourier analysis to identify the most prominent frequencies in the sound. Unfortunately, speech contains highly aperiodic short structures that this technique doesn't capture.

We've taken the opposite approach, starting with mathematical descriptions of sound waveforms and asking listeners to identify them. By comparing these mathematically defined sounds, we've been able to better understand how humans process speech. Our hypothesis is that the time differences between peaks may be more important to human speech recognition than the frequency spectrum.

To carry out this research, we faced two practical problems:

We originally considered building our own digital analog converter, but the custom software requirements were daunting. We eventually settled on a SoundBlaster card, which provides CD-quality mono and stereo outputs and supports several standard data formats. We also decided to use the widely supported WAV file format to store and manipulate our synthesized sounds. In this article, we'll describe the software tools we developed to easily create and manipulate WAV-format sound files.

WAVE-GL

The system we developed to create new sounds from mathematical descriptions is called "Wave Generation Language" (WAVE-GL for short). WAVE-GL is a two-layer programming system. The inner layer specifies a mathematical function that is used by the outer layer to create a sound file. This outer layer is responsible for selecting the sampling rate, normalizing the output amplitude, and selecting the left or right channel for stereo output. (The complete source code to the WAVE-GL system is available electronically; see "Availability," page 3.)

Example 1 is a short WAVE-GL program, and Figure 1 is a graph of its output. This file demonstrates the use of the DEFFUNC command to define a function for the inner part and the GENERATE TONE command, which uses this function definition to actually produce the output .WAV file. Note that the string argument to DEFFUNC is in a different language; the outer and inner layers of our system are handled by different parsers with different grammars.

Figure 1 has more maxima on the positive side than on the negative side. This feature, needed for one of our tests, could not have been easily produced with a more-traditional system that simply combines sine waves, and it explains why our mathematical formula language supports conditionals and other flow-control operators.

Listings One and Two show a more-typical example. These files are the mathematical descriptions we developed for the sound "da." Listing One describes the initial consonant sound, and Listing Two the associated vowel. (The corresponding descriptions for "ga" are available electronically.) From these, we were able to identify a small "chirp" at the beginning that seems to characterize the difference between these consonants. These four listings are inner-layer mathematical descriptions only; using them with WAVE-GL requires including them into an outer-layer program using DEFFUNC.

Listing Three is just such a main program. This listing demonstrates DEFFUNC's ability to include mathematical descriptions from a separate file. It combines the aforementioned sounds into "da da da ga," simulating the beginning of Beethoven's Symphony #5.

An Example

To illustrate some of the capabilities of WAVE-GL, we'll explain Example 1 in more detail. This example produces a stereo signal with two tooth-shaped curves on each channel, differing in phase.

The outer layer of Example 1 has only six statements. The initial VERBOSE ON enables tracing of all statements. The NEW FILE statement must appear near the beginning of the program. It allocates a file to hold the result. After the filename are four comma-separated parameters. The first parameter is the mode, which can be either MONO or STEREO. The remaining parameters are the sample rate, resolution, and the total length of the file. WAVE-GL writes data to the file as it is generated, which allows the length of the output file to be limited only by the available disk space.

The third statement in the outer layer of Example 1 is the DEFFUNC statement, which has a single-string argument defining the function. This string is in a C-like formula language with flow-control extensions such as IF and WHILE. This formula is repeatedly evaluated for different values of the parameters t (the current time) and f (the current frequency).

The remaining three statements in the outer layer of Example 1 are the two GENERATE TONE statements (one for each channel) and the END statement. The five arguments to the GENERATE TONE statement are: the channel (LEFT or RIGHT); basic frequency (used to set the f variable); amplitude (used to scale the result); starting time of this signal; and duration of this signal. The GENERATE TONE statement is executed by repeatedly evaluating the function with suitable values for t and f, scaling the result, and storing the resulting data into the output WAV file.

A variety of sounds can be added to the file by repeatedly using DEFFUNC to define a new formula and GENERATE TONE to create a sound using that new formula, as in Listing Three.

About WAVE-GL

WAVE-GL is implemented in C++. The scanners and parsers for WAVEGEN and FUNCGEN were built with FLEX++ and BISON++, which are C++ versions of FLEX and BISON, the GNU implementations of LEX and YACC. These compilers generate scanners and parsers as C++ classes.

WAVEGEN and FUNCGEN are handled quite differently. In WAVEGEN, every statement is executed as it is read. FUNCGEN is a bit more sophisticated: When WAVEGEN executes the statement DEFFUNC, the source enclosed in double quotes is interpreted and stored as a parse tree, so that the function source need not be reread and recompiled.

The FUNCGEN object also contains two symbol tables. One is a simple array that holds variables with single-character names. The other symbol table uses a hash structure. Variables with single-character names are called "fast variables." They are not declared and can be preset before the GENERATE TONE statement appears. Other variables, stored in the hash table, must be declared before use.

Our next version of WAVE-GL will treat formulas as functions so that complex descriptions can be decomposed; see Example 2. We're also working on a parse-tree optimizer.

Future Perspectives

The next step in our research is to investigate how much a speech signal can be distorted and remain understandable. These investigations will help us to better understand how the brain decodes sound, and will also provide insight into better ways to compress speech.

To facilitate these studies, we intend to expand our software tools. A planned WAV-TOOLBOX will include a variety of filters, sound editing tools, and a tool to graphically display WAV files.

WAVE-GL's ability to describe arbitrary signals opens the door to a variety of applications. For example, we've used it to reproduce Diana Deutsch's tritone experiment (see Scientific American, October 1992). By basing our system on the portable WAV sound file format and easily-obtainable hardware, we hope that this software system will be of use to many researchers.

Example 1: Simple WAVE-GL Program.

VERBOSE ON;
NEW FILE "show.wav"  STEREO, 44100 HZ, 16 BIT, 0.022 SEC;
DEFFUNC "main
 DECLARE pi2 = 6.28318530718;
 DECLARE sam = 44100.0;
 {  /* The generated function consists of four sections of equal length  */
  b = 0.0;                          /* This block generates the basic    */
  if (t < 400.0)                    /* sine wave for the whole           */
    b = sin(pi2/f*t);               /* function (220.5 Hz).              */
  if (t < 100.0)                    /* This block adds a sine with the   */
    b = b + sin(pi2/f*t*3.0)/3.0;   /* 3-fold frequency to the basic     */
                                    /* sine in the first section.        */
  if (t > 199.9)                    /* This block adds a sine with the   */
    if (t < 300.0)                  /* 3-fold frequency to the basic     */
      b = b + sin(pi2/f*t*3.0)/3.0; /* sine in the third section.        */
  b = b*0.749;                      /* b has to be between -1 and 1      */
  RETURN a*b;
 }";
GENERATE TONE LEFT, 220.5 Hz, 1.0, 0.0 SEC, 0.02 SEC;
GENERATE TONE RIGHT, 220.5 Hz, 1.0, 0.002 SEC, 0.02 SEC;
END

Example 2: Treating formulas as functions.

DEFFUNC "f1 { ..... }";
DEFFUNC "f2 { ..... }";
DEFFUNC "f3 { return (f1+f2); }";

Figure 1: Output of Example 1.

Listing One


/* BEETD.WSP */

main

declare pi2 = 6.28318530718;
declare sam = 44100.0;
declare ff1 = 500.0;
declare ff2 = 1580.0;
declare ff3 = 2760.0;
declare suml1;
declare suml2;
declare suml3;
declare l1;
declare l2;
declare l3;
declare u1;
declare u2;
declare u3;
declare ln2;
declare f1;
declare f2;
declare f3;
declare amp;
{
  f1 = sam/ff1;
  f2 = sam/ff2;
  f3 = sam/ff3;
  ln2 = ln(2.0);

  if (t < 2)
  {
    suml1 = 0.0;
    l1 = f1;
    u1 = 1.0;
  }
  if (t > suml1 + l1)
  {
    suml1 = suml1 + l1;
    u1 = 1.0 + 8.0*suml1/sam;
    l1 = f1/u1;
  }
  b = sin(pi2/f1*(t - suml1)*u1);
  /*  print suml1;
      print t;
  */
  if (t < 2)
  {
    suml2 = 0.0;
    l2 = f2;
    u2 = 1.0;
  }
  if (t > suml2 + l2)
  {     suml2 = suml2 + l2;
    u2 = 1.0 - 380.0/79.0*suml2/sam;
    l2 = f2/u2;
  }
  c = sin(pi2/f2*(t - suml2)*u2);
  if (t < 2)
  {
    suml3 = 0.0;
    l3 = f3;
    u3 = 1.0;
  }
  if (t > suml3 + l3)
  {
    suml3 = suml3 + l3;
    u3 = 1.0 - 60.0/23.0*suml3/sam;
    l3 = f3/u3;
  }
  d = sin(pi2/f3*(t - suml3)*u3);
  if (t < 2.0)
    x = t;
  else
    x = x + 1.0;
  if (x/f > 1.0)
    x = x - f;
  if (x < 0.15*f)
    amp = 0.2 + 1.6/0.3*x/f;
  else
    amp = 9.7/8.5 - 1.6/1.7*x/f;
  /* amp = amp*exp(t/sam*12.5*ln(3.3))/3.3;
     print suml2;
     print suml3;
  */
  RETURN a*amp*(b + c*0.5 + d*0.25)/1.77;
}


Listing Two


/* BEETDA.WSP */

main

declare pi2   = 6.28318530718;
declare aus   = 0.19;
declare sam   = 44100.0;
declare corr1 = 52.735844;
declare corr2 = 23.319477;
declare corr3 = 1.282506;
declare ff1   = 700.0;
declare ff2   = 1200.0;
declare ff3   = 2400.0;
declare f1;
declare f2;
declare f3;
declare amp;
{
  f1 = sam/ff1;   f2 = sam/ff2;
  f3 = sam/ff3;

  c = sin(pi2/f1*(t + corr1));
  d = sin(pi2/f2*(t + corr2));
  e = sin(pi2/f3*(t + corr3));
  b = (c + d*0.5 + e*0.25)/1.77;

  if (t < 2.0)
    x = f/2.0;
  else
    x = x + 1.0;
  if (x/f > 1.0)
    x = x - f;
  if (x < 0.15*f)
    amp = 0.2 + 1.6/0.3*x/f;
  else
    amp = 9.7/8.5 - 1.6/1.7*x/f;
  s = t/sam;
  u = aus-s;
  if (s > aus)
    amp = amp*exp(39.0*u);
  RETURN a*amp*b;
}


Listing Three


/*  BEET.WSP DA-DA-DA-GA REALIZATION OF BEETHOVEN #5 */

VERBOSE ON;
NEW FILE "beet.wav"  MONO, 44100 HZ, 16 BIT, 1.8 SEC;

DEFFUNC FROM FILE "beetd.wsp";
GENERATE TONE MONO, 150.0 Hz, 1.0, 0.0 SEC,  0.05 SEC;
DEFFUNC FROM FILE "beetda.wsp"; GENERATE TONE MONO, 150.0 Hz, 1.0,
0.05 SEC,  0.25 SEC;

DEFFUNC FROM FILE "beetd.wsp";
GENERATE TONE MONO, 150.0 Hz, 1.0, 0.3 SEC,  0.05 SEC;
DEFFUNC FROM FILE "beetda.wsp";
GENERATE TONE MONO, 150.0 Hz, 1.0, 0.35 SEC,  0.25 SEC;

DEFFUNC FROM FILE "beetd.wsp";
GENERATE TONE MONO, 150.0 Hz, 1.0, 0.6 SEC,  0.05 SEC;
DEFFUNC FROM FILE "beetda.wsp";
GENERATE TONE MONO, 150.0 Hz, 1.0, 0.65 SEC,  0.25 SEC;

DEFFUNC FROM FILE "beetg.wsp";
GENERATE TONE MONO, 120.0 Hz, 1.0, 0.9 SEC,  0.05 SEC;
DEFFUNC FROM FILE "beetga.wsp";
GENERATE TONE MONO, 120.0 Hz, 1.0, 0.95 SEC,  0.85 SEC;

END

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.