Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

Web Development

Sounding Off with the RSX Library


Dr. Dobb's Journal July 1997: Sounding Off with the RSX Library

Steve, aka Bull, a senior design engineer for Codesign, can be reached at [email protected].


Sidebar: HRTF Sound Filtering

If you close your eyes and have a friend walk around you snapping their fingers behind you, above you, or below you, you can pinpoint where the sound is coming from, even though your eyes are closed. Working together with your ears, your brain is able to determine where the sound is coming from because sounds phase shift and attenuate differently, depending on the location of the source.

Three-dimensional sound attempts to synthesize this effect in software or hardware. This is analogous to a graphical rendering performed by 3-D graphics libraries.

Intel's Realistic Sound Experience (RSX) library was initially developed to enhance the silent 3-D world of VRML. (The VRML 2.0-compliant RSX 2.0 SDK, which can be downloaded at http://developer.intel.com/ ial/rsx/, requires Windows 95/NT, a Pentium or 33/486DX, and stereo PC soundcard. MMX is not required, but it significantly enhances performance.) As the RSX project developed, however, it was expanded to support more demanding applications of 3-D audio, such as 3-D games. These capabilities include HRTF-based (Head Related Transfer Function) localization, Doppler, reverberation, and real-time streaming.

RSX uses a set of high-level COM-based interfaces for describing a 3-D audio world that follows a model similar to that used by high-level graphical rendering engines. A typical 3-D graphical application describes a scene with objects and defines a view of the scene by specifying a camera. When components in the scene change, the application informs the rendering engine, and the application renders a new image from the defined camera's perspective.

Similarly, an application describes an audio scene with sound-source objects called "emitters" and defines an audio "point-of-view" by specifying a "listener." When emitters change position or orientation, or when the listener moves, the application renders a new audio "view" from the listener's perspective.

Like graphical objects, you place and orient sounds in 3-D space. You can adjust other properties, such as the definition of ambient and directional sound regions to achieve the desired rendering effect. When you finish defining listeners and emitters, the RSX library can begin an audio rendering of the scene. For best results, keep the audio environment consistent with the graphical environment. When objects move, both the graphical and audio rendering engines need updating. Likewise, when the view of the scene changes, the camera and listener positions need updating.

Components of 3-D Sound

Several components contribute to the rendering of 3-D sounds. "Spatialization" is the process of placing a sound in 3-D space. This is the audio counterpart to the position of an object, but rather than rendering with perspective and shading cues, audio is rendered such that sounds appear to originate from a specific location in space. RSX implements several localization algorithms, the simple one being a variation of left/right panning. The panning algorithm provides left and right placement of sound, but lacks what most would consider 3-D localization (the ability to simulate left/ right, up/down, and front/back cues). To achieve true 3-D localization, RSX uses an HRTF-based algorithm.

"Reverberation" is an effect for simulating acoustic environments, ranging from small chambers to open canyons. This is the audio equivalent of environmental effects such as fog. Like fog rendering, reverberation has a global effect on the entire scene. The RSX reverberator is based on the Schroeder reverb algorithm, which can simulate a wide range of acoustical environments. Some parameters are user adjustable, such as intensity and delay. A minimal level of reverberation greatly improves the perceived spatialization.

"Doppler" is used to simulate sounds in relative motion. The graphical counterpart to Doppler is motion blur. Rather than blurring an image, an audio signal is "blurred" through pitch shifting. As the velocity of the moving object dictates the degree of motion blur, so does it dictate the degree of pitch shifting. The classic experiment with Doppler involves a horn sounding on a moving train. As the train travels toward you, the sound waves are compressed, effectively increasing the pitch. As the train travels away from you, the sound waves are rarefied, correspondingly decreasing the pitch.

"Pitch shifting" is commonly used to model engines, where the engine's pitch is proportional to the engine RPM. Explicit pitch adjustment is the equivalent of tweaking the color of light sources in the graphical world. The Doppler effect can be viewed as an automatic pitch shifter which translates the relative velocity between an emitter and the listener. RSX implements two pitch-shifting algorithms, a low-quality/low-cost drop-and-dupe algorithm and a high-quality/higher-cost pitch shifter without the artifacts of the low-cost implementation.

Sound Emitter Model

RSX models an emitter in terms of two concentric elliptical regions. These two ellipses describe an ambient region and an attenuation region. The shape of these ellipses can be adjusted to vary the directionality of the emitter. Figure 1 illustrates the elliptical sound model.

The inner ellipse, identified by a minimum front range and minimum back range, defines an ambient region. The ambient region maintains a constant, maximum intensity relative to the given sound. The outer ellipse, identified by the maximum front range and maximum back range, defines the attenuation region. Within this region, audio is localized, and the intensity decreases logarithmically. Beyond the outer ellipse, the intensity is zero.

Audio Streaming

RSX provides interfaces to define two types of listeners and two types of emitters: direct and streaming listeners, and cached and streaming emitters.

A direct listener lets RSX output sound to a standard audio interface. A streaming listener provides access to processed buffers from RSX. Both allow you to define a listener and control its position and orientation in the audio environment.

Cached emitters are file-based sound sources. Streaming emitters require buffers and let your application process real-time data. When you create an emitter, you assign it a specific sound, much as you would assign lighting and texture properties to a polygon. You bring sounds to life by controlling when they start and stop and by adjusting their volume and tone.

Coordinate Systems and Units

RSX supports both right- and left-handed coordinate systems. The application simply informs RSX which format to use. The left-handed coordinate system can be presented in the following orientation with respect to the display: the x-y plane is the screen, with x increasing to the right, y increasing upward, and z increasing into the screen. The right-handed coordinate system uses this same x-y orientation, but specifies z increasing away from the screen.

RSX defines all of its audio using an absolute coordinate system, known as "world" or "scene" coordinates to many graphics libraries. There is no support for specifying the position or orientation of one object relative to another object. When working with high-level graphics libraries, coordinates passed to RSX should be in either world or scene coordinates. That is, you must specify an audio object's x-,y, and z-coordinates.

RSX calculates the relative relationship between objects, their position and attributes, and uses this information for rendering. In this model, distance has no units.

RSX uses the speed of sound for Doppler effects, and all time is specified in seconds. You must supply the value for the Doppler effect in generic units per second.

COM Interfaces

The RSX library consists of five COM interfaces and two abstract base classes. The abstract base classes group together the methods that are common to all emitters or all listeners. The emitters or listeners provide the abstract base class functionality through their respective interfaces. Table 1 lists the RSX interfaces.

WaveSpinner Application

While working on the RSX library, I often found myself creating example .WAV files that were spatialized in different ways to demonstrate different effects. I grew tired of trying to slap together an application just to move the sounds around. I decided to write a tool that would accept a script that described the movement and characteristics of sounds in a 3-D world. The tool, called "WaveSpinner," reads such a script, animates and spatializes all the sounds using RSX, and writes the result to a .WAV file. This tool is very useful for creating static sequences such as background noise, missile fly-bys, and other sounds that need to be spatialized, but will not change. The missile fly-by, for example, can be used many times in the course of a game, but will require minimal CPU time, as it does not need to be spatialized at run time. (The complete source code and related files for WaveSpinner are available electronically; see "Availability," page 3.)

Example Script

Listing One s a WaveSpinner script that animates some WAV files commonly found in the Windows 95 C:\Windows\Media directory. You already have these input files available if you have a default install of Windows 95.

To simplify parsing of the script file, I adopted a pseudo-Visual Basic scripting syntax. There are four types of objects: listener, environment, output (which are predeclared), and emitter (which you declare in the script). There can be any number of emitters and only one instance of each of the other objects. For each object, you can set certain variables or call a method. Variables are set with an equal sign, and these settings hold throughout the entire processing cycle; if you try to set a variable twice, the last setting is used. Methods, of course, can be called as many times as necessary. The methods closely reflect the actual RSX API, except that they have an additional first parameter that specifies time. This time parameter adds the animation element to the generated .WAV file. For example, if you specify Listing Two, WaveSpinner will smoothly move the sound from the first position to the second in three seconds. The granularity of time is 40 milliseconds, which is about the smallest practical time interval because RSX needs sufficient buffer length to "smooth" one buffer into the next. If the buffer size is too small, the buffers cannot be blended sufficiently, and the result is unwanted pops and clicks in the output.

A script file will usually declare the output settings up front, although they can be anywhere in the file since WaveSpinner collects all settings before generating output. In the example, I create a 30-second .WAV file and write it to c:\noises.wav, with the output spatialized for headphones.

One of the output settings is PeripheralType. The possible values for this setting are Headphones, Speakers, and Transaural. It is important to know what the output type is because the HRTFs and postprocessing of the .WAV file change depending on the type. If the peripheral type is set to Speakers, then the sound is not spatialized using HRTFs, but Doppler, reverb, and attenuation effects still apply. If the output type is Headphones, then HRTFs are used. If the peripheral type is Transaural, then HRTFs will be applied and the output will be fed through a cross-canceling filter, which is the necessary adjustment to get the 3-D effect from two open speakers.

Listing Three summarizes the scripting API. The README.TXT file included with WaveSpinner describes the methods in greater detail, and shows an example of each.

For the environment object, you can set the reverb volume and the speed of sound (which effects Doppler). The listener methods allow you to position and orient the listener. Emitters are declared by specifying the source file; for example, emitter MyEmitter1=c:\tidal.wav. This emitter can then be used in the script. Order of statements in the file is unimportant except in two cases: setting a parameter twice, in which case the second setting will be used; and two statements falling inside a 40-ms time period, in which case both will be called (in order).

Code Overview

WaveSpinner uses three main classes: CEmitterObject, CListenerObject, and CEnvironObject. There probably should be an output class as well, and to its nonexistence I can plead no excuse other than laziness and the fact that this code has changed several times before settling into its present form. Anyhow, the existing classes wrap the actual RSX interfaces, and incorporate additional methods to parse commands from the input script file and to animate the sounds as time marches on.

WaveSpinner makes an initial pass through the script file to gather the output settings and build a linked list of declared emitters. It then runs a second pass for environment, listener, and emitter statements.

To parse the script file, the main parsing loop reads a line from the script file, determines who the line belongs to, and then asks the owner to parse the line themselves. For example, if an incoming line starts with "listener.Position(...)", the listener object will be handed the line using the CListenerObject::ParseCommand() method. If the listener has a problem parsing it, it merely returns false, the line is sent as an error to the Wave Spinner main window, and the line is ignored.

Likewise, statements prefixed with "environment." or starting with the name of an emitter (determined by looking through a CObList of emitter objects) are passed to the ParseCommand() method of the correct object instance. Internally, the objects examine the information and construct lists (using the CKey objects described later) or tables (for calculating splines).

After the file parsing, WaveSpinner adjusts the RSX peripheral type. RSX relies on a registry setting to know what the peripheral type is. To actually adjust the RSX setting, WaveSpinner stores the old setting from the registry, writes the new value, runs the script, and then switches back to the old setting.

WaveSpinner then enters the main play loop. During each iteration of the play loop, WaveSpinner calls the SetTime() method of each object. In response to SetTime(), each object refers to its internal keys or tables and makes any changes necessary to the underlying RSX object. Once all the objects have been correctly updated, WaveSpinner requests a buffer from the RSX streaming listener. Inside in RSX, the streaming listener gets its input data from the RSX emitters and generates an output buffer. WaveSpinner then takes the buffer and writes it to the output .WAV file.

The play loop then repeats, incrementing the time by 40 ms each iteration. The loop stops when the time exceeds output.TotalPlayTime. WaveSpinner then deletes all RSX objects, closes the output file, and restores the peripheral setting in the registry.

Keys versus Splines

The listener and emitter classes both use CKey and CSplineArray objects. The environment object uses only CKey objects. While the script is running, there are two types of settings that will be adjusted while time marches on -- those that need to be interpolated and those that don't. The only things that need to be interpolated are position and orientation. Other settings, such as emitter model and reverb can change abruptly without adverse effect and therefore do not need to be interpolated.

Keys are used to perform the correct action at a specific time. Each key only knows how to do one thing -- Run() -- and each has an m_fTime associated with it. The idea is that a large collection of them can be hung in a linked list, then iterated through and executed at the proper time. When WaveSpinner calls SetTime() on the emitter object, for example, the emitter walks through its linked list of keys and calls Run() on every key whose m_fTime value is between the previous time and the current time.

I use the CSplineArray class to handle interpolation. To refresh your memory, a spline is an algorithm that allows you to smoothly interpolate a curve between a set of points. The CSplineArray class encapsulates a one-dimensional spline, and you can combine these to handle splines of higher dimensions. I have thoughtfully provided a 3DSpline class that contains -- you guessed it -- three instances of the CSplineArray class. The 3-D spline is used for position and orientation of emitters. Orientation of the listener uses a 6-D spline set, since a listener requires a 3-D up direction as well as a 3-D forward direction in order to fully specify orientation. Combining the dimensions this way is possible because you can interpolate a point in n-dimensional space by interpolating each of the dimensions independently (I don't know if this is a theoretically correct statement, but it does seem to work in practice).

I appropriated (stole) the spline algorithm from Numerical Recipes in C, Second Edition, by William H. Press and others (Cambridge University Press, 1996). I chose this implementation because it is a 2-D spline, as I will be feeding it several 2-element values, namely time and a value. Because the times will not be on nice integer boundaries, I needed a spline that could handle several widely scattered points. For example, the user may give me points like (0.0, 2.0), (0.25, 1.0), and (3.75, 0.0). With the spline algorithm I chose, I can hand in an arbitrary time and it will magically figure out what the output value should be.

Upon examining the spline code, you will see that it closely follows the code in the book. I have added code to handle some special cases and boundary conditions. Note that before the spline can be used, some tables need to be set up that contain all the input points. The information for these tables was gleaned from the information passed in through the ParseCommand() method.

Final Notes

Note that to build WaveSpinner, you need to have RSX installed (available at http://www.intel.com/ial/rsx/). A good addition to WaveSpinner would be a GUI interface that lets you drag and drop sounds into position and plot their animation sequence. Writing the script by hand is quicker than writing a program to animate the sounds, but still becomes tedious. Another enhancement would be to add animation loops. Currently, if you want a sound to rotate every five seconds for a period of 30 seconds, for example, you would need to repeat the orientation sequence six times in the script. It would be easier to enter it once and have WaveSpinner take care of the rest.


Listing One

// Set up outputoutput.FileName=c:\noises.wav
output.ChannelCount=2
output.SamplesPerSec=44100
output.BitsPerSample=16
output.TotalPlayTime=30.0f
output.PeripheralType=Headphones
environment.Reverb(0.0f, TRUE, 1.0f, 0.01f)
environment.SpeedOfSound(0.0f, 525.0f)


</p>
emitter n1=c:\windows\media\Robotz Windows Start.wav
emitter n2=c:\windows\media\Utopia Recycle.wav
emitter n3=c:\windows\media\Utopia Windows Start.wav
emitter n4=c:\windows\media\Utopia Windows Exit.wav


</p>
n1.Model(   0.0f, 1.0f, 30.0f, 1.0f, 30.0f, 1.0f)
n2.Model(   0.0f, 1.0f, 22.0f, 1.0f, 22.0f, 1.0f)
n3.Model(0.0f, 1.0f, 85.0f, 1.0f, 85.0f, 1.0f)
n4.Model(0.0f, 1.0f, 100.0f, 1.0f, 100.0f, 1.0f)


</p>
n1.ControlMedia(0.0f, RSX_PLAY, 0, 0.0f)
n2.ControlMedia(0.0f, RSX_PLAY, 0, 0.0f)
n3.ControlMedia(0.0f, RSX_PLAY, 0, 0.0f)
n4.ControlMedia(0.0f, RSX_PLAY, 0, 0.0f)


</p>
// Initial positions -------------------------------------
n1.Position(    0.0f,  50.0f, 0.0f, 0.0f)
n2.Position(    0.0f, 100.0f, 0.0f, 0.0f)


</p>
n4.Position( 0.0f,   0.0f, 5.0f,  25.0f)
n4.Position( 3.0f, 150.0f, 5.0f,  25.0f)
n4.Position( 4.0f, 150.0f, 5.0f, -25.0f)
n4.Position( 7.0f,   0.0f, 5.0f, -25.0f)
n4.Position( 8.0f,   0.0f, 5.0f,  25.0f)
n4.Position(11.0f, 150.0f, 5.0f,  25.0f)
n4.Position(12.0f, 150.0f, 5.0f, -25.0f)
n4.Position(15.0f,   0.0f, 5.0f, -25.0f)
n4.Position(16.0f,   0.0f, 5.0f,  25.0f)
environment.Reverb(17.0f, TRUE, 1.0f, 0.1f)
n4.Position(19.0f, 150.0f, 5.0f,  25.0f)
n4.Position(20.0f, 150.0f, 5.0f, -25.0f)
n4.Position(23.0f,   0.0f, 5.0f, -25.0f)
n4.Position(24.0f,   0.0f, 5.0f,  25.0f)
n4.Position(27.0f, 150.0f, 5.0f,  25.0f)
n4.Position(28.0f, 150.0f, 5.0f, -25.0f)


</p>
n3.Position( 0.0f,   0.0f, 0.0f,  25.0f)
n3.Position( 5.0f, 150.0f, 0.0f,  25.0f)
n3.Position( 6.0f, 150.0f, 0.0f, -25.0f)
n3.Position(11.0f,   0.0f, 0.0f, -25.0f)


</p>
n3.Position(12.0f,   0.0f, 0.0f,  25.0f)
n3.Position(17.0f, 150.0f, 0.0f,  25.0f)
n3.Position(18.0f, 150.0f, 0.0f, -25.0f)


</p>
n3.Position(23.0f,   0.0f, 0.0f, -25.0f)
n3.Position(24.0f,   0.0f, 0.0f,  25.0f)
n3.Position(29.0f, 150.0f, 0.0f,  25.0f)


</p>
// Start-em up!---------------------------------------------


</p>
listener.Orientation(0.0f, 1.0f, 0.0f, 0.0f, 0.0f, 1.0f, 0.0f)


</p>
// Move the listener around
listener.Position(0.0f, 10.0f, 5.0f, 0.0f)
listener.Position(2.0f, 40.0f, 5.0f, 0.0f)


</p>
// Circle once
listener.Position(4.0f, 50.0f, 5.0f, -10.0f)
listener.Position(6.0f, 60.0f, 5.0f, 0.0f)
listener.Position(8.0f, 50.0f, 5.0f, 10.0f)
listener.Position(10.0f, 40.0f, 5.0f, 0.0f)


</p>
// Circle twice
listener.Position(12.0f, 50.0f, 5.0f, -10.0f)
listener.Position(14.0f, 60.0f, 5.0f, 0.0f)
listener.Position(16.0f, 50.0f, 5.0f, 10.0f)
listener.Position(18.0f, 40.0f, 5.0f, 0.0f)


</p>
// Circle thrice
listener.Position(20.0f, 50.0f, 5.0f, -10.0f)
listener.Position(22.0f, 60.0f, 5.0f, 0.0f)
listener.Position(24.0f, 50.0f, 5.0f, 10.0f)
listener.Position(26.0f, 40.0f, 5.0f, 0.0f)


</p>
listener.Position(28.0f, 90.0f, 0.0f, 0.0f)
listener.Orientation(27.0f, 1.0f, 0.0f, 0.0f, 0.0f, 1.0f, 0.0f)
listener.Orientation(28.0f, 0.0f, 0.0f, 1.0f, 0.0f, 1.0f, 0.0f)

Back to Article

Listing Two

MyEmitter1.Position(2.0f, 0.0f, 0.0f, 0.0f)MyEmitter1.Position(5.0f, 4.0f, 4.0f, 4.0f)

Back to Article

Listing Three

// Output Settingsoutput.FileName=c:\mywave.wav
output.ChannelCount=2
output.SamplesPerSec=22050
output.BitsPerSample=16
output.TotalPlayTime=30.0f
output.PeripheralType=Headphones 


</p>
// RSX environment
environment.Reverb(time, true/false, delay, intensity) 
environment.SpeedOfSound(time, speed) 


</p>
// Listener
listener.Position(time, x, y, z) 
listener.Orientation(time, x, y, z, xUp, yUp, zUp) 


</p>
// Emitters
emitter Emitter0=c:\sounds\bird.wav


</p>
Emitter0.Position(time, x, y, z)
Emitter0.Orientation(time, x, y, z) 
Emitter0.Pitch(time, pitch) 
Emitter0.Model(time, MaxFront, MinFront, MaxBack, MinBack, Intensity) 
Emitter0.ControlMedia(time, action, loops, startPos) 
Emitter0.SetMuteState(time, true/false) 
Emitter0.SetMarkPosition(time, startpos, endpos) 

Back to Article

DDJ


Copyright © 1997, Dr. Dobb's Journal


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.