Dr. Dobb's is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Channels ▼
RSS

C/C++

Feb02: C Programming


Feb02: C Programming

Al is DDJ's senior contributing editor. He can be contacted at [email protected].


Over the past several months, I've described a C++ project that involves playback of music on a PC. The project is not just another media player. It has some special requirements related to mixing and merging audio tracks in real time. Earlier columns contain the details of the project itself. This month, I'll address one part of it — waveform playback and recording on the Win32 platform.

Up until now, my project used homegrown and imported Winamp plug-ins to experiment with various file formats and playback techniques. I discussed that approach last month. The time came, however, when I had to insert audio playback into my own application. I toyed with the notion of using existing plug-ins, but since I need recording as well as playback, I decided to bite the bullet and learn how waveform processing works with Win32.

First some basics: A digital waveform is represented in memory as a sequence of numbers. Each number is a sample. Each sample represents the relative amplitude of the waveform at a specific point in time. The more samples per second, the more accurately the digital waveform represents the original analog waveform. Each sample is represented by a signed integer. The more bits in the integer, the more accurately the digital waveform represents the dynamic range of the original waveform signal. Multiple-channel waveforms typically store the samples for each channel in adjacent integers. So, for example, a stereo signal has a left-channel sample followed by a right-channel sample to represent the two signals' amplitudes at the time of that sample. Consequently, three data values must accompany a sequence of samples for it to be correctly played through an audio system. These values specify the number of samples per second, the number of bits per sample (typically 8 or 16), also called the "sample resolution," and the number of channels.

If you need to know any more than that, including graphs and pictures that make it clearer, I recommend A Programmer's Guide To Sound, by Tim Kientzle (Addison-Wesley, 1998). I keep plugging Tim's book. I like it. I wish some columnist liked one of my books as much as I like Tim's.

There's more to know if you are dealing with files of waveforms. There are many data formats and compression algorithms for representing waveforms in data files. I'm not discussing these issues this month, and Tim's book is an excellent resource for them. My project uses a raw, pulse-coded modulation (pcm) format, which is simply a record of the samples in the sequence in which they are to be played with fixed values for sample rate, sample resolution, and number of channels. File formats are not important to this work at this time.

Several hours poring through the Win32 programming books in my library turned up only two that explain waveform processing. One is Tim's book and the other is the venerable Programming Windows Fifth Edition, by Charles Petzold, (Microsoft Press, 1999). Petzold shows how to write C programs that record and playback waveforms. Tim presents a complete waveform playback program in C++ that integrates most file formats and thoroughly queries the installed audio system to find one that matches the requirements of the current waveform. Neither of these solutions is quite what I needed, but both were instrumental in getting me started toward my goal, which is to use C++ classes that encapsulate waveform recording and playback.

By looking at Charles's and Tim's code, I learned how to apply the Win32 wave API. After that, the Win32 API reference documentation itself was all I needed.

The processes are surprisingly small, but they represent a complex mechanism. By building this mechanism into C++ classes, I can have its benefits long after having forgotten its details. Other programmers can use the classes without ever needing to understand the details. This is what encapsulation is supposed to deliver, after all.

With these classes, an application integrates waveform processing. The Win32 API includes high- and low-level waveform functions from which you choose depending on your requirements. If your program records and plays back only standard .WAV files, the high-level Media Control Interface (MCI) functions are your best bet. I used them in a simple voice recorder/playback application called "Storch" (DDJ, September 1999). For more complex audio processing, the low-level audio services are better. That's what this project uses.

The Wave Class

Playback and recording have several things in common, which are encapsulated in a common abstract base class named Wave, defined and implemented in wave.h and wave.cpp (available electronically; see "Resource Center," page 5). The Wave class implements the three values that describe how to record or playback a waveform. It contains two data buffers for overlapped wave data processing, an integer that defines the length of the buffers in bytes, two WAVEHDR structures that the API uses to define the two buffers, and a WAVEFORMATEX structure that the playback and recording APIs use to define the waveform's characteristics.

I use std::auto_ptr<char> template objects to contain the buffers, which are dynamically allocated. This usage permits the class member functions to throw exceptions without worrying about undeleted resources.

The only Wave class member function is its constructor, which initializes the waveform-defining data members and the API structures and allocates memory for two buffers of samples. The constructor's samples parameter is a count of the number of samples to be stored in each buffer. (If you make this number too small, playback makes funny noises. 4096 seems to work well.) A sample might be 1 or 2 bytes, depending on the b parameter, which is the number of bits per sample, typically 8 or 16. A logical sample might occupy one or two sample positions in the buffer depending on the nc parameter, which is the number of channels in the waveform. The constructor computes the buflen data member value from arguments passed in these parameters by derived classes. The rate parameter defines the number of samples per second that are recorded to or played back from the waveform.

The Wave class is an abstract base class; its constructor and (empty) destructor are within the members controlled by the class's protected access specifier.

Program Notes

Observe that in the Wave class auto_ptr and runtime_error are not declared in the std:: namespace. For some reason, Visual C++ does not put a lot of Standard C++ stuff in std:: by default. I rummaged through the VC++ headers and concluded that you can probably enable the feature by #defining something, but I wonder why they don't set the standard way as the default instead of the other way around. The VC++ headers come from HP and SGI, where STL was created and where a widely used version of these standard library classes and functions is maintained. The headers make extensive and effective use of compile-time conditional #define statements to fit one source-code resource to the parameters of many compilers. That warms my heart. I've always been a fan of the preprocessor and am always put off when C++ gurus denigrate it and declare it no longer necessary. Yet these #define statements keep showing up in code from respected sources, code that has to live in the real world, not just in the rarefied laboratories of gurus.

A word about how I use exceptions in these classes. A more conventional implementation would derive exception classes from the standard ones and throw objects of them. I simply throw objects of std::runtime_error and include a text argument that the catcher can display by using the std::exception::what() member function. If I was going to use this library in a larger application where many kinds of exceptions needed to be caught, I'd probably build unique exceptions for these classes to throw.

The WavePlayer Class

Waveplayer.h and waveplayer.cpp (available electronically) define and implement the WavePlayer class, which is derived from Wave to implement waveform playback with the Win32 API. An application derives a class from the WavePlayer class and instantiates an object of the derived class to playback waveforms on the PC's audio system.

WavePlayer includes an HWAVEOUT data member, which is the Win32 device identifier. It also includes data members that keep track of the object's current playback mode.

WavePlayer is an abstract base class. The derived class provides an implementation of the FillBuffer pure virtual function to pass a buffer of samples to the class to playback. WavePlayer itself provides Play, Pause, Resume, and Stop public member functions that the application calls to perform those operations. Once playback begins, the derived class's FillBuffer implementation must be prepared to put samples in the buffer whenever it is called. FillBuffer inserts into the buffer pointed to by its first parameter up to the number of samples specified in its second parameter. FillBuffer returns the number of samples inserted.

An application can implement digital signal processing on the buffer of samples by overriding WavePlayer::DSP, which has the same parameters and return value as WavePlayer::FillBuffer but may return up to twice the number of samples specified in the second parameter. WavePlayer provides an empty DSP function implementation.

The WavePlayer class's constructor depends on the API's waveOutOpen function to find an appropriate audio playback device to match the characteristics of the waveform. If it can't find one or can't initialize the one it finds (perhaps another application has the device), the constructor throws an exception. Most contemporary PCs have sound cards that support CD quality waveforms (two channels, 44100 Hz, 16 bps). Some older sound cards, particularly the 8-bit cards of yore, can't do this. If you try to instantiate an object of a class derived from WavePlayer for CD-quality sound on a PC with an older sound card, the WavePlayer constructor throws an exception. Your program should catch that exception and instantiate the object with lower quality values, which means your program must downsize the samples before passing them for playback via the FillBuffer function. Converting 16-bit samples to 8-bit samples is simply a matter of dividing each sample by 2.

The WavePlayer constructor passes to waveOutOpen the address of a callback function to be called from the API whenever a buffer has been fully written to the audio system. The callback function, WavePlayer::waveOutProc, is a static member function of the WavePlayer class.

When the application calls WavePlayer::Play(), the function makes two calls to the class's private FillBufferAndPlay function, passing the addresses of the two WAVEHDR structures. FillBufferAndPlay calls the derived class's implementations of FillBuffer and DSP, updates the WAVEHDR structure, and calls the API's waveOutWrite function.

When the API calls WavePlayer::waveOutProc, the callback function intercepts the WOM_DONE message and calls FillBufferAndPlay for whichever buffer has completed its playback.

The WaveRecorder Class

Recording is similar to playback. Waverecorder.h and waverecorder.cpp (available electronically) define and implement the abstract WaveRecorder base class, which is derived from Wave. The application derives a class from WaveRecorder to implement the pure virtual StoreData member function. The WaveRecorder class calls this function when there are data in the input buffer that need to be stored. The application's derived class takes care of doing that.

As with WavePlayer, the WaveRecorder constructor depends on the Win32 audio system to find an appropriate audio recording device on the PC. If one cannot be found or initiated, the constructor throws an exception.

WaveRecorder::Record is the only interface function other than the constructor and destructor. An application instantiates an object of its class derived from WaveRecorder and calls Record to begin recording. This function calls the API's waveInAddBuffer twice, once for each of the WAVEHDR structures. Then it calls waveInStart to begin the recording process.

WaveRecorder::waveInProc is the callback function that the Win32 API calls when a buffer is filled and ready to be stored. It calls the private SaveBuffer member function, which calls the derived class's implementation of StoreData, and then calls waveInAddBuffer to let recording proceed in the buffer.

The derived class's StoreData function accepts a buffer of samples and does whatever the application needs to do with them. My application writes them to disk in raw PCM format. To stop recording, the application simply destroys the derived WaveRecorder object.

Why Not DirectSound?

I could have used the DirectSound component of DirectX for this project. I chose not to for three reasons.

  • First, the project does not need the performance benefits that DirectSound offers.

  • Second, users would be required to install DirectX and the appropriate sound card drivers to use my program.

  • Third, I'd have to download the enormous DirectX SDK from Microsoft's web site, and I don't think my slow, fragile, boonies-bound dial-up connection would get it here before the next millennium.

If you are interested in DirectSound programming, which you should be if you are developing games, a good start is Windows 98 Programming Secrets, by Clayton Walnum (IDG Books, 1998).

Sound a Retreat

My audio project will eventually become a commercial product. That is unfortunate in one respect because it means I must temporarily abandon my voyage into the world of Linux software development and retreat to the Win32 platform. Linux users traditionally expect their applications to be free, and there aren't enough of them in my targeted marketplace to make it worth the time and effort. However, my retreat is personally fortunate in another respect because I won't be writing about Linux programming for a while and will be spared the flames and arrows of outraged Linux devotees whenever I criticize something. Nobody minds if you fire volleys at Redmond, but one mustn't take potshots at the lovable little penguin. I can already hear the huge gasps of relief around the Linux development community; for a time, at least, they are free from the critical eye and unfettered pen of at least one old curmudgeon.

One criticism I made generated several responses. I referred to the sndconfig program that came with my Linux distribution as an evil program. Readers didn't mind that I called it evil; they mostly agreed. They objected because I failed to report that sndconfig comes only on Red-Hat and derivative distributions. My apologies and congratulations to those other Linux distributions that were wise enough not to include sndconfig.

DDJ


Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.