Channels ▼
RSS

Embedded Systems

Real-Time Sound Processing

Source Code Accompanies This Article. Download It Now.


Oct98: Real-Time Sound Processing

Randall is the coauthor of Autoscore, a pitch-to-MIDI converter for Mac OS and Windows published by Wildcat Canyon Software. Randall can be contacted at randall@wildcat.com.


Sidebar: DirectSound: The Future of Recording on Windows
Sidebar: Compensating for Poor Hardware Clipping
Sidebar: Digital Audio Basics


Most general-purpose operating systems provide a low-level recording API that makes sophisticated audio applications possible. Such applications typically record and process an audio signal from a microphone or other source in real time. In this article, I'll discuss general issues in real-time sound processing, how to do real-time audio recording for Macintosh and Windows, and how to encapsulate the operating system differences behind a cross-platform layer. Although I have been writing sound-processing programs since 1985, most of my experience for this article comes from developing Autoscore, a pitch-to-MIDI converter program for Macintosh and Windows. If you are unfamiliar with digital audio, see the accompanying sidebar entitled "Digital Audio Basics."

Conceptually, real-time audio processing is simple. You tell the device what type of signal you want (sample size, sample rate, mono or stereo) and how it should notify you whenever a new sample comes in. Then you tell it to begin recording and you start processing the samples. In practice, however, real-time sound processing is not this easy. It exposes limitations in the underlying operating system, hardware, and driver software that are not apparent from simply reading the documentation.

A General Discussion of Real-Time Sound Processing

Real-time sound processing always involves some sort of hardware device, controlled by its driver software, and hidden behind an operating-system abstraction layer. Like any system resource, this device must be opened before use and closed when finished. Once open, you can query the device to determine what it can do, and you can configure it to record samples in the format that you need. Prior to actually recording, you give the device a pointer to a function it can call whenever new samples have been digitized. This function is called an "interrupt routine," because it is called in response to an interrupt created by the sound-input hardware.

The interrupt routine does the actual sound processing. It always receives (directly or indirectly):

  • A pointer to a structure describing the current recording.
  • A pointer to the new samples (the input buffer).
  • The input buffer size.
  • A user-specified value.

Because the overhead for calling a function for each and every sample is so high, a device usually collects samples in an input buffer until it is full, and then it calls the interrupt routine to process all of the samples at once. How many samples does it collect? That depends. Does it call the interrupt routine regularly? Sometimes.

Part of the difficulty of real-time sound processing has to do with the interrupt routine. Because it is called at interrupt time, it cannot use the operating system like other code. Among other things, it typically cannot do graphics or allocate memory. Thus a mechanism must be developed for it to communicate with the noninterrupt-time main thread of the program. Generally the interrupt routine stores its samples or results in some shared buffer and signals the main thread that new data is available. This is multithreading, and all the synchronization issues contained therein apply here as well. Also, the operating system may make assumptions about how quickly the interrupt routine should execute. These assumptions may or may not be documented. To be safe, the interrupt routine should copy the samples to private buffers where another thread can process them.

Unless you know ahead of time the exact versions of the operating system, hardware, and driver software, do not assume that the interrupt routine will be called regularly. It is often called several times in rapid succession and then not at all for a while. While this shouldn't interfere with recording, it can prevent smooth data flow through a more sophisticated sound processing system.

While, in general, real-time audio recording on any system can be straightforward, there are certain insufficiently documented issues that can influence how good a recording you get. Perhaps the biggest problem is quirky hardware. Some microphones may work better than others. You may get different levels for the same signal if you record it in mono or stereo. Some hardware cannot record and play simultaneously. Some distort loud signals severely. Some introduce excessive noise or other artifacts. (See the accompanying text box "Compensating for Poor Hardware Clipping" for other hardware anomalies.)

Such tight coupling to the hardware is an unfortunate reality of real-time sound processing, even on modern device-independent operating systems. Often, however, the hardware is flexible enough to work around some of its limitations. For instance, if you get a stronger signal when recording in stereo than mono, and you want a mono signal, then record in stereo and throw out one of the channels. Also, some systems let you programmatically increase the input gain to boost a weak signal. Since any two systems are rarely identical, the best way to achieve consistent performance is to test on as wide an array of machines as you can, and write your program to be as robust as possible.

Recording on the Macintosh

Real-time sound processing on the Macintosh is close to the ideal just discussed. You open a device with SPBOpenDevice, make several calls to SPBGetDeviceInfo and SPBSetDeviceInfo to configure the hardware, and finally call SPBRecord to begin recording, passing it a pointer to your interrupt routine. When you are finished recording, call SPBStopRecording and then SPBCloseDevice. A Macintosh program that does real-time sound processing is available electronically; see "Resource Center," page 3.

Sound Processing on Windows

Unlike the Macintosh, opening and configuring a device in Windows is done in one step, through a call to waveInOpen, which takes six parameters:

  • A pointer to a device handle (HWAVEIN) that waveInOpen will initialize for you.
  • The index of the device you want to record from (use the constant WAVE_MAPPER (-1) to let Windows pick a device for you).
  • A pointer to a WAVEFORMATEX structure that contains the sample size, sample rate, and number of channels.
  • A flag indicating how you want to be notified when new samples have been recorded (for example, through a window, thread, or callback function).
  • A reference to the object that is to be notified (a window handle, thread ID, or function pointer).
  • A 32-bit user-specified value that is passed to the notification object.

Windows gives you the option of using a window as the notification object, thereby not having an interrupt routine at all. While this can be easier to program, such a mechanism is not responsive enough for high-performance applications. In any case, if waveInOpen succeeds, then you are ready to record -- almost. Unlike Mac OS, which provides the input buffer for you, Windows requires you to supply the input buffer yourself. This is the tricky part. If you want to record for, say, five seconds and then stop, you must:

  1. Allocate five seconds worth of sample memory (a buffer).

  2. Save the address and size of the buffer in a WAVEHEADER structure.

  3. Let waveInPrepareHeader lock down the buffer's physical address so the sound card can write to it.

  4. Call waveInAddBuffer to tell Windows it can use the buffer.

  5. Finally, call waveInStart to actually begin recording.

When the buffer becomes full, Windows notifies you and you can save or process the buffer as you please. This is sufficient if you want a simple one-shot recording. What if you want to record indefinitely? Since you can't provide an infinite buffer, you have to recycle buffers once they have been filled. Windows maintains a pool of available buffers that it uses to store samples when the current buffer becomes full. You can allocate several buffers and add them to the pool prior to recording (steps 1 though 4). When a buffer gets full, Windows notifies the object specified in waveInOpen, and you can process the buffer's samples while Windows fills another buffer that you have already allocated.

One of the things you can do when processing the buffer's samples is recycle the buffer. You simply call waveInPrepareHeader and waveInAddBuffer again. This is essentially a double-buffering scheme. Unfortunately, double-buffering is not good enough for Windows real-time sound processing. The problem is that you have no idea when Windows will notify you that your buffer has been filled. If you have two 10-ms buffers and Windows notifies you 30 ms after the first buffer is filled, then you will lose data, and the only way you can tell is by noticing that you have recorded less data than you should have.

The solution is to give Windows enough buffers so that it can safely record for a long time if it doesn't give you a chance to recycle your buffers. How long? I have seen 133-MHz Pentium systems running 32-bit apps under Windows 95 not receive notifications for at least 0.75 seconds. To be safe, I'd allocate at least 1.5-2 seconds of buffer.

Another issue is how you go about recycling the buffers. If you are being notified through a window function, you can simply call waveInPrepareHeader and waveInAddBuffer again. Of course, how often your window function is called is up to Windows. If you use a callback function, Windows can call it more promptly, but since it is called at interrupt time, you cannot use most of the operating system, including waveInAddBuffer! This means you can process the samples promptly, but recycling will have to wait until a window function can be called. (Using a thread may be your best bet, but I have little experience with this since Autoscore has had to remain compatible with 16-bit systems, where threads are not available.)

Another question is how big the buffers should be. The smaller the buffer, the more quickly it will be filled and the more often you can process its samples and, in theory, the more responsive your program will be. Experiments done at Wildcat Canyon Software indicated that going from 64 four-KB buffers to 256 one-KB buffers on one machine resulted in the interrupt routine being called more frequently but still regularly. However, using even smaller buffers did not help. Rather than deliver the data in an even rhythm, Windows would call the interrupt routine several times in rapid succession, and then not call it for a long time, and then repeat. It was as if there were a hardware input buffer that was filling on its own, and when it was filled, Windows would call our interrupt routine until it was empty, and then wait until it filled again. In light of this behavior and the lack of an API describing it, you must write your programs to either not be affected by it or to measure and compensate for it at run time.

Listings One and Two is the C++ WindowsRecorder class that can be used for recording under Windows. WindowsRecorder can use either a function or a window as the notification object. A window function is still needed for recycling the buffers. The InputBuffer class (available electronically) makes Windows input buffers easier to use.

Windows Caveats

The biggest problems facing Windows audio programs are configuring the sound card and dealing with its limitations. Windows 3.1 provides no API for selecting the audio source on the sound card or its input level; you must rely on the mixer software that (hopefully) comes with the sound card. Windows 95 provides an API through its Audio Mixer services, but unless you know exactly how the sound card will be used, it's better to let users adjust these settings manually through the user interface provided by the operating system. A simple test to determine if your system is configured properly is this: If Sound Recorder (SOUNDREC.EXE or SNDREC32.EXE) won't record, your code won't either.

The main limitation in sound cards is that certain subsystems cannot be used simultaneously. For example, if the card is half-duplex (as many of them are), it cannot record and play digital audio at the same time. These cards generally do not announce this on their packaging. Some cards are capable of full-duplex operation, but their default installation is half-duplex to conserve IRQ and DMA channels. The only "API" for determining whether or not a card can record and play simultaneously is to try it and see if you get an error. If your application is only concerned with wave input, this shouldn't be a problem, right? Wrong. Most applications typically begin wave input in response to a menu command. If the user has turned on menu sound effects in Windows, a sound is usually still playing when your command processor is trying to open the wave input device. You get an error. Which error? I have observed the following errors in this scenario: MMSYSERR_NOTENABLED (3), MMSYSERR_ALLOCATED (4), MMSYSERR_ NOTSUPPORTED (8), and WAVERR_ BADFORMAT (32). Because the wave device will be free once the menu sound stops playing, if you get one of these errors you shouldn't give up yet. Wait a few milliseconds and try again. See my function TryToOpenWaveInputDevice (see Listing Two). Also, see the accompanying text box "DirectSound: The Future of Recording on Windows" for a snapshot of what lies ahead.

Creating a Cross-Platform Layer

Complex operating-system APIs arise to satisfy the needs of many programmers working on diverse applications. Though they are necessarily complex, these complexities can be hidden behind a well-designed façade. My goal with Autoscore was to encapsulate the low-level audio recording APIs of Mac OS and Windows into an object with a simple interface specifically designed for real-time sound processing. The result is XWaveInputDevice, a C++ class whose interface consists of three main routines: IsHardwareAvailable, Start, and Stop; the complete source is available electronically. XWaveInputDevice is an abstract class; it has two derived classes that implement it on MacOS and Windows. The Macintosh code is straightforward. A subtle aspect of the Windows code is that it creates a hidden window so that it can do its buffer recycling itself, without requiring modifications to the main application window function.

Conclusion

Though it takes some work to get there, cross-platform real-time sound processing can be made easy to program. Because they deal with actual hardware devices, real-time sound processing programs are unfortunately tightly coupled to those devices, even though they may be "device independent" by the standards of the operating system. Nevertheless, armed with knowledge of what to expect from the myriad sound input configurations out there, it is possible to create interesting, useful applications that feature real-time sound processing on consumer-oriented computers.

DDJ

Listing One

// WindowsRecorder.h -- A class for recording continuously on Windows.// by Randall Cook. Copyright (C) 1998 Randall Cook. All Rights Reserved.


</p>
#ifndef WINDOWSRECORDER_H
#define WINDOWSRECORDER_H


</p>
#include <mmsystem.h>
#include "InputBuffer.h"


</p>
// interface for sample processing class
class SampleProcessor {
    public:
    virtual void ProcessSamples(unsigned char* buffer, long length) = 0;
};
// the sound recording class
class WindowsRecorder {
    friend void CALLBACK MyInterruptRoutine(HWAVEIN hwi, UINT uMsg, 
                DWORD dwInstance, DWORD dwParam1, DWORD dwParam2);
    private:
    HWND hostWindow;            // the window that gets the recycle messages
    HWAVEIN device;             // the wave input device
    bool recording;             // a flag indicating recording status
    bool useInterruptRoutine;   // a flag indicating recording mode
    InputBuffer** bufferList;   // an array of InputBuffer*
    int bufferCount;            // number of buffers in bufferList
    UINT recycleBufferMessage;  // the recycle buffer message
    SampleProcessor* processor; // an object that can process samples
    
    bool ErrorSuggestsBusyDevice(MMRESULT err);
    MMRESULT TryToOpenWaveInDevice(int devIndex, WAVEFORMATEX& wf,
                    DWORD callbackObject, DWORD callbackObjectType);
    void PrepareBuffers();
    void UnprepareBuffers();
    void ProcessBufferData(InputBuffer* ib);
    void RecycleBuffer(InputBuffer* ib);
    
    protected:
    void SetRecordingParameters(WAVEFORMATEX& wf);
    
    public:
    WindowsRecorder(int bufCount,int bufSize,SampleProcessor* sp,HWND host);
    ~WindowsRecorder();
    MMRESULT Start(bool interrupt);
    void Stop();
    bool IsRecycleBufferMessage(UINT message);
    LRESULT ProcessRecycleBufferMessage(LPARAM lParam);
    bool IsRecording() { return recording; }
};
#endif


</p>

Back to Article

Listing Two

// WindowsRecorder.cpp -- A class for recording continuously on Windows.// by Randall Cook. Copyright (C) 1998 Randall Cook. All Rights Reserved.


</p>
#include "WindowsRecorder.h"


</p>
void CALLBACK MyInterruptRoutine(HWAVEIN hwi, UINT uMsg, 
    DWORD dwInstance, DWORD dwParam1, DWORD dwParam2)
// Process the samples, but don't recycle the buffer.
{
    if (uMsg == WIM_DATA) {
        WindowsRecorder* winRec = (WindowsRecorder*)dwInstance;
        // Extract the InputBuffer pointer from the WAVEHDR structure.
        LPWAVEHDR hdr = (LPWAVEHDR)dwParam1;
        InputBuffer* ib = (InputBuffer*)hdr->dwUser;
        if (ib)
            winRec->ProcessBufferData(ib);
        // Post a recycle buffer message to the main window. Put the WAVEHDR 
        // pointer in lParam so the message can be handled like MM_WIM_DATA.
        PostMessage(winRec->hostWindow, winRec->recycleBufferMessage, 
                                                              0, dwParam1);
    }
}
WindowsRecorder::WindowsRecorder(int bufCount, int bufSize,
                                       SampleProcessor* sp, HWND host)
{
    hostWindow = host;
    device = 0;
    useInterruptRoutine = false;
    processor = sp;
    recording = false;
    
    bufferList = new InputBuffer*[bufCount];
    bufferCount = bufCount;
    
    for (int i = 0; i < bufferCount; i++)
        bufferList[i] = new InputBuffer(bufSize);
        
    recycleBufferMessage = RegisterWindowMessage("WindowsRecorderRecycle");
}
WindowsRecorder::~WindowsRecorder()
{
    if (IsRecording())
        Stop();
    for (int i = 0; i < bufferCount; i++)
        delete bufferList[i];
    delete[] bufferList;
}
void WindowsRecorder::SetRecordingParameters(WAVEFORMATEX& wf)
{
    // In production code, use waveInGetDevCaps to query the device to ensure 
    // it can do what we want it to. Here, only set the sample rate, sample 
    // size, and number of channels to 44100 Hz, 16 bit, mono.
    
    wf.wFormatTag = WAVE_FORMAT_PCM;    // Standard sample format.
    wf.nSamplesPerSec = 44100L;         // 44100 Hz.
    wf.wBitsPerSample = 16;             // 16 bits.
    wf.nChannels = 1;                   // 1 channel (mono).
    wf.nBlockAlign = 2;                 // 2 bytes per sample.
    wf.nAvgBytesPerSec = wf.nSamplesPerSec
                         * wf.nBlockAlign
                         * wf.nChannels;
    wf.cbSize = 0;
}
bool WindowsRecorder::ErrorSuggestsBusyDevice(MMRESULT err)
// these errors (3, 4, 8, and 32) often come when the device is busy
{
    return err == MMSYSERR_NOTENABLED || err == MMSYSERR_ALLOCATED ||
           err == MMSYSERR_NOTSUPPORTED || err == WAVERR_BADFORMAT;
}
MMRESULT WindowsRecorder::TryToOpenWaveInDevice(int devIndex, 
            WAVEFORMATEX wf, DWORD callbackObject, DWORD callbackObjectType)
// Prepare to try to open the device several times if necessary. 
// Sets device, and returns the error code from the open call.
{
    MMRESULT err = 0;
    // Prepare to try to open device several times if necessary. If it works
    // the first time, we'll break out. We try for 3.5 seconds since that is 
    // a reasonable amount of time to try to open device. 0.25 seconds is a 
    // reasonable time to wait between tries, since user will have to wait at 
    // most this amount of time for things to start. The 14 below is 3.5/0.25.
    for (int count = 0; count < 14; count++) {
        err = waveInOpen(&device, devIndex, &wf, callbackObject, 
                                       (DWORD)this, callbackObjectType);
        if (ErrorSuggestsBusyDevice(err))
            Sleep(250);
        else
            break;
    }
    return err;
}
MMRESULT WindowsRecorder::Start(bool interrupt)
{
    MMRESULT err;
    // Describe the desired recording format.
    WAVEFORMATEX wf;
    SetRecordingParameters(wf);
    // Open the default device.
    useInterruptRoutine = interrupt;    // Remember which mode we are using.
    if (useInterruptRoutine) {
        err = TryToOpenWaveInDevice(WAVE_MAPPER, wf,
              (DWORD)MyInterruptRoutine, CALLBACK_FUNCTION);
    } else {
        err = TryToOpenWaveInDevice(WAVE_MAPPER, wf,
              (DWORD)hostWindow, CALLBACK_WINDOW);
    }
    if (err == 0) {
        PrepareBuffers();               // Prepare supply of input buffers.
        err = waveInStart(device);      // Actually begin recording.
        if (err == 0) {                 // No errors: we are recording.
            recording = true;
        } else {                        // Clean up on errors.
            UnprepareBuffers();
            waveInClose(device);
            device = 0;
        }
    }
    return err;
}
void WindowsRecorder::Stop()
{
    // Since the goal of this function is to stop recording and
    // close the device, it is reasonable to ignore errors.
    if (IsRecording()) {
        waveInReset(device);
        waveInStop(device);
        UnprepareBuffers();
        waveInClose(device);
        device = 0;
        recording = false;
    }
}
void WindowsRecorder::PrepareBuffers()
{
    for (int i = 0; i < bufferCount; i++)
        bufferList[i]->Prepare(device);
}
void WindowsRecorder::UnprepareBuffers()
{
    for (int i = 0; i < bufferCount; i++)
        bufferList[i]->Unprepare();
}
void WindowsRecorder::ProcessBufferData(InputBuffer* ib)
{
    if (ib->GetDataLen() && processor) {
        // Process the samples here. There are ib->GetDataLen() bytes
        // of data at ib->GetData().
        processor->ProcessSamples(ib->GetData(), ib->GetDataLen());
    }
}
void WindowsRecorder::RecycleBuffer(InputBuffer* ib)
{
    // Only recycle buffers that have been filled and are not being 
    // flushed out.
    if (ib->ContainsValidData() && device != 0) {
        // If using a window as the notification object, you must process the
        // samples now. If using a function as notification object, samples 
        // have already been processed in interrupt routine and don't need to 
        // be processed again here.
        if (!useInterruptRoutine)
            ProcessBufferData(ib);
        // Ihis does the actual recycling.
        ib->Prepare(device);
    }
}
bool WindowsRecorder::IsRecycleBufferMessage(UINT message)
{
    return message == MM_WIM_DATA || message == recycleBufferMessage;
}
LRESULT WindowsRecorder::ProcessRecycleBufferMessage(LPARAM lParam)
{
    // Extract the InputBuffer pointer from the WAVEHDR structure.
    LPWAVEHDR hdr = (LPWAVEHDR)lParam;
    InputBuffer* ib = (InputBuffer*)hdr->dwUser;
    if (ib)
        RecycleBuffer(ib);
    return 0;
}
/*
// modify your main window procedure thus:
WindowsRecorder* gRecorder;     // This doesn't have to be a global...
LRESULT CALLBACK MainWindowProc(HWND w, UINT message, WPARAM wParam,
                                                             LPARAM lParam)
{
    // ...
    if (gRecorder->IsRecycleBufferMessage(message)) {
        return gRecorder->ProcessRecycleBufferMessage(lParam);
    }
    // ...
}
*/

Back to Article


Copyright © 1998, Dr. Dobb's Journal

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 

Video