There's a lot of noise being made about the need for software support for sound. John examines the options available to PC programmers, then presents a sound driver that produces high-quality digitized sound without requiring any extra hardware.
March 01, 1993
URL:http://www.drdobbs.com/architecture-and-design/examining-pc-audio/184408962
Copyright © 1993, Dr. Dobb's Journal
This article contains the following executables: AUDIO.ARC
John is the president of THE Audio Solution and an independent software developer. He produces interactive multimedia education products for Milliken Publishing Company, is the author of 688 Attack Sub for Electronic Arts, and is establishing an industry standard for digitized sound on MS-DOS machines. John can be contacted at 747 Napa Lane, St. Charles, MO 63303.
Over the past few years, PCs have made extensive use of graphical user interfaces. With Windows 3.1, which provides an API for producing digitized sound and music output at system level, there's now a standardized way to incorporate sound and music into the user interface. But in their base configuration, IBM-compatible PCs don't offer sound capabilities, other than being able to produce a square wave on a dinky speaker. Consequently, a plethora of third-party sound boards, like those described in Table 1, have stepped into the gap. Overall there are more than two million third-party sound solutions from over 20 different vendors--each with its own hardware specification. And therein lies the problem.
Programmers who want to add broad-based sound support to their software really have only two choices: use a third-party API to talk to the wide array of hardware, or purchase every single third-party audio device they want to support and write hardware-specific code for each one. The second option is not only impractical, it isn't even possible under Windows. Talking directly to a piece of sound hardware under Windows generates an exception error. Under Windows, you pretty much have to use the provided system calls. Under DOS, however, a variety of practical solutions are available.
Generally speaking, there are three ways to make sound or music with a computer: synthesizing, digitizing, and using MIDI. Synthesized effects are created by modulating a waveform like a sine wave to produce unique sounds. The results range from simple little ditties to sophisticated music. Still, producing sound in this fashion is extremely complex and very difficult, if not impossible, for programmers who don't have musical backgrounds.
Digitized sound is the same as that found on a compact disc. It's simply recorded audio from a microphone which is analog-to-digital (A/D) converted at some fixed sampling rate. The quality of the sound is controlled by the bit resolution of the samples (16 bit for compact disc) and the recording frequency (44 KHz for compact disc). Most third-party audio boards capable of dealing with digitized sound use 8-bit resolution, and audio is generally recorded anywhere between 5 and 22 KHz. The advantage of digitized sound is that you can simply record whatever you want. Among its disadvantages: Not all third-party boards support digitized sound; digitized sound takes up huge amounts of data storage (often impractical for anything other than CD-ROM); and digitized-sound playback is CPU and memory intensive.
The third way to deal with audio is through MIDI, a pseudostandardized file format for music. MIDI allows you to specify multiple channels of sound simultaneously, each with its own instrument. Each channel contains events such as "play this note at this time." For programmers, this is great. You hire a musician to compose some music, and tell the black box to play it. Still, there are problems. First of all, MIDI isn't really a standard, because each MIDI device responds differently to channel assignments and patch changes. (On one MIDI device a flute might come out sounding like a violin, or an entire channel might not be played because that device doesn't support that particular channel.) Roland Corp.'s Sound Canvas (a "general" MIDI device) addresses this problem by providing a base patch set (description of available instruments) that works across a wide variety of MIDI systems. With it, you can orchestrate a single piece of MIDI music as general MIDI and expect it to play properly under many different hardware configurations.
Is MIDI the final solution? Unfortunately not, since few people actually own MIDI devices at this time. This seems to be changing, however, as a number of third-party manufacturers are readying sound devices that provide general MIDI support.
Another approach, MIDI emulation, is seen in Windows 3.1, where MIDI commands are emulated in software to produce music on non-MIDI devices like the SoundBlaster or Adlib card, utilizing their on-board FM synthesis chip. There are a number of other emulation solutions, some better than others in overall musical quality.
What API solutions are available to you if you want to add sound and music to your application? Under Windows, just use the system calls provided under Windows 3.1. Another option is Miles Design's WAIL (Windows Audio Interface Library), which provides MIDI emulation for Adlib-compatible devices through a Windows DLL.
Under DOS, there are a number of options. The first is to purchase the SDKs sold by each sound-board manufacturer. If you want to program the audio hardware directly, this is a must. However, the API libraries provided are often impractical for commercial products, and, of course, only support their respective hardware platforms.
Looking again at Table 1, note that some manufacturers provide MIDI pass-through on their sound cards. This allows the user to connect an external MIDI device through this MIDI port. I've only noted sound cards that provide on-board MIDI support. In the MIDI API section of the table, most sound devices do not have on-board MIDI capability, but do support FM synthesis (through YM3812 or YM OPL3 chipsets). A number of third-party APIs allow software emulation of MIDI commands for these systems:
Having said all this, I now turn to a sound driver that will produce high-quality digitized sound on any PC without requiring any extra hardware. This driver installs as a TSR and has a simple API to produce digitized sound output on the internal PC speaker. You simply fill a data structure that describes where in memory the desired sound effect is, the length of the sound sample, and the frequency that you want played back. The AX register contains a 688H, the DS:SI registers point to this data structure, and you perform an INT 66H. Your application should first check to see if the sound driver has been loaded in memory. You can do this via a call to CheckIn, which is located in DIGPLAY.ASM (see Listing One, page 107). The sound passed is simply raw 8-bit PCM data. A number of audio file formats (WAVE and VOC, for example) typically contain a header followed by the raw data itself. The utility SNDCONV (provided electronically--see "Availability," page 5) strips unnecessary header information off of sound files to be played back with the IBM-SND driver.
Most IBM compatibles come with an extremely poor-quality speaker that's positioned ineffectively and has no volume control. Some PC manufacturers don't even provide a speaker, merely a peizeolectric wafer that produces a barely audible sound. Originally, the only sound produced on an IBM computer was "beep." Later, music composed of simple tones was incorporated into some entertainment products. Recently, relatively good-quality digitized sound has been produced out of the IBM speaker. However, none of these approaches can overcome the poor quality of the speaker itself.
There's only one way to make sound on the IBM. Under software control, you can apply 5 volts to the internal speaker, then turn the speaker off. Turning the speaker on causes it to travel out until it hits its maximum position; turning it off causes it to come back to rest. There's no way to directly control volume; this would involve causing the speaker to rest in intermediate positions by applying different voltage levels.
There are two ways to turn the IBM speaker on and off: either directly toggling bit 1 at port address 61H, or tying the speaker to Timer channel 2 of the 8253 timer chip. This is enabled using bit 0 of port address 61H. With this method, the speaker is turned on and off at whatever frequency Timer 2 has been set, which causes a square wave tone to be generated. This is the method by which most simple sounds are created on the IBM.
A digitized sound sample is generally recorded as an 8-bit value (0-255), where each data sample represents a voltage level. Many samples are taken to approximate a continuous waveform. The sampling rate for the human voice is generally above 5 KHz. Good-quality recordings of music are often at 22 KHz, and a compact disc records data at 16 bits of resolution with a 44-KHz sampling rate. The simplest way to do digitized sound on the IBM speaker is as 1-bit sound. If your 8-bit data sample is less than 128, keep the speaker position off; if it is greater than 127, turn the speaker on. This works fine, except that 1-bit digitized sound sounds awful. To get higher-quality sound we need to find a way to hold the speaker in intermediate positions.
There's a simple way to do this. The speaker is a physical device. Even though you can apply voltage to the speaker at an effective, instantaneous rate (the speed of light is still pretty fast), the speaker cannot travel from its minimum to its maximum position instantaneously. How long does it take? The answer isn't exact because it depends upon the speaker inside of the particular machine, but it is roughly 60 millionths of a second. That's pretty fast, but since the 8253 clocks at roughly 1 MHz, we can control the rate at which we toggle the speaker with about six bits of resolution.
By sending a digitized sound sample to Timer 2 of the 8253, we effectively hold the speaker at intermediate positions that correlate with different voltage levels. The drawback is that we're doing this at a particular carrier frequency. That means that if you send out 9-KHz audio data, you also get a 9-KHz tone as the carrier frequency. This carrier will actually drown out the digitized sound data you are sending, and cause you and your friends to run screaming from the room in pain as your ears quickly begin to burn from a sound that could only appeal to a bat. My solution is to send the audio data out at a carrier frequency out of the range of human hearing. By playing back a 9-KHz sound sample with a carrier that is double the playback rate, we achieve an 18-KHz carrier that can't be heard. What's left is the digitized sound data itself, which is of surprisingly good quality.
As you might guess, the intermediate positions that the speaker holds are nonlinear. Therefore, when you transform your 8-bit source data down to six bits, the data needs to be modified to form a distribution that fills the bandwidth within which we can modulate the speaker.
There are some technical challenges in pushing data out of the speaker at frequencies above 18 KHz. This is why the audio driver presented in this article does not operate in the background. The foreground driver, IBMSND, effectively shuts the machine down while playing the sound out of the internal speaker, thus providing maximum clarity during playback. You can play digitized sound in the background, but it requires setting up an 18-KHz interrupt on the machine, which runs a very high risk of causing conflicts under DOS. (For example, protected-mode programs that intercept hardware interrupts cause a lot of problems for code with a very high interrupt rate.)
The IBMSND sound driver presented here simply takes the input 8-bit audio sample and reduces it to 6-bit values that are sent to the Timer 2 channel of the 8253. The timer interrupt is revved up to run at twice the sampling rate of the audio sample to get the carrier frequency out of the range of human hearing. All other interrupts in the machine are shut down while the digitized sound is pushed out of the speaker. The Timer 2 channel is reprogrammed to accept a 1-byte countdown timer value rather than the normal 2-byte value. At each interrupt, we simply execute the routine in Example 1 with DS:SI set up to point at the audio data (already in 6-bit format). In the foreground, the program waits for the SI register to reach the end of the audio data. The interrupt is used to achieve very precise timing on all hardware. This code worked even on 4.7-MHz 8088 machines, at 18 KHz! Notice that the hardware interrupt expects certain registers to be set up and does not save and restore the AX register. This is an especially good reason to shut off all other interrupts in the machine, because it is doubtful that it will run effectively for very long with the hardware interrupt installed.
TimerInterrupt: lodsb ; Get a data sample (DS:SI points to audio data) out 42h,al ; Send 1 byte timer countdown to timer 2. mov al,20h ; non-specific EOI out 20h,al ; Send it to interrupt controller. iret
The file DIGPLAY.DOC (available electronically) documents the API interface to the IBMSND digitized sound driver. A callable link layer into the digitized sound driver has been provided through DIGPLAY.ASM. This assembly language file provides C function-call hooks into the digitized sound driver as well as the ability to detect the presence of a resident sound driver through the call CheckIn. Simply include the C header file DIGPLAY.H in your code (see Listing Two, page 107) and link, regardless of memory model, to the object module DIGPLAY.OBJ. The assembly source is written taking advantage of Borland's Turbo Assembler IDEAL mode, which provides a powerful 8086 syntax. A sample C program that reads sound effects into memory and plays them back is enclosed in the file SPLAY.C (also available electronically).
Adding sound to your software can be very rewarding. It's an excellent method of communicating information, mood, and emotion to the user. It is also entertaining, and makes computers more fun to use. But developers should be cautioned to use sound wisely. Just as we have learned to hire professional graphic artists to create attractive graphics for GUI-based software, we need to look to professional musicians to obtain quality sound effects and music for our products.
With multimedia raging about us, it begs the question, "Why, when, and how should music and sound be used in computer programs?" For answers, I turned to George Alistair Sanger, aka "The Fat Man." From his home in Austin, Texas, the Fat Man works on audio production and composition for interactive entertainment and multimedia products, including award-winning projects like Wing Commander, Ultima Underworld, and The 7th Guest. --J.R.
Why should music and sound be used in computer programs? For the same reasons we've all surrendered to using icons--because it's the next logical step in computer-software development. Icons and graphics save viewers from having to figure out what they're supposed to be seeing; music helps them know what they're supposed to be feeling.
I don't intend to demean the intelligence of the average user, but there's a little bit of analytical work (which side of the brain does that again? Brains ought to be color coded to match the big companies; one side blue and the other rainbow) that goes into reading. Reading is slow, somewhat detached from our more natural experiences, and the user doesn't really like to do it. That's what makes icons work. It's already been established that icons speed productivity by giving us images from the "real world" (whatever that is) and letting us relate our feelings about, say, a file cabinet or a wizard, to the items represented on the screen. Music and sound take this to the next logical step.
Users associate with graphics the feelings appropriate to the real-world object represented. Music and sound are even more direct: They expose users directly to those feelings. (They can now experience an angry wizard or an efficient filing cabinet.) And feelings can be very useful tools.
Sound and music can increase throughput, enhance the pleasantness of computer experience, and increase entertainment value of a program.
When should music and sound be used? Not just in games.
Although computers and film are very different media, the relative maturity of the latter makes it useful for developers to look at film as a model for the future of some aspects of computer software. Especially with the importance of multimedia, we can look for examples not only in feature films, but all kinds of video applications: educational and industrial films, training, advertising, and news programs--anything that's been on a film or videotape.
When do they use music and sound? To enhance emotion where it already exists. Some folks like happy faces or other graphics when their computers start up--a happy tune can triple the happy effect.
A business logo can be enhanced by an audio swirl. Action games can be scored like mysteries or documentaries. Music and sound also inject emotion where it is desired, but doesn't already exist. If the attention of the user is really required, consider how much more effective a klaxon is than the word "Warning!" in a dialog box. Clicking keystrokes gives a sense of security to some typists. Any place an interesting graphic is used to change boredom to interest, an interesting piece of sound can enhance that change.
Sound also manipulates or changes an emotion that might already exist. A child-safety multimedia presentation might show a picture of a cute baby near a swimming pool, and something like the Jaws theme might keep the user from focusing on the cuteness of the baby. A short, simple title tune might make a database program seem less complex and frightening. In the case of the "bad news" dialog box, consider how much more palatable a warning a pleasant "ping" is than an explosive sound (or a picture of a bomb, for that matter). With audio, the developer, like a film director, is able to control the degree of emotion the user feels.
Moreover, it's a mistake to use music or sound when an emotional response or a reaction is inappropriate. When music is just thrown in as filler, or is so poorly composed that it doesn't adequately support the purpose of the program, it annoys the user and cheapens the software. And it gives music itself a bad name--"Musak."
How should music and sound be used in computer programs? Like graphics, they have to be done right. Filling up space with spectacular graphics done by an "artist friend" (everybody's got one) is simply not the way to create an effective program, and the same applies to music. There's as much an art to using music and sound in software as there is in film. And of course it's too big a subject to address here.
By way of cheap conclusion, here's a good rule of thumb. To judge the quality of your sound, regardless of the technical limitations of your platform, ask yourself these three questions:
--Fat
If you've been making noise about adding audio support to your applications, Media Vision's Sound Programming Contest is an opportunity for you to really sound off.
The programming contest, sponsored by sound-board manufacturer Media Vision, seeks to reward outstanding use of sound in software. Sound-supported applications must: use 16-bit digital sound and run on a Pro AudioSpectrum 16 or compatible sound card; be shareware or freeware; and be reasonably bug free.
Entries will be rated on a scale of one to ten, with four points being allocated for use of sound, four more for the "fun factor," and two for ease of use. A panel of judges will nominate 53 finalists, from which first, second, and third places will be selected.
Prizes will include CompuServe Electronic Mall shopping sprees (sponsored by Media Vision and Computer Express) worth $5000.00, $2000.00, and $1000.00 for first, second, and third places, respectively, and items worth $50.00-$100.00 for each of the 50 finalists. As a special bonus, ten additional prizes consisting of $100.00 shopping spress each will be awarded to the best "early bird" entries received before midnight, March 31, 1993. All entries will receive a Media Vision T-shirt and free CompuServe time.
To enter, either download the entry kit from CompuServe or Media Vision's BBS 510-770-0968, or dial 800-356-7886 or 408-655-6014 x211 to ask for the kit. Send your entry form to Media Vision and post the program in the PAS16 Shareware contest section on CompuServe (GO PAS16CONTEST). Media Vision will use your CompuServe ID to verify the author names on the application and the entry form. You must provide periodic technical support for your application.
Anyone can try out the applications on CompuServe (GO PAS16CONTEST) or the Media Vision BBS (shareware contest area).
The deadline for entries is midnight on July 15, 1993.
--Editors
Activision Digispeech P.O. Box 67001 2464 Embarcadero Way Los Angeles, CA 90049 Palo Alto, CA 94303 310-207-4500 415-494-8086 Adlib Corp. Media Vision 20020 Grande Allee East, #850 47221 Fremont Blvd. Quebec City, PQ Fremont, CA 94538 Canada GIR 2J1 510-770-8600 418-529-9676 Miles Design Inc. Advanced Gravis 10926 Jollyville, #308 #111, 7400 MacPherson Ave. Austin, Texas 78759 Burnaby, BC 512-345-2642 Canada V5J 5B6 604-434-7274 Roland Corp. 7200 Dominion Circle Advanced Strategies Corp. Los Angeles, CA 90040-3647 60 Cutter Mill Road, Suite 502 213-685-5141 Great Neck, NY 11021 516-482-0088 Sequoia Development Group 12517 Cascade Canyon Drive AMD Granada Hills, CA 91344 5204 E. Ben White Blvd. MS 56 818-368-7221 Austin, TX 78741 800-292-9263 or 512-462-5651 Sequioa Systems Inc. 400 Nickarson Rd. Artisoft Malboro, MA 01752 691 East River Rd. 800-562-0011 Tucson, AZ 85704 800-846-9726 Street Electronic Corp. 6420 Via Real ASC Computer Systems Carpentina, CA 93013 26401 Harper Ave. 805-684-4593 St. Clair Shores, MI 48080 313-882-1133 THE Audio Solution P.O. Box 11688 ATI Clayton, MO 63105 3761 Victoria Park Ave. 314-567-0267 Scarborough, ON Canada MlW 3S2 Turtle Beach Systems 416-756-0718 Cybercenter Unit 33 1600 Pennsylvania Ave. Covox Inc. York, PA 17404 675 Conger St. 717-843-6916 Eugene, OR 97402 503-342-1271 Voyetra Technologies 333 Fifth Ave. Creative Labs Inc. Pelham, NY 10803 1901 McCarthy Blvd. 914-738-4500 Milpitas, CA 95035 408-428-6600 Walt Disney software P.O. Box 290 Buffalo, NY 14207-0290
_EXAMINING PC AUDIO_ by John W. Ratcliff[LISTING ONE]
;; DIGPLAY.ASM John W. Ratciff ;; ;; This piece of source provides C procedure call hooks down into ;; the resident TSR sound driver. Use the call CheckIn to find out ;; if the sound driver is in memory. See the C header file DIGPLAY.H ;; for prototype information. ;; ;; This file is in the format for Turbo Assembler's IDEAL mode. The ;; IDEAL mode syntax makes a lot more sense for 8086 than the old ;; MASM format. MASM has recently been updated to provide some of the ;; functions that Turbo Assembler has had for a number of years. I prefer ;; to consider Turbo Assembler the standard for 8086 assemblers. ;; IDEAL mode functionality includes true local labels, real data structures, ;; typecasting, automatic argument passing and local memory. ;; Converting any of this code into MASM format is an exercise left for ;; the student. LOCALS ;; Enable local labels IDEAL ;; Use Turbo Assembler's IDEAL mode JUMPS INCLUDE "PROLOGUE.MAC" ;; common prologue SMALL_MODEL equ 0 ; True if wanting to assemble near procs. SEGMENT _TEXT BYTE PUBLIC 'CODE' ;; Set up _TEXT segment ENDS ASSUME CS: _TEXT, DS: _TEXT, SS: NOTHING, ES: NOTHING SEGMENT _TEXT Macro CPROC name ; Macro to establish a C callable procedure. public _&name IF SMALL_MODEL Proc _&name near ELSE Proc _&name far ENDIF endm ;; int DigPlay(SNDSTRUC far *sndplay); // 688h -> Play 8 bit digitized sound. CPROC DigPlay ARG DATA:DWORD PENTER 0 push ds push si call CheckIn ; Is sound driver in memory? or ax,ax ; no-> don't invoke interupt... jz @@EXT ; mov ax,0688h ; Function #1, DigPlay lds si,[DATA] ; Data structure. int 66h ; Do sound interupt. mov ax,1 ; Return sound played. @@EXT: pop si pop ds PLEAVE ret endp ;; int SoundStatus(void); // 689h -> Report sound driver status. CPROC SoundStatus mov ax,0689h ; Check sound status. int 66h ; Sound driver interrupt. ret endp ;; void MassageAudio(SNDSTRUC far *sndplay);// 68Ah -> Preformat 8 bit digitized sound. CPROC MassageAudio ARG DATA:DWORD PENTER 0 push ds push si mov ax,068Ah ; Identity lds si,[DATA] ; Data structure. int 66h ; Do sound interupt. pop si pop ds PLEAVE ret endp ;; int DigPlay2(SNDSTRUC far *sndplay); // 68Bh -> Play preformatted data. CPROC DigPlay2 ARG DATA:DWORD PENTER 0 push ds push si mov ax,068Bh ; Identity lds si,[DATA] ; Data structure. int 66h ; Do sound interupt. pop si pop ds PLEAVE ret endp ;; int AudioCapabilities(void); // 68Ch -> Report audio driver capabilities. CPROC AudioCapabilities mov ax,068Ch ; Check sound status. int 66h ret endp ;; int ReportSample(void); // 68Dh -> Report current sample address. CPROC ReportSample mov ax,068Dh ; Report audio sample. int 66h ret endp ;; void SetCallBackAddress(void far *proc); // 68Eh -> Set procedure callback address. CPROC SetCallBackAddress ARG COFF:WORD,CSEG:WORD PENTER 0 mov bx,[COFF] mov dx,[CSEG] mov ax,68Eh int 66h PLEAVE ret endp ;; void StopSound(void); // 68Fh -> Stop current sound from playing. CPROC StopSound mov ax,68Fh int 66h ret endp CPROC ReportCallbackAddress mov ax,691h int 66h ret endp CPROC WaitSound @@WS: mov ax,689h int 66h or ax,ax jnz @@WS ret endp ;; int CheckIn(void); // Is sound driver available? CPROC CheckIn call CheckIn ret endp Proc CheckIn near push ds ; Save ds register. push si mov si,66h*4h ; get vector number xor ax,ax ; zero mov ds,ax ; point it there lds si,[ds:si] ; get address of interupt vector or si,si ; zero? jz @@CIOUT ; exit if zero sub si,6 ; point back to identifier cmp [word si],'IM' ; Midi driver? jne @@NEX cmp [word si+2],'ID' ; full midi driver identity string? jne @@NEX ;; Ok, a MIDI driver is loaded at this address. mov ax,701h ; Digitized Sound capabilities request. int 66h ; Request. or ax,ax ; digitized sound driver available? jnz @@OK ; yes, report that to the caller. jz @@CIOUT ; exit, sound driver not available. @@NEX: cmp [word si],454Bh ; equal? jne @@CIOUT ; exit if not equal cmp [word si+2],4E52h ; equal? jne @@CIOUT @@OK: mov ax,1 @@EXT: pop si pop ds ret @@CIOUT: xor ax,ax ; Zero return code. jmp short @@EXT endp ends end[LISTING TWO]
/* Bit flags to denote audio driver capabilities. */ /* returned by the AudioCapabilities call. */ #define PLAYBACK 1 // Bit zero true if can play audio in the background. #define MASSAGE 2 // Bit one is true if data is massaged. #define FIXEDFREQ 4 // Bit two is true if driver plays at fixed frequency. #define USESTIMER 8 // Bit three is true, if driver uses timer. typedef struct { char far *sound; // address of audio data. unsigned int sndlen; // Length of audio sample. int far *IsPlaying; // Address of play status flag. int frequency; // Playback frequency. } SNDSTRUC; int far DigPlay(SNDSTRUC far *sndplay); // 688h -> Play 8 bit digitized sound. int far SoundStatus(void); // 689h -> Report sound driver status. void far MassageAudio(SNDSTRUC far *sndplay); // 68Ah -> Preformat 8 bit digitized sound. void far DigPlay2(SNDSTRUC far *sndplay); // 68Bh -> Play preformatted data. int far AudioCapabilities(void); // 68Ch -> Report audio driver capabilities. int far ReportSample(void); // 68Dh -> Report current sample address. void far SetCallBackAddress(void far *proc); // 68Eh -> Set procedure callback address. void far StopSound(void); // 68Fh -> Stop current sound from playing. void far *far ReportCallbackAddress(void); // 691h -> report current callback address. /* Support routines */ void far WaitSound(void); // Wait until sound playback completed. int far CheckIn(void); // Is sound driver available? 0 no, 1 yes.
Copyright © 1993, Dr. Dobb's Journal
Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.