C Programming: More Music Minus Whatever, Failures and Successes

Al continues development of his Music Minus Whatever project, which lets you encode three discrete logical channels of music into one stereo audio file.


December 01, 2001
URL:http://www.drdobbs.com/cpp/c-programming-more-music-minus-whatever/184404893

Dec01: C Programming

Al is DDJ's senior contributing editor. He can be contacted at [email protected].


When a magazine article describes a software project, it usually presents a final solution — the approach the author used that got everything working. Projects typically run for many months or years, and developers make mistakes and correct them, go down blind alleys and retreat from them, try new technologies and abandon them, and so on. You don't get to read about all the failures because the purpose of a technical paper is to explain what works, and authors usually write them after the work is completed.

I don't have that luxury. I have to write something every month. A trip down the "C Programming" column memory lane of the past 13 years illustrates how my projects often get started, developed, changed, and restarted.

What you are about to read is the account of yet another failed experiment. Or, to put it more positively, an experiment that eliminates proposed solutions to a problem. In that context, every experiment is a success, because if it identifies what does not work, or at least what I have not yet made to work, it points the work toward other, hopefully successful, approaches.

I'll also tell you about what I finally decided to do, which, I hope, will work. But I don't know yet. It's called research.

MMW

I'm not sure I can call my current project "Music Minus Whatever" without stirring up the lawyers at http://www.musicminusone.com/, but so far, it's only a research project and not really a product. Last month I described a process that encodes three discrete logical channels of music into one stereo audio file, which, of course, has only two physical channels. I developed this approach to support MMW, a project that uses musical training files with which students suppress the instrument being studied to play along with professional studio musicians recorded in the other instrument channels.

Using the three logical channel approach, if I want to record a six-piece band, I'll need two audio files and a way to play them both at once, eliminating the specific logical channel of the instrument to be suppressed. That's not much more efficient than simply storing one-channel files for each of the instruments and mixing in only the ones the student wants to hear. Furthermore, the mix could be a mess. Each instrument is either extreme left, extreme right, or dead center. Twiddling the samples lets you eliminate one of three such logical channels, but so far I've found no way to isolate each channel and remix them all across the stereo spectrum. Instead, MMW simply mixes all the remaining logical channels into a single monaural signal.

Not completely happy with that solution, I set out to develop a process for encoding more than three logical channels into two physical channels. I almost got it working.

I said last month that I need to test these theories with real musicians rather than simple "test, one, two, three" voice files. Since I am a real musician and play four instruments (piano, trumpet, trombone, and bass), I can be my own quartet. If I need a quintet, I can add a MIDI-generated drum track. Any more than that, and I'll have to phone a friend.

As with last month's project, I recorded each channel independently as a monaural channel with no cross-over audio from the other instruments, except that this time I made five recordings instead of only three. I recorded each one by using a microphone plugged into my laptop. In a studio you would either get all the players together in a studio and put them in isolation booths with headphones, or you could record them alone at different times as I did. Each successive recording artist plays while listening through headphones to what was previously recorded. The first one in would listen only to an electronic metronome.

Recall also from last month that the algorithm for eliminating the logical center channel is to use the values in each sample from the physical channels to cancel out those parts of the two signals that are equal. Listing One illustrates that algorithm. The right and left samples for two channels are adjacent in most audio file formats. Zeroing out only the samples that are equal in both channels does not work. There is always enough of the original center channel signal remaining that shares a sample's value with other waveforms in the signal — sorry, but I don't know a more scientific way to say that — that you don't really eliminate much of the center channel at all.

Listing One works fine when both channels are sent to independent stereo outputs. But in a recent demonstration of vocal elimination, I had only a single monaural amplifier in a typical PA system setup, so I used a Y-connector to connect both sides to the amplifier's line input. This created a problem. The remaining signal in the two channels cancel one another out, and the output volume is, consequently, too low. To correct that condition, I added a line of code that inverts one of the channels so that it is out of phase. Listing Two shows this process. Curiously, none of the center channel removal programs I downloaded and tested take this behavior into account.

More Than Three Logical Channels?

If you can eliminate a center channel by using Listing One's algorithm, perhaps you can eliminate an off-center channel by panning the whole waveform to put the off-center channel into the center. Makes sense, but does it work? Let's see.

To identify multiple logical channels in a stereo mix, I mixed each individual instrument's channel to position it in a specific location in the stereo spectrum. I used Cool Edit Pro (http://www.syntrillium.com/) for this process. This is a great program that any audio engineer should have in the toolbox. Too bad it works only under Windows. I wish there was a Linux version.

With five unique mono audio files to mix, I opened each one in Cool Edit, copied its waveform to the clipboard, created a new, empty stereo audio file, and pasted the one-channel signal into the new two-channel file. Then I used Cool Edit's amplitude adjustment feature to increase one stereo channel's amplitude and decrease the other stereo channel's amplitude in each file by a percentage unique to the file. I increased the far left logical channel's left physical channel amplitude by 50 percent and decreased its right physical channel amplitude by 50 percent. The middle left channel is panned to +30 and -30 percent, the center channel is unchanged, the middle right channel is -30 and +30, and the far right channel is -50 and +50.

Cool Edit Pro supports multiple track sessions, so I added each of my panned files to a session and mixed the five-track session down to a single stereo file. When you play that file, it sounds like a well-balanced stereo performance. The difference between this recording and a typical stereo recording from a studio is that the instruments are precisely mixed at specific places in the stereo spectrum without any audio bleed-over to other logical channels. In theory, the difference between the right and left channels identifies those parts of the mixed waveform that belong to each instrument or logical channel. (Acoustic engineers and DSP experts are probably laughing their nether regions off at this point. I am not deterred. They laughed at Thomas Edison, Alexander Graham Bell, and J.D. Hildebrand, too.)

By the way, all these audio files consume a lot of disk space. A good investment is the MP3 input/output plug-in filter for Cool Edit Pro. It'll pay for itself in hard drive savings.

Eliminating a specific logical channel involves first panning the waveform such that the logical channel to be eliminated is in the center, and then applying the center channel removal algorithm. Listing Three is the algorithm that is supposed to do that. Except that it doesn't work very well. Center channel removal is as good as it always was, but each other logical channel removal leaves echoes of itself and degrades the audio quality of the other channel on its same side, right or left.

As with last month's project, I implemented the channel eliminator as a Winamp plug-in. I added a slider control to tweak the panning values during playback. The results were mostly the same when I tweaked each logical channel. You can download this month's plug-in project (see "Resource Center," page 5). The zipped file includes a test MP3 — a short audio of me singing the five notes of a C9 chord in succession with lyrics that identify the channel. Kind of sounds like a bad barbershop quintet.

What About FFT?

I don't know much about the Fast Fourier Transform (FFT) or even the slow one. What I do know I learned from DDJ contributing editor Tim Kientzle's book, A Programmer's Guide to Sound (Addison-Wesley, 1997). I already understand audio samples, which you need to know to understand how FFT works with them. A digital audio file consists of a bunch of numbers called "samples." The samples combine to represent a waveform. An audio waveform is sampled at a fixed rate, and the procedure stores signed integers representing the waveform's value at the time of each sample. A positive number is a positive value and a negative number is a negative value. Samples for two stereo channels are stored in adjacent integers. A CD-quality recording uses two channels of 16-bit samples sampled at 44,100 samples per second.

FFT converts a digitally represented waveform to a list of frequencies (expressed as cycles per second) and their amplitudes. You pass to FFT a sequence of samples, and FFT does the transformation. Inverse FFT converts the list of frequencies and amplitudes back into samples. You use sequences of a small number of samples for FFT to preserve the timing of where the frequency occurs in the overall waveform, which enables Inverse FFT to reconstruct the original waveform from the frequencies. It's all very mysterious, but it works.

FFT enables audio playback programs to implement EQ controls and display the waveform in a real-time animated bar chart like the front panels on contemporary stereo systems.

It seemed to me that a frequency list produced by FFT could also more precisely identify the center channel. Those frequencies that are equal in Hz and amplitude during a given time period must belong to the center channel. Or so it would seem. I installed code in a Winamp plug-in that uses FFT to convert samples to frequency lists, zeroes the amplitude of frequencies that are equal in both channels and seem to belong to the center channel, and then uses Inverse FFT to convert the frequencies back to samples. So far, mostly what I get is a lot of bubbly sound in the audio output even when I apply a margin of ranges to test for pseudoequal frequencies.

Part of the problem might be the tools. The FFT that Tim implements uses Standard C++'s std::complex<double> template class and lots of floating-point math, so it is not really a fast FFT. Kind of half-fast (rim shot). Not fast enough to do its conversion in real time on a mainstream computer, at least. Further experimentation means I need to find a faster, perhaps an integer-based, implementation of FFT in source code. They're all over the Internet, and I'm poring over several of them now, but the results so far are not encouraging.

But maybe I don't really need to do all this after all. Remember that the MMW project's objective is to produce ensemble recordings from which users can selectively eliminate one instrument at a time. An MP3 of acceptable quality (for these purposes) can represent a one-channel, three-minute performance in about a half a megabyte. Given six instruments in the ensemble, you can cram over 200 such selections onto a CD-ROM if you record each instrument on its own track for each selection. But with last month's technique of putting three logical channels into two physical channels, you increase that number to 300 selections. Either way, it's far more capacity than a teaching tool needs or can reasonably afford. Most commercial audio CDs don't use the full 74 minutes of recording space because the production costs are prohibitive when compared to the return. Studio time, musicians' pay, and royalties keep the performances down to the number of tunes typically found on the old-time traditional 33 1/3 RPM vinyl LP.

The Final Solution?

Using discrete physical channels to record six instruments sounds good, but will it work? Decoding and mixing three stereo MP3's for real-time playback might be a little taxing for a mainstream PC. Such a solution might have to wait a year or two for double-digit GHz CPUs. However, single-file MP3 decoding works fine on old Pentiums, and the contemporary machines are a lot faster. Mixing two files in real time will probably work given the faster machines of recent years. Consequently, the MMW experiments have retreated to last month's approach. To solve the mixing problem, I'll mix both files after channel elimination down to a mono signal, then mix the two mono signals into a simple stereo spread, which ought to sound pleasing enough. Listing Four is the code for the mixing algorithm.

In Search of an MP3 Decoder

Of course, I need to be able to decode MP3s. Until now, I've written plug-in modules for media players to test my DSP algorithms. But the players have done the decoding for me and, of course, none of them know how to decode and mix two MP3 files because nobody has ever wanted to do such a nutty thing before.

My search for a Win32 MP3 decoder in source code turned up a plug-in for Cool Edit. It was written before Syntrillium offered its MP3 file filter plug-in as a commercial product. I originally found the filter source code at Cool Edit's web site but it is no longer available at that site. The Syntrillium commercial MP3 filter product is a better choice for day-to-day use because it is a lot faster and also does encoding, but the earlier one is better for my needs because it is not only available in source code, but it is written in C++, which makes it easier to work with than other plug-ins, which are mostly written in C.

One other advantage to the free decoder: The commercial one won't process some MP3 formats, particularly MP3 files built by RealJukebox (http://www.real.com/). The commercial filter fails to find the file header, and Cool Edit assumes the file is raw PCM, which generates mainly a lot of noise. The problem is in Syntrillium's commercial plug-in itself and not the file format, because every other MP3 decoder, including Syntrillium's older free one, properly processes these files. Syntrillium is aware of the problem and will fix it in the next release.

Syntrillium's free plug-in is based on an earlier program called "MAPLAY," an open-source generic MP3 playback program by Jeff Tsay, who based his work on a decoder published by the Motion Picture Experts Group (ftp://ftp.tnt.unihannover.de/pub/MPEG/audio/mpeg2/software/technical_report/). The original decoder and encoder are written in C with absolutely no optimization whatsoever, and they are slow. The decoder does not produce audio; it produces a file of raw samples. Anyone wanting to use this code in a player has to optimize it to decode fast enough to play the samples in real time, and they have to put audio playing code on the back end. That's what MAPLAY does. I went looking for the source code, but for a while I couldn't find it. Many MP3-oriented Internet sites link to MAPLAY's web site, but they all point to the same dead page at berkeley.edu. Either Jeff graduated and moved on, or the proliferation of media player programs made his program unnecessary and he shut down its web site. After an extended search, I found the source code at another site. It compiles with Borland's C++ 5.x compiler. I had to search my archives for an old copy of that, which I installed. MAPLAY compiles and runs fine. The distribution includes a GUI version with a lot of extra features I don't need (WAV and CD-ROM playback, for example) and a command-line version with minimal features.

The MPEG code seems to have no license announcements about its reuse, but there are restrictions about how you may use it. MAPLAY was released under the GNU GPL. I haven't really looked into all the legalities, so I don't yet know what anyone can and cannot use with respect to this code. For now, it supports only my research project, so I won't worry about licenses for a while.

sndconfig: Just Say No

Every now and then you run into a widely distributed and used program that can be so stupid it defies belief. Linux includes a configuration tool available only to the root user. This program probes the hardware for a sound card and configures the sound card when it finds one it recognizes. The program, named "sndconfig," is evil; here is why.

My laptop includes sound circuitry identified as Cirrus Logic Crystal CS4281 PCI Audio. Mandrake 8.0's installation procedure properly identifies and configures this device. Recently I decided to upgrade my desktop's sound card and I wanted something that Linux 2.4 supports. I ran sndconfig on my laptop just to view the list of supported cards so I'd know which one to buy. The probe found the CS4281 but decided there was something wrong with the hardware and could not complete the configuration of the sound card, which, up until now, had been working just fine. Apparently, during its probe, sndconfig modified some interior system configuration files that describe the sound system. Here's the first stupid, evil part: There is no way to tell sndconfig to ignore everything it's done and return the system to its original state. Once you run it, the darn program is going to have its way with your system whether you like it or not. It decided that I do not have a configurable sound card, even though I have one that had been working fine up until then. The procedure effectively disabled audio under Linux. A subsequent reboot restored it and the next reboot lost it again. If I want sound, I have to reboot at least twice, sometimes more often.

Harddrake, the Mandrake GUI hardware configuration utility, is no help. It senses the CS4281 but doesn't include it in the list of devices it configures. The sndconfig man pages name the files "mungs" (although one of them does not exist on my system), and I banged around on them a bit, but their cryptic formats and contents are undocumented and the problem persists.

There is a --noprobe option that supposedly suppresses the probe, but guess what. It probes anyway. If it finds an unsupported sound card, it exits without letting you see the list of supported sound cards. Useless.

It will be argued that because sndconfig and the files it modifies are available only to those with root privileges, others should keep out. Those who make that argument forget that Linux is marketed in pretty boxes as a mainstream desktop single-user system. Even though it is a UNIX knock-off, Linux was originally developed to be run on systems that are typically single-user — the PC x86 platform. Now with mature KDE and GNOME GUIs, Linux strains to be a Windows wanna-be. Single user. Desktop. The only user of a typical Linux system can have root privileges whenever he or she wants them. You can buy Mandrake and RedHat Linux distributions at most Staples office supply stores where anyone is allowed inside. The boxes' lists of system requirements do not include an entry saying you must have a system administrator on staff, and typical users won't understand those arcane configuration files found in /etc.

Every time I read where someone fervently believes that Linux is on the brink of becoming a mainstream desktop OS, I have to laugh.

The moral? Don't write stupid programs like sndconfig. Don't let your programs do permanent things your users cannot easily undo. And, above all else, don't use sndconfig. Unless, of course, you want a really quiet computer. Or, if you do use sndconfig, use its --noprobe command-line option, which I learned about too late to prevent sndconfig from doing its dastardly irreversible deeds.

Next Month, Plug It In

All this noodling with MP3 players, plug-ins, and source code taught me that I don't know what I'm doing when it comes to digital signal processing. But I learned how to write plug-in modules for audio programs, and next month's column discusses those experiments.

DDJ

Listing One

for (int i = 0; i < numSamples; i+= 2)  {
    short int s = *(buffer+i);
    *(buffer+i) -= *(buffer+i+1);
    *(buffer+i+1) -= s;
}

Back to Article

Listing Two

for (int i = 0; i < numSamples; i+= 2)  {
    short int s = *(buffer+i);
    *(buffer+i) -= *(buffer+i+1);
    *(buffer+i+1) -= s;
    *(buffer+i+1) *= -1;
}

Back to Article

Listing Three

// percentage of pan expressed as an int (e.g. 10 = 10%) increase left side 
// by pan, decrease right side by pan pan is positive for rightward panning,
// and negative for leftward panning
  for (int i = 0; i < numSamples; i+= 2)  {
    int lf = *(buffer+i);
    int rt = *(buffer+i+1);
    lf = (lf * (100+pan))/100;
    rt = (rt * (100-pan))/100;
    *(buffer+i) = min(max((rt-lf), -32768), 32768);
    *(buffer+i+1) = (min(max((lf-rt), -32768), 32768)) * -1;
  }

Back to Article

Listing Four

for (int i = 0; i < numSamples; i+= 2)  {
    int lf1 = *(buf1+i);
    int rt1 = *(buf1+i+1);
    int lf2 = *(buf2+i);
    int rt2 = *(buf2+i+1);
    // do the channel elimination
    // ...
    // mix left and right channels
    int mixleft  = lf1 + lf2;
    int mixright = rt1 + rt2;
    // build the audio waveform with a center channel
    const int balance = 4;  // experiment for best balance
    *(audiobuf+i) = mixleft + mixright / balance;
    *(audiobuf+i+1) = mixright + mixleft / balance;
}

Back to Article

Terms of Service | Privacy Statement | Copyright © 2024 UBM Tech, All rights reserved.