Slow Fourier Transform
I promised to examine the Raspberry Pi as a digital signal processing platform. Thanks to the Portaudio library, getting audio data in and out is super easy. The challenge is that the Pi isn't exactly a super fast computer. I started out reading audio data from a microphone and last time you saw code that generates TouchTone sounds programmatically. Soon, I want to try to decode the same tones.
White PapersMore >>
- Architecting Private and Hybrid Cloud Solutions: Best Practices Revealed
- Developing a User-Centric Secure Mobile Strategy: It's in Reach
Forget the Pi for a minute. How do you decode tones? Portaudio is great for reading and producing audio data, but it doesn't do much for detecting tones. There are two common (and related) techniques for detecting tones in audio data. The Fourier transform takes time domain data and converts it to frequency domain data.
What exactly does that mean? When you read a signal you typically sample its amplitude at different times. That is a time domain data set. A frequency domain data set represents the amount of energy in the signal at different frequencies. In simple terms, a 1kHz tone looks like a sine wave in the time domain, but looks like a single spike (at 1kHz) in the frequency domain. The Fourier transform can convert time domain signals to frequency domain signals and vice versa.
The math behind this is a bit complex (in fact, it uses complex math, but the pun was unintentional). You can find whole books on the subject (I suggest Steven Smith's book, Digital Signal Processing for Engineers and Scientists). The math may be tough, but translating it into practice is even worse. In practice, you don't have a nice continuous signal. A real piece of software gets a list of measurements at discrete times. This requires a specialized form of the Fourier transform known as the discrete Fourier transform or DFT. The other problem is you want an algorithm that is computationally efficient. A common way to do this is to employ the algorithm known as the fast Fourier transform (FFT).
There is an even faster method if you only need to determine the presence of a few frequencies and you don't care about the entire spectrum of frequency data. This method is the Goertzel algorithm. It is a simplified version of the Fourier transform.
In truth, I didn't want to write either algorithm — I wanted to use something off the shelf. There are several options for FFT libraries, including the amusingly named Fastest FFT in the West. However, I decided to go with the GNU Scientific Library (GSL). This is available in the Pi's package manager (install libgsl0-dev using apt-get).
The library is very easy to use. The version of the FFT that this project will use is gsl_fft_real_radix2_transform. This version of the FFT takes an array of float values that has to be a power of two in size. When complete, the buffer will contain frequency domain data where the real part is in the first half of the buffer and the imaginary part is in the second half (the result of the operation is a complex number with a real and imaginary part).
For a given sampling frequency, the only meaningful data will be at half that frequency. That's why the library routine places its output in half of the buffer. The data that would be in the top half of the buffer is meaningless. Each element in the array (upon output) represents a frequency and is sometimes known as a bucket. Since there are only discrete frequencies, the bucket really represents a range of frequencies.
Suppose you have a sampling rate of 10,000 Hz and there are 1,000 buckets. Each bucket will represent 10 Hz. Of course, in the time domain, the entire array represents 100 milliseconds. You can see there is a trade-off between sample rate, number of samples, response time, and frequency resolution. In addition, longer buffers will take more processing time.
Modern C compilers have support for complex numbers (that is, numbers with real and imaginary parts). If you include complex.h, you can define variables as complex (for example, complex double). To initialize these variables, you can use the predefined I constant (I is traditionally used to represent the square root of negative 1). For example:
complex double c0=fftbuf[tonemap[i]]+fftbuf[tonemap[i]+BUFSIZE/2]*I;
The job should be simple: Grab audio data from Portaudio, call the FFT routine, and examine the results for the tone frequencies of interest. There are a few wrinkles, of course. Portaudio only returns floats, but the GSL routine expects an array of doubles. But the basic scheme is just that easy.
To help manage the trades of sample rate and buffer sizes, I created a simple spreadsheet. There are several key points to the spreadsheet. The GSL FFT requires a power of two for the FFT size. That means it has to be a number like 256, 512, or 2048. The sound card won't have an endless supply of sampling rates, either. A common one is 44100 Hz. However, this is overkill for reading just a few tones, all under 1500Hz. Even worse, it means that the frequency resolution will be reduced for a given buffer size.
Realistically, you want to make sure the bucket column for each tone is more than one away from its neighbors. If you have two buckets that are the same, the frequency resolution is too coarse to be usable.
Of course, it is easy to pick a large buffer size to get good resolution. The problem then becomes the total time calculation. To detect tones that last, say, 100 milliseconds, the total time for each packet needs to be less than 50 milliseconds. So you have to balance.
A few random experiments showed that with my hardware, a sample rate of 8000 and a sample size of 256 would probably be my best bet. The frequency resolution is adequate (although not great). The time each data set represents would be a little more than 30 milliseconds.
I've made it sound all rather simple, and it isn't too hard. But, as always, the devil is in the details. Next time, I'll show you some practical code that uses these ideas to decode tones.