Channels ▼

Al Williams

Dr. Dobb's Bloggers

Slow Fourier Transform

July 29, 2013

I promised to examine the Raspberry Pi as a digital signal processing platform. Thanks to the Portaudio library, getting audio data in and out is super easy. The challenge is that the Pi isn't exactly a super fast computer. I started out reading audio data from a microphone and last time you saw code that generates TouchTone sounds programmatically. Soon, I want to try to decode the same tones.

More Insights

White Papers

More >>

Reports

More >>

Webcasts

More >>

Forget the Pi for a minute. How do you decode tones? Portaudio is great for reading and producing audio data, but it doesn't do much for detecting tones. There are two common (and related) techniques for detecting tones in audio data. The Fourier transform takes time domain data and converts it to frequency domain data.

What exactly does that mean? When you read a signal you typically sample its amplitude at different times. That is a time domain data set. A frequency domain data set represents the amount of energy in the signal at different frequencies. In simple terms, a 1kHz tone looks like a sine wave in the time domain, but looks like a single spike (at 1kHz) in the frequency domain. The Fourier transform can convert time domain signals to frequency domain signals and vice versa.

The math behind this is a bit complex (in fact, it uses complex math, but the pun was unintentional). You can find whole books on the subject (I suggest Steven Smith's book, Digital Signal Processing for Engineers and Scientists). The math may be tough, but translating it into practice is even worse. In practice, you don't have a nice continuous signal. A real piece of software gets a list of measurements at discrete times. This requires a specialized form of the Fourier transform known as the discrete Fourier transform or DFT. The other problem is you want an algorithm that is computationally efficient. A common way to do this is to employ the algorithm known as the fast Fourier transform (FFT).

There is an even faster method if you only need to determine the presence of a few frequencies and you don't care about the entire spectrum of frequency data. This method is the Goertzel algorithm. It is a simplified version of the Fourier transform.

In truth, I didn't want to write either algorithm — I wanted to use something off the shelf. There are several options for FFT libraries, including the amusingly named Fastest FFT in the West. However, I decided to go with the GNU Scientific Library (GSL). This is available in the Pi's package manager (install libgsl0-dev using apt-get).

The library is very easy to use. The version of the FFT that this project will use is gsl_fft_real_radix2_transform. This version of the FFT takes an array of float values that has to be a power of two in size. When complete, the buffer will contain frequency domain data where the real part is in the first half of the buffer and the imaginary part is in the second half (the result of the operation is a complex number with a real and imaginary part).

For a given sampling frequency, the only meaningful data will be at half that frequency. That's why the library routine places its output in half of the buffer. The data that would be in the top half of the buffer is meaningless. Each element in the array (upon output) represents a frequency and is sometimes known as a bucket. Since there are only discrete frequencies, the bucket really represents a range of frequencies.

Suppose you have a sampling rate of 10,000 Hz and there are 1,000 buckets. Each bucket will represent 10 Hz. Of course, in the time domain, the entire array represents 100 milliseconds. You can see there is a trade-off between sample rate, number of samples, response time, and frequency resolution. In addition, longer buffers will take more processing time.

Modern C compilers have support for complex numbers (that is, numbers with real and imaginary parts). If you include complex.h, you can define variables as complex (for example, complex double). To initialize these variables, you can use the predefined I constant (I is traditionally used to represent the square root of negative 1). For example:

complex double c0=fftbuf[tonemap[i]]+fftbuf[tonemap[i]+BUFSIZE/2]*I;

The job should be simple: Grab audio data from Portaudio, call the FFT routine, and examine the results for the tone frequencies of interest. There are a few wrinkles, of course. Portaudio only returns floats, but the GSL routine expects an array of doubles. But the basic scheme is just that easy.

To help manage the trades of sample rate and buffer sizes, I created a simple spreadsheet. There are several key points to the spreadsheet. The GSL FFT requires a power of two for the FFT size. That means it has to be a number like 256, 512, or 2048. The sound card won't have an endless supply of sampling rates, either. A common one is 44100 Hz. However, this is overkill for reading just a few tones, all under 1500Hz. Even worse, it means that the frequency resolution will be reduced for a given buffer size.

Realistically, you want to make sure the bucket column for each tone is more than one away from its neighbors. If you have two buckets that are the same, the frequency resolution is too coarse to be usable.

Of course, it is easy to pick a large buffer size to get good resolution. The problem then becomes the total time calculation. To detect tones that last, say, 100 milliseconds, the total time for each packet needs to be less than 50 milliseconds. So you have to balance.

A few random experiments showed that with my hardware, a sample rate of 8000 and a sample size of 256 would probably be my best bet. The frequency resolution is adequate (although not great). The time each data set represents would be a little more than 30 milliseconds.

I've made it sound all rather simple, and it isn't too hard. But, as always, the devil is in the details. Next time, I'll show you some practical code that uses these ideas to decode tones.

Related Reading






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 


Video