Dr. Dobb's | Real-Time Enough

Robert implements a real-time caller ID implementation that is part of a voice-activated call monitor system he built.

Caller ID

Caller ID, which is a feature that tells you who is calling on the telephone, occurs as a Frequency Shift Keying (FSK) signal between the first and second ring. It operates at 1200 baud, using the Bell-202 Standard, which is based on 1200 Hz (a "mark," representing a 1) and 2200 Hz (a "space," representing a 0) tones. It's a serial transmission, with a start bit, 8 bits of data, no parity, and a stop bit. At the data level, it consists of packets of information with checksums.

In this article, I discuss a real-time caller-ID implementation that is part of a voice-activated call monitoring system I built. In particular, I discuss how the caller-ID part of the solution has evolved from a hardware FSK modem in the early 1990s to the software FSK modem in use today. While my particular implementation is based on FreeBSD 5.3, there is nothing operating-system specific here.

My hardware consists of an adapter for a phone line (Figure 1) that converts the telephone signal into something that I can feed into a sound card's input. Because I have two lines, I have two such circuits in my system. I've conveniently used the sound card's left channel for one line, and the right channel for the other.

[Click image to view at full size]

Figure 1: Telephone line adapter.

Requirements

My original requirement for the hardware was to use it as a simple, voice-activated recorder to record phone calls. Later, I realized that I could do the caller-ID decoding as part of the voice recorder's processing, thus freeing up two serial ports (used by the hardware caller-ID boxes). (Soon, I will be doing call-progress tone detection and DTMF decoding, but that's currently left as an exercise for the reader.)

Figure 2 presents the big picture of the system. Both lines go into a sound card and are sampled at 8 kHz. The samples then go into the voice-activated recorder, which listens for activity. When it detects activity (as defined by the signal levels going above a certain threshold, for a certain period of time), it begins recording. While it's recording, it feeds samples into the software caller-ID FSK modem (there's no point in feeding samples through the modem if there's no signal).

Figure 2: The big picture.

Each sample is run through the FSK modem, and if a zero or a one is detected, the bit is then sent into a software UART ("Universal Asynchronous Receiver Transmitter," but we're using just the receiver part). The software UART looks for the start bit, and when it gets it, accumulates eight more bits, and constructs the byte. The bytes are then accumulated in a buffer. When a sufficient number of bytes have been accumulated, the buffer is passed to the caller-ID event decoder, which analyzes the buffer for caller-ID information (date, time, phone number, caller's name, message waiting, and so on), and stores the information. A logger is waiting for information from the caller-ID event decoder, and logs the information to a text file.

Now turn to the FSK modem part of the system, keeping in mind the constraints affecting the real-time processing of the samples.

A sample rate of 8 kHz means that a sample arrives every 125 microseconds. In the past, before soundcards, an A/D (analog-to-digital) converter would simply present the digital value of the sample when the conversion was finished. This meant that something had to happen every 125 microseconds. The constraint here is that the system had to provide a reliable, preemptive scheduler that would allow a task to be scheduled every 125 microseconds.

Coupled with the fact that something had to happen fairly frequently, the next issue was how much actual work had to happen. Do you perform the FSK calculations or do you simply buffer the data? If you buffered the data, meaning you've deferred the FSK calculations until later, then the only way you could perform the FSK work would be with a reliable preemptive scheduler that interrupts your FSK calculations in order to store subsequent samples.

However, times have changed. Today's sound cards buffer the samples. This means that we don't need to worry about incurring scheduling overhead (context switches) for each and every sample. The hardware simply informs the kernel (via an interrupt) that a buffer is full and should be emptied. This means that you can access blocks of samples, rather than being forced to get the samples one at a time from the hardware. The advantage is that you invoke a single kernel call to transfer an entire block of samples, (and then process them individually), rather than invoking kernel calls for each and every sample.

Another constraint is the actual amount of processing required to perform the FSK algorithm. If you are right on the edge in terms of CPU processing power to perform the calculations, then adding context-switching overhead is not going to help. By eliminating (or vastly reducing) one constraint, you've bought yourself more time to perform the FSK work.

The FSK Algorithm

How much work is it? Look at the FSK algorithm itself. Mathematically speaking, we are constructing the formula in Figure 3. The gist of this formula is that you take each sample and multiply it by the sine at the mark frequency, the cosine at the mark frequency, the sine at the space frequency, and the cosine at the space frequency. We add up all the multiplication and square each set. Whichever set turns out to be bigger indicates which tone you've decoded.

[Click image to view at full size]

Figure 3: The formula.

To detect 1200 Hz and 2200 Hz tones, you construct a "correlation template"—a table of four normalized (meaning that they are between -1 and +1) sets of values, giving the sine and cosine of a 1200 Hz and 2200 Hz waveform sampled at the sampling rate of 8 kHz. I only require enough samples to match the slowest waveform, which is 1200 Hz in this case. One complete waveform of a 1200 Hz sine wave, sampled at 8 kHz, will require 6 2/3 samples (8000/1200=6.666...), so I can get away with just six samples. The correlation tables look like Table 1.

	M A R K		S P A C E
	sine	cosine	sine	cosine
[0]	+0.0000	+1.0000	+0.0000	+1.0000
[1 ]	+0.8090	+0.5878	+0.9877	-0.1564
[2]	+0.9511	-0.3090	-0.3090	-0.9511
[3]	+0.3090	-0.9511	-0.8910	+0.4540
[4]	-0.5878	-0.8090	+0.5878	+0.8090
[5]	-1.0000	-0.0000	+0.7071	-0.7071

Table 1: Correlation tables.

In Listing One, the mark waveform's sine table is at index 0 (corresponding to sin(q_m) in Figure 3), the cosine at 1(cos(q_m)), the space waveform's sine table at 2(sin(q_s)), and the cosine at 3(cos (q_s).

The algorithm assumes that samples are placed into the ring buffer handle -> buffer, with the variable handle -> ringstart pointing to the beginning of the buffer.

Then, each sample in the buffer is multiplied by each of the four correlates, and the values are accumulated into the factors array. (The logic dealing with j simply makes sure that j loops around the ring buffer correctly.) The four factors correspond to the four summations in Figure 3.

So far, the total amount of work is 24 multiply-accumulates, which is followed by a comparison and some buffer housekeeping.

Notice that the correlates at index zero are always zero (for the sine components) and one (for the cosine components). The code could be optimized slightly to:

This would bring the code down by four multiply-accumulates (17 percent savings). (Alternatively, you could do the aforementioned optimization, but keep the same number of multiply-accumulates and just extend the table. Remember, I'm using six samples where my calculations indicated 6 and 2/3 samples, so 7 samples is okay, too.)

Finally, once all the multiply-accumulates are done for the four factors, you square and add the respective mark and space factors, and see which one is bigger:

Whichever set of factors is bigger indicates which frequency was detected; therefore, if you get a true value, it means you detected a mark; otherwise, you detected a space.

The algorithm just described is basically a pair of matched filters at the mark and space frequencies; the result is obtained by comparing the amplitude of the output of the two filters.

Bit-Cell Management

At a slightly higher level, you need to perform bit-cell management. What I mean by this is that we need to know where one bit ends and another bit starts. Obviously, if the bits have different values, this is easy: When the value changes, one bit has ended and another has started. However, what if there are several bits all with the same value, one after another? This is the topic of "clock recovery."

Because the data stream is arriving at 1200 baud, you know that each bit cell is 1/1200 of a second long, or 833.333... microseconds. Next you calculate how many samples (at 8 kHz) 1/1200 of a second is. Well, this is the same number I came up with earlier when I calculated how many samples I needed to store one complete waveform at 1200 Hz. It's the value 6 and 2/3. This means that every 6 2/3 samples we have the end of one bit, and the beginning of another. However, since I am working with integral samples, it means that I have to compromise a little bit, and potentially introduce drift. (Drift is introduced not only by rounding errors, but also because your sound card will not be sampling at exactly 8 kHz, nor will the phone company be transmitting at exactly 1200 baud; there will be some variation.)

The saving grace here, though, is that you are dealing with a serial protocol that has a start bit and a stop bit, which are different from each other. This means that you are guaranteed to have at least two bit reversals during an 8-bit data byte. Recall that a bit reversal (a change from a mark to a space or vice-versa) tells you the definitive location of a bit-cell boundary. You therefore use those for synchronization. Listing Two is the code that does that part. The value celladj is a floating-point value that indicates the fraction of a bitcell that elapses with each sample. (Yes, this could have been done with integers, but I got lazy.)

So far, you've accumulated a sequence of bits, and called bit_outcall with each bit as it arrived from the software modem. You now need to construct a software UART; something that looks for a start bit, accumulates 8 bits (ignores the stop bit), and calls a function to indicate that a completed data byte has arrived; see Listing Three. This implements a trivial state machine.

The state machine begins the operation with have_start set to zero, indicating that you have not yet seen the start bit. A start bit is defined as a zero, so while you receive 1s, you simply ignore them. Once you do get a zero, you set have_start to a 1, clear the data byte that you'll accumulate, and reset the number of bits that you've received to zero. As new bits come in, you shift them into place (using my favorite bit of obfuscated C, the "Boolean typecast operator," or "!!"), and bump the count of received bits. When the count reaches eight, you've received all of the bits you need to declare a completed byte, and you ship the completed data off to a higher level function (byte_outcall) and reset the have_start state variable back to zero to indicate that you are once again looking for a start bit.

If you are into such things, you could check to see that there is indeed a stop bit present. If there isn't a stop bit, you should declare a frame error in your serial transmission. I didn't bother for this application, but there is some commented-out code in the library that serves as a starting point. At this point, you have bytes coming from a sound card—pure magic.

What does this have to do with real time? Well, there's a lot of computations happening:

So while I'd like to believe that I could do this on a late 1960s vintage PDP-8 minicomputer, the truth is that it would probably take at least an 8086-class processor to come close to the CPU requirements.

This ties in to the "real time enough" aspect. Just as certain tasks were completely out of the picture in terms of being done in software, so has the landscape changed in terms of "real-time" requirements. In 1993, I implemented the higher level caller-ID software, but relied on hardware FSK modems because it was inconceivable (to me, at least) that I could do this in software. The 1993 implementation relied on a real-time OS (QNX 4) to make sure that the CPU got allocated to the serial port handler when data arrived, and then got allocated to the high-level caller-ID software to ensure timely processing.

Today, I run the full software caller-ID package as described in this article on a free OS (FreeBSD 5.3) and I didn't even bother using the real-time scheduling features of the kernel. It barely uses enough CPU to show up as more than 0.00 percent on the CPU monitor. In short, it's "fast enough."