Embedded Systems

Device Drivers & Real-Time Systems

By Robert Krten, October 01, 1998

Robert examines two radically different device drivers and their implementation under QNX Software's QNX 4 real-time operating system.

Oct98: Device Drivers & Real-Time Systems

Robert is a consultant specializing in real-time systems design and development work. You can contact him at http://www.parse.com/ or [email protected].

What makes a real-time operating system "real time" is its ability to respond to events in a deterministic (and, hopefully, fast) manner. The amount of time this response takes is referred to as "latency" and is fundamental to any operating system that claims to be real time. In conjunction with latency issues, real-time systems require operating-system-provided timing services that are accurate enough. Clearly, neglecting timing accuracy issues is something that can come back to haunt you once the initial design has been completed, because the timing accuracy is often taken for granted. This is particularly true when it comes to writing device drivers for real-time operating systems.

In this article, I'll examine two radically different device drivers and their implementation under QNX Software's QNX 4 real-time operating system. QNX is a real-time, extensible, POSIX-certified operating system based on a microkernel and optional cooperating processes. This architecture lets you scale QNX down for embedded systems, or up for large-scale systems with hundreds of processors. The QNX microkernel handles process creation, memory management, and timer control.

The first driver I'll examine communicates with a BSR TW-523 X-10 controller to provide access to various X-10 modules I have around the house. These modules let me perform functions such as controlling lights and appliances, by using existing 110-VAC wiring in the house. The second driver I'll examine is for a home-built PC soundcard, which has a unique architecture. (The source code for both device drivers is available electronically; see "Resource Center," page 3.) My focus here is to illustrate real-time device driver issues that come up in the real world. As you'll see, writing device drivers presents a number of development challenges, requiring timing and latency analysis during the design phase.

Latencies

Although there are several types of latencies, the two I'll focus on here are:

Interrupt latency, which is the amount of time that elapses from the hardware raising an interrupt, to the execution of the first instruction of the ISR.
Scheduling latency, which is the amount of time that elapses from a particular process being made ready to execute, and the execution of that process.

While both are important in real-time systems, interrupt latency is crucial because the source of the interrupt -- the hardware -- usually has no buffering. If you miss the interrupt, the data is gone. If the scheduling latency is long, however, this can (to a degree) be compensated for in the ISR. In such cases, the ISR is more complex, because it has to effectively buffer the data. In extreme cases, the ISR can get very complex, because it may have to respond to the ultimate source of the interrupt. (For example, a network card ISR may need to send back acknowledgments within a short amount of time; otherwise, the other end will time out. This means that the ISR must be intimately aware of the protocol, and perhaps have access to a lot of data structures from the corresponding process.) With a short scheduling latency, you can defer processing to the controlling process, rather than doing it in the ISR.

The X-10 Driver

The BSR TW-523 controller (X-10 controller) presents a simple interface to the PC -- it has four wires (common, TX, RX, and Zero Cross) and a 110-VAC plug. The idea is that whenever the 110-VAC changes polarity, the zero-cross line will change state. Effectively, this presents the 110-VAC line via an optically isolated square wave. According to X-10 protocol, data must be transmitted immediately after a zero crossing of the AC has been detected. This data is transmitted by asserting the TX pin for one millisecond (if transmitting a "1"), or doing nothing (if transmitting a "0"). When the TX pin is asserted, the X-10 controller generates a 120-kHz carrier on the AC line. Other devices listening on the AC line synchronize their reception of this carrier to the zero crossing.

There are two main software challenges in interfacing with X-10 devices -- the response time required upon detection of a zero crossing, and the accuracy of the one millisecond pulse that needs to be generated. When I built the hardware interface for the controller, I chose to use a standard RS-232 serial port. This was the easiest way I could think of to get interrupts from the zero crossing line. I then tied the DTR line to the TX pin, so that I could raise and lower it via software control. (I'll ignore the TW-523's RX pin in this article.)

So how much work should I do in the ISR versus the process level? This is a common tradeoff. The ISR, while running with the minimal latency after the time that the hardware interrupt was asserted by the hardware, is generally a much more "sensitive" environment. This is due to a number of reasons:

ISRs generally have access to all of the I/O ports (on x86 processors) and can wreak havoc with other hardware devices.
The amount of time spent inside of the ISR has a direct, negative impact on process scheduling.
Since the ISR isn't a real process, it is generally limited in the number of kernel calls that it can use. On the other hand, deferring processing until "process" time, while avoiding the pitfalls of the ISR, can lead to unacceptable latencies under some operating systems.

So, what to do?

On the surface, the actual work that needs to be done in the ISR for writing an X-10 controller device driver looks to be minimal. After all, when the interrupt hits, you jump into the ISR, look at a circular buffer (containing the data that some client process wants us to transmit), and, if there is a "1" to be sent, you assert DTR. That part is no problem. I'd be amazed if this took more than five lines of C. However, once you turn on DTR, you need to be able to turn it off one millisecond later. Depending upon what type of operating system you are using, this may range from a few to a few dozen lines of C -- somehow you tell the operating system to schedule a process to run, and the process starts a one millisecond timer. When the timer fires, the process deasserts DTR. Under QNX, this is done by returning a nonzero value from the ISR itself. The kernel picks up the ISR's return value, and affects the scheduling queue.

By doing the work at process time, rather than within the ISR, the only thing that's changed is when/where you do the circular linked-list management.

Since this example is interrupt driven, you still need an ISR, and you still need to clear the source of the interrupt (on the serial chip I'm using, this involves two I/O port reads). Then, you need to tell the kernel to schedule a process as a result of the ISR.

Doing the work in the ISR directly is more efficient, because the ISR (having access to the circular buffer) can determine whether or not it needs to tell the kernel to schedule a process. The "process" method requires that the ISR schedule the process every time, since the ISR has no idea of what data should or should not be sent.

So why would I do it in the process? Simple: Because I can. It is much easier to debug things in the process. I can use source-level debug profiling tools (though my personal favorite is the printf() debugger). Also, in this case, I'm only getting interrupted at a low rate.

The real issue involves how long it takes to get there. Under QNX 4 on a 100-MHz Pentium, it takes 1.8 microseconds to run the first line of the ISR, and another 4.7 microseconds after the ISR has exited to run the first line of the process. These numbers were obtained directly from QNX Software Systems, and are stated as being "typical."

Can I afford 6.5 microseconds of delay? Well, where's the 110-VAC sine wave, 6.5 microseconds after the zero crossing? Only 417 mV higher (or lower), or about 0.12 percent of the range. Probably not significant.

"Hard" and "Soft": What do You Mean?

Recall that I quoted two numbers -- the 1.8 microsecond ISR latency, and the 4.7 microsecond "scheduling latency." While both numbers are equal in importance, one number is a little more equal than the other. The ISR latency time will only be affected by a process or ISR that has interrupts disabled, or a higher (in terms of hardware priority) ISR. Since most real-time architectures (and programmers, for that matter) try to disable interrupts for the smallest possible amounts of time, and run the ISRs for the least amount of time, statistically speaking you should have a good success rate with this 1.8 microsecond number.

What about the 4.7 microsecond number? This number is, first of all, after the ISR has completed execution, and secondly, is affected by the priority of the process. The ultimate decision as to whether this is good or bad depends on whoever decided at what priority things should run. If you need to attain the 4.7 microsecond number, then the process should run at a higher priority than other processes; period.

The One-Millisecond Issue

Regardless of where the DTR pin was asserted (in the ISR or in the process), most operating systems require you to do timing functions within a process. (You certainly don't want to spend one millisecond in an ISR.)

There are, of course, some design issues associated with this. Since the kernel receives periodic interrupts from some hardware clock, and indeed bases all of its timing on those interrupts, you cannot delay for a period of time whose granularity is finer than the base clock-tick rate. For example, if the kernel gets periodic interrupts every 10 milliseconds, you cannot reliably delay for anything less than 10 milliseconds. However, it's not as simple as boosting hardware-clock rates. Even if you boosted the rate to, say, one millisecond, the hardware clock is asynchronous to the process. If the hardware clock has just interrupted the kernel and you tell the kernel you want to sleep for one millisecond, you'll get pretty close to a one-millisecond delay. However, if the kernel clock is about to interrupt the kernel and you schedule the one-millisecond delay, you will be awakened much too early. The best that you can do in this case is to boost the hardware-clock rate so that the "jitter" (the amount of variability in the delay time) is "acceptable." Even if you boosted the clock rate to 100 microseconds, you'd still only be able to reliably sleep for between just over 900 microseconds and just under 1000 microseconds (one millisecond). And, of course, you can't just boost the hardware clock to an arbitrary rate (like one microsecond) because the kernel wouldn't be able to handle the interrupts at that rate. (I've found a hardware-clock rate of 500 microseconds works fine for the X-10 application -- as it turns out, the timing length isn't that sensitive.)

In this case, however, there is a more elegant solution. Since what you want is a time source that is synchronous with the assertion of the DTR pin, why not use the serial-port chip's TX pin instead? You could program the serial port for nine Kbaud, and send it one byte with all of the bits set the same. The serial-port chip will send out eight data bits and one stop bit, (nine bits in all), which at nine Kbaud will be extremely close to one millisecond! Tying the TX pin to the TW-523's TX pin means that the hardware has effectively generated a clean one-millisecond pulse. (Of course, this occurred to me after I built and got the hardware running.)

Can you rely on this as an "external" synchronous timing source? It depends on your willingness to modify the hardware such that the TX pin is looped back to a modem status pin that can generate an interrupt (such as CD).

The Audio Driver

To further examine timing, I'll now turn to the soundcard driver. About six years ago, when soundcards weren't very good (not to mention pricey), I managed to wangle some sample digital-audio quality A/D and D/A parts. Since these parts worked with a serial data stream, I designed an AT-compatible ISA card with four FIFO chips on it, along with some serial/parallel and parallel/serial conversion circuitry logic on it. I wasn't quite sure how to work with the hardware interrupt system, so I left it off for what I thought was the initial test. To my surprise, the board worked (and the interrupt circuitry has stayed off of the board). So what does this have to do with real time?

Let's examine how a FIFO chip works. A FIFO chip has two "sides." In my case (for the D/A portion), one side is connected to the ISA bus (the writer side), and the other side is connected to the parallel/serial conversion logic (the reader side). The reader side is driven by a steady 44.1-kHz clock -- that's the sampling rate that the card and D/A converter operates at. This means that the parallel/serial conversion logic is reading data out of the FIFO at a fixed rate (44.1 kHz). Since the FIFOs are 512 bytes (and there are two of them, to make, effectively, a single 512 word FIFO), this means that the FIFO will go from full to empty in 512/44100 seconds (11.6 milliseconds).

At that time, I realized that I didn't need interrupts, and could get away with just polling the FIFO's FULL/EMPTY flag. By filling the FIFO completely, I had 11.6 milliseconds where I could do whatever other processing was required. Example 1 (the main polling loop in the audio driver) illustrates what I mean. Note that Example 1 is a code excerpt; the actual code (available electronically) has multiple buffers for buf that are fetched from disk during the time when the FIFO is full.

I made the decision to call delay (1) (which sleeps for one millisecond), as opposed to calling it with a number closer to 11.6, for two reasons. First of all, I didn't feel comfortable with sleeping until the FIFO was almost empty -- what if something caused me to oversleep? Then there would be a "click" in the audio stream as the parallel/serial logic sucked 0s out of the FIFO. Also, and more importantly, I didn't want to be hogging the CPU at a high priority for the entire time that it took to fill the FIFO from a near empty state. It's much better to fill it in tiny bursts, as this lets lower-priority processes run more often. This second point may appear moot -- except that I contemplated buying 16-KB FIFOs instead of the wimpy 512-byte FIFOs I had, until I found out that they were about $50 each. Another consequence of not using an interrupt is that I avoid the context switch of entering an ISR and scheduling a process.

Summary

In conclusion, the key questions to ask when designing and implementing drivers for real-time devices are:

How good is your kernel's clock granularity?
How fast are the context switch times (both into the ISR, the interrupt latency number, and from the ISR to the process, the scheduling latency number)?
Are there any good tricks that you can do in the hardware to offload the software's timing burden?

Acknowledgment

QNX Software Systems programmer and DDJ contributor Dan Hildebrand, who recently passed away, inspired my work with X-10 devices. This article is dedicated to his memory. Donations in Dan's name may be made to the Manitoba Cancer Treatment and Research Foundation, 100 Olivia Street, Winnipeg, Manitoba, Canada, R3E OV9.

For More Information

QNX Software Systems Ltd.
175 Terence Matthews Crescent
Kanata, ON
Canada K2M 1W8
613-591-0931
http://www.qnx.com/

DDJ

1 2 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Embedded Systems