Rating Real Time: Hard and Brittle

By Ed Nisley, July 01, 2001

There's "soft" real time, "hard" real time, and to Ed's way of thinking, there's also "brittle" real time.

Jul01: Embedded Space

Ed is an EE, PE, and author in Poughkeepsie, New York. You can contact him at [email protected].

Some years ago, I picked up an old Nicholet Model 800 logic analyzer, mostly because I couldn't let it go to the dump. It's an incredible hulk — three feet and 43 pounds. According to the sticker on the top, its next calibration was due in 1985.

It can record 4X1000 logic state bits at 20 MHz, 1X2000 timing bits at 100 MHz, and even digitize an analog waveform at 50 megasamples per second.

The front panel features a character CRT display, a fold-down keyboard, and a half-high 5-1/4-inch dual-sided 720-KB diskette drive. Inside throbs a 3.3-MHz 8085 microprocessor that can run the analyzer from ROM or boot CP/M from the floppy into 48 KB of RAM.

If you're looking for a classic hard real-time embedded system, look no further. This is so hard, it's brittle!

Say It with Silicon

The current practice of implementing what are called "hard real-time" systems using little more than software became possible after the collision between Moore's Law and human-scale physics. Only recently has hardware become fast, dense, and cheap enough to permit nontrivial, submillisecond software response, which is entirely sufficient for many embedded control applications.

In point of fact, though, achieving true hard real-time performance requires more than just fast response. It also requires finishing the job on time and keeping up with the incoming data, both of which are much, well, harder to ensure with software alone.

As an example, that logic analyzer certainly couldn't capture a 100-MHz data stream using any combination of software, interrupts, and programmed I/O. Even an 8085, which in its day had a reputation as a blinding flash and a deafening report, doesn't have enough raw speed.

The entire 8085 required 6500 transistors, used 3.0-micron technology, had no instruction pipelining, and lacked any trace of cache. It required about four clock cycles to complete each instruction and, fed by a 3.3-MHz clock, trundled along at about 800 KIPS.

What it could do, however, was implement control algorithms for the entire logic analyzer in software rather than hardware. The 8085 handled the user interface, translated user selections into control signals, and set up the back-end machinery for the high-speed part of the job. During the actual data-capture sequence, it simply waited for a capture-complete status flag to pop up.

You'll find the capture machinery lined up in the left rear quarter of the chassis in Figure 1. The three identical cards near the middle of the cage hold the capture memory and trigger logic, with the CPU, RAM, and EPROM on three more cards near the front. The remainder holds the stuff that goes into making a microcomputer system: video display, storage interface, serial I/O, what have you. In the early 1980s, you didn't expect an Ethernet jack.

Nowadays, one corner of a system-on-a-chip such as the ZFx86 (née Mach-Z) from ZF Micro Devices (née ZFLinux Devices) has room to spare for nearly all that circuitry. It would be an interesting exercise to reimplement the whole thing using a contemporary SOC, a little LCD panel, and a handful of parts, wouldn't it?

Timing Is Everything

A key part of a logic analyzer's job involves triggering — recognizing when to start storing data. It can also stop storing data at the trigger, which lets a desperate engineer travel back in time to see what caused the error. Can't be beat!

Generally, you're waiting for a particular pattern of bits, perhaps occurring at a specific time after another pattern, all preceding or following a blip from an external source. While it may appear that all this happens sequentially, the actual trigger logic must run at the full capture speed of the analyzer to deliver meaningful results; you cannot wait while a CPU chugs through a function to read input data and evaluate a series of inputs against some Boolean masks.

When the triggering condition occurs, the capture logic either enables or disables the clock that latches data from the probes attached to whatever you're monitoring. That must occur within one sample clock so the data stored in the capture memory represents the actual conditions at the moment the trigger event happens. As a rule of thumb, logic-analyzer trigger design requires the bleeding edge of whatever high-speed circuit family is available at the time.

Given a trigger, data capture proceeds mechanically: Digitize the incoming voltages at a preset threshold and store them in memory, one bit per channel per clock cycle. This is simple enough to be done with a few gates, although they may be obscured by the surrounding clock logic.

Note carefully how the real-time part of this job remains bounded in time. This analyzer will record 2000 samples in 20s at 100 MHz, less time than the vertical refresh time of the CRT display. The slowest capture rate may require a few seconds overall, but that's determined by the analyzer setup and completely understandable.

(Raise your hand if you've ever stared at an analyzer or oscilloscope, wondering why it just locked up, only to heave a sigh of relief when you see the sample rate set to 1 s per sample rather than 1s per sample. Do you still wear that T-shirt occasionally?)

The trigger and capture timings go beyond "hard real time" into what we might call "brittle real time." The latency and jitter specifications use nanoseconds, not microseconds, in a region obviously not suited for software. Not this decade, anyway.

After the real-time machinery has finished its work, the 8085 may take whatever time it needs to extract, format, and display the results. In practical terms, the display pops up instantaneously, but we know it might take tens or even hundreds of milliseconds to go from raw bits to characters on the CRT.

Then, of course, you spend half an hour pondering the waveforms on the screen, examining the schematics, tracing the firmware, and wondering just exactly how the [deleted] your gadget could possibly do that.

Unreal Time

Although a logic analyzer works in brittle real time, the task remains bounded and readily divided between hardware and software (or, if you prefer, firmware). That is not always the case, and when it's not, it makes doing hard real-time tasks really hard.

Consider, for example, the situation in an IP router with incoming data at T-1 rates: 1.55 Mbps. Those 1.5 Kb IP packets arrive every millisecond, pretty much like clockwork, and shorter ones may arrive even more often. The router must examine each packet header, apply some filtering and routing rules, then send it to the appropriate output network.

What's not critical here is the response time to any individual packet. The NIC receives the packet, stores it in a buffer, posts an interrupt to the CPU, and begins processing the next packet. The CPU responds to the interrupt when it gets around to it, reads the packet, and does what's needed to send it on its way.

(Grant me an interrupt per packet, please. We both know it doesn't work quite that way, but it makes the story simpler. In any case, NICs convert a brittle real-time problem into something far less demanding.)

The router's average outgoing packet rate must at least equal its average incoming packet rate, lest packets pile up at the input port. However, given adequate buffering, there's no requirement that each packet be handled instantly as it arrives, without delay, in anything close to real time.

It turns out that a plain Linux distribution, set up as a router right off the CD, can keep up with T-1 data on a magnificently obsolete box. You'll recall that unbounded delays in the dispatcher keep stock Linux out of the real-time arena, but it works fine in this application.

Now, let's upgrade that incoming line to OC-3 at 155 Mbps. Suddenly, IP packets come popping out of the NIC (well, fiber modem) every 10 microseconds or so. Again, there's no need for real-time processing, but something's significantly different: You now have 99 percent less time to finish each packet. I suspect even recently obsolete hardware can't keep up with that rate. (Let me know if I'm wrong, then repeat the exercise with OC-12.)

The issue isn't response time or interrupt latency. Raw throughput matters more than timeliness and that's a much harder thing to measure.

Real Thumb Time

When you read of hard real-time systems done purely in software, you can infer several facts about their applications. A few rules of thumb may be in order here; let me know how far off I am from your own experience.

Most obviously, the overall specifications must allow enough timing jitter to accommodate the worst-case (not average) interrupt latency. That will typically be in the low tens of microseconds on current hardware and depends critically on the I/O gear as well as the CPU.

As you tighten that spec, the degree-of-difficulty rating for the whole project rises dramatically. At some point, probably just under 10s these days, you enter brittle territory and should stop kidding yourself about this software stuff; it ain't gonna get the job done.

I suggest budgeting an order of magnitude more time than the bare minimum spec to accomplish anything useful. If the latency spec is 15s, allow 150s to actually handle the I/O and finish working with the data, which gives you an estimate of the overall handler run time. If you have better numbers, use them instead, of course.

Then figure how much of the CPU that handler requires. If you get an interrupt every millisecond and the handler runs for 150s, it uses 15 percent of the CPU's capacity. That's not negligible, but I've seen similar burn rates ignored in first-cut estimates. It'll come back and eat you!

Next, while the entire system must have enough throughput for the long-term average data rate, the buffers must handle the worst-case mismatch. This can quickly get into complex simulations, but again, a quick estimate may save the day.

Cobble up a routine that does something trivial for each input or output data value it finds in the interrupt-handler queue. Measure the elapsed time for that routine, multiply by 10, and divide by the interrupt period. Presto — the fraction of the CPU time devoted to handling your data. Add that to the handler utilization and you may also have grounds to worry.

If, at this point, you find the CPU utilization nudging 25 percent, you are definitely in a heap o' trouble, because you haven't allowed for any of the other processing that must go on in the background. Remember, there's a user interface, network I/O, all that stuff that we tend to ignore while concentrating on the big picture.

When you're trying to size the system, be very, very conservative at first. You can always spec a slower and cheaper CPU, but paring firmware to suit the fastest silicon in your supplier's current lineup is agonizingly difficult.

Finally, a very simple simulation can often reveal overlooked issues. If you can characterize your real-time system with a few interrupts, a few data streams, and some processing, the common parallel printer port can be your friend. Mock up a single interrupt handler, shovel dummy data between it and a simple user-level program, then run some imitation processing. Add marker outputs that toggle parallel port bits at key points in the code, feed a square wave into the port's ACK pin to fire up the interrupts, and watch your scope.

You'll see a direct measure of just how long your code takes. Apply my rules of thumb to your results and decide how hard or brittle your timing looks.

Don't have a scope? Just find yourself an engineer with a (modern) logic analyzer!

Reentry Checklist

Nowadays, you can dissolve many brittle real-time problems in a sea of Field-Programmable Gate Array (FPGA) silicon. All the trigger and capture circuitry on those logic analyzer cards would fit neatly into a single, high-speed chip, with interconnections reduced to a bit pattern stored in EEPROM. Who says things aren't getting better every day?

Look at http://www.altera.com/ and http://www.xilinx.com/ for ideas. When microseconds matter, hardware does it best. Honest!

Rules of thumb are what you use in the absence of better information. Tom Parker collected at least two volumes of them in his Rules of Thumb (1983, ISBN 0-395-34642-8) and Rules of Thumb 2 (1987, ISBN 0-395-42955-2). They're out of print, but well worth tracking down.

Not much to my surprise, I found Y2K fixes available for CP/M. If I ever boot that logic analyzer into CP/M, I should be ready. Get that and more at http://www.cpm.z80.de/, then become completely lost at http://www.seasip.demon.co.uk/Cpm/index.html. I doubt anybody's developing CP/M code any longer, but its remnants will live forever on the Internet.

DDJ

1 2 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Rating Real Time: Hard and Brittle

Say It with Silicon

Timing Is Everything

Unreal Time

Real Thumb Time

Reentry Checklist

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

Rating Real Time: Hard and Brittle

Say It with Silicon

Timing Is Everything

Unreal Time

Real Thumb Time

Reentry Checklist

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content