By the Numbers

By Ed Nisley, February 01, 2004

When it comes down to it, software performance really is a number's game.

Feb04: Embedded Space

Ed is an EE, PE, and author in Poughkeepsie, New York. You can contact him at [email protected].

The cause of the acceleration of the motion of falling bodies is not a necessary part of the investigation.

—Galileo Galilei, Two New Sciences

Galileo was the first human to realize that we could understand the universe by measuring physical properties and analyzing the results. Having invented modern science, he went on to revolutionize optics, astronomy, mathematics, and physics. Imagine what he could have done with computer support!

One unfortunate side effect of Galileo's influence is that we often confuse measurement with knowledge. After you collect the numbers, you must extract their underlying principles before you know anything about the subject. It's easy for raw data to overwhelm careful analysis, particularly when data collection doesn't require much effort.

Programming and software, being inherently digital, produce a torrent of data that changes with each revision. Simply keeping track of the numbers can be a full-time job that's often mistaken for keeping track of the software itself.

Worse, software does not operate in a Galilean universe where measurement and analysis can always lead to understanding. Our intuitions, honed by our experience in a physical universe, tend to be completely wrong. But we can make some headway if we're careful about what we measure.

Software Measurements

Software measurement techniques can be either static or dynamic, with a very small gray area in between. This is in keeping with the geek T-shirt slogan "There are only 10 types of people in the world: Those who understand binary and those who don't."

Static measurements form the basis of most management decisions: lines of code, number of modules, length of functions, object file size, and so forth and so on. In essence, anything you can derive by looking at the files on the hard drive is fair game for static analysis.

Human-based static analysis—the domain of code reviews, inspections, walkthroughs, "many eyes" teams, and Extreme Programming—attempts to weed out errors before actually running the stuff. Simply looking at the code one more time works wonderfully well, as does sleeping on it for a while.

Pop Quiz: How often have you snapped awake in the middle of the night, realizing your code simply won't work? Essay: Discuss why this sometimes happens years after the code went read-only.

Because humans have relatively short attention spans and limited memory capacity, we depend on programs to analyze source code and produce summary reports. Compilers, of course, verify that a particular source module conforms to the language requirements. Lint, that old standby, ensures that you said the same thing the same way in every module. Similar syntactic parsers gnaw through other languages with similar results, although C and C++ facilitate particularly infelicitous slips of the keyboard.

Even a program with no syntactic or semantic errors is not correct when it does exactly what you said, not what you meant. The next level of static measurement compares the source code with the specifications and highlights the mismatches.

Problems may still go unseen in places where the spec says nothing about weird error conditions that "can't happen here." As we raise the level of abstraction with which we specify programs, we're also raising the level at which errors occur. This may be a good thing, but it can result in some truly baffling errors.

Beyond that, we move into synthetic execution, the gray area between static and dynamic testing where Lint-like programs trace boundary-condition values through program variables without actually running the program. As we saw last month, this technique can expose problems that simply can't be found by any other method.

In contrast to all that parsing, dynamic analysis involves actually firing up the program, applying some data, and measuring what happens. Dynamic analysis collects data describing which logic paths the code traversed, which functions were called, and so forth.

Given access to the source code and compilation tools, you can build probes right into the structure of your program. Lacking that, operating-system hooks can trace the program's ins and outs. If all else fails, shim programs ooze into the OS interfaces with varying degrees of compatibility and overhead.

With that degree of control over the program's execution, you can both observe and manipulate the program's dynamic flow. James Whittaker used Security Innovations' Holodeck in his talk at the SD Best Practices conference last September, much to the amusement of the audience. There's more about Holodeck and security testing at http://www.sisecure.com/ and in "Red-Team Security Testing," by Herbert H. Thompson and Scott Chase (DDJ, November 2003).

Dynamic analysis also turns out to be useful for those folks devoted to software copy protection. The arms race between protection and piracy has lately produced Byzantine schemes that encode usage permission in sequences of function calls. The July/August 2003 issue of the IEEE Security & Privacy journal (http://csdl.computer.org/comp/mags/sp/2003/04/j4toc.htm) described the Sandmark tool (http://cgi.cs .arizona.edu/~sandmark/sandmark.html). This stuff is so weird that I'm left speechless, but it's right at the thin edge of the DMCA wedge.

Human-Scale Time

Timing data is conspicuous by its absence in all this, as it simply doesn't count for much in the ordinary application-programming domain. The only timing data of interest to management is how late the current release might be, with smaller details being trumped by whether this "final" build works well enough to ship. Raw performance simply doesn't appear in the release checklist of most applications.

When it does, you'll find that human-usable programs generally measure time in wall-clock units: seconds and minutes. Batch programs (batch lives—a retired friend babysits a bank's overnight batch jobs) have runtimes measured in hours. Getting performance numbers requires nothing more than a wristwatch with, perhaps, a stopwatch function.

Programs that actually interact with humans have three order-of-magnitude time constants. Any user input must produce a response within 0.1 second, the program must either complete the request or slap up a progress bar within 1 second, and it better not take more than 10 seconds to complete most actions. All of those times are easy enough to measure using nothing more than the usual operating system timing facilities.

In the firmware domain, however, particularly in the real-time neighborhood, milliseconds and sometimes even microseconds matter. The overlooked minutiae of execution can make or break a project.

Time Parameters

The simplest definition of real-time programming I've heard yet is "The right answer at the wrong time is wrong." Although we can quibble about whether an accounting program qualifies as "real time" (because a late paycheck is definitely a failure), the specs for most programs don't have much to say about timing. Heck, the specs usually don't say much about bizarre error conditions.

That's also true of most embedded programs, particularly on the low end where 4- and 8-bit microcontrollers reign. Remember that low-horsepower chips constitute something like 3/4 of the total number of processors shipped these days, so a huge chunk of the market basically runs as fast as it can, polling switches and updating displays in tight little loops.

A small and growing number of larger projects, though, do care about time. These are not necessarily engine controllers that twiddle valves and trigger sparks precisely on time, but programs that work with audio data, network traffic, and similar information streams. They're typically deeply embedded programs with a minimal human interface, so getting data out requires more effort than you might expect.

The bulk of those projects do not have true real-time constraints, in that the failure to deliver results on schedule generally won't cause a catastrophic failure. Because the code must cope with a steady stream of data, it cannot stall for a protracted time without either losing input data or starving a downstream consumer. The actual durations depend on the application, but tend to be in the millisecond range.

The simplest timing parameter is the latency from an interrupt request to the start of the user's interrupt handler. Contemporary systems have latencies on the order of a few microseconds, quite difficult to measure using only software without introducing significant errors. Given an oscilloscope or logic analyzer, some dexterity, and a line or two of code, however, you can collect numbers with gleeful abandon. The problem, as always, lies with the analysis.

Operating-system kernel spec sheets often give the typical interrupt latency, the average or mean value of a large number of samples. Most spec sheets tout the minimum latency, the smallest input-to-output delay, which results when nothing gets in the way of the code and the planets are correctly aligned. Few spec sheets mention the maximum latency, the only number you really need to know because it determines the worst delay your code must contend with.

In actual practice, however, interrupt latency forms a very small part of the overall problem. Your code will surely perform some nontrivial processing on aggregates of incoming data, because otherwise you wouldn't need the processor in the first place. The latency from interrupt to processed output can be, and usually is, orders of magnitude longer than the raw interrupt latency. Worse, it's completely under your control, so there's no place to hide.

Sampled Data

If you admit that your code generates most of the latency, then measuring the actual time with software becomes easier. After all, what are a few [dozen|hundred|thousand] more instructions among friends?

Real-time operating-system kernels typically have runtime profiling and timing facilities, either built-in or available as add-on packages. Blending the measurement code with the kernel allows fine-grained data collection from interrupt handlers, schedulers, and user tasks recording event times at key points within the kernel.

A good timing system can generate data without modifying your application firmware, record times only for specific tasks or programs, and generally throttle the torrent of data down to a manageable level. Extracting that data from the target system presumes an external connection that may appear only in a development board, but projects of sufficient complexity to require execution profiling tend to sport network taps.

Tom Barret of Quadros Systems (http://www.quadros.com/) diagrammed the Quadros RTXC OS internals for me one morning at ESC. The product supports engineers comfortable with the notion of fitting an application into raw hardware with the aid of a tailored real-time kernel that doesn't tote all the baggage inherent in, say, a Linux distribution. In fact, even the file system and communication stacks are optional, which shows a delightfully minimalist approach to the problem.

RTXC includes two schedulers, one for threads and another for tasks. In fact, you can implement a system with either scheduler or both, as you see fit. The overall system can handle uniprocessor, multiprocessor, and DSP scheduling, with size and complexity increasing in step with your own code.

Quadros recently added execution profiling that feeds data up through its CodeWarrior (http://www.metrowerks.com/) kernel awareness module. Basically, you can now figure out what's going on inside your software without leaving the confines of the debugger.

After covering a sheet of paper with diagrams, Tom pointed out that mean latency is meaningless, even if it's easy to measure, and that your test cases might not induce the maximum latency. He proposes that the variance and its square root—the standard deviation—provide insight into your code's behavior.

Galileo measured the relation between acceleration and time by rolling balls down inclined planes. He repeated the experiments hundreds of times to ensure that he had accurate and stable numbers, measured to the limits of mid-1600s technology. After years of recording careful measurements, he figured out the square-law relationship governing uniformly accelerated motion.

Software doesn't have that same level of repeatability, at least for code complex enough to require a real-time operating system. Nondeterministic effects such as memory allocation, garbage collection, interface locks, and even spin loops change the path length for every pass through what's nominally the same code.

Suppose that a routine has a mean latency of 1 millisecond during a million samples with varying system loads. If the standard deviation of those samples is 10 microseconds, you can be fairly sure that your code doesn't have any major exposures.

On the other hand, a standard deviation of 500 microseconds tells you that your code occasionally wanders off the beaten track. Some careful static analysis of your source might be in order, plus some additional tracing to determine where it's spending its time. It may well be that your code is faultless and the processor is getting bogged down in Other People's Code, a fact not revealed by any possible static examination.

As I understand it, no execution profilers compute standard deviations, so you must collect and massage the raw numbers yourself. Who knows? You may discover something interesting.

Although I used Quadros and CodeWarrior as examples, many of the same features are available from other folks. Take your pick, get some meaningful numbers, and decide if your project meets its specs. Even better, make sure those specs match up with what your users expect.

Reentry Checklist

Feed various combinations of "static analysis," "dynamic analysis," "software tools," and "source code" into your favorite search engine to unearth oodles of products and services that may help you produce better code. Add terms like "timing analysis" and "profiler" to focus on embedded stuff.

GUI Bloopers (Academic Press, 2000; ISBN 1-55860-582-7), by Jeff Johnson, has a relevant chapter on user-interface responsiveness. Even if you're not doing real-time system design, his guidelines, including the three time constants I mentioned, will improve your programming.

If you can finish Galileo's Daughter without a tear in your eye and a catch in your breath, you need an empathy tuneup. It's by Dava Sobel (Walker & Co., ISBN 0-8027-1343-2) and worth every penny. Read it and weep. The Galileo Project at http://es.rice.edu/ES/humsoc/Galileo/ includes the full text of his daughter's letters. In view of his reputation with the Vatican, Sister Maria Celeste's convent destroyed his letters after her death.

DDJ

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.