Embedded Systems

Debugging Real-time Systems

By Joseph M. Newcomer, July 01, 1993

When building complex real-time systems, there are a number of analysis and programming techniques that guarantee correct performance, among them informative breakpoints, in-core event traces, and timer dividing.

JUL93: Debugging Real-time Systems

Informative breakpoints, in-core event traces, and timer dividing get the job done

Dr. Joseph M. Newcomer received his PhD in the area of compiler optimization from Carnegie Mellon University in 1975. He has done real-time programming for 28 years and can be contacted at 610 Kirtland St., Pittsburgh, PA 15208.

A number of analysis and programming techniques guarantee correct performance when building complex real-time systems. Many such techniques, however, don't "scale down" to relatively trivial systems. For instance, techniques such as "rate monotonic analysis" can tell you how to schedule and prioritize tasks in a multithreaded, real-time control system (a nuclear-reactor control system or an avionics system, for example), but they don't help much when you're writing a single-threaded device driver.

Moreover, a lot of real-time code does not have a debugger interface. Debug, symdeb, and later Codeview weren't available in early PCs for real-time debugging because they interface to DOS for I/O. Consequently, if a breakpoint set in the real-time interrupt handler was taken while your program was in DOS, the debuggers crashed the system or even corrupted the disk because of DOS's non-reentrancy. I finally ended up using a program called the IBM Professional Debug Facility, which went to the bare metal for I/O. I later switched to Nu-Mega's Soft-Ice, even though the first version didn't support symbolic debugging.

This article presents key techniques for debugging real-time, embedded, or device-driver code: informative breakpoints, in-core event traces, and timer dividing. You'll also find these techniques useful when using sophisticated debuggers such as Soft-Ice.

The Breakpoint

An obvious starting point is the simple breakpoint. The first thing a breakpoint can tell you is whether or not your program got to a specific piece of code. For example, if you set a breakpoint at the basic interrupt handler for a device, and the interrupt isn't taken when there's input pending, you know you've failed to set up some basic property, such as enabling the interrupt, or even that you have another board interfering with the interrupt line.

With debuggers, breakpoints replace the instruction at the desired address with an instruction that transfers control to the debugger, such as the INT 3 instruction on 80x86 machines. When the breakpoint is taken, the original contents of the instruction are restored, the instruction is executed with the trace-trap mode bit enabled so a trap is taken immediately after the instruction is executed, and the breakpoint instruction is restored for the next time. Many simple monitor programs (like those on a number of "bare boards") aren't sophisticated enough to handle the full resume-from-breakpoint protocol, and execute the instruction replaced by a breakpoint. You can usually examine and deposit memory, and many will even report that a trap has occurred and continue at the instruction following the trap, but they can't actually do real breakpoint handling. Consequently, I've developed a technique for setting breakpoints in spite of these limitations.

I introduce a NOP instruction at the place I want the breakpoint, then reassemble, relink, and (on embedded systems) download the new code to the target. Working with the monitor program, I deposit into the appropriate NOPs the trap instruction to the monitor. There is no need to restore the original opcode of a (possibly) multibyte instruction and single-step across it.

Without a symbolic debugger, however, you don't know where that instruction is until you convert the NOP address based on the relocation of the code and the offset of the instruction, which also means you need a full assembly listing. This isn't so bad until you're doing something like writing a DOS device driver. In this case your only (machine-readable) assembly listing is on the machine you are debugging on, so you can't examine it when the interrupt handler is running.Printing the listing in its entirety every time would consume a lot of time and paper. My approach is the equivalent (in whatever system I'm using) of the MASM construct in Example 1(a). To determine the breakpoint's location, I print out the link map. If the link map is large, I only print the segment of the link map containing all names starting with B_P_, which is certainly smaller than the entire listing or map.

This solves half of the problem--the part that tells you the location of the NOP address that you're going to change to a breakpoint instruction--but this isn't sufficient. What I really need to know when the breakpoint is taken, is exactly which breakpoint I hit. This means I have to write down the (absolute) breakpoint address each time I set it, which for a segmented architecture can be a real pain.

I modified the code sequence in Example 1(a) to save a register, load the breakpoint number into that register, do the NOP, and finally restore the register. This does introduce some overhead; if the breakpoint is never taken, we still have to execute the instructions. Example 1(b) shows the new code, which can be made into a macro. When the breakpoint is taken, I only have to examine the AX register to see the breakpoint number. This means that once I've set the breakpoints, the forebrain is no longer involved in the recognition process. Since the breakpoint number remains the same on each compilation, I don't need to remember a large, complex, and different number each time; once I've remembered that breakpoint 17H is the input-interrupt entry breakpoint, I can retain that single fact.

Now suppose I want to look at some variable or set of variables at the time a breakpoint is set. Rather than continuing to decode the variable addresses from the link map, I can just push a few more registers and load either the values into the registers, or pointers to the values into the registers. Typically, on an 80x86 machine these pointers are DS:relative, so I only need the 16-bit offset for a pointer. For certain breakpoints in the queue manager of the device driver I was working on, for example, I loaded the BX register with the base of the active input or output queue (circular character buffers). Thus, by displaying the contents of memory found at DS:BX, I could see what was in the queues.

Moving Toward Real-time

The breakpoint technique works only when you are trying to debug the basic logic flow; that each breakpoint may result in an interruption of the flow drastically changes the real-time behavior of the program. Eventually, you're convinced that you now have correct logic flow, but when running without the debugger, the program still crashes in some way. What now?

Anyone who's suffered through an intermediate-level college software course knows the terms, "invariant" and "output condition." An invariant is, in the formal-proof sense, a condition that's true before and after a segment of code executes. An output condition is a condition that must be true when a piece of code completes. Both are useful when applied correctly--even to something as grubby as an interrupt handler.

My problem was that the interrupt routine I was working on had several exit points; I was sure that one of them was not restoring the interrupts, but I couldn't see how this was failing. So I added an assembler routine that simply checked the status of the interrupt-enable flag; if it was set for interrupts disabled, I enabled interrupts and took a breakpoint. I then called this from the main-event loop in the C code. Sure enough, about five seconds into the processing, I trapped with AX=0xEEEE. After careful study, I found the serial-line interrupt routine had been "optimized" for performance; if, just before returning to the user code, it detected another character had arrived, the routine called itself recursively. I'd seen this, including the comment, "Hope the stack doesn't overflow," but because of the bursty nature of the incoming data, I suspected this warning did not indicate the real cause. It turns out that the operative term here is "called." It did a far-call of the entry point. It even set a variable to tell if it should do a RETF or an RTI. (Why it didn't fake an INT or do a PUSHF first is a mystery known only to the original author.) The bug became obvious; the return-by-RETF flag wasn't cleared on the return, so the eventual return to the main code was via RETF, and the flags were not restored! Of course, this meant the stack was out of sync, and the next return done at the higher level would crash.

This crash did not occur, however, because the higher-level code was testing to see if there was a key struck, if a Ctrl-Break had occurred, or if an entire input packet had been received from the interface. Because interrupts were disabled, a keypress would never be seen, and a packet could never be fully received, so it effectively locked up. (Actually, if this race condition had occurred on exactly the last byte of a packet, the return would have occurred and the C code would have crashed, but the condition always seemed to occur in the middle of a packet.)

On a PC, some of this state is easily monitored. For example, you can put in conditional code that writes directly to the display memory. No DOS or BIOS interactions at all; for example, writing a status character to location B800:9E and a display-characteristics byte to B800:9F will display that character in the top-right corner of your VGA/EGA screen. This works fine for character-mode displays; you have to do more work if your screen is graphical. Thus, if you set up the assembly-code routine called from C to display a blue background E for interrupts enabled and a red background D for interrupts disabled, you can instantly see your last state when the system hangs. If you see a D, you've violated the interrupts-enabled invariant and know where to start looking.

Event Traces

One of the most useful debugging techniques is an event trace, sometimes known as the "debug print" statements scattered throughout our code. But how do you do an event trace when you can't print? The reasons for not printing are many: DOS is not re-entrant, you may not have a printer, using the printer is not possible in real time, and so on.

I keep the event trace in a circular buffer in memory. I first fill the buffer with 0s, then guarantee that the last entry is always followed by a 0 entry (making it easy to locate the end of the buffer). I've found that 256 events are more than enough for most problems. This technique is akin to using a logic analyzer, but cheaper and easier to analyze.

To put this in context, I was interfacing a new real-time data collection and an external device-controller board to an existing data-analysis/control package. I had to find all the places where things were done to the old card and add code to handle the new card. The cards were radically different; the old card had a packet buffer and interrupted every packet, while the new card interrupted every byte, and I had to handle the packet decode myself. So I had to retrofit this to what was already a very convoluted driver, consisting of about ten thousand lines of assembler.

It took three weeks to figure out how to insert the new board driver into the old code. We should have written a real DEVICE= style driver for each of the boards, or at least a TSR driver, but the company insisted that, for business and product-compatibility reasons, the device driver(s) be embedded in the application program and be able to determine which of the two boards were installed in the machine.

When I finally tested the integrated old-and-new code, it worked fine for the old board but failed for the new one. After painstaking debugging using the settable-breakpoint technique, I knew the basic control flow was correct. I needed to figure out what was going wrong under full-speed operating conditions.

I eventually added event logging for the following events: input-interrupt routine entry; input-interrupt routine exit; output-interrupt routine entry; output-interrupt routine exit; input-interrupt routine (places character in input queue); output-interrupt routine (removes character from output queue); set interrupt enable (all places STI instructions were executed); set interrupt disable (all places where CLI instructions were executed); device-input interrupt enable (set the IE bit in the device register); device-input interrupt disable (clear the IE bit in the device register); device-output interrupt enable (set the OE bit in the device register); device-output interrupt disable (clear the OE bit in the device register); timer-interrupt routine entry; timer-interrupt routine exit; timer-interrupt enable; timer-interrupt disable; load timer; output-queue entry insertion (called from the C code); output-queue entry removal (called from the interrupt routine); input-queue entry insertion (called from the interrupt routine); and input-queue entry removal (called from the C code).

Each event had 32 bits of data: an

8-bit event code, an 8-bit status value, and a 16-bit value. The event code was just a unique integer representing each of the aforementioned events. The 8-bit status was dependent on the event, but in the case of the card interrupts was one of the device registers. The 16-bit value was also dependent on the type of the event; for example, for queue management it was the 8-bit count of items in the queue, and the 8-bit character value either inserted or removed. For the load-timer operation, it was the 16-bit value loaded into the timer, and the status value was the timer-control register bits.

With this technique, I could detect that I was failing to enable output interrupts when a certain control command went out. It was a classic oversight, but I might have looked at the code for days without finding it. Instead, the trace showed it immediately, and I spent only four hours writing and inserting the trace-event code. A version that only records an 8-bit event code of the newevent macro is in Listing One (page 98). Note the use of conditional compilation. Having inserted these probe points, a simple setting of the conditional flag will either generate the probes or not. Thus the debugging tool is always available when you need it.

All names in Listing One are declared PUBLIC using C naming conventions. I did this because I got tired of decoding the bytes "by hand" using the debugger. It turns out I needed to use this technique many times as we enhanced the functionality of the interrupt handler to deal with more sophisticated control and more complex input packets. I finally spent an hour and modified the top-level event loop in the C code to use a piece of the screen real estate and display the last ten events in symbolic form. Once I did this, debugging went even faster, because the screen update was interruptible and didn't interfere with the real-time performance at all.

Timers

I once hit a nasty situation due to a bug in the IBM Professional Debug Facility (PDF). It seems that the PDF restores the flags (and effectively does an STI) sometime before actually doing the RTI to return to the breakpoint. It was not re-entrant. When I started debugging with the timer enabled, I discovered that if I took any breakpoint, but had a breakpoint set in the timer routine, the system would crash when I said "proceed," because the timer interrupt was pending and taken as soon as the interrupts were enabled. The breakpoint in the timer routine caused the debugger to be entered before it had fully exited, and it was not prepared to deal with this. It took some careful analysis to determine the cause, but I finally isolated it to the situation just described. (Unfortunately, I got no vendor support for this bug; there didn't even seem to be a mechanism for reporting it!)

The problem caused by this debugger bug was that if I took a breakpoint once the timer was running, I was doomed. What to do? The answer was simple: Turn the timer off before the breakpoint is taken and restore it afterwards. So I modified the macro in Example 1(b) to that in Example 1(c).

The routines DBT_OFF and DBT_ON simply stopped the timer and restarted it if it was on. Again notice that this actually takes executed instructions whether the breakpoint is taken or not. For super-time-critical applications, this may be unacceptable. For many applications, the stop/start of the timer may be unacceptable unless the breakpoint is actually taken. These complicate life somewhat.

The principle, "Don't design dynamically stable systems; your system should function at any clock speed, including 0 Hz," is harder to accomplish in hardware in an era of dynamic RAMs, but we can apply it to software. If we can safely ignore the impact of slow performance on the external device, we should be able to build into our "real-time" system the ability to run the clock at any speed. I've done this in a number of systems. For one thing, it slows down the interactions to where you can see them; instead of characters coming out at full 9600 baud, I might emit one character every five seconds, giving me enough time to actually see the effect. By slowing the clock speed, you can often take time to do screen displays of what is going on for analysis.

The technique of slowing the clock down is quite simple, and is shown in Listing Two (page 98). The variable _cdiv can be set from a high-level debugger, a command to the program, a command-line argument, and so on. For example, in one debugging scenario, I loaded the address of the divisor into the D0 register before taking the breakpoint, and saved the contents of the D0 register after the breakpoint. So to set the clock divisor, all I had to do from the monitor was change the value in the D0 register!

Set to 0 or 1, the clock interrupts are handled at full speed; set to any larger value, the clock is effectively divided by that amount. Of course, you could accomplish the same effect by changing the clock-time value.

But I usually make an error in doing this and don't get the expected result, or if I want 100:1 slowdown, I can't get a large enough value into the 16-bit clock (such as I found on 80x86 boxes). Slowing the clock down by the divisor technique is simple and lets me get 65535:1 slowdown if I need it. Debugging techniques should not be so complicated that they themselves may contain bugs.

As a further consideration, the program itself should be impervious to timing. For example, in doing certain RS-232 packet protocols, time-out becomes an important consideration for robustness. But if you're debugging the sender, the receiver will almost certainly time out. In this case, you need to have an option that allows you to disable time-out at the receiver, either by disabling it entirely, or slowing it down enough to allow for debugging of the transmitter. In addition, you need an option that will force a time-out condition at the receiver, even if it is a lie. Why? So you can check that the time-out recovery actually works!

Using High-level Language Interrupt Handlers and ROM Code

With a number of extensions to C compilers (Microsoft and Borland C, for example), you can declare a C procedure to be of type interrupt and dispense with any assembly-code interface at all. Such a system means that all the code is compiler generated, and the NOP trick can't be done.

Assume that we have a low-level debugger which can run on the bare metal, such as the IBM Professional Debug facility (obviously you want to use Soft-Ice if you have a 386 or higher, but for 8088/186/286 you need help; or we may have an embedded system with all of the code in C but no real debugger), and that this debugger will intercept the interrupts.

What I did in this case was write a procedure very similar to that in Listing Three (page 98), which is written in Microsoft C. I set a value in the bpts array to determine which of the 50 breakpoints I want to enable, and if it is selected, I simply set its index in the AX register and take a debugger breakpoint. For example, to take the breakpoint identified as breakpoint number 7, the call is simply bpt(7).

Note the use of #pragma to suppress the generation of stack-overflow tests. When you're writing interrupt-level code, you don't want the program calling DOS to exit because of stack overflow! You might find yourself trying to re-enter DOS, with the expected serious consequences. (No, you don't want to know how I found this out. It did involve a full reformatting of the disk and a tape restore.)

For ROM code, the same technique applies. However, it may be necessary to explicitly zero out the breakpoint vector kept in RAM, or to use an initialization clause with the declaration, to guarantee that no unwanted breakpoints are taken because of random trash left in memory from the last download.

To be able to quickly find the base of the bpts vector, I use the same technique of taking a breakpoint with a desired value in some register. When the program starts, I call show_bpts and write down the values for DS:AX, which gives me the base of the table. Thus, when I'm at interrupt level with no symbols available, I can readily compute the address of the breakpoint-enable flag. By using symbolic breakpoint names equated to hex index values, I can easily compute the correct address of a breakpoint-enabling flag. At no time do I need an assembler listing or even a segment of link map.

Conclusion

While nothing substitutes for careful design and coding, the complexity of our systems--and in particular the complexity of either retrofitting to an existing system or interfacing to a complex (and poorly documented) interface protocol--is a trap for even the most careful. When the full power of tools like source-level debuggers is unavailable, the techniques described here have significantly improved productivity.

Example 1: (a) Determining the location of the NOP address that you're going to change to a breakpoint instruction; (b) this modified version of Example 1(a) saves a register, loads into that register the breakpoint number, does the NOP, and restores the register; (c) this modified version of Example 1(b) turns the timer off before the breakpoint is taken and then restores it.

(a)

       PUBLIC B_P_n
B_P_n: NOP


(b)

       PUBLIC B_P_n
       PUSH AX
       MOV  AX,n
B_P_n: NOP
       POP  AX


(c)

       PUBLIC B_P_n
       PUSH AX
       MOV  AX,n
       CALL DBT_OFF
B_P_n: NOP
       CALL DBT_ON
       POP  AX

[LISTING ONE]


if debug_events
evlen=256
        public _event_len
        public _event_ptr
        public _event_last
        public _event_events
_event_len      dw      evlen
_event_ptr      dw      0
_event_last     db      0
_event_events   db      evlen dup(0)
endif

newevent        macro   ev
                local   L1
if debug_events
                inc     _event_ptr
                cmp     _event_ptr,evlen
                jl      L1
                mov     _event_ptr,0
L1:             push    bx
                  mov   bx,_event_ptr
                  mov   _event_events[bx],ev
                  mov   _event_events[bx+1],0
                  mov   _event_last,ev
                pop     bx
endif
                endm

[LISTING TWO]


        PUBLIC _cdiv
_cdiv   dw     0
cval    dw     0

_clock  proc far  ; interrupt entry point
        ...       ; interrupt prolog and setup
        dec  cval ; decrement divisor
        jle  ok   ; take interrupt
        jmp  cexit ; clock exit
ok:     mov  ax,_cdiv  ; set up for next time
        mov  cval,ax   ; ...
        ...       ; body of clock interrupt handler
cexit:  ...       ; epilog code
                  rti       ; return
                  endp

</PRE>
<P>
<h4><a name="01f5_000f"><a name="01f5_0010"><B>[LISTING THREE]</B><a name="01f5_0010"></h4>
<pre>

#pragma check_stack(off)
char bpts[50];
void bpt(int n)
    {
     if(bpts[n])
         _asm{
              mov ax,n;
              int 3;
             }

    }
</PRE>
<P>
<h4><a name="01f5_0011"><a name="01f5_0012"><B>[LISTING FOUR]</B><a name="01f5_0012"></h4>
<P>
<pre>

void show_bpts()
   {
    register char * where = bpts;
    _asm {
          mov ax,where;
          int 3;
         }
   }
<B>End Listings</B>

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Embedded Systems

Debugging Real-time Systems

Informative breakpoints, in-core event traces, and timer dividing get the job done

The Breakpoint

Moving Toward Real-time

Event Traces

Timers

Using High-level Language Interrupt Handlers and ROM Code

Conclusion

[LISTING TWO]

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Embedded Systems Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

Embedded Systems

Debugging Real-time Systems

Related Reading

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Embedded Systems Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content