Channels ▼

Embedded Systems

Using Trace to Debug Real-Time Systems

Source Code Accompanies This Article. Download It Now.

Michael is a software engineer at Green Hills Software. Michael can be reached at [email protected]

Imagine that your embedded software project is progressing nicely through the final stages of testing when, suddenly, the software unexpectedly crashes. The engineers who found this bug cannot give you a reproducible test case, yet they are able to get the product to crash periodically. Sound familiar? As an engineer, you want to solve this problem, yet there is no reliable way to reproduce it. And to add to the stress of the situation, your manager is pressuring you to release the product because you need to get to market as quickly as possible. Software teams encounter this situation all the time and are forced to release products even though they know that there are serious bugs lurking in the code.

Now imagine that there is a tool that gives you nearly perfect visibility into what your software was doing when the system crashed, making it easy to find and fix these intermittent and difficult bugs. The technology that makes this dream a reality is trace data generated by embedded microprocessors, and the tools that collect and analyze this trace data make it possible for you to analyze what went wrong and find the cause of the undesirable behavior.

Hardware trace data has been around for years, yet it has traditionally only been used by hardware and low-level firmware engineers to track down board-level problems and initialization failures. However, the concept behind hardware trace data is familiar to software engineers because they have been using a similar technique—printf() logging and debugging—since the advent of modern software development.

Hardware trace data provides a complete log of the instructions executed by the processor, essentially giving you full printf() debugging in a completely nonintrusive fashion. This lets you easily look at interactions between multiple tasks and interrupts in your system. However, hardware trace data can also be used in many embedded systems where normal code instrumentation is less than ideal. For instance, if you have a system that has tight timing margins, you may not be able to insert enough extra statements to track the flow of your program reliably. Additionally, you may have bugs that occur only because of sensitive timing problems and inserting extra code may make the bugs disappear.

Because hardware trace provides a full history of the instructions executed by the processor, there is too much data to analyze efficiently by hand, but there are extensive software tools that analyze trace data and provide high-level information to help solve a variety of problems. Basic trace-analysis software lets you reconstruct the flow of your program and possibly see the memory locations that were read or written by it. There is also software that can generate profiling information from trace data, provide additional high-level views of your program execution, and even software that lets you virtually step backwards through your code so that you can see exactly what happened. These tools offer the main value of trace data because you are able to collect and analyze all of this information without having to change any of the runtime characteristics of your embedded software project.

Hardware Versus Software Trace

Much has been written about software trace and instrumenting code to provide software trace logs (see, for example, "Tracing Program Execution & NUnit" by Paul Kimmel, DDJ, August 2004). While this type of instrumentation is great for certain applications, hardware trace has several advantages over software trace, especially for embedded systems.

First of all, software trace requires you to instrument your code. If you have an existing codebase that does not have trace instrumentation, it could take months to insert trace statements in all of the interesting code fragments. And if you overlook some of the interesting code or neglect to instrument some functions, you may spend a good deal of time investigating the wrong cause of your bug.

A second advantage of hardware trace data is that it often has the ability to provide data values along with each instruction that is executed. This lets you see the values that caused the program to take the incorrect execution path, which gives you more insight into what caused the problem and how to fix it. While you can certainly log values with software trace, it is impractical to log more than a few, and if a bug occurs and you did not instrument one necessary value then you may not be able to find the cause of your bug.

Finally, the most important advantage that hardware trace data has over instrumentation—especially for embedded systems—is that it does not involve changing any aspect of your program to collect the data. This is vital for applications that may involve timing errors, interrupt latency issues, or other bugs that only appear when a specific set of circumstances arises. Additionally, it is often impractical to instrument code in interrupt handlers or other timing-sensitive code. In nearly all high-reliability applications, you must certify and test the final code without any instrumentation, and software trace almost certainly cannot be enabled in production code, making it impossible to use software trace to track down bugs that appear in the production version of the code.

What is Hardware Trace Data?

Hardware trace data is generated and output by microprocessors and collected using a Trace Port Analyzer (TPA), which is a device that records trace data from a microprocessor and stores it into a trace buffer on the device. This data is output over a dedicated trace port and consists of a stream of highly compressed packets. Some examples of embedded microprocessors that currently support trace are ARM processors such as the ARM7, ARM9, and ARM11; the IBM PowerPC 4xx series; as well as the PowerPC 5554 and MAC7100 family from Motorola. Figure 1 shows a trace collection system with a microprocessor outputting trace data that is collected by a TPA and a PC that software engineers use to debug software.

There are several Trace Port Analyzers available, including the Agilent E5904B Trace Port Analyzer (, the RealView MultiTrace from ARM (, and the SuperTrace Probe from Green Hills Software (the company I work for, The trace buffers on TPAs range from a few megabytes (the Agilent E5904B has 4 MB of trace storage) to 1 GB (available on the SuperTrace Probe). Figure 2 shows a SuperTrace Probe ready to collect trace from an ARM920 target.

Trace data essentially consists of a log of each instruction that is executed by the processor. This is often referred to as "PC Trace" because it is a simply a list of PCs executed by the processor. Many trace implementations also contain information about data that is read from and written to memory (called "Data Trace"), as well as information about when context switches and interrupts occur. This information combines to give you a complete view of exactly what the processor was doing as your system was running. Trace data lets you debug either a single application or the entire system by showing you the interactions between various tasks and interrupts. This is essential for finding the most difficult bugs in complicated systems.

As you can imagine, collecting a full log of the PCs executed and data values accessed involves a lot of data. For instance, if you have a processor running at 200 MHz and executing roughly one instruction per cycle, storing a full trace log of 32-bit PCs involves storing roughly 800 MB of trace data per second. On top of this, if 20 percent of the instructions are data accesses (assuming data trace consists of 32-bits of address and 32-bits of data), then that is another 320 MB of data per second. That's well over 1 GB of data per second for a relatively slow processor! Due to this high volume of data, the designers of trace-enabled microprocessors have implemented efficient compression systems to minimize the amount of data that is streaming out of the processor. With these compression schemes, trace ports generally output between 4 and 20 bits per cycle, depending on the compression scheme, whether data trace is available, and how many pins the chip designers are willing to dedicate to trace.

Even with a high rate of compression, trace data still uses very high bandwidth. The 200-MHz processor just mentioned still outputs trace data at a rate of 500 MB/second with a 20-bit trace port. Thus, even a very large trace buffer on a TPA fills up in a few seconds or less. To allow tracing over longer periods of time, the devices that collect trace data generally use a circular buffer to constantly collect data, always throwing away the oldest data once the buffer is full.

Using a circular buffer to collect trace data makes it important to be able to precisely stop trace collection when a problem occurs because the trace buffer wraps around so quickly. This problem is solved by "trace triggers," which are special signals from the trace hardware that are sent through the trace port when a condition defined the user occurs. Potential trigger conditions include executing a specific instruction and reading from a selected memory location. When the TPA receives the trigger signal, it collects a user-defined portion of the buffer and then stops trace collection. For instance, you can configure the TPA to trigger in the center of the trace buffer, which collects half of your trace buffer before and half after the trigger occurs. Likewise, you could put the trigger at the beginning or end of the buffer, or anywhere in between, depending on what information you are trying to collect.

For example, if you are trying to debug an error message that occurs intermittently, you could set a trigger to occur when the error message is printed. Further, you could configure the trigger to occur at the end of the trace buffer. Once the error message is printed with trace collection enabled, your trace buffer will contain information about what instructions were executed prior to the error message. You will now be able to examine in detail the code paths that were executed, looking at memory and register values to help you determine what caused the error message you are interested in.

Trace-Analysis Software

Once the trace data is collected, the analysis software uploads the data from the TPA and decompresses it. After decompressing the trace data, the trace-analysis software displays the list of instructions that were executed. However, because most software is written at a much higher level than assembly code, a list of instructions executed by the processor is not all that useful to software engineers. Software that can analyze this information and present it to software developers in an easy-to-understand format that allows them to see exactly what is happening in their system is a major benefit that trace data can provide to the embedded software development process. There are many trace-analysis software packages available, including the ARM RealView Developer Suite and the TimeMachine tools from Green Hills Software.

An Example

One of the best examples of a situation where trace data is particularly useful is a system that suffers from resource contention problems. For instance, if two tasks try to access a shared memory-mapped bus without some sort of locking mechanism, one or both tasks will get incorrect values. This could cause any number of system glitches and errors. This type of problem is often not obvious from looking at the source code for either task because the bug only occurs due to the interaction between two tasks. Additionally, in complicated systems, this type of problem may occur infrequently, making it even harder to track down the root cause. With trace, as long as you catch the bug occurring one time, you have a good chance at discovering the root cause and fixing the bug.

Listing One is a simplified example of two tasks incorrectly sharing a resource. In this case, two tasks read from the I2C bus on a processor. Roughly, an I2C bus is a serial bus requiring multiple transactions to read a single piece of data, including selecting a device and then reading from the selected device. Task A periodically reads from a temperature sensor on the I2C bus and task B reads from an EEPROM that is also connected to the I2C bus. Task A checks for an invalid reading from the temperature sensor and prints an error when it gets an invalid reading. You can set a trigger to occur when the error condition (printf() statement) is executed. Then enable trace when you start running the program and wait for the trigger to be hit. Once the trigger is hit, the trace data will be loaded into the trace-analysis software and you will be able to look at what happened in your system prior to the error.

Many trace-analysis software packages convert the list of instructions executed by the processor into a list of functions that were executed and what task was running at each point in time. This is a relatively elementary type of analysis that can give useful information to programmers debugging a system. For this example, Figure 3 shows a sequence of functions and context switches that indicate that a context switch occurred after selecting one device on the bus but before reading the data from that device.

From examining this series of function calls, it is clear that one task initiated an I2C bus transaction while another bus transaction was in progress. However, especially since these two devices seem unrelated at first glance, traditional debugging techniques will probably not uncover this bug nearly as quickly as the trace tools.

Performance Analysis

Trace data also makes performance analysis easy by processing a trace log and displaying profiling information. This lets you easily see how much processor time each part of your system uses without requiring any code instrumentation or other overhead. Additionally, because it does not require any code modifications, profiling information generated from trace data lets you see how long various interrupt handlers take when they run and let you identify situations where various components will not meet their real-time deadlines. Another significant benefit of using trace data for profiling is that you can get very accurate performance numbers without incurring any system overhead or using on-board sampling timers, which lets you collect all instructions for a region of code rather than some relatively small number of samples per second.

Many TPAs are able to generate either a timestamp or cycle count information along with each incoming trace packet. A timestamp is the time at which each incoming trace packet is collected, often with a resolution of a few nanoseconds, and a cycle count is the number of processor cycles that each instruction takes to execute. Both of these pieces of information provide accurate timing information for individual instructions, memory accesses, and other processor events. You can even see individual cache misses by finding memory accesses that take longer than others! This information provides very detailed performance analysis, letting you see detailed performance information that is unavailable using typical debugging methods.

One great advantage of this profiling information being readily available when tracing a microprocessor is that you can often find problems that you may not even be specifically looking for. Since you do not need to recompile the code to collect profiling data, it is almost effortless to collect trace data and convert it to profiling information. You can then easily see what functions executed and find any potential performance problems before they start negatively affecting your system.

Another specific application of trace-based performance analysis is driver optimization. If you have ever had to try to squeeze every bit of performance out of an Ethernet driver and/or TCP/IP stack, you know that a large part of the battle is determining exactly what sequence of events occur and where the bottlenecks are. Using trace data coupled with the generated profiling information allows you to first find the functions or segments of code that take the most time, then examine the trace data to determine exactly what code was executed and why it took so long. Whether it is because your code copies the data too many times or uses an inefficient algorithm, trace tools can quickly help you find the bottlenecks in your system. You can then examine these areas in more detail using traditional debugging techniques to eliminate inefficiencies in your code.


While trace is currently supported on relatively few embedded processors, the advantages of devoting some portion of the silicon to generating trace data greatly outweigh the extra costs in many applications. Indeed, since more and more new processor designs include trace implementations and trace-analysis tools are improving, trace data has the potential to fundamentally change the way software engineers debug their software. From simple resource management errors to complicated memory bugs within a single application, trace data and analysis make it possible to correct otherwise difficult bugs in a relatively short time.

How many of us have had bugs reported to us that we were unable to reproduce and had to write off as random glitches? Imagine if it were suddenly possible to solve many of these problems and remove most if not all of the glitches from your system. With trace data, this possibility is becoming a reality. And as more processors support trace and the trace-analysis tools improve, it will only become easier in the future.


Listing One

// This function is obviously not correct for actually reading 
// from an i2c bus, but is an example of a function that could
// read from a shared resource.
int read_i2c_value(int DEVICE_ID)
    // Select the I2C device
    // Select the register on this device that we want to read
    // Actually read the register
    return read_i2c_register();
#define EEPROM 2

void TaskA_entry()
    while(1) {
    int temp_c;
    temp_c = read_i2_value(TEMPERATURE_SENSOR);
    if(temp_c > 100) {
         // Error.  The temperature cannot possibly be
         // above boiling!
        printf("Error reading temperature sensor!\n");
void TaskB_entry()
    while(1) {
    int eeprom_value;
    eeprom_value = read_i2c_value(EEPROM);
Back to article

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.