Dr. Dobb's | Multithreaded Debugging Techniques

Debugging multithreaded applications can be a challenging task. The increased complexity of multithreaded programs results in a large number of possible states that the program may be in at any given time. Determining the state of the program at the time of failure can be difficult; understanding why a particular state is troublesome can be even more difficult. Multithreaded programs often fail in unexpected ways, and often in a nondeterministic fashion. Bugs may manifest themselves in a sporadic fashion, frustrating developers who are accustomed to troubleshooting issues that are consistently reproducible and predictable. Finally, multithreaded applications can fail in a drastic fashion-deadlocks cause an application or worse yet, the entire system, to hang. Users tend to find these types of failures to be unacceptable.

General Debug Techniques

Regardless of which library or platform that you are developing on, several general principles can be applied to debugging multithreaded software applications.

The first technique for eliminating bugs in multithreaded code is to avoid introducing the bug in the first place. Many software defects can be prevented by using proper software development practices. The later a problem is found in the product development lifecycle, the more expensive it is to fix. Given the complexity of multithreaded programs, it is critical that multithreaded applications are properly designed up front.

How often have you, as a software developer, experienced the following situation? Someone on the team that you're working on gets a great idea for a new product or feature. A quick prototype that illustrates the idea is implemented and a quick demo, using a trivial use-case, is presented to management. Management loves the idea and immediately informs sales and marketing of the new product or feature. Marketing then informs the customer of the feature, and in order to make a sale, promises the customer the feature in the next release. Meanwhile, the engineering team, whose original intent of presenting the idea was to get resources to properly implement the product or feature sometime in the future, is now faced with the task of delivering on a customer commitment immediately. As a result of time constraints, it is often the case that the only option is to take the prototype, and try to turn it into production code.

While this example illustrates a case where marketing and management may be to blame for the lack of following an appropriate process, software developers are often at fault in this regard as well. For many developers, writing software is the most interesting part of the job. There's a sense of instant gratification when you finish writing your application and press the run button. The results of all the effort and hard work appear instantly. In addition, modern debuggers provide a wide range of tools that allow developers to quickly identify and fix simple bugs. As a result, many programmers fall into the trap of coding now, deferring design and testing work to a later time. Taking this approach on a multithreaded application is a recipe for disaster for several reasons:

Code Reviews

Many software processes suggest frequent code reviews as a means of improving software quality. The complexity of parallel programming makes this task challenging. While not a replacement for using well established parallel programming design patterns, code reviews may, in many cases, help catch bugs in the early stages of development.

One technique for these types of code reviews is to have individual reviewers examine the code from the perspective of one of the threads in the system. During the review, each reviewer steps through the sequence of events as the actual thread would. Have objects that represent the shared resources of the system available and have the individual reviewers (threads) take and release these resources. This technique will help you visualize the interaction between different threads in your system and hopefully help you find bugs before they manifest themselves in code.

As a developer, when you get the urge to immediately jump into coding and disregard any preplanning or preparation, you should consider the following scenarios and ask yourself which situation you'd rather be in. Would you rather spend a few weeks of work up front to validate and verify the design and architecture of your application, or would you rather deal with having to redesign your product when you find it doesn't scale? Would you rather hold code reviews during development or deal with the stress of trying to solve mysterious, unpredictable showstopper bugs a week before your scheduled ship date? Good software engineering practices are the key to writing reliable software applications. Nothing is new, mysterious, or magical about writing multithreaded applications. The complexity of this class of applications means that developers must be conscious of these fundamental software engineering principles and be diligent in following them.

Extending your Application: Using Trace Buffers

Two categories of bugs are often found in multithreaded applications: synchronization bugs and performance bugs. Synchronization bugs include race conditions and deadlocks that cause unexpected and incorrect behavior. Performance bugs arise from unnecessary thread overhead due to thread creation or context switch overhead, and memory access patterns that are suboptimal for a given processor's memory hierarchy. The application returns the correct results, but often takes too long to be usable. This chapter focuses on debugging synchronization bugs that cause applications to fail.

In order to find the cause of these types of bugs, two pieces of information are needed:

In many cases, finding and fixing synchronization bugs involves code inspection. A log or trace of the different threads in the application and the pattern in which they accessed the shared resources of the code helps narrow down the problematic code sections. One simple data structure that collects this information is the trace buffer.

A trace buffer is simply a mechanism for logging events that the developer is interested in monitoring. It uses an atomic counter that keeps track of the current empty slot in the array of event records. The type of information that each event can store is largely up to the developer. A sample implementation of a trace buffer, using the Win32 threading APIs, is shown in Listing One ( In the interest of making the code more readable, Listing One uses the time() system call to record system time. Due to the coarse granularity of this timer, most applications should use a high performance counter instead to keep track of the time in which events occurred.)

Listing One, creates a trace buffer that can store 1,024 events. It stores these events in a circular buffer. As you'll see shortly, once the circular buffer is full, your atomic index will wrap around and replace the oldest event. This simplifies your implementation as it doesn't require dynamically resizing the trace buffer or storing the data to disk. In some instances, these operations may be desirable, but in general, a circular buffer should suffice.

Lines 1-13 define the data structures used in this implementation. The event descriptor traceBufferElement is defined in lines 4-9. It contains three fields: a field to store the thread ID, a timestamp value that indicates when the event occurred, and a generic message string that is associated with the event. This structure could include a number of additional parameters, including the name of the thread.

The trace buffer in Listing One defines three operations. The first method, InitializeTraceBuffer(), initializes the resources used by the trace buffer. The initialization of the atomic counter occurs on line 16. The atomic counter is initialized to -1. The initial value of this counter is -1 because adding a new entry in the trace buffer requires us to first increment (line 29) the atomic counter. The first entry should be stored in slot 0. Once the trace buffer is initialized, threads may call AddEntryToTraceBuffer() to update the trace buffers with events as they occur. PrintTraceBuffer() dumps a listing of all the events that the trace buffer has logged to the screen. This function is very useful when combined with a debugger that allows users to execute code at a breakpoint. Both Microsoft Visual Studio and GDB support this capability. With a single command, the developer can see a log of all the recent events being monitored, without having to parse a data structure using the command line or a watch window.

Note that the implementation of the trace buffer in Listing One logs events as they are passed into the buffer. This doesn't necessarily guarantee that the trace buffer will log events exactly as they occur in time. To illustrate this point, consider the two threads shown in Listing Two.

By now it should be clear what the problem is. A race condition exists between the two threads and the access to the trace buffer. Thread1 may write to the global data value and then start logging that write event in the trace buffer. Meanwhile, Thread2 may read that same global value after the write, but log this read event before the write event. Thus, the data in the buffer may not be an accurate reflection of the actual sequence of events as they occurred in the system.

One potential solution to this problem is to protect the operation that you want to log and the subsequent trace buffer access with a synchronization object. A thread, when logging the event, could request exclusive access to the trace buffer. Once the thread has completed logging the event, it would then unlock the trace buffer, allowing other threads to access the buffer. This is shown in Listing Three.

There are a number of drawbacks to this technique. Using a synchronization primitive to protect access to a trace buffer may actually mask bugs in the code, defeating the purpose of using the trace buffer for debug. Assume that the bug the developer is tracking down is related to a missing lock around the read or write access in the thread. By locking access to the trace buffer, the developer is protecting a critical section of code that may be incorrectly unprotected. Generally speaking, when tracking down a race condition, the programmer should avoid synchronizing access to the trace buffer. If you synchronize access and your application works, it's a clue that there may be a problem in the synchronization mechanism between those threads.

The preferred method to overcoming this limitation is to log a message before and after the event occurs. This is demonstrated in Listing Four.

By logging a before and after message, a programmer can determine whether or not the events occurred as expected. If the before and after messages between the two threads occur in sequence, then the developer can safely assume that the event was ordered. If the before and after messages are interleaved, then the order of events is indeterminate; the events may have happened in either order.

A trace buffer can be used to gather useful data about the sequence of operations occurring in a multithreaded application. For other more difficult problems, more advanced threading debug tools may be required.

Debugging Multithreaded Applications in Windows

Most Windows programmers use Microsoft Visual Studio as their primary integrated development environment (IDE). As part of the IDE, Microsoft includes a debugger with multithreaded debug support. This section examines the different multithreaded debug capabilities of Visual Studio, and then demonstrates how they are used.

Threads Window

As part of the debugger, Visual Studio provides a "Threads" window that lists all of the current threads in the system. From this window, you can:

The Threads window acts as the command center for examining and controlling the different threads in an application.

Tracepoints

As previously discussed, determining the sequence of events that lead to a race condition or deadlock situation is critical in determining the root cause of any multithread related bug. In order to facilitate the logging of events, Microsoft has implemented tracepoints as part of the debugger for Visual Studio 2005.

Most developers are familiar with the concept of a breakpoint. A tracepoint is similar to a breakpoint except that instead of stopping program execution when the applications program counter reaches that point, the debugger takes some other action. This action can be printing a message or running a Visual Studio macro.

Enabling tracepoints can be done in one of two ways. To create a new tracepoint, set the cursor to the source line of code and select "Insert Tracepoint." If you want to convert an existing breakpoint to a tracepoint, simply select the breakpoint and pick the "When Hit" option from the Breakpoint submenu. At this point, the tracepoint dialog appears.

When a tracepoint is hit, one of two actions is taken based on the information specified by the user. The simplest action is to print a message. The programmer may customize the message based on a set of predefined keywords. These keywords, along with a synopsis of what gets printed, are shown in Table 1. All values are taken at the time the tracepoint is hit.

[Click image to view at full size]

Table 1: Tracepoint Keywords

In addition to the predefined values in Table 1, tracepoints also give you the ability to evaluate expressions inside the message. In order to do this, simply enclose the variable or expression in curly braces. For example, assume your thread has a local variable threadLocalVar that you'd like to have displayed when a tracepoint is hit. The expression you'd use might look something like this:

Thread: $TNAME local variables value is {threadLocalVar}.

Breakpoint Filters

Breakpoint filters allow developers to trigger breakpoints only when certain conditions are triggered. Breakpoints may be filtered by machine name, process, and thread. The list of different breakpoint filters is shown in Table 2.

[Click image to view at full size]

Table 2: Breakpoint Filter Options

Breakpoint filters can be combined to form compound statements. Three logic operators are supported: !(NOT), &(AND), and ||(OR).

Naming Threads

When debugging a multithreaded application, it is often useful to assign unique names to the threads that are used in the application. Assigning a name to a thread in a managed application is as simple as setting a property on the thread object. In this environment, it is highly recommended that you set the name field when creating the thread, because managed code provides no way to identify a thread by its ID.

In native Windows code, a thread ID can be directly matched to an individual thread. Nonetheless, keeping track of different thread IDs makes the job of debugging more difficult; it can be hard to keep track of individual thread IDs. You may have noticed the conspicuous absence of any sort of name parameter in the methods used to create threads. In addition, there is no function provided to get or set a thread name. It turns out that the standard thread APIs in Win32 lack the ability to associate a name with a thread. As a result, this association must be made by an external debugging tool.

Microsoft has enabled this capability through predefined exceptions built into their debugging tools. Applications that want to see a thread referred to by name need to implement a small function that raises an exception. The exception is caught by the debugger, which then takes the specified name and assigns it to the associated ID. Once the exception handler completes, the debugger will use the user-supplied name from then on.

The implementation of this function can be found on the Microsoft Developer Network (MSDN) Web site at msdn.microsoft.com by searching for: "setting a thread name (unmanaged)." The function, named SetThreadName(), takes two arguments. The first argument is the thread ID. The recommended way of specifying the thread ID is to send the value -1, indicating that the ID of the calling thread should be used. The second parameter is the name of the thread. The SetThreadName() function calls RaiseException(), passing in a special 'thread exception' code and a structure that includes the thread ID and name parameters specified by the programmer.

Once the application has the SetThreadName() function defined, the developer may call the function to name a thread. This is shown in Listing Five. The function Thread1 is given the name Producer, indicating that it is producing data for a consumer (Admittedly the function name Thread1 should be renamed to Producer as well, but is left somewhat ambiguous for illustration purposes. Note that the function is called at the start of the thread, and that the thread ID is specified as -1. This indicates to the debugger that it should associate the calling thread with the associated ID.

Listing Five: Using SetThreadName to Name a Thread.

unsigned __stdcall Thread1(void *)
{
   int i, x = 0; // arbitrary local variable declarations
   SetThreadName(-1, "Producer");
   // Thread logic follows
}

Naming a thread in this fashion has a couple of limitations. This technique is a debugger construct; the OS is not in any way aware of the name of the thread. Therefore, the thread name is not available to anyone other than the debugger. You cannot programmatically query a thread for its name using this mechanism. Assigning a name to a thread using this technique requires a debugger that supports exception number 0x406D1388. Both Microsoft's Visual Studio and WinDbg debuggers support this exception. Despite these limitations, it is generally advisable to use this technique where supported as it makes using the debugger and tracking down multithreaded bugs much easier.

Putting It All Together

Let's stop for a minute and take a look at applying the previously discussed principles to a simplified real-world example. Assume that you are writing a data acquisition application. Your design calls for a producer thread that samples data from a device every second and stores the reading in a global variable for subsequent processing. A consumer thread periodically runs and processes the data from the producer. In order to prevent data corruption, the global variable shared by the producer and consumer is protected with a Critical Section. An example of a simple implementation of the producer and consumer threads is shown in Listing Six. Note that error handling is omitted for readability.

Listing Six: Simple Data Acquisition Device.

 static int m_global = 0;
 static CRITICAL_SECTION hLock; // protect m_global
  
 // Simple simulation of data acquisition
 void sample_data()
 {
    EnterCriticalSection(&hLock);      
    m_global = rand();
    LeaveCriticalSection(&hLock);      
 } 
 
 // This function is an example 
 // of what can be done to data
 // after collection
 // In this case, you update the display
 // in real time
 void process_data()
 {
    EnterCriticalSection(&hLock);      
    printf("m_global = 0x%x\n", m_global);
    LeaveCriticalSection(&hLock);      
 }
  
 // Producer thread to simulate real time 
 // data acquisition. Collect 30 s 
 // worth of data
 unsigned __stdcall Thread1(void *)
 {
    int count = 0;
    SetThreadName(-1, "Producer");
    while (1)
    {
       // update the data
       sample_data();
 
       Sleep(1000);
       count++;
       if (count > 30)
          break;
    }
    return 0;
 }
 
 // Consumer thread
 // Collect data when scheduled and 
 // process it. Read 30 s worth of data
 unsigned __stdcall Thread2(void *)
 {
    int count = 0;
    SetThreadName(-1, "Consumer");
    while (1)
    {
       process_data();
 
 	Sleep(1000);
       count++;
       if (count > 30)
          break;
    }
    return 0;
 }

The producer samples data on line 34 and the consumer processes the data in line 53. Given this relatively simple situation, it is easy to verify that the program is correct and free of race conditions and deadlocks. Now assume that the programmer wants to take advantage of an error detection mechanism on the data acquisition device that indicates to the user that the data sample collected has a problem. The changes made to the producer thread by the programmer are shown in Listing Seven.

Listing Seven: Sampling Data with Error Checking.

void sample_data()
{
   EnterCriticalSection(&hLock);      
   m_global = rand();
   if ((m_global % 0xC5F) == 0)
   {
      // handle error
      return;
   }
   LeaveCriticalSection(&hLock);      
}

After making these changes and rebuilding, the application becomes unstable. In most instances, the application runs without any problems. However, in certain circumstances, the application stops printing data. How do you determine what's going on?

The key to isolating the problem is capturing a trace of the sequence of events that occurred prior to the system hanging. This can be done with a custom trace buffer manager or with tracepoints. This example uses the trace buffer implemented in Listing One.

Now armed with a logging mechanism, you are ready to run the program until the error case is triggered. Once the system fails, you can stop the debugger and examine the state of the system. To do this, run the application until the point of failure. Then, using the debugger, stop the program from executing. At this point, you'll be able bring up the Threads window to see the state information for each thread, such as the one shown in Figure 1.

[Click image to view at full size]

Figure 1: Examining Thread State Information Using Visual Studio 2005

When you examine the state of the application, you can see that the consumer thread is blocked, waiting for the process_data() call to return. To see what occurred prior to this failure, access the trace buffer. With the application stopped, call the PrintTraceBuffer() method directly from Visual Studio's debugger. The output of this call in this sample run is shown in Figure 2.

[Click image to view at full size]

Figure 2: Output from trace buffer after Error Condition Occurs

Examination of the trace buffer log shows that the producer thread is still making forward progress. However, no data values after the first two make it to the consumer. This coupled with the fact that the thread state for the consumer thread indicates that the thread is stuck, points to an error where the critical section is not properly released. Upon closer inspection, it appears that the data value in line 7 of the trace buffer log is an error value. This leads up back to your new handling code, which handles the error but forgets to release the mutex. This causes the consumer thread to be blocked indefinitely, which leads to the consumer thread being starved. Technically this isn't a deadlock situation, as the producer thread is not waiting on a resource that the consumer thread holds.

The complete data acquisition sample application is provided on this book's Web site, www.intel.com/intelpress/mcp.

Multithreaded Debugging Using GDB

For POSIX threads, debugging is generally accomplished using the GNU Project Debugger (GDB). GDB provides a number of capabilities for debugging threads, including:

Automatic notification when new threads are created
Listing of all threads in the system
Thread-specific breakpoints
The ability to switch between threads
The ability to apply commands to a group of threads

Not all GDB implementations support all of the features outlined here. Please refer to your system's manual pages for a complete list of supported features.

Notification on Thread Creation

When GDB detects that a new thread is created, it displays a message specifying the thread's identification on the current system. This identification, known as the systag, varies from platform to platform. Here is an example of this notification:

Starting program: /home/user/threads
[Thread debugging using libthread_db enabled]
[New Thread -151132480 (LWP 4445)]
[New Thread -151135312 (LWP 4446)]

Keep in mind that the systag is the operating system's identification for a thread, not GDB's. GDB assigns each thread a unique number that identifies it for debugging purposes.

Getting a List of All Threads in the Application

GDB provides the generic info command to get a wide variety of information about the program being debugged. It is no surprise that a subcommand of info would be info threads. This command prints a list of threads running in the system:

(gdb) info threads
2 Thread -151135312 (LWP 4448)  0x00905f80 in vfprintf ()   from /lib/tls/libc.so.6
* 1 Thread -151132480 (LWP 4447)  main () at threads.c:27

The info threads command displays a table that lists three properties of the threads in the system: the thread number attached to the thread by GDB, the systag value, and the current stack frame for the current thread. The currently active thread is denoted by GDB with the * symbol. The thread number is used in all other commands in GDB.

Setting Thread-Specific Breakpoints

GDB allows users that are debugging multithreaded applications to choose whether or not to set a breakpoint on all threads or on a particular thread. The much like the info command, this capability is enabled via an extended parameter that's specified in the break command. The general form of this instruction is:

break linespec thread threadnum

where linespec is the standard gdb syntax for specifying a breakpoint, and threadnum is the thread number obtained from the info threads command. If the thread threadnum arguments are omitted, the breakpoint applies to all threads in your program. Thread-specific breakpoints can be combined with conditional breakpoints:

(gdb) break buffer.c:33 thread 7 if level > watermark

Note that stopping on a breakpoint stops all threads in your program. Generally speaking this is a desirable effect-it allows a developer to examine the entire state of an application, and the ability to switch the current thread. These are good things.

Developers should keep certain behaviors in mind, however, when using breakpoints from within GDB. The first issue is related to how system calls behave when they are interrupted by the debugger. To illustrate this point, consider a system with two threads. The first thread is in the middle of a system call when the second thread reaches a breakpoint. When the breakpoint is triggered, the system call may return early. The reason-GDB uses signals to manage breakpoints. The signal may cause a system call to return prematurely. To illustrate this point, let's say that thread 1 was executing the system call sleep(30). When the breakpoint in thread 2 is hit, the sleep call will return, regardless of how long the thread has actually slept. To avoid unexpected behavior due to system calls returning prematurely, it is advisable that you check the return values of all system calls and handle this case. In this example, sleep() returns the number of seconds left to sleep. This call can be placed inside of a loop to guarantee that the sleep has occurred for the amount of time specified. This is shown in Listing Eight.

Listing Eight: Proper Error Handling of System Calls.

int sleep_duration = 30;
do
{
   sleep_duration = sleep(sleep_duration);
} while (sleep_duration > 0);

The second point to keep in mind is that GDB does not single step all threads in lockstep. Therefore, when single-stepping a line of code in one thread, you may end up executing a lot of code in other threads prior to returning to the thread that you are debugging. If you have breakpoints in other threads, you may suddenly jump to those code sections. On some OSs, GDB supports a scheduler locking mode via the set scheduler-locking command. This allows a developer to specify that the current thread is the only thread that should be allowed to run.

Switching Between Threads

In GDB, the thread command may be used to switch between threads. It takes a single parameter, the thread number returned by the info threads command. Here is an example of the thread command:

gdb) thread 2
[Switching to thread 2 (Thread -151135312 (LWP 4549))]#0  PrintThreads (num=0xf6fddbb0) at threads.c:39
39      { 
(gdb) info threads
* 2 Thread -151135312 (LWP 4549)  PrintThreads (num=0xf6fddbb0) at threads.c:39
  1 Thread -151132480 (LWP 4548)  main () at threads.c:27
(gdb)

In this example, the thread command makes thread number 2 the active thread.

Applying a Command to a Group of Threads

The thread command supports a single subcommand apply that can be used to apply a command to one or more threads in the application. The thread numbers can be supplied individually, or the special keyword all may be used to apply the command to all threads in the process, as illustrated in the following example:

gdb) thread apply all bt
Thread 2 (Thread -151135312 (LWP 4549)):
#0  PrintThreads (num=0xf6fddbb0) at threads.c:39
#1  0x00b001d5 in start_thread () from /lib/tls/libpthread.so.0
#2  0x009912da in clone () from /lib/tls/libc.so.6
Thread 1 (Thread -151132480 (LWP 4548)):
#0  main () at threads.c:27
39      { 
(gdb)

The GDB backtrace (bt) command is applied to all threads in the system. In this scenario, this command is functionally equivalent to: thread apply 2 1 bt.

Key Points

This article described a number of general purpose debugging techniques for multithreaded applications. To sum up:

Proper software engineering principles should be followed when writing and developing robust multithreaded applications.
When trying to isolate a bug in a multithreaded application, it is useful to have a log of the different sequence of events that led up to failure. A trace buffer is a simple mechanism that allows programmers to store this event information.
Bracket events that are logged in the trace buffer with "before" and "after" messages to determine the order in which the events occurred.
Running the application in the debugger may alter the timing conditions of your runtime application, masking potential race conditions in your application.
Tracepoints can be a useful way to log or record the sequence of events as they occur.
For advanced debugging, consider using the Intel software tools, specifically, the Intel Debugger, the Intel Thread Checker, and the Intel Thread Profiler.

Shameem Akhter is a platform architect at Intel Corporation, focusing on single socket multicore architecture and performance analysis. Jason Roberts is a senior software engineer at Intel, and has worked on a number of different multithreaded software products that span a wide range of applications targeting desktop, handheld, and embedded DSP platforms. This article was excerpted from their book MultiCore Programming. Copyright (c) 2006 Intel Corporation. All rights reserved.