Tools

Debugging Multithreaded Applications

Peter Horwood, Shlomo Wygodny and and Martin Zardecki

, March 01, 2000

It is often significantly harder to locate and test for bugs in multithreaded and multiprocess applications than for nonthreaded, single process situations. Our authors describe some of the problems with multithreaded applications and discuss common debugging techniques.

Mar00: Debugging Multithreaded Applications

Peter is vice president of development for aRenDeeco Inc., and can be contacted at [email protected]. Shlomo is the CTO of MuTek Solutions, and can be reached at [email protected]. Martin is a team leader at Trimac Information Services and can be contacted at mzardeck@ trimac.com.

Modern operating systems and programming tools make it easier to develop multithreaded applications. Those of us who implemented background printing and background cleanup routines on DOS systems in the 1980s are aware of the work it took to implement these routines correctly and effectively with tools that had no support for threading. Today, multithreading for the masses has become practical, thanks to built-in multithreading in tools such as Java and the Win32 API. Even tools like Visual Basic have support for multithreading.

However, it is often significantly harder to locate and test for bugs in multithreaded and multiprocess applications than it is in nonthreaded, single process situations. One problem is the variability of the execution order. If we could always have the instructions in the threads execute in the same order or in a fully predictable order, debugging these applications would be easy. That would also defeat the purpose of threads. Another difficulty is that multiple threads may be simulated on a single CPU, where the CPU ultimately decides when each thread is allowed to start and finish. A small thread that is supposed to finish before a larger one may not in reality.

The difficulties with debugging multithreaded programs are intensified in COM-based applications. With the COM architecture, programmers often write code that is run by other executables. Dozens of such processes may be activated semisporadically in the system, for example, by the Microsoft Transaction Server, which activates COM processes. It is difficult to catch the activation of these processes under a debugger, as typically the run time of each component is fairly short.

In this article, we describe some of the specific problems with multithreaded applications and discuss some common debugging techniques by stepping through the process of debugging a sample multithreaded application.

Sample Application

We wrote a sample application that simulates a communications program with numerous threads to handle I/O, using Microsoft Visual C++ 6.0 and MFC. I/O processing is handed off to threads so that the main process thread can keep processing Windows messages without delays or screen/keyboard lock-ups.

The program uses five threads. The main process thread represents the main execution path of the program and also creates additional threads as required. The InputThread thread represents a communications thread that interfaces with Windows sockets and pumps appropriate data to the MasterThread for processing. The MasterThread reads data written to the buffer by InputThread and is synchronized by checking a buffer length variable. If the variable is greater than 9, data is available for reading. At least, if the program were fully debugged, this is what MasterThread is supposed to do. For our purposes, MasterThread doesn't work properly. MasterThread also accesses a variable in the InputThread using a pointer. If InputThread terminates before MasterThread, we will get an error.

Multithreading Problems

The most obvious problem with multithreaded applications is an access violation. An access violation occurs when two or more threads attempt to access the same memory at the same time or when shared memory has been released or resized by one thread without informing the other thread.

In our sample program, one thread places characters into a buffer and increments a counter while another thread removes characters and decrements the counter, as in Listing One. When it works correctly, the ReadBuf thread runs after the InputThreadProc thread has completed all the FillBuf calls for IncrementBufLen, as in Figure 1. However, because we forgot to synchronize these threads, they incorrectly change each other's positions in the shared buffer. While this does not cause a GPF, it does invalidate the data displayed. This failure can be clearly seen in Figure 2, where there are FillBufs occurring before and after the ReadBuf thread runs. ReadBuf should have been locked until the FillBufs were complete.

Another access violation is caused when the MasterThread tries to access a variable in InputThread that no longer exists (see Listing Two), which causes a GPF. This problem is common in C++ programs, where complex structures may be destructed automatically when the thread exits.

A second common problem is deadlock. One way a deadlock can occur is when Thread1 locks ResourceA while Thread2 locks ResourceB. Then, Thread1 attempts to lock ResourceB and waits patiently (keeps trying) until ResourceB is available. Meanwhile, Thread2 attempts to lock ResourceA and waits patiently for ResourceA to be made available. If both threads typically lock one, then the other in quick succession, the bug may rarely show up, and the problem may be blamed on something else, such as hardware or the operating system. You can imagine how this problem becomes complicated with a simulation program where, say, 50 threads are running in parallel.

A second source of deadlock occurs when one thread is waiting for a flag to be set using a blocking call from the Win32 API, such as WaitForSingleObject, but the thread that was supposed to set the flag no longer exists. The thread using the WaitForSingleObject call will wait forever.

A third source of deadlock is when multiple threads simply attempt to lock the same resource at the same time. Nowadays, the operating system or the DBMS are designed to handle this situation. However, we remember some fun times back in DOS days when two threads or two processes both tried printing to the printer at the same time.

Debugging Techniques

There are a number of techniques for trying to debug multithreaded problems. The first and foremost solution is to not let the bugs happen in the first place. Make sure you are synchronizing all shared-memory accesses, and when you need to use more than one resource, if at all possible, lock them in the same order in all threads.

Sometimes, it's helpful for one or more developers to walk through the code, reading and looking for problems. With role playing, you may be able to determine the cause of the problem. When doing this, remember that a very short thread can be running after a very long thread has gone away and has cleaned up after itself. A walkthrough is even more effective if, when your program has crashed, you were given an indication of what the conflict was about.

Logging is a way of letting the problem speak for itself, by saving information to a file or some form of static storage for postmortem analysis. This method is effective, but involves a lot of work to set it up and then to analyze it. The information from many threads may not be easily saved to a single file, as it will have the same problems as multiple threads trying to access shared memory at the same time (overwriting each other's data). In practice, it is often easier to maintain multiple log files and then combine them after the fact for interpretation. Remember to print in the log the thread ID, using GetCurrentThreadId, so you can distinguish between different threads. Be aware that the resolution of a timestamp may only be to the closest 1/18th of a second, which may make it difficult to combine log files.

Although logging is easily the most effective traditional debugging technique, it is time consuming and difficult to use. Logging requires lots of extra code, and frequently leaves artifacts in the final program. Who hasn't seen a shipping program that suddenly pops up a dialog with a message as useful as the one in Figure 3? Beyond these problems, one of the most frustrating situations can be when the bug occurs with the logging turned off, but disappears when logging is turned on. Also, because it is impractical to put a log statement after each line of code, you have to create multiple versions of your code that log different things until the problem is isolated. If you are sending the program to an end user, this can result in many frustrating repetitions before they see any progress.

While we would use debuggers over logging in single threaded applications and when solving single threaded bugs in a multithreaded application, debuggers easily rank second when trying to solve multithreading problems. If you are debugging a multithreaded application, and it magically starts to work when you switch to the debugger, you should take that as an indication that the problem may be in the relationships between threads. When debugging multiple threads, the program can suddenly start to work because the threads are artificially synchronized (or unsynchronized).

With traditional debuggers, you get little support for following multiple threads. Typically, when one thread is stopped, depending on the debugger, others may stop, continue to run, or run behind while you are stepping through the one thread or when you step to a spot that is waiting for user actions. Placing stops on all threads lets you control the execution sequence completely, although it can be difficult. It would be nice to be able to define conditional breakpoints, such as: Break when switching to thread x or when starting threads. It is disappointing that these are not available in popular modern tools, especially when you consider that the DEC/VMS debugger for Ada had this ability 10 years ago.

Knowing the features and limitations of traditional debuggers can help you debug multithreaded errors that are occurring routinely, but it will seldom be of any value for bugs that are showing up occasionally and unpredictably. Debugging multiprocess contention can be even worse. In addition to all of the problems for multithreaded applications, you need to have multiple debuggers running -- one for each process. This is not something you'd want to do even on a 1280×1024 monitor, let alone a lower resolution.

Message spies are programs such as Microsoft's Spy++, DDE Spy, or Kogosoft's OLEspy (http://www.kogosoft.com/). These programs intercept and display various messages transmitted between programs or within a given program. For example, Spy++ can be used to monitor Windows messages that a program receives, whereas DDE Spy can be used to view and display DDE messages passing between different programs. Such tools do give a general overview of messages, but not necessarily the level of detail required to tell which thread is doing what when.

One technique we have used from time to time can hardly be considered traditional, but it has periodically helped us find bugs over the past 10 years in programs with a lot of user interaction. Set up a video camera that videotapes part or all of the screen, and possibly the keyboard as well. When the bug occurs, the user turns off the videotape and gives it to the debugger. The person doing the debugging can go back and replay the video tape, frame by frame, if necessary, to see the order of execution visible on the screen as well as the actions of the user.

In many programs, the sequence of the thread starting and execution is directly or closely affected by the user interaction. Some of these bugs only show up with fast typists; in one case, a thread was started every time the user moved from one row to the next in a table. For most testers, the speed at which they moved was slow enough that the thread had finished executing for the current row before the user successfully moved to the next row. But we were able to see on the tape that faster typists were able to move to the new row before the thread for the previous row had finished executing. This caused the thread to act partly or wholly on the wrong row.

In other cases, this technique has shown bugs that are caused when users perform actions that were not in the sequence anticipated by developers, causing threads to fire in orders not anticipated. We've also used the technique for debugging traditional programs, including those times when a message flashes by on the screen too quickly for the human eye to read, but the camera has no problem catching it.

In recent years, there have been some new tools that specifically address the problems with debugging multithreaded applications. Building on the traditional debugger, NuMega (http://www.numega .com/) has a multiprocess, multithreaded debugger, SoftICE, that can simultaneously monitor events in your program, other programs, and the operating system. This can be used to see how your processes and threads are interacting at run time. There are also multithreaded debuggers for Java, such as NuMega's JCheck and AverStar's JWatch (http://www.jwatch .com/), both of which give very visible views of the threads and where each starts and stops. For UNIX developers, there are a number of multithreaded debuggers dedicated to specific environments such as the Digital Ladebug debugger, SUN's SPARCworks Debugger MT (MT stands for multithreaded), and Etnus's TotalView (http://www.etnus.com/).

MuTek (http://www.mutek.com/), one of our employers, makes a tool called "BugTrapper," which is what we call an "automatic tracer." It is a greatly enhanced version of logging. BugTrapper automatically traces your program without any modifications to your code, and shows you the execution path and context switches that led to a crash. In Figure 4, you can see that the crash is caused by accessing the pointer pID from the MasterThread (Thread 281, in this case). The area to which it points was deallocated before, when InputThread (280) exited. One use of an automatic tracing tool is to let users run it as a black box, waiting patiently for a bug to occur. Later, they can then send you the logging info via e-mail, and you can look to see how the bug occurs. BugTrapper uses a circular buffer of a size you determine to store the logging information, so the program will not chew up all available hard disk.

Conclusion

Multithreading can be a very effective way to reduce the total complexity of a program, while at the same time increasing the functionality and quality from a user's perspective. This applies to the whole range of applications from business software to games.

There is no way to guarantee that you have found every bug in most serious programs, and multithreaded programs are typically an order or two in magnitude further away from that goal. Using good techniques and being careful with your design will go a long way to prevent them from occurring. Traditional debugging techniques can be helpful for finding many of the bugs, and new multithreading tools such as multithreaded debuggers and dynamic logging are making it even easier.

DDJ

Listing One

// Master Thread created earlier in the program
UINT MasterThreadProc(LPVOID pParam)
{
CString     cs_Out;
CThreadzDlg &MainDlg = *(CThreadzDlg *)pParam;

for (;;) {
/* Loop continuously checking to see if there is data in the input
   buffer */
  if (MainDlg.m_iInput1Bufferlen > 0) {
   /* Critical region here, while we use the buffer and the buffer length
      indicator Input1Thread could still be making changes to it, accordingly 
      we will get an inconsistent buffer and/or buffer length indicator */
  MainDlg.InTextBox().GetWindowText(cs_Out);
  cs_Out = cs_Out + CString(MainDlg.m_cInput1PseudoBuffer) + "\r\n";
  MainDlg.InTextBox().SetWindowText(cs_Out);
  MainDlg.m_iInput1Bufferlen = 0;
  // end of critical region
  }
 }
  return 0;
}
// This is a simulated keyboard monitoring thread
UINT Input1ThreadProc(LPVOID pParam)
{
  char  cKbdBuffer[MAX_PATH];
  CThreadzDlg   &MainDlg = *(CThreadzDlg *)pParam;
  // Assign arbitrary string to simulated keyboard
  strcpy(cKbdBuffer, "Hello World!!!");
  for (;;) {
  /* Fill up the pseudo-buffer at random interval but wait for buffer
     length variable to be zeroed by master thread */
  if ((rand()/(float)RAND_MAX * 100.0) < 70.0 &&
      MainDlg.m_iInput1Bufferlen == 0) {
/* MainDlg.m_iInput1Bufferlen should have been zeroed out by master
   thread Copy characters one at a time to simulate keystrokes */
   for (; MainDlg.m_iInput1Bufferlen <= strlen(cKbdBuffer);
          MainDlg.m_iInput1Bufferlen++) {
/* While we copy characters buffer length indicator is greater than 0
   MAsterThread will start using the buffer and reset the buffer
   length indicator to 0 again creating an inconsistent buffer and/or
   buffer length indicator. */
   MainDlg.m_cInput1PseudoBuffer[MainDlg.m_iInput1Bufferlen]
     = cKbdBuffer[MainDlg.m_iInput1Bufferlen];
   Sleep(200);
  }
 }
 Sleep(2000);
}
return 0;
}

Back to Article

Listing Two

// This is a simulated comm port/socket monitoring thread
UINT Input2ThreadProc(LPVOID pParam)
{
 UINT uiX;
 ...
/* This should actually cause a GPF the helper thread will access
    uiX after this function has terminated */
   m_pHelperThread = AfxBeginThread(HelperThreadProc, &uiX,
                     THREAD_PRIORITY_NORMAL, 0, 0, NULL);
   break;
 ...
return 0;
}
UINT HelperThreadProc(LPVOID pParam)
{
    //  uiX no longer exists
    // access violation/GPF here
     *((UINT *)pParam) = (UINT)5;
     return 0;
}

Back to Article

1 2 3 4 5 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Tools

Debugging Multithreaded Applications