Debugging Threaded Applications
The goal of parallel programming is to accelerate program performance by running multiple threads of execution in parallel. By definition, this effort requires every thread to have an impact on at least one other thread (generally, at minimum, the thread that launched it). As long as these interactions are orderly, well-designed, and predictable, improved performance ensues. However, if the threads interfere with each other or distort the algorithm incorrectly, defects specific to parallel programming will creep into the program.
These defects can be difficult to locate and equally hard to resolve once identified. As a result, in parallel programming, it pays to program defensively. That is, know ahead of time where the traps are and make sure you write your code to guard against them. Even then, an unexpected interaction between two threads will cause a bug to show up now and again. In this article, I examine the primary places where threading bugs lurk and explain how to avoid them. I also look at Intel Thread Checker, which is one of the few threading tools on the market that can automate the discovery of potential trouble spots. I assume that you're already familiar with threading basics and that you have used some form of mutual exclusion (a mutex, semaphore, or critical section) at some point in the recent past. My comments apply equally to threads on all platforms — Windows, Linux, UNIX, Java, etc.
The places where threading bugs most occur are where two threads interact or two threads share a common data variable. The most common violations can be grouped into three categories: data races, deadlocks, and a rarely discussed topic, live locks. Scrupulously avoid them and your life as a parallel programmer will be much smoother.
Data races occur when two threads share a variable but don't guard against simultaneous access. Suppose for example that you've made reservations online for a flight to your first gaming convention. Once the reservation is paid for, you go log in online to choose your seats. At the same time, someone else on your flight is doing the same thing. You see that there is one aisle seat left and you click on it to reserve it for yourself. The other fellow sees the same seat and clicks on it at the same time you do. Due to poor program design, you both see your screens flash with the information that you now have the seat. So, who really gets the seat?
This situation is a data race — two threads unaware of each other are both trying to update the same variable at the same time.
Data races should be suspected anytime you get inconsistent results when repeatedly running the same program against the same data. The solution to data race is mutual exclusion (mutex). Using a mutex, for example, one thread will put a lock on the code that updates the variable, so that no other thread can update it until the lock is released. In the airline reservation system, when a click occurs, a locked is placed on the seat-status update code. Now, if someone else clicks on the seat, they will have to wait until the lock releases, at which time the program will discover that the seat is now taken and the user will be told to select another seat.
Notice the inconsistency of the results: If the two clicks are milliseconds apart and no mutual exclusion is used the second person, will get the seat as his choice will overwrite his rival's. In the second scenario, where mutual exclusion is employed, the first click locks the other thread out and claims the seat.
Many data races are straightforward. You look where two threads share a data field or resource and you impose sequential access via mutual exclusion. However, some forms of data races can be fairly subtle. Libraries, in particular can create unexpected race conditions, if they have not been thoroughly tested for thread safety. In addition, resources can sometimes create data races. Consider multiple threads running printf() at the same time. Without mutual exclusion, they can step on each others' output easily and provide a screen or a log filled with nonsensical data.
So make sure your libraries are truly thread-safe, be vigilant about putting mutual exclusion around all data items and system resources that are shared between threads.
Deadlock is a situation in which two threads are blocked because each of them is waiting on a lock held by the other. For example, you're designing a combat game in which the principal objective of the current stage is mopping-up pockets of snipers. You come down a street and, by the rules of the game, you cannot advance because the street has not been cleared of snipers. But by an error in logic, the snipers cannot be cleared until a unit that's stuck behind you clears the street first. You now have deadlock: each thread is waiting on the other (Your advance is waiting on the sniper clearing unit and that unit is waiting for you to move forward). The result is that your thread waits for an event that will never occur. It appears to you that your thread or the game is hung.
The symptoms of deadlocks often involve two threads suddenly not advancing. If one of those threads is holding a lock that many other threads need, the program may appear to freeze completely.
There are several ways of avoiding deadlocks. The first is to track every place where a thread uses mutual exclusion and determine whether a lock has any dependence at all on any other thread that could be waiting for it.
Even this diligent inventory can overlook a subtle deadlock possibility: acquiring locks in the wrong order. Suppose, for example, that your game flashes an announcement on all players' screens whenever someone breaks the all-time record for disabling snipers. The way the logic is currently written, two locks are involved. One lock covers the code that checks your score against the old record and if you've beaten it, it writes your score to the record book. Because of the possibility of a data race (as discussed previously), the code carefully uses mutual exclusion to record the score — so that it can't be updated simultaneously by two different players. The next step is to acquire a lock to broadcast the new all-time high to all the players. The code uses a lock to make sure only one announcement can be made at a time. The code maintains the original lock on the score update, so that the point total in the announcement is not suddenly updated by someone else. So, at announcement time, the routine is holding two locks that enable the program to accurately record and broadcast the new all-time high.
What could possibly go wrong? As long as all threads, acquire the locks in the same order, all is well. But a summer intern at your company decides that if a rookie-level player sets the all-time record, this should be broadcast immediately. So he acquires the broadcast lock first, does the broadcast, and only then acquires the lock to record the result. As we know, this will cause a data race if two players, one of whom is a rookie, break the record at the same time. It can also cause a deadlock. Suppose the rookie procedure grabs the announcement lock and is waiting on the recording lock, while the other player has the recording lock and is waiting on the broadcast lock. Now both players are waiting on each other in a deadly embrace. So, a crucially important rule is that if multiple locks are required for a transaction, they must always be acquired in the same order.
One helpful precaution against deadlock is to avoid holding a lock if a second lock cannot be acquired. Almost all thread libraries have some form of threading call — in Pthreads, for example, it's called pthread_mutex_trylock()— that enables a developer to try a lock and if it can't be acquired to return right with an error code rather than wait forever for the lock. This call allows the developer to release a previously held lock and retry the operation later.
Live lock is a situation involving multiple threads in which no thread can advance even though they are actively working. The quintessential example is the Dining Philosopher's problem. In this classic story, five philosophers are in a room meditating on various ideas. In the room, is a table with five plates of noodles and five chopsticks — one chopstick placed between each pair of plates. To eat, a philosopher must pick up two chopsticks, one on either side of his or her plate. The way the algorithm is written, the philosopher sits down and first attempts to pick up the left chopstick. If he's successful, he attempts to pick up the right chopstick. If that's successful too, he eats. If the right chopstick is unavailable, the philosopher puts down the left chopstick, waits five seconds, and tries again — until both chopsticks are available.
The live lock problem occurs if all five philosophers decide to eat at the same time. Then, each one picks up the left chopstick successfully, and then notices the right chopstick is not available. In synch, they all lay down the left chopstick, wait five seconds, and repeat the process. Again, they all find the left chopstick but discover the right chopstick is not available, so they put down their left chopstick and wait again. As long as they are perfectly in sync, they will repeat the pattern ad infinitum.
Live locks are nearly always the result of an algorithmic flaw, rather than an implementation error. They are very difficult to identify, because it can be nearly impossible to reproduce them. Consider, for example, that on a quad-core machine, the problem cannot occur, because with only four cores running in parallel, the five philosophers cannot act simultaneously. One will be swapped out briefly to enable the fifth philosopher's action. As a result, one philosopher will always get to eat — and the live lock disappears. However, on processors with five or more cores the problem can indeed occur. So, if the QA engineer or the help desk technicians have quad-core machines, they will never be able to reproduce a bug reported by multiple users.
The best way to prevent live locks is through vigilance and defensive programming. Always consider what would happen if all possible threads performed the same action at the same time.
Locating Problem Code
As seen in the previous example, the biggest challenges in debugging parallel code of reproducing and locating the problem. Frequently, though, if you can reproduce the problem, you can locate the error. Most debuggers and IDEs today enable you to break on specific threads, so you can see what each one is doing. Parallel debugging, however, becomes highly complex when dozens of threads are in flight. Not only is the debugging difficult, but the defensive programming work requires patient, careful thought and thorough analysis of design.
Unfortunately, there are very few tools on the market to help locate sources of threading conflicts and of parallel trouble spots. One of the few is Intel Thread Checker, which is part of a line of thread-oriented development tools from Intel that includes Thread Profiler and Intel Threading Building Blocks (TBB). In its simplest use case, Thread Checker runs a threaded program and monitors the various threads looking for problematic interactions between them. It is capable of identifying race conditions and deadlocks, as well as suspicious threading practices. Figure 1 shows a screenshot from Thread Checker when it discovers a deadlock in a small sample program.
Notice how it is able to trace the deadlock statements to specific lines of code in each of the two threads. This can be an invaluable time saver, especially on massively parallel code such as that in MPGs or other complex games.
Figure 2 shows the display of a data race found when running a different program. In the central panel are shown three places where data races were detected (marked with red circles). Below them are various points of information (marked with blue) that highlight events of possible interest. If you click on any of these events, you are brought to the line of source code in a panel that looks much like Figure 1. This design makes it easy to identify the unprotected variable and determine which thread is modifying it.
Intel Thread Checker can certainly be used as a debugging tool, but this approach tends to underutilize it. A more plenary use case is to run it during integration tests. The reason for this is that parallel bugs are elusive creatures. A program that has a data race might run fine and give accurate results for weeks before one run suddenly delivers an incorrect result. To avoid this, the use of Thread Checker during the testing phase can automate the discovery of latent defects that are hidden because the right parallel conditions have been able to mask their presence. When a defect is found, then Thread Checker's debugging facility brings added lift. (Thread Checker also has thread performance analysis capabilities, which frequently can pinpoint places where threads are interacting suboptimally.)
In conclusion, you want to do all you can to avoid having to debug threaded code. The best approach is to write threading apps defensively, test extensively, and rely on automated defect detection tools such as Intel Thread Checker. All in all, this path will save you considerable frustration.
— Andrew Binstock is the editor in chief of Dr. Dobb's. A modified version of this post appeared recently on Intel.com, which owns the copyright.