Kirk Krauss is a software developer working on IBM Rational PurifyPlus.
" There are two kinds of spurs, my friend. Those that come in by the door, and those that come in by the window."
-- Eli Wallach in The Good, The Bad, and the Ugly
In Deadlock-Proof Your Code: Part 1, I described a way for multithreaded software to automatically detect and work around deadlocks involving live threads. Sometimes a deadlock can involve a dead thread. Given the complexity of modern software, particularly where routines can call back and forth between one another, you have to be very careful to avoid the situation where threads exit while holding locks. If a thread takes ownership of a locks and is then terminated, and another thread waits for the lock held by the terminated thread, an endless waiting situation can begin. It's a type of deadlock that can occur if the locks involved are not aware of whether they're held by live or dead threads.
The following scenario illustrates this type of deadlock:
Once a thread has gotten stuck waiting on a lock held by a thread that has terminated, the hung thread will not execute further code, other than such code as may be involved in waiting on the lock. If other threads require either the lock held by the terminated thread or any other lock held by the hung thread, then the entire process can hang as those threads all endlessly wait.
You can solve the problem of a thread waiting on a lock held by a terminated thread if you ensure that no threads exit while holding locks. To detect this condition, you can set up watchdog methods to perform the following tracking as your program works with its threads and locks:
- Track all threads and their states (ie. running or waiting, and if waiting whether waiting on a lock, and if so, which lock);
- For each thread, also track a reference to which locks it's holding at any given time;
- Track all lock creation, acquisition (by which thread), and release.
When a thread acquires a lock, your watchdog can intercept the lock acquisition (via the same lock acquisition method described in Part 1 of this article, or via a simplified method that skips the surrogate lock mechanism described in Part 1). That way, your watchdog's lock acquisition method can track the fact that the lock is held. When the thread releases a lock, another of your watchdog's methods can track this fact as well. So your watchdog can track all of the locks held by the thread, in a list that is kept current. You can check your code for all the places where a thread is about to exit or to be terminated. At each of those spots, you can add a call to a routine that programmatically releases all of the exiting thread's locks based on the tracking performed by your watchdog's methods. This will prevent the thread from exiting until it's made sure to release any locks your watchdog is tracking. If you can establish comprehensive thread and lock tracking, you'll never again get a hang involving any locks held by the thread once it has exited.
Your watchdog's methods can be implemented as a set of functions to be explicitly called in lieu of standard OS-provided API functions (as in Listing 2 in Part 1). Alternatively, you can achieve the same wrapper effect in production code if you arrange dynamic interception of the necessary API functions via object code runtime patching or other means. Your intercepts can be set up at or near the start of the run or when a relevant component is loaded into the process.
Listing 1 (Part 1) is example Windows code that successfully deadlocks. It would deadlock over and over, recursively, if it didn't hang from a deadlock at the outset of its recursion. And if it didn't hang because of that, it would hang because threads exit while holding locks required by other threads.
Listing 2 (Part 1_ is functionally identical to the deadlocking code of Listing 1, except that it adds thread and lock tracking as described above, makes use of surrogate locks (as described in Part 1), and also makes use of a routine that releases all locks tracked as being owned by each thread that exits. That routine is called ReleaseThreadsLocks(). It appears as the last function in Listing 2 (Part 1).
Listing 2 (Part 1) is intended only as a proof of concept and does not exhaustively cover all available Windows synchronization API functions. You will almost certainly have to modify this code to meet your needs, rather than simply deploy it. The watchdog methods described here, or something similar to them, can fit virtually any platform that supports multithreaded applications. Though the code in Listing 2 is oriented toward native-code applications, the same techniques can be applied to Java or managed applications too. The locks used in the listings are critical sections, but other types of synchronization objects may benefit from similar protection to prevent hangs that occur when threads are terminated while holding locks.