In this article, I explain the many problems associated with concurrent access to hierarchical data structures. I explore several possible solutions, and describe a generic construct that encapsulates much of the complexity associated with locking hierarchical data structures in C#. The design, though, can be implemented in any programming language.
Let's start with a description of a problem with similar characteristics to my real project at work. I'm working on a scheduling service for online classes. The main data structure is a tree consisting of lectures. For each lecture, there are multiple class sessions (which can be represented as branches). Each class session contains slots that students can occupy. The invariants are: If the number of students exceeds a threshold, a new class session is added; and students can only attend one class at a time. (If a student joins another class, he is logged out of his current class.) So, the data structure that needs to be managed is a tree:
- Lectures
- Classes
- Students
The code needs to support a small set of operations:
- Attend lecture (student wants to attend a lecture)
- Expel student (student is removed from current class)
- Cancel class (stop a particular running class)
- Cancel lecture (stop all running classes of this lecture)
- Get statistics (number of classes running and number of students overall)
When this service is being accessed by more than one user, clearly there will be a contention issue that needs intelligent management of mutual exclusion and locking. Let's quickly look at the locking strategies available to us:
Coarse-Grained Locking. Coarse-grained locking is relatively simple to implement. In every thread that needs to access some shared state, you lock the entire state, read/write to your heart's content, and then release the lock. The access to the shared state is serialized and only one thread can access it. The problem with coarse-grained locking is that it's slow. If you have hundreds of threads that need to access the shared state, they will all have to wait until the current thread that holds the lock finishes its work, and only then will another thread be allowed do its thing. Such a system will most likely perform much worse than a single-threaded system. Coarse-grained locking is only useful if your threads execute quickly and, as a result, don't create a lot of contention.
Fine-Grained Locking. With fine-grained locking, multiple locks are used in a set sequence to lock the smallest possible part of the data structure that the current thread needs to operate on. This gives other threads the opportunity to work in parallel on other parts of the data. Fine-grained locking is where most concurrency problems emerge: race conditions, dead locks, live locks, lock convoys, and so on.
Lock-Free Algorithms and Data Structures. Lock-free algorithms address the issues raised by locks, but bring their own set of problems. Their use in the industry is still fairly new. At their core, they rely on atomic operations at the hardware level. It is very hard to design and implement lock-free algorithms properly because the building blocks are very small; when you compose them, the emerging behavior is not trivial to analyze.
ChainLocker
In this article, I'll examine the most common solution fine-grained locking. The approach I use here, called ChainLocker, encapsulates the concept of sequenced or hand-over-hand locking of a chain of locks. Hand-over-hand locking refers to locking a top-level item (lectures tree root), looking for a particular branch (a specific lecture), locking the lecture, and only then releasing the lectures tree root so other threads can access other lectures in parallel. Once you have the lecture locked, you can look for a particular class, lock it, then release the lecture lock, so other threads can operate on other classes of this lecture. Here is pseudocode that demonstrates the concept:
lock(root) lecture = root.find_lecture(...) lock(lecture) unlock(root) class = lecture.find_class(...) lock(class) unlock(lecture) process(class) unlock(class)
To remove an element from the tree (such as a lecture), it's necessary to make sure nobody else has any class locked. You can do this by trying to lock all the classes in addition to the lecture, and only then remove the lecture; or you can mark the lecture as CLOSED
and remove it later in a background task after a short delay to make sure all current threads finish processing (new requests for a CLOSED
lecture
will fail as though it had never existed).
Design and Implementation
ChainLocker
is a generic class that can be parameterized by the types of objects you want to lock. In our example, the types are IDictionary<int, Lecture>
, Lecture
, and Class
. A ChainLocker
can lock arbitrarily long chains of objects. Here, I present a three-level deep ChainLocker
, but I'll later introduce a ChainLocker
generator that can generate any size ChainLocker
you need. Listing One presents the code for the three-level ChainLocker
.
Listing One
using System; using System.Linq; using System.Threading; namespace ChainLocker { public class ChainLocker<T0, T1, T2> where T0 : class where T1 : class where T2 : class { public void Do(T0 state0, Func<T0, T1> stage0, Func<T1, T2> stage1, Action<T2> stage2) { if (state0 == null) { throw new Exception("state0 can't be null"); } var takenLocks = Enumerable.Repeat(false, 3).ToArray(); var states = Enumerable.Repeat<object>(null, 3).ToArray(); states[0] = state0; try { // Lock state0 Monitor.Enter(states[0], ref takenLocks[0]); // Execute stage0 states[1] = stage0((T0)(states[0])); // Bail out if returned null if (states[1] == null) { return; } // Lock state1 Monitor.Enter(states[1], ref takenLocks[1]); // Release the state0 lock so other threads can work on it Monitor.Exit(states[0]); // Execute stage1 states[2] = stage1((T1)(states[1])); // Bail out if returned null if (states[2] == null) { return; } // Lock state2 Monitor.Enter(states[2], ref takenLocks[2]); // Release the state1 lock so other threads can work on it Monitor.Exit(states[1]); takenLocks[1] = false; // Execute stage2 stage2((T2)(states[2])); } finally { // Release still held locks (If an exception was thrown) for (int i = 0; i < 3; ++i) { if (states[i] != null && takenLocks[i]) { Monitor.Exit(states[i]); } } } } } }
Here is the class definition:
public class ChainLocker<T0, T1, T2> where T0 : class where T1 : class where T2 : class { public void Do(T0 state0, Func<T0, T1> stage0, Func<T1, T2> stage1, Action<T2> stage2) { ... }
Before diving into the implementation, let's make sense of the class definition. T0
, T1
, and T2
are the parameterized types in the order they should be locked. Note, the constraint "where Tx : class."
In the management service (MS) example we are running, T0
is the root (IDictionary<int, Lecture>
), T1
is Lecture
, and T2
is Class
. ChainLocker
has a single method called Do()
. Do()
has an interesting signature. It accepts an initial state of type T0
called state0
, and then three stages (stage0
, state1
, and stage2
). The initial state state0
is the root, which must be locked first. The stage
parameters are delegates that specify what processing happens at each stage once the appropriate state is locked properly. If you are not familiar with the C#/.NET Func
and Action
delegates, they encapsulate an anonymous method with an arbitrary signature. Action
is for void
methods that don't do anything, and Func
is for methods with a return type. Both can take any number and type of arguments.
For example, stage0
is defined as Func<T0, T1>
, which means it's a method that takes an object of type T0
and returns an object of type T1
. The last argument to Do() stage2
is Action<T2>
, which translates as a method that takes a T2
object and returns nothing (void
).
What is the purpose of this strange signature and how does ChainLocker
work? First off, an important concept of ChainLocker
is quick exit. Remember that ChainLocker
keeps scarce resources locked. It is often the case that a particular workflow will not go through the entire chain of root
, lecture
, class
. For example, consider a student trying to join a nonexistent lecture. Once in stage0
, it becomes clear that there is no such lecture and there is no point in going to stage1
(processing the lecture
) and stage2
(processing the class
). ChainLocker
will bail out immediately at the end of stage0
. How does the generic ChainLocker
figure out that it should bail out early? That is the responsibility of the caller who provides the processing delegates. ChainLocker
relies on an unwritten contract that if a stage
delegate returns null
, it will bail out immediately. Here is the relevant snippet for stage0
:
// Execute state0 states[1] = stage0((T0)(states[0])); // Bail out if returned null if (states[1] == null) { return; }
The last stage (Action<T2> state2
) returns nothing because there is no bailing out early after the last stage. An interesting consequence of this design is that entire Do()
method returns nothing. As part of processing, you can assign a result to a property on one of the objects you process, but ChainLocker.Do()
itself is always void
. It is possible to extend ChainLocker
to support a return type for Do()
, but I decided to avoid it because it can make the semantics very confusing in cases of early bail out (you can only return null
, but null
may also be a valid return value if processing goes all the way and also if the processing involves launching asynchronous operations). I decided to keep it simple and let the caller manage the results processing.