August 26, 2009
Practical Lock-Free BuffersSarmad Asgher
Lock-free systems enjoy freedom from deadlocks
Sarmad is a Senior Software Engineer at PalmChip Pakistan (Pvt.) Limited and an M.Phil of FAST-NU Pakistan. He can be contacted at m_sarmad_asgher@yahoo.com.
Lock-free programming frees you from system failure by tolerating individual process failures. How can a process failure cause system failure? When a process fails while holding a lock, other processes in the system requiring that lock have to wait indefinitely. On the other hand, lock-freedom (or "non-blocking behavior") guarantees that the system makes progress when some process takes few steps [1]. By definition, lock-free systems enjoy freedom from deadlocks. This property of lock-free systems can be handy in applications like API monitoring. Many malware analysis tools monitor Win32 APIs. These tools [2] work by injecting code into monitored process and use Interprocess communication (IPC) to log API calls [3]. If this IPC mechanism uses locks, then it can result in deadlock. In this case, the probability of deadlock is real because execution of monitored process can halt while some thread may be logging an API call. Scenarios which increase the likelihood of deadlock include:
If lock-free data structures are used for IPC, then the system can tolerate process/thread halts. In this article, I present a multi-producer/single-consumer lock-free buffer which can be used for IPC. (The complete source code and related files are available here.) In this particular situation, the consumer is the monitoring tool which logs API calls, whereas producers are the monitored processes which produce API call records. To simplify matters, I assume consumer does not fail. This lock-free buffer is different from usual circular buffers commonly used. A circular buffer uses in and out pointers to track free and used items, whereas this lock-free buffer uses a special value to track free items. This special value is never used as data. This simplifies the problem because you don't have to worry about updating in and out pointers atomically. The declaration of the buffer is:
This declares a buffer of integer items. You might be wondering how integer slots can store API call records. I build on this buffer of integers and construct a buffer which can hold strings. Returning to the integer buffer, an item with a zero value indicates a free slot. All the items in the buffer are initialized to zero. Insertion is straightforward. The insert operation iterates through the items to find an empty one, then tries to put an integer value using atomic primitive CAS. (For description of CAS, see the sidebar "CAS and ABA".) Note that writing lock-free code requires care to avoid potential pitfalls see [4] for details.
There is another handy insert operation on the buffer which tries to put the integer value at a particular index:
Similarly, Delete operation iterates through the buffer to find an inserted item and tries to take it out using CAS:
This implementation is lock-free because no process owns any slots in the buffer which means a process failure does not result in loss of any slots. As the buffer slots remain intact it is always possible to put an item if the buffer is not full. The ABA problem (see the sidebar titled "CAS and ABA") commonly faced by lock-free data structures miraculously does not cause any problems. Consider a scenario: First process reads the buffer slot to be empty, but is preempted by second process which inserts an item in that slot. This is followed by a third process which takes out that item. This in turn is followed by the resumption of the first process which finds the slot to be empty. But it does not matter if first process inserts an item in that slot. It shows that ABA problem is not an issue in this case.
How you can build on this integer buffer to put strings? You can put individual characters one at a time into the integer buffer. The immediate question is how to separate a character of one string from a character of another string. A simple answer is to use unique Message Number for a string. The next question is how the consumer can link these characters to form the string again. This is tricky. A simple approach can be to put index of the character in the buffer along with message number and actual character. However, this approach has a serious drawback. It will limit the length of string which can be transferred through the buffer. The approach used to overcome this limitation is to find a free slot to put the first special character along with the relative offset of the second character in the lock-free buffer. This is followed by putting the second character at the announced index along with offset of the third character and so on. This allows unlimited size strings to be transferred through the buffer. See the TransferString function in Listing 1 for complete implementation.
Listing 1
This TransferString function implementation builds on integer buffer. This implementation assumes that integer size is 4 byte on the target platform. The 4 bytes of an integer item which is put into the buffer holds the following information (see Figure 1):
Figure 1: Composition of an item of the Integer buffer.
Again, there is a single consumer. This simplifies the problem because concurrency is not an issue. I am going to discuss a demo consumer implementation here. The consumer calls Delete function on the integer lock-free buffer in an infinite loop. The consumer remembers the index from which the item was taken out of the buffer as well as the integer data itself. In the infinite loop it calls RearrangeString function which loops through those remembered items and checks three conditions:
In the first case, consumer calculates the index of the next character in the buffer and remembers it to check later on that an item with the same message number is really the next item in the sequence. In case 2 consumer prints out the string. In all three cases the consumer forgets this remembered item so that the same character is not repeated again. This approach works even in case when two nonconsecutive characters of the same string are inserted at the same index of the buffer because it is not possible to insert an item at an in use index. This means an item inserted first at the index is removed first. This result in RearrangeString function correctly linking the item inserted first at the same index. For a complete implementation of RearrangeString function, see Listing 1.
There are 0xFFFF possible message numbers which is very limiting for this implementation to be used for IPC in malware analysis tool. To make this implementation practical the same message number can be used again by the same thread in a process to communicate as many API calls as it makes. This implies that the resulting implementation can have 0xFFFF threads communicating API calls simultaneously. In other words, this lock-free buffer can tolerate failure of 0xFFFF threads. The system is not in dead-lock even if all message numbers are consumed. The new threads can continue their execution with out logging the API calls. No requirement of memory management is another advantage of this implementation. Although the demo implementation of consumer assumes maximum number of threads at compile time, however this technique has no such limitation. There can be any number of threads in the system.
Lock-freedom has the potential to make a difference. It invites you to think about new ideas and change the status-co in lock-free paradigm. You can start by thinking about ways to avoid the limitation on number of thread failures this implementation of lock-free buffer tolerates.
Acknowledgment
Thanks to Shafiq-ur-Rehman and Bilal Hashmi for guiding me on concurrency and lock-freedom related issues.
References
[1]M. Herlihy, "Wait-free synchronization," ACM Transactions on Programming Languages and Systems, vol. 11, no. 1, pp. 124"149, Jan. 1991.
[2]FireEye Anti-Malware and Anti-Botnet Security tool is an example. (http://www.fireeye.com/products/index.html).
[3]Details of API monitoring are out of scope of this article. See Detours for an example. (hhttp://research.microsoft.com/en-us/projects/detours/)
[4]H. Sutter. "Lock-Free Code: A False Sense of Security" (Dr. Dobb's, September 2008). (www.ddj.com/cpp/210600279)
[5]Michael, M.M. "Hazard pointers: Safe memory reclamation for lock-free objects," IEEE Transactions on Parallel and Distributed Systems, vol. 15, no. 8, Aug. 2004.
|
|
|||||||||||||||||||||||||||||
|
|
|
|