DCE pthreads versus NT Threads

Michael ports PTF, a C++ class library for DCE pthreads, from HP-UX System 9 to Windows NT. In doing so, he examines the differences between pthreads and NT threads, and describes the porting experience.

December 01, 1996
URL:http://www.drdobbs.com/cpp/dce-pthreads-versus-nt-threads/184410012

December 1996: Yam

DCE pthreads versus NT Threads

Porting a C++ class library to Windows NT

Michael Yam

Michael is the founder of Y Technology, a consulting company that serves New York's financial district. He can be reached at [email protected].

One measure of a good class library is that it can be used by many programmers on multiple platforms. To that end, I've ported PTF, a C++ class library for DCE pthreads (see my article "A C++ Framework for DCE Threads," Dr. Dobb's Sourcebook, July/August 1995), from HP-UX System 9 to Windows NT. While NT conforms to the POSIX standard level 1, it needs to reach level 4 before it can qualify for POSIX thread compatibility. The possibility of level 4 conformance is remote, however, because Microsoft has developed an alternative threading model for NT. In this article, I'll discuss the differences between pthreads and NT threads, and describe my porting experience. Most of the discussion about NT threads also applies to Windows 95 threads; I'll point out the differences along the way. Also, some designers make a subtle distinction between a "class library" (which you send messages to) and a "framework" (which sends messages to you). Since PTF does a little of both, I use the terms interchangeably. Regardless, the most important aspects of frameworks and class libraries are that they not only promote code reuse, but design reuse as well.

Under UNIX, pthreads are typically used in servers. They behave as automated workers and don't require user input. NT threads, however, offer two models: a worker thread similar to the type used under UNIX, and a user-interface (UI) thread. A UI thread has an event loop and can respond to user input and events independent of other threads. In theory, UI threads allow for highly responsive applications. In practice, however, UI threads are tricky to manage and can slow down an application when misused. PTF only supports worker threads.

Creating Threads

There are three ways to create a thread under NT:

Use MFC.
Use the Win32 API call.
Use a variation of the Win32 API call for programs that use the standard C run-time library.

If you develop solely in MFC and need UI threads, you're better off creating a thread using MFC (CWinThread or AfxBeginThread, for example). Threads created any other way (with the Win32 API, for instance) cannot safely access any MFC objects or use the MFC API. Even with MFC-created threads, be aware that MFC 4.0 objects are not multithread safe. For example, while one thread can safely access a CString object many times (it's reentrant), multiple threads cannot safely access the same CString object, because MFC objects are not protected by synchronization mechanisms. This isn't an oversight; Microsoft compromised here to reduce overhead. This means you'll have to do a little work and wrap your shared MFC objects in critical sections or mutexes.

With the Win32 API, you would create a thread using CreateThread(). A thread created this way works fine as long as it restricts itself to Win32 API calls. Any call to the standard C run-time library will not be thread safe, because the C library was developed for single-threaded environments and the functions are not guaranteed to be reentrant.

That brings us to _beginThreadex(), which does the necessary housekeeping for safe access to the standard C library and eventually calls CreateThread(). Although the housekeeping adds some overhead, I prefer this method because I don't have to be vigilant about restricting my thread to use only Win32 API calls. For the performance minded, I've written PTF to work with either call; by default, PTF is built using CreateThread(). Compile with _MT defined to use the safer _beginThreadex().

NT thread creation has one nicety that pthread creation does not: a CREATE_SUSPENDED option. One problem with creating a thread in a C++ constructor is that the thread can conceivably start as soon as it's created, even before the constructor has had a chance to complete. With NT, I can create threads in a state of "suspended animation," and when the constructor has completed, start the thread with a call to resumeThread(). Since there is no such option with pthreads, I had to move pthread creation out of the constructor and into a start() member function. This allows the constructor to complete its initialization before a thread starts to run. Despite this difference, the public interface to PTF remains the same in NT and UNIX: You create a thread by deriving from the class PTObject and explicitly start the thread with the start() member function. Figure 1 summarizes PTF's public interface.

Schedules and Priorities

NT threads use one scheduling policy: round-robin (RR). With pthreads, there are five, but the primary scheduling policies are RR and first-in first-out (FIFO). RR guarantees that all threads eventually get a quantum (time-slice), but at the cost of context switching. Context switching can be especially expensive on a single-CPU machine. On a multiple-CPU machine, however, the OS can assign each of your threads to an available CPU, reducing context switching and improving speed. But this capability requires that the kernel be threaded. The NT kernel is threaded; the HP-UX System 9 kernel is not. HP, however, does plan to have kernel-space threads, possibly by System 11. (For more details on scheduling, see my aforementioned article.)

pthreads use manifest constants to define minimum and maximum priority levels. To get any priority level in between, you would just do the appropriate arithmetic. For example, a mediumpriority thread would be calculated as: ((max-min)/2) + min. By default, pthreads are created with a medium priority.

NT threads use a more-complex priority scheme. On an absolute scale, there are 32 priority levels, some of which you can't or don't want to use. The lowest priority level, 0, is reserved for the system and cannot be assigned. The highest priority level (31) is for real-time use and has a higher priority than many system functions, such as mouse events and disk activity. Consequently, a real-time thread can hang your system and should be used rarely and briefly, if at all.

NT's priority levels are grouped into four priority classes: idle, normal, high, and real time. By default, when a thread is created, it inherits the same priority class as its parent process. That is, a high-priority process will create a high-priority class thread.

These four priority classes are further divided into seven relative priority levels: time critical, highest, above normal, normal, below normal, lowest, and idle. Thus, the aforementioned high-priority process can create threads with seven different relative levels. By default, threads are created with a normal relative priority.

The goal of these groupings is to support responsive UI threads. As a further refinement, NT offers dynamic boosting of thread priorities; that is, NT will increase the relative priority of a UI thread by one or two levels if there is user input. Neither Windows 95 nor pthreads support dynamic boosting.

With either pthreads or NT threads, PTF uses the default schedule and priority levels. In general, a simple priority scheme is preferable to a clever one. I try to make all my threads the same level: medium. Even for a thread that could be created with a low priority, such as one that monitors system activity, I would make it medium and have it sleep a lot. The asynchronous nature of threads makes events difficult to foresee and debug. And if you work with UI threads, the unpredictable behavior of the user only compounds the difficulty.

Data and Thread Synchronization

Both pthreads and NT threads support mutexes to protect access to data. To lock and unlock a mutex, pthreads use pthread_mutex_lock() and pthread_mutex_unlock(), while NT threads use WaitForSingleObject() and ReleaseMutex(). In this case, the calls map one to one, but in general, the NT thread API is richer and more complex than the pthread API. For example, WaitForSingleObject() is used not only to wait on a mutex, but to wait on other objects, such as threads, events, or semaphores. Table 1 equates the calls among PTF, NT thread API, and pthread API.

Mapping the mutex calls was easy. Not so with pthreads' condition variables, or in NT parlance, event objects. With pthreads, condition variables synchronize threads with other threads. Threads with a publisher/subscriber relationship, where one thread produces data and the other consumes it, coordinate using condition variables. The subscriber waits on a condition variable until notified by the publisher that data is available. The publisher notifies the subscriber by signaling the condition variable (using pthread_signal()). Another variation might use a single publisher and multiple subscribers. When data becomes available, the publisher would broadcast to the condition variable (using pthread_broadcast()), thereby notifying all waiting subscribers.

Under NT threads, a condition variable roughly translates to an event object. Similar to the pthreads model, the subscriber waits on this event until notified by the publisher. But here's where the two models diverge: While a condition variable can be either signaled or broadcast to (that is, there are two ways to notify it), there is just one way to notify an event object-- PulseEvent(). PulseEvent() distinguishes between notifying a single subscriber and multiple subscribers by recognizing automatic and manual event objects.

When creating an event object, you must specify its flavor. An automatic event is so named because, after it is pulsed, it automatically resets itself to a nonsignaled state, and, in effect, allows only one waiting thread to proceed. This is equivalent to pthread_signal(). A manual event does not reset itself, and by remaining in a signaled state, allows all waiting threads to proceed. This is equivalent to pthread_broadcast(). After all threads proceed, PulseEvent() resets the manual event object to nonsignaled. A pair of related NT thread calls, SetEvent() and ResetEvent(), does the same job as PulseEvent(), but in two steps. SetEvent() sets the event to signaled, and ResetEvent() resets an event to nonsignaled. PulseEvent() is more convenient, but Microsoft probably anticipated a need to leave an event object in the signaled state.

Because NT threads have two types of event objects, you must arrange them carefully in your code. Know when to notify one waiting thread, and when to notify all waiting threads. For example, if you have several threads waiting on an automatic event and your intent is to notify all waiting threads, you're out of luck. PulseEvent() on an automatic event will only permit one thread to proceed. In contrast, pthreads are more flexible because you can use a signal or a broadcast as it suits your needs.

Reconciling the two models for PTF caused me some consternation, but I was able to produce a compromise. The constructor for the PTCondVar class now accepts an optional argument that specifies the event type: PTAUTOEVENT or PTMANUALEVENT. In the absence of an argument, the constructor defaults to PTAUTOEVENT. Under UNIX, PTF ignores this argument. Under NT, however, PTF uses it to create the correct type of event object. Since you could accidentally signal a manual event or broadcast to an automatic event, PTCondVar::signal() and PTCondVar::broadcast() will first check the event type using assert.h's assert() macro. If you've matched your event object with the incorrect notification method, the framework will abort. When you're done debugging and testing, you can turn off the assert() macros by rebuilding PTF with #define NDEBUG.

Interlocked Functions versus PTTSafeType

In multithreaded applications, many threads may need to safely read and write to a common variable. For example, business transactions often update a sequence number. NT threads offer a family of interlocked functions for such purposes--interlockedIncrement(), interlockedDecrement(), and interlockedExchange()--that operate on a 32-bit long int, either adding or subtracting one from it, or exchanging two values. These functions are also atomic, so there's no need to wrap the calls in mutexes or critical sections. Handy as these routines are, they aren't portable to pthreads.

PTF addresses this point with a template named "PTTSafeType," which preserves that C++ syntax, using "++" and "--" to increment and decrement a native type. The operators are thread safe because they're overloaded to use an internal mutex; see Figure 2. PTTSafeType does not exchange the values of two variables, but an exchange() member function can be written easily enough.

The Test Program

Listing One is a program that exercises some of PTF. It involves four threads: an increment thread, two decrement threads created in suspended mode, and a watch thread. The increment thread continuously adds one to a global variable. The decrement threads continuously subtract one from the global variable. The watch thread monitors the variable.

The increment and watch threads start first. When the increment thread causes the global variable to exceed a specified threshold, the watch thread wakes one decrement thread. The increment and decrement threads compete to a stalemate until the watch thread signals the second decrement thread. The program ends when the global variable goes to zero.

The output consists of printf() statements identifying the thread id (tid) and the value of the global variable. Thread ids look different in NT than in Windows 95. While both operating systems store tids as DWORDS, Windows 95 assigns huge values such as 4294954371, whereas NT might assign values starting with 130. pthreads under HP-UX assign values beginning with 4. As long as the values are unique, you can identify the thread. One caveat: Under NT and Windows 95, a tid will be reused if the thread belonging to it terminates. In contrast, HP's implementation of pthreads does not reuse a tid.

Some C++ developers might take exception to my use of printf() statements over streams. I do find printf() easier to use, but the main reason I use it is because printf() is thread safe, whereas streams are not. Multiple threads streaming to cout will produce scrambled output. Of course I could have wrapped cout with a mutex, but here I traded technical correctness for expediency. Besides, printf() statements were used only for debugging.

When building PTF for pthreads, you need to #define either "_POSIX_SOURCE" or "_HPUX_SOURCE". When using Visual C++ to build PTF for NT, the target operating system is automatically detected. Specifically, the IDE defines "Win32," and PTF is built using the Win32 API. Should you also need to make calls to the standard C library, remember to #define "_MT". Finally, the test program under NT must be created as a "console" application. The complete source code to PTF is available electronically.

Conclusion

Looking back, the port was mostly straightforward, mainly because PTF is small. For in-house corporate development, I find small class libraries have a better chance of being reused than large ones. In-house software requirements tend to be vertical and can differ greatly between departments. Thus, a broad and deep framework that neatly captures the essence of one department would be difficult to use in another. Small also means easier to learn, maintain, and possibly replace when a commercial equivalent or an official standard appears. Until then, PTF will continue to serve me well. I hope you find it just as useful.

Table1: Map resolving calls among pthreads, NT threads, and PTF.

 
Pthreads                NT Threads           PTF

pthread_create          CreateThread or      PTObject::ctor
                        _beginThreadex       PTObject:start
pthread_cancel          deleteThread or      PTObject::dtor
pthread_destroy         _endThread
pthread_join            WaitForSingleObject  PTObject::join
pthread_yield           Sleep                PTObject:yield
pthread_mutex_init      CreateEvent          PTMutex::ctor
pthread_mutex_destroy   CloseHandle          PTMutex::dtor
pthread_mutex_lock      WaitForSingleObject  PTMutex::lock
pthread_mutex_unlock    ReleaseMutex         PTMutex::unlock
pthread_mutex_trylock   WaitForSingleObject  PTMutex::trylock
pthread_cond__init      CreateEvent          PTOondVar::ctor
pthread_cond_destroy    CloseHandle          PTCondVar::dtor
pthread_cond_broadcast  PulseEvent           PTCondVar::broadcast
pthread_cond_signal     PulseEvent           PTCondVar::signal
pthread_cond_wait       WaitForSingleObject  PTCondVar::timedWait

Figure 1: PTF's public interface.

Figure 2: Aggregation for PTThreadSafeObject.

Listing One


/*---- testptf.cpp exercise some of PTFs classes -----*/

#include "pt\ptf.h"
#include "pt\pttf.h"

#include <stdio.h>
#include <assert.h>

// Global condition variable and counter. Program ends when gCounter <= 0.
PTTSafeType <int> gCounter;
PTCondVar gWatchEvent;

// Simple classes: increment thread, decrement thread
// and watch thread.  Uses PTObjects ctor and dtor.
class IncThread : public PTObject
{
public:
    int runPThread();
};
class DecThread : public PTObject
{
public:
    int runPThread();
};
class WatchThread : public PTObject
{
public:
    int runPThread();
};
// Implement virtual runPThread() member for all three thread classes.
int
IncThread::runPThread()
{
    while(gCounter > 0)
    {
        ++gCounter;     // thread-safe increment
        printf ("Inc Thread tid: %u  gCounter = %d\n", tid(), (int)gCounter);
#if defined (_HPUX_SOURCE) || defined (_POSIX_SOURCE)
        yield();    // HP-UX quantums can be large
#endif
    }
    return gCounter;
}
int
DecThread::runPThread()
{
    // wait for signal from watch thread before starting.
    gWatchEvent.timedWait();
    while((int)gCounter > 0)
    {
        --gCounter; // thread-safe decrement
        printf ("Dec Thread tid: %u  gCounter = %d\n", tid(), (int)gCounter);
#if defined (_HPUX_SOURCE) || defined (_POSIX_SOURCE)
        yield();    // HP-UX quantums can be large
#endif
    }
    return gCounter;
}
int
WatchThread::runPThread()
{
    PTCondVar sleepEvent;
    DecThread decThread1;
    DecThread decThread2;
    // threads are started but not yet signaled.
    decThread1.start();
    decThread2.start();

    while((int)gCounter > 0)
    {
        printf ("Watch Thread tid: %u  gCounter = %d\n", tid(), (int)gCounter);
        if (gCounter > 1000)
        {
            // wake one dec thread if counter is over arbitrary limit.
            gWatchEvent.signal();
        }
        sleepEvent.timedWait (3);   // nap for 3 secs
    }
    decThread1.join();
    decThread2.join();

    return gCounter;
}
int main()
{
    gCounter = 500;     // arbitrary start count;

    IncThread incThread;
    WatchThread watchThread;

    if (watchThread.isValid())
        watchThread.start();
    if (incThread.isValid())
        incThread.start();
    watchThread.join();
    incThread.join();

    printf ("gCounter = %d\n", (int)gCounter);

    return 0;
}