Open Source

Thread Programming in UnixWare 2.0

By John Rodley, June 01, 1995

With the advent of UnixWare 2.0, threads have made their way to the UNIX desktop. John describes how threads are implemented and how you can take advantage of them.

JUN95: Thread Programming in UnixWare 2.0

Just say "no" to fork()

John is an independent consultant in Cambridge, MA. He can be contacted at [email protected].

With the advent of UnixWare 2.0, threads have made their way to the UNIX desktop. A superset of the thread specification in the POSIX Portable Operating Systems Standard (draft standard P1003.1c), threads have the potential to liberate UnixWare developers from the limitations of the age-old fork() model. Furthermore, threads let you exploit the capabilities of multiprocessing hardware.

Before Version 2.0 (POSIX 1003.1c and SVR4.2 MP), UnixWare provided two ways to create new processes: fork and fork-exec. The fork system call creates an exact copy of the calling process and sets it running at the return from the fork call. The new process is a child of the old; it gets a copy of the parent's data space and valid file descriptors for all files opened by the parent. To start a different process, the child process calls exec right after the return from fork.

With fork(), creating a new process consisted of a few lines of code, such as those in Example 1. To start another process, a process had to clone itself, then ask the operating system which of the two copies it was. Until recently, fork/exec was the only avenue for concurrent programming.

Lightweight Processes

Pre-2.0 UnixWare kernels had only one type of process, which I call a "heavy-weight process" (HWP), and is the object of such calls as ps, kill(), and getpid(). HWPs still exist, but only as collections of lightweight processes (LWPs), which are the only schedulable entity in UW 2.0. An HWP consists of from one to MAXULWP LWPs. If you run a nonthreaded application in UW 2.0, in memory you will get an HWP that consists of a single LWP. In effect, instead of being a pointer to a piece of executable code, the HWP is now a pointer to a list of pieces of executable code.

In multiprocessor systems, separate LWPs from a single HWP can run on different processors, allowing them to achieve true concurrency. The best example of the need for this is a print function. You want to hit the print button, then move on--not sit watching a dialog box that says, "Now formatting page n. Please wait."

Since the HWP concept is still supported, old process-specific calls such as getpid, kill, and nice work much as they did before. Therefore, you need analogues to those calls to control threads and their LWPs the way you've always controlled HWPs. Table 1 lists some process-control calls and their threads' lib analogues.

Threads

Threads are not LWPs. The kernel itself knows nothing about threads, it only schedules LWPs. Each running LWP makes calls to the dynamic-threads library, which schedules threads to run on LWPs. So you now have two levels of scheduling: kernel scheduling of LWPs on processors and thread-library scheduling of threads on LWPs. A single instance of a thread can run, at different points in its life, on different processors and different LWPs. To really get this, you have to view the scheduled process as something completely independent from the lines of code that will run when that process gets scheduled. Think of the processor as a field, of each LWP as someone who has signed up to use the field, and of each thread as a particular activity such as baseball, soccer, or football. Now, when the kernel schedules someone to use the field, that person can play football for the entire time (a bound thread) or football for five minutes and baseball for ten. The kernel doesn't care. The person using the field (the threads library, through any of its LWPs) has to keep track of the games being played (thread instances) within the time that person uses the field. Thus, a thread is simply a series of logical statements, independent of the process, or processor upon which it might be executed.

There are two basic kinds of threads: bound and multiplexed. A bound thread gets its own dedicated LWP. Each HWP has a number of LWPs in its pool, and it can run a particular multiplexed thread (muxthread) on any LWP in the pool at any given point.

The major consideration in choosing between bound and multiplexed threads is the trade-off between performance and concurrency. On a uniprocessor, bound threads can have up to five times the context-switching overhead of muxthreads. Bound threads, though, enjoy the most concurrency. Five bound threads on a five-processor system could be running physically concurrently, one thread to a processor, while five muxthreads on the same system might end up running on a single processor.

Concurrency

Concurrency is easiest to understand in the multiprocessor model. In a multiprocessing machine, an LWP can be farmed out to another processor. Two LWPs, or two threads bound to different LWPs, running on different processors at the same time are running truly concurrently. If two muxthreads run on the same LWP, they can never run on separate physical processors, and can thus never be truly concurrent.

Thus you can see the two extremes of concurrency: the maximum being one LWP per thread and the minimum being one LWP for all threads. In reality, the threads library will not let you pile a large number of threads onto a single LWP. UW 2.0 allows you to set the concurrency level through the thr_setconcurrency call. Listing One (beginning on page 102) is a program that creates six additional multiplexed threads, each of which only prints out its process ID, LWP id, and thread id. Figure 1 shows the output from a run of the program with concurrency set to 1 (the minimum). Even at that setting, the threads library created two new LWPs (2 and 4) to run our spawned threads (2--7), proof that the concurrency level we set in thr_setconcurrency is a hint, not an order. Figure 2 shows the output when we increase the concurrency level by 1. A new LWP (5) appears. Notice also that from one iteration of the thread's main loop to the next, the thread can run on different LWPs.

Another anomaly that leaps out when running Listing One is that the first run creates three LWPs: the one running thread 1 (LWP1) and those running threads 2--7 (LWP2 and 4). Logically, there must have been an LWP3. In this case, the thread library's wrapper for the sleep function created its own bound thread and thus a new LWP, so you don't have absolute control over the number of LWPs in a process.

Down in the details, scheduling and concurrency are even more complicated, but the bottom line is that two bound threads have the maximum probability of achieving true concurrency, while two muxthreads with concurrency level 1 have the minimum.

What's in a Thread?

The first decision in threading an application is which lumps of code should get their own threads. Table 2 lists categories of code granularity for threading. You need to mark medium- and coarse-grain functions for possible threading. A good example of a medium-grain function is a signal handler. Typically, a signal handler is a single function that does all its work within that function, or with calls to one or two other small functions. A typical coarse-grain function would be the serial I/O handler of a communications package. While it contains a huge amount of functionality, and correspondingly huge amounts of code, it needs user input to make a complete program.

You also have to decide whether making two threads of execution concurrent yields any real-time gain to the user. If you have three functions--A, B, and C--where B can't start until A is done and C can't start until B is done, then making A and B concurrent gets you nowhere. If, however, B can start without A being done, then putting B in a separate thread could be a real-time win.

Creation

Creating a thread is as simple as making a call to thr_create with the address of the function that will be the "main" for that thread. Creating the new thread in a suspended state (THR_SUSPENDED) lets you specify exactly when the new thread begins to run. By calling thr_continue, the new thread begins processing at the first line of the function passed to thr_create. You can call thr_suspend at any time to pause your new thread.

Note that with thr_create you can no longer rely on your stack to autogrow. The kernel supports autogrowth of a stack when you run out of stack space, but since the kernel isn't handling threads, it doesn't know anything about the threads' stacks. Thus, you have to allocate a big-enough stack right from the thr_create call.

Threads and Signals

You can set up separate signal masks for each thread in a process. A signal sent to a UNIX process from another UNIX process via kill(process_id,signal_id), however, will only go to a thread enabled to catch that signal. If more than one thread is accepting a particular signal, the signal may be delivered to any accepting thread.

For this and other reasons, Novell recommends that instead of dealing with signals on a thread-by-thread basis, applications mask all signals in all threads and dedicate a single thread to wait on incoming signals via sigwait. An "object thread" program that adds a signal handler thread to Listing One is available electronically (see "Availability," page 3). As always, it pays to build an appropriately limited signal set. Two new signals have been defined in UW 2.0 to support the threads lib, SIGWAITING and SIGLWP. SIGWAITING happens when all LWPs in the processes' LWP pool are blocked interruptibly. In thread8, this occurs when thread 1 is in gets(), thread 2 is sitting in a sigwait(), and all the other threads are either suspended or sleeping. If you add SIGWAITING to an object-thread program's signal set, the process will stop accepting user input.

Shared Data

To share data among HWPs, you have to use the System V shared-memory IPC. Threads, on the other hand, automatically share all global and static data. You can see this in Listing One, where the variable ulIterations is a static in the thread-start function. Each thread increments ulIterations each time through the loop, and you get output like that in Figures 1 and 2.

If you made ulIteration an automatic variable, it would have gone on the stack, which is separate for each thread, and thus each thread would get its own, private copy, giving you output such as this:

Thread1 Iteration 1
Thread2 Iteration 1
Thread1 Iteration 2
Thread1 Iteration 3
Thread2 Iteration 2

Interthread Coordination

UW 2.0 supports a number of mechanisms for coordinating the activity of threads within a single HWP: locks, semaphores, and conditions.

Mutual-exclusion locks are used to restrict resource access to a single thread. Lock the resource by calling mutex_lock. Any other thread calling mutex_lock for that mutex blocks until you call mutex_unlock. All the mutex calls take a pointer to a mutex_t structure as their first arg in order to identify the mutex. Under the rules of shared data, this mutex_t struct must be either global or static in order to be available to all threads.

Reader-writer locks are a variation of mutex locks. They allow the application to place two different types of lock on the same resource. When performing a nondestructive operation on the resource (read), the app calls rw_rdlock to put a read lock on. Any number of threads can put read locks on a resource. If a thread attempts a write lock on the resource, it will block until the reader's unlock. When a thread acquires a write lock, all other readers and writers block until the single writer unlocks. In file-system terms, putting a read-lock on a resource is the equivalent of doing a chmod 444 on a file (everyone can read, none can write), while putting a write lock is more like a chmod 600 (one can read/write, no others can read or write).

Conditions provide a way for threads to wait on specific conditions without having to "acquire" a semaphore or a mutex. The pseudocode in Example 2 demonstrates this. cond_wait blocks until some thread validates the condition (sets bLineIn True) and calls cond_signal or cond_broadcast. We only sit in this loop retesting the condition because the condition could have been invalidated again by another thread that was also blocked on this condition and got scheduled before us.

Thread Termination

Terminating a thread is very similar to terminating a UNIX process. From inside the thread, you call thr_exit (which is called implicitly if the start function returns). From outside the thread, you have to send the thread a SIGTERM signal. To clean up, you can catch the signal, then call thr_exit. Suspended threads do not terminate until they are restarted.

A process terminates when all non-daemon threads have terminated. A call to exit() or a return from main() (which implies exit) forces termination of all threads. A program that lets you interactively create and control a command-line-specifiable number of threads is available electronically. In the program, I precede the call return(0) at the end of main() with a call to thr_exit(). If you run this program and start up the threads, they'll start printing their output. While they're running, hit q to exit the main user-input loop. The spawned threads keep running, but thread0 exits (the return(0) never gets executed). Run thread9 again with the --d flag so that all threads are daemon threads and you see that the process (and all daemon threads) terminates when all nondaemon threads (thread 0, for instance) terminate.

If, as Novell suggests, you create a separate signal-handling thread, either make it a daemon thread or make sure you have some way of killing it so that your process doesn't hang waiting for that endless thread to die.

Threads and Libraries

Those of us who suffered through the combination of OS/2 1.0 and Microsoft C 5.1 know all about the misery of non-reentrant libraries in a multithreaded environment--traps, mysterious hangs, crazy values. According to Novell, all the libraries delivered with 2.0 and the new SDK are thread safe. Third-party libraries are another story altogether. As usual, there's only one way to know for sure_.

File I/O

Sharing open-file descriptors introduces an atomicity problem that is almost certain to blow up any pre-SVR4.2 MP third-party library that does file I/O. Consider two threads, X and Y, which share an open-file descriptor. X wants to do a simple seek/read on that file, but seek() and read() are separate instructions, so X could be preempted between the two. During that preemption, Y could also call seek against that file descriptor, putting the descriptor's internal pointer someplace other than where X wanted it. When X regains control, it will read at the offset Y sought to, not the one X wanted. You could get around this by locking the file or surrounding all file ops with a semaphore, but those are pretty big hammers to use on such a small problem.

To deal with this, UW 2.0 introduces pread() and pwrite(), which are atomic combinations of lseek/read and lseek/write. The calls are identical to read and write except that they take an extra argument--the offset from beginning of file to seek to. These calls do not change the file descriptors' internal file pointer as lseek would.

Other Considerations

Now that you're free of fork/exec, the temptation is to go out and write a new thread for everything (16 million threads!), but you should check that impulse just a little. There is a kernel-enforced limit on the number of LWPs that one user id can have. This is a kernel tunable called "MAXULWP." It has a range of 1--65000 and defaults to 200, which should suffice for all but the most esoteric programs. Listing One uses a kludgy method for obtaining MAXULWP. According to Novell, there is no supported way for a nonroot user to obtain MAXULWP.

The Bottom Line

UW 2.0 threads are easy to get running, and once you get used to it, they're a much more natural way of viewing problems than the old sequential model. Keeping in mind a few of the concepts and caveats I've discussed here should put you well on your way to writing the maximum multithreading program.

Example 1: Calling a new process.

if(( child_pid = fork()) != 0 )
    // do child process stuff
    exec( "new_program" );  // overlay this clone with a new executable
else
    // continue doing parent process stuff

Example 2: Pseudocode for conditions.

cond_t MyCondition;              // All threads agree that this global                    
                                    condition indicates that a
                                 // line has arrived from the user.
mutex_t MyConditionsMutex;       // All threads agree that this mutex is  
                                    associated with
                                 // MyCondition.
// this is thread0
BOOL bLineIn = FALSE;
cond_init( &MyCondition ...)
// spawn thread1
gets();
bLineIn = TRUE;
mutex_lock( &MyConditionsMutex );
cond_signal( &MyCondition );
mutex_unlock( &MyCondition );

// this is thread1
mutex_lock( &MyConditionsMutex )
do {
    iRet = cond_wait( &MyCondition, &MyConditionsMutex );
} while ( bLineIn == FALSE );
mutex_unlock( &MyConditionsMutex );

Figure 1: Listing One output at concurrency level 1.

P1688 LWP2 - Thread 2 iteration 0
P1688 LWP2 - Thread 3 iteration 1
P1688 LWP2 - Thread 4 iteration 2
P1688 LWP4 - Thread 5 iteration 3
P1688 LWP4 - Thread 6 iteration 4
P1688 LWP4 - Thread 7 iteration 5
P1688 LWP2 - Thread 2 iteration 6
P1688 LWP2 - Thread 3 iteration 7
P1688 LWP2 - Thread 4 iteration 8
P1688 LWP2 - Thread 5 iteration 9
P1688 LWP4 - Thread 6 iteration 10
P1688 LWP4 - Thread 7 iteration 11
 ....

Figure 2: Listing One output at concurrency level 2.

P1688 LWP2 - Thread 2 iteration 0
P1688 LWP2 - Thread 3 iteration 1
P1688 LWP2 - Thread 4 iteration 2
P1688 LWP4 - Thread 5 iteration 3
P1688 LWP4 - Thread 6 iteration 4
P1688 LWP4 - Thread 7 iteration 5
P1688 LWP5 - Thread 2 iteration 6
P1688 LWP2 - Thread 3 iteration 7
P1688 LWP2 - Thread 4 iteration 8
P1688 LWP5 - Thread 5 iteration 9
P1688 LWP5 - Thread 6 iteration 10
P1688 LWP4 - Thread 7 iteration 11
 ....

Table 1: Thread-specific calls and their process-specific analogues.

Thread-Specific    Process-Specific   
Call               Analogue            

thr_create         fork/exec
thr_exit           exit
thr_join           wait
thr_kill           kill
thr_setprio        nice
thr_sigsetmask     sigsetmask (BSD)
pread              lseek/read
pwrite             lseek/write
getpid             thr_self

Table 2: Code granularity.

Code Granularity Level  Code Item              Comments  

Fine                    Loop                   May be threaded by 
                                                parallelizing
                                                compiler
Medium                  Standard one-page      Thread
                         function
Coarse                  Background serial I/O  Thread
                         communications 
                         handler
Super-coarse/gross      Program                Separate heavyweight process

Listing One


// A program to create and control a command line specifiable
// number of threads interactively.
//      command line arguments:
//         -b                 Create BOUND threads, defaults to multiplexed
//         -nthreads <number> Create number threads
// see code for explanation of interactive commands.

#include "defines.h"
#include <sys/types.h>
#include <ctype.h>
#include <stdio.h>
#include <unistd.h>
#include <mt.h>
#include <sys/signal.h>
#include <thread.h>
#include <stdlib.h>
#include <sys/lwp.h>
#include "listing1.hpp"

#define MAX_THREADOBJECTS   65000
#define IDTUNE_CMD "grep MAXULWP /etc/conf/cf.d/mtune | awk '{print $2}'"
#define BUFSIZE 80

bool bBound;    // are we using bound or multiplexed threads??
pid_t getpid(), child1_pid, child2_pid;

int GetMAXULWP();
int flagset( int argc, char *argv[], char *flag );

// Main - create the number of threads specified on the command line, then sit
// in a loop accepting and executing interactive commands from the user.
int main( int argc, char *argv[] )
{
int i, k, iNumThreads;
Thread *pt[MAX_THREADOBJECTS];
int iMaxThreads;
int maxulwp;
int iNumRequestedThreads;
int thread_index;
int thread_id;
char kar;
char buffer[80];
int iConcurrencyLevel = 1;
int iRet;

maxulwp = GetMAXULWP();
if(( k = flagset( argc, argv, "-nthreads" )) > 0 )
    iNumRequestedThreads = atoi( argv[k+1] );
else
    iNumRequestedThreads = MAX_THREADOBJECTS;
if( flagset( argc, argv, "-b" ) > 0 )
    {
    bBound = TRUE;
    iMaxThreads = maxulwp + 1;
    }
else
    {
    bBound = FALSE;
    iMaxThreads = (maxulwp*4) + 1;
    if(( iRet = thr_setconcurrency( iConcurrencyLevel )) != 0 )
         printf( "Error: thr_setconcurrency(%d) = %d\n", iConcurrencyLevel, iRet );
    }
printf( "P%d LWP%d - Creating %d %s threads\n", getpid(), _lwp_self(), iNumRequestedThreads, 
         bBound?"bound":"multiplexed" );
for( i = 0, iNumThreads = 0; i < MAX_THREADOBJECTS && i < iNumRequestedThreads ; i++ )
    {
    if( bBound )
        pt[i] = new BoundThread();
    else
        pt[i] = new MultiplexedThread();
    if( !pt[i] )
        break;
    if( pt[i]->iCreateError != 0 )
        {
        printf( "P%d - Thread create error %d\n", getpid(), pt[i]->iCreateError );
        delete pt[i];
        break;
        }
    iNumThreads++;
    if( iNumThreads == iMaxThreads)
        printf( "P%d -\tNext thread will exceed MAXULWP (%d)\n", getpid(), maxulwp );
    }
printf( "Following thread commands are available:\n" );
printf( "\ti - shows the status of all the threads\n" );
printf( "\ta - increments concurrency level\n" );
printf( "\tc - continues all the threads\n" );
printf( "\tc <thread#> - continues the specified thread\n" );
printf( "\ts - suspends all the threads\n" );
printf( "\ts <thread#> - suspends the specified thread\n" );
printf( "\tk <thread#> - sends SIGTERM to the specified thread\n" );
printf( "\tv - turns iteration printing on/off\n" );
printf( "\tq - ends the program\n" );

sigignore( SIGTERM );
bool bKeepRunning = TRUE;
while( bKeepRunning )
    {   
    thread_id = -1;
    thread_index = -1;
    gets( buffer );
    kar = toupper( buffer[0] );
    for( i = 1; buffer[i] != '\0'; i++ )
        {
            if( !isspace( buffer[i] ))
            {
            thread_id = atoi( &buffer[i] );
            for( i = 0; i < iNumThreads; i++ )
                {
                if( thread_id == pt[i]->tid )
                    {
                    thread_index = i;
                    break;
                    }
                }
            break;
            }   
        }
    switch( kar )
        {
        case 'A':
            iConcurrencyLevel++;
            // iConcurr... is 1 based, while iNumThreads is 0 based
            if( iConcurrencyLevel > (iNumThreads+1))
                printf( "Error: would have more LWPs than threads!\n" );
            else
                {
                if(( iRet = thr_setconcurrency( iConcurrencyLevel )) != 0 )
                    printf( "Error: thr_setconcurrency(%d) = %d\n", iConcurrencyLevel, iRet );
                }
            break;
        case 'I':
            for( i = 0; i < iNumThreads; i++ )
                printf( "\tThread id %d - %s\n", 
                    pt[i]->tid, pt[i]->Ended?"GONE":"STILL RUNNING" );
            break;
        case 'C':
            if( thread_id >= 0 )
                pt[thread_index]->Continue();
            else
                for( i = 0; i < iNumThreads; i++ )
                    pt[i]->Continue();
            break;
        case 'S':
            if( thread_id >= 0 )
                pt[thread_index]->Suspend();
            else
                for( i = 0; i < iNumThreads; i++ )
                    pt[i]->Suspend();
            break;
        case 'K':
            if( thread_id >= 0 )
                pt[thread_index]->Kill( SIGTERM );
            else
                for( i = 0; i < iNumThreads; i++ )
                    pt[i]->Kill(SIGTERM);
            break;
        case 'V':
            for( i = 0; i < iNumThreads; i++ )
                pt[i]->bVerbose = pt[i]->bVerbose^0x1;
            break;
        case 'Q':
            bKeepRunning = FALSE;
            break;
        default:
            printf( "Unknown command (%c) (%s)\n", kar, buffer );
            break;
        }
    }
// We really don't have to call End, because the return kills the 
// threads anyway, but cleanliness counts.
for( i = 0; i < iNumThreads; i++ )
    pt[i]->End();
printf( "P%d - Ending thread 0\n", getpid() );
return( 0 );
}
// flagset - tells whether a command-line flag was set. returns an index into
// argv where flag was detected. Use return val+1 to get arg following a flag
int flagset( int argc, char *argv[], char *flag )
{
for( int i = 1; i < argc; i++ )
    {
    if( strcmp( argv[i], flag ) == 0 )
        return( i );
    }
return( -1 );
// This function greps MAXULWP out of mtune so that we can tell 
// when we're about to exceed the maximum allowable number of LWPs per user id.
int GetMAXULWP()
{
int maxulwp, i;
FILE *fp;
char buf[BUFSIZE];

if(( fp = popen( IDTUNE_CMD, "r" )) < 0 )
    printf( "P%d - Couldn't exec %s - skipping MAXULWP check\n", getpid(), IDTUNE_CMD );
else
    {
    i = 0;
    while (fgets(buf, BUFSIZ, fp ) != NULL)
        {
        maxulwp = atoi( buf );
        printf( "P%d - Got MAXULWP value of %d\n", getpid(), maxulwp );
        i++;
        }
    if( i > 1 )
        printf( "P%d - ambiguous value for MAXULWP, skipping check\n", getpid() );
    pclose( fp );
    }
return( maxulwp );
}

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Open Source

Thread Programming in UnixWare 2.0

Just say "no" to fork()

Lightweight Processes

Threads

Concurrency

What's in a Thread?

Creation

Threads and Signals

Shared Data

Interthread Coordination

Thread Termination

Threads and Libraries

File I/O

Other Considerations

The Bottom Line

Example 1: Calling a new process.

Example 2: Pseudocode for conditions.

Figure 1: Listing One output at concurrency level 1.

Figure 2: Listing One output at concurrency level 2.

Table 1: Thread-specific calls and their process-specific analogues.

Table 2: Code granularity.

Listing One

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Open Source Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

Open Source

Thread Programming in UnixWare 2.0

Figure 1: Listing One output at concurrency level 1.

Figure 2: Listing One output at concurrency level 2.

Related Reading

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Open Source Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content