Richard Gerber has worked on numerous multimedia projects, 3D libraries, and computer games for Intel. Andrew Binstock is the principal analyst at Pacific Data Works and author of "Practical Algorithms for Programmers". They are the authors of Programming with Hyper-Threading Technology.
Parallel programming makes use of threads to enable two or more operations to proceed in parallel. The entire concept of parallel programming centers on the design, development, and deployment of threads within an application and the coordination between threads and their respective operations. This article examines how to break up traditional programming tasks into chunks that are suitable for threading. It then demonstrates how to create threads, how these threads are typically run on multiprocessing systems, and how threads are run using HT Technology.
Designing for Threads
Developers unacquainted with parallel programming are generally comfortable with traditional programming models such as single-threaded declarative programs and object-oriented programming (OOP). In both cases, a program begins at a defined point -- such as main() -- and works through a series of tasks in succession. If the program relies on user interaction, the main processing instrument is a loop in which user events are handled. From each allowed event -- a button click, for example -- an established sequence of actions is performed that ultimately ends with a wait for the next user action.
When designing such programs, developers enjoy a relatively simple programming world because only one thing is happening at any given moment. If program tasks must be scheduled in a specific way, it's because the developer chooses a certain order to the activities, which themselves are designed to flow naturally into one another. At any point in the process, one step generally flows into the next, leading up to a predictable conclusion, based on predetermined parameters -- the job completed -- or user actions.
Moving from this model to parallel programming requires designers to rethink the idea of process flow. Now, they must try to identify which activities can be executed in parallel. To do so, they must begin to see their programs as a series of discrete tasks with specific dependencies between them. The process of breaking programs down into these individual tasks is known as decomposition. Decomposition comes in three flavors: functional, data, and a variant of functional decomposition, called producer/consumer. As you shall see shortly, these different forms of decomposition mirror different types of programming activities.
Decomposing a program by the functions it performs is called "functional decomposition", also called "task-level parallelism". It is one of the most common ways to achieve parallel execution. Using this approach, individual tasks are catalogued. If two of them can run concurrently, they are scheduled to do so by the developer. Running tasks in parallel this way usually requires slight modifications to the individual functions to avoid conflicts and reflect that these tasks are no longer sequential.
If discussing gardening, functional decomposition would suggest that gardeners be assigned tasks based on the nature of the activity: If two gardeners arrived at a client's home, one might mow the lawn while the other weeded. Mowing and weeding are separate functions broken out as such. To accomplish them, the gardeners would make sure to have some coordination between them, so that the weeder is not sitting in the middle of a lawn that needs to be mowed.
In programming terms, a good example of functional decomposition is word processing software, such as Microsoft Word. When a very long document is opened, the user can begin entering text right away. While the user is doing this, document pagination occurs in the background, as can readily be seen by the quickly increasing page count that appears in the status bar. Text entry and pagination are two separate tasks that, in the case of Word, Microsoft has broken out by function and run in parallel. Had it not done this, the user would be obliged to wait for the entire document to be paginated before being able to enter any text. Many of you will recall that this wait was common on early PC word processors.
Producer/consumer (P/C) is so common a form of functional decomposition that it is best examined by itself. Here, the output of one task, the producer, becomes the input to another, the consumer. The important aspects of P/C are that both tasks are performed by different threads and the second one -- the consumer -- cannot start until the producer finishes some portion of its work.
Using the gardening example, one gardener prepares the tools -- puts gas in the mower, cleans the shears, and other similar tasks -- for both gardeners to use. No gardening can occur until this step is mostly finished, at which point the true gardening work can begin. The delay caused by the first task creates a pause for the second task, after which both tasks can continue in parallel. In computer terms, this particular model occurs frequently.
In common programming tasks, P/C occurs in several typical scenarios. For example, programs that must rely on the reading of a file are inherently in a P/C scenario: the results of the file I/O become the input to the next step, which might well be threaded. However, that step cannot begin until the reading is either complete or has progressed sufficiently for other processing to kick off. Another common programming example of P/C is parsing: an input file must be parsed, or analyzed semantically, before the back-end activities, such as code generation in a compiler, can begin.
The P/C model has several interesting dimensions absent in the other decompositions:
- The dependence created between consumer and producer can cause formidable delays if this model is not implemented correctly. A performance-sensitive design seeks to understand the exact nature of the dependence and diminish the delay it imposes. It also aims to avoid situations in which consumer threads are idling while waiting for producer threads.
- In the ideal scenario, the hand-off between producer and consumer is completely clean, as in the example of the file parser. The output is context-independent and the consumer has no need to know anything about the producer. Many times, however, the producer and consumer components do not enjoy such a clean division of labor, and scheduling their interaction requires careful planning.
- If the consumer is finishing up while the producer is completely done, one thread remains idle while other threads are busy working away. This issue violates an important objective of parallel processing, which is to balance loads so that all available threads are kept busy. Because of the logical relationship between these threads, it can be very difficult to keep threads equally occupied in a P/C model.