.NET

Multithreading, Fthreads, & Visual Fortran

By Dan Nagle, July 01, 2001

Dan presents a Fortran module that helps you write multithreaded programs for Windows-based applications.

Jul01: Multithreading, Fthreads, & Visual Fortran

Dan is a computer consultant. He can be contacted at http://users.erols.com/dnagle/.

Fortran programs often require great execution speed to process large numbers of floating-point operations in an acceptable time. Other than simply using a high-quality compiler and high-clocking processor, one way of obtaining high processing rates is to run programs on multiprocessor systems. And if you are using Compaq Visual Fortran (CVF; http://www.compaq.com/fortran/) under Microsoft Windows NT/2000 on multiprocessor hardware, you can take advantage of the multithreading capability of the operating system to execute a multithreaded program. This is because Windows NT/2000 lets you (via system calls) create, synchronize, and terminate multiple threads per process.

Other than execution time, you might want to use a multithreaded solution when a program must support different tasks that would be difficult or inefficient to code together. For example, a program might need to update a graphic output screen while also computing the data being displayed. Mixing calls to graphic routines throughout the computation can result in a difficult-to-follow path of execution. Separating the different tasks by writing the program with the computation in one thread and the graphic output in another makes for more readable and reliable code. Also, placing input/output statements in their own thread lets you overlap input/output processing and computation, which can greatly speed up some calculations.

In this article, I'll describe Windows-based multithreading and present an fthreads module (available electronically from DDJ, see "Resource Center," page 5, and at http://users.erols.com/dnagle/fthreads.html) that can help you write multithreaded programs. Additionally, I'll present a program (also available electronically) that illustrates multithreading using fthreads.

Some Background

A Windows process consists of some memory, a state of execution, and at least one sequence of instructions to be executed. The state of execution may be something like "executing," "suspended awaiting completion of a system call," or "terminating due to an arithmetic exception." The state of execution together with the sequence of instructions is called a "thread." All processes have one thread; a process with several threads is called a "multithreaded process."

Conceptually, it's easy to program a multithreaded process. The first thread to run in a process is called the "primary thread," which starts additional threads called "worker threads." These threads must then be synchronized, which is done via system calls that manipulate synchronization objects. The Windows API provides system calls to accomplish all of these tasks.

When it comes to actually making the system calls, nothing is quite as simple as it sounds, especially for Fortran programmers. In general, Windows is meant to be programmed from Visual C++ and the binding to the system calls is written for that language. Further, the Windows API requires that programs maintain a handle for each thread and synchronization object. Each thread and synchronization object may optionally have a name, and each has a system ID (an integer uniquely identifying each thread or synchronization object known to Windows). A program using multiple threads must keep all this bookkeeping correctly. Confusing a thread handle with the handle of a synchronization object can result in making a bad system call, which terminates the program.

These complications are why you might want to consider using fthreads in place of the raw Windows system calls when writing multithreaded programs for Windows. The fthreads module provides a binding to the Windows API thread system calls that are available under the GNU GPL. The fthreads procedures manage the handles, system IDs, object names, and the Windows API calling sequence. This means you can concentrate on the Fortran.

The fthreads module defines a separate type for each category of system call, so the compiler can check that each routine is being called with arguments of the correct type. Optional arguments are used so you won't have to remember to add actual arguments for unused capabilities. And fthreads add a trace capability to assist you with the task of debugging the multithreaded program or of following the path taken by the program's execution. Figure 1 presents the general flow of an fthreads program.

The Trace Facility

One of the fundamental experiences of programming is running a new program — that you know is right — for the first time and getting a core dump as the result. Multithreaded programs multiply the opportunities for such disappointments. You not only have the correctness of each thread to consider, but also the interaction of the threads with each other.

The fthreads module contains a trace facility to assist following the course of an fthreads program. The facility consists of a trace type and procedures for manipulating variables of the trace type. You initialize a trace variable, then pass it to most of the fthreads procedures. The procedures leave a message in the trace variable, including a timestamp, indicating the action performed. You can place your own messages in the trace variable. The trace variable contains a circular buffer holding a number of messages; the number is determined when the variable is initialized.

Starting and Ending fthreads

The fthreads module uses allocatable data structures to manage the program's thread data. This lets you select a large or small number of threads and synchronization objects without wasting memory. You call fthreads_init() to set the number of threads, teams, and synchronization objects the program uses. Procedure fthreads_init() must be called before any threads or synchronization objects are created. The data structures are deallocated by calling fthreads_end(). After the call to fthreads_end(), the program may not call any of the statistics-gathering procedures because fthreads_end() has deallocated the data structures. After fthreads_end() has been called, another call to fthreads_init() is necessary before further use of fthreads (excepting the trace facility and some basic inquiries) may be made.

Creating and Waiting for Threads

A program starts additional threads by calling thread_create(). One required argument to thread_create is the task variable, which is the procedure to be executed as the newly created thread. According to the Win32 API, this procedure must be a function of a single argument that returns a default integer. The thread_create() procedure returns a thread variable to the calling procedure. The thread variable is also passed to the task procedure as its single argument. This variable identifies the thread and is passed to any procedure that refers to this thread. Its value may be determined by calling thread_id() (perhaps to index into an array).

One thread waits for another thread to complete by calling the thread_wait() procedure. The thread_wait() procedure will not return until the thread being awaited has returned. When the thread completes, it returns an integer value to the thread_wait() procedure. This value may signal anything determined by the program. If teams are being used, a thread may wait for a team to complete by calling thread_waitall().

Synchronization Objects

There are three types of synchronization objects — barriers, events, and mutexes. Each object is represented by a variable of unique type that must be created before it is used. Each object's variable may be deleted when no longer needed. The synchronization procedures have explicit interfaces, so some incorrect use may be detected by the compiler. An important conceptual point is that threads are not synchronized except for that synchronization, which may be inferred from the placement of the synchronization procedures because the threads are scheduled to run independently by the operating system. If you catch yourself thinking, "It should be safe here because this thread has only gone a little way, and that thread has to go a long way before reaching the critical point," you have laid a trap for yourself. Worse, the program may complete successfully many times before the lack of synchronization silently gives wrong results.

Barriers are used to cause a point in the program where all threads will wait until all threads have arrived. Barriers may be thought of as being two counters called height and current. When created, height is permanently set to a value, the total number of worker threads (or the number of threads on the team, if the barrier is associated with a team). The current variable is set to zero. Each time a thread calls barrier_sync(), current is incremented by one and then tested against height. When current equals height, current is set to zero, and all threads proceed through the barrier. For example, suppose each thread is operating on a portion of a large array in an iterative loop. The iteration is supposed to exit when a convergence criterion is met. You place a barrier call at the point in each iteration after all the computation is completed, but before the test for the convergence criterion is made. Thus, all threads will wait until the entire array is processed for each iteration before contributing to the convergence test. This way, each thread is testing the array on the same iteration and the test is conceptually the same as in the single-threaded case.
Events are simple variables with two states, much like a logical variable has two states (.true. and .false.). In the case of an event, these two states are called "clear" and "posted." When an event is created, it is clear. A thread calling an event wait procedure will wait while the event is clear until the event is posted by another thread. Events are often used in pairs. For example, suppose one has a separate thread to compute and a thread to perform writing buffers to an unformatted file. One event might signal to the writer thread "ready to write the next buffer to the file," and another event might signal back "buffer written successfully, ready to reuse the buffer."
Mutex stands for "mutual exclusion." Mutexes, therefore, are used to allow only one thread at a time to execute a certain section of code. Like an event, a mutex has two states called "locked" and "unlocked." When a mutex is created, it is unlocked. A thread calling the mutex_lock() procedure waits while the mutex is locked until the mutex is unlocked. For example, suppose a chemistry program has several threads, each computing different contributions to the total energy of a molecule (rotational energy, vibrational energy, and so on). Each thread must: First, fetch the variable containing the total energy to a register, then add its contribution, then store the total energy variable back to memory so it may be fetched by another thread running on another processor. Each thread must surround the statement containing the fetch, addition, and store sequence with mutex lock and mutex unlock calls to ensure that each thread accurately updates the total energy variable.

Critical Operations

Critical operations are a set of procedures for computing quick updates to scalar variables where efficiency is a concern. They are all of the form result = result OP operand, where OP may be +, *, /, and, or, eor, max, min, max-and-copy, or min-and-copy. The max-and-copy and min-and-copy operations are useful when, for example, the array index of the largest or smallest element of an array is needed. There are especially fast operations to increment or decrement default integers by one.

An Example fthreads Program

The example fthreads program I present here (available electronically) has two main parts. In the first part, a procedure called producer() fills a buffer with random numbers. This is to mimic a procedure producing results, for example, by inverting a matrix or other computation. The second procedure, called consumer(), uses the buffer (it simply computes the average of the numbers in the buffer). This is to mimic a procedure using the results of an earlier step in a calculation. For example, if the first procedure is inverting a matrix, the second may be using the inverse matrix for further calculation. These two procedures illustrate a concept called a "software pipeline," in which data flows through steps in a calculation. The transfer of data between each step in the pipeline requires two event variables to signal the state of the buffer. The second half of the example is a Poisson solver where the worker threads cooperate by dividing the array and each processing a portion.

DDJ

1 2 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

.NET