The advent of multicore processors is forcing software developers to explore multithreading. A critical decision is whether to "do it yourself" using POSIX threads (Pthreads) or WinAPI threadswhich allow developers to directly manipulate operating-system threadsor to build atop a concurrency platforma layer of software that coordinates, schedules, and manages the multicore resources. Concurrency platforms include any of various thread-pool libraries, such as .NET's ThreadPool class; message-passing libraries, such as MPI; data-parallel programming languages, such as NESL, RapidMind, and variants of Fortran since Fortran 90; task-parallel libraries, such as Intel's Threading Building Blocks (TBB) and Microsoft's Task Parallel Library (TPL); and parallel linguistic extensions, such as OpenMP and Cilk++. I submit that most applications would do better by adopting a concurrency platform than by using DIY multithreading.
Over 100 interviews of prospective customers have led us at Cilk Arts to identify three key evaluation criteria on which customers refuse to compromise. We call them the "multicore-software triad":
- Development time: Any solution must offer programmer productivity. Dramatic restructuring of massive codebases to enable parallelism is out of the question. In addition, organizations must be able to train or acquire developers with the skill sets necessary to multithread.
- Application performance: The ostensible reason for multicore-enabling an application in the first place. In particular, good response time is a priority for human users. Moreover, as the number of processing cores per chip increases in accordance with Moore's Law, companies cannot afford to tailor their applications to the specific number of cores provided by each processor generation. Their programs must scale.
- Software reliability. In a word, races. Software with races can behave nondeterministically depending on how concurrent tasks are scheduled. These pernicious bugs are extremely hard to find by testing.
Although concurrency platforms have advantages over DIY multithreading with respect to application performance and software reliability, to a large extent, it boils down to the third leg of the triaddevelopment time.
Let's start with performance. DIY multithreading may seem convenient if you can functionally decompose your application in a natural wayfor example, into a user-interface thread, a compute thread, a render thread, and so on. This approach does not scale, however, because most applications contain only a few functional components suitable for implementation as separate threads. Moreover, if the threads' computational needs vary, you may find that you're significantly underutilizing processor cores. Even if you can break your application into many tasks, implementing each task as a thread may be impractical due to the large overheads for creating, switching, and synchronizing threads. One alternative is to write a scheduler to balance load across the processing cores, but that's tantamount to implementing your own concurrency platforma big hit on development time. In contrast, concurrency platforms allow you to express the parallelism in your application, and they take the responsibility of automatically managing tasks, load balancing, and synchronizing.
DIY multithreaded applications are notoriously unreliable due to race conditions. A race occurs when concurrent threads access a shared-memory location and at least one of the threads stores a value into the location. Depending on how instructions are scheduled, the software may behave differently. DIY multithreading involves writing race-prone protocols for communication and synchronization, and regression testing is generally ineffective at detecting races. The alternative is manual code inspections that can dramatically increase development time. In contrast, concurrency platforms provide structured communication among logically parallel tasks, allowing programmers to avoid direct programming of protocols. Where protocols are necessary, some concurrency platforms offer race-detection tools. Although some such tools exist for DIY multithreading, they tend to offer less code coverage and run far more slowly than tools provided by concurrency platforms, adversely impacting guess what? Developer time.
Parallel programming has earned a reputation for being difficult, and a good deal of that credit owes itself to applications written without a concurrency platform. So, if you don't mind that your multicore software is a mess, if you don't mind giving corner offices to the few engineers who have at least some clue as to what's going on, if you don't mind the occasional segment fault when your software is in the field, then DIY multithreading might be for you. But, if that prospect sounds at least mildly unpleasant, check out one of the concurrency platforms mentioned above.