Toward Fault Tolerant Parallel Programs
The goal of testing software is to make sure that the software does what we want, and we want what it does.
An application is often requested as a list of desired features. This list can be represented as a formal specification consisting of hundreds of pages, or as simple as verbal request from an employer detailing a dozen or so requirements.
Regardless to how the list of requirements for a piece of software is generated, the testing process must make sure that the software meets those requirements and that the requirements meet the user's expectations. In many cases where parallel computers or multiprocessors are involved the user's expectations include performance speed up or a certain level of high performance through put. The kinds of software errors will increase when multithreading or multiprocessing is added in attempt to achieve the user's expectation. When the software does not perform according to the specifications the software is in error. Even if the specification violation is that the system performs too slowly. Testing and debugging are in the top ten challenges for software that has a concurrency requirement. Software that requires parallel programming can be notorious to test and debug. Testing and debugging are among the major top 10 challenges to parallel programming. Some of the main issues that are unique to testing and debugging multithreaded or multiprocessing programs are:
- Simulating minimum-to-maximum volume loads
- Duplicating the exact flow of control during debugging
- Duplicating race condition errors during debugging
- Duplicating system-wide process and thread contention
- Finding hidden unsafe thread functions
- Testing and debugging non-deterministic algorithms
- Proving there are no possibilities for deadlock or data race in the Software
- Simulating boundary and average workload mixes
- Checking intermediate results When 100s or 1000s of threads and processes are in execution
- Identifying the right number of threads or processes that will generate acceptable performance
The PADL (Parallel Application Design Layers) and PBS (Predicate Breakdown Structure) analysis create the primary concurrency infrastructure and implementation models that testing phases will validate and verify. Declarative and predicate-based approaches lend themselves to more automated forms of model checking and testing. Declarative designs lead to declarative implementations. Declarative implementations bring the testing complexity of medium to large scale parallel programs within reach of the software developer. Regardless to the complexity of the testing or debugging process the goal is to deploy software that is error free and fault tolerant. The testing process must find each error and software defect and remove it.
(This is an excerpt from our book "Professional Multicore Programming: Design and Implementation for C++ Developers".)