Testing and Logical Fault Tolerance for Parallel Programs
The goal of testing software is to make sure that the software does what we want and we want what is does. An application is often requested as a list of desired features. This list can be represented as a formal specification consisting of hundreds of pages, or can be as simple as verbal requests from an employer detailing a dozen or so requirements.
Regardless of how the list of requirements for a piece of software is generated, the testing process must make sure that the software meets those requirements and that the requirements meet the user's expectations. In many cases where parallel computers or multiprocessors are involved, the user's expectations include performance speed gains or a certain level of high performance throughput. The kinds of software errors will increase when multithreading or multiprocessing is added in attempts to achieve the user's expectation.
When the software does not perform according to the specifications, the software is in error even if the specification violation is that the system performs too slowly. Testing and debugging are in the top ten challenges for software that has a concurrency requirement. Software that requires parallel programming can be notorious to test and debug. We use exception handling to provide a kind of logical fault-tolerance for declarative architectures. That is, if our application for unknown and uncontrollable reasons violates statements, assertions, rules, predicates, or constraints from our PBS (Predicate Breakdown Structure), we want to throw an exception and gracefully exit because once our predicates have been violated, then the correctness, reliability, and meaning of the application has been compromised. The journey towards fault tolerance in our software begins by recognizing that:
- No amount of exception handling can rescue a flawed or inappropriate software architecture
- The fault tolerance of a piece of software is directly related to the quality of its architecture
- The exception handling architecture cannot replace the testing stages
In our approach toward declarative interpretations of parallel programming, we are moving more toward logical models. Ultimately we want non-logical models or irrational program behavior to be considered an exception. So the exception handling strategy flows from Layer 5 of the PADL (Parallel Application Design Layers) and the PBS. It is a fundamental part of the software architecture. User-defined C++ Predicates form the application's logical argument. If one of the assertions or predicates turns out to be false, then the application is irrational at that point. The PBS of an application clearly defines what the possible worlds of an agent or knowledge source will operate within. For every world that is possible for the agent, there is a set of acceptable code that the agent or knowledge source can execute.
As software developers, we produce applications in the fields of medicine, manufacturing, homeland security, transportation, finance, education, scientific research, and all areas of business. We have an ethical and moral responsibility to produce software that is safe, correct, reliable, and fault tolerant. Anything less is malpractice.
This is an excerpt from our book, Professional Multicore Programming: Design and Implementation for C++ Developers.

