Channels ▼

Cameron and Tracey Hughes

Dr. Dobb's Bloggers

It's Not My Fault ...

May 19, 2010

Failures are the result of a defect in hardware, software, or human operation. If the software is not running, then it cannot encounter defects.

Although this is an obvious statement it is important in understanding some of the distinctions between the responsibilities and activities of the testing phase versus those of the exception handler.

The more defects found and removed during testing, the less defects encountered by the software during runtime. Defects encountered during runtime lead to failures in the software. Failures in the software produce exceptional conditions for the software to operate under. The exceptional conditions require exception handlers. So the balancing act is between defect removal during the testing stages versus defect survival during exception handling.

When choosing defect survival over defect removal, the problem is that exception handling code can become so complex that it introduces defects into the software. So instead of providing a mechanism to help achieve fault tolerance the exception handler becomes a source of failure. Defect survival reduces the software's chance to operate properly. Extensive and thorough testing removes defects which reduces the strain on the exception handlers. It is also important to note that exception handlers do not occur as free standing pieces of code. They occur within the context of the overall software architecture. The journey towards fault tolerance in our software begins by recognizing that:

  • No amount of exception handling can rescue a flawed or inappropriate software architecture.
  • The fault tolerance of a piece of software is directly related to the quality of its architecture.
  • The exception handling architecture cannot replace the testing stages.

To make a discussion about exception handling clear and meaningful, it is important to understand that the exception handling architecture occurs within the context of the software architecture as a whole. This means that exceptions are identified by the PBS (Predicate Breakdown Structure) and PADL (Parallel Application Design Layers) analysis. The solution model has a PBS when we have an unavoidable, uncontrollable, unexplainable deviation from the application architecture's PBS then we have an exception. So the exception is defined by clearly articulated architectures. If the software architecture is inappropriate, incomplete, or poorly thought out then any attempt at after-the-fact exception handling is highly questionable. Further, if short cuts have been taken during the testing stages (i.e. incomplete stress testing, incomplete integration testing, incomplete glass box testing and so on) then the exception handling code will have to be perpetually added to and will become increasingly complex, ultimately detracting from the software's fault tolerance and the declarative architecture of the application. On the other hand if the software architecture is sound and the exception handling architecure is compatible and consistent with the PBS and Layers 3, 4, and 5 of the PADL (see blog) analysis, then a high degree of fault tolerance can be achieved for our parallel programs. If we approach our goal of context failure resilience with an understanding of the roles that software application architecture and testing play then it is obvious that we choose defect removal over defect survival. Defect removal takes place during testing.

So what about parallel systems? Parallel systems require even more effort during the testing phase. So we make use of our PADL analysis and PBS breakdown during our test plan. We break up the testing goals of parallel programs into answering three fundamental questions:

  1. Do the design models and PBS correctly and completely represent the solution model? (assuming that the solution model solves the original problem)
  2. Does the implementation model map correctly to the design models and the PBS (Layer 4 and 5 from PADL)?
  3. Have all of the challenges to concurrency in the Implementation model been addressed?

(This is an excerpt from our book "Professional Multicore Programming: Design and Implementation for C++ Developers: Chapter 10: Testing and Logical Fault Tolerance for Parallel Programs".)

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.