Channels ▼

Walter Bright

Dr. Dobb's Bloggers

Safe Systems from Unreliable Parts

October 29, 2009

A recent article in Wired [1] piqued my interest because it touched on an interesting subject. In it, a medical "gamma knife" used to treat cancer patients had
suffered from a software glitch. The emergency shutoff switch had no effect, causing the staff to have to run in and extract the patient before he was killed.

How does one create a safe machine? The first thought one has is to make the machine perfect so it cannot fail. Evidently, the designers of that gamma knife followed this principle, hoping their software is perfect and following up on imperfections with bug fixes.

The problem with that approach, of course, is it is impossible to create a component that cannot fail. As programmers, we all know it is impossible to create bug-free software. Even if no expense was spared, perfection can only be asymptotically approached, at exponentially increasing costs. Even if the software was perfect, there could be a hardware failure that corrupts the software.

How can a safe system be created using flawed, unreliable parts, at a practical cost? I used to work for Boeing on flight critical systems design, and received quite an education on this. Boeing is, of course, spectacularly successful at making an inherently unsafe activity, flying, astonishingly safe. They do this by acknowledging the principle:

Any component can fail, at any time, in the worst conceivable manner.

Now, what to do about that?

Any critical compononent or system must have a backup.

Let's illustrate this with a bit of math. Given component A that has a 10% failure rate, we need to get it down to 1%. Improving the quality of that component by a factor of 10 will get us there, but at a cost explosion of 10 times the price. But suppose we add in a backup component B, that also has a 10% failure rate. The odds of A and B both simultaneously failing are 10% of 10%, or 1%. This is achieved by a mere doubling of the cost instead of an order of magnitude increase. The reliability can be further improved to 0.1% by adding another backup component C with 3 times the cost, instead of a hundred times.

In order for this to work, though, A and B (and C) must be completely independent of each other. A failure in one cannot propagate to the other, and the circumstances that cause one to fail must also not cause the other to fail. This independence is where the hard work comes in.

In the gamma knife example (I am inferring based on the scanty information in the article) the emergency shutoff switch relied on the same software as the rest of the system did, so when the software crashed, the failure propagated to the shutoff system, and that didn't work either.

A shutoff system that was not coupled to failures in the software could be one that simply turned off all power to the machine. The gamma radiation source would be automatically blocked upon power failure by shields normally held back when power is on with electromagnets.

Unfortunately, for the makers of the gamma knife and in resolutions of earlier problems with similar machines [3] the focus instead was on rolling out bug fixes and similar attempts to make the software perfect. Reading the comments on the article [1] also show a focus on trying to make the software perfect. While it is worthwhile to try to make the software better and less buggy, the point is it is not the path to making safe systems, because it is impossible to make perfect software.

Conclusion

Reliance on writing perfect software and rolling out bug fixes to correct any imperfections is a fundamentally unsound and unsafe approach. A safe system is one with a backup or failsafe shutdown that is independent and completely decoupled from the primary system. Redundancy is not enough, decoupling is required as well in order to prevent the failure of one system propagating to the backup.

In the next installment, I'll talk about incorporating some of these ideas into software to improve the reliability of that software.

References

[1] 'Known Software Bug' Disrupts Brain-Tumor Zapping
http://www.wired.com/threatlevel/2009/10/gamma/

[2] History's Worst Software Bugs
http://www.wired.com/software/coolapps/news/2005/11/69355?currentPage=all

[3] An Investigation of the Therac-25 Accidents
http://courses.cs.vt.edu/~cs3604/lib/Therac_25/Therac_1.html

Thanks to Bartosz Milewski, Brad Roberts, David Held, and Jason House for reviewing this.

Related Reading


More Insights






Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

Dr. Dobb's encourages readers to engage in spirited, healthy debate, including taking us to task. However, Dr. Dobb's moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing or spam. Dr. Dobb's further reserves the right to disable the profile of any commenter participating in said activities.

 
Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
 


Dr. Dobb's TV