Safety through Feedback
Coming back to Earth, consider the problem of mitigating the effects of accidents As technology becomes more advanced, the possible consequences of an accident increase -- compare the Bhopal chemical accident with the sinking of the Titanic. The scientific question is why do such accidents happen?
MIT professor Nancy Leveson believes the problem has less to do with the failure of individual components than with the absence of control and feedback loops that can compensate for those failures. Here is her analysis of the Bhopal accident.
In December 1984, the release of methyl isocyanate (MIC) from the Union Carbide chemical plant in Bhopal, India, caused at least 10,000 deaths and more than 200,000 injuries, making it the worst industrial accident in history. The proximate cause was assigning a relatively new worker to wash out some pipes and filters. The worker neglected to insert a special disk to prevent water from leaking into the MIC tank. Water flooded in and caused an explosion. A relief valve opened, and 40 tons of MIC was released into the air above a populated area.
What really caused the accident? Was it an inadvertent error by an inexperienced maintenance worker, or was it part of a sabotage plot, as Union Carbide later claimed? Leveson believes that neither explanation holds. The pipe-washing operation should have been supervised by the second-shift supervisor, but that position had been eliminated in a cost-cutting effort. Inserting the disk was not the job of the worker washing the pipes, because he was a low-level employee and should not have been entrusted with this important task. Further, contaminating MIC with water was bad, but it should not have caused an explosion if the MIC were being kept refrigerated as called for by the design. The refrigerator, however, was not working. MIC itself should not have escaped up the flue, because a scrubber should have stopped it. But the scrubber was out of service.
In Leveson's analysis, the maintenance worker was only a minor and somewhat irrelevant player in the accident. Instead, safety degradation had occurred over time and without any particular single decision. A series of decisions had moved the plant slowly toward a situation in which any slight error would lead to a major accident. Although the Indian government blamed the accident on human error, Leveson would say the whole system was unsafe. Mistakes happen. The question is how to design and operate the system so that errors and failures do not lead to a catastrophic accident.
Systems theory treats safety as a control problem rather than a failure problem. You control safety by enforcing constraints on the behavior of the system and its components. Control, in turn, depends on the familiar notion of feedback. Think about how you deal with an unfamiliar hotel shower. You rotate the temperature dial to about the right level and then feel the water, adjust the control, feel the water some more, and so on. Even if the dial is very different from any you have ever seen, you are able to set the shower to a comfortable temperature in a short time. The single most important feature of feedback-based control is that the sensor (your hand) must be accurate while the control need only be reasonably smooth and monotonic -- turning the dial a little clockwise always has one outcome: making the water a little cooler.
The Bhopal plant had inadequate feedback and control over the pipe-washing operation -- the supervisor, had he been there, could have checked for safety and corrected any mistake. The scrubbers, had they been operational, could have detected and corrected the effect of the released MIC by changing its chemical composition. Proper control means detection followed by corrective feedback.
Leveson's philosophy of safety-driven design follows from these observations. She argues for designing safety into the system from the beginning. In her latest book, Leveson offers a simple example of the design for a train's automated door system. The first step in the design process is to identify the system hazards. Some of the hazards include:
- A person being hit by closing doors;
- Someone falling from a moving train; and
- Passengers and staff being unable to escape from a harmful condition in the train compartment, such as a fire.
These conditions are translated into constraints for the design of the automated doors and their control system:
- An obstructed door must reopen to permit the removal of an obstruction and then automatically reclose.
- A door should open only when the train is stopped.
- Means must be provided to open doors for emergency evacuation anywhere when the train is stopped.
These constraints may lead to various design decisions. For example, the first constraint could suggest the use of a sensor in a door similar to the one in an elevator that prevents the door from crushing a passenger. The second constraint might entail an interlock between a motion sensor and the door. The third constraint might combine the interlock with a "door open" instruction when an alarm is sounded. In effect, hazards give rise to constraints, which then give rise to design guidelines. The idea is to build in safety right from the start.
Making safety issues understandable at every level of the design, from the most abstract to the most concrete, may discourage operational management from optimizing costs at the risk of reducing safety. When the operations manual gives a rationale for every safety measure, it will be more difficult for management to let them lapse.
Leveson has applied system safety theory in the service of the U.S. Missile Defense Agency, which has constructed a giant interconnected system of systems. Some of the missile defense systems date back to the 1960s, including the Cold War era's North American Aerospace Defense Command (NORAD). Others are more modern, such as newer radar and platforms. Leveson was called in to assess not just the reliability of the individual parts, but the safety of the integrated whole. The major hazard in this case was inadvertent launch of an anti-missile missile. Not only is each missile very expensive, but an unscheduled launching may cause an over-reaction from a potential adversary. Leveson's collaborators found so many instances of missing feedback and control that the "deployment and testing phase of the new U.S. missile defense system was delayed for six months while they fixed them," she says.
Positive and negative feedback control loops can be found throughout nature. They keep organisms healthy and ecosystems in balance. System safety puts feedback control loops at the center of engineering design and operation. It's a simple enough idea to work.