CSC 379: Ethics in Computing  
  Summer II 2006  
 
 
 
 
   
   
   
   
  COURSE OVERVIEW  
  This course is a survey of the ethical issues involved in computing. It discusses the way that computers and software pose new ethical questions or pose new versions of standard moral problems and dilemmas. It stresses case studies that relate to ethical theory.  
     
  INSTRUCTOR  
  Edward F. Gehringer
Office: 2301 Partners I
(919) 515-2066
Office hours:
MW 2:45-3:45
efg@ncsu.edu
 
     
  TEACHING ASSISTANT  
  Ahmed Bakir
abakir@ncsu.edu
919-641-6642
 
     
 
   
Lecture 13: Software Safety
 
   

The great blackout of 2003. At about 4:00 Eastern time on August 14, 2003, the lights went out from Michigan to as far east as Massachusetts, as well as the Canadian province of Ontario. Major cities without power included Cleveland, Columbus, Detroit, Toronto, and Ottawa, as well as much of the New York City area. Approximately 50 million people were affected by the outage, making it the largest blackout in American history. Power in some areas was not restored until two days later.

Inadequate software played a major role in the outage, reported the government task force assigned with investigating the causes. Electric companies monitor the condition of their transmission lines and use a state estimator to process the data mathematically and make it consistent with the electrical system model. At 12:15, the Indiana-based Midwest Independent Grid Operator's state-estimator software produced a solution that was obviously in error. This was caused by a transmission line that had been taken out of service. The analyst found the problem in about 45 minutes, and then left to go to lunch. Unfortunately, while looking for the problem, he had turned off the automatic trigger that runs the state estimator every five minutes. No one noticed that it was off until 2:40 PM. When turned back on, it again failed to reach a solution. This was because in the interim, a 597 MW generating station near Lake Erie had gone out of service, and a Dayton Power & Light high-voltage transmission line had tripped due to contact with trees. In the meantime, power was being transmitted over the wrong lines by operators who were not aware of these outages. By the time they figured out what was wrong at 4:04 PM, the cascade of outages was about to begin.

Meanwhile, over in Northeast Ohio, First Energy was having trouble with their computer systems. At 2:14 PM, FE's control-room operators lost the alarm function that provided visual and audible indications of trouble with equipment. Soon therafter, they lost a number of remote consoles. Their main server went down, followed quickly by the backup server at 2:54 PM. However, for more than an hour, the control room had no indication that anything was wrong. Although FE's IT staff was working to bring the systems up, they didn't notify the control room. The lack of alarms left the controllers with no clue as to the extent of the problems facing them, and led them to discount phone calls from outside that warned of their growing crisis.

The damage might have been averted if software had been more persistent in issuing warnings. The systems should have been programmed to make it obvious that the state-estimator was not running, and that the alarm system was inoperative. Only three people died in the great blackout. But in Panama in 2001, lack of software-issued warnings may have cost up to 21 lives. In that case, 28 patients at the National Cancer Institute in Panama City recieved massive overdoses of gamma rays because a user interface did not inform the operator of a radiation-therapy machine of problematic input data.

Radiation overdoses. In the Panamanian case, a doctor places metal shields, called "blocks," above the area where tumor is located. The purpose is to protect normal or sensitive tissue from radiation. The software for the radiation machine assumed that up to four blocks would be placed over the patient. However, the doctor was trying to be more cautious by adding a fifth block. The technician found a way to enter data for five blocks, and the system displayed a diagram that seemed to confirm their location. However, if the data for these blocks was entered in a certain order, the software would miscalculate the treatment times, by a large amount. This led to overdoses of 20% to 100%, which burned the patients and left some disabled. In the forty months after the overdoses, 21 of these 28 patients died. The case has led to a trial of the machine's operator for second-degree murder, and a lawsuit against the machine's manufacturer, Multidata Systems International, that threatens to bankrupt the company.

The London Ambulance fiasco. Health-related software has frequently been involved in fatal malfunctions. Prior to 1991, the London Ambulance Service was using an uncomputerized dispatching system. They would write the details of calls on pieces of paper, and pass them to an allocator. The allocator chose the nearest available ambulance and forwarded details to a dispatcher, who telephoned or radioed directions. Sixty-five percent of ambulances reached their destination in fifteen minutes. A new system was ordered from two different vendors, Datatrak and Systems Options. Initially, it was configured as a semicomputerized system--paper reports of calls were still printed out and passed to dispatchers. It crashed on its first training session. In February 1992, an operator inadvertently switched off a screen and lost four emergency calls.

In October, 1992 a new computer system did away with direct human responses to telephone calls. On its first two days, fewer than 20% of ambulances reached their destination within 15 minutes. Some callers were put on hold for up to 30 minutes. This was not because of an unusually high volume of calls, but because worried callers would call back. Dispatching of ambulances was delayed by up to 11 hours. Up to 20 people may have died as a result. Calls were lost. One disabled woman was trapped in her chair by the body of her collapsed husband. She called the LAS every 30 minutes. On each call, she was told that there was no record of her earlier call. An ambulance eventually arrived 2 hours and 45 minutes after the initial call. By this time, her husband had died.

Design flaws. The system manifested some basic design flaws. Following cutover to the automated system, there was no backup procedure. The user interface gave the operator no way to scroll through the list of pending calls. The effects of system overload had not been properly anticipated; the exception list kept growing. Requirements were not met. Robin Bloomfield, a government consultant on this system, said it was a fundamental requirement to have several layers of defense against failure. "With about a million calls a year," he explained, "the system has to be more reliable than a nuclear reactor backup system." As put into operation, the only backup it seemed to have was that people would make their own arrangements if the system failed.

Forewarnings of disaster. Senior officials of London Ambulance Service were warned that system would be an "expensive disaster," by Michael Page, whose company submitted a competing bid which had been rejected a year before. Page wrote a series of memoranda to LAS in June and July 1991, warning that the tracking subsystem, which tracks ambulances and dispatches the nearest one to a call, was not up to the requirements. "The rule-based analytical approach used by the LAS cannot," he wrote, "deal as well as an experienced operator with the small minority of difficult cases. The system wrongly reduces the influence of operators."

Why we can't be sure that software is correct. It is hard to establish the correctness of software. First of all, we can't ignore design failures. In physical systems, the hardware failure rate probably greatly exceeds the probability of failure from design errors. So in practice, we can ignore design-failure probabilities in safety and reliability calculations. Secondly, redundancy is no solution. In physical systems, redundancy reduces the impact of random hardware failures. Unfortunately, software does not exhibit random failures. "Common-mode" failures can occur. For example, consider the hydraulic system on United Airlines Flight 232, July 18, 1989. This was not digital technology, but it was a complex system. An engine failed, and the resulting explosion disabled all three hydraulic systems. After that, the jet could not be controlled. The three hydraulic systems had been put there for the sake of redundancy. However, lines for all three hydraulic systems converged in one small area near the tail. An explosion near that site disabled them all. Now on the Boeing 777, wiring for the digital flight-control system is routed all over the plane, so no single explosion can disable it.

Decision paths in software are complex. Even in a program of a few hundred lines, there are dozens of branches and thousands of paths. Not all of these paths can be tested. Maybe the designer misunderstood the situation responsible for those inputs, or failed to take the situation into account at all. For example, the Patriot missile failed to intercept Scuds during the first Gulf War. This was due to the cumulative effect of inaccuracies in timekeeping by a computer. The system was meant to be turned off and restarted often enough for accumulated error never to become dangerous.

A lot of oversights, it seems, involve time. The Y2K problem arose because programmers who used two-digit date fields never suspected that their code would live on till the year 2000. Maybe we shouldn't blame them for that. But surely we'd expect programmers to anticipate the beginning of daylight-savings time next spring! But sometimes they haven't. At the end of April 1993, when Germany went on summer time, the computer clock of a German steel producer went from 1:59 AM to 3:00 AM in one minute. This resulted in a production line allowing molten ingots to cool for one hour less than normal. When the process controller thought the cooling time had expired, his actions splattered still-molten steel, damaging part of the facility. Similarly, latitude and longitude have sabotaged several computerized systems. Early American air-traffic control software could not be used in Britain because it was unable to cope with longitudes less than 0. A bug in the simulator for the F-16 fighter aircraft caused it to flip over whenever it crossed the equator.

Small changes in programs make big differences. In many physical systems, a small change to stimuli will produce only a small change in response. But in programs, changing a single bit from 0 to 1 can radically change the operation of a program. The destruction of the Mariner probe is an example. This was the first US interplanetary probe, meant to explore Venus. There was a bug in control program for Atlas rocket that launched it. On July 22, 1962, a single incorrect character caused it to veer off course. An equation was missing a "bar" that told it to use a set of averaged values instead of raw data. This led the computer to decide the rocket was behaving erratically, although it was not. When the computer tried to correct the situation, it did cause erratic behavior. For the safety of people on the ground, both rocket and spacecraft had to be destroyed shortly after launch.

In 1992, a Dutch chemical plant exploded due to a single mistyped character. Three firemen were killed and eleven workers injured. Fragments were found at a distance of 1 km. The damage was estimated at several tens of millions of guilders (or about half that much in dollar terms). A lab-worker trainee typed tank 634 instead of 632 as the source tank for a chemical used in a reaction. Instead of resin feed classic, he put in dicyclopentadiene. He failed to check whether the tank contents were consistent with the recipe. When the reactor overheated, he called the plant fire department. The fire department expected the reactor contents to be released via a safety valve. So they were connecting deluge guns to prevent the fire from spreading. They did not have safety equipment on. Instead, the reactor ruptured and exploded. They called in the city fire department, but it had to let the fire burn out to prevent environmental damage from polluted firefighting water.

We can't test software "long enough." In June and July 1991, a series of outages affected telephone users in SF, DC, VA, WV, Baltimore, and Greensboro. The problems were in a switching program written by DSC Communications. The program consisted of several million lines of code. They ran it through a 13-week series of tests, and it passed. Then they changed three lines. They didn't think it necessary to repeat the 13-week series of tests. They were confident they understood the effects of that change. The program crashed repeatedly.

Overconfidence was the culprit in the most widely known case of unsafe software. It involved the Therac-25 , a radiation-therapy machine. Between June 1985 and January 1987, six known accidents occurred, involving patients being given massive overdoses of radiation. Two patients in Galveston, TX died. A machine like the Therac-25 can deliver two kinds of radiation, X-rays and electrons. Electrons work well for irradiating cancers near the surface. For cancers further in, X-rays are used. To get the X-rays, a tungsten shield is placed over the patient's body. A very powerful stream of electrons is directed at it, which causes it to emit X-rays. The X-rays must be emitted only when the shield is in place. Here is the error that caused the problem: The operator prepared to send X-rays, then realized she had made a mistake. She switched the machine over to electrons, and the shield retracted. But it retracted before the intensity of the beam was lowered. Patients felt a severe burning sensation, and this was only the start of their problems. Radiation sickness followed.

Some of the software from the Therac-20, a previous version, was reused in the Therac-25. The Therac-20 had included a mechanical interlock. But it had been run for years without any software problems being noticed. So the mechanical interlock was removed on the Therac-25. Turns out, when the bug occurred in the Therac-20, before any harmful radiation could be emitted, it blew a fuse.

Achieving greater confidence. Even if we can't ensure that software is perfectly correct, how can achieve greater confidence? One approach is formal specification. Natural language is a wondrous instrument. It allows us to say things like, "Do you feel more like you do now than when you came in?" The requirements for a complex system will almost inevitably be complex. If expressed in English, they can easily be incomplete or self-contradictory. In a large document, these contradictions can be far from obvious.

Writing formal specs in a specification language is difficult and time-consuming. Therefore, large, complex systems are virtually the only ones for which it is undertaken. Unfortunately, specifications must also be formal. If the specification is contradictory, we should be able to detect it with analysis techniques. If the specification is incomplete, design may reveal this incompleteness. But if it is functionally wrong, this can only be determined by human review of the functioning of the system. The problem is that our "domain experts" are unfamiliar with specification languages and mathematical logic. For example, Teletronic Pacing Systems develops software for pacemakers. They wrote their specs in a formal specification language. But the cardiologists couldn't understand them. So they had to rewrite them in English.

We can attempt to verify our systems mathematically. This approach relies on assertions, preconditions about what has to be true on entry to routines and postconditions about what should be true when they are exited. However, a program can only be proved correct with respect to something else, e.g., a specification. So if the specification is wrong, the proof is useless. You prove you have correctly implemented the wrong thing.

Large programs are too tedious to proved correct "by hand." Therefore, they must be proved by theorem-proving software. But suppose this software has bugs ... Besides, programs can't be checked the way mathematical proofs are checked. In math, some proofs are complicated. There are even cases where two groups of mathematicians have proved contradictory things, and neither was able to disprove the others' work. So, our certainty about math relies upon mathematicians checking the work of others. Prove them wrong, and you get to publish the results. But program proofs are too long and mechanical for anyone to verify by reading them. John Rushby, a proponent of formal verification methods, says, "No one reads this stuff with a lot of care--it's too boring."

One experiment was to prove emergency shutdown system for the Darlington nuclear plant on Lake Ontario. The system was put in software so they could save money on specialized hardware and achieve greater safety, by providing better information to the operators. They divided their programmers into four teams. One team converted the programs into logic. Another did the same with the specs. A third team verified that the programs implemented the specs. A fourth team checked the work of the third. This delayed the project by six months, and in the meantime, the plant could not be opened. The plant cost its investors $20 million for each month it was idle. So the delay cost $120M. But at least it saved $1M of hardware.

n-version programming. OK, if it's too hard to prove that one program is correct, maybe we can have different teams of programmers write entirely separate programs and see if their results agree. If, say, three versions of the program are written, on a critical decision, the majority can rule. The idea is that independently developed programs will have independent bugs, and won't fail at the same time. A test conducted by Nancy Leveson attempted to validate this approach experimentally. Students at different grad schools wrote many versions of the same program and tested them extensively. The same inputs caused many different versions to fail. It turns out that the hard parts of programs are just hard, no matter who writes them.

Living with unreliable software. Let's face facts: We'll have to learn to live with software that is at least a little bit unreliable. Here are four stratagems we can use to make the situation more palatable. First, improve programming practices. Use safer programming languages. As Nancy Leveson says, the easiest way to find out what features are bad is just to look at C; it has all of them. Boeing developed systems in both C and Ada. The Ada systems were not only cheaper to produce, but had many fewer errors. C++ with its dangling pointers and memory leaks isn't much better. For a safety-critical system, Java or Ada 95 would be a much better choice.

Second, program more safely. Any memory that's unused should be initialized to a pattern that will leave system in a safe state, should it be referenced by accident. Set critical flags and conditions as close as possible to the code they protect. Decouple systems. A highly coupled system is one that's highly dependent: A failure in one component rapidly affects others. Grade separations decrease accidents because they decouple traffic. This principle seems lost on the engineers who rebuilt the Raleigh Beltline; they took out grade separations at three exits and put up traffic lights. You may write code differently if you're opting for safety. For example, you can put code in ROM or write-restricted RAM. But there's a tradeoff--less tightly coupled systems are less efficient.

Third, make the role of software "not too critical." In computer science, we are under a handicap compared to other engineering disciplines. In other fields, they know that things will wear out and fail. But, theoretically, we can get software "right." Nancy Leveson opines, "Maybe after 40 years of trying, it is time to be a little more realistic."

Reliability requirements should be modest enough so that reliability can be demonstrated before a system is deployed. One option is relying on non-computer-controlled backup systems. A safety backup usually performs simpler functions than the main system, so it can be built more reliably. It could be built with different technology, or use other sensors, actuators and power sources. For example, a system can use gravity or another physical property to ensure that a vent valve opens when pressure is excessive. Then, the probability that both systems will fail simultaneously is minimized.

Finally, we can learn to live with more modest overall system safety. Software may still be safer than the non-software system it replaces. In a fly-by-wire aircraft, pilots control the aircraft through a computer that may modify their commands before acting on them. The Airbus A320 or Boeing 777 are examples. The possibility that software may cause accidents has to be weighed against the likelihood that it may avoid mishaps that would otherwise be caused by pilot error. Drive-by-wire systems will avert many auto accidents, but will be more of a challenge than fly-by-wire. Pilots have to be highly trained and physically fit, but anyone can drive a car. Suppose a drive-by-wire system is in control of a car that has just had a blowout, and the driver attempts to swerve to avoid an obstacle; what should it do? Robotic surgery will soon be feasible. There may be bugs, but human surgeons also have fairly high failure rates. Anyway, robots will soon be able to perform surgery that is beyond capabilities of humans.

Summary. To end our consideration of software safety, let's review a few of the main points. Safety problems are likely if software is tested inadequately or deployed too rapidly. Other problems arise when the programmer ignores relevant conditions, like the equator, daylight-savings time, or leap years. Software is logically more complex than other engineering systems, so it is necessary to get the design right. In practical terms, it's impossible to know that a program is correct, so one must at least follow good design principles and write code in languages without well known safety vulnerabilities. If these principles are adhered to assiduously, it should be possible to build computerized systems that are at least safer than the non-computerized systems they replace.