Week 15: AI and Expert Systems

Programmed trading and the '87 crash. October 19, 1987 was a historic day on Wall Street. The Dow Jones Industrial Average plunged 508 points, losing 22% of its value in a single day. This was far worse than any day during the 1929 crash. The stock market recovered the next year, but several investigations ensued. Much of the blame fell on "programmed trading."

The idea of programmed trading is to make money off of small differences in the trading prices of stocks on the New York exchange and futures traded in Chicago. This is called "arbitrage." "Index arbitrage" was the predominant form in 1987. Prices of stocks were compared with stock-index futures. When one price moved above the other, traders would sell the higher-priced one and buy the lower-priced one. The profits per share were minute, but by buying and selling millions of shares, traders were able to make a worthwhile profit. This strategy would not have been viable without computers. It couldn't be done by trading just a few stocks. That would have meant buying many shares of individual companies, which would cause a short-term imbalance in supply and demand, depressing the price of the stocks that were sold and inflating the price of the stocks that were bought. To employ arbitrage successfully, hundreds of trades had to be made, in a matter of minutes. Computers were the only way to accomplish this.

Just as in last week's cases of the Therac-25 and the London Ambulance Service, these programs had serious and unforeseen consequences. Unlike last week's scenarios, these programs were not malfunctioning. They were doing exactly what they were supposed to. But so many programs were looking at the same data at the same time, and coming to similar conclusions, that the effects were devastating.

The investigations following the crash led to several circuit-breakers being put into place. The most important was the rule that after the Dow goes up or down by 50 points in one day, all index arbitrage must be "stabilizing." That is, if the market is down 50 points, sell orders can only be entered when stocks move back upward, and buy orders can only be entered when stocks are falling. This prevents a program from selling a stock that is falling, causing it to fall still further. The circuit-breakers have been quite successful. The 50-point rule was tripped 44 times in its first two years. Now, the market average is much higher, so 50 points represents less than a 1/2% change in the price of the Dow stocks, and the circuit-breakers come into play on at least two-thirds of all trading days.

Risks of correct programs: During this course, we've studied several kinds of risks from program--intentional misuse, as in hacking; unforeseen consequences of misuse, as in the Internet worm; and incorrect inputs, as in the Dutch chemical plant. In all of these cases, the unexpected consequences were severe, but they were all due to errors or misuse. The 1987 stock-market crash shows that even a correct program is not necessarily safe. The programs were supposed to sell losing stocks, and they did. The more they fell, the more quickly they were sold. The selling frenzy fed on itself. The programs were not breaking the law, or the rules of the stock exchange. They were trying to earn money. Who's at fault? Not the programmers. But in old-fashioned trading, there was a limit to how fast people could move.

It's not only stock traders who want to make decisions quickly. The military does too. The armed forces want to make decisions faster, and bring more information to bear on them. So there will be pressure to put more decision-making authority in the hands of a computer. But any such program incorporates many assumptions that may not be subjected to scrutiny unless they become critical. It might be a good idea to re-evaluate them at the time of battle in light of new evidence. But there won't be time to do that.

In any program, you can make assumptions that you don't really test, as in the American air-traffic control software that couldn't be used at Heathrow because it couldn't deal with planes that crossed 0° longitude. Suppose AI programs were used to decide when to initiate an attack. We want to strike first if and only if the enemy is going to attack us. If computers are not used, people will be making decisions with apprehension about the consequences of an attack. A program, though, may decide that the risk of going to war tomorrow is greater than the risk today. Because, let's say, the number of lives we will lose will be greater, or the number that the enemy will lose will be less. It will make up its mind based on decision rules thought up years earlier, before all of the present circumstances were appreciated.

Life-and-death medical decisions. No computer program has started a war yet, but every day computers are making life-and-death medical decisions. In the best of circumstances, these programs save lives. Abbott-Northwestern Hospital in Minneapolis used diagnostic software to flag 1800 medication orders from doctors as inappropriate. LDS Hospital in Salt Lake City used a new computerized drug-interaction system for administering post-surgical antibiotics. It decreased the incidence of adverse drug reactions from 5% to 0.2%, and reduced the post-surgical infection rate from 1.8% to 0.4%.

More problematic is the use of software to decide who gets treatment and who doesn't. APACHE III is an acronym for Acute Physiology and Chronic Health Evaluation. Developed at George Washington University and introduced in 1991, it is a program that predicts the probability that a severely ill patient will die in the hospital. It has proved much better than physicians in predicting who will die. Researchers compared doctors' mortality estimates with APACHE's on 850 patients in intensive care units. Doctors predicted that 25.5% would die; APACHE predicted 19.7%. 20.7% actually did die. Physicians identified 46 patients as having a 90% risk of dying, but only 62% died. APACHE rated 16 patients of having a 90% chance of dying in the hospital; all died.

What is the advantage of using a program like APACHE? Well, resources are limited, and it is better to direct them to people who have a good chance of living. It can help avoid performing expensive procedures on patients who are likely to die anyway. And, it can help decide when a patient no longer needs intensive care, which costs $2500/day. Dr. Charles Watts, head of critical care unit at the University of Michigan Medical Center, says he saved $2.5M in 1994 because the APACHE system's predictions let him move some recovering patients out of intensive care earlier. For these patients, it is actually better to leave intensive care, where the risk of infection is high. If a patient is very likely to die, it makes it easier for families to decide to withdraw life-support systems. One 77-year-old woman was predicted seven days in a row to have a 99% chance of dying. Her husband decided to take the woman off the ventilator, and she died almost immediately. Finally, it can save lives by affecting treatment. If the mortality risk rises from 50% to 60%, the doctor knows he must consider a different treatment. Upon seeing the number change, the doctor may be more likely to listen to a nurse who tells him this.

Ethical considerations in computerized decision-making. But, should computers be used to make life-or-death decisions? They may incorporate untested assumptions--remember 0° longitude. On the other hand, a program doesn't discriminate against people because of age, race, or ability to pay. Doctors often weigh age heavily in treatment decisions. Age doesn't have as much to do with survival as most doctors think. But suppose the program recommends the wrong thing? If the doctor disregards the program's advice because of intuition, and the patient dies, the doctor may be sued So once a program is in place, it will be hard for doctors to use their own judgment.

When should AI systems be used in fields where ethics is a concern? Here are two proposed rules: (i) computers should never make any decisions that humans want to make, and (ii) computers should never make any decisions that humans cannot override. Suppose it were true that computer had a lower error rate than a doctor. If a computer always does something, over time we lose the skills to do it. This would be a negative competence. We'd want doctors to keep up their skills, and learn new skills so that they could teach them to computers. But, if computers do something very well, humans might no longer need those skills. For example, it is not clear that students need to know all the "tricks" to integration; they can leave that to the mathematicians, who can advise programmers who write symbolic-algebra programs. Take the case of Mathematica. We might rather have it than humans do the calculations for a new bridge, because it is more reliable. Suppose that when computers drive cars, the accident rate plummets. At what point do we stop giving humans drivers' licenses? At what point do we say, Your freedom to drive isn't worth it. We want to be safe on the highways!

The second proposed rule is, Computers should never make any decisions that humans cannot override. This asks, should a person always be able to "pull the plug?" You can imagine systems that, when humans override them, tend to get screwed up. Some commercial airliners, like the Boeing 777, already fly "by wire." A computer may modify the commands given by the pilot in a way that the pilot can't override. Is this good? How sure do we have to be that a program is correct before denying the operator the chance to override its decisions? How can we ever be that sure of its correctness?

Guidelines for the use of expert systems. Here are four guidelines about when an area is suitable for an expert system. First, the knowledge required to make a decision should be fairly well circumscribed. For example, analyzing Swan-Ganz catheter readings is easier than managing a patient in cardiogenic shock, which in turn is easier than managing a patient with multi-organ failure. Second, people who are experts in the field can reach decisions much more rapidly than non-experts. Third, there is considerable value in reaching accurate solutions. If not, it's not cost-effective to write software to do it. Finally, the data required as input to the decision can be described objectively. If they can't be, we don't have any assurance that the program would reach the right conclusion.

What is the ethical responsibility for decisions made by programs? Attributes of a moral decision-maker include knowledge of all relevant facts, freedom from bias, freedom from disturbing passion, and the ability to vividly imagine the feelings and circumstances of the parties involved. Computers might be able to achieve the first three of these, but the last is problematic.

Intelligence. Can a program know all the relevant facts? A program can embody a lot of knowledge. But it will never know everything the expert knows. In particular, it can't include unconscious aspect of the expert's mind. Nor will its domain of knowledge be as broad as that of the expert, since a program concentrates on a small area of knowledge.

In-biasedness. A program generally does not have prejudices, although it might be written to incorporate bias. But at least we can check for the bias more easily than for human bias. It may incorporate the biases of the knowledge engineer, for example, in the decision of whether to go to war.

Suppose a program tries to capture the diagnostic ability of a physician. The knowledge engineer must break down the doctor's intuitive thinking into a series of logical steps. But one moral rule can take precedence over another in certain circumstances. Since there is no "artificial ethics," humans should always be in control.

Legal responsibility for decisions made by programs. If the decision of an expert system is wrong and leads to serious injury or loss, who is responsible? The domain expert? The knowledge engineer? The vendor? The end user of the system? Some cases are easy. If the knowledge engineer introduces bugs because he doesn't understand the domain expert, then he is clearly responsible. In some cases, the user may misinterpret the output of the program. Some cases are harder. Experts may disagree; suppose the domain expert's opinion is not held by other domain experts. The vendor must assume at least some responsibility if the experts or knowledge engineer are proved wrong. The vendor is only immune from liability if the user is clearly negligent, careless, or untrained.

But there have not yet been many cases in this area. One vendor, Medical Software Consortium, a St. Louis supplier of medical systems, balked at developing an automated stretcher with diagnostic capabilities. No one wants to be the test case for an expert-system product-liability suit. Strict liability holds the vendor liable if it sells a product in a defective condition. However, strict liability does not apply to services, which fall under malpractice provisions. So strict liability might not apply when the injured party is not an end-user of a system. Physicians clearly cannot abdicate autonomy and responsibility by delegating it to an expert system. They must be more than passive users. The law would hold physicians ultimately accountable for their judgments. Vendors, too, should take precautions. They should communicate reasonable expectations, avoid hyperbole, provide a clear statement of the risks and limitations, provide extensive user training, disseminate bug fixes promptly, and recall a product if it causes major problems.

Artificial intelligence opens new vistas for computers to improve our lives. But given the complexity of these systems, it is not possible to foretell their effect with confidence. Therefore we should go slowly in introducing these systems, making sure we understand their impact before deploying them on a large scale.