Design

Monte Carlo Methods

By Matthew Ginsberg, April 01, 2000

Bridge is one of a handful of classic games that have thus far eluded competent computer play. However, GIB, the Bridge program Matthew wrote and discusses here, proves to be a worthy competitor.

Apr00: Monte Carlo Methods

Matt is a member of the University of Oregon's Computational Intelligence Research Laboratory, which he founded in 1992 and directed until 1996. He can be reached at [email protected].

Bridge, Poker, and Go make up that handful of classic games of mental skill that have thus far eluded competent computer play. Bridge programs have been especially weak, leading perennial world Bridge champion Bob Hamman (six-time winner of the Bermuda Bowl) to accurately comment that "they would have to improve to be hopeless."

In general, performance at Bridge is measured by playing the same deal twice or more, with the cards held by one pair of players being given to another pair during the replay and the results then being compared. A "team" in a Bridge match thus typically consists of two pairs, with one pair playing the North/South (N/S) cards at one table and the other pair playing the East/West (E/W) cards at the other table. The results obtained by the two pairs are added; if the sum is positive, the team wins this particular deal, and if negative, they lose it.

Typically, the numeric sum of the results obtained by the two pairs is converted to International Match Points (IMPs). The purpose of the conversion is to diminish the impact of single deals on the total, so that an abnormal result on one particular deal does not have an unduly large impact on the result of an entire match.

One of my colleagues (Jeff Goldsmith) reports that the standard deviation on a single deal in Bridge is about 5.5 IMPs, so that if two roughly equal pairs were to play the deal, it would not be surprising if one team beat the other by about this amount. It also appears that the difference between an average club player and a world class expert is about 2 IMPs (per deal played). The strongest Bridge playing programs thus far appear to be slightly weaker than average club players.

Before I started working on GIB, the Bridge program I discuss here, other programmers had attempted to duplicate human Bridge-playing methodology in that their goal had been to recognize the class into which any particular deal falls: finesse, end play, squeeze, and so on. In retrospect, perhaps we should have expected this approach to have limited success; certainly chess-playing programs that have attempted to mimic human methodology (such as Paradise) have fared poorly (see D.E. Wilkins, "Using Patterns and Plans in Chess," Artificial Intelligence, 1980).

GIB works differently. Instead of modeling its play on techniques used by humans, GIB uses a brute-force search to analyze the situation in which it finds itself. More specifically, GIB reduces Bridge to its perfect-information variant by dealing a variety of hands for the opponents and then analyzing each hand given that information. GIB then selects that move that works out best, on average, over the sample of deals that was created.

This general approach is known as a "Monte Carlo technique" because of its gambling flavor: If the deal sample created accurately reflects the distributions of hidden cards that GIB is likely to encounter, the play suggested is likely to be a good one. If not, the recommended play is likely to be fairly weak. The general idea of applying Monte Carlo techniques to Bridge card play appears to have been first suggested by D. Levy (see "The Million Pound Bridge Program," edited by D. Levy and D. Beal, Heuristic Programming in Artificial Intelligence, Ellis Horwood, 1989).

Card play is only half of Bridge; there is bidding as well. It is possible to use search-based techniques here also, although there is no escaping the fact that a large database of bids and their meanings is needed by the program. Partly because this work is so dependent on the bidding database (and partly because GIB's success as a bidder has been more modest), I will focus here on aspects of card play alone.

Monte Carlo Methods and Card Play

To understand the card play phase of a Bridge deal, consider first Bridge's perfect information variant, the game where all of the players are playing double dummy, in that they can see which cards the other players hold. In this case, analysis of the game tree is fairly straightforward, although the tree is fairly large because the raw branching factor (the average number of legal plays in any given position) is about four, leading to an overall tree size of approximately 4⁵²10³⁰. Standard techniques that can be found in any introductory artificial intelligence textbook (such as alpha-beta pruning and the introduction of a transposition table) bring the branching factor down to about 1.7; techniques that have been described in the literature but are generally less well known (such as narrowness, partition search, and the use of the killer heuristic) bring it down to about 1.2, leading to search spaces of approximately 18,000 nodes per deal on average (see M.L. Ginsberg, "Partition Search," and A. Plaat, J. Schaeffer, W. Pijls, and A. de Bruin, "Exploiting Graph Properties of Game Trees," both in Proceedings of the Thirteenth National Conference on Artificial Intelligence, 1996).

One way we might now proceed in a realistic situation would be to deal the unseen cards at random, biasing the deal so that it was consistent both with the bidding and with the cards played thus far. We could then analyze the resulting deal double dummy, and decide which of our possible plays was the strongest. Averaging over a large number of such Monte Carlo samples is one possible way of dealing with the imperfect nature of Bridge information. Example 1 is an algorithm with this approach.

There are a variety of known problems with this method; most obvious is that the approach never suggests making an information-gathering play. After all, the perfect-information variant on which the decision is based invariably assumes that the information will be available by the time the next decision must be made. Instead, the tendency is for the approach to simply defer important decisions; in many situations this may lead to information gathering inadvertently, but the amount of information acquired will generally be far less than other approaches might provide.

In spite of this limitation, GIB's card play is at the level of a human expert. This was demonstrated fairly dramatically in an invitational event at the 1998 world Bridge championship in France, where GIB joined a field of 34 of the best card players in the world to face 12 problems in card play over the course of two days. GIB was leading at the halfway mark, but played poorly on the second day (perhaps the pressure was too much for it) and finished 12th. The human participants were given 90 minutes to play each deal, although they were penalized slightly for playing slowly. GIB played each deal in about 10 minutes, using a Monte Carlo sample size of 500.

Since its performance in France, GIB has become a commercial product, running under both Windows and Linux. The GIB package includes binaries for the Bridge engine (written in C), and source code for the user interface (written in Java). GIB's bidding database, written in a specialized language, is available free of charge at http://www.gibware.com/bidding/, although no technical support is available for the bidding database in isolation. For more information, see http://www.gibware.com/.

There are two important technical remarks that must be made about the Monte Carlo algorithm before proceeding. First, note that I was cavalier in simply saying, "Construct a set D of deals consistent with both the bidding and play of the deal thus far." To construct deals consistent with the bidding, you first simplify the auction as observed, building constraints describing each of the hands around the table. You then deal hands consistent with the constraints using a deal generator that deals unbiased hands given restrictions on the number of cards held by each player in each suit. This set of deals is then tested to remove elements that do not satisfy the remaining constraints, and each of the remaining deals is passed to the bidding module to identify those for which the observed bids would have been made by the players in question. This process typically takes one or two seconds to generate the full set of deals needed by the algorithm.

To conform to the card play thus far, it is impractical to test each hypothetical decision against the card play module itself. Instead, GIB uses its existing analyses to identify mistakes that the opponents might make. As an example, suppose GIB plays the 5. The analysis indicates that 80 percent of the time the next player (say West) holds the K, and it is a mistake for West not to play it. If West in fact does not play the K, Bayes's rule is used to adjust the probability that West holds the K at all. The probabilities are then modified further to include information revealed by defensive signaling (if any), and the adjusted probabilities are finally used to bias the Monte Carlo sample, replacing the evaluation _ds(m,d) with _dw_ds(m,d) where w_d is the weight assigned to deal d. More heavily weighted deals thus have a larger impact on GIB's eventual decision.

The second technical point regarding the algorithm itself involves the fact that it needs to run quickly and that it may need to be terminated before the analysis is complete. For the former, there are a variety of greedy techniques that can be used to ensure that a move m is not considered if we can show _ds(d,m)_ds(d,m') for some m'. The algorithm also uses preliminary searches that artificially reduce the branching factor of the search to ensure that a low-width (and possibly incorrect) answer is available if a high-width search fails to terminate in time. Results from the low- and high-width searches are combined when time expires.

An Example

It is difficult to know exactly what to include by way of an example, since any interesting example will necessarily require a significant amount of Bridge knowledge on the part of the reader. Let me include just one example, with South to lead needing two additional tricks at a no-trump contract, as in Figure 1.

The positions of the Q and the J are known, but the position of the Q is not. The solution is to lead a diamond from hand, discarding the 10 from dummy. The defenders will now be helpless to take more than two tricks, independent of the location of the Q.

GIB arrives at this conclusion by examining a sample of hands where West has the Q, and it suffices to play either the club or a diamond. If East has the Q, the diamond play and 10 pitch are required. GIB concludes from this that the diamond and the 10 discard are correct play in the given situation.

There are two interesting things to note about this deal. First, the ending itself, while not overwhelmingly complex, has not (to the best of my knowledge) appeared in the Bridge literature. This is a legitimate discovery of something new by a Bridge-playing program. Second, GIB was aiming for this situation from quite early in the hand. Figure 2 shows the full deal.

South (GIB) dealt at unfavorable vulnerability, and the auction went P-2 (weak)-X-P-3NT-all pass. The opening lead was the K, ducked by GIB, and West now switched to a small heart. East won the ace and returned to spades, GIB winning to eventually reach the position described earlier.

Other Games

This has been very much an article about Bridge; I have left essentially untouched the question: "To what extent could the basic Monte Carlo technique could be applied to other games of imperfect information?" Although I can make educated guesses in this area, the experimental work on which this article is based deals with Bridge exclusively.

The primary drawback of the Monte Carlo approach appears to be that it does not encourage information gathering actions, instead tending to defer decisions on the grounds that perfect information will be available later. This leads to small but noticeable errors in GIB's card play. Hearts appears to be similar to Bridge in this area, and I would expect it to be possible to translate GIB's success from one game to the other.

The Monte Carlo approach is known to be successful in both Backgammon and Scrabble, where the strongest machine players simulate possible dice rolls or tile draws several moves ahead in order to select a move. These games clearly meet the criteria of the previous paragraph, since it is impossible to gather information in advance about the stochastic processes underlying the game. For other games, however, the problems may be more severe. Poker, for example, depends heavily on the ability to make information gathering maneuvers. How effective Monte Carlo techniques are in cases such as this remains to be seen.

Acknowledgment

The GIB work has been supported by Just Write Inc. During its development, I have received invaluable help from members of both the Bridge and computer science communities. I am especially indebted to Chip Martel, Rod Ludwig, Alan Jaffray, Hans Kuijf, and Fred Gitelman, but also to Bob Hamman and Eric Rodwell, to David Etherington, Bart Massey, and the other members of CIRL, to Jonathan Schaeffer and Rich Korf, and to Jeff Goldsmith, Thomas Andrews and many other members of the rec.games .Bridge community.

DDJ

1 2 3 4 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Design