Saturday, 31 January 2015

Poker Perfection

On June 8, the paper "Heads-up Limit Hold’em Poker is Solved" by Michael Bowling, Neil Burch, Michael Johanson (University of Alberta), and Oskari Tammelin (independent consultant from Finland) appeared in the prestigious journal Science. This paper reports on an outstanding research result: a popular version of two-player poker has been solved by computers. The University of Alberta program, named Cepheus (poker.srv.ualberta.ca), will not lose in the long run against any opponent (and likely will win since humans make mistakes). Computer perfection. This was the culmination of research into computer poker at the University of Alberta that began in 1995!
Many games have been solved before. Tic-tac-toe is a trivial game for which the perfect-play result (draw) and a strategy for achieving that result are known. The most challenging popular game that has been solved is 8x8 checkers (also called draughts). My team started looking at checkers in 1989, became the strongest checker-playing entity in the world in 1994, and achieved perfection in 2007. Perfect play leads to a draw. One of the challenges of checkers is the sheer size of the search space – the roughly 500,000,000,000,000,000,000 (10 to the 21st power, or 500 billion billion) possible positions.
On one dimension, the checkers problem is harder to solve than poker. Two player Texas Hold-em poker has a search space of "only" 10 to the 18th power (a billion billion scenarios). However, poker positions can be classified into so-called information sets, of which there are only 10 to the 13th power to solve.
On another dimension, poker is harder to solve. Poker has to deal with imperfect information (the opponent's cards are hidden) and randomness (the deal of the cards), whereas checkers is a perfect information game (all information about the state of the game is available to both players) without randomness. This distinction has consequences for the computation of the perfect strategy. In checkers, each position could take on only one of three values: win, loss or draw.  In poker, betting decisions are necessarily based on probabilities (because of the unknown information and the random scenarios that can occur). The Cepheus calculations had to iterate on each betting decision, gradually refining the probability of folding, calling, and/or raising. 
From the impact point of view, solving poker has greater potential to change the world. Poker is a microcosm of many real-world problems, including negotiation and strategy applications. Many domains can be couched as "games", including politics, environment, epidemics, and military. In fact, after we solved checkers in 2007, the U.S. government sponsored two workshops on Solving Games of National Importance. As John von Neuman, the father of the modern computer, once wrote: "Real life is not like [checkers]. Real life consists of bluffing, of little tactics of deception, of asking yourself what is the other man going to think I mean to do. And that is what games are about in my theory."
From left to right: Jonathan Schaeffer, Murray Campbell, Michael Bowling, and Gerry Tesauro.
Photo taken at the annual AAAI conference (Austin, Texas) on July 27, 2014.
The picture shows the lead authors of the programs that have solved (real or perceived) popular games. All four have a University of Alberta connection!
  • Jonathan Schaeffer -- led the creation of the checkers program Chinook, the first world champion at any game (1994). Solved checkers in 2007. At the University of Alberta since 1984.
  • Murray Campbell -- one of the three lead authors of Deep Blue, which in 1997 defeated World Chess Champion Garry Kasparov. Received his B.Sc. (1979) and M.Sc. (1981) from the University of Alberta.
  • Michael Bowling -- led the creation of Cepheus, computer perfection at two-player limit Texas Hold'em poker. Joined the University of Alberta in 2003.
  • Gerry Tesauro -- developed a landmark machine learning program, TD-Gammon, for backgammon and achieved super-human status in the later 1990s. His program is based on University of Alberta professor Rich Sutton's temporal difference learning algorithm.

And of the future? No-limit poker is harder to solve because of the wide range of possible betting scenarios. And if this is not hard enough, playing strong three-player poker is difficult. So, despite the crowning achievement of almost 20 years of research, the University of Alberta's Computer Poker Research Group has another 20 years of research ahead of it.