The games that helped AI evolve

Genetics researchers have fruit flies. Oncologists have white mice. For pioneering computer scientists studying artificial intelligence, it was games: rules-based systems that had defined criteria for success and failure, that demanded both nuance and complex decision-making. During the second half of the 20th century, researchers at IBM used games to train some of the earliest neural networks, developing technologies that would become the basis for 21st century AI.

Two particularly influential programs, Samuel Checkers and TD-Gammon, used checkers and backgammon to study strategy and improve their play through trial and error, much the same way human minds learn. Eventually, IBM researchers developed neural networks sophisticated enough to compete with human experts — spectacularly so in the case of Deep Blue, a chess program that became the first machine to beat a world champion, Garry Kasparov, in 1997.

In the process, these early programs taught researchers about both machine learning and the workings of organic brains. And while the leaders of the TD-Gammon and Deep Blue teams were adamant that their programs did not “think” in the human sense, they appreciated the potential of games to open up new frontiers in artificial intelligence. “Games are a nice, convenient test domain. They present us with closed or trivial problems,” said Gerald Tesauro, a researcher at IBM Hawthorne who led the TD-Gammon team.

During the 1990s, those trivial problems led to big breakthroughs.

The father of machine learning

When Arthur Samuel joined IBM in 1949, he was a professor and electrical engineer best known for his work with vacuum tubes. At Poughkeepsie, he began working with the 700 series of IBM computers, which he put to an unusual pursuit: checkers.

Although the IBM 701 machine on which he developed his Samuel Checkers program was among the most powerful computers of its time, its memory was not sufficient to game out every possible outcome of each move. Samuel got around this limitation by introducing what is now called “alpha-beta pruning,” a scoring system that allowed the program to evaluate the likelihood of winning from certain positions without playing them out to the end of the game. Like a human player, Samuel Checkers looked as many moves ahead as it could and made its decisions from there.

Most important, Samuel introduced mechanisms by which his checkers program could learn from games it had already played. Samuel Checkers recorded each position it saw and whether that position eventually led to a win or a loss; by incorporating these values into its subsequent decisions, the program got better the more games it played. Samuel called this process “machine learning,” a term he coined that remains central to artificial intelligence today. In 1962, after it had played thousands of games against itself to develop its skill, Samuel Checkers defeated self-described “checkers master” Robert Nealey. Its subsequent record against human opponents was mixed, but the principles Samuel developed laid the groundwork for a series of breakthroughs in artificial intelligence at IBM during the 1990s.

The neural network that learned to play backgammon

In 1992, IBM announced another major step in developing artificial intelligence through games: A program written by Tesauro had taught itself to play backgammon well enough to compete with professional players. That year, TD-Gammon, as it was known, went 19–19 in 38 games at a World Cup of Backgammon event — a far better performance than any backgammon program up to that point.

In some ways, TD-Gammon was an electronic brain as much as it was a computer program. It represented an early example of a neural network, a computer application comprising nodes and connections, modeled after the human brain’s neurons and synapses. The TD-Gammon network “learned” through a temporal difference algorithm, which used a delayed reinforcement approach to reward the system for a successful game.

The TD algorithm, which was designed to mimic the way humans learn, was conceived by computer scientist Richard Sutton of GTE Laboratories. But Tesauro was the first to apply it on such a large scale. He chose backgammon with the mindset that the game’s clean-cut rules and criteria for success would help the network learn through trial and error. Running on an IBM RS/6000 workstation, TD-Gammon played approximately 300,000 games against itself over the course of a month — about three times as many as most backgammon masters play in a lifetime.

After each turn’s roll of the dice, TD-Gammon considered every legal move and estimated the probability that it would lead to a win. These estimates were based on the connection-strength values stored in the synapses of the neural network; if a move or series of moves had previously featured in a winning game, it got more weight in the probability estimate. These weights were adjusted after each game, enabling TD-Gammon to “learn” from wins and losses.

With 30,000 artificial synapses, TD-Gammon had the brain power equivalent of a sea slug. It was powerful enough to compete at backgammon, but its abilities were dwarfed by the game-learning system that came after it: Deep Blue.

300,000

games TD-Gammon played against itself over the course of a month

100,000

games played in a lifetime by backgammon masters

In 1997, the IBM computer Deep Blue won a six-game series against Garry Kasparov, becoming the first machine in history to defeat a world chess champion. Since Deep Blue’s upset of Kasparov, computer chess players have become competitive with, if not superior to, human grandmasters — and the frontiers of AI have moved on to more subtle games.

In 2011, IBM’s Watson beat two of the most successful Jeopardy! champions of all time in an exhibition match. Many AI researchers now use Go, a millennia-old strategy game that is considered more abstract than chess, to train the next generation of neural networks. And at its Learn and Play website, IBM is training a new generation of game, chat and security AIs to interact with human players. By opening up developing AI systems to visitors on the internet, IBM’s researchers hope that both machines and humans can learn from each other.