The machines again raised the stakes. Pluribus, a superhuman poker-playing robot, beat the best players in Texas hold'em no-limit six-player poker, the most popular variant of the game. It's the first time that a program Artificial Intelligence (AI) beats elite human players during a match to more than two players.1.
"Although going from two to six players may seem progressive, it's a big problem," says Julian Togelius of New York University, who studies games and AI. "The multiplayer aspect is something that does not exist at all in the other games currently being studied."
The team behind Pluribus had already built an artificial intelligence, called Libratus, which had defeated two-player poker professionals. He built Pluribus by updating Libratus and created a bot that requires much less computing power to play games. In 12 days, with more than 10,000 hands, he beat the top 15 players. "Many researchers in artificial intelligence did not think it was possible to do that" with our techniques, says Noam Brown of Carnegie Mellon University in Pittsburgh, Philadelphia, and Facebook AI Research in New York, who developed Pluribus with his colleague from Carnegie, Tuomas Sandholm.
Other AIs who have mastered human games – such as Libratus and DeepMind's game robots – have shown they are unbeatable in two-player zero-sum games. In these scenarios, there is always a winner and a loser, and game theory offers a better defined strategy.
But game theory is less useful for scenarios involving multiple parties with opposing interests and no clear win / lose conditions – which reflect most of the real-life challenges. By solving multiplayer poker, Pluribus is laying the groundwork for future AIs to tackle such complex problems, says Brown. He believes their success is a step towards applications such as automated negotiations, better detection of fraud and autonomous cars.
To tackle six-player poker, Brown and Sandholm have radically revised Libratus's search algorithm. Most AIs in play explore decision trees to determine the best move in a given situation. Libratus searched until the end of a party before choosing an action.
But the complexity introduced by additional players makes this tactic impractical. Poker requires reasoning with hidden information – players must strategize by considering the cards their opponents might have and what their opponents might guess about their hand based on previous bets. But more players complicate the choice of an action at a given moment, because it is necessary to evaluate a greater number of possibilities.
The main advance was the development of a method that allowed Pluribus to make the right choices after a few small moves rather than towards the end of the game.
Pluribus learns from scratch using a form of reinforcement learning similar to that used by Alpha AI of Go DeepMind. He begins by playing poker at random and improves as he determines the actions that make the most money. After each hand, he goes back on his game and checks if he could have made more money with different actions, such as relaunching rather than wagering on a bet. If alternatives lead to better results, it will be more likely to choose a theme in the future.
By playing trillions of poker hands against himself, Pluribus has created a basic strategy on which he relies in the matches. At each decision point, he compares the state of the game to his plan and looks for some moves to see the course of the action. He then decides whether he can improve it or not. And because it has learned to play without human intervention, artificial intelligence has chosen some strategies that human players tend not to use.
Park of IA
The success of Pluribus relies heavily on its effectiveness. During playback, it only works on two CPUs. In contrast, DeepMind's original Go bot used nearly 2,000 processors, and Libratus 100 processors, for the first time against the best professionals. By playing against himself, Pluribus plays a hand in about 20 seconds – about twice as fast as professional humans.
The games have proven to be a great way to measure advances in artificial intelligence because bots can be scored against the best men – and objectively be hailed as superhuman if they triumph. But Brown thinks AIs are bigger than their parks. "It was the last remaining poker challenge," he says.
But Togelius thinks that there is still much to be done for researchers in artificial intelligence and games. "There is a lot of unexplored territory," he says. Few AI have mastered more than one game, which requires general skills rather than a niche skill. And more than just games, says Togelius. "There is also their design. A big challenge of AI is there is one. "