Home / Technology / Facebook AI Pluribus beats the best poker players in Texas Hold'm '6 players

Facebook AI Pluribus beats the best poker players in Texas Hold'm '6 players

This video shows examples of hands from the Pluribus experience against professional poker players. The cards are returned to facilitate understanding of Pluribus's strategy. Courtesy of Carnegie-Mellon University.

Poker AIs generally play well against human opponents when the game is restricted to two players. Now researchers at Carnegie Mellon University and the IA on Facebook have set the bar even higher with an AI nicknamed Pluribus, which has faced 15 professional human players in Texas Hold & # 39; em with no limit to six players and won. The researchers describe how they achieved this feat in a new Science article.

Playing more than 5,000 hands each time, five copies of AI have pitted two of the best professional players: Chris "Jesus" Ferguson, six-time World Series of Poker events winner, and Darren Elias, who currently holds the record for the greatest number of World Poker Tour Titles. Pluribus defeated them both. It was the same in a second experiment in which Pluribus played five players at a time, among 13 players, for 10,000 hands.

Co-author Tuomas Sandholm of Carnegie Mellon University, has been confronted with the unique challenges that poker poses to AI for 16 years. No-Limit Texas Hold 'em is a game called "imperfect information" because there are hidden cards (held by opponents in hand) and no restrictions on the size of the bet that the # We can do. In contrast, with chess and go, the state of the board and all the pieces are known to all players. Poker players can (and do) bluff on occasion, so it is also a misleading game.

Claudico spawned Libratus

In 2015, Claudico, the first version of artificial intelligence intended for Sandholm's game, was facing four professional players in the heads-up Texas Hold 'em – where there are only two players in the hand – at a tournament Brains vs.. Artificial Intelligence at Rivers Casino in Pittsburgh. After 80 000 hands played in two weeks, Claudico has not yet reached the statistical threshold of victory declaration: the margin must be large enough that there is a 99.98% certainty that the victory AI is not due to chance.

Sandholm et al. followed in 2017 with another AI, called Libratus. This time, rather than focusing on exploiting the mistakes of his opponents, the AI ​​has focused on improving its own game – a seemingly more reliable approach. "We have considered correcting the shortcomings of our own strategy as it makes our game safer and safer," Sandholm told IEEE Spectrum at the time. "When you exploit opponents, you open up more and more to the exploitation." The researchers also increased the number of games played to 120,000.

Artificial intelligence prevailed, although the four players tried to conspire against it, coordinating themselves to make strange bets to confuse Libratus. Sam Machkovech, of Ars, wrote at the time: "Libratus came out victorious after 120,000 hands of poker combined against four online poker professionals." Libratus' margin of $ 1.7 million, combined with numerous hands, crosses the main bar: victory with statistical significance. "

Online poker pro Dong Kim has embarked on an artificial intelligence program called Claudico in 2015. He lost to an updated program, Libratus, in the rematch of 2017.
Enlarge / Online poker pro Dong Kim has embarked on an artificial intelligence program called Claudico in 2015. He lost to an updated program, Libratus, in the rematch of 2017.

The University of Carnegie Mellon

But Libratus was still playing against another player in heads-up. A much tougher puzzle is to play poker with multiple players. So, Pluribus builds on its previous work with Libratus, with some key innovations allowing it to offer winning strategies in multiplayer games.

Sandholm and her graduate student, Noam Brown, who is working on her PhD with Facebook group Artificial Intelligence Research (FAIR), have used "abstraction and action" and "abstraction" approaches. "Information" to reduce the number of different actions that the AI ​​must take into account when designing its strategy. Whenever Pluribus reaches a point in the game where it has to act, it forms a subset – a representation that provides a finer abstraction of the actual game, likening to a pattern, according to Sandholm.

"It goes back a few actions and makes a kind of theoretical reasoning of the game," he said. Each time, Pluribus must propose four continuation strategies for each of the five human players via a new search algorithm with limited anticipation. This amounts to "four times more than six million different continuation strategies," according to Sandholm.

Like Libratus, Pluribus does not use poker-specific algorithms; he simply learns the rules of this imperfect information game and then plays against himself to craft his own winning strategy. So, Pluribus realized that it was best to devise a mixed and unpredictable game strategy – the conventional wisdom of the best human players of today. "We did not even say:" The strategy should be randomized, "Sandholm said. "The algorithm automatically understood that it had to be randomized, in what way and with what probabilities in which situations."

No limping

Pluribus has actually confirmed a bit of the conventional wisdom of the poker game: it's just not a good idea to "limp" in one hand, that is to say to call the big blind rather than going to bed or raise. The exception, of course, is if you are in the small blind, when a single call costs you half as much as the other players. But while human players generally avoid what is called "donk bets", in which a player ends a round with a call but starts the next round with a bet, Pluribus has placed the bet on a donation a lot. more often than his human opponents.

So, "In some ways, Pluribus plays the same way as humans," Sandholm said. "In other respects, he plays Martian strategies completely." Specifically, Pluribus makes unusual bet sizes and is better at randomization.

"His main strength lies in his ability to use mixed strategies," said Elias, one of the professional players who participated in the Pluribus experiment. "It's the same thing humans are trying to do. It is a question of execution for humans: to do it in a perfectly random and consistent way. Most people simply can not.

"These AIs really showed extra depth in the game that humans did not understand."

"It was incredibly fascinating to play against the poker robot and see some of the strategies that he chose," said Michael "Gags" Gagliano, another participating poker player. "There are several games that humans simply do not do at all, especially when it comes to the size of their bet.Bots / AIs are an important part of the evolution of poker, and it was amazing to have first hand experience in future. "

This type of AI could be used to design drugs to fight against antibiotic-resistant bacteria, for example, or to improve cybersecurity or military robotic systems. Sandholm cites multi-party trading or pricing, such as Amazon, Walmart and Target, which are trying to offer the most competitive pricing, as a specific application. Optimal spending by the media for political campaigns is another example, as are auctioning strategies. Sandholm has already yielded much of the poker technology developed in his lab to two startups: Strategic Machine and Strategy Robot. The first start is interested in games and other entertainment applications; Strategy Robot focuses on defense and intelligence applications.

Potential for fraud

When Libratus defeated human players in 2017, one wondered if poker could still be considered a skill-based game and whether online games, in particular, would soon be dominated by disguised robots. Some have realized that Libratus needs major hardware to analyze its game and determine how to improve it: 15 million hours and 1400 processor cores in real mode. But Pluribus needs much less processing capacity, completing its master plan strategy in eight days using only 12,400 core hours and 28 live cores. So, is this the death knell of poker based on skills?

The algorithm was so successful that the researchers decided not to publish its code, fearing that it could be used to empty the coffers of online poker companies. "This could be very dangerous for the poker community," Noam Brown, a former CMU student who helped develop the algorithm, told Technology Review.

Sandholm recognizes that sophisticated robots are teeming on online poker forums, but destroying poker has never been his goal, and he still thinks it's an address game. "I've come to love the game, because these AIs really showed that there was extra depth in the game that humans did not understand, even the brilliant professional players who have played millions of hands, "he said. "I therefore hope this will contribute to the enthusiasm of poker as a recreational game."

DOI: Science, 2019. 10.1126 / science.aay2400 (About DOIs).

Announcement image of Steve Grayson / WireImage / Getty Images

Source link