Home / Technology / An IA poker program built by Facebook and CMU beats the best players in the world

An IA poker program built by Facebook and CMU beats the best players in the world



The artificial intelligence has definitely defeated the man at another of our favorite games. A program designed by researchers at Facebook's artificial intelligence lab and Carnegie Mellon University has helped some of the world's best poker players win a series of no-limit Texas Hold'em poker games for six.

In 12 days and 10,000 hands, the Pluribus artificial intelligence system has confronted 12 pros in two different contexts. In one, the AI ​​played alongside five human players; in the other, five versions of artificial intelligence played with a single player (computer programs were unable to collaborate in this scenario). Pluribus has earned an average of $ 5 per hand with hourly earnings of about $ 1,000 – a "decisive margin of victory," according to the researchers.

"It is safe to say that we are at a superhuman level and that will not change," said Noam Brown, researcher at Facebook AI Research and co-creator of Pluribus. The edge.

"Pluribus is an opponent very difficult to face. It's really hard to get him into all sorts of hands, "said Chris Ferguson, six-time World Series of Poker champion and one of the 12 professionals hired against AI, in a press release.

In an article published in Science, the scientists behind Pluribus say that this victory is a milestone in the research on AI. Although machine learning has already reached superhuman levels in board games like Echess and Go, and computer games like Starcraft II and dotaEndless Texas Hold'em of six people is, according to some measures, a higher difficulty index.

The information needed to win is not only hidden from the players (which is known as the "imperfect information game"), but it also involves multiple players and complex win results. The game of Go has more combinations of possible cards than atoms in the observable universe, which represents a daunting challenge for AI, who wishes to define the strategy to follow. But all the information is visible, and the game has only two possible results for the players: win or lose. This facilitates, in some ways, the formation of an AI.



Schedule of Pluribus training program. "Limping" is a strategy used by some human players that the AI ​​has finally rejected.
Credit: Facebook

Back in 2015, a machine learning system beat the men's pros in two-player Texas Hold'em, but bringing the number of opponents to five dramatically increases the complexity. To create a program that can meet this challenge, Brown and his colleague Tuomas Sandholm, professor at CMU, have deployed some crucial strategies.

First, they taught Pluribus to play poker by playing it against copies of itself – a process known as self-play. This is a common technique for AI training, the system can learn the game by trial and error; play hundreds of thousands of hands against himself. This training process was also remarkably effective: Pluribus was created in eight days using a 64-core server with less than 512 GB of RAM. The training of this program on cloud servers would cost only $ 150, which makes it a good deal compared to the price of one hundred thousand dollars of other systems at the cutting edge of technology.

Then, to deal with the added complexity of six players, Brown and Sandholm have come up with an effective way for the AI ​​to look ahead in the game and decide the decision to be made, mechanism called "search function". Rather than predict how his opponents would play until the end of the game (a calculation that would become incredibly complex in just a few steps), Pluribus was designed to take only two or three shots to come. This truncated approach was the "real breakthrough," Brown said.

You might think that Pluribus is sacrificing here its long-term strategy for a short-term gain, but in poker, it turns out that any short-term uncertainty is all you need.

For example, Pluribus was remarkably good at bluffing his opponents, the pros who faced him extolling his "implacable constancy" and the way he drew the profits from relatively lean hands. It was predictably unpredictable: a fantastic quality in a poker player.

Brown says that it is only natural. We often think that bluffing is a purely human trait; something that relies on our ability to lie and deceive. But it's an art that can still be reduced to mathematical optimal strategies, he says. "Artificial intelligence does not see bluffing as misleading. He just sees the decision that will bring him the most money in this particular situation, "he says. "What we are showing is that an AI can bluff and bluff better than any human."

What does it mean, then, that an AI has definitely defeated man as the most popular poker game in the world? Well, as we've seen with past AI wins, humans can certainly learn computers. Some strategies that players generally suspect (such as "donk betting") have been adopted by the AI, suggesting that they might be more useful than previously thought. "Every time I play bot, I feel like I'm discovering something new to incorporate into my game," said poker pro Jimmy Chou.

It is also hoped that the techniques used to create Pluribus will be transferable to other situations. Many real-world scenarios resemble Texas Hold'em Poker in the broadest sense of the term: they involve multiple players, hidden information and many win-win outcomes.

Brown and Sandholm hope that the methods they have demonstrated could therefore be applied in areas such as cybersecurity, fraud prevention and financial negotiations. "Even something like helping to manage traffic with autonomous cars," says Brown.

So, can we now consider poker as a "beat" game?

Brown does not answer the question directly, but he says it's worth noting that Pluribus is a static program. After its initial eight-day training period, the AI ​​has never been updated or improved to better match the strategies of its opponents. And over the past 12 days with the pro, they have never managed to find a constant weakness in his game. There was nothing to exploit. From the moment he started betting, Pluribus was at the top.


Source link