The machines have proven their superiority in face-to-face games, like chess, poker and even, but in the complex multiplayer versions of the card game, humans have retained their edge … until now. An evolution of the latest artificial intelligence agent as a professional flummox poker player is now resolutely beating them decisively in a 6-person championship game.
As stated in an article published in the journal Science Today, the CMU / Facebook collaboration that they call Pluribus reliably beats five professional poker players in the same game or player vs. five independent copies. This is a major breakthrough in machine capacity and is also much more efficient than previous agents.
One-on-one poker is a strange game, and not simple, but its zero-sum nature (all you lose, the other player wins it) makes it vulnerable to certain strategies in which a computer is able to calculating far enough can put one's advantage. But add four extra players to the composition and things get really complex, very quickly.
With six players, the possibilities for hands, bets and possible results are so numerous that it's actually impossible to count them all, especially in a minute or less. It would be like trying to exhaustively document every grain of sand on a beach between the waves.
Yet with more than 10,000 hands played with champions, Pluribus has managed to earn money at a steady pace, revealing no weakness or habit that his opponents could take advantage of. What's the secret? Random coherent.
Even computers have regrets
Pluribus has been formed, as many artificial intelligence agents who play today, not by studying the game of humans, but by playing against itself. In the beginning, it 's probably like watching kids, or even me, playing poker – constant mistakes, but at least AI and kids are learning from them.
The training program used something called Minimization of Monte Carlo counterfactual regret. It sounds like a whiskey for breakfast after losing your shirt to the casino, and in a way, it's a style of machine learning.
Minimization of regrets It simply means that when the system finishes a hand (against itself, remember), it would replay it in different ways, exploring what could have happened if it had checked here instead of. to be raised, bent instead of calling, etc. . (Since this has not really happened, it's counterfactual.)
A Monte Carlo tree is a way to organize and evaluate many possibilities, which amounts to climbing tree by tree and to note the quality of each sheet found, then to choose the best sheet once you think to be sufficiently climbed.
If you do it in advance (this is done for example in chess), you are looking for the best move to choose. But if you associate this function with the regret function, you go through a catalog of possible solutions for the game and observe what would have the best result.
Minimizing the counter-factual regrets of Monte Carlo is therefore only a means of systematically investigating what could have happened if the computer had acted differently and adjusting its game model accordingly.
Of course, the number of games is almost infinite if you want to know what would happen if you had bet $ 101 instead of $ 100, or you would have won this big hand if you had a kicker with eight shots instead of seven There too is an almost infinite regret, one that allows you to stay in bed in your hotel room until lunch time.
The truth is that these minor changes matter so little that this possibility can be fundamentally ignored. It is never important to bet an extra dollar. Thus, any bet between, for example, 70 and 130 may be considered identical by the computer. Same thing with cards – it does not matter whether the jack is a heart or a cat, except in very specific (and usually obvious) situations, so that 99.999% of the hands can be considered equivalent.
This "abstraction" of gameplay and this "wake" of possibilities greatly reduces the possibilities that Pluribus must take into account. It also helps to keep the computing load low; Pluribus has been trained for about a week on a relatively ordinary 64-core server rack, while other models can take years of processor into high-power clusters. It even works on a platform (certainly beefy) with two processors and 128 GB of RAM.
Random like a fox
The training produces what the team calls a "blueprint" explaining how to play is fundamentally strong and would probably beat a lot of players. But one weakness of AI models is that they develop trends that can be detected and exploited.
In Pluribus's Facebook writing, he gives the example of two computers playing paper-stone scissors. One chooses randomly while the other still chooses rock. Theoretically, they would win the same number of games. But if the computer tried the rock strategy on a human, it would start to lose quickly and never stop.
As a simple example in poker, maybe a series of bets in particular always makes sure that the computer goes everywhere regardless of one's hand. If a player can spot this series, he can take the computer to town at any time. Finding and preventing such ruts is important to create a gaming agent that can defeat ingenious and observant humans.
To do this, Pluribus does several things. First, he modified the versions of his model so that it could be used if the game was to be folded, called, or raised. Different strategies for different games mean that the game is less predictable and can change in a minute if the betting habits change and the hand goes from a call to a bluff.
He also engages in a brief but thorough introspective search to determine how he would play if he had all the other hands, from a big nothing until a flush, and how he would bet. He then chooses his bet in the context of all of them, taking care to do so so that he does not point to a particular individual. With the same hand and the same game again, Pluribus would not choose the same bet, but would vary it to remain unpredictable.
These strategies contribute to the "constant randomness" I referred to earlier and that was part of the model's ability to slowly but surely place some of the best players in the world.
The human lament
There are too many hands to designate one or more people who indicate the power of Pluribus on the game. Poker is a game of skill, luck and determination, in which winners emerge after only dozens or hundreds of hands.
And here, it must be said that the experimental setup does not quite reflect a regular 6-person poker game. Unlike a real game, the number of chips is not maintained as a continuous total – for each hand, each player has 10,000 chips to use as they please, and winners or losers receive 10,000 points in the game. next hand.
Obviously, this limits the possible long-term strategies, and indeed "the bot was not trying to exploit the weaknesses of its opponents," said Noam Brown, researcher in artificial intelligence of Facebook. Really, Pluribus was living in the present moment as few humans can.
But the mere fact of not basing one's game on long-term observations of the opponents' individual habits or styles does not mean that their strategy was superficial. On the contrary, it is probably more impressive, and takes a different look at the game, that there is a winning strategy that do not rely on behavioral cues or exploitation of individual weaknesses.
The pros who got themselves the lunch money from the implacable Pluribus were however good sportsmen. They praised the high-level game system, its validation of existing techniques and its inventive use of news. Here is a selection of woes of deceased humans:
I was one of the first players to test the bot, so I could see his previous versions. Poor mediocre player, the bot has competed with the best players in the world in a few weeks. Its main strength lies in its ability to use mixed strategies. It's the same thing humans are trying to do. It's a performance issue for humans – do it in a perfectly random and consistent way. It was also satisfying to note that many strategies used by the bot are things we already do at poker at the highest level. Having your strategies more or less confirmed as correct by a supercomputer is a nice feeling. -Darren Elias
It was incredibly fascinating to be able to play against the poker bot and see some of the strategies that he chose. There have been several pieces that humans simply do not do at all, especially with regard to the size of their stakes. -Michael 'Gags' Gagliano
Whenever I play bot, I have the impression of looking for something new to incorporate into my game. As humans, I think we tend to oversimplify the game too much. for ourselves, making strategies easier to adopt and memorize. The bot does not take any of these shortcuts and has an extremely complex / balanced game tree for every decision. -Jimmy Chou
In a game that, more often than not, will reward you when you show mental discipline, concentration and consistency, and will definitely punish you when you run out of one of the three, competing for hours against an artificial bot that , obviously, does not do it. having to worry about these gaps is an exhausting task. The technical abilities and deep subtleties of the AI bot's poker ability were remarkable, but what I had underestimated was its most transparent strength, its relentless consistency. -Sean Ruane
Beat the humans in poker is only the beginning. As good a player as it is, Pluribus is more importantly a demonstration that an artificial intelligence agent can achieve superhuman performance in areas as complicated as 6-player poker.
"Many real-world interactions, such as financial markets, auctions and navigation, can also be modeled as multi-agent interactions with limited communication and collusion among participants," wrote Facebook on his blog.
Yes and the war.