Advertisements
<pre><pre>Facebook and CMU & # 39; s & # 39; superhuman & # 39; poker AI beats the human pro & # 39; s

AI has definitely defeated people at one of our favorite games. A program designed by researchers from the AI ​​Lab at Facebook and Carnegie Mellon University has defeated some of the best poker players in the world in a series of games of six-person no-limit Texas Hold & # 39; em poker.

Advertisements

For 12 days and 10,000 hands, the AI ​​system with the name Pluribus faced 12 professionals in two different settings. In one, the AI ​​played alongside five human players; in the other, five versions of the AI ​​played with one human player (the computer programs & # 39; s could not work together in this scenario). Pluribus won an average of $ 5 per hand with hourly earnings of around $ 1,000 each – a "decisive profit margin," the researchers said.

"It is safe to say that we are at a superhuman level and that will not change," said Noam Brown, a research scientist at Facebook AI Research and co-creator of Pluribus, The edge.

"Pluribus is a very tough opponent to play against. It is very difficult to pin him down on any kind of hand," said Chris Ferguson, a six-time World Series of Poker champion and one of the twelve pros who competed against the AI , in a press statement.

In a newspaper published in Science, the scientists behind Pluribus say that the victory is an important milestone in AI research. Although machine learning has already reached superhuman levels in board games such as chess and Go, and computer games such as Starcraft II and Dota, six-person no-limit Texas Hold & # 39; em represents, by some measures, a higher level of difficulty.

Not only is the information needed to win hidden from players (making it an "imperfect information game"), it also involves multiple players and complex victory results. The famous game Go has more possible board combinations than atoms in the observable universe, making it a huge challenge for AI to map which next step to take. But all the information is available to see and the game has only two possible outcomes for players: winning or losing. This makes it easier in some ways to train an AI.


A timeline of the Pluribus training regime. "Limping" is a strategy used by some human players, which the AI ​​eventually discarded.
Credit: Facebook

In 2015, a machine-learning system beat the human pro & # 39; s at Texas Hold & # 39; em for two players, but increasing the number of opponents to five significantly increases complexity. To create a program capable of meeting this challenge, Brown and his colleague Tuomas Sandholm, a professor at the CMU, have deployed a number of crucial strategies.

Advertisements

First, they taught Pluribus to play poker by having it play against copies of themselves – a process known as self-playing. This is a commonly used technique for AI training, where the system is able to learn the game by trial and error; Hundreds of thousands of hands play against themselves. This training process was also remarkably efficient: Pluribus was created in just eight days with a 64-core server equipped with less than 512 GB of RAM. Training this program on cloud servers would only cost $ 150, making it a bargain compared to the one hundred thousand dollar price tag for other advanced systems.

To deal with the additional complexity of six players, Brown and Sandholm came up with an efficient way for the AI ​​to look ahead in the game and decide what action to take, a mechanism called the search function. Instead of trying to predict how his opponents would play until the end of the game (a calculation that would be incredibly complicated in just a few steps), Pluribus was designed to see only two or three moves ahead. This shortened approach was the & # 39; real breakthrough & # 39 ;, says Brown.

You might think that Pluribus is specifying a long-term strategy for short-term profit here, but in poker short-sighted wit is actually everything you need.

Pluribus, for example, was remarkably good at bluffing its opponents, with the pro & # 39; s playing against it, and praising the "ruthless consistency" and the way it took advantage of relatively thin hands. It was predictably unpredictable: fantastic quality in a poker player.

Brown says that this is only natural. We often think of bluffing as a unique characteristic of humans; something that relies on our ability to lie and mislead. But it is an art that can still be reduced to mathematically optimal strategies, he says. "The AI ​​does not see bluffing as misleading. It only sees the decision that it will make the most money in that specific situation," he says. "What we show is that an AI can bluff, and it can bluff better than any human."

What does it mean then that an AI has definitively labeled man as the most popular poker game in the world? Well, as we have seen with previous AI victories, people can certainly learn from the computers. Some strategies that players are generally suspicious of (such as "bet on donk") Were embraced by the AI, suggesting that they could be more useful than previously thought." When I play the bot, I feel like I'm picking up something new to be incorporated into my game, "said poker pro Jimmy Chou.

There is also the hope that the techniques used to make Pluribus are transferable to other situations. Many scenarios in the real world resemble Texas Hold & # 39; em poker in the broadest sense of the word, meaning that they include multiple players, hidden information, and countless win-win results.

Advertisements

Brown and Sandholm hope that the methods they have demonstrated can therefore be applied to areas such as cyber security, fraud prevention and financial negotiations. "Even something like helping navigate through traffic with self-driving cars," says Brown.

Can we now consider poker as a "beaten" game?

Brown does not answer the question directly, but he does say that it is worth noting that Pluribus is a static program. After the first eight-day training period, the AI ​​was never updated or upgraded to better match its opponents' strategies. And during the 12 days it spent with the pro, they could never find a consistent weakness in the game. There was nothing to exploit. From the moment it started to gamble, Pluribus was at the top.