AI界不仅有围棋“杠把子”阿法狗和Master，还有扑克“大佬”Claudico。虽然在去年的比赛中Claudico逊于人类玩家，但研究员又带来了升级版的Libratus ，采用更高效的算法，还会根据特定的情境“虚张声势”，干扰对手的策略。Libratus将于1月11日在匹兹堡Rivers Casino挑战四位世界顶级扑克玩家，让我们拭目以待。
Meet the New AI Challenging Human Poker Pros
In 2015, several of the world’s top poker players faced down a supercomputer-powered artificial intelligence named Claudico during a grueling 80,000 hands of no-limit Texas Hold’em. Beginning tomorrow, a rematch of humans versus AI will test whether humanity can hold its own against an even more capable challenger.
The human margin of victory from the past event was not large enough to statistically prove whether humans or the Claudico AI were really the better poker players. This year’s rematch features four human poker pros playing for a prize pot of $200,000 against an AI called Libratus in the “Brains Vs. Artificial Intelligence: Upping the Ante” event being held at the Rivers Casino in Pittsburgh starting on 11 January. One of the researchers who helped build both Claudico and Libratus believes that AI will beat the best human players sometime within the next several years—if not sooner.
“I still think it will happen within the next five years, but it could be within the next month,” says Tuomas Sandholm, a computer scientist at Carnegie Mellon University. “It’s quite possible that humans will win this event, but it’s also possible that we’ll pull an upset.”
Game-playing AI has found solutions to some versions of poker. But heads-up, no-limit Texas Hold’em represents an especially complex challenge with 10160 possible plays at different stages of the game (possibly more than the number of atoms in the universe). Such complexity exists because this two-player version of poker allows for unrestricted bet sizes.
To deal with such a game, many AI rely on a technique calledcounterfactual regret minimization (CFR). Typical CFR algorithms try to solve games such as poker through several steps at each decision point. First, they come up with counterfactual values representing different game outcomes. Second, they apply a regret minimization approach to see which strategy leads to the best outcome. And third, they typically average the most recent strategy with all past strategies.
The challenge with the CFR approach is that no supercomputer could solve for all the different game outcomes at any given point in heads-up, no-limit Texas Hold’em. Instead, CFR algorithms usually solve simplified versions of poker and use the resulting strategies to imperfectly play the full versions of poker games. Even these simplified “game trees” must map out many different paths branching out from each decision point.
But Sandholm and his Ph.D. student Noam Brown built Libratus from the ground up with more efficient algorithms. Their new variant of CFR can prune certain paths over time and effectively create a smaller game tree, which reduces the computational load and leads to speedups in calculation times. The efficiency of the Libratus algorithms also mitigates the problem of something called imperfect recall abstraction, which arises when CFR algorithms must “forget” part of the game tree history so that they can computationally focus on more refined models of the present.
Libratus has also improved on its predecessor by taking a safer approach to endgame strategy. Unlike Claudico, Libratus can calculate how much it has already benefited from past mistakes of opponents in the current hand and then balance that against how much it can afford to risk for the remainder of the hand. At certain points throughout its poker play, Libratus will stop and calculate what it should do for the endgame.
Last but not least, Libratus has evolved its poker-playing strategy by running for 15 million processor-core-hours on a new supercomputer calledBridges prior to the upcoming competition. By comparison, Claudico’s algorithms only ran between 2 and 3 million core hours on an older (and now retired) supercomputer called Blacklight. During the competition, Libratus will also take time each night to perform offline calculations and improve before the next day of poker play.
AI still has much to prove next to the best human poker players. But Libratus does have a qualitative advantage based on its ability to take a perfectly balanced approach to the game. For example, it will be able to bluff during certain hands with precisely-calculated values to balance risk and reward. It will also be able to deploy random moves in a way that human players would have great difficulty doing. That means it could baffle human opponents with unusual strategies such as making tiny bets or massive overbets in certain situations.
The human poker pros—Jason Les, Dong Kim, Daniel McAulay, and Jimmy Chou—will have gained their own advantages since the first “Brains Vs. Artificial Intelligence” event in 2015. They have had the opportunity to study many of the strategies of the previous Claudico AI. Furthermore, the human competition has improved overall since many top players began using game theory tools starting several years ago.
“Human poker pros are playing each other and getting better and better, and they’re using computational tools to get better at the game,” Sandholm says. “It’s not like we're facing the same opposition from a year and eight months.”
In the 2015 competition, the four poker pros managed to win more money than the Claudico AI over the course of 80,000 hands, but not enough to prove human dominance with statistical significance. To improve the chances of a statistically significant outcome, organizers boosted the number of total hands being played to 120,000 hands and relaxed the level of statistical significance needed. Last time the threshold was 95 percent statistical significance (which is the standard used in the Annual Computer Poker Competition), but this time it is one standard deviation.
The four human poker pros agreed to play two hands at the same time against Libratus to squeeze in the increased number of hands. But they will be playing just seven hours per day during the 20-day event this year, which is less than the average of 8 to 10 hours per day during the 2015 competition.
Perhaps the greatest human advantage during the previous event was the poker players’ ability to adapt to the Claudico AI’s unusual strategies. By comparison, the Claudico AI generally did not adapt to its human opponents’ strategies. “If we’re talking about the absolute top human players from last time, I was very impressed by their quick adaptation,” Sandholm says. “They learned very quickly from a very small number of hands.”
Could Libratus have gained the ability that Claudico lacked? Sandholm declined to confirm or deny whether the new Libratus AI can adapt to its human opponents. To see how the new and improved AI performs, anyone is free to watch the competition through the online streaming service Twitch.