Libratus Paper
Libratus Paper
5226
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17)
hand possibilities on the third round are grouped into 2.5 mil- opponent’s mistakes in the hand so far to enlarge the strategy
lion buckets, and the 2.4 billion different possibilities on the polytope that can be safely optimized over. This begets better
fourth round are grouped into 1.25 million buckets. The idea strategies [Brown and Sandholm, 2017b] than prior subgame-
is that solving this abstraction gives a detailed strategy for solving techniques [Ganzfried and Sandholm, 2015; Burch et
the first two betting rounds and a blueprint for the remaining al., 2014; Jackson, 2014; Moravcik et al., 2016].
two betting rounds; the subgame solver, discussed in the next A further novel aspect is that Libratus changes its action
subsection, will then refine the blueprint into a detailed strat- abstraction in each subgame. Thus the opponents must adapt
egy. The card abstraction algorithm was similar to that used to new bet sizes in each subgame.
in Baby Tartanian8 [Brown and Sandholm, 2016a] (the win-
ner of the 2016 ACPC) and Tartanian7 [Brown et al., 2015] 2.3 Self-Improvement
(the winner of the 2014 ACPC). The abstraction algorithm As described in Section 2.1, Libratus uses a dense action ab-
took the game size from 10161 decision points down to 1012 . straction on the first two betting rounds. If an opponent does
We solved the abstract game via a distributed version of an not bet an amount that is in the abstraction, the bet is rounded
improvement over Monte Carlo Counterfactual Regret Min- to a nearby size that is in the abstraction. This causes the AI’s
imization (MCCFR) [Zinkevich et al., 2007; Lanctot et al., strategy and estimates of the values of reaching subgames, to
2009; Brown et al., 2015]. MCCFR is an iterative algo- be slightly off. To improve upon this, in the background, the
rithm which independently minimizes regret at every deci- AI determined a small number of actions to add to the abstrac-
sion point. If both players play according to MCCFR in a tion that would reduce this rounding error as much as possi-
two-player zero-sum game, then their average strategies prov- ble. The choice of actions was based on a combination of
ably converge to a Nash equilibrium. Libratus improves over which actions the opponents were choosing most frequently
vanilla MCCFR through a sampled form of Regret-Based and how far those actions were from their nearest actions in
Pruning (RBP) [Brown and Sandholm, 2015] (which we also the abstraction. Once an action was selected, a strategy was
used in our Baby Tartanian8 agent [Brown and Sandholm, calculated for it in a similar manner to subgame solving, de-
2016a]). At a high level, our improvement is that paths in the scribed in Section 2.2. From that point on, if that action (or
tree that have very negative regret (for the player that is being a nearby one) were chosen by an opponent, then the newly
updated on the current iteration) are visited less often. This solved subgame strategy would be used.
leads to a significant speed improvement, thereby enabling a
large fine-grained abstraction to be solved. It also mitigates 3 Agent Construction
the downsides of imperfect-recall abstraction (which is the
state of the art) because the effective in-degree of abstract In total, Libratus used about 25 million core hours. Of those,
states decreases as some paths to them get de-emphasized. about 13 million core hours were used for exploratory experi-
ments and evaluation. About 6 million core hours were spent
2.2 Nested Safe Subgame Solving on the initial abstraction and equilibrium finding component,
another 3 million were used for nested subgame solving, and
The second module solves a finer-grained abstraction of the about 3 million were used on the self-improvement algorithm.
remaining game, taking into account the blueprint of the strat- The equilibrium finding and self-improvement algorithms
egy for the entire game, when the third round is reached. used 196 nodes on the Bridges supercomputer at the Pitts-
Unlike perfect-information games, an imperfect-information burgh Supercomputing Center. Each node has 128 GB of
subgame cannot be solved in isolation. The Nash equilibrium memory and 28 cores, but only 14 cores are used by the agent.
strategy in other subgames affects the optimal strategy in the An asymmetric abstraction was used that had more actions for
subgame that is reached during play. Nevertheless, we can the opponent, to better reduce the error resulting from action
approximate a good strategy in the subgame that is reached if translation [Bard et al., 2014].
we have a good estimate of the value of reaching the subgame
The subgame solver used 50 nodes per game. Here we
in an equilibrium. The first module estimated this value for
used CFR+ [Tammelin et al., 2015] combined with spe-
every subgame. Using these subgame values as input, sub-
cialized optimizations [Johanson et al., 2011] for equilib-
game solving creates and solves a finer-grained abstraction in
rium finding. Recent work suggests that subgame solving
the subgame that is reached.
could be even faster by leveraging warm starting [Brown
This finer-grained abstraction does not use any card ab- and Sandholm, 2016b], new pruning techniques [Brown and
straction and uses a dense action abstraction. Sandholm, 2017a], or first-order methods [Nesterov, 2005;
Also, rather than apply action translation as was done on Kroer et al., 2017].
the first two rounds, and has been done in prior poker AIs,
Libratus constructs and solves a new subgame every time an
opponent chooses an action that is not in the finer-grained Acknowledgments
abstraction (in practice, it constructs a new subgame every The research was supported by NSF grant 1617590, ARO
time the opponent bets). This allows it to avoid the rounding award W911NF-17-1-0082, and XSEDE computing re-
error due to action translation and leads to dramatically lower sources by the Pittsburgh Supercomputing Center. The
exploitability [Brown and Sandholm, 2017b]. Brains vs. AI match was sponsored by CMU, Rivers Casino,
Another novel subgame solver aspect is that it guarantees GreatPoint Ventures, Avenue4Analytics, TNG Technology
that the solution is no worse than the precomputed equilib- Consulting, Artificial Intelligence, Intel, and Optimized Mar-
rium approximation, taking into account the magnitude of kets. Ben Clayman computed statistics of the play of our AIs.
5227
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17)
References [Jackson, 2014] Eric Jackson. A time and space efficient al-
[Bard et al., 2014] Nolan Bard, Michael Johanson, and gorithm for approximately solving large imperfect infor-
Michael Bowling. Asymmetric abstractions for adversar- mation games. In AAAI Workshop on Computer Poker and
ial settings. In International Conference on Autonomous Imperfect Information, 2014.
Agents and Multi-Agent Systems (AAMAS), pages 501– [Johanson et al., 2011] Michael Johanson, Kevin Waugh,
508, 2014. Michael Bowling, and Martin Zinkevich. Accelerating
[Bowling et al., 2015] Michael Bowling, Neil Burch, best response calculation in large extensive games. In Pro-
Michael Johanson, and Oskari Tammelin. Heads-up limit ceedings of the International Joint Conference on Artificial
hold’em poker is solved. Science, 347(6218):145–149, Intelligence (IJCAI), pages 258–265, 2011.
January 2015. [Kroer et al., 2017] Christian Kroer, Kevin Waugh, Fatma
[Brown and Sandholm, 2014] Noam Brown and Tuomas Kılınç-Karzan, and Tuomas Sandholm. Theoretical
Sandholm. Regret transfer and parameter optimization. In and practical advances on smoothing for extensive-form
Proceedings of the Twenty-Eighth AAAI Conference on Ar- games. In Proceedings of the ACM Conference on Eco-
tificial Intelligence, pages 594–601. AAAI Press, 2014. nomics and Computation (EC), 2017.
[Brown and Sandholm, 2015] Noam Brown and Tuomas [Lanctot et al., 2009] Marc Lanctot, Kevin Waugh, Martin
Sandholm. Regret-based pruning in extensive-form Zinkevich, and Michael Bowling. Monte Carlo sampling
games. In Advances in Neural Information Processing Sys- for regret minimization in extensive games. In Proceed-
tems, pages 1972–1980, 2015. ings of the Annual Conference on Neural Information Pro-
cessing Systems (NIPS), pages 1078–1086, 2009.
[Brown and Sandholm, 2016a] Noam Brown and Tuomas
Sandholm. Baby Tartanian8: Winning agent from the 2016 [Moravcik et al., 2016] Matej Moravcik, Martin Schmid,
annual computer poker competition. In Proceedings of the Karel Ha, Milan Hladik, and Stephen Gaukrodger. Re-
Twenty-Fifth International Joint Conference on Artificial fining subgames in large imperfect information games. In
Intelligence (IJCAI-16), pages 4238–4239, 2016. AAAI Conference on Artificial Intelligence (AAAI), 2016.
[Brown and Sandholm, 2016b] Noam Brown and Tuomas [Nesterov, 2005] Yurii Nesterov. Excessive gap technique in
Sandholm. Strategy-based warm starting for regret min- nonsmooth convex minimization. SIAM Journal of Opti-
imization in games. In AAAI Conference on Artificial In- mization, 16(1):235–249, 2005.
telligence (AAAI), 2016. [Schaeffer et al., 2007] Jonathan Schaeffer, Neil Burch, Yn-
[Brown and Sandholm, 2017a] Noam Brown and Tuomas gvi Björnsson, Akihiro Kishimoto, Martin Müller, Robert
Sandholm. Reduced space and faster convergence in Lake, Paul Lu, and Steve Sutphen. Checkers is solved.
imperfect-information games via pruning. In International Science, 317(5844):1518–1522, 2007.
Conference on Machine Learning, 2017. [Silver et al., 2016] David Silver, Aja Huang, Chris J Maddi-
[Brown and Sandholm, 2017b] Noam Brown and Tuomas son, Arthur Guez, Laurent Sifre, George Van Den Driess-
Sandholm. Safe and nested endgame solving for che, Julian Schrittwieser, Ioannis Antonoglou, Veda Pan-
imperfect-information games. In AAAI-17 Workshop on neershelvam, Marc Lanctot, et al. Mastering the game of
Computer Poker and Imperfect Information Games, 2017. go with deep neural networks and tree search. Nature,
529(7587):484–489, 2016.
[Brown et al., 2015] Noam Brown, Sam Ganzfried, and Tuo-
mas Sandholm. Hierarchical abstraction, distributed equi- [Tammelin et al., 2015] Oskari Tammelin, Neil Burch,
librium computation, and post-processing, with applica- Michael Johanson, and Michael Bowling. Solving
tion to a champion no-limit texas hold’em agent. In heads-up limit texas hold’em. In Proceedings of the
Proceedings of the 2015 International Conference on Au- International Joint Conference on Artificial Intelligence
tonomous Agents and Multiagent Systems, pages 7–15. In- (IJCAI), pages 645–652, 2015.
ternational Foundation for Autonomous Agents and Mul- [Zinkevich et al., 2007] Martin Zinkevich, Michael Johan-
tiagent Systems, 2015. son, Michael H Bowling, and Carmelo Piccione. Regret
[Burch et al., 2014] Neil Burch, Michael Johanson, and minimization in games with incomplete information. In
Michael Bowling. Solving imperfect information games Proceedings of the Annual Conference on Neural Informa-
using decomposition. In AAAI Conference on Artificial In- tion Processing Systems (NIPS), pages 1729–1736, 2007.
telligence (AAAI), pages 602–608, 2014.
[Campbell et al., 2002] Murray Campbell, A Joseph Hoane,
and Feng-Hsiung Hsu. Deep Blue. Artificial intelligence,
134(1-2):57–83, 2002.
[Ganzfried and Sandholm, 2015] Sam Ganzfried and Tuo-
mas Sandholm. Endgame solving in large imperfect-
information games. In International Conference on Au-
tonomous Agents and Multi-Agent Systems (AAMAS),
pages 37–45, 2015.
5228