0% found this document useful (0 votes)
60 views3 pages

Libratus Paper

Libratus is an AI developed to play heads-up no-limit Texas Hold'em poker, successfully defeating top human professionals in a significant challenge match. Its architecture includes three main modules: pre-computing Nash equilibrium strategies, nested subgame solving during play, and a self-improvement mechanism that refines its strategy based on opponents' actions. This achievement marks a significant milestone in AI's ability to handle imperfect-information games, showcasing advanced algorithms and techniques in strategy optimization.

Uploaded by

C
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views3 pages

Libratus Paper

Libratus is an AI developed to play heads-up no-limit Texas Hold'em poker, successfully defeating top human professionals in a significant challenge match. Its architecture includes three main modules: pre-computing Nash equilibrium strategies, nested subgame solving during play, and a self-improvement mechanism that refines its strategy based on opponents' actions. This achievement marks a significant milestone in AI's ability to handle imperfect-information games, showcasing advanced algorithms and techniques in strategy optimization.

Uploaded by

C
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17)

Libratus: The Superhuman AI for No-Limit Poker


(Demonstration)
Noam Brown Tuomas Sandholm∗
Computer Science Department Computer Science Department
Carnegie Mellon University Carnegie Mellon University
and Strategic Machine, Inc.
Abstract and have applicability to a variety of imperfect-information
games. Libratus features three main modules, and is powered
No-limit Texas Hold’em is the most popular vari-
by new algorithms in each of the three:
ant of poker in the world. Heads-up no-limit Texas
Hold’em is the main benchmark challenge for AI 1. Computing approximate Nash equilibrium strategies be-
in imperfect-information games. We present Libra- fore the event.
tus, the first—and so far only—AI to defeat top hu- 2. Subgame solving during play.
man professionals in that game. Libratus’s archi-
tecture features three main modules, each of which 3. Improving Libratus’s own strategy to play even closer
has new algorithms: pre-computing a solution to an to equilibrium based on what holes the opponents have
abstraction of the game which provides a high-level been able to identify and exploit.
blueprint for the strategy of the AI, a new nested The next three subsections discuss these, respectively.
subgame-solving algorithm which repeatedly cal-
culates a more detailed strategy as play progresses, 2.1 Abstraction and Equilibrium Finding
and a self-improving module which augments the It is infeasible to pre-compute a strategy for each of the
pre-computed blueprint over time. 10161 different decision points in heads-up no-limit Texas
hold’em.1,2 However, many situations are strategically sim-
1 Introduction ilar and can be treated identically at only a small cost. For
Recreational games have long been used in AI as bench- example, there is little difference between a bet of $500 and
marks to evaluate the progress of the field. AIs have beaten a bet of $501. Rather than come up with a unique strategy
top humans in chess [Campbell et al., 2002] and Go [Silver for both of those situations, it is standard to group them to-
et al., 2016]. Checkers was even completely solved [Scha- gether and treat them identically, so that only one strategy is
effer et al., 2007]. However, these are perfect-information generated for them. There are two kinds of such abstraction:
games: both players know the exact state of the game at every action abstraction and card abstraction.
point. In contrast, poker is an imperfect-information game: In action abstraction, only a few of the nearly 20, 000 pos-
part of the state is hidden from a player because the other sible actions available at any point in the game are included
player has private information. Many real-world applications in the abstraction for both the agent and the opponent. If,
can be modeled as imperfect-information games, such as ne- during actual play, the opponent chooses an action that is not
gotiations, business strategy, security interactions, and auc- in the abstraction, then it is standard to map that action to a
tions. Indeed, imperfect information is common in the real nearby action that is in the abstraction. The actions that we
world, while perfect information is rare, making imperfect- included in the abstraction were determined by analyzing the
information games particularly suitable for modeling real- most common actions taken by prior top AIs in the Annual
world strategic interactions. Dealing with hidden information Computer Poker Competition (ACPC). For the first few ac-
requires drastically different AI techniques. Heads-up no- tions of the game, the actions to include in the abstraction
limit Texas Hold’em has long been the primary benchmark (i.e., bet sizes) were determined by a parameter optimiza-
challenge for imperfect-information games. tion algorithm which converged to a locally optimal set of
In January 2017 Libratus beat a team of four top-10 heads- bet sizes [Brown and Sandholm, 2014].
up no-limit specialist professionals in a 120,000-hand 20-day In card abstraction, similar poker hands are bucketed to-
Brains vs. AI challenge match. That is the first time an AI has gether and treated identically. Libratus does not use any card
beaten top humans in this game. Libratus beat the humans abstraction on the first (preflop) and second (flop) betting
by a large margin (147 mbb/hand), with 99.98% statistical rounds. The last two betting rounds, which are exponentially
significance. It also beat each of the humans individually. larger, are more coarsely abstracted. The 55 million different
2 Architecture of Libratus 1
The standard version of the game has 10161 because both play-
Libratus’s strategy was not programmed in, but rather gener- ers have $20,000 and are limited to dollar-increment bets.
2
ated algorithmically. The algorithms are domain-independent Heads-up limit Texas Hold’em, a significantly smaller game
with 1013 decision points, was essentially solved in 2015 [Bowling

Corresponding author. Email: sandholm@cs.cmu.edu et al., 2015; Tammelin et al., 2015].

5226
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17)

hand possibilities on the third round are grouped into 2.5 mil- opponent’s mistakes in the hand so far to enlarge the strategy
lion buckets, and the 2.4 billion different possibilities on the polytope that can be safely optimized over. This begets better
fourth round are grouped into 1.25 million buckets. The idea strategies [Brown and Sandholm, 2017b] than prior subgame-
is that solving this abstraction gives a detailed strategy for solving techniques [Ganzfried and Sandholm, 2015; Burch et
the first two betting rounds and a blueprint for the remaining al., 2014; Jackson, 2014; Moravcik et al., 2016].
two betting rounds; the subgame solver, discussed in the next A further novel aspect is that Libratus changes its action
subsection, will then refine the blueprint into a detailed strat- abstraction in each subgame. Thus the opponents must adapt
egy. The card abstraction algorithm was similar to that used to new bet sizes in each subgame.
in Baby Tartanian8 [Brown and Sandholm, 2016a] (the win-
ner of the 2016 ACPC) and Tartanian7 [Brown et al., 2015] 2.3 Self-Improvement
(the winner of the 2014 ACPC). The abstraction algorithm As described in Section 2.1, Libratus uses a dense action ab-
took the game size from 10161 decision points down to 1012 . straction on the first two betting rounds. If an opponent does
We solved the abstract game via a distributed version of an not bet an amount that is in the abstraction, the bet is rounded
improvement over Monte Carlo Counterfactual Regret Min- to a nearby size that is in the abstraction. This causes the AI’s
imization (MCCFR) [Zinkevich et al., 2007; Lanctot et al., strategy and estimates of the values of reaching subgames, to
2009; Brown et al., 2015]. MCCFR is an iterative algo- be slightly off. To improve upon this, in the background, the
rithm which independently minimizes regret at every deci- AI determined a small number of actions to add to the abstrac-
sion point. If both players play according to MCCFR in a tion that would reduce this rounding error as much as possi-
two-player zero-sum game, then their average strategies prov- ble. The choice of actions was based on a combination of
ably converge to a Nash equilibrium. Libratus improves over which actions the opponents were choosing most frequently
vanilla MCCFR through a sampled form of Regret-Based and how far those actions were from their nearest actions in
Pruning (RBP) [Brown and Sandholm, 2015] (which we also the abstraction. Once an action was selected, a strategy was
used in our Baby Tartanian8 agent [Brown and Sandholm, calculated for it in a similar manner to subgame solving, de-
2016a]). At a high level, our improvement is that paths in the scribed in Section 2.2. From that point on, if that action (or
tree that have very negative regret (for the player that is being a nearby one) were chosen by an opponent, then the newly
updated on the current iteration) are visited less often. This solved subgame strategy would be used.
leads to a significant speed improvement, thereby enabling a
large fine-grained abstraction to be solved. It also mitigates 3 Agent Construction
the downsides of imperfect-recall abstraction (which is the
state of the art) because the effective in-degree of abstract In total, Libratus used about 25 million core hours. Of those,
states decreases as some paths to them get de-emphasized. about 13 million core hours were used for exploratory experi-
ments and evaluation. About 6 million core hours were spent
2.2 Nested Safe Subgame Solving on the initial abstraction and equilibrium finding component,
another 3 million were used for nested subgame solving, and
The second module solves a finer-grained abstraction of the about 3 million were used on the self-improvement algorithm.
remaining game, taking into account the blueprint of the strat- The equilibrium finding and self-improvement algorithms
egy for the entire game, when the third round is reached. used 196 nodes on the Bridges supercomputer at the Pitts-
Unlike perfect-information games, an imperfect-information burgh Supercomputing Center. Each node has 128 GB of
subgame cannot be solved in isolation. The Nash equilibrium memory and 28 cores, but only 14 cores are used by the agent.
strategy in other subgames affects the optimal strategy in the An asymmetric abstraction was used that had more actions for
subgame that is reached during play. Nevertheless, we can the opponent, to better reduce the error resulting from action
approximate a good strategy in the subgame that is reached if translation [Bard et al., 2014].
we have a good estimate of the value of reaching the subgame
The subgame solver used 50 nodes per game. Here we
in an equilibrium. The first module estimated this value for
used CFR+ [Tammelin et al., 2015] combined with spe-
every subgame. Using these subgame values as input, sub-
cialized optimizations [Johanson et al., 2011] for equilib-
game solving creates and solves a finer-grained abstraction in
rium finding. Recent work suggests that subgame solving
the subgame that is reached.
could be even faster by leveraging warm starting [Brown
This finer-grained abstraction does not use any card ab- and Sandholm, 2016b], new pruning techniques [Brown and
straction and uses a dense action abstraction. Sandholm, 2017a], or first-order methods [Nesterov, 2005;
Also, rather than apply action translation as was done on Kroer et al., 2017].
the first two rounds, and has been done in prior poker AIs,
Libratus constructs and solves a new subgame every time an
opponent chooses an action that is not in the finer-grained Acknowledgments
abstraction (in practice, it constructs a new subgame every The research was supported by NSF grant 1617590, ARO
time the opponent bets). This allows it to avoid the rounding award W911NF-17-1-0082, and XSEDE computing re-
error due to action translation and leads to dramatically lower sources by the Pittsburgh Supercomputing Center. The
exploitability [Brown and Sandholm, 2017b]. Brains vs. AI match was sponsored by CMU, Rivers Casino,
Another novel subgame solver aspect is that it guarantees GreatPoint Ventures, Avenue4Analytics, TNG Technology
that the solution is no worse than the precomputed equilib- Consulting, Artificial Intelligence, Intel, and Optimized Mar-
rium approximation, taking into account the magnitude of kets. Ben Clayman computed statistics of the play of our AIs.

5227
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17)

References [Jackson, 2014] Eric Jackson. A time and space efficient al-
[Bard et al., 2014] Nolan Bard, Michael Johanson, and gorithm for approximately solving large imperfect infor-
Michael Bowling. Asymmetric abstractions for adversar- mation games. In AAAI Workshop on Computer Poker and
ial settings. In International Conference on Autonomous Imperfect Information, 2014.
Agents and Multi-Agent Systems (AAMAS), pages 501– [Johanson et al., 2011] Michael Johanson, Kevin Waugh,
508, 2014. Michael Bowling, and Martin Zinkevich. Accelerating
[Bowling et al., 2015] Michael Bowling, Neil Burch, best response calculation in large extensive games. In Pro-
Michael Johanson, and Oskari Tammelin. Heads-up limit ceedings of the International Joint Conference on Artificial
hold’em poker is solved. Science, 347(6218):145–149, Intelligence (IJCAI), pages 258–265, 2011.
January 2015. [Kroer et al., 2017] Christian Kroer, Kevin Waugh, Fatma
[Brown and Sandholm, 2014] Noam Brown and Tuomas Kılınç-Karzan, and Tuomas Sandholm. Theoretical
Sandholm. Regret transfer and parameter optimization. In and practical advances on smoothing for extensive-form
Proceedings of the Twenty-Eighth AAAI Conference on Ar- games. In Proceedings of the ACM Conference on Eco-
tificial Intelligence, pages 594–601. AAAI Press, 2014. nomics and Computation (EC), 2017.
[Brown and Sandholm, 2015] Noam Brown and Tuomas [Lanctot et al., 2009] Marc Lanctot, Kevin Waugh, Martin
Sandholm. Regret-based pruning in extensive-form Zinkevich, and Michael Bowling. Monte Carlo sampling
games. In Advances in Neural Information Processing Sys- for regret minimization in extensive games. In Proceed-
tems, pages 1972–1980, 2015. ings of the Annual Conference on Neural Information Pro-
cessing Systems (NIPS), pages 1078–1086, 2009.
[Brown and Sandholm, 2016a] Noam Brown and Tuomas
Sandholm. Baby Tartanian8: Winning agent from the 2016 [Moravcik et al., 2016] Matej Moravcik, Martin Schmid,
annual computer poker competition. In Proceedings of the Karel Ha, Milan Hladik, and Stephen Gaukrodger. Re-
Twenty-Fifth International Joint Conference on Artificial fining subgames in large imperfect information games. In
Intelligence (IJCAI-16), pages 4238–4239, 2016. AAAI Conference on Artificial Intelligence (AAAI), 2016.
[Brown and Sandholm, 2016b] Noam Brown and Tuomas [Nesterov, 2005] Yurii Nesterov. Excessive gap technique in
Sandholm. Strategy-based warm starting for regret min- nonsmooth convex minimization. SIAM Journal of Opti-
imization in games. In AAAI Conference on Artificial In- mization, 16(1):235–249, 2005.
telligence (AAAI), 2016. [Schaeffer et al., 2007] Jonathan Schaeffer, Neil Burch, Yn-
[Brown and Sandholm, 2017a] Noam Brown and Tuomas gvi Björnsson, Akihiro Kishimoto, Martin Müller, Robert
Sandholm. Reduced space and faster convergence in Lake, Paul Lu, and Steve Sutphen. Checkers is solved.
imperfect-information games via pruning. In International Science, 317(5844):1518–1522, 2007.
Conference on Machine Learning, 2017. [Silver et al., 2016] David Silver, Aja Huang, Chris J Maddi-
[Brown and Sandholm, 2017b] Noam Brown and Tuomas son, Arthur Guez, Laurent Sifre, George Van Den Driess-
Sandholm. Safe and nested endgame solving for che, Julian Schrittwieser, Ioannis Antonoglou, Veda Pan-
imperfect-information games. In AAAI-17 Workshop on neershelvam, Marc Lanctot, et al. Mastering the game of
Computer Poker and Imperfect Information Games, 2017. go with deep neural networks and tree search. Nature,
529(7587):484–489, 2016.
[Brown et al., 2015] Noam Brown, Sam Ganzfried, and Tuo-
mas Sandholm. Hierarchical abstraction, distributed equi- [Tammelin et al., 2015] Oskari Tammelin, Neil Burch,
librium computation, and post-processing, with applica- Michael Johanson, and Michael Bowling. Solving
tion to a champion no-limit texas hold’em agent. In heads-up limit texas hold’em. In Proceedings of the
Proceedings of the 2015 International Conference on Au- International Joint Conference on Artificial Intelligence
tonomous Agents and Multiagent Systems, pages 7–15. In- (IJCAI), pages 645–652, 2015.
ternational Foundation for Autonomous Agents and Mul- [Zinkevich et al., 2007] Martin Zinkevich, Michael Johan-
tiagent Systems, 2015. son, Michael H Bowling, and Carmelo Piccione. Regret
[Burch et al., 2014] Neil Burch, Michael Johanson, and minimization in games with incomplete information. In
Michael Bowling. Solving imperfect information games Proceedings of the Annual Conference on Neural Informa-
using decomposition. In AAAI Conference on Artificial In- tion Processing Systems (NIPS), pages 1729–1736, 2007.
telligence (AAAI), pages 602–608, 2014.
[Campbell et al., 2002] Murray Campbell, A Joseph Hoane,
and Feng-Hsiung Hsu. Deep Blue. Artificial intelligence,
134(1-2):57–83, 2002.
[Ganzfried and Sandholm, 2015] Sam Ganzfried and Tuo-
mas Sandholm. Endgame solving in large imperfect-
information games. In International Conference on Au-
tonomous Agents and Multi-Agent Systems (AAMAS),
pages 37–45, 2015.

5228

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy