05 Games
05 Games
Artificial Intelligence
Adversarial Search
and Games
AIMA Chapter 5
𝑠0 Empty board.
𝐴𝑐𝑡𝑖𝑜𝑛𝑠(𝑠) Play empty squares.
𝑅𝑒𝑠𝑢𝑙𝑡(𝑠, 𝑎) Symbol (x/o) is placed on empty square.
𝑇𝑒𝑟𝑚𝑖𝑛𝑎𝑙(𝑠) Did a player win or is the game a draw?
𝑈𝑡𝑖𝑙𝑖𝑡𝑦(𝑠) +1 if x wins, -1 if o wins and 0 for a draw.
Utility is only defined for terminal states.
action / result 1
redundant path
9×8
The state space size (number of
possible boards) is much smaller
than:
39 = 19,683 states.
AND
2 Objective: Find a subtree that has only win
OR leaf nodes (utility +1). We can abandon a
subtree if we find a single loss (utility -1).
3
MV MV MV MV MV MV MV MV MV
min
1 MV MV Determine MVs using a bottom-
max up strategy
Represents
OR Search
Represents
AND Search
Exercise: Simple 2-Ply Game
MV
Max
𝑎1 𝑎2 𝑎3
MV MV MV
Min
𝑎1 𝑎3 𝑎1 𝑎3
𝑎1 𝑎3
𝑎2 𝑎2 𝑎2
Space complexity: 𝑂 𝑏𝑚
Time complexity: 𝑂 𝑏𝑚
• Fast solution is only feasible for very simple games with small branching factor!
• Example: Tic-tac-toe
𝑏 = 9, 𝑚 = 9 → 𝑂 99 = 𝑂(387,420,489)
𝑏 decreases from 9 to 8, 7, … the actual size is smaller than:
1 9 9 × 8 9 × 8 × 7 … 9! = 986,409 nodes
• Observations:
• min(3, 𝑥, 𝑦) can never be more than 3
• max(5, min(3, 𝑥, 𝑦, … )) does not depend on the values of 𝑥 or 𝑦.
• Minimax search applies alternating min and max.
Utility cannot be
more than 2 in the
Max Max subtree, but we
𝑣=3 already can get 3
Min Min from the first
𝑣≤2 subtree. Prune the
rest.
𝑣=3
[ 3, +∞ ] Max Max Once a subtree is
𝑣=2 fully evaluated,
Min Min the interval has a
length of 0
(𝛼 = 𝛽).
= minimax search + pruning
• Move ordering for DFS = Check good moves for Min and Max
first.
𝑎1 𝑎2 𝑎3
𝑎1 𝑎3 𝑎1 𝑎3
𝑎1 𝑎3
𝑎2 𝑎2 𝑎2
Examples:
1. A weighted linear function
𝐸𝑣𝑎𝑙 𝑠 = 𝑤1 𝑓1 𝑠 + 𝑤2 𝑓2 𝑠 + ⋯ + 𝑤𝑛 𝑓𝑛 (𝑠)
where 𝑓𝑖 is a feature of the state (e.g., # of pieces captured in chess).
2. A deep neural network trained on complete games.
Heuristic Alpha-Beta Tree Search:
Cutting off search HMV = heuristic minimax value
Depth (ply)
0 Pick the action with
the highest HMV
• Method
1. Simulate 𝑁 playouts from the current state.
2. Select the move that results in the highest win percentage.
Issue: Pure Monte Carlo Search spends a lot of time to create playouts for bad
move.
Better: Select the starting state for playouts to focus on important parts of the
game tree (i.e., good moves).
This presents the following tradeoff:
Exploration: perform more playouts from
states that currently have no or few
playouts.
𝑈 𝑛 log 𝑁(𝑃𝑎𝑟𝑒𝑛𝑡 𝑛 )
𝑈𝐶𝐵1 𝑛 = +𝐶
𝑁 𝑛 𝑁(𝑛)
Average utility High for nodes with few playouts relative to the
(=exploitation) parent node (=exploration). Goes to 0 for large 𝑁(𝑛)
Important considerations:
• We can use UCB1 as the selection strategy to decide what
part of the tree we should focus on for the next playout.
This balances exploration and exploitation.
• We typically can only store a small part of the game tree, so
we do not store the complete playout runs.
Highest UCB1 score UCB1 selection favors win
percentage more and more.
Wins/Playouts
White
Black
White
Black
White
(update counts)
Black
After
move
White
Black
White
Stochastic Games
Games With Random Events
Stochastic Games
• Game includes a “random action” 𝑟 (e.g., dice, dealt cards)
• Add chance nodes that calculate the expected value.
Backgammon
Expectiminimax
• Game includes a “random action” 𝑟 (e.g., dice, dealt cards).
• For chance nodes we calculate the expected minimax value.
𝐸𝑥𝑝𝑒𝑐𝑡𝑖𝑚𝑖𝑛𝑖𝑚𝑎𝑥 𝑠 =
𝑈𝑡𝑖𝑙𝑖𝑡𝑦 𝑠 if 𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑙(𝑠)
max 𝐸𝑥𝑝𝑒𝑐𝑡𝑖𝑚𝑖𝑛𝑖𝑚𝑎𝑥 𝑅𝑒𝑠𝑢𝑙𝑡 𝑠, 𝑎 if 𝑚𝑜𝑣𝑒 = 𝑀𝑎𝑥
𝑎∈𝐴𝑐𝑡𝑖𝑜𝑛𝑠 𝑠
min 𝐸𝑥𝑝𝑒𝑐𝑡𝑖𝑚𝑖𝑛𝑖𝑚𝑎𝑥 𝑅𝑒𝑠𝑢𝑙𝑡 𝑠, 𝑎 if 𝑚𝑜𝑣𝑒 = 𝑀𝑖𝑛
𝑎∈𝐴𝑐𝑡𝑖𝑜𝑛𝑠 𝑠
• Options:
• Use Minimax algorithm. Issue: Search tree size explodes if the number of
“random actions” is large. Think of drawing cards for poker!
• Cut-off search and approximate Expectiminimax with an evaluation function.
• Perform Monte Carlo Tree Search.
Scale only for tiny problems!
Nondeterministic actions:
• The opponent is seen as part of an
environment with nondeterministic
actions. Non-determinism is the
result of the unknown moves by the
opponent. All possible moves are
considered.
Optimal decisions:
• Minimax search and Alpha-Beta
pruning where each player plays
optimal to the end of the game.
• Choice nodes and Expectiminimax for
stochastic games.