0% found this document useful (0 votes)
42 views42 pages

05 Games

Uploaded by

shahzad.dar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views42 pages

05 Games

Uploaded by

shahzad.dar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

CS 5/7320

Artificial Intelligence

Adversarial Search
and Games
AIMA Chapter 5

Slides by Michael Hahsler


with figures from the AIMA textbook

This work is licensed under a Creative Commons


Attribution-ShareAlike 4.0 International License. "Reflected Chess pieces" by Adrian Askew
Games
• Games typically confront the agent with a
competitive (adversarial) environment affected by
an opponent (strategic environment).
• Games are episodic.
• We will focus on planning for
• two-player zero-sum games with
• deterministic game mechanics and
• perfect information (i.e., fully observable environment).

• We call the two players:


1) Max tries to maximize his utility.
2) Min tries to minimize Max’s utility since it is
a zero-sum game.
Definition of a Game
• Definition:
𝑠0 The initial state (position, board, hand).
𝐴𝑐𝑡𝑖𝑜𝑛𝑠(𝑠) Legal moves in state 𝑠.
𝑅𝑒𝑠𝑢𝑙𝑡(𝑠, 𝑎) Transition model.
𝑇𝑒𝑟𝑚𝑖𝑛𝑎𝑙(𝑠) Test for terminal states.
𝑈𝑡𝑖𝑙𝑖𝑡𝑦(𝑠) Utility for player Max for terminal states.

• State space: a graph defined by the initial state and the


transition function containing all reachable states (e.g.,
chess positions).
• Game tree: a search tree superimposed on the state
space. A complete game tree follows every sequence
from the current state to the terminal state (the game
ends).
Example: Tic-tac-toe

𝑠0 Empty board.
𝐴𝑐𝑡𝑖𝑜𝑛𝑠(𝑠) Play empty squares.
𝑅𝑒𝑠𝑢𝑙𝑡(𝑠, 𝑎) Symbol (x/o) is placed on empty square.
𝑇𝑒𝑟𝑚𝑖𝑛𝑎𝑙(𝑠) Did a player win or is the game a draw?
𝑈𝑡𝑖𝑙𝑖𝑡𝑦(𝑠) +1 if x wins, -1 if o wins and 0 for a draw.
Utility is only defined for terminal states.

Here player x is Max


and player o is Min.
Note: This game still uses a goal-based agent that
plans actions to reach a winning terminal state!
Tic-tac-toe: Partial Game Tree
Note: This game state / node
has no cycles! # of nodes

action / result 1

redundant path
9×8
The state space size (number of
possible boards) is much smaller
than:
39 = 19,683 states.

Terminal states However, the complete game tree is


have a known much larger because the same state
utility (board) can be reached in different
subtrees (redundant paths). The game
tree here is a little smaller than:
1 + 9 × 8 + 9 × 8 × 7 + ⋯ 9!
= 986,409 nodes
Exact Methods
• Model as nondeterministic actions: The
opponent is seen as part of an
environment with nondeterministic
actions. Non-determinism is the result of
the unknown moves by the opponent. We
consider all possible moves by the
opponent.
• Find optimal decisions: Minimax search
Methods for and Alpha-Beta pruning where each
player plays optimal to the end of the
game.
Adversarial
Games Heuristic Methods
(game tree is too large)
• Heuristic Alpha-Beta Tree Search:
a. Cut off game tree and use
heuristic for utility.
b. Forward Pruning: ignore poor
moves.
• Monte Carlo Tree search: Estimate utility
of a state by simulating complete games
and average the utility.
Nondeterministic Actions

Recall AND-OR Search from AIMA Chapter 4


Exact Methods
• Model as nondeterministic actions: The
opponent is seen as part of an environment with
nondeterministic actions. Non-determinism is the
result of the unknown moves by the opponent.
We consider all possible moves by the opponent.
• Find optimal decisions: Minimax search and
Methods Alpha-Beta pruning where each player plays
optimal to the end of the game.
for
Heuristic Methods
Adversarial (game tree is too large)

Games • Heuristic Alpha-Beta Tree Search:


a. Cut off game tree and use heuristic for
utility.
b. Forward Pruning: ignore poor moves.
• Monte Carlo Tree search: Estimate utility of a
state by simulating complete games and average
the utility.
Recall: Nondeterministic Actions
For planning, we do not know what the opponents moves will
be. We have already modeled this issue using
nondeterministic actions.

Outcome of actions in the environment is nondeterministic =


transition model need to describe uncertainty about the
opponent's behavior.
Each action consists of the move by the
player and all possible (i.e., nondeterministic)
responses by the opponent.
Example transition:
𝑅𝑒𝑠𝑢𝑙𝑡𝑠 𝑠1 , 𝑎 = 𝑠2 , 𝑠4 , 𝑠5
i.e., action 𝑎 in 𝑠1 can lead to one of several states (which is
called a belief state of the agent).
Recall: AND-OR DFS Search Algorithm
= nested If-then-else statements

// don’t follow loops my


// check all possible actions moves

all states that can result from


opponent’s moves

// check all possible current states


Go through
abandon subtree if a loss is found
opponent
moves
Tic-tac-toe: AND-OR Search
We play MAX and decide on our actions (OR).
MIN’s actions introduce non-determinism (AND).
Depth (ply)
Pick an action that leads to a
0
subtree with only win leaves.
OR

AND
2 Objective: Find a subtree that has only win
OR leaf nodes (utility +1). We can abandon a
subtree if we find a single loss (utility -1).
3

AND We call playing always the best move


playing optimally. Since we consider all
the opponent’s moves in the AND stage,
we also includes MIN’s best move. This
m means we consider MIN playing optimally.
Optimal Decisions
Minimax Search and Alpha-Beta Pruning
Exact Methods
• Model as nondeterministic actions: The
opponent is seen as part of an environment with
nondeterministic actions. Non-determinism is the
result of the unknown moves by the opponent.
We consider all possible moves by the opponent.
• Find optimal decisions: Minimax search and
Methods Alpha-Beta pruning where each player plays
optimal to the end of the game.
for
Heuristic Methods
Adversarial (game tree is too large)

Games • Heuristic Alpha-Beta Tree Search:


a. Cut off game tree and use heuristic for
utility.
b. Forward Pruning: ignore poor moves.
• Monte Carlo Tree search: Estimate utility of a
state by simulating complete games and average
the utility.
Idea: Minimax Decision
• Assign each state 𝑠 a minimax value that reflects the utility
realized if both players play optimally from 𝑠 to the end of
the game:
𝑈𝑡𝑖𝑙𝑖𝑡𝑦 𝑠 if 𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑙(𝑠)
max 𝑀𝑖𝑛𝑖𝑚𝑎𝑥 𝑅𝑒𝑠𝑢𝑙𝑡 𝑠, 𝑎 if 𝑚𝑜𝑣𝑒 = 𝑀𝑎𝑥
𝑀𝑖𝑛𝑖𝑚𝑎𝑥 𝑠 = 𝑎∈𝐴𝑐𝑡𝑖𝑜𝑛𝑠 𝑠
min 𝑀𝑖𝑛𝑖𝑚𝑎𝑥 𝑅𝑒𝑠𝑢𝑙𝑡 𝑠, 𝑎 if 𝑚𝑜𝑣𝑒 = 𝑀𝑖𝑛
𝑎∈𝐴𝑐𝑡𝑖𝑜𝑛𝑠 𝑠

• This is a recursive definition which can be solved from


terminal states backwards.
• The optimal decision for Max is the action that leads to the
state with the largest minimax value. That is the largest
possible utility if both players play optimally.
Minimax Search: Back-up
Minimax Values
Pick action that leads to the largest MV

MV MV MV MV MV MV MV MV MV
min
1 MV MV Determine MVs using a bottom-
max up strategy

0 1 1 • Max always picks the action


min that has the largest value.
… • Min always picks the action
that has the smallest value.

= minimax value (MV)


Approach: Follow tree to each
terminal node and back up
minimax value.

Note: This is just a generalization


of the AND-OR Tree Search and
returns the first action of the
conditional plan.

Represents
OR Search

Find the action that


leads to the best value.

Represents
AND Search
Exercise: Simple 2-Ply Game
MV
Max

𝑎1 𝑎2 𝑎3

MV MV MV
Min

𝑎1 𝑎3 𝑎1 𝑎3
𝑎1 𝑎3
𝑎2 𝑎2 𝑎2

Utility for Max 2 0 5 -5 -2 7 5 -7 4


Utility for Min

• What are the terminal state utilities for Min?


• Compute all MV (minimax values).
• How do we traverse the game tree? What is the Big-O notation for time and space?
• What is the optimal action for Max?
b: max branching factor
m: max depth of tree
Issue: Game Tree Size
• Minimax search traverses the complete game tree using DFS!

Space complexity: 𝑂 𝑏𝑚
Time complexity: 𝑂 𝑏𝑚

• Fast solution is only feasible for very simple games with small branching factor!

• Example: Tic-tac-toe
𝑏 = 9, 𝑚 = 9 → 𝑂 99 = 𝑂(387,420,489)
𝑏 decreases from 9 to 8, 7, … the actual size is smaller than:
1 9 9 × 8 9 × 8 × 7 … 9! = 986,409 nodes

• We need to reduce the search space! → Game tree pruning


Alpha-Beta Pruning
• Idea: Do not search parts of the tree if they do not make a
difference to the outcome.

• Observations:
• min(3, 𝑥, 𝑦) can never be more than 3
• max(5, min(3, 𝑥, 𝑦, … )) does not depend on the values of 𝑥 or 𝑦.
• Minimax search applies alternating min and max.

• Approach: maintain bounds for the minimax value


[𝛼, 𝛽] and prune subtrees (i.e., don’t follow actions) that do
not affect the current minimax value bound.
• Alpha is used by Max and means “𝑀𝑖𝑛𝑖𝑚𝑎𝑥(𝑠) is at least 𝛼.”
• Beta is used by Min and means “𝑀𝑖𝑛𝑖𝑚𝑎𝑥(𝑠) is at most 𝛽.”
Example: Alpha-Beta Search
[𝛼, β] Max updates α
Max Max (utility is at least)

Min Min Min updates 𝛽


(utility is at most)

Utility cannot be
more than 2 in the
Max Max subtree, but we
𝑣=3 already can get 3
Min Min from the first
𝑣≤2 subtree. Prune the
rest.

𝑣=3
[ 3, +∞ ] Max Max Once a subtree is
𝑣=2 fully evaluated,
Min Min the interval has a
length of 0
(𝛼 = 𝛽).
= minimax search + pruning

// v is the minimax value

Found a better action?


Abandon subtree if Max finds an
actions that has more value than
the best-known move Min has in
another subtree.

Found a better action?

Abandon subtree if Min finds an


actions that has less value than
the best-known move Max has in
another subtree.
Move Ordering for Alpha-Beta Search
• Idea: Pruning is more effective if good alpha-beta bounds can be
found in the first few checked subtrees.

• Move ordering for DFS = Check good moves for Min and Max
first.

• We need expert knowledge or some heuristic to determine what


a good move is.

• Issue: Optimal decision algorithms still scale poorly even when


using alpha-beta pruning with move ordering.
Exercise: Simple 2-Ply Game
[𝛼, 𝛽]
Max

𝑎1 𝑎2 𝑎3

[𝛼, 𝛽] [𝛼, 𝛽] [𝛼, 𝛽]


Min

𝑎1 𝑎3 𝑎1 𝑎3
𝑎1 𝑎3
𝑎2 𝑎2 𝑎2

Utility for Max 2 -5 5 7 0 2 5 -7 -4

• Find the [𝛼, 𝛽] intervals for all nodes.


• What part of the tree can be pruned?
• What would be the optimal move ordering?
Heuristic Alpha-Beta
Tree Search
Exact Methods
• Model as nondeterministic actions: The
opponent is seen as part of an environment with
nondeterministic actions. Non-determinism is the
result of the unknown moves by the opponent.
We consider all possible moves by the opponent.
• Find optimal decisions: Minimax search and
Methods Alpha-Beta pruning where each player plays
optimal to the end of the game.
for
Heuristic Methods
Adversarial (game tree is too large or search takes too long)

Games • Heuristic Alpha-Beta Tree Search:


a. Cut off game tree and use heuristic for
utility.
b. Forward Pruning: ignore poor moves.
• Monte Carlo Tree search: Estimate utility of a
state by simulating complete games and average
the utility.
Cutting off search
Reduce the search cost by restricting the search depth:
1. Stop search at a non-terminal node.
2. Use a heuristic evaluation function 𝐸𝑣𝑎𝑙 𝑠 to approximate the utility for
that node/state.

Needed properties of the evaluation function:


▪ Fast to compute.
▪ 𝐸𝑣𝑎𝑙 𝑠 ∈ 𝑈𝑡𝑖𝑙𝑖𝑡𝑦 𝑙𝑜𝑠𝑠 , 𝑈𝑡𝑖𝑙𝑖𝑡𝑦 𝑤𝑖𝑛
▪ Correlated with the actual chance of winning (e.g., using features of the state).

Examples:
1. A weighted linear function
𝐸𝑣𝑎𝑙 𝑠 = 𝑤1 𝑓1 𝑠 + 𝑤2 𝑓2 𝑠 + ⋯ + 𝑤𝑛 𝑓𝑛 (𝑠)
where 𝑓𝑖 is a feature of the state (e.g., # of pieces captured in chess).
2. A deep neural network trained on complete games.
Heuristic Alpha-Beta Tree Search:
Cutting off search HMV = heuristic minimax value
Depth (ply)
0 Pick the action with
the highest HMV

1 HMV HMV HMV HMV HMV HMV HMV HMV HMV

Eval = heuristic to estimate of the minimax


2 Eval Eval Eval
value/utility of the state.
Cut search off at depth =2

This is also called: search


with a “look ahead” of 2
Forward pruning

To save time, we can prune moves that appear bad.

There are many ways move quality can be evaluated:

• Low heuristic value.


• Low evaluation value after shallow search (cut-off search).
• Past experience.

Issue: May prune important moves.


Heuristic Alpha-Beta Tree Search:
Example for Forward Pruning
x … prune low HMV actions
x x x x x x
HMV HMV HMV HMV HMV HMV HMV HMV HMV

Perform complete alpha-


Eval Eval Eval
beta search on these.
Cut search off at depth =2

1. Perform Cut-off search.


2. Choose the n best actions
using the heuristic minimax
value and prune the rest.
3. Explore the chosen actions
using regular Alpha-Beta
Tree search with move
ordering.
Monte Carlo Tree
Search (MCTS)
Exact Methods
• Model as nondeterministic actions: The
opponent is seen as part of an environment with
nondeterministic actions. Non-determinism is the
result of the unknown moves by the opponent.
We consider all possible moves by the opponent.
• Find optimal decisions: Minimax search and
Methods Alpha-Beta pruning where each player plays
optimal to the end of the game.
for
Heuristic Methods
Adversarial (game tree is too large or search takes too long)

Games • Heuristic Alpha-Beta Tree Search:


a. Cut off game tree and use heuristic for
utility.
b. Forward Pruning: ignore poor moves.
• Monte Carlo Tree search: Estimate utility of a
state by simulating complete games and average
the utility.
Idea
• Approximate 𝑬𝒗𝒂𝒍 𝒔 as the average utility of several simulation
runs to the terminal state (called playouts).

• Playout policy: How to choose moves during the simulation runs?


Example playout policies:
• Random.
• Heuristics for good moves developed by experts.
• Learn good moves from self-play (e.g., with deep neural networks). We
will talk about this when we talk about “Learning from Examples.”

• Typically used for problems with


• High branching factor (many possible moves make the tree very wide).
• Unknown or hard to define good evaluation functions.
Pure Monte Carlo Search
Find the next best move.

• Method
1. Simulate 𝑁 playouts from the current state.
2. Select the move that results in the highest win percentage.

• Optimality Guarantee: Converges to optimal play for


stochastic games as 𝑁 increases.

• Typical strategy for 𝑁 : Do as many playouts as you can


given the available time budget for the move.
Playout Selection Strategy
Max can start a
playout at any of
these states. Which
one should it choose?

Issue: Pure Monte Carlo Search spends a lot of time to create playouts for bad
move.
Better: Select the starting state for playouts to focus on important parts of the
game tree (i.e., good moves).
This presents the following tradeoff:
Exploration: perform more playouts from
states that currently have no or few
playouts.

Exploitation: more playouts for states that have


done well to get more accurate estimates.
Selection using Upper Confidence
Bounds (UCB1)
Tradeoff constant ≈ 2
can be optimizes using experiments

𝑈 𝑛 log 𝑁(𝑃𝑎𝑟𝑒𝑛𝑡 𝑛 )
𝑈𝐶𝐵1 𝑛 = +𝐶
𝑁 𝑛 𝑁(𝑛)

Average utility High for nodes with few playouts relative to the
(=exploitation) parent node (=exploration). Goes to 0 for large 𝑁(𝑛)

𝑛 … node in the game tree


𝑈 𝑛 … total utility of all playouts going through node n
𝑁 𝑛 … number of playouts through n

Selection strategy: Select node with highest UCB1 score.


Monte Carlo Tree Search (MCTS)
Pure Monte Carlo search always start playouts from a given
state.
Monte Carlo Tree Search builds a partial game tree and can
start playouts from any state (node) in that tree.

Important considerations:
• We can use UCB1 as the selection strategy to decide what
part of the tree we should focus on for the next playout.
This balances exploration and exploitation.
• We typically can only store a small part of the game tree, so
we do not store the complete playout runs.
Highest UCB1 score UCB1 selection favors win
percentage more and more.

Wins/Playouts
White

Black

White

Black

White

(update counts)

Select leaf with Note: the simulation


highest UCB1 score path is not recorded to
preserve memory!
Online Play Using MCTS
• Search and update a partial tree to use up the time budget for the
move.
• Keep the relevant subtree from move to move and expand from
there.

Do highest Keep subtree and


playout move explore/exploit.
Wins/Playouts
White

Black
After
move
White

Black

White
Stochastic Games
Games With Random Events
Stochastic Games
• Game includes a “random action” 𝑟 (e.g., dice, dealt cards)
• Add chance nodes that calculate the expected value.

Backgammon
Expectiminimax
• Game includes a “random action” 𝑟 (e.g., dice, dealt cards).
• For chance nodes we calculate the expected minimax value.

𝐸𝑥𝑝𝑒𝑐𝑡𝑖𝑚𝑖𝑛𝑖𝑚𝑎𝑥 𝑠 =
𝑈𝑡𝑖𝑙𝑖𝑡𝑦 𝑠 if 𝑡𝑒𝑟𝑚𝑖𝑛𝑎𝑙(𝑠)
max 𝐸𝑥𝑝𝑒𝑐𝑡𝑖𝑚𝑖𝑛𝑖𝑚𝑎𝑥 𝑅𝑒𝑠𝑢𝑙𝑡 𝑠, 𝑎 if 𝑚𝑜𝑣𝑒 = 𝑀𝑎𝑥
𝑎∈𝐴𝑐𝑡𝑖𝑜𝑛𝑠 𝑠
min 𝐸𝑥𝑝𝑒𝑐𝑡𝑖𝑚𝑖𝑛𝑖𝑚𝑎𝑥 𝑅𝑒𝑠𝑢𝑙𝑡 𝑠, 𝑎 if 𝑚𝑜𝑣𝑒 = 𝑀𝑖𝑛
𝑎∈𝐴𝑐𝑡𝑖𝑜𝑛𝑠 𝑠

෍ 𝑃(𝑟)𝐸𝑥𝑝𝑒𝑐𝑡𝑖𝑚𝑖𝑛𝑖𝑚𝑎𝑥 𝑅𝑒𝑠𝑢𝑙𝑡 𝑠, 𝑟 if 𝑚𝑜𝑣𝑒 = 𝐶ℎ𝑎𝑛𝑐𝑒


𝑟

• Options:
• Use Minimax algorithm. Issue: Search tree size explodes if the number of
“random actions” is large. Think of drawing cards for poker!
• Cut-off search and approximate Expectiminimax with an evaluation function.
• Perform Monte Carlo Tree Search.
Scale only for tiny problems!
Nondeterministic actions:
• The opponent is seen as part of an
environment with nondeterministic
actions. Non-determinism is the
result of the unknown moves by the
opponent. All possible moves are
considered.
Optimal decisions:
• Minimax search and Alpha-Beta
pruning where each player plays
optimal to the end of the game.
• Choice nodes and Expectiminimax for
stochastic games.

Conclusion Heuristic Alpha-Beta Tree Search:


• Cut off game tree and use heuristic
evaluation function for utility (based
on state features).
• Forward Pruning: ignore poor moves.

State of the Art


• Learn heuristic from data using MCTS

Monte Carlo Tree search:


• Simulate complete games and
calculate proportion of wins.
• Use modified UCB1 scores to expand
the partial game tree.
• Learn playout policy using self-play
and deep learning.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy