IA c06 NoAnim
IA c06 NoAnim
(some slides adapted from A. Dragan, N. Kitaev, N. Lambert, S. Levine, S. Rao, S. Russell)
Outline
Adversarial Search
Minimax algorithm
Examples
I games: chess, Go, poker
I economy: actors can increase demand, supply etc.
Two-player zero-sum games
Note on terminology
I move = action
I ply = one move by one player
I position = state
I players: MAX, MIN (MAX moves first, then alternate)
Formal definition of a game
Elements
I S0 : initial state (game setup at start)
I TO-MOVE(s): The player whose turn it is to move in state s
I ACTIONS(s): The set of legal moves in state s
I RESULT(s,a): The transition model (defines the state
resulting from taking action a in state s)
I IS-TERMINAL(s): A test which is true when in a terminal
state (the game is over) and false otherwise
I UTILITY(s,p): A utility function (aka objective function, aka
payoff function), defining the final numeric value to player p
when the game ends in terminal state s
E.g.: chess: 1 , 0, or 1/2 (and is a zero-sum game!)
Tic-Tac-Toe Game tree
Minimax values
MINIMAX(s) =
UTILITY(s,MAX) if IS-TERMINAL(s)
maxa∈Actions(s) MINIMAX(RESULT(s,a)) if TO-MOVE(s)=MAX
mina∈Actions(s) MINIMAX(RESULT(s,a)) if TO-MOVE(s)=MIN
Minimax and optimallity
Iterative deepening
I search one ply deep and record the moves ranking based
on their evaluations
I search one ply deeper, using the previous ranking to
inform move ordering; and so on
This also allows controlling time restrictions
Killer moves
Killer moves = the best moves (e.g., caused a beta cutoff).
These moves should be tried first
Effectiveness of alpha-beta pruning
Transpositions
Permutations of a move sequence ending up in the same
position
Types of strategies
I Type A: consider all possible moves to a certain depth,
then use a heuristic evaluation function to estimate the
utility of states at that depth. It explores a wide but shallow
portion of the tree.
I Type B: ignore moves that look bad, and follows promising
lines ”as far as possible.” It explores a deep but narrow
portion of the tree.
Heuristic Alpha-Beta Tree Search
Terminal states:
EVAL(s,p) = UTILITY(s,p)
Non-terminal states:
UTILITY(loss,p) ≤ EVAL(s,p) ≤ UTILITY(win,p)
Notes:
I wi can be estimated via Machine Learning
I correlation function–chances to win not necessarily linear:
s twice as likely to win as s0 requires EVAL(s)>EVAL(s’),
not necessarily EVAL(s)=2*EVAL(s’)
Cutting off search
Quiescent positions
Apply the evaluation function only to quiescent positions: where
there is no pending move (e.g., capturing the queen) which
might swing the evaluation
Cutting off search
Quiescence
In (b), black ahead by K+2P, but the Q capture will change this
Cutting off search
Horizon effect
B doomed, but P sacrifices can push the loss over the horizon
(→ so considered good)
Cutting off search
Horizon effect
Chess
I Branching factor: 35 → 355 = 5 ∗ 106
I minimax search: 5 ply, not more → average human player
I alpha–beta search + large transposition table → 14 ply
(expert level)
I for grandmaster status, we need: an extensively tuned
evaluation function + a large database of endgame moves
I STOCKFISH: all of the above → 30 ply (> the ability of any
human player)
Search v. lookup