0% found this document useful (0 votes)
7 views31 pages

IA c06 NoAnim

The document discusses artificial intelligence with a focus on adversarial search techniques, including the minimax algorithm and alpha-beta pruning. It outlines the principles of competitive environments, game trees, evaluation functions, and strategies for optimizing search in two-player zero-sum games. Additionally, it addresses the effectiveness of various pruning techniques and the performance of AI in chess, highlighting the importance of heuristic evaluation functions and the challenges of search depth.

Uploaded by

Davide Muresan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views31 pages

IA c06 NoAnim

The document discusses artificial intelligence with a focus on adversarial search techniques, including the minimax algorithm and alpha-beta pruning. It outlines the principles of competitive environments, game trees, evaluation functions, and strategies for optimizing search in two-player zero-sum games. Additionally, it addresses the effectiveness of various pruning techniques and the performance of AI in chess, highlighting the importance of heuristic evaluation functions and the challenges of search depth.

Uploaded by

Davide Muresan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Artificial Intelligence

Radu Răzvan Slăvescu

Technical University of Cluj-Napoca


Department of Computer Science

(some slides adapted from A. Dragan, N. Kitaev, N. Lambert, S. Levine, S. Rao, S. Russell)
Outline

Adversarial Search

Minimax algorithm

Alpha beta pruning

Cutting off Search


Competitive environments

What is a competitive environment?


One in which two or more agents have conflicting goals
(and each of them can take actions in its own interest)

Examples
I games: chess, Go, poker
I economy: actors can increase demand, supply etc.
Two-player zero-sum games

Common games characteristics


I deterministic
I two-player
I turn-taking
I perfect information: we can see all moves and environment
I zero-sum: if player 1 gains amount a → player 2 looses a

Note on terminology
I move = action
I ply = one move by one player
I position = state
I players: MAX, MIN (MAX moves first, then alternate)
Formal definition of a game

Elements
I S0 : initial state (game setup at start)
I TO-MOVE(s): The player whose turn it is to move in state s
I ACTIONS(s): The set of legal moves in state s
I RESULT(s,a): The transition model (defines the state
resulting from taking action a in state s)
I IS-TERMINAL(s): A test which is true when in a terminal
state (the game is over) and false otherwise
I UTILITY(s,p): A utility function (aka objective function, aka
payoff function), defining the final numeric value to player p
when the game ends in terminal state s
E.g.: chess: 1 , 0, or 1/2 (and is a zero-sum game!)
Tic-Tac-Toe Game tree

MAX: X; MIN: O; number on leaves: utility values for MAX


Game trees and values

Minimax values

∆: MAX’s turn (aims to maximize utility); ∇: MIN’s turn


Numbers: leaves: MAX’s utility values
non-leaves: minimax values
Best moves sequence: A: a1 , B reply: b1
Game trees and values
Minimax values

I complete DF exploration of the game tree


I time: O(bm ) (m=tree depth, b=branching factor)
I space: O(bm) if all actions generated at once
I approximations needed for practical use
Game trees and values
Minimax values

MINIMAX(s) =

 UTILITY(s,MAX) if IS-TERMINAL(s)
maxa∈Actions(s) MINIMAX(RESULT(s,a)) if TO-MOVE(s)=MAX
mina∈Actions(s) MINIMAX(RESULT(s,a)) if TO-MOVE(s)=MIN

Minimax and optimallity

Non-optimal choice for MIN


What if MIN does not play optimally?
MAX will do at least as well as against an optimal player,
possibly better.

Non-optimal choice for MAX


Can MAX risk? If optimal moves leads to a draw and a suboptial
choice for MAX can lead to a situation where MIN has 5 options,
4 leading to its defeat, 1 to its victory and MAX believes that MIN
does not have the resources to find the best option.
General Principle for alpha-beta pruning
Bounds for the values in the path

α and β get updated and branches at a node are pruned (no


more recursive calls) as soon as the value of the current node
is known to be worse than the current α/β valuefor MAX/MIN
General Principle for alpha-beta pruning
General case

Player could move to n, but has a better choice (m0 or m). He


will never move to n; once he knows enough about n (by
examining some of its descendants) to reach this conclusion,
he can prune it (it has no impact on the outcome).
General Principle for alpha-beta pruning
Bounds for the values in the path

I α = the value of the best (i.e., highest-value) choice so far


at any choice point along the path for MAX. α = ”at least”
I β = the value of the best (i.e., lowest-value) choice so far at
any choice point along the path for MIN. β = ”at most”
Effectiveness of alpha-beta pruning

Performance if perfect pruning


O(bm/2 ) v. O(bm )

A simple ordering function for exploring moves


1. captures
2. threats
3. forward moves
4. backward moves

Dynamic move-ordering schemes


First, the best moves in the past → close to the theoretical limit
Effectiveness of alpha-beta pruning

Iterative deepening
I search one ply deep and record the moves ranking based
on their evaluations
I search one ply deeper, using the previous ranking to
inform move ordering; and so on
This also allows controlling time restrictions

Killer moves
Killer moves = the best moves (e.g., caused a beta cutoff).
These moves should be tried first
Effectiveness of alpha-beta pruning

Transpositions
Permutations of a move sequence ending up in the same
position

E.g., sequence [w1 , b1 , w2 , b2 ] leads to state s; by exploring the


tree under s, we get its backed value and cache it

If we ever get sequence [w2 , b2 , w1 , b1 ], we know it also leads


to s. We’ll look up the value rather than recomputing it
Effectiveness of alpha-beta pruning

Types of strategies
I Type A: consider all possible moves to a certain depth,
then use a heuristic evaluation function to estimate the
utility of states at that depth. It explores a wide but shallow
portion of the tree.
I Type B: ignore moves that look bad, and follows promising
lines ”as far as possible.” It explores a deep but narrow
portion of the tree.
Heuristic Alpha-Beta Tree Search

Cutting off search


Limited time → cut off search and apply a heuristic evaluation
function to states

EVAL replaces UTILITY


EXPECTIMINIMAX(s,d) =


 EVAL(s,MAX),



 if IS-CUTOFF(s,d)
maxa∈Actions(s) EXPECTIMINIMAX(RESULT(s,a),d+1),


 if TO-MOVE(s)=MAX



 mina∈Actions(s) EXPECTIMINIMAX(RESULT(s,a),d+1),
if TO-MOVE(s)=MIN

Evaluation functions

Heuristic function EVAL(s, p)


Returns an estimate of the expected utility of state s to player p

Terminal states:
EVAL(s,p) = UTILITY(s,p)
Non-terminal states:
UTILITY(loss,p) ≤ EVAL(s,p) ≤ UTILITY(win,p)

Desirable properties for evaluation functions


I can be computed fast
I strongly correlated with the actual chance of winning
Evaluation functions

Building evaluation functions


Usually based on features (e.g., how many pawns, queens
etc.), defining categories (equivalence classes) (e.g., a
category is ”all endgames with 2 pawns vs. 1 pawn”)

Experience says 82% of the states in this category lead to a


win (utility +1); 2% to a loss (0), and 16% to a draw (1/2).
Evaluation for states in the category is the expected value:
(0.82 × +1) + (0.02 × 0) + (0.16 × 1/2) = 0.90

Estimating probabilities requires too much experience


Evaluation functions

Building evaluation functions


Combine features:
I Material piece value estimation: P=1, B=3, Q=9
I ”good pawn structure” and ”king safety”=1/2

We can use a linear combination of features:


n
X
EVAL(s)= wi fi (s), wi normalized
i=1
Evaluation functions

Building evaluation functions


n
X
EVAL(s)= wi fi (s), wi normalized
i=1

Notes:
I wi can be estimated via Machine Learning
I correlation function–chances to win not necessarily linear:
s twice as likely to win as s0 requires EVAL(s)>EVAL(s’),
not necessarily EVAL(s)=2*EVAL(s’)
Cutting off search

When to cut off the search?


To a specific limit d BUT:

Quiescent positions
Apply the evaluation function only to quiescent positions: where
there is no pending move (e.g., capturing the queen) which
might swing the evaluation
Cutting off search

Quiescence

In (b), black ahead by K+2P, but the Q capture will change this
Cutting off search

Horizon effect

Damage unavoidable, but which could be delayed for a while.

B doomed, but P sacrifices can push the loss over the horizon
(→ so considered good)
Cutting off search

Horizon effect

Mitigation: singular extensions moves ”clearly better”, even if


the search is cut off
E.g., R h1 → a1 → a2 are clearly better, so given a chance to
extend the search
Forward Pruning

Forward pruning as a Type B strategy


Prune moves that appear poor, even if they might prove good.
I Beam search: on each ply, consider only ”top n” best
moves (according to the evaluation function)
I late move reduction: reduce the search depth for the
moves in the last part of the list of possible moves
Performance of techniques on chess

Chess
I Branching factor: 35 → 355 = 5 ∗ 106
I minimax search: 5 ply, not more → average human player
I alpha–beta search + large transposition table → 14 ply
(expert level)
I for grandmaster status, we need: an extensively tuned
evaluation function + a large database of endgame moves
I STOCKFISH: all of the above → 30 ply (> the ability of any
human player)
Search v. lookup

Reusing chess openings


Typically, rely on human experience for the first 10–15 moves
(then reach a rare position and switch to search)

Near end of the game (in chess)


Computers can completely solve the endgame → policy
mapping state to the best move in it; store it in a lookup table
Search v. lookup

Retrograde minimax search for building KBNK table


1. start with all possible positions
2. mark the ones where White wins
3. generate the ones from which White gets to the winning
positions no matter what Black does
4. mark them as White wins
5. repeat → perfect lookup table for RBNK

Where are we now?


Endings for up to 7 pieces (400M positions)
Endings for up to 8 pieces would need 40 ∗ 1015 positions
That’s all, folks!

Thanks for your attention...Questions?

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy