L4 Optimization
L4 Optimization
2
Scheduling McGill’s Final Exams
• It’s not so hard to find a solution (though it might be
terrible).
3
Overview
Uninformed search
• Assumes no knowledge about the problem.
• BFS, DFS, Iterative deepening
Informed search
• Use knowledge about the problem, in the form of a heuristic.
• Best-first search, heuristic search, A* (and extensions)
Search for optimization problems:
• Search over large-dimensional spaces
• Iterative improvement algorithms:
1. Hill climbing Today!
2. Simulated annealing
3. Parallelism and beam search
4. Genetic algorithms
4
Search and Optimization
• Search so far:
• Finding the solution with the minimal cost, where the cost is the
sum of edge weights (e.g., A* search)
• Today:
• Case where solution cost is some arbitrary function (Eval(X))
• Want to find best solution (X*) – optimization problem
5
Optimization problems are everywhere
Scheduling
• Given: a set of tasks to be completed, with durations and mutual
constraints (e.g. task ordering, joint resources).
• Goal: generate shortest schedule (assignment of start times to
tasks.)
Digital circuit layout
• Given: a board, components and connections.
• Goal: place each component on the board such as to maximize
energy efficiency, minimize production cost, …
User customization
• Given: customers described by characteristics (age, gender, location,
etc.) and previous purchases.
• Goal: find a function from characteristics to products that maximizes
expected gain.
6
Types of search for optimization
problems
1. Constructive methods: Start from scratch and build up a
solution.
• This is the type of method we have seen so far.
7
Characteristics
• An optimization problem is described by:
• A set of states (= configurations)
• An evaluation function
• For interesting optimization problems, the state space is
too big to enumerate all states, or the evaluation function
is too expensive to compute for all states.
• In iterative improvement, a state is a candidate solution,
not a description of the world!
• It might be a partial or incorrect solution; if so, should be reflected
in evaluation function.
8
Travelling Salesman Problem (TSP)
• Given: a set of vertices and distance between pairs.
• Goal: construct the shortest path that visits every vertex
exactly once (a tour)
• e.g., consider these seven cities on a map
9
Visualizing iterative improvement
• Intuition: Consider all possible solutions laid out on a
landscape. We want to find the highest (or lowest) point.
10
A generic local search algorithm
• Start from an initial configuration X0.
• Repeat until satisfied:
• Generate the set of neighbours of Xi and evaluate them.
• Select one of the neighbours, Xi+1.
• The selected neighbor becomes the current configuration.
11
A generic local search algorithm
• Start from an initial configuration X0.
• Repeat until satisfied:
• Generate the set of neighbours of Xi and evaluate them.
• Select one of the neighbours, Xi+1.
• The selected neighbor becomes the current configuration.
Important questions:
1. How do we choose the set of neighbours to consider?
2. How do we select one of the neighbours?
• Defining the set of neighbours is a design choice (like
choosing the heuristic for A*) and has crucial impact on
performance.
12
What moves should we consider?
• Case 1: Search for high ground
• Start with initial state = random position.
• Move to an adjacent position.
• Terminate when goal is reached.
13
Hill climbing
• Also called greedy local search
• In continuous state space, related to gradient ascent
Start from an initial configuration X0 with value E(X0)
X X0 , and E E(X0)
Repeat until satisfied:
Generate the set of neighbours of Xi and their value E(Xi).
Let Emax = maxi E(Xi) be the value of the best successor,
i* = argmaxi E(Xi) be the index of the best successor.
if Emax E:
return X (we are at an optimum)
else:
let X Xi* , and E Emax. (take a greedy step)
14
Properties of hill climbing
• Very popular in AI:
• Trivial to program!
• Requires no memory of where we’ve been (no backtracking).
• Can handle very large problems.
15
Local vs. Global Optimum
• Global optimum: The optimal
point over the full space of
possible configurations.
16
Example: TSP
17
Example: TSP swapping 2 nodes
18
Example: TSP swapping 3 nodes
19
Problems with hill climbing
• Can get stuck in a local maximum or in a plateau
objective function
global maximum
shoulder
local maximum
“flat” local maximum
state space
current
state
• Relies heavily on having a good evaluation
20
Improvements to hill climbing
• Quick fix:
• When stuck in a plateau or local maximum, use random re-starts.
21
Improvements to hill climbing
• Quick fix:
• When stuck in a plateau or local maximum, use random re-starts.
22
Simulated annealing
23
Simulated annealing
Similar to hill climbing, but:
• allows some “bad moves” to try to escape local maxima.
• decrease size and frequency of “bad moves” over time.
Algorithm:
• Start from an initial configuration X0 with value E(X0).
• Repeat until satisfied:
• Let Xi be a random neighbour of X with value E(Xi).
• If Ei > Emax, let Xi* Xi and let Emax = Ei (we found a new better sol’n).
• If Ei > E then X Xi and E Ei .
• Else, with some probability p, still accept the move: X Xi and E Ei .
• Return Xi* .
24
What value should we use for p?
• Many possible choices:
• A given fixed value.
• A value that decays to 0 over time.
• A value that decays to 0, and gives similar chance to “similarly
bad” moves.
• A value that depends on on how much worse the bad move is.
25
What value should we use for p?
• If the new value Ei is better than the old value E, move to
Xi .
26
Properties of simulated annealing
• What happens when T is high?
• Algorithm is in an exploratory phase (even bad moves have a high
chance of being picked).
• What happens when T is low?
• Algorithm is in an exploitation phase (the “bad” moves have very
low probability).
27
Properties of simulated annealing
• If T decreases slowly enough, simulated annealing is
guaranteed to reach the optimal solution (i.e., find the
global maximum).
• But it may take an infinite number of moves! This result is
not practically useful.
28
Example
29
TSP example: Searching
configurations
30
TSP Example: Energy
31
Simulated annealing in practice
• Very useful algorithm, used to solve hard optimization
problems.
• E.g. Protein design, scheduling large transportation fleets.
• The temperature annealing schedule is crucial (design
choice!)
• Cool too fast: converge to sub-optimal solution.
• Cool too slow: don’t converge.
32
Mitigating the local optimum problem
• Even simulated annealing can get stuck in local maxima!
• More strategies to find a good solution:
• Parallel search
• Beam search
• Genetic algorithms
33
Parallel search
• Run many separate searches (hill climbing or simulated
annealing) in parallel.
34
Local beam search
• Similar to parallel search; run many instances of local
search or simulated annealing at the same time
35
Local beam search schematic
36
Evolutionary computing
• Refers generally to computational procedures patterned
after biological evolution
• Many solutions (individuals) exist in parallel
• Nature looks for the best individual (i.e., the fittest)
37
Genetic algorithms
• A candidate solution is called an individual.
• In a traveling salesman problem, an individual is a tour
• Each individual has a fitness
• fitness = numerical value proportional to quality of that solution
• A set of individuals is called a population.
• Populations change over generations, by applying operations
to individuals.
• operations = {mutation, crossover, selection}
• Individuals with higher fitness are more likely to survive &
reproduce.
• Individual typically represented by a binary string:
• allows operations to be carried out easily.
38
Mutation
• A way to generate desirable features that are not present in
the original population by injecting random change.
• Typically, mutation just means changing a 0 to a 1 (and vice versa).
• The mutation rate controls prob. of mutation occurring
• We can allow mutation in all individuals, or just in the
offspring.
39
Crossover
• Combine parts of individuals to create new individuals
• Single-point crossover:
• Choose a crossover point, cut individuals there, swap the pieces.
E.g. 101|0101 101|1110
011|1110 011|0101
• Implementation:
• Use a crossover mask, which is a binary string
E.g. mask = 1110000
40
Encoding operators as binary masks
41
Typical genetic algorithm
GA(Fitness, threshold, p, r, m)
42
Selection: Survival of the fittest
• As in natural evolution, fittest individuals are more likely to
survive.
• Several ways to implement this idea:
1. Fitness proportionate selection:
Can lead to crowding (multiple copies being propagated).
2. Tournament selection:
Pick i, j at random with uniform probability. With prob p select the fitter
one. Only requires comparing two individuals.
3. Rank selection:
Sort all hypothesis by fitness. Probability of selection is proportional to
rank.
4. Softmax (Boltzman) selection:
43
Genetic algorithms as search
• States: possible solutions
• Search operators: mutation, crossover, selection
• Relation to previous search algorithms:
• Parallel search, since several solutions are maintained in parallel
• Hill-climbing on the fitness function, but without following the
gradient
• Mutation and crossover should allow us to get out of local minima
• Very related to simulated annealing.
44
Example: Solving TSP with a GA
• Each individual is a tour.
• Mutation swaps a pair of edges (many other operations
are possible and have been tried in literature.)
• Crossover cuts the parents in two and swaps them. Reject
any invalid offsprings.
• Fitness is the length of the tour.
• Note that GA operations (crossover and mutation)
described here are fancier that the simple binary
examples given before.
45
Example: Solving TSP with a GA
46
TSP example: Initial generation
47
TSP example: Generation 15
48
TSP example: Generation 30
49
The good and bad of GAs
• Good:
• Intuitively appealing, due to evolution analogy.
• If tuned right, can be very effective (good solution with few
steps.)
• Bad:
• Performance depends crucially on the problem encoding. Good
encodings are difficult to find!
• Many parameters to tweak! Bad parameter settings can result in
very slow progress, or the algorithm is stuck in local minima.
• With mutation rate is too low, can get overcrowding (many copies
of the identical individuals in the population).
50
Summary
• Optimization problems are widespread and important.
• It is infeasible to enumerate lots of solutions.
• Goal is to get a reasonable (not necessarily optimal) solution.
51