1 Greedy
1 Greedy
Analysis of Algorithms
Lecture 2
!
Greedy Algorithms
Graph-Theoretic Formulation:
Node = Computer
3
Edge = Pair of computers
Edge Cost(u,v) = Distance(u,v) 2 2 2 1
Graph-Theoretic Formulation:
Node = Computer
3
Edge = Pair of computers
Edge Cost(u,v) = Distance(u,v) 2 2 2 1
The heaviest edge in a cycle is the maximum cost edge in the cycle
Property 2. The heaviest edge in a cycle never belongs to an MST
X = { }
While there is a cut (S, V\S) s.t. X has no edges across it
X = X + {e}, where e is the lightest edge across (S,V\S)
X = { }
While there is a cut (S, V\S) s.t. X has no edges across it
X = X + {e}, where e is the lightest edge across (S,V\S)
X = { }
For each edge e in increasing order of weight:
If the end-points of e lie in different components in X,
Add e to X
1 n
Interval Ik = [ k+1, k+2, .., 2k]
1 n
Interval Ik = [ k+1, k+2, .., 2k] #intervals = log*n
1 n
Interval Ik = [ k+1, k+2, .., 2k] #intervals = log*n
1 n
Interval Ik = [ k+1, k+2, .., 2k] #intervals = log*n
Break up 1..n into intervals Ik = [ k+1, k+2, .., 2k]
Charging Scheme: If rank[x] is in Ik, set t(x) = 2k
Total time on m find operations <= m log*n+⌃ t(x)
Therefore, we need to estimate ⌃t(x)
The Union-Find Data Structure
Property 1: If x is not a root, then Property 2: For root x, if rank[x] = k,
rank[p[x]] > rank[x] then subtree at x has size >= 2k
Property 3: There are at most n/2k
nodes of rank k
1 n
Interval Ik = [ k+1, k+2, .., 2k] #intervals = log*n
X = { }
For each edge e in increasing order of weight:
If the end-points of e lie in different components in X,
Add e to X
X = { }, S = {r}
Repeat until S has n nodes:
Pick the lightest edge e in the cut (S,V - S)
Add e to X
Add v, the end-point of e in V - S to S
S V-S
Prim’s Algorithm
X = { }, S = {r}
Repeat until S has n nodes:
Pick the lightest edge e in the cut (S,V - S)
Add e to X
Add v, the end-point of e in V - S to S
key(x 4 8
9 7
key(x 4 8
9 7
key(x 4 8
9 7
key(x 4 8
9 7 1
key(x 4 8
9 7 1
key(x 4 1
9 7 8
key(x 4 1
9 7 8
key(x 4 2
9 7 8
key(x 4 2
9 7 8
key(x 4 2
9 7 8
key(x 4 2
9 7
key(x 4 2
9 7
key(x 4 8
9 7
8 4
key(x
9 8 5 5
9 9 8 9 5
8 4
key(x
5 8 5 5
9 9 8 9
8 4
key(x
5 8 5 5
9 9 8 9
5 4
key(x
8 8 5 5
9 9 8 9
key(x 4 8
9 7
X = { }, S = {r}
Repeat until S has n nodes:
Pick the lightest edge e in the cut (S,V - S)
Add e to X
Add v, the end-point of e in V - S to S
Procedure:
Initialize: each node is a cluster
Until we have one cluster:
Pick two closest clusters C, C*
Merge S = C U C*
!
Distance between two clusters:
d(C, C*) = minx in C, y in C* d(x, y)
!
Can you recognize this algorithm?
!
Greedy Algorithms
a
S1 Cache Contents
b
E - Evicted items
a a
S1 Cache Contents
b b
E - - Evicted items
a a a
S1 Cache Contents
b b c
E - - b Evicted items
a a a b
S1 Cache Contents
b b c c
E - - b a Evicted items
a a a b b
S1 Cache Contents
b b c c c
E - - b a - Evicted items
a a a b b b
S1 Cache Contents
b b c c c a
E - - b a - c Evicted items
a a a b b b b b
S1 Cache Contents
b b c c c a a a
E - - b a - c - - Evicted items
a
S1 Cache Contents
b
E - Evicted items
a a
S1 Cache Contents
b b
E - - Evicted items
a a c
S1 Cache Contents
b b b
E - - a Evicted items
a a c c
S1 Cache Contents
b b b b
E - - a - Evicted items
a a c c c
S1 Cache Contents
b b b b b
E - - a - - Evicted items
a a c c c b
S1 Cache Contents
b b b b b c
E - - a - - - Evicted items
a a c c c b b
S1 Cache Contents
b b b b b c a
E - - a - - - c Evicted items
a a c c c b b b
S1 Cache Contents
b b b b b c a a
E - - a - - - c - Evicted items
a a c c c b b b
S1 Cache Contents
b b b b b c a a
E - - a - - - c - Evicted items
M a b c b c b a a
a a a b b b b b
S1 Non-reduced
b b c c c a a a
a a a b b b b b
S2 Reduced
b b c c c c a a
Sj a, b
SFF a, b
Sj+1 a, b
Sj a, b a, b
SFF a, b a, b
Sj+1 a, b a, b
Sj a, b a, c
SFF a, b a, c
Sj+1 a, b a, c
Case 2: Cache miss at t=j+1, Sj and SFF evict same item. Sj+1 = Sj
Caching: FF Schedules
Theorem: Suppose a reduced schedule Sj makes the same decisions as SFF
from t=1 to t=j. Then, there exists a reduced schedule Sj+1 s.t:
1. Sj+1 makes same decision as SFF from t=1 to t=j+1
2. #fetches(Sj+1) <= #fetches(Sj)
Sj c, b d, c
SFF a, c ...
Sj+1 a, c d, c
Sj c, b c, a
SFF a, c ...
Sj+1 a, c c, a
Sj c, b d, b
SFF a, c ...
Sj+1 a, c d, b
Sj c, b d, b
SFF a, c ...
Sj+1 a, c d, b
Case 3d: Cache miss at t=j+1. Sj evicts a, SFF evicts b. Sj+1 also evicts b
Next there is a request to b. Cannot happen as a is accessed before b!
Summary: Optimal Caching
Theorem: Suppose a reduced schedule Sj makes the same decisions as SFF
from t=1 to t=j. Then, there exists a reduced schedule Sj+1 s.t:
1. Sj+1 makes same decision as SFF from t=1 to t=j+1
2. #fetches(Sj+1) <= #fetches(Sj)
Suppose you claim a magic schedule schedule SM makes less fetches than SFF
Then, we can construct a sequence of schedules:
SM = S0, S1, S2, ..., Sn = SFF such that:
(1) Sj agrees with SFF from t=1 to t = j
(2) #fetches(Sj+1) <= #fetches(Sj)
• k-Center
• Set Cover
Approximation Algorithms
Metric Space:
Point set w/ distance fn d
Properties of d:
!
•d(x, y) >= 0
•d(x, y) = d(y, x)
•d(x, y) <= d(x, z) + d(y, z)
NP Hard in general
Greedy Algorithm: Furthest-first
traversal
1. Pick C = {x}, for an arbitrary point x
2. Repeat until C has k centers:
Let y maximize d(y, C), where
d(y, C) = minx in C d(x, y)
C = C U {y}
Greedy Algorithm: Furthest-first
traversal
1. Pick C = {x}, for an arbitrary point x
2. Repeat until C has k centers:
Let y maximize d(y, C), where
d(y, C) = minx in C d(x, y)
C = C U {y}
Greedy Algorithm: Furthest-first
traversal
1. Pick C = {x}, for an arbitrary point x
2. Repeat until C has k centers:
Let y maximize d(y, C), where
d(y, C) = minx in C d(x, y)
C = C U {y}
Greedy Algorithm: Furthest-first
traversal
1. Pick C = {x}, for an arbitrary point x
2. Repeat until C has k centers:
Let y maximize d(y, C), where
d(y, C) = minx in C d(x, y)
C = C U {y}
Greedy Algorithm: Furthest-first
traversal
1. Pick C = {x}, for an arbitrary point x
2. Repeat until C has k centers:
Let y maximize d(y, C), where
d(y, C) = minx in C d(x, y)
C = C U {y}
Furthest-first Traversal
Properties of d:
!
•d(x, y) >= 0
•d(x, y) = d(y, x)
•d(x, y) <= d(x, z) + d(y, z)
For a set S,
d(x, S) = miny in Sd(x, y)
Property 1. Solution value of FF-traversal = r
Property 2. There are at least k+1 points S s.t
FF-traversal: each pair has distance >= r
Pick C = {x}, arbitrary x Property 3. Any k-center solution must assign at
Repeat until C has k centers: least two points x, y in S to the same center c
Let y maximize d(y, C)
C = C U {y} What is Max(d(x, c), d(y, c)) ?
Furthest-first(FF) Traversal
Theorem: Approx. ratio of FF-traversal is 2
Metric Space:
Define, for any instance: r = max x d(x, C)
Point set w/ distance fn d
Properties of d:
!
•d(x, y) >= 0
•d(x, y) = d(y, x)
•d(x, y) <= d(x, z) + d(y, z)
For a set S,
d(x, S) = miny in Sd(x, y)
Property 3. Any k-center solution must assign at
least two points x, y in S to the same center c
FF-traversal: What is max(d(x, c), d(y, c)) ?
Pick C = {x}, arbitrary x
Repeat until C has k centers: From property of d,
d(y,c) d(x,y)
Let y maximize d(y, C) d(x,c) + d(y, c) >= d(x, y)
C = C U {y} max(d(x,c), d(y,c)) >= d(x,y)/2
d(x,c)
Furthest-first(FF) Traversal
Theorem: Approx. ratio of FF-traversal is 2
Metric Space:
Define, for any instance: r = max x d(x, C)
Point set w/ distance fn d
Properties of d:
!
•d(x, y) >= 0
•d(x, y) = d(y, x)
•d(x, y) <= d(x, z) + d(y, z)
For a set S,
d(x, S) = miny in Sd(x, y)
Property 1. Solution value of FF-traversal = r
Property 2. There are at least k+1 points S s.t
FF-traversal: each pair has distance >= r
Pick C = {x}, arbitrary x Property 3. Any k-center solution must assign at
Repeat until C has k centers: least two points x, y in S to the same center c
Let y maximize d(y, C)
Max(d(x, c), d(y, c)) >= d(x,y)/2 >= r/2
C = C U {y}
Property 4. Any other solution has value >= r/2
Applications:
• Facility-location problems
• Clustering
• Initialization step in clustering problems
e.g, k-means++
Greedy Approximation Algorithms
• k-Center
• Set Cover
Set Cover Problem
Given:
• Universe U with n elements
• Collection C of sets of elements of U
!
Find the smallest subset C* of C that covers all of U
NP Hard in general
Set Cover Problem
Given:
• Universe U with n elements
• Collection C of sets of elements of U
!
Find the smallest subset C* of C that covers all of U
NP Hard in general
Applications
C* = { }
Repeat until all of U is covered:
Pick the set S in C with highest # of uncovered elements
Add S to C*
A Greedy Set-Cover Algorithm
C* = { }
Repeat until all of U is covered:
Pick the set S in C with highest # of uncovered elements
Add S to C*
A Greedy Set-Cover Algorithm
C* = { }
Repeat until all of U is covered:
Pick the set S in C with highest # of uncovered elements
Add S to C*
A Greedy Set-Cover Algorithm
C* = { }
Repeat until all of U is covered:
Pick the set S in C with highest # of uncovered elements
Add S to C*
A Greedy Set-Cover Algorithm
C* = { }
Repeat until all of U is covered:
Pick the set S in C with highest # of uncovered elements
Add S to C*
A Greedy Set-Cover Algorithm
C* = { }
Repeat until all of U is covered:
Pick the set S in C with highest # of uncovered elements
Add S to C*
A Greedy Set-Cover Algorithm
C* = { }
Repeat until all of U is covered:
Pick the set S in C with highest # of uncovered elements
Add S to C*
A Greedy Set-Cover Algorithm
C* = { }
Repeat until all of U is covered:
Pick the set S in C with highest # of uncovered elements
Add S to C*
Greedy: #sets=7
A Greedy Set-Cover Algorithm
C* = { }
Repeat until all of U is covered:
Pick the set S in C with highest # of uncovered elements
Add S to C*
Greedy: #sets=7
OPT: #sets=5
Greedy Set-Cover Algorithm
Theorem: If optimal set cover has k sets, then greedy selects <= k ln n sets
Greedy Algorithm:
C* = { }
Repeat until U is covered:
Pick S in C with highest # of uncovered elements
Add S to C*
Define:
n(t) = #uncovered elements after step t in greedy
!
Property 1: There is some S that covers at least
n(t)/k of the uncovered elements
!
Property 2: n(t+1) <= n(t)(1 - 1/k)
!
Property 3: n(T) <= n(1 - 1/k)T < 1,
when T = k ln n
Greedy Algorithms