1 Greedy Algorithms: 1.1 Activity Selection Problem
1 Greedy Algorithms: 1.1 Activity Selection Problem
Scribe: Virginia Williams, Sam Kim (2016), Mary Wootters (2017) Date: May 22, 2017
1 Greedy Algorithms
Suppose we want to solve a problem, and we’re able to come up with some recursive formulation of the
problem that would give us a nice dynamic programming algorithm. But then, upon further inspection, we
notice that any optimal solution only depends on looking up the optimal solution to one other subproblem.
A greedy algorithm is an algorithm which exploits such a structure, ignoring other possible choices. Greedy
algorithms can be seen as a refinement of dynamic programming; in order to prove that a greedy algorithm
is correct, we must prove that to compute an entry in our table, it is sufficient to consider at most one
other table entry; that is, at each point in the algorithm, we can make a “greedy”, locally-optimal choice,
and guarantee that a globally-optimal solution still exists. Instead of considering multiple choices to solve a
subproblem, greedy algorithms only consider a single subproblem, so they run extremely quickly – generally,
linear or close-to-linear in the problem size.
Unfortunately, greedy algorithms do not always give the optimal solution, but they frequently give good
(approximate) solutions. To give a correct greedy algorithm one must first identify optimal substructure (as
in dynamic programming), and then argue that at each step, you only need to consider one subproblem. That
is, even though there may be many possible subproblems to recurse on, given our selection of subproblem,
there is always an optimal solution that contains the optimal solution to the selected subproblem.
1
This problem requires us to fill in a table of size n2 , so the dynamic programming algorithm will run in
Ω(n2 ) time. The actual runtime is O(n3 ) since filling in a single entry might take O(n) time.
But we can do better! We will show that we only need to consider the ak with the smallest finishing
time, which immediately allows us to search for the optimal activity selection in linear time.
Claim 1. For each Si,j , there is an optimal solution Ai,j containing ak ∈ Si,j of minimum finishing time
fk .
Note that if the claim is true, when fk is minimum, then Ai,k is empty, as no activities can finish before
ak ; thus, our optimal solution only depends on one other subproblem Ak,j (giving us a linear time algorithm).
Here, we prove the claim.
Proof. Let ak be the activity of minimum finishing time in Si,j . Let Ai,j be some maximum set of non-
conflicting activities. Consider A0i,j = Ai,j \ {al } ∪ {ak } where al is the activity of minimum finishing time
in Ai,j . It’s clear that |A0i,j | = |Ai,j |. We need to show that A0i,j does not have conflicting activities. We
know al ∈ Ai,j ⊂ Si,j . This implies fl ≥ fk , since ak has the minimum finishing time in Si,j .
All at ∈ Ai,j \ {al } don’t conflict with al , which means that st ≥ fl , which means that st ≥ fk , so this
means that no activity in Ai,j \ {al } can conflict with ak . Thus, A0i,j is an optimal solution.
Due to the above claim, the expression for Ai,j from before simplifies to the following expression in terms
of ak ⊆ Si,j , the activity with minimum finishing time fk .
|Ai,j | = 1 + |Ak,j |
Ai,j = Ak,j ∪ {ak }
Algorithm Greedy-AS assumes that the activities are presorted in nondecreasing order of their finishing
time, so that if i < j, fi ≤ fj .
Algorithm 1: Greedy-AS(a)
A ← {a1 } // activity of min fi
k←1
for m = 2 → n do
if sm ≥ fk then
//am starts after last acitivity in A
A ← A ∪ {am }
k←m
return A
By the above claim, this algorithm will produce a legal, optimal solution via a greedy selection of activities.
There may be multiple optimal solutions, but there always exists a solution that includes ak with the
minimum finishing time. The algorithm does a single pass over the activities, and thus only requires O(n)
time – a dramatic improvement from the trivial dynamic programming solution. If the algorithm also needed
to sort the activities by fi , then its runtime would be O(n log n) which is still better than the original dynamic
programming solution.
1.2 Scheduling
Consider another problem that can be solved greedily. We are given n jobs which all need a common resource.
Let wj be the weight (or importance) and lj be the length (time required) of job j. Our output is an ordering
of jobs. We define the completion time cj of job j to be the sum of the lengths of jobs in the ordering up to
and including
P lj . Our goal is to output an ordering of jobs that minimizes the weighted sum of completion
times j wj cj .
2
1.2.1 Intuition
Our intuition tells us that if all jobs have the same length, then we prefer larger weighted jobs to appear
earlier in the order. If jobs all have equal weights, then we prefer shorter length jobs in the order.
1 2 3
vs
3 2 1
P3
In the first case, assuming they all have equal weights of 1, i=1 wi ci = 1 + 3 + 6 = 10. In the second case,
P3
i=1 wi ci = 3 + 5 + 6 = 14.
l1 l2 l2 l1
1 2 → 2 1
Note that swapping jobs i and j does not alter the completion times for every other job and only changes
the completion
P times for i and j. ci increases by lj and cj decreases by li . This means that our objective
function i wi ci changes by wi lj − wj li . Since we assumed our order was optimal originally, our objective
function cannot decrease after swapping the jobs. This means,
wi lj − wj li ≥ 0
which implies,
lj li
≥
wj wi
Therefore, we want to process jobs in increasing order of wlii , the ratio of the length to the weight of
each job. The algorithm also does a single pass over jobs, and thus only requires O(n) time, assuming the
jobs were ordered by wlii . Like previously, if the algorithm also needed to sort the jobs based on the ratio of
length to weight, then its runtime would be O(n log n).
3
1.3.1 Tree Representation
We may think of representing our codes in a tree structure, where the codewords represent the leaves of our
tree. An example is shown below:
1
1
0
a: .45 .55
1
0
.25 .30
1 1
0 0
c: .12 b: .13 .14 d:.16
1
0
f:.05 e:.09
Above, in addition to the characters {a, b, c, d, e, f }, we’ve included frequency information. That is, f (a) =
0.45 means that the probability of a random character in this language being equal to a is .45.
The code for each character can be found by concatenating the bits of the path from the root to the
leaves. By convention, every left branch is given the bit 0 and every right branch is given the bit 1.
As long as the characters are on the leaves of this tree, the corresponding code will be prefix-free. This
is because one string is a prefix of another if and only if the node corresponding to the first is an ancestor
of the node corresponding to the second. No leaf is an ancestor of any other leaf, so the code is prefix-free.
We say that a tree T is optimal if this expected cost B(T ) is as small as possible.
4
Input: Set of characters C = {c1 , c2 , . . . , cn } of size n, and F = {f (c1 ), f (c2 ), . . . , f (cn )}, a set of frequencies.
• Create nodes Nk for each character ck , with key f (ck ).
The tree shown above results from running this algorithm on the letters with those frequencies; see the
slides for an illustration of this process.
f (x)dT (a) + f (a)dT (x) − f (x)dT (x) − f (a)dT (a) = (f (x) − f (a))(dT (a) − dT (x)) ≤ 0
1 For simplicity, we ignore the case where a, b, x, y are not distinct. For more details, see Lemma 16.2 in CLRS.
5
Therefore, there swapping a, b with x, y will not increase our objective function B(T ). Hence, there exists
an optimal coding tree where x, y are siblings in the tree.
Claim 2 shows that there exists an optimal coding tree where x and y are sibling leaves, that is, there
is an optimal code that makes the same greedy choice as the algorithm. However, this is only immediately
helpful for the first iteration of the inductive step, when all of the elements of current are indeed leaves. In
order to make this idea work for all t, we need one more claim.
Claim 3. Let C be a set of characters, and let T be a coding tree for C. Imagine creating C 0 from C by
collapsing all the characters in a subtree rooted at a node N with key k = N.key into a single character c0
with frequency k. Then the corresponding tree T 0 is optimal for C 0 .
Conversely, suppose that a tree T 0 that is an optimal coding tree for an alphabet CP 0
. Let c0 ∈ C 0 be a
r
character with frequency f (c ). Introduce new characters c1 , . . . , cr with total frequency i=1 f (c00i ) = f (c0 ).
0 00 00
Let T be an optimal coding tree on c1 , . . . , cr . Then the tree T on the alphabet C = (C \ {c0 }) ∪ {c001 , . . . , c00r }
00 00 00 0