Sample of Size 1 Mega
Sample of Size 1 Mega
Ahmed Kosba
3
Activity Selection [CLRS 16.1]
Given a set of n activities, S = {a1, a2, …, an}, where each
activity ai has a start time si and a finish time fi, find a
maximum-size subset of mutually compatible activities.
• Each activity ai takes place during the half-open interval [si, fi).
• Two activities ai and aj are compatible iff
[si, fi) and [sj, fj) do not overlap.
• Assume that the activities are already sorted by their finish
times.
4
Activity Selection
• Example [CLRS notes]:
i 1 2 3 4 5 6 7 8 9
si 1 2 4 1 5 8 9 11 13
fi 3 5 7 8 9 10 11 14 16
a5
a4
a2 a7 a9
a1 a3 a6 a8
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Activity Selection
• Example [CLRS notes]:
i 1 2 3 4 5 6 7 8 9
si 1 2 4 1 5 8 9 11 13
fi 3 5 7 8 9 10 11 14 16
Solution 1
a5
a4
a2 a7 a9
a1 a3 a6 a8
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Activity Selection
• Example [CLRS notes]:
i 1 2 3 4 5 6 7 8 9
si 1 2 4 1 5 8 9 11 13
fi 3 5 7 8 9 10 11 14 16
Solution 2
a5
a4
a2 a7 a9
a1 a3 a6 a8
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Activity Selection
• Example [CLRS notes]:
i 1 2 3 4 5 6 7 8 9
si 1 2 4 1 5 8 9 11 13
fi 3 5 7 8 9 10 11 14 16
Solution 3
a5 There are more solutions.
a4
a2 a7 a9
a1 a3 a6 a8
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Activity Selection
• To solve the previous problem, we can use dynamic
programming.
• However, we will discover a simpler greedy
algorithm.
• We will start by the DP solution as a review, then
we will discuss the greedy one.
• Note: The DP solution presented next is not the most
efficient DP solution to the problem.
• The goal is to illustrate the difference between DP and
greedy algorithms.
• We use the formalization used in CLRS.
9
Activity Selection
• Examining the structure of the problem
• Suppose some activity ak is part of the optimal solution
for the set S, i.e., ak belongs to the maximum subset of
mutually compatible activities.
The optimal solution in this case must include
As in the examples covered previously, we don't know which ak belong to the optimal
10
solution, so we have to consider all options when writing the recursive definition.
Activity Selection
• The problem has optimal substructure.
• The optimal solution of the original problem includes
optimal solutions to the subproblems.
11
Activity Selection
Some notation:
• 𝑆𝑖𝑗 is the set of activities that start after activity 𝑎𝑖
finishes and that finish before activity 𝑎𝑗 starts.
12
Activity Selection
ai
aj
𝑓𝑖 𝑠𝑗
13
Activity Selection
𝑆𝑖𝑗 is the set of activities that start
after activity 𝑎𝑖 finishes and that
𝑆𝑖𝑗 finish before activity 𝑎𝑗 starts.
ai
aj
𝑓𝑖 𝑠𝑗
All activities that start and finish in
this interval belong to 𝑆𝑖𝑗 .
14
Activity Selection
𝑆𝑖𝑗
ai
ak
aj
𝑓𝑖 𝑠𝑘 𝑓𝑘 𝑠𝑗
𝑆𝑖𝑗
ai
ak
aj
𝑓𝑖 𝑠𝑘 𝑓𝑘 𝑠𝑗
Suppose the optimal solution for 𝑆𝑖𝑗 includes activity ak, then it must
16
include the optimal solutions for the shaded intervals as well.
Activity Selection
𝑆𝑖𝑗
𝑆𝑖𝑘 𝑆𝑘𝑗
ai
ak
aj
𝑓𝑖 𝑠𝑘 𝑓𝑘 𝑠𝑗
Suppose the optimal solution for 𝑆𝑖𝑗 includes activity ak, then it must
17
include the optimal solutions for the shaded intervals as well.
Activity Selection
|𝐴𝑖𝑗 | = |𝐴𝑖𝑘 | + |𝐴𝑘𝑗 | + 1
𝑆𝑖𝑗
𝑆𝑖𝑘 𝑆𝑘𝑗
ai
ak
aj
𝑓𝑖 𝑠𝑘 𝑓𝑘 𝑠𝑗
Suppose the optimal solution for 𝑆𝑖𝑗 includes activity ak, then it must
18
include the optimal solutions for the shaded intervals as well.
Optimal solution for the set 𝑆𝑖𝑗 Counting ak
Activity Selection
|𝐴𝑖𝑗 | = |𝐴𝑖𝑘 | + |𝐴𝑘𝑗 | + 1
𝑆𝑖𝑗
ai
ak
aj
𝑓𝑖 𝑠𝑘 𝑓𝑘 𝑠𝑗
Suppose the optimal solution for 𝑆𝑖𝑗 includes activity ak, then it must
19
include the optimal solutions for the shaded intervals as well.
Activity Selection Check your understanding
Is 𝑆𝑖𝑗 = 𝑆𝑖𝑘 U 𝑆𝑘𝑗 U {ak} ?
𝑆𝑖𝑗
𝑆𝑖𝑘 𝑆𝑘𝑗
ai
ak
aj
𝑓𝑖 𝑠𝑘 𝑓𝑘 𝑠𝑗
Suppose the optimal solution for 𝑆𝑖𝑗 includes activity ak, then it must
20
include the optimal solutions for the shaded intervals as well.
Exercise:
Is 𝑆𝑖𝑗 = 𝑆𝑖𝑘 U 𝑆𝑘𝑗 U {ak} ?
ai
ak
aj
𝑓𝑖 𝑠𝑘 𝑓𝑘 𝑠𝑗
Suppose the optimal solution for 𝑆𝑖𝑗 includes activity ak, then it must
21
include the optimal solutions for the shaded intervals as well.
Activity Selection – DP Solution
• Recursive definition:
As before, we don't know which k would lead to the
optimal solution, so we loop over all 𝑎𝑘 in 𝑆𝑖𝑗 and select
what leads to the maximum.
23
Activity Selection
A simpler solution
• Greedy strategy:
• Instead of solving all the subproblems for each possible
ak, choose the activity ak in a greedy way (before solving
any of the subproblems!)
25
Activity Selection – Greedy
Solution
Back to our example:
a5
a4
a2 a7 a9
a1 a3 a6 a8
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
26
Activity Selection – Greedy
Solution
a5
a4
a2 a7 a9
a1 a3 a6 a8
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
a5
a4
a2 a7 a9
a1 a3 a6 a8
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
28
Activity Selection – Greedy
Solution
Now solve the subproblem in the interval 3 to 16.
a5
a4
x a2 a7 a9
a1 a3 a6 a8
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
a5
a4
x a2 a7 a9
a1 a3 a6 a8
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
3- Select a3 as it has the next earliest finish time, and it start time >= 3
30
Activity Selection – Greedy
Solution
Now solve the subproblem in the interval 7 to 16.
a5
a4
x a2 a7 a9
a1 a3 a6 a8
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
31
Activity Selection – Greedy
Solution
Now solve the subproblem in the interval 7 to 16.
a5
x a4
x a2 a7 a9
a1 a3 a6 a8
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
x a5
x a4
x a2 a7 a9
a1 a3 a6 a8
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
x a5
x a4
x a2 a7 a9
a1 a3 a6 a8
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
6- Select a6 as it has the next earliest finish time, and it start time >= 7
34
And so on.
Other options for the greedy
choice?
• In the previous example, we used "the earliest
finish" criteria.
• We will prove shortly that it will lead to an optimal
solution.
35
Other Greedy Criteria
Choosing shortest activity first?
• Choosing the shortest activity first will not
necessarily lead to an optimal solution.
Optimal solution
38
Activity Selection – Greedy
Solution
• As we now consider one subproblem only, the notation can
be simplified.
• Let 𝑆𝑘 be the set of activities that start after activity 𝑎𝑘
finishes.
𝑆𝑘 = {𝑎𝑖 ∈ 𝑆 ∶ 𝑠𝑖 ≥ 𝑓𝑘 }
a4
a2 a7 a9
a1 a3 a6 a8
S = {a1, …, a9}
Given an optimal solution A = {a2 , a5 , a7 , a8}, show that we can construct an
optimal solution using the activity with the earliest time a1
41
Activity Selection – Greedy Solution
Proof of Optimality (Intuition)
Proof illustration via an example:
a5
a4
a2 a7 a9
Swap
a1 a3 a6 a8
S = {a1, …, a9}
Given an optimal solution A = {a2 , a5 , a7 , a8}, show that we can construct an
optimal solution using the activity with the earliest time a1
As a2 has the earliest time in A, let A'1 = A1 – {a2} U {a1} = {a1 , a5 , a7 , a8} 42
Activity Selection – Greedy Solution
Proof of Optimality (Intuition)
Proof illustration via an example:
a5
a4
a2 a7 a9
a1 a3 a6 a8
Note that because a1 finishes earliest, it was possible to swap it with a2 without
43
making overlaps with any other activity. This is generalized in the proof.
Activity Selection – Greedy Solution
Implementation
• Assuming the activities are sorted by the finish
times already, the running time of the greedy
solution will be θ(n).
44
Activity Selection – Greedy Solution
Implementation
• Iterative implementation [CLRS]
45
Activity Selection – Greedy Solution
Implementation
• Recursive implementation [CLRS]
Assume having a dummy activity a0 with f0 = 0
First call: Recursive-Activity-Selector(s, f, 0, n)
46
The Greedy Strategy – Summary
[CLRS]
• Express the optimization problem as a problem in
which we can make a choice, then solve one
subproblem.
48
The Greedy Strategy
• Optimal substructure
• Recall: A problem has optimal substructure if the
optimal solution incorporates optimal solutions to
subproblems.
49
Knapsack Problem
• To see the difference between greedy algorithms
and dynamic programming, we will revisit the
knapsack problem covered previously.
50
Knapsack Problem
Given a knapsack (bag) that can hold a weight of at most W, and
n items to pick from.
Each item has weight wi kg and is worth vi dollars.
How to choose items to put in the knapsack, such that the total
value of the items in the knapsack is maximized?
Item 1
Item j Item 1
Swap x kgs of items 1 and j.
Knapsack
𝑣𝑗 𝑣1
New value of knapsack V' = V – x * +x*
𝑤𝑗 𝑤1
𝑣1 𝑣𝑗
As > , then V' > V
𝑤1 𝑤𝑗
55
This is a contradiction. All of item 1 must be in the optimal solution.
Fractional Knapsack – Correctness Proof
Sketch [informal]
• If we replace x kgs of some item j in the optimal solution
with x kgs of item 1 (j ≠ 1)
Assume this is the optimal Item 1 has the highest value
solution with total value V per kg, and some of it has not
been included in the knapsack
Item j Item 1
Swap x kgs of items 1 and j.
Knapsack
57
Fractional Knapsack – Correctness
Proof Sketch [informal]
• Therefore, we can reach the optimal solution by
combining the greedy choice and the optimal
solution to the subproblem that we have to solve
after making the greedy choice.
58
Greedy Strategy for 0-1 Knapsack?
• Note that the greedy approach won't work for the
0-1 knapsack.
i wi vi vi / wi
1 10 60 6
2 20 100 5
3 30 120 4
60
Data Compression
• Needed for many applications in practice
• Two categories:
• Lossless data compression
• Allows reconstructing the original data completely from the
compressed version without any loss of information
• Used when changes in the uncompressed data are not
tolerable, e.g., text, programs, etc.
• Lossy data compression
• Allows reconstructing an approximation of the original data.
• Used to compress audio, video and images.
61
Huffman Codes
• A technique for lossless data compression
• According to CLRS, it can achieve savings between
20% and 90%, depending on the characteristics of
the input data.
• Uses a greedy method to find an optimal way for
representing characters.
62
Designing a Binary Code
• How can characters be represented in binary?
• Fixed-length codes
• Each character is represented by a unique binary string
(codeword) of a fixed length.
• Example: ASCII, Unicode
• Variable-length codes
• The codewords representing the characters vary in their length.
• Can be utilized in compression, by assigning short codewords
to the characters that appear frequently.
Character Fixed-length code Variable-length code
A 00 0
B 01 111
C 10 110
D 11 10 63
Designing a Binary Code
• How to decode when using variable-length codes?
• While encoding is straightforward, the decoding might
not result into a single result if the code is not designed
carefully.
• Example
Assume the codewords representing {A, B, C, D} are {1, 10, 110,
111}, how to decode the string 1110?
Both AAB and AC are possible (ambiguity).
65
Prefix Codes
• Can be represented by a binary tree.
• Each path from the root to the leaves generates a
codeword.
• Helps during the decoding process.
0 1
Character Codeword
A 0 A
B 111 0 1
C 110 D
D 10 0 1
C B
Example: 001010111110 66
Prefix Codes
Given a prefix code tree T, the number of bits needed
to encode a file can be calculated as:
𝐵 𝑇 = 𝑑 𝑇 𝑐 ∗ 𝑐. 𝑓𝑟𝑒𝑞
𝑐∈𝐶
Goal: Find a prefix code that would minimize the above for a given file.
67
Trees of Optimal Prefix Codes
• Binary trees corresponding to
optimal prefix codes must be full
binary trees. 0 1
• The number of leaves should be
equal to the alphabet size |C| and 0
the number of internal nodes 0 1
should be
|C| - 1. A D
0 1
69
Huffman Codes
Example from CLRS
a b c d e f
Freq. 45 13 12 16 9 5
70
Huffman Codes
Example from CLRS
a b c d e f
Freq. 45 13 12 16 9 5
f:5 e:9
71
Huffman Codes
Example from CLRS
a b c d e f
Freq. 45 13 12 16 9 5
14 d:16 25 a:45
0 1 0 1
72
Huffman Codes
Example from CLRS
a b c d e f
Freq. 45 13 12 16 9 5
25 30 a:45
0 1 0 1
f:5 e:9
73
Huffman Codes
Example from CLRS
a b c d e f
Freq. 45 13 12 16 9 5
a:45 55
0 1
25 30
0 1 0 1
f:5 e:9
74
Huffman Codes
Example from CLRS
a b c d e f
Freq. 45 13 12 16 9 5
100 1
0
55
a:45 0 1
25 30
0 1 0 1
f:5 e:9 75
Huffman Codes
• How to implement the previous algorithm
efficiently?
• Need a data structure that supports minimum extraction
and insertion.
• Use a priority queue.
• For simplicity, we will assume a binary min-heap is used for
implementing the priority queue.
• Advanced data structures could be used to achieve a better
cost. See discussion in CLRS. This is extracurricular.
76
Huffman Codes
• Pseudocode [CLRS]:
Size of alphabet
All characters are added to the queue.
Note: each character has an attribute freq
78