0% found this document useful (0 votes)
12 views78 pages

Sample of Size 1 Mega

Uploaded by

Yomna Yasser
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views78 pages

Sample of Size 1 Mega

Uploaded by

Yomna Yasser
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 78

Fall 2024 - Analysis and Design of Algorithms

Lecture 7: Greedy Algorithms

Ahmed Kosba

Department of Computer and Systems Engineering


Faculty of Engineering
Alexandria University
Greedy Algorithms
• Greedy Strategy:
• At any step, when we have multiple options to choose
from, choose the best option at the moment, i.e., the
option that offers the highest immediate benefit.

• This certainly does not lead to optimal solutions to all


problems.
• We have seen several examples in the dynamic programming
lectures where the greedy strategy fails.

• In this lecture, we will see several examples where


the greedy strategy works.
2
Outline
• Activity selection
• Fractional knapsack
• Huffman codes

3
Activity Selection [CLRS 16.1]
Given a set of n activities, S = {a1, a2, …, an}, where each
activity ai has a start time si and a finish time fi, find a
maximum-size subset of mutually compatible activities.

• Each activity ai takes place during the half-open interval [si, fi).
• Two activities ai and aj are compatible iff
[si, fi) and [sj, fj) do not overlap.
• Assume that the activities are already sorted by their finish
times.

4
Activity Selection
• Example [CLRS notes]:
i 1 2 3 4 5 6 7 8 9
si 1 2 4 1 5 8 9 11 13
fi 3 5 7 8 9 10 11 14 16

a5

a4

a2 a7 a9

a1 a3 a6 a8

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Activity Selection
• Example [CLRS notes]:
i 1 2 3 4 5 6 7 8 9
si 1 2 4 1 5 8 9 11 13
fi 3 5 7 8 9 10 11 14 16
Solution 1
a5

a4

a2 a7 a9

a1 a3 a6 a8

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Activity Selection
• Example [CLRS notes]:
i 1 2 3 4 5 6 7 8 9
si 1 2 4 1 5 8 9 11 13
fi 3 5 7 8 9 10 11 14 16
Solution 2
a5

a4

a2 a7 a9

a1 a3 a6 a8

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Activity Selection
• Example [CLRS notes]:
i 1 2 3 4 5 6 7 8 9
si 1 2 4 1 5 8 9 11 13
fi 3 5 7 8 9 10 11 14 16
Solution 3
a5 There are more solutions.

a4

a2 a7 a9

a1 a3 a6 a8

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Activity Selection
• To solve the previous problem, we can use dynamic
programming.
• However, we will discover a simpler greedy
algorithm.
• We will start by the DP solution as a review, then
we will discuss the greedy one.
• Note: The DP solution presented next is not the most
efficient DP solution to the problem.
• The goal is to illustrate the difference between DP and
greedy algorithms.
• We use the formalization used in CLRS.
9
Activity Selection
• Examining the structure of the problem
• Suppose some activity ak is part of the optimal solution
for the set S, i.e., ak belongs to the maximum subset of
mutually compatible activities.
The optimal solution in this case must include

The optimal solution ak The optimal solution


for the set of activities for the set of activities
that finish before ak that start after ak

As in the examples covered previously, we don't know which ak belong to the optimal
10
solution, so we have to consider all options when writing the recursive definition.
Activity Selection
• The problem has optimal substructure.
• The optimal solution of the original problem includes
optimal solutions to the subproblems.

• Following the DP paradigm of the previous lecture


• Next step: find a recursive definition

11
Activity Selection
Some notation:
• 𝑆𝑖𝑗 is the set of activities that start after activity 𝑎𝑖
finishes and that finish before activity 𝑎𝑗 starts.

𝑆𝑖𝑗 = {𝑎𝑘 ∈ 𝑆 ∶ 𝑓𝑖 ≤ 𝑠𝑘 < 𝑓𝑘 ≤ 𝑠𝑗 }

• 𝐴𝑖𝑗 is the maximum-size subset of mutually compatible


activities in 𝑆𝑖𝑗 .

• |𝐴𝑖𝑗 | is the size of the set 𝐴𝑖𝑗

12
Activity Selection

ai
aj

𝑓𝑖 𝑠𝑗

13
Activity Selection
𝑆𝑖𝑗 is the set of activities that start
after activity 𝑎𝑖 finishes and that
𝑆𝑖𝑗 finish before activity 𝑎𝑗 starts.

ai
aj

𝑓𝑖 𝑠𝑗
All activities that start and finish in
this interval belong to 𝑆𝑖𝑗 .
14
Activity Selection

𝑆𝑖𝑗

ai
ak
aj

𝑓𝑖 𝑠𝑘 𝑓𝑘 𝑠𝑗

Suppose the optimal solution for 𝑆𝑖𝑗 includes activity ak.


15
Activity Selection

𝑆𝑖𝑗

ai
ak
aj

𝑓𝑖 𝑠𝑘 𝑓𝑘 𝑠𝑗

Suppose the optimal solution for 𝑆𝑖𝑗 includes activity ak, then it must
16
include the optimal solutions for the shaded intervals as well.
Activity Selection
𝑆𝑖𝑗

𝑆𝑖𝑘 𝑆𝑘𝑗

ai
ak
aj

𝑓𝑖 𝑠𝑘 𝑓𝑘 𝑠𝑗

Suppose the optimal solution for 𝑆𝑖𝑗 includes activity ak, then it must
17
include the optimal solutions for the shaded intervals as well.
Activity Selection
|𝐴𝑖𝑗 | = |𝐴𝑖𝑘 | + |𝐴𝑘𝑗 | + 1
𝑆𝑖𝑗

𝑆𝑖𝑘 𝑆𝑘𝑗

ai
ak
aj

𝑓𝑖 𝑠𝑘 𝑓𝑘 𝑠𝑗

Suppose the optimal solution for 𝑆𝑖𝑗 includes activity ak, then it must
18
include the optimal solutions for the shaded intervals as well.
Optimal solution for the set 𝑆𝑖𝑗 Counting ak

Activity Selection
|𝐴𝑖𝑗 | = |𝐴𝑖𝑘 | + |𝐴𝑘𝑗 | + 1
𝑆𝑖𝑗

𝑆𝑖𝑘 𝑆𝑘𝑗 Optimal solution


for subproblems
(the sets 𝑆𝑖𝑘 and
𝑆𝑘𝑗 ).

ai
ak
aj

𝑓𝑖 𝑠𝑘 𝑓𝑘 𝑠𝑗

Suppose the optimal solution for 𝑆𝑖𝑗 includes activity ak, then it must
19
include the optimal solutions for the shaded intervals as well.
Activity Selection Check your understanding
Is 𝑆𝑖𝑗 = 𝑆𝑖𝑘 U 𝑆𝑘𝑗 U {ak} ?

𝑆𝑖𝑗

𝑆𝑖𝑘 𝑆𝑘𝑗

ai
ak
aj

𝑓𝑖 𝑠𝑘 𝑓𝑘 𝑠𝑗

Suppose the optimal solution for 𝑆𝑖𝑗 includes activity ak, then it must
20
include the optimal solutions for the shaded intervals as well.
Exercise:
Is 𝑆𝑖𝑗 = 𝑆𝑖𝑘 U 𝑆𝑘𝑗 U {ak} ?

Activity Selection Not necessarily. There


could be activities like the
dark blue ones below
𝑆𝑖𝑗 which belong to 𝑆𝑖𝑗 , but
cannot belong to 𝑆𝑖𝑘 U
𝑆𝑘𝑗 .
𝑆𝑖𝑘 𝑆𝑘𝑗 When considering an
optimal solution that has
ak , we do not consider the
dark blue activities.

ai
ak
aj

𝑓𝑖 𝑠𝑘 𝑓𝑘 𝑠𝑗

Suppose the optimal solution for 𝑆𝑖𝑗 includes activity ak, then it must
21
include the optimal solutions for the shaded intervals as well.
Activity Selection – DP Solution
• Recursive definition:
As before, we don't know which k would lead to the
optimal solution, so we loop over all 𝑎𝑘 in 𝑆𝑖𝑗 and select
what leads to the maximum.

Let 𝑐 𝑖, 𝑗 be the size of the optimal solution for 𝑆𝑖𝑗


𝑐 𝑖, 𝑗 = 0 𝑖𝑓 𝑆𝑖𝑗 = ∅
𝑐 𝑖, 𝑗 = max 𝑐 𝑖, 𝑘 + 𝑐 𝑘, 𝑗 + 1 𝑖𝑓 𝑆𝑖𝑗 ≠ ∅
𝑎𝑘 ∈ 𝑆𝑖𝑗
• As in the DP lecture, we could implement this by either a
bottom-up approach or top-down approach with
memoization.
• However, is this the best we can do? 22
Activity Selection – DP Solution
• Notes
• The previous DP formalization can be simplified.
• Furthermore, there is a more efficient DP solution than
the previous one.

• Exercise: Can you find a simplified recursive definition


that won’t require solving two subproblems?

23
Activity Selection
A simpler solution
• Greedy strategy:
• Instead of solving all the subproblems for each possible
ak, choose the activity ak in a greedy way (before solving
any of the subproblems!)

• In this context, a possible greedy choice is to select an


activity that would leave more space for the other
activities.
• The greedy choice we will use is based on the earliest finish
time.
• We will show other ways that do not work.

• Note that the choice is made without considering the


future choices, i.e., before solving any of the next
subproblems. 24
Activity Selection – Greedy
Solution
• Greedy strategy:
• Choose the first activity to finish.
• As the activities are sorted by the finish time, this means
that we select the first activity in the interval we are
considering.

• When the first activity is selected for the optimal


solution, note that only one subproblem remains.

25
Activity Selection – Greedy
Solution
Back to our example:

a5

a4

a2 a7 a9

a1 a3 a6 a8

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

26
Activity Selection – Greedy
Solution

a5

a4

a2 a7 a9

a1 a3 a6 a8

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

1- Select a1 as it has the earliest finish time.


27
Activity Selection – Greedy
Solution
Now solve the subproblem in the interval 3 to 16.

a5

a4

a2 a7 a9

a1 a3 a6 a8

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

28
Activity Selection – Greedy
Solution
Now solve the subproblem in the interval 3 to 16.

a5

a4

x a2 a7 a9

a1 a3 a6 a8

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

2- The next activity with earliest finish time is a2 , but we


will skip it, as its start time < 3. 29
Activity Selection – Greedy
Solution
Now solve the subproblem in the interval 3 to 16.

a5

a4

x a2 a7 a9

a1 a3 a6 a8

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

3- Select a3 as it has the next earliest finish time, and it start time >= 3
30
Activity Selection – Greedy
Solution
Now solve the subproblem in the interval 7 to 16.

a5

a4

x a2 a7 a9

a1 a3 a6 a8

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

31
Activity Selection – Greedy
Solution
Now solve the subproblem in the interval 7 to 16.

a5

x a4

x a2 a7 a9

a1 a3 a6 a8

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

4- The next activity with earliest finish time is a4 , but we


will skip it, as its start time < 7. 32
Activity Selection – Greedy
Solution
Now solve the subproblem in the interval 7 to 16.

x a5

x a4

x a2 a7 a9

a1 a3 a6 a8

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

5- The next activity with earliest finish time is a5 , but we


will skip it, as its start time < 7. 33
Activity Selection – Greedy
Solution
Now solve the subproblem in the interval 7 to 16.

x a5

x a4

x a2 a7 a9

a1 a3 a6 a8

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

6- Select a6 as it has the next earliest finish time, and it start time >= 7
34
And so on.
Other options for the greedy
choice?
• In the previous example, we used "the earliest
finish" criteria.
• We will prove shortly that it will lead to an optimal
solution.

• This does not mean that any greedy criteria will


work. For example, think about the following
criteria:
• Choose the shortest activity first.
• Choose the activity which has the minimum number of
conflicts.
Both of them won't work.

35
Other Greedy Criteria
Choosing shortest activity first?
• Choosing the shortest activity first will not
necessarily lead to an optimal solution.

• Consider this counter example:


Optimal solution

Choosing the activity with the shortest duration will not


lead to an optimal solution. 36
Other Greedy Criteria
Choosing activity with minimum number of
conflicts?
• This will not necessarily lead to an optimal solution
as well. Counter example:

Optimal solution

Choosing the activity with minimum number of conflicts


37
will not lead to optimal solution.
Greedy Criteria
• This implies that not any criteria that just makes sense
would work.

• We need to prove that the strategy will lead to an


optimal solution.

• In greedy algorithms, we prove two properties:


• The greedy-choice property
• The greedy choice will lead to an optimal solution.
• The optimal substructure property
• Combining the greedy choice with the optimal solution of the
subproblem will lead to an optimal solution of the problem.

38
Activity Selection – Greedy
Solution
• As we now consider one subproblem only, the notation can
be simplified.
• Let 𝑆𝑘 be the set of activities that start after activity 𝑎𝑘
finishes.
𝑆𝑘 = {𝑎𝑖 ∈ 𝑆 ∶ 𝑠𝑖 ≥ 𝑓𝑘 }

• Optimal substructure property:


• If a1 is part of the optimal solution, then the optimal solution must
contain the optimal solution of S1. (This can be proven by a simple
contradiction)
• Next, we will prove that the greedy choice will lead to an
optimal solution.
39
Activity Selection – Greedy Solution
Proof of Optimality [CLRS]
Theorem:
If Sk is nonempty and am has the earliest finish time in Sk,
then am is included in some optimal solution for Sk.
Proof:
• Let Ak be an optimal solution to Sk, i.e., Ak is a maximum-size subset of
mutually compatible activities in Sk.
• Assume aj is the activity with the earliest finish time in Ak.
• If aj = am , done.
• If aj ≠ am
• Let A'k = Ak – {aj} U {am} // include am instead of aj
• The activities in A'k must be non-overlapping, as the activities of the
optimal solution Ak must be non-overlapping, and fm ≤ fj.
• Therefore |A'k| = |Ak| = the size of the maximum-size subset of mutually
compatible activities in Sk, i.e, am is part of a maximum-size subset.
40
Activity Selection – Greedy Solution
Proof of Optimality (Intuition)
Proof illustration via an example:
Note: you cannot use examples to prove a claim. This is for illustration.
a5

a4

a2 a7 a9

a1 a3 a6 a8

S = {a1, …, a9}
Given an optimal solution A = {a2 , a5 , a7 , a8}, show that we can construct an
optimal solution using the activity with the earliest time a1
41
Activity Selection – Greedy Solution
Proof of Optimality (Intuition)
Proof illustration via an example:

a5

a4

a2 a7 a9
Swap

a1 a3 a6 a8

S = {a1, …, a9}
Given an optimal solution A = {a2 , a5 , a7 , a8}, show that we can construct an
optimal solution using the activity with the earliest time a1
As a2 has the earliest time in A, let A'1 = A1 – {a2} U {a1} = {a1 , a5 , a7 , a8} 42
Activity Selection – Greedy Solution
Proof of Optimality (Intuition)
Proof illustration via an example:

a5

a4

a2 a7 a9

a1 a3 a6 a8

A'1 = {a1 , a5 , a7 , a8} is an optimal solution as well.

Note that because a1 finishes earliest, it was possible to swap it with a2 without
43
making overlaps with any other activity. This is generalized in the proof.
Activity Selection – Greedy Solution
Implementation
• Assuming the activities are sorted by the finish
times already, the running time of the greedy
solution will be θ(n).

• If the activities are not sorted, the cost will be O(n


lg n).

44
Activity Selection – Greedy Solution
Implementation
• Iterative implementation [CLRS]

Add the first activity a1 to A


Recall that the activities are sorted by finish times.

Find the first activity that starts after f[k]

45
Activity Selection – Greedy Solution
Implementation
• Recursive implementation [CLRS]
Assume having a dummy activity a0 with f0 = 0
First call: Recursive-Activity-Selector(s, f, 0, n)

Greedy choice One subproblem to solve

46
The Greedy Strategy – Summary
[CLRS]
• Express the optimization problem as a problem in
which we can make a choice, then solve one
subproblem.

• Show that there is always an optimal solution that


includes the greedy choice.

• Show that combining the optimal solution of the


subproblem with the greedy choice leads to an
optimal solution.
47
The Greedy Strategy
• Greedy-choice property:
• A globally optimal solution is reached by making locally
optimal (greedy) choices (without considering solutions
to subproblems).

• In dynamic programming, the situation was different.


We make a choice after finding the solutions to
subproblems.
• This is why the solutions of DP can be built in a bottom-up
manner.
• The greedy approach usually works in a top-down
manner, as the subproblem is solved after making a
choice.

48
The Greedy Strategy
• Optimal substructure
• Recall: A problem has optimal substructure if the
optimal solution incorporates optimal solutions to
subproblems.

• In the context of greedy algorithms, we show that


combining
• the greedy choice
• the optimal solution to the subproblem that we have to solve
after making the greedy choice
will lead to an optimal solution to the problem.

49
Knapsack Problem
• To see the difference between greedy algorithms
and dynamic programming, we will revisit the
knapsack problem covered previously.

• We discussed dynamic programming solutions to 0-


1 knapsack and unbounded knapsack last time.

• In this lecture, we will see a variant of this problem


that can be solved by a greedy algorithm.

50
Knapsack Problem
Given a knapsack (bag) that can hold a weight of at most W, and
n items to pick from.
Each item has weight wi kg and is worth vi dollars.
How to choose items to put in the knapsack, such that the total
value of the items in the knapsack is maximized?

Different versions of this problem:


• Knapsack with repetition (Unbounded Knapsack)
• There is no limit on the quantity of each item. An item can appear 0, 1 or more
times.
• Knapsack without repetition (0-1 Knapsack)
• Each item can appear at most once.
• Fractional Knapsack
51
• We can take a fraction of an item.
Knapsack Example
Example [CLRS]: i wi vi
1 10 60
W = 50 2 20 100
3 30 120

• For 0-1 knapsack, the optimal solution will be items 2 and 3.


• The max total value will be 220.
• What if we are allowed to take fractions of items?
2
• The optimal solution will be items 1 and 2, and of item 3.
3
• The max total value will be 240
52
Fractional Knapsack
• Can be solved using a greedy strategy.
• Compute the value per kg of each item and sort the items
accordingly.
Step 1 Step 2 Step 3
i wi vi vi / wi 2/3 of
item 3
1 10 60 6
Item 2 Item 2
2 20 100 5
3 30 120 4 Item 1 Item 1 Item 1

• Take as much as possible from the item with the highest


value per kg, until its supply ends, or the knapsack is full.
• If there is room in the knapsack, move to the next item with
the 2nd highest value per kg, and repeat. 53
Fractional Knapsack – Correctness Proof
Sketch [informal]
• Prove that the greedy choice can lead to an optimal
solution, then prove the optimal substructure.
• We will discuss the greedy choice first.
• Assume we have an optimal solution, in which we did not
include as much as possible from item 1 (the item that has
the highest vi / wi)
Assume this is the optimal
solution with total value V Item 1 has the highest value
per kg, and some of it has not
been included in the knapsack

Item 1

Consider the case, where items have


Knapsack distinct vi/wi 54
Fractional Knapsack – Correctness Proof
Sketch [informal]
• If we replace x kgs of some item j in the optimal solution
with x kgs of item 1 (j ≠ 1)
Assume this is the optimal Item 1 has the highest value
solution with total value V per kg, and some of it has not
been included in the knapsack

Item j Item 1
Swap x kgs of items 1 and j.

Knapsack
𝑣𝑗 𝑣1
New value of knapsack V' = V – x * +x*
𝑤𝑗 𝑤1
𝑣1 𝑣𝑗
As > , then V' > V
𝑤1 𝑤𝑗
55
This is a contradiction. All of item 1 must be in the optimal solution.
Fractional Knapsack – Correctness Proof
Sketch [informal]
• If we replace x kgs of some item j in the optimal solution
with x kgs of item 1 (j ≠ 1)
Assume this is the optimal Item 1 has the highest value
solution with total value V per kg, and some of it has not
been included in the knapsack

Item j Item 1
Swap x kgs of items 1 and j.

Knapsack

If we don't assume distinct vi / wi, then we can show that V' ≥ V.


Including item 1 will never make the solution worse.
𝑣𝑗 𝑣1 𝑣1 𝑣𝑗
New value of knapsack V' = V – x * +x* . As ≥ , then V' ≥ V 56
𝑤𝑗 𝑤1 𝑤1 𝑤𝑗
Fractional Knapsack – Correctness Proof
Sketch [informal]
Optimal substructure [CLRS]:
• If the optimal solution for weight W contains (some of) item i, then
if we remove x kgs of item i, what remains in the knapsack is the
optimal solution for weight W – x using the other n – 1 items and wi
– x kgs from item i.

• The optimal substructure can be proven by contradiction.


• If the optimal solution for weight W includes x kgs of item i and
some non-optimal solution for weight W – x
• Then, we can simply show that the solution of the problem for
weight W cannot be optimal, because if we made the subproblem
solution better, then we can use it to make the total value of the
knapsack for weight W higher.

57
Fractional Knapsack – Correctness
Proof Sketch [informal]
• Therefore, we can reach the optimal solution by
combining the greedy choice and the optimal
solution to the subproblem that we have to solve
after making the greedy choice.

58
Greedy Strategy for 0-1 Knapsack?
• Note that the greedy approach won't work for the
0-1 knapsack.

i wi vi vi / wi
1 10 60 6
2 20 100 5
3 30 120 4

• The greedy strategy based on vi / wi for the 0-1


knapsack in the above example will lead to items 1
and 2 only, which is not optimal.
59
Huffman Codes

60
Data Compression
• Needed for many applications in practice
• Two categories:
• Lossless data compression
• Allows reconstructing the original data completely from the
compressed version without any loss of information
• Used when changes in the uncompressed data are not
tolerable, e.g., text, programs, etc.
• Lossy data compression
• Allows reconstructing an approximation of the original data.
• Used to compress audio, video and images.

61
Huffman Codes
• A technique for lossless data compression
• According to CLRS, it can achieve savings between
20% and 90%, depending on the characteristics of
the input data.
• Uses a greedy method to find an optimal way for
representing characters.

62
Designing a Binary Code
• How can characters be represented in binary?
• Fixed-length codes
• Each character is represented by a unique binary string
(codeword) of a fixed length.
• Example: ASCII, Unicode
• Variable-length codes
• The codewords representing the characters vary in their length.
• Can be utilized in compression, by assigning short codewords
to the characters that appear frequently.
Character Fixed-length code Variable-length code
A 00 0
B 01 111
C 10 110
D 11 10 63
Designing a Binary Code
• How to decode when using variable-length codes?
• While encoding is straightforward, the decoding might
not result into a single result if the code is not designed
carefully.
• Example
Assume the codewords representing {A, B, C, D} are {1, 10, 110,
111}, how to decode the string 1110?
Both AAB and AC are possible (ambiguity).

To avoid this, we use prefix codes, in which no codeword


is a prefix of any other codeword.
These are known also as prefix-free codes.
64
Prefix Codes
• No codeword can be a prefix of any other
codeword.
• Example codewords: {0, 10, 110, 111}
• Given a string 001010111110, it's straightforward to
decode it.
001010111110
001010111110
001010111110
001010111110
001010111110
001010111110

65
Prefix Codes
• Can be represented by a binary tree.
• Each path from the root to the leaves generates a
codeword.
• Helps during the decoding process.
0 1
Character Codeword
A 0 A
B 111 0 1

C 110 D
D 10 0 1

C B

Example: 001010111110 66
Prefix Codes
Given a prefix code tree T, the number of bits needed
to encode a file can be calculated as:

𝐵 𝑇 = ෍ 𝑑 𝑇 𝑐 ∗ 𝑐. 𝑓𝑟𝑒𝑞
𝑐∈𝐶

Length of codeword representing c Frequency of character c

Goal: Find a prefix code that would minimize the above for a given file.

67
Trees of Optimal Prefix Codes
• Binary trees corresponding to
optimal prefix codes must be full
binary trees. 0 1
• The number of leaves should be
equal to the alphabet size |C| and 0
the number of internal nodes 0 1
should be
|C| - 1. A D
0 1

• These properties can be proven C B


formally.
For example, this cannot correspond to an
• These are necessary conditions
optimal code. Why?
but not sufficient for optimality.
We will use Huffman's algorithm
to get an optimal solution.
68
Huffman Codes
• Huffman codes: Optimal prefix codes
• To illustrate the algorithm, we will trace an example
first.
• The following example is from CLRS
a b c d e f
Freq. 45 13 12 16 9 5

69
Huffman Codes
Example from CLRS
a b c d e f
Freq. 45 13 12 16 9 5

f:5 e:9 c:12 b:13 d:16 a:45

70
Huffman Codes
Example from CLRS
a b c d e f
Freq. 45 13 12 16 9 5

c:12 b:13 14 d:16 a:45


0 1

f:5 e:9

71
Huffman Codes
Example from CLRS
a b c d e f
Freq. 45 13 12 16 9 5

14 d:16 25 a:45
0 1 0 1

f:5 e:9 c:12 b:13

72
Huffman Codes
Example from CLRS
a b c d e f
Freq. 45 13 12 16 9 5

25 30 a:45
0 1 0 1

c:12 b:13 14 d:16


0 1

f:5 e:9

73
Huffman Codes
Example from CLRS
a b c d e f
Freq. 45 13 12 16 9 5

a:45 55
0 1

25 30

0 1 0 1

c:12 b:13 14 d:16


0 1

f:5 e:9
74
Huffman Codes
Example from CLRS
a b c d e f
Freq. 45 13 12 16 9 5

100 1
0

55
a:45 0 1

25 30

0 1 0 1

c:12 b:13 14 d:16


0 1

f:5 e:9 75
Huffman Codes
• How to implement the previous algorithm
efficiently?
• Need a data structure that supports minimum extraction
and insertion.
• Use a priority queue.
• For simplicity, we will assume a binary min-heap is used for
implementing the priority queue.
• Advanced data structures could be used to achieve a better
cost. See discussion in CLRS. This is extracurricular.

76
Huffman Codes
• Pseudocode [CLRS]:

Size of alphabet
All characters are added to the queue.
Note: each character has an attribute freq

Get the two nodes with


the lowest frequencies,
and combine them.

Running time: O(n lg n)


77
Huffman Code – Correctness
Proof
• Need to prove that the problem exhibits both the
greedy choice and optimal substructure properties.

78

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy