0% found this document useful (0 votes)
49 views47 pages

CSC373 Week 2: Greedy Algorithms Nisarg Shah

The document discusses greedy algorithms and their application to interval scheduling and interval partitioning problems. It describes how to use a greedy approach with earliest finish time to solve the interval scheduling problem optimally in O(n log n) time. It also explains how the interval partitioning problem can be solved optimally using a greedy approach that schedules intervals based on earliest start time into the minimum number of partitions.

Uploaded by

yiyang hua
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views47 pages

CSC373 Week 2: Greedy Algorithms Nisarg Shah

The document discusses greedy algorithms and their application to interval scheduling and interval partitioning problems. It describes how to use a greedy approach with earliest finish time to solve the interval scheduling problem optimally in O(n log n) time. It also explains how the interval partitioning problem can be solved optimally using a greedy approach that schedules intervals based on earliest start time into the minimum number of partitions.

Uploaded by

yiyang hua
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

CSC373

Week 2: Greedy Algorithms

Nisarg Shah

373F19 - Nisarg Shah 1


Recap
• Divide & Conquer
➢ Master theorem
➢ Counting inversions in 𝑂(𝑛 log 𝑛)
2
➢ Finding closest pair of points in ℝ in 𝑂 𝑛 log 𝑛

➢ Fast integer multiplication in 𝑂 𝑛log2 3


➢ Fast matrix multiplication in 𝑂 𝑛log2 7
➢ Finding 𝑘
𝑡ℎ smallest element (in particular, median) in

𝑂(𝑛)

373F19 - Nisarg Shah 2


Greedy Algorithms
• Greedy (also known as myopic) algorithm outline
➢ We want to find a solution 𝑥 that maximizes some
objective function 𝑓
➢ But the space of possible solutions 𝑥 is too large
➢ The solution 𝑥 is typically composed of several parts (e.g.
𝑥 may be a set, composed of its elements)
➢ Instead of directly computing 𝑥…
o Compute it one part at a time
o Select the next part “greedily” to get maximum immediate benefit
(this needs to be defined carefully for each problem)
o May not be optimal because there is no foresight
o But sometimes this can be optimal too!

373F19 - Nisarg Shah 3


Interval Scheduling
• Problem
➢ Job 𝑗 starts at time 𝑠𝑗 and finishes at time 𝑓𝑗
➢ Two jobs are compatible if they don’t overlap
➢ Goal: find maximum-size subset of mutually compatible jobs

373F19 - Nisarg Shah 4


Interval Scheduling
• Greedy template
➢ Consider jobs in some “natural” order
➢ Take each job if it’s compatible with the ones already
chosen

• What order?
➢ Earliest start time: ascending order of 𝑠𝑗
➢ Earliest finish time: ascending order of 𝑓𝑗

➢ Shortest interval: ascending order of 𝑓𝑗 − 𝑠𝑗

➢ Fewest conflicts: ascending order of 𝑐𝑗 , where 𝑐𝑗 is the


number of remaining jobs that conflict with 𝑗

373F19 - Nisarg Shah 5


Example
• Earliest start time: ascending order of 𝑠𝑗
• Earliest finish time: ascending order of 𝑓𝑗
• Shortest interval: ascending order of 𝑓𝑗 − 𝑠𝑗
• Fewest conflicts: ascending order of 𝑐𝑗 , where 𝑐𝑗 is the number of
remaining jobs that conflict with 𝑗

373F19 - Nisarg Shah 6


Interval Scheduling
• Does it work? Counterexamples for

earliest start time

shortest interval

fewest conflicts

373F19 - Nisarg Shah 7


Interval Scheduling
• Implementing greedy with earliest finish time (EFT)
➢ Sort jobs by finish time. Say 𝑓1 ≤ 𝑓2 ≤ ⋯ ≤ 𝑓𝑛
➢ When deciding whether job 𝑗 should be included, we
need to check whether it’s compatible with all previously
added jobs
o We only need to check if 𝑠𝑗 ≥ 𝑓𝑖 ∗ , where 𝑖 ∗ is the last added job
o This is because for any jobs 𝑖 added before 𝑖 ∗ , 𝑓𝑖 ≤ 𝑓𝑖 ∗
o So we can simply store and maintain the finish time of the last
added job

➢ Running time: 𝑂 𝑛 log 𝑛

373F19 - Nisarg Shah 8


Interval Scheduling
• Optimality of greedy with EFT
➢ Suppose for contradiction that greedy is not optimal
➢ Say greedy selects jobs 𝑖1 , 𝑖2 , … , 𝑖𝑘 sorted by finish time
➢ Consider the optimal solution 𝑗1 , 𝑗2 , … , 𝑗𝑚 (also sorted by
finish time) which matches greedy for as long as possible
o That is, we want 𝑗1 = 𝑖1 , … , 𝑗𝑟 = 𝑖𝑟 for greatest possible 𝑟

373F19 - Nisarg Shah 9


Interval Scheduling Another standard
method is induction

• Optimality of greedy with EFT


➢ Both 𝑖𝑟+1 and 𝑗𝑟+1 were compatible with the previous
selection (𝑖1 = 𝑗1 , … , 𝑖𝑟 = 𝑗𝑟 )
➢ Consider the solution 𝑖1 , 𝑖2 , … , 𝑖𝑟 , 𝑖𝑟+1 , 𝑗𝑟+2 , … , 𝑗𝑚
o It should still be feasible (since 𝑓𝑖𝑟+1 ≤ 𝑓𝑗𝑟+1 )
o It is still optimal
o And it matches with greedy for one more step (contradiction!)

373F19 - Nisarg Shah 10


Interval Partitioning
• Problem
➢ Job 𝑗 starts at time 𝑠𝑗 and finishes at time 𝑓𝑗
➢ Two jobs are compatible if they don’t overlap
➢ Goal: group jobs into fewest partitions such that jobs in
the same partition are compatible

• One idea
➢ Find the maximum compatible set using the previous
greedy EFT algorithm, call it one partition, recurse on the
remaining jobs.
➢ Doesn’t work (check by yourselves)

373F19 - Nisarg Shah 11


Interval Partitioning
• Think of scheduling lectures for various courses
into as few classrooms as possible

• This schedule uses 4 classrooms for scheduling 10


lectures

373F19 - Nisarg Shah 12


Interval Partitioning
• Think of scheduling lectures for various courses
into as few classrooms as possible

• This schedule uses 3 classrooms for scheduling 10


lectures

373F19 - Nisarg Shah 13


Interval Partitioning
• Let’s go back to the greedy template!
➢ Go through lectures in some “natural” order
➢ Assign each lecture to a compatible classroom (which?),
and create a new classroom if the lecture conflicts with
every existing classroom

• Order of lectures?
➢ Earliest start time: ascending order of 𝑠𝑗
➢ Earliest finish time: ascending order of 𝑓𝑗

➢ Shortest interval: ascending order of 𝑓𝑗 − 𝑠𝑗

➢ Fewest conflicts: ascending order of 𝑐𝑗 , where 𝑐𝑗 is the


number of remaining jobs that conflict with 𝑗
373F19 - Nisarg Shah 14
Interval Partitioning
• At least when you
assign each lecture to
an arbitrary feasible
classroom, three of
these heuristics do not
work.

• The fourth one works!


(next slide)

373F19 - Nisarg Shah 15


Interval Partitioning

373F19 - Nisarg Shah 16


Interval Partitioning
• Running time
➢ Key step: check if the next lecture can be scheduled at
some classroom
➢ Store classrooms in a priority queue
o key = finish time of its last lecture
➢ Is lecture 𝑗 compatible with some classroom?
o Same as “Is 𝑠𝑗 at least as large as the minimum key?”
o If yes: add lecture 𝑗 to classroom 𝑘 with minimum key, and
increase its key to 𝑓𝑗
o Otherwise: create a new classroom, add lecture 𝑗, set key to 𝑓𝑗
➢ 𝑂(𝑛) priority queue operations, 𝑂(𝑛 log 𝑛) time

373F19 - Nisarg Shah 17


Interval Partitioning
• Proof of optimality (lower bound)
➢ # classrooms needed ≥ maximum “depth” at any point
o depth = number of lectures running at that time
➢ We now show that our greedy algorithm uses only these
many classrooms!

373F19 - Nisarg Shah 18


Interval Partitioning
• Proof of optimality (upper bound)
➢ Let 𝑑 = # classrooms used by greedy
➢ Classroom 𝑑 was opened because there was a schedule 𝑗
which was incompatible with some lectures already
scheduled in each of 𝑑 − 1 other classrooms
➢ All these 𝑑 lectures end after 𝑠𝑗

➢ Since we sorted by start time, they all start at/before 𝑠𝑗

➢ So at time 𝑠𝑗 , we have 𝑑 overlapping lectures

➢ Hence, depth ≥ 𝑑
➢ So all schedules use ≥ 𝑑 classrooms.
➢ QED!

373F19 - Nisarg Shah 19


Interval Graphs
• Interval scheduling and interval partitioning can be
seen as graph problems

• Input
➢ Graph 𝐺 = (𝑉, 𝐸)
➢ Vertices 𝑉 = jobs/lectures
➢ Edge 𝑖, 𝑗 ∈ 𝐸 if jobs 𝑖 and 𝑗 are incompatible

• Interval scheduling = maximum independent set


(MIS)
• Interval partitioning = graph colouring
373F19 - Nisarg Shah 20
Interval Graphs
• MIS and graph colouring are NP-hard for general
graphs
• But they’re efficiently solvable for interval graphs
➢ Interval graphs = graphs which can be obtained from
incompatibility of intervals
➢ In fact, this holds even when we are not given an interval
representation of the graph
• Can we extend this result further?
➢ Yes! Chordal graphs
o Every cycle with 4 or more vertices has a chord

373F19 - Nisarg Shah 21


Minimizing Lateness
• Problem
➢ We have a single machine
➢ Each job 𝑗 requires 𝑡𝑗 units of time and is due by time 𝑑𝑗

➢ If it’s scheduled to start at 𝑠𝑗 , it will finish at 𝑓𝑗 = 𝑠𝑗 + 𝑡𝑗

➢ Lateness: ℓ𝑗 = max 0, 𝑓𝑗 − 𝑑𝑗

➢ Goal: minimize the maximum lateness, 𝐿 = max ℓ𝑗


𝑗

• Contrast with interval scheduling


➢ We can decide the start time
➢ All jobs must be scheduled on a single machine

373F19 - Nisarg Shah 22


Minimizing Lateness
• Example

Input

An example schedule

373F19 - Nisarg Shah 23


Minimizing Lateness
• Let’s go back to greedy template
➢ Consider jobs one-by-one in some “natural” order
➢ Schedule jobs in this order (nothing special to do here,
since we have to schedule all jobs and there is only one
machine available)

• Natural orders?
➢ Shortest processing time first: ascending order of
processing time 𝑡𝑗
➢ Earliest deadline first: ascending order of due time 𝑑𝑗

➢ Smallest slack first: ascending order of 𝑑𝑗 − 𝑡𝑗

373F19 - Nisarg Shah 24


Minimizing Lateness
• Counterexamples

➢ Shortest processing time first


o Ascending order of processing time 𝑡𝑗

➢ Smallest slack first


o Ascending order of 𝑑𝑗 − 𝑡𝑗

373F19 - Nisarg Shah 25


Minimizing Lateness
• By now, you
should
know
what’s
coming…

• We’ll prove
that earliest
deadline
first works!

373F19 - Nisarg Shah 26


Minimizing Lateness
• Observation 1
➢ There is an optimal schedule with no idle time

373F19 - Nisarg Shah 27


Minimizing Lateness
• Observation 2
➢ Earliest deadline first has no idle time

• Let us define an “inversion”


➢ 𝑖, 𝑗 such that 𝑑𝑖 < 𝑑𝑗 but 𝑗 is scheduled before 𝑖

• Observation 3
➢ By definition, earliest deadline first has no inversions

• Observation 4
➢ If a schedule with no idle time has an inversion, it has a
pair of inverted jobs scheduled consecutively

373F19 - Nisarg Shah 28


Minimizing Lateness
• Claim
➢ Swapping adjacently scheduled inverted jobs doesn’t
increase lateness but reduces #inversions by one
• Proof
➢ Let ℓ and ℓ′ denote lateness before/after swap

➢ Clearly, ℓ𝑘 = ℓ𝑘 for all 𝑘 ≠ 𝑖, 𝑗

➢ Also, clearly, ℓ𝑖 ≤ ℓ𝑖

373F19 - Nisarg Shah 29


Minimizing Lateness
• Claim
➢ Swapping adjacently scheduled inverted jobs doesn’t
increase lateness but reduces #inversions by one
• Proof
➢ ℓ𝑗′ = 𝑓𝑗′ − 𝑑𝑗 = 𝑓𝑖 − 𝑑𝑗 ≤ 𝑓𝑖 − 𝑑𝑖 = ℓ𝑖
′ ′ ′ ′
➢ 𝐿 = max ℓ𝑖 , ℓ𝑗 , max ℓ𝑘 ≤ max ℓ𝑖 , ℓ𝑖 , max ℓ𝑘 ≤ 𝐿
𝑘≠𝑖,𝑗 𝑘≠𝑖,𝑗

373F19 - Nisarg Shah 30


Minimizing Lateness
• Proof of optimality of earliest deadline first
➢ Suppose for contradiction that it’s not optimal

➢ Consider an optimal schedule 𝑆 which has fewest inversions
among all optimal schedules
o We can assume it has no idle time
o If 𝑆 ∗ has zero inversions, it’s exactly earliest deadline first
o So assume 𝑆 ∗ has at least one inversion
o So it must have an adjacent inversion (𝑖, 𝑗)
o But swapping these jobs doesn’t increase lateness (so new schedule
stays optimal) and reduces the number of inversions by 1
o Contradiction given that 𝑆 ∗ has fewest inversions among all optimal
schedules.
o QED!

373F19 - Nisarg Shah 31


Lossless Compression
• Problem
➢ We have a document that is written using 𝑛 distinct labels
➢ Naïve encoding: represent each label using 𝑘 = log 𝑛 bits
➢ If the document has length 𝑚, this uses 𝑚 log 𝑛 bits

➢ Say for English documents with no punctuations etc, we


have 𝑛 = 26, so we can use 5 bits.
o 𝑎 = 00000
o 𝑏 = 00001
o 𝑐 = 00010
o 𝑑 = 00011
o…

373F19 - Nisarg Shah 32


Lossless Compression
• Is this optimal?
➢ What if 𝑎, 𝑒, 𝑟, 𝑠 are much more frequent in the
document than 𝑥, 𝑞, 𝑧?
➢ Can we assign shorter codes to more frequent letters?

• Say we assign…
➢ 𝑎 = 0, 𝑏 = 1, 𝑐 = 01, …
➢ See a problem?
o What if we observe the encoding ‘01’?
o Is it ‘ab’? Or is it ‘c’?

373F19 - Nisarg Shah 33


Lossless Compression
• To avoid conflicts, we need prefix-free encoding
➢ Map each label 𝑥 to a bit-string 𝑐(𝑥) such that for all
distinct labels 𝑥 and 𝑦, 𝑐(𝑥) is not a prefix of 𝑐 𝑦
➢ Then it’s impossible to have a scenario like this
………………………..

𝑐(𝑥)

𝑐(𝑦)

➢ So we can read left to right, find the first point where it


becomes a valid encoding, decode the label, and continue

373F19 - Nisarg Shah 34


Lossless Compression
• Formal problem
➢ Given 𝑛 symbols and their frequencies (𝑤1 , … , 𝑤𝑛 ), find a
prefix-free encoding with lengths (ℓ1 , … , ℓ𝑛 ) assigned to
the symbols which minimizes σ𝑛𝑖=1 𝑤𝑖 ⋅ ℓ𝑖
o Note that σ𝑛𝑖=1 𝑤𝑖 ⋅ ℓ𝑖 is the length of the compressed document

• Example
➢ (𝑤𝑎 , 𝑤𝑏 , 𝑤𝑐 , 𝑤𝑑 , 𝑤𝑒 , 𝑤𝑓 ) = (42,20,5,10,11,12)
➢ No need to remember the numbers ☺

373F19 - Nisarg Shah 35


Lossless Compression
• Observation: prefix-free encoding = tree

𝑎 → 0, 𝑒 → 100,
𝑓 → 101, 𝑐 → 1100,
𝑑 → 1101, 𝑏 → 111

373F19 - Nisarg Shah 36


Lossless Compression
• Huffman Coding
➢ Build a priority queue by adding 𝑥, 𝑤𝑥 for each symbol 𝑥
➢ While |queue|≥ 2
o Take the two symbols with the lowest weight (𝑥, 𝑤𝑥 ) and (𝑦, 𝑤𝑦 )
o Merge them into one symbol with weight 𝑤𝑥 + 𝑤𝑦

• Let’s see this on the previous example

373F19 - Nisarg Shah 37


Lossless Compression

373F19 - Nisarg Shah 38


Lossless Compression

373F19 - Nisarg Shah 39


Lossless Compression

373F19 - Nisarg Shah 40


Lossless Compression

373F19 - Nisarg Shah 41


Lossless Compression

373F19 - Nisarg Shah 42


Lossless Compression
• Final Outcome
𝑎 → 0, 𝑒 → 100,
𝑓 → 101, 𝑐 → 1100,
𝑑 → 1101, 𝑏 → 111

373F19 - Nisarg Shah 43


Lossless Compression
• Running time
➢ 𝑂(𝑛 log 𝑛)
➢ Can be made 𝑂(𝑛) if the labels are given to you sorted by
their frequencies

• Proof of optimality
➢ Induction on the number of symbols 𝑛
➢ Base case: For 𝑛 = 2, there are only two possible
encodings, both are optimal, assign 1 bit to each symbol
➢ Hypothesis: Assume it returns an optimal encoding with
𝑛 − 1 symbols

373F19 - Nisarg Shah 44


Lossless Compression
• Proof of optimality
➢ Consider the case of 𝑛 symbols
➢ Lemma 1: If 𝑤𝑥 < 𝑤𝑦 , then ℓ𝑥 ≥ ℓ𝑦 in any optimal tree.
o Proof sketch: Otherwise, swapping 𝑥 and 𝑦 would strictly reduce
the overall length (exercise!).
➢ Lemma 2: There is an optimal tree 𝑇 in which the two
least frequent symbols are siblings.
o Proof sketch: First prove that they must have the same longest
length assigned to them. Then, if they’re not siblings, chop and
rearrange the tree to make them siblings (exercise!).
➢ Now, we can compare the tree 𝐻 produced by Huffman
vs such an optimal tree 𝑇

373F19 - Nisarg Shah 45


Lossless Compression
• Proof of optimality
➢ Let 𝑥 and 𝑦 be the two least frequency symbols
➢ In Huffman, we combine them in the first step into “xy”

➢ Let 𝐻′ and 𝑇 be trees obtained from 𝐻 and 𝑇 by treating
𝑥𝑦 as one symbol with frequency 𝑤𝑥 + 𝑤𝑦

➢ Use induction hypothesis: 𝐿𝑒𝑛𝑔𝑡ℎ 𝐻 ≤ 𝐿𝑒𝑛𝑔𝑡ℎ(𝑇 ′ )

➢ 𝐿𝑒𝑛𝑔𝑡ℎ 𝐻 = 𝐿𝑒𝑛𝑔𝑡ℎ 𝐻 + 𝑤𝑥 + 𝑤𝑦 ⋅ 1

➢ 𝐿𝑒𝑛𝑔𝑡ℎ 𝑇 = 𝐿𝑒𝑛𝑔𝑡ℎ 𝑇
′ + 𝑤 +𝑤
𝑥 𝑦 ⋅1
➢ QED!

373F19 - Nisarg Shah 46


Other Greedy Algorithms
• If you aren’t familiar with the following algorithms,
spend some time checking them out!
➢ Dijkstra’s shortest path algorithm
➢ Kruskal and Prim’s minimum spanning tree algorithms

373F19 - Nisarg Shah 47

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy