0% found this document useful (0 votes)
4 views

Lecture 1 - Analysis of Algorithms

The document outlines the first lecture of a course on Data Structures and Algorithms, focusing on the analysis of algorithms, specifically Insertion Sort and Merge Sort. It defines algorithms, discusses their efficiency in terms of time and space complexity, and introduces concepts like order of growth and worst-case running time. The lecture also includes examples of algorithms in practical applications such as data compression and web page ranking.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Lecture 1 - Analysis of Algorithms

The document outlines the first lecture of a course on Data Structures and Algorithms, focusing on the analysis of algorithms, specifically Insertion Sort and Merge Sort. It defines algorithms, discusses their efficiency in terms of time and space complexity, and introduces concepts like order of growth and worst-case running time. The lecture also includes examples of algorithms in practical applications such as data compression and web page ranking.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

ELCE 305

Data Structures and Algorithms

LECTURE 1
ANALYSIS OF ALGORITHMS
• Definition of algorithm • Analysis of Insertion sort • Chapters 1 and 2
• Features of algorithms • Analysis of Merge sort
• Definition of data structures
• Order of growth
Definition of Algorithm (1)
• Origin:
• Latinization of the name of mathematician Al-Khwarizmi.
• He wrote a book on Hindu-Arabic numeral system which was translated
into Latin as “Algoritmi de numero Indorum” (Algoritmi on the numbers
of the Indians).
• "Algoritmi" was the translator’s Latinization of Al-Khwarizmi's name.

• Definition:
• A sequence of computational steps that transform the input into the output.
• Procedural approach to solve “computational problems”.
• The theoretical study of computer program performance and resource usage.

2
Definition of Algorithm (2)
• Example 1: Find the shortest path from NU to MIT

1
• Is the shortest path always the fastest path?
• Consider a lifeguard running towards a drowning person.
2
3
Definition of Algorithm (4)
• Example 2: Data compression
• The process of encoding information using fewer bits than the original
representation.
• Lossless compression reduces bits by identifying and eliminating statistical
redundancy. No information is lost in lossless compression.
• Examples: ZIP, PNG, GIF
• Lossy compression reduces bits by removing unnecessary or less important
information.
• Examples: JPEG, MPEG, MP3

4
Definition of Algorithm (5)
• Example 3: Find the most relevant web page for a query (Google Search)
• PageRank (PR): a way of measuring the importance of website pages
• What is PR?
• PR is a “vote” by all other pages on the web about how important a page is.
• A link to a page counts as a vote of support.
• If there’s no link, then there is no support.
• Assume page 𝐴𝐴 has pages 𝑇𝑇1 ⋯ 𝑇𝑇𝑛𝑛 pointing to it.
• Damping factor 𝑑𝑑: Taken to be 0.85
• 𝐶𝐶(𝐴𝐴): Number of links going out of page 𝐴𝐴.
PR 𝑇𝑇1 PR 𝑇𝑇𝑛𝑛
• PR equation: PR 𝐴𝐴 = 1 − 𝑑𝑑 + 𝑑𝑑( + ⋯+ )
𝐶𝐶 𝑇𝑇1 𝐶𝐶 𝑇𝑇𝑛𝑛
• The PR of each page depends on the PR of the pages pointing to it.
• But we won’t know what PR those pages have until the pages pointing to them have their
PR calculated and so on …
• We can just go ahead and calculate a page’s PR without knowing the final value of the
PR of the other pages. 5
Definition of Algorithm (6)
• Let’s look at a simple example:

• Each page has 1 outgoing link (the outgoing count is 1, i.e. 𝐶𝐶(𝐴𝐴) = 1 and 𝐶𝐶(𝐵𝐵) = 1.

• Guess 1: We don’t know what their PR should be to begin with, so let’s take
a guess at 1.0 and do some calculations:

This is a lucky guess!

6
Definition of Algorithm (7)
• Guess 2: Let’s start the guess at 0 instead and re-calculate:

The numbers just keep going up. But will


the numbers stop increasing when they
get to 1.0? What if a calculation over-
shoots and goes above 1.0?

7
Definition of Algorithm (8)
• Guess 3: Let’s start the guess at 40 each and do a few cycles:

The numbers are heading down


34.15 alright! It sure looks the numbers
34.15 will get to 1.0 and stop.

8
Definition of Algorithm (9)
• To be able to design an algorithm, we first need to learn the tools required
to analyze it.
• Algorithm is all about efficiency: Time vs. Space
• Time complexity: Developing a formula for predicting how fast an algorithm
is, based on input size.
• Space complexity: Developing a formula for predicting how much memory
an algorithm requires, based on input size.
• Memory is extensible, time is not!
• Multiple algorithms can be designed to solve the same problem.
• The algorithm that provides the maximum efficiency should be used for
solving the problem.
9
Example of Efficiency (1)
• We will learn two famous sorting algorithms namely, Insertion Sort (IS) and
Merge Sort (MS).
• IS has a running time proportional to 𝑐𝑐1 𝑛𝑛2 , whereas MS has a running time
proportional to 𝑐𝑐2 𝑛𝑛 log 2 𝑛𝑛. (Take my word for it!)
• Here, 𝑛𝑛 is the input size and 𝑐𝑐1 , 𝑐𝑐2 are constant factors.
• Constant factors do not depend on 𝑛𝑛. They have less of an impact on the
running time.
• IS has a factor of 𝑛𝑛 in its running time (𝑛𝑛 × 𝑛𝑛), while MS has a factor of
log 2 𝑛𝑛.
• Example: When 𝑛𝑛 = 1000, log 2 1000 ≈ 10.
When 𝑛𝑛 = 106 , log 2 1000000 ≈ 20.
10
Example of Efficiency (2)
• For small values of 𝑛𝑛, IS may run faster. But as 𝑛𝑛 increases, MS will
compensate for the difference in 𝑐𝑐1 − 𝑐𝑐2 .
• A more concrete example:
• Sort an array of 𝑛𝑛 = 10 million numbers
How many Megabytes?
• Each number is an 8-byte integer.

Computer A (fast) Computer B (slow)

• Running IS • Running MS
• 10 GIPS • 10 MIPS
• Best machine • Average high-level
language IS coder MS programmer

2𝑛𝑛2 Which one is faster? 50 𝑛𝑛log 2 𝑛𝑛


11
Example of Efficiency (3)
• To sort 10 million numbers,
2
2� 107 instructions
• Computer A takes = 20,000 seconds ≈ 5.5 hrs
1010 IPS
50�107 log2 107 instructions
• Computer B takes = 1,163 seconds < 20 mins
107 IPS

• Using an algorithm whose running time grows more slowly, even with a poor
compiler, Computer B runs more than 17 times faster than Computer A!

12
Insertion Sort (1)
• Every sorting problem:
• Input: A sequence of 𝑛𝑛 numbers 𝑎𝑎1 , 𝑎𝑎2 , … , 𝑎𝑎𝑛𝑛 .
• Output: A permutation 𝑎𝑎́ 1 , 𝑎𝑎́ 2 , … , 𝑎𝑎́ 𝑛𝑛 such that 𝑎𝑎́ 1 ≤ 𝑎𝑎́ 2 ≤ ⋯ ≤ 𝑎𝑎́ 𝑛𝑛 .
• Example:
• Input: 1, 9, 2, 5, 7, 9
• Output: 1, 2, 5, 7, 9, 9
• Insertion Sort:
Input: 4 3 1 6 2 5 1 2 3 4 6 5
3 4 1 6 2 5 1 2 3 4 6 5
3 1 4 6 2 5 1 2 3 4 5 6
1 3 4 6 2 5
1 3 4 6 2 5
1 3 4 2 6 5
1 3 2 4 6 5 13
Insertion Sort (2)
• Pseudocode:
• Input: An array of 𝑛𝑛 numbers, 𝐴𝐴[1 … 𝑛𝑛]

• Insertion sort is an in-place sorting algorithm


• It rearranges the numbers within the array 𝐴𝐴 itself.
14
Insertion Sort (3)
• Another example: j: current index

𝐴𝐴[𝑗𝑗]: key indexed at 𝑗𝑗

• 𝐴𝐴[1. . 𝑗𝑗 − 1]: currently sorted subarray


• 𝐴𝐴[𝑗𝑗 + 1. . 𝑛𝑛]: remaining subarray of unsorted numbers

• Loop invariant property:


• At the start of each iteration of the for loop, the subarray 𝐴𝐴[1. . 𝑗𝑗 − 1] consists of
elements originally in 𝐴𝐴[1. . 𝑗𝑗 − 1], but in sorted order.
15
Analysis of Insertion Sort (1)
• Predicting the resources that the algorithm requires:
• Memory, communication bandwidth, computer hardware
• But most often it is computational time that we want to measure.
• We consider the random-access machine (RAM) model to analyze our
algorithms. (Read this link)
• Assumptions:
• When a for or while loop exits the usual way, there is always one additional iteration
to be considered.
• Every instruction 𝑖𝑖 (i.e. every line in the pseudocode) takes some constant time, say
𝑐𝑐𝑖𝑖 , to execute.
• The running time of an algorithm depends on the input and grows with the size of
the input. → describe running time as function of input size
16
Analysis of Insertion Sort (2)

• 𝑇𝑇(𝑛𝑛): running time of insertion sort on an input of 𝑛𝑛 values.


• To compute 𝑇𝑇(𝑛𝑛), sum the products of the cost and time columns.
17
Analysis of Insertion Sort (3)

• Best-case running time (linear function of 𝑛𝑛): already sorted

𝑎𝑎𝑎𝑎 + 𝑏𝑏
• Worst-case running time (quadratic function of 𝑛𝑛): reverse ordered

𝑎𝑎𝑛𝑛2 + 𝑏𝑏𝑏𝑏 + 𝑐𝑐
18
Analysis of Insertion Sort (4)
• We will mainly concentrate on the worst-case running time: the longest
time for any input of size 𝑛𝑛. Why?
• The worst-case running time of an algorithm gives us an upper bound on
the running time for any input.
• Provides a guarantee that the algorithm will never take any longer.
• For some algorithms, the worst case occurs fairly often. In some
applications, searches for absent information may be frequent.
• The “average case” is often roughly as bad as the worst case.
• For instance, in insertion sort, average-case running time turns out to be a
quadratic function of the input size, just like the worst-case running time.

19
Order of Growth (1)
• We expressed the worst-case running time as 𝑎𝑎𝑎𝑎2 + 𝑏𝑏𝑏𝑏 + 𝑐𝑐 for some
constants 𝑎𝑎, 𝑏𝑏, and 𝑐𝑐 that depend on the statement costs 𝑐𝑐𝑖𝑖 .
• Since the lower-order terms are relatively insignificant for large values of 𝑛𝑛,
we consider only the leading term of a formula, i.e. 𝑎𝑎𝑎𝑎2 .
• We also ignore the leading term’s constant coefficient, since constant
factors are less significant than the rate of growth in determining
computational efficiency for large inputs.
• We write that insertion sort has a worst-case running time of Θ 𝑛𝑛2
(pronounced as theta of 𝑛𝑛-squared).

20
Order of Growth (2)
• When is one algorithm considered to be more efficient than another?
• If the worst-case running time of one has a lower order of growth than
the other.
• For large enough inputs, a Θ 𝑛𝑛2 algorithm, for example, will run more
quickly in the worst case than a Θ 𝑛𝑛3 algorithm.

• Example:
• Express the function below in terms of Θ-notation:
𝑛𝑛3� − 100𝑛𝑛 2
− 100𝑛𝑛 + 3
1000

Θ 𝑛𝑛3
21
Algorithm Design
• Incremental approach:
• Used in insertion sort
• Having sorted the subarray 𝐴𝐴[1. . 𝑗𝑗 − 1], we inserted the single element
𝐴𝐴[𝑗𝑗] into its proper place, yielding the sorted subarray 𝐴𝐴[1. . 𝑗𝑗].
• Divide-and-conquer approach:
• Will be used in designing merge sort
• Much lesser worst-case running time than insertion sort
• All recursive algorithms fall in this category:
• Divide the task into several subtasks that are similar to the original task but
smaller in size.
• Conquer the subtasks by solving them recursively, and then
• Combine the solutions of subtasks to create a solution to the original task.
22
Merge Sort (1)
𝑝𝑝 𝑞𝑞 𝑞𝑞+1 𝑟𝑟

𝐴𝐴: ⋯ ⋯

• The procedure MERGE-SORT sorts the elements in the subarray 𝐴𝐴 𝑝𝑝. . 𝑟𝑟 .


• If 𝑝𝑝 ≥ 𝑟𝑟, the subarray has at most one element and is thus, sorted!
Otherwise, 𝑞𝑞 is computed to divide 𝐴𝐴 into two subarrays: 𝐴𝐴[𝑝𝑝. . 𝑞𝑞] with 𝑛𝑛⁄2
elements and 𝐴𝐴 𝑞𝑞 + 1. . 𝑟𝑟 with 𝑛𝑛⁄2 elements.
23
Merge Sort (2)

Let’s see an example for this part

24
Merge Sort (3)
• MERGE(𝐴𝐴, 9, 12, 16)

(a) (b)

(c) (d)

25
Merge Sort (4)

(e) (f)

(g) (h)

(i)

26
Merge Sort (5)
• Examples:

27
Merge Sort (6)
• What is the running time of the MERGE procedure?

• Lines 1-3 and 8-11 take constant time


• For loops take Θ 𝑛𝑛1 + 𝑛𝑛2 = Θ 𝑛𝑛 time
• There are 𝑛𝑛 iterations of the for loops in lines 12-17,
each taking constant time .

Θ 𝑛𝑛

28
Analysis of Divide-n-Conquer
• When an algorithm contains a recursive call to itself, its running time is
often described by a recurrence equation.
• We can then use mathematical tools to solve the recurrence and provide
bounds on the performance of the algorithm.
• If the problem size is small enough, say 𝑛𝑛 ≤ 𝑐𝑐 for some constant 𝑐𝑐, the
solution takes constant time Θ 1 .
• Assume the problem is divided into 𝑎𝑎 subproblems, each of which is 1⁄𝑏𝑏 the
size of the original. (In merge sort, 𝑎𝑎 = 𝑏𝑏 = 2).
• Let 𝐷𝐷(𝑛𝑛) denote the time to divide the problem into subproblems and
𝐶𝐶(𝑛𝑛) be the time to combine the solutions to the subproblem into the
solution to the actual problem. Θ 1 𝑖𝑖𝑖𝑖 𝑛𝑛 ≤ 𝑐𝑐
𝑇𝑇 𝑛𝑛 = � 𝑛𝑛
𝑎𝑎𝑎𝑎 + 𝐷𝐷 𝑛𝑛 + 𝐶𝐶 𝑛𝑛 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒
𝑏𝑏
29
Analysis of Merge Sort (1)
• For analysis, we assume that the input size is a power of 2, though the
algorithm works well for odd input sizes too.
• Each divide step yields two subsequences of size exactly 𝑛𝑛⁄2.
• Merge sort on just one element takes constant time.
• What if 𝑛𝑛 > 1?

30
Analysis of Merge Sort (2)
• Let us break the running time as follows:
• Divide: It computes the middle of the array, which takes constant time.
Thus, 𝐷𝐷 𝑛𝑛 = Θ 1 .
• Conquer: Recursively solve two subproblems, each of size 𝑛𝑛⁄2. This
𝑛𝑛
contributes to 2𝑇𝑇( ) to the running time.
2
• Combine: We already know that the MERGE procedure on 𝑛𝑛 elements
takes time Θ 𝑛𝑛 . So, 𝐶𝐶 𝑛𝑛 = Θ 𝑛𝑛 .
• Adding them all up gives:
Θ 1 𝑖𝑖𝑖𝑖 𝑛𝑛 = 1 Θ 1 𝑖𝑖𝑖𝑖 𝑛𝑛 = 1
𝑇𝑇 𝑛𝑛 = � 𝑛𝑛 𝑇𝑇 𝑛𝑛 = � 𝑛𝑛
2𝑇𝑇 + Θ 𝑛𝑛 + Θ 1 𝑖𝑖𝑖𝑖 𝑛𝑛 > 1 2𝑇𝑇 + Θ 𝑛𝑛 𝑖𝑖𝑖𝑖 𝑛𝑛 > 1
2 2

31
Analysis of Merge Sort (3)
• The solution to the recurrence equation is 𝑇𝑇 𝑛𝑛 = Θ 𝑛𝑛 log 2 𝑛𝑛 . How?
• Let us rewrite the equation as follows:
Θ 1 𝑖𝑖𝑖𝑖 𝑛𝑛 = 1 𝑐𝑐 𝑖𝑖𝑖𝑖 𝑛𝑛 = 1
𝑇𝑇 𝑛𝑛 = � 𝑛𝑛 𝑇𝑇 𝑛𝑛 = � 𝑛𝑛
2𝑇𝑇 + Θ 𝑛𝑛 𝑖𝑖𝑖𝑖 𝑛𝑛 > 1 2𝑇𝑇 + 𝑐𝑐𝑐𝑐 𝑖𝑖𝑖𝑖 𝑛𝑛 > 1
2 2
where constant 𝑐𝑐 represents the time required to solve problems of size 1.
• We also assume that 𝑛𝑛 is an exact power of 2.
• We now construct the recurrence tree of the recurrence equation above.
• Then add the costs across each level of the tree to compute the total cost
of 𝑇𝑇 𝑛𝑛 .

32
Analysis of Merge Sort (4)
cost
Level 0
(a)

Level 1

(b) Level 2

Level log 𝑛𝑛

(c) (d) 33
Analysis of Merge Sort (5)
• The top level (level 0) has total cost of 𝑐𝑐𝑐𝑐.
• The next level (level 1) has total cost of 𝑐𝑐 𝑛𝑛⁄2 + 𝑐𝑐 𝑛𝑛⁄2 = 𝑐𝑐𝑐𝑐.
• The next level (level 2) has total cost of 𝑐𝑐 𝑛𝑛⁄4 + 𝑐𝑐 𝑛𝑛⁄4 + 𝑐𝑐 𝑛𝑛⁄4 + 𝑐𝑐(𝑛𝑛⁄4) = 𝑐𝑐𝑐𝑐.
• In general, level 𝑖𝑖 has 2𝑖𝑖 nodes, each having a cost of 𝑐𝑐(𝑛𝑛⁄2𝑖𝑖 ). Thus, level 𝑖𝑖 will
have a total cost of 2𝑖𝑖 𝑐𝑐 𝑛𝑛⁄2𝑖𝑖 = 𝑐𝑐𝑐𝑐.
• The bottom-most level has 𝑛𝑛 nodes, each contributing a cost of 𝑐𝑐, giving a total of
𝑐𝑐 + 𝑐𝑐 + ⋯ + 𝑐𝑐 = 𝑐𝑐𝑐𝑐.
• The total number of levels of the recurrence tree is log 2 𝑛𝑛 + 1. Prove using
induction.
• There are log 2 𝑛𝑛 + 1 levels, each costing 𝑐𝑐𝑐𝑐. Thus, total cost is:
𝑇𝑇 𝑛𝑛 = 𝑐𝑐𝑐𝑐(log 2 𝑛𝑛 + 1) = 𝑐𝑐𝑐𝑐 log 2 𝑛𝑛 + 𝑐𝑐𝑐𝑐 = Θ 𝑛𝑛 log 2 𝑛𝑛
34

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy