0% found this document useful (0 votes)

105 views2 pages

Homework Assignment 2: Total Points 80

This homework assignment involves answering questions about sampling from data streams to estimate statistics, using Bloom filters to estimate set membership, estimating frequencies of elements in a data stream using the Frequency Moment (FM) algorithm, and estimating frequencies of elements in a sliding window using the Dynamic Global Iceberg Monitoring (DGIM) algorithm. The document provides details of exercises to complete for each algorithm involving calculating statistics from samples, estimating false positive rates, determining tail lengths, and estimating frequencies. Solutions must show all steps and be explained briefly in own words. Copying from others will result in no points.

Uploaded by

samriddhi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

105 views2 pages

Homework Assignment 2: Total Points 80

Uploaded by

samriddhi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Homework Assignment 2

From the course book Mining Massive Datasets, chapter 4.

http://infolab.stanford.edu/~ullman/mmds/ch4.pdf
Use your own words. No cut-and-paste from the web or from class mates. Copying from other
sources will be detected and result in 0 points. If assignments by multiple students seem too
similar to be independent work, all students will receive 0 points.
It is great to work on solutions in groups! Just prepare the homework report in your own words.
Show all steps of your solution and calculations and explain them briefly in your own words. If
you write just the answer to the question without solution details, it will be 0 points.
No long answers, just brief and clear explanations for each step of your solution are required.

Total points 80
1. Sampling

(20 points) Exercise 4.2.1 : Suppose we have a stream of tuples with the schema Grades(university,
courseID, studentID, grade) Assume universities are unique, but a courseID is unique only within a
university (i.e., different universities may have different courses with the same ID, e.g., “CS101”) and
likewise, studentID’s are unique only within a university (different universities may assign the same ID to
different students). Suppose we want to answer certain queries approximately from a 1/15th sample of
the data. For each of the queries below, indicate how you would construct the sample. That is, tell what
the key attributes should be.

(a) Estimate the average number of courses per university.

(b) Estimate the fraction of students who have a GPA of 3.7 or more.

Explain briefly but clearly how you will create the sample and why.

2. Bloom Filter

(15 points) Exercise 4.3.1 : For the situation of our running example (8 billion bits, 1 billion members of
the set S), calculate the false-positive rate if we use 3 and 5 hash functions. Briefly explain each step in
your solution.

3. FM Algorithm

(10 points) Exercise 4.4.1 : Suppose our stream consists of the integers 3, 1, 4, 1, 5, 9, 2, 6, 5. Our hash
functions will all be of the form h(x) = ax+ b mod 32 for some a and b. You should treat the result as a 5-
bit binary integer. Determine the tail length for each stream element and the resulting estimate of the
number of distinct elements if the hash function is:

(a) h(x) = 2x + 1 mod 32.

(b) h(x) = 3x + 7 mod 32.

Briefly explain each step in your solution.

4. DGIM Algorithm

1) (15 points) Exercise 4.6.1 : Suppose the window is as shown in Fig. 4.2. Estimate the number of 1’s the
last k positions, for k =

(a) 5

(b) 15

In each case, how far off the correct value is your estimate?
2) (20 points) Study the example in section 4.6.7 Extensions to the Counting of Ones. Use the technique of
Section 4.6.6 to estimate the total error. Show that if each ci has fractional error at most e, then the
estimate of the true sum has error at most e.
Briefly explain each step in your solution.

Presentation Usage of Punctuation
No ratings yet
Presentation Usage of Punctuation
22 pages
Intro to Programming with Python - Final exam Practice Fall 2024
No ratings yet
Intro to Programming with Python - Final exam Practice Fall 2024
4 pages
Lec1 Bloom Distinctcount
No ratings yet
Lec1 Bloom Distinctcount
76 pages
Final Python Record
No ratings yet
Final Python Record
46 pages
Assignment No.2: HOANG Nguyen Phong
No ratings yet
Assignment No.2: HOANG Nguyen Phong
6 pages
Compsci Algorithms For Data Science: Cameron Musco University of Massachusetts Amherst. Fall 2019
No ratings yet
Compsci Algorithms For Data Science: Cameron Musco University of Massachusetts Amherst. Fall 2019
28 pages
Mit6 100l f22 Lec26
No ratings yet
Mit6 100l f22 Lec26
40 pages
3.flajolet Martin Algorithm
No ratings yet
3.flajolet Martin Algorithm
31 pages
6.00 Quiz 2, 2011 - Name
No ratings yet
6.00 Quiz 2, 2011 - Name
8 pages
BDA PT 2
No ratings yet
BDA PT 2
35 pages
Stoichiometry (Chemical)
No ratings yet
Stoichiometry (Chemical)
71 pages
Experiment No 8
No ratings yet
Experiment No 8
7 pages
DSBD_Unit-II_3
No ratings yet
DSBD_Unit-II_3
28 pages
Assignment 5-Fall 2024_553
No ratings yet
Assignment 5-Fall 2024_553
8 pages
CS PRACTICAL FILE
No ratings yet
CS PRACTICAL FILE
21 pages
Streaming Algorithms: CS6234 Advanced Algorithms February 10 2015
No ratings yet
Streaming Algorithms: CS6234 Advanced Algorithms February 10 2015
90 pages
Final Report
No ratings yet
Final Report
8 pages
AA Exam 2021 Answers
No ratings yet
AA Exam 2021 Answers
6 pages
1a. Best of Two
No ratings yet
1a. Best of Two
7 pages
Detector CO2 Parcaje
No ratings yet
Detector CO2 Parcaje
93 pages
Aden - Kerker. Scattering Efficiency For A Layered Sphere. 1951
100% (2)
Aden - Kerker. Scattering Efficiency For A Layered Sphere. 1951
6 pages
Book 160 163
No ratings yet
Book 160 163
4 pages
Physical Science Grade 12 Nov 2020 P2 and Memo
No ratings yet
Physical Science Grade 12 Nov 2020 P2 and Memo
36 pages
Ai Praticalfile
No ratings yet
Ai Praticalfile
24 pages
Compre ModelQns
No ratings yet
Compre ModelQns
5 pages
Inelec in a Nutshell - Math IV Recitations Solution
No ratings yet
Inelec in a Nutshell - Math IV Recitations Solution
91 pages
FIT1029: Tutorial 5 Solutions Semester 1, 2014: Activity 1
No ratings yet
FIT1029: Tutorial 5 Solutions Semester 1, 2014: Activity 1
6 pages
CSE IV SEM Syllabus250225113453
No ratings yet
CSE IV SEM Syllabus250225113453
14 pages
Practice
No ratings yet
Practice
8 pages
PPL Experiment No-4
No ratings yet
PPL Experiment No-4
9 pages
ict assignment # 4
No ratings yet
ict assignment # 4
7 pages
Hash Solution
100% (2)
Hash Solution
3 pages
3final ML Lab Manual
No ratings yet
3final ML Lab Manual
17 pages
ITP421 WEEK 2 - Motherboards
No ratings yet
ITP421 WEEK 2 - Motherboards
29 pages
Practical-1: Aim: Write A Function To Find All Prime Numbers Occur Between 1 To 100
No ratings yet
Practical-1: Aim: Write A Function To Find All Prime Numbers Occur Between 1 To 100
8 pages
bda exp8
No ratings yet
bda exp8
4 pages
DGIM
No ratings yet
DGIM
90 pages
Discrete Guidelines NEP-3
No ratings yet
Discrete Guidelines NEP-3
2 pages
Mini Project Report
No ratings yet
Mini Project Report
10 pages
Assignment 2
No ratings yet
Assignment 2
2 pages
Bond Pricing
No ratings yet
Bond Pricing
70 pages
PYTHON2 (1)
No ratings yet
PYTHON2 (1)
47 pages
Lab Programs
No ratings yet
Lab Programs
10 pages
Operation and Maintenance of Centrifugal Pump
100% (1)
Operation and Maintenance of Centrifugal Pump
10 pages
FB Tutorial Migration Imaging Conditions 141201
No ratings yet
FB Tutorial Migration Imaging Conditions 141201
11 pages
Levels of Polychlorinated Biphenyls (PCBS) in Transformer Oils From Korea
No ratings yet
Levels of Polychlorinated Biphenyls (PCBS) in Transformer Oils From Korea
9 pages
Assignment1 Maths
No ratings yet
Assignment1 Maths
5 pages
Oracle HTTP Server Installation On Linux
No ratings yet
Oracle HTTP Server Installation On Linux
21 pages
Nerve Physiology
100% (4)
Nerve Physiology
31 pages
IP Practical File - Edited
No ratings yet
IP Practical File - Edited
48 pages
DRIVES Kinetix 300 (2097-Vxxx) - A - 1.084 (Released 8 - 2017)
No ratings yet
DRIVES Kinetix 300 (2097-Vxxx) - A - 1.084 (Released 8 - 2017)
1 page
Lesson 9 - Figures of Categorical Syllogisms
No ratings yet
Lesson 9 - Figures of Categorical Syllogisms
13 pages
homework1
No ratings yet
homework1
2 pages
Colchicine
0% (1)
Colchicine
2 pages
Assignment1 Maths
No ratings yet
Assignment1 Maths
7 pages
Localization of A Unicycle-Like Mobile Robot Using LRF and Omni-Directional Camera
No ratings yet
Localization of A Unicycle-Like Mobile Robot Using LRF and Omni-Directional Camera
7 pages
Ct-2 b1 (Set-A) Ak
No ratings yet
Ct-2 b1 (Set-A) Ak
7 pages
Ch11 Soln 2
No ratings yet
Ch11 Soln 2
8 pages
BHCS-04-Discrete-Structures-32341202
No ratings yet
BHCS-04-Discrete-Structures-32341202
5 pages
Logical Reasoning and Data Interpretation 06 - Class Notes - IPMAT Pro 2025
No ratings yet
Logical Reasoning and Data Interpretation 06 - Class Notes - IPMAT Pro 2025
34 pages
Relé de Estado Solido RZ
No ratings yet
Relé de Estado Solido RZ
4 pages
NEW Calibration BP
No ratings yet
NEW Calibration BP
12 pages
midterm1practice
No ratings yet
midterm1practice
11 pages
Tenses - Part 12 Future Perfect Continuous Tense
No ratings yet
Tenses - Part 12 Future Perfect Continuous Tense
5 pages
FIT1053 Algorithms and Programming Fundamentals in Python - Workshop 3
No ratings yet
FIT1053 Algorithms and Programming Fundamentals in Python - Workshop 3
2 pages
COMPSCI330 Design and Analysis of Algorithms Assignment 1: Due Date: Thursday, August 27, 2020
No ratings yet
COMPSCI330 Design and Analysis of Algorithms Assignment 1: Due Date: Thursday, August 27, 2020
2 pages
Long Test and Midterm Reviewer
No ratings yet
Long Test and Midterm Reviewer
7 pages
MIT 404 Main 2022
No ratings yet
MIT 404 Main 2022
3 pages
final_solutions
No ratings yet
final_solutions
6 pages
AlgorithmComplexityCA Correction
No ratings yet
AlgorithmComplexityCA Correction
4 pages
Structure in Linguistics
No ratings yet
Structure in Linguistics
6 pages
Mining Data Streams (Part 2)
No ratings yet
Mining Data Streams (Part 2)
56 pages
Fpi-100 Poe Injector
No ratings yet
Fpi-100 Poe Injector
1 page
22341 2023 Summer Question Paper[Msbte Study Resources] (1)
No ratings yet
22341 2023 Summer Question Paper[Msbte Study Resources] (1)
8 pages
MATH lab final code
No ratings yet
MATH lab final code
10 pages
Python 41 AM2
No ratings yet
Python 41 AM2
8 pages
COSC3320 Homework 1 Spr2025
No ratings yet
COSC3320 Homework 1 Spr2025
3 pages
XII IP Practical File 1 Complete
No ratings yet
XII IP Practical File 1 Complete
38 pages
Unit 4 - 4.4
No ratings yet
Unit 4 - 4.4
23 pages
Mains Compact NT 1.0 Reference Guide
No ratings yet
Mains Compact NT 1.0 Reference Guide
130 pages
Cardiac Calculations
No ratings yet
Cardiac Calculations
2 pages
SERVO MOTOR CODING WITH MANUAL
No ratings yet
SERVO MOTOR CODING WITH MANUAL
4 pages
HW 2 Sol
No ratings yet
HW 2 Sol
5 pages
Discrete_Probability_and_Counting
No ratings yet
Discrete_Probability_and_Counting
2 pages
Homeworks
No ratings yet
Homeworks
8 pages
Sankalp 022W - 1-3 - LOT-p1-PH-2-CPT-1-PTC
No ratings yet
Sankalp 022W - 1-3 - LOT-p1-PH-2-CPT-1-PTC
18 pages
Exp PR Eq040 en r0 - 1 Piping
83% (6)
Exp PR Eq040 en r0 - 1 Piping
62 pages
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
From Everand
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
Sama Alshatali
No ratings yet
Couchbase Certified Java Developer - Exam Practice Tests
From Everand
Couchbase Certified Java Developer - Exam Practice Tests
Cristian Scutaru
No ratings yet
Advanced C++ Interview Questions You'll Most Likely Be Asked
From Everand
Advanced C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Homework Assignment 2: Total Points 80

Uploaded by

Homework Assignment 2: Total Points 80

Uploaded by

Homework Assignment 2

From the course book Mining Massive Datasets, chapter 4.

(a) Estimate the average number of courses per university.

(a) h(x) = 2x + 1 mod 32.

(b) h(x) = 3x + 7 mod 32.

Briefly explain each step in your solution.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.