0% found this document useful (0 votes)
700 views

Flajolet-Martin Algorithm

Uploaded by

jessjohn2209
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
700 views

Flajolet-Martin Algorithm

Uploaded by

jessjohn2209
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Don Bosco Institute of Technology, Mumbai

Computer Department

Module 4
Mining Data Streams

Faculty: Ms. Sana Shaikh


Quick Recap

● Counting Distinct Elements in a Stream, Count-Distinct Problem


Flajolet-Martin Algorithm

Learning Outcomes:
● Understand and explain how the Flajolet-Martin algorithm works
● Implement FM algorithm
● Able to identify distinct elements in any data stream.
● Able to use in various applications

2
Counting Distinct Elements in a Stream

● How many different elements have appeared in the stream


● Suppose stream elements are chosen from some universal set
● Counting either from the beginning of the stream or from some known
time in the past

3
Counting Distinct Elements in a Stream

● How many different elements have appeared in the stream


● Suppose stream elements are chosen from some universal set
● Counting either from the beginning of the stream or from some known
time in the past
● Constraints on data stream algorithms:
○ Elements in stream are presented sequentially and single pass is
allowed
○ Can not stored entire stream and limited space to operate
○ Probability of error < 1
4
Counting Distinct Elements in a Stream

● Applications
○ Consider a Website gathering statistics on how many unique users it
has seen in each given month
○ Google - Search query

5
Counting Distinct Elements in a Stream

Problem: A data stream consists of elements chosen from a set of size n.


Maintain a count of the number of distinct elements seen so far.

Real Problem: What if we do not have space to store the complete set ?

Solution: The Flajolet-Martin Algorithm [FM Algorithm]


● It approximates the number of unique objects in a stream or a database in one
pass

● If the stream contains n elements with m of them unique, this algorithm runs in
O(n) time and needs O(log(m)) memory. 6
The Flajolet-Martin Algorithm

1. Pick up a hash function ‘h’ that maps each of the n elements to at


least log2 n bits.
2. For each stream element a:
a. Calculate the hash function h(a)
b. Calculate the binary equivalent of h(a)
c. Calculate the number of trailing zeros. Let r(a) be the trailing
zeros.
3. Record the maximum number of trailing zeros in ‘R’.
i.e. R = max (r(a))
1. Estimate the number of Distinct elements ‘E’
i.e. E = 2 R 7
The Flajolet-Martin Algorithm - Example

8
The Flajolet-Martin Algorithm - Example

9
The Flajolet-Martin Algorithm - Example

Calculate the hash function h(x)

10
The Flajolet-Martin Algorithm - Example

Calculate the hash function h(x)

11
The Flajolet-Martin Algorithm - Example

Calculate the hash function h(x)

12
The Flajolet-Martin Algorithm - Example

Calculate the hash function h(x)

13
The Flajolet-Martin Algorithm - Example

Calculate the hash function h(x)

14
The Flajolet-Martin Algorithm - Example

Calculate the hash function h(x)

15
The Flajolet-Martin Algorithm - Example

Binary-bit Calculation

16
The Flajolet-Martin Algorithm - Example

Trailing Zeros Calculation

17
The Flajolet-Martin Algorithm - Example

Trailing Zeros Calculation

18
The Flajolet-Martin Algorithm - Example

Trailing Zeros Calculation

19
The Flajolet-Martin Algorithm - Example

Distinct Elements

20
The Flajolet-Martin Algorithm - Solve

Suppose our stream consists of the integers:


1, 4, 2, 1, 2, 4, 4, 4, 1, 2, 4, 1, 7.
Our hash functions will all be h(x) = x + 6 mod 32. You should treat the
result as a 5-bit binary integer.

Determine the tail length for each stream element and the resulting
estimate of the number of distinct elements.
21
The Flajolet-Martin Algorithm - Advantages

● Scalability: This algorithm is considered scalable as it can be used to handle large


datasets and can estimate the number of unique elements in the dataset without storing
the entire dataset in the memory.
● Memory efficiency: This algorithm requires a small amount of memory to count the
number of unique elements. This can be achieved by using bit manipulations and hash
functions to create a small representation of the data.
● Speed: This algorithm is suited for real-time applications since it is computationally
efficient and can quickly generate count of the number of unique items by using
relatively less computing.

22
The Flajolet-Martin Algorithm - Disadvantages

● Accuracy : Instead of providing an exact result, this algorithm provides us with an estimation of the count
of distinct items. The accuracy depends on factors like the number of hash functions used, the length of the
binary string representation etc. In some applications, where precise count is required this algorithm is not
accurate enough.
● Sensitive to dataset : The accuracy of the Flajolet-Martin algorithm is also influenced by the distribution
and characteristics of the dataset. It may have better accuracy on datasets having uniform or random
distributions, but perform less accurately on datasets with skewed distributions or some specific patterns.
● Hash Function selection: The performance and accuracy of this algorithm can be influenced by the hash
functions used in the algorithm. It is very important to select appropriate hash functions to maintain a
balance between accuracy and efficiency.
● Limited applicability: The Flajolet-Martin algorithm is mainly designed for estimating the number of
unique elements and can’t be used for any other data analysis tasks. It does not provide insights information
about the specific elements or their frequencies. Its main goal is on estimation.
23
Test Yourself - Question No 2
To estimate the number of different elements appearing in a stream
Stream: 4, 2, 5 ,9, 1, 6, 3, 7
Hash function, h(x) = 3x + 1 mod 32

24
Test Yourself - Question No 2
To estimate the number of different elements appearing in a stream
Stream: 4, 2, 5 ,9, 1, 6, 3, 7
Hash function, h(x) = 3x + 1 mod 32

25
Test Yourself - Question No 1

To estimate the number of different elements appearing in a stream


Stream: 4, 2, 5 ,9, 1, 6, 3, 7
Hash function, h(x) = x + 6 mod 32

26
Test Yourself - Question No 1

To estimate the number of different elements appearing in a stream


Stream: 4, 2, 5 ,9, 1, 6, 3, 7
Hash function, h(x) = x + 6 mod 32

27
Thank You

28

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy