Flajolet-Martin Algorithm
Flajolet-Martin Algorithm
Computer Department
Module 4
Mining Data Streams
Learning Outcomes:
● Understand and explain how the Flajolet-Martin algorithm works
● Implement FM algorithm
● Able to identify distinct elements in any data stream.
● Able to use in various applications
2
Counting Distinct Elements in a Stream
3
Counting Distinct Elements in a Stream
● Applications
○ Consider a Website gathering statistics on how many unique users it
has seen in each given month
○ Google - Search query
5
Counting Distinct Elements in a Stream
Real Problem: What if we do not have space to store the complete set ?
● If the stream contains n elements with m of them unique, this algorithm runs in
O(n) time and needs O(log(m)) memory. 6
The Flajolet-Martin Algorithm
8
The Flajolet-Martin Algorithm - Example
9
The Flajolet-Martin Algorithm - Example
10
The Flajolet-Martin Algorithm - Example
11
The Flajolet-Martin Algorithm - Example
12
The Flajolet-Martin Algorithm - Example
13
The Flajolet-Martin Algorithm - Example
14
The Flajolet-Martin Algorithm - Example
15
The Flajolet-Martin Algorithm - Example
Binary-bit Calculation
16
The Flajolet-Martin Algorithm - Example
17
The Flajolet-Martin Algorithm - Example
18
The Flajolet-Martin Algorithm - Example
19
The Flajolet-Martin Algorithm - Example
Distinct Elements
20
The Flajolet-Martin Algorithm - Solve
Determine the tail length for each stream element and the resulting
estimate of the number of distinct elements.
21
The Flajolet-Martin Algorithm - Advantages
22
The Flajolet-Martin Algorithm - Disadvantages
● Accuracy : Instead of providing an exact result, this algorithm provides us with an estimation of the count
of distinct items. The accuracy depends on factors like the number of hash functions used, the length of the
binary string representation etc. In some applications, where precise count is required this algorithm is not
accurate enough.
● Sensitive to dataset : The accuracy of the Flajolet-Martin algorithm is also influenced by the distribution
and characteristics of the dataset. It may have better accuracy on datasets having uniform or random
distributions, but perform less accurately on datasets with skewed distributions or some specific patterns.
● Hash Function selection: The performance and accuracy of this algorithm can be influenced by the hash
functions used in the algorithm. It is very important to select appropriate hash functions to maintain a
balance between accuracy and efficiency.
● Limited applicability: The Flajolet-Martin algorithm is mainly designed for estimating the number of
unique elements and can’t be used for any other data analysis tasks. It does not provide insights information
about the specific elements or their frequencies. Its main goal is on estimation.
23
Test Yourself - Question No 2
To estimate the number of different elements appearing in a stream
Stream: 4, 2, 5 ,9, 1, 6, 3, 7
Hash function, h(x) = 3x + 1 mod 32
24
Test Yourself - Question No 2
To estimate the number of different elements appearing in a stream
Stream: 4, 2, 5 ,9, 1, 6, 3, 7
Hash function, h(x) = 3x + 1 mod 32
25
Test Yourself - Question No 1
26
Test Yourself - Question No 1
27
Thank You
28