0% found this document useful (0 votes)
11 views34 pages

Flajolrt

The document discusses the Flajolet-Martin algorithm for estimating cardinality and its limitations, particularly the impact of outliers on averaging results. It suggests using a combination of mean and median for better accuracy by employing multiple hash functions. Additionally, it introduces Markov Chains as a model for dynamic systems, explaining their components and providing examples of their application in real-world scenarios like customer arrival at a billing counter.

Uploaded by

MANGAL KALE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views34 pages

Flajolrt

The document discusses the Flajolet-Martin algorithm for estimating cardinality and its limitations, particularly the impact of outliers on averaging results. It suggests using a combination of mean and median for better accuracy by employing multiple hash functions. Additionally, it introduces Markov Chains as a model for dynamic systems, explaining their components and providing examples of their application in real-world scenarios like customer arrival at a billing counter.

Uploaded by

MANGAL KALE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34

UNIT II

Mathematical Foundations of Big Data


Limitations Flajolet Martin

Problem 4 , 2 ,5, 9 ,1, 6, 3, 7 => h(x) = (3x + 7) mod 32
Limitations Flajolet Martin

A problem with the Flajolet–Martin algorithm in the above form
is that the results vary significantly.
Limitations Flajolet Martin

Any element has at least tell of length i
Limitations Flajolet Martin

A common solution has been to run the algorithm multiple
times with k different hash functions

Take average of the results from the different runs, obtaining a
single estimate of the cardinality.
Substream 0: R0=3
Substream 1: R1=4
Substream 2: R2=2
Substream 3: R3=3
Limitations Flajolet Martin

A common solution has been to run the algorithm multiple t

Limitations Flajolet Martin

The problem with this is that averaging is very susceptible to
outliers (which are likely here).

A different idea is to use the median, which is less prone to be
influences by outliers.

The problem with median is that the results can only take form
2 R / ϕ,
Divide into L groups with k hash fun

A common solution is to combine both the mean and the
median: Create k ⋅ l hash functions

split them into k distinct groups (each of size l .

Within each group use the mean for aggregating together the l
results.

Finally take the median of the k group estimates as the final
estimate
Proof Flajolet

Any element has at least tell of length i zeros is 2-i

Any element not ends in i zeros 1-2-i

All m elements not ends in i zeros is (1-2-i)m

~ (All m elements not ends in i zeros is (1-2-i)m)
1-(1-2-i)m => Prob (least one element > i)

For 2i >> m = > Prob = 0
For 2i << m Prob = 1
Proof Flajolet

Our max element comes from wron region

If we take m = 2 pow R then it comes from right region
Bloom limitation Appliation and formula

Our max element comes from wron region

If we take m = 2 pow R then it comes from right region
Reservior sampling and approximate median

Our max element comes from wron region

If we take m = 2 pow R then it comes from right region
Markov Chain

Markov Chain: Is used to model the states of a Dynamic
System

It is a model and used for predicting future state of a Dynamic
System

What si Dynamic System
Markov Chain

Markov Chain: Is used to model the states of a Dynamic
System

It is a model and used for predicting future state of a Dynamic
System

What si Dynamic System
Assumptions

We have a discrete time counter

The one which counts as 1 , 2, 3 ,

The time may be a day, a year , a minute , a secod

We start from some state say X0

Then after n time steps we reach may be here here or here
How system behaves

At every time tick the system jums at random state

And the probabilities of the system to take certain state from a
given state are known

Then Our interest is to find the state of the system after n time
steps

ie. whether system will be in state 1 , state 2 or state n
Real Example : Billing Counter

Consider example of billing counter at some supermarket

Assumptions to fit this example into Markov Model

Lets assume that custemers arrive at time ticks of 1, 2 ,3

Where each time tick is a 5 minute interval

Cutomer arrives at a fixed probability

Customer leaves at some probability

Further we assume the queue size goes to 10
Probability implementation

At each time tick we flip a qoin with bias p to decide the
custome arrival

Likewise for customer departure


Then after 5 time ticks what will be the state of the system


How to find This
Transision Digram

Diagramatically this will be as follows


FIND WHAT : P [ p0 , p1, ..p10 ]
Components of Dynamic System

Set of Probable states States

Probabilities of going into any state from a given state

Initial state
How Markov Model helps : Simple Example

Consider a simple whether model :

Two states : sunny and rainy

GIVEN TRANSITION PROBABILITIES

Sunny to sunny 0.9

Sunny to rainy 0.1

Rainy to Rainy 0.5

Rainy to sunny 0.5
Transition Matrix Conventions
MATRIX CONVENTIONS

Next state 1 next state 2


Current state 1
Current state 2
FORMULA to calculate state after n time

The following is formula to find the state of the system after n
transitions

Pn = P0 . A n where P0 is in row form
Example Pepsi and Coke
Example Harward and Yale
Random Walk
Example stationary Distribution
Calculations
r current end = r current intermediate * P intermediate end


R 11 = ?

Current c = 1

End e = 1

Intermediate I = All given statesl 1 , 2
in following formula only intermediate will vary
So r 1 1 * P 1 1 + r12 * P21
Stationary distribution

A stationary Distribution of a markov chain is a probability
distribution that remains unchanged in the markov cahian as
time progresses

Typically it is represented as roe vector pi whose entries are
probabilities summing to 1, and given transition matrix A, Then
following is satisfied , whwere pi is a unique matrix


Pi = pi * A
Example : Computer system
Variance and standard Deviation

Find variance and standard deviation of following data set
70 , 60 , 72 , 42 , 86

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy