0% found this document useful (0 votes)

11 views34 pages

Flajolrt

The document discusses the Flajolet-Martin algorithm for estimating cardinality and its limitations, particularly the impact of outliers on averaging results. It suggests using a combination of mean and median for better accuracy by employing multiple hash functions. Additionally, it introduces Markov Chains as a model for dynamic systems, explaining their components and providing examples of their application in real-world scenarios like customer arrival at a billing counter.

Uploaded by

MANGAL KALE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views34 pages

Flajolrt

Uploaded by

MANGAL KALE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 34

UNIT II

Mathematical Foundations of Big Data

Limitations Flajolet Martin

Problem 4 , 2 ,5, 9 ,1, 6, 3, 7 => h(x) = (3x + 7) mod 32
Limitations Flajolet Martin

A problem with the Flajolet–Martin algorithm in the above form
is that the results vary significantly.
Limitations Flajolet Martin

Any element has at least tell of length i
Limitations Flajolet Martin

A common solution has been to run the algorithm multiple
times with k different hash functions

Take average of the results from the different runs, obtaining a
single estimate of the cardinality.
Substream 0: R0=3
Substream 1: R1=4
Substream 2: R2=2
Substream 3: R3=3
Limitations Flajolet Martin

A common solution has been to run the algorithm multiple t

Limitations Flajolet Martin

The problem with this is that averaging is very susceptible to
outliers (which are likely here).

A different idea is to use the median, which is less prone to be
influences by outliers.

The problem with median is that the results can only take form
2 R / ϕ,
Divide into L groups with k hash fun

A common solution is to combine both the mean and the
median: Create k ⋅ l hash functions

split them into k distinct groups (each of size l .

Within each group use the mean for aggregating together the l
results.

Finally take the median of the k group estimates as the final
estimate
Proof Flajolet

Any element has at least tell of length i zeros is 2-i

Any element not ends in i zeros 1-2-i

All m elements not ends in i zeros is (1-2-i)m

~ (All m elements not ends in i zeros is (1-2-i)m)
1-(1-2-i)m => Prob (least one element > i)

For 2i >> m = > Prob = 0
For 2i << m Prob = 1
Proof Flajolet

Our max element comes from wron region

If we take m = 2 pow R then it comes from right region
Bloom limitation Appliation and formula

Our max element comes from wron region

If we take m = 2 pow R then it comes from right region
Reservior sampling and approximate median

Our max element comes from wron region

If we take m = 2 pow R then it comes from right region
Markov Chain

Markov Chain: Is used to model the states of a Dynamic
System

It is a model and used for predicting future state of a Dynamic
System

What si Dynamic System
Markov Chain

Markov Chain: Is used to model the states of a Dynamic
System

It is a model and used for predicting future state of a Dynamic
System

What si Dynamic System
Assumptions

We have a discrete time counter

The one which counts as 1 , 2, 3 ,

The time may be a day, a year , a minute , a secod

We start from some state say X0

Then after n time steps we reach may be here here or here
How system behaves

At every time tick the system jums at random state

And the probabilities of the system to take certain state from a
given state are known

Then Our interest is to find the state of the system after n time
steps

ie. whether system will be in state 1 , state 2 or state n
Real Example : Billing Counter

Consider example of billing counter at some supermarket

Assumptions to fit this example into Markov Model

Lets assume that custemers arrive at time ticks of 1, 2 ,3

Where each time tick is a 5 minute interval

Cutomer arrives at a fixed probability

Customer leaves at some probability

Further we assume the queue size goes to 10
Probability implementation

At each time tick we flip a qoin with bias p to decide the
custome arrival

Likewise for customer departure


Then after 5 time ticks what will be the state of the system


How to find This
Transision Digram

Diagramatically this will be as follows


FIND WHAT : P [ p0 , p1, ..p10 ]
Components of Dynamic System

Set of Probable states States

Probabilities of going into any state from a given state

Initial state
How Markov Model helps : Simple Example

Consider a simple whether model :

Two states : sunny and rainy

GIVEN TRANSITION PROBABILITIES

Sunny to sunny 0.9

Sunny to rainy 0.1

Rainy to Rainy 0.5

Rainy to sunny 0.5
Transition Matrix Conventions
MATRIX CONVENTIONS

Next state 1 next state 2

Current state 1
Current state 2
FORMULA to calculate state after n time

The following is formula to find the state of the system after n
transitions

Pn = P0 . A n where P0 is in row form
Example Pepsi and Coke
Example Harward and Yale
Random Walk
Example stationary Distribution
Calculations
r current end = r current intermediate * P intermediate end


R 11 = ?

Current c = 1

End e = 1

Intermediate I = All given statesl 1 , 2
in following formula only intermediate will vary
So r 1 1 * P 1 1 + r12 * P21
Stationary distribution

A stationary Distribution of a markov chain is a probability
distribution that remains unchanged in the markov cahian as
time progresses

Typically it is represented as roe vector pi whose entries are
probabilities summing to 1, and given transition matrix A, Then
following is satisfied , whwere pi is a unique matrix


Pi = pi * A
Example : Computer system
Variance and standard Deviation

Find variance and standard deviation of following data set
70 , 60 , 72 , 42 , 86

J&T Express Leveraging Information Systems For Competitive Advantage
No ratings yet
J&T Express Leveraging Information Systems For Competitive Advantage
14 pages
Toaz - Info Cyberpunk 2020 Adventure All Fall Down Ag5040 PR - PDF
100% (1)
Toaz - Info Cyberpunk 2020 Adventure All Fall Down Ag5040 PR - PDF
34 pages
MArkov
No ratings yet
MArkov
26 pages
DCN 1
No ratings yet
DCN 1
128 pages
ESI 4313 Operations Research 2: Markov Chains Basics
No ratings yet
ESI 4313 Operations Research 2: Markov Chains Basics
45 pages
Probability & Statistics 2: Robert Šámal January 29, 2024
No ratings yet
Probability & Statistics 2: Robert Šámal January 29, 2024
29 pages
2.markov Chaains
No ratings yet
2.markov Chaains
17 pages
R Problem Solving
No ratings yet
R Problem Solving
7 pages
Markov Chains
No ratings yet
Markov Chains
55 pages
6 - Discrete Markov Chains
No ratings yet
6 - Discrete Markov Chains
34 pages
Stochastic Process Simulation in Matlab
No ratings yet
Stochastic Process Simulation in Matlab
17 pages
Markov Models: Current Next Transition Probabilities Current
100% (1)
Markov Models: Current Next Transition Probabilities Current
53 pages
Chapter 4 Markov Chain
No ratings yet
Chapter 4 Markov Chain
39 pages
Computational Genomics Hidden Markov Models (HMMS)
No ratings yet
Computational Genomics Hidden Markov Models (HMMS)
55 pages
Markov Hand Out
No ratings yet
Markov Hand Out
14 pages
Markov Process
No ratings yet
Markov Process
28 pages
Finite Markov Chains and Algorithmic Applications
100% (1)
Finite Markov Chains and Algorithmic Applications
123 pages
Ross Chapter 4 Sols
No ratings yet
Ross Chapter 4 Sols
3 pages
Lecture 3 Markov Chain
No ratings yet
Lecture 3 Markov Chain
76 pages
Markov Chains
No ratings yet
Markov Chains
42 pages
Markov Chains
No ratings yet
Markov Chains
45 pages
Markov Chains and Simulation Techniques
No ratings yet
Markov Chains and Simulation Techniques
51 pages
MC MC Revolution
No ratings yet
MC MC Revolution
27 pages
Cadeia de Markov
No ratings yet
Cadeia de Markov
184 pages
18-MarkovChains 2
No ratings yet
18-MarkovChains 2
30 pages
Sistema. Markov Chain - Anton, Rorres - 10.4 (Intro) (Solucao de Sistema)
No ratings yet
Sistema. Markov Chain - Anton, Rorres - 10.4 (Intro) (Solucao de Sistema)
10 pages
Chapter 5 Markov Chains Lecture
No ratings yet
Chapter 5 Markov Chains Lecture
30 pages
Markov Analysis
No ratings yet
Markov Analysis
34 pages
Module 5
No ratings yet
Module 5
51 pages
HMM
No ratings yet
HMM
25 pages
Discrete Markov Chain
No ratings yet
Discrete Markov Chain
43 pages
Chapter 4 - Discrete Time Markov Chains
No ratings yet
Chapter 4 - Discrete Time Markov Chains
37 pages
Markov Chains - Lectures - CMC - 2024
No ratings yet
Markov Chains - Lectures - CMC - 2024
168 pages
Mining Data Streams (Part 2)
No ratings yet
Mining Data Streams (Part 2)
56 pages
STAT333 Lecture Notes Book Version
No ratings yet
STAT333 Lecture Notes Book Version
71 pages
Lec1 Bloom Distinctcount
No ratings yet
Lec1 Bloom Distinctcount
76 pages
HiddenMarkovModel FINAL
100% (2)
HiddenMarkovModel FINAL
73 pages
Eie512 08 24
No ratings yet
Eie512 08 24
20 pages
MATH37012 Course Notes: Discrete Time: DR Jonathan Bagley
No ratings yet
MATH37012 Course Notes: Discrete Time: DR Jonathan Bagley
29 pages
19MAT301 - Practice Sheet 2 & 3
No ratings yet
19MAT301 - Practice Sheet 2 & 3
10 pages
Cadeia de Markov
No ratings yet
Cadeia de Markov
178 pages
Chapter 17 - Markov Chains
No ratings yet
Chapter 17 - Markov Chains
51 pages
Calculating Transient Probabilities of A R Program and Test Bank
No ratings yet
Calculating Transient Probabilities of A R Program and Test Bank
15 pages
Queueing 2017 Fall
No ratings yet
Queueing 2017 Fall
48 pages
Markov Processes
No ratings yet
Markov Processes
60 pages
Markov Chains: Modified by Longin Jan Latecki Temple University, Philadelphia Latecki@temple - Edu
No ratings yet
Markov Chains: Modified by Longin Jan Latecki Temple University, Philadelphia Latecki@temple - Edu
36 pages
Markov Chains 2023
No ratings yet
Markov Chains 2023
130 pages
Markov Chains2
No ratings yet
Markov Chains2
75 pages
DSBD Unit-Ii 2
No ratings yet
DSBD Unit-Ii 2
47 pages
Markov Chains
No ratings yet
Markov Chains
50 pages
FL10 PDF
No ratings yet
FL10 PDF
30 pages
12 dtmc1
No ratings yet
12 dtmc1
43 pages
Bahan Bacaan 3
No ratings yet
Bahan Bacaan 3
5 pages
Discrete-Time Markov Chains: ELEC345 1
No ratings yet
Discrete-Time Markov Chains: ELEC345 1
33 pages
Fuzzy Stat Prob
No ratings yet
Fuzzy Stat Prob
24 pages
Markov Analysis
No ratings yet
Markov Analysis
16 pages
Lec5 Markov Chain
No ratings yet
Lec5 Markov Chain
43 pages
Lec7 MarkovChains
No ratings yet
Lec7 MarkovChains
14 pages
Topic 4 CEM615
No ratings yet
Topic 4 CEM615
69 pages
Week 3-Stochastic Processes
No ratings yet
Week 3-Stochastic Processes
29 pages
R Rec M.823 3 2006
No ratings yet
R Rec M.823 3 2006
20 pages
1034 Chap 2
100% (1)
1034 Chap 2
48 pages
2a29477 Clapper Check Valve Ops Manual
No ratings yet
2a29477 Clapper Check Valve Ops Manual
28 pages
Turbine Monitoring and Control: Aset - Eee
No ratings yet
Turbine Monitoring and Control: Aset - Eee
16 pages
MGI - Thriving Amid Turbulence Imagining The Cities of The Future
No ratings yet
MGI - Thriving Amid Turbulence Imagining The Cities of The Future
16 pages
Motherboard Manual
No ratings yet
Motherboard Manual
23 pages
B.Tech Project Report Format
No ratings yet
B.Tech Project Report Format
31 pages
Common
No ratings yet
Common
81 pages
Acción Psicológica - Home Page
No ratings yet
Acción Psicológica - Home Page
1 page
Ms Word
No ratings yet
Ms Word
42 pages
STA404 Exam Booklet - 20.03.2023
No ratings yet
STA404 Exam Booklet - 20.03.2023
153 pages
TC - Conversion Process
No ratings yet
TC - Conversion Process
5 pages
Unit1 Ai&ml
No ratings yet
Unit1 Ai&ml
51 pages
3rd Quarter Summative Test in Animation For Week 1 - 2 Grade 7-8
No ratings yet
3rd Quarter Summative Test in Animation For Week 1 - 2 Grade 7-8
2 pages
IdeaPad 3 15IAU7 82RK017PIN
No ratings yet
IdeaPad 3 15IAU7 82RK017PIN
2 pages
Hmi WS23-24
No ratings yet
Hmi WS23-24
5 pages
ASUS VW198S Service Manual PDF
0% (1)
ASUS VW198S Service Manual PDF
57 pages
Lec 10
No ratings yet
Lec 10
51 pages
2022 JamesCook Katalog EN Homepage
No ratings yet
2022 JamesCook Katalog EN Homepage
36 pages
The Augmented Matrix of A Linear System
No ratings yet
The Augmented Matrix of A Linear System
14 pages
I2c 1602 LCD
100% (1)
I2c 1602 LCD
8 pages
Data Dictionary Example
No ratings yet
Data Dictionary Example
3 pages
BeneFusion VP1 Vet Operators Manual - ENG - V9.0
No ratings yet
BeneFusion VP1 Vet Operators Manual - ENG - V9.0
90 pages
How To Make Speakers
No ratings yet
How To Make Speakers
4 pages
Guide To Kindle Content Quality
No ratings yet
Guide To Kindle Content Quality
8 pages
Romance D'Amour Sheet Music For Piano (Solo) Easy
No ratings yet
Romance D'Amour Sheet Music For Piano (Solo) Easy
1 page
551030-Task 4 - Group6
No ratings yet
551030-Task 4 - Group6
24 pages
Time Table
No ratings yet
Time Table
6 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Flajolrt

Uploaded by

Flajolrt

Uploaded by

UNIT II

Mathematical Foundations of Big Data

Next state 1 next state 2

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.