0% found this document useful (0 votes)
26 views43 pages

Data Analysis Slides

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views43 pages

Data Analysis Slides

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Data Analysis

Dr. Rahul Pandya

https://www.probabilitycourse.com/
Introduction

▶ Discuss limit theorems and convergence modes for random


variables.
▶ Limit theorems are among the most fundamental results in
probability theory.
▶ Two important limit theorems: the law of large numbers
(LLN) and the central limit theorem (CLT).
▶ Importance of these theorems as applied in practice.
▶ Discuss the convergence of sequences of random variables.
Limit Theorems

▶ Discuss two important theorems in probability, the law of large


numbers (LLN) and the central limit theorem (CLT).
▶ The LLN states that the average of a large number of i.i.d.
random variables converges to the expected value.
▶ The CLT states that, under some conditions, the sum of a
large number of random variables has an approximately
normal distribution.
Law of Large Numbers

▶ The law of large numbers has a very central role in probability


and statistics.
▶ States that if you repeat an experiment independently a large
number of times and average the result, what you obtain
should be close to the expected value.
▶ Two main versions: weak and strong laws of the large
numbers.
▶ Focus on the weak law of large numbers (WLLN).
▶ Define the sample mean:

X1 + X2 + ... + Xn
X̄ =
n

https://www.probabilitycourse.com/
Sample Mean

▶ Common notation for the sample mean is Mn .


▶ If the Xi have CDF FX (x), we might show the sample mean
by Mn (X ) to indicate the distribution of the Xi ’s.
▶ The sample mean X̄ = Mn (X ) is also a random variable.
▶ Expectation:

E [X̄ ] = E [X1 + X2 + ... + Xn ] = nE [Xn ] = E [X ]

▶ Variance:
Var (X1 + X2 + ... + Xn ) Var (X )
Var (X̄ ) = 2
=
n n

https://www.probabilitycourse.com/
Weak Law of Large Numbers (WLLN)

▶ Let X1 , X2 , ..., Xn be i.i.d. random variables with a finite


expected value E [Xi ] = µ < ∞.
▶ Then, for any ϵ > 0:

lim P(|X̄ − µ| ≥ ϵ) = 0.
n→∞

▶ Proof using Chebyshev’s inequality:

Var (X )
P(|X̄ − µ| ≥ ϵ) ≤
nϵ2
▶ This goes to zero as n → ∞.

https://www.probabilitycourse.com/
Central Limit Theorem (Slide 1)
▶ The central limit theorem (CLT) is one of the most important
results in probability theory.

▶ It states that, under certain conditions, the sum of a large


number of random variables is approximately normal.

▶ Here, we state a version of the CLT that applies to i.i.d.


random variables.

▶ Suppose that X1 , X2 , . . . , Xn are i.i.d. random variables with


expected values E [Xi ] = µ < ∞ and variance
Var (Xi ) = σ 2 < ∞.

▶ The sample mean is given by:

X1 + X2 + . . . + Xn
X̄ =
n

https://www.probabilitycourse.com/
Central Limit Theorem (Slide 2)

▶ The sample mean has mean E [X̄ ] = µ and variance


2
Var (X̄ ) = σn .

▶ Thus, the normalized random variable

X̄ − µ X + X2 + . . . + Xn − nµ
Zn = √ = 1 √
σ/ n nσ

▶ Zn has mean E [Zn ] = 0 and variance Var (Zn ) = 1.

▶ The central limit theorem states that the CDF of Zn


converges to the standard normal CDF.

https://www.probabilitycourse.com/
Central Limit Theorem
▶ The Central Limit Theorem (CLT)

▶ Let X1 , X2 , . . . , Xn be i.i.d. random variables with expected


value E [Xi ] = µ < ∞ and variance 0 < Var (Xi ) = σ 2 < ∞.

▶ Then, the random variable

X̄ − µ X + X2 + . . . + Xn − nµ
Zn = √ = 1 √
σ/ n nσ

▶ converges in distribution to the standard normal random


variable as n goes to infinity, that is

lim P(Zn ≤ x) = Φ(x), for all x ∈ R,


n→∞

▶ where Φ(x) is the standard normal CDF.


https://www.probabilitycourse.com/
Understanding the Central Limit Theorem

▶ An interesting aspect of the CLT is that the distribution of


the Xi ’s does not matter.
▶ The Xi ’s can be:
▶ Discrete
▶ Continuous
▶ Mixed random variables
▶ Let’s assume that Xi ’s are Bernoulli(p):
▶ E [Xi ] = p
▶ Var (Xi ) = p(1 − p)
Understanding the Central Limit Theorem

▶ Yn = X1 + X2 + . . . + Xn has a Binomial(n, p) distribution:

Yn − np
Zn = p
np(1 − p)

▶ Where Yn ∼ Binomial(n, p).


▶ Figure 7.1 shows the PMF of Zn for different values of n:
▶ The shape of the PMF approaches a normal PDF curve as n
increases.
▶ Zn is a discrete random variable with a PMF, not a PDF.
▶ The CLT states that the CDF of Zn converges to the standard
normal CDF.
▶ PMF and PDF are conceptually similar, making the figure
useful for visualizing convergence to the normal distribution.

https://www.probabilitycourse.com/
Understanding the Central Limit Theorem

Figure: Zn is the normalized sum of n independent Bernoulli(p) random


variables. The shape of its PMF, PZn (z), resembles the normal curve as n
https://www.probabilitycourse.com/
Understanding the Central Limit Theorem

Figure: Zn is the normalized sum of n independent Bernoulli(p) random


variables. The shape of its PMF, PZn (z), resembles the normal curve as n
https://www.probabilitycourse.com/
Understanding the Central Limit Theorem

▶ As another example, let’s assume that Xi ’s are Uniform(0,1):


▶ E [Xi ] = 12
▶ Var (Xi ) = 1
12
▶ In this case, we have:
n
X1 + X2 + . . . + Xn − 2
Zn = pn
12

https://www.probabilitycourse.com/
Understanding the Central Limit Theorem

Figure:

Zn is the normalized sum of n independent Uniform(0,1) random


variables.
The shape of its PDF, fZn (z), gets closer to the normal curve as n
increases. https://www.probabilitycourse.com/
Normalization of Random Variables

▶ We could have directly looked at Yn = X1 + X2 + ... + Xn .


▶ Why do we normalize it first and say that the normalized
version (Zn ) becomes approximately normal?
▶ This is because E [Yn ] = nE [Xi ] and Var (Yn ) = nσ 2 go to
infinity as n goes to infinity.
▶ We normalize Yn in order to have a finite mean and variance
(E [Zn ] = 0, Var (Zn ) = 1).
▶ The CDF of Zn is obtained by scaling and shifting the CDF of
Yn .
Importance of the Central Limit Theorem

▶ The importance of the CLT stems from the fact that, in many
real applications, a certain random variable of interest is a
sum of a large number of independent random variables.
▶ Examples include:
▶ Laboratory measurement errors modeled by normal random
variables.
▶ Gaussian noise in communication and signal processing.
▶ Percentage changes in asset prices modeled by normal random
variables.
▶ Random sampling from a population to obtain statistical
knowledge.
▶ The CLT simplifies computations significantly, especially when
dealing with sums of a large number of i.i.d. random variables.
▶ It is often stated that if n ≥ 30, then the normal
approximation is very good.
Applying the Central Limit Theorem (CLT)

▶ Write the random variable of interest Y as the sum of n i.i.d.


random variables Xi :
Xn
Y = Xi
i=1

https://www.probabilitycourse.com/
Finding Mean and Variance

▶ Find E [Y ] and Var (Y ):

E [Y ] = nµ, Var (Y ) = nσ 2

▶ Where µ = E [Xi ] and σ 2 = Var (Xi ).


Conclusion of CLT

▶ According to the CLT, we conclude that:

Y − E [Y ] Y − nµ
p = √
Var (Y ) nσ

is approximately standard normal.

https://www.probabilitycourse.com/
Finding Probability Using CLT

▶ To find P(y1 ≤ Y ≤ y2 ), write:


 
y1 − nµ Y − nµ y2 − nµ
P(y1 ≤ Y ≤ y2 ) = P √ ≤ √ ≤ √
nσ nσ nσ
   
y2 − nµ y1 − nµ
≈Φ √ −Φ √
nσ nσ

https://www.probabilitycourse.com/
Problem Statement

A bank teller serves customers standing in the queue one by one.


Suppose that the service time Xi for customer i has mean
E [Xi ] = 2 minutes and Var (Xi ) = 1. Assume that service times for
different customers are independent. Let Y be the total time the
bank teller spends serving 50 customers.
Find P(90 < Y < 110).

https://www.probabilitycourse.com/
Solution

▶ Let Y = X1 + X2 + · · · + X50 , the total service time for 50


customers.
▶ The mean and variance of Y can be found as:

E [Y ] = 50 · E [Xi ] = 50 · 2 = 100

Var (Y ) = 50 · Var (Xi ) = 50 · 1 = 50


▶ By the Central Limit Theorem (CLT), Y is approximately
normal for large n.

https://www.probabilitycourse.com/
Finding P(90 < Y < 110)

▶ Standardize the variable:


Y − E [Y ] Y − 100
Z= p = √
Var (Y ) 50

▶ Now, calculate P(90 < Y < 110):


 
90 − 100 110 − 100
P(90 < Y < 110) = P √ <Z < √
50 50
 
−10 10
=P √ <Z < √
50 50
▶ Simplifying:
= P(−1.414 < Z < 1.414)

https://www.probabilitycourse.com/
Conclusion

▶ Using standard normal distribution tables,


P(−1.414 < Z < 1.414) ≈= 0.8427
▶ Therefore, the probability that the total service time is
between 90 and 110 minutes is approximately 84.27%.
Z-Table or Standard Normal Table

Figure: Z-Table or Standard Normal Table


https://www.probabilitycourse.com/
Z-Table or Standard Normal Table

Figure: Z-Table or Standard Normal Table


https://www.probabilitycourse.com/
Problem Statement

In a communication system, each data packet consists of 1000 bits.


Due to noise, each bit may be received in error with a probability
of 0.1. It is assumed that bit errors occur independently.
Find the probability that there are more than 120 errors in a
certain data packet.
Solution: Step 1 - Binomial Distribution

Let n = 1000 be the number of bits in a packet, and p = 0.1 be


the probability of error. The number of errors X follows a binomial
distribution:

X ∼ Binomial(n = 1000, p = 0.1)


We need to find P(X > 120).

https://www.probabilitycourse.com/
Defining the Problem

Let us define Xi as the indicator random variable for the i-th bit in
the packet. That is,

Xi = 1 if the i-th bit is received in error, Xi = 0 otherwise.

The Xi ’s are i.i.d. and

Xi ∼ Bernoulli(p = 0.1).

If Y is the total number of bit errors in the packet, then

Y = X1 + X2 + · · · + Xn .
Mean and Variance of Xi

Since Xi ∼ Bernoulli(p = 0.1), we have:

E[Xi ] = µ = p = 0.1, Var(Xi ) = σ 2 = p(1 − p) = 0.09.

https://www.probabilitycourse.com/
Using the Central Limit Theorem

Using the Central Limit Theorem (CLT), we can estimate:


 
Y − nµ 120 − nµ
P(Y > 120) = P √ > √
nσ nσ

Substituting the values:


   
120 − 100 20
P(Y > 120) = P √ >0 ≈1−Φ √ .
90 90

https://www.probabilitycourse.com/
Final Probability

From the standard normal distribution table:

P (Y > 120) ≈ 1 − Φ(2.11) = 0.0175.

Therefore, the probability that there are more than 120 errors in
the data packet is approximately 1.75%.

https://www.probabilitycourse.com/
Z-Table or Standard Normal Table

Figure: Z-Table or Standard Normal Table


https://www.probabilitycourse.com/
Z-Table or Standard Normal Table

Figure: Z-Table or Standard Normal Table


https://www.probabilitycourse.com/
Continuity Correction

Let us assume that Y ∼ Binomial(n = 20, p = 21 ), and suppose


that we are interested in P(8 ≤ Y ≤ 10).
We know that a Binomial(n = 20, p = 21 ) can be written as the
sum of n i.i.d. Bernoulli(p) random variables:

Y = X1 + X2 + . . . + Xn .
Expectation and Variance

Since Xi ∼ Bernoulli(p = 21 ), we have


1 1
E [Xi ] = µ = p = , Var(Xi ) = σ 2 = p(1 − p) = .
2 4
Thus, we may want to apply the CLT to write
 
8 − nµ Y − nµ 10 − nµ
P(8 ≤ Y ≤ 10) = P √ < √ < √
nσ nσ nσ

   
8 − 10 Y − nµ 10 − 10 2
=P √ < √ < √ ≈ Φ(0)−Φ − √ = 0.3145.
5 nσ 5 5

https://www.probabilitycourse.com/
Exact Probability Calculation

Since here, n = 20 is relatively small, we can actually find


P(8 ≤ Y ≤ 10) accurately. We have
10  
X 20 k
P(8 ≤ Y ≤ 10) = p (1 − p)n−k
k
k=8
       20
20 20 20 1
= + + = 0.4565.
8 9 10 2

https://www.probabilitycourse.com/
Approximation Error

We notice that our approximation is not so good. Part of the error


is due to the fact that Y is a discrete random variable and we are
using a continuous distribution to find P(8 ≤ Y ≤ 10).
Here is a trick to get a better approximation, called continuity
correction. Since Y can only take integer values, we can write

P(8 ≤ Y ≤ 10) = P(7.5 < Y < 10.5)


Applying Continuity Correction

We can express this as:


 
7.5 − nµ Y − nµ 10.5 − nµ
=P √ < √ < √
nσ nσ nσ


   
7.5 − 10 Y − nµ 10.5 − 10 2.5
=P √ < √ < √ ≈ Φ(0.5/ 5)−Φ − √
5 nσ 5 5
= 0.4567.

As we see, using continuity correction, our approximation


improved significantly.

https://www.probabilitycourse.com/
Application of Continuity Correction

The continuity correction is particularly useful when we would like


to find P(y1 ≤ Y ≤ y2 ), where Y is binomial and y1 and y2 are
close to each other.

https://www.probabilitycourse.com/
Continuity Correction for Discrete Random Variables
Let X1 , X2 , . . . , Xn be independent discrete random variables and
let

Y = X1 + X2 + . . . + Xn .

https://www.probabilitycourse.com/
Finding Probability Using CLT

Suppose that we are interested in finding P(A) = P(l ≤ Y ≤ u)


using the CLT, where l and u are integers. Since Y is an
integer-valued random variable, we can write
 
1 1
P(A) = P l − ≤ Y ≤ u + .
2 2
It turns out that the above expression sometimes provides a better
approximation for P(A) when applying the CLT. This is called the
continuity correction, and it is particularly useful when Xi ’s are
Bernoulli (i.e., Y is binomial).

https://www.probabilitycourse.com/

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy