0% found this document useful (0 votes)
45 views12 pages

W7PS

1. The document discusses concepts from statistics including normal distributions, sampling distributions, central limit theorem, and weak law of large numbers. 2. Several practice problems and their solutions are provided to illustrate calculating probabilities and variances for random variables, finding population parameters from sample statistics, and determining sample sizes needed based on desired probability bounds. 3. Key concepts demonstrated include properties of normal distributions, how sample means converge in distribution to normal as sample size increases, and using weak law of large numbers to bound probabilities based on population variance and sample size.

Uploaded by

polar neckson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views12 pages

W7PS

1. The document discusses concepts from statistics including normal distributions, sampling distributions, central limit theorem, and weak law of large numbers. 2. Several practice problems and their solutions are provided to illustrate calculating probabilities and variances for random variables, finding population parameters from sample statistics, and determining sample sizes needed based on desired probability bounds. 3. Key concepts demonstrated include properties of normal distributions, how sample means converge in distribution to normal as sample size increases, and using weak law of large numbers to bound probabilities based on population variance and sample size.

Uploaded by

polar neckson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Statistics for Data Science - 2

Week 7 practice Assignment


Statistics from samples and Limit theorems

X
1. If X, Y ∼ i.i.d. Normal(0, 4), what will be the variance of ?
Y
(a) 4
(b) 2
(c) 1
(d) Undefined

Solution:
X
We know that if X, Y ∼ i.i.d. Normal(0, σ 2 ), ∼ Cauchy(0, 1) and variance of Cauchy
Y
distribution is undefined.
Therefore, option(d) is correct.

2. A population has mean 60 and standard deviation 6. Random samples of size 100 from
this population are collected independently. Find the expected value of the sample mean.
Solution:
We know that expected value of the sample mean X is given by

E[X] = µ
= 60

3. Let X1 , X2 , X3 , X4 and X5 ∼ i.i.d. Normal(2, 25). Calculate P (2X1 + X2 + 3X3 + X4 +


X5 ≥ 10).

1. FZ (0.3)
2. 1 − FZ (0.3)
3. FZ (−0.3)
4. 1 − FZ (−0.3)

Solution:

We know that linear combination of independent Normal distributions is again a normal


distribution.
Hence, 2X1 + X2 + 3X3 + X4 + X5 will follow a Normal distribution.
Let Y = 2X1 + X2 + 3X3 + X4 + X5
E[Y ] = E[2X1 + X2 + 3X3 + X4 + X5 ] = (2 + 1 + 3 + 1 + 1)E[X] = 16
Var(Y ) = Var(2X1 + X2 + 3X3 + X4 + X5 ) = (4 + 1 + 9 + 1 + 1)Var(X) = 400

It implies that Y ∼ Normal(16, 202 ).


To find: P (Y ≥ 10)

Now,

P (Y ≥ 10) = P (Y − 16 ≥ −6)
Y − 16 −6
= P( ≥ )
20 20
Y − 16
= P( ≥ −0.3)
20
= P (Z ≥ −0.3)
= 1 − P (Z < −0.3)
= 1 − FZ (−0.3)

4. Random samples of size 100 are collected from a population of unknown parameters. If
the variance of the sample mean is 36, what will be the standard deviation of the actual
population?
Solution:
σ2
We know that variance of the sample mean is given by where σ is the standard
n
deviation of the actual population and n is the sample size.

By the given information, we have

σ2
= 36
n
σ2
⇒ = 36
100
⇒σ 2 = 3600
⇒σ = 60

Therefore, standard deviation of the actual population is 60.

5. A random sample of size 50 is collected from a population with a standard deviation of


5. Find the upper bound on the probability that the sample mean will be at least 10
away from the actual mean using the weak law of large numbers. Write your answer
correct to three decimal places.

Page 2
Solution:
Given: standard deviation of the population, σ = 5
Sample size, n = 50

To find: upper bound on P (|X − µ| ≥ 10) where X and µ are sample mean and popu-
lation mean, respectively.

Now, by weak law of large number, we have

σ2
P (|X − µ| ≥ δ) ≤
nδ 2
25
⇒P (|X − µ| ≥ 10) ≤
100 × 50
⇒P (|X − µ| ≥ 10) ≤ 0.005

6. A study shows that the average daily sleeping hours of teenagers is ten hours with a
standard deviation of two hours. If a sample of 100 teenagers is collected, what will be
the probability that the mean of the sleeping hours of these 100 teenagers is at least 0.4
hours away from the population mean? Assume that each observation in the sample is
independent. Assume that FZ denotes the CDF of standard normal distribution.

(a) 1 + FZ (−2) − FZ (2)


(b) 1 − FZ (−2) + FZ (2)
(c) FZ (2) − FZ (−2)
(d) FZ (2)

Solution:
let X denote the average daily sleeping hours of teenagers.
Given: standard deviation of X, σ = 2
Sample size, n = 100

To find: P (|X − µ| ≥ 0.4) where X and µ are sample mean and population mean,
respectively.

Let S = X1 + X2 + . . . X100 where Xi denotes the ith sample.


S − nµ S − 100µ
By CLT, we know that √ ∼ Normal(0, 1) ⇒ ∼ Z (Standard Normal)
σ n 20

Page 3
Now,

S
P (|X − µ| ≥ 0.4) = P ( − µ ≥ 0.4)

n

S − nµ
= P ( ≥ 0.4)

n
S − nµ 0.4√n

= P ( √ ≥ )

σ n σ
= P (|Z| ≥ 2)
= P (Z ≥ 2) + P (Z ≤ −2)
= 1 − P (Z ≤ 2) + P (Z ≤ −2)
= 1 − FZ (2) + FZ (−2)

7. What is the fourth moment of the Normal(0, 4) distribution?


Solution:
λ2 X 2 λ3 X 3
MX (λ) = E[eλX ] = E[1 + λX + + + ...]
2! 3!
λ2 E[X 2 ] λ3 E[X 3 ]
= 1 + λE[X] + + + ...
2! 3!
In the moment generating function, coefficient of λ will give first moment (E[X]), co-
λ2 λk
efficient of will give the second moment (E[X 2 ]) and similarly, coefficient of will
2! k!
give the kth moment (E[X k ]).

2 σ2
Moment generating function of Normal(0, σ 2 ) is given by eλ /2
.
Let N ∼ Normal(0, 22 )
λ2 22/2
MN (λ) = e
λ2 22 λ4 24
=1+ + + ...
2 2!(4)
λ2 22 λ2
=1+ + 48 + . . .
2 4!

λ4
Therefore, 4th moment of Normal(0, 22 ) = coefficient of = 48
4!

8. Let X ∼ Gamma(2, 12 ) and Y ∼ Gamma(5, 21 ) be two independent random variables.


X
What will be the expected value of ? Write your answer correct to two decimal
X +Y

Page 4
places.
Solution:
We know that if X ∼ Gamma(α, k) and Y ∼ Gamma(β, k) be two independent random
X
variables, then ∼ Beta(α, β).
X +Y

Given that X ∼ Gamma(2, 12 ) and Y ∼ Gamma(5, 12 ) are two independent random


variables. It implies that
X
∼ Beta(2, 5)
X +Y
 
X 2
Therefore, E = = 0.28
X +Y 2+5

9. A study says that the delivery time of pizzas has a standard deviation of 10 minutes. A
pizza shop collected the data of some deliveries and their
√ delivery time. The probability
that the mean delivery time of this sample is at least 5 minutes away from the actual
mean delivery time is at most 51 as per the weak law of large numbers. What is the size
of the sample?
Solution:
Let X denote the delivery time of pizzas.
Given that σ = 10 √
To find: size of the sample such that P (|X − µ| ≥ 5) ≤ 15 ...(1).
By the weak law of large numbers, we have

σ2
P (|X − µ| ≥ δ) ≤ 2

√ 100
⇒P (|X − µ| ≥ 5) ≤ ...(1)
n×5

By equation (1) and (2), we have


1 100
=
5 5n
⇒n = 100

10. A company sells eggs whose weights are normally distributed with a mean of 70g and a
standard deviation of 2g. Suppose that these eggs are sold in packages that each contain
four eggs. Assume that the weight of each egg is independent. What is the probability
that the mean weight of the four eggs in a package is greater than 68.5g? Write your
answer correct to two decimal places.
(Hint: Use the fact that linear combination of normal distributions is again a normal
distribution. FZ (−1.5) = 0.066)

Page 5
Solution:
Let X denote the weight of an egg.
Given that E[X] = µ = 70
SD(X) = σ = 2
X ∼ Normal(70, 22 ) Let X1 , X2 , X3 and X4 denote the weights of four eggs in a package.

Suppose that
X1 + X2 + X3 + X4
X=
4

To find: P (X > 68.5)

We know that linear combination of independent Normal distribution is again a Normal


distribution.
It implies that X is a Normal distribution.

E[X] = µ = 70 and
σ2 4
Var(X) = = =1
n 4

It implies that X ∼ Normal(70, 1) ⇒ X − 70 ∼ Normal(0, 1)

Now,

P (X > 68.5) = P (X − 70 > −1.5)


= P (Z > −1.5)
= 1 − FZ (−1.5)
= 1 − 0.066 = 0.93

11. Let X1 , X2 , X3 , . . . Xn be i.i.d. Poisson(4). What should be the value of n such that
P (3.8 ≤ X ≤ 4.2) ≥ 0.95? [2 marks]
(Hint: Use FZ (1.96) = 0.975)

1. at least 200
2. at least 385
3. at least 450
4. at least 585

Solution:
Given that X1 , X2 , X3 , . . . Xn ∼ i.i.d. Poisson(4)

Page 6
Mean of the distribution = µ = 4
Variance of the distribution = σ 2 = 4
Let S = X1 + X2 + . . . + Xn and
X1 + X2 + . . . + Xn
X=
n

To find: value of n such that P (3.8 ≤ X ≤ 4.2) ≥ 0.95


By CLT, we know that
S − nµ
√ ∼ Normal(0, 1)

S − 4n
⇒ √ ∼ Normal(0, 1) ...(1)
2 n

P (3.8 ≤ X ≤ 4.2) ≥ 0.95


S
⇒P (3.8 ≤ ≤ 4.2) ≥ 0.95
n
S
⇒P (−0.2 ≤ − 4 ≤ 0.2) ≥ 0.95
n
S − 4n
⇒P (−0.2 ≤ ≤ 0.2) ≥ 0.95
n
S − 4n
⇒P (−0.1 ≤ ≤ 0.1) ≥ 0.95
2n
√ S − 2n √
⇒P (−0.1 n ≤ √ ≤ 0.1 n) ≥ 0.95
2 n
√ √
⇒FZ (0.1 n) − FZ (−0.1 n) ≥ 0.95
√ √
⇒FZ (0.1 n) − (1 − FZ (0.1 n)) ≥ 0.95

⇒2FZ (0.1 n) − 1 ≥ 0.95

⇒Fz (0.1 n) ≥ 0.975

⇒0.1 n ≥ 1.96
⇒n ≥ 384.16

12. Let the moment generating function of a random variable X be given by


         
1 −4λ 1 −2λ 1 2λ 1 4λ 5
MX (λ) = e + e + e + e +
8 6 6 8 12
Find the distribution of X. [1 mark]

X −4 −2 0 2 4
1 1 1 1 5
P (X = x) 8 6 6 8 12

1.

Page 7
X −4 −2 0 2 4
5 1 1 1 1
P (X = x) 12 8 6 6 8

2.

X −4 −2 0 2 4
1 1 5 1 1
P (X = x) 8 6 12 6 8

3.

X −4 −2 0 2 4
1 1 5 1 1
P (X = x) 8 6 12 8 6

4.

Solution:
The MGF of a discrete random variable X with the PMF fX (x) = P (X = x), x ∈ TX
is given by

MX (λ) = E[eλX ]
X
= P (X = x).eλx
x∈TX

Now, MGF of a random variable X be given by


         
1 −4λ 1 −2λ 1 2λ 1 4λ 5
MX (λ) = e + e + e + e +
8 6 6 8 12

Therefore, distribution of X is given by

X −4 −2 0 2 4
1 1 5 1 1
P (X = x) 8 6 12 6 8

13. A fair die is rolled 3600 times. Use CLT to compute the probability that six appears at
most 630 times. Enter the answer correct to two decimal places.
(Hint: Use FZ (1.341) = 0.91)
Solution:
Define a random variable X such that
(
1 if six appears on rolling a fair die
X=
0 otherwise

Page 8
1
Therefore, E[X] = µ = and
6
1 5 5
Var(X) = σ 2 = . =
6 6 36

Let X1 , X2 , . . . , X3600 be outcomes on rolling the fair die 3600 times.


Notice that X1 +X2 +. . .+X3600 will denote the number of times six appears in 3600 rolls.

Let S = X1 + X2 + . . . + X3600

To find: P (S ≤ 630)

By CLT, we know that


S − 3600µ
√ ∼ Normal(0, 1)
σ n
S − 600
⇒ √ ∼ Normal(0, 1)
10 5
Now,
P (S ≤ 630) = P (S − 600 ≤ 30)
S − 600 30
= P( √ ≤ √ )
10 5 10 5
= P (Z ≤ 1.34)
= 0.91

14. A fair die is rolled 1000 times. Let X denote the number of times six is obtained. Find
X 1
a bound for the probability that differs from by more than 0.2 using weak law
1000 6
of large numbers.
5
1. at least
1440
1436
2. at least
1440
5
3. at most
1440
1436
4. at most
1440
Solution:
X denotes the number of times six is obtained on rolling the die 1000 times.
Let X1 , X2 , . . . , X1000 be 1000 i.i.d. samples such that
(
1 if six appears on rolling a fair die
Xi =
0 otherwise

Page 9
1
E[Xi ] = µ = and
6
5
Var(Xi ) = σ 2 =
36
Notice that X = X1 + X2 + X3 + . . . + X1000
!
X 1
To find: Bound on P − > 0.2 .

1000 6

By weak law of large numbers, we have


σ2
P (|X − µ| > δ) ≤ 2
nδ!
X 1 5
⇒P − > 0.2 ≤

1000 6 36 × 1000 × 0.04
!
X 1 5
⇒P − > 0.2 ≤

1000 6 1440

15. Consider the following PDF curves and match them with the correct distribution. [1
mark]

Graph 1 Graph 2

Graph 3 Graph 4

Page 10
(a) Graph 1 → Gamma, Graph 2 → Normal, Graph 3 → Gamma, Graph 4 → Beta.
(b) Graph 1 → Beta, Graph 2 → Gamma, Graph 3 → Normal, Graph 4 → Gamma.
(c) Graph 1 → Beta, Graph 2 → Normal, Graph 3 → Normal, Graph 4 → Gamma.
(d) Graph 1 → Gamma, Graph 2 → Normal, Graph 3 → Normal, Graph 4 → Beta.

Solution:
Graph 1: Range of the distribution is [0, 1] and shape of the graph resembles to the Beta
distribution.

Graph 2: PDF curve is not symmetric about mean and shape of the graph resembles to
the Gamma distribution.

Graph 3: PDF curve is symmetric about mean and shape of the graph resembles to the
Normal distribution.

Graph 4: PDF curve is not symmetric about mean and shape of the graph resembles to
the Gamma distribution.
Therefore, Graph 1 → Beta, Graph 2 → Gamma, Graph 3 → Normal, Graph 4 →
Gamma.

16. Let X1 , X2 and X3 ∼ i.i.d. X where X has the following probability mass function:

x -1 2
2 1
fX (x) 3 3

Table 7.1.P: PMF of X

Find the distribution of Y = X1 + X2 + X3 . [1 mark]

Y -3 0 3 6
(a) 1 1 1 1
P (Y = y) 6 6 3 3

Y -3 0 3 6
(b) 8 4 2 1
P (Y = y) 27 9 9 27

Y -3 0 3 6
(c) 8 1 4 2
P (Y = y) 27 27 9 9

Y -3 0 3 6
(d) 2 8 1 4
P (Y = y) 9 27 27 9

Page 11
Solution:
The PMF of X is given by

x -1 2
2 1
fX (x) 3 3

Given that Y = X1 + X2 + X3 where X1 , X2 and X3 ∼ i.i.d. X.


To find: Distribution of Y .

We will find the distribution of X by finding the MGF of Y .

MY (λ) = E[eλY ]
= E[eλ(X1 +X2 +X3 ) ]
= E[eλX1 eλX2 eλX3 ]
= E[eλX1 ]E[eλX2 ]E[eλX3 ] (Since, X1 , X2 and X3 are independent)
λX λX λX
= E[e ]E[e ]E[e ] (Since, X1 , X2 and X3 ∼ i.i.d. X)
= [MX (λ)]3 ...(1)

Now,
MX (λ) = E[eλX ]
= e−1λ .P (X = −1) + e2λ .P (X = 2)
2e−λ e2λ
= + ...(2)
3 3
From equation (1) and (2), we have

3
2e−λ e2λ

MY (λ) = +
3 3
1
= (2e−λ + e2λ )3
27
1
= (8e−3λ + e6λ + 12e−2λ e2λ + 6e−λ e4λ ) (since, (a + b)3 = a3 + b3 + 3a2 b + 3ab2 )
27
8 1 4 2
= e−3λ + e6λ + + e3λ
27 27 9 9
Therefore, distribution of Y is given by

Y -3 0 3 6
8 4 2 1
P (Y = y) 27 9 9 27

Page 12

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy