0% found this document useful (0 votes)
15 views48 pages

Applied Statistics Lecture 11

Uploaded by

adhithxt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views48 pages

Applied Statistics Lecture 11

Uploaded by

adhithxt
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

MA2540/MA4240: Applied Statistics

Dr. Sameen Naqvi


Department of Mathematics, IIT Hyderabad
Email id: sameen@math.iith.ac.in
Example 6

Suppose we have conducted a study on the acidity levels of 61 soil


samples collected from a particular agricultural region. The sample
mean acidity level is 5.8 pH units, and the sample standard
deviation is 0.6 pH units.

(a) Calculate the 95% confidence interval for the mean acidity
level in the soil of this agricultural region.

(b) Calculate the 99% confidence interval for the mean acidity
level in the soil of this agricultural region.
Solution

(a) Given: n = 61, df = 60, X̄ = 5.8, S = 0.6, t0.025,n−1 = 2.000.

0.6
The interval is: 5.8 ± 2 √ = 5.8 ± 0.154, i.e., (5.646, 5.954).
61

Thus, we are 95% confident that the mean acidity level in the
soil is 5.646 pH to 5.954 pH.

Width of interval = 0.308.


Solution

(b) Given: n = 61, df = 60, X̄ = 5.8, S = 0.6, t0.005,n−1 = 2.660.

0.6
The interval is: 5.8 ± 2.660 √ = 5.8 ± 0.204, i.e.,
61
(5.596, 6.004).

Thus, we are 99% confident that the mean acidity level in the
soil is 5.596 pH to 6.004 pH.

Width of interval = 0.408.


Confidence and Precision

I Wider intervals have poorer precision.

I Higher the confidence level, the wider is the width of the


interval and thus less precision.
Determining Sample Size : t-interval
 
I Since E = tα/2,n−1 √S , we determine the sample size by
n
solving the equation for n:
(tα/2,n−1 )2 S 2
n= .
E2

I Here, approximate value of S = range


4 .
Determining Sample Size : t-interval

I Note that the t-value on the right depends upon n.

I Crude Method: Simply replace the t-value that depends on


n with a Z -value that doesn’t because as n increases, the
t-distribution approaches the standard normal distribution.

I Thus,
(zα/2 )2 S 2
n≈ .
E2

I Iterative method: Start with an initial guess for n, plug in


the formula, and iteratively solve for n.
CI for population variance
(C). CI for population variance

Theorem 3

If X1 , X2 , . . . , Xn are normally distributed and a = χ21−α/2,n−1 and


b = χ2α/2,n−1 , then a (1 − α)100% CI for the population variance
σ 2 is: !
(n − 1)S 2 (n − 1)S 2
,
b a
and a (1 − α)100% CI for the population standard deviation σ is:
p p !
(n − 1)S (n − 1)S
√ , √
b a
(C). CI for population variance contd.
Proof
It is known that if X1 , X2 , . . . , Xn are normally distributed with
mean µ and population variance σ 2 , then
(n − 1)S 2
∼ χ2n−1 .
σ2

With a = χ21−α/2 and b = χ2α/2 , and using


(C). CI for population variance contd.

we have
h (n − 1)S 2 i
P a≤ ≤ b =1−α
σ2

Considering
(n − 1)S 2
a≤ ≤ b,
σ2

and simplifying, we get

(n − 1)S 2 (n − 1)S 2
≤ σ2 ≤ .
b a
Example 7

I A pharmaceutical company produces pills with an intended


active ingredient concentration of 20 milligrams per tablet. A
quality control analyst at the company is concerned about the
variation in the actual active ingredient concentrations and
wants to estimate the population standard deviation (σ) of
the concentrations.
I To do this, the analyst randomly selects a sample of n = 15
pills from a production batch and measures their active
ingredient concentrations. The sample yields a sample
variance of 3.6.
I Use this random sample data to calculate a 95% confidence
interval for σ of the active ingredient concentrations in these
pills.
Solution
Here,
a = χ21−α/2,n−1 = χ20.975,14 = 5.629
and
b = χ2α/2,n−1 = χ20.025,14 = 26.119.
Substituting in the formula,
 
14 × 3.6 2 14 × 3.6
≤σ ≤
26.119 5.629
and simplifying, we get 95% confidence interval for σ 2

1.93 ≤ σ 2 ≤ 8.95 .


This leads to 95% confidence interval for σ

(1.39 ≤ σ ≤ 2.99).
CI for population proportion
(D). CI for Population Proportion
Theorem 4
For large random samples, a 100(1 − α)% CI for population
proportion p is: r
p̂(1 − p̂)
p̂ ± zα/2 .
n

I Proof. We know that, for large n,


p̂ − p
Z=q ∼ N(0, 1).
p(1−p)
n

Now,
h p̂ − p i
P − zα/2 ≤ q ≤ zα/2 ≈ 1 − α.
p(1−p)
n
(D). CI for Population Proportion contd.
Now, consider the inequality inside the brackets:
−zα/2 ≤ qp̂−p ≤ zα/2
p(1−p)
n
r r
p(1 − p) p(1 − p)
−zα/2 ≤ p̂ − p ≤ +zα/2
r n rn
p(1 − p) p(1 − p)
−p̂ − zα/2 ≤ −p ≤ −p̂ + zα/2
r n r n
p(1 − p) p(1 − p)
p̂ − zα/2 ≤ p ≤ p̂ + zα/2
n n
Replace population proportions (p) that appear at endpoints of the
interval with sample proportion (p̂) to get an (approximate)
100(1 − α)% CI for p
r r
p̂(1 − p̂) p̂(1 − p̂)
p̂ − zα/2 ≤ p ≤ p̂ + zα/2 .
n n
Example 8

A marketing agency conducted a survey to investigate the


preference for eco-friendly packaging among consumers in a
city.

Out of 600 respondents, 420 expressed a preference for


eco-friendly packaging.

Using this sample proportion, the marketing agency wants to


estimate, with 95% confidence, the parameter p, which is the
proportion of all consumers in the city who prefer eco-friendly
packaging. What is the confidence interval for p based on this
sample proportion?
Solution

Given: n = 600, sample proportion p̂ = 420


600 = 0.70, and
z0.025 = 1.96. Substituting in the formula for Cl for p, we get:
r
0.70(1 − 0.70)
0.70 ± 1.96
600
i.e.,
0.70 ± 0.037 = (0.663, 0.737)
Thus, we can be 95% confident that between 66.3% and 73.7% of
the population in the city prefer eco-friendly packaging.
CIs for difference of two population means
CIs for µ1 − µ2

(A.) when the populations are independent and normally


distributed with unknown common variance σ 2 - Two sample
Pooled t-interval.

(B.) when the populations are independent and normally


distributed with unknown and unequal variances - Welch’s
t-interval.

(C.) when the populations are dependent and normally distributed


- Paired t-interval.
(A). Two-Sample Pooled t-interval
Theorem 1
If X1 , X2 , . . . , Xn ∼ N(µ1 , σ 2 ) and Y1 , Y2 , . . . , Ym ∼ N(µ2 , σ 2 ) are
independent random samples, then a (1 − α)100% CI for the
difference in the population means, µ1 − µ2 is:
r
1 1
(X − Y ) ± tα/2,n+m−2 Sp + ,
n m

where Sp2 , the “pooled sample variance”

(n − 1)SX2 + (m − 1)SY2
Sp2 =
n+m−2

is an UE of the common variance σ 2 .

I Note: See Theorem 4, Week 5.


(A). Two-Sample Pooled t-interval contd.
Proof:
It is known that
(X − Y ) − (µ1 − µ2 )
T = q ∼ tn+m−2 .
1 1
Sp n + m

Also,
 
(X − Y) − (µ1 − µ2 )
P −tα/2,n+m−2 ≤ q ≤ tα/2,n+m−2  = 1−α.
1 1
Sp n + m

Consider the inequality within the bracket


r
1 1
−tα/2,n+m−2 Sp + ≤ (X − Y ) − (µ1 − µ2 )
n m
r
1 1
≤ tα/2,n+m−2 Sp +
n m
(A). Two-Sample Pooled t-interval contd.

On simplification, we get
r
1 1
(X − Y ) − tα/2,n+m−2 Sp + ≤ µ1 − µ 2
n m
r
1 1
≤ (X − Y ) + tα/2,n+m−2 Sp +
n m

Thus, (1 − α)100% CI for the difference in the population


means is r
1 1
(X − Y ) ± tα/2,n+m−2 Sp + .
n m
Example 1

Suppose the number of products sold by the two sales team, A and
B, weekly is as follows:

Team A Team B
28, 35, 30, 32, 29, 34, 31, 33, 24, 29, 26, 31, 27, 30, 28, 32,
27, 36, 30, 32, 28, 35, 31, 33, 25, 33, 29, 31, 24, 29, 28, 32,
29, 34, 30, 32, 31 26, 30, 27, 31, 28

Is there statistically significant evidence to conclude that there


is a difference in the average number of products sold between
two sales teams?
Solution

I Let Xi and Yi be the number of products sold by Team A and


Team B in the i th week, respectively.

I Since sample variances SX2 = 6.05 and SY2 = 6.63 are not that
different, we can assume the population variances are similar.

I The pooled sample variance

(21 − 1)6.05 + (21 − 1)6.63


Sp2 = = 6.68.
21 + 21 − 2
which implies Sp = 2.58.
Solution contd.

I For m = n = 21, if we calculate a 95%Cl, we have


t0.025,21+21−2 = t0.025,40 = 2.021.

Also, x̄ = 31.43 and ȳ = 28.57. Thus, the 95%Cl for the


difference in population means are
r
1 1
(31.43 − 28.57) ± 2.021(2.58) + = (1.250, 4.470).
21 21
I Since the interval does not contain the value 0, we can
conclude that the population means differ.
(B). Welch’s t-interval (if σX2 6= σY2 )
Theorem 2
If data is normally distributed and the population variances σX2 and
σY2 can’t be assumed to be equal, then a (1 − α)100% CI for the
difference in the population means, µX − µY is:
s
SX2 S2
(X − Y ) ± tα/2,r + Y,
n m

where the r d.f. are approximated by:


2
SX2 SY2

n + m
r= .
(SX2 /n)2 (SY2 /m)2
n−1 + m−1

I Note: See Theorem 3, Week 5.


Example 2

I In Example 1, the following statistics were given:

n = 21, x̄ = 31.43, SX2 = 6.05


m = 21, ȳ = 28.57, SY2 = 6.63.

What is the difference, if any, in the mean number of products


sold by sales teams (Team A and Team B)?
Solution

I Here,
2
SX2 SY2

+ 6.05 6.63 2

n m 21 + 21
r= 2 2 = (6.05/21)2 2 ≈ 40.07.
(SX2 /n)
+
(SY2 /m)
20 + (6.63/21)
20
n−1 m−1

So, dr e = 40.

Using a t-table, we get t0.025,40 = 2.021.

Thus, Welch’s interval is


r
6.05 6.63
(31.43 − 28.57) ± 2.021 +
21 21
Solution contd.

Thus, 95%CI for µX − µY is (1.360, 4.360).

I Recall from Example 1 that the two-sample pooled t-interval


was (1.250, 4.470).

I Comparing the two intervals, we note that they aren’t that


different. The reason is that sample variances aren’t really all
that different.

I Rule of thumb: Use Welch’s interval if

SX2 SY2
> 4 or >4
SY2 SX2
(C.) Paired t-interval

Theorem 3

When dealing with pairs of dependent measurements, sample mean


difference, D should be used to estimate population mean
difference, µD . The (1 − α)100% t-interval is
S 
D
D ± tα/2,n−1 √
n

I Note: See Theorem 6, Week 5.


Example 3

I Suppose you want to investigate if the installation of a new


air filtration system in a factory has had an impact on the
level of a specific air pollutant (e.g., particulate matter) in the
factory environment.

I The collected data on the concentration of particulate matter


(in micrograms per cubic meter) before and after the
installation of the filtration system for ten different days, is as
follows:
Example 3

Day Before installation After installation


1 45 38
2 50 42
3 48 40
4 55 48
5 42 35
6 47 41
7 53 45
8 52 40
9 49 39
10 46 37
Solution

I Xi : concentration of particulate matter before installation of


the filtration system.

I Yi : concentration of particulate matter after installation of


the filtration system.

I Calculating Di = Xi − Yi removes the effect of the air


filtration system, and therefore, Di ’s are independent.
Solution contd.

Day Xi Yi Di = Xi − Yi
1 45 38 7
2 50 42 8
3 48 40 8
4 55 48 7
5 42 35 7
6 47 41 6
7 53 45 8
8 52 40 12
9 49 39 10
10 46 37 9
Solution contd.

I Thus, the 95% CI for µD is

SD
D ± t0.025,9 √ .
n
I From the given data, we get

1.62
7.4 ± 2.262 √ = (6.241, 8.559).
10
I Since 95% confidence interval does not include 0, we can
conclude that installation of the filtration system has a
significant effect in reducing the particulate matter.
CIs for ratio of two population variances
CIs for ratio of two population variances

Theorem 4
If X1 , X2 , . . . , Xn ∼ N(µX , σX2 ) and
Y1 , Y2 , . . . , Ym ∼ N(µY , σY2 ) are independent samples, then a
(1 − α)100% CI for σX2 /σY2 is:
!
1 SX2 SX2
, F (m − 1, n − 1) 2 .
Fα/2 (n − 1, m − 1) SY2 α/2 SY
CIs for ratio of two population variances contd.

Proof
(n−1)SX2 (m−1)SY2
We know that σX2
∼ χ2n−1 and σY2
∼ χ2m−1 .
Also, by the independence of the two samples,
(m−1)SY2
σY2
/(m − 1) σX2 SY2
F = = · ∼ F (m − 1, n − 1).
(n−1)SX2
/(n − 1) σY2 SX2
σX2

Therefore,
" #
σX2 SY2
P F1−α/2 (m−1, n−1) ≤ 2 · 2 ≤ Fα/2 (m−1, n−1) = 1−α
σY SX
CIs for ratio of two population variances contd.

Simplifying the quantity within the bracket and using the fact
that
1
F1−α/2 (m − 1, n − 1) = ,
Fα/2 (n − 1, m − 1)

the (1 − α)100% CI for σX2 /σY2 is:

1 SX2 σX2 SX2


≤ ≤ F α/2 (m − 1, n − 1) .
Fα/2 (n − 1, m − 1) SY2 σY2 SY2
Example 4

I In Example 1, the following statistics were given:

n = 21, x̄ = 31.43, SX2 = 6.05


m = 21, ȳ = 28.57, SY2 = 6.63.

Estimate, with 95% confidence, the ratio of the two


population variances.
Solution

I From the F -table


1 1
F0.025 (20, 20) = 2.47 and F0.975 (20, 20) = = .
F0.025 (20, 20) 2.47

σ2
Then, the 95% CI for X2 is
σY
σX2
   
1 6.05 6.05
≤ 2 ≤ 2.47 .
2.47 6.63 σY 6.63

Simplifying, we get the 95% CI as (0.367, 2.237)


CIs for difference of two population proportions
CIs for difference of two population proportions
I Therem 5
For large random samples, an approximate 100(1 − α)% CI for the
difference in two population proportions p1 − p2 is:
s
p̂1 (1 − p̂1 ) p̂2 (1 − p̂2 )
(p̂1 − p̂2 ) ± zα/2 + .
n1 n2

Proof. We know that


!
Y1 p1 (1 − p1 )
p̂1 = ∼ N p1 ,
n1 n1
and !
Y2 p2 (1 − p2 )
p̂2 = ∼ N p2 ,
n2 n2
CIs for difference of two population proportions contd.
By independence,
!
p1 (1 − p1 ) p2 (1 − p2 )
(p̂1 − p̂2 ) ∼ N p1 − p2 , + .
n1 n2

Now,
" #
(p̂1 − p̂2 ) − (p1 − p2 )
P − zα/2 ≤ q ≤ zα/2 ≈ 1 − α
p1 (1−p1 ) p2 (1−p2 )
n1 + n2

Simplifying the quantity within the bracket, we get the


approximate 100(1 − α)% CI for p1 − p2 :
s
p̂1 (1 − p̂1 ) p̂2 (1 − p̂2 )
(p̂1 − p̂2 ) ± zα/2 + .
n1 n2
Example 5

A marketing research company conducted a study to compare


the effectiveness of two advertising campaigns, Campaign X
and Campaign Y, in attracting new customers to a retail store.

It was found that in a sample of 400 people who were exposed


to Campaign X, 200 of them visited the store. For Campaign
Y, in a sample of 250 people, 100 of them visited the store.

Calculate a 95% confidence interval for the difference in the


proportions of people who visited the store as a result of the
two advertising campaigns (Campaign X and Campaign Y).
Solution
Data Campaign X Campaign Y
Sample size 400 250
# of people visited the store 200 100
Sample proportion 0.50 0.40
I Substituting in the formula, we get
r
0.50 × 0.50 0.40 × 0.60
(0.50 − 0.40) ± 1.96 +
400 250
which simplifies to

0.10 ± 0.078 = (0.022, 0.178)


I We can be 95% confident that there are between 2.2% and
17.8% more visitors to store due to campaign X than
campaign Y.
Thank you for listening!

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy