0% found this document useful (0 votes)
7 views47 pages

Lecture09 Variance Inference Inf Stats FA24

The document discusses inferential statistics focusing on variance estimation and hypothesis testing using Chi-square and F-distributions. It covers confidence intervals for population variance, comparison of two variances, and the application of Chi-square tests for statistical hypotheses. Examples illustrate the construction of confidence intervals and the testing of variance differences between populations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views47 pages

Lecture09 Variance Inference Inf Stats FA24

The document discusses inferential statistics focusing on variance estimation and hypothesis testing using Chi-square and F-distributions. It covers confidence intervals for population variance, comparison of two variances, and the application of Chi-square tests for statistical hypotheses. Examples illustrate the construction of confidence intervals and the testing of variance differences between populations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

Inferential Statistics and

Applied Probability
Inference about Variance
Ch-9 from Michael Baron’s Book (Section 9.5)
Chi-square Test
Ch-10 from Michael Baron’s Book (Section 10.1)

1
Introduction
• We discuss about the confidence intervals for
population variance 𝜎 2 & for comparison of two
variances 𝜎𝑥2 𝑎𝑛𝑑 𝜎𝑦2 i.e. to test the null hypothesis
(Var(X)=Var(Y)) against the alternative (Var(X)≠Var(Y))
• This is a different inference since,
o Variance is a scale and not a location parameter,
o Distribution of its estimator (sample variance) not symmetric
• Variance often needs to be estimated or tested for
quality control, in order to assess stability and accuracy,
evaluate various risks, and also, for tests and confidence
intervals for the popmeans when variance is unknown
2
Variance Estimator and Chi-square
distribution

3
Variance Estimation
• We start by estimating population variance 𝜎 2 using
an observed sample 𝑋 = 𝑋1 , 𝑋2 , … 𝑋𝑛 i.e. 𝜎 2 can
be unbiasedly estimated by the sample variance

• The s2 is approximately Normal, under mild


conditions, when the sample is large. But with
small to moderate samples, this distribution is not
normal and not even symmetric, since s2 is always
non-negative
4
Distribution of Sample Variance

5
Chi-square Distribution
• Chi-square is a special case of gamma distribution i.e. 𝐶ℎ𝑖 −
𝑠𝑞𝑢𝑎𝑟𝑒 𝑣 = 𝐺𝑎𝑚𝑚𝑎(𝑣/2,1/2)
• Also, chi-square with v=2 degrees of freedom is Exp(1/2)

6
Confidence Interval for Population
Variance

7
(1-α)100% Confidence Interval:
Population Variance
• We want to build confidence interval for 𝜎 2 , based on
sample size, n
• We start with s2, that is not symmetric and hence our
confidence interval won’t have the shape estimator±margin
as before
• Instead, we use the chi-square table to find the critical
2 2
values of 𝑥1−𝛼/2 and 𝑥𝛼/2 of the Chi-square distribution
with ν = n-1 degrees of freedom
• These values chop the areas of α/2 on the right and left
side of the density curve much like ± 𝑧𝛼/2 and ± 𝑡𝛼/2

8
Critical Values of Chi-square
Distribution
2 𝛼
Recall that 𝜒𝛼/2 denotes the 1 − 2 -quantile, 𝑞1−𝛼/2

9
Conf Intervals: Variance and std
deviation

10
Example 9.40: M. Baron
• Let we have a measurement device giving the data
(n=6 measurements) as found below. Construct
90% confidence interval for the standard deviation.

11
Example 9.40: Solution

12
Chi-square
distribution
table

13
Comparison of two variances: F-
distribution

14
Comparison of Two Variances
• Now, we deal with populations whose variances need to be
compared
• Such inference used for comparison of accuracy,
uncertainty, or risks arising in two populations
• To compare variances or std deviations, two independent
samples 𝑿 = 𝑋1 , . . 𝑋𝑛 𝑎𝑛𝑑 𝒀 = 𝑌1 , . . 𝑌𝑚 , one from each
population collected and their variances compared i.e. 𝜃 =
𝜎𝑋2 /𝜎𝑌2
• A natural estimator of ratio of population variances is the
ratio of samples variances i.e.
2 ത
𝑠𝑋 (𝑋𝑖 −𝑋)/𝑛−1
𝜃෠ = 2 = σ ത
,
𝑠𝑌 (𝑌𝑖 −𝑌)/𝑚−1
whose distribution is called F-distribution termed after its
developed Sir Ronald Fisher
15
Sample Variances Ratio Distribution

• For Normal data, both 𝑠𝑋2 /𝜎𝑋2 and 𝑠𝑌2 /𝜎𝑌2 follow 𝜒 2 distributions
• The ratio of two independent 𝜒 2 variables has F-distribution
• A ratio of two non-negative continuous random variables, any
F-distributed variable is also non-negative and continuous
16
Sample Variances Ratio Distribution
• F-distribution has two parameters, the numerator and
denominator degrees of freedom i.e. the degrees of
freedom of sample variances in the numerator and
denominator of the F-ratio
• We use tables of critical values of F-distribution to
construct confidence intervals and test hypotheses
comparing two variances
• While comparing 𝜎𝑋2 and 𝜎𝑌2 , we can either divide 𝑠𝑋2 by 𝑠𝑌2
or the other way around but we have to keep in mind that
1st case has 𝐹 𝑛 − 1, 𝑚 − 1 while the 2nd case has
𝐹 𝑚 − 1, 𝑛 − 1 distribution
• This leads us to an important general conclusion:
o If F has F(𝜈1 , 𝜈2 ) distribution, then the distribution of 1/F is
F(𝜈2 , 𝜈1 )

17
Critical Values: F-distribution and
reciprocal

18
Table A-7:
F-distribution
table

19
F-distribution
table

20
F-distribution
table

21
F-distribution
table

22
Conf Interval: Ratio of Population
Variances

23
2 2
Conf Interval: Parameter 𝜃 = 𝜎𝑋 /𝜎𝑌

24
Confidence Interval: Ratio of
Variances (Final Equation)

25
Example 9.46

A data channel has the average speed of 180MBps. A


hardware upgrade is supposed to improve the stability of data
transfer while maintaining the same average speed. After the
upgrade, the instantaneous speed of data transfer, measured
at 16 random instants yield a standard deviation of 14Mbps.
Records show that the standard deviation was 22Mbps before
the upgrade, based on 27 measurements at random times.
We are asked to construct a 90% confidence interval for the
relative change in the standard deviation (assume Normal
distribution of the speed)

26
Example 9.46: Solution

27
F-distribution
table: Ex 9.46

28
F-tests comparing two variances

29
Hypothesis Testing
• Now we see how2 to test null hypothesis using ratio of
𝜎𝑋
variances i.e. 𝐻0 : 2 = 𝜃0 against two-sided alternative
𝜎𝑌
• Often, we only need to know if two variances are equal i.e.
we choose 𝜃0 = 1
• F-test statistic i.e. F-distribution for variances comparison,
becomes
𝑠𝑋2 /𝑠𝑌2
𝐹=
𝜃0
• If X and Y are samples from Normal distributions, this F-
statistic has F-distribution with (n - 1) and (m - 1) degrees of
freedom
• Just like χ2, F-statistic is also non-negative, with a non-
symmetric right-skewed distribution
30
Hypothesis Testing
• For the given significance level 𝛼 , the null
hypothesis is rejected if the F-test statistic is either
greater than 𝐹𝛼/2 (𝜈1 , 𝜈2 ) or smaller than
1/𝐹𝛼/2 (𝜈2 , 𝜈1 ) [Two-tailed F-test]

31
Example: F-Test
• Is there a difference between variances of the
number of weeks on the best seller lists for
nonfiction and fiction books? 15 New York Times
bestselling fiction books had a standard deviation
of 6.17 weeks on the list. 16 New York Times
bestselling nonfiction books had a standard
deviation of 13.12 weeks. At the 10% significance
level, can we conclude there is a difference in the
variance?

32
Solution

33
Solution

34
Solution

35
36
Chi-Square Tests

37
Chi-square Tests
• Several important tests of statistical hypotheses are
based on the Chi-square distribution, which we
already used to study the population variance
• Now we will develop tests based on the counts of
our sampling units that fall in various categories
• The general principle is to compare the observed
counts against the expected counts via the chi-
square statistic

38
Chi-square Statistic and P-value

39
Sample Size
• Under the null hypothesis (no significant difference
between observed and expected values), the Chi-square
statistic follows a Chi-square distribution with (N-1) degrees
of freedom, where N is the number of categories.
• Large sample size requirement:
• Chi-square test works properly only for large enough sample sizes.
• This is due to the CLT, which ensures that the Chi-square statistic
behaves as expected (i.e., follows a Chi-square distribution) when
the sample size is large.
• A rule of thumb: Each category should have an expected count of
at least 5. If this is not the case, the results may not be reliable.
𝐸𝑥𝑝 𝑘 ≥ 5 ∀ 𝑘 = 1, 2, … , 𝑁
• What to do if counts are too small:
• If any category has an expected count less than 5, the solution is to
merge small categories together to increase their counts.
• Once the categories are merged, the Chi-square statistic is
recalculated, and the test is applied again.
40
Application: Testing a distribution
• We may want to test whether a sample comes from the
Normal distribution, whether inter-arrival times are
Exponential and counts are Poisson, whether a
random number generator returns high quality
Standard Uniform values, or whether a die
is unbiased
• Steps to do so:
o Observe a sample (𝑋1,.. , 𝑋𝑛 ) of size n from distribution F and
test 𝐻0 : 𝐹 = 𝐹0 𝑣𝑠 𝐻𝐴 : 𝐹 ≠ 𝐹0 for some given distribution 𝐹0
o This is done by evaluating the χ2 statistic after ensuring that
all 𝐸𝑥𝑝(𝑘) ≥ 5 and then finding the corresponding P-value to
either accept or reject the null hypothesis

41
Example 10.1: Is the die biased?

42
Example 10.1: Baron (Solution)

43
Chi-square
table

44
Ex 7.9: Forsyth

Try!

45
Ex 7.9: Solution

46
Thanks!

47

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy