Lecture09 Variance Inference Inf Stats FA24
Lecture09 Variance Inference Inf Stats FA24
Applied Probability
Inference about Variance
Ch-9 from Michael Baron’s Book (Section 9.5)
Chi-square Test
Ch-10 from Michael Baron’s Book (Section 10.1)
1
Introduction
• We discuss about the confidence intervals for
population variance 𝜎 2 & for comparison of two
variances 𝜎𝑥2 𝑎𝑛𝑑 𝜎𝑦2 i.e. to test the null hypothesis
(Var(X)=Var(Y)) against the alternative (Var(X)≠Var(Y))
• This is a different inference since,
o Variance is a scale and not a location parameter,
o Distribution of its estimator (sample variance) not symmetric
• Variance often needs to be estimated or tested for
quality control, in order to assess stability and accuracy,
evaluate various risks, and also, for tests and confidence
intervals for the popmeans when variance is unknown
2
Variance Estimator and Chi-square
distribution
3
Variance Estimation
• We start by estimating population variance 𝜎 2 using
an observed sample 𝑋 = 𝑋1 , 𝑋2 , … 𝑋𝑛 i.e. 𝜎 2 can
be unbiasedly estimated by the sample variance
5
Chi-square Distribution
• Chi-square is a special case of gamma distribution i.e. 𝐶ℎ𝑖 −
𝑠𝑞𝑢𝑎𝑟𝑒 𝑣 = 𝐺𝑎𝑚𝑚𝑎(𝑣/2,1/2)
• Also, chi-square with v=2 degrees of freedom is Exp(1/2)
6
Confidence Interval for Population
Variance
7
(1-α)100% Confidence Interval:
Population Variance
• We want to build confidence interval for 𝜎 2 , based on
sample size, n
• We start with s2, that is not symmetric and hence our
confidence interval won’t have the shape estimator±margin
as before
• Instead, we use the chi-square table to find the critical
2 2
values of 𝑥1−𝛼/2 and 𝑥𝛼/2 of the Chi-square distribution
with ν = n-1 degrees of freedom
• These values chop the areas of α/2 on the right and left
side of the density curve much like ± 𝑧𝛼/2 and ± 𝑡𝛼/2
8
Critical Values of Chi-square
Distribution
2 𝛼
Recall that 𝜒𝛼/2 denotes the 1 − 2 -quantile, 𝑞1−𝛼/2
9
Conf Intervals: Variance and std
deviation
10
Example 9.40: M. Baron
• Let we have a measurement device giving the data
(n=6 measurements) as found below. Construct
90% confidence interval for the standard deviation.
11
Example 9.40: Solution
12
Chi-square
distribution
table
13
Comparison of two variances: F-
distribution
14
Comparison of Two Variances
• Now, we deal with populations whose variances need to be
compared
• Such inference used for comparison of accuracy,
uncertainty, or risks arising in two populations
• To compare variances or std deviations, two independent
samples 𝑿 = 𝑋1 , . . 𝑋𝑛 𝑎𝑛𝑑 𝒀 = 𝑌1 , . . 𝑌𝑚 , one from each
population collected and their variances compared i.e. 𝜃 =
𝜎𝑋2 /𝜎𝑌2
• A natural estimator of ratio of population variances is the
ratio of samples variances i.e.
2 ത
𝑠𝑋 (𝑋𝑖 −𝑋)/𝑛−1
𝜃 = 2 = σ ത
,
𝑠𝑌 (𝑌𝑖 −𝑌)/𝑚−1
whose distribution is called F-distribution termed after its
developed Sir Ronald Fisher
15
Sample Variances Ratio Distribution
• For Normal data, both 𝑠𝑋2 /𝜎𝑋2 and 𝑠𝑌2 /𝜎𝑌2 follow 𝜒 2 distributions
• The ratio of two independent 𝜒 2 variables has F-distribution
• A ratio of two non-negative continuous random variables, any
F-distributed variable is also non-negative and continuous
16
Sample Variances Ratio Distribution
• F-distribution has two parameters, the numerator and
denominator degrees of freedom i.e. the degrees of
freedom of sample variances in the numerator and
denominator of the F-ratio
• We use tables of critical values of F-distribution to
construct confidence intervals and test hypotheses
comparing two variances
• While comparing 𝜎𝑋2 and 𝜎𝑌2 , we can either divide 𝑠𝑋2 by 𝑠𝑌2
or the other way around but we have to keep in mind that
1st case has 𝐹 𝑛 − 1, 𝑚 − 1 while the 2nd case has
𝐹 𝑚 − 1, 𝑛 − 1 distribution
• This leads us to an important general conclusion:
o If F has F(𝜈1 , 𝜈2 ) distribution, then the distribution of 1/F is
F(𝜈2 , 𝜈1 )
17
Critical Values: F-distribution and
reciprocal
18
Table A-7:
F-distribution
table
19
F-distribution
table
20
F-distribution
table
21
F-distribution
table
22
Conf Interval: Ratio of Population
Variances
23
2 2
Conf Interval: Parameter 𝜃 = 𝜎𝑋 /𝜎𝑌
24
Confidence Interval: Ratio of
Variances (Final Equation)
25
Example 9.46
26
Example 9.46: Solution
27
F-distribution
table: Ex 9.46
28
F-tests comparing two variances
29
Hypothesis Testing
• Now we see how2 to test null hypothesis using ratio of
𝜎𝑋
variances i.e. 𝐻0 : 2 = 𝜃0 against two-sided alternative
𝜎𝑌
• Often, we only need to know if two variances are equal i.e.
we choose 𝜃0 = 1
• F-test statistic i.e. F-distribution for variances comparison,
becomes
𝑠𝑋2 /𝑠𝑌2
𝐹=
𝜃0
• If X and Y are samples from Normal distributions, this F-
statistic has F-distribution with (n - 1) and (m - 1) degrees of
freedom
• Just like χ2, F-statistic is also non-negative, with a non-
symmetric right-skewed distribution
30
Hypothesis Testing
• For the given significance level 𝛼 , the null
hypothesis is rejected if the F-test statistic is either
greater than 𝐹𝛼/2 (𝜈1 , 𝜈2 ) or smaller than
1/𝐹𝛼/2 (𝜈2 , 𝜈1 ) [Two-tailed F-test]
31
Example: F-Test
• Is there a difference between variances of the
number of weeks on the best seller lists for
nonfiction and fiction books? 15 New York Times
bestselling fiction books had a standard deviation
of 6.17 weeks on the list. 16 New York Times
bestselling nonfiction books had a standard
deviation of 13.12 weeks. At the 10% significance
level, can we conclude there is a difference in the
variance?
32
Solution
33
Solution
34
Solution
35
36
Chi-Square Tests
37
Chi-square Tests
• Several important tests of statistical hypotheses are
based on the Chi-square distribution, which we
already used to study the population variance
• Now we will develop tests based on the counts of
our sampling units that fall in various categories
• The general principle is to compare the observed
counts against the expected counts via the chi-
square statistic
38
Chi-square Statistic and P-value
39
Sample Size
• Under the null hypothesis (no significant difference
between observed and expected values), the Chi-square
statistic follows a Chi-square distribution with (N-1) degrees
of freedom, where N is the number of categories.
• Large sample size requirement:
• Chi-square test works properly only for large enough sample sizes.
• This is due to the CLT, which ensures that the Chi-square statistic
behaves as expected (i.e., follows a Chi-square distribution) when
the sample size is large.
• A rule of thumb: Each category should have an expected count of
at least 5. If this is not the case, the results may not be reliable.
𝐸𝑥𝑝 𝑘 ≥ 5 ∀ 𝑘 = 1, 2, … , 𝑁
• What to do if counts are too small:
• If any category has an expected count less than 5, the solution is to
merge small categories together to increase their counts.
• Once the categories are merged, the Chi-square statistic is
recalculated, and the test is applied again.
40
Application: Testing a distribution
• We may want to test whether a sample comes from the
Normal distribution, whether inter-arrival times are
Exponential and counts are Poisson, whether a
random number generator returns high quality
Standard Uniform values, or whether a die
is unbiased
• Steps to do so:
o Observe a sample (𝑋1,.. , 𝑋𝑛 ) of size n from distribution F and
test 𝐻0 : 𝐹 = 𝐹0 𝑣𝑠 𝐻𝐴 : 𝐹 ≠ 𝐹0 for some given distribution 𝐹0
o This is done by evaluating the χ2 statistic after ensuring that
all 𝐸𝑥𝑝(𝑘) ≥ 5 and then finding the corresponding P-value to
either accept or reject the null hypothesis
41
Example 10.1: Is the die biased?
42
Example 10.1: Baron (Solution)
43
Chi-square
table
44
Ex 7.9: Forsyth
Try!
45
Ex 7.9: Solution
46
Thanks!
47