7 Chi-Square and F
7 Chi-Square and F
Distributions
• There are many theoretical distributions, both
continuous and discrete. Howell calls these
test statistics
• We use 4 test statistics a lot: z (unit normal), t,
chi-square ( ), and F.
2
3
Control of Dispersion
• Taguchi defined Quality as Deviation from
target
• Cost of quality is proportional to square of
deviation from target
• Quality can be achieved only by reducing
variation
• Recent developments in Quality Engineering
include topics like Tolerance Design and
Robust Design
Control of dispersion (Cont)
• When research is undertaken to control
dispersion, it is required to measure success
• It is required to estimate population variance from
sample variance
• If Sample variance is V, It is not an unbiased
estimate of population variance
n
• Vis an unbiased point estimate of population
n 1
variance
Interval Estimate of variance
• Some more distributions like Chi Square
Distributions and F distribution are required
for interval estimation of Population variance
from sample variance
Chi-square
•The chi-square test is an important test amongst the several tests
of significance developed by statisticians.
• Chi-square, symbolically written as χ2 (Pronounced as Ki-
square), is a statistical measure used in the context of sampling
analysis for comparing a variance to a theoretical variance.
•As a non-parametric* test, it “can be used to determine if
categorical data shows dependency or the two classifications are
independent.
•It can also be used to make comparisons between theoretical
populations and actual data when categories are used.”
•The test is, in fact, a technique through the use of which it is
possible for all researchers to
(i) test the goodness of fit;
(ii) test the significance of association between two attributes,
(iii) test the homogeneity or the significance of population
variance.
Chi-square as a test for comparing
variance
• The chi-square value is often used to judge the significance of
population variance i.e., we can use the test to judge if a
random sample has been drawn from a normal population
with mean (μ) and with a specified variance , σ2 p
• The test is based on χ2 –distribution i.e dealing with collections
of values that involve adding up squares.
• If we take each one of a collection of sample variances, divided
them by the known population variance and multiply these
quotients by (n – 1), where n means the number of items we
2
shall obtain a – distribution.
Chi-square
• Then by comparing the calculated value of χ2 with its table
value for (n – 1) degrees of freedom at a given level of
significance, we may either accept H0 or reject it.
• If the calculated value of χ2 is equal to or less than the table
value, the null hypothesis is accepted; otherwise the null
hypothesis is rejected.
• This test is based on chi-square distribution which is not
symmetrical and all the values happen to be positive; one must
simply know the degrees of freedom for using such a
distribution.*
Chi-square Distribution
• The χ2 -distribution is not symmetrical and all the values are
the distribution
Chi-square
The distribution of chi-square depends
on 1 parameter, its degrees of freedom
(df or v). As df gets large, curve is less
skewed, more normal.
HYPOTHESIS TESTING FOR COMPARING A VARIANCE
TO SOME HYPOTHESISED POPULATION VARIANCE
•The test we use for comparing a sample variance to some theoretical or hypothesised
variance of population is different than z-test or the t-test.
• The test we use for this purpose is known as chi-square test and the test statistic
symbolised as χ2 , known as the chi-square value, is worked out.
•The chi-square value to test the null hypothesis viz, H : σs 2 = σs2 worked out as
under:
Test of goodness of fit,
• As a test of goodness of fit, χ2 test enables us to see how well
does the assumed theoretical distribution (such as Binomial
distribution, Poisson distribution or Normal distribution) fit to the
observed data.
• When some theoretical distribution is fitted to the given data, we
are always interested in knowing as to how well this distribution
fits with the observed data.
• The chi-square test can give answer to this.If the calculated value
of χ2 is less than the table value at a certain level of significance,
the fit is considered to be a good one which means that the
divergence between the observed and expected frequencies is
attributable to fluctuations of sampling.
But if the calculated value of χ2 is greater than its table value, the
fit is not considered to be a good one.
Multinomial Experiments
A multinomial experiment is a probability experiment consisting of
a fixed number of trials in which there are more than two possible
outcomes for each independent trial. (Unlike the binomial
experiment in which there were only two possible outcomes.)
Example:
A researcher claims that the distribution of favorite pizza toppings
among teenagers is as shown below.
Topping Frequency, f
Each outcome is Cheese 41% The probability for
classified into Pepperoni 25% each possible
categories. Sausage 15% outcome is fixed.
Mushrooms 10%
Onions 9%
Chi-Square Goodness-of-Fit Test
A Chi-Square Goodness-of-Fit Test is used to test whether a frequency
distribution fits an expected distribution.
To calculate the test statistic for the chi-square goodness-of-fit test, the
observed frequencies and the expected frequencies are used.
Continued.
Chi-Square Goodness-of-Fit Test
Performing a Chi-Square Goodness-of-Fit Test
In Words In Symbols
6. Calculate the test statistic. 2 (O E )2
χ
E
• Proceed with the null hypothesis that the two attributes (viz., new medicine and control of
fever) are independent which means that new medicine is not effective in controlling fever.
• First calculate the expected frequencies and then work out the value of χ2 . If the calculated
value of χ2 is less than the table value at a certain level of significance for given degrees of
freedom, we conclude that null hypothesis stands which means that the two attributes are
independent or not associated (i.e., the new medicine is not effective in controlling the fever).
• But if the calculated value of χ2 is greater than its table value, our inference then would be
that null hypothesis does not hold good which means the two attributes are associated and
the association is not because of some chance factor but it exists in reality (i.e., the new
medicine is effective in controlling the fever and as such may be prescribed).
Test of independence
Continued.
Chi-Square Goodness-of-Fit Test
Example continued:
Topping Observed Expected
Rejection Frequency Frequency
region
Cheese 78 82
0.01 Pepperoni 52 50
Sausage 30 30
X2
Mushrooms 25 20
χ20 = 13.277 Onions 15 18
2 (O E )2 (78 82)2 (52 50)2 (30 30)2 (25 20)2 (15 18)2
χ
E 82 50 30 20 18
2.025
Fail to reject H0.
There is not enough evidence at the 1% level to reject the surveyor
’s claim.
Contingency Tables
An r c contingency table shows the observed frequencies for
two variables. The observed frequencies are arranged in r rows and
c columns. The intersection of a row and a column is called a cell.
Age
Gender 16 – 20 21 – 30 31 – 40 41 – 50 51 – 60 61 and older
Male 32 51 52 43 28 10
Female 13 22 33 21 10 6
Expected Frequency
Assuming the two variables are independent, you can use the
contingency table to find the expected frequency for each cell.
Continued.
Expected Frequency
Example continued:
Age
Gender 16 – 20 21 – 30 31 – 40 41 – 50 51 – 60 61 and Total
older
Male 32 51 52 43 28 10 216
Female 13 22 33 21 10 6 105
Total 45 73 85 64 38 16 321
(Su m of r ow r ) (Su m of colu m n c )
E xpect ed fr equ en cy E r ,c
Sa m ple size
216 45 216 73 216 85
E 1,1 30.28 E 1,2 49.12 E 1,3 57.20
321 321 321
Continued.
Chi-Square Independence Test
Performing a Chi-Square Independence Test
In Words In Symbols
6. Calculate the test statistic. 2 (O E )2
χ
E
Continued.
Chi-Square Independence Test
Example continued: O E O–E (O – E)2 (O E )2
Rejection
E
32 30.28 1.72 2.9584 0.0977
region
51 49.12 1.88 3.5344 0.072
0.05 52 57.20 5.2 27.04 0.4727
43 43.07 0.07 0.0049 0.0001
X2 28 25.57 2.43 5.9049 0.2309
10 10.77 0.77 0.5929 0.0551
χ20 = 11.071
13 14.72 1.72 2.9584 0.201
(O E )2 22 23.88 1.88 3.5344 0.148
2
χ 2.84 33 27.80 5.2 27.04 0.9727
E
21 20.93 0.07 0.0049 0.0002
Fail to reject H0. 10 12.43 2.43 5.9049 0.4751
6 5.23 0.77 0.5929 0.1134
is called an F-distribution.
There are several properties of this distribution.
F
1 2 3 4
Critical Values for the F-Distribution
Finding Critical Values for the F-Distribution
1. Specify the level of significance .
2. Determine the degrees of freedom for the numerator, d.f. N.
3. Determine the degrees of freedom for the denominator, d.f. D.
4. Use Table 7 in Appendix B to find the critical value. If the hypothesis
test is
a. one-tailed, use the F-table.
1
b. two-tailed, use the 2 F-table.
Critical Values for the F-Distribution
Example:
Find the critical F-value for a right-tailed test when = 0.05,
d.f.N = 5 and d.f.D = 28.
Appendix B: Table 7: F-Distribution
d.f.D: Degrees = 0.05
of freedom, d.f.N: Degrees of freedom, numerator
denominator
1 2 3 4 5 6
1 161.4 199.5 215.7 224.6 230.2 234.0
2 18.51 19.00 19.16 19.25 19.30 19.33
27 4.21 3.35 2.96 2.73 2.57 2.46
28 4.20 3.34 2.95 2.71 2.56 2.45
29 4.18 3.33 2.93 2.70 2.55 2.43
1 2 3 4 5 6
1 161.4 199.5 215.7 224.6 230.2 234.0
2 18.51 19.00 19.16 19.25 19.30 19.33
3 10.13 9.55 9.28 9.12 9.01 8.94
4 7.71 6.94 6.59 6.39 6.26 6.16
5 6.61 5.79 5.41 5.19 5.05 4.95
6 5.99 5.14 4.76 4.53 4.39 4.28
The critical
7 value5.59
is F0 =4.74
4.53. 4.35 4.12 3.97 3.87
Two-Sample F-Test for Variances
Two-Sample F-Test for Variances
A two-sample F-test is used to compare two population variances
when σ 12aasample
n d σ 22 is randomly selected from each population.
The populations must be independent and normally distributed.
The test statistic is
s 12
F 2
s2
where s 12 a n d s 22 represent the sample variances with
2 2
sThe
1 s 2.
degrees of freedom for the numerator is d.f.N = n1 – 1 and
the degrees of freedom for the denominator is d.f.D = n2 – 1, where
n1 is the size of the sample having
2
variance and n2 is the size of
s1
the sample having variance
s 22.
Two-Sample F-Test for Variances
Using a Two-Sample F-Test to Compare σ 12 and σ 22
In Words In Symbols
1. Identify the claim. State the null and State H0 and Ha.
alternative hypotheses.
Continued.
Two-Sample F-Test for Variances
Using a Two-Sample F-Test to Compare σ 12 and σ 22
In Words In Symbols
5. Determine the rejection region.
6. Calculate the test statistic. s 12
F 2
s2
SS B
Between SSB d.f.N MS B M S B M SW
d.f.N
S SW
Within SSW d.f.D M SW
d.f.D
Performing a One-Way ANOVA Test
Example:
The following table shows the salaries of randomly selected
individuals from four large metropolitan areas. At = 0.05, can
you conclude that the mean salary is different in at least one of the
areas? (Adapted from US Bureau of Economic Analysis)
Continued.
Performing a One-Way ANOVA Test
Example continued:
H0: μ1 = μ2 = μ3 = μ4
Ha: At least one mean is different from the others. (Claim)
d.f.D = N – k = 20 – 4 = 16
( v ) / v2
2
than v1 and v2 will be larger than 2.
In such a case, the mean of the F
distribution (expected value) is
v2 /(v2 -2).
F Distribution (2)
• F depends on two parameters: v1 and v2 (df1
and df2). The shape of F changes with these.
Range is 0 to infinity. Shaped a bit like chi-
square.
• F tables show critical values for df in the
numerator and df in the denominator.
• F tables are 1-tailed; can figure 2-tailed if you
need to (but you usually don’t).
F table – critical values
Numerator df: dfB
dfW 1 2 3 4 5
t(2v ) F(1,v )
Review
• How is F related to the Normal? To chi-
square?
• Suppose we have 2 samples and we want to
know whether they were drawn from
populations where the variances are equal.
Sample1: N=50, s2=25; Sample 2: N=60,
s2=30. How can we test? What is the best
conclusion for these data?