Chi Square Exercises
Chi Square Exercises
Recall Properties
Example: = 0.05
Df = 7
2. Test of Independence
a. Based on frequency in a contingency table of two or
more variables of a single sample
b. Used to test if the two variables are associated or not;
alternatively tests if the outcome of one variable has
an effect on the outcome of the other variable(s)
II. Steps:
1. State the null and alternative hypotheses
Ho: The variable under consideration has the
specified distribution.
Ha: The variable under consideration does not have
the specified distribution.
Example
Observing that the proportion of blue M&Ms in his bowl of candy
appeared to be less than that of the other colors, R. Fricker, Jr. decided
to compare the color distribution in randomly chosen bags of M&Ms to
the theoretical distribution reported by M&M/MARS consumer affairs.
Fricker published his findings in the article, “The Mysterious Case of the
Blue M&Ms” (Chance, Vol 9(4), pp 19-22). For his experiment Fricker
bought three bags of M&Ms from local stores and counted the number
of each color. The average number of each color in the three bags was
distributed as shown in the “Observed Frequency” column.
Solution
We have n = 509
Df = (k-1) = 5
Critical Chi-Square value at = 0.05 with df=5 is 11.070
All expected frequencies are 1 or greater.
At most 20% of the expected frequencies are less than 5.
The value of the test statistic does not fall in the rejection region, so we do not
reject H0. The test results are not statistically significant at the 5% level.
The data do not provide sufficient evidence to conclude that the color distribution
of M&Ms differs from that reported by M&M / MARS consumer affairs.
Chi-Square Procedures Page -5-
Class Notes – Updated Dec 2014
Prepared by: Nina Kajiji
Contingency Tables
Contingency Tables are created by simultaneously grouping data
from two or more variables into frequency distribution. That is,
contingency tables or n-way tables represent the frequency
distribution of a bivariate (n-variate) data.
Association (Independence)
Association is the relationship of two variables with each other.
In other words, there exists an association between two variables
of a population if knowing the value of one allows us to obtain
information about the other.
Steps:
1) State the null and alternative hypotheses
Ho: The two variables under consideration are not
associated
Ha: The two variables under consideration are
associated.
Where:
R = Row total
C = Column total
n = sample size
Where:
k is
O is the observed frequencies
E is the expected frequencies
Ho: Siskel’s and Ebert’s ratings are not associated (no association) -OR-
independent
Ha: Siskel’s and Ebert’s ratings are associated -OR- not independent
Thumbs Up 10 9 64 83
21.79 E31 15.56 E32 45.65 E33
Total 42 30 88 160
For example: The expected frequency for Thumbs Down from both
Siskel and Ebert (E11) is: (45)(42) / 160 = 11.81
All numbers in red are calculated in the same manner using the
appropriate row and column totals.
Since 45.365 lies in the rejection region we reject Ho. The test
results are statistically significant at the 1% level. We thus
conclude that at the 1% significance level, the data provide
sufficient evidence to conclude that there is an association
(relationship) between Siskel’s and Ebert’s ratings.
Steps:
1. State the null and alternative hypotheses
Ho: p1 = p2 = … = ps (where s = # of samples)
Ha: At least one proportion is different
Where:
R = Row total
C = Column total
n = sample size
Where:
k is
O is the observed proportions
E is the expected proportions
Ho: p1 = p2 = p3 = p4
Ha: At least one proportion is different
No 76 67 62 51
256
64 E21 64 E22 64 E23 36 E24
For example: The expected proportion for (E11) is: (144)(100) / 400 =
11.81
All numbers in red are calculated in the same manner using the
appropriate row and column totals.
Since 14.15 lies in the rejection region we reject Ho. The test
results are statistically significant at the 5% level. We thus
conclude that at the 5% significance level, the data provide
sufficient evidence to conclude that there is at least one
proportion that is different. Hence the incomes seem to make a
difference in the proportion of people that are happy.