Unit 20
Unit 20
Chi-Square Tests
Table of Contents
Introduction .................................................................................................................................... 3
Test Your Prerequisite Skills ........................................................................................................ 4
Objectives ...................................................................................................................................... 4
Lesson 1: Introduction to Chi-Square Tests
- Warm Up! ........................................................................................................................... 5
- Learn about It! ................................................................................................................... 6
- Let’s Practice! ..................................................................................................................... 7
- Check Your Understanding! ............................................................................................ 10
Lesson 2: Chi-Square Distribution
- Warm Up! ......................................................................................................................... 12
- Learn about It! ................................................................................................................. 12
- Let’s Practice! ................................................................................................................... 14
- Check Your Understanding! ............................................................................................ 17
Lesson 3: Goodness of Fit Test
- Warm Up! ......................................................................................................................... 19
- Learn about It! ................................................................................................................. 19
- Let’s Practice! ................................................................................................................... 20
- Check Your Understanding! ............................................................................................ 25
Lesson 4: Chi-Square Test for Independence
- Warm Up! ......................................................................................................................... 27
- Learn about It! ................................................................................................................. 27
- Let’s Practice! ................................................................................................................... 29
1
UNIT 20
Chi-Square Tests
In daily life, relationships between our status and our decisions are
often observed. Our perception towards a particular topic or event
may have been influenced by groups in which we belong. For
example, as we grow older, we tend to see things in a different
perspective than a younger individual does. These relationships
between our choices and our status are often subtle.
In this unit, we will explore how chi-square test can be applied to different examples such
as the ones stated above and what other kinds of problem it can solve.
Before you get started, answer the following items on a separate sheet of paper. This will
help you assess your prior knowledge and practice some skills that you will need in
studying the lessons in this unit. Show your complete solution.
Objectives
Warm Up!
Instructions:
1. This activity should be done by the whole class.
3. For each question, the teacher will note the number of responses using the
following table:
Male Female
Question a
Question b
Question c
Question d
In Warm Up!, you may have noticed that some questions got more responses from girls
and some got more responses from boys. In that sense, we could say that gender may
probably have a relationship to color preference, for example. This relationship can be
tested using chi-square test.
There are mainly two types of chi-square test that will be discussed later in this unit: chi-
square goodness-of-fit test and chi-square test for independence.
Determining whether there exists a relationship between age bracket and political
candidate preference is an example of a study where chi-square test can be used.
Inquiries such as “Does gender affect food preference?” are also another example where
chi-square test can be applied. If gender, in the latter example, affects food preference,
we say that the two categorical variables—gender and food preference—are dependent.
Otherwise, we conclude that the variables involved are independent, which means that
there exists no relationship between the variables.
where and are the observed and expected frequencies of category , respectively.
Let’s Practice!
Example 1: Suppose the expected frequencies of two given categories are 200 and 250.
During the study, the researcher observed 195 and 255 as the frequencies of
the two categories, respectively. What is the value of ?
Try It Yourself!
Suppose the expected frequencies of two given categories are 120 and 80. During
the study, the researcher observed 105 and 95 as the frequencies of the two
categories, respectively. What is the value of ?
Example 2: The following table shows the expected frequencies and actual observed
frequencies of three categories in a study. Calculate .
Try It Yourself!
The following table shows the expected frequencies and actual observed
frequencies of three categories in a study. Calculate .
Example 3: The following table shows the expected frequencies and actual observed
frequencies of four categories in a study. Calculate .
Try It Yourself!
Suppose the expected frequencies of four different categories in a given study are
equal. If the experiment is conducted 400 times and the actual observed
frequencies in the study are indicated below, calculate the critical value of .
Observed Frequency
Category 1 110
Category 2 70
Category 3 90
Category 4 130
Solution: Organize the given data using a table. Note that the expected frequency of a
category is equal to the percentage of the category multiplied by the sample
size.
Expected Frequency Observed Frequency
Textbooks
Reference Books
Inspirational Books
9
Try It Yourself!
Suit Frequency
Spade 26
Heart 22
Diamond 24
Club 28
Calculate .
b.
Expected Frequency Observed Frequency
Category 1 75 68
Category 2 100 102
Category 3 25 30
c.
Expected Frequency Observed Frequency
Category 1 200 150
Category 2 200 250
Category 3 200 212
Category 4 200 188
11
Warm Up!
Conduct a Poll
Instructions:
1. This activity should be done in groups of ten.
2. In each group, members respond to the following questions by saying Yes or No:
Do you play online games?
Can you play the guitar?
Can you sing?
Do you study prefer studying late at night?
Do you attend any fitness activity?
3. One group member takes note of the number of Yes and No responses to each
question and whether each response is from a male or a female. Then he/she
tabulates the frequencies. Compare results with other groups.
In Warm Up!, you may have observed that although questions for all groups are the same,
the proportion of boys or girls who responded yes vary. In one group, boys could be the
majority who responded Yes to a question but in another group, the majority of the
responses for the same question may have come from the girls. Using chi-square test, the
independence of gender and any other qualitative variable can be determined. In this
lesson, we will discuss the chi-square distribution.
12
Similar to the normal distribution and binomial distribution, the chi-square distribution (or
distribution) is a continuous probability distribution. In the chi-square distribution, the
minimum value of the random variable is 0 and there is no maximum value.
A degree of freedom is associated with each distribution. The following table shows the
critical values of the distribution with 1 to 15 degrees of freedom given the right tail
area.
Right-Tail Area
df
0.995 0.975 0.9 0.5 0.1 0.05 0.025 0.01 0.005
1 0.000 0.001 0.016 0.455 2.706 3.841 5.024 6.635 7.879
2 0.010 0.051 0.211 1.386 4.605 5.991 7.378 9.210 10.597
3 0.072 0.216 0.584 2.366 6.251 7.815 9.348 11.345 12.838
4 0.207 0.484 1.064 3.357 7.779 9.488 11.143 13.277 14.860
5 0.412 0.831 1.610 4.351 9.236 11.070 12.833 15.086 16.750
6 0.676 1.237 2.204 5.348 10.645 12.592 14.449 16.812 18.548
7 0.989 1.690 2.833 6.346 12.017 14.067 16.013 18.475 20.278
8 1.344 2.180 3.490 7.344 13.362 15.507 17.535 20.090 21.955
9 1.735 2.700 4.168 8.343 14.684 16.919 19.023 21.666 23.589
10 2.156 3.247 4.865 9.342 15.987 18.307 20.483 23.209 25.188
11 2.603 3.816 5.578 10.341 17.275 19.675 21.920 24.725 26.757
12 3.074 4.404 6.304 11.340 18.549 21.026 23.337 26.217 28.300
13 3.565 5.009 7.042 12.340 19.812 22.362 24.736 27.688 29.819
14 4.075 5.629 7.790 13.339 21.064 23.685 26.119 29.141 31.319
15 4.601 6.262 8.547 14.339 22.307 24.996 27.488 30.578 32.801
To use the chi-square table to determine a critical value, the degrees of freedom and
significance level are needed. Moreover, to estimate a -value, and -statistic are
needed.
This is evident in the given formula of the chi-square statistic wherein the
differences between the observed frequencies and the expected frequencies are
squared. Hence, is can only be positive or zero.
The mean of a chi-square distribution is equal to its degrees of freedom while the
variance is twice its degrees of freedom.
Let’s Practice!
Example 1: What is the critical value in a chi-square test if its distribution has 11
degrees of freedom and 0.1 significance level?
Try It Yourself!
What is the critical value in a chi-square test if its distribution has 5 degrees of
freedom and 0.025 significance level?
Try It Yourself!
Try It Yourself!
15
Real-World Problems
Suit Frequency
Club 524
Heart 520
Diamond 498
Spade 458
Calculate the chi-square statistic of the given data and its -value. Note that
the degrees of freedom of the given data is equal to , where is the
number of categories.
16
Try It Yourself!
Calculate the chi-square statistic of the given data and its -value.
2. Determine the -value given the following statistics and degrees of freedom.
a. ,
b. ,
c. ,
d. ,
e. ,
17
3. Determine the critical value in a chi-square test given the following significance
levels and degrees of freedom.
a.
b.
c.
d.
e.
18
Warm Up!
Toss It!
Instructions:
1. Divide the class into four groups.
2. For each group, toss the coin 50 times and record the outcomes.
3. Each group should briefly present their results.
In Warm Up!, each group may have recorded different sets of outcomes. If you compare
your outcomes with the theoretical probability of obtaining a head or a tail, that is, ,
your observed outcomes may not match this theoretical probability. It does not mean,
however, that the theoretical and experimental probabilities are significantly different. A
goodness-of-fit test can help us determine whether the outcomes that we observe match
the outcomes that we expect.
The chi-square test is can be employed to determine whether the observed frequencies
“fit” the expected frequencies, making a claim consistent, or to determine relationships
between categorical variables.
19
In a chi-square goodness-of-fit test, the null hypothesis states that each observed
frequency is equal to its expected frequency while the alternative hypothesis states that at
least one of the observed frequencies is different from its corresponding expected
frequency.
Note that the chi-square goodness-of-fit test is appropriate when the sampling method is
simple random sampling, the variables are categorical, and the expected frequency in
each category is at least 5.
Note that in most applications of the chi-square goodness-of-fit test, it is always assumed
that we are doing a one-tailed test.
Let’s Practice!
20
Solution: If rats have no preference, we expect that the three doors would be chosen
an equal number of times. Hence, the null hypothesis should state that the
probability of each door being chosen is .
The alternative hypothesis should state that rats have a door preference,
that is, the probability of at least one out of the three doors being chosen is
different from the others.
Try It Yourself!
A man was observed buying breakfast in a food stall with four different choices. If
is the probability that the man chooses stall , what are the appropriate hypotheses
if a study would want to know if the man has a food preference in the stall?
21
Try It Yourself!
Is there enough evidence to conclude that the rats have a door preference?
22
Try It Yourself!
A company claims that they produce equal numbers of the four different variations
of their product. If a random sample of 200 of their products was taken and the
following frequencies were recorded, test the claim of the company using a 0.01
level of significance.
Solution:
23
: The percentage of all-star, rookie, and veteran cards are 40%, 10%, and
50%, respectively.
It is given that .
Step 3: Use the -table to determine the critical value and rejection region.
24
There is not enough evidence to say that the company’s claim on the
percentages of basketball card designs is not true.
Try It Yourself!
25
2. A company’s food product has three different flavors. The company claims that
they distribute equal numbers of each flavor to their partner stores. In a sample of
900 products from a partner store, the following frequencies are observed:
Is there enough evidence to say that the partner store does not receive all flavors
equally?
3. It is reported in an old survey that the proportions of blood types A, B, AB, and O in
the country are 0.40, 0.10, 0.20, and 0.30, respectively. To determine whether the
present population still fits this reported percentage, 200 samples are selected.
Given the following observed frequencies, test if the reported percentages are still
true.
26
Warm Up!
Book Preference
Instructions:
1. This activity should be done as a class.
2. Each student should choose one book preference among manga, romantic
novel, and science fiction. Then tabulate the results using the following table:
In Warm Up!, you may have observed that there seems to be a relationship between
gender and book preference. However, we cannot statistically conclude by merely looking
at the table. There might be an observable difference between the responses of male and
female but we have to make sure that the difference is significant. The chi-square test for
independence allows us to arrive at such conclusion.
27
Using the chi-square test for independence, we can conclude whether two categorical
variables are dependent or independent of each other. That is, we can answer the
question: “Does one variable affect the other?”
The chi-square test for independence can be used when there exist two categorical
variables from a single population. It is used to determine if there is a significant
association between these categorical variables.
In a chi-square test for independence, the null hypothesis states that the two variables
under study are independent. On the other hand, the alternative hypothesis states that
the two variables are dependent.
The chi-square test for independence can be used when the sampling method is simple
random sampling, the variables are categorical, and the expected frequency of each entry
in the table is at least 5. Also, the degrees of freedom is given by
where is the number of levels for the first variable and is the number of levels for the
second variable.
The expected frequencies are computed separately for each category of the first variable
at each category of the second variable. The expected frequency at category of the
first variable and category of the second variable is defined by
where is the total number of sample observations in category of the first variable, is
the total number of sample observations in category of the second variable, and is the
sample size.
28
where and are the observed and expected frequencies, respectively, at category
of the first variable and category of the second variable.
Similar to the chi-square goodness-of-fit test, the following are the steps in performing a
chi-square test for independence:
Let’s Practice!
Example 1: Suppose a study about the relationship between gender and type of pet
owned is to be conducted. What are the appropriate hypotheses for this
study?
Solution: Since the two variable under study are categorical, we can use a chi-square
test for independence. Thus, we have the following hypotheses:
Try It Yourself!
Suppose a study about the relationship between citizenship and food preference is
to be conducted. What are the appropriate hypotheses for this study?
Solution: Since there are categories (flu, no flu) for the first variable and
categories (no vaccine, one shot, two shots) for the second variable,
.
30
Try It Yourself!
Example 3: In the given problem in Example 2, what is the value of the -statistic and
what is the appropriate conclusion? Use .
31
Try It Yourself!
Using 0.05 level of significance, is there enough evidence to say that performance in
mathematics is dependent on gender?
32
Solution:
It is given that .
Step 3: Use the chi-square table to determine the critical value and rejection region.
34
Try It Yourself!
35
Challenge Yourself!
36
Performance Task
As one of the members of the organizing committee, you are tasked to survey 100
students about their preferred field in college (e.g. business, science, math, arts). In
general, you need to determine whether the field of interest of the students is dependent
on gender. This is to help the organizing committee in deciding which group to orient first.
You must organize your data and present it to the committee.
37
Data are
Data are Data are
organized
organized organized
properly but
Data are not properly but properly. All
Organization some
organized some information
of Data information
properly. necessary needed in the
needed in the
parts are analysis are
analysis are
missing. present.
present.
The
There is no The presentation
observable presentation shows an in-
The
comprehension shows some depth
presentation
of the problem observable understanding
shows an
and the understanding of all the
Mathematical observable
method being of the required
Justifications complete
presented. The required mathematical
and understanding
solution covers mathematical components
Presentation of the
none of the knowledge. of the task. All
required
mathematical Some of the solutions and
mathematical
components solutions used procedures
components.
required to are incorrectly are presented
solve the task. executed. clearly and
concisely.
38
Wrap-up
Key Terms/Formulas
Lesson 1
1. 4.688
2. 16.125
3. 20
4. 0.8
39
Lesson 2
1. 12.833
2. 0.5
3.
4. 16.88;
Lesson 3
1. :
: At least one is not equal to .
2. ; ; The rejection region contains all values of such
that .
3. Since , do not reject the null hypothesis. Thus, there is not enough
evidence to refute the claim of the company.
4. Since , do not reject the null hypothesis. Thus, there is not
enough evidence to refute the claim of the publishing company about the
percentages of their books. Also, its p-value is between 0.5 and 0.9. Hence, at 0.05
level of significance, we can conclude that the company’s claim is consistent with
the sample data.
Lesson 4
1. : Citizenship and food preference are independent.
: Citizenship and food preference are dependent.
2. ; , , , , , and
.147.34, 91.16130.66258.580.84
3. Since , we fail to reject the null hypothesis. Thus, there is NOT
enough evidence to say that performance in mathematics is dependent on gender.
4. Since , we reject the null hypothesis. Thus, there is sufficient
evidence to say that gender and sports preference are dependent.
40
References
41