100% found this document useful (1 vote)
61 views14 pages

Chi Square Exercises

The document discusses chi-square procedures, including properties of the chi-square distribution, finding chi-square values for specific areas under the curve, and three commonly used chi-square tests: goodness-of-fit tests, tests of independence, and tests of homogeneity of proportions. It also provides detailed steps for conducting goodness-of-fit tests and tests of independence using chi-square.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
61 views14 pages

Chi Square Exercises

The document discusses chi-square procedures, including properties of the chi-square distribution, finding chi-square values for specific areas under the curve, and three commonly used chi-square tests: goodness-of-fit tests, tests of independence, and tests of homogeneity of proportions. It also provides detailed steps for conducting goodness-of-fit tests and tests of independence using chi-square.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Chi-square Procedures

Recall Properties

1. The total area under 2 curve is equal to 1.

2. The 2 curve starts at 0 on the horizontal axis and extends


indefinitely to the right, approaching the horizontal axis as it
does so.

3. The 2 curve is not symmetrical.

4. As the number of df gets larger, 2 curves look increasingly


like normal curves.

Finding 2 for a Specified AUC

Example:  = 0.05
Df = 7

Solution: The 2 value for  = 0.05 is 14.067,  2R


The 2 value for 1- = 0.95 is 2.167,  2L
Note: To find the left value look at the table
in the book; not on the formula card!

Chi-Square Procedures Page -1-


Class Notes – Updated Dec 2014
Prepared by: Nina Kajiji
Three Commonly Used Chi-Sq Tests
1. Goodness-Of-Fit Test
a. Based on a simple frequency distribution involving
one variable.
b. NOTE: If the frequency distribution of the variable is
reported in classes, then use the class mark to
represent the class.
c. Used to test if the frequency distribution fits a
theoretical distribution or some specified pattern

2. Test of Independence
a. Based on frequency in a contingency table of two or
more variables of a single sample
b. Used to test if the two variables are associated or not;
alternatively tests if the outcome of one variable has
an effect on the outcome of the other variable(s)

3. Test of Homogeneity of Proportions


a. Based on proportions of a variable in a contingency
table. The variable is sampled multiple times from
different populations.
b. Used to test if the proportions are the same

Chi-Square Procedures Page -2-


Class Notes – Updated Dec 2014
Prepared by: Nina Kajiji
Goodness-Of-Fit Test
I. Assumptions:
1. All expected frequencies [NOT observed frequencies (O) –
see step 2 for calculation of expected frequencies] are at
least 1
2. At most 20% of the expected frequencies are < 5.0

II. Steps:
1. State the null and alternative hypotheses
Ho: The variable under consideration has the
specified distribution.
Ha: The variable under consideration does not have
the specified distribution.

2. Calculate the expected frequencies using the formula


E = np
Where: n is the sample size and p is the probability for the
category given in the null hypothesis.

3. Check whether the expected frequencies (from step 2) satisfy


the assumptions listed above in I. If they do not, then do not
use this procedure. Alternative methods to use are outside
the scope of this class. If assumptions are met continue to
step 4.

4. Prepare to compute the 2 test. Start by choosing the


significance level (generally provided to you; if not default is
generally α = 0.05)

5. Obtain the critical value of 2 with df=k-1, where k is the


number of categories in the distribution.
Chi-Square Procedures Page -3-
Class Notes – Updated Dec 2014
Prepared by: Nina Kajiji
6. Compute the test statistic: 2 =

7. If the value of the test statistic falls in the rejection region,


reject Ho; otherwise, do not reject Ho.

8. State conclusion in words.

Example
Observing that the proportion of blue M&Ms in his bowl of candy
appeared to be less than that of the other colors, R. Fricker, Jr. decided
to compare the color distribution in randomly chosen bags of M&Ms to
the theoretical distribution reported by M&M/MARS consumer affairs.
Fricker published his findings in the article, “The Mysterious Case of the
Blue M&Ms” (Chance, Vol 9(4), pp 19-22). For his experiment Fricker
bought three bags of M&Ms from local stores and counted the number
of each color. The average number of each color in the three bags was
distributed as shown in the “Observed Frequency” column.

Q: Do the data provide sufficient evidence to conclude that the color


distribution of M&Ms observed by Fricker differs from the distribution
of colors reported by M&M/MARS consumer affairs? Use a 95%
confidence level.

Chi-Square Procedures Page -4-


Class Notes – Updated Dec 2014
Prepared by: Nina Kajiji
Given to you. In this Results of the
example provided by Experiment
MARS

M&M Relative Observed Expected Frequency


Colors Frequency Frequency (O) (E=np)
(Theoretical
Distribution)
(p)
Brown 0.30 152 (509)(0.30) = 152.7
Yellow 0.20 114 (509)(0.20) = 101.8
Red 0.20 106 (509)(0.20) = 101.8
Orange 0.10 51 (509)(0.10) = 50.9
Green 0.10 43 (509)(0.10) = 50.9
Blue 0.10 43 (509)(0.10) = 50.9
Totals 1.00 n=509

Solution
We have n = 509
Df = (k-1) = 5
Critical Chi-Square value at = 0.05 with df=5 is 11.070
All expected frequencies are 1 or greater.
At most 20% of the expected frequencies are less than 5.

Ho: The color distribution of the M&Ms is as reported by the company.


Ha: The color distribution of the M&Ms is different from that reported by the
company.

The test statistic is:

The value of the test statistic does not fall in the rejection region, so we do not
reject H0. The test results are not statistically significant at the 5% level.

The data do not provide sufficient evidence to conclude that the color distribution
of M&Ms differs from that reported by M&M / MARS consumer affairs.
Chi-Square Procedures Page -5-
Class Notes – Updated Dec 2014
Prepared by: Nina Kajiji
Contingency Tables
Contingency Tables are created by simultaneously grouping data
from two or more variables into frequency distribution. That is,
contingency tables or n-way tables represent the frequency
distribution of a bivariate (n-variate) data.

Association (Independence)
Association is the relationship of two variables with each other.
In other words, there exists an association between two variables
of a population if knowing the value of one allows us to obtain
information about the other.

Conditional Distribution: Each column of a contingency table


can provides the conditional distribution of the variable shown
on the rows of the table by the class level represented by that
column.

Marginal Distribution: The total column provides the


(unconditional) distribution of the variable shows as the rows of
the contingency table. This is called the marginal distribution.

If there is no association between two variables, the conditional


and marginal distributions will be the same for all class levels.

The normal charting technique to show association is a


segmented (stacked) bar chart.

Chi-Square Procedures Page -6-


Class Notes – Updated Dec 2014
Prepared by: Nina Kajiji
Chi-Square Independence Test
Assumptions:
1) All expected frequencies (calculated in Step 2 below) are 1
or greater.
2) At most 20% of the expected frequencies are < 5.

Steps:
1) State the null and alternative hypotheses
Ho: The two variables under consideration are not
associated
Ha: The two variables under consideration are
associated.

2) Calculate the expected frequencies using the formula

Where:
R = Row total
C = Column total
n = sample size

Place each expected frequency below its corresponding


observed frequency in the contingency table. You will
have as many expected frequencies as you have cells in the
table.

3) Check whether the expected frequencies satisfy the


assumptions. If they do not, then do not use this
procedure. Alternative methods to use are outside the scope
of this class. If assumptions are met continue to step 4.
Chi-Square Procedures Page -7-
Class Notes – Updated Dec 2014
Prepared by: Nina Kajiji
4) Prepare to compute the 2 test. Start by choosing the
significance level (generally provided to you; if not default
is generally α = 0.05)

5) Obtain the critical value of 2 with df = (r-1)(c-1), where r


and c are the number of possible values for the two
variables under consideration. Alternatively, r is the
number of rows; and c is the number of columns in the
contingency table.

6) Compute the test statistic: 2 =

Where:
k is
O is the observed frequencies
E is the expected frequencies

7) If the value of the test statistic falls in the rejection region,


reject Ho; otherwise, do not reject Ho.

8) State conclusion in words.

Chi-Square Procedures Page -8-


Class Notes – Updated Dec 2014
Prepared by: Nina Kajiji
Example
Siskel’s\Ebert’s Thumbs Down Mixed Thumbs Up Row
Totals
Thumbs Down 24 8 13 45
Mixed 8 13 11 32
Thumbs Up 10 9 64 83
Column Totals 42 30 88 160

At the 1% significance level do the data provide sufficient evidence to


conclude that an association exists between the ratings of Siskel and
Ebert?

Ho: Siskel’s and Ebert’s ratings are not associated (no association) -OR-
independent
Ha: Siskel’s and Ebert’s ratings are associated -OR- not independent

Table of Observed and Expected Frequencies


Siskel’s\Ebert’s Thumbs Down Mixed Thumbs Up Total
Thumbs Down 24 8 13 45
11.81 E11 8.44 E12 24.75 E13
Mixed 8 13 11 32
8.4 E21 6 E22 17.6 E23

Thumbs Up 10 9 64 83
21.79 E31 15.56 E32 45.65 E33
Total 42 30 88 160

For example: The expected frequency for Thumbs Down from both
Siskel and Ebert (E11) is: (45)(42) / 160 = 11.81

E21: (32)(42) / 160 = 8.4

All numbers in red are calculated in the same manner using the
appropriate row and column totals.

Chi-Square Procedures Page -9-


Class Notes – Updated Dec 2014
Prepared by: Nina Kajiji
The test statistic will be:

Critical chi-square value at the 1% level with df=(r-1)(c-1)=4 is


13.277.

Since 45.365 lies in the rejection region we reject Ho. The test
results are statistically significant at the 1% level. We thus
conclude that at the 1% significance level, the data provide
sufficient evidence to conclude that there is an association
(relationship) between Siskel’s and Ebert’s ratings.

Chi-Square Procedures Page -10-


Class Notes – Updated Dec 2014
Prepared by: Nina Kajiji
Chi-Square Test of Homogeneity of Proportions
Assumptions:
1) All expected proportions (calculated in Step 2 below) are 1
or greater.
2) At most 20% of the expected proportions are < 5.

Steps:
1. State the null and alternative hypotheses
Ho: p1 = p2 = … = ps (where s = # of samples)
Ha: At least one proportion is different

2. Calculate the expected proportion using the formula

Where:
R = Row total
C = Column total
n = sample size

Place each expected frequency below its corresponding


observed frequency in the contingency table. You will
have as many expected frequencies as you have cells in the
table.

3. Check whether the expected frequencies satisfy the


assumptions. If they do not, then do not use this
procedure. Alternative methods to use are outside the scope
of this class. If assumptions are met continue to step 4.

Chi-Square Procedures Page -11-


Class Notes – Updated Dec 2014
Prepared by: Nina Kajiji
4. Prepare to compute the 2 test. Start by choosing the
significance level (generally provided to you; if not default
is generally α = 0.05)

5. Obtain the critical value of 2 with df = (r-1)(c-1), where r


and c are the number of possible values for the two
variables under consideration. Alternatively, r is the
number of rows; and c is the number of columns in the
contingency table.

6. Compute the test statistic: 2 =

Where:
k is
O is the observed proportions
E is the expected proportions

7. If the value of the test statistic falls in the rejection region,


reject Ho; otherwise, do not reject Ho.

8. State conclusion in words.

Chi-Square Procedures Page -12-


Class Notes – Updated Dec 2014
Prepared by: Nina Kajiji
Example
A psychologist selected 100 people from each of the four income groups
and asked them if they were “very happy”. The percent for each group
who responded “yes” and the number from the survey are show in
below. At the 95% CL, test the claim that there is no difference in the
proportions.

Household Less than $30,000 to $75,000 to $100,000 Row


Income $30,000 $74,999 $99,999 or more Totals
Yes 24% 33% 38% 49% 144
No 76% 67% 62% 51% 256
Column Totals 100 100 100 100 400

Ho: p1 = p2 = p3 = p4
Ha: At least one proportion is different

Table of Observed and Expected Frequencies


Household Less than $30,000 to $75,000 to $100,000 Row
Income $30,000 $74,999 $99,999 or more Totals
24 33 38 49
Yes 144
36 E11 36 E12 36 E13 36 E14

No 76 67 62 51
256
64 E21 64 E22 64 E23 36 E24

Total 100 100 100 100 400

For example: The expected proportion for (E11) is: (144)(100) / 400 =
11.81

E21: (256)(100) / 400 = 64

All numbers in red are calculated in the same manner using the
appropriate row and column totals.

Chi-Square Procedures Page -13-


Class Notes – Updated Dec 2014
Prepared by: Nina Kajiji
The test statistic will be:

Critical chi-square value at the 5% level with df=(r-1)(c-1)=3 is


7.815.

Since 14.15 lies in the rejection region we reject Ho. The test
results are statistically significant at the 5% level. We thus
conclude that at the 5% significance level, the data provide
sufficient evidence to conclude that there is at least one
proportion that is different. Hence the incomes seem to make a
difference in the proportion of people that are happy.

Chi-Square Procedures Page -14-


Class Notes – Updated Dec 2014
Prepared by: Nina Kajiji

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy