Chapter 12 - Chi-Squared Test - Send
Chapter 12 - Chi-Squared Test - Send
2.1
Example Occupation
Newspaper Blue Collar White Collar Professional Total
G&M 27 29 33 89
Post 18 43 51 112
Star 38 21 22 81
Sun 37 15 20 72
Total 120 108 126 354
2.2
Example
To covert the frequencies in each column to relative frequencies in each
column:
Occupation
Newspaper Blue Collar White Collar Professional
G&M 27/120 =.23 29/108 = .27 33/126 = .26
Post 18/120 = .15 43/108 = .40 51/126 = .40
Star 38/120 = .32 21/108 = .19 22/126 = .17
Sun 37/120 = .31 15/108 = .14 20/126 = .16
2.3
Example
Interpretation: The relative frequencies in the columns 2 & 3 are similar, but there are large
differences between columns 1 and 2 and between columns 1 and 3.
similar
dissimilar
This tells us that blue collar workers tend to read different newspapers from both white
collar workers and professionals and that white collar and professionals are quite similar
in their newspaper choice.
2.4
Graphing the Relationship Between Two Nominal Variables…
Use the data from the cross-classification table to create bar charts…
Professionals tend
to read the Post
more than twice as
often as the Star or
Sun…
2.5
Chapter 10
Chi-Squared Tests
15.6
Two Techniques…
15.7
The Multinomial Experiment
A multinomial experiment:
• Consists of a fixed number, n, of trials.
• Each trial can have one of k outcomes, called cells.
• Each probability pi remains constant.
• Our usual notion of probabilities holds, namely:
p1 + p2 + … + pk = 1, and
• Each trial is independent of the other trials.
15.8
Example: Before the advertising campaigns began, the
market share of company A was 45%, whereas company
B had 40% of the market. Other competitors accounted
for the remaining 15%.
After the advertising campaigns, a marketing analyst
solicited the preferences of a random sample of 200
customers of fabric softener. Of the 200 customers, 102
indicated a preference for company A's product, 82
preferred company B's, and the remaining 16 preferred
the products of one of the competitors.
Can the analyst infer at the 5% significance level that
customer preferences have changed from their levels
before the advertising campaigns were launched?
15.9
Chi-squared Goodness-of-Fit Test…
We test whether there is sufficient evidence to reject a
specified set of values for pi.
Step 1: H0: p1 = a1, p2 = a2, …, pk = ak
H1: At least one pi is not equal to its specified value
Step 2: = 0.05;
Step 3: n= 200
Required Conditions: The sample size must be large
enough so that the expected value for each cell is 5 or
more. (i.e. (n)(pi )≥ 5). If the expected frequency is less
than five, combine it with other cells to satisfy the
condition. This statistic is approximately Chi-squared
with k–1 degrees of freedom 15.10
Step 4: The reject region is
15.11
If the null hypothesis is true, the expected frequency for
each cell is ei = npi. That is,
e1 = 200(.45) = 90
e2 = 200(.40) = 80
e3 = 200(.15) = 30
15.12
To calculate our test statistic, we lay-out the data in a
tabular fashion for easier calculation by hand:
Observed Expected Summation
Delta
Company Frequency Frequency Component
fi ei (fi – ei) (fi – ei)2/ei
A 102 90 12 1.60
B 82 80 2 0.05
Others 16 30 -14 6.53
Total 200 200 8.18
ei=(n)(pi)
15.14
Chi-squared Test of a Contingency Table
15.15
Example: A statistics professor took a random sample
of last year's MBA students and recorded the
undergraduate degree and the major selected in the
graduate program.
The undergraduate degrees were BA, BEng, BBA, and
several others.
There are three possible majors for the MBA students,
accounting, finance, and marketing.
Can the statistician conclude that the undergraduate
degree affects the choice of major?
15.16
Xm15-02 The data are stored in two columns.
The first column represents the undergraduate degree
where 1 = BA; 2 = Beng; 3 = BBA; 4 = other.
The second column lists the MBA major where 1=
Accounting; 2 = Finance; 3 = Marketing.
The problem objective is to determine whether two
variables (undergraduate degree and MBA major) are
related. Both variables are nominal.
The technique to use is the chi-squared test of a
contingency table.
15.17
Example 15.2 IDENTIFY
,( r −1)( c −1)
2 2
.205,6 = 12.6
16.19
The test statistic is the same as the one used to test
proportions in the goodness-of-fit-test. That is,
( f − e ) 2
2 = i i
ei
where ei = npi.
Because we don’t know the pi (they are not specified by
the null hypothesis), it is necessary to estimate the pi
from the data.
15.20
The first step is to count the number of students in each
of the 12 combinations. The result is called a cross-
classification table.
MBA Major
Undergrad
Accounting Finance Marketing Total
Degree
BA 31 13 16 60
BEng 8 16 7 31
BBA 12 10 17 39
Other 10 5 7 22
Total 61 44 47 152
15.21
If the null hypothesis is true, the two nominal variables
are independent, then, for example
P(BA and Accounting) = [P(BA)] [P(Accounting)]
We need to use the data to estimate the probabilities.
P(Accounting) 61 = .401 ; P(BA) 60 = .395
152 152
If the null hypothesis is true,
61 60
P(BA and Accounting)=P(BA) P(Accounting)
152 152
15.24