0% found this document useful (0 votes)
58 views24 pages

Chapter 12 - Chi-Squared Test - Send

The document describes how to use a chi-squared test to analyze the relationship between two nominal variables. It provides an example of using a contingency table to test whether an undergraduate degree and MBA major chosen are independent. The chi-squared test compares the observed frequencies in the contingency table to expected frequencies calculated under the assumption that the variables are independent. If the test statistic is greater than the critical value, the null hypothesis of independence can be rejected.

Uploaded by

Ha Uyen Nguyen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views24 pages

Chapter 12 - Chi-Squared Test - Send

The document describes how to use a chi-squared test to analyze the relationship between two nominal variables. It provides an example of using a contingency table to test whether an undergraduate degree and MBA major chosen are independent. The chi-squared test compares the observed frequencies in the contingency table to expected frequencies calculated under the assumption that the variables are independent. If the test statistic is greater than the critical value, the null hypothesis of independence can be rejected.

Uploaded by

Ha Uyen Nguyen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Relationship between Two Nominal Variables…

To describe the relationship between two nominal


variables, a cross-classification table lists the frequency
of each combination of the values of the two variables…

2.1
Example Occupation
Newspaper Blue Collar White Collar Professional Total

G&M 27 29 33 89
Post 18 43 51 112
Star 38 21 22 81
Sun 37 15 20 72
Total 120 108 126 354

2.2
Example
To covert the frequencies in each column to relative frequencies in each
column:

Occupation
Newspaper Blue Collar White Collar Professional
G&M 27/120 =.23 29/108 = .27 33/126 = .26
Post 18/120 = .15 43/108 = .40 51/126 = .40
Star 38/120 = .32 21/108 = .19 22/126 = .17
Sun 37/120 = .31 15/108 = .14 20/126 = .16

2.3
Example
Interpretation: The relative frequencies in the columns 2 & 3 are similar, but there are large
differences between columns 1 and 2 and between columns 1 and 3.

similar

dissimilar
This tells us that blue collar workers tend to read different newspapers from both white
collar workers and professionals and that white collar and professionals are quite similar
in their newspaper choice.
2.4
Graphing the Relationship Between Two Nominal Variables…

Use the data from the cross-classification table to create bar charts…

Professionals tend
to read the Post
more than twice as
often as the Star or
Sun…

2.5
Chapter 10

Chi-Squared Tests

15.6
Two Techniques…

The first is a goodness-of-fit test applied to data


produced by a multinomial experiment. (To see whether
we have a particular distribution)
The second uses data arranged in a contingency table
to determine whether two classifications of a population
of nominal data are statistically independent (To see
whether the two variables are independent)

In both cases, we use the chi-squared ( ) distribution.

15.7
The Multinomial Experiment

A multinomial experiment:
• Consists of a fixed number, n, of trials.
• Each trial can have one of k outcomes, called cells.
• Each probability pi remains constant.
• Our usual notion of probabilities holds, namely:
p1 + p2 + … + pk = 1, and
• Each trial is independent of the other trials.

15.8
Example: Before the advertising campaigns began, the
market share of company A was 45%, whereas company
B had 40% of the market. Other competitors accounted
for the remaining 15%.
After the advertising campaigns, a marketing analyst
solicited the preferences of a random sample of 200
customers of fabric softener. Of the 200 customers, 102
indicated a preference for company A's product, 82
preferred company B's, and the remaining 16 preferred
the products of one of the competitors.
Can the analyst infer at the 5% significance level that
customer preferences have changed from their levels
before the advertising campaigns were launched?
15.9
Chi-squared Goodness-of-Fit Test…
We test whether there is sufficient evidence to reject a
specified set of values for pi.
Step 1: H0: p1 = a1, p2 = a2, …, pk = ak
H1: At least one pi is not equal to its specified value
Step 2:  = 0.05;
Step 3: n= 200
Required Conditions: The sample size must be large
enough so that the expected value for each cell is 5 or
more. (i.e. (n)(pi )≥ 5). If the expected frequency is less
than five, combine it with other cells to satisfy the
condition. This statistic is approximately Chi-squared
with k–1 degrees of freedom 15.10
Step 4: The reject region is

Step 5: Our Chi-squared goodness of fit test statistic


measures the similarity of the expected and observed
frequencies. expected
observed frequency
frequency

15.11
If the null hypothesis is true, the expected frequency for
each cell is ei = npi. That is,
e1 = 200(.45) = 90
e2 = 200(.40) = 80
e3 = 200(.15) = 30

15.12
To calculate our test statistic, we lay-out the data in a
tabular fashion for easier calculation by hand:
Observed Expected Summation
Delta
Company Frequency Frequency Component
fi ei (fi – ei) (fi – ei)2/ei
A 102 90 12 1.60
B 82 80 2 0.05
Others 16 30 -14 6.53
Total 200 200 8.18

Check that these are equal

Step 6: If the expected and observed frequencies are


quite different, we would reject H0. Since our test
statistic is 8.18 which is greater than our critical value
for Chi-squared, we reject H0 in favor of H1
Identifying Factors…

Factors that Identify the Chi-Squared Goodness-of-Fit


Test:

ei=(n)(pi)

15.14
Chi-squared Test of a Contingency Table

The Chi-squared test of a contingency table is used to:


• determine whether there is enough evidence to
infer that two nominal variables are independent, and
• to infer that differences exist among two or more
populations of nominal variables.

15.15
Example: A statistics professor took a random sample
of last year's MBA students and recorded the
undergraduate degree and the major selected in the
graduate program.
The undergraduate degrees were BA, BEng, BBA, and
several others.
There are three possible majors for the MBA students,
accounting, finance, and marketing.
Can the statistician conclude that the undergraduate
degree affects the choice of major?

15.16
Xm15-02 The data are stored in two columns.
The first column represents the undergraduate degree
where 1 = BA; 2 = Beng; 3 = BBA; 4 = other.
The second column lists the MBA major where 1=
Accounting; 2 = Finance; 3 = Marketing.
The problem objective is to determine whether two
variables (undergraduate degree and MBA major) are
related. Both variables are nominal.
The technique to use is the chi-squared test of a
contingency table.

15.17
Example 15.2 IDENTIFY

Step 1: H0: The two variables are independent.


H1: The two variables are dependent
Note that, the null does not specify the proportions pi,
from which we compute ei = npi
Step 2: =.05
Step 3: 152
In a contingency table where one or more cells have
expected values of less than 5, we need to combine
rows or columns to satisfy the rule of five. By doing
this, the degrees of freedom must be changed as well.
15.18
Step 4: The reject region is

   ,( r −1)( c −1)
2 2

Where r=4 is the number of rows and c=3 the number of


columns.

.205,6 = 12.6
16.19
The test statistic is the same as the one used to test
proportions in the goodness-of-fit-test. That is,
( f − e ) 2
2 = i i
ei

where ei = npi.
Because we don’t know the pi (they are not specified by
the null hypothesis), it is necessary to estimate the pi
from the data.

15.20
The first step is to count the number of students in each
of the 12 combinations. The result is called a cross-
classification table.

MBA Major

Undergrad
Accounting Finance Marketing Total
Degree

BA 31 13 16 60

BEng 8 16 7 31

BBA 12 10 17 39

Other 10 5 7 22

Total 61 44 47 152
15.21
If the null hypothesis is true, the two nominal variables
are independent, then, for example
P(BA and Accounting) = [P(BA)] [P(Accounting)]
We need to use the data to estimate the probabilities.
P(Accounting)  61 = .401 ; P(BA)  60 = .395
152 152
If the null hypothesis is true,
61 60
P(BA and Accounting)=P(BA) P(Accounting)  
152 152

The expected number of BA and Accounting out of 152


is 61 60
 152   = 24.08
152 152
15.22
We can now compare observed with expected
frequencies… MBA Major
Undergrad
Accounting Finance Marketing
Degree
BA 31 24.08 13 17.37 16 18.55
BEng 8 12.44 16 8.97 7 9.59
BBA 12 15.65 10 11.29 17 12.06
Other 10 8.83 5 6.37 7 6.80

and calculate our test statistic:

Step 6: There is enough evidence to infer that the MBA


major and the undergraduate degree are related.
15.23
Factors that identify the Chi-squared test of a
contingency table:

15.24

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy