0% found this document useful (0 votes)
12 views35 pages

6.3 Chi-Square

The document provides an overview of the Chi-square distribution and its applications in biostatistics, including the Chi-square test of independence and goodness of fit. It outlines the properties of the Chi-square distribution, the difference between parametric and non-parametric tests, and the steps for conducting a Chi-square test, including hypothesis formulation and calculation of expected frequencies. Additionally, it discusses the use of Yate's continuity correction and Fisher's exact test when certain conditions are not met.

Uploaded by

sergekouassi065
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views35 pages

6.3 Chi-Square

The document provides an overview of the Chi-square distribution and its applications in biostatistics, including the Chi-square test of independence and goodness of fit. It outlines the properties of the Chi-square distribution, the difference between parametric and non-parametric tests, and the steps for conducting a Chi-square test, including hypothesis formulation and calculation of expected frequencies. Additionally, it discusses the use of Yate's continuity correction and Fisher's exact test when certain conditions are not met.

Uploaded by

sergekouassi065
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

CHI-SQUARE DISTRIBUTION (X2)

& Chi-Square Test


Biostatistics Course 2021-2022 / Block 6

Ali Lateef Jasim


MBChB.
Learning objectives

❑ Describe cross-tabulation and assess the relationship


between two categorical (nominal- ordinal level)
variables with two or more categories.
❑ Understand the concept of observed and expected
frequencies.
❑ Interpret the SPSS output for the Chi-square procedure.
❑ Understand the applications of Fischer exact test and yate
continuity test.
CHI-SQUARE (X2) DISTRIBUTION
PROPERTIES:
1. It is one of the most widely used distribution in statistical
applications.
2. This distribution may be derived from normal distribution.
3. This distribution assumes values from (zero to + infinity).
4. X2 relates to frequencies of occurrence of individuals (or
events) in the categories of one or more variables.
5. X2 test used to test the agreement between the observed
frequencies with certain characteristics and the expected
frequencies under certain hypothesis.
CHI-SQUARE (X2) DISTRIBUTION
❖ The values of test statistic in Chi-square distribution is
between zero and + ∞. No negative values are present since
they are squared values.
❖ The Chi-square distribution has one tail only (positively
skewed distribution).
❖ The higher the degrees of freedom (df) the more flattened
is the curve.
❖ It include three tests:
1. CHI-SQUARE (X2) test of Goodness of fit (non parametric)
2. CHI-SQUARE (X2) test of homogeneity
3. CHI-SQUARE (X2) test of Independence
What are the parametric vs non-
parametric tests?

❖ Parametric tests are those that make assumptions about


the parameters of the population distribution from which
the sample is drawn.
❖ This is often the assumption that the population data are
normally distributed.
❖ Non-parametric tests are “distribution-free” and, as such,
can be used for non-Normal variables.
What are the parametric vs non-
parametric tests?
❖ Non-parametric tests are valid for both non-Normally
distributed data and Normally distributed data, so why not
use them all the time?
1. We are rarely interested in a significance test alone; we
would like to say something about the population from
which the samples came, and this is best done with
estimates of parameters and confidence intervals.
2. Parametric tests usually have more statistical power than
their non-parametric equivalents. In other words, one is
more likely to detect significant differences when they truly
exist.
Reasons to Use Nonparametric Tests
1. The underlying data do not meet the assumptions about the
population sample.
Generally, the application of parametric tests requires various
assumptions to be satisfied. For example, the data follows a normal
distribution.
2. The population sample size is too small.
The sample size is an important assumption in selecting the
appropriate statistical method. If a sample size is reasonably large,
the applicable parametric test can be used. However, if a sample
size is too small, it is possible that you may not be able to validate
the distribution of the data. Thus, the application of nonparametric
tests is the only suitable option.
3. The analyzed data is ordinal or nominal.
CHI-SQUARE(X2) test of Independence
❖ It is used to test the null hypothesis that two criteria of
classification when applied to the same set of entities are
independent (NO ASSOCIATION).
❖ Generally, a single sample of size (n) can be drawn from a
population, the frequency of occurrence of the entities are
cross-classified on the basis of the two variables of interest
(X & Y). The corresponding cells are formed by the
intersections of the rows (r), and the columns (c).
❖ The table is called the ‘contingency table’
❖ Calculation of expected frequency is based on the
Probability Theory.
CHI-SQUARE(X2) test of Independence
❖ The hypotheses and conclusions are stated on in terms of
the independence or lack of independence of the two
variables.

X2 = ∑ (O-E)2 / E
Df = (r-1)(c-1)
CHI-SQUARE (X2) DISTRIBUTION
Steps in constructing Chi square -test
Steps in constructing Chi square -test

1. Hypotheses
Ho: the 2 criteria are independent (no association)
HA: The 2 criteria are not independent (There is association)
2. Construct the contingency table
3. Calculate the expected frequency for each cell: By
multiplying the corresponding marginal totals of that cell,
and divide it by the sample size.
E = (row total x column total) / grand total
Steps in constructing Chi square -test
4. Calculate the X2 value (calculated X2 c)
✓ X2 = ∑ (O-E)2 / E.
✓ For each cell we will calculate X2 value.
✓ X2 value for all the cells of the contingency table will be
added together to find X2 c.
5. Define the critical value (tabulated X2)
✓ This depends on alpha level of significance and degrees
of freedom. The value will be determined from X2 table
✓ df=(r-1)(c-1) r: no. of row c: no. of column
Chi-Square Table (Tabulated X 2 )
Steps in constructing Chi square -test

6. Conclusion
✓ If the X2 calculated is less than X2 tabulated we accept Ho.
✓ If the X2 calculated is more than X2 tab we reject Ho.
✓ The tabulated X2 for 2x2 table with df=1 and alpha error=
0.05 is equal to (1.96)2 = 3.84.
Notes

For the cross table (r×c) X2 test is not applicable if:


A. The expected frequency of any cell is <1.
B. The summation of the least expected frequencies in
20% of the cells is < 5

For 2 × 2 table X2 test is not applicable if: The


expected frequency of any cell is <5
Example. 1
The table shows the distribution of individuals according to
3 categories of Socioeconomic Index Level (SEIL).

SEIL
No %

Low 50 25

Average 110 55

High 40 20

Total 200 100

In the same sample the location of residence was also


classified into 3 sectors: south, center and north.
Example. 1
N %

south 44 22

center 96 48

north 60 30

Total 200 100

The table shows the distribution of individuals according to


3 categories of Socioeconomic Index Level (SEIL).

SEIL South Center North Total


Low 33 7 10 50
Average 9 81 20 110
High 2 8 30 40
Total 44 96 60 200
Example. 1
Example. 1
Example. 1
Example. 1
Example. 1

❖ The last one is the most important.


❖ It shows the calculated Chi square =126.2
❖ The df = 4 and P value <0.001.
Example. 2
A food services manager for a baseball park wants to
know if there is a relationship between gender (male or
female) and the preferred condiment on a hot dog. The
following table summarizes the results. Test the hypothesis
with a significance level of 95%.

Condiment
Ketchup Mustard Relish Total
Male 15 23 10 48
Gender

Female 25 19 8 52
Total 40 42 18 100
Example. 2
1. The hypotheses are:
Ho :Gender and condiments are independent.
HA : Gender and condiments are not independent.
2. Cross tabulation

Condiment
Ketchup Mustard Relish Total
Male 15 23 10 48
Gender

Female 25 19 8 52
Total 40 42 18 100
Example. 2
3. Now we need to calculate the expected values for each
cell (we have 6 cells):
E = (row total x column total) / grand total
E (C1) = 48*40 / 100 = 19.2 Condiment
Ketchup Mustard Relish Total

E (C2) = 48*42 / 100 = 20.16 Male C1 C2 C3 48

Gender
Female C4 C5 C6 52

E (C3) = 48*18 / 100 = 8.64 Total 40 42 18 100

E (C4) = 52*40 / 100 = 20.8 None of the expected counts in the


table are less than 5. Therefore, we
E (C5) = 52*42 / 100 = 21.84
can proceed with the Chi-Square
E (C6) = 52*18 / 100 = 9.36 test.
Example. 2
4. The test statistic (X2 value) is:
X2 = ∑ (O-E) 2/E

X2 = 17.64/19.2 + 8.0656/20.16 + 1.85/8.64 + 17.64/20.8 +


8.0656/21.84 + 1.85/9.36 =
= 0.92 + 0.4 + 0.214 + 0.848 + 0.369 + 0.197 = 2.948
Example. 2
5. Define the critical value (tabulated X 2):
Degrees of freedom = (r-1)(c-1) = (2-1)(3-1) = 1*2 = 2

So, tabulated X 2 = 5.991

# To calculate the p value we use the SPSS or we cam use


an alternative method such as this website:
https://www.socscistatistics.com/pvalues/chidistribution.aspx
Example. 2
So, the p value for this example is as follow:
Example. 2
6. Conclusion
As we have the calculated X2 is 2.948 which is less than
the tabulated X2 which equal to 5.991, so here we accept
the null hypothesis (Ho) which states that Gender and
condiments are independent (no relationship between the
two variables).
Alternatively, the p-value for this example was 0.229
which is much greater than the accepted alpha error =
0.05, So we accept the null hypothesis (Ho) there is no
statistically significant relationship between the two
variables.
Yate’s Continuity Correction

❖ Because Chi square is a continuous distribution and


categorical data are discrete, some statisticians use a
version of chi square called Yate’s corrected chi square.
❖ X² = ∑ ( |O – E| - 0.5)² / E
❖ The corrected version is more conservative than the non
corrected version.
Yate’s Continuity Correction

❖ For the example above we calculate the Yate’s corrected


chi square as follows:
( | 15 -19.2 | - 0.5)² /19.2 + (|23-20.16| – 0.5)² /20.16 + (|10-8.64| –
0.5)² /8.64 + (|25-20.8| - 0.5)² /20.8 + (|19-21.84| -0.5)² /21.84 +
(|8-9.36| -0.5)² /9.36 =
= 0.713 + 0.271 + 0.0856 + 0.658 + 0.25 + 0.079 = 2.0506
Fisher’s Exact Test

❖ This test is used if chi square test is not applicable because of


small expected value. (when the expected frequency of any
cell in a 2 X 2 table is less than 5)
❖ For tables in which the use of chi square test is appropriate,
the two tests give very similar results

(a+b) ! (c+d) ! (a+c) ! (b+d) !


P= ----------------------------------
n!a!b!c!d!
Fisher’s Exact Test

❖ To calculate the Fisher’s Exact Test for 2×2 tables use:


https://www.graphpad.com/quickcalcs/contingency1/

❖ To calculate the Fisher’s Exact Test for 2×3 tables use:


http://vassarstats.net/fisher2x3.html
Thank You
Ali Lateef Jasim
MBChB.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy