0% found this document useful (0 votes)
57 views36 pages

1 - CA51018 - Chi Square - Introduction - Goodness of Fit Test - 2

The document discusses the chi-square distribution and chi-square tests. It defines the chi-square distribution and lists some of its characteristics and assumptions. It then describes different types of chi-square tests including goodness-of-fit tests and tests of independence.

Uploaded by

tzmjsn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views36 pages

1 - CA51018 - Chi Square - Introduction - Goodness of Fit Test - 2

The document discusses the chi-square distribution and chi-square tests. It defines the chi-square distribution and lists some of its characteristics and assumptions. It then describes different types of chi-square tests including goodness-of-fit tests and tests of independence.

Uploaded by

tzmjsn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

2

x test


always positive answer
The Chi-square Distribution
The Chi-square distribution is a continuous probability distribution. It
is the distribution of a sum of the squares of k independent standard
normal random variables.
In the Chi-square distribution, as the degrees of freedom increases,
the Chi-square distribution approaches a normal distribution.

https://tophat.com/marketplace/science-&-math/statistics/full-course/statistics-for-social-science-stephen-hayward/211/34398/
Characteristics of a Chi-square Distribution:
1. The chi-square distribution is a family of curves based on
the degrees of freedom.
2. The chi-square distributions are positively skewed.
3. All chi-square values are greater than or equal to zero.
4. The total area under each chi-square distribution is equal to
one.

Statistical Analysis with Software Applications, Mc Graw Hill


General Assumptions of a Chi-square Distribution:
1. The sample was chosen using a random sampling method.
2. The variables being analyzed are categorical (nominal or
ordinal).
3. All chi-square values are greater than or equal to zero.
4. The total area under each chi-square distribution is equal
to one.

https://tophat.com/marketplace/science-&-math/statistics/full-course/statistics-for-social-science-stephen-
hayward/211/34398/
The Chi-square Distribution can be used to
•find a confidence interval for a variance or standard deviation;
•test a hypothesis about a single variance or standard deviation;
•test concerning frequency distributions;
•test the goodness-of-fit test;
•test for independence of two categorical variables;
•test the homogeneity of proportions;
•test the normality of the variable.

Statistical Analysis with Software Applications, Mc Graw Hill


Chi-square Test:
A chi-square test (or chi-squared test), denoted by χ2 is statistical
hypothesis test
● used to investigate whether distributions of categorical variables (at
the nominal or ordinal levels of measurement) significantly differ from
one another.
(0)
● commonly used to compare observed data (actual value) with data we
would expect (expected value) to obtain according to a specific
hypothesis. (E)

● used to test information about the proportion or percentage of people


or things who fit into a category.
https://tophat.com/marketplace/science-&-math/statistics/full-course/statistics-for-social-science-stephen-
hayward/211/34398/

test is based on companion of the


Williams, et. al (2000):goodness of fit
observed results we expected under the
Ho:0 =
E
sample of H,:0 FE
that the null
assumption hypothesis (Ho) is true.
Types of Chi-square Tests:
Chi-square Goodness-of-Fit Test
A chi-square test,
●also known as the chi-square goodness-of-fit test is used if we would
like to see whether the distribution of data follows a specific pattern.
For example:
•You would like to see whether the values obtained from an actual
observation on the monthly dividend in stocks differ considerably from
the expected value.
•You may want to investigate whether the fluctuation on the interest
rates during Sundays is higher than the rest of the days in a week.
Chi-square Test of Independence
A chi-square test,
●can be used to test the independence of two variables. alternative for Pearson
R correlation
●Is used when we would like to see; test, it variables
are COUNT
NUMBERS
o whether or not two random variables take their values
independently.
o whether the value of one relates with another.
o whether one variable is associated with another.
● this test of hypothesis use the chi-square distribution and the
contingency table.
Elementary Statistics by Bluman
For example,
• based on the distribution of data, you want to see whether the success
of an individual in his chosen career is independent or relates with his
academic performance in college. Here, the two variables involved are
the success of an individual in his chosen career and his academic
performance in college.
• you may want to see whether the life in years of laptops is
independent of brand. Here, the two variables involved are the life in
years of laptop and the brand of laptops.
• A study which involves on determining if job satisfaction can be
associated with income. The two variables are job satisfaction and
income.
Chi-square Test for Homogeneity of Proportions

A chi-square test,
●can also be used to test the homogeneity of proportions.
● this is used to determine whether the proportions for a variable are
equal when several samples are selected from different populations.
●this also use the chi-square distribution and the contingency table.
For example,
•You would like to see if the proportions of each group of students who
play online gaming are equal based on their program of affiliation, say
proportions of accountancy students, engineering students, and
architecture students who play online gaming.
•You may want to see if the proportions of employees who are in to
stock market are equal based on the nature of their profession (IT,
Medicine, Accounting, Engineering).
Two main types of Chi-square Tests to be discussed
here are:
•Goodness-of-fit tests which focus on one categorical
variable.
•Tests of independence which focus on the relationship
between two categorical variables. Thus, the
contingency table (or cross tabulation table will be used
to present the data values).

https://tophat.com/marketplace/science-&-math/statistics/full-course/statistics-for-social-science-stephen-
hayward/211/34398/
To illustrate the use of chi-square test:
If, according to Mendel's laws, you expect 10 of 20 offspring to be male
and the actual observed number was 8 males, then you might want to
know about the "goodness-of-fit" between the observed and expected
data.
Were the deviations (differences between observed and expected value)
the result of chance, or were they due to other factors?
How much deviation can occur before we conclude that something other
than chance is at work, causing the observed to differ from the expected
value.
The chi-square test is always testing what scientists call the null
hypothesis, which states that there is no significant difference between the
expected and observed result.
Elementary Statistics by Bluman
Test for Goodness-of-Fit
Definition:
The chi-square goodness-of-fit test is used to test the claim
that an observed frequency distribution fits some given
expected frequency distribution.

Assumptions of Chi-square Goodness-of-Fit Test:


1.The data are obtained from a random sample.
2.The expected frequency for each category must be 5 or more.

Statistical Analysis with Software Applications, Mc Graw Hill


Test Of Goodness-of-Fit
• If the observed frequencies are close to the
corresponding expected frequencies, the 2-value will
be small, indicating a good fit.
• If the observed frequencies differ considerably from the
expected frequencies, the 2-value will be large and the
fit is poor.
• A good fit leads to the acceptance of H0, whereas a poor
fit leads to its rejection.
Test Statistic, To calculate the expected frequencies, there
are two rules to follow:
To test the null hypothesis, the following
formula will be used: 1.If all the expected frequencies are equal, the
expected frequency E can be calculated by
using E =n/k, where n is the total number of
observations and k is the number of categories.
Where: O = is the observed frequency 2.If all the expected frequencies are not equal,
E = is the expected frequency then the expected frequency E can be
df = k – 1, degrees of freedom, k is the
number of categories
calculated by E = n ● p, where n is the total
number of observations and p is the
n = total number of observations
probability for that category(or p is the
hypothesized proportion from the null
hypothesis).
Statistical Analysis with Software Applications, Mc Graw Hill
Consider for example, a quality control officer of a laptop manufacturing company
would like to see if there was a difference in the life span of laptop batteries among three
categories. A sample of 45 student laptop owners is selected. The table below shows
the distribution of the life span of laptop batteries in years. If there were no difference,
you would expect 45/3 = 15 years life span of batteries for each category.
More than 4 years and below
Category 4 years and below Above 10 years
10 years
Observed frequency 12 19 14

The observed frequencies will almost always differ from the expected frequencies due to
sampling error; that is, the values differ from sample to sample. But the question is: Are these
differences significant? (Which means, there is a difference in the life span of the batteries for
each category) or will it be due to chance only? Thus, the two opposing statements are necessary
before computing the test value, the null and alternative hypotheses. Here, the null hypothesis
indicates that there is no difference or change among the categories.
Ho: There is no difference in the life span of laptop batteries among three categories.
H1: There is difference in the life span of laptop batteries among three categories.
Summary Procedures in conducting Chi-Squared Goodness-of-Fit Test:

Step 1: State the hypothesis and identify the claim.


Step 2: Find the critical value for the chi-square table. The test is always right-
tailed.
Step 3: Compute the test value using the formula

Step 4: Make the decision.


Reject the null hypothesis if the test value is greater than the critical
value.
Do not reject the null hypothesis if the test value is less than the critical
value.
Step 5: Summarize the results.
Statistical Analysis with Software Applications, Mc Graw Hill
Example 1:
A quality control officer of a laptop manufacturing company would like to see if
the life span of laptop batteries are equally distributed among three categories. A
sample of 45 student laptop owners is selected. The table below shows the
distribution of the life span of laptop batteries in years. At α = 0.05 can it be
considered that the lifespan of laptop batteries are equally distributed among the
three categories?

More than 4 years and


Category 4 years and below Above 10 years
below 10 years

Observed
12 19 14
frequency
Note that this problem involves only one categorical variable, the life span of laptop batteries classified into
three (4 years and below, more than 4 years and below 10 years, above 10 years), so we use the
goodness-of-fit-test.
Solution:
Step 1: State the hypotheses and identify the claim.
Ho: The ages of laptop batteries are equally distributed over the three
categories. (claim)
(Which is the same as saying that, “There is no difference in the
lifespan of laptop batteries in the three categories.”)
H1: The ages of laptop batteries are NOT equally distributed.
(Which is the same as saying that, “There is difference in the lifespan
of laptop batteries in the three categories.”)
Step 2: Find the critical value. At α = 0.05 and df = 3-1 = 2, locate the
critical value from the chi-square table. Thus, the critical value is
5.991.
Step 3: Compute the test value
To compute the test value, we solve first for the expected value E.

More than 4 years and below


Category 4 years and below Above 10 years
10 years
Observed frequency 12 19 14

Expected frequency 15 15 15

Then the test value 2 is


2 = 1.73 (test value/computed value or test statistic)

Step 4: Make the decision. Do not reject the null hypothesis, since the test value
1.73 is less than the critical value 5.991 (1.73 < 5.991)

Step 5: Summarize the results. There is not enough evidence to reject the claim
that the ages of laptop batteries is equally distributed over the three categories.
The life span of laptop batteries is equally distributed.
To illustrate the goodness-of-fit test, let us analyze the charts showing the graphs of the
observed values and the expected values of different data sets. From the charts below, you
could see whether the observed values and the expected values are close together or far
apart.

(A (B) (C)
)

From (A), the observed values From (B), the observed values From (C), the observed values
and the expected values are and the expected values are far and the expected values are far
close together, indicating that apart, the chi-square test will be apart, the chi-square test will be
the chi-square test will be small. large. Then “the null hypothesis large. Then “the null hypothesis
The decision will be “do not will be rejected”, hence, there is will be rejected”, hence, there is
reject the null hypothesis”, “not a good fit”. “not a good fit”.
hence, there is “a good fit”.

Statistical Analysis with Software Applications, Mc Graw Hill


Example 2:
A financial analyst wants to determine whether investors have any
preference on the type of investment. A sample of 93 investors were
interviewed and provided the information shown on the table below. At
0.10 level of significance, is there a difference in investment preferences
among the investors?
Types of Investment Frequency
Stocks 35
Mutual Funds 18
Bonds 30
Index Funds 10

Note that this problem involves only one categorical variable, the types of investment classified into four
(stocks, mutual funds, bonds, index funds), so we use the goodness-of-fit-test.
Solution:
Step 1: State the hypotheses and identify the claim.
Ho: Investors show no preferences.
(Which is the same as saying that, “There is no difference in the
preferences on the type of investment among investors.”)
H1: Investors show preferences. (claim)
(Which is the same as saying that, “There is difference in the
preferences on the type of investment among investors.”)
Step 2: Find the critical value. At α = 0.10 and df = 4-1 = 3, locate the
critical value from the chi-square table. Thus, the critical value is
6.251.
Step 3: Compute the test value
Types of Investment Observed Frequency Expected Frequency
Stocks 35 24
Mutual Funds 18 24
Bonds 30 24
Index Funds 10 24

To compute the test value, we solve first for the expected value E.

Then the test value 2 is


2 = 16.21 (test value/computed value or test statistic)

Step 4: Make the decision. Reject the null hypothesis, since the test value 16.21 is
greater than the critical value 6.251 (16.21 > 6.251).

Step 5: Summarize the results. There is enough evidence to reject the null
hypothesis that the investors show no preferences. The investors in fact show
preferences.
Example 3:
An article shows statistics of orders made online on a particular
product with different online stores within city. The data is based

proportions,
on the last six months of the previous year as follows, July 17%, Number of Orders made
August 11%, September 8%, October 14%, November 27%, and Months with CECT store
December 23%. The CECT online store manager wants to July 30
compare the orders made with his store with that of the data August 17
revealed by the article. The manager listed the number of orders
September 22
in his store on the same product stated in the article. The table on
the right shows the data collected by the manager for the last six October 45
months in the previous year. November 30
At 0.01 level of significance, can we support the claim that the December 59
proportions of orders with CECT online store is the same as the
rest of the online stores within city?

Note that this problem involves only one categorical variable, months covered in a year, so we use the
goodness-of-fit-test.
Solution:
Step 1: State the hypotheses and identify the claim.
Ho: The orders made on a particular product in different online stores within
the city for the last six months of the year is distributed as follows: July 17%,
August 11%, September 8%, October 14%, November 27%, and December
23%.
(or “There is no difference between the orders made with the CECT online
stores with the rest of the online stores within the city”.(claim)
H1: The distribution is not the same as stated in the null hypothesis.
(or “There is difference between the orders made with the CECT online
stores with the rest of the online stores within the city”.)
Step 2: Find the critical value. At α = 0.01 and df = 6-1 = 5, locate the
critical value from the chi-square table. Thus, the critical value is 15.086.
Step 3: Compute the test value
Months Number of Orders made with CECT store (O) P E = np

July 27 17% (200)(0.17) = 34


August 17 11% (200)(0.11) = 22
September 22 8% (200)(0.08) = 16
October 45 14% (200)(0.14) = 28
November 30 27% (200)(0.27) = 54
December 59 23% (200)(0.23) = 46

Then the test value 2 is


2 = 29.49 (test value/computed value or test statistic)

Step 4: Make the decision. Reject the null hypothesis, since the test value 29.49 is greater
than the critical value 15.086 (29.49 > 15.086).

α = 0.01

15.086

Step 5: Summarize the results. There is enough evidence to reject the null hypothesis that
there is no difference between the orders made with the CECT online stores with the rest
of the online stores. The store manager would conclude that the orders on the same
product made with CECT online store is different from orders made with the other online
stores within the city.
Exercise 1:

A chef of a fine dining restaurant wants to determine whether customers


have any preference among five flavors of ice cream as toppings in their
special dessert. A sample of 100 people provided the following data. At
0.10 level of significance, is there a difference in the flavor preferences
among the customers?
Exercise 2:
An operations manager would like to see whether the production of the
different parts (A, B, C, D) of a certain electronic equipment in different
machines: laser designing machine for part A, laser engraving machine of
part B, solid filling machine of part C, and pressing machine of part D is
in the ratio 2:2:5:1 per day. A randomly selected day is inspected to see if
the production of these parts is in the ratio 2:2:5:1. The manager has
recorded that a total of 900 pieces of these parts was found to have 200
pieces of part A, 165 pieces of part B, 468 pieces of part C, and 67 pieces
of part D. At the 0.01 level of significance, test the hypothesis that the
machines has produced these parts in the ratio 2:2:5:1.
http://www.z-table.com/chi-square-table.html

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy