0% found this document useful (0 votes)
81 views51 pages

Unit 5 - PS

This document discusses hypothesis testing, which allows drawing conclusions about a population based on a sample. It defines key concepts like the null and alternative hypotheses, type I and II errors, and p-values. The steps of hypothesis testing are to state the hypotheses, form an analysis plan, analyze sample data to find a test statistic, and interpret results to reject or fail to reject the null hypothesis based on the p-value or critical values. Specific statistical tests covered include z-tests, t-tests, F-tests, and chi-square tests as they apply to samples from normal distributions.

Uploaded by

A53Tejas Sanap
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views51 pages

Unit 5 - PS

This document discusses hypothesis testing, which allows drawing conclusions about a population based on a sample. It defines key concepts like the null and alternative hypotheses, type I and II errors, and p-values. The steps of hypothesis testing are to state the hypotheses, form an analysis plan, analyze sample data to find a test statistic, and interpret results to reject or fail to reject the null hypothesis based on the p-value or critical values. Specific statistical tests covered include z-tests, t-tests, F-tests, and chi-square tests as they apply to samples from normal distributions.

Uploaded by

A53Tejas Sanap
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

Hypothesis Testing

Probability and Statistics (ES22201IT)


Dr. Laxmi Bewoor
Laxmi,bewoor@viit.ac.in
Department of Computer Engineering

BRACT’S, Vishwakarma Institute of Information Technology, Pune-48

(An Autonomous Institute affiliated to Savitribai Phule Pune University)


(NBA and NAACDr.L.A.Bewoor,
accredited,Dept. ISO 9001:2015 certified)
of Computer
Engg.VIIT Pune
Objective/s of this session

• Understand confidence intervals and perform statistical


inference hypothesis testing

Learning Outcome/Course Outcome

1.Understand null and alternate hypothesis testing


2.Understand how to infer from hypothesis
3. Understand ANOVA

Dr. L.A.Bewoor, Department of Computer Engineering, VIIT, Pune-48 2


Content (for example)
Part –I
Statistical hypothesis,
Null and Alternate hypothesis,
test of hypothesis and significance,
Type I and Type II errors,

Level of Significance ,
Part-II
Tests involving the Normal distribution,
One-Tailed and Two-Tailed tests,
P value.
Special tests of significance for large samples and small samples (F, chi- square,
z, t- test),
ANOVA

Dr. L.A.Bewoor, Department of Computer Engineering, VIIT, Pune-48 3



Introduction
Hypothesis testing is a form of inferential statistics that allows us to draw
conclusions about an entire population based on a representative sample.
• In the real world, it is nearly impossible to deduce statistics about the entire
population & this huge amount of data needs interpretation to draw meaningful
conclusions.
• Hence, we take some random samples from the population, derive some statistical
measures (e.g. mean, standard deviation, variance), and draw conclusions about
relationships from the data collected.
• Data can be interpreted by assuming a specific outcome and use statistical methods
to confirm or reject the assumption. This assumption is called a hypothesis and the
statistical test used for this purpose is called hypothesis testing.
• In statistics, a hypothesis is a statement about a population that we want to verify
based on information contained in the sample data.
• Hypothesis testing quantifies an observation or outcome of an experiment under a
given assumption. The result of the test enables us to interpret whether the
assumption holds true or false. In other words, it signifies if the hypothesis can be
confirmed or rejected for the observation made.
• An observation or outcome of an experiment is known as a test statistic, which is a
statistic measure or a standardized value that is calculated from sample data of the
underlying population.
• Eg. Children watch an average of 3 hours of TV per week.

Dr.L.A.Bewoor, Dept. of Computer


Engg.VIIT Pune
Hypothesis Testing
• The assumption of hypothesis testing is called the null hypothesis.
• A null hypothesis is a type of assumption used in statistics
indicating that there is no significant difference between the
samples from the underlying population. It is also known as the
default hypothesis, represented by H0.
• H0: children watch an average of 3 hours of TV per week.
• In contrast, there is “Alternative Hypothesis”, represented by H1.
For every null hypothesis, there is an alternative hypothesis that is
opposite to what the null hypothesis states.
• The decision of confirming or rejecting the null hypothesis is made
by interpreting the result of the test.
• The result of the hypothesis testing can be interpreted using
p-values or critical values.
• The p-value is the probability of deducing the observed value,
given the assumption. On the other hand, critical values are
cut-off values that define regions where the test statistic is
unlikely to lie.

Dr.L.A.Bewoor, Dept. of Computer


Engg.VIIT Pune
Dr.L.A.Bewoor, Dept. of Computer
Engg.VIIT Pune
Hypothesis Testing Steps
• State the hypotheses. This involves stating the null and alternative
hypotheses. The hypotheses are stated in such a way that they are
mutually exclusive. That is, if one is true, the other must be false.
• Formulate an analysis plan. The analysis plan describes how to use sample
data to evaluate the null hypothesis. The evaluation often focuses around
a single test statistic.
• Analyze sample data. Find the value of the test statistic (mean score,
proportion, t statistic, z-score, etc.) described in the analysis plan.
• Interpret results. Apply the decision rule described in the analysis plan. If
the value of the test statistic is unlikely, based on the null hypothesis,
reject the null hypothesis.
• When the p value is less than 5% (p < .05), we reject the null hypothesis.
We will refer to p < .05 as the criterion for deciding to reject the null
hypothesis, although note that when p = .05, the decision is also to reject
the null hypothesis. When the p value is greater than 5% (p > .05), we
retain the null hypothesis. The decision to reject or retain the null
hypothesis is called significance.

Dr.L.A.Bewoor, Dept. of Computer


Engg.VIIT Pune
Hypothesis Testing Summary

Dr.L.A.Bewoor, Dept. of Computer


Engg.VIIT Pune
Decision Errors
• Two types of errors can result from a hypothesis test.
• Type I error. : False positives:A Type I error occurs when the researcher rejects a null hypothesis
when it is true. This type of error is analogous to finding an innocent person guilty
• The probability of committing a Type I error is called the significance level. This probability is also
called alpha, and is often denoted by α.
• Type II error. False negatives: A Type II error occurs when the researcher fails to reject a null
hypothesis that is false. The probability of committing a Type II error is called Beta, and is often
denoted by β. The probability of not committing a Type II error is called the Power of the test.
• The correct decision is to reject a false null hypothesis. There is always some probability that we
decide that the null hypothesis is false when it is indeed false. This decision is called the power of the
decision-making process. It is called power because it is the decision we aim for.

Dr.L.A.Bewoor, Dept. of Computer


Engg.VIIT Pune
Making Decision: Types of Error

1.The decision to retain the null hypothesis


could be correct.
2. The decision to retain the null hypothesis
could be incorrect.
3. The decision to reject the null hypothesis
could be correct.
4. The decision to reject the null hypothesis
could be incorrect.

Dr.L.A.Bewoor, Dept. of Computer


Engg.VIIT Pune
Testing Research Hypothesis
• The test statistic we use depends largely on what
we know about the population. When we know
the mean and standard deviation in a single
population, we can use the one–independent
sample z test
• we can state one of three alternative hypotheses:
A population mean is greater than (>), less than
(<), or not equal (≠) to the value stated in a null
hypothesis. The alternative hypothesis determines
which tail of a sampling distribution to place the
level of significance,
Dr.L.A.Bewoor, Dept. of Computer
Engg.VIIT Pune
Nondirectional, Two-tailed Hypothesis Tests (H1: ≠)
• Nondirectional tests, or two-tailed tests, are
hypothesis tests where the alternative hypothesis
is stated as not equal to (≠).
Problem statement
• A survey reported that the population mean score
on the quantitative portion of the GRE between
1994 and 1997 was 558 ± 139 (µ ±σ). Suppose we
select a sample of 100 participants (n = 100). We
record a sample mean equal to 585 (M = 585).
Compute the one–independent sample z test for
whether or not we will retain the null hypothesis
(m = 558) at a .05 level of significance (α = .05).

Dr.L.A.Bewoor, Dept. of Computer


Engg.VIIT Pune
• Step 1: State the hypotheses.
– The population mean is 558, and we are testing
whether the null hypothesis is (=) or is not (≠)
correct:
– H0 : µ = 558 Mean test scores are equal to 558 in
the population.
– H1 : µ ≠ 558 Mean test scores are not equal to
558 in the population.
• Step 2: Set the criteria for a decision.
– The level of significance is .05, which makes the
alpha level α = .05.
– In a nondirectional two-tailed test, we divide the
alpha value in half so that an equal proportion of
area is placed in the upper and lower tail.
Dr.L.A.Bewoor, Dept. of Computer
Engg.VIIT Pune
The critical values(+-1.96) for a nondirectional (two-tailed) test with .05 level of
significance
The regions beyond the critical values, displayed in Figure are called the rejection
regions. If the value of the test statistic falls in these regions, then the decision is to
reject the null hypothesis; otherwise, we retain the null hypothesis
The test statistic for a one–independent sample z test is called the z statistic. The z
statistic converts any sampling distribution into a standard normal distribution.
The solution of the formula gives the number of standard deviations, or z-scores, that a
sample mean falls above or below the population mean stated in the null hypothesis.
We can then compare the value of the z statistic, called the obtained value, to the
critical values Dr.L.A.Bewoor, Dept. of Computer
Engg.VIIT Pune
z obt =585-558/13.9=1.94
Step 4: Make a decision.
To make a decision, we compare the obtained value to the critical values. We reject the
null hypothesis if the obtained value exceeds a critical value. Figure shows that the
obtained value (Zobt = 1.94) is less than the critical value; it does not fall in the rejection
region. The decision is to retain the null hypothesis.

Dr.L.A.Bewoor, Dept. of Computer


Engg.VIIT Pune
Problem 2:

Dr.L.A.Bewoor, Dept. of Computer


Engg.VIIT Pune
⚫ Testing hypothesis for the mean μ :
⚫ When the value of sample size
(n):

population is normal or not normal population is normal


( n ≥ 30 ) (n< 30)

σ is known σ is not known σ is known σ is not known

X −μ X − 0μ X − μ0 X−μ
Z= 0 Z= Z=
σ 0
σ S T= S
n n
n n

⚫ 1
Text Book : Basic Concepts
and Methodology for the Health 10
11
The Use of P – Values in Decision Definition
Making::
⚫ 6.Decision :
⚫ If we reject H0, we can conclude that
HA is true.
⚫ If ,however ,we do not reject H0, we
may conclude that H0 is true.

14
An Alternative Decision Rule using the
p - value Definition
⚫ The p-value is defined as the smallest
value of α
for which the null hypothesis can be
rejected.

⚫ If the p-value is less than or equal to


α ,we reject the null hypothesis (p ≤
α)
⚫ If the p-value is greater than α ,we do not
reject the null hypothesis (p > α) 15
The Use of P –Values in Decision Making
• Critical value=2.15
• Area corresponding to critical value=0.9842
• Pvalue= 1-0.9842=0.0158

Dr.L.A.Bewoor, Dept. of Computer


Engg.VIIT Pune
P-Values for z Tests
• The calculation of the P-value depends on
whether the test is upper-, lower-, or
two-tailed.

Dr.L.A.Bewoor, Dept. of Computer


Engg.VIIT Pune
Researchers are interested in the mean age of a certain population. A random sample
of 10 individuals drawn from the population of interest has a mean of 27. Assuming
that the population is approximately normally distributed with variance 20,can we
conclude that the mean is different from 30 years ? (α=0.05) . If the p - value is 0.0340
how can we use it in making a decision?
Sol: variable is age, n=10, xbar=27 , σ2 =20,α=0.05
Hypotheses:
H0 : μ=30 HA: μ≠ 30
Test Statistic: (Normally distributed)
z obt = -2.12
Critical value: (1-α)+α/2=0.95+0.05/2= 0.975
Zscore=1.96
z obt is in rejection so reject Null hypothesis
Area =0. 01700 (sum of Al+Middle)
P value=2*0.017=0.034
Pvalue< α so reject null hypothesis 0.9025

Dr.L.A.Bewoor, Dept. of Computer


Engg.VIIT Pune
• Among 157 African-American men ,the mean systolic blood pressure was 146 mm
Hg with a standard deviation of 27. We wish to know if on the basis of these data,
we may conclude that the mean systolic blood pressure for a population of
African-American is greater than 140. Use α=0.01.
• Sol:
• Data: Variable is systolic blood pressure,
• n=157 ,xbar =146, s=27, α=0.01.
• Assumption: population is not normal, σ2 is unknown
• H0 :μ=140
• HA: μ>140
• Z=2.78
• we reject H0 if Z>Z1-α
• = Z0.99= 2.33
• Decision: We reject H0.
• Hence we may conclude that the mean systolic blood pressure for a population of
African-American is greater than 140

Dr.L.A.Bewoor, Dept. of Computer


Engg.VIIT Pune
• The purpose of a study by Luglie was to investigate the oral status of a
group of patients diagnosed with thalassemia major (TM) . One of the
outcome measure s was the decayed , missing, filled teeth index (DMFT) .
In a sample of 18 patients ,the mean DMFT index value was 10.3 with
standard deviation of 7.3 . Is this sufficient evidence to allow us to
conclude that the mean DMFT index is greater than 9 in a population of
similar subjects?
• Let α =0.1.Take p=0.22 and make decision.

Dr.L.A.Bewoor, Dept. of Computer


Engg.VIIT Pune
• A monthly income investment scheme exists that promises variable monthly returns. An investor will
invest in it only if they are assured of an average $180 monthly income. The investor has a sample of 300
months’ returns which has a mean of $190 and a standard deviation of $75. Should they invest in this
scheme?alpha=0.95
• Sol: The investor will invest in the scheme if they are assured of the investor's desired $180 average
return.
• H0: Null Hypothesis: mean = 180
• H1: Alternative Hypothesis: mean > 180
Method1: Using p-value
z obt = (xbar − µ)/ σ M= (190-180)/75/sqrt(300)=2.309
Use Z table to find area which is= 0.99158
Pvalue=1-0.99158= 0.0084( rightt tailed)
Pvalue< 0.95 so reject Ho
Method2: Using critical region method
Critical value: 1.64
Zobt>Critical region so Ho rejected

Dr.L.A.Bewoor, Dept. of Computer


Engg.VIIT Pune
F Test
• The F-distribution, also known Fisher-Snedecor distribution is extensively used to test for
equality of variances from two normal populations.
• F-distribution got its name after R.A. Fisher
• The F-distribution is generally a skewed distribution and also related to a chi-squared
distribution
• The f distribution is the ratio of X1 random chi-square variable with degrees of freedom
ϑ1 and X2 random chi-square variable with degrees of freedom ϑ2.

• The F-test compares the more than one level of independent variable with multiple groups
which uses the F distribution. This is generally used in ANOVA calculations. Always use
F-distribution for F-test to compare more than two groups.
Assumptions of F distribution
• Assumes both populations are normally distributed
• Both the populations are independent to each other
• The larger sample variance always goes in the numerator to make the right tailed test, and the
right tailed tests are always easy to calculate.

Dr.L.A.Bewoor, Dept. of Computer


Engg.VIIT Pune
• What is an F Test?
• F test is to find out whether the two independent estimates of population variance
differ significantly. In this case F ratio is

• To find out whether the two samples drawn from the normal population having the
same variance. In this case F ratio is

In both the cases σ12 > σ22 , S12 > S22 in other words larger estimate of variance
always be in numerator and smaller estimate of variance in denominator

Degrees of freedom (ϑ)


DF of larger variance (i.e numerator) =n1-1
DF of smaller variance (i.e denominator) =n2-1

Dr.L.A.Bewoor, Dept. of Computer


Engg.VIIT Pune

F Statistic
F statistic also known as F value is used in ANOVA and regression analysis to identify the means between
two populations are significantly different or not. In other words F statistic is ratio of two variances
(Variance is nothing but measure of dispersion, it tells how far the data is dispersed from the mean). F
statistic accounts corresponding degrees of freedom to estimate the population variance.
• F statistic is almost similar to t statistic. t-test states a single variable is statistically significant or not
whereas F test states a group of variables are statistically significant or not.
• F statistics are based on the ratio of mean squares. F statistic is the ratio of the mean square for
treatment or between groups with the Mean Square for error or within groups.

Dr.L.A.Bewoor, Dept. of Computer


Engg.VIIT Pune
Steps to conduct F test
• Choose the test: Note down the independent variables and dependent
variable and also assume the samples are normally distributed
• Calculate the F statistic, choose the highest variance in the numerator and
lowest variance in the denominator with a degrees of freedom (n-1)
• Determine the statistical hypothesis
• State the level of significance
• Compute the critical F value from F table. (use α/2 for two tailed test)
• Calculate the test statistic
• Finally, draw the statistical conclusion. reject the null hypothesis; If the
test statistic falls in the critical region.
When would you use an F Test?
• There are different types of F tests are exists for different purpose.
• In statistics, an F-test of equality of variances is a test for the null hypothesis that
two normal populations have the same variance.
• F-test is to test equality of several means. While ANOVA uses to test the equality
of means.
• F-test for linear regression model is to tests any of the independent variables in
a multiple linear regression are significant or not. It also indicates a linear
relationship between dependent variable and at least one of the independent
variable.
Dr.L.A.Bewoor, Dept. of Computer
Engg.VIIT Pune
Example 1: A botanical research team wants to study the growth of plants with the usage of urea. Team conducted 8
tests with a variance of 600 during initial state and after 6 months 6 tests were conducted with a variance of 400. The
purpose of the experiment is to know is there any improvement in plant growth after 6 months at 95% confidence
level.
Sol:
Degrees of freedom ϑ1=8-1 =7 (highest variance in numerator)
ϑ2 = 6-1= 5
Statistical hypothesis:
Null hypothesis H0: σ12≤ σ22
Alternative hypothesis H1: σ12>σ22
Since the team wants to see the improvement it is a one-tail (right) test
Level of significance α= 0.05
Compute the critical F from table = 4.88
Reject the null hypothesis if the calculated F value more than or equal to 4.88
Calculate the F value F= S 12/ S22 =600/400= 1.5
Fcalc< Fcritical Hence fail to reject the null hypothesis
Interpret the results: Compare f calc to f critical . In hypothesis testing, a critical value is a point
on the test distribution compares to the test statistic to determine whether to reject the
null hypothesis. Since f cal value is less than f ritical value and it is not in the rejection region.
Hence we failed to reject the null hypothesis at 95% confidence level .
Dr.L.A.Bewoor, Dept. of Computer
Engg.VIIT Pune
• Example 2: A toy manufacturer is planning to place a bulk order for
batteries for the toys. The quality team collected 21 samples from
supplier A, and the variance is 36 hours, and also collected 16 samples
from supplier B with a variance of 28. At 95% confidence level, determine
is there a difference in variance between two suppliers?
• Sol:
Degrees of freedom ϑ1=21-1 =20 (highest variance in numerator)
ϑ2 = 16-1= 15
Statistical hypothesis:
Null hypothesis H0: σ12= σ22
Alternative hypothesis H1: σ12≠σ22
Since team wants to see is there a difference between two suppliers, it is a
two –tailed test
Level of significance α= 0.05
α/2= 0.025
Critical value for the right tail F(0.025,20,15) =2.7559
Critical value for left tail: Since it is a left tail, we must switch the degrees of
freedom, then take a reciprocal of final answer
• reciprocal of F(0.025,15,20) = 1/F(0.025,15,20) = 1/2.57=0.389
• Calculate the F value F= S12/ S22 =36/28= 1.285
Interpret the results: Compare f calc to f critical . In hypothesis testing, a critical
value is a point on the test distribution compares to the test statistic to
determine whether to reject the null hypothesis. Since f cal value does lie not
lie in the rejection region. Hence we failed to reject the null hypothesis at
95% confidence level.

Dr.L.A.Bewoor, Dept. of Computer


Engg.VIIT Pune
• Example 3: XYZ is an e-commerce site that wants to test if the
delivery times of A city are less than the B City at a 5%
significance level during thanks giving holidays. 28 delivery
times are observed from A city, and the variance is 38hours.
Collected 25 samples from BCity, and the variance is 83
hours.
Sol: Degrees of freedom ϑ1=28-1 =27
ϑ2 = 25-1= 24
Statistical hypothesis:
Null hypothesis H0: σ12≥ σ22
Alternative hypothesis H1: σ12<σ22
Since team wants to see delivery times of Boston city are less than the New York
City, it is a left–tailed test
Level of significance α= 0.05
Critical value for left tail: F(0.95,27,24). Since it is a left tail and we cannot find the
0.95 in f- table, we must switch the degrees of freedom, then take a reciprocal of
final answer
• reciprocal of F(0.05,24,27) = 1/F(0.025,24,27) = 1/1.93=0.5181
• Calculate the F value F= S 12/ S22
• For left tailed test keep the lowest variance as the numerator and the highest
variance as denominator = 38/83= 0.4578
• Interpret the results : Compare f calc to f critical . In hypothesis testing, a critical
value is a point on the test distribution compares to the test statistic to
determine whether to reject the null hypothesis. Since f cal value is less than f
critical value and it is in the rejection region. Hence we reject the null
hypothesis at 95% confidence level.

Dr.L.A.Bewoor, Dept. of Computer


Engg.VIIT Pune
ANOVA- Analysis of Variance
• Analysis of variance (ANOVA) is a statistical technique that is used to check if the
means of two or more groups are significantly different from each other. ANOVA
checks the impact of one or more factors by comparing the means of different
samples.
• ANOVA is also called the Fisher analysis of variance, and it is the extension of the t-
and z-tests
• When we have only two samples, t-test and ANOVA give the same results.
However, using a t-test would not be reliable in cases where there are more than 2
samples
• Example:
▪ Evaluation of academic performance of students from different schools
▪ Assessment of customer satisfaction between two or more products
▪ Determining difference in quality of service among different branches of a company
▪ Comparing the average weight of individuals living in different countries or regions.

Dr.L.A.Bewoor, Dept. of Computer


Engg.VIIT Pune
Terminologies related to ANOVA
• Grand Mean(µ): The grand mean is the mean of sample means or the
mean of all observations combined, irrespective of the sample.
• Hypothesis:

• Between Group Variability


Consider the distributions of the below two samples. As these samples
overlap, their individual means won’t differ by a great margin. Hence the
difference between their individual means and grand mean won’t be
significant enough.

• If the samples differ from each other by a big margin, their individual
means would also differ. The difference between the individual means and
grand mean would therefore also be significant.
Such variability between the distributions called
Between-group variability
Dr.L.A.Bewoor, Dept. of Computer
Engg.VIIT Pune
• Each sample is looked at and the difference between its mean and grand mean is calculated to
calculate the variability. If the distributions overlap or are close, the grand mean will be similar to
the individual means whereas if the distributions are far apart, difference between means and
grand mean would be large. Given the sample means and Grand mean, called as sum-of-squares
for between-group variability calculated as:

For between-group variability , we find each squared deviation, weigh them by


their sample size, sum them up, and divide by the degrees of freedom ( ),
which in the case of between-group variability is the number of sample means (k-1).

• Within Group Variability

Dr.L.A.Bewoor, Dept. of Computer


Engg.VIIT Pune
• As the spread (variability) of each sample is increased, their distributions overlap
and they become part of a big population
• consider another distribution of the same three samples but with less variability.
Although the means of samples are similar to the samples in the above image,
they seem to belong to different populations

Such variations within a sample are denoted by Within-group variation. It refers to


variations caused by differences within individual groups
• Measure Within-group variability by looking at how much each value in each
sample differs from its respective sample mean. So take the squared deviation of
each value from its respective sample mean and add them up. This is the sum of
squares for within-group variability. After that divide the sum of squared
deviations by the degrees of freedom

Dr.L.A.Bewoor, Dept. of Computer


Engg.VIIT Pune
degrees of freedom is the sum of the sample sizes (N) minus the number of samples (k).
So now:

• F-Statistic
The statistic which measures if the means of different samples are significantly different
or not is called the F-Ratio. Lower the F-Ratio, more similar are the sample means. In that
case, we cannot reject the null hypothesis.

F = Between group variability / Within group variability

F-statistic calculated here is compared with the F-critical value for making a
conclusion. look at different F-values for each alpha/significance level because the
F-critical value is a function of two things:

Dr.L.A.Bewoor, Dept. of Computer


Engg.VIIT Pune
Numerical 1:
• Three types of fertilizers are used on three groups of plants for 5 weeks. We want to check if
there is a difference in the mean growth of each group. Using the data given below apply a
one way ANOVA test at 0.05 significant level.

Fertilizer 1 Fertilizer 2 Fertilizer 3

6 8 13

8 12 9

4 9 11

5 11 8

3 6 7

4 8 12

Dr.L.A.Bewoor, Dept. of Computer


Engg.VIIT Pune
Solution:
H0:μ1 = μ2= μ3
H1: The means are not equal
x1bar=5, x2bar=9, x3bar=10, Xbar (Grand mean)=8
n1=n2 =n3 = 6, k = 3
SSB = 6(5 - 8)2 + 6(9 - 8)2 + 6(10 - 8)2
= 84
dfbetween = k - 1 = 2
(x1i-5)^2=16 (X2i-9)^2=24,(x3i-10)^2=28
SSwithin = 16 + 24 + 28 = 68
N = 18
dfwithin = N - k = 18 - 3 = 15
MSB = SSB / df1 = 84 / 2 = 42
MSwithin = SSwithin / df2 = 68 / 15 = 4.53
ANOVA test statistic, f = MSB / MSE = 42 / 4.53 = 9.33
Using the f table at αα = 0.05 the critical value is given as F(0.05, 2, 15) = 3.68
As f > F, thus, the null hypothesis is rejected and it can be concluded that there is a difference
in the mean growth of the plants.

Dr.L.A.Bewoor, Dept. of Computer


Engg.VIIT Pune
One way Anova
A recent study claims that using music in a class enhances the concentration and
consequently helps students absorb more information.

calculate the means and the Grand mean

Alpha=0.05

F-value is greater than the F-critical value for the alpha level selected (0.05) so reject the
null hypothesis
Dr.L.A.Bewoor, Dept. of Computer
Practice Problem
• Determine if there is a difference in the mean daily calcium intake for people with normal
bone density, osteopenia, and osteoporosis at a 0.05 alpha level. The data was recorded as
follows:

Normal Osteoporosi
Osteopenia
Density s

1200 1000 890

1000 1100 650

980 700 1100

900 800 900

750 500 400

800 700 350

Dr.L.A.Bewoor, Dept. of Computer


Engg.VIIT Pune
Chi-square test
• Chi-square is a method that is used in statistics and it calculates the difference
between observed and expected data values.
• It is used to find out how closely actual data fit with expected data.
• The value of chi-square will help us to get the answer to the question as to the
significance of the difference in expected and observed data statistically.
• A small chi-square value will tell us that any differences in actual and
expected data are due to some usual chance and hence the data is not statistically
significant.
• Also, a large value will tell that the data is statistically significant and there is
something causing the differences in data. From there, a statistician may explore
factors that may be responsible for the differences.
• If the chi-square value is very large, then we have to reject the null hypothesis.
• Chi-Square is one way to show the relationship between two categorical variables.
• Chi-Square Test of Association between two variables . It is also called the Pearson’s
chi-square test of association. You use this test when you have categorical data for
two independent variables, and you want to see if there is an association between
them

Dr.L.A.Bewoor, Dept. of Computer


Engg.VIIT Pune
Formula for the Chi-Square Test

The Chi-Square is denoted by χ2 and the formula is:


χ2=∑(O−E)2/E
Where,
O: Observed frequency
E: expected frequency
∑:summation
Chi 2 :Chi Square Value
E=(row total*col total)/sample size

Dr.L.A.Bewoor, Dept. of Computer


Engg.VIIT Pune
Numerical1
• Is gender independent of education level? A random sample of 395 people were
surveyed and each person was asked to report the highest education level they
obtained. The data that resulted from the survey is summarized in the following
table:

High
School Bachelors Masters Ph.d. Total
Female 60 54 46 41 201
Male 40 44 53 57 194
Total 100 98 99 98 395
Question: Are gender and education level dependent at 5% level of significance? In
other words, given the data collected above, is there a relationship between the gender
of an individual and the level of education that they have obtained?

Dr.L.A.Bewoor, Dept. of Computer


Engg.VIIT Pune
Sol:

High
School Bachelors Masters Ph.d. Total
Female 50.886 49.868 50.377 49.868 201
Male 49.114 48.132 48.623 48.132 194
Total 100 98 99 98 395

The critical value of χ2 with 3 degree of freedom is 7.815. Since 8.006 > 7.815, we reject
the null hypothesis and conclude that the education level depends on gender at a 5% level
of significance.

Dr.L.A.Bewoor, Dept. of Computer


Engg.VIIT Pune
Numerical2
• Following pattern of driving test outcomes was observed at 0.05 level of significance

Male Female
Pass 7 11
Fail 13 9

It seems females have a more successful pass/fail rate than males. However, to test
whether this observed difference is significant, we need to look at the outcome of a
Chi-Square test. As with the one-variable Chi-Square test, our aim is to see if the pattern of
observed frequencies is significantly different from the pattern of frequencies which we
would expect to see by chance - i.e., what we would expect to obtain if there was no
relationship between the two variables in question. With respect to the example above, "no
relationship" would mean that the pattern of driving test performance for males was no
different to that for females.

Dr.L.A.Bewoor, Dept. of Computer


Engg.VIIT Pune
18
22

20 20
Step 2: Calculate expected numbers for each individual cell (i.e. the frequencies we
would expect to obtain if there were no association between the two variables). You do
this by multiplying row sum by column sum and dividing by total number.
Expected Frequency = Row Total x Column
0.6579 + 0.6579 + 0.5952 + 0.5952 = 2.5062

Calculate degrees of freedom (df): (Number of Rows – 1) x (Number of Columns – 1)


= (2 – 1) x (2 – 1) =1 df (degrees of freedom)
Conclusion:

The Chi Square calculation above was 2.5062. This number is less than the critical value
of 3.84, so in this case the null hypothesis cannot be rejected. In other words, there
does not appear to be a significant association between the two variables: males and
females have a statistically similar pattern of pass/fail rates on their driving tests.

Dr.L.A.Bewoor, Dept. of Computer


Engg.VIIT Pune

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy