ANOVA PPT Explained PDF
ANOVA PPT Explained PDF
(ANOVA)
Avjinder Singh Kaler and Kristi Mai
Estimating a Population Variance/Standard Deviation
• 𝜒 2 (Chi-Square) Distribution
Comparing Variation in Two Samples
• F Distribution
One-Way Analysis of Variance (ANOVA)
Multiple Comparison Tests
• Tukey Test
Two-Way Analysis of Variance (ANOVA)
Main Ideas:
• The sample variance is the best point estimate of the population
variance and the sample standard deviation is typically used to
estimate the population standard deviation
• We can use a sample variance to construct a C.I. to estimate the true
value of a population variance and we can also use a sample
standard deviation to construct a C.I. to estimate the true value of a
population standard deviation
• We can also test claims about a population variance or standard
deviation
If a population has a normal distribution, then the following formula described
𝑛−1 ∗𝑠 2
the 𝜒 distribution: 𝜒 =
2 2
2 𝜎
This is a Chi-Square-score and is a measure of relative standing
We NEED degrees of freedom for the 𝜒 2 distribution
• 𝑑𝑓 = 𝑛 − 1 (in this situation)
• Although this value for degrees of freedom is common, 𝑑𝑓 are NOT always 𝑛 − 1
Properties of the Chi-Square Distribution:
• The 𝜒 2 distribution is NOT symmetric like the t-distribution or the Normal distribution
Note: Because the distribution is NOT symmetric, the C.I. will NOT be 𝑠 2 ± 𝐸
• The values of 𝜒 2 can be ≥ 0 but cannot be negative
• The 𝜒 2 distribution is different for different degrees of freedom
Main Ideas:
The sample variance is the best point estimate of the population
variance and the sample standard deviation is typically used to
estimate the population standard deviation
Conclusion Cautions:
• Rejecting the null hypothesis does NOT tell us that all of the means are different!
• In fact, rejecting the null hypothesis cannot tell us which mean(s) is(are)
different
Use the performance IQ
scores listed in Table 12-1
and a significance level
of α = 0.05 to test the
claim that the three
samples come from
populations with means
that are all equal.
Here are summary statistics from the collected data:
Requirement Check:
1. The three samples appear to come from populations that are
approximately normal.
2. The three samples have standard deviations that are not dramatically
different.
3. We can treat the samples as simple random samples.
4. The samples are independent of each other and the IQ scores are not
matched in any way.
5. The three samples are categorized according to a single factor: low
lead, medium lead, and high lead.
The hypotheses are:
H0 : 1 2 3
H1 : At least one of the means is different from the others.
Assuming that the populations have the same variance σ2 (as required
for the test), the F test statistic is the ratio of these two estimates of σ2:
• The Tukey Test compares all 𝑘 means, two at a time. (Note: This can be
done here because a Tukey Test does control the overall significance
level for pairwise comparisons)
We introduce the method of two-way analysis of variance, which is
used with data partitioned into categories according to two factors.
Notice the Two-Way ANOVA is a two-step procedure that performs one to three
separate 𝐹 tests
The data in the table are categorized
with two factors:
1. Gender: Male or Female
2. Blood Lead Level: Low, Medium, or
High
The subcategories are called cells, and
the response variable is IQ score.
Let’s explore the IQ data in the table by calculating the mean for each
cell and constructing an interaction graph.
An interaction effect is suggested if the line segments are far from
being parallel.
No interaction effect is suggested if the line segments are
approximately parallel.
For the IQ scores, it appears there is an interaction effect:
• Females with high lead exposure appear to have lower IQ scores, while
males with high lead exposure appear to have high IQ scores.
Step 1: Interaction Effect – test the null hypothesis that there is no
interaction
The test statistic is F = 0.43 and the P-value is 0.655, so we fail to reject the
null hypothesis.
For the row factor, F = 0.07 and the P-value is 0.791. Fail to reject the null
hypothesis, there is no evidence that IQ scores are affected by the gender of
the subject.
H0 : ܰ( ݎݐܿܽܨ ݊݉ݑ݈ܥ blood lead level) ݐ݂݂ܿ݁ܧ
H1 : There ݅ ݊݉ݑ݈ܥ ܽ ݏFactor (blood lead level) ݐ݂݂ܿ݁ܧ
For the column factor, F = 0.10 and the P-value is 0.906. Fail to reject the null
hypothesis, there is no evidence that IQ scores are affected by the level of lead
exposure.
Interpretation:
Based on the sample data, we conclude that IQ scores do not appear
to be affected by gender or blood lead level.
Caution:
• Two-way analysis of variance is not one-way analysis of variance done twice.
• Be sure to test for an interaction between the two factors.
To better understand the method of two-way analysis of variance, let’s
repeat Example 1 after adding 30 points to each of the performance IQ
scores of the females only. That is, in Table 12-3, add 30 points to each
of the listed scores for females.
Step 1:
• Interaction Effect: The display shows a p-value of 0.655 for an interaction
effect. Because that p-value is not less than or equal to 0.05, we fail to
reject the null hypothesis of no interaction effect. There does not appear to
be an interaction effect.
Step 2:
• Row Effect: The display shows a p-value less than 0.0001 for the row variable of
gender, so we reject the null hypothesis of no effect from the factor of gender. In
this case, the gender of the subject does appear to have an effect on
performance IQ scores.
• Column Effect: The display shows a p-value of 0.906 for the column variable of
blood lead level, so we fail to reject the null hypothesis of no effect from the
factor of blood lead level. The blood lead level does not appear to have an
effect on performance IQ scores.
Interpretation:
By adding 30 points to each score of the female subjects, we do
conclude that there is an effect due to the gender of the subject, but
there is not apparent effect from an interaction or from the blood lead
level.
If our sample data consist of only one observation per cell, there is no
variation within individual cells and sample variances cannot be
calculated for individual cells.