21st April Lecture-Chi Square and ANNOVA
21st April Lecture-Chi Square and ANNOVA
tests, ANOVA
- Dr.Nikita Mishra
nikita.mishra@spit.ac.i
Todays Specials
● Hypothesis Testing
● Alpha and Critical Values
● Errors in Hypothesis Testing
● Independent and dependent t-tests
● Chi-Square Tests
● Goodness of Fit test
● Test of Independence
● Anova - one-way ANOVA, two-way ANOVA
Hypothesis Testing
5. Calculate p-value
● Alternative hypothesis → HA or H1
○ Complement of H0
○ The claim that the researcher believes in
Identifying Test Statistic
● Test statistic: standardized difference between estimated sample
parameter & the hypothesis value.
● Purpose: used to calculate the p-value, supporting evidence for the
null hypothesis.
● A large standardized distance indicates a lower p-value, suggesting
stronger evidence against the null hypothesis.
● Example: H0: μ ≤ 100,000 vs. X (sample estimate) = 110,000.
● Std distance calculation: (110,000 - 100,000) / 5000 = 2 SD’s
● p-value = P(Observing test statistics value | null hypothesis is true)
● A smaller p-value implies stronger evidence against the null
hypothesis.
● In this case, a p-value of 0.02275 supports the statement in the null
hypothesis with a certain level of significance.
Decision Criteria - Significance Value
● Significance level (𝛂) - Criteria used for taking the decision regarding the
null hypothesis. 𝛂 = P(Rejecting a null hypothesis | null hypothesis is
true)
probability = 𝛂
● Critical value - value of statistic in the distribution for which the
H0 : 𝛍m ≤ 100,000
HA : 𝛍m > 100,000
Right-tailed
hypothesis tests
rejection region
One-Tailed Test: Left-sided
H0 : 𝛍w ≥ 30
HA : 𝛍w < 30
Left-tailed
hypothesis tests
rejection region
Two-Tailed Test
Claim: Avg annual salaries of male & female MBA students are different at the
time of graduation
H0 : 𝛍m = 𝛍f
HA : 𝛍m ≠𝛍f
Two-tailed
hypothesis tests
rejection region
Errors: Type I & Type II
Errors: Type I & Type II
● Since the hypothesis test is carried out with just one sample, this
test is also known as one-sample Z-test
H0: 𝛍 ≤ 4200
HA: 𝛍 > 4200
Since we know the population standard deviation, we can use the Z-test. The
corresponding Z-statistic is given by
(b) Ministry of education believes that the IQ is more than 82. If the actual
IQ (population mean) of Indians is 86, calculate the Type II error and the
power of hypothesis test.
Power of Test: 1 - 𝛃
Z-Test for Proportion
CLT of proportions:
the sampling distribution of proportions p∧ for a large sample follows an approximate
normal distribution with mean p (the population proportion) and standard deviation
Example
(b) Calculate the 95% confidence interval for the proportion of gift cards that are not used.
Solution
Solution
t-Test: Population mean under
unknown population variance
We use the fact that a sampling distribution of a sample from a population
that follows normal distribution with unknown variance follows a t-distribution
with (n − 1) degrees of freedom. In cases where population variance is
unknown, we will have to
estimate the variance using the sample itself.
𝛂 = 0.05 under
right-tailed test, tcritical = 1.6848 [in Excel TINV(2 𝛂, df) will return right-tailed critical
Note that this is a one-tailed test (right-tailed) and the critical t-value at
In a paired t-test, the data related to the parameter is captured twice from the
same subject, once before the intervention and once after intervention.
Alternatively, the paired t-test can be used for comparing two different
interventions such as two different promotion strategies applied on the same
subject (price discount versus bundling of items).
Assume that the mean difference in the estimated parameter value before and
after the treatment is D, and the corresponding standard deviation of difference is
Sd. Let 𝝻d be the hypothesized mean difference. Then the statistic defined in Eq.
below follows a t-distribution with (n − 1) degrees of freedom.
Example
A researcher believes that people drink more coffee on Mondays than other days of the week.
Based on a sample of 50 coffee drinkers, the mean difference was estimated as 14 ml
test at 𝛂 = 0.1 to check the claim that people drink on average 10 ml more coffee on
and the corresponding standard deviation was 8.5 ml. Conduct an appropriate hypothesis
1. The sample sizes (say n1 and n2) of two samples drawn from two
check
whether the difference in monthly salary is at least 5000 more for students
with
marketing specialization compared to operations specialization. Assume that
the
salary of students with marketing specialization and operations specialization
follow normal distribution.
Two-sample t-Test (equal variance)
2. We assume that the standard deviation of two populations are equal (but
unknown)
Then the sampling distribution of the difference in estimated means (X1 - X2 ) follows a t-
distribution with (n1 + n2 – 2) degrees of freedom with mean (𝝻1 – 𝝻2) and SD
Two-sample t-Test (EV): Problem
A company makes a claim that children (in the age group between 7 and 12)
who drink their health drink will grow taller than the children who do not drink
that
health drink. Data in Table 6.10 shows average increase in height over one-
year
period from two groups: one drinking the health drink and the other not
the health drink. At 𝛂 = 0.05, test whether the increase in height for the
drinking
children who
drink the health drink is at least 1.2 cm.
Two-sample t-Test (Unequal
variance)
1. Standard deviations of the populations are unknown, and require
estimation from the samples drawn from these two populations
Then the sampling distribution of the difference in estimated means (X1 - X2 ) follows a t-
distribution with (n1 + n2 – 2) degrees of freedom with mean (𝝻1 – 𝝻2) and SD
df → Corresponding degrees of
freedom
Two-sample t-Test (UV): Problem
Two-sample Z-Test for proportions
Problem
Effect Size: Cohen’s D
OR
than 𝛂
3.56
ANNOVA BASED PROBLEMS
Problem 1
Problem
Grand Mean (μ) = Total Sum / Total Count = (960 + 1163 + 1392) / 90 = 39.05
Step 3: Between Group Variation (SSB)
Step 4: Within Group Variation
(SSW)
Since the calculated F-statistic is much higher than the critical F-value, we reject the null hypothesis
and conclude that the mean sales quantity values under different discounts are different.
This Example is an experimental study in which the marketer was trying to study the
impact of discounts on sales.
Problem 2
Two-way ANOVA
To understand the impact of 2 factors simultaneously on a
response variable by trying to answer to the following
questions: