Unit 5 - PS
Unit 5 - PS
Level of Significance ,
Part-II
Tests involving the Normal distribution,
One-Tailed and Two-Tailed tests,
P value.
Special tests of significance for large samples and small samples (F, chi- square,
z, t- test),
ANOVA
X −μ X − 0μ X − μ0 X−μ
Z= 0 Z= Z=
σ 0
σ S T= S
n n
n n
⚫ 1
Text Book : Basic Concepts
and Methodology for the Health 10
11
The Use of P – Values in Decision Definition
Making::
⚫ 6.Decision :
⚫ If we reject H0, we can conclude that
HA is true.
⚫ If ,however ,we do not reject H0, we
may conclude that H0 is true.
14
An Alternative Decision Rule using the
p - value Definition
⚫ The p-value is defined as the smallest
value of α
for which the null hypothesis can be
rejected.
• The F-test compares the more than one level of independent variable with multiple groups
which uses the F distribution. This is generally used in ANOVA calculations. Always use
F-distribution for F-test to compare more than two groups.
Assumptions of F distribution
• Assumes both populations are normally distributed
• Both the populations are independent to each other
• The larger sample variance always goes in the numerator to make the right tailed test, and the
right tailed tests are always easy to calculate.
• To find out whether the two samples drawn from the normal population having the
same variance. In this case F ratio is
In both the cases σ12 > σ22 , S12 > S22 in other words larger estimate of variance
always be in numerator and smaller estimate of variance in denominator
• If the samples differ from each other by a big margin, their individual
means would also differ. The difference between the individual means and
grand mean would therefore also be significant.
Such variability between the distributions called
Between-group variability
Dr.L.A.Bewoor, Dept. of Computer
Engg.VIIT Pune
• Each sample is looked at and the difference between its mean and grand mean is calculated to
calculate the variability. If the distributions overlap or are close, the grand mean will be similar to
the individual means whereas if the distributions are far apart, difference between means and
grand mean would be large. Given the sample means and Grand mean, called as sum-of-squares
for between-group variability calculated as:
• F-Statistic
The statistic which measures if the means of different samples are significantly different
or not is called the F-Ratio. Lower the F-Ratio, more similar are the sample means. In that
case, we cannot reject the null hypothesis.
F-statistic calculated here is compared with the F-critical value for making a
conclusion. look at different F-values for each alpha/significance level because the
F-critical value is a function of two things:
6 8 13
8 12 9
4 9 11
5 11 8
3 6 7
4 8 12
Alpha=0.05
F-value is greater than the F-critical value for the alpha level selected (0.05) so reject the
null hypothesis
Dr.L.A.Bewoor, Dept. of Computer
Practice Problem
• Determine if there is a difference in the mean daily calcium intake for people with normal
bone density, osteopenia, and osteoporosis at a 0.05 alpha level. The data was recorded as
follows:
Normal Osteoporosi
Osteopenia
Density s
High
School Bachelors Masters Ph.d. Total
Female 60 54 46 41 201
Male 40 44 53 57 194
Total 100 98 99 98 395
Question: Are gender and education level dependent at 5% level of significance? In
other words, given the data collected above, is there a relationship between the gender
of an individual and the level of education that they have obtained?
High
School Bachelors Masters Ph.d. Total
Female 50.886 49.868 50.377 49.868 201
Male 49.114 48.132 48.623 48.132 194
Total 100 98 99 98 395
The critical value of χ2 with 3 degree of freedom is 7.815. Since 8.006 > 7.815, we reject
the null hypothesis and conclude that the education level depends on gender at a 5% level
of significance.
Male Female
Pass 7 11
Fail 13 9
It seems females have a more successful pass/fail rate than males. However, to test
whether this observed difference is significant, we need to look at the outcome of a
Chi-Square test. As with the one-variable Chi-Square test, our aim is to see if the pattern of
observed frequencies is significantly different from the pattern of frequencies which we
would expect to see by chance - i.e., what we would expect to obtain if there was no
relationship between the two variables in question. With respect to the example above, "no
relationship" would mean that the pattern of driving test performance for males was no
different to that for females.
20 20
Step 2: Calculate expected numbers for each individual cell (i.e. the frequencies we
would expect to obtain if there were no association between the two variables). You do
this by multiplying row sum by column sum and dividing by total number.
Expected Frequency = Row Total x Column
0.6579 + 0.6579 + 0.5952 + 0.5952 = 2.5062
The Chi Square calculation above was 2.5062. This number is less than the critical value
of 3.84, so in this case the null hypothesis cannot be rejected. In other words, there
does not appear to be a significant association between the two variables: males and
females have a statistically similar pattern of pass/fail rates on their driving tests.