Chpater Three
Chpater Three
Inferential Statistics,
Estimation
Ahmed M(Assistant professor of Epidemiology and
Biostatistics, PhD in Epidemiology Candidate )
2/20/2025 1
Learning objectives
At the end of this chapter the student will be able to:
Understand the concepts of sample statistics and
population parameters
Understand the principles of sampling distributions of
means and proportions and calculate their standard
errors
Understand the principles of estimation and
differentiate between point and interval estimations
Compute appropriate confidence intervals for
population means and proportions and interpret the
findings
2/20/2025 2
Parameter and Statistic
Parameter
- The statistical constant computed for the
population such as mean (), variance
(2),correlation coefficient() , and proportion
(P) are called ‘ parameter ’
2/20/2025 3
Statistic
- Statistical constants computed from the samples
corresponding to the parameters namely mean (
x ), Variance (S2), sample correlation
coefficient(r) and proportion etc are called
statistic.
- Statistic are functions of the sample
observations
In general, population parameters are unknown
and sample statistics are used as their estimates.
2/20/2025 4
Sampling Distribution
The sample distribution is the distribution
resulting from the collection of actual data.
The distribution of all possible values that can
be assumed by some statistic, computed from
samples of the same size randomly drawn from
the same population is called the sampling
distribution of that statistic.
2/20/2025 5
Sampling Distribution…..
2/20/2025 7
Properties of sampling Dist---
1. The mean of the sampling distribution of means is
the same as the population mean, .
2. The SD of the sampling distribution of means is
/ n (Standard error) .
3. The shape of the sampling distribution of means is
approximately a normal curve, regardless of the
shape of the population distribution and provided
n is large enough (Central limit theorem).
2/20/2025 8
The Central Limit Theorem
Regardless of the shape of the frequency distribution of a
characteristic in the parent population, the means of a large number
of samples (independent observations) from the population will
follow a normal distribution (with the mean of means approaches
the population mean μ, and standard deviation of σ/√n ).
2/20/2025 10
Sampling distribution of …..
♣ Single mean
♣ Difference of means
♣ Single proportion
♣ Difference of proportion
2/20/2025 11
Estimation of common population parameters using
confidence Intervals
Statistical inference: probability distributions
The standard normal distribution (Z-distribution) is used
in estimating both point and interval estimates. It is also
used to make both one and two-tailed tests.
2/20/2025 13
Statistical inference Cont…
As the degrees of freedom decrease, the t-distribution
becomes increasingly spread out compared with the
normal.
The sample standard deviation (S)is used as an estimate
of (the standard deviation of the population which is
unknown) and appears to be a logical substitute.
2/20/2025 14
Statistical inference Cont.…
• The process of drawing conclusions about an entire
population based on the data in a sample is known as
statistical inference.
• Methods of inference usually fall into one of two broad
categories: estimation or hypothesis testing.
2/20/2025 15
Estimation, Estimator & Estimate
♣ Estimation is the computation of a statistic from sample
data, often yielding a value that is an approximation
(guess) of its target, an unknown true population
parameter value.
2/20/2025 17
1. Point Estimate
• A single numerical value used to estimate the
corresponding population parameter.
2/20/2025 18
2. Interval estimation: Is a range (an interval) of
values used to estimate the true values of a
population parameter, with a specified degree of
confidence.
Confidence Interval (CI) estimate of a parameter
2/20/2025 19
2/20/2025 20
Confidence Intervals
• Give a plausible range of values of the estimate likely
to include the “true” (population) value with a given
confidence level.
• An interval estimate provides more information about
a population characteristic than does a point estimate
• Such interval estimates are called confidence
intervals.
• CIs also give information about the precision of an
estimate.
• How much uncertainty is associated with a point
estimate of a population parameter?
2/20/2025 21
• A CI in general:
– Takes into consideration variation in sample
statistics from sample to sample
– Based on observation from 1 sample
– Gives information about closeness to
unknown population parameters
– Stated in terms of level of confidence
• Never 100% sure
2/20/2025 22
General Formula:
The general formula for all CIs is:
The value of the statistic in my sample
(e.g., mean, odds ratio, etc.)
2/20/2025 23
Estimation for Single Population
2/20/2025 24
1. CI for a Single Population Mean
(normally distributed)
A. Known variance (large sample size)
• There are 3 elements to a CI:
1. Point estimate
2. SE of the point estimate
3. Confidence coefficient
• Consider the task of computing a CI estimate of μ
for a population distribution that is normal with σ
known.
• Available are data from a random sample of size =
n.
2/20/2025 25
Assumptions
Population standard deviation () is known
Population is normally distributed
If population is not normal, use large sample
• A 100(1-)% C.I. for is:
2/20/2025 27
Reliability Coefficient (z, t)
The standardized z or t value corresponding to
the given level of confidence.
Z = 1.64 if your confidence level is 90%.
Z = 1.96 if your confidence level is 95%.
Z = 2.58 if your confidence level is 99%.
2/20/2025 28
Factors Affecting Margin of Error
2/20/2025 30
a. 2.25
1.52 1.96 1.52 1.96(.33)
20
1.52 .65 (.87, 2.17)
• We are 95% confident that the true mean waiting time is between 0.87 and 2.17
hrs.
• Although the true mean may or may not be in this interval, 95% of the intervals
formed in this manner will contain the true mean.
2/20/2025 31
b.
2.25
1.52 1.96 1.52 1.96(.27)
32
1.52 .53 (.99, 2.05)
2/20/2025 32
• In this case, the SE of the population can be replaced
by the SE of the sample if the sample size is large
enough (n>30). With large sample size, we assume a
normal distribution.
• Example: It was found that a sample of 35 patients were 17.2
minutes late for appointments, on the average, with SD of 8
minutes. What is the 90% CI for µ?
Ans: (15.0, 19.4).
• Since the sample size is fairly large (>30) and the population SD
is unknown, we assume the distribution of sample mean to be
normally distributed based on the CLT and the sample SD to
replace population .
2/20/2025 33
2/20/2025 34
Student’s t Distribution
• The t is a family of distributions
• Bell Shaped
• Symmetric about zero (the mean)
• Flatter than the Normal (0,1). This means
– The variability of a t is greater than that of a Z that is
normal(0,1)
– Thus, there is more area under the tails and less at center
– Because variability is greater, resulting confidence
intervals will be wider.
2/20/2025 35
Degrees of Freedom (df)
df = Number of observations that are free to vary after
sample mean has been calculated
df = n-1
2/20/2025 36
Student’s t Table
2/20/2025 37
t distribution values
• With comparison to the Z value
2/20/2025 38
Example
2/20/2025 39
Example
• Standard error =
• t-value at 90% CL at 19 df =1.729
2/20/2025 40
2/20/2025 41
Exercise
• Compute a 95% CI for the mean birth weight
based on n = 10, sample mean = 116.9 and s
=21.70.
• From the t Table, t9, 0.975 = 2.262
• Ans: (101.4, 132.4)
2/20/2025 42
2. CIs for single population proportion, p
Hence,
2/20/2025 46
Example 1
• A random sample of 100 people shows that 25
are left-handed. Form a 95% CI for the true
proportion of left-handers.
2/20/2025 47
Interpretation
2/20/2025 48
Changing the sample size
2/20/2025 49
Example 2
• It was found that 28.1% of 153 cervical-cancer cases had never
had a Pap smear prior to the time of case’s diagnosis. Calculate
a 95% CI for the percentage of cervical-cancer cases who never
had a Pap test.
2/20/2025 50
Example 3
• Suppose that among 10,000 female operating-room nurses,
60 women have developed breast cancer over five years. Find
the 95% for p based on point estimate.
• Point estimate = 60/10,000 = 0.006
• The 95% CI for p is given by the interval:
2/20/2025 51
Hypothesis Testing
Ahmed M(Assistant professor of
Epidemiology and Biostatistics
2/20/2025 52
Hypothesis Testing
• Hypothesis is a statement about one or more
population and its parameter.
• Hypothesis testing is a type of statistical inference
which helps the researcher, clinician, or administrator
in reaching a decision or conclusion concerning a
population by examining a sample from that
population.
• Research hypothesis: is the speculation or
supposition that motivates the research.
• Statistical hypotheses: are hypotheses that are
stated in such a way that they may be evaluated by
appropriate statistical technique.
2/20/2025 53
The Logic of Hypothesis Testing
• When you want to make statements about a
population, you usually draw samples
• How generalizable is your sample-based finding?
• Evidence has to be evaluated statistically before
arriving at a conclusion regarding the hypothesis
• Depends on whether information is generated from
the sample with fewer or larger observations
2/20/2025 54
Types of Hypothesis
1. The Null Hypothesis, H0
Is a statement claiming that there is no difference between
the hypothesized value and the population value.
(The effect of interest is zero = no difference)
The null hypothesis, sometimes called hypothesis of no
difference, is the hypothesis to be tested.
Ho should contain the statement of equality, either =, ≥or ≤.
States the assumption (hypothesis) to be tested
2/20/2025 55
2. The Alternative Hypothesis, HA
• Is a statement of what we hope or expect to be
able to conclude as a result of the test.
• Is generally the hypothesis that is believed (or
needs to be supported) by the researcher.
• Is a statement that disagrees (opposes) with
Ho
(The effect of interest is not zero)
Never contains “=” , “ ≤” or “≥ ” sign
• May or may not be accepted
2/20/2025 56
Rules for Stating Statistical Hypotheses
1. One population
• Indication of equality (either =, ≤ or ≥) must appear in
Ho.
Ho: μ = μo, HA: μ ≠ μo
Ho: P = Po, HA: P ≠ Po
• Can we conclude that a certain population mean is
– not 50?
Ho: μ = 50 and HA: μ ≠ 50
– greater than 50?
Ho: μ ≤ 50 HA: μ > 50
• Can we conclude that the proportion of patients with
leukemia who survive more than six years is not 60%?
Ho: P = 0.6 HA: P ≠ 0.6
2. Two populations
Ho: μ1 = μ2 HA: μ1 ≠ μ2
Ho: P1 = P2 HA: P1 ≠ P2
2/20/2025 57
Decision cont --
• Computed from the data of the sample
• The decision to reject or not to reject the Ho is based
on the magnitude of the test statistic.
• An example of a test statistic is the quantity
2/20/2025 58
Hypothesis Testing Process
2/20/2025 59
Types of Errors in Hypothesis Tests
• Whenever we reject or accept the Ho, we
commit errors.
• Two types of errors are committed.
– Type I Error
– Type II Error
2/20/2025 60
Type I Error
• Rejecting the Ho when it is true. The probability of
making type I error is denoted by α
• Considered a serious type of error
• Called level of significance of the test
• Set by researcher in advance
Type II Error
• Not rejecting the Ho when it is actually false. The
probability of making type II error is denoted by β
• Usually unknown but larger than α
2/20/2025 61
Action Reality
(Conclusion)
Ho True Ho False
2/20/2025 62
Power
• The probability of rejecting the Ho when it is
false.
Power = 1 – β = 1- probability of type II error
2/20/2025 63
P – Value
Is the probability of getting the observed difference in
the sample purely by chance from a population where
the true difference is zero.
P – value less than 0.05 are called statically significant
Values grater than 0.10 are usually considered non –
significant.
Values between 0.05 and 0.1 may be considered to
indicate week evidence against the Null hypothesis.
If the P-value is greater than α (like 0.05) then, by
convention, we conclude that the observed difference
could have occurred by chance and there is no
statistically significant evidence (at the α (like 5%) level
of significance) for a difference between the groups in
the population.
2/20/2025 64
So , with large p-value, we can not ignore the effect
of chance.
If the p-value < α (like 0.05), then we say the
difference is significant and hence reject the null
hypothesis of no difference.
While if p-value > α (like 0.05), then the difference is
not significant and hence do not reject the null
hypothesis.
2/20/2025 65
Another way to state conclusion
• Reject Ho if P-value < α
• Accept Ho if P-value ≥ α
2/20/2025 67
1. Hypothesis Testing of a Single Mean
(Normally Distributed)
2/20/2025 68
1. Hypothesis Testing of a Single Mean
(Normally Distributed)
Large Small
sample sample
• Z – test
t - test
2/20/2025 69
Basic Concepts of Hypothesis Testing
• The Null and Alternate hypothesis
• Choosing the relevant statistical test and appropriate
probability distribution. Depends on
- Size of the sample
- Whether the population standard deviation is
known or not
• Choosing the Critical Value. The three criteria used
are
- Significance Level
- Degrees of Freedom
- One or Two Tailed Test
2/20/2025 70
One or Two-tail Test
• One-tailed Hypothesis Test
• Determines whether a particular population parameter is larger or
smaller than some predefined value
2/20/2025 71
Example: Two-Tailed Test
1. A simple random sample of 10 people from a certain
population has a mean age of 27. Can we conclude that the
mean age of the population is not 30? The variance is
known to be 20. Let a= 0.05.
2/20/2025 72
step1 Hypotheses
Ho: µ = 30
HA: µ ≠ 30
step 2 Test statistic
As the population variance is known, we use Z as
the test statistic.
2/20/2025 73
E. Decision Rule
• Reject Ho if the Z value falls in the rejection region.
• Don’t reject Ho if the Z value falls in the non-rejection region.
• Because of the structure of Ho it is a two tail test. Therefore, reject Ho
if Z ≤ -1.96 or Z ≥ 1.96.
2/20/2025 74
F. Calculation of test statistic
G. Statistical decision
We reject the Ho because Z = -2.12 is in the rejection region. The
value is significant at 5% α.
H. Conclusion
=> Z tabulated= -1.96 >Z Cal= -2.12…reject HO
Or using p
=> Zcal value= 0.9830 =p= (1-zca value)
=>We conclude that µ is not 30 (since p<a)= 0.0340<0.05
A Z value of -2.12 corresponds to an area of (1-0.9830= 0.0170). Since there are
two parts to the rejection region in a two tail test, the P-value is twice this
which is 0.017*2=.0340.
2/20/2025 75
Test for single proportion
– p
p
z=
pq
n
2/20/2025 76
Example: When Gregory Mendel conducted his famous
hybridization experiments with peas, one such experiment resulted in
offspring consisting of 428 peas with green pods and 152 peas with
yellow pods. According to Mendel’s theory, 1/4 of the offspring
peas should have yellow pods. Use a 0.05 significance level with the
P-value method to test the claim that the proportion of peas with
yellow pods is equal to 1/4.
2/20/2025 77
Example: When Gregory Mendel conducted his famous hybridization
experiments with peas, one such experiment resulted in offspring
consisting of 428 peas with green pods and 152 peas with yellow pods.
According to Mendel’s theory, 1/4 of the offspring peas should have
yellow pods. Use a 0.05 significance level with the P-value method to
test the claim that the proportion of peas with yellow pods is equal to
1/4.
H0: p = 0.25 – p
p 0.262 – 0.25
H1: p 0.25 z= = = 0.67
n = 580
= 0.05 pq
(0.25)(0.75)
p = 0.262
n
580
Since this is a two-tailed test, the P-value is twice the area to the
right of the test statistic. Using Table A-2,
z = 0.67 is 1 – 0.7486 = 0.2514.
2/20/2025 78
Hypothesis Testing About
a Single Mean - Example 1(2 tailed)
• Ho: = 5000 (hypothesized value of population)
• Ha: 5000 (alternative hypothesis)
• n = 100
X = 4960
• = 250
• = 0.05
2/20/2025 79
Hypothesis Testing About
a Single Mean - Example 2
• Ho: = 1000 (hypothesized value of population)
• Ha: 1000 (alternative hypothesis)
• n = 12
X = 1087.1
• s = 191.6
• = 0.01
2/20/2025 80
Hypothesis Testing About
a Single Mean - Example 3(1 tailed)
• Ho: 5000 (hypothesized value of population)
• Ha: < 5000 (alternative hypothesis)
• n = 50
X = 4970
• = 250
• = 0.01
2/20/2025 81
Hypothesis Testing of Proportion
• Quality control dept of a light bulb company
claims 95% of its products are defect free
• The CEO checks 225 bulbs and finds only 87% to
be defect free
• Is the claim of 95% true at .05 level of significance
?
• So we have hypothesized values and sample values
2/20/2025 83
Hypothesis Testing of Proportion
• The limits of the acceptance region are
2/20/2025 84