0% found this document useful (0 votes)
13 views84 pages

Chpater Three

Chapter Three covers inferential statistics and estimation, focusing on sample statistics, population parameters, and the principles of sampling distributions. It explains point and interval estimations, confidence intervals, and the Central Limit Theorem, emphasizing their importance in statistical inference. The chapter also discusses the use of Z and t distributions for estimating population parameters based on sample data.

Uploaded by

naseemahmed5599
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views84 pages

Chpater Three

Chapter Three covers inferential statistics and estimation, focusing on sample statistics, population parameters, and the principles of sampling distributions. It explains point and interval estimations, confidence intervals, and the Central Limit Theorem, emphasizing their importance in statistical inference. The chapter also discusses the use of Z and t distributions for estimating population parameters based on sample data.

Uploaded by

naseemahmed5599
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 84

CHPATER THREE

Inferential Statistics,
Estimation
Ahmed M(Assistant professor of Epidemiology and
Biostatistics, PhD in Epidemiology Candidate )

2/20/2025 1
Learning objectives
At the end of this chapter the student will be able to:
Understand the concepts of sample statistics and
population parameters
Understand the principles of sampling distributions of
means and proportions and calculate their standard
errors
Understand the principles of estimation and
differentiate between point and interval estimations
Compute appropriate confidence intervals for
population means and proportions and interpret the
findings
2/20/2025 2
Parameter and Statistic
Parameter
- The statistical constant computed for the
population such as mean (), variance
(2),correlation coefficient() , and proportion
(P) are called ‘ parameter ’

- Parameters are functions of the population values.

2/20/2025 3
Statistic
- Statistical constants computed from the samples
corresponding to the parameters namely mean (
x ), Variance (S2), sample correlation
coefficient(r) and proportion etc are called
statistic.
- Statistic are functions of the sample
observations
In general, population parameters are unknown
and sample statistics are used as their estimates.

2/20/2025 4
Sampling Distribution
The sample distribution is the distribution
resulting from the collection of actual data.
The distribution of all possible values that can
be assumed by some statistic, computed from
samples of the same size randomly drawn from
the same population is called the sampling
distribution of that statistic.
2/20/2025 5
Sampling Distribution…..

How can you generate a sampling distribution?


Obtain a sample of n observations selected at
random from a large finite population of size N.
Determine the value (statistic) of interest for each
sample e.g. mean, or proportion, and then replace
the observations in the population.
Repeat the sampling procedure until the possible
number of different samples drawn. For each
sample, calculate the sample value of
interest(statistic) such as sample mean, and
proportion.
2/20/2025 6
Sampling Distribution…..
 If we compute the mean of a sample of 10 numbers,
the value you obtain will not equal the population
mean exactly; by chance it will be a little bit higher or
a little bit lower. If you sampled sets of 10 numbers
over and over again (computing the mean for each
set), you would find that some sample means come
much closer to the population mean than others.
Some would be higher than the population means
and some would be lower.

2/20/2025 7
Properties of sampling Dist---
1. The mean of the sampling distribution of means is
the same as the population mean,  .
2. The SD of the sampling distribution of means is
 / n (Standard error) .
3. The shape of the sampling distribution of means is
approximately a normal curve, regardless of the
shape of the population distribution and provided
n is large enough (Central limit theorem).
2/20/2025 8
The Central Limit Theorem
Regardless of the shape of the frequency distribution of a
characteristic in the parent population, the means of a large number
of samples (independent observations) from the population will
follow a normal distribution (with the mean of means approaches
the population mean μ, and standard deviation of σ/√n ).

The central limit theorem states that the sampling distribution of


any statistic will be normal or nearly normal, if the sample size is
large enough.

How large is "large enough"? As a rough rule of thumb, many


statisticians say that a sample size of 30 is large enough.
2/20/2025 9
The Central Limit Theorem cont’d…

Sampling Distribution of the mean: Suppose we choose


a random sample of size n, the sampling distribution of
the sample mean posses the following properties.
– The sample mean will be an estimate of the
population mean μ.
– The standard deviation of is σ/√n (called the
standard error of the mean).
– Provided n is large enough the shape of the sampling
distribution of is normal.

2/20/2025 10
Sampling distribution of …..

♣ Single mean

♣ Difference of means

♣ Single proportion

♣ Difference of proportion

2/20/2025 11
Estimation of common population parameters using
confidence Intervals
Statistical inference: probability distributions
 The standard normal distribution (Z-distribution) is used
in estimating both point and interval estimates. It is also
used to make both one and two-tailed tests.

 However, it should be noted that the Z-test is applied


when the distribution is normal and the population
standard deviation  is known or when the sample size
n is large ( n  30) and with unknown  (by taking S as
estimator of ) .
 But, what happens when n30 and  is unknown?
2/20/2025 12
Statistical inference Cont…
 We will use a t-distribution which depends on the
number of degrees of freedom (df).

 The t-distribution is a theoretical probability


distribution (i.e. its total area is 100 percent ) and is
defined by a mathematical function.
 The distribution is symmetrical, bell-shaped, and similar
to the normal but more spread out.

 For large sample sizes (n  30), both t and Z curves are


so close together and it does not much matter which
you use.

2/20/2025 13
Statistical inference Cont…
 As the degrees of freedom decrease, the t-distribution
becomes increasingly spread out compared with the
normal.
 The sample standard deviation (S)is used as an estimate
of  (the standard deviation of the population which is
unknown) and appears to be a logical substitute.

 This substitution, however, necessitates an alteration in


the underlying theory, an alteration that is especially
important when the sample size, n, is small.

2/20/2025 14
Statistical inference Cont.…
• The process of drawing conclusions about an entire
population based on the data in a sample is known as
statistical inference.
• Methods of inference usually fall into one of two broad
categories: estimation or hypothesis testing.

2/20/2025 15
Estimation, Estimator & Estimate
♣ Estimation is the computation of a statistic from sample
data, often yielding a value that is an approximation
(guess) of its target, an unknown true population
parameter value.

♣ The statistic itself is called an estimator and can be of


two types - point or interval.

♣ The value or values that the estimator assumes are


called estimates.
2/20/2025 16
• Two methods of estimation are commonly used: point
estimation and interval estimation

1. Point estimation: - A single numerical value used


to estimate the corresponding population
parameter

2/20/2025 17
1. Point Estimate
• A single numerical value used to estimate the
corresponding population parameter.

Sample Statistics are Estimators of Population Parameters


Sample mean, µ
Sample variance, S2 2
Sample proportion, P or π
Sample Odds Ratio, OŔ OR
Sample Relative Risk, RŔ RR
Sample correlation coefficient, r ρ

2/20/2025 18
2. Interval estimation: Is a range (an interval) of
values used to estimate the true values of a
population parameter, with a specified degree of
confidence.
Confidence Interval (CI) estimate of a parameter

CI = Estimator ± (reliability coefficient) x (standard


error)

2/20/2025 19
2/20/2025 20
Confidence Intervals
• Give a plausible range of values of the estimate likely
to include the “true” (population) value with a given
confidence level.
• An interval estimate provides more information about
a population characteristic than does a point estimate
• Such interval estimates are called confidence
intervals.
• CIs also give information about the precision of an
estimate.
• How much uncertainty is associated with a point
estimate of a population parameter?
2/20/2025 21
• A CI in general:
– Takes into consideration variation in sample
statistics from sample to sample
– Based on observation from 1 sample
– Gives information about closeness to
unknown population parameters
– Stated in terms of level of confidence
• Never 100% sure

2/20/2025 22
General Formula:
The general formula for all CIs is:
The value of the statistic in my sample
(e.g., mean, odds ratio, etc.)

point estimate  (measure of how confident we want to


be)  (standard error)

From a Z table or a T table, depending


on the sampling distribution of the
statistic.

Standard error of the statistic.

2/20/2025 23
Estimation for Single Population

2/20/2025 24
1. CI for a Single Population Mean
(normally distributed)
A. Known variance (large sample size)
• There are 3 elements to a CI:
1. Point estimate
2. SE of the point estimate
3. Confidence coefficient
• Consider the task of computing a CI estimate of μ
for a population distribution that is normal with σ
known.
• Available are data from a random sample of size =
n.

2/20/2025 25
Assumptions
 Population standard deviation () is known
 Population is normally distributed
 If population is not normal, use large sample
• A 100(1-)% C.I. for  is:

  is to be chosen by the researcher, most common values of  are 0.05,


0.01 and 0.1.
2/20/2025 26
3. Commonly used CLs are 90%, 95%, and
99%

2/20/2025 27
Reliability Coefficient (z, t)
 The standardized z or t value corresponding to
the given level of confidence.
Z = 1.64 if your confidence level is 90%.
Z = 1.96 if your confidence level is 95%.
Z = 2.58 if your confidence level is 99%.

2/20/2025 28
Factors Affecting Margin of Error

The CI for mean or margin of error is


determined by n, s, and α.
– As n increases, the CI decreases.
– As s increases, the length of CI increases.
– As the confidence level increases (α decreases), the
length of CI increases.
2/20/2025 29
Example:
1. Waiting times (in hours) at a particular hospital are
believed to be approximately normally distributed
with a variance of 2.25 hr.
a. A sample of 20 outpatients revealed a mean waiting
time of 1.52 hours. Construct the 95% CI for the
estimate of the population mean.
b. Suppose that the mean of 1.52 hours had resulted
from a sample of 32 patients. Find the 95% CI.
c. What effect does larger sample size have on the CI?

2/20/2025 30
a. 2.25
1.52  1.96  1.52  1.96(.33)
20
 1.52  .65  (.87, 2.17)
• We are 95% confident that the true mean waiting time is between 0.87 and 2.17
hrs.

• Although the true mean may or may not be in this interval, 95% of the intervals
formed in this manner will contain the true mean.

• An incorrect interpretation is that there is 95% probability that this


interval contains the true population mean.

2/20/2025 31
b.
2.25
1.52  1.96  1.52  1.96(.27)
32
 1.52  .53  (.99, 2.05)

c. The larger the sample size makes the CI


narrower (more precision).

2/20/2025 32
• In this case, the SE of the population can be replaced
by the SE of the sample if the sample size is large
enough (n>30). With large sample size, we assume a
normal distribution.
• Example: It was found that a sample of 35 patients were 17.2
minutes late for appointments, on the average, with SD of 8
minutes. What is the 90% CI for µ?
Ans: (15.0, 19.4).
• Since the sample size is fairly large (>30) and the population SD
is unknown, we assume the distribution of sample mean to be
normally distributed based on the CLT and the sample SD to
replace population .
2/20/2025 33
2/20/2025 34
Student’s t Distribution
• The t is a family of distributions
• Bell Shaped
• Symmetric about zero (the mean)
• Flatter than the Normal (0,1). This means
– The variability of a t is greater than that of a Z that is
normal(0,1)
– Thus, there is more area under the tails and less at center
– Because variability is greater, resulting confidence
intervals will be wider.

2/20/2025 35
Degrees of Freedom (df)
df = Number of observations that are free to vary after
sample mean has been calculated
df = n-1

2/20/2025 36
Student’s t Table

2/20/2025 37
t distribution values
• With comparison to the Z value

2/20/2025 38
Example

2/20/2025 39
Example

• Standard error =
• t-value at 90% CL at 19 df =1.729

2/20/2025 40
2/20/2025 41
Exercise
• Compute a 95% CI for the mean birth weight
based on n = 10, sample mean = 116.9 and s
=21.70.
• From the t Table, t9, 0.975 = 2.262
• Ans: (101.4, 132.4)

2/20/2025 42
2. CIs for single population proportion, p

• Is based on three elements of CI.


– Point estimate
– SE of point estimate
– Confidence coefficient
2/20/2025 43
2/20/2025 44
2/20/2025 45
Lower limit = Point Estimate - (Critical Value) x (Standard Error of Estimate)

Upper limit = Point Estimate + (Critical Value) x (Standard Error of Estimate)

Hence,

is an approximate 95% CI for the true proportion p.

2/20/2025 46
Example 1
• A random sample of 100 people shows that 25
are left-handed. Form a 95% CI for the true
proportion of left-handers.

2/20/2025 47
Interpretation

2/20/2025 48
Changing the sample size

2/20/2025 49
Example 2
• It was found that 28.1% of 153 cervical-cancer cases had never
had a Pap smear prior to the time of case’s diagnosis. Calculate
a 95% CI for the percentage of cervical-cancer cases who never
had a Pap test.

2/20/2025 50
Example 3
• Suppose that among 10,000 female operating-room nurses,
60 women have developed breast cancer over five years. Find
the 95% for p based on point estimate.
• Point estimate = 60/10,000 = 0.006
• The 95% CI for p is given by the interval:

• The 95% CI for p is:

2/20/2025 51
Hypothesis Testing
Ahmed M(Assistant professor of
Epidemiology and Biostatistics

2/20/2025 52
Hypothesis Testing
• Hypothesis is a statement about one or more
population and its parameter.
• Hypothesis testing is a type of statistical inference
which helps the researcher, clinician, or administrator
in reaching a decision or conclusion concerning a
population by examining a sample from that
population.
• Research hypothesis: is the speculation or
supposition that motivates the research.
• Statistical hypotheses: are hypotheses that are
stated in such a way that they may be evaluated by
appropriate statistical technique.
2/20/2025 53
The Logic of Hypothesis Testing
• When you want to make statements about a
population, you usually draw samples
• How generalizable is your sample-based finding?
• Evidence has to be evaluated statistically before
arriving at a conclusion regarding the hypothesis
• Depends on whether information is generated from
the sample with fewer or larger observations

2/20/2025 54
Types of Hypothesis
1. The Null Hypothesis, H0
 Is a statement claiming that there is no difference between
the hypothesized value and the population value.
 (The effect of interest is zero = no difference)
 The null hypothesis, sometimes called hypothesis of no
difference, is the hypothesis to be tested.
 Ho should contain the statement of equality, either =, ≥or ≤.
 States the assumption (hypothesis) to be tested

2/20/2025 55
2. The Alternative Hypothesis, HA
• Is a statement of what we hope or expect to be
able to conclude as a result of the test.
• Is generally the hypothesis that is believed (or
needs to be supported) by the researcher.
• Is a statement that disagrees (opposes) with
Ho
(The effect of interest is not zero)
 Never contains “=” , “ ≤” or “≥ ” sign
• May or may not be accepted
2/20/2025 56
Rules for Stating Statistical Hypotheses
1. One population
• Indication of equality (either =, ≤ or ≥) must appear in
Ho.
Ho: μ = μo, HA: μ ≠ μo
Ho: P = Po, HA: P ≠ Po
• Can we conclude that a certain population mean is
– not 50?
Ho: μ = 50 and HA: μ ≠ 50
– greater than 50?
Ho: μ ≤ 50 HA: μ > 50
• Can we conclude that the proportion of patients with
leukemia who survive more than six years is not 60%?
Ho: P = 0.6 HA: P ≠ 0.6
2. Two populations
Ho: μ1 = μ2 HA: μ1 ≠ μ2
Ho: P1 = P2 HA: P1 ≠ P2
2/20/2025 57
Decision cont --
• Computed from the data of the sample
• The decision to reject or not to reject the Ho is based
on the magnitude of the test statistic.
• An example of a test statistic is the quantity

• When the variance of the population is unknown, we


use

2/20/2025 58
Hypothesis Testing Process

2/20/2025 59
Types of Errors in Hypothesis Tests
• Whenever we reject or accept the Ho, we
commit errors.
• Two types of errors are committed.
– Type I Error
– Type II Error

2/20/2025 60
Type I Error
• Rejecting the Ho when it is true. The probability of
making type I error is denoted by α
• Considered a serious type of error
• Called level of significance of the test
• Set by researcher in advance
Type II Error
• Not rejecting the Ho when it is actually false. The
probability of making type II error is denoted by β
• Usually unknown but larger than α

2/20/2025 61
Action Reality
(Conclusion)
Ho True Ho False

Do not Correct action Type II error (β)


reject Ho (Prob. = 1-α) (Prob. = β= 1-Power)

Reject Ho Type I error (α) Correct action


(Prob. = α = Sign. level) (Prob. = Power = 1-β)

2/20/2025 62
Power
• The probability of rejecting the Ho when it is
false.
Power = 1 – β = 1- probability of type II error

• We would like to maintain low probability of a


Type I error (α) and low probability of a Type II
error (β) [high power = 1 - β].

2/20/2025 63
P – Value
Is the probability of getting the observed difference in
the sample purely by chance from a population where
the true difference is zero.
P – value less than 0.05 are called statically significant
Values grater than 0.10 are usually considered non –
significant.
Values between 0.05 and 0.1 may be considered to
indicate week evidence against the Null hypothesis.
If the P-value is greater than α (like 0.05) then, by
convention, we conclude that the observed difference
could have occurred by chance and there is no
statistically significant evidence (at the α (like 5%) level
of significance) for a difference between the groups in
the population.
2/20/2025 64
So , with large p-value, we can not ignore the effect
of chance.
If the p-value < α (like 0.05), then we say the
difference is significant and hence reject the null
hypothesis of no difference.
While if p-value > α (like 0.05), then the difference is
not significant and hence do not reject the null
hypothesis.

2/20/2025 65
Another way to state conclusion
• Reject Ho if P-value < α
• Accept Ho if P-value ≥ α

P-value is the probability of obtaining a test


statistic as extreme as or more extreme than
the actual test statistic obtained if the Ho is
true
The larger the test statistic, the smaller is the
P-value. OR, the smaller the P-value the
stronger the evidence against the Ho.
2/20/2025 66
Hypothesis Test for One Sample
• Test for single mean
• Test for single proportion

2/20/2025 67
1. Hypothesis Testing of a Single Mean
(Normally Distributed)

2/20/2025 68
1. Hypothesis Testing of a Single Mean
(Normally Distributed)

Large Small
sample sample
• Z – test
t - test

2/20/2025 69
Basic Concepts of Hypothesis Testing
• The Null and Alternate hypothesis
• Choosing the relevant statistical test and appropriate
probability distribution. Depends on
- Size of the sample
- Whether the population standard deviation is
known or not
• Choosing the Critical Value. The three criteria used
are
- Significance Level
- Degrees of Freedom
- One or Two Tailed Test

2/20/2025 70
One or Two-tail Test
• One-tailed Hypothesis Test
• Determines whether a particular population parameter is larger or
smaller than some predefined value

• Uses one critical value of test statistic


• Two-tailed Hypothesis Test
• Determines the likelihood that a population parameter is within
certain upper and lower bounds

• May use one or two critical values

2/20/2025 71
Example: Two-Tailed Test
1. A simple random sample of 10 people from a certain
population has a mean age of 27. Can we conclude that the
mean age of the population is not 30? The variance is
known to be 20. Let a= 0.05.

• Answer, "Yes we can, if we can reject the Ho that it is 30."


A. Data
n = 10, sample mean = 27, 2 = 20, α = 0.05
B. Assumptions
Simple random sample
Normally distributed population

2/20/2025 72
step1 Hypotheses
Ho: µ = 30
HA: µ ≠ 30
step 2 Test statistic
As the population variance is known, we use Z as
the test statistic.

2/20/2025 73
E. Decision Rule
• Reject Ho if the Z value falls in the rejection region.
• Don’t reject Ho if the Z value falls in the non-rejection region.
• Because of the structure of Ho it is a two tail test. Therefore, reject Ho
if Z ≤ -1.96 or Z ≥ 1.96.

2/20/2025 74
F. Calculation of test statistic

G. Statistical decision
We reject the Ho because Z = -2.12 is in the rejection region. The
value is significant at 5% α.
H. Conclusion
=> Z tabulated= -1.96 >Z Cal= -2.12…reject HO
Or using p
=> Zcal value= 0.9830 =p= (1-zca value)
=>We conclude that µ is not 30 (since p<a)= 0.0340<0.05
A Z value of -2.12 corresponds to an area of (1-0.9830= 0.0170). Since there are
two parts to the rejection region in a two tail test, the P-value is twice this
which is 0.017*2=.0340.

2/20/2025 75
Test for single proportion

– p
p
z=
pq
n

2/20/2025 76
Example: When Gregory Mendel conducted his famous
hybridization experiments with peas, one such experiment resulted in
offspring consisting of 428 peas with green pods and 152 peas with
yellow pods. According to Mendel’s theory, 1/4 of the offspring
peas should have yellow pods. Use a 0.05 significance level with the
P-value method to test the claim that the proportion of peas with
yellow pods is equal to 1/4.

We note that n = 428 + 152 = 580,


so p = 0.262, and p = 0.25.

2/20/2025 77
Example: When Gregory Mendel conducted his famous hybridization
experiments with peas, one such experiment resulted in offspring
consisting of 428 peas with green pods and 152 peas with yellow pods.
According to Mendel’s theory, 1/4 of the offspring peas should have
yellow pods. Use a 0.05 significance level with the P-value method to
test the claim that the proportion of peas with yellow pods is equal to
1/4.

H0: p = 0.25 – p
p 0.262 – 0.25
H1: p  0.25 z= = = 0.67
n = 580
 = 0.05 pq
(0.25)(0.75)
p = 0.262
 n
580

Since this is a two-tailed test, the P-value is twice the area to the
right of the test statistic. Using Table A-2,
z = 0.67 is 1 – 0.7486 = 0.2514.
2/20/2025 78
Hypothesis Testing About
a Single Mean - Example 1(2 tailed)
• Ho:  = 5000 (hypothesized value of population)
• Ha:   5000 (alternative hypothesis)
• n = 100
X = 4960
•  = 250
•  = 0.05

Rejection rule: if |zcalc| > z/2 then reject Ho.

2/20/2025 79
Hypothesis Testing About
a Single Mean - Example 2
• Ho:  = 1000 (hypothesized value of population)
• Ha:   1000 (alternative hypothesis)
• n = 12
X = 1087.1
• s = 191.6
•  = 0.01

Rejection rule: if |tcalc| > tdf, /2 then reject Ho.

2/20/2025 80
Hypothesis Testing About
a Single Mean - Example 3(1 tailed)
• Ho:  5000 (hypothesized value of population)
• Ha:  < 5000 (alternative hypothesis)
• n = 50
X = 4970
•  = 250
•  = 0.01

Rejection rule: if  Z  ZCalc then reject Ho.

2/20/2025 81
Hypothesis Testing of Proportion
• Quality control dept of a light bulb company
claims 95% of its products are defect free
• The CEO checks 225 bulbs and finds only 87% to
be defect free
• Is the claim of 95% true at .05 level of significance
?
• So we have hypothesized values and sample values

p  0.87, q  0.13 po  0.95, qo  0.05,


2/20/2025 82
Hypothesis Testing of Proportion
• The null hypothesis is Ho: p=0.95
• The alternate hypothesis is Ha: p 0.95
• First, calculate the standard error of the proportion
using hypothesized values as
p o qo .95  .05
p    .0145
n 225
• Since np and nq are large, we can use the Z table.
The appropriate z value is 1.96

2/20/2025 83
Hypothesis Testing of Proportion
• The limits of the acceptance region are

po  1.96 p  .95  (1.96  .0145)  (.922, .978)


• Since the sample proportion of 0.87 does not
fall within the acceptance region, the CEO
should reject the quality control department’s
claim

2/20/2025 84

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy