Statistical Inference 417
Statistical Inference 417
MLB 417
LEARNING OUTCOMES
• After studying this chapter, the student will
• 1. Explain the importance and basic principles of
estimation
• 2. Be able to calculate interval estimates for a variety of
parameters.
• 3. Be able to interpret a confidence interval
• 4. Identify the basic properties and uses of the t
distribution, chi-square distribution, and F distribution
2
Introduction
• Means and variances in most cases are calculated from
samples drawn from populations.
• These statistics serve as estimates of the corresponding
population parameters.
• These estimates are expected to differ by some amount from
the parameters they estimate.
• Estimation procedures take these differences into account,
thereby providing a foundation for statistical inference.
• The two basic areas of statistical inference are estimation and
hypothesis testing.
3
Introduction
• Statistical inference procedures can be used to reach
conclusions about the target population only when the target
population and the sampled population are the same.
• For example, to assess the effectiveness of a method for
treating arthritis, the target population will consist of all
patients suffering from the disease.
• It is not practical to draw a sample from this population.
• Select a sample from all arthritis patients seen in some
specific clinic as the sampled population.
• Inferences about this sampled population may be drawn on
the basis of the information in the sample. 4
Definitions
• Statistical inference is the procedure by which we reach a
conclusion about a population on the basis of the
information contained in a sample drawn from that
population.
• It includes methods like point estimation, interval estimation
and hypothesis testing which are all based on probability
theory.
• Making decisions in the face of uncertainty
• Estimation entails calculating from the data of a sample,
some statistic that is offered as an approximation of the
corresponding parameter of the population from which the
sample was drawn
5
Definitions
• An estimate is that single computed value from a sample
• A point estimate is a single numerical value used to estimate the
corresponding population parameter.
• An interval estimate consists of two numerical values defining a
range of values that, with a specified degree of confidence, most
likely includes the parameter being estimated.
• An estimator is the rule used to compute this value, or estimate.
• The sampled population is the population from which one actually
draws a sample.
• The target population is the population about which one wishes to
make an inference. 6
Point and Interval Estimates
• A point estimate is a single number,
• a confidence interval provides additional
information about the variability of the
estimate
Lower Upper
Confidence Point Estimate Confidence
Limit Limit
Width of
confidence interval
7
Point Estimators – Most common to use sample values
ˆ s i
( y y ) 2
n 1
• Sample proportion ˆ estimates population
proportion
8
Confidence Intervals
9
Confidence Interval Estimate
11
General Formula
• The general formula for all confidence
intervals is:
Point Estimate ± (Critical Value)(Standard Error)
Where:
•Point Estimate is the sample statistic estimating the population
parameter of interest
•Critical Value is a table value based on the sampling distribution
of the point estimate and the desired confidence level
12
Confidence Intervals
Confidence
Intervals
Population Population
Mean Proportion
σ Known σ Unknown
13
Confidence Interval for 𝜇
(𝜎 known)
• Assumptions
• Population standard deviation σ is
known
• Population is normally distributed
• If population is not normal, use large
sample
14
Confidence Interval for 𝜇
(𝜎 known)
• Confidence interval estimate:
σ
X Z α/2
n
15
Finding the Critical Value, Zα/2
α α
0.025 0.025
2 2
17
Solution
24.00
• SE(𝑥) = = 4.31
31
• 84.65 ± (1.96)(4.31) = (76.2; 93.1)
• We are 95% confident that the true mean saturation
level is between 76.2 and 93.1
• Although the true mean may or may not be in this
interval, 95% of intervals formed in this manner will
contain the true mean
18
Confidence Interval for 𝜇 (𝜎 Unknown)
• If the population standard deviation is unknown but the population
is normally distributed, a large sample size should be used
• the sample standard deviation, S can be substituted for population
standard deviation σ.
• This introduces extra uncertainty, since S is variable from sample
to sample
• So the t distribution with n-1 degrees of freedom is used instead of
the normal distribution.
• The degrees of freedom measure the amount of information
available in the data that can be used to estimate σ² ; hence, it
measures the reliability of s² as an estimate of σ². 19
Confidence Interval for 𝜇
(𝜎 Unknown) )
21
Student’s t Distribution
Note: t Z as n increases
Standard
Normal
(t with df = ∞)
t (df = 13)
t-distributions are bell-
shaped and symmetric, but
have ‘fatter’ tails than the t (df = 5)
normal
0 t
22
Example
23
Example of t distribution confidence interval
S 8
X t α/2 50 (2.0639)
n 25
46.698 ≤ μ ≤ 53.302
24
25
Confidence Interval for a Proportion
• The estimate of a population proportion, p , is similar to
estimating a population mean.
• A sample is drawn from the population of interest, and the
sample proportion, p is computed. This sample proportion is
used as the point estimator of the population proportion. A
confidence interval is obtained by the general formula
• estimator ± (reliability coefficient)*(standard error of the
estimator)
The standard error of the sample proportion = √(p(1 – p)/n).
The 100( 1 – α) percent confidence interval for p is given by
pˆ z1 / 2 pˆ (1 pˆ ) / n 26
Example 1:
𝑠𝑒 = 𝑝(1 − 𝑝)/𝑛
= 0.213(0.787)/164 = 0.032 27
Example 1:
28
Example 2
18 percent of Internet users have used it to search for information regarding
medicines. The sample consisted of 1220 adult Internet users, construct a 95
percent confidence interval for the proportion of Internet users in the sampled
population who have searched for information on medicines.
Solution
𝑝 ̂ = .18,
the reliability coefficient corresponding to a confidence level of .95 is 1.96,
Estimate of the standard error is =√(.18)(.82)/1220 = .0110
The 95 percent confidence interval for p, based on these data, is
0.18 ± 1.96(0.0110)
.18 ± .022 .158 , .202
29
Example 2 cont.
We are 95 percent confident that the population proportion p is
between .158 and .202.
Thus, we expect, with 95 percent confidence, to find
somewhere between 15.8 percent and 20.2 percent of adult
Internet users to have used it for information on medicine.
30
Example 3:
33
Example 2
• Solution
28
• Point estimate , p = = 0.0056
5000
0.0056 1−0.0056
• SE(p) = = 0.0011
5000
• 95% confidence interval is given by
• 0.005±1.96(0.0011) = (0.0034; 0.0078)
34
Exercise
• 1. A researcher found that of 472 mechanically ventilated patients,
63 had clinical evidence of ventilator-associated pneumonia.
Construct a 95 percent confidence interval for the proportion of all
mechanically ventilated patients at these hospitals who may be
expected to develop the disease.
Determining
Sample Size
σ σ
X Zα / 2 e Zα / 2
n n
38
Determining Sample Size - Cochran’s formula
(continued)
Determining
Sample Size
For the
Mean
σ 2
Zα / 2 σ 2
e Zα / 2 Now solve for n
to get n
n e 2
39
Determining Sample Size
• Thus, sample size require the knowledge of σ² which
is not known. It has to be estimated using;
• 1. Select a pilot sample and estimate σ with the
sample standard deviation, S or
• 2. from previous or similar studies.
40
Example 1
• A biostatistician is to advice on size of a sample to be
taken to conduct a survey among a population of
teenage girls to determine their average daily protein
intake (measured in grams).
• How will he provide this assistance?
• These three items of information should be provided:
(1) the desired width of the confidence interval,
• (2) the level of confidence desired, and
• (3) the magnitude of the population variance
41
Example 2
• Assume an interval of about 10 grams wide is desired; (within 5
grams of the population mean in either direction - a margin of
error of 5 grams).
• Also assume that a confidence coefficient of .95 is decided
• from past experience, the population standard deviation is
probably about 20 grams.
• Thus, z= 1.96, σ = 20, e = 5
•
• 2
1.96 202 = 61.47.
• n A sample of size 62 is advised
5 2 42
Sample Size Example 2
Z 2 σ 2 (1.645)2 (45)2
n 2
2
219.19
e 5
43
Exercise
• 1. To estimate the mean weight of babies born in her hospital, how
large a sample of birth records should be taken if a 99 percent
confidence interval is desired and that is 1 pound wide? Assume that
a reasonable estimate of the standard deviation is 1 pound.
• What sample size is required if the confidence coefficient is lowered
to .95?
• 2. in order to estimate the mean age of persons bitten by dogs, a
sample is to be drawn from the department’s records of dog bites
reported. A 95 percent confidence interval is desired, a margin o error
o 2.5 is satisfied and from previous studies estimates of the
population standard deviation is to be about 15 years.
• How large a sample should be drawn? 44
Determination Of Sample Size For Estimating Proportions -
Cochran’s formula
• This is essentially the same as that described for estimating a
population mean.
2
n z pq 2
• e ‘ q=1-p
• p is the proportion in the population possessing the
characteristic of interest. This will be unknown, a pilot
sample is taken and an estimate is computed to be used in
place of p.
• If it is impossible to come up with a better estimate, one may
set p equal to .5 and solve for n.
45
Example
1. How large a sample size do we need to estimate a population
proportion to within 0.03, with probability 0.95? Assume the
population proportion is 50%.
solution
n = (1.96)² (0.50) (0.50) = 0.9604/0.0009 = 1067
0.03²
47
Practice 2
• An administrator at Alpha-Beta clinic, wishes to know what
proportion of discharged patients is unhappy with the care received
during hospitalization. How large a sample should be drawn if the
margin of error is assumed to be 0.05, the confidence coefficient is
.95, and no other information is available?
• How large should the sample be if p is approximated by 0.25?
48
Yamane’s or Slovin's formula
49
Using Published Tables
• Table 1. Sample Size for ±5% and ±10% Precision Levels where Confidence
Level is 95% and P=0.5
• Size
. of Size of Sample Size Sample Size
Sample Size Sample population (n) ±5% ±10%
population (n) ±5% Size ±10%
500 222 83
100 81 51
1000 286 91
125 96 56
2000 333 95
150 110 61
200 134 67 3000 353 97
250 154 72 4000 364 98
300 172 76 5000 370 98
350 187 78 7000 378 99
400 201 81 9000 383 99
450 212 82 10000 385 99 50
HYPOTHESIS TESTING
• Overview
• Basics of Hypothesis Testing
• Key Concepts in Hypothesis Testing
• Testing a Claim About a Mean: σ Known
• Testing a Claim About a Mean: σ Not Known
51
HYPOTHESIS TESTING
• Overview
• Hypothesis testing is the second of two general areas of
statistical inference. The main goal in many research studies
is to check whether the data collected support certain
statements or predictions.
• A hypothesis test involves collecting data from a sample and
evaluating the data. Then, a decision is made as to whether or
not there is sufficient evidence, based upon analyses of the
data, to reject the null hypothesis.
• Hypothesis testing consists of two contradictory hypotheses
or statements, a decision based on the data, and a conclusion
52
Basic Concepts
• A hypothesis may be defined simply as a statement about one
or more populations
• It is frequently concerned with the parameters of the
populations about which the statement is made.
• A hospital administrator may hypothesize that the average
length of stay of patients admitted to the hospital is 5 days;
• A public health nurse may hypothesize that a particular
educational program will result in improved communication
between nurse and patient;
• A physician may hypothesize that a certain drug will be
effective in 90 percent of the cases for which it is used. 53
Basic Concepts
• The null hypothesis - Hჿ - is a statement that the value of a
population parameter (such as proportion, mean, or standard
deviation) is equal to some claimed value.
• The null hypothesis states that the “null” condition exists;
that is, there is nothing new happening, the old theory is still
true, the old standard is correct, and the system is in control.
• The alternative hypothesis – H₁ or HA - is the statement that
the parameter has a value that somehow differs from the null
hypothesis. The alternative hypothesis, on the other hand,
states that the new theory is true, there are new standards, the
system is out of control, and/or something is happening 54
Basic Concepts
• NOTE: new hypotheses that researchers want to “prove” are stated in
the alternative hypothesis.
• Alternative hypothesis must use one of these symbols: ≠, <, >
Identifying Null and Alternative Hypothesis
• 1. In 2013 , 70% of Ghanaians 18years old participated in volunteer
work.
• i) A researcher believes that this percentage is different today.
• ii) A researcher believes that this percentage is lower today than in
2013
• iii) A researcher believes that this percentage is higher today than in
2013 55
Basic Concepts - Solution
• i) Ho : p = 70%; 70% of Ghanaians 18years old participated in
volunteer work in 2013.
• H1: p ≠ 70%; percentage of Ghanaians 18years old who
participated in volunteer work in 2013 is different from 70% today.
57
Solution
• a) The proportion of drivers who admit to running red lights is
greater than 0.5.
• H0 : p = 0.5.
• H1 : p > 0.5
• b) The mean height of professional basketball players is at most 7
ft. H0 : µ = 7
• H1 : µ ≤ 7.
• c) The standard deviation of IQ scores of actors is equal to 15.
• H0 : σ = 15.
• H1 : σ ≠ 15 58
Test Statistic
• The test statistic is a value used in making a decision about
the null hypothesis, and is found by converting the sample
statistic to a score with the assumption that the null
hypothesis is true.
59
Critical Value
62
Right-tailed Test and Left- tailed Test
63
Significance Level
• Level of significance reflects the fact that hypothesis tests
are sometimes called significance tests, and a computed value
of the test statistic that falls in the rejection region is said to
be significant.
The level of significance, α , specifies the area under the curve
of the distribution of the test statistic that is above the values
on the horizontal axis constituting the rejection region.
The more frequently encountered values of α are .01, .05, and
.10
64
Types of Errors
• 1. Type I Error – it is the error committed when a true null
hypothesis is rejected.
• Type II Error - it is the error committed when a false null
hypothesis is not rejected.
65
HYPOTHESIS TESTING
• To perform a hypothesis test:
• 1. Set up two contradictory hypotheses (null hypothesis and
alternative hypothesis)
• 2. Collect sample data
• 3. Determine the correct distribution to perform the
hypothesis test.
• 4. Test statistic. = relevant statistic - hypothesized parameter
standard error of the relevant statistic
•
• Statistical decision
• Reject the null hypothesis since -2.12 is in the rejection
region, that is, the computed value of the test statistic is
significant at the .05 level.
• The conclusion is that the population mean is not equal to 30.
72
HYPOTHESIS TESTING
• Sampling from Normally Distributed Populations with
Population Variance unknown
• t = 98.217 −98.6
78
WHEN TO USE ANOVA
• ANOVA is used in applications such as the following:
• 1. To determine if there is sufficient evidence to support the
• claim that the three groups have different mean blood pressure
• levels by treating group 1 with 2 tablets of aspirin, group 2
• 1tablet each day and the group 3 placebo.
• 2. To test the claim that the cereals on the shelves have the
• same mean sugar content since it is believed that supermarkets
• place high-sugar cereals on shelves that are at eye-level for
• children
79
Assumptions of one way ANOVA
• Populations are normally distributed
• The data are randomly sampled and independently chosen from the
populations
• The variances of each sample are assumed equal./ Populations have
equal variances.
• It is based on a comparison of two different estimates
• The variance among (between) samples and the variance within
samples.
• It is one-way since the sample data are separated into groups
according to one characteristic, or factor.
80
One-Way ANOVA
• Hypotheses
• Hჿ: μ₁ = μ₂ = μ₃ = ……μₖ
• All population means are equal
• i.e., no factor effect (no variation in means among groups)
• H₁ :Not all of the population means are the same
• At least one population mean is different
• i.e., there is a factor effect
• Does not mean that all population means are different (some
pairs may be the same)
81
One-Way ANOVA
• ANOVA analyzes the variance among values.
• It calculates the variance by summing the squares of the
differences between each value and the mean.
• This is called the sum of squares.
• The variance has two components when the data is from
several groups.
• 1. variation from differences among the group mean.
• 2. variation from differences among the subjects within
• each group (within-groups sum of square)
82
Computing a one-way ANOVA
• Here is the basic one-way ANOVA table
Source SS df MS F p F
Between SSA k-1 SSA / k-1 F = MSA/
(Among) MSW
84
Class exercise
• A study was conducted to test the question as to whether cigarette smoking
is associated with reduced serum levels in men aged 35 to 45. The outcome
is as follows:
a) What is the null hypothesis?
b. What is the alternative hypothesis?
Source SS df MS F c. Identify the value of the test statistic.
d. Find the critical value for a 0.05
Between 0.7248 3 0.2416 9.152 significance level
(Among)
e. how many groups are there?
f. Based on the preceding results, what
do you conclude about equality of the
Within 8.1516 309 population
0.0264 means?
90