0% found this document useful (0 votes)
202 views90 pages

Statistical Inference 417

Good for students

Uploaded by

maxwell amponsah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
202 views90 pages

Statistical Inference 417

Good for students

Uploaded by

maxwell amponsah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 90

Statistical Inference

MLB 417
LEARNING OUTCOMES
• After studying this chapter, the student will
• 1. Explain the importance and basic principles of
estimation
• 2. Be able to calculate interval estimates for a variety of
parameters.
• 3. Be able to interpret a confidence interval
• 4. Identify the basic properties and uses of the t
distribution, chi-square distribution, and F distribution
2
Introduction
• Means and variances in most cases are calculated from
samples drawn from populations.
• These statistics serve as estimates of the corresponding
population parameters.
• These estimates are expected to differ by some amount from
the parameters they estimate.
• Estimation procedures take these differences into account,
thereby providing a foundation for statistical inference.
• The two basic areas of statistical inference are estimation and
hypothesis testing.
3
Introduction
• Statistical inference procedures can be used to reach
conclusions about the target population only when the target
population and the sampled population are the same.
• For example, to assess the effectiveness of a method for
treating arthritis, the target population will consist of all
patients suffering from the disease.
• It is not practical to draw a sample from this population.
• Select a sample from all arthritis patients seen in some
specific clinic as the sampled population.
• Inferences about this sampled population may be drawn on
the basis of the information in the sample. 4
Definitions
• Statistical inference is the procedure by which we reach a
conclusion about a population on the basis of the
information contained in a sample drawn from that
population.
• It includes methods like point estimation, interval estimation
and hypothesis testing which are all based on probability
theory.
• Making decisions in the face of uncertainty
• Estimation entails calculating from the data of a sample,
some statistic that is offered as an approximation of the
corresponding parameter of the population from which the
sample was drawn
5
Definitions
• An estimate is that single computed value from a sample
• A point estimate is a single numerical value used to estimate the
corresponding population parameter.
• An interval estimate consists of two numerical values defining a
range of values that, with a specified degree of confidence, most
likely includes the parameter being estimated.
• An estimator is the rule used to compute this value, or estimate.
• The sampled population is the population from which one actually
draws a sample.
• The target population is the population about which one wishes to
make an inference. 6
Point and Interval Estimates
• A point estimate is a single number,
• a confidence interval provides additional
information about the variability of the
estimate

Lower Upper
Confidence Point Estimate Confidence
Limit Limit
Width of
confidence interval
7
Point Estimators – Most common to use sample values

• Sample mean estimates population mean m


ˆ  y   y i

• Sample std. dev. estimates population std. dev. s

ˆ  s   i
( y  y ) 2

n 1
• Sample proportion ˆ estimates population
proportion
8
Confidence Intervals

• A confidence interval (CI) is an interval of numbers


believed to contain the parameter value.

• The probability the method produces an interval that


contains the parameter is called the confidence level.
Most studies use a confidence level close to 1, such
as 0.95 or 0.99.

9
Confidence Interval Estimate

• An interval gives a range of values:


• Takes into consideration variation in sample statistics from
sample to sample
• Based on observations from 1 sample
• Gives information about closeness to unknown population
parameters
• Stated in terms of level of confidence
• e.g. 95% confident, 99% confident
• Can never be 100% confident
10
Confidence Interval

• In practice you only take one sample of size n


• In practice you do not know µ so you do not know if the
interval actually contains µ
• However you do know that 95% of the intervals formed in
this manner will contain µ
• Thus, based on the one sample, you actually selected you can
be 95% confident your interval will contain µ (this is a 95%
confidence interval)

11
General Formula
• The general formula for all confidence
intervals is:
Point Estimate ± (Critical Value)(Standard Error)
Where:
•Point Estimate is the sample statistic estimating the population
parameter of interest
•Critical Value is a table value based on the sampling distribution
of the point estimate and the desired confidence level
12
Confidence Intervals
Confidence
Intervals

Population Population
Mean Proportion

σ Known σ Unknown

13
Confidence Interval for 𝜇
(𝜎 known)

• Assumptions
• Population standard deviation σ is
known
• Population is normally distributed
• If population is not normal, use large
sample

14
Confidence Interval for 𝜇
(𝜎 known)
• Confidence interval estimate:
σ
X  Z α/2
n

where X is the point estimate


Zα/2 is the normal distribution critical value for a
probability of /2 in each tail
is the standard error σ/ n

15
Finding the Critical Value, Zα/2

• Consider a 95% confidence interval:


1  α  0.95 so α  0.05

α α
 0.025  0.025
2 2

Z units: Zα/2 = -1.96 0 Zα/2 = 1.96


Lower Upper
X units: Confidence Point Estimate Confidence
Limit Limit
16
Example 1
• Data on percentage saturation of bile for 31 male
patients is as follows: 𝑥 =84.65 s = 24.00
• Find the 95% confidence of the mean.

17
Solution

24.00
• SE(𝑥) = = 4.31
31
• 84.65 ± (1.96)(4.31) = (76.2; 93.1)
• We are 95% confident that the true mean saturation
level is between 76.2 and 93.1
• Although the true mean may or may not be in this
interval, 95% of intervals formed in this manner will
contain the true mean
18
Confidence Interval for 𝜇 (𝜎 Unknown)
• If the population standard deviation is unknown but the population
is normally distributed, a large sample size should be used
• the sample standard deviation, S can be substituted for population
standard deviation σ.
• This introduces extra uncertainty, since S is variable from sample
to sample
• So the t distribution with n-1 degrees of freedom is used instead of
the normal distribution.
• The degrees of freedom measure the amount of information
available in the data that can be used to estimate σ² ; hence, it
measures the reliability of s² as an estimate of σ². 19
Confidence Interval for 𝜇
(𝜎 Unknown) )

• Confidence Interval Estimate:


S
X  tα / 2
n

(where tα/2 is the critical value of the t distribution


with n -1 degrees of freedom and an area of α/2
in each tail)
20
Student’s t Distribution
• The t is a family of distributions
• The tα/2 value depends on degrees of freedom
(d.f.)
• Number of observations that are free to vary
after sample mean has been calculated
• d.f. = n - 1

21
Student’s t Distribution
Note: t Z as n increases

Standard
Normal
(t with df = ∞)

t (df = 13)
t-distributions are bell-
shaped and symmetric, but
have ‘fatter’ tails than the t (df = 5)
normal

0 t
22
Example

A random sample of n = 25 has X = 50 and


S = 8. Form a 95% confidence interval for μ

• d.f. = n – 1 = 24, so t α/2  t 0.025  2.0639

The confidence interval is

23
Example of t distribution confidence interval

S 8
X  t α/2  50  (2.0639)
n 25

46.698 ≤ μ ≤ 53.302

24
25
Confidence Interval for a Proportion
• The estimate of a population proportion, p , is similar to
estimating a population mean.
• A sample is drawn from the population of interest, and the
sample proportion, p is computed. This sample proportion is
used as the point estimator of the population proportion. A
confidence interval is obtained by the general formula
• estimator ± (reliability coefficient)*(standard error of the
estimator)
The standard error of the sample proportion = √(p(1 – p)/n).
The 100( 1 – α) percent confidence interval for p is given by
pˆ  z1 / 2 pˆ (1  pˆ ) / n 26
Example 1:

What percentage of 18-22 year-old Ghanaians report being


“very happy”?
Recent GSS data: 35 of n = 164 say they are “very happy”
(others report being “pretty happy” or “not too happy”)
𝑝 = 35/164 = .213

𝑠𝑒 = 𝑝(1 − 𝑝)/𝑛

= 0.213(0.787)/164 = 0.032 27
Example 1:

95% CI is 0.213 ± 1.96(0.032), or 0.213 ± 0.063,

(i.e., “margin of error” = 0.063)

which gives (0.15, 0.28). We’re 95% confident, the


population proportion who are “very happy” is between
0.15 and 0.28.

28
Example 2
18 percent of Internet users have used it to search for information regarding
medicines. The sample consisted of 1220 adult Internet users, construct a 95
percent confidence interval for the proportion of Internet users in the sampled
population who have searched for information on medicines.
Solution
𝑝 ̂ = .18,
the reliability coefficient corresponding to a confidence level of .95 is 1.96,
Estimate of the standard error is =√(.18)(.82)/1220 = .0110
The 95 percent confidence interval for p, based on these data, is
0.18 ± 1.96(0.0110)
.18 ± .022 .158 , .202
29
Example 2 cont.
We are 95 percent confident that the population proportion p is
between .158 and .202.
Thus, we expect, with 95 percent confidence, to find
somewhere between 15.8 percent and 20.2 percent of adult
Internet users to have used it for information on medicine.

30
Example 3:

• Weight measured before and after period of


treatment
• y = weight at end – weight at beginning
• The result is shown below For n=17 girls
receiving the treatment.
y = 11.4, 11.0, 5.5, 9.4, 13.6, -2.9, -0.1, 7.4, 21.5, -
5.3, -3.8, 13.4, 13.1, 9.0, 3.9, 5.7, 10.7
31
SPSS: Analyze
---------------------------------------------------------------------------------------
Variable N Mean Std.Dev. Std. Error Mean
weight_change 17 7.265 7.157 1.736
----------------------------------------------------------------------------------------
se obtained as se  s / n  7.157/ 17  1.736
Since n = 17, df = 16, t-score for 95% confidence is 2.12
95% CI for population mean weight change is
y  t (se), which is 7.265  2.12(1.736), or (3.58, 10.94)
We can predict that the population mean weight change is positive
(i.e., the treatment is effective, on average), with value between
about 3.6 and 10.9 pounds
32
Example 2

• Consider the problem of estimating the prevalence of a disease


in 45 to 54 year-old women in Accra. Suppose that a random
sample of n = 5000 women is selected from this age group and
x = 28 are found to have the disease. Calculate the 95%
confidence interval.

33
Example 2
• Solution
28
• Point estimate , p = = 0.0056
5000
0.0056 1−0.0056
• SE(p) = = 0.0011
5000
• 95% confidence interval is given by
• 0.005±1.96(0.0011) = (0.0034; 0.0078)

34
Exercise
• 1. A researcher found that of 472 mechanically ventilated patients,
63 had clinical evidence of ventilator-associated pneumonia.
Construct a 95 percent confidence interval for the proportion of all
mechanically ventilated patients at these hospitals who may be
expected to develop the disease.

• 2. 125 unemployed male high-school dropouts between the ages of


16 and 21 were sampled. 88 stated that they were regular consumers
of alcoholic beverages. Construct a 95 percent confidence interval
for the population proportion
35
DETERMINING SAMPLE SIZE
• During early stages of planning a survey, how large a sample
to take is of importance.
• There are different formulae for determination of appropriate
sample size when different techniques of sampling are used.
• Determining representative sample size using simple random
sampling technique give equal probability to all units.
• These include
• using a census for small populations,
• imitating a sample size of similar studies,
• using published tables, and
• applying formulas to calculate a sample size. 36
Determining Sample Size

Determining
Sample Size

For the For the


Mean Proportion

Sample size depends on:


• Size of the population standard deviation, σ
• The desired degree of reliability, z.
•The desired interval width, e. 37
Determining Sample Size
Interval estimation are to obtain narrow
Determining intervals with high reliability. The width of
the interval is determined by the
Sample Size
magnitude of the quantity
(reliability coefficient) * (standard error of
the estimator) = margin of error.
For the
Mean
Sampling error (margin
of error)

σ σ
X  Zα / 2 e  Zα / 2
n n

38
Determining Sample Size - Cochran’s formula
(continued)
Determining
Sample Size

For the
Mean

σ 2
Zα / 2 σ 2
e  Zα / 2 Now solve for n
to get n
n e 2
39
Determining Sample Size
• Thus, sample size require the knowledge of σ² which
is not known. It has to be estimated using;
• 1. Select a pilot sample and estimate σ with the
sample standard deviation, S or
• 2. from previous or similar studies.

40
Example 1
• A biostatistician is to advice on size of a sample to be
taken to conduct a survey among a population of
teenage girls to determine their average daily protein
intake (measured in grams).
• How will he provide this assistance?
• These three items of information should be provided:
(1) the desired width of the confidence interval,
• (2) the level of confidence desired, and
• (3) the magnitude of the population variance
41
Example 2
• Assume an interval of about 10 grams wide is desired; (within 5
grams of the population mean in either direction - a margin of
error of 5 grams).
• Also assume that a confidence coefficient of .95 is decided
• from past experience, the population standard deviation is
probably about 20 grams.
• Thus, z= 1.96, σ = 20, e = 5

• 2
1.96 202 = 61.47.
• n A sample of size 62 is advised
5 2 42
Sample Size Example 2

If  = 45, what sample size is needed to


estimate the mean within ± 5 with 90%
confidence?

Z 2 σ 2 (1.645)2 (45)2
n 2
 2
 219.19
e 5

So the required sample size is n = 220

43
Exercise
• 1. To estimate the mean weight of babies born in her hospital, how
large a sample of birth records should be taken if a 99 percent
confidence interval is desired and that is 1 pound wide? Assume that
a reasonable estimate of the standard deviation is 1 pound.
• What sample size is required if the confidence coefficient is lowered
to .95?
• 2. in order to estimate the mean age of persons bitten by dogs, a
sample is to be drawn from the department’s records of dog bites
reported. A 95 percent confidence interval is desired, a margin o error
o 2.5 is satisfied and from previous studies estimates of the
population standard deviation is to be about 15 years.
• How large a sample should be drawn? 44
Determination Of Sample Size For Estimating Proportions -
Cochran’s formula
• This is essentially the same as that described for estimating a
population mean.
2

n  z pq 2
• e ‘ q=1-p
• p is the proportion in the population possessing the
characteristic of interest. This will be unknown, a pilot
sample is taken and an estimate is computed to be used in
place of p.
• If it is impossible to come up with a better estimate, one may
set p equal to .5 and solve for n.
45
Example
1. How large a sample size do we need to estimate a population
proportion to within 0.03, with probability 0.95? Assume the
population proportion is 50%.
solution
n = (1.96)² (0.50) (0.50) = 0.9604/0.0009 = 1067
0.03²

2. to determine what proportion of families in a certain area are


medically indigent, it is believed that the proportion cannot be
greater than .35. A 95 percent confidence interval is desired with e =
0.05. What size sample of families should be selected? 46
Practice 1
• Determine the sample size that would be required to estimate
the true proportion to within .03 with 95 percent confidence of
adults living in a large metropolitan area having hepatitis B
virus. In a similar metropolitan area the proportion of adults
with the characteristic is reported to be .20.
• If data from another metropolitan area were not available and a
pilot sample could not be drawn, what sample size would be
required.

47
Practice 2
• An administrator at Alpha-Beta clinic, wishes to know what
proportion of discharged patients is unhappy with the care received
during hospitalization. How large a sample should be drawn if the
margin of error is assumed to be 0.05, the confidence coefficient is
.95, and no other information is available?
• How large should the sample be if p is approximated by 0.25?

48
Yamane’s or Slovin's formula

• This is an alternative to Cochran’s formula. According


to him, for a 95% confidence level and p = 0.5 , size of
the sample should be

• where, N is the population size and e is the level of


precision

49
Using Published Tables
• Table 1. Sample Size for ±5% and ±10% Precision Levels where Confidence
Level is 95% and P=0.5
• Size
. of Size of Sample Size Sample Size
Sample Size Sample population (n) ±5% ±10%
population (n) ±5% Size ±10%
500 222 83
100 81 51
1000 286 91
125 96 56
2000 333 95
150 110 61
200 134 67 3000 353 97
250 154 72 4000 364 98
300 172 76 5000 370 98
350 187 78 7000 378 99
400 201 81 9000 383 99
450 212 82 10000 385 99 50
HYPOTHESIS TESTING
• Overview
• Basics of Hypothesis Testing
• Key Concepts in Hypothesis Testing
• Testing a Claim About a Mean: σ Known
• Testing a Claim About a Mean: σ Not Known

51
HYPOTHESIS TESTING
• Overview
• Hypothesis testing is the second of two general areas of
statistical inference. The main goal in many research studies
is to check whether the data collected support certain
statements or predictions.
• A hypothesis test involves collecting data from a sample and
evaluating the data. Then, a decision is made as to whether or
not there is sufficient evidence, based upon analyses of the
data, to reject the null hypothesis.
• Hypothesis testing consists of two contradictory hypotheses
or statements, a decision based on the data, and a conclusion
52
Basic Concepts
• A hypothesis may be defined simply as a statement about one
or more populations
• It is frequently concerned with the parameters of the
populations about which the statement is made.
• A hospital administrator may hypothesize that the average
length of stay of patients admitted to the hospital is 5 days;
• A public health nurse may hypothesize that a particular
educational program will result in improved communication
between nurse and patient;
• A physician may hypothesize that a certain drug will be
effective in 90 percent of the cases for which it is used. 53
Basic Concepts
• The null hypothesis - Hჿ - is a statement that the value of a
population parameter (such as proportion, mean, or standard
deviation) is equal to some claimed value.
• The null hypothesis states that the “null” condition exists;
that is, there is nothing new happening, the old theory is still
true, the old standard is correct, and the system is in control.
• The alternative hypothesis – H₁ or HA - is the statement that
the parameter has a value that somehow differs from the null
hypothesis. The alternative hypothesis, on the other hand,
states that the new theory is true, there are new standards, the
system is out of control, and/or something is happening 54
Basic Concepts
• NOTE: new hypotheses that researchers want to “prove” are stated in
the alternative hypothesis.
• Alternative hypothesis must use one of these symbols: ≠, <, >
Identifying Null and Alternative Hypothesis
• 1. In 2013 , 70% of Ghanaians 18years old participated in volunteer
work.
• i) A researcher believes that this percentage is different today.
• ii) A researcher believes that this percentage is lower today than in
2013
• iii) A researcher believes that this percentage is higher today than in
2013 55
Basic Concepts - Solution
• i) Ho : p = 70%; 70% of Ghanaians 18years old participated in
volunteer work in 2013.
• H1: p ≠ 70%; percentage of Ghanaians 18years old who
participated in volunteer work in 2013 is different from 70% today.

• ii) H1 : p < 70% ; percentage of volunteer work among Ghanaians


18years old is lower today than 70% in 2013

• iii) H1 : p > 70%; percentage of volunteer work among Ghanaians


18years old is higher today than 70% in 2013
56
Class Exercise
• Identify the Null and Alternative Hypothesis. Express the
corresponding null and alternative hypotheses in symbolic form.
• a) The proportion of drivers who admit to running red lights is
greater than 0.5.
• b) The mean height of professional basketball players is at most
7ft.
• c) The standard deviation of IQ scores of actors is equal to 15.

57
Solution
• a) The proportion of drivers who admit to running red lights is
greater than 0.5.
• H0 : p = 0.5.
• H1 : p > 0.5
• b) The mean height of professional basketball players is at most 7
ft. H0 : µ = 7
• H1 : µ ≤ 7.
• c) The standard deviation of IQ scores of actors is equal to 15.
• H0 : σ = 15.
• H1 : σ ≠ 15 58
Test Statistic
• The test statistic is a value used in making a decision about
the null hypothesis, and is found by converting the sample
statistic to a score with the assumption that the null
hypothesis is true.

59
Critical Value

• A critical value is any value that separates the critical region


(where we reject the null hypothesis) from the values of the
test statistic that do not lead to rejection of the null
hypothesis. The critical values depend on the nature of the
null hypothesis, the sampling distribution that applies, and the
significance level α.
60
Critical Region

• The critical region (or rejection region) is the set of all


values of the test statistic that cause us to reject the null
hypothesis.

• The significance level (denoted by α) is the probability that


the test statistic will fall in the critical region when the null
hypothesis is actually true.
61
Two-tailed Test
• H0 : = ……

62
Right-tailed Test and Left- tailed Test

63
Significance Level
• Level of significance reflects the fact that hypothesis tests
are sometimes called significance tests, and a computed value
of the test statistic that falls in the rejection region is said to
be significant.
The level of significance, α , specifies the area under the curve
of the distribution of the test statistic that is above the values
on the horizontal axis constituting the rejection region.
The more frequently encountered values of α are .01, .05, and
.10
64
Types of Errors
• 1. Type I Error – it is the error committed when a true null
hypothesis is rejected.
• Type II Error - it is the error committed when a false null
hypothesis is not rejected.

65
HYPOTHESIS TESTING
• To perform a hypothesis test:
• 1. Set up two contradictory hypotheses (null hypothesis and
alternative hypothesis)
• 2. Collect sample data
• 3. Determine the correct distribution to perform the
hypothesis test.
• 4. Test statistic. = relevant statistic - hypothesized parameter
standard error of the relevant statistic

• 5. Make a decision and write a meaningful conclusion.


66
HYPOTHESIS TESTING
• Conclusion. If the null hypothesis is rejected, we conclude that the
alternative is true. If the null hypothesis is not rejected, the
conclusion is that the null hypothesis may be true.
• A p value is the probability that the computed value of a test
statistic is at least as extreme as a specified value of the test
statistic when the null hypothesis is true.
• Thus, the p value is the smallest value of α for which we can reject
a null hypothesis.
• Reject H0 if the P-value ≤ α (where α is the significance level, such
as 0.05).
• Fail to reject H0 if the P-value > α. 67
HYPOTHESIS TESTING
• The purpose of hypothesis testing is to assist making
decisions.
• The administrative or clinical decision usually depends on
the statistical decision.
• If the null hypothesis is rejected, the administrative or
clinical decision is compatible with the alternative
hypothesis. The reverse is usually true if the null
hypothesis is not rejected. The administrative or clinical
decision, however, may take other forms, such as a
decision to gather more data
68
Example

• Blood glucose levels for obese patients have a mean of 100


with a standard deviation of 15. A researcher thinks that a diet
high in raw cornstarch will have a positive or negative effect
on blood glucose levels. A sample of 30 patients who have
tried the raw cornstarch diet have a mean glucose level of
140. Test the hypothesis that the raw cornstarch had an effect
69
Solution
• Step 1: H0 :μ=100
• Step 2: H1 :≠100
• Step 3: We’ll use 0.05 for this example. As this is a two tailed
test, split the alpha into two. 0.05/2=0.025
• Step 4: A z-score for (0.5-0.025=0.475) is 1.96.
• Step 5: Find the test statistic using this formula
• z=(140-100)/(15/√30)=14.60.
• Step 6: If Step 5 is less than -1.96
• or greater than 1.96 (Step 3), reject the
null hypothesis this case, it is greater, so you can reject the
null
70
Example
• Researchers are interested in the mean age of a certain
population. The data available to the researchers are the ages
of a simple random sample of 10 individuals drawn from the
population of interest. From this sample a mean of 27 and a
variance of 20 have been computed. It is assumed that the
sample comes from a population whose ages are
approximately normally distributed. Can we conclude that the
mean age of this population is different from 30 years?
• Solution
• Hჿ: μ = 30
• H₁: μ ≠ 30 71
Solution
• The decision rule: reject Hჿ if the computed value of the test
statistic is either ≥ 1.96 or ≤ -1.96; otherwise do not reject
Hჿ.


• Statistical decision
• Reject the null hypothesis since -2.12 is in the rejection
region, that is, the computed value of the test statistic is
significant at the .05 level.
• The conclusion is that the population mean is not equal to 30.
72
HYPOTHESIS TESTING
• Sampling from Normally Distributed Populations with
Population Variance unknown

• We wish to check that normal body temperature may be less


than 98.6 degrees. In a random sample of n=18 individuals,
the sample mean was found to be 98.217 and the standard
deviation was .684. Assume the population is normally
distributed. Use alpha = 0.05.
• H0 : μ = 98.6
• HA: μ < 98.6
73
Solution
• Left tailed, α = 0.05 , df=18-1=17

• t critical value = 1.740

• t = 98.217 −98.6

0.684 /(√18 ) = −2.375631 = −2.38

Our test value is smaller than the critical value of -1.74.

We have enough evidence to support the claim that average


body temperature is less than 98.6 degrees 74
Exercise
• 1. The ages (years) of 16 subjects with eye defects are: 62,
62, 68, 48, 51, 60, 51, 57, 57, 41, 62, 50, 53, 34, 62, 61. Can
we conclude that the mean age of the population from which
the sample may be presumed to have been drawn is less than
60 years? Let α = .05
• 2. A sample of 18 patients were investigated concerning their
oral status. The mean teeth index value was 10.3 with a
standard deviation of 7.3. Is this sufficient evidence to allow
us to conclude that the mean index is greater than 9.0 in a
population of similar subjects?
75
Exercise
• 3. At a chronic disease hospital on an outpatient basis, a study
was made of a sample of 25 records of patients with the mean
number of outpatient visits per patient was 4.8, and the
sample standard deviation was 2. Can it be concluded from
these data that the population mean is greater than four visits
per patient? Let the probability of committing a type I error
be .05. What assumptions are necessary?
• 4. Forty-nine adolescents served as the subjects in a study.
The variable of interest was the diameter of skin test reaction
to an antigen. The sample mean and standard deviation were
21 and 11 mm, respectively. Can it be concluded from these
data that the population mean is less than 30? 76
Exercise
• 5. To know if the mean daily caloric intake in the adult rural
population of a developing country is less than 2000,a sample
of 500 had a mean of 1985 and a standard deviation of 210.
take a significance level of 5%.
• 6. A survey of 100 similar-sized hospitals revealed a mean
daily census in the pediatrics service of 27 with a standard
deviation of 6.5. Do these data provide sufficient evidence to
indicate that the population mean is greater than 25? Let α =
.10; α = .05
77
Analysis of Variance
• Analysis of variance (ANOVA) is a method of testing the equality of
three or more population means by analyzing sample variances
• Techniques for comparing the means of three or more different
populations or samples.
• The one-way analysis of variance is used to test the claim that three
or more population means are equal.
• This is an extension of the two independent samples t-test.
• when there are 3 or more means being compared, statistical
significance can be ascertained by conducting one statistical test,
ANOVA, or by repeated t-tests

78
WHEN TO USE ANOVA
• ANOVA is used in applications such as the following:
• 1. To determine if there is sufficient evidence to support the
• claim that the three groups have different mean blood pressure
• levels by treating group 1 with 2 tablets of aspirin, group 2
• 1tablet each day and the group 3 placebo.
• 2. To test the claim that the cereals on the shelves have the
• same mean sugar content since it is believed that supermarkets
• place high-sugar cereals on shelves that are at eye-level for
• children

79
Assumptions of one way ANOVA
• Populations are normally distributed
• The data are randomly sampled and independently chosen from the
populations
• The variances of each sample are assumed equal./ Populations have
equal variances.
• It is based on a comparison of two different estimates
• The variance among (between) samples and the variance within
samples.
• It is one-way since the sample data are separated into groups
according to one characteristic, or factor.
80
One-Way ANOVA
• Hypotheses
• Hჿ: μ₁ = μ₂ = μ₃ = ……μₖ
• All population means are equal
• i.e., no factor effect (no variation in means among groups)
• H₁ :Not all of the population means are the same
• At least one population mean is different
• i.e., there is a factor effect
• Does not mean that all population means are different (some
pairs may be the same)
81
One-Way ANOVA
• ANOVA analyzes the variance among values.
• It calculates the variance by summing the squares of the
differences between each value and the mean.
• This is called the sum of squares.
• The variance has two components when the data is from
several groups.
• 1. variation from differences among the group mean.
• 2. variation from differences among the subjects within
• each group (within-groups sum of square)
82
Computing a one-way ANOVA
• Here is the basic one-way ANOVA table
Source SS df MS F p F
Between SSA k-1 SSA / k-1 F = MSA/
(Among) MSW

Within SSW n-k SSW / n-k

Total SST N-1


83
Decision Rule

84
Class exercise
• A study was conducted to test the question as to whether cigarette smoking
is associated with reduced serum levels in men aged 35 to 45. The outcome
is as follows:
a) What is the null hypothesis?
b. What is the alternative hypothesis?
Source SS df MS F c. Identify the value of the test statistic.
d. Find the critical value for a 0.05
Between 0.7248 3 0.2416 9.152 significance level
(Among)
e. how many groups are there?
f. Based on the preceding results, what
do you conclude about equality of the
Within 8.1516 309 population
0.0264 means?

Total 8.8764 312 85


86
Chi-Square Test
• The most frequently employed statistical technique for the analysis of
count or frequency data.
• One may wish to know, for the population from which the sample
was drawn, if a certain variable differs according to gender.
• There may be frequencies for a variable in category represented and
for another variable represented.
• One might want to know if, in the population from which the sample
was drawn, there is a relationship between the variables of interest.
• chi-square assumes values between 0 and infinity
• Chi-square is used testing hypotheses where the data available for
analysis are in the form of frequencies. 87
Types of Chi-Square Tests
• Tests of Goodness-of-fit - is appropriate when one wishes to decide
if an observed distribution of frequencies is drawn from a
preconceived or hypothesized distribution (Normal, binomial).
• Tests of Independence - to test the null hypothesis that two criteria
of classification, when applied to the same set of entities, are
independent. For example, if socioeconomic status and area of
residence of the inhabitants of a certain city are independent, we
would expect to find the same proportion of families in the low,
medium, and high socioeconomic groups in all areas of the city.
• Tests of Homogeneity – to test the null hypothesis that samples are
drawn from populations that are homogeneous with respect to
88
some criterion of classification.
Chi-Square Tests
• The chi-square statistic is most appropriate for use with
categorical variables, such as marital status (married, single,
widowed, and divorced).
• The quantitative data used in the computation of the test statistic
are the frequencies associated with each category of the one or
more variables under study.
• There are two sets of frequencies with which we are concerned,
observed frequencies and expected frequencies. The observed
frequencies are the number of subjects or objects in our sample
that fall into the various categories of the variable of interest.
89
Chi-Square Tests
• The computed value of X² is compared with the tabulated ꭕ²
value with k-r degrees of freedom. k is equal to the number
of groups available, and r is the number of restrictions or
constraints imposed
• The decision rule, then, is: Reject Hჿ if ꭕ² is greater than or
equal to the tabulated ꭕ² for the chosen value of α.

90

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy