SEE5211 Chapter5 P2017
SEE5211 Chapter5 P2017
(SEE5211/SEE8212)
(SEE5211/SEE8212)
Chapter 5
Statistic
p – sample proportion
4.5 5.4 10.3 7.9 8.5 6.6 11.7 8.9 2.2 9.8
6.3 4.3 9.6 8.7 13.3 4.6 10.7 13.4 7.7 5.6
Suppose we randomly catch a sample of 3 fish from this pond and measure their length. What
would the mean length of the sample be?
x = 7.27 inches
2nd sample - 8.5, 4.6, and 5.6 inches.
x = 6.23 inches The true mean m = 8.
3rd sample – 10.3, 8.9, and 13.4 inches. Notice that some sample
means are closer and some
farther away; some above and
x = 10.87 inches some below the mean.
Suppose we wanted to estimate the proportion of blue candies in a
VERY large bowl.
How might we go about estimating this proportion?
Population characteristic
The paper reports the results of 7421 students at 40 colleges and
universities. (The sample was selected in such a way that it is
representative of the population of college students.)
The authors want to estimate the proportion (p) of college
students who spend more than 3 hours a day on the Internet.
2998 out of 7421 students reported using the Internet more than 3
hours a day.
p = 2998/7421 = .404
A research paper “The Impact of Internet and Television Use on the Reading Habits and Practices of College Students”
investigates the reading habits of college students. The following observations represent the number of hours spent on
academic reading in 1 week by 20 college students.
If a point estimate of m, the mean academic reading time per week for all
college students, is desired, an obvious choice of a statistic for estimating m is
the sample mean x.
However, there are other possibilities – a trimmed mean or the sample median.
1.7 3.8 4.7 9.6 11.7 12.3 12.3 12.4 12.6 13.4
14.1 14.2 15.8 15.9 18.7 19.4 21.2 21.9 23.3 28.2
1.7 3.8 4.7 9.6 11.7 12.3 12.3 12.4 12.6 13.4
14.1 14.2 15.8 15.9 18.7 19.4 21.2 21.9 23.3 28.2
287.2
sample mean x 14.36
20
13.4 14.1
sample median 13.75
2
230.2
10% trimmed mean 14.39
16
Computing an Estimate
. . . within 5 years?
. . . within 1 year?
What happened to your level of
confidence as the interval
became smaller?
Confidence level
The most common confidence levels are 90%, 95%, and 99% confidence.
General Properties for sampling distributions
1. m ˆ p
p As long as the sample size is less
than 10% of the population
p (1 p )
2. pˆ
n
0
-1.96 1.96
Developing a Confidence Interval
If p is within 1.96
p (1 p ) of p,
n
p
p p (1 p ) p (1 p )
1.96 1.96
n n
This line represents 1.96 standard deviations This line represents 1.96 standard
below the mean. deviations above the mean.
pˆ(1 pˆ)
pˆ (z critical value)
n
This is called the bound on the
error estimation.
A survey of 1031 adult Americans: The survey was carried out by
the National Center for Public Policy and the sample was selected
in a way that makes it reasonable to regard the sample as
representative of adult Americans. Of those surveyed, 567 indicated
that they believe a college education is essential for success.
What is a 95% confidence interval for the population
proportion of adult Americans who believe that a college
education is essential for success?
Conditions:
1) np = 1031(.55) = 567 and n(1-p) = 1031(.45) = 364,
since both of these are greater than 10, the sample
size is large enough to proceed.
2) The sample size of n = 1031 is much smaller than
10% of the population size (adults).
3) The sample was selected in a way designed to
produce a representative sample. So we can regard
the sample as a random sample from the population.
College Education Continued . . .
What is a 95% confidence interval for the
population proportion of adults who believe that a college education is
essential for success?
Calculation:
pˆ(1 pˆ)
pˆ (z critical value)
n
.55(.45)
.55 1.96 (.521,.579)
1031
Conclusion:
We are 95% confident that the population proportion of adults who
believe that a college education is essential for success is between
52.1% and 57.9%
College Education Revisited . . .
.55(.45)
.55 1.645 (.524,.575)
1031 0.51,0.521, 0.524, 0.575,0.579,0.590
p (1 p )
If we solve this for n . . .
B 1.96
n
2
1.96
n p 1 p
B
Why is the conservative estimate for p = 0.5?
2
1.96 What value should be used for p?
n p (1 p )
B
2
1.96
n .25
.03
Always round the sample size up
n 1067.111 to the next whole number .
n 1068 people
Confidence intervals for m when is known
The general formula for a confidence interval for a population mean m when .
..
1) x is the sample mean from a random sample,
2) the sample size n is large (n > 30), and
3) , the population standard deviation, is known
is
x (z critical value) Standard
Point estimate n deviation of the
statistic
Cosmic radiation levels rise with increasing altitude, promoting researchers to
consider how pilots and flight crews might be affected by increased
exposure to cosmic radiation. A study reported a mean annual cosmic radiation
dose of 219 mrems for a sample of flight personnel of Xinjiang Airlines.
Suppose this mean is based on a random sample of 100 flight crew members.
Let s = 35 mrems.
Calculate and interpret a 95% confidence interval for the actual
mean annual cosmic radiation exposure for Xinjiang flight crew
members.
1)Data is from a random sample of crew members
2)Sample size n is large (n > 30)
3) is known
Cosmic Radiation Continued . . .
x (z critical value )
n
35
219 1.96 (212.14, 225.86)
100
We are 95% confident that the actual mean annual cosmic radiation exposure
for Xinjiang flight crew members is between 212.14 mrems and 225.86 mrems.
Confidence intervals for m when is unknown
z curve
t curve for 2 df
Why is the z curve taller
than the t curve for 2 df?
0
Important Properties of t Distributions
t curve for 8 df
t curve for 2 df
0
Important Properties of t Distributions Continued . .
.
z curve
t curve for 2 df
t curve for 5 df
0
Confidence intervals for m when is unknown
s
is x (t critical value)
n
Where the t critical value is based on df = n - 1.
In a study, chimpanzees learned to use an apparatus that dispersed food when either of
two ropes was pulled. When one of the ropes was pulled, only the chimp controlling the
apparatus received food. When the other rope was pulled, food was dispensed both to
the chimp controlling the apparatus and also a chimp in the adjoining cage. The
accompanying data represent the number of times out of 36 trials that each of seven
chimps chose the option that would provide food to both chimps (charitable response).
23 22 21 24 19 20 20
1
Normal Scores
20 22 24
Number of Charitable Responses
The plot is reasonable
-1 straight, so it seems plausible
that the population
distribution of number of
-2 charitable responses is
approximately normal.
Chimps Continued . . .
23 22 21 24 19 20 20
x = 21.29 and s = 1.80 df = 7 – 1 = 6
s
x (t critical value)
n
1.80
21.29 3.71 (18.77, 23.81)
7
We are 99% confident that the mean number of
charitable responses for the population of all
chimps is between 18.77 and 23.81.
Choosing a Sample Size
Solve this for n: B 1.96
n
When is unknown, a preliminary study can be This requires to be
performed to estimate known – which is rarely the
OR case!
make an educated guess of the value of .
A rough estimate for (used with distributions
that are not too skewed) is the range divided
2
1.96
by 4. We can use this to find
the necessary sample
n
size for a particular
bound on error of
B estimation.
The financial aid office wishes to estimate the mean cost of textbooks
per quarter for students at a particular university. For the estimate to
be useful, it should be within $20 of the true population mean. How
large a sample should be used to be 95% confident of achieving this
level of accuracy?
The financial aid office is believes that the amount spent on books
varies with most values between $150 to $550.
To estimate :
550 150
$100
4
Standard deviation
Empirical Rule-
1.96100
2
20
the next whole number!
n 97
Contour Plot
• Open littlepond.jmp
• Select Graph > Contour plot
• Select the X, Y coordinates and click X
• Select the depth Z and click Y (in a contour plot, the X1, X2 roles are used for the
X and Y axes)
• Red Triangle >Fill Areas
Nominal Logistic Regression
1. Open Penicillin.jmp.
2. Select Analyze > Fit Y by X.
3. Select Response and click Y, Response. (Categorical Variable)
4. Select In(Dose) and click X, Factor. (Continuous Variable)
Notice that JMP automatically fills in Count for Freq. Count was previously
assigned the role of Freq.
5. Click OK.
Right Click , choose marker Size
Correlations report
Covariance Matrix
6. Click Red Triangle , Scree plot, Scatterplot 3D
• The report gives the eigenvalues and a bar chart of the percent of the
variation accounted for by each principal component. There is a Score Plot
and a Loadings Plot as well.
• The eigenvalues indicate the total number of components extracted based on
the amount of variance contributed by each component.
• The Score Plot graphs each component’s calculated values in relation to the
other, adjusting each value for the mean and standard deviation.
• The Loadings Plot graphs the unrotated loading matrix between the variables
and the components. The closer the value is to 1 the greater the effect of the
component on the variable.