0% found this document useful (0 votes)

45 views48 pages

SEE5211 Chapter5 P2017

This document outlines a lecture on data analysis in environmental applications. It discusses sampling variability and confidence intervals. Specifically, it defines key statistical concepts like statistics, point estimates, and sampling variability. It provides an example to illustrate how sampling variability causes different statistics to be obtained from different samples. The document then introduces confidence intervals as a way to convey more information than a single point estimate by providing a range of plausible values for a population characteristic. It discusses how to construct a 95% confidence interval for a population proportion based on a large random sample.

Uploaded by

kk chan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views48 pages

SEE5211 Chapter5 P2017

Uploaded by

kk chan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

Data Analysis in Envir Application

(SEE5211/SEE8212)

Dr. Wen Zhou

School of Energy and Environment

Email: wenzhou@cityu.edu.hk ; Office: B5425, AC1

Outline

• The role of statistics and the data analysis process

• Numerical method of describing data
• Summarizing bivariate data
• Population distributions
• Sampling variability and Confidence interval
• Hypothesis Testing Using a Single Sample
• Comparing Two populations
• Regression Analysis
• Analysis of Variance
Data Analysis in Envir Application

(SEE5211/SEE8212)

Sampling variability and Confidence interval

Chapter 5
Statistic

• A number that that can be computed from

sample data
• Some statistics we will use include
x – sample mean This variability is called
s – standard deviation sampling variability

p – sample proportion

• The observed value of the statistic

depends on the particular sample selected
from the population and it will vary from
sample to sample.
An example of sampling variability

A fish pond： Suppose there are 20 fish in the pond. The

lengths of the fish (in inches) are given below:

4.5 5.4 10.3 7.9 8.5 6.6 11.7 8.9 2.2 9.8
6.3 4.3 9.6 8.7 13.3 4.6 10.7 13.4 7.7 5.6
Suppose we randomly catch a sample of 3 fish from this pond and measure their length. What
would the mean length of the sample be?

We caught fish with lengths 6.3 inches, 2.2

Let’s catch two more samples and
inches, and 13.3 inches. look at the sample means.

x = 7.27 inches
2nd sample - 8.5, 4.6, and 5.6 inches.
x = 6.23 inches The true mean m = 8.

3rd sample – 10.3, 8.9, and 13.4 inches. Notice that some sample
means are closer and some
farther away; some above and
x = 10.87 inches some below the mean.
Suppose we wanted to estimate the proportion of blue candies in a
VERY large bowl.
How might we go about estimating this proportion?

We could take a sample of candies

and compute the proportion of blue
candies in our sample.

We would have a sample proportion

or a statistic – a single value for the
estimate.
Point Estimate

• A single number (a statistic) based on sample data that is used

to estimate a population characteristic
• But not always to the population characteristic due to sampling
variation
Different samples may produce different
statistics.

“point” refers to the single value on a

number line.

Population characteristic
The paper reports the results of 7421 students at 40 colleges and
universities. (The sample was selected in such a way that it is
representative of the population of college students.)
The authors want to estimate the proportion (p) of college
students who spend more than 3 hours a day on the Internet.
2998 out of 7421 students reported using the Internet more than 3
hours a day.

This is a point estimate for the population proportion of college students

who spend more than 3 hours a day on the Internet.

p = 2998/7421 = .404
A research paper “The Impact of Internet and Television Use on the Reading Habits and Practices of College Students”
investigates the reading habits of college students. The following observations represent the number of hours spent on
academic reading in 1 week by 20 college students.

If a point estimate of m, the mean academic reading time per week for all
college students, is desired, an obvious choice of a statistic for estimating m is
the sample mean x.
However, there are other possibilities – a trimmed mean or the sample median.

1.7 3.8 4.7 9.6 11.7 12.3 12.3 12.4 12.6 13.4
14.1 14.2 15.8 15.9 18.7 19.4 21.2 21.9 23.3 28.2

The dotplot suggest this data is approximately symmetrical.

College Reading Continued . . .

1.7 3.8 4.7 9.6 11.7 12.3 12.3 12.4 12.6 13.4
14.1 14.2 15.8 15.9 18.7 19.4 21.2 21.9 23.3 28.2

287.2
sample mean  x   14.36
20

13.4  14.1
sample median   13.75
2

230.2
10% trimmed mean   14.39
16
Computing an Estimate

• Choose a statistic that is unbiased (accurate)

A statistic whose mean value is equal to the value of the population
characteristic being estimated is said to be an unbiased statistic.

Unbiased, since the

Biased, since the distribution is Unbiased, since
distribution is centered
NOT centered at the true value the distribution is
at the true value
centered at the
true value
Suppose we wanted to estimate the proportion of blue candies in a
VERY large bowl.

We could take a sample of candies and compute the proportion of blue

candies in our sample.

Would you have more confidence if your answer were an interval?

How much confidence do you have in the point estimate?
Confidence intervals

A confidence interval (CI) for a population characteristic is an

interval of plausible values for the characteristic.

It is constructed so that, with a chosen degree of confidence, the

actual value of the characteristic will be between the lower and
upper endpoints of the interval.

The primary goal of a confidence interval is to estimate an

unknown population characteristic.
Rate your confidence0 – 100%

What does it mean to be within 10 years?

How confident (%) are you that you can ...

Guess a person’s age within 10 years?

. . . within 5 years?

. . . within 1 year?
What happened to your level of
confidence as the interval
became smaller?
Confidence level

The confidence level associated with a confidence interval

estimate is the success rate of the method used to construct the
interval.

If this method was used to generate an interval estimate over and

over again from different samples, in the long run 95% of the
resulting intervals would include the actual value of the
characteristic being estimated.

The most common confidence levels are 90%, 95%, and 99% confidence.
General Properties for sampling distributions

1. m ˆ  p
p As long as the sample size is less
than 10% of the population
p (1  p )
2.  pˆ 
n

3. As long as n is large (np > 10 and n (1-p) > 10) the

sampling distribution of p is approximately normal.

These are the conditions that must be true in order to

calculate a large-sample confidence interval for p
large-sample confidence interval
To begin, we will use a 95% confidence level. Use the table of standard normal
curve areas to determine the value of z* such that a central area of .95 falls
between –z* and z*.
For large random samples, the sampling distribution
of p is approximately normal. So about 95% of the
possible p will fall within
p (1  p )
1.96 within p
n
We can generalize this to
normal distributions other Central Area = .95
than the standard normal
distribution – 95% of these values
About 95% of the values are within 1.96 of
are within 1.96 standard the mean.
deviations of the mean

Lower tail area = .025 Upper tail area = .025

0
-1.96 1.96
Developing a Confidence Interval

If p is within 1.96
p (1  p ) of p,
n

this means the interval

p (1  p ) p (1  p )
pˆ  1.96 to pˆ  1.96
n n
will capture p.

And this will happen for 95% of all possible samples!

Developing a Confidence Interval

Notice that the length of each

half of the interval equals Approximate sampling
distribution of p
p (1  p )
1.96 Here is the mean of the
n sampling distribution

p
p p (1  p ) p (1  p )
1.96 1.96
n n
This line represents 1.96 standard deviations This line represents 1.96 standard
below the mean. deviations above the mean.

This p doesn’t fall within 1.96 standard deviations of the mean

AND its confidence interval does NOT “capture” p.

This p fell within 1.96 standard deviations

When n is large, a 95% confidence interval for p is of the mean AND its confidence interval
“captures” p.
p (1  p )
pˆ  1.96
n
The diagram to the right is
100 confidence intervals for
p computed from 100
different random samples.

Note that the ones with

asterisks do not capture p.

If we were to compute 100

more confidence intervals for
p from 100 different random
samples, would we get the
same results?
The Large-Sample Confidence Interval for p

The general formula for a confidence interval for a population

proportion p when

• p is the sample proportion from a random sample

• the sample size n is large (np > 10 and

n(1-p) > 10), and

• if the sample is selected without replacement, the sample size is small

relative to the population size (at most 10% of the population)
The Large-Sample Confidence Interval for p

The general formula for a confidence interval for a population

proportion p . . . Is
The 95% confidence interval is based on the fact that, for
approximately 95% of all random samples, p is within the
bound on error estimation of p.

pˆ(1  pˆ)
pˆ  (z critical value)
n
This is called the bound on the
error estimation.
A survey of 1031 adult Americans: The survey was carried out by
the National Center for Public Policy and the sample was selected
in a way that makes it reasonable to regard the sample as
representative of adult Americans. Of those surveyed, 567 indicated
that they believe a college education is essential for success.
What is a 95% confidence interval for the population
proportion of adult Americans who believe that a college
education is essential for success?

The point estimate is

Before computing the confidence 567
interval, we need to verify the pˆ   .55
conditions. 1031
College Education Continued . . .
What is a 95% confidence interval for the
population proportion of adults who believe that a
college education is essential for success?

Conditions:
1) np = 1031(.55) = 567 and n(1-p) = 1031(.45) = 364,
since both of these are greater than 10, the sample
size is large enough to proceed.
2) The sample size of n = 1031 is much smaller than
10% of the population size (adults).
3) The sample was selected in a way designed to
produce a representative sample. So we can regard
the sample as a random sample from the population.
College Education Continued . . .
What is a 95% confidence interval for the
population proportion of adults who believe that a college education is
essential for success?

Calculation:
pˆ(1  pˆ)
pˆ  (z critical value)
n
.55(.45)
.55  1.96  (.521,.579)
1031
Conclusion:
We are 95% confident that the population proportion of adults who
believe that a college education is essential for success is between
52.1% and 57.9%
College Education Revisited . . .

A 95% confidence interval for the population proportion of adults

who believe that a college education is essential for success is:
.55(.45)
.55  1.96  (.521,.579)
1031

Compute a 90% confidence interval for this proportion.

.55(.45)
.55  1.645  (.524,.575)
1031 0.51，0.521， 0.524， 0.575，0.579，0.590

Compute a 99% confidence interval for this proportion.

.55(.45)
.55  2.58  (.510,.590)
1031
Choosing a Sample Size
Sometimes, it is feasible to perform a preliminary study to estimate the value for p.

The bound on error estimation for a 95% confidence interval is

Before collecting any data, an investigator may wish to determine a sample

size needed to achieve a certain bound on error estimation.

p (1  p )
If we solve this for n . . .
B  1.96
n

If there is no prior knowledge and a preliminary study is not feasible,

then the conservative estimate for p is 0.5.

2
 1.96 
n  p 1  p  
 B 
Why is the conservative estimate for p = 0.5?

.1(.9) = .09 By using .5 for p, we are using the

.2(.8) = .16 largest value for p(1 – p) in our
.3(.7) = .21 calculations.
.4(.6) = .24
.5(.5) = .25
In spite of the potential safety hazards, some people would like to
have an internet connection in their car. Determine the sample size
required to estimate the proportion of adults who would like an
internet connection in their car to within 0.03 with 95% confidence.

2
 1.96  What value should be used for p?
n  p (1  p ) 
 B 
2
 1.96 
n  .25 
 .03 
Always round the sample size up
n  1067.111  to the next whole number .
n  1068 people
Confidence intervals for m when  is known

The general formula for a confidence interval for a population mean m when .
..
1) x is the sample mean from a random sample,
2) the sample size n is large (n > 30), and
3) , the population standard deviation, is known
is

These are the properties of the sampling

distribution of x.

Bound on error of estimation

  
x  (z critical value)  Standard
Point estimate  n deviation of the
statistic
Cosmic radiation levels rise with increasing altitude, promoting researchers to
consider how pilots and flight crews might be affected by increased
exposure to cosmic radiation. A study reported a mean annual cosmic radiation
dose of 219 mrems for a sample of flight personnel of Xinjiang Airlines.
Suppose this mean is based on a random sample of 100 flight crew members.
Let s = 35 mrems.
Calculate and interpret a 95% confidence interval for the actual
mean annual cosmic radiation exposure for Xinjiang flight crew
members.
1)Data is from a random sample of crew members
2)Sample size n is large (n > 30)
3)  is known
Cosmic Radiation Continued . . .

Let x = 219 mrems

n = 100 flight crew members
s = 35 mrems.
Calculate and interpret a 95% confidence interval for the actual mean annual cosmic
radiation exposure for Xinjiang flight crew members.

  
x  (z critical value ) 
 n
 35 
219  1.96   (212.14, 225.86)
 100 
We are 95% confident that the actual mean annual cosmic radiation exposure
for Xinjiang flight crew members is between 212.14 mrems and 225.86 mrems.
Confidence intervals for m when  is unknown

When  is unknown, we use the sample standard deviation s to

estimate . In place of z-scores, we must use the following to
standardize the values:
x m
t 
s
n

The use of the value of s introduces extra variability.

Therefore the distribution of t values has more variability
than a standard normal curve.

t value 1.98 at 95%, 99df . (212.07,225.93)

Important Properties of t Distributions

1) The t distribution corresponding to any particular

number of degrees of freedom is bell shaped and
centered at zero (just like the standard normal (z)
distribution).
2) Each t distribution is more spread out than the
standard normal distribution.
t distributions are described by degrees of freedom (df).

z curve

t curve for 2 df
Why is the z curve taller
than the t curve for 2 df?

0
Important Properties of t Distributions

3) As the number of degrees of freedom increases, the

spread of the corresponding t distribution decreases.

t curve for 8 df

t curve for 2 df

0
Important Properties of t Distributions Continued . .
.

3) As the number of degrees of freedom increases, the

spread of the corresponding t distribution decreases.
4) As the number of degrees of freedom increases, the
corresponding sequence of t distributions approaches
the standard normal distribution.

z curve

t curve for 2 df
t curve for 5 df

0
Confidence intervals for m when  is unknown

The general formula for a confidence interval for a population

mean m based on a sample of size n when . . .

1) x is the sample mean from a random sample,

2) the population distribution is normal, or the sample size n is
large (n > 30), and
3) s, the population standard deviation, is unknown

 s 
is x  (t critical value) 
 n
Where the t critical value is based on df = n - 1.
In a study, chimpanzees learned to use an apparatus that dispersed food when either of
two ropes was pulled. When one of the ropes was pulled, only the chimp controlling the
apparatus received food. When the other rope was pulled, food was dispensed both to
the chimp controlling the apparatus and also a chimp in the adjoining cage. The
accompanying data represent the number of times out of 36 trials that each of seven
chimps chose the option that would provide food to both chimps (charitable response).

23 22 21 24 19 20 20

Compute a 99% confidence interval for the mean number of

charitable responses for the population of all chimps.
Chimps Continued . . .
23 22 21 24 19 20 20
2

1
Normal Scores

20 22 24
Number of Charitable Responses
The plot is reasonable
-1 straight, so it seems plausible
that the population
distribution of number of
-2 charitable responses is
approximately normal.
Chimps Continued . . .
23 22 21 24 19 20 20
x = 21.29 and s = 1.80 df = 7 – 1 = 6

 s 
x  (t critical value)  
 n
 1.80 
21.29  3.71   (18.77, 23.81)
 7 
We are 99% confident that the mean number of
charitable responses for the population of all
chimps is between 18.77 and 23.81.
Choosing a Sample Size

The bound on error of estimation associated with a 95% confidence

interval is

  
Solve this for n: B  1.96 
 n
When  is unknown, a preliminary study can be This requires  to be
performed to estimate  known – which is rarely the
OR case!
make an educated guess of the value of .
A rough estimate for  (used with distributions
that are not too skewed) is the range divided
2
 1.96 
by 4. We can use this to find
the necessary sample

n  
size for a particular
bound on error of

 B  estimation.
The financial aid office wishes to estimate the mean cost of textbooks
per quarter for students at a particular university. For the estimate to
be useful, it should be within $20 of the true population mean. How
large a sample should be used to be 95% confident of achieving this
level of accuracy?
The financial aid office is believes that the amount spent on books
varies with most values between $150 to $550.

To estimate  :
550  150
  $100
4
Standard deviation

Empirical Rule-

• Approximately 68% of the

observations are within 1 standard
deviation of the mean

• Approximately 95.4% of the

observations are within 2 standard
deviation of the mean 550  150
  $100
4
• Approximately 99.7% of the
observations are within 3 standard
deviation of the mean
The financial aid office wishes to estimate the mean cost of
textbooks per quarter for students at a particular university. For
the estimate to be useful, it should be within $20 of the true
population mean. How large a sample should be used to be 95%
confident of achieving this level of accuracy?

 1.96100  
2

n    96.04  Always round sample size up to

 20 
the next whole number!

n  97
Contour Plot

• Open littlepond.jmp
• Select Graph > Contour plot
• Select the X, Y coordinates and click X
• Select the depth Z and click Y (in a contour plot, the X1, X2 roles are used for the
X and Y axes）
• Red Triangle >Fill Areas
Nominal Logistic Regression

1. Open Penicillin.jmp.
2. Select Analyze > Fit Y by X.
3. Select Response and click Y, Response. (Categorical Variable)
4. Select In(Dose) and click X, Factor. (Continuous Variable)
Notice that JMP automatically fills in Count for Freq. Count was previously
assigned the role of Freq.
5. Click OK.
Right Click , choose marker Size

The plot shows the fitted model, which

is the predicted probability of being
cured, as a function of ln(dose). The
p-value is significant, indicating that
the dosage amounts have a significant
effect on whether the rabbits are
cured.
Principal Component Analysis

The purpose of principal component analysis is to derive a small number of

independent linear combinations (principal components) of a set of measured variables
that capture as much of the variability in the original variables as possible. Principal
component analysis is a dimension-reduction technique, as well as an exploratory data
analysis tool. Principal component analysis is also useful for constructing predictive
models, as in principal components analysis regression (PCA regression)
1. Open Solubility.jmp.
2. Select Analyze > Multivariate Methods > Principal Components.
The Principal Components launch window appears.
3. Select all of the continuous columns and click Y, Columns.
4. Keep the default Estimation Method.
5. Click OK. The Principal Components on Correlations report appears.

Correlations report

Covariance Matrix
6. Click Red Triangle , Scree plot, Scatterplot 3D
• The report gives the eigenvalues and a bar chart of the percent of the
variation accounted for by each principal component. There is a Score Plot
and a Loadings Plot as well.
• The eigenvalues indicate the total number of components extracted based on
the amount of variance contributed by each component.
• The Score Plot graphs each component’s calculated values in relation to the
other, adjusting each value for the mean and standard deviation.
• The Loadings Plot graphs the unrotated loading matrix between the variables
and the components. The closer the value is to 1 the greater the effect of the
component on the variable.

4th Quarter STAT FINAL PPT Revised (1)
No ratings yet
4th Quarter STAT FINAL PPT Revised (1)
54 pages
Chapter-7-Statistical-Intervals
No ratings yet
Chapter-7-Statistical-Intervals
113 pages
Test Epson Print Head
No ratings yet
Test Epson Print Head
4 pages
Operator Theory Daniel Alpay Eds download
No ratings yet
Operator Theory Daniel Alpay Eds download
91 pages
Statistical Intervals 2
No ratings yet
Statistical Intervals 2
58 pages
Estimation
No ratings yet
Estimation
39 pages
ECO2004_Ch9
No ratings yet
ECO2004_Ch9
12 pages
Principles of Materials Characterization and Metrology by Kannan
No ratings yet
Principles of Materials Characterization and Metrology by Kannan
869 pages
10 Estimation and Confidence Intervals
No ratings yet
10 Estimation and Confidence Intervals
33 pages
Estimation
No ratings yet
Estimation
14 pages
The Vacuum Interrupter Contact
No ratings yet
The Vacuum Interrupter Contact
8 pages
PDF Lesson 2 Understanding Confidence Interval Estimates for the Population Mean
No ratings yet
PDF Lesson 2 Understanding Confidence Interval Estimates for the Population Mean
33 pages
Estimation and CI
No ratings yet
Estimation and CI
87 pages
Echeveria
No ratings yet
Echeveria
75 pages
LECTURES UP TO FINAL ASSIGNMENTS
No ratings yet
LECTURES UP TO FINAL ASSIGNMENTS
33 pages
Adipic Acid - Wikipedia PDF
No ratings yet
Adipic Acid - Wikipedia PDF
24 pages
Module 06 - One Population Parameter Estimation - Topic 4A
No ratings yet
Module 06 - One Population Parameter Estimation - Topic 4A
59 pages
Class 6 Savita
No ratings yet
Class 6 Savita
7 pages
Estimtion Confidence Interval
No ratings yet
Estimtion Confidence Interval
46 pages
Haier-HSU-09HEA03-instukcia
No ratings yet
Haier-HSU-09HEA03-instukcia
56 pages
Aggregate Function
No ratings yet
Aggregate Function
40 pages
Estimation
No ratings yet
Estimation
29 pages
Notes7 1o
No ratings yet
Notes7 1o
29 pages
Ch 2-Confidence Interval and Sample Size -YARA
No ratings yet
Ch 2-Confidence Interval and Sample Size -YARA
27 pages
The Quantum Theory Philosophy and God 1 PDF
No ratings yet
The Quantum Theory Philosophy and God 1 PDF
294 pages
Chapter Two-Four
No ratings yet
Chapter Two-Four
118 pages
Chap5 Estimation Upload
No ratings yet
Chap5 Estimation Upload
50 pages
Estimation
No ratings yet
Estimation
35 pages
Chapter 4 Lesson 2
No ratings yet
Chapter 4 Lesson 2
56 pages
GEV 001 OVENS - 2017 - TECNOEKA (ENU) Tecnoeka
No ratings yet
GEV 001 OVENS - 2017 - TECNOEKA (ENU) Tecnoeka
29 pages
Confidence Interval
No ratings yet
Confidence Interval
4 pages
Module 06 - One Population Parameter Estimation - Topic 4A
No ratings yet
Module 06 - One Population Parameter Estimation - Topic 4A
9 pages
Chapter Two
No ratings yet
Chapter Two
154 pages
Materi 4 Estimasi Titik Dan Interval-Edit
No ratings yet
Materi 4 Estimasi Titik Dan Interval-Edit
73 pages
Chapter 7
No ratings yet
Chapter 7
17 pages
Chapter 3 Estimation
No ratings yet
Chapter 3 Estimation
43 pages
Estimations
No ratings yet
Estimations
24 pages
Lecture 7
No ratings yet
Lecture 7
50 pages
FS-2 CPP 07 Physics Chemistry Mathematics 2020
No ratings yet
FS-2 CPP 07 Physics Chemistry Mathematics 2020
25 pages
Chapter 9 Slides
No ratings yet
Chapter 9 Slides
33 pages
Stat Chapter 4
No ratings yet
Stat Chapter 4
19 pages
Sampling Distributions and Confidence Intervals For Proportions
No ratings yet
Sampling Distributions and Confidence Intervals For Proportions
31 pages
Samsung Gtu37sen Chassis Le37m86bdx LCD (ET)
No ratings yet
Samsung Gtu37sen Chassis Le37m86bdx LCD (ET)
217 pages
Chapter 9
No ratings yet
Chapter 9
20 pages
Mucoindo Engineering 08-12-2017
No ratings yet
Mucoindo Engineering 08-12-2017
11 pages
BSCHAPTER - (Theory of Estimations)
No ratings yet
BSCHAPTER - (Theory of Estimations)
39 pages
Product Design-1 PDF
No ratings yet
Product Design-1 PDF
15 pages
2070516 - Nguyễn Thiên Phú - Assign1
No ratings yet
2070516 - Nguyễn Thiên Phú - Assign1
33 pages
ST130 - Chapter 8
No ratings yet
ST130 - Chapter 8
13 pages
Chapter Four
No ratings yet
Chapter Four
9 pages
Chapter 4 - BUSINESS STATISTICS
No ratings yet
Chapter 4 - BUSINESS STATISTICS
14 pages
Estimation of Population Parameters: 1) Estimating The Population Mean
No ratings yet
Estimation of Population Parameters: 1) Estimating The Population Mean
19 pages
Chapter 5.1 Point Estimation - 9march2016
No ratings yet
Chapter 5.1 Point Estimation - 9march2016
44 pages
Sampling and Testing Concrete Masonry Units and Related Units
No ratings yet
Sampling and Testing Concrete Masonry Units and Related Units
24 pages
Estimation 1
No ratings yet
Estimation 1
35 pages
Review Final Math
No ratings yet
Review Final Math
5 pages
LAGD 19 Guidelines On The Measurement Uncertainty-Calibration
No ratings yet
LAGD 19 Guidelines On The Measurement Uncertainty-Calibration
20 pages
Homework4 PDF
No ratings yet
Homework4 PDF
3 pages
4 Confidence Intervals
100% (1)
4 Confidence Intervals
49 pages
M1112SP IIIh 3
No ratings yet
M1112SP IIIh 3
3 pages
Earth Leakage Relay Series CMR: Cat. No. 17G715GF2 17G745GF2 17G715KF2 17G745KF2
No ratings yet
Earth Leakage Relay Series CMR: Cat. No. 17G715GF2 17G745GF2 17G715KF2 17G745KF2
3 pages
Seminar Presentation ON
No ratings yet
Seminar Presentation ON
16 pages
Weka (Software)
No ratings yet
Weka (Software)
4 pages
Become A Data Analyst
No ratings yet
Become A Data Analyst
24 pages
Degree Completion Checklist For BSC in CSE (201 To 232)
No ratings yet
Degree Completion Checklist For BSC in CSE (201 To 232)
5 pages
Bus 173 - 1
No ratings yet
Bus 173 - 1
28 pages
10 Inferential Statistics
No ratings yet
10 Inferential Statistics
39 pages
Point and Interval Estimates
No ratings yet
Point and Interval Estimates
17 pages
Binomial Distributions For Sample Counts
No ratings yet
Binomial Distributions For Sample Counts
38 pages
1 Point Estimation: Parameter Estimation Kaustav Banerjee Decision Sciences Area, IIM Lucknow
No ratings yet
1 Point Estimation: Parameter Estimation Kaustav Banerjee Decision Sciences Area, IIM Lucknow
4 pages
Chapter 6
No ratings yet
Chapter 6
33 pages
Point Estimate
No ratings yet
Point Estimate
2 pages
Confidence Intervals
No ratings yet
Confidence Intervals
12 pages
Navidi ch5
No ratings yet
Navidi ch5
34 pages
Flipped Notes 7 Estimation
No ratings yet
Flipped Notes 7 Estimation
36 pages
Estimation and Confidence Intervals: Mcgraw Hill/Irwin
No ratings yet
Estimation and Confidence Intervals: Mcgraw Hill/Irwin
15 pages
5.1 Lesson 5 T-Distribution - A Lecture
100% (1)
5.1 Lesson 5 T-Distribution - A Lecture
5 pages
Chapter 09 Estimation and Confidence Intervals
No ratings yet
Chapter 09 Estimation and Confidence Intervals
26 pages
Development of Fuel Cell AUV "Urashima"
No ratings yet
Development of Fuel Cell AUV "Urashima"
5 pages
Applied Statistics and Probability For Engineers Chapter - 8
No ratings yet
Applied Statistics and Probability For Engineers Chapter - 8
13 pages
9a BMGT 220 S.I. Theory of Estimation
No ratings yet
9a BMGT 220 S.I. Theory of Estimation
5 pages
EPA Method 531.1 (Carbamate in Water)
No ratings yet
EPA Method 531.1 (Carbamate in Water)
23 pages
Session 5. Confidence Interval of The Mean When SD Is Known (18-22)
No ratings yet
Session 5. Confidence Interval of The Mean When SD Is Known (18-22)
5 pages
Confidence Interval
100% (1)
Confidence Interval
19 pages
Preboard Geo May 2015
No ratings yet
Preboard Geo May 2015
10 pages
Advanced Sheet Metal Commands
No ratings yet
Advanced Sheet Metal Commands
26 pages
Deluge Valve Model F-1
No ratings yet
Deluge Valve Model F-1
2 pages
Unit 4 (STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES )
No ratings yet
Unit 4 (STATISTICAL ESTIMATION AND SMALL SAMPLING THEORIES )
26 pages
Sampling in Statistics
From Everand
Sampling in Statistics
Stephanie Glen
No ratings yet
Statistics II Essentials
From Everand
Statistics II Essentials
Emil Milewski
2.5/5 (1)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

SEE5211 Chapter5 P2017

Uploaded by

SEE5211 Chapter5 P2017

Uploaded by

Data Analysis in Envir Application

Dr. Wen Zhou

Email: wenzhou@cityu.edu.hk ; Office: B5425, AC1

• The role of statistics and the data analysis process

Sampling variability and Confidence interval

• A number that that can be computed from

• The observed value of the statistic

A fish pond： Suppose there are 20 fish in the pond. The

We caught fish with lengths 6.3 inches, 2.2

We could take a sample of candies

We would have a sample proportion

• A single number (a statistic) based on sample data that is used

“point” refers to the single value on a

This is a point estimate for the population proportion of college students

The dotplot suggest this data is approximately symmetrical.

• Choose a statistic that is unbiased (accurate)

Unbiased, since the

We could take a sample of candies and compute the proportion of blue

Would you have more confidence if your answer were an interval?

A confidence interval (CI) for a population characteristic is an

It is constructed so that, with a chosen degree of confidence, the

The primary goal of a confidence interval is to estimate an

What does it mean to be within 10 years?

How confident (%) are you that you can ...

Guess a person’s age within 10 years?

The confidence level associated with a confidence interval

If this method was used to generate an interval estimate over and

3. As long as n is large (np > 10 and n (1-p) > 10) the

These are the conditions that must be true in order to

Lower tail area = .025 Upper tail area = .025

this means the interval

And this will happen for 95% of all possible samples!

Notice that the length of each

This p doesn’t fall within 1.96 standard deviations of the mean

This p fell within 1.96 standard deviations

Note that the ones with

If we were to compute 100

The general formula for a confidence interval for a population

• p is the sample proportion from a random sample

• the sample size n is large (np > 10 and

• if the sample is selected without replacement, the sample size is small

The general formula for a confidence interval for a population

The point estimate is

A 95% confidence interval for the population proportion of adults

Compute a 90% confidence interval for this proportion.

Compute a 99% confidence interval for this proportion.

The bound on error estimation for a 95% confidence interval is

Before collecting any data, an investigator may wish to determine a sample

If there is no prior knowledge and a preliminary study is not feasible,

.1(.9) = .09 By using .5 for p, we are using the

These are the properties of the sampling

Bound on error of estimation

Let x = 219 mrems

When  is unknown, we use the sample standard deviation s to

The use of the value of s introduces extra variability.

t value 1.98 at 95%, 99df . (212.07,225.93)

1) The t distribution corresponding to any particular

3) As the number of degrees of freedom increases, the

3) As the number of degrees of freedom increases, the

The general formula for a confidence interval for a population

1) x is the sample mean from a random sample,

Compute a 99% confidence interval for the mean number of

The bound on error of estimation associated with a 95% confidence

• Approximately 68% of the

• Approximately 95.4% of the

n    96.04  Always round sample size up to

The plot shows the fitted model, which

The purpose of principal component analysis is to derive a small number of

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.