0% found this document useful (0 votes)
9 views32 pages

Basic Univariate Statistics For Engineers 2019

The document discusses simple random sampling, emphasizing the challenges of obtaining true random samples due to population size and labeling issues. It covers various sampling methods, parameter estimation, and the importance of unbiased estimators, as well as the Central Limit Theorem and confidence intervals for estimating population parameters. Additionally, it introduces the t-distribution for small sample sizes and the chi-squared distribution for estimating standard deviation.

Uploaded by

Yen Yen Pek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views32 pages

Basic Univariate Statistics For Engineers 2019

The document discusses simple random sampling, emphasizing the challenges of obtaining true random samples due to population size and labeling issues. It covers various sampling methods, parameter estimation, and the importance of unbiased estimators, as well as the Central Limit Theorem and confidence intervals for estimating population parameters. Additionally, it introduces the t-distribution for small sample sizes and the chi-squared distribution for estimating standard deviation.

Uploaded by

Yen Yen Pek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Simple Random Sampling

If population consists of a finite number of units then these could be labelled and a random number
generator used to select a sample.

It is often not possible to label the members of a population.

 Insufficient information is available about the population.

 Population is too large.

In such cases it will usually not be possible to obtain a true random sample but an effort should be made to
avoid bias and non-independence.

Biased selection occurs when some members of the population are more likely to be selected than others.
Biased sampling may result in observations that are systematically higher or lower than the population
average. This relates then to a systematic error (non-random or constant shift (bias) from the true value) and
accuracy, which statistics alone cannot help you attack (!). This requires experimental design to
investigate/avoid bias. We can off course correct our measurements for bias iff (if and only if) we know
what the true value is.

Non-independence occurs when the probability of a unit being sampled depends on which other units are in
the sample. Non-independence sampling often leads to the sample observations being less variable than are
the observations in the whole population.

It is worth noting that when sampling is without replacement, it is not true to say that the sampled
observations are selected independently of one another. However it is of negligible effect when the sample
size is less than about one-tenth of the population size sampled.

When non-random sampling has to be used, it is important that the method of sampling is reported carefully,
so that a judgement can be made about the wider applicability of any conclusions based on the sample.

Example: It is required that 20 students are selected at random from a college. Two methods of sampling
have been proposed:

(1) choose a class at random and select, at random 20 students from the class;
(2) ask the head of the college to select one student from each of 20 classes.

The first method gives a sample that is non-independent, since if one member of a class is selected, the other
members of the class, have a greatly enhanced chance of selection.

The second method may be biased. Perhaps the head will select those students who excel in academic
studies or sporting activities.
Why Sample

 Cost

 Destructive testing

 Accessibility

How should we sample?

 Accessibility or haphazard sampling


With the prime stimulus of just administrative convenience

 Judgmental or purposive sampling


Deliberate subjective choice in drawing what they regard as a "representative" sample.

 Quota sampling
Often judgement and accessibility are combined. This method is usually more structured than
straight accessibility or judgmental sampling.

 Stratified simple random sampling

 Cluster and multi-stage sampling


Parameter Estimation

As indicated in the previous section, the parameters representing the parent population are often not available
and instead it is necessary to make do with parameters estimated from a sample.

This poses two questions, first how do you estimate these parameters and secondly how well do we know them.

Attention will focus on the two most important parameters (statistics) namely the mean and the variance. It is
common practice to use the symbols  and 2 for the mean and variance, respectively, when referring to the
parent population but then use the symbols x and s2 for the sample estimators.

Note that the square root of the variance is the standard deviation (s), and while the standard deviation or s.d.
s is the more intuitive term(it has the same units as the mean) the variance s2 (always a positive quantity) is
preferred for mathematical manipulations. The variance is always a positive quantity, whereas the individual
sample values lie greater than (+) and less than (-) the mean value. We use however the standard deviation as
a characteristic measure of the variability or spread of individual sample values about their mean.

Estimates for  and 2

Given a sample x1, x2, ...xn from the population. Example #1:
The sample mean is given by :- Mean = (4.1.+4.3+4.4+4.2+4.3+3.9)/6 =
4.20g
1 n s2 = [(4.1-4.2)2+(4.3-4.2)2+(4.4-
x= x i
n i=1
4.2)2+(4.2-4.2)2+(4.3-4.2)2+(3.9-4.2)2]/5
= [(-0.1)2+0.12+0.22+02+0.12+(-0.3)2]/5 =
0.032g2
The sample variance and standard deviation are given by:- s = √s2 = 0.18g

n n

(x - x) (x - x)
2 2
i i

s2 = i=1
s= i=1
n -1 n -1
The sample mean is a point estimator of the population mean  and the sample variance s2 is a point
estimator of the population variance 2.

Properties required of a good point estimator


• The point estimator should be unbiased. That is, the long-run average or expected value of the point
estimator should be the parameter that is being estimated. Although unbiasedness is desirable, this
property alone does not always make an estimator a good one
• An unbiased estimator should have minimum variance. This property states that the minimum
variance point estimator has a variance that is smaller than the variance of any other estimator of that
parameter.

The sample variance is divided by n-1 rather than n, this is a requirement to obtain an unbiased estimate for
s2. An alternative argument for dividing by n-1 is to note that n-1 is the number of degrees of freedom (d.f.)
in the system. In this case, although there seems to be n independent values x1, x2, ...xn, the condition
, i.e. minimum variance, reduces the number of independent variables by 1 (if we have
all the other estimates in the bracket, the deviation of the sample value from its mean, then the final estimate
is fixed and not independent since the deviations must sum to zero).

Dividing by the number of degrees of freedom (d.f. or df) is common in statistics.


The Sampling Distribution of x

Having obtained estimates by taking a sample it is possible to take another sample of similar size from the same
population and this will provide slightly different values of x and s2 will result. If repeated samples are taken
then one obtains a sampling distribution for the sample mean and sample standard deviation. These sampling
distributions can be used to quantify how well we know the estimate.

If random samples, size n, were to be taken from a distribution with mean  and standard deviation , then the
sample means would form a distribution having the same mean  but with a smaller standard deviation given by
 
. The quantity which is the standard deviation of the values of x is often called the standard error of
n n
the mean, x , or s.e. The sampling distribution of x will be approximately Normal (Gaussian), and this
approximation improves as n increases.

Example #2:

The percentage of copper in a certain chemical is to be estimated by taking a series of measurements on


small random quantities of the chemical and using the sample mean percentage to estimate the true
percentage. From previous experience individual measurements of this type are known to have a
standard deviation of 2%. How many measurements must be made so that the standard error of the
estimated percentage is less than 0.6% ?

< 0.6
n
2
< 0.6
n
n > 11.1
At least 12 measurements must be made to achieve the required precision.

We have seen that the mean of a sample provides an approximation to the true mean μ of the population. If we
take numerous samples then the ‘mean of the means’ is obviously a more reliable estimate (this is Central Limit
Theorem). However this is often unrealistic and only one sample from the population is taken, so how reliable
is this single value ? The standard error provides the way of assessing the precision of the sample mean. It is
convenient to express how well we know the estimate in terms of confidence intervals (CI). This is the band in
which we can say with x% certainty (e.g. 95%) the value of the mean occurs (ie 95% CI).
Normal Distribution
The Normal distribution is one of the most important sampling distributions (and models many parent
populations as well). So when we say testing ‘Normal’ or ‘Normality’ we mean this distribution. An
important special case of the Normal distribution is the standard Normal distribution.

Standard Normal Distribution


0.4

0.35

0.3

0.25
f(z)

0.2

0.15

0.1

0.05

0
-4 -3 -2 -1 0 1 2 3 4

z
A standard Normal distribution has mean 0 and standard distribution 1. A Normal distribution is denoted as
N(, 2), therefore a standard Normal distribution is denoted as N(0,1). A Normal distribution x ~ N(, 2)
can be transformed to the standard Normal distribution z ~ N(0,1) using the operation:-

x 
z

The Central Limit Theorem (CLT)

If x1, x2, …. xn are n independent and identically distributed random variables (from the same parent
population) with E(xi) =  and E((xi - )2 ) = 2 and w = x1 + x2 + …. xn then

w  n
zn 
n 2

has an approximate N(0,1) distribution.


w
Letting x 
n
x 
Gives z n 

n
is approximately Normally distributed with mean  and variance 2/n
Confidence Interval for  with  known

If the mean and standard deviation of the population is known then the mean x of a sample of size n will be
part of a Normal distribution
2

N(, )
n

x -
or N(0,1) where z =

n
Now 95% of the standard Normal distribution lies between 1.96

-1.96 +1.96

Probability(-1.96 < Z < 1.96) = 0.95


x -
Probability(-1.96 < < 1.96) = 0.95

n
x -
Considering -1.96 < < 1.96

n

rearranging gives:-

 
x - 1.96 <  < x + 1.96
n n

The interval between x - 1.96/n and x + 1.96/n is called the 95% confidence interval for x .

Example #3:
If the sample mean x of the twelve measurements, taken as the percentage of copper in a certain chemical
(previous example), was found to be 12.91%, give a 95% confidence for the true percentage μ.

 2
x  1.96  12.91  1.96
n 12
 12.91  1.13
Confidence Interval (CI) for μ with σ unknown

In practice we usually find the true standard deviation σ is unknown and that the sample standard deviation
s then is used to estimate σ. Then from CLT if a sample of size n is taken, an estimate of the standard error of
the sample mean x is given by s/√n.

We might expect that the 95% confidence interval for μ to be x + 1.96s/√n indeed this is correct for large
samples (n > 30). However for small samples the true interval is wider since s is no longer a good estimate for
σ and there will be appreciable variation in s from sample to sample.

Rather than use the standard Normal distribution.

a t-distribution or t-test (also known as Student t-test) is used instead to compare means

Comparsion of t-distribution with normal


distribution

0.5
0.4
0.3 normal
0.2 t-distribution
0.1
0
-6 -4 -2 0 2 4 6
t

The t-distribution is more spread out than the standard Normal distribution. The distribution depends on the
number of degrees of freedom (n-1, where n is the sample size).
The percentage point t  , is chosen so that a proportion  of the t-distribution with  degrees of freedom lies
above it.

t
t  ,

The 100(1 - α) % confidence interval for μ is given by

 
2 2

t
t t
,n 1 ,n 1
2 2

Example #4:
The compressive strength of concrete is being tested by a civil engineer, who tests 12 specimens and obtains
the following data.

15.56 15.56 15.47 15.17


15.36 15.38 15.43 15.50
16.08 15.38 15.61 15.47

Construct a 95% confidence interval on the mean strength.


185.97
x  15.4975
12
0.06252   0.1375   ...   0.0275 
2 2

s 
2

11
 0.04724
 s  0.21735

t 0.025,11  2.201 t 0.025,11  2.201

s
  x  2.201
n
 15.4975  0.13810

(95% confidence level for the mean)


Inference about the Standard Deviation 

A point estimator for  was previously given by:

(x - x)
2
i

s= i=1

n -1

To estimate a confidence interval for this a sampling distribution for s2 is needed. This requires a distribution
called the 2 distribution (read “chi-square distribution”) whose form depends on n-1.

Chi-Squared Distribution  2

If x1, x2, …. xk are k independent and identically Normally distributed random variables with  = 0 and 2
=1 then w  x12  x 22  ....  x 2k follows the 2 distribution

Chi-Squared Distribution
0.35

0.3

0.25

0.2
k=1
f(x)

0.15
k=2
0.1 k=15
0.05

0
0 5 10 15 20 25 30
-0.05

This distribution is asymmetric or skewed with  =k and 2 = 2k


If x1, x2, …. xk is a random sample from an N(, 2) distribution, then:-
k

 (x i  x) 2
SS
i 1
 ~  2k 1
 2
 2

SS
The sample variance  s 2 
n 1
2 2
2
Then the distribution of s is  k 1
n 1

SS here is the sum of square differences. Thus the sampling distribution of the sample variance is a constant
 the chi-squared (χ2) distribution if the population is Normally distributed.

Example #5:
Find the upper 0.05 point of the x distribution with 17 degrees of freedom. Also find the lower 0.05 point.

The upper 0.05 point is read from the column labelled 0.05, giving 0.05
2
 27.59 for 17 d.f. The lower
0.05 point is read from the column 0.95, giving 0.95
2
 8.67 .
The χ2 distribution table.
The  2 distribution is the basic distribution for constructing confidence intervals for 2. To find the 95%
confidence level for 2 the probability  is divided equally between the two tails. From the  2 distribution:
 2
P  0.975 
 n  1 s 2  2   0.95
0.025 
 2 
Rearranging the inequality gives:
  n  1 s 2  n  1 s 2 
P  2
  0.95
 0.025 0.975
2 2

A confidence interval for  can be obtained by taking the square root. For a confidence level of 95% the
interval for  becomes:
  n  1  n  1 
s , s
 0.025
2
 0.975
2

 

Example:
The data of the computer anxiety score for a sample of students is given in the table below.
2.90 1.00 1.90 2.37 3.32 3.79 3.26 1.90
1.84 2.58 1.58 2.90 2.42 3.42 2.53

Find the 90% confidence level.

The sample variance can be evaluated from the formula


n

(x - x)
2
n i
1 37.71 8.3602
x=  xi   2.514 s2 =   0.5972
i=1
n i=1 15 n -1 14

Here n = 15, so d.f. = n - 1 = 14. The  2 table gives  0.95


2
 6.57 and  0.05
2
 23.68 . The 90% confidence level
for  is
2

 14  0.5972 14  0.5972 
 ,    0.3531, 1.2726 
 23.68 6.57 

You can also calculate the probability (p-)values, eg

https://www.youtube.com/watch?v=HwD7ekD5l0g
Robustness of inference procedures

The small sample methods for both confidence interval estimation and hypothesis testing presuppose that
the sample is obtained from a Normal population. Users of these methods would naturally ask:

1. What method can be used to determine if the population distribution is nearly Normal?
2. What can go wrong if the population distribution is not Normal?
3. What procedures should be used if it is not Normal?
4. If the observations are not independent, is this serious?

1. To answer the first question, we could construct a dot diagram or Normal-score plot. Theses may
indicate a wild observation or a striking departure from Normality. If none of these is visible, the
investigator would feel more secure using the inference procedures. However, any plot based on a small
sample cannot provide convincing justification for Normality. Lacking sufficient observations to justify
or refute the Normal assumption, we are led to a consideration of the second question.

2. Confidence intervals and test of hypothesis concerning  are based on Student’s t-distribution. If the
population is not Normal, the actual percentage points may differ substantially from the tabulated
values. Fortunately, the effects on inference about  using the t-statistic are not too serious if the sample
size is at least moderately large (say 15). In larger samples, such disturbances tend to disappear due to
the Central Limit Theorem. We express this fact by saying that inferences about  using the t-statistic
are reasonably robust. However, this qualitative discussion should not be considered a blanket
endorsement for t. When the sample size is small, a wild observation or a distribution with long tails
can produce misleading results. Unfortunately, inferences about  using the 2-distribution may be
seriously affected by departure from Normality, even with large samples. We express this by saying
that inferences about  using the 2-distribution are not robust against departures of the population
distribution from Normality.

3. We cannot give a specific answer to the third question without knowing something about the nature of
the non-Normality. Dot diagrams or histograms of the original data may suggest some transformation
that will bring the shape of the distribution closer to Normality. If it is possible to obtain a
transformation that leads to reasonably Normal data plots, the problem can then be recast in terms of
the transformed data. Otherwise, users could benefit from consulting with a statistician.

4. A basic assumption has been that the sample is drawn at random, so the observations are independent of
one another. If the sampling is made in such a manner that the observations are dependent, however, all
the inferential procedures for small as well as large samples may be seriously in error. Because
independence is the most crucial assumption, we must be constantly alert to detect such violations.
Prior to a formal analysis of the data, a close scrutiny of the sampling process is imperative.
Normality checks

Many of the inference procedures used have required the sample to be Normally distributed. If this is the case
then the assumption should be checked. Graphical methods can prove helpful in detecting serious departures
from Normality. Histograms can be inspected for lack of symmetry. The thickness of the tails can be checked
for conformance by comparing the proportions of observations in the intervals
 x  s, x  s  ,  x  2s, x  2s  ,  x  3s, x  3s  with those suggested by the Normal distribution.
A more effective way to check the plausibility of a Normal model is to construct a graph called a normal-scores
plot of the sample. The Normal score refers to an idealised sample taken from the standard Normal distribution.
For a sample size of n the standard Normal distribution is divided into n+1 equal proportions (in terms of
probability); the z values forming the boundaries between the zones form the n standard Normal scores. These
 i 
values can be obtained using the MS-Excel function NORMINV  , 0,1 for i = 1,2, … n. These provide
 n 1 
predicted z-score locations for the given data in the sample.

To construct a Normal-scores plot:

1. Order the sample data from smallest to largest.


2. Obtain the Normal scores.
3. Pair the i-th largest observation with the ith largest Normal score and plot the pairs in a graph.

If the sample were indeed Normal, we should expect the plot of the observed values against the standard
Normal scores to produce a straight line where the intercept of the line indicates the value of  and the slope of
the line indicates .

Example #6:
Check visually the Normality of the samples given in the table below.

16.35 16.4 16.52 16.57 16.59 16.85 16.96 17.04 17.15 17.21

zi
i xi i/(n+1) (NORMINV)
1 16.35 0.090909 -1.33518
2 16.4 0.181818 -0.90846
3 16.52 0.272727 -0.60459
4 16.57 0.363636 -0.34876
5 16.59 0.454545 -0.11419
6 16.85 0.545455 0.114185
7 16.96 0.636364 0.348756
8 17.04 0.727273 0.604585
9 17.15 0.818182 0.908458
10 17.21 0.909091 1.335178

n= 10

Then plot zi vs. the order of the observations in an x-y plot, and fit a straight line.
Normal-score Plot

1.5

Normal Score
0.5

0
16.2 16.4 16.6 16.8 17 17.2 17.4
-0.5

-1

-1.5
Ordered Observations

This is one way of graphing a Normality plot (effectively a Normal score plot). If we change axes so we plot
observed scores against the standard Normal scores then:

Where the regression fit now gives the mean (y-intercept) and standard deviation (slope) directly.
Significance Tests

In this section we shall carry out significance tests or Hypothesis testing in order to test some theory about the
population.

Consider the example where the strength of steel wire made by an existing process is Normally-distributed with
mean μ0 = 1250 and standard deviation (√σ2 =) σ = 150. A batch of wire is made by a new process and a
sample of 25 measurements gives an enhanced average strength of x = 1312 (here we assume that the standard
deviation of the measurements does not change so s=σ). The engineer now must decide if the difference
between x and μ0 is strong enough evidence (a statistically significant difference) to justify changing to the
new process.

The method of testing the data to see if the results are considerably or significantly better is called a significance
test. In any such situation the experimenter has to weigh the evidence and if possible, decide between two rival
possibilities (hypotheses). The hypothesis which we want to test is called the null hypothesis and is denoted by
H0. Any other hypothesis then is called the alternative hypothesis and is denoted as H1.

Returning to the example. The engineer could decide whether to accept or reject the hypotheses that there is no
difference between the new and existing process. This would be the null hypothesis

H0: μ = μ 0

The alternative hypothesis would be that the new process is better than the existing process, that is the
alternative hypothesis.
H1: μ > μ 0

The important point to remember here is that the null hypothesis must be assumed to be true until the data
indicates otherwise. The choice of the null hypothesis depends on a number of considerations and several
possibilities arise. For example if the existing process is reliable and changeover costs are high, the burden of
proof is on the new process to show that it really is better.

Then H0 : Existing process is as good as or better than the new process.

H1 : New process is an improvement.

This is the case outlined previously, however if the existing process is unreliable and change overcosts are low,
the burden of proof is on the existing process.

Then H0 : New process is as good as or better than the previous.

H1 : Existing process is better than the new process.


Exemplar #7:
A new drug is tested to see if it is effective in curing a certain disease. It is vital that a drug must not be put on
the market until it is been rigorously tested. Thus we must assume that the drug is not effective or is actually
harmful until the tests indicate otherwise. The null hypothesis is that the drug is not effective. The alternative
hypothesis is that the drug is effective.

H0 : μ ≤ μ 0

versus H1 : μ > μ 0

In addition the null hypothesis is expressed precisely

H0 : μ = μ 0

versus H1 : μ > μ 0

Test Statistics

Having decided on the null and alternative hypothesis, the next step is to calculate a test statistic which will
show up any departure from the null hypothesis.

Returning to the batch wire example of the two processes, a suitable test statistic would be:-

x - 0
z=

25

We would expect this to be a Normal distribution from the Central Limit Theorem.

The conclusion that H0 is true does not actually mean that the sample mean must equal μ0 exactly but that it
must be close to a specified significance level (5% is often taken).

Considering the example


1312 - 1250
z0 = = 2.067
150
25

From a standard Normal (z) table then

Prob ( z ≤ 2.067 ) = 0.9808

Prob ( z > 2.067 ) = 1 - 0.9808

= 0.0192

Thus the result is significant at the 2% level (better than our 5% threshold). We are saying to get a value greater
than this z-statistic probably would occur only 2% of the time, therefore we tend to reject the null hypothesis
and accept the alternative hypothesis that there has been an improvement.
z

In the previous example we are interested in values of significantly higher (or it could have been lower) than the
mean. Any such test which only takes account of departures from the null hypothesis in one direction is called
an one-tailed test. However other situations exist in which departures from H0 in two directions (either side of
the mean, higher and lower, i.e. not equal to) are of interest. This then is a two-tailed test.

Example #8: According to a certain chemical theory the percentage of iron in a certain compound should be
12.1%. It was decided to analyse 9 different samples to see if the measurements differ significantly from
12.1%.

Null hypothesis H0 μ = 12.1%


Alternative hypothesis H1 μ ≠ 12.1%

This is a two-tailed test. For tests based on the Normal distribution the level of significance of a result in a
two-tailed test can be obtained by doubling the level of significance which would be obtained if a one-tailed test
was carried out. There is an additional remaining complication when the value of is unknown and has to be
replaced by s the sample standard deviation. The test statistic becomes:-
x - 0
t=
s
n
when n is large ( > 30 ) then the test statistic is approximated well by the same Normal distribution, however if
n is small the t -distribution should be used (this then is Student’s t-test).

Exemplar #8:
If the results of the nine different samples of iron are as follows

11.7, 12.2, 10.9, 11.4, 11.3, 12.0, 11.1, 10.7, 11.6

Is the sample mean of these measurements significantly different then from 12.1% ?

Mean = 11.43 s2 = 0.24 s = 0.49

The population standard deviation (σ) is unknown here, therefore we use s


x -  0 11.43 - 12.1
t0 = = = - 4.1
s 0.49
n 9
This is now a two-tailed test with n-1 = 8 degrees of freedom. Therefore t0.025,8 = 2.31 (from table) and the
result is significant at the α = 5% level (0.025 then is α/2 each tail).

Furthermore t0.005,8 = 3.36 (from table), therefore the result is still significant at the 1% level.

Our test statistic (t0) is a larger absolute value, |t0|, than either of these numbers (it is a negative value because
the sample mean is smaller than the population mean here).

Thus we have fairly conclusive evidence that the percentage of iron is not 12.1% (t0>tα/2, n-1).

The above is an example of the critical value (CV) approach. This involves determining "likely" or "unlikely"
by determining whether or not the observed test statistic is more extreme than would be expected if the null
hypothesis were true. That is, it entails comparing the observed test statistic to some cutoff value, called the
"critical value." If the test statistic is more extreme than the critical value (as here), then the null hypothesis is
rejected in favor of the alternative hypothesis. If the test statistic is not as extreme as the critical value, then the
null hypothesis is not rejected. [the calculated t-statistic (t0) is compared against the t-value from table (tα/2, n-1;
here for a 2-tail test) for the given significance level]. If t0< tα/2, n-1 then accept the null hypothesis at the level of
significance. If t0>tα/2, n-1 then reject the null hypothesis for the alternative hypothesis.
There are also tables and calculations (eg software like MS-Excel) detailing the p-value of say a t-statistic i.e.
the "the probability of observing a test statistic at least as large as the one calculated assuming the null
hypothesis is true”. Thus if the p-value is less than (or equal to) α, then the null hypothesis is rejected in favor of
the alternative hypothesis. And, if the p-value is greater than α, then the null hypothesis is not rejected (cf. if α is
0.05 then this is at the 95% confidence level). https://onlinecourses.science.psu.edu/statprogram/node/138;
http://blog.minitab.com/blog/michelle-paret/alphas-p-values-confidence-intervals-oh-my. For practical
purposes, reject a null hypothesis if the p-value <α (generally 5% or 0.05).
Simple Comparative Experiments

Exemplar 9: Tension Strength Data (N/mm2) for Portland Cement

Modified Mortar Unmodified Mortar


j x1j x2j
1 16.85 17.50
2 16.40 17.63
3 17.21 18.25
4 16.35 18.00
5 16.52 17.86
6 17.04 17.75
7 16.96 18.22
8 17.15 17.90
9 16.59 17.96
10 16.57 18.15

We are interested in comparing the strengths of the two mortars.


We could compare their means x1 = 16.76 N/mm2 x 2 = 17.92 N/mm2

But is this (statistically) significant? (i.e. are their means really different or could they each derive from
the same population ?).

Examining box-whisker plots would suggest visually this is significant (as the extreme inter-quartile ranges do
not overlap) here.
18.5

18.0

17.5

17.0

16.5

16.0
N= 10 10

Modified Mortar Standard Mortar

But can we quantify this significance ?


Significance test between two samples (see also end section)

For example, in the Portland cement experiment, we want to test the hypothesis that the mean strength of the
two mortars are equal. This can be formally stated then as:

H0 : 1 = 2
H1 : 1  2

Where 1 is the mean tension bond strength of the modified mortar and 2 is the mean tension bond strength of
the unmodified mortar. The statement H0: 1 = 2 is called the null hypothesis and H1 : 1  2 is called the
alternative hypothesis. The alternative hypothesis specified here is called a two-sided or two-tailed alternative
hypothesis since it would be true either if 1 < 2 or if 1 > 2.

Two types of errors may be committed generally when testing hypotheses. If the null hypothesis is rejected
when it is true, then a type I error has occurred. If the null hypothesis is not rejected when it is false, then a type
II error has been made.

 = P(type I error) = P(reject H0|H0 is true)


 = P(type II error) = P(fail to reject H0|H0 is false)

The general procedure in hypothesis testing is to specify a value of the probability of type I error , often called
then the significance level of the test, and then design the test procedure so that the probability of type II error 
has a suitably small value. A test statistic is needed then to explore the hypothesis and in the case of these
mortars assuming that the variances of the two samples are in fact identical the appropriate test statistic then is
given by:

x1 - x 2
t0 =
1 1
sp +
n1 n 2
(n1 -1)s12 + (n 2 -1)s 22
where s 2p =
n1 + n 2 - 2
To determine whether to reject H0: 1 = 2, we compare t0 to the t-distribution with n1 + n2 - 2 degrees of
freedom. If |t0 | > t/2, n1 + n2 - 2 where t/2, n1 + n2 - 2 is the upper /2 percentage point of the t distribution with n1 +
n2 - 2 degrees of freedom, we would reject H0 and conclude that the mean strengths of the two formulations
differ. Here,

Modified Mortar Unmodified Mortar


x1 = 16.76 x2 = 17.92
s12 = 0.1 s22 = 0.061
s1 = 0.316 s2 = 0.247
n1 = 10 n2 = 10

sp = 0.284 t0 = -9.13 and t0.025, 18 = 2.101

Since t0 = -9.13 < - t0.025, 18 = -2.101, we would reject H0 and conclude that the mean strength of the two
formulations are indeed different.

This was for known/assumed variances. N.B. if unknown variance then we can also hypothesis test for example
the equality of variances, i.e. if the variances are the same (ie the hypothesis s12=s22; cf. same population) or the
same as a particular value using then χ2 distribution but now a F-test (cf. http://www.milefoot.com/math/stat/ht-
variance.htm). Generally, however, we are interested in significance testing the equality of means, as above.
Checking Assumptions in the t-Test
In using the t-test, we make the assumptions that both samples are drawn from independent populations that can
be described by a Normal population, that the standard deviation or variances of both populations are equal, and
that the observations are independent random variables. This assumption of independence is critical, and if the
run order is randomised this assumption will be satisfied. The equal-variance and Normality assumptions are
easy to check then using a Normal score plot (Normal probability plot).

Normal Q-Q Plot of Standard Mortar Normal Q-Q Plot of Modified Mortar
1.5 1.5

1.0 1.0

.5 .5

0.0 0.0
Expected Normal

-.5 Expected Normal -.5

-1.0 -1.0

-1.5 -1.5
17.4 17.6 17.8 18.0 18.2 18.4 16.2 16.4 16.6 16.8 17.0 17.2 17.4

Observed Value Observed Value

Choice of Sample Size


Selection of an appropriate sample size is one of the most important aspects of any experimental design
problem. The choice of sample size and the probability of type II error  are closely connected. Suppose that
we are testing the hypothesis

H0 : 1 = 2
H1 : 1  2

and that the means are not equal so that d = 1 - 2. Since H0: 1 = 2 is not true, we are concerned about
wrongly failing to reject H0. The probability of type II error depends on the true difference in means d. A graph
of  versus d for a particular sample size is called the operating characteristic curve for the test. The  error is a
function of sample size.
F-Distribution
If  and  are two independent chi-square random variables with u and v degrees of freedom then the ratio
2
u
2
v

 2u
Fu,v 
 2v
follows the F distribution with u numerator degrees of freedom and v denominator degrees of freedom.

Given two independent Normal populations with common variance 2 ,where x11, x12, …. x1n is a random
sample of n1 observations from the first population and x21, x22, …. x2n is a random sample of n2
observations from the second, then
s12
~ Fn1 1,n 2 1
s 22
where s12 and s 22 are the two sample variances.

F-D istribution
0.9

0.8

0.7

0.6

0.5
u=4, v=10
f(x)

u=10, v=10
0.4
u=10, v=30
0.3

0.2

0.1

0
0 1 2 3 4

These are the distributions used then for (significance) hypothesis–testing of variances (cf.
http://www.milefoot.com/math/stat/ht-variance.htm; https://www.statlect.com/fundamentals-of-
statistics/hypothesis-testing-variance; http://www.itl.nist.gov/div898/handbook/eda/section3/eda358.htm).

As per t-test (here for two-tail), if F0< Fα/2, n-1 or F0> F1-α/2, n-1 then accept the null hypothesis at the level of
significance. If F0> Fα/2, n-1 or F0< F1-α/2, n-1 then reject the null hypothesis for the alternative hypothesis. Again
there in most software there are p-level calculations so if the p-value is less than (or equal to) α, then the null
hypothesis is rejected in favor of the alternative hypothesis. And, if the p-value is greater than α, then the null
hypothesis is not rejected (cf. if α is 0.05 then this is at the 95% confidence level).
The Analysis of Variance (ANOVA)
Experiments with a Single Factor

Take an experiment with treatments or different levels of a single factor. The data can be represented by a
linear statistical model.

i  1, 2, . . . . , a
x ij    t i  eij 
 j  1, 2, , . . . n

where xij is the observation for the ith level and jth replication.
 is the overall mean
i is the ith treatment effect
ij is a random error.

Format of data for analysis of variance

Treatment
(Level) Observations Totals Averages
1 x11 x12 . . . x1n x1. x 1.
2 x21 x22 . . . x2n x2. x 2.
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
a xa1 xa2 . . . xan xa. x a.
x.. x ..

n
Where x i.   x ij , x i.  x i. n i  1, 2, . . . , a
j1
a n
x ..   x ij x ..  x .. N
i-1 j1

N an
Exemplar #10: Tensile strength of a new fibre

Weight Observed Tensile


Percent Strength
of Totals Averages
Cotton 1 2 3 4 5 xi x i.
15 7 7 15 11 9 49 9.8
20 12 17 12 18 18 77 15.4
25 14 18 18 19 19 88 17.6
30 19 25 22 19 23 108 21.6
35 7 10 11 15 11 54 10.8
x..= 376 x ..= 15.04

A box-whisker plot provides useful visual information about the relative strengths.

30

20

24
11

10
Tensile Strength

21

0
N= 5 5 5 5 5

15% 20% 25% 30% 35%

Cotton weight percent


Analysis of the Statistical Model

The treatment effects i for the model are usually defined as deviations from the overall mean, so


i 1
i 0

The mean of the i-th treatment is E(xij) = i =  + i

Thus, the mean of the i-th treatment consists of the overall mean plus the i-th treatment effect. We are
interested in testing the equality of the a treatment means; that is

H0: 1 = 2 = . . . = a

H1: i  j for at least one pair (i, j)

An equivalent way to write the above hypotheses is in terms of the treatments effects i,

H0: 1 = 2 = . . . = a = 0

H1: i   for at least one pair i

Thus we may speak of testing the equality of treatment means or testing that the treatments effects are zero.
The appropriate procedure for testing the equality of a treatments means is the analysis of variance.

Decomposition of the Total Sum of Squares

The name analysis of variance is derived from a partitioning of the total variability into its component parts.

Consider the Analysis of Variance Identity

SST = SSTreatments + SSE

a n
The total corrected sum of squares is given by SST   (x ij - x .. ) 2
i 1 j1

SSTreatments is called the sum of the square due to treatments (i.e. between treatments), and SSE is called the
sum of squares due to the error (i.e. within treatments). There are a n = N total observations; thus SST has
N-1 degrees of freedom. There are a levels of the factors, so SSTreatments has a-1 degrees of freedom. Finally,
within any treatment there are n replicates providing n-1 degrees of freedom with which to estimate the
experimental error. Since there are a treatments, we have a(n-1) = N - a degrees of freedom for error
a
SSTreatments  n  (x i.  x .. ) 2
i 1

SSTreatments
With the mean squares given by MSTreatments 
a 1
a
n  i2
and E(MSTreatments )   2  i 1
a 1
a
 n 
SSE     (x ij  x i. ) 2 
i 1  j1 
SSE
With the mean squares given by MSE 
Na
and E(MSE )  2

SST/2 is distributed as chi-square with N-1 degrees of freedom.

SSTreatments/2 is distributed as chi-square with a-1 degrees of freedom if H0 is true.

SSE/2 is distributed as chi-square with N - a degrees of freedom if H0 is true.

Also SSTreatments/2 and SSE/2 are independently distributed chi-square random variables (Cochran’s
Theorem).

Therefore if the null hypothesis of no difference in treatment means is true, the ratio

SSTreatments (a -1) MSTreatments


F0  
SSE (N - a) MSE

is distributed as F with a - 1 and N - a degrees of freedom. F0 is the test statistic for the hypothesis of no
differences in treatment means. From the expected mean squares we see that, in general, MSE is an unbiased
estimator of 2. Also, under the null hypothesis, MSTreatments is an unbiased estimator of 2. However if the
null hypothesis is false, then the expected value of MSTreatments is greater than 2. This implies an upper-tail,
one-tail critical region. Therefore, we should reject H0 and conclude that there are differences in the
treatment means if

F0  F ,a-1,N-a

Source of Sum of Degrees of Mean Fo


Variation Squares Freedom

Between SSTreatments a-1 MSTreatments MSTreatments


Treatments Fo 
MSE
Error (Within SSE N-a MSE
Treatments)
Total SST N-1

Source of Sum of Degrees of Mean Fo p-Value


Variation Squares Freedom Square
Between 475.76 4 118.94 14.76 < 0.01
Error 161.20 20 8.06
Total 636.96 24
Model Adequacy Checking

In the analysis of variance it is assumed that the errors are Normally and independently distributed with
mean zero and constant but unknown variance. If these assumptions are valid, then the analysis of variance
procedure is an exact test of the hypothesis of no difference in treatment means. In practice, however, these
assumptions will usually not hold exactly. Violations in the assumptions can be established from
considering the residuals.

eij  x ij  x i.

Examination of the residuals should be an automatic part of an analysis. If the model is adequate, the
residuals should be structureless (i.e. no trends)

0
RESIDUAL

-2

-4
0 10 20 30

ID
Analysis of Categorical Data

The expression categorical data refers to observations that are only classified into categories so that the data
set consists of frequency counts for the categories. Such data occur abundantly in almost all fields of
quantitative study. For example in a survey of job compatibility, employed persons may be classified as
being satisfied, neutral or dissatisfied with their jobs; while in plant breeding, the offspring of cross-
fertilization may be grouped into several genotypes.

Exemplar #11: The offspring produced by a cross between given types of plants can be any of the three
genotypes denoted by A, B and C. A theoretical model of gene inheritance suggests that the offspring of
type A, B, and C should be in the ratio 1:2:1. For experimental verification, 100 plants are bred by crossing
the two given types. Their genetic classifications are recorded in the table below. Do these data contradict
the genetic model?

Genotype A B C Total
Observed frequency 18 55 27 100

Let us denote the population proportions or the probabilities of the genotypes A, B, and C by pA, pB and pC,
respectively. Since the genetic model states that these probabilities are in the ratio 1:2:1, our objective is to
test the null hypothesis

1 1 1
H0 : pA  pB  pC 
4 2 4
Here the data consists of frequency counts of a random sample classified in three categories or cells, the null
hypothesis specifies the numerical value of the cell probabilities and we wish to examine if the observed
frequencies contradict the null hypothesis.

Pearson’s 2 test for goodness of fit

Our primary goal is to test if the model given by the null hypotheses fits the data, and this is appropriately
called testing the goodness of fit. For general discussion, suppose a random sample of size n is classified
into k categories or cells labelled 1, 2, … k and let n1, n2, … nk denote the respective cell frequencies. If we
denote the cell probabilities by p1, p2, … pk a null hypothesis that completely specifies the cell probabilities
is of the form

H0: p1 = p10, p2 = p20, … pk = pk0

where p10, p20, … pk0 are given numerical values that satisfy p10 + p20 + … + pk0 = 1.

The expected cell frequencies can be readily computed by multiplying the specified probabilities pi0 by the
sample size n. A goodness of fit test attempts to determine if a conspicuous discrepancy exists between the
observed cell frequencies and those expected under Ho.
Cells 1 2 … k Total
Observed Frequency O n1 n2 … nk n
Probability under H0 p10 p20 … pk0 1
Expected frequency E under H0 np10 np20 … npk0 n

A useful measure for the overall discrepancy between the observed and expected frequencies is given by the
2 statistic.

 n i  npi0  O  E
2 2
k k
 
2

i 1 npi0 i 1 E

where O and E symbolize an observed frequency and the corresponding expected frequency.

For the example at the start of the section test the goodness of fit of the genetic model to the experimental
data. Take  = 0.05.

Cells A B C Total
Observed Frequency O 18 55 27 100
Probability under H0 0.25 0.5 0.25 1
Expected frequency E under H0 25 50 25 100
O  E  2  2.62
2

1.96 0.50 0.16 d.f. = 2


E

We use the2 statistic with rejection if 2  5.99 since 0.05


2
 5.99 with d.f. = 2. Because the observed 2 =
2.62 is smaller than this value, the null hypothesis is not rejected at  = 0.05. We conclude that the data
does not contradict the genetic model.
Bibliography

Johnson, Richard A.
Miller & Freund's Probability and Statistics for Engineers. 7th Edition.
Prentice Hall College Division. 2005

Montgomery, Douglas C. and Runger, George C.


Applied Statistics and Probability for Engineers. 3rd Edition.
Wiley, New York. 2003.

Johnson, Richard A and Bhattacharyya, Gouri K.


Statistics : Principles and Methods
Wiley, New York. 2005

Montgomery, Douglas C., Runger, George C., Hubele, Norma Faris


Engineering Statistics. 5th Edition.
Wiley, New York. 2012.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy