0% found this document useful (0 votes)
248 views24 pages

Stat 408 Analysis of Experimental Design PDF

The document discusses one-way analysis of variance (ANOVA) and one-way classification of data. It provides an example of a gas company testing the amount of gas in cylinders supplied by different agents. The weights of gas in cylinders from each agent exhibit variation that could be due to differences between the agents. One-way ANOVA can determine if the mean weights differ significantly between agents. One-way classification occurs when data is grouped according to a single factor, like the supplying agent.

Uploaded by

pvbsudhakar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
248 views24 pages

Stat 408 Analysis of Experimental Design PDF

The document discusses one-way analysis of variance (ANOVA) and one-way classification of data. It provides an example of a gas company testing the amount of gas in cylinders supplied by different agents. The weights of gas in cylinders from each agent exhibit variation that could be due to differences between the agents. One-way ANOVA can determine if the mean weights differ significantly between agents. One-way classification occurs when data is grouped according to a single factor, like the supplying agent.

Uploaded by

pvbsudhakar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

STAT 408: ANALYSIS OF EXPERIMENTAL DESIGN LECTURE NOTES: ONE-WAY CLASSIFICATION

The One way Classification


One-Way Analysis of Variance

1.1 Observational and Experimental Studies


Research studies may often be classified as either observational or experimental, although
some are a mixture of the two.

1.1.1 Observational Studies


In an observational study, data are collected without any attempt to manipulate or
influence the outcome.

For example:
y Fish may be collected from three different regions of a lake, in order to compare
their weights over the three locations.
y Children from three different schools may be compared for their performance on
an achievement test.
y Households from three suburbs are surveyed to compare their incomes and
political opinions.

1.1.2 Experimental Studies


In experiments usually some manipulation is attempted, in order to see if the outcome is
related to the factor being controlled.

For example:
y Twenty plots of carrots are grown in a field. Each plot is randomly allocated to
one of five fertilizers, with four plots for each fertilizer. At the end of the
experiment, the carrots from each plot are weighed. The yield of carrots with
different fertilizers is being studied.

y Twenty children from a class are each randomly assigned to one of five different
teaching methods, four children to each method. After three weeks of teaching,
each child is tested for understanding of the material taught. The different
teaching methods are being compared.

y People with a certain disease are randomly allocated to three different drugs. The
drugs are being compared for their influence on the progress of the disease.

The goal of a study is to find out the relationships between certain explanatory factors
and response variables.

The nature of the study matters when it comes to interpretation of results.

• S.A. YEBOAH FSS, FASG, MSc, BSc (Hons) Page 1


STAT 408: ANALYSIS OF EXPERIMENTAL DESIGN LECTURE NOTES: ONE-WAY CLASSIFICATION

y An experimental study aims to answer the question: whether there is a cause-and-effect


relationship between the explanatory factor and the response variable.

y An observational study usually can only answer whether there is an association between
the explanatory factor and the response variable. In general, external evidence is
required to rule out possible alternative explanations for a cause-and-effect relationship.

Regression and ANOVA Models


Regression models and ANOVA models can be used for both observational and
experimental data.
– It is much easier to use regression methods for observational data, in particular when
variable selection is an issue.
– In many ways an ANOVA framework is easier to utilize for experiments.

y Regression models can include both qualitative and quantitative explanatory variables.
– Regression models assume that there is some sort of linear relationship between quantitative
explanatory variables (or transformations) and the response.

y Analysis of variance (ANOVA) models assume all explanatory variables (quantitative and
qualitative) enter the model as qualitative variables.

– Quantitative explanatory variables are normally converted to qualitative explanatory


variables.
– There are no assumptions about the nature of the statistical relation between the
explanatory variables and the response.

y Effectively no difference between ANOVA models and regression models with qualitative
explanatory variables.

• S.A. YEBOAH FSS, FASG, MSc, BSc (Hons) Page 2


STAT 408: ANALYSIS OF EXPERIMENTAL DESIGN LECTURE NOTES: ONE-WAY CLASSIFICATION

Analysis of Variance
y We must consider the method of analysis when designing a study. The method of analysis
depends on the nature of the data and the purpose of the study.

y Analysis of variance, ANOVA, is a statistical procedure for analyzing continuous data,


sampled from two or more populations, or from experiments in which two or more
treatments are used. It extends the two-sample t -test to compare the means from more than
two groups.

y ANOVA is typically used when the effects of one or more explanatory variables are of
interest.

y The goal of ANOVA is to determine if there is a difference between the mean response
associated with each factor level or treatment. If there is a difference, determine the nature of
the difference.

Basic Concepts
y We shall start with a simple real life problem that many of us face.
y Nowadays most of us use gas for cooking purposes. Most of the gas users are customers
of gas companies.
y The customers get their refills (filled gas cylinders) through the agents of these
companies.
y One of the customers, Mrs. Mensah, who buys her gas from ABC gas agent, has faced a
problem in the recent past.
y She observed that her cylinders were not lasting as long as they used to be in the past.
y So she suspected that the amount of gas in the refills was less compared to what she
used to get in the past. She knew that she is supposed to get 14.2 kgs of gas in every
refill.
y She explained her problem to the customers’ complaints section of the ABC gas
company.
y Subsequently, the company made a surprise check on an ABC agent.
y They took 25 cylinders that were being supplied to customers from this agency and
measured the amount of gas in each of these cylinders.
y The 25 observations were statistically analyzed and through a simple test of hypothesis
it was inferred that the mean amount of gas in the cylinders supplied by the ABC agent
was significantly lower than 14.2 kgs.

• S.A. YEBOAH FSS, FASG, MSc, BSc (Hons) Page 3


STAT 408: ANALYSIS OF EXPERIMENTAL DESIGN LECTURE NOTES: ONE-WAY CLASSIFICATION

y On investigation, it was revealed that the agent was tapping gas from cylinders before
they are being supplied to the customers.
y There were five agents of the company in the town where Mrs. Mensah was living.
y To protect customers’ interests, the company decided to carry out surprise checks on all
the agents from time to time.
y During each check, they picked up 7 cylinders at random from each of the five agents
resulting in the data given in the table below. Is it possible to test from this data whether
the mean amount of gas per cylinder differs from agent to agent?
y It is possible to carry out a simple test of hypothesis for each of the agents separately.
But there is a better statistical procedure to do this simultaneously. We shall see how
this can be done.

Source of Variation
y You know that variation is inevitable in almost all the variables (measurable
characteristics) that we come across in practice.
y For example, the amount of gas in two refills is not the same irrespective of whether the
gas is tapped or not.
y Consider the data in the table below.
y We have the weights of gas in 35 cylinders taken at random, seven from each of the five
agents.
y These 35 weights exhibit variation. You will agree that some of the possible reasons for
this variation are one or more of the following:-
9 The gas refilling machine at the company does not fill every cylinder with exactly
same amount of gas.
9 There may be some leakage problem in some of the cylinders.
9 The agency/agents might have tapped gas from some of these cylinders.
9 All the 35 cylinders are not filled by the same filling machine.

y Thus, the variation in the 35 weights might have come from different sources.
y Though the variation is attributable to several sources, depending upon the situation,
we will be interested in analyzing whether most of this variation can be due to
differences in one (or more) of the sources.

• S.A. YEBOAH FSS, FASG, MSc, BSc (Hons) Page 4


STAT 408: ANALYSIS OF EXPERIMENTAL DESIGN LECTURE NOTES: ONE-WAY CLASSIFICATION

For instance, in the above example, the company will be interested in identifying if there are
any differences among the agents. So the source of variation of interest here is AGENTS. In
other words, we are interested in one factor or, one-way analysis of variance.

y Now that you know what is source of variation, you can think of different types of sources.
y In the gas company example, agents form one type of source.
y If the cylinders under consideration were refilled by different filling machines, then filling
machines is another type of source of variation.

When the data are classified only with respect to one type of source of variation, we say that we
have one-way classification data.
In many situations, one conducts experiments to study the effect of a single factor on a variable
under study. Such experiments, known as one-factor experiments, lead to one-way
classification data.

Classification of Data
The process of arranging data into homogenous group or classes according to some common
characteristics present in the data is called classification.
For Example: The process of sorting letters in a post office, the letters are classified according to the cities and
further arranged according to streets.

Types of Classification:
(1) One -way Classification:
If we classify observed data keeping in view single characteristic, this type of classification is
known as one-way classification.
(2) Two -way Classification:
If we consider two characteristics at a time in order to classify the observed data then we are

• S.A. YEBOAH FSS, FASG, MSc, BSc (Hons) Page 5


STAT 408: ANALYSIS OF EXPERIMENTAL DESIGN LECTURE NOTES: ONE-WAY CLASSIFICATION

doing two way classifications.


(3) Multi -way Classification:
We may consider more than two characteristics at a time to classify given data or observed data.
In this way we deal in multi-way classification.
For Example: The population of world may be classified by Religion, Sex and Literacy.

Single-Factor Experiments
y We generally classify scientific experiments into two broad categories, namely, single-factor
experiments and multifactor experiment.

y Definition: Whenever an experimenter is concerned with comparing the means/effects of a


single factor having at least 3 levels whether the levels are (i) quantitative or qualitative or (ii)
fixed or random, the experiment is referred to as a single factor experiment.
y In a single-factor experiment, only one factor varies while others are kept constant.
y In these experiments, the treatments consist solely of different levels of the single variable
factor.
y If there is only one factor, and if the response variable is continuous and satisfies a few other
conditions to be discussed later, then the statistical analysis of the experimental data is done
by one-way analysis of variance.
y In multi-factor experiments (also referred to its factorial experiments), two or more factors
vary simultaneously.

In single factor experiments the response variable Y is continuous

There are two key differences regarding the explanatory variable X.

1. It is a qualitative variable (e.g. gender, location, etc). Instead of calling it an explanatory


variable, we now refer to it as a factor.
2. No assumption (i.e. linear relationship) is made about the nature of the relationship
between X and Y. Rather we attempt to determine whether the response differ
significantly at different levels of X.
We will consider two single-factor ANOVA models:

y Model I: This is a model where the factor levels are fixed by the researcher. Conclusions will
pertain only to the means associated with each of the fixed factor levels.

• S.A. YEBOAH FSS, FASG, MSc, BSc (Hons) Page 6


STAT 408: ANALYSIS OF EXPERIMENTAL DESIGN LECTURE NOTES: ONE-WAY CLASSIFICATION

y Model II: This is a model where the factor levels are random, that is, the levels are randomly
selected by the researcher from a population of factor levels. Conclusions will extend to the
population of factor levels.

Fixed Factors Model (Model I)

There are two ways of parameterizing the model:


1. Cell means model
2. Factor effects model

Notation

X (or A) is the qualitative factor


y r (or a or k) is the number of levels
y we often refer to these as groups or treatments

Y is the continuous response variable


y ij is the jth observation in the ith group.

i 1, 2, L , k levels of the factor X.


j 1, 2, L , ni observations at factor level i.
k
The total number of observations is N ∑n
i 1
i

In general, we have a single factor with k u 2 levels (treatments) and ni replicates for each
treatment.

Cell Means Model


yij Pi  H ij
Where
yij is the jth observation on treatment i,
Pi is the theoretical mean of all observations at level i.
H ij is a random deviation of yij about the ith mean P i . H ij is called the random error.

Model Assumptions
iid
y H ij ~ N (0, V 2 )
iid
y yij ~ N ( P i , V 2 )

• S.A. YEBOAH FSS, FASG, MSc, BSc (Hons) Page 7


STAT 408: ANALYSIS OF EXPERIMENTAL DESIGN LECTURE NOTES: ONE-WAY CLASSIFICATION

Parameters
The parameters of the model are: ( P1 , P 2 , L P k , V
2
)

Estimates
ni

∑(y
j 1
ij  yiy ) 2
2
For each level i, get an estimate of the variance, si
n 1
ni

∑y ij

Estimate P i by the mean of the observations at level i. That is, P̂ i yi y


j 1

ni
We combine these si2 to get an estimate of V 2 in the following way.

Pooled Estimate of V2
The pooled estimate is
k k k ni

∑ (n i  1) s 2 2
i ∑ (n i  1) s 2
i ∑∑(y
i 1 j 1
ij  yiy ) 2
s2 i 1 i 1
MSE
k
N k N k
∑ (n
i 1
i  1)

In the special case that there are an equal number of observations per group ( ni n ) then
N nk and this becomes
k
( n  1)∑ si2
1 k 2 2
s 2

nk  k
i 1
∑i
k i1
s a simple average of si

Hypothesis Tests
The hypothesis that all treatments are equally effective becomes:

H 0 : P1 P2 ... P k all means are equal vrs


H1 : Pi { P j for at least one i, j not all the means are equal

Factor Effects Model


An equivalent from of the model:

⎧ i 1,2,...k
Effects Model: yij P  W i  H ij ⎨
⎩ j 1,2,...ni
k

∑nW
k
Where ∑W
i 1
i 0 (balanced design)
i 1
i i 0 (unbalanced design)

• S.A. YEBOAH FSS, FASG, MSc, BSc (Hons) Page 8


STAT 408: ANALYSIS OF EXPERIMENTAL DESIGN LECTURE NOTES: ONE-WAY CLASSIFICATION

y P is the “weighted" or overall mean of the treatment means


y W i The treatment effect (deviation up or down from the grand mean) of the ith treatment and
is defined to be W i Pi  P
y Wi can be thought of as the average effect that factor level i has on the overall mean.

y Another interpretation is to think of W i as an adjustment that needs to be made to the overall


mean given that you know data comes from factor level i.

Parameters
The parameters of the factor effects model are: ( P ,W 1 ,W 2 ,LW k , V ) There are k+2 of these.
2

Estimation of Model Parameters


We now wish to estimate the model parameters, based on the effects model (P, W i , V ). The
2

most popular method of estimation is the method of least squares (LS) which determines the
estimators of P and W i by minimizing the sum of squares of the errors.
k ni k ni
L ∑∑ H
i 1 j 1
2
ij ∑∑ ( y
i 1 j 1
ij  P  W i )2

We use the “^” (hat) notation to represent least squares estimators, as well as, predicted (or
fitted) values.
k
Minimization of L via partial differentiation (with the zero-sum constraint ∑W
i 1
i 0 ) provides

the estimates:
k ni

∑∑ y ij
yyy
P̂ i 1 j 1
yyy
N N

Wˆi yiy  yyy for i=1,…,k,

Hˆij eij yij  yiy


yˆ ij Pˆ  Wˆi = ŷ ij y iy

Hˆij eij yij  yˆ ij yij  yi y

Pˆ i Pˆ  Wˆi yyy  yiy  yyy yiy

• S.A. YEBOAH FSS, FASG, MSc, BSc (Hons) Page 9


STAT 408: ANALYSIS OF EXPERIMENTAL DESIGN LECTURE NOTES: ONE-WAY CLASSIFICATION

Proof
Consider the fixed effect one-way ANOVA model

yij P  W i  H ij ( i 1,L, k j 1,L, ni )

where P and W i are fixed, but unknown, parameters and the H ij ' s are independent random

variables with E( H ij ) = 0 and Var( H ij ) = V 2 .

The least squares estimators, P̂ and Wˆi , of the parameters P and W i are obtained by
minimizing the sum of squares of the errors ( H ij ' s ).

We have H ij y ij  P  W i
Let the sum of squared errors be
k ni k ni
L ∑∑ H
i 1 j 1
2
ij ∑∑ ( y
i 1 j 1
ij  P  W i )2

Mathematically, we want to find Pˆ ,Wˆ1 ,LWˆk that minimize

k ni k ni
L ∑∑ Hˆ
i 1 j 1
2
ij ∑∑ ( y
i 1 j 1
ij  Pˆ  Wˆi ) 2

A solution can be found by using the Normal equations which are found equating the partial
derivatives to 0 and then solving:

xL k ni
2∑∑ ( yij  Pˆ  Wˆi )
xPˆ i 1 j 1
(1)

xL ni
2∑ ( yij  Pˆ  Wˆi ) i 1,L, k
xWˆi j 1
(2)

Setting (1) equal to zero gives

xL k ni
2∑∑ ( yij  Pˆ  Wˆi ) 0
xPˆ i 1 j 1

• S.A. YEBOAH FSS, FASG, MSc, BSc (Hons) Page 10


STAT 408: ANALYSIS OF EXPERIMENTAL DESIGN LECTURE NOTES: ONE-WAY CLASSIFICATION

k ni k ni k ni
⇒ ∑∑ yij ∑∑ Pˆ  ∑∑Wˆ i
i 1 j 1 i 1 j 1 i 1 j 1

k
⇒ yyy NPˆ  ∑ niWˆi (3)
i 1

where N ∑n
i 1
i

Setting each of the equations in (2) equal to zero, the least squares estimators Wˆi for
i 1,L, k are given by

xL ni
2∑ ( yij  Pˆ  Wˆi ) 0 i 1,L, k
xWˆi j 1

ni ni ni
⇒ ∑ yij ∑ Pˆ  ∑Wˆi
j 1 j 1 j 1

⇒ yi y ni Pˆ  niWˆi For i 1,L, k (4)

There is no unique solution to these equations as they are not linearly independent —
summing over i. To get unique solutions for P̂ and Wˆi we impose the constraint
k

∑nW
i 1
i i 0

yyy
Using the constraint into (3) yields yy y NP̂ or P̂ yyy
N

Thus yi y ni Pˆ  niWˆi becomes yi y ni yy y  niWˆi

Solving for Wˆi yields Wˆi yiy  yyy For i 1,L, k

• S.A. YEBOAH FSS, FASG, MSc, BSc (Hons) Page 11


STAT 408: ANALYSIS OF EXPERIMENTAL DESIGN LECTURE NOTES: ONE-WAY CLASSIFICATION

Hypothesis Tests
y The cell means model hypotheses were

H 0 : P1 P 2 ... P k
H1 : Pi { P j for at least one i, j (not all the P i are equal)
y For the factor effects model these translate to

H 0 : W1 W 2 ... W k 0
H1 : W i { 0 for at least one i

Thus, the one way ANOVA for testing the equality of treatment effects is identical to the
ANOVA for testing the equality of treatment means.

Sample Layout
The typical data layout for a one-way ANOVA is shown below:

Some more Notation


k ni
yyy ∑∑ y
i 1 j 1
ij Grand sum of all observations

yyy
yyy k
Grand mean
∑n
i 1
i

ni
yi y ∑y j 1
ij ith treatment sample sum

yi y
yi y ith treatment mean
ni

• S.A. YEBOAH FSS, FASG, MSc, BSc (Hons) Page 12


STAT 408: ANALYSIS OF EXPERIMENTAL DESIGN LECTURE NOTES: ONE-WAY CLASSIFICATION

Decomposition of the Total Deviation


Decomposition of y ij
For any observed value yij we can write:

y ij y yy  ( y i y  y yy )  ( y ij  y i y )
or
yij  yyy = ( yij  yiy )  ( yiy  yyy )

Decomposition of Total Sum of Squares (SST)


The total (corrected) sum of squares is given by
k ni
SST ∑∑ ( y
i 1 j 1
ij  yyy ) 2 is a measure of the total variability in the data.

Notice that the total sum of squares, SST, may be decomposed as


k ni
SST ∑∑ ( y
i 1 j 1
ij  yiy  yiy  yyy ) 2
k ni k ni k ni

∑∑ ( y
i 1 j 1
iy  y yy )  ∑∑ ( y ij  y iy )  2∑∑ ( y iy  y yy )( y ij  y iy )
2

i 1 j 1
2

1i4j4
1
44 4244444 3
0
k k ni

∑n (y
i 1
i iy  yyy )  ∑∑ ( yij  yi y ) 2
2

i 1 j 1
Expressing the above sum of squares symbolically we have:

SST = SSTR + SSE

Breakdown of Degrees of freedom:

SST has N-1 d.f.; SSTR has k-1 d.f.; and SSE has N-k d.f.; so we also have a decomposition of
the total d.f.

d.f Total = d.f. Trt + d.f. Error


→ N  1 = N  k + (k – 1)

The degrees of freedom (d.f.) for a sum of squares counts the number of independent pieces of
information that goes into that quantification of variability.

• S.A. YEBOAH FSS, FASG, MSc, BSc (Hons) Page 13


STAT 408: ANALYSIS OF EXPERIMENTAL DESIGN LECTURE NOTES: ONE-WAY CLASSIFICATION

Notice that
k ni k
SSE ∑∑ ( yij  yi y )2
i 1 j 1
∑ (n  1)s
i 1
i
2
i

2
Where s i is the sample variance within the ith treatment, so
(n1  1) s12  (n2  1) s22  L  (nk  1) sk2
s 2p pooled estimate of V when k=2
SSE 2
MSE k
(n1  1)  L  (nk  1)
∑ (n
i 1
i  1)

Computational Formulae
We have defined SST, SSTR and SSE as sums of squared deviations. Equivalent formulas for
the SST and SSTR for computational purposes are as follows:
k ni k ni 2
y
SST ∑∑ ( y
i 1 j 1
ij  yyy ) 2
∑∑
i 1 j 1
y  yy
N
2
ij

2
k k
yi2y yy y
SSTR ∑n (y
i 1
i iy  yy y ) 2

i 1 ni

N
SSE is computed by subtraction: SSE = SST – SSTR

Mean Squares
The ratios of sums of squares to their degrees of freedom result in mean squares.

y MSTR, the treatment mean square error, is defined as follows: MSTR = SSTR/(k-1)
y MSE, the mean square error, is defined as follows: MSE = SSE/(N-k)

Expected Mean Squares


If V represents the variance associated with random errors,
2 H ij then it can be shown that in

general,
k

∑ n (P
k

∑ niW i2 i i  P)2
E ( MSTR) V 2  i 1
or E ( MSTR) V2  i 1

k 1 k 1

n1P1 n2 P 2 nP k
ni Pi
where P  L k k ∑ and W i Pi  P
N N N i 1 N

E ( MSE ) V 2

• S.A. YEBOAH FSS, FASG, MSc, BSc (Hons) Page 14


STAT 408: ANALYSIS OF EXPERIMENTAL DESIGN LECTURE NOTES: ONE-WAY CLASSIFICATION

The F-test

y Under H 0 : P1 P2 ... Pk or equivalently H 0 : W 1 W2 ... W k 0

E ( MSE ) V2 E ( MSTR )

∑n 0 i
Since E ( MSTR ) V 2  i 1
V2 0 V2
k 1
y Therefore if H 0 : P1 P 2 ... P k or equivalently H 0 : W 1 W 2 ... W k 0 is true

then MSE and MSTR both estimate V2


MSTR SSTR (k  1)
y Therefore under H0 F ~ Fk 1, N k and the test statistic becomes an F-
MSE SSE ( N  k )
test.

y We Reject H0 for large values of the F-ratio in comparison to an Fk 1, N  k distribution

Logic behind the F-test

MSTR
If H0 is true F should be close to 1.
MSE
However, when H0 is false it can be shown that MSTR estimates something larger than V 2 (i.e.
E(MSTR)>E(MSE) when some treatments means are different or if real treatment effects do
exist)
y That is,

⎧ estimator of something l arg er than V 2


⎪ estimator of V 2
if H 0 is false
MSTR ⎪

⎪ estimator of V
2
MSE
if H 0 is true
⎪⎩ estimator of V 2

MSTR
y If !! 1 then it makes sense to reject H0
MSE
y Therefore to determine whether H0 is true or not, we look at how much larger than 1
MSTR/MSE is.

• S.A. YEBOAH FSS, FASG, MSc, BSc (Hons) Page 15


STAT 408: ANALYSIS OF EXPERIMENTAL DESIGN LECTURE NOTES: ONE-WAY CLASSIFICATION

The test procedure may be summarized in an ANOVA TABLE as follows:


Degrees of Sum of Squares Mean Squares F
Source Freedom
Treatment k-1 SSTR MSTR=SSTR/(k-1) MSTR/MSE
Error N–k SSE MSE=SSE/(N-k)
Total N–1 SST

Comparison of factor level means/effects


y A confidence interval on one mean P i is estimated by whose variance is estimated by yi .
This results in:
CI yi y s tD / 2 , N  k MSE / ni
y Similarly, a confidence Interval on one difference is Pi  P j W i  W j is
CI yi y  y j y s tD / 2, N  k MSE 1
ni
 1
nj

Suppose that following the ANOVA F test (for treatments) where the null hypothesis
H 0 : P1y P2y L Pk y
is rejected, we wish to determine which means can be considered significantly different from
each other. That is, we wish to test
H 0 : Pi y P jy H1 : Pi y { P j y for 1 e i  j e t of all P1y ,..., P 2y

This could be done using the t statistic


yi y  y j y
t and comparing it to tD 2, ( N  k ) .
⎛1 1⎞
MSE ⎜  ⎟
⎜n n ⎟
⎝ i j ⎠

An equivalent test declares P i y and P jy to be significantly different if yiy  y jy ! LSD

Where
⎛1 1⎞
LSD tD 2, N k MSE⎜  ⎟
⎜n n ⎟
⎝ i j ⎠

• S.A. YEBOAH FSS, FASG, MSc, BSc (Hons) Page 16


STAT 408: ANALYSIS OF EXPERIMENTAL DESIGN LECTURE NOTES: ONE-WAY CLASSIFICATION

Random Effects Model for One-way ANOVA (ANOVA Model II)


y So far we have studied experiments and models with only fixed effect factors: factors whose
levels have been specifically fixed (in advance) by the experimenter, and where the interest is
in comparing the response for just these fixed levels.

y A random effect factor is one that has many possible levels, and where the interest is in the
variability of the response over the entire population of levels, but we only include a random
sample of levels in the experiment.

The factor levels are meant to be representative of a general population of possible levels.
We are interested in whether that factor has a significant effect in explaining the response,
but only in a general way. For example, we're not interested in a detailed comparison of level
2 vs. level 3, say.

Examples: Classify as fixed or random effect.


1. The purpose of the experiment is to compare the effects of three specific dosages of a
drug on response.
2. A textile mill has a large number of looms. Each loom is supposed to provide the
same output of cloth per minute. To check whether this is the case, five looms are
chosen at random and their output is noted at different times.
3. A manufacturer suspects that the batches of raw material furnished by his supplier
differ significantly in zinc content. Five batches are randomly selected from the
warehouse and the zinc content of each is measured.
4. Four different methods for mixing Portland cement are economical for a company to
use. The company wishes to determine if there are any differences in tensile strength
of the cement produced by the different mixing methods.
5. A drug company has its products manufactured in a large number of locations, and
suspects that the purity of the product might vary from one location to another.
Three locations are randomly chosen, and several samples of product from each are
selected and tested for purity.

• S.A. YEBOAH FSS, FASG, MSc, BSc (Hons) Page 17


STAT 408: ANALYSIS OF EXPERIMENTAL DESIGN LECTURE NOTES: ONE-WAY CLASSIFICATION

Random effects model


Suppose, as before, that there are k treatments (factor levels) or groups, and that yij is the jth
observation in the ith group.

The mathematical representation of the model is the same as the fixed effects model:

yij P  W i  H ij i 1, L k ; j 1, L ni

where y , W and H are random variables and P is an unknown fixed parameter, the overall
mean.

Model Assumptions
1. The H ij ’s (random errors) come independently from a N (0, V 2 ) distribution. [i.e.
iid
H ij ' s ~ N (0,V 2 ) ]

2. The random effects W i ’s are independent random variables with the same
distribution N (0, V W ) .
2

iid
[i.e. we assume that W 1 , W 2 , K , W k ~ N (0, V W ) ]
2

3. W i and H ij are independent of each other for all i, j . j 1, L , ni i 1,L, k .

Variance components

y In the random effects model, the variance of y ij is no longer just V . The equation for y ij
2

now has two random variables on the right. There is the residual unexplained variability V
2

as before, plus the variability from randomly selecting W i from a N (0, V W2 ) distribution.

That is: Var ( yij ) Var ( P  W i  H ij ) Var(W i )  Var(H ij ) V W2  V 2

The two variances V W2 and V 2 are called variance components (or components of variance) as
the variance of one observation is equal to V W  V .
2 2

y Further, it can be shown that

E ( y ij ) P Var( yij ) V 2  V W2 i.e. yij ~ N ( P , V W2  V 2 )

These two components may be estimated from the MS column of the ANOVA table.

• S.A. YEBOAH FSS, FASG, MSc, BSc (Hons) Page 18


STAT 408: ANALYSIS OF EXPERIMENTAL DESIGN LECTURE NOTES: ONE-WAY CLASSIFICATION

Hypotheses
For the random-effects model, testing the hypothesis that the individual treatment effects are

zero is meaningless. It is more appropriate to test hypotheses about W i . Since we are interested
in the bigger population of treatments, the hypotheses of interest associated with the random

Wi effects are:

H 0 : V W2 0 vrs H1 : V W2 ! 0

If V W
2
y 0 , then all random treatment effects are identical, but

y If V W2 ! 0 significant variability exists among randomly selected treatments (that is, the
variability observed among the randomly selected treatments is significantly larger than the
variability that can be attributed to random error).

Expected mean squares (EMS)


The expected values of the mean squares for treatments and error are somewhat different than in
the fixed-effect case.
Balanced design
In the case of a balanced design, with k treatments and n observations per treatments (so N =
kn), there are formulae for the expected mean squares.
The expected value for MSE (mean square error) is V .
2
y

This equation holds independent of V W .


2

y Under the alternative hypothesis: V W2 ! 0 , and for ni=n the expected value of MSTR (mean

squares for treatments) is V 2  nV W2 ,

⎛ SSTR ⎞
E⎜ ⎟ V  nV W .
2 2
E ( MSTR )
⎝ k 1 ⎠
Unbalanced design
For unequal sample sizes (i.e. unequal ni ‘s) (unbalanced design) n is replaced by n0
⎡ k

1 ⎢ k ∑ ni2 ⎥
Where n0 ⎢∑ ni  i k1 ⎥
k 1 ⎢ i 1
⎢⎣ ∑ ni ⎥⎥
i 1 ⎦

• S.A. YEBOAH FSS, FASG, MSc, BSc (Hons) Page 19


STAT 408: ANALYSIS OF EXPERIMENTAL DESIGN LECTURE NOTES: ONE-WAY CLASSIFICATION

ANOVA of variance
y The ANOVA decomposition of total variability is still valid;
y That is, the ANOVA identity is still SST = SSTR + SSE as for the fixed effects model and
the formulae for computing the sums of squares remain unchanged
y The computational procedure and construction of the ANOVA table for the random effects
model are identical to the fixed-effects case.
The conclusions, however, are quite different because they apply to the entire population of
treatments.

ANOVA Table (for ni=n)


Source d.f. Sum of Mean square Expected MS
squares
Model k −1 SSTR SSTR/(k −1)=MSTR V 2  nV W2

Error n –k SSE SSE/(n −k)= MSE V2


Total n −1 SST

Testing
Testing is performed using the same F statistic that we used for the fixed effects model:
MSTR
F*
MSE
If F ! FD , k 1, N  k then Reject H0
*
Otherwise do not Reject H0

If H0 is true then V W
2
0 the expected F-value is 1.

MSTR V 2  nV W2 V2
That is, E ( MSTR) V  n0 (0) V  0 V and F
2 2 2 *
1
MSE V2 V2
However, when real variability among the random treatments does exist, that is, V W ! 0 , then
2

E ( MSTR) V 2  ( some positive quantity)

Therefore, the larger the variability among the random treatment effects W i , the larger

E(MSTR) becomes. This implies the ratio

E ( MSTR ) V 2  n0V W2
1  ( another positive quantity ) becomes larger as the variability among the
E ( MSE ) V2
W i ’s increase.

• S.A. YEBOAH FSS, FASG, MSc, BSc (Hons) Page 20


STAT 408: ANALYSIS OF EXPERIMENTAL DESIGN LECTURE NOTES: ONE-WAY CLASSIFICATION

Unbiased Estimators
The parameters of the one-way random effects model are P , V and V W2 .
2

Mean
As in the fixed effects case, we estimate P by
k ni

∑∑ y ij
yyy
P̂ i 1 j 1
yyy
N N

Estimation of V2 and V W2
Usually, we also want to estimate the variance components ( V 2 and V W ) in the model. The
2

procedure consists of equating the expected mean squares to their observed values in the
ANOVA table and solving for the variance components.

Thus the estimates of the components of variance are:


y Since MSE is an unbiased estimator of its expected value V 2

Vˆ 2 = MSE
⎛ MSTR  MSE ⎞ n0V W  V 2  V 2
y E ⎜⎜ ⎟⎟ V W2
⎝ n0 ⎠ n0
MSTR  MSE
Since E ( MSTR) n0V W2  V 2 so VˆW2
n0
y Note that VˆW2 u 0 if and only if MSTR u MSE , which is equivalent to F u 1 .

Occasionally MSTR < MSE. In such a case we will get VˆW  0.


2
y

y A negative variance estimate VˆW2 occurs only if the value of the F statistic is less than 1.
Obviously the null hypothesis H0 is not rejected when F e 1 . Since variance cannot be
negative, a negative variance estimate is replaced by 0. This does not mean that V W2 is zero. It
simply means that there is not enough information in the data to get a good estimate of V W2 .

Confidence Intervals for Variance Components


Since we now have estimates of V W2 and V 2 , the two components of variance in the response Y ,
we can estimate the percentage of the total variation due to the factor W , and the percentage due
to the residual variation.
VˆW2 Vˆ 2
% due W X 100 and % unexp lained X 100
VˆW2  Vˆ 2 VˆW2  Vˆ 2

It is also possible to calculate approximate confidence intervals for V W2 and V 2

• S.A. YEBOAH FSS, FASG, MSc, BSc (Hons) Page 21


STAT 408: ANALYSIS OF EXPERIMENTAL DESIGN LECTURE NOTES: ONE-WAY CLASSIFICATION

Confidence Intervals for V2


SSE
Since ~ F 2 ( N k ) it must be true that
V 2

⎛ SSE ⎞
Pr ⎜ F12 D 2 ( N  k ) e 2 e F D22 ( N  k ) ⎟ 1  D
⎝ V ⎠
Inverting all three terms in the inequality just reverses the ≤ signs to u’s:

⎛ ⎞
⎜ 1 V2 1 ⎟
Pr ⎜ 2 u u 2 ⎟ 1D
⎜ F1D 2 ( N  k ) SSE F D 2 ⎟
⎝ ( N k ) ⎠

⎛ ⎞
⎜ SSE SSE ⎟
⇒ Pr ⎜ 2 uV u 2
2
⎟ 1D
⎜ F1  D 2 ( N  k ) FD ⎟
⎝ 2 (N k ) ⎠

Therefore, a 100(1  D )% confidence interval for V is


2

⎛ ⎞
⎜ SSE SSE ⎟
⎜ F2 , 2 ⎟
⎜ D 2 ( N  k ) F 1D 2 ⎟
⎝ (N k ) ⎠

It turns out that it is a good bit more complicated to derive a confidence interval for V W .
2

However, we can more easily find exact CIs for the intra-class correlation coefficient

V W2 V W2 V W2
U and for the ratio of the variance components T
V W2  V 2 V Y2 V2
Confidence Interval for T V W V
2 2

Where T represents the ratio of the between treatment variance to the within-treatment or error
variance.

F 2 ( k  1) F 2 (N  k)
Since MSTR ~ (V  n0V A ) and MSE ~ V 2
2 2

k 1 N k

and MSTR and MSE are independent,

MSTR ⎛ V 2  n0V W2 ⎞ MSTR MSE


~ ⎜⎜ ⎟⎟ F (k  1, N  k ) ⇒ ~ F (k  1, N  k ) Using an argument
MSE ⎝ V 2
 T
4⎠
1 n
142
4 43
1 n0T

similar to the one we used to obtain our CI for V 2 we get the 100(1-)% interval [Lower,
Upper] for θ where

• S.A. YEBOAH FSS, FASG, MSc, BSc (Hons) Page 22


STAT 408: ANALYSIS OF EXPERIMENTAL DESIGN LECTURE NOTES: ONE-WAY CLASSIFICATION

⎡ MSTR 1 ⎤ 1
Lower ⎢ X  1⎥ L
⎣⎢ MSE FD 2, k 1, N k ⎥⎦ n0

⎡ MSTR ⎤ 1 ⎡ MSTR 1 ⎤ 1
upper ⎢⎣ MSE X FD 2, N  k , k 1  1⎥⎦ n ⎢ X  1⎥ U
0 ⎢⎣ MSE F1D 2, k 1, N k , ⎥⎦ n0

V W2 V W2
Confidence Intervals for U
V W2  V 2 V Y2
U (intra-class correlation coefficient) represents the proportion of the total variance that is
the result of differences between treatments

T
Since U we can transform the endpoints of the interval for θ to get an interval for ρ:
1T
1D P[ L e V W2 V 2 e U ]

P[1  L e 1  V W2 V 2 e 1  U ]

V 2  V W2
P[1  L e e 1U]
V2
⎡ 1 V2 1 ⎤
P⎢ u 2 u ⎥
⎣1  L V  V W 1  U ⎦
2

⎡ 1 V2 1 ⎤
P ⎢1  e 1 2 e 1 ⎥
⎣ 1 L V  VW2
1U ⎦

⎡ L V2 U ⎤
P⎢ e 2 W 2 e ⎥
⎣1  L V  V W 1  U ⎦

⎡ Lower Upper ⎤
Thus, a 100(1-D)% Confidence Interval for ρ is ⎢ , ⎥
⎣1  Lower 1  Upper ⎦

• S.A. YEBOAH FSS, FASG, MSc, BSc (Hons) Page 23


STAT 408: ANALYSIS OF EXPERIMENTAL DESIGN LECTURE NOTES: ONE-WAY CLASSIFICATION

Example 1:
We are to investigate the formulation of a new synthetic fibre that will be used to
make cloth for shirts. The cotton content varies from 10% - 40% by weight (the one
factor is cotton content) and the experimenter chooses 5 levels of this factor: 15%,
20%, 25%, 30%, 35%. The response variable is Y = tensile strength (time to break
when subject to a stress). There are 5 replicates (complete repetitions of the
experiment). In a replicate five shirts, each with different cotton content, are
randomly chosen from the five populations of shirts. The 25 tensile strengths are
measured, in random order.
Tensile Strength Data
Cotton Percentage
15% 20% 25% 30% 35%
7 12 14 19 7
7 17 18 25 10
15 12 18 22 11
11 18 19 19 15
9 18 19 23 11

Does changing the cotton content (level) change the mean strength?
Carry out an ‘Analysis of Variance’ (ANOVA) at D=0.01

Example 2
A textile company weaves a fabric on a large number of looms. They would like the looms to be
homogeneous so that they obtain a fabric of uniform strength. The process engineer suspects
that, in addition to the usual variation in strength within samples of fabric from the same loom,
there may also be significant variations in strength between looms. To investigate this, he selects
four looms at random and makes four strength determinations on the fabric manufactured on
each loom. The data are given in the following table:

Observations
Looms 1 2 3 4
1 98 97 99 96
2 91 90 93 92
3 96 95 97 95
4 95 96 99 98
Use D=0.05

• S.A. YEBOAH FSS, FASG, MSc, BSc (Hons) Page 24

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy