ANOVA - Blocked Design
ANOVA - Blocked Design
If observations come from different days and/or locations what might happen to
our data?
Say each hospital location varies in population density – thus some hospitals are
in high traffic areas while others are not.
Might differences in traffic alter participants anxiety levels before they arrive?
Could this influence the effectiveness of each treatment?
Recognizing and Dealing With Confounds
Suppose on the second day of testing a large attack occurs (e.g., another 9/11,
Mumbai, Nairobi, Vegas). Might such an occurrence impact participants anxiety
levels making it more difficult to reduce anxiety than on an average day where
nothing happened?
By recognizing the potential for such confounds we can in the best cases design
our experiment using a complete randomized block design.
Completely randomized design
What is the block? Often we are interested in the effects of specific conditions
(treatments – categorical predictors) on something (dependent variable).
Sometime we recognize the commonalities among subjects and they can be potential
confounds in the study.
Block is a subgroup that you can assign your subject/participant into it, which helps your
design and result be more precisely.
Blocking
For example, we want to know if a new drug will influence individuals in terms
of their stamina. We can simply have two groups, a control group (with placebo)
and a treatment (the new drug) group. However, this design can not help us to
recognize if there are any differences between some other potential extraneous
factors, such as gender, exercise habit, etc.
In this case, we can create BLOCKs for gender, exercise habit and/or other
extraneous variables. This design helps us to get a deeper understanding of the
data and gain a more precise result, which is closer to the real situation.
Blocking
A researcher wants to study the effect on three different weight-loss drugs on the behavior
of hogs. He has a number of hogs on which to experiment. However, he is concerned that the color of
the hogs might affect the hog’s susceptibilities to the drugs. He notices that with this specific kind of
hog, there have three colors; white, black, and grey. If he does not control for them, that means
differences in the shade of hogs will potentially increase his error mean square and reduce his
chances of getting significant results. To perform the experiment, he selects three types of color of
hogs and randomly assigns one color to each of his three drug conditions, administers the drug for a
three-month period, and then measure the hog’s weight loss.
A study was conducted to determine the effect of word organization upon the number of words
recalled. Prior research had established a strong correlation between verbal ability and word
recall. Therefore, individual differences in verbal ability were controlled to produce a more
sensitive test of the treatment effect. (Miller, 2013)
Normality
Observations are drawn from a normally distributed population.
Independence of observations
Observations are randomly sampled from the population , or subjects are randomly
assigned to treatment groups, so that all the observations in the within groups and the
between groups are independent.
Equal variance
The observations have equal variances across groups.
Additivity of Interaction
There is no interaction between treatment and block.
Additivity of Interaction
1. The purpose of the blocking factor is to account for a nuisance factor and/or to
reduce the error term used in performing the test for the significance of the
treatment effect. For this reason, the significance of the block effect itself is not
tested.
2. If there is an interaction effect between block and IV, the blocked ANOVA
design will be analyzed as a factorial ANOVA.
Blocked Design
A study was conducted to determine the effect of price upon the rating of a particular
type of earphone. However, one of the supervisors has a concern that different colors
would have impact on consumers’ attitudes. Therefore, individual differences in color are
controlled to produce a more sensitive test of the treatment effect.
In the study the IV is price, which has three levels, low, median and high.
Block is color, which has six levels.
DV is the rating from consumers.
In our case we need 240 to 600 observations to run this study: 40 raters * 6 levels of price.
Randomized Block Design
We need to determine whether we can run 240 participants (focus groups) under the exact same
conditions (same day, time, location, and experimenter, etc.).
Perhaps we can, or we will employ a platform that will ensure enough random error that there
should be no systematic effect of these outsides influences on the data (or there is no way to
measure/control for it).
If we are running in a lab setting (focus groups at a marketing firm, patients at various hospitals,
different days or experimenters) then we can attempt to control for their influence in our analysis
(coding them as dummy variables).
There are several approaches to this and the first will cover is a complete randomized block
design - probably the best practice.
Randomized Block Design
In a complete randomized block design, each block contains at least one full
replication of the experiment with the predictor levels randomly distributed
across it.
Blocks can be anything from the time of day to the location they are ran to where
(from whom) the observations come from: The variable must be categorical.
For our example lets assume participants will be randomly assigned to rate 1 of
the 3 possible product price points for how likely they would be to buy them – the
response scale goes from 0 [Wouldn’t Buy] to 100 [Would Buy for Sure].
Blocked Design: Application
First of all, make sure the data is randomly selected and /or randomly assigned to
the groups.
Scatterplot of residuals
eruption.lm = lm(Variable ~ Group, data=data)
eruption.res = resid(eruption.lm)
plot(data$Variable, eruption.res, ylab="Residuals", xlab=“group", main=“Title")
abline(0, 0)
Price is significant.
Analyzing a Randomized Block Design in R
Look at color – we are generally not interested in whether our blocking factors are
significant but note how much variance it explains (MS).
Bonferroni
pairwise.t.test(ANOVAExample $IV, ANOVAExample $Group, paired = FALSE,
p.adjust.method = “method")
Tukey
TukeyHSD(model)
Effect Size
by(ANOVAExample$V2, ANOVAExample$Condition, stat.desc)
mes(mean 1, mean 2, SD 1, SD2, n1, n2)
Summary write up
We know we can account for one additional source of variance (an experimental
confound) in our data, providing ourselves with more power to detect effects we are
in interested in.
Looking at the example data set we are working with we can see that we have two
potential experimental confounds – focus groups and color.
Colors are larger blocks with each containing 6 focus groups – thus the study was
run with larger sessions and within each session different focus groups broke off.
First we make sure we have no interactions with our main effect of price as well.
model <- aov(rating ~ price*session*focusgroup, data = BlockingExample)
Notice there is a significant interaction effect, so what should we do?
model <- aov(rating ~ price + session + focusgroup, data = BlockingExample)
Not much changes because the impact of Focus Group is small – though price is
slightly more significant because more variance is removed from the error term.
We will talk more next week (factorial ANOVA) in terms of main effect, interaction
and others.
What assumption must we test to include a variable as a
blocking factor?
Recognize the IV, DV, block and create a table for the
following research statement.
“A company is planning to investigate the motor skills or
elderly population. The company separates the target
Weekly Lab population into three age categories: 60 – 69, 70 – 79, and
above 80 then randomly assign the participants in the study to
one of the three task conditions. After individuals have
completed the task, their performance will be compared.