0% found this document useful (0 votes)
16 views42 pages

ANOVA - Blocked Design

Uploaded by

aarya.raghav9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views42 pages

ANOVA - Blocked Design

Uploaded by

aarya.raghav9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 42

Blocking: Design & Analysis

Instructor: Weikang Kao, Ph.D.


What are confounds?

 When performing experiment, we first recognize our target variables.


 Second? Eliminate some extraneous variables.
 What if we can’t?
Planning for Confounds

 The most important stage of experimental research is the design phase.


 One aspect of design often overlooked is the importance of recognizing potential
issues with how data is collected/generated: experimental design confounds.
 Experimental design confounds can lead to entanglement of outside contributors
of variance (e.g., data generated under different settings) with key predictors
(conditions).
 Our goal is to spot them early so we can account for any variance they contribute.
Limitations Lead to Confounds

 Good experiments typically have large sample sizes.


 Ideally all that data would be collected under identical conditions.
 Say we want to compare the effectiveness of two types of medication and a
placebo on reducing anxiety/irritation.
 What should we do?
Limitations Lead to Confounds

 FDA guidelines suggest 1000 observations should be in each treatment condition


for robust estimation of effects (i.e., the general population of anxiety sufferers in
our example).
 How likely will be able to obtain 1000 observations in one day of testing?
 We probably wouldn’t even be able to get 1000 observations on the same day
even if we collected observations at different hospitals (laboratories).
Recognizing and Dealing With Confounds

 If observations come from different days and/or locations what might happen to
our data?
 Say each hospital location varies in population density – thus some hospitals are
in high traffic areas while others are not.
 Might differences in traffic alter participants anxiety levels before they arrive?
Could this influence the effectiveness of each treatment?
Recognizing and Dealing With Confounds

 Suppose on the second day of testing a large attack occurs (e.g., another 9/11,
Mumbai, Nairobi, Vegas). Might such an occurrence impact participants anxiety
levels making it more difficult to reduce anxiety than on an average day where
nothing happened?
 By recognizing the potential for such confounds we can in the best cases design
our experiment using a complete randomized block design.
Completely randomized design

 Before moving to the complete randomized block design, what is completely


randomized design?
 With the completely randomized design, subjects are randomly assigned to one of
two or more treatment conditions.
 Why randomized?
 A completely randomized design relies on randomization to control for the effects of
extraneous variables, which means, on average, extraneous factors will affect
treatment conditions equally; this assumption leads to a discussion that the significant
differences between conditions can fairly be attributed to the independent variable.
Blocking
Blocking

 What is the block? Often we are interested in the effects of specific conditions
(treatments – categorical predictors) on something (dependent variable).
 Sometime we recognize the commonalities among subjects and they can be potential
confounds in the study.
 Block is a subgroup that you can assign your subject/participant into it, which helps your
design and result be more precisely.
Blocking

 For example, we want to know if a new drug will influence individuals in terms
of their stamina. We can simply have two groups, a control group (with placebo)
and a treatment (the new drug) group. However, this design can not help us to
recognize if there are any differences between some other potential extraneous
factors, such as gender, exercise habit, etc.
 In this case, we can create BLOCKs for gender, exercise habit and/or other
extraneous variables. This design helps us to get a deeper understanding of the
data and gain a more precise result, which is closer to the real situation.
Blocking

 What does blocking do when performing an ANOVA test?


 Recall: when we do one-way ANOVA, what is the total SS?
 In the one-way ANOVA example, SSerror suggests the variance which is not explained by the
model.
Total SS = SSB + SSW = SStreatment effect + SSerror
 Blocking:
Total SS = SSB + SSW = SStreatment effect + SSblock + SSerror
That means, the block helps to explain some of the variance from the SS error so we can
first: explain more variance and second: have less unexplained variance.
Blocking: Example

A researcher wants to study the effect on three different weight-loss drugs on the behavior
of hogs. He has a number of hogs on which to experiment. However, he is concerned that the color of
the hogs might affect the hog’s susceptibilities to the drugs. He notices that with this specific kind of
hog, there have three colors; white, black, and grey. If he does not control for them, that means
differences in the shade of hogs will potentially increase his error mean square and reduce his
chances of getting significant results. To perform the experiment, he selects three types of color of
hogs and randomly assigns one color to each of his three drug conditions, administers the drug for a
three-month period, and then measure the hog’s weight loss.

 What are the IV and DV?


 What is the block?
Specification Table
Variable #Levels
Drug 3
Color (block) 3
The design will be looked like:
Blocking: Example
Blocking: Example II

 A study was conducted to determine the effect of word organization upon the number of words
recalled. Prior research had established a strong correlation between verbal ability and word
recall. Therefore, individual differences in verbal ability were controlled to produce a more
sensitive test of the treatment effect. (Miller, 2013)

 What are the IV and DV?


 What is the block?
 According to previous studies, verbal ability can be separated into different categories (high,
medium, low). Word organization can be categorized as “with an order” and “not ordered”.
Specification Table
Variable #Levels
Order 2
Verbal (block) 3
The design will be looked like:
Blocking: Example II
Blocked Design: Assumptions

 Normality
Observations are drawn from a normally distributed population.
 Independence of observations
Observations are randomly sampled from the population , or subjects are randomly
assigned to treatment groups, so that all the observations in the within groups and the
between groups are independent.
 Equal variance
The observations have equal variances across groups.
 Additivity of Interaction
There is no interaction between treatment and block.
Additivity of Interaction

1. The purpose of the blocking factor is to account for a nuisance factor and/or to
reduce the error term used in performing the test for the significance of the
treatment effect. For this reason, the significance of the block effect itself is not
tested.
2. If there is an interaction effect between block and IV, the blocked ANOVA
design will be analyzed as a factorial ANOVA.
Blocked Design

As we aware we need more than one replication of an experiment to:


1. Have the power to detect an effect (e.g., a mean difference between conditions)
2. Get a robust estimate of the size of any effect.
As a result we need multiple replications of an experiment to garner anything useful
from it.
A general rule of thumb, in the social sciences, is to aim for 40 to 100 observations
per cell (factorial combination) – a better way to figure out the minimum amount of
data needed for a particularly analysis would be to perform a power analysis (see
G*Power for this).
Application
Blocked Design: Case

 A study was conducted to determine the effect of price upon the rating of a particular
type of earphone. However, one of the supervisors has a concern that different colors
would have impact on consumers’ attitudes. Therefore, individual differences in color are
controlled to produce a more sensitive test of the treatment effect.
 In the study the IV is price, which has three levels, low, median and high.
 Block is color, which has six levels.
 DV is the rating from consumers.
In our case we need 240 to 600 observations to run this study: 40 raters * 6 levels of price.
Randomized Block Design

 We need to determine whether we can run 240 participants (focus groups) under the exact same
conditions (same day, time, location, and experimenter, etc.).
 Perhaps we can, or we will employ a platform that will ensure enough random error that there
should be no systematic effect of these outsides influences on the data (or there is no way to
measure/control for it).
 If we are running in a lab setting (focus groups at a marketing firm, patients at various hospitals,
different days or experimenters) then we can attempt to control for their influence in our analysis
(coding them as dummy variables).
 There are several approaches to this and the first will cover is a complete randomized block
design - probably the best practice.
Randomized Block Design

 In a complete randomized block design, each block contains at least one full
replication of the experiment with the predictor levels randomly distributed
across it.
 Blocks can be anything from the time of day to the location they are ran to where
(from whom) the observations come from: The variable must be categorical.
 For our example lets assume participants will be randomly assigned to rate 1 of
the 3 possible product price points for how likely they would be to buy them – the
response scale goes from 0 [Wouldn’t Buy] to 100 [Would Buy for Sure].
Blocked Design: Application

 Before treating an effect as a blocking variable we have to ensure there is no interaction


between it and our predictor(s) of interest (if you only have 1 replication of an
experiment per block you can’t test this).
 We will need to factor the categorical predictors that are numerically coded (color and
focus group) using the factor() command.
 Next will check the assumptions of ANOVA.
Blocked Design: Application

Research Questions and Hypothesis?


Specification Table
Variable #Levels
Price 3
Color 6
Practice

 We use price as IV and color as block.


 The research question is “Price of an earphone will have influence on consumers
rating regarding the earphone”
 Hypothesis:
 H0: there is no difference on rating between different prices.
 H1: there is at least one group differs than the others.
Assumptions: normality

 Let’s walk through an example in R using the BlockingExample.xlsx data .


plot(density(BlockingExample$rating))
Looks like some negative skew and kurtosis, let’s confirm:
anscombe.test(BlockingExample$rating)
kurt = 1.9864, z = -4.3406, p-value = 1.421e-05
agostino.test(BlockingExample$rating)
skew = -0.16709, z = -0.78371, p-value = 0.4332
Assumptions: independence

 First of all, make sure the data is randomly selected and /or randomly assigned to
the groups.
 Scatterplot of residuals
eruption.lm = lm(Variable ~ Group, data=data)
eruption.res = resid(eruption.lm)
plot(data$Variable, eruption.res, ylab="Residuals", xlab=“group", main=“Title")
abline(0, 0)

 How does the plot look like?


Assumptions: equality of variance

 We can use the Bartlett test for this assumption.


bartlett.test(BlockingExample$rating, BlockingExample$price)
 A general rule of thumb is that if we have balanced data (the # of observations in
each level is equal) then small violations (largest variance/smallest variance < 3
or 4) we are ok with ANOVA.
tapply(BlockingExample$rating, BlockingExample$price, var)
Assumptions: Additivity of Interaction

 There is no interaction between treatment and block.


 To be able to check this assumption. We have to set an INTERACTION variable.
In the study, the interaction variable is “price*color.
model <- aov(rating ~ price*color, data = BlockingExample)
model
summary(model)
Analyzing a Randomized Block Design in R

Now we run the model and see what we get.


model <- aov(rating ~ price + color, data = BlockingExample)

Price is significant.
Analyzing a Randomized Block Design in R

Look at color – we are generally not interested in whether our blocking factors are
significant but note how much variance it explains (MS).

What if we didn’t include it?


Post-Hoc Test and Effect Size

Bonferroni
pairwise.t.test(ANOVAExample $IV, ANOVAExample $Group, paired = FALSE,
p.adjust.method = “method")
Tukey
TukeyHSD(model)

Effect Size
by(ANOVAExample$V2, ANOVAExample$Condition, stat.desc)
mes(mean 1, mean 2, SD 1, SD2, n1, n2)
Summary write up

Observations from the study were analyzed by conducting a one-way


analysis of variance using R version 3.6.1. First, all assumptions are met, and there
is no adjustment made. Results suggest that, after controlling for color, (F(5, 112) =
5.43, p = .001), the price of the earphone has a significant impact on participants’
attitudes toward the product(F(2, 112) = 26.89, p < .001).
Continue the discussion with specifically which groups differed, a
Bonferroni post-hoc test was established. The result suggested that there is a
significant difference between low price and high price (p < .001) , and median price
and low price (p < .001), in terms of individual’s attitude toward the earphone. The
effects were large, Cohen’ D = 1.77 and .79.
In Class Practice

This time we use the focus group as another block.


Research Questions and Hypothesis?
Specification Table
Variable #Levels
Price 3
FocusGroup 3
Practice: Results

 check the ANOVA assumptions:


 Normality: Good
 Independence of observations: Normal
 Variance Equality: Good
 Additivity of Interaction: Violated

 Is there any significant result? NO.


Summary write up

Observations from the study were analyzed by conducting a one-way


analysis of variance using R version 3.6.1. First, all assumptions are met, and there
is no adjustment made. Results suggest that, the price of the earphone has a
significant impact on participants’ attitudes toward the product(F(2, 115) = 22.65, p
< .001).
Continue the discussion with specifically which groups differed, a Tukey
post-hoc test was established. The result suggested that there is a significant
difference between low price and high price (p < .001) , and median price and low
price (p < .001), in terms of individual’s attitude toward the earphone. The effects
were large, Cohen’ D = 1.77 and .79.
Two Sources of Variance
Two Sources of Variance

We know we can account for one additional source of variance (an experimental
confound) in our data, providing ourselves with more power to detect effects we are
in interested in.

What if we have two?

If each block contains a complete replication things are straightforward.


Two Sources of Variance

Looking at the example data set we are working with we can see that we have two
potential experimental confounds – focus groups and color.

Colors are larger blocks with each containing 6 focus groups – thus the study was
run with larger sessions and within each session different focus groups broke off.

We should control for this additional source of variance.


Two Sources of Variance

First we make sure we have no interactions with our main effect of price as well.
model <- aov(rating ~ price*session*focusgroup, data = BlockingExample)
Notice there is a significant interaction effect, so what should we do?
model <- aov(rating ~ price + session + focusgroup, data = BlockingExample)
Not much changes because the impact of Focus Group is small – though price is
slightly more significant because more variance is removed from the error term.
We will talk more next week (factorial ANOVA) in terms of main effect, interaction
and others.
 What assumption must we test to include a variable as a
blocking factor?
 Recognize the IV, DV, block and create a table for the
following research statement.
“A company is planning to investigate the motor skills or
elderly population. The company separates the target
Weekly Lab population into three age categories: 60 – 69, 70 – 79, and
above 80 then randomly assign the participants in the study to
one of the three task conditions. After individuals have
completed the task, their performance will be compared.

 Use the data “Lab 3” with the research question to perform


a fine report.
*age “1”:60-69, “2”: 70-79 and “3”: above 80.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy