0% found this document useful (0 votes)
253 views46 pages

Anova

Uploaded by

sampritc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
253 views46 pages

Anova

Uploaded by

sampritc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 46

ANOVA

Samprit Chakrabarti
Topics for Discussion
• ANOVA
Example
• Gulfstream Aerospace Company produced three different prototypes
as candidates for mass production as the company’s newest large-
cabin business jet, the Gulfstream IV. Each of the three prototypes
has slightly different features, which may bring about differences in
performance. Therefore, as part of the decision-making process
concerning which model to produce, company engineers are
interested in determining whether the three proposed models have
about the same average flight range. Each of the models is assigned
a random choice of 10 flight routes and departure times, and the
flight range on a full standard fuel tank is measured (the planes carry
additional fuel on the test flights, to allow them to land safely at
certain destination points). Range data for the three prototypes, in
nautical miles (measured to the nearest 10 miles), are as follows.
Data

Prototype A Prototype B Prototype C

4,420 4,230 4,110

4,540 4,220 4,090

4,380 4,100 4,070

4,550 4,300 4,160

4,210 4,420 4,230

4,330 4,110 4,120

4,400 4,230 4,000

4,340 4,280 4,200

4,390 4,090 4,150

4,510 4,320 4,220

Do all three prototypes have the same average range?


Introduction
• ANOVA is a technique of testing hypotheses
about the significance in several population
means
• Was developed by R A Fisher
• Total variation in a data set accounted for two
components: between and within
ANOVA – What does it tell us?

• ANOVA = Analysis of Variance


• ANOVA will tell us whether we have sufficient
evidence to say that measurements from at
least one sample differ significantly from at
least one other.
– It will not tell us which ones differ, or how many
differ
Relationship Amongst Test, Analysis of Variance, Analysis
of Covariance, & Regression
Metric Dependent Variable

One Independent
Variable One or More
Independent Variables

Categorical: Categorical
Binary Interval
Factorial and Interval

Analysis of Analysis of
t Test Variance Covariance Regression

More than
One Factor One Factor

One-Way Analysis N-Way Analysis


of Variance of Variance
One-Way Analysis of Variance
Marketing researchers are often interested in
examining the differences in the mean values of the
dependent variable for several categories of a single
independent variable or factor. For example:

• Do the various segments differ in terms of their


volume of product consumption?
• Do the brand evaluations of groups exposed to
different commercials vary?
• What is the effect of consumers' familiarity with the
store (measured as high, medium, and low) on
preference for the store?
ANOVA vs. t-test
• ANOVA is like a t-test among multiple data
sets simultaneously
– t-tests can only be done between two data
sets, or between one set and a “true” value
• ANOVA uses the F distribution instead of
the t-distribution
• ANOVA assumes that all of the data sets
have equal variances
Conducting One-Way ANOVA

Identify the Dependent and Independent Variables

Decompose the Total Variation

Measure the Effects

Test the Significance

Interpret the Results


Completely randomized design
Population 1 Population 2….. Population k
Mean = 1 Mean = 2 …. Mean = k
Variance=12 Variance=22 … Variance = k2

We want to know something about how the


populations compare. Do they have the same mean?
We can collect random samples from each
population, which gives us the following data.
Completely randomized design
Mean = M1 Mean = M2 ..… Mean = Mk
Variance=s12 Variance=s22 …. Variance = sk2
N1 cases N2 cases …. Nk cases

Suppose we want to compare the interview scores


given by three interviewers to the interviewees. We
collect the following data of scores based on random
surveys.
Completely randomized design
Interviewer 1 Interviewer 2 Interviewer 2
27 23 48
22 36 35
33 27 46
25 44 36
38 39 28
29 32 29
Completely randomized design
Can the HR Head conclude that there are
differences among the interviewer’s scores?
Ho: 1 = 2 = 3
HA: 1  2  3

In this problem we must take into account:


1) The variance between samples, or the actual
differences by interviewer. This is called the sum of
squares for treatment (SST) (columns)
Completely randomized design
2) The variance within samples, or the variance
of scores within a single interviewer. This is
called the sum of squares for error (SSE).
Recall that when we sample, there will always
be a chance of getting something different
than the population. We account for this
through #2, or the SSE.
F-Statistic
For this test, we will calculate a F statistic, which
is used to compare variances.
F = SST/(k-1)
SSE/(n-k)
SST=sum of squares for treatment (columns)
SSE=sum of squares for error
k = the number of populations
n = total sample size
F-statistic
Intuitively, the F statistic is:
F = explained variance
unexplained variance
Explained variance is the difference between
majors
Unexplained variance is the difference based on
random sampling for each group
Decision Based on ANOVA

• F (calculated) > F (critical or theoretical)


• Reject H0
• P < 0.05 (chosen significance level)
• Reject H0
Calculating SST
SST = ni(Mi - )2
 = grand mean or =  Mi/k or the sum of all
values for all groups divided by total sample
size
Mi = mean for each sample
k= the number of populations
Calculating SST
By major
Accounting M1=29, n1=6
Marketing M2=33.5, n2=6
Finance M3=37, n3=6
 = (29+33.5+37)/3 = 33.17
SST = (6)(29-33.17)2 + (6)(33.5-33.17)2 + (6)(37-
33.17)2 = 193
Calculating SST
Note that when M1 = M2 = M3, then SST=0 which
would support the null hypothesis.
In this example, the samples are of equal size,
but we can also run this analysis with samples
of varying size also.
Calculating SSE
SSE = (Xit – Mi)2
In other words, it is just the variance for each sample
added together.
SSE = (X1t – M1)2 + (X2t – M2)2 +
(X3t – M3)2
SSE = [(27-29)2 + (22-29)2 +…+ (29-29)2]
+ [(23-33.5)2 + (36-33.5)2 +…]
+ [(48-37)2 + (35-37)2 +…+ (29-37)2]
SSE = 819.5
Statistical Output
When you estimate this information in a computer
program, it will typically be presented in a table as
follows:
Source of df Sum of Mean F-ratio
Variation squares squares
Treatment k-1 SST MST=SST/(k-1) F=MST
Error n-k SSE MSE=SSE/(n-k) MSE
Total n-1 SS=SST+SSE
Calculating F for our example
F = 193/2
819.5/15
F = 1.77
Our calculated F is compared to the critical value
using the F-distribution with
F, k-1, n-k degrees of freedom
k-1 (numerator df)
n-k (denominator df)
The Results
For 95% confidence (=.05), our critical F is 3.68
In this case, 1.77 < 3.68 so we must accept the
null hypothesis.
The HR Head is puzzled by these results because
just by eyeballing the data, it looks like
interviewer three has given higher scores
Two way ANOVA
Now SS(total) = SST + SSB + SSE
Where SSB = the variability among blocks,
where a block is a matched group of
observations from each of the populations
We can calculate a two-way ANOVA to test our
null hypothesis.
Example: Solving with Excel
• V K Foods Ltd is a leading manufacturer of biscuits. The
company has launched a new brand in the four metros; A, B,
C and D. After one month, the company realizes that there is
a difference in the retail price per pack of biscuits across
cities. Before the launch, the company had promised its
employees and newly-appointed retailers that the biscuits
would be sold at a uniform price in the country. The
difference in price can tarnish the image of the company. In
order to make a quick inference, the company collected data
about the price from six randomly selected stores across the
four cities. Based on the sample information, the price per
pack of the biscuits, in rupees, is given in the table
Data
A B C D
22 19 18 21
22.5 19.5 17 20
21.5 19 18.5 21.5
22 20 17 20
22.5 19 18.5 21
21.5 21 17 20
Hypotheses
• Null hypothesis is all the means are all equal
• Alternative hypothesis is all the means are
unequal
Excel Path
• Select Data Analysis
• From Data Analysis dialogue box, select Anova:
Single Factor
• Click OK
• In the Anova: Single Factor dialogue box, enter
the location of the samples in the variable Input
Range box. Select Grouped by Columns.
• Place the value of α
• Click OK
Minitab Path
• Select Stat from the menu bar
• A pull down menu will appear- select ANOVA
• Another pull down menu will appear- select One Way
Unstacked
• One Way dialogue box will appear
• By using select, place samples in the Responses (in
separate columns) box and place the confidence level
• Click OK
• F and p values will appear in the output box
Output Sheet: Excel
Anova: Single Factor

SUMMARY
Groups Count Sum Average Variance
Column 1 6 132 22 0.2
Column 2 6 117.5 19.58333 0.641667
Column 3 6 106 17.66667 0.566667
Column 4 6 123.5 20.58333 0.441667

ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 59.70833 3 19.90278 43.03303 6.54E-09 3.098391
Within Groups 9.25 20 0.4625

Total 68.95833 23

df for SST is (4-1) = 3, df for SSE is (24-4) = 20, df for total is (24-1) =
23. F theoretical is 3.098 where as F calculated is 43.03. As F
calculated greater than F theoretical, F falls in the rejection region.
Null hypothesis is rejected.
Conclusion
• There is enough evidence to believe that there
is a significant difference in the prices across
four cities
Two Way ANOVA
• A company which produces stationery items wants to diversify
into the photocopy paper manufacturing business. The company
has decided to first test market the product in three areas termed
as the north area, central area, and the south area. The company
takes a random sample of five salesmen S1, S2, S3, S4 and S5 for
this purpose. The sales volume generated by these five salesmen,
in thousand rupees, and total sales in different regions are given
in the table.
• Use a randomized block design analysis to examine:
– Whether the salesmen significantly differ in performance?
– Whether there is a significant difference in terms of sales capacity
between the regions?
– Take 95% confidence level
Data
Region Salesmen
S1 S2 S3 S4 S5 Region’s
Total
North 24 30 26 23 32 135
Central 22 32 27 25 31 137
South 23 28 25 22 32 130
Salesmen’s 69 90 78 70 95 402
Total
Hypotheses
• Divided into two parts
• For treatments (columns)
• For blocks (rows)
• For treatments: Null hypothesis: all the
treatment means are equal; Alternative
hypothesis: all the treatment means are unequal
• For blocks: Null hypothesis: all the block means
are equal; Alternative hypothesis: all the block
means are unequal
Excel Path
• Select Data Analysis
• From Data Analysis dialogue box, select Anova:
Two Factor Without Replication
• Click OK
• In the Anova: Two Factor dialogue box, enter
the location of the samples in the variable Input
Range box
• Place the value of α
• Click OK
Output Sheet: Excel
Anova: Two-Factor Without Replication

SUMMARY Count Sum Average Variance


Row 1 5 135 27 15
Row 2 5 137 27.4 17.3
Row 3 5 130 26 16.5

Column 1 3 69 23 1
Column 2 3 90 30 4
Column 3 3 78 26 1
Column 4 3 70 23.33333 2.333333
Column 5 3 95 31.66667 0.333333

ANOVA
Source of Variation SS df MS F P-value F crit
Rows 5.2 2 2.6 1.714286 0.2401 4.45897
Columns 183.0667 4 45.76667 30.17582 7.09E-05 3.837853
Error 12.13333 8 1.516667

Total 200.4 14

The F calculated for columns is 30.17, which is greater than F theoretical


of 3.84, null hypothesis is rejected.
The F calculated for rows is 1.71, which is less than F theoretical of 4.46,
null hypothesis is accepted.
Conclusion
• There is enough evidence to believe that there
is a significant difference in the performance
of five salesmen in terms of generation of
results. On the other hand, there is no
significant difference in the capacity of
generating sales for the three regions
Illustrative Applications of One-Way
Analysis of Variance

We illustrate the concepts discussed in this chapter


using the data presented in the next table

The department store is attempting to determine the


effect of in-store promotion (X) on sales (Y)

The null hypothesis is that the category means are


equal:
H0 : µ 1 = µ2 = µ3 .
Effect of Promotion and Clientele on Sales

Store Num ber Coupon Level In-Store Prom otion Sales Clientel Rating
1 1.00 1.00 10.00 9.00
2 1.00 1.00 9.00 10.00
3 1.00 1.00 10.00 8.00
4 1.00 1.00 8.00 4.00
5 1.00 1.00 9.00 6.00
6 1.00 2.00 8.00 8.00
7 1.00 2.00 8.00 4.00
8 1.00 2.00 7.00 10.00
9 1.00 2.00 9.00 6.00
10 1.00 2.00 6.00 9.00
11 1.00 3.00 5.00 8.00
12 1.00 3.00 7.00 9.00
13 1.00 3.00 6.00 6.00
14 1.00 3.00 4.00 10.00
15 1.00 3.00 5.00 4.00
16 2.00 1.00 8.00 10.00
17 2.00 1.00 9.00 6.00
18 2.00 1.00 7.00 8.00
19 2.00 1.00 7.00 4.00
20 2.00 1.00 6.00 9.00
21 2.00 2.00 4.00 6.00
22 2.00 2.00 5.00 8.00
23 2.00 2.00 5.00 10.00
24 2.00 2.00 6.00 4.00
25 2.00 2.00 4.00 9.00
26 2.00 3.00 2.00 4.00
27 2.00 3.00 3.00 6.00
28 2.00 3.00 2.00 10.00
29 2.00 3.00 1.00 9.00
30 2.00 3.00 2.00 8.00
16-41
One-Way ANOVA: Effect of In-store Promotion on
Store Sales

Source of Sum of df Mean F ratio F


prob.
Variation squares square
Between groups 106.067 2 53.033 17.944 3.35
(Promotion)
Within groups 79.800 27 2.956
(Error)
TOTAL 185.867 29 6.409

Cell means

Level of Count Mean


Promotion
High (1) 10 8.300
Medium (2) 10 6.200
Low (3) 10 3.700
TOTAL 30 6.067

16-42
Illustrative Applications of One-Way
Analysis of Variance

• From table we see that for 2 and 27 degrees of freedom,


the critical value of F is 3.35 for 95% level. Because the
calculated value of F is greater than the critical value, we
reject the null hypothesis.
SPSS Windows
One-way ANOVA can be efficiently performed using the
program COMPARE MEANS and then One-way ANOVA.
To select this procedure using SPSS for Windows click:

Analyze>Compare Means>One-Way ANOVA …

N-way analysis of variance and analysis of covariance


can be performed using GENERAL LINEAR MODEL. To
select this procedure using SPSS for Windows click:

Analyze>General Linear Model>Univariate …


SPSS Windows: One-Way ANOVA
1. Select ANALYZE from the SPSS menu bar.
2. Click COMPARE MEANS and then ONE-WAY ANOVA.
3. Move “Sales [sales]” in to the DEPENDENT LIST box.
4. Move “In-Store Promotion[promotion]” to the FACTOR
box.
5. Click OPTIONS
6. Click Descriptive.
7. Click CONTINUE.
8. Click OK.
• Thanks

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy