0% found this document useful (0 votes)
2K views49 pages

CHAPTER 12 Analysis of Variance

This document provides an introduction to analysis of variance (ANOVA). It discusses how ANOVA can be used to test for differences between three or more population means. It defines key terms like factors, treatments, and response variables. It explains how ANOVA partitions total variation in data into variation among sample means and variation within samples. It notes the assumptions of ANOVA and discusses one-way classification of data where samples are grouped based on a single factor. The document provides an example and introduces the concepts needed to understand one-way ANOVA.

Uploaded by

Ayushi Jangpangi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2K views49 pages

CHAPTER 12 Analysis of Variance

This document provides an introduction to analysis of variance (ANOVA). It discusses how ANOVA can be used to test for differences between three or more population means. It defines key terms like factors, treatments, and response variables. It explains how ANOVA partitions total variation in data into variation among sample means and variation within samples. It notes the assumptions of ANOVA and discusses one-way classification of data where samples are grouped based on a single factor. The document provides an example and introduces the concepts needed to understand one-way ANOVA.

Uploaded by

Ayushi Jangpangi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 49

Chapter 12

Analysis of Variance

LEARNING OBJECTIVES

After studying this chapter, you should be able to

 understand how ‘analysis of variance’(ANOVA) can be used to


test for the equality of three or more population means.
 understand and use the terms like ‘response variable’, ‘a
factor’ and ‘a treatment’ in the analysis of variance.
 learn how to summarize F-ratio in the form of an ANOVA
table.

There is no merit in equality unless it be equality with best.


—John Spalding

12.1 INTRODUCTION

In Chapter 10 we introduced hypothesis testing procedures to test the significance of


differences between two sample means to understand whether the means of two
populations are equal based upon two independent random samples. In all these
cases the null hypothesis states that there is no significant difference among
population mean, that is, H0 : µ1 ≠ µ2. However, there may be situations where more
than two populations are involved and we need to test the significance of differences
between three or more sample means. We also need to test the null hypothesis that
three or more populations from which independent samples are drawn have equal
(or homogeneous) means against the alternative hypothesis that population means
are not all equal. Let µ1, µ2,…, µr be the mean value for population 1, 2,
…, r respectively. Then from sample data we intend to test the following hypotheses:

In other words, the null and alternative hypotheses of population means imply that
the null hypothesis should be rejected if any of the r sample means is different from
the others.
For example, the production level in three shifts in a factory can be compared to
answer questions such as: Is the production level higher/lower on any day of the
week? Is Wednesday morning shift's production better/worse than any other shift?
and so on. Production level can also be analysed using other days and shifts of the
week in combination.

The following are a few examples involving more than two populations where it is
necessary to conduct a comparative study to arrive at a statistical inference:

 Effectiveness of different promotional devices in term of sales


 Quality of a product produced by different manufacturers in terms of an attribute
 Production volume in different shifts in a factory
 Yield from plots of land due to varieties of seeds, fertilizers, and cultivation methods

Under certain circumstances we may not conduct repeated t-tests on pairs of the


samples. This is because when many independent tests are carried out pairwise, the
probability of the outcome being correct for the combined results is reduced
greatly. Table 12.1 shows how the probability of being correct decreases when we
intend to compare the average marks of 2, 3, and 10 students at the end of an
examination at 95 per cent confidence level or 0.95 probability of being correct in
our statistical inferences in this experiment.

Table 12.1: Calculations of Probability of being Correct

It is clear from the calculations in Table 12.1 that as the size of student population or
sample increases, the probability of error in statistical inference of population means
increases. Under certain assumptions, a method known as analysis of
variance (ANOVA) developed by R. A. Fisher is used to test the significance of the
difference between several population means.

Analysis of variance: A statistical procedure for determining whether


the means of several different populations are equal.

The following are few terms that will be used during discussion on analysis of
variance:

 A sampling plan or experimental design is the way that a sample is selected from the
population under study and determines the amount of information in the sample.
 An experimental unit is the object on which a measurement or measurements is taken.
Any experimental conditions imposed on an experimental unit provides effect on the
response.
 A factor or criterion is an indepenent variable whose values are controlled and varied by
the researcher.
 A level is the intensity setting of a factor.
 A treatment or population is a specific combination of factor levels.
 The response is the dependent variable being measured by the researcher.

For example,

1. A tyre manufacturing company plans to conduct a tyre-quality study in which quality is


the independent variabnle called factor or criterion and the treatment levels or
classifications are low, medium and high quality. The dependent (or response) variable
might be the number of kilometers driven before the tyre is rejected for use. A study of
daily sales volumes may be taken by using a completely randomized design with
demographic setting as the independent variable. A treatment levels or classifications
would be inner-city stores, stores in metro-cities, stores in state capitals, stores in small
towns, etc. The dependent variable would be sales in rupees.
2. For a production volume in three shifts in a factory, there are two variables—days of the
week and the volume of production in each shift. If one of the objectives is to determine
whether mean production volume is the same during days of the week, then
the dependance (or response) variable of interest, is the mean production volume.
The variables that are related to a response variable are called factors, that is, a day of
the week is the independent variable and the value assumed by a factor in an experiment
is called a level. The combinations of levels of the factors for which the response will be
observed are called treatments, i.e. days of the week. These treatments define the
populations or samples which are differentiated in terms of production volume and we
may need to compare them with each other.

Factor: Another word for independent variable of interest that is


controlled in the analysis of variance.

Factor level: A value at which the factor is controlled.

12.2 ANALYSIS OF VARIANCE APPROACH

The first step in the analysis of variance is to partition the total variation in the
sample data into the following two component variations in such a way that it is
possible to estimate the contribution of factors that may cause variation.

1. The amount of variation among the sample means or the variation attributable to the
difference among sample means. This variation is either on account of difference in
treatment or due to element of chance. This difference is denoted by SSC or SSTR.
2. The amount of variation within the sample observations. This difference is considered
due to chance causes or experimental (random) errors. The difference in the values of
various elements in a sample due to chance is called an estimate and is denoted by SSE.
The observations in the sample data may be classified according to one
factor (criterion) or two factors (criteria). The classifications according to one factor
and two factors are called one-way classification and two-way classification,
respectively. The calculations for total variation and its components may be carried
out in each of the two-types of classifications by (i) direct method, (ii) short-cut
method, and (iii) coding method.

Assumptions for Analysis of Variance

The following assumptions are required for analysis of variance:

1. Each population under study is normally distributed with a mean µr that may or may not
be equal but with equal variances σ2r.
2. Each sample is drawn randomly and is independent of other samples.

12.3 TESTING EQUALITY OF POPULATION (TREATMENT) MEANS: ONE-WAY CLASSIFICATION

Many business applications involve experiments in which different populations (or


groups) are classified with respect to only one attribute of interest such a
(i) percentage of marks secured by students in a course, (ii) flavour preference of
ice-cream by customers, (iii) yield of crop due to varieties of seeds, and so on. In all
such cases observations in the sample data are classified into several groups based
on a single attribute and is called one-way classification of sample data.

One-way analysis of variance: Analysis of variance in which only one


criterion (variable) is used to analyse the difference between more than
two population means.

As mentioned before, for all theoretical purposes we refer populations (i.e., several
groups classified based on single factor or criterion in a sample data) as treatments.
We will study the effect of a factor (criterion) such as flavour preference on the
dependent variable (i.e. sales) at different groups (i.e. variety of ice-creams). These
groups are the treatments in this particular example.

Suppose our aim is to make inferences about r population means µ1, µ2, µr based on
independent random samples of size n1, n2,…, nr, from normal populations with a
common variances <2. That is, each of the normal population has same shape but
their locations might be different. The null hypothesis to be tested is stated as:

 
 

The observations values obtained for r independent samples based on one-criterion


classification can be arranged as shown in Table 12.2.

Table 12.2: One-Criterion Classification of Data

The values of   are called sample means and   is the grand mean of all observations
(or measurements) in all the samples.

Since there are k rows and r columns in Table 12.2, therefore total number of


observations are rk = n, provided each row has equal number of observations. But, if
the number of observations in each row varies, then the total number of
observations is n1 + n2+ ⋯ + nr = n.

Illustration: Three brands A, B, and C of tyres were tested for durability. A sample


of four tyres of each brand is subjected to the same test and the number of
kilometres until wearout was noted for each brand of tyres. The data in thousand
kilometres is given in Table 12.3.

Table 12.3: Example of Data in ANOVA


 

Since the same number of observations is obtained from each brand of tyres
(population), therefore the number of observations in the table is n = rk = 3×4 = 12.

 The sample mean of each of three samples is given by


 

 
 The grand mean for all samples is
 

12.3.1 Steps for Testing Null Hypothesis

Step 1: State the null and alternative hypotheses to test the equality of population
means as:

α = level of significance

Step 2: Calculate total variation If a single sample of size n is taken from the
population, then estimate of the population variance based on the variance of
sampling distribution of mean is given by
 

The numerator in s2 is called sum of squares of deviations of sample values about the
sample mean   and is denoted as SS. Consequently ‘sum of squares’ is a measure of
variation. Thus when SS is divided by df, the result is often called the mean
square which is an alternative term for sample variance.

Total variation is represented by the ‘sum of squares total’ (SST) and is equal to the


sum of the squared differences between each sample value from the grand mean 

where r = number of samples (or treatment levels)

  nj = size of jth sample

The total variation is divided into two parts as shown below:

Step 3: Calculate variation between sample means This is usually called the


‘sum of squares between’ and measures the variation between samples due to
treatments. In statistical terms, variation between samples means is also called
the between-column variance. The procedure is as follows:

1. Calculate mean values  1,  2,…,  r of all r samples


2. Calculate grand mean 
where T = grand total of all observations.

  n = number of observations in all r samples.

3. Calculate difference between the mean of each sample and the grand mean as  1 −  ,  2 −
, −  r −  . Multiply each of these by the number of observations in the corresponding
sample and add. The total gives the sum of the squared differences between the sample
means in each group and is denoted by SSC or SSTR.

 
This sum is also called sum of squares for treatment (SSTR)

Step 4: Calculate variation within samples This is usually called the ‘sum of


squares within’ and measures the difference within samples due to chance error.
Such variation is also called within sample variance. The procedure is as follows:

1. Calculate mean values  1,  2, −,  r of all r samples.


2. Calculate difference of each observation in r samples from the mean values of the
respective samples.
3. Square all the differences obtained in Step (b) and find the total of these differences. The
total gives the sum of the squares of differences within the samples and is denoted by
SSE.

This sum is also called the sum of squares for error, SSE = SST − SSTR.

Step 5: Calculate average variation between and within samples-mean


squares Since r independent samples are being compared, therefore r − 1 degrees
of freedom are associated with the sum of the squares among samples. As each of
the r samples contributes nj − 1 degrees of freedom for each independent sample
within itself, therefore there are n − r degrees of freedom associated with the sum of
the squares within samples. Thus total degrees of freedom equal to the degrees of
freedom associated with SSC (or SSTR) and SSE. That is

 
 

When these ‘sum of squares’ are divided by their associated degrees of freedom, we
get the following variances or mean square terms:

It may be noted that the quantity MSE = SSE/(n – r) is a pooled estimate of


σ2 (weighted average of all r sample variances whether H0 is true or not)

Step 6: Apply F-test statistic with r – 1 degrees of freedom for the numerator
and n – r degrees of freedom for the denominator

Step 7: Make decision regarding null hypothesis If the calculated value of F-


test statistic is more than its right tail critical value F(r − 1, n − r) at a given level of
significance α and degrees of freedom r − 1 and n − r, then reject the null hypothesis.
In other words, as shown in shown in Fig. 12.1, the decision rule is:

 Reject H0 if the calculated value of F > its critical value Fα(r − 1, n − r)
 Otherwise accept H0

The F-distribution is a family of distributions, each identified by a pair of degrees of


freedom. The first number refers to the number of degrees of freedom in the
numerator of the F ratio, and the second refers to the number of degrees of freedom
in the denominator. In the F table, columns represent the degrees of freedom for
numerator and the rows represents the degrees of freedom for denominator.

Figure 12.1 Rejection Region for Null Hypothesis using ANOVA


 

If null hypothesis H0 is true, then the variance in the sample means measured by
MSTR = SSTR/(r − 1) provides an unbiased estimate of σ2. But if H0 is false, and
population means are different, then MSTR is large as shown in Fig 12.2

Table 12.4 shows the general arrangement of the ANOVA table for one-factor
analysis of variance.

ANOVA table: A standard table used to summarize the analysis of


variance calculations and results.

Figure 12.2 Sample Means Drawn from Identical Populations

Table 12.4: ANOVA Summary Table


 

Short-Cut Method The values of SSTR and SSE can be calculated by applying the
following short-cut methods:

 Calculate the grand total of all observations in samples, T


 

 Calculate the correction factor 


 Find the sum of the squares of all observations in samples from each of r samples and
subtract CF form this sum to obtain the total sum of the squares of deviations SST:

Coding Method Sometimes the method explained above takes a lot of


computational time due to the magnitude of numerical values of observations. The
coding method is based on the fact that the F-test statistic used in the analysis of
variance is the ratio of variances without unit of measurement. Thus its values does
not change if an appropriate constant value is either multiplied, divided, subtracted
or added to each of the observations in the sample data. This adjustment reduces the
magnitude of numerical values in the sample data and reduces computational time
to calculate F value without any change.

Example 12.1: To test the significance of variation in the retail prices of a


commodity in three principal cities, Mumbai, Kolkata, and Delhi, four shops were
chosen at random in each city and the priceswho lack confidence in their
mathematical ability observed in rupees were as follows:

Do the data indicate that the price in the three cities are significantly different?

[Jammu Univ., M.Com, 1997]


Solution: Let us take the null hypothesis that there is no significant difference in
the prices of a commodity in the three cities. Calculations for analysis of variance are
as under.

There are r = 3 treatments (samples) with n1 = 4, n2 = 4, n3 = 4, and n = 12.

Table 12.5: ANOVA Table


 

The table value of F for df1 = 2, df2 = 9, and α = 5 per cent level of significance is 4.26.
Since calculated value of F is less than its critical (or table) value, the null hypothesis
is accepted. Hence we conclude that the prices of a commodity in three cities have no
significant difference.

Example 12.2: A study investigated the perception of corporate ethical values


among individuals specializing in marketing. Using a = 0.05 and the following data
(higher scores indicate higher ethical values), test for significant differences in
perception among three groups.

Marketing Manager Marketing Research Advertising

6 5 6

5 5 7

4 4 6

5 4 5

6 5 6

4 4 6

Solution: Let us assume the null hypothesis that there is no significant difference


in ethical values among individuals specializing in marketing. Calculations for
analysis of variance are as under:

 
 

There are r = 3 treatments (samples) with n1 = n2 = n3 = 6 and n = 18.

Table 12.6: ANOVA Table


 

The table value of F for df1 = 2, df2 = 15, and α = 0.05 is 3.68. Since calculated value
of F=7 is more than its table value, the null hypothesis is rejected. Hence we
conclude that there is significant difference in ethical values among individuals
specializing in marketing.

Example 12.3: As head of the department of a consumer's research organization,


you have the responsibility for testing and comparing lifetimes of four brands of
electric bulbs. Suppose you test the life-time of three electric bulbs of each of the
four brands. The data are shown below, each entry representing the lifetime of an
electric bulb, measured in hundreds of hours:

Can we infer that the mean lifetimes of the four brands of electric bulbs are equal?

[Roorkee Univ., MBA, 2000]

Solution: Let us take the null hypothesis that the mean lifetime of the four brands
of electric bulbs is equal.

Substracting a common figure 20 from each observation. The calculations with code
data are as under:

 
 

Degrees of freedom: df1

Table 12.7: ANOVA Table

 
The table value of F for df1 = 3, df2 = 8, and a = 0.05 is 4.07. Since the calculated
value of F = 1.67 is less than its table value, the null hypothesis is accepted. Hence we
conclude that the difference in the mean lifetime of four brands of bulbs is not
significant and we infer that the average lifetime of the four brands of bulbs is equal.

12.4 INFERENCES ABOUT POPULATION (TREATMENT) MEANS

When null hypothesis H0 is rejected, it implies that all population means are not
equal. However, we may not be satisfied with this conclusion and may want to know
which population means differ. The answer to this question comes from the
construction of confidence intervals using the small sample procedures, based on t-
distribution.

For a single populatin mean, µ the confidence interval is given by

where   is the sample mean from a population. Similarly, confidence interval for the
difference between two population means µ1 and µ2 is given by

where  1 and   mean of sample population 1 and 2 respectively;

   n1 and n2 = number of observations in sample 1 and 2 respectively.

To use these confidence intervals, we need to know

 How to calculate s or s2, which is the best estimate of the common sample variance?
 How many degrees of freedom are used for the critical value of t-test statistic?

To answer these questions, recall that in the analysis of variance, we assume that the
population variances are equal for all populations. This common value is the mean
square error MSE = SSE/(n − r) which provides an unbiased estimate of σ2,
regardless of test or estimation used. We use, s2 = MSE
or   and tα/2 at specified level of
significance α to estimate σ , where n = n1 + n2 +…+ nr.
2
Mean square error (MSE): The mean of the squared errors used to
judge the quality of a set of errors.

Illustration

From Example 12.1 we know that,  1 = 5;  3 = 6, n = n1 + n2 + n3 = 18, s2 = MSE = 0.5
and α = 0.05 level of signifiance. So the confidence interval is computed as:

Since zero is included in this interval, we may conclude that there is no significant
difference in the selected population means. That is, there is no difference between
eithical values of marketing and advertising managers.

Remark: If the end points of the confidence interval have the same sign, then we
may conclude that there is a significant difference between the selected population
means.

Self-Practice Problems 12A


12.1 Kerala Traders Co. Ltd., wishes to test whether its three salesmen A, B, and C
tend to make sales of the same size or whether they differ in their selling ability as
measured by the average size of their sales. During the last week there have been 14
sales calls—A made 5 calls, B made 4 calls, and C made 5 calls. Following are the
weekly sales record of the three salesmen:

Perform the analysis of variance and draw you conclusions.


[Madras Univ., MCom, 1996; Madurai Univ., MCom,1996]

12.2 There are three main brands of a certain powder. A sample of 120 packets sold
is examined and found to be allocated among four groups A, B, C, and D, and brands
I, II and III, as shown below:

 
 

Is there any significant difference in brand preferences?


12.3 An agriculture research organization wants to study the effect of four types of
fertilizers on the yield of a crop. It divided the entire field into 24 plots of land and
used fertilizer at random in 6 plots of land. Part of the calculations are shown below:

1. Fill in the blanks in the ANOVA table.


2. Test at α = 0.05, whether the fertilizers differ significantly

12.4 A manufacturing company has purchased three new machines of different


makes and wishes to determine whether one of them is faster than the others in
producing a certain output. Five hourly production figures are observed at random
from each machine and the results are given below:

Use analysis of variance and determine whether the machines are significantly
different in their mean speed.
12.5 The following figures related to the number of units of a product sold in five
different areas by four salesmen:

Is there a significant difference in the efficiency of these salesmen?


[Osmania Univ., MBA, 1998]

12.6 Four machines A, B, C, and D are used to produce a certain kind of cotton


fabric. Samples of size 4 with each unit as 100 square metres are selected from the
outputs of the machines at random, and the number of flowers in each 100 square
metres are counted, with the following results:

Do you think that there is significant difference in the performance of the four
machines?
[Kumaon Univ., MBA, 1998]

Hints and Answers


12.1 Let H0 : No difference in average sales of three salesmen.

Divide each observation by 100 and use the code data for analysis of variance.

 
 

Since the calculated value of F = 1.83 is less than its table value F = 3.98 at df1 =
2, df2 = 11, and α = 0.05, the null hypothesis is accepted.
12.2 Let H0: There is no significant difference in brand preference.

Since calculated value of F = 3.69 is less than its table value F = 4.26 at df1 =
2, df2 = 9, and α = 0.05, the null hypothesis is accepted.
12.3 Given total number of observations, n = 24; Number of samples, r = 4
df = n − 1 = 24 − 1 = 23 (For between the groups-fertilizers)
df1 = r − 1 = 4 − 1 = 3; df2 = n − r = 24 − 4 = 20

 
 

Since the calculated value of F = 5.99 is more than its table value F = 3.10 at df1 =
3, df2 = 20, and α = 0.05, the null hypothesis is rejected.
12.4 Let H0 : Machines are not significantly different in their mean speed.

Since the calculated value of F = 7.50 is more than its table value F = 3.89
at df1 = 2, df2 = 12, and α = 0.05, the null hypothesis is rejected.
12.5 Let H0 : No significant difference in the performance of four salesmen.

Since the calculated value of F = 10.61 is greater its table value F = 3.24 at df1 =
3, df2 = 12, and α = 0.05, the null hypothesis is rejected.
12.6 Let H0 : Machines do not differ significantly in performance.

 
 

Since the calculated value of F = 25.207 is more than its table value F = 5.95
at df1 = 3, df2 = 12, and α = 0.05, the null hypothesis is rejected.

12.5 TESTING EQUALITY OF POPULATION (TREATMENT) MEANS: TWO-WAY CLASSIFICATION

In one-way ANOVA, the partitioning of the total variation in the sample data is done
into two components: (i) Variation among the samples due to different samples (or
treatments) and (ii) Variation within the samples due to random error. However,
there might be a possibility that some of the variation left in the random error from
one-way analysis of variation was not due to random error or chance but due to
some other measurable factor. For instance, in Example 12.1 we might feel that part
of the variation in price was due to the inability in data collection or condensation of
data. If so, this accountable variation was deliberately included in the sum of squares
for error (SSE) and therefore caused the mean sum of squares for error (MSE) to be
little large. Consequently, F-Value would then be small and responsible for the
rejection of null hypothesis.

The two-way analysis of variance can be used to

 explore one criterion (or factor) of interest to partition the sample data so as to remove
the unaccountable variation, and arriving at a true conclusion.
 investigate two criteria (factors) of interest for testing the difference between sample
means.
 consider any interaction between two variables.

Two-way analysis of variance: Analysis of variance in which two


criteria (or variables) are used to analyse the difference between more
than two population means.

In two-way analysis of variance we are introducing another term called blocking


variable’ to remove the undesirable accountable variation. A block variable is the
variable that the researcher wants to control but is not the treatment variable of
interest. The term ‘blocking’ refers to block of land and comes from agricultural
origin. The ‘block’ of land might make some difference in the study of growth pattern
of varieties of seeds for a given type of land. R. A. Fisher designated several different
plots of land as blocks, which he controlled as a second variable. Each of the seed
varieties were planted on each of the blocks. The main aim of his study was to
compare the seed varieties (independent variable). He wanted only to control the
difference in plots of land (blocking variable). For instance, in Example 12.1, each set
of three prices in three cities under a given condition would constitute a ‘block’ of
sample data. ‘Blocking’ is an extension of the idea of pairing observations in
hypothesis testing. Blocking provides the opportunity for one-to-one comparison of
prices, where any observed difference cannot be due to difference among blocking
variables.

Blocking: The removal of a source of variation from the error term in the


analysis of variance.

To ensure a right conclusion to be reached, each sample data (group) should be


measured under the same conditions by removing variations due to these conditions
by the use of a blocking factor.

The partitioning of total variation in the sample data is shown below:

The general ANOVA table for c observations n is shown in Table 12.8.

Table 12.8: General ANOVA Table for Two-way Classification


 

As stated above, total variation consists of three parts: (i) variation between
columns, SSTR; (ii) variation between rows, SSR; and (iii) actual variation due to
random error, SSE. That is

SST = SSTR + (SSR + SSE)

The degrees of freedom associated with SST are cr – 1, where c and r are the number
of columns and rows, respectively

      Degrees of freedom between columns = c – 1

      Degrees of freedom between rows = r – 1

      Degrees of freedom for residual error = (c – 1) (r – 1) = N – n − c + 1

The test-statistic F for analysis of variance is given by

Decision rule

 If Fcal < Ftable, accept null hypothesis H0


 Otherwise reject H0

Example 12.5: The following table gives the number of refrigerators sold by 4


salesmen in three months May, June and July:

 
 

Is there a significant difference in the sales made by the four salesmen? Is there a
significant difference in the sales made during different months?

[Delhi Univ., MCom, 1998]

Solution: Let us take the null hypothesis that there is no significant difference


between sales made by the four salesmen during different months. The given data
are coded by subtracting 40 from each observation. Calculations for a two-criteria—
month and salesman—analysis of variance are shown in Table 12.9.

Table 12.9: Two-way ANOVA Table

T = Sum of all observations in three samples of months = 48

 
 

The ANOVA table is shown in Table 12.10.

Table 12.10: Two-way ANOVA Table

(a) The table value of F = 4.75 for df1 = 3, df2 = 6, and α = 0.05. Since the calculated
value of Ftreatment = 1.018 is less than its table value, the null hypothesis is accepted.
Hence we conclude that sales made by the salesmen do not differ significantly

(b) The table value of F=5.14 for df1=2, df2 = 6, and α = 0.05. Since the calculated
value of Fblock = 3.327 is less than its table value, the null hypothesis is accepted.
Hence we conclude that sales made during different months do not differ
significantly.

Example 12.6: To study the performance of three detergents and three different
water temperatures, the following ‘whiteness’ readings were obtained with specially
designed equipment:

 
 

Perform a two-way analysis of variance, using 5 per cent level of significance.

[Osmania Univ., MBA, 1998]

Solution: Let us take the null hypothesis that there is no significant difference in


the performance of three detergents due to water temperature and vice-versa. The
data are coded by subtracting 50 from each observation. The data in coded form are
in Table 12.11:

Table 12.11: Coded Data

Table 12.12: Two-way ANOVA Table


 

(a) Since calculated value of Ftreatment = 9.847 at df1 = 2, df2 = 4, and, α = 0.05 is greater
than its table value F = 6.94, the null hypothesis is rejected. Hence we conclude that
there is significant difference between the performance of the three detergents.

(b) Since the calculated value of Fblock = 2.380 at df1 = 2, df2 = 4, and α = 0.05 is less
than its table value F = 6.94, the null hypothesis is accepted. Hence we conclude that
the water temperature do not make a significant difference in the performance of the
detergent.

Conceptual Questions 12A


1. What are some of the criteria used in the selection of a particular hypothesis testing
procedure?
2. What are the major assumptions of ANOVA?
3. Under what conditions should the one-way ANOVA F-test be selected to examine the
possible difference in the means of independent populations?
4. How is analysis of variance technique helpful in solving business problems? Illustrate
your answer with suitable examples.
[Kumaon Univ., MBA, 2000]

5. Distinguish between one-way and two-way classifications to test the equality of


population means.
6. What is meant by the term analysis of variance? What types of problems are solved using
ANOVA? Explain.
7. Describe the procedure for performing the test of hypothesis in the analysis of variance.
What is the basic assumption underlying this test?
8. What is meant by the critical value used in the analysis of variance? How is it found?
9. How is the F-distribution related to the student's t-distribution and the chi-square
distribution? What important hypothesis can be tested by the F-distribution?
10. Discuss the components of total variation when samples are selected in blocks.
11. Define the terms treatment, error – ‘with in’ ‘between’ and the context in which these are
used.
12. Explain the sum-of-square principle.
13. Explain how the total deviation is partitioned into the treatment deviation and the error
deviation.
14. Does the quantity MSTR/MSE follow an F-distribution when the null hypothesis of
ANOVA is false? Explain.

Self-Practice Problems 12B


12.7 A tea company appoints four salesmen A, B, C, and D, and observes their sales
in three seasons—summer, winter and monsoon. The figures (in lakhs) are given in
the following table:

1. Do the salesmen significantly differ in performance?


2. Is there significant difference between the seasons?

[Calcutta Univ., MCom, 1996; Calcultta Univ., MCom, 1998]

12.8 Perform a two-way ANOVA on the data given below:

Use the coding method for subtracting 40 from the given numbers.

[CA, May 1996]

12.9 The following data represent the production per day turned out by 5 different
workers using 4 different types of machines:

 
 

1. Test whether the mean productivity is the same for the different machine types.
2. Test whether the 5 men differ with respect to mean productivity.

[Madras Univ., MCom, 1997]

12.10 The following table gives the number of units of production per day turned
out by four different types of machines:

Using analysis of variance (a) test the hypothesis that the mean production is
same for four machines and (b) test the hypothesis that the employees do not
differ with respect to mean productivity.
[Osmania Univ., MCom, 1999]

12.11 In a certain factory, production can be accomplished by four different workers


on five different types of machines. A sample study, in the context of a two-way
design without repeated values, is being made with two fold objectives of examining
whether the four workers differ with respect to mean productivity and whether the
mean productivity is the same for the five different machines. The researcher
involved in this study reports while analysing the gathered data as under:

1. Sum of squares for variance between machines = 35.2


2. Sum of squares for variance between workmen = 53.8
3. Sum of squares for total variance = 174.2 Set up ANOVA table for the given information
and draw the inference about variance at 5 per cent level of significance.
12.12 Apply the technique of analysis of variance of the following data showing the
yields of 3 varieties of a crop each from 4 blocks, and test whether the average yields
of the varieties are equal or not. Also test equality of the block means

12.13 Three varieties of potato are planted each on four plots of land of the same
size and type, each variety is treated with four different fertilizers. The yield in
tonnes are as follows:

Perform an analysis of variance and show whether (a) there is any significant
difference between the average yield of potatoes due to different fertilizers being
different used, and (b) there is any difference in the average yield of potatoes of
different varieties.

Hints and Answers


12.7 Let H0 : No significant difference between sales by salesmen and that of
seasons.
Decoding the data by subtracting 30 from each figure.

 
 

 Since F1 = 1.619 < F0.05 (6, 3) = 4.76, accept null hypothesis.
 Since F2 = 1.417 < F0.05(6, 2) = 5.14, accept null hypothesis.

12.8

1. F1 = 1.312 < F0.05 (3, 6) = 4.76, accept null hypothesis.
2. F2 = 1.218 < F0.05 (2, 6) = 5.14, accept null hypothesis.

12.9 Let H0 :(a) Mean productivity is same for all machines (b) Men do not differ
with respect to mean productivity Decoding the data by subtracting 40 from each
figure.

 
 

1. F0.05 = 3.49 at df1 = 3 and df2 = 12. Since the calculated value F1 = 18.387 is greater than the
table value, the null hypothesis is rejected.
2. F0.05 = 3.26 at df1 = 4 and df2 = 12. Since the calculated value F2 = 6.574 is greater than the
table value, the null hypothesis is rejected.

12.10 Let H0:

1. Mean production does not differ for all machines


2. Employees do not differ with respect to mean productivity Decoding the data by
subtracting 40 from each figure.

Decoding the data by subtracting 40 from each figure.

1. F0.05 = 3.86 at df1 = 3 and df2 = 9. Since the calculated value F1 = 9.72 is more than its table
value, reject the null hypothesis.
2. F0.05 = 3.86 at df1 = 3 and df2 = 9. Since the calculated value F2 = 8.27 is more than its table
value, reject the null hypothesis
12.11 Let H0 :

1. Workers do not differ with respect to their mean productivity


2. Mean productivity of all machines is the same

1. The calculated value of F1 = 1.24 is less than its table value F0.05 = 3.25 at df1 = 4 and df2 =
12, hence the null hypothesis is accepted.
2. The calculated value of F2 = 2.53 is less than its table value F0.05 = 3.49 at df1 = 3 and df2 =
12, hence the null hypothesis accepted.

12.12 Let H0:

1. Mean yields of the varieties are equal


2. Block means are equal

Decoding the data by subtracting 5 from each figure.

 
 

1. Since the calculated value F1 = 1.15 is less than its table value F0.05 = 4.757 at df = (3, 6), the
null hypothesis is accepted.
2. Since the calculated value F2 = 11.20 is less than its table value F0.05 = 19.33 at df = (6, 2),
the null hypothesis is accepted.

12.13 Let H0 :

1. No significant difference in the average yield of potatoes due to different fertilizers


2. No significant difference in the average yield of the three varieties of potatoes

Decoding the data by subtracting 158 from each figure.

1. Fcal = 1.55 is less than its table value F0.05 = 5.14 at df = (2, 6), the null hypothesis is
accepted.
2. Fcal = 9.22 is more than its table value F0.05 = 4.67 at df = (3, 6), the null hypothesis is
rejected.

Formulae Used
1. One-way analysis of variance
 Grand sample mean
 

 Correction factor CF 


 Total sum of squares
 

 
 Sum of squares of variations between samples due to treatment
 

 
 Sum of squares of variations within samples or error sum of squares
 

 
 Mean square between samples due to treatments
 

 
 Mean square within samples due to error
 

 
 Test statistic for equality of k population means
 

 
 Degrees of freedom
Total df(n − 1) = Treatment df (r − 1) + Random
error df (n − r)
2. Two-way analysis of variance
 Total sum of squares
 

 
 Sum of squares of variances between columns due to treatments
 

 
 Sum of squares between rows due to blocks
 
 
 Sum of squares due to error
SSE =SST − (SSTR + SSR)
 Degrees of freedom
dfc = c − 1; dfr = (r − 1)
df (residual error) = Blocks df + Treatments df
= (r − 1) (c − 1)
 Mean squares between columns due to treatment
 

 
 Mean square between rows due to blocks
 

 
 Mean square of residual error
 

 
 Test statistic
 

 
 provided numerator is bigger than denominator.

Chapter Concepts Quiz

True or False
1. Analysis of variance serves as the basis for a variety of statistical models related to the
design of experiments.
(T/F)

2. Analysis of variance is used to test the hypothesis that the means of several populations
do not differ.
(T/F)

3. Analysis of variance is used to test the hypothesis that the variances of several
populations do not differ.
(T/F)

4. Analysis of variance cannot be used when samples are of unequal size.


(T/F)

5. For analysis of variance, samples drawn from populations need not be independent.
(T/F)

6. All populations need not be normally distributed to draw samples for analysis of variance.
(T/F)

7. If samples of size n each are drawn from k normal populations, the degrees of freedom for
variation within the samples is n (k – 1).
(T/F)

8. Analysis of variance is an extension of the tests for differences between two means.
(T/F)

9. In the analysis of variance the assumption that population variances are equal is called
homogeneity of variance.
(T/F)

10. The homogeneity of variance means that the test is concerned with the hypothesis that
the several means came from the same population.

(T/F)

Multiple Choice
11. The number of parts in which total variance in a one-way analysis of variance
partitioned is:

1. 2
2. 3
3. 4
4. none of these

12. The number of parts in which total variance in a two-way analysis of variance


partitioned is:

1. 2
2. 3
3. 4
4. none of these

13. Any difference among the population means in the analysis of variance will
inflate the expected value of

1. SSE
2. MSTR
3. MSE
4. all of these

14. If data in a two-way classification is displayed in r rows and c columns, then the


degrees of freedom will be

1. r − 1
2. c − 1
3. (r − 1)
4. (r − 1) (c − 1)

15. The degrees of freedom between samples for k samples of size n will be

1. r –
2. n − 1
3. nr −
4. none of these

16. The degrees of freedom associated with the denominator of F-test in the analysis
of variance are:

1. r (n − 1)
2. n (r − 1)
3. nr − 1
4. none of these

17. The sum of squares within a one-way analysis of variance is given by:

1. SST + SSTR
2. SSTR – SST
3. SST – SSTR
4. none of these

18. The total sum of squares in a two-way analysis of variance is given by

1.

2.
3.
4. none of these

19. The error sum of squares can be obtained from the equation:

1. SSE = SST + SSR + SSTR


2. SSE = SSR + SSC – SST
3. SSE = SST – SSR – SSTR
4. none of these

20. The degrees of freedom for the error sum of squares are:


1. dfe = dft – dft – dfc
2. dfe = dfr + dfc – dft
3. dfe = dft + dfr dfc
4. none of these

21. Which of these distributions has a pair of degrees of freedom

1. Binomial
2. Chi-square
3. Poisson
4. none of these

22. To test equality of proportions of more than two populations which of following
techniques is used

1. interval estimate
2. analysis of variance
3. chi-square test
4. none of these
23. Which of the following assumptions of ANOVA can be discarded in case the
sample size is large

1. Each population has equal variance


2. samples are drawn from a normal population
3. both (a) and (b)
4. none of these

24. Test statistic for equality of r population means is

1. MSTR/MSE
2. MSTR/MSR
3. MSR/MSE
4. none of these

Concepts Quiz Answers


1. T 2.T 3.F 4.T 5.F

6.F 7.F 8.T 9.T 10.T

11.(a) 12.(b) 13.(b) 14.(d) 15.(a)

16.(a) 17.(c) 18.(b) 19.(c) 20.(a)

21.(d) 22.(d) 23.(b) 24.(a)  

Review Self-Practice Problems


12.14 Complete the ANOVA table and determine the extent to which this
information supports the claim that on an average there are no treatment
differences:

12.15 A manager obtained the following data on the time (in days) needed to do a
job. Use these data to test whether the mean time needed to complete a job differs
for four persons. Use α = 0.05.

12.16 A leading oil company claims that its engine oil improves engine efficiency. To
verify this claim, the company's brand A is compared with three other competing
brands B, C, and D. The data of the survey consists of the km per litre consumption
for a combination of city and highway travel, and are as follows:

1. Is there any difference in the average mileage for these four brands?
2. Is there any difference in the average mileage for a combination of city and highway
travel?
12.17 A TV manufacturing company claims that the performance of its brand A TV
set is better than two other brands. To verify this claim, a sample of 5 TV sets are
selected from each brand and the frequency of repair during the first year of
purchase is recorded. The results are as under:
  TV Brands  

A B C

4 7 4

6 4 6

7 3 6

5 6 3

8 5 1

In view of this data, can it be concluded that there is a significant difference


between the three brands?
12.18 Three varieties of coal were tested for ash content by five different
laboratories. The results are as under:

In view of this data, can it be concluded that all three varieties of coal have an
equal amount of ash content?
12.19 An Insurance Company wants to test whether three of its field officers, A, B,
and C in a given territory, meet equal number of prospective customers during a
given period of time. A record of the previous four months showed the following
results for the number of customers contacted by each field officer for each month:

 
 

Is there any significant difference in the average number of contacts made by the
three field officers per month?
12.20 A departmental store chain is considering opening a new store at one of three
locations. An important factor in making such a decision is the household income in
these areas. If the average income per household is similar, then the management
can choose any one of these three locations. A random survey of various households
in each location is undertaken and their annual combined income is recorded. This
data is as under:
  Annual Household Income (Rs ’000s)  
Area 1 Area 2 Area 3
70 100 60

72 110 65

75 108 57

80 112 84

83 113 84

– 120 70

– 100 –

Can the average income per household in these areas be considered to be the
same?
12.21 Four types of advertising displays were set up in twelve retail outlets, with
three outlets randomly assigned to each of the displays. The data on product sales
according to the advertising displays are as under:

 
 

Does the type of advertising display used at the point of purchase affect the
average level of sales?

Hints and Answers


12.14 df (within) = total df − df (between) = 9 − 2 = 7, that is, k − 1 = 2 and n − k = 7

12.15 Let H0:

1. No significant difference in efficiency of four persons


2. Mean time needed to complete the job is equal

 
 

1. Fcal = 47.64 is more than its table value F0.05 = 4.75 at df = (3, 6), the null hypothesis is
rejected.
2. Fcal = 11.42 is more than its table value F0.05 = 5.14 at df = (2, 6), the null hypothesis is
rejected.

12.16 Let H0 :

1. No significant difference in average mileage for four brands of engine oils.


2. No significant difference in average mileage for a combination of city and highway travel.

1. Fcal = 156 is more than its table value F0.05 = 3.862, for df = (3, 9), the null hypothesis is
rejected.
2. Fcal = 1.64 is less than its table value F0.05 = 3.862 for df = (3, 9), the null hypothesis is
accepted.

12.17 Let H0 : No significant difference in the performance of three brands of TV


sets.

 
 

Since Fcal = 1.58 is less than its table value F0.05 = 3.89 at df = (2, 12), the null
hypothesis is accepted.
12.18 Let H0 : Ash content is equal in all varieties of coal.

Since Fcal = 2.69 is less than its table value F0.05 = 3.89 at df = (2, 12), the null
hypothesis is accepted.
12.19 Let H0 : All field officers met equal number of customers in the previous four
months.

Since Fcal = 3.95 is less than its table value F0.05 = 4.26 at df = (2, 9), the null
hypothesis is accepted.
12.20 Let H0 : No significant difference in the average income per household in all
the three areas.

Since Fcal = 38.96 is more than its table value F0.05 = 3.68 at df = (2, 15), the null
hypothesis is rejected.
12.21 Let H0 : Type of advertising display used at the point of purchase does not
affect the average level of sales.

Since Fcal = 4.53 is more than its table value F0.05 = 4.07 at df = (3, 8), the null
hypothesis is rejected.

Case Studies

Case 12.1: FMCG Company

A FMCG company wished to study the effects of four training programmes on the
sales abilities of their sales personnel. Thirty-two people were randomly divided into
four groups of equal size, and the groups were then subjected to the different sales
training programmes. Because there were some dropouts during the training
programmes due to illness, vacations, and so on, the number of trainees completing
the programmes varied from group to group. At the end of the training programmes,
each salesperson was randomly assigned a sales area from a group of sales areas that
were judge to have equivalent sales potentials. The sales made by each of the four
groups of salespeople during the first week after completing the training programme
are listed in the table:

Training Programme

Questions for Discussion

1. Analyse the experiment using the appropriate method.


2. Identify the treatments or factors of interest to the researcher and investigate any
significant effects.
3. What are the practical implications of this experiment?
4. Write a paragraph explaining the results of your analysis.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy