07 Analysis of Variance
07 Analysis of Variance
The Paired t-test is used to test whether the mean difference between pairs of measurements is
zero or not.
Example 7.1
A drug manufacturing company invented a drug to control the blood pressure of patients. A group of
patients are taken into an experiment and their blood pressures are noted before taking the drug. All
patients were then administered the invented drugs for a week and then their blood pressure were
measured again.
Here, we have pairs of measurements for each person, and we can find the differences. In this case, we are to
test if the mean difference is zero or not.
Example 7.2
Two different techniques are invented to type messages on a small hand-held device. Technique T1
uses a touch screen and technique T2 uses a keyboard. A group of users each possessing a specific
model of a hand-held device participated in an experiment, where each of them had to type a
particular text, and their text entry speed was recorded.
User ID Text entry time with T1 Text entry time with T2 Difference in times
… … … …
… … … …
Steps
1. Calculate the mean of the differences
1
𝑋ത = 𝑑𝑖
𝑛
2. Calculate the variance of the differences
1
𝑆2 = 𝑑 − 𝑋ത 2
𝑛−1 𝑖
σ 𝑑𝑖 2 − (σ 𝑑𝑖 )2
=
𝑛−1
3. Calculate the test statistics
Average difference 𝑋ത σ 𝑑𝑖
𝑡= = =
Standard error 𝑆 𝑛(σ 𝑑𝑖 2 − σ 𝑑𝑖 2 )
𝑛 𝑛−1
This t-value should be tested with a level of significance to reject or accept the hypothesis that the mean
difference is zero.
CS 61061: Data Analytics 8
Example 7.3
An instructor wants to check if two exams are of equal difficulty level. For this, she
conducted two tests with a group of 15 students. The scores on the evaluation of two
tests are shown in Table.
𝑥=1.31
ҧ
𝑥ҧ 𝑥ҧ
3. 𝑡 = = 𝑆 // Average of the mean difference
𝑆𝑋
ഥ
𝑛
1.31
= = 0.750
1.75
4. The critical t-value from the table with α=0.05 with df = n-1 = 16-1 = 15 is 2.131
5. Decision: The test statistics is lower than the critical t-value with α = 0.05. Thus we are failed to reject
the null hypothesis that the mean difference is zero. In other word, we can say that the two tests are equally
difficult.
ത 𝑡 ) = 0.4650
p-value estimation: 𝑝 = 𝑃(𝑋>
This means that the likelihood of seeing a sample of average difference is 1.31 or greater when the 10
underlying population mean difference is zero, about 47%.
Thus we are 53% confidence about our decision.
Validity of the Paired t-test
1. Shapiro-Wilk test
2. Kolmogorov-Smirnov test
is used for 𝑛 ≥ 50
3. Anderson-Darling test
To test the hypothesis, use the Shapiro-Wilk test statistics, which is given by
2
σ𝑛𝑖=1 𝑎𝑖 𝑥(𝑖)
𝑤= 𝑛 2
σ𝑖=1 𝑥𝑖 − 𝑥ҧ
Here, 𝑥(𝑖) are the ordered sample values (the i-th sample when in order) and 𝑎𝑖 are constants
that are generated by the expression
𝑚𝑇 𝑉 −1
𝑎1 , 𝑎2 , … , 𝑎𝑛 = 1 ;
𝑚𝑇 𝑉 −1 𝑚 ൗ2
(This vector can be obtained from the Shapiro-Wilk table available in the book of Statistics.)
3. Calculate
𝑏 = 𝑎1 (𝑥(𝑛) -𝑥(1) ) + 𝑎2 𝑥 𝑛−1 −𝑥 2 + ⋯ + 𝑎𝑚 𝑥 𝑛−𝑚+1 −𝑥 𝑚
𝑏2
𝑤= S = standard deviation
(𝑛−1)𝑆 2
5. Compare the test statistics with a critical value (from Shapiro-Wilk table). Let this be w*
14
Shapiro-Wilk Test
(39.7683)2
=
(9−1)(15.52)2
= 0.8203
The critical value for n = 9 and α =0.05
𝑤 ∗ = 0.8293
Since 𝑤 < 𝑤 ∗ , we reject the null hypothesis; that is, we conclude with 95% confidence
that the given data are not from a normal distribution.
15
Pooled Two-Sampled t-test
𝑛1 −1 𝑆12 + 𝑛2 −1 𝑆22
𝑆𝑝2 =
𝑛1 +𝑛2 −2
𝑛1 −1 𝑛2 −1
= 𝑆12 + 𝑆22
𝑛1 +𝑛2 −2 𝑛1 +𝑛2 −2
Special case: If 𝑛1 = 𝑛2 = 𝑛, then
1
𝑆𝑝2 = (𝑆12 + 𝑆22 )
2
Note
The larger sample size receives more weight.
The degree of freedom is 𝑛1 + 𝑛2 − 2 and if 𝑛1 = 𝑛2 = 𝑛, then 𝑑𝑓 = 2(𝑛 − 1)
𝑥1 − 𝑥2
𝑡=
1 1
𝑆𝑝2 +
𝑛1 𝑛2
𝑥1 −𝑥2
=
1 1
𝑆𝑃 +
𝑛1 𝑛2
The hypothesis test procedure will follow the same steps as the standard
hypothesis testing.
Sample 1 Sample 2
3.2 4.5
4.5 6.2
3.8 5.8
4.0 6.0
3.7 7.1
3.2 6.8
4.1 7.2
Test the hypothesis that two agricultural areas have the same fertilization
rate. Assume, the significance level α = 5%
CS 61061: Data Analytics 20
Pooled Two-Sampled t-Test
= 0.55
4. This is a one-sided test with 7+7-2=12 degrees of freedom. The critical test value with 12
degree of freedom and α = 0.05 is -1.782.
21
Pooled Two-Sampled t-Test
6. Conclusion is that two agricultural lands are not of equal fertility rate.
p-value estimation:
𝑝 = 𝑃(𝑋ഥ >-6.16) and we can check that 𝑝 ≤ 0.05.
22
Pooled Two-Sampled t-Test
1 1
Thus 𝐶𝐼 = (𝑥1 − 𝑥2 ) ±𝑡α/2 . 𝑆𝑃 +𝑛
𝑛1 2
1 1
𝐶𝐼 = (3.79 − 6.23) ±1.782 0.55 +
7 7
= −2.44 ±0.7064
Single population
Multiple population
ANOVA determines if there is any difference between the means of different groups. In
other words, it tests for differences among the population's mean by examining the
variation within each sample relative to the amount of variation between the samples.
Analyzing variance tests the hypothesis that the means of two or more populations are
equal.
Example 7.7
Let us consider another simple example to understand an application of
ANOVA test.
Suppose, there are three different drugs A, B, and C available from three
drug manufacturing company. We have to study the effectiveness of
drugs to cure a disease. The data that will be given to us three samples
and is as follows.
𝜇A 𝜇B 𝜇C
𝜎A2 𝜎B 2 𝜎C2
Example 7.8
A typical example of a sample data is shown below.
Given this data, the objective is to test at the 0.05 significance level
whether the meantime for three drugs to cure the disease are equal (it is 29
the null hypothesis H0).
The Issue in Statistical Testing
A recent study claims that using music in a class enhances the concentration
and consequently helps students absorb more information.
Three different groups of ten randomly selected students from three different
classrooms were taken.
Each classroom was provided with three different environments for students to study.
Classroom A had constant music being played in the background
Classroom B had variable music being played in the background
Classroom C was a regular class with no music playing
A test was conducted after one month for all the three groups and their test scores were
collected.
Maybe it’s true, but there is also a slight chance that we happened to select the
best students from class A, which resulted in better test scores (remember, the
selection was done at random).
1. How do we decide that these three groups performed differently because of the
different situations and not merely by chance?
2. In a statistical sense, how different are these three samples from each other?
This technique was invented by Sir Ronald Aylmer Fisher (1921), and is
often referred to as Fisher’s ANOVA.
However, t-test is not useful to compare mean of more than two populations
Observations:
Looking only at the means, we can see that they are identical for the three populations in both
the sets.
Using the means alone, we would state that there is no difference between the two sets.
This observation is the basis for using the analysis of variance for making inferences about
differences among means
The analysis of variance is based on the comparison of the variance among the means of the
populations to the variance among sample observations within the individual populations.
CS 61061: Data Analytics 42
The problems to be analyzed
Factor
A characteristic under consideration, thought to
influence the measured observations
Each sample is looked at and the difference between its mean and grand mean is
calculated to calculate the variability.
If the distributions overlap or are close, the grand mean will be similar to the individual
means, whereas if the distributions are far apart, difference between means and grand
mean would be large.
9/27/2023
Dependent variable:
It is a measure of some measurable quantity.
DATA ANALYTICS
Independent variable:
The things under experiment. It is also called a factor.
Group:
Debasis Samanta
It denotes the subcategory of independent variable(s)
48
Example 7.11
A sample data is shown below.
Drug A Drug B Drug C
100.07 90.54 108.00
90.60 105.05 107.25
103.45 84.15 92.46
95.70 83.18 105.31
110.00 92.35 83.27
125.28 100.00 100.48
121.32 88.45 80.24
114.46 77.33 97.08
9/27/2023
Depending on the number of dependent variables and independent
variables (i.e., factors), the ANOVA test can be classified as shown below.
DATA ANALYTICS
ANOVA
Factorial MANOVA
ANOVA
Debasis Samanta
One-way Two-way
ANOVA ANOVA
50
Factorial ANOVA
9/27/2023
A Factorial ANOVA is an analysis of variance test with one or more
independent variable(s), that is, factor(s).
DATA ANALYTICS
Thus, with One-way ANOVA, there is only one factor, whereas Two-
way and Three-way refer to the number of factors 2 and 3,
respectively.
Four-way ANOVA and above are rarely used because the test results
are complex and difficult to interpret.
Debasis Samanta
51
Variants of ANOVA
Based on the number of Independent Variables and Dependent Variable𝑠
considered for the study, there are different variants of ANOVA
1. One-way ANOVA: Only one independent variable (factor) with greater than 2
levels.
2. Two-way ANOVA: Two independent variables (i.e., factors).
3. Three-way ANOVA: Three independent variables (i.e., factors).
Example:
Effectiveness of three drugs manufactured by three drug manufacturing companies to
cure a disease.
Drug A Drug B Drug C
Case 1: We wanted to test the effectiveness of different types of teas (Black tea,
Green tea, No tea) on weight loss. For this purpose, we collected data with a
group of individuals randomly splitting into smaller groups and asked them to
drink a specific tea for a specific group with a certain period. The weight losses of
the individuals in each group are recorded.
54
…. …. ….
Case Study 1
Clearly identify the dependent, independent variables and group (level)
in the following situations.
Case 1: We wanted to test the effectiveness of different types of teas (Black tea,
Green tea, No tea) on weight loss. For this purpose, we collected data with a
group of individuals randomly splitting into smaller groups and asked them to
drink a specific tea for a specific group with a certain period. The weight losses of
the individuals in each group are recorded.
9/27/2023
From a population, we selected a number of individuals who are categorized into
the following four categories (called weight groups): Underweight, Overweight,
Obese, and Normal. For each category of individuals, their sprinting skill (i.e.,
running time in a 100m race) were recorded.
DATA ANALYTICS
In this case, identify the dependent variable, independent variable (factor) and
groups (level). Draw a table structure where data can be recorded for the Anova
test.
UW OW O N
Debasis Samanta
Dependent Variable: ?? …. …. … …
Independent variable(factor): ??
…. ….. ….
Group(level): ??
…..
56
… …. …. ….
Case Study 2
9/27/2023
From a population, we selected a number of individuals who are categorized into
the following four categories (called weight groups): Underweight, Overweight,
Obese, and Normal. For each category of individuals, their sprinting skill (i.e.,
running time in a 100m race) were recorded.
DATA ANALYTICS
In this case, identify the dependent variable, independent variable (factor) and
groups (level). Draw a table structure where data can be recorded for the Anova
test.
UW OW O N
Debasis Samanta
Dependent Variable: Speed (Numeric quantity) …. …. … …
9/27/2023
We want to find out if there is an interaction between Income
(Low (L), Medium (M) and High (H) ) and Gender (Male (M),
Female (F) and Transgender (T)) for the performance score (e.g.,
DATA ANALYTICS
performance in competitive test)
Debasis Samanta
59
Case Study 3
9/27/2023
Dependent Variable:
Score in the competitive examination
Gender Income Score
DATA ANALYTICS
M/F/T L/M/H … Independent variable(factor):
Gender, Income
Note:
In this case, nine different means are to be analyzed
… …. ….
Debasis Samanta
{𝜇FL, 𝜇FM, 𝜇FH, 𝜇ML, 𝜇MM, 𝜇MH, 𝜇TL, 𝜇TM, 𝜇TH,}
60
MANOVA
9/27/2023
MANOVA test is just an ANOVA test with several dependent
variables.
DATA ANALYTICS
Debasis Samanta
61
Case Study 4
Suppose, there are three video lectures of 3 different duration 30 minutes
(S), 60 minutes (M) and 90 minutes (L) have been prepared to teach the
Data Analytics course. A number of students randomly split into smaller
groups and experimented while they follow either 30 minutes, 60 minutes
or 90 minutes video lectures. To test their competence on the subject, two
measures, namely long-term recall (LTR) and short-term recall (STR) are
measured by means of conducting quiz and subjective test, respectively.
These performances of students are recorded in a table for the ANOVA
test.
a) Dependent variables
b) Independent variables
62
9/27/2023
Videos STR Score LTR Score
Dependent Variable:
S/M/L ….. …
1. Long-term-recall measurement
DATA ANALYTICS
2. short-term-recall measurement
…. ….. …..
Independent variable (factor):
Video Lectures
… …. ….
Group (level): 3 [ S, M, L]
Debasis Samanta
What are the different means in this case whose variance are to be
analyzed ?
63
Case Study 5
9/27/2023
A study was conducted to see the impact of socio-economic class (Rich (R),
Middle (M) and Poor (P)) and gender (Male (M), Female (F)) on TV-hours/day
and Study hours/day. A sample of 24 people collected as shown below.
DATA ANALYTICS
Gender S-E-C TV- Study-Hour Gender S-E-C TV- Study-
Hour Hour Hour
M R 5 3 F R 2 3
M R 4 6 F R 3 5
M R 3 4 F R 5 3
M R 2 4 F R 4 2
Debasis Samanta
M M 4 6 F M 9 8
M M 3 6 F M 6 5
M M 5 4 F M 7 6
M M 5 5 F M 8 9
M P 7 5 F P 8 9
M P 4 3 F P 9 8
F P 3 7 64
M P 3 1
M P 7 2 F P 5 7
Case Study 5
9/27/2023
A study was conducted to see the impact of socio-economic class (Rich (R),
Middle (M) and Poor (P)) and gender (Male (M), Female (F)) on TV-hours/day
and study hours/day. A sample of 24 people collected as shown below.
DATA ANALYTICS
Dependent Variable:
1. TV- Hours
2. Study- Hours
Debasis Samanta
2. Socio-Economic class
Group(level):
[M, F] x [R, M, P] = 2x3 = 6 levels
65
Statistical Analysis with ANOVA
9/27/2023
Like the previously learned statistical learning, with Anova test, we are
also to test hypothesis testing. Few cases are illustrated below.
DATA ANALYTICS
Case Study 1: One-way ANOVA
Debasis Samanta
110.00 92.35 83.27
125.28 100.00 100.48
H1: The mean time for the three drugs to
cure the disease are not equal.
121.32 88.45 80.24
114.46 77.33 97.08
66
Statistical Analysis with ANOVA
Case Study 4: Two-way ANOVA
1. The result from the two-way ANOVA calculates a main effect and an
interaction effect.
2. The main effect is similar to the One-way ANOVA: each factor’s effect
is considered separately. With the interaction effect, all factors are
considered at the same time.
H01 : All the income groups have equal mean performance score.
H02 : All the gender groups have equal mean performance score.
H03 : The factors are independent, that is, there is no effect of gender or income
group on performance score. 67
Statistical Analysis with ANOVA
9/27/2023
Case Study 5: MANOVA
DATA ANALYTICS
class effect on TV-hours and Study hours study),
Debasis Samanta
2. What are the interaction among dependent variables?
3. What are the interaction among independent variables?
68
Assumptions for ANOVA Tests
9/27/2023
1. The population must be close to a normal distribution.
2. Samples must be independent.
3. Population variance must be equal.
DATA ANALYTICS
4. Groups must have equal sample sizes.
Debasis Samanta
The F-test is used in ANOVA tests.
69
One-way ANOVA
𝐻0 : 𝜇𝑖 = 𝜇 all 𝑖 = 1,2, … , 𝑘
𝐻1 : 𝜇𝑖 ≠ 𝜇 some 𝑖 = 1,2, … , 𝑘
That is, at least one equality is not satisfied
𝑦.. 𝑦ഥ..
An entry in the table (e.g., 𝑦𝑖𝑗 ) represents the 𝑗𝑡ℎ observation taken under the factor at
level 𝑖.
There will be, in general, 𝑛 observations under the 𝑖 𝑡ℎ level.
𝑦𝑖. represents the total of the observations under the 𝑖 𝑡ℎ level.
𝑦𝑖. represent the average of the observation under the 𝑖 𝑡ℎ level.
𝑦.. represent the grand total of all the observation under the 𝑓𝑎𝑐𝑡𝑜𝑟.
𝑦ഥ.. represent the average grand total of all the observation under the factor.
𝑦𝑖.
𝑦ത𝑖.. =
𝑛𝑖
𝑘 𝑛𝑖
𝑦..
𝑦.. = 𝑦𝑖𝑗 𝑦ഥ.. = ൗ𝑁
𝑖=1 𝑗=1
𝑛 2
𝑖
𝑆𝑆𝑖 = σ𝑗=1 𝑦𝑖𝑗 − 𝑦ത𝑖. for i = 1, 2, …, k
𝑆𝑆𝑖 = (𝑦𝑖𝑗 − 𝑦. . )2
𝑖=1
𝑆𝑆𝑝 = 𝑆𝑆𝑖
𝑖=1
𝑆𝑆𝑝 𝑆𝑆𝑝
𝑠𝑝 = =
𝑝𝑜𝑜𝑙𝑒𝑑 𝑑𝑒𝑔𝑟𝑒𝑒 𝑜𝑓 𝑓𝑟𝑒𝑑𝑜𝑚 σ 𝑛𝑖 − 𝑘
Note that if the individual variances are available, the same can be
computed as
σ𝑘 2
𝑖=1 𝑛𝑖 −1 𝑠𝑖
𝑠𝑝 = σ 𝑛𝑖 −𝑘
where 𝑠𝑖2 are the variances for each sample. This is also called variance
2
within samples and also popularly be denoted as 𝜎
ො𝑊
CS 61061: Data Analytics 77
Example 5: Variance within Samples
The table below shows the lifetimes under controlled conditions, in hours in
excess of 1000 hours, of samples of 60𝑊 electric light bulbs of three
different brands.
Brand
1 2 3
16 18 26
15 22 31
13 20 24
21 16 30
15 24 24
The sample mean and variance (divisor (𝑛 − 1)) for each level are
as follows.
Brand
1 2 3
Sample Size 5 5 5
Sum 80 100 135
Sum of squares 1316 2040 3689
Mean 16 20 27
Variance 9 10 11
2
5 − 1 × 9 + 5 − 1 × 10 + (5 − 1) × 11
𝜎ො𝑊 = = 10
5+5+5−3
If the null hypothesis of equal mean is true, then we can compute the two
estimates of 𝜎 2 namely
𝜎ො𝐵2 = σ 𝑦ത𝑖. − 𝑦ത.. 2/(𝑘 − 1) and 𝑠𝑝2 , the pooled variance
2
𝑛ෞ𝜎𝐵
Therefore, the ratio has the F-distribution with degrees of freedom (k-
𝑠𝑝2
1) and 𝑛 − 𝑘
𝐻0 : 𝜇𝑖 = 𝜇 all 𝑖 = 1, 2, … , 𝑘
𝐻1 : at least one equality is not satisfied
2
𝑛ෞ𝜎𝐵
We are to reject H0, if the calculated value of F = exceeds α
𝑠𝑝2
(confidence level) of the F-distributions with (k-1) and 𝑛 − 𝑘 degrees of
freedom.
For both sets, the value of n𝜎ො𝐵2 is 101.67. However, for Set 1, 𝑠𝑝2 = 0.250
while for Set 2, 𝑠𝑝2 = 10.67. Thus for Set 1, F = 406.67 and for Set 2, F =
9.53.
This confirms that the relative magnitude of the two variances is the
important factor for detecting difference among means.
This variance (divisor (𝑛 − 1)), denoted by 𝜎ො𝐵2ത is called the variance between
sample means. Since it calculated using sample means, it is an estimate of
𝜎2 𝜎2
(that is in general)
5 𝑛
based upon (3 − 1) = 2 degrees of freedom, but only if the null hypothesis is true.
If 𝐻0 is false, then the subsequent 'large' differences between the sample means will
result in 5𝜎ො𝐵2ത being an inflated estimate of 𝜎 2 .
Recall that the F-test requires the two variances to be independently distributed
(from independent samples). Although this is by no means obvious here (both
2
were calculated from the same data), 𝜎ො𝑊 and 𝜎ො𝐵2ത are in fact independently
distributed.
2
The test is always one-sided, upper-tail, since if 𝐻0 is false, 𝜎ො𝑊
ഥ is inflated
whereas 5𝜎ො𝐵2 is unaffected.
o 𝐻0 : 𝜇𝑖 = 𝜇 all 𝑖 = 1, 2, 3
o 𝐻1 : 𝜇𝑖 ≠ 𝜇 some 𝑖 = 1, 2, 3
o Degrees of freedom, 𝑣1 = 2, 𝑣2 = 12
2
5ෝ
𝜎𝐵ഥ 155
o Test statistic is 𝐹 = 2 = = 15.5
ෝ𝑊
𝜎 10
This value does lie in the critical region. There is evidence, at the 1% significance
level, that the true mean lifetimes of the three brands of bulb do differ.
𝑀𝑆𝐵 155
Note that 𝐹 = = = 15.5 as previously.
𝑀𝑆𝑊 10
Detergent
A B C D
77 74 73 76
81 66 78 85
61 58 57 77
76 69 64
69 63
Assuming all whiteness readings to be normally distributed with common variance, test
the hypothesis of no difference between the four brands as regards mean whiteness
readings after washing.
o 𝐻1 : 𝜇𝑖 ≠ 𝜇 some 𝑖 = 1, 2, 3
o Degrees of freedom, 𝑣1 = 𝑘 − 1 = 3,
and 𝑣2 = 𝑛 − 𝑘 = 17 − 4 = 13
𝑦𝑖𝑗 2 = 86362
𝑖 𝑗
12042
𝑆𝑆𝑇 = 86362 − = 1090.47
17
Total 1090.47 16
In (b), since all 5 programs are run on each compiler, differences between
programs should not affect the results. Indeed it may be advantageous to use 5
programs that differ markedly so that comparisons of compilation times are
more general. For this design, there are two factors; compiler (4 levels) and
program (5 levels). The factor of principal interest is compiler whereas the other
factor, program, may be considered as a blocking factor as it creates 5 blocks
each containing 4 copies of the same program.
Compiler
1 2 3 4
3. The effect of one factor is the same at all levels of the other
factor.
N(μij, σ²)
Assumption 3 is equivalent to stating that there is no interaction
between the two factors.
Stirring has no effect on sweetness if sugar is not added but certainly does have an
effect if sugar is added. Similarly, adding sugar has little effect on sweetness unless
the tea is stirred.
Interaction can only be assessed if more than one measurement is taken at each
combination of the factor levels. Since such situations are beyond the scope of this
text, it will always be assumed that interaction between the two factors does not
exist.
Notation, similar to that for the one factor case, is then as follows.
These lead to the following computational formulae which again are similar
to those for one-way ANOVA except that there is an additional sum of
squares, etc. for the second factor.
What are the degrees of freedom for SST , SSR and SSC when
there are 20 observations in a table of 5 rows and 4 columns?
What is the degrees of freedom of SSE ?
1 2 3 4
Program A 29.21 28.25 28.20 28.62
1 2 3 4 Row(totals) (𝑻𝑹𝒊 )
Program D 14 26 20 2 62
σ 𝒙𝒊𝒋 𝟐 = 1757768
42602
𝑆𝑆𝑇 = 1757768 = = 850388
20
1 42602
𝑆𝑆𝑅 = 14282 + 3982 + 21702 + 622 + 2022 - = 830404
4 20
1 2 2 2 2 42602
𝑆𝑆𝐶 = (1260 +985 +1040 +975 ) - = 10630
5 20
Total 850388 19
This value does not lie in the critical region. Thus there is no evidence, at
the 1% significance level, to suggest a difference in compilation times
between the four compilers.
Obtain the current prices of the items in three different shops; preferably a
small 'corner' shop, a small supermarket and a large supermarket or hyper
market.