Sst101-Lecture 3
Sst101-Lecture 3
Measures of central tendency are the most common with the statisticians because they help to
reduce the complexity of data and make it more comparable. We cannot remember the whole set
of data and analysis of such data is impossible. In order to reduce this complexity of the data and
make the data comparable, we resort to averaging. This average must be a representative of the
whole data. Use of average is based on the principle that over a long time the attribute possessed
by a large number of cases in one direction is generally offset by those in the other direction.
The average gives a single expression of the whole set of data. Average is the value of the
variable which is located in the middle of the distribution.
Definition
Any statistical measure which gives an idea about the position of the point around which other
observations cluster is called a measure of central tendency.
According to Prof. Yule the following are the characteristics to be satisfied by an ideal measure
of central tendency:
Let the xi denote the values x1, x2, …, xn. We will denote the arithmetic mean by x̄ . If the
data is discrete then we define the arithmetic mean as;
n
x̄=n−1 ∑ xi
i=1
This method is called the direct method of calculating the arithmetic mean. There is another
method of calculating the arithmetic mean called the indirect or short-cut method In this method
we make use of an assumed mean A, where A can be any value in the range of values taken by
xi. Using this method,
n
x̄= A+ N −1 ∑ f i di
i=1
where d i =x i -A .
Advantages of the Arithmetic Mean
3. it may fall at a point where none of the actual observations are; e.g., 15.3
eggs. Hence it may not be truly representative.
1. The sum of the deviations of the observations x1, x2, …, xn from their arithmetic mean
is equal to zero.
Proof:
n n n
∑ ( x i− x̄ )=∑ x i−∑ x̄
i=1 i=1 i =1
n
=n x̄−n x̄ since
∑ x i =n x̄
i=1
=0
2. If z 1=x 1 + y 1
z 2 =x2 + y 2
⋮ ⋮ ⋮
z n =x n + y n
n n n
x̄=n−1 ∑ xi ȳ=n−1 ∑ y i z̄=n−1 ∑ z i
then z̄= x̄ + ȳ where i=1 , i =1 and i =1
Proof:
By definition
n
z̄=n−1 ∑ z i
i =1
n
=n −1
∑ ( xi+ yi )
i=1 since z i=x i + yi
n n
=n−1 ∑ x i +n−1 ∑ yi
i=1 i=i
¿ x̄ + ȳ
Proof:
By definition
n
x̄=n− ∑ x i
i=1
n
=n−1 ∑ ( di + A )
i=1 since x i=d i + A
n n
=n−1 ∑ d i +n−1 ∑ A
i=1 i=1
n
=n −1
∑ di+ A
i=1
4. If x̄ 1 and x̄ 2 are the means of two samples of sizes n1and n2 then the combined mean
n1 x̄ 1 +n 2 x̄2
x̄=
n1 +n 2
Proof:
Now, by definition
n1 n2
1 ∑ x 1i
x̄ 1=n−1 2 ∑ x 2i
x̄ 2 =n−1
i =1 and i =1
The combined sample is thus given by
n 1+n 2
x̄=( n1 +n 2 ) −1
∑ xi
i=1
{ }
n1 n1+n2
=( n1 + n2 )−1 ∑ xi+ ∑ xi
i =1 i=n1 +1
n1 x̄ 1 + n2 x̄ 2
=
n1 + n2
Example 3.1:
x 1 2 3 4 5
Frequency 3 5 9 6 2
Solution:
x frequency (f) fx
1 3 3
2 5 10
3 9 27
4 6 24
5 2 10
∑ f =25 ∑ f x=74
(a) Direct method:
74
=
25 = 2.96
∑ f =25 ∑ fd=−1
The arithmetic mean is given by;
n
x̄= A+ N −1 ∑ f i di
i=1
1
=3−
25 = 2.96
Example 3.2:
Compute the arithmetic mean from the following grouped data:
Weight frequency
6.5 – 7.5 5
7.5 – 8.5 12
8.5 – 9.5 25
9.5 – 10.5 48
10.5 – 11.5 32
11.5 – 12.5 6
12.5 – 13.5 1
Solution:
Let the assumed mean A=10
The geometric mean is the nth root of the products of all the observations comprising a group of
items of a series. It is very much used in the calculations of index numbers. It is calculated by
multiplying the values of the items and then finding out the root of the product corresponding to
the number of item. If one of the items is zero then the geometric mean of the items cannot be
calculated.
Definition
If x1, x2, …, xn are n observations then the geometric mean (G.M.) is given by;
Thus
1
G . M .=Antilog ∑ log( x i )
n ( )
In the case of grouped data; if x1, x2, …, xn are the observations with f1, f2, …, fn as the
corresponding frequencies then
√ ⏟
G. M = x 1 ⋯ x 1⋅x ⋯ x ¿⋯⋅x ⋯ x
N
⏟ 2 2 n n
⏟f times
1
f n times
f 2 times
√
N f
= x 11⋅x 22 ¿⋯⋅x nn
f f
Example 3.3:
Find the geometric mean of the numbers 4,6,8,9.
Solution:
G . M .=√n x 1⋅x 2⋅⋯⋅x n
Here n=4
Therefore by definition
= 4√ 4×6×8×9
= 4√1728
Using logarithms; G . M .=0 . 25×log(1728)
Example 3.4:
Solution:
= Antilog (1.350)
=22.39
Limitations
The geometric mean is;
1) not determinable if any of the items is zero or negative.
2) difficult to calculate and understand.
3) may not be identical with any of the items under review and therefore may not be
representative.
to give the greatest weight to the smallest items. It is applied rarely, e.g. in averaging rates, time
etc.
If x1, x2, …, xn are n observations then we define the harmonic mean as:
n
H .M=
1 1 1
+ +⋯+
x1 x 2 xn
1
= n
1 1
∑
n i=1 x i
For grouped data if x1, x2, …, xn are the observations with f1, f2, …, fn as the corresponding
frequencies then
1
H . M .= n f
1
∑ i
N i=1 x i where N=∑ f i
Example 3.5:
An airplane flies around a square measuring 100 miles each side. It covers at a speed of 100
miles per hour the first side, 200 mph the second side, 300 mph the third side and at 400 mph the
fourth side. What is the average speed?
Solution
n
H .M=
1 1 1
+ +⋯+
x1 x 2 xn
4
=
1 1 1 1
+ + +
100 200 300 400
=192 mph
Example 3.6:
Find the harmonic mean for the following distribution:
Solution:
Brief summary of overall task Watch the video on how to compute the Arithmetic,
geometric and harmonic means.
https://www.youtube.com/watch?v=n7V_rZMVnpY
https://www.youtube.com/watch?v=PKWVAIP17pw
https://www.youtube.com/watch?v=vglvKEroFhg
Spark
d =x i -A .
where i
G . M .=Antilog ( N1 ∑ f log( x ))
i i
n H . M .=
1
H .M= n f
1 1 1 1
+ +⋯+
x1 x 2 x n or ∑ i
N i=1 x i
Q k =L1 +
(
Nk
4
−C h )
f
Where
Li = Lower limit of the ith quartile class
N = Total cumulated frequency
F = Frequency of the quartile class
C = Cumulative frequency of the class preceding the quartile class
(ii) Median
This is the second quartile, i.e. when k=2. It may be defined as the middle most or central
value of the variable when the values are arranged in increasing order of magnitude. In the case
of grouped data, the median may be defined as that value of the variable that divides the area of
the curve into two equal parts.
Advantages of Quartiles
The quartiles
Limitations
The quartiles
1. are not amenable to further algebraic manipulation
2. requires that data must arranged in ascending order or descending order of magnitude and
involves additional work.
3. are erratic if the number of items is small.
Example 3.7:
Using the data below compute the quartiles and the median.
Variable 5 7 9 11 13 15 17 19
frequency 1 2 7 9 11 8 5 4
Solution:
We first calculate the cumulative frequency in order to determine the value of the quartiles.
5 1 1
7 2 3
9 7 10
11 9 19
13 11 30
15 8 38
17 5 43
19 4 47
th
First quartile Q1 =size of ( N /4 ) item
47
= 4
= 11.75th item.
This item is included in the cumulated frequency (c.f. =19) where x=11. Hence the first quartile
Q1 =11.
th
The second quartile (median) Q2 = size of ( N /2) item
47
= 2
= 23.5th item
This item is included in the cumulated frequency (c.f. =30) where x =13. Hence the second
quartile Q2 =13.
th
The third quartile Q3 = size of (3N/4 )
47×3
= 4
= 35.25th item.
This item is included in the cumulated frequency (c.f. =38) where x =15. Hence the third
quartile Q3 =15.
Example 3.8:
Find the median and the quartile for the marks obtained by 76 students given below.
Solution:
th
The median Q2 =size of ( N /2) item.
76
=
2
= 38th item.
This item lies in class interval 30-40 whose cumulated frequency (c.f.=56).
Applying the interpolation formulae
M =L1 +
N
2(−C h )
f
Here L1 =30, f=32 N /2=38 and C=24. Substituting these values in the formula above we get;
( 38−24 ) 10
M =30+
32
=34.37 marks
th
The first quartile Q1 = size of ( N /4 ) item
76
= 4
=19th item.
This item lies in the class interval 20-30 whose cumulated frequency (c.f. = 24).
Applying the interpolation formulae
Q 1 =L1 +
N
4 (
−C h )
f
Here h=10, L1 =20, f =12 N /2=19 and C=12
Substituting these values in the formula above we get:
(19−12 ) 10
Q 1 =20 +
12
=25.83 marks
( )
th
3N
The third quartile Q3 = size of 4 item
76×3
= 4
= 57th item
The item lies in class interval 40-50 whose cumulated frequency (c.f.=56).
Applying the interpolation formulae
Q3 =L1 +
(
3N
4
−C h )
f
Here h= 10, L1 =40, f= 20, N /2=57 and C=56. Substituting these values in the
formula above we get;
(57−56 )10
Q3 =40 +
20
= 40.5 marks
(iii) Mode
It is usually found that in a given data, a certain item will occur more frequently than any other
and this predominant item can easily be located. The value of the item, which is most common,
is known as the mode. The mode is the value that occurs most frequently. It automatically
follows that if the items are selected at random, the most likely item to occur will be the modal
value. In the case of discrete grouped frequency distribution, the mode is the value of the
variable corresponding to the maximum frequency. In the case of continuous data the mode is
given by the following interpolation formulae;
( f m −f 1 ) h
Mode=L1 +
2f m−f 1−f 2
Solution:
If we locate the mode by inspection, we find that the variables 7 and 10 have a maximum
frequency, hence we cannot determine whether the mode is 7 or 10. This is a case of bimodal
distribution. We can determine the mode of this distribution using the method of grouping.
Variable frequency I II III IV V
3 5
9
4 4 15
10
5 6 18
14
6 8 23
17
7 9 24
16
8 7 21
12
9 5 21
14
10 9 18
13
11 4
Procedure:
The frequencies in column I are added in pairs. In column II we leave the first item and added
the rest in pairs. In column III the items are added in threes and in column IV the first item is left
out and the rest added in threes. In column V the first two items are left out and the rest added in
threes. The maximum frequency in each column is picked out in the table below:
I 16 7,8 II
17 6,7
III 24 6,7,8
IV 21 7,8,9 V
23 5,6,7
We then construct a frequency table by counting the number of occurrences of each of the items
in the combinations above:
Frequency Table
Variable 5 6 7 8 9
Frequenc 1 3 5 3 1
y
Since item 7 has the maximum frequency, then 7 is the mode. In the case of grouped data we
locate the modal class using the method of grouping and then apply the interpolation formulae.
Example 3.11:
Find the mode from the following data
Solution:
Since this a unimodal distribution we see that the modal class is 30 - 40. We can now use the
interpolation formulae;
( f m −f 1 ) h
Mode=L1 +
2f m−f 1−f 2
(37−21 )10
=30+
74−21−31
= 37.27 marks
3.2.2.2 E-Tivity: Computation of the main positional averages i.e median,mode and the
quartiles.
Numberin 3.2.2.2
g and
pacing
and
sequencin
g
Title Computation of the main positional averages, i.e
Brief Watch the video on how to compute the quartiles, median and mode.
summary https://www.youtube.com/watch?v=9zV3H-Dh0Sk
of overall
task https://www.youtube.com/watch?v=heb3JvwSQZ4
Spark
Q k =L1 +
( Nk
4 )
−C h
f
Median is when k=2.
( f m −f 1 ) h
Mode=L1 +
2f m−f 1−f 2
Individual 1. Below is the frequency distribution which resulted when the weight (in kg) of
contributi
50
on
calves in a dairy farm were measured.
Weight 17 172. 17 177. 18 182. 18 187. 19 192. 19
(Kg) 0 5 5 5 0 5 5 5 0 5 5
Frequen 1 2 4 6 8 9 7 6 3 2 2
cy
Find:
a) the mode
b) the median
c) the interquartile range
netiquette in mind
Focussing group discussion
E-
moderato Encouraging lurkers (quiet ones) to contribute
r’s
interventi Providing feedback/ teaching points
on.
Closing the discussion
Age group 80-89 70-79 60-69 50-59 40-49 30-39 20-29 10-19
Frequency 2 2 6 20 56 40 42 32
Calculate the
(i) Arithmetic mean
(ii) Harmonic mean
(iii) Geometric mean
2) Find the average mark of the student from the following frequency table:
6) When checking the number of errors per page by a copy typist the frequency distribution was
as summarised below.
Number of errors per 0 1 2 3 4 5 6 7 8
page
Frequency 4 15 27 20 18 10 4 1 1
Find:
a) the mode
b) the median
c) the upper quartile
7) The grouped frequency shown below gives the results of an IQ test performed on a group of
50 students.
IQ test marks 90 - 95 - 100 - 105 - 110 - 115 - 120 - 125 -
94 99 104 109 114 119 24 129
Frequency 2 7 9 14 9 4 3 2
Weight 60-69 70-79 80 -89 90-99 100-109 110-119 120-129 130-139 140-149
Boys 1 9 24 28 15 11 7 3 1
3.4References.