Statistics
Statistics
S
aA
itil
Mb
Contents
.S
aA
itil
Mb
1 Statistics 1
.S
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
aA
1.1.1 Variable and Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
itil
1.1.2 Frequency Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Mb
1.2 Measures of Central Tendency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.1 Arithmetic Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
.S
1.2.2 Median . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
aA
1.2.3 Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
itil
1.3 Measures of Position . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.3.1 Quartile . . . . . . . . . . . . . . . . Mb
. . . . . . . . . . . . . . . . . . . . . 29
1.3.2 Percentile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
A.S
1.3.3 Decile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1.4 Measures of Dispersion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
a
itil
Mb
.S
aA
itil
Mb
.S
aA
itil
Mb
.S
aA
itil
Mb
.S
aA
itil
Mb
.S
aA
itil
Mb
i
Dar es Salaam Institute of Technology-DIT 2022 By Mr. Mbitila A.S +255754629262
.S
aA
Symbol Description Symbol Description
itil
C Complex numbers = is equal to
Mb
I Imaginary numbers ⌘ is equivalent to
N Natural numbers ⇡ is approximately equal to
R Real numbers is greater than
.S
>
aA
R+ Positive real numbers excluding zero < is less than
itil
Q Rational numbers is greater than or equal to
Mb
W Whole numbers is less than or equal to
Z Integers : such that
Z+ Positive integers excluding zero 2 is an element of
.S
aA
9 There exist
itil
Mb
.S
aA
itil
Mb
A.S
a
itil
Mb
.S
aA
itil
Mb
.S
aA
itil
Mb
.S
aA
itil
Mb
.S
aA
itil
Mb
.S
aA
itil
Mb
.S
aA
itil
Statistics
Mb
.S
aA
itil
Objectives
Mb
After completing this chapter, you should be able to:
.S
aA
• Differentiate between the two branches of statistics.
itil
Mb
• Identify types of data.
1.1 Introduction
.S
Broadly speaking, applied statistics can be divided into two areas: descriptive statistics and
inferential statistics.
.S
Descriptive Statistics Descriptive statistics consists of methods for organizing, displaying, and
aA
Inferential Statistics Inferential statistics consists of methods that use sample results to help
Mb
To gain knowledge about seemingly haphazard situations, statisticians collect information for
itil
Definition 1.2. A variable is a characteristic or attribute that can assume different values.
.S
aA
There are two types of variables, qualitative or quantitative. Qualitative variables are vari-
itil
ables that can be placed into distinct categories, according to some characteristic or attribute.
Mb
For example, if subjects are classified according to gender (male or female), then the variable
gender is qualitative. Quantitative variables are numerical and can be ordered or ranked. For
example, the variable age is numerical, and people can be ranked in order according to the
.S
aA
value of their ages. Quantitative variables can be further classified into two groups; discrete
itil
1
Dar es Salaam Institute of Technology-DIT 2022 By Mr. Mbitila A.S +255754629262
Definition 1.3. Data are the values or measurements or observations that the variables
.S
aA
can assume.
itil
Mb
A collection of data values forms a data set. Each value in the data set is called a data value or
a datum. Data may be discrete or continuous.
.S
Definition 1.4. Discrete data assume values that can be counted and they form a discrete
aA
variable.
itil
Mb
Examples of discrete data are the number of children in a family, the number of students in
a classroom, the number of cars in a workshop and so on. In fact it is impractical to represent
.S
number of people with fractions or decimals, the same applied to number of cars, fraction of a
aA
car is practically not a car but spare parts of a car.
itil
Mb
Definition 1.5. Continuous data assumes an infinite number of values between any two
specific values. They form a continuous variable.
.S
aA
They are obtained by measuring. They often include fractions and decimals. Temperature,
itil
Mb
for example, is a continuous variable, since the variable can assume an infinite number of values
between any two given temperatures. Other examples of continuous data are heights, weights,
and masses.
A.S
Variable information may be represented either in raw or grouped data. Raw data are collected
Mb
data which have not been organized. If raw data are arranged in either ascending or descending
order, then it is called an array.. There two types of frequency distributions; Un-grouped or
.S
If the range of data is small, say less than 10 the data may be grouped under single score with
its’ respective frequency.
.S
From frequency distribution table, we can generate the cumulative frequency distribution.
aA
The table which contains scores and cumulative frequencies is know as cumulative frequency
itil
distribution. The sum of frequency of the class and all classes below it is called cumulative
Mb
frequency. Hereunder is the procedure on how to obtain the cumulative frequency and generate
the distribution table of it.
.S
aA
From the procedure above, we can now establish the cumulative frequency distribution table
.S
below.
aA
itil
Score(x) Cumulative Frequency (F )
Mb
6 3
13 5
15 13
.S
aA
20 17
itil
22 29
Mb
28 34
By using the procedure in Table 1.1.1, we deduce the formal definition of cumulative frequency
.S
below.
aA
itil
Definition 1.6. Cumulative frequency of a class is the sum of all frequencies of classes preceding
Mb
that class.
Example 1.1. The numbers below is the distribution of weight in kilogram of rice bags packed
.S
for sales in one local area located in Mbeya region.
aA
itil
21 23 23 22 21 23 20 22
25 14 20 23 22 Mb
20 20 24
25 21 22 20 24 22 22 20
A.S
22 23 21 25 25 24 23 22
a
itil
Solution: By cancelling each number and tallying to its location as below, we obtain the fre-
quency distribution as below.
.S
aA
⇢⇢
21 ⇢⇢
23 ⇢⇢
23 ⇢
22
⇢ ⇢
21
⇢ ⇢
23
⇢ ⇢ ⇢
20
⇢ ⇢
22
itil
⇢⇢
25 ⇢⇢
14 ⇢⇢
20 ⇢
23
⇢ ⇢
22
⇢ ⇢
20
⇢ ⇢ ⇢
20
⇢ ⇢
24
Mb
⇢⇢
25 ⇢⇢
21 ⇢⇢
22 ⇢
20
⇢ ⇢
24
⇢ ⇢
22
⇢ ⇢ ⇢
22
⇢ ⇢
20
⇢⇢
22 ⇢⇢
23 ⇢⇢
21 ⇢
25
⇢ ⇢
25
⇢ ⇢
24
⇢ ⇢
23 ⇢
⇢ ⇢
22
.S
aA
14 / 1
Mb
20 ///// 6
21 //// 4
.S
22 //// /// 8
aA
23 //// / 6
itil
24 //// 3
Mb
25 //// 4
For the data with large range, the scores within a certain interval may be grouped into classes
itil
with their respective frequencies. Class intervals may either be inclusive or exclusive.
Mb
The upper class limit of preceding class interval and lower class limit of succeeding class
itil
interval are different. Inclusive type of class intervals are used in the case of discrete data.
Mb
For example, the raw data set x below has a wide range, we need to organize them in
inclusive classes.
48 47 42 67 73 50 76 47 44 44
.S
57 58 54 45 58 56 66 67 45 43
aA
71 48 64 52 42 54 62 32 49 34
itil
35 46 89 37 47 54 45 60 64 44
Mb
Looking at the raw data, little information can be obtained, hence you need to organize
the data into grouped data in the so called frequency distribution table as shown in Table
.S
aA
1.1.2. Here under is the procedure to obtain the frequency distribution table of raw data:
itil
Mb
⇢⇢
48 ⇢⇢
47 ⇢⇢ ⇢
42 ⇢
67 ⇢
73
⇢ ⇢ ⇢
50
⇢ ⇢
76 ⇢
47
⇢ ⇢
44
⇢ ⇢
44
⇢
⇢⇢
57 ⇢⇢
58 ⇢⇢ ⇢
54 ⇢
45 ⇢
58
⇢ ⇢ ⇢
56
⇢ ⇢
66 ⇢
67
⇢ ⇢
45
⇢ ⇢
43
⇢
⇢
71 ⇢
48 ⇢
64 ⇢⇢
52 ⇢
42 ⇢
54 ⇢⇢
62 ⇢
32 ⇢
49 ⇢
34
.S
⇢ ⇢ ⇢ ⇢ ⇢ ⇢ ⇢ ⇢
aA
⇢⇢
35 ⇢⇢
46 ⇢⇢
89 ⇢⇢
37 ⇢
47
⇢ ⇢ ⇢
54
⇢ ⇢
45 ⇢
60
⇢ ⇢
64
⇢ ⇢
44
⇢
itil
Mb
Class(x) Tally Frequency
32 39 //// 4
.S
40 47 //// //// /// 13
aA
48 55 //// /// 8
itil
56 63 //// // 7
64 71 //// Mb 5
72 79 // 2
A.S
80 87 0
/ 1
a
88 95
itil
Mb
The information above can expressed in more compact form as shown in the table below.
.S
Class(x) 32 39 40 47 48 55 56 63 64 71 72 79 80 87 88 95
Frequency(f ) 4 13 8 7 5 2 0 1
.S
The frequency distribution table above can be written in the form of inequalities as shown
aA
below.
itil
Mb
Class(x) Frequency(f )
32 x 39 4
.S
40 x 47 13
aA
48 x 55 8
itil
7
Mb
56 x 63
64 x 71 5
72 x 79 2
.S
80 x 87 0
aA
88 x 95 1
itil
Mb
Definition 1.7. Frequency distribution is the form of raw data summarized by distributing
aA
The table which shows data classes and their corresponding frequencies is known as fre-
quency distribution table, for example Table 1.1.2 above. The class or class interval is
bounded by two values lower and upper limits respectively, these limits sometimes are
.S
known as apparent limits. For example, if we take a class interval 40-47 in Table 1.1.2,
aA
the number 40 is called lower limit and the number 47 is called the upper limit of the
itil
lass interval.. In general, we can consider the frequency distribution in Table 1.1.3 below
Mb
where Li and Ui are lower and upper limits respectively, where i = 1, 2, 3, · · · , n.
.S
Table 1.1.3
aA
itil
Class L1 U1 L2 U2 L3 U3 ··· Ln Un
Mb
Frequency f1 f2 f3 ··· fn
To be more specific, let us consider a class interval Li Ui as shown in Figure 1.1.1 below.
.S
aA
The class mark or mid point of a class is the average of its upper and lower class limits.
itil
That is,
Mb
xi = 12 (Li + Ui ), (1.1.1)
where xi is the class mark, Li and Ui are lower and upper class limits respectively.
.S
aA
itil
Mb
aA.S
itil
Figure 1.1.1
Mb
Lower class boundary or real lower limit of a class interval is the number L defined by
L = 12 (Ui 1 + Li ), (1.1.2)
.S
aA
and the upper class boundary or real upper limit of a class interval is the number U given
itil
by
Mb
Example 1.2. Consider the data from Table 1.1.2, by taking the class interval 56 63, then
.S
aA
find:
itil
Mb
(c) The lower and upper class boundaries of the class interval
aA
Solution:
.S
(a) The lower and upper limits of the class are 56 and 63 respectively.
aA
(b) The class mark or mid point x of the class interval is given by
itil
Mb
.S
L = 12 (Ui 1 + Li ) ) L = 12 (55 + 56) = 12 (111) = 55.5,
aA
and the upper class boundary of the class interval is given by
itil
U = 12 (Ui + Li+1 ) ) U = 12 (63 + 62) = 12 (127) = 63.5.
Mb
(d) The class size of the class interval is given by
c = U L ) c = 63.5 55.5 = 8.
.S
aA
Example 1.3. Consider the frequency distribution table of data given below.
itil
Class
Mb
10.0 19.5 20.0 29.5 30.0 39.5 40.0 49.5
Frequency 12 8 10 14
.S
Find the lower and upper class boundaries of the class interval 30.0 39.5, hence determine
aA
the class size of the class interval.
itil
Mb
Solution:To obtain the lower and upper class boundaries of the class interval 30.0 39.5,
consider part of the data as shown in the diagram below.
.S
aA
itil
Mb
The lower class boundary of the class interval 30.0 39.5 is given by
A.S
1 1 1
L = (Ui 1 + Li ) ) L = (29.5 + 30) = (59.5) = 29.75,
a
2 2 2
itil
and the upper class boundary of the class interval 30.0 39.5 is given by
Mb
1 1 1
U = (Ui + Li+1 ) ) U = (39.5 + 40) = (79.5) = 39.75.
2 2 2
The class size of the class interval is given by
.S
Exclusive class intervals contain values less than the upper class limit, that’s the upper
class limit is excluded, or that contain values greater than the lower class limit, that’s the
lower class limit is excluded. The exclusive type of class intervals are used in the case of
.S
aA
continuous data. For the matter of generalization, from Figure 1.1.1, the upper class limit
itil
of preceding class interval and lower class limit of succeeding class interval are the same,
Mb
so Li = Ui 1 and Ui = Li+1 . Hence the lower class boundary of the class interval Li Ui is
given by L = Li or L = Ui 1 , and the upper class boundary of the class interval is given by
U = Ui or U = Li+1 . For example, the classes 30 40, 40 50, 50 60, · · · are exclusive class
.S
aA
intervals which refers to continuous data, where the lower and upper class boundaries of
itil
the class 40 50 are 40 and 50 respectively. The class size or class interval size is the
Mb
value c, defined as the difference between upper and lower class boundaries, hence it is
given by
c = U L. (1.1.4)
.S
aA
More details of exclusive frequency distributions are given in Tables 1.1.4 and 1.1.5 below.
itil
Mb
20 30 < 30
30 40 Less than 40 < 40 30 x < 40
40 50 Less than 50 < 50 40 x < 50
The less than classes above may be written in the form 0 , 10 , 20 , 30 and 40 . The
.S
score x = 20 will be in the interval 20 30 and not in 10 20, and the score x = 30 will be
aA
placed in the class 30 40 and not in 20 30.
itil
Mb
Table 1.1.5: Greater than classes
Class(x) Description Inequality Form Similar Inequality Form
.S
aA
0 10 Greater than 10 >0 0 < x 10
itil
10 20 Greater than 20 > 10 10 < x 10
Mb
20 30 Greater than 30 > 20 20 < x 20
30 40 Greater than 40 > 30 30 < x 30
40 50 Greater than 50 > 40 40 < x 50
.S
aA
itil
From the table above, the score x = 20 will be placed in the interval 10 20 and not in
Mb
20 30, also the score x = 30 will be in the interval 20 30 and not in 30 40. Note: the
class mark of less or greater than classes is just an average of lower and upper limits.
.S
aA
Exercise 1.1.2 (Answers on page 7.)
itil
Mb
1. List the upper limits of the class intervals given in the Table below
Less than 50 15
a
Less than 60
itil
6
Mb
Less than 70 8
Less than 80 21
.S
2. List the lower limits of the class intervals given in the Table below
aA
itil
Mb
12
Greater than or equal to 50 3
itil
Less than 10
Mb
10
Less than 20 12
Less than 30 18
.S
Less than 40 9
aA
Less than 50 15
itil
Less than 60
Mb
16
Less than 70 18
Less than 80 20
.S
aA
1.
.S
aA
1.2.1 Arithmetic Mean
itil
The mean (arithmetic mean or average) of a set of data is found by adding up all the items and
Mb
then dividing by the sum of the number of items. The mean of a sample is denoted by x and the
mean of a complete population is denoted by µ. At this level, large sample will be considered
.S
as a good representation of a population, therefore, x will be considered as the mean of a
aA
population. To the scope of this book, mean for raw data, un-grouped and grouped data with
itil
frequency distributions will be computed in three different methods; direct method, deviation
Mb
method and coding method. However the coding method applies only for grouped data.
(a) Raw data
.S
aA
itil
(i) Direct method
Mb
The mean of N data items x1 , x2 , x3 , · · · , xN is given by the formula
N
x1 + x2 + x3 + · · · + xN 1 X
or x = (1.2.1)
.S
x= xi .
aA
N N i=1
itil
Example 1.4. Find the mean score of students in one of the subject where they
scored; 38, 40, 70, 48, 44, 46, 55, 54, 42, 63. Mb
Solution: From the given scores, N = 10, the mean (x) by direct method is given by
A.S
N
1 X
a
x= xi
itil
N i=1
Mb
x1 + x2 + x3 + · · · + x10
=
10
.S
38 + 40 + 70 + 48 + 44 + 46 + 55 + 54 + 42 + 63
aA
=
10
itil
500
Mb
=
10
) x = 50.
.S
XN
itil
When the numerical values of xi are large, finding the xi becomes tedious and
Mb
i=1
time consuming. So, for such situations, let us think of a method of reducing these
calculations. We change each, xi to a smaller number so that our calculations become
.S
easy, by subtracting a fixed number, A, known as assumed mean from each of these
aA
xi ’s. Also, to further reduce our calculation work, we may take A at or near the centre
itil
Mb
xi di = xi A
aA
x1 d1 = x1 A
itil
x2 d2 = x2 A
Mb
x3 d3 = x3 A
.. ..
. .
.S
xN d N = xN A
aA
itil
N
X N
X
Mb
.S
N
1 X
aA
x=A+ di . (1.2.2)
itil
N i=1
Mb
Example 1.5. Ten students were asked to write a one or two digit number according
to their likes. The following raw numbers were wrote by students; 4, 5,6,8, 10, 15,
.S
16, 18, 20, and 24. By using direct and deviation methods, find the mean number.
aA
itil
Solution: Since there 10 students, then N = 10, hence by direct method, the mean
Mb
is given by
N
1 X
x = xi
.S
N i=1
aA
x1 + x2 + x3 + · · · + xN
itil
=
Mb
N
4 + 5 + 6 + 8 + 10 + 15 + 16 + 18 + 20 + 24
=
10
.S
aA
126
=
itil
10
= 12.6. Mb
By deviation method, we need to construct a table of xi and deviations, di as shown below
A.S
a
itil
xi d i = xi A
Mb
4 6
5 5
6 4
.S
aA
8 2
itil
A ! 10 0
Mb
15 5
16 6
18 8
.S
aA
20 10
itil
24 14
Mb
X10
di = 26
i=1
.S
aA
From the table above, the mean by deviation or assumed mean method is given by
itil
10
1 X
Mb
x=A+ di
N i=1
1
.S
= 10 + (26)
aA
10
itil
) x = 12.6.
Mb
n
X n
1X
Example 1.6. Prove that (xi x) = 0, where x = xi .
n i=1
.S
i=1
aA
itil
Mb
n n
1X X
Solution: From x = xi ) nx = xi . Now
.S
n i=1
aA
i=1
n
X
itil
(xi x) = (x1 x) + (x2 x) + · · · + (xn x)
Mb
i=1
= (x1 + x2 + · · · + xn ) (x + x + · · · )
.S
Xn
aA
= xi nx
itil
i=1
Mb
= nx nx
= 0 proved.
.S
aA
Example 1.7. The mean of nth terms is x, if the first term is increased by 1 and the second
itil
by 2 and so on. What will be the new mean?
Mb
n
X 1
Solution: Recall that r = n(n + 1) and proceed as follows:
.S
2
aA
r=1
itil
1st term +
Mb
1
2nd term + 2
3rd term + 3
A.S
.. .. ..
. . .
a
itil
nth terms + r
Mb
n
1X
itil
=x+ r
Mb
n r=1
1
=x+ n
⇥ 12 n(n + 1)
)X =x+
.S
1
(n + 1).
aA
2
itil
x= .
aA
n1 + n2
itil
Solution: Let n = n1 + n2 be the total number of observations, from the means of separate
Mb
observations;
a1 + a2 + · · · + an 1
x1 = ) n1 x1 = a1 + a2 + · · · + an1 and
.S
n1
aA
b1 + b2 + · · · + bn 2
itil
x2 = ) n 2 x 2 = b1 + b2 + · · · + bn 2
n2
Mb
x1 + x2
x= (mean of combined observations)
n1 + n2
.S
=
n1 + n2
itil
n1 x1 + n2 x2
Mb
Example 1.9. The mean of two samples of size 100 and 150 were found to be 25 and 10
.S
respectively. Find mean of the combined samples of the distribution.
aA
itil
Solution:Given that n1 = 100, n2 = 150 and x1 = 25 x2 = 10, if x is the combined mean,
Mb
then
n1 x1 + n2 x2
x=
.S
n1 + n2
aA
100 ⇥ 25 + 150 ⇥ 10
=
itil
100 + 150
Mb
2500 + 1500
=
250
4000
.S
=
aA
250
itil
) x = 16.
Mb
(b) Un-grouped frequency distribution
Each value xi may be recorded with its respective frequency, fi to form the frequency
.S
n
X
aA
distribution as shown in Table 1.2.1 below. The sum of frequencies, fi , or total number
itil
i=1
Mb
of data we have will always be denoted by N under this chapter.
Table 1.2.1
A.S
a
xi fi di = xi A f i xi fi di
itil
Mb
x1 f1 d1 = x1 A f 1 x1 f1 d1
x2 f2 d2 = x2 A f 2 x2 f2 d2
x3 f3 d3 = x3 A f 3 x3 f3 d3
.S
.. .. .. .. ..
aA
. . . . .
itil
xn fn d n = xn A f n xn fn dn
Mb
Xn n
X n
X
N= fi f i xi fi di
i=1 i=1 i=1
.S
aA
n
1 X
x= f i xi . (1.2.3)
N i=1
.S
aA
By choosing an appropriate assumed mean, A, among the xi ’s, the mean deviation d
Mb
.S
aA
itil
Mb
.S
aA
itil
Mb
.S
n
1 X
aA
d= fi di but di = xi A
N i=1
itil
Mb
n
1 X
= fi (xi A)
N i=1
!
.S
n
X n
X
aA
1
= f i xi A fi
itil
N
Mb
i=1 i=1
n n
1 X X
=x A * f i xi = x and fi = N
N i=1
.S
i=1
aA
n
1 X
x=A+d but d= fi di
itil
N i=1
Mb
n
1 X
=A+ fi di .
N i=1
.S
aA
Hence, the mean by deviation or assumed mean method is given by
itil
n
1 X
Mb
x=A+ fi di . (1.2.4)
N i=1
A.S
Example 1.10. A total of 40 students of a certain school were grouped according to their
age in years and asked how many times per year each student could commit wrong arith-
a
itil
metics in normal life applications of basic mathematics. The results were tabulated in the
Mb
Frequency (f ) 6 7 8 9 10
itil
Mb
Use direct and deviation methods to find the mean age in years of students interviewed.
Solution
.S
aA
itil
xi fi d i = xi A f i xi fi di
Mb
14 6 2 84 12
16 7 1 105 7
A ! 16 8 0 128 0
.S
aA
19 9 3 171 27
itil
21 10 5 210 50
Mb
5
X 5
X
N = 40 fi xi =698 fi di = 58
i=1 i=1
.S
aA
n
1 X 1
Mb
x= fi xi = (698) = 17.45.
N i=1 40
By deviation formula, and taking the assumed mean, A = 16, then the mean is given by
.S
n
1 X 1
aA
Example 1.11. If the mean of x numbers with frequency f in the frequency distribution
table below is 3.66. Find the value of a constant a.
x 1 2 3 4 5 6
.S
f 3 9 a 11 8 7
aA
itil
Solution: Given the mean x = 3.66, but the mean of x numbers with frequencies f is
Mb
given by
N
1 X
fi xi thus,
.S
x=
N i=1
aA
itil
(1 ⇥ 3) + (2 ⇥ 9) + (3 ⇥ a) + (4 ⇥ 11) + (5 ⇥ 8) + (6 ⇥ 7)
Mb
3.66 =
(3 + 9 + a + 11 + 8 + 7)
3 + 18 + 3a + 44 + 40 + 42
3.66 =
.S
38 + a
aA
3.66(38 + a) = 147 + 3a
itil
139.08 + 3.669a = 147 + 3a
Mb
0.66a = 7.92
) a = 12
.S
aA
(c) Grouped frequency distribution
itil
Now, for each class-interval, we require a point which would serve as the representative of
the whole class. It is assumed that the frequency of each class interval is centred aroundMb
its mid-point. So the mid-point or class mark of each class can be chosen to represent
A.S
the observations falling in the class. All the three methods; direct, deviation and coding
methods do apply for grouped data. Refer Table 1.2.2, where Li Ui for i = 1, 2, 3, · · · , n
a
itil
lier, in additional we compute ui = dci for the purpose of establishing the coding method,
where c is the class size.
.S
aA
Table 1.2.2
itil
Mb
di
Li Ui fi xi = 12 (Li + Ui ) d i = xi A fi di ui = f i ui
c
d1
.S
L1 U1 f1 x1 d 1 = x1 A f1 d1 u1 = f 1 u1
aA
c
itil
d2
Mb
L2 U2 f2 x2 d 1 = x1 A f2 d2 u2 = f 2 u2
c
d3
L3 U3 f3 x3 d 1 = x1 A f3 d3 u3 = f 3 u3
.S
c
aA
.. .. .. .. .. .. ..
. . . . . . .
itil
dn
Mb
Ln Un fn xn d n = xn A fn dn un = f n un
c
n
X n
X n
X
.S
N= fi fi di f i ui
aA
The direct method for grouped data use tha same formula in equation (1.2.3) of
aA
ungrouped data, only replacement of the column of xi by class mark or mid points of
itil
Mb
groups.
.S
Similary, for grouped, the same formula in equation (1.2.4) is applied with a little bit
aA
modification of using class marks to the place of xi .
itil
Mb
(iii) Coding method
The mean, u of u1 , u2 , u3 , · · · , un is given by
n
1 X xi A
.S
u= fi ui but ui =
aA
N i=1 c
itil
n ✓ ◆
1 X xi A
Mb
= fi
N i=1 c
" n n
!#
1 1 X X
.S
aA
= f i xi A fi
c N i=1
itil
i=1
n n
Mb
1 X X
cu = x A because fi xi = x and fi = N
N i=1 i=1
.S
n
1 X
aA
x = A + cu but u = f i ui
N i=1
itil
Mb
n
c X
=A+ f i ui .
N i=1
A.S
x=A+ f i ui . (1.2.5)
Mb
N i=1
Example 1.12. The table below gives the percentage distribution of female teachers in the
.S
secondary schools of rural areas of various wards Manyara region. Find the mean percentage of
aA
Solution The class interval, c = 25 15 = 10, the percentage of female teaches is the classes or
itil
(i) By direct method: The class marks, xi are computed and tabulated as shown below.
.S
Class fi xi f i xi
aA
15 25 6 20 120
itil
Mb
25 35 11 30 330
35 45 7 40 280
45 55 4 50 0
.S
55 65 4 60 240
aA
65 75 2 70 140
itil
75 85 1 80 8
Mb
n
X n
X
N= fi = 35 fi xi = 1390
.S
i=1 i=1
aA
itil
Mb
From the table above, the mean, x is camputed by direct method as follows:
.S
n
1 X
aA
x= f i xi
N i=1
itil
Mb
1390
=
35
= 39.71
.S
aA
(ii) By deviation or assumed mean method: Let A = 50 be the assumed mean, thus the class
itil
marks, xi and deviations, di are computed and tabulated as shown below.
Mb
Class fi xi d i = xi A fi di
.S
15 25 6 20 30 180
aA
25 35 11 30 20 220
itil
Mb
35 45 7 40 10 70
45 55 4 A ! 50 0 0
55 65 4 60 10 40
.S
65 75 2 70 20 40
aA
75 85 1 80 30 30
itil
n
X n
X
Mb
N= fi = 35 fi di = 360
i=1 i=1
A.S
From the table above, the mean, x is camputed by deaviation method as follows:
a
itil
n
1 X
Mb
x=A+ fi di
N i=1
( 360)
= 50 +
.S
35
aA
= 39.71
itil
Mb
(iii) By coding method: If the deviations di = xi A where A is the assumed mean, and the
class size c = 10, the class marks, xi and ui are computed and tabulated as shown below.
.S
aA
di
Class fi xi ui = f i ui
itil
c
Mb
15 25 6 20 3 18
25 35 11 30 2 22
35 45 7 40 1 7
.S
aA
45 55 4 A ! 50 0 0
itil
55 65 4 60 1 4
Mb
65 75 2 70 2 4
75 85 1 80 3 3
n
X n
X
.S
N= fi = 35 f i ui = 36
aA
itil
i=1 i=1
Mb
From the table above, the mean, x is camputed by coding method as follows:
n
c X
x=A+ f i ui
.S
N i=1
aA
itil
10
= 50 + ( 36)
Mb
35
= 39.71
Example 1.13. The mean score in Mathematics test of the following students was 43.9 as it was
.S
recorded by a marker. However at the next stage of marking processes, the checker observed
aA
some errors in recording scores of two students as shown in the table below:
itil
Mb
Student’s Names Checker’s Observation
Ernest Mbuya Correctly recorded
Revina Revelian Correctly recorded
.S
aA
Petro C. Mwengo Correctly recorded
itil
Happy S. Henna Correctly recorded
Mb
Mponda D. Ambrose Correctly recorded
Rose B. Kiria Wrongly recorded
Siyajali Charles Correctly recorded
.S
aA
Robert L. Machenya Correctly recorded
itil
Diana Chawe Correctly recorded
Mb
Bakari M. Mdoe Correctly recorded
Irene Norbert Correctly recorded
Johnbull Lusenga Correctly recorded
.S
aA
Enatha Bugumile Correctly recorded
itil
Maziku M. Kaswende Correctly recorded
Mb
Joyce P. Nyaulingo Correctly recorded
Kilimu M. Chuma Wrongly recorded
Matrida Y. Ngasoma Correctly recorded
A.S
of Kilimu M. Chuma was recorded as 90 instead of 09. If the mistaken scores were removed
aA
and replace with the correct score as suggested by a checker on the data above, find the correct
itil
mean score.
Mb
P
Solution: Let x be the sum of scores of 20 students
P as recorded by a marker, so
x
.S
x=
aA
PN
itil
x
43.9 =
Mb
X 20
x = 43.9 ⇥ 20
.S
= 878.
aA
We need to remove 74 and 90 from the sum and then we add 47 and 09 to obtain the correct
itil
P
new sum xnew as follows: X
Mb
= 770.
aA
xnew
xnew =
Mb
N
770
=
20
.S
aA
) xnew = 38.5.
itil
1. Use the direct and deviation method to find the mean of numbers 12, 16, 10, 20 and 25.
2. Find the mean of the following frequency distribution by both direct and deviation meth-
.S
ods:
aA
itil
Size(x) 1 3 5 7 9 11 13 15
Mb
Frequency(f ) 3 3 4 14 7 4 3 4
3. Find the mean of the following frequency distribution of x variable with its corresponding
.S
aA
frequency f .
itil
Mb
x 20 30 40 50 60
f 1 3 20 2 4
.S
aA
4. The table below gives scores of 100 candidates in mathematics test:
itil
Mb
Scores 9 12 15 18 21 24 27 30 33 36
Candidates 12 10 15 19 12 14 3 10 4 1
.S
aA
By using the assumed mean of 18, find the mean score correct to 2 decimal places.
itil
5. The table below shows distribution of marks in a final examination in Advanced level
mathematics Mb
A.S
Marks 90 99 80 89 70 79 60 69 50 59 40 49 30 39
No of students
a
9 32 43 21 11 3 1
itil
Mb
Score 0 4 5 9 10 14 15 19 20 24 25 29
Mb
Frequency 5 4 7 7 5 2
7. The age of children from primary to secondary school levels in one of the private school
itil
Age(years) 5 9 9 13 13 17 17 21
.S
Number of children 16 32 37 25
aA
itil
Find the mean age of children by using direct, deviation and coding methods. Give your
Mb
8. The range of weight of children in a certain primary school were summarized in the table
.S
aA
below:
itil
Mb
Find the mean weight of children by using direct, deviation and coding methods. Approx-
aA
9. A random sample of 120 maize grains were collected. Each grain was weighted to the
nearest 0.01 gm and the results were summarised in the table below:
.S
1.10 1.29 7
aA
1.30 1.49 24
itil
1.50 1.69 33
Mb
1.70 1.89 32
1.90 2.09 14
.S
2.10 2.29 8
aA
2.30 2.49 1
itil
2.50 2.69 1
Mb
Find the mean weight in gm of maize grains.
.S
10. The mean of 10 different numbers is 8, what will be the new mean if each number is
aA
increased by 7?
itil
Mb
11. Find the mean of 30 different numbers given that the mean of 10 of them is 12 and the
mean of the remaining 20 is 9.
.S
aA
12. The mean of 5 different numbers is 9, if a number x is added to the 5 numbers, the new
itil
mean is 10. Find the value of x.
13. The mean of numbers 10, 15, 30, 8, m, 42 and 5 is 18. Find the value of m. Mb
14. The mean of 5 different numbers is 12 and the mean of other 5 different numbers is 6.
A.S
15. The mean height of civil engineering students in the table below is 91cm.
Number of students 3m 21 34 52 5m 11
aA
itil
16. The mean of data set xi with their respective frequencies fi in the table below is 14.
.S
aA
xi 5 10 15 20 25
itil
fi f1 f2 8 4 5
Mb
below.
itil
Mb
18. Dar es Salaam is one of cars’congested city in Tanzania, the frequency distribution below
aA
shows how much time in minutes normally employees took to travel to one of the working
itil
station.
Mb
Employees 55 32 8 3 2
aA
itil
.S
aA
2. 8 5. 74 8. 21.168 11. 10 14. 9 17. 12.76
itil
Mb
3. 41.67 6. 13.5 9. 1.687 12. 15 15. 4 18. 23
1.2.2 Median
.S
aA
The median is a location parameter which separates the higher half from the lower half of a
itil
data sample of a population or a probability distribution. For a data set, it may be thought of as
Mb
the ”middle” value.
Definition 1.8. The median is the middle value of a finite set of numbers arranged either
.S
aA
in ascending or descending order.
itil
Mb
(a) Raw data
To obtain the median of raw data, we need to re-arrange the data values in order of
.S
magnitude, either from small number to large number or from large number to small
aA
number. Suppose we have N numbers. If N is an odd number, then there is a middle
itil
value which is the median, and it is given by
median = 12 (N + 1)th number. (1.2.6) Mb
If N is an even number, then there are two middle numbers, at 12 N th and 1
N + 1 th,
A.S
2
therefore,
a
itil
Example 1.14. Find the median of the data 23, 15, 87, 32, 22, 82, 53, 65, 72, 30, 45
.S
Solution: The numbers in ascending order are 15, 22, 23, 30, 32, 45, 53, 65, 72, 82, 87. For
aA
= 6th number
itil
= 45.
Mb
Example 1.15. Find the median of the data 15, 16, 10, 4, 9, 19, 12, 3, 7, 17, 23, 5.
.S
aA
Solution: The numbers in ascending order are 3, 4, 5, 7, 9, 10, 12, 15, 16, 17, 19, 23. Since
itil
N = 12 which is even, the 12 N th which is the 6th number is 10 and the 12 N + 1 th which
Mb
= 12 (10 + 12)
itil
= 12 (22)
Mb
= 11.
.S
For data given by frequency distribution table, the median can be found directly from
itil
cumulative frequency distribution. From either side of the distribution table, the median
Mb
the median is the average value where cumulative frequency is at least 12 N and the next
.S
1
N + 1 as depicted ie equation (1.2.7).
aA
2
itil
Example 1.16. The table below shows the scores obtained when a die is thrown 63 times.
Mb
Find the median score of this experiment.
Score (x) 1 2 3 4 5 6
.S
aA
Frequency (f ) 12 10 11 13 8 9
itil
Mb
Solution: Since N = 63 then the position of the median is at 12 (N +1)th, which is computed
as 12 (63 + 1)th = 32th. Adding frequencies from the left we get 33 which is at least 32 and
the corresponding value is 3, and if we add frequencies from the right we get 41 which is
.S
aA
at least 32 and the corresponding value is also 3. Therefore, the median score is 3.
itil
Mb
Example 1.17. The frequency distribution table below shows the body masses in kilo-
grams of 150 freshers joined Arusha Technical College in the 2016/17 academic year. Use
the table to find the median mass of students.
.S
aA
Mass (Kg) 42.92 43.12 46.54 47.66 49.88 58.44
itil
Frequency (f ) 18 42 15 18 32 25
Mb
Solution: Since the number of students is N = 150 which is even, then the position of
A.S
the median is between the numbers where the cumulative frequency is at least 12 N and
a
1
N + 1 . The position of the first number is at 12 N = 75th which is 46.54 and the second
itil
2
number is at 12 N + 1 = 76th, which is 47.66. By taking the cumulative frequency from
Mb
either side of the table, the position of the median is between 46.54 and 47.66. Therefore,
.S
x 1 10 11 20 21 30 31 40 41 50 51 60
itil
Mb
f 3 5 7 6 5 5
lies in the 4th class. Since we have 15 observations in the first 3 classes, the median is the
aA
Example 1.18. Gains in mass of Lion cabs for a certain period in a Zoological park were
recorded as follows:
.S
aA
Mass (Kg) 5 9 10 14 15 19 20 24 24 29 30 34
itil
Frequency 2 29 37 16 14 2
Mb
Determine the median class and calculate the median weight of cabs. Give your answer to
2 decimal places.
.S
aA
itil
Solution: The total number of observations is N = 100, this is an even number. The
Mb
cumulative frequency of at least 12 N = 50 coincides in one class from both sides which is
15 19. This is the median class. The median class is the class which contain the median,
.S
where we have the following information:
aA
X
fb = 2 + 29 = 31, sum of frequency of classes below the median class
itil
Mb
fm = 37, frequency of the median class
L = 14.5, the lower class boundary of the median class
.S
c = 5, class interval or class size of the median class
aA
itil
Now, we use the formula to obtain the✓medianPas below;
Mb
1 ◆
N fb
Median = L + 2
c
fm
✓ ◆
.S
aA
50 31
= 14.5 + 5
itil
37
✓ ◆
Mb
29
= 14.5 + 5
37
.S
= 14.5 + 2.5̇6̇7̇
aA
= 17.0675̇6̇7̇
itil
) Median = 17.07 kg correct to 2 decimal places.
Mb
Example 1.19. From the data in the table below, find the median.
A.S
Class
a
0 10 10 20 20 30 30 40 40 50 50 60
itil
Frequency 6 8 14 16 4 2
Mb
information
X
aA
N f b
Median = L + 2 c
fm
✓ ◆
.S
25 14
aA
= 20 + 5
itil
14
✓ ◆
Mb
11
= 20 + 10
14
.S
= 20 + 7.857142857
aA
= 27.857142857
itil
Example 1.20. In one of a popular motel, 68 home appliances are installed. The con-
sumption of electricity of these appliances per week are given in the table below:
.S
aA
itil
Mb
.S
40 80 4
aA
80 120 f1
itil
120 140 13
Mb
140 160 20
180 220 14
.S
220 240 f2
aA
240 260 4
itil
Mb
If the median consumption of electricity is 152 kWh per week, determine the number of
missing home appliances f1 and f2 .
.S
aA
Solution: Since the given number of home appliances is 68, then
itil
4 + f1 + 13 + 20 + 14 + f2 + 4 = 68
Mb
f1 + f2 + 55 = 68
f1 + f2 = 13.
.S
The class intervals are not uniform, but the median formula remain unaltered and the
aA
class interval of the median class is taken. We have the median which is 152, so it lies
itil
in the class 140 160, which
P is the median class. From the median class we obtain L =
1
140, fm = 20, 2 N = 34,
✓1 P ◆ Mb
fb = f1 + 17 and c = 20. So, from the median formula which is
N fb
Median = L + 2 c, then
A.S
fm
✓ ◆
a
itil
34 (f1 + 17)
152 = 140 + 20
Mb
20
152 = 140 + 140 + 34 17 f1
.S
152 = 157 f1
aA
f1 = 5, but
itil
Mb
f1 + f2 = 13 ) 5 + f2 = 13
) f1 = 5 and f2 = 8.
.S
aA
Number of children 0 1 2 3 4 5
aA
Frequency 3 5 12 9 4 2
itil
Mb
3. Find the median of the given frequency distribution of a variable x with frequency f below.
.S
aA
itil
x 10 12 15 18 24 28 32
Mb
f 4 7 2 14 9 6 12
4. The masses measured to the nearest kg of 49 boys are noted and recorded in the distribu-
.S
tion table below.
aA
itil
Mass in kg 60 64 65 69 70 74 75 79 80 84 85 89
Mb
Frequency 2 6 12 14 10 5
.S
Determine the median class and hence find the median of the data.
aA
itil
5. The data in the table below is the distribution of weight in gm of certain living species.
Mb
Weight interval in gm Frequency
0.1 0.9 2
.S
aA
1.1 1.9 5
itil
2.1 2.9 7
Mb
3.1 3.9 2
4.1 4.9 3
5.1 5.9 1
.S
aA
itil
Calculate the median weight of the species.
6. A herd of 100 buffalos in Serengeti National Park has an average weight as shown in the Mb
frequency distribution table below:
a A.S
0 100 2
100 200 5
200 300 n1
.S
300 400 12
aA
400 500 17
itil
Mb
500 600 20
600 700 n2
700 800 9
.S
aA
800 900 7
itil
900 1000 4
Mb
Determine the number of missing buffalos n1 and n2 if the median is 525 kg.
.S
7. The table below gives age in days for a certain baby care class, find the median of the age
aA
6. 9; 15
7. 142
.S
aA
itil
Mb
1.2.3 Mode
.S
Mode is the number which appear may times than other numbers, or simply we say is the
aA
number with highest frequency.
itil
Mb
Definition 1.9. Mode of a set of data is the value which occurs most often.
.S
If the data have the same frequency, such set of data has no mode, and if there are more than
aA
one number with the same highest frequency list them all as modes of the data set.
itil
Mb
(a) Raw data
For raw data, the mode is obtained by observation, but we advice to re-arrange numbers
.S
in order of magnitude to simplify the process of identifying a mode.
aA
itil
Example 1.21. Find the mode of this data; 1, 5, 2, 0, 4, 2, 1, 2, 2, 3
Mb
Solution: Put the numbers in order so it is easier to spot similar numbers; 0, 1, 1, 2, 2, 2,
2, 3, 4, 5. The mode is 2 because the number 2 appears four times.
.S
aA
Example 1.22. Find the mode of this data 2, 5, 3, 4, 0, 1, 6
itil
Mb
Solution: Putting the numbers in order of magnitude as 0, 1, 2, 3, 4, 5, 6, so it is easier
to spot similar numbers. There is no mode because there is no number that appears more
A.S
The mode is 2 and 3 because the number 2 and 3 appear twice, which is more than any
aA
Example 1.24. The table below shows the marks obtained by a group of players in a game.
aA
Marks 0 1 2 3
Frequency 11 10 19 10
.S
aA
Solution: The number with highest frequency is 2, it appeared 19 times, therefore the
itil
mode is 2.
Mb
class. The estimate of the mode is obtained within the modal class and not otherwise. For
itil
grouped data, the mode is estimated from; calculation (formula), histogram and from the
Mb
cumulative curve. Suppose we have a modal class with class interval size c and lower class
boundary L, the mode is estimated by formula as
✓ ◆
.S
d1
Mode = L + (1.2.8)
aA
c.
d1 + d2
itil
where d1 and d2 are the excess frequency of the modal class frequency, fm to the next
Mb
.S
aA
itil
Mb
.S
aA
itil
Mb
.S
aA
We can eliminate d1 and d2 and re-write the mode formula given by (1.2.8) in terms of
itil
f1 , f2 and fm as follow: ✓ ◆
Mb
d1
Mode = L + c
d1 + d2
✓ ◆
.S
fm f1
aA
=L+ c
fm f1 + fm f2
itil
✓ ◆
fm f1
Mb
=L+ c.
2fm f1 f2
Therefore, the other form of the mode formula is given below,
⇣ ⌘
A.S
Mode = L + 2fm f1 f2 c.
fm f1
(1.2.9)
a
itil
Sometimes, f1 is known as the frequency preceding the modal class and f2 is the frequency
Mb
Example 1.25. From the data in the table below, find the estimated mode of the data by
.S
Classes 0 10 10 20 20 30 30 40 40 50 50 60
Frequencies 6 8 14 16 4 2
.S
aA
Solution: The modal class is the class with highest frequency of fm = 16, which is 30 40,
itil
✓ ◆
aA
2
= 30 + ⇥ 10
itil
2 + 12
Mb
20
= 30 + ⇥ 10
14
.S
= 31.428571429
aA
Example 1.26. Use the formula in equation (1.2.9) on page 25 to compute the mode from
the data in the table below. Give your estimate to 4 decimal places.
.S
aA
itil
Mb
Class Frequency
.S
1 10 4
aA
11 20 12
itil
21 30 24
Mb
31 40 36
41 50 20
.S
51 60 16
aA
61 70 8
itil
71 80 5
Mb
Solution: The modal class is 31 40 where L = 30.5, fm = 36, f1 = 24, f2 = 20, and
.S
c = 10. Thus, we find mode from ✓ ◆
aA
fm f1
Mode = L + c
itil
2fm f1 f2
Mb
✓ ◆
36 24
= 30.5 + ⇥ 10
2 ⇥ 36 24 20
.S
12
aA
= 30.5 + ⇥ 10
28
itil
= 34.7857 to 4 decimal places.
Mb
Example 1.27. The mode of the marks of 60 students in Math test in the table below is
32. Find the values of missing frequencies denoted by x and y.
A.S
a
itil
Marks 1 10 11 20 21 30 31 40 41 50 51 60
Mb
Frequency 13 x 15 18 y 5
13 + x + 15 + 18 + y + 6 = 60
itil
x + y + 51 = 60
Mb
x + y = 9.
We are given that mode is 32, so it lies in the class 31 40, which is the modal class.
.S
From the modal class we obtain L = 30.5, fm = 18, f1 = 15, f2 = y and c = 10. So
aA
d1
Mb
Mode = L + c
d1 + d2
✓ ◆
3
.S
32 = 30.5 + 10
aA
3 + 18 y
✓ ◆
itil
3
32 = 30.5 + 10
Mb
21 y
1.5y = 1.5 ) y = 1, but
.S
x+y =9)x+1=9
aA
) x = 8 and y = 1.
itil
Mb
The mode also can be estimated from the histogram, the histogram is the one of elemen-
tary tool to derive the formula for the estimated mode, M , of grouped data. Consider
Figure 1.2.1 which shows three rectangles of a histogram, where d1 = fm f1 is the excess
.S
aA
frequency of the modal class frequency, fm to the next lower frequency f1 and d2 = fm f2
itil
is the excess frequency of the modal class frequency, fm to the next higher frequency f2 .
Mb
By assuming that class intervals have equal size we define the mode M as abscissa of the
point of intersection P of constructed lines AC and BD.
.S
aA
itil
Mb
.S
aA
itil
Mb
.S
aA
Figure 1.2.1
itil
Mb
Let L and U be lower and upper class boundaries of a modal class respectively, and since
4P AB ⇠ P CD we have
EP PF M L U M
.S
= ) = .
aA
AB CD d1 d2
itil
Simplifying the above equation we obtain
Mb
d2 (M L) = d1 (U M )
(d1 + d2 )M = d1 U + d2 L
A.S
d1 U + d2 L
M= but U = L + c
a
d1 + d2
itil
d1 (L + c) + d2 L
Mb
=
d1 + d2
(d1 + d2 )L + d1 c
=
.S
d + d2
aA
✓1 ◆
d1
itil
)M =L+ c.
Mb
d1 + d2
Estimation of the mode from cumulative frequency curve. The cumulative frequency curve is
greatest at the point corresponding to the mode. Therefore, at the point of inflexion of the
.S
aA
cumulative frequency curve it is where the mode is. To estimate the mode, use a ruler to
itil
estimate the point where the curve has maximum gradient as shown in Figure 1.2.2 below.
Mb
.S
aA
itil
Mb
.S
aA
itil
Mb
.S
Figure 1.2.2
aA
itil
Example 1.28. Use histogram to estimated the mode of the data given in the table below:
Mb
Classes 3 7 8 12 13 17 18 22 23 27 28 32
Frequency 4 10 14 8 4 10
Solution: With suitable scale, draw the histogram as shown in Figure 1.2.3, construct dotted
.S
lines AC and BD to join the corners A, B, C and D of which they intersect at point P . Draw a
aA
vertical line from point P down to the horizontal line as shown by red dotted arrow, and read
itil
the mode from the line of class mark where the arrow hits. From Figure 1.2.3, the mode is
Mb
approximated to 14.5.
.S
aA
itil
Mb
.S
aA
itil
Mb
.S
aA
itil
Mb
A.S
a
itil
Mb
.S
aA
itil
Mb
Figure 1.2.3
(a) 4, 3, 3, 7, 2, 8, 3, 5, 6, 3, 4, 3, 3, 5, 2, 3
.S
2. In an experiment, a die was tossed 50 times and the numbers occurred in each event was
aA
Number 1 2 3 4 5 6
Frequency 7 4 9 12 10 8
.S
aA
3. Estimate the mode of the following distribution which shows the scores of 60 students in
itil
Score 1 10 11 20 21 30 31 40 41 50 51 60
Frequency 18
.S
13 8 15 1 5
aA
itil
4. The mode of a test in Engineering drawing for 68 students is 61.5, the distribution of marks
Mb
x 30 39 40 49 50 59 60 69 70 79 80 89 90 99
.S
f m 12 14 16 n 6 6
aA
itil
Find the missing number of students m and n.
Mb
5. The age of students in Leather Products Technology NTA 5 in a certain institute are shown
in a table below:
.S
aA
itil
Age 18 26 26 34 34 42 42 50 50 58
Mb
F requency 8 12 30 25 7
.S
aA
6. Find the mode of the following data in a table below:
itil
Mb
Length (cm) 0 5 10 15 20 and over
Frequency 8 12 6 3 1
.S
aA
Answers to Exercise 1.2.3 (On page 28.)
itil
2. 4 3. 32 4. 6; 8 6. 7 Mb
A.S
In this section, we will learn how to use fractiles to specify the position of a data entry within
Mb
a data set. Fractiles are numbers that partition, or divide, an ordered data set into equal parts.
For instance, the median is a fractile because it divides an ordered data set into two equal parts.
Percentiles are position measures used in educational and health-related fields to indicate the
.S
aA
position of an individual in a group. The quartile, percentile and decile are the measures of
itil
1.3.1 Quartile
The word quartile originates from the word quarter, so if you have an amount, divide by four to
.S
aA
Definition 1.10. Quartile is the number which divides data distributions into four equal
groups, the quartiles are denoted by Q1 , Q2 or Q3 .
.S
aA
Note that Q1 is the lower quartile range and is the same as the 25th percentile; Q2 is the same
itil
as the 50th percentile, or the median; Q3 is the upper quartile range and it corresponds to the
Mb
order, it is possible to find the quartiles. However, there are different methods for finding
itil
the quartiles, which give different results or sometimes the same result. The methods
Mb
are the Moore and McCabe, Tukey and Mendenhall and Sincich methods. Within each
method, the method is slightly different, it depends on whether there are an odd or even
number of numbers in data set. The Moore and McCabe (M & M) method is our choice
.S
under this particular case, because it is in conformity with the definition of quartile.
aA
If N is a number of data set, whatever even or odd, take the median as a reference point
itil
Mb
which divides data set into two equal parts (excluding the median), lower half and upper
half when arranged in ascending order. If the lower and upper halves will have an odd
number of data set then, the lower quartile Q1 is within the lower half and it is given by,
.S
Q1 = 14 (N + 1) th numbers. (1.3.1)
aA
itil
Similarly for upper quartile Q3 is within the upper half of the data set, and it is given by
Mb
Q3 = 34 (N + 1) th numbers. (1.3.2)
The second or middle quartile Q2 is the median of the data set, thus is obtained by using
formula in equation (1.2.6) on page 19. If the lower and upper halves will have an even
.S
aA
number of data set then the lower quartile Q1 is the average of two middle numbers. The
itil
numbers are at 14 N th and 14 N + 1 th of the first lower half of data set, therefore,
Mb
Q1 = average of 14 N th and 1
4
N + 1 th numbers. (1.3.3)
Similarly for upper quartile Q3 , there are two middle numbers, at and + 1 th 3
N th 3
N
.S
4 4
of the second upper half of data set, therefore,
aA
Q3 = average of 34 N th and 34 N + 1 th numbers. (1.3.4)
itil
Mb
The second or middle quartile Q2 is the median of the data set, thus, it is obtained by
using the formula in equation (1.2.7) on page 19. Note that, if the position of a quartile
is not an integer, it is approximated to the nearest integer, for example; for N = 7, the
A.S
1
N th ⇡ 2th and the 34 N th ⇡ 5th.
a
4
itil
The Interquartile Range (IQR) is defined as the difference between Q1 and Q3 as given by
Mb
the formula
IQR = Q3 Q1 . (1.3.5)
The interquartile range is also known as the range of the middle 50% of the data.
.S
aA
SIQR = Q3 2 Q1 . (1.3.6)
Mb
Example 1.29. Find the lower and upper quartiles of the following data
18, 16, 13, 14, 12, 10, 11, 15, 17, 19
.S
aA
.
itil
Since N = 10, it is an even number, but the lower and upper halves contain odd number
aA
Q1 = 14 (N + 1)th number
Mb
⇡ 3rd number
itil
) Q1 = 12.
Mb
= 34 (11)th number
itil
Mb
⇡ 8th number
) Q3 = 17.
Example 1.30. Find the lower quartile, upper quartile and the semi-interquartile range of
.S
the data set below
aA
10, 15, 20, 4, 16, 17, 6, 18, 20, 14, 17.
itil
Mb
Solution: The numbers in ascending order are
4, 6, 10, 14, 15, 16, 17, 17, 18, 20, 20.
| {z } | {z }
.S
Lower half Upper half
aA
Since N = 11, it is an odd number, the lower and upper halves also contain an odd number
itil
of data set, so lower quartile range is
Mb
Q1 = 14 (N + 1)th number
= 14 (11 + 1)th number
.S
= 14 (12)th number
aA
itil
= 3rd number
Mb
) Q1 = 10.
The upper quartile is given by
Q3 = 34 (N + 1)th number
.S
aA
= 34 (11 + 1)th number
itil
= 34 (12)th number
= 9th number Mb
) Q3 = 18.
A.S
Q3 Q1
SIQR =
Mb
2
18 10
=
2
.S
8
aA
=
2
itil
) SIQR = 4.
Mb
⇥ ⇤
= 12 124
th number + 124
+ 1 th number
= 12 [3rd number + 4th number]
.S
= 12 (12 + 17)
aA
itil
) Q1 = 4.5.
Mb
= 12 12 th number + 12 + 1 th number
aA
2 2
= 12 [6th number + 7th number]
itil
Mb
= 12 (6 + 7)
) Q2 = 6.5.
.S
Q3 = average of 34 N th and 34 N + 1 th numbers
aA
⇥ ⇤
= 12 36 th number 36
th number
itil
4
+ 4
+ 1
Mb
= 12 [9th number + 10th number]
= 12 (12 + 17)
) Q3 = 14.5.
.S
aA
Deduction: From the examples above, we can simply say that, the lower quartile range is
itil
the median of the data in the lower half, and the upper quartile range is the median of the
Mb
data in the upper half.
.S
aA
itil
Example 1.32. Given the following frequency distribution table.
Mb
Class Intervals 5 9 10 14 15 19 20 24 25 29 30 34
.S
Frequency 4 9 16 12 6 3
aA
itil
Calculate the lower and upper quartiles, find also the inter-quartile range.
Mb
Solution: From the table, N = 50, so the position of lower quartile is 14 N th = 12.5th,
A.S
N f
itil
b
Q1 = L + 4 c
Mb
fQ1
✓ ◆
12.5 4
= 9.5 + 5
9
.S
aA
= 9.5 + 4.722
itil
) Q1 = 14.22.
Mb
The position of upper quartile is 34 N th = 37.5th, which lie in the class 20 24 where
L = 19.5 and fQ3 = 12. Thus, P ◆
✓3
.S
N fb
aA
4
Q3 = L + c
f Q3
itil
✓ ◆
Mb
37.5 29
= 19.5 + 5
12
= 19.5 + 3.54
.S
aA
) Q3 = 23.04.
itil
IQR = Q3 Q1
= 23.04 14.22
.S
) IQR = 8.82.
aA
itil
Example 1.33. The masses of 45 apples measured to the nearest gram were noted and
Mb
(a) Construct a frequency distribution with class size of 5, taking the lower limit as 84 for
.S
the first class interval.
aA
(b) Calculate;
itil
Mb
(i) Median
(ii) Upper and lower quartiles
(iii) Semi-Interquartile Range
.S
aA
Solution:
itil
Mb
(a) The frequency distribution table is given below
Class Interval Tallies Frequency
.S
84 88 // 2
aA
89 93 //// 5
itil
//// 5
Mb
94 98
99 103 //// //////// 14
104 108 /////// 8
.S
109 113 //// 5
aA
114 118 ///// 6
itil
P
Mb
(b) (i) The median class is 99 103, where L = 98.5, fm = 14, i = 5 and fb = 12.
Also 2 = 2 = 22.5. Thus,
N 45
N
P !
fb
A.S
Median = L + 2 i
fm
a
itil
✓ ◆
Mb
22.5 12
= 98.5 + ⇥5
14
= 98.5 + 3.75
.S
aA
= 102.25
itil
) Median = 102.25.
Mb
P
(ii) The class containing upper quartile is 104 108, from which we get L = 103.5, fb =
26, fQ3 = 8 and i = 5. Also 4 N = 4 ⇥ 45 = 33.75, thus the upper quartile (Q3 ) is
3 3
.S
given by P ◆
aA
✓3
N fb
itil
4
Q3 = L + i
fQ3
Mb
✓ ◆
33.75 26
= 103.5 + ⇥5
8
.S
aA
= 103.5 + 4.843
itil
= 108.34
Mb
) Q3 = 108.34. P
The class containing lower quartile is 94 98, from which we get L = 93.5, fb =
7, fQ1 = 5 and i = 5. Also 4 N = 4 ⇥ 45 = 11.25, thus the lower quartile (Q1 ) is
.S
1 1
aA
given by ✓1 P ◆
itil
N fb
Mb
4
Q1 = L + i
fQ1
✓ ◆
11.25 7
= 93.5 + ⇥5
.S
5
aA
itil
= 93.5 + 4.25
Mb
= 97.75
) Q1 = 97.75.
.S
Q3 Q1
S.I.R =
aA
2
itil
108.34 97.75
Mb
=
2
= 5.295
.S
) S.I.R = 5.295.
aA
itil
Exercise 1.3.1 (Answers on page 35.)
Mb
1. Find the three quartiles of each of the following data sets:
.S
(a) 11, 45, 47, 49, 50, 52, 53, 59, 60, 94, 117
aA
itil
(b) 78, 79, 89, 79, 86, 79, 80, 82, 82, 85, 88, 92, 97
Mb
(c) 50, 52, 64, 52, 75, 52, 53, 54, 56, 58, 61, 65, 69, 77
(d) 5, 10, 11, 11, 14, 17, 19, 24, 30, 36, 40, 45
.S
aA
2. Find the lower, upper and inter-quartile ranges of each of the following information:
itil
Mb
(a) 9, 12, 14, 3, 5, 24, 6, 3, 20, 8, 19
(b) 5, 10, 12, 2, 3, 6, 11, 13
A.S
(c) 13, 16, 18, 14, 12, 10, 11, 15, 19, 17.
a
itil
3. The lower quartile of a data set is the 8th data value. How many data values are there in
the data set?
.S
aA
4. A sample of weights of 40 college students in kg yields the following sorted data below.
itil
Mb
5. The data 18.2 is too small in such a way that it might be recorded wrongly in question
itil
5 above, if we delete this smallest observation, what is the inter-quartile range of the
Mb
remaining observations?
6. The information in the table below show distribution of final economics examination.
.S
aA
Marks No of students
itil
80 89 2
Mb
70 79 3
60 69 6
.S
50 59 4
aA
40 49 5
itil
30 39 8
Mb
.S
aA
Scores 40 48 50 58 60 68 70 78 80 88 90 98 100 108
itil
Frequency 6 4 5 5 4 3 1
Mb
Use the given information to find lower quartile, upper quartile and semi-inter quartile
.S
range.
aA
itil
8. Find the inter quartile range of the university students’age given in the table below.
Mb
Age (years) 15 20 20 25 25 30 30 35 35 40 40 45
Number of students 1 14 25 10 8 1
.S
aA
itil
9. The weights, x in kilogram (kg) of a random sample of female cows at a one of the market
Mb
is given in table below.
.S
aA
120 x < 140 6
itil
140 x < 150 16
Mb
150 x < 160 30
160 x < 170 36
170 x < 180
A.S
30
180 x < 200 0
a
itil
Find the lower and upper quartiles of the weight of female cows, give your answer correct
to one decimal place.
.S
aA
itil
1.
.S
1.3.2 Percentile
aA
itil
Mb
Definition 1.11. Percentile is the number that divides the data set into 100 equal groups.
The Mendenhall and Sincich method is commonly used to find the percentiles of un-
grouped data, it locates the percentiles which are part of the data set. If N is the total
number of data set, the percentile Pk is given by
.S
aA
k
Pk = (N + 1)th number, (1.3.7)
itil
100
Mb
1.3.3 Decile
.S
aA
Definition 1.12. Decile is the number that divides the data set into 10 equal parts.
itil
Mb
Symbolically deciles are represented by D1 , D2 , D3 , · · · , D9 . The D1 is known as lower decile of
decile one, D5 is the middle decile which is the same as median, and D9 is the upper decile.
.S
aA
(a) Decile for Raw Data
itil
Mb
Definition 1.13. Percentile is the number that divides the data set into 100 equal groups.
.S
aA
Percentiles are symbolized by P1 , P2 , P3 , · · · , P99 .
itil
Mb
(c) Decile for Grouped Data
.S
aA
Exercise 1.3 (Answers on page 36.)
itil
Mb
1. Given the data set 2, 3, 5, 6, 8, 10, 12, 15, 18, 20. Find P40 , P60 and P75
2. Given the data set 3, 4, 4, 6, 8, 10, 10, 12, 12, 12, 13, 15, 15, 15, 16, 17, 20, 22, 25, 27. Find P10 ,
A.S
3. Find the 6th decile and the 30th percentile of the following data
Mb
4. Find the 25th percentile or P25 of the following test scores of a random sample of ten
.S
students:
aA
35, 42, 40, 28, 15, 23, 33, 20, 18 and 28. (04 Marks)
itil
Mb
5. Find the semi-decile range from the following distribution; 5, 10, 12, 2, 3, 6, 11, 13
Scores 21 30 31 40 41 50 51 60 61 70 71 80 81 90
Mb
Frequency 18 16 9 20 24 1 12
Use the given information to find D1 , D9 , P10 and P90 . Comment on the results obtained.
.S
aA
from DIT Mwanza campus and the scores were recorded as shown in the table below.
Scores 21 25 26 30 31 35 36 40 41 45 46 50
.S
No of graduates
aA
6 12 9 11 8 4
itil
Mb
Find the 7th decile of the scores. If the pass mark of the interview is above 40, how many
passed the interview?
.S
5.
Mb
.S
Dispersion is a measure of the extent to which the individual item vary from a central value.
aA
Dispersion is used in two senses; difference between the extreme items of the series and average
itil
Mb
of deviation of items from the mean. Measures of dispersion or variation are range, mean
deviation, variance and standard deviation.
.S
(a) Range
aA
itil
Mb
Definition 1.14. The range of a set of numbers is the difference between the largest and
the smallest number in the set.
.S
aA
The definition above holds for raw or ungrouped data, such that if R is the range, L is the
itil
smallest number and H is the largest number, then
Mb
R = H L. (1.4.1)
For grouped data, L is the lower class boundary of the lowest class and H is the upper
.S
class boundary of the highest class.
aA
itil
Example 1.34. Find the range of data set 10, 20, 30, 40, 50, 60 and 70
Solution: The smallest number is L = 10 and the largest number is H = 70, thus the Mb
range is given by
A.S
R=H L
a
itil
= 70 10
Mb
) R = 60.
Example 1.35. Find the range of data set given in the frequency distribution table below
.S
aA
itil
Class 40 49 50 59 60 69 70 79 80 89
Mb
Frequency 18 16 12 10 15
Solution: The lowest class is 40 49 and its lower class boundary is L = 39.5. The highest
.S
aA
class is 80 89 and its upper class boundary is H = 89.5, thus the range is given by
itil
R=H L
Mb
= 89.5 39.5
) R = 50.
.S
aA
Mean deviation is the mean of the absolute deviations of a set of observations, taken from
Mb
a mean. Mean deviation, M.D, from the mean, x, for raw data x1 , x2 , x3 , · · · , xn is given by
n
1 X
M.D = |xi x|. (1.4.2)
.S
N i=1
aA
deviation is given by
n
1 X
M.D = fi |xi x|, (1.4.3)
N i=1
.S
aA
itil
n
X
where N = fi . This formula also applies for grouped data distributions where xi is
Mb
i=1
represented by class mark or mid point.
Example 1.36. The following data are the marks of 10 students in mathematics examina-
.S
tion; 38, 40, 70, 48, 44, 46, 55, 54, 42, 63. Find the mean deviation.
aA
itil
Solution: We tabulate the data as below in order to find the mean deviation
Mb
x x x |x x| |x x|2
38 -12 12 144
.S
aA
40 -10 10 100
itil
70 20 20 400
Mb
48 -2 2 4
44 -6 6 36
46 -4 4 16
.S
55 5 5 25
aA
54 4 4 16
itil
Mb
42 -8 8 64
63 13 13 169
P P
.S
From the table above, we have |x x| = 84, and |x x|2 = 974, so the mean deviation
aA
(M.D) is
itil
P
|x x|
Mb
M.D =
N
84
=
A.S
10
) M.D = 8.4.
a
itil
Mb
Example 1.37. The following table shows the weight in kilogram distribution of College
students.
.S
Weight in kg 55 65 75 85 95
aA
Number of students 8 6 12 11 13
itil
Mb
Solution: Let x be the weights in kg and frequency,f represent the number of students.
.S
aA
We need the mean first, so that we can find the mean deviation, let us tabulate the data as
itil
below.
Mb
x f fx x x |x x| f |x x|
55 8 440 23 23 184
.S
aA
65 6 390 13 13 78
itil
75 12 900 3 3 36
Mb
85 11 935 7 7 77
95 13 1235 17 17 221
P P P
.S
From the table above, N = f = 50, f x = 3900 and f |x x| = 596. The mean
aA
weight is given by
itil
n
1 X
Mb
x= fi xi or in simple form
N i=1
P
.S
fx
aA
=
N
itil
3900
Mb
=
50
= 78.
.S
n
1 X
aA
M.D = fi |xi x| or in simple form
N i=1
itil
P
Mb
f |x x|
=
N
596
.S
=
aA
50
itil
) M.D = 11.92.
Mb
(c) Variance ( 2 )
The variance of a set of observation x1 , x2 , x3 , · · · , xN is the mean of the squares of devi-
.S
ations from mean x of the observations. The variance is usually denoted by Var(X) or
aA
2
. There are about three things to note from the definition of variance; deviations from
itil
mean of the observations (xi x), squares of deviations from mean (xi x)2 and mean of
Mb
the squares of deviations from mean, which is a variance, and for the case of raw data it
is given by
.S
!2
aA
N N N
1 X 1 X 1 X
Var(X) = (xi x)2 = x2 xi . (1.4.4)
itil
N i=1 N i=1 i N i=1
Mb
If di = xi A are deviations of xi from some arbitrary constants A, then the variance by
deviation approach which is known to be short method is given by
A.S
N N N
!2
1 X X X
a
1 1
itil
N i=1 N i=1
For data with frequency distributions, where data xi = x1 , x2 , x3 , · · · , xn have frequencies
fi = f1 , f2 , f3 , · · · , fn , the variance by direct method is given by
.S
!2
aA
n n n
1 X 1 X 1 X
itil
(1.4.7)
aA
The most short method is the coding method, for computing the variance of grouped data,
it reduces biomechanics calculations. When data are grouped into a frequency distribution
whose class interval have equal size c, we have di = cui , where xi = A + cui , the variance
.S
2 !2 3
itil
Xn X n
1 1
Mb
Example 1.38. Find the variance of the data; 28.5, 73.6, 47.2, 31.5 and 64.8
Mb
.S
aA
itil
Mb
.S
x
aA
x=
N
itil
28.5 + 73.6 + 47.2 + 31.5 + 64.8
Mb
=
5
245.6
=
.S
5
aA
= 49.12
itil
X
x2 = 28.52 + 73.62 + 47.22 + 31.52 + 64.82
Mb
= 13648.34.
So, the variance, Var(X) is given by
.S
aA
N
1 X
(xi x)2 , or in simple form
itil
Var(X) =
N i=1
Mb
P 2
x
= (x)2
N
.S
13648.34
aA
= (49.12)2
itil
5
Mb
= 316.89
) Var(X) = 316.89.
A.S
Example 1.39. The mean and variance of 7 observations are 8 and 16 respectively. If five
of the observations are 2, 4, 10, 12, and 14. Find the remaining two observations.
a
itil
Mb
Solution: Let x and y be the otherPtwo observations, so the mean (x) of 7 observations is
x
x=
.S
N
aA
2 + 4 + 10 + 12 + 14 + x + y
8=
itil
7
Mb
56 = 42 + x + y
Simplifying the above we get the equation
x + y = 14 (1.4.9)
.S
aA
2 x
Var(X) = = (x)2
Mb
P 2 N
x
16 = (8)2
.S
7
P 2
aA
x
16 + 64 =
itil
X 7
Mb
2
x = 7 ⇥ 80
X
x2 = 560
.S
X
aA
560 = x2 + y 2 + 460.
Mb
Example 1.40. Use coding method to find the variance of the grouped data in the table
below.
Group 0 9 10 19 20 29 30 39 40 49 50 59 60 69
.S
Frequency 1 5 12 22 17 9 4
aA
itil
Solution: By taking the assumed mean from a group 30 39 which is A = 34.5, we tabulate
Mb
the information as below
Group Class mark(x) Frequency(f )
d = x A u = dc f u u2 f u2
.S
0 9 4.5 30 1 3 3 9 9
aA
10 19 14.5 20 5 2 10 4 20
itil
20 29 24.5 10 12 1 12 1 12
Mb
30 39 34.5 0 22 0 0 0 0
40 49 44.5 10 17 1 17 1 17
.S
50 59 54.5 20 9 2 18 4 36
aA
60 69 64.5 30 4 3 12 9 36
itil
P P 2
Mb
From the table above, we obtain c = 10, N = 70, f u = 22 and f u = 130. by using
coding method given in equation
2 (1.4.8), the variance is given by
n n
!2 3
1 X 2 1 X
.S
Var(X) = c2 4 fi ui 5 or in simplified form
aA
f i ui
N i=1 N i=1
itil
"P ✓ P ◆2 #
Mb
2
f u fu
= c2
N N
" ✓ ◆2 #
A.S
130 22
a
= 100
itil
70 70
Mb
= 175.8
) Var(X) = 175.8.
.S
P
aA
1 X
2
(x A)
Example 1.41. Prove that the variance, Var(X) = (x A) 2
where
itil
N N
Mb
A is an assumed mean.
Solution: The variance, Var(X) of un-grouped data is given by
.S
1 X
aA
Var(X) = (x x)2
N
itil
1 X
Mb
= (x A + A x)2 * x A = d and x A = d
N
1 X
= (d d)2
.S
N
aA
1 X 2 2
itil
= (d 2d.d + d )
N
Mb
1 X 2 1 X 1 X 2
= d 2d.d + d
N N N
1 X 2 X d
.S
2 2
= d 2d +d * d is a constant number
aA
N N P
itil
1 X 2 2 d
*
Mb
= d 2d.d + d =d
N N
1 X 2 2
= d 2d + d2
.S
N
aA
1 X 2 2 X d Xx A
= d d but d = x A, d = =
itil
N N N
P
Mb
1 X
2
(x A)
) Var(X) = (x A)2 , hence proved.
N N
.S
Standard Deviation was introduced by Karl Pearson in 1823. It is the most important and
aA
widely used measure of studying dispersion, because algebraic signs are not ignored and
itil
it is algebraically correct. It also satisfies most of the properties of a good measure of
Mb
dispersion. The standard deviation is the measure of concentration of the data about their
mean. The smaller the standard deviation, the closer the data lie to the mean. Standard
.S
deviation ( ) is the positive square root of the variance given by
aA
p p
= Variance = Var(X) . (1.4.11)
itil
Mb
Hence the standard deviation is the square root of any of the equations (1.4.4), (1.4.5),
(1.4.6), (1.4.7) or (1.4.8) depending on the nature of data we have.
.S
Example 1.42. Find the variance and the standard deviation of the numbers 4, 5, 7, 9, 11, 14, 20.
aA
itil
Solution: We have the numbers 4,P 5, 7, 9, 11, 14, 20, the mean is given by
Mb
x
x=
N
.S
4 + 5 + 7 + 9 + 11 + 14 + 20
aA
=
7
itil
70
Mb
=
7
) x = 10
In order to find the variance, we construct the following frequency distribution table.
a A.S
itil
x x x (x x)2
Mb
4 6 36
5 5 25
.S
7 3 9
aA
9 1 1
itil
11 1 1
Mb
14 4 16
20 10 100
.S
aA
(x x)2
Var(X) =
Mb
N
36 + 25 + 9 + 1 + 1 + 16 + 100
=
7
.S
aA
188
=
itil
7
Mb
= 26.86
) Variance = 26.86.
From the relationship between variance and standard
p deviation ( ), which is
.S
aA
= Var(X)
Mb
p
= 26.86
= 5.18
.S
aA
) = 5.18.
itil
P P
Mb
Example 1.43. Given that N = 10 x = 60 and x2 = 1000. use the given information
to find the standard deviation
.S
aA
x x
=
itil
N N
s✓
Mb
◆ ✓ ◆2
1000 60
=
10 10
.S
p
aA
= (100 36)
p
itil
= 64
Mb
) = 8.
Example 1.44. In an agricultural experiment, seeds were planted in rows and the number
.S
aA
of seeds that germinated in each row were tallied and data summarised in a table below:
itil
Mb
Number of seeds germinated Frequency
0 4 20
.S
4 8 45
aA
8 12 30
itil
12 16 5
Mb
Use direct, deviation and coding methods to find the standard deviation of the seeds ger-
minated per row and give your answer correct to 2 decimal places.
A.S
a
itil
Solution: The direct method is used by constructing the table of class mark (x), f x and
Mb
f x2 .
Class interval x f fx f x2
.S
aA
0 4 2 20 40 80
itil
4 8 6 45 270 1620
Mb
8 12 10 30 300 3000
12 16 14 5 70 980
.S
P P 2
aA
From the table above, we have N = 100 f x = 680 and f x = 5680. The standard
itil
P 2 ✓ P ◆2
fx fx
=
N N
.S
s ✓ ◆2
aA
5680 680
itil
=
100 100
Mb
of d = x A, f d and f d2 as below:
aA
itil
Class interval x f d f d f d2
Mb
0 4 2 20 20 80 320
4 8 A!6 45 10 0 0
.S
8 12 10 30 0 120 480
aA
12 16 14 5 10 40 320
itil
Mb
P P 2
From the table above, N = 100, f d = 80, f d = 1120, from the variance formula
.S
given in equation (1.4.7), the standard deviation is
aA
v !2
u
u1 X
itil
n n
X
1
=t fi d2i fi di , or in simple form
Mb
N i=1 N i=1
sP ✓ P ◆2
f d2 fd
.S
=
aA
N N
itil
s ✓ ◆2
Mb
1120 80
=
100 100
p
.S
= 11.2 0.64
aA
p
= 10.56
itil
Mb
) = 3.25 correct to 2 decimal places.
To use a coding method, we let A = 6 to be our assumed mean and we construct the table
of u = x c A , f u and f u2 as below:
.S
aA
Class interval
x f d f u f u2
itil
0 4
2 20 20 80 320
A ! 6 45
4 8 10 0 0 Mb
8 12
10 30 0 120 480
A.S
12 16
14 5 10 40 320
a
P P 2
itil
From the table above, c = 4, N = 100, f u = 20, f u = 70, from the variance formula
Mb
N i=1 N i=1
itil
sP
Mb
✓ P ◆2
f u2 fu
=c
N N
s
.S
✓ ◆2
aA
70 20
=4
itil
100 100
Mb
p
= 4 0.7 0.04
p
= 4 0.66
.S
aA
Example 1.45. The weights of the pupils in a certain school are given in the table below.
Mb
Weight(kg) 33 36 37 40 41 44 45 48 49 52
No of pupils. 15 17 21 22 25
.S
aA
c
tabulate the information as below.
Weight (kg) Class mark(x) Frequency(f ) u fu u2 f u2
.S
33 36 34.5 15 2 30 4 60
aA
37 40 38.5 17 1 17 1 17
itil
Mb
41 44 42.5 21 0 0 0 0
45 48 46.5 22 1 22 1 22
49 52 50.5 25 2 50 4 100
P P 2
From the table above, we obtain f u = 25 and f u = 199, thus, the standard deviation by
.S
coding method is given v
by
aA
u 2 !2 3
u
itil
X n Xn
u 1 1
Mb
= t c2 4 fi u2i fi ui 5, or in simple form
N i=1 N i=1
sP ✓ P ◆2
.S
f u2 fu
aA
=c
N N
itil
s
Mb
✓ ◆2
199 25
=4
100 100
.S
= 5.55
aA
)
itil
= 5.55.
Mb
Exercise 1.4 (Answers on page 46.)
.S
1. The table below shows the number of calls per day received by a fire station in a year.
aA
itil
Number of calls per day 100 119 120 139
Mb
140 159 160 179 180 199
Number of days 129 102 57 42 25
A.S
2. By using both deviation and coding methods, find the variance and standard deviation of
Mb
Class Interval
.S
0 5 5 10 10 15 15 20 20 25
aA
Frequency 5 4 7 7 5
itil
Mb
3
aA
and y.
itil
Mb
4. The mean and variance of 5, 6, 9, 4, a, 10, 7, 8 and b are 5.5 and 8.25 respectively. Find the
value of a and b.
.S
5. The mean and standard deviation of 6 observations are 8 and 4 respectively. If each obser-
aA
vation is multiplied by 3. Find the new mean and new standard deviation of the resulting
itil
observation.
Mb
6. The mean and variance of 8 observations are 9 and 9.25 respectively. If six of the observa-
.S
tions are 6, 7, 10, 12, 12 and 13. Find the remaining observations.
aA
itil
7. The mean and standard deviation of a group of 100 observations were found to be 20
Mb
and 3 respectively. Later on, it was found that three observations were incorrect, which
were recorded as 21, 21 and 18. Find the mean and standard deviation if the incorrect
observations were omitted.
.S
aA
8. The mean and standard deviation of 20 observations are found to be 10 and 2 respectively.
itil
On rechecking, it was found that an observation 8 was incorrect. Calculate the correct
Mb
.S
(b) If it is replaced by 12
aA
itil
9. Prove that the standard deviation is given
rby
Mb
P 2
fu
=i u2
NP
.S
fu
aA
where i=class interval, u=coding number, u = and N =total number of items.
N
itil
Mb
Answers to Exercise 1.4 (On page 45.)
2. 45.2487, 6.7267
.S
aA
Miscellaneous Exercise 1 (Answers on page 46.)
itil
Mb
1. A total of 1690 malaria patients were given a 3-day dose of Artemisinin-based Combination
Therapy (ACT) in one of district hospital in tropical countries. After completing a dose,
.S
their recovery time in days was as shown in the table below.
aA
itil
Days of Recovery
Mb
0 10 10 20 20 30 30 40 40 50
Number of Patients 460 730 260 150 90
A.S
Find the mean, median and mode of the recovery time. Give your answer correct to one
a
decimal place.
itil
Mb
Temperature in F 55 65 75 85 95
itil
Time in days 8 6 12 11 13
Mb
Find the mean, mean deviation and standard deviation of the temperature scale.
.S
aA