Sma 160 Probability and Statistics 1
Sma 160 Probability and Statistics 1
KENYATTA UNIVERSITY
INSTITUTE OF OPEN LEARNING
SCHOOL OF PURE AND APPLIED SCIENCES
E. G. Njenga
S. M. Karuku
Department of Mathematics
Preface
E. G. Njenga
S. M. Karuku
Department of Mathematics
TABLE OF CONTENTS
Unit
Page
Unit 1
Introduction to Statistics 1
Unit 2
Data Presentation 30
Unit 3
Measures of Central Tendency 54
Unit 4
Measures of Dispersion 80
Unit 5
Moment, Skewness and Kurtosis 95
Unit 6
Correlation 111
Unit 7
Regression 126
Unit 8
Probability 136
UNIT 1
INTRODUCTION TO STATISTICS
1.0 Introduction
In our everyday life, we come across different types of quantitative information in
newspapers, magazines, over radio and television. For example, we may hear or read that
the infant mortality rate had decreased at the rate of 15% per annum during the period
1997-1998, the population of Kenya had increased at the rate of 3% per year during the
period 1989-1999, the number of admission in Public Universities had gone up by, say,
4% during 1998-99 as compared to 1996-97, etc. We would like to know what these
figures mean. This quantitative information or expression is called statistical data or
statistics. The unit begins with a discussion of the meaning and scope of statistics. This is
followed by definitions of terms commonly encountered in statistics. We also outline the
stages involved in any statistical enquiry. The different types of variables and levels of
measurement are discussed. We also introduce some mathematical symbols commonly
used in statistics. At the end of the unit, a revision exercise is provided to assist the
student in reviewing the main ideas of the Unit and practice on the use of the
mathematical symbols considered.
1.1 Objectives
By the end of this unit, you should be able to:
distinguish between statistical data and statistical methods.
classify statistical studies as either descriptive or inferential.
define the terms data, population, sample and variable.
identify the population and the sample in an inferential study.
explain what is meant by a representative sample.
(ii) In the second sense, statistics refers to a set of tools for dealing with numerical
facts. Thus in this sense, statistics is a set of tools used to collect, organize,
present, analyze and interpret numerical facts or observations to make
decisions.
It is the second definition that constitutes the subject matter of this module. Within this
second definition, a distinction is often made between the two functions of the statistical
method.
The second function is beyond the scope of this module. As such we will deal with
descriptive statistics.
Note: this implies that there is a third meaning for the term
! statistics, which distinguishes a statistic from a parameter.
(ii) Planning
The researcher creates a design of the study/analysis. The decision as how to collect data
is made. To obtain an accurate and complete count of the population, the statistician
needs to decide on the precise nature of the items to be enumerated or measured. In most
cases, however, constraints like time and resources make a census unrealistic and
impossible. The only option left in that case is for the statistician to select an unbiased
(representative) sample from the population.
In this method, the investigator (also called interviewer) studies the facts and collects the
required data.
(ii) Interview
In this method, the investigator or his assistants establishes contact with the respondents.
An opportune time is agreed upon for a face-to-face interview with the respondents.
Alternatively the interviewer can use telephone interviews if the duration of the interview
is short.
(iii) Questionnaire method
In this method a question booklet is prepared and sent to respondents either through post
or taken personally to him.
Many a times with some modifications, same purpose may be served by using data
collected by other persons or agencies.
• Secondary data may provide validation for primary data, whereby the secondary
data allow us to assess the quality and consistency of the primary data.
• Secondary data may act as a substitute for primary data. In some situations we
may simply not be able to collect data, for reasons of access, cost, or time; or the
data have been collected once and to repeat the collection process would be
undesirable.
then there is no need for highly accurate estimates in order to make the investment
decision.
Reliability –The reliability of published statistics may vary over time. It is not
uncommon, for example, for the systems of collecting data to have changed over
time but without any indication of this to the reader of published statistics. The
government may change geographical or administrative boundaries, or the basis
for stratifying a sample may have altered. Other aspects of research methodology
that affect the reliability of secondary data are the sample size, response rate,
questionnaire design and modes of analysis.
Time frame – Most censuses take place at 10-year intervals, so data from this and
other published sources may be out-of-date at the time the researcher wants to
make use of the statistics. The time period during which secondary data was first
compiled may have a substantial effect upon the nature of the data.
Source bias – Researchers have to be aware of vested interests when they consult
secondary sources. Those responsible for their compilation may have reasons for
wishing to present a more optimistic or pessimistic set of results for their
organization. For example, officials responsible for estimating food shortages
would wish to exaggerate figures before sending aid requests to potential donors.
Similarly, and with equal frequency, commercial organizations have been known
to inflate estimates of their market shares.
include weight, age, height and price. Quantitative variables are further divided into
whether they are discrete or continuous.
A continuous variable is one for which all values in some range are possible.
Examples of continuous variables
Time taken to cover a distance of 100 metres by an athlete.
Volume of water in a fish pod at any time.
Height of plants in a greenhouse.
Weight of newborn babies in a maternity ward.
A discrete variable on the other hand is one for which the possible values form a finite (or
countably infinite) set of numbers. That is, a variable that assumes values that can be
counted.
Variable
Quantitative Qualitative
Discrete Continuous
Exercise 1.1
1. Which of the following variables are qualitative and which are
quantitative?
(a) The nationality of personnel at the World Health Organization
headquarters.
(b) Number of days absent from school due to illness.
(c) The political party people vote for in an election.
(d) The dimensions of the altar.
(e) The lifestyle of a member of the royal family.
For data analysis, numbers are assigned to the attributes (for example, greatly dislike
= −2 , moderately dislike = −1 , indifferent to = 0 , moderately like = +1 , and greatly like
= +2 ), but the numbers are understood to indicate rank order and the “distance” between
the numbers has no meaning. Any other assignment of numbers that preserves the rank
order of the attributes would serve as well.
matter whether they are high or low. For example, consider the Fahrenheit scale of
temperature. The difference between 86o and 66o represents the same temperature
difference between 90o and 70o. This is because each 20o interval has the same physical
meaning (in terms of kinetic energy molecules). Interval scales, however, do not have a
true zero point: the zero is arbitrary. 0o Fahrenheit does not represent the complete
absence of temperature (the absence of any molecular kinetic energy). Consequently, it
does not make sense to compute ratios of temperature. For example, there is no sense in
which the ratio of 86o and 43o is the same as the ratio of 90o and 45o; no interesting
physical property is preserved across the two ratios. It does not make sense to say that 90o
is “twice as hot” as 45o
Another example of variables measured on an interval scale is the calendar years. The
arbitrary 0 was assigned when Christ was born and time before this is labelled ‘BC’.
Exercise 1.2
What type of scale is being used in each of the following measurements?
(a) Altitude (height above sea level).
(b) The presidential candidate people vote for in an election.
(c) Pain level: None, Mild, Moderate, Severe.
(d) The dimensions of the altar.
(e) Age of an athlete taking part in the Youth Athletics Championship.
(f) Arrival time of a plane at an international airport.
1.8.1 Subscripts
As in other branches of mathematics, statistics uses literal symbols to represent variable
quantities. Subscripts are usually used to distinguish variables that are related in some
sense.
1.8.1.1 Single Subscript
Suppose we take X to be the height (in feet) of a fresh college student. Now, we may
measure the value of X for several students; say 8 of them, getting the set of values
{4.5, 5.2, 5.0, 5.2, 5.4, 4.9, 5.7, 5.8}
These eight values are all values of X, but correspond to different observations of the
variable X. To distinguish symbolically between such alternative values of a single
variable X, it is common to use a subscript notation, X, where i = 1 for the first value,
i = 2 for the second, and so on. Thus, for this example,
X1 = 4.5, X2 = 5.2, X3 = 5.0, X4 = 5.2, X5 = 5.4, X6 = 4.9, X7 = 5.7, and X8 = 5.8.
These subscripts are viewed as numerical labels used to distinguish one of a set of values
from the others.
Triple subscripts can be used to indicate that a collection of variables differs along three
dimensions. For instance, suppose that instead of only one plot in the agricultural
experiment considered above, we had, say, four of them each with seven columns and
nine rows as in Table1 above. Then in this case we would use triple subscripts with one
subscript indicating the plot number, the second subscript indicating the row number and
the third subscript indicating the column number. That is Xi,j,k represents the yields in the
ith plot of the jth row and the kth column. Thus in the table below, X4,5,3 represents the
yields in the fourth plot of the fifth row and the third column.
X111 X112 X113 X114 X115 X116 X117 X211 X212 X213 X214 X215 X216 X217
X121 X122 X123 X124 X125 X126 X127 X221 X222 X223 X224 X225 X226 X227
X131 X132 X133 X134 X135 X136 X137 X231 X232 X233 X234 X235 X236 X237
X141 X142 X143 X144 X145 X146 X147 X241 X242 X243 X244 X245 X246 X247
X151 X152 X153 X154 X155 X156 X157 X251 X252 X253 X254 X255 X256 X257
X161 X162 X163 X164 X165 X166 X167 X261 X262 X263 X264 X265 X266 X267
X171 X172 X173 X174 X175 X176 X177 X271 X272 X273 X274 X275 X276 X277
X181 X182 X183 X184 X185 X186 X187 X281 X282 X283 X284 X285 X286 X287
X191 X192 X193 X194 X195 X196 X197 X291 X292 X293 X294 X295 X296 X297
Plot1 Plot2
X311 X312 X313 X314 X315 X316 X317 X411 X412 X413 X414 X415 X416 X417
X321 X322 X323 X324 X325 X326 X327 X421 X422 X423 X424 X425 X426 X427
X331 X332 X333 X334 X335 X336 X337 X431 X432 X433 X434 X435 X436 X437
X341 X342 X343 X344 X345 X346 X347 X441 X442 X443 X444 X445 X446 X447
X351 X352 X353 X354 X355 X356 X357 X451 X452 X453 X454 X455 X456 X457
X361 X362 X363 X364 X365 X366 X367 X461 X462 X463 X464 X465 X466 X467
X371 X372 X373 X374 X375 X376 X377 X471 X472 X473 X474 X475 X476 X477
X381 X382 X383 X384 X385 X386 X387 X481 X482 X483 X484 X485 X486 X487
X391 X392 X393 X394 X395 X396 X397 X491 X492 X493 X494 X495 X496 X497
Plot3 Plot4
Figure2: A generic experimental set up with four plots each plot having nine rows and seven columns
Exercise 1.3
1. What is a superscript? What is the difference between a superscript and an
exponent?
2. Give an example in which
(a) a double subscript can be used.
(b) a triple subscript can be used.
Here i is the variable ranging over the integers 1, 2, 3, 4, 5, 6, 7 and 8. The symbol i = 1
below the Σ sign indicates that 1 is the initial value taken on by i and the 8, written above
the Σ sign indicates that 8 is the last value of i .
We call i the index of summation while X i is referred to as the summand. The summand
is a function of i which takes on the values X 1 , X 2 , X 3 , X 4 , X 5 , X 6 , X 7 and X 8 as i
takes on successively the values 1, 2, 3, 4, 5, 6, 7 and 8. The Σ sign indicates the fact that
the values X 1 , X 2 , X 3 , X 4 , X 5 , X 6 , X 7 and X 8 taken on by X i are to be added. The
8
entire symbol ∑X
i =1
i is read “the summation of X i as i ranges from 1 to 8 ”. In general,
∑X
i =1
i , which is read “the sum of X i for i = 1 to i = n ”.
Note: Sometimes the indices below and above the summation sign are
n
∑X ∑ X . This is
! omitted and the notation
i =1
i
Proof:
There are two parts in this proof; the sum and the difference rule. We begin with the
former. In this case
n
∑ (X
i =1
i + Yi ) = (X1 + Y1 ) + (X 2 + Y2 ) + L + (X n + Yn )
= X1 + X 2 + L + X n + Y1 + Y2 + L + Yn
n n
= ∑ X i + ∑ Yi
i =1 i =1
∑ (X
i =1
i − Yi ) = (X 1 − Y1 ) + (X 2 − Y2 ) + L + (X n − Yn )
= X 1 + X 2 + L + X n − Y1 − Y2 − L − Yn
= X 1 + X 2 + L + X n − (Y1 + Y2 + L + Yn )
n n
= ∑ X i − ∑ Yi
i =1 i =1
n n n
Therefore, ∑ (Xi ± Yi ) = ∑ Xi ± ∑ Yi
i =1 i =1 i =1
n
2. ∑ k = nk , k ≠ 0
i =1
Proof:
Recall that for all i ≠ 0 , i 0 = 1 . Thus we can write k = k (i 0 ) .
n n
This implies that ∑ k = ∑ k (i0 )
i =1 i =1
= k (10 ) + k (20 ) + L + k (n 0 )
= k (1) + k (1) + L + k (1)
= k + k + L + k {n times}
= nk
n n
3. ∑ kXi = k ∑ Xi
i =1 i =1
Proof:
n
∑ kX
i =1
i = kX1 + kX 2 + L + kX n
n
= k (X1 + X 2 + L + X n ) = k ∑ X i
i =1
n
n n
4. ∑ (X i Yi ) ≠ ∑ X i ∑ Yi
i =1 i =1 i =1
Proof:
Suppose the equality holds. That is, for any given set of values for the variables X and Y
n
n
n
∑ (X i Yi ) = ∑ X i ∑ Yi .
i =1 i =1 i =1
Now consider the following set of values for the variables X and Y variables X and Y:
X1 = 1 , X 2 = 2 , X 3 = 3 , X 4 = 4
Y1 = 3 , Y2 = 4 , Y3 = 6 , Y4 = 5
4
Now, ∑ (X Y ) = X Y
i =1
i i 1 1 + X 2 Y2 + X 3 Y3 + X 4 Y4
= (1 × 3) + (2 × 4) + (3 × 6) + (4 × 5)
= 3 + 8 + 18 + 20 = 49
4 4
Next, ∑ X i ∑ Yi = (1 + 2 + 3 + 4 ) × (3 + 4 + 6 + 5)
i =1 i =1
= 10 × 18 = 180
Clearly, 49 ≠ 180 , a contradiction to our supposition that the equality for any two
variables X and Y holds. Hence
n
n
n
∑ (X i Yi ) ≠ ∑ X i ∑ Yi
i =1 i =1 i =1
i =1
(X i X i ) ≠
n
∑ X
n
i =1 i =1
i ∑ X i ; that is, ∑
n
i =1
X ≠
n
∑ Xi
2
i
i =1
2
m n
m n n m
An important property of the double summation is that ∑∑ X ij = ∑∑ X ij . We
i =1 j=1 j=1 i =1
Example 1.1:
Let X ij denote the biomass (in g) of a genetically modified maize plant in the ith row
and the jth column of an experimental plot with five rows and four columns as shown
below:
Now,
5 4 5 4
∑∑ X ij = ∑ ∑ X ij
i =1 j=1
i =1 j=1
5
= ∑ (X i1 + X i2 + X i3 + X i4 )
i =1
5 5 5 5
= ∑ X i1 + ∑ X i2 + ∑ X i3 + ∑ X i4
i =1 i =1 i =1 i =1
= (sum of col1) + (sum of col2) + (sum of col3) + (sum of col4)
= 18 + 14 + 22 + 32 = 86
Next,
4 5 4
5
∑∑
j=1 i =1
X ij = ∑ ∑ X ij
j=1 i =1
= ∑ (X 1j + X 2j + X 3j + X 4j + X 5j )
4
j=1
4 4 4 4 4
= ∑ X 1j + ∑ X 2j + ∑ X 3j + ∑ X 4j + ∑ X 5j
j=1 j=1 j=1 j=1 j=1
= (sum of row1) + (sum of row2) + (sum of row3) + (sum of row4) + (sum of row5)
= 16 + 14 + 23 + 20 + 13 = 86
m n n m
Hence, ∑∑ X ij = ∑∑ X ij
i =1 j=1 j=1 i =1
∏X
i =1
i = X1 × X 2 × X 3 × X 4 × X 5 × X 6 × X 7 × X8
Here, again, i is the index of multiplication, whose range is indicated by the notations on
the Π symbol and X i is a function of i . The Π sign indicates the fact that the values X 1 ,
8
symbol ∏X
i =1
i is read “the product of X i as i ranges from 1 to 8” In general, suppose
∏X
i =1
i , which is read “the product of X i for i = 1 to i = n ”.
Notes:
(1) Sometimes the indices below and above the multiplication sign
n
are omitted and the notation ∏X i is simply written as ∏X .
!
i =1
n
1. ∏k = k
i =1
n
Proof:
Recall that, for all real numbers i ≠ 0 ., i 0 = 1 . In our case i ranges over integers, and
hence we can write k = k ⋅ (i 0 ) .
This implies that
n n
∏ k ≡ ∏ k ⋅ (i
i =1 i =1
0
) = k ⋅ (10 ) × k ⋅ (2 0 ) × k ⋅ (30 ) × L × k ⋅ (n 0 )
= kn
n n
2. ∏ k ⋅ Xi = k n ⋅ ∏ Xi
i =1 i =1
Proof:
n
∏k ⋅X
i =1
i = k ⋅ X1 × k ⋅ X 2 × L × k ⋅ X n
= (k × k × L × k ) ⋅ (X1 × X 2 × L × X n )
n n
= (k n ) ⋅ ∏ X i = k n ∏ X i
i =1 i =1
n
n n
3. ∏X i =1
i ⋅ Yi = ∏ X i ∏ Yi
i =1 i =1
Proof:
n
∏X
i =1
i ⋅ Yi = (X 1 ⋅ Y1 ) × (X 2 ⋅ Y2 ) × L × (X n ⋅ Yn )
n n
= (X1 × X 2 × L × X n ) ⋅ (Y1 × Y2 × L × Yn ) = ∏ X i ⋅ ∏ Yi
i =1 i =1
∏∏ X
i =1 j=1
ij = (X 11 × X 12 × L × X 1n ) × (X 21 × X 22 × L × X 2 n )
× L × (X m1 × X m 2 × L × X mn )
m n n m
Analogous to the double summation property, we have that ∏∏ X
i =1 j=1
ij = ∏∏ X ij .
j=1 i =1
Example 1.2:
Using the data of the above example on the yields of genetically modified maize
plants, we have the following table.
Row1 3 7 2 4
∏X
j=1
1j = 168
Row2 5 1 2 6 ∏X
j=1
2j = 60
Row3 4 2 8 9
∏X
j=1
3j = 576
Row4 2 3 7 8
∏X
j=1
4j = 336
Row5 4 1 3 5 ∏X
j=1
5j = 60
5 5 5
Product ∏X
i =1
i1 = 480
∏X
5
i2 = 42 ∏X i =1
i3 = 672 ∏X
i =1
i4 = 8640
i =1
Now,
5 4 5 4
∏∏ X ij = ∏ ∏ X ij
i =1 j =1
i =1 j =1
5
= ∏ (X i1 × X i2 × X i3 × X i4 )
i =1
5 5 5 5
= ∏ X i1 × ∏ X i2 × ∏ X i3 × ∏ X i4
i =1 i =1 i =1 i =1
= (product of col1) ⋅ (product of col2) ⋅ (product of col3) ⋅ (product of col4)
= ∏ (X 1j × X 2j × X 3j × X 4j × X 5j )
4
j=1
4 4 4 4 4
= ∏ X1j ⋅ ∏ X 2j ⋅ ∏ X 3j ⋅ ∏ X 4j ⋅ ∏ X 5j
j=1 j=1 j=1 j=1 j=1
∏∏ X ij = ∏∏ X ij .
i =1 j=1 j=1 i =1
2. Which of the following variables are continuous and which are discrete?
(i) Level of chemical pollutant in the air.
(ii) Number of children a woman has had.
(iii) The heights of students in a class.
(iv) The thickness of blood vessels in different species.
(v) Number of goals scored by a soccer team.
(vi) The length of skipping ropes used by sportspersons in the pitch.
(vii) The blood pressure of a recuperating patient in a hospital.
(viii) Number of clinic visits made in one year.
5. What type of scale is being used for each of the following measurements?
(i) The type of car driven by a university professor.
(ii) The number of cars passing a particular point on the highway.
(iii) The level of formal education attained by citizens of a country.
(iv) The types of occupations of men in the age bracket 30 – 50 years.
(v) The academic grade attained by college students in a particular course.
(vi) The cost of beef in different meat shops within the city centre.
(vii) The temperature (in oC) recorded at a given location.
(viii) The speed at which elephants move.
(ix) The weight of newborn babies in a maternity ward.
6. Consider the following set of values for the two variables X and Y:
X1 = 2 , X 2 = 3 , X 3 = 7 , X 4 = 8 , X 5 = 1 , X 6 = 10
Y1 = 4 , Y2 = 5 , Y3 = 12 , Y4 = 15 , Y5 = 9 , Y6 = 20
2
6 6
6
(d) ∑ 5Y
i =1
i (e) ∑X
i =1
2
i (f) ∑ X i
i =1
2
6
6 6
(g) ∑Y
i =1
i
2
(h) ∑ Yi
i =1
(i) ∑ (Xi =1
i + Yi )
6 6
∑ (X )
6 6
(j) ∑X Y
i =1
i i (k) ∑ X i ∑ Yi
i =1 i =1
(l)
i =1
2
i + Yi2
6 6 6
∑ (Xi − Yi ) ∑ (Xi + Yi ) ∑ (X + 6)
2
(m) (n) (o) i
i =1 i =1 i =1
6
(p) ∑ (Y − 3)
i =1
i
9. Prove that if X1 , X 2 ,L, X n is a given set of values for a variable X, c and k are real
n n
constants, and n is an integer, then ∑ (k + cX ) = nk + c∑ X
i =1
i
i =1
i
10. Using the set of values for the variables X and Y in Exercise 6, find the value of
each of the following: {you may use your calculator}.
6 4 3
(a) ∏ Xi
i =1
(b) ∏ Yi
i =1
(c) ∏ 3X
k =1
k
2
3 3 3
(d) ∏ 2Y j (e) ∏ Yj (f) ∏X Y i i
j=1 j=1 i =1
3 3 3 3
(g) ∏ X i ∏ Yj (h) ∏ (X − Yi ) ∏ (X + Yi )
2
i (i) i
i =1 j=1 i =1 i =1
3 6
(j) ∏ (X i + 6)
i =1
(k) ∏ (Y
k =1
k − 3)
11. The table below shows the physical measurements of some 6 cadets recruited into
the armed forces. The variable X represents the height (in ft.) and Y represents the
weight (in kg.)
Cadet Serial X Y
Number
1 5.6 60.9
2 6.2 58.7
3 5.9 62.3
4 5.7 59.0
5 5.3 67.8
6 5.5 71.6
(a) If each of the cadets lost 1.4 kg after 3 months of rigorous training,
find the sum of their weights after the 3 months.
(b) Find the product of their heights after 1 year if each had gained 0.5 ft.
(c) Find the sum of the squares of their heights at the time of recruitment.
(d) Find the square of the sum of their heights at the time of recruitment.
(e) Find the product of the squares of their heights at the time of
recruitment.
(f) Find the sum of three times their weights at the time of recruitment.
m n n m
(ii) ∏∏ X ij = ∏∏ X ij
i =1 j=1 j=1 i =1
UNIT 2
DATA PRESENTATION
2.0 Introduction
After the data have been collected, they are typically recorded as a tabulated set of
numbers. However, a list of numbers is not very helpful for:
• determining trends or relationships between different variables.
• presentation of data to demonstrate a relationship or support a hypothesis.
A more visually accessible form of the data is required in such cases. There are a wide
variety of ways to summarize and present data. In this Unit, we consider the graphical
method. The numerical method of data summarization will be discussed in Units 3 and 4.
2.1 Objectives
By the end of this Unit, you should be able to:
Prepare grouped and ungrouped frequency distributions from a given
data set.
Graphically display grouped and ungrouped frequency distributions by
means of a histogram and an ogive from a given data set.
Present data in a variety of graphical forms including the pie chart, bar
chart, column chart, stem-and-leaf plots, box-and-whisker diagram
(boxplot)
Construct and interpret bar graphs and pie charts using a given set of
data.
Construct and interpret a box-and-whisker diagram (boxplot) from a
given set of data.
Example 2.1:
The set of observations below shows the number of times that each of 30
public service vehicles plying a certain route was charged with a traffic
offence during the month of December in the year 1999.
3 0 1 6 0 5
6 2 1 3 6 3
4 0 6 2 3 5
6 0 6 6 5 1
1 5 2 4 0 0
This value indicates that 1% of the public service vehicles considered above
were charged twice for traffic offences during the period under
consideration.
Example 2.2:
The set of numbers below shows the marks scored in an end-of-semester
Psychology examination by 150 students in a certain university 1994.
88 53 29 36 58 56 78 90 68 59 35 65 54 87 44
66 45 87 63 53 52 48 38 48 61 80 46 70 54 67
58 65 32 39 60 57 81 92 68 90 27 68 84 83 56
42 50 67 90 80 88 93 92 51 93 87 75 59 68 79
78 76 89 86 91 50 49 89 38 76 45 46 73 49 91
70 86 89 80 90 41 53 86 43 49 82 76 72 58 80
90 87 51 43 76 70 85 81 46 79 86 81 79 80 78
84 64 59 63 75 74 89 61 92 77 87 42 65 93 87
79 41 33 57 86 88 91 83 73 31 44 51 55 62 70
32 68 47 29 54 50 43 51 88 91 52 76 88 90 63
Before moving on, let us revise the terminology of grouped frequency distributions.
• Upper and lower class limits are the largest and smallest values belonging to a
given class interval. The lower limit of the first class interval is any number,
which is less than or equal to the lowest value in the data.
• Upper and lower class boundaries are the largest and the smallest actual values
that separate classes. Class limits are usually converted to class boundaries by
finding the midpoint of the upper class limit of the upper class and the lower class
limit of the following class.
• Class mark (midpoint) is half of the sum of the upper class limit and lower class
limit.
• Class width is the difference between the upper class boundary and the lower
class boundary.
To illustrate the above terminology, consider the class interval 70 – 79 in the above
grouped frequency table. Then
o The lower and the upper class limits are 70 and 79 respectively.
o The lower and the upper class boundaries are 69.5 and 79.5. Note that this is done
to ensure that the class intervals are continuous, mutually exclusive and
exhaustive. If the data under consideration were continuous, the class boundaries
would capture the value that falls between the upper class limit of one class
interval and the lower class limit of the subsequent class interval.
(70 + 79)
o The class midpoint is = 74.5
2
o The class width is 79.5 − 69.5 = 10 .
Exercise 2.1
1. During the Christmas holiday the Department of Culture organized a music
extravaganza. The following day, a random sample of 70 of those who
attended were asked to rate the extravaganza on a five point scale 1, 2, 3, 4, 5
where 1 represents maximum enjoyment and 5 represents minimum
enjoyment. Their ratings are shown below.
1 5 3 4 2 1 4 2 4 3
2 4 3 2 1 3 5 2 1 1
1 4 3 2 3 2 4 1 1 2
1 2 3 3 2 2 4 5 5 2
4 2 1 1 4 4 2 3 3 4
58 47 85 47 63 51 40 70 80 73 72 90 84 42
56 67 63 70 54 76 49 81 75 80 75 46 60 71
70 79 84 72 54 55 61 82 70 47 40 77 81 76
66 59 81 66 48 43 87 55 70 60
category or simply the value of the variable. Bar charts are displayed horizontally or
vertically and they are separated rather than touching so that implications of continuity
among the categories are avoided. The bar charts are visually appealing and make it easy
for users to see comparisons, patterns, and trends in data.
Notes:
• It is reasonable to use a logarithmic scale for the frequency axis if the
range of values is greater than two orders of magnitude (e.g., 0 –200).
! • The first and the last bars may include the extremes; that is, we may
have open-ended intervals (for example, if we are dealing with age in
years, we may have, say, “under 30 ” and “over 70”)
Example 2.3:
The table below shows the 1999 census of Kenya by province as at 24th August 1999.
Province Population
Nairobi 2,143,254
Nyanza 4,392,196
Coast 2,487,264
North Eastern 962,143
Eastern 4,631,779
Western 3,358,776
Central 3,724,159
Rift –valley 6,987,036
Source: Kenya, Central Bureau of Statistics
We can represent this information in a vertical or a horizontal bar chart as shown in the
figures below.
6.5
6
5.5
5
4.5
Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)
4
lOMoARcPSD|31627713
Fig 2.1: A vertical bar chart depicting the 1999 census of Kenya by province
Western
Rift-valley
Nyanza
Province
North-Eastern
Nairobi
Eastern
Coast
Central
Fig 2.2: A horizontal bar chart depicting the 1999 census of Kenya by province
Example 2.4:
The table below shows the 1930 Education Department Expenditure by race in Kenya.
Race Pupils (in state and state- Total expenditure Expenditure per pupil
aided schools only) (In US dollars) (In US dollars)
African 6948 232,293 33.4
Asian 1900 70,329 37.0
European 776 140,041 180.5
Total 9,624 442,663 46.0
Source: Kenya, Education Department Annual Report, 1930
We can represent the total expenditure information (third column) in a pie chart as shown
below.
232,293
Sector angle corresponding to 232,293 is × 360 o = 188.91o
442,663
70,329
Sector angle corresponding to 70,329 is × 360 o = 57.20 o
442,663
140,041
Sector angle corresponding to 140,041 is × 360 o = 113.89 o
442,663
European
32%
African
52%
Asian
16%
Fig 2.3: A pie chart depicting the 1930 educational expenditure by race in Kenya
113.89 o 140,041
≡ ≈ 31.64%
360 o 442,663
188.91o 232,293
≡ ≈ 52.48%
360 o 442,663
Example 2.5:
Using the grouped frequency distribution in Example
2.2, the following histograms are obtained.
40
35
30
25
Frequencies
20
15
10
0
20 -29 30 - 39 40 - 49 50 - 59 60 - 69 70 - 79 80 - 89 90 - 99
Marks
40
35
30
25
Frequency
20
15
10
0
24.5 34.5 44.5 54.5 64.5 74.5 84.5 94.5
Marks
Note:
When we construct a histogram based on the relative
Example 2.6:
Using the histogram in Fig. 2.4b we obtain the following
frequency polygon
40
35
30
25
Frequency
20
15
10
0
Downloaded
14.5 24.5by34.5
Alfonce Mwelelu
44.5 54.5 (mwelelualfonce9@gmail.com)
64.5 74.5 84.5 94.5 100
lOMoARcPSD|31627713
Note:
When we construct a polygon based on the relative
Example 2.7:
The following table shows the class intervals, the upper
class boundaries and their corresponding cumulative
frequencies obtained from the grouped frequency
distribution for the data on psychology exam marks in
Example 2.2.
150
140
130
120
Cumulative frequency 110
100
90
80
70
60
50
40
30
20
10
0
19.5 29.5 39.5 49.5 59.5 69.5 79.5 89.5 99.5
Fig 2.6: A cumulative frequency polygon for the psychology exam marks
110
100
90
80
70
60
50
40
30
20
10
0
29.5 39.5 49.5 59.5 69.5 79.5 89.5 99.5 100
4: 11223334455666788999
5: 00011112233344456677888999
6: 0112333455567788888
7: 00002334556666678889999
8:
000001112334456666677777788888
9999
9: 0000001111222333
Fig 2.8a: A stem –and –leaf plot for the psychology exam marks using a class width of
10
4: 55666788999
5: 000111122333444
5: 56677888999
6: 01123334
6: 55567788888
7: 00002334
7: 556666678889999
8: 0000011123344
8: 566666777777888889999
9: 0000001111222333
9:
Fig 2.8b: A stem –and –leaf plot for the psychology exam marks using a class width of 5
Note:
If we turn the stem –and –leaf plot sideways we note that it
Used for summarizing a set of data measured on an interval or ratio scale, the line graphs
are commonly used to present mathematical expressions. They are also used to represent
a time series –the statistical data arranged in accordance with occurrence in time.
Example 2.8:
The table below shows the number of primary schools in Kenya from 1971 to 1988.
We can represent the above information in a line graph as shown in the figure below
16
Numbe r of schools (in thousands)
14
12
10
8
6
4
2
0
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
Ye ar
Fig 2.9: A line graph for the number of primary schools in Kenya from 1971 to 1988
Exercise 2.2
1. The marks (out of 100) of 50 candidates in an examination are given below.
53 54 18 29 82 17 36 54 47 72
75 23 7 51 70 61 57 35 37 40
46 43 81 27 35 57 26 44 50 36
77 20 55 43 46 70 44 28 63 35
72 45 33 22 51 35 60 47 57 27
(i) Select suitable classes to prepare a grouped frequency distribution for
these data.
(ii) Use the grouped frequency distribution obtained in (i) to construct a
cumulative frequency distribution.
(iii) Prepare a relative frequency polygon using the grouped frequency
distribution obtained in (i).
2. The table below shows the number of health institutions in Kenya by province in
the year 1989
(iv) Represent the entries of the last column in the above table on a bar chart.
3. The following table shows the distribution of ages of 150 persons who were
interviewed by a beverage manufacturing company to establish the number of
persons in each of the age groups who were users of a particular beverage.
74 45 84 17 51 46 34 31 56 70 93 11 67 85 65 94
54 10 14 20 17 31 53 57 69 47 33 91 52 68 87 13
59 60 43 27 81 44 25 84 59 37 92 50 97 80 11 30
90 58 20 18 37 74 34 79 70 54 31 44 64 52 88 90
70 88 37 45 28 31 40 41 60 62 13 45 92 70 81 98
A box plot is especially helpful for indicating whether there are any unusual observations
(outliers) in the data set. They are also very useful when large numbers of observations
are involved and when two or more data sets are being compared. While the maximum
and minimum values in a given data set are clear to us, the median, the lower and the
upper quartiles are new concepts, which will be discussed in Units 3 and 4. We shall thus
revisit the box plot as a form of data summarization in Unit 4 after discussing the
foregoing summarization measures.
1. During the months of March and April, a meteorologist noted the temperature
(in oC) each day at noon. The results are shown below.
24 21 22 21 22 20 24 19 21 18 23 22 19 18 21 24
25 18 20 22 18 19 22 18 25 19 24 27 17 22 22 27
18 20 19 23 24 26 21 23 21 26 22 21 21 23 24 18
18 22 25 20 22 20 23 19 24 19 26 20 20
2. A large number of people set out together on a Freedom from Hunger walk. The
time taken by each of a sample of 50 of these people to complete the walk is
recorded below. The times are given to the nearest hour.
63 74 88 79 82 67 66 84 77 92
75 83 67 72 70 61 67 85 77 60
76 83 81 67 75 67 66 74 80 76
77 80 75 83 96 70 94 78 63 75
72 65 83 72 91 85 60 77 67 77
5. The weights, to the nearest kilogram of heifers in a dairy farm are as summarized
below.
Weight (in grams) 20 - 25- 30- 35- 40- 45- 50- 55- 60- 65-
24 29 34 39 44 49 54 59 64 69
No. of fossils 13 24 37 43 28 17 8 5 4 1
8. The table below shows the number of students at Kenyatta University in Nairobi,
Kenya pursuing a Bachelor of Education (Home Economics) degree course in the
years 1984 –1989.
UNIT 3
3.1 Objectives
By the end of this Unit, you should be able to:
distinguish between the two main types of measures of central
tendency; i.e., mathematical and positional averages.
know when it is appropriate to use each of them.
calculate the various mathematical averages and positional averages
from given raw data.
compute and interpret quartiles for a set of observations.
Learning Style
To achieve what is expected of you…
Briefly revise the section 1.8 of Unit1.
Allocate sufficient study time.
• be easy to understand.
• not be affected by extreme values.
∑x i
x= i =1
(3.1)
n
For grouped data, the arithmetic mean is given by
n
∑f x i i
x= i =1
n
, (3.2)
∑f
i =1
i
where fi is the frequency of the ith class and xi is, in this case, the class midpoint.
This method is called the direct method of calculating the arithmetic mean.
There is another method of calculating the arithmetic mean for a grouped data, called the
indirect method. In this method we list the class midpoints, pick any of them as the
assumed (working) mean, A, and then calculate di, the deviation from the assumed mean
of the ith midpoint. Then the mean is given by
n
∑f d i i
x =A+ i =1
n
, (3.3)
∑f
i =1
i
where d i = x i − A and fi is the frequency of the ith class. This formula is suitable for
classes with equal class widths.
o It may fall at a point where none of the actual observations are. For instance, we
may obtain 15.3 eggs as the arithmetic mean of the following eggs collected at ten
spots: 40, 10, 20, 24, 14, 13, 7, 17, 1, 3, 4. Hence it may not be truly
representative.
o It cannot be computed for grouped data in the cases of open classes.
=0
This property implies that the arithmetic mean is a score or a potential score that
balances all the scores on either side of it. This explains why the arithmetic mean
is very sensitive to extreme values when these values are not balanced on both
sides of it.
2. If z1 = x1 + y1 , z 2 = x 2 + y 2 , L, z n = x n + y n then z = x + y
n n n
∑x i ∑y i ∑z i
where x = i =1
, y= i =1
, and z = i =1
.
n n n
Proof:
By definition,
n
∑z i
z= i =1
n
n
∑ (x i + yi )
= i =1
{Since z i = x i + y i }
n
n n
∑x ∑y i i
= i =1
+ i =1
n n
=x+y
∑d i
x = A+ i =1
{A is then referred to as the assumed mean}
n
Proof:
By definition
n
∑x i
x= i =1
n
n
∑ (d i + A)
= i =1
{Since x i = d i + A }
n
n n
∑ di ∑A
= i =1
+ i =1
n n
n n
∑ di ∑A nA
= i =1
+A {Since i =1
= = A}
n n n
4. If x 1 and x 2 are the means of two samples of sizes n1and n2 then the combined
mean is given by
n x + n2x2
x= 1 1
n1 + n 2
Proof:
Let x11 , x12 ,L, x1n1 be a sample of n1 observations and let x 21 , x 22 ,L, x 2 n 2 be
∑ x1i ∑x 2i
x1 = i =1
and x 2 = i =1
n1 n2
The mean of the combined sample is thus given by
n1 + n 2
∑x i
x= i =1
n1 + n 2
n1 n1 + n 2
∑ xi +
i =1
∑x
i = n 1 +1
i
=
n1 + n 2
n1x1 + n 2 x 2
=
n1 + n 2
n x + n 2x 2 + L + n k x k ∑n x i i
x= 1 1 = i =1
n1 + n 2 + L + n k k
∑n
i =1
i
5. The sum of squares of deviations from the arithmetic mean is less than the sum of
squares of deviations from any other arbitrary score. That is, the sum of squares of
deviations is minimum when taken about the arithmetic mean.
Proof:
Consider a frequency distribution and take an assumed mean, A to be any of the
class midpoints, xi, i = 1, 2, …, n.
Define
n
S = ∑ f i d i2 , where d i = x i - A , i = 1,2,L, n
i =1
n
= ∑ f i (x i − A) 2
i =1
n
⇒ ∑ f i (x i − A) = 0 (3.4)
i =1
∂ 2S ∂ n n
Next,
∂A 2
= −2 ∑
∂A i =1
f i x i − A ∑
i =1
fi
n n
= 0 + ∑ fi = ∑ fi > 0
i =1 i =1
n n
⇒ ∑ fi x i = A∑ fi
i =1 i =1
∑f x i i
⇒A= i =1
n
=x
∑f
i =1
i
Let us now consider some examples to demonstrate how the arithmetic mean is
calculated.
Example 3.1:
Given the following data calculate the arithmetic mean.
x 1 2 3 4 5
frequency 3 5 9 6 2
Solution:
(a) Direct Method
For grouped data,
n
∑f x i i
x= i =1
n
∑f
i =1
i
5 5
Thus, ∑f
i =1
i = 25 , and ∑f x
i =1
i i = 74 .
∑f x i i
x= i =1
n
∑f
i =1
i
74
= = 2.96
25
x frequency d = x - 3 fd
(f)
1 3 -2 -6
2 5 -1 -5
3 9 0 0
4 6 1 6
5 2 2 4
Total 25 -1
5 5
Thus, ∑ fi = 25 , and
i =1
∑f d
i =1
i i = −1 .
∑f d i i
x =A+ i =1
7
∑f i =1
i
(−1)
= 3+
25
1
= 3− = 2.96
25
Example 3.2:
The following grouped data shows the number of consignments and their weights (in kg)
received by a courier service for shipment.
weight frequency
6.5 – 7.5 5
7.5 – 8.5 12
8.5 – 9.5 25
9.5 – 10.5 48
10.5 – 11.5 32
11.5 – 12.5 6
12.5 – 13.5 1
Solution:
Let the assumed mean A=10. Then we have the following table.
7 7
Thus, ∑f
i =1
i = 129 , and ∑f d
i =1
i i = −17
∑f d i i ∑ fd
x =A+ i =1
x = A+ ⋅h
∑ f
7
∑f
i =1
i
(−17)
= 10 +
129
= 9.867
Note:
In the computation of x by the method of assumed mean, we
sometimes reduce the bulkiness of the deviations x – A by
1 n
x r = antilog ∑ log(x i )
n i =1
In the case of grouped data; if x1, x2, …, xn are the observations with f1, f2, …, fn as the
corresponding frequencies then
n
xr = ∑ fi x ⋅ x ⋅ L ⋅ x ⋅ x ⋅ x ⋅ L ⋅ x ⋅ L ⋅ x ⋅ x ⋅ L ⋅ x
i =1 11 4
41244 31 12 442244 32 1n 44n244 3n
f1 times f 2 times f n times
∑ fi
= i =1
x 1f1 ⋅ x f22 ⋅ L ⋅ x fnn
log(x r ) = n
1
(
log x 1f1 ⋅ x f22 ⋅ L ⋅ x fnn )
∑ fi
i =1
n
1
= n ∑f i log(x i )
∑ fi i =1
i =1
Thus,
1 n
x r = anti log n ∑ f i log(x i )
∑ fi i =1
i =1
The geometric mean has its own limitations. For example, it cannot be used for a data set
with negative values. Besides, if any observation in the data set is zero, the geometric
mean is equal to zero.
Example 3.3:
Find the geometric mean of the values 4,6,8,9.
Solution:
By definition, x r = n x1 ⋅ x 2 ⋅ L ⋅ x n .
Here n=4 and therefore,
xr = 4 4 × 6 × 8 × 9
= 4 1728
Using logarithms;
Example 3.4:
Given below is the frequency distribution of students’ performance in a Mathematics test
at the end of a school term. The marks are out of 50.
Marks Number of
students
0 – 10 5
10 – 20 9
20 – 30 10
30 – 40 16
40 – 50 4
Then,
1 5
x r = anti log 5 ∑ f i log(x i )
∑ f i i =1
i =1
1
= anti log (59.377)
44
= anti log(1.350) ≈ 22.39
Harmonic mean of a set of n observations x1, x2, …, xn is the ratio of n to the sum of the
reciprocals of the observations. That is, the harmonic mean is defined as
n
xh =
1 1 1
+ +L+
x1 x 2 xn
n
= n
∑1 x
i =1
i
For grouped data if x1, x2, …, xn are the observations with f1, f2, …, fn as the
corresponding frequencies then
n
∑f i
xh = i =1
n
fi
∑x
i =1 i
The harmonic mean is mainly used where it is desired to give the greatest weight to the
smallest items. It is used in such areas as in averaging rates and time. It is not, however,
a popular measure of location.
Example 3.5:
An airplane flies around a square of length 100 miles. It covers at a speed of 100 miles
per hour the first side, 200 mph the second side, 300 mph the third side and at 400 mph
the fourth side. What is the average speed?
Solution
We make use of the harmonic mean to calculate the average speed.
n
xh =
1 1 1
+ +L+
x1 x 2 xn
4
=
1 1 1 1
+ + +
100 200 300 400
= 192 mph
Example 3.6:
The table below shows the distribution of wages across the different age groups in a Life
Is Precious, a non-governmental organization, whose role is to alleviate poverty in a
certain community.
Solution:
Using the above frequency distribution, we form the following table.
Then,
n
∑f i
xh = i =1
n
fi
∑x i =1 i
65
= ≈ 63.021
1.0314
Note:
In the computation of the x , x h , and x r , the
Exercise 3.2
1. Given the following data:
Age group 80-89 70-79 60-69 50-59 40-49 30-39 20-29 10-19
Frequency 2 2 6 20 56 40 42 32
Calculate the
(i) arithmetic mean
(ii) harmonic mean
(iii) geometric mean
2. Find the average mark of the student from the following frequency table:
Above 30 389
Above 40 309
Above 50 273
Above 60 250
Above 70 0
i. Quartiles
Quartiles are the values of the variate, which divided the total frequency into four equal
parts.
The kth quartile denoted by Qk is given by;
Nk
− C h
Q k = L1 +
4
f
Where
Li = Lower limit of the ith quartile class
N = Total cumulated frequency
F = Frequency of the quartile class
C = Cumulative frequency of the class preceding the quartile class
k =1,2,3; that is, first, second and third quartile.
Advantages of Quartiles
The quartiles
o are very easy to calculate.
o are not affected by extreme values.
o can be used to treat qualitative data.
o can be determined graphically using ogives.
Limitations
The quartiles
o are not amenable to further algebraic manipulation
o requires that data must arranged in ascending order or descending
order of magnitude and involves additional work.
o are erratic if the number of items is small.
Example 3.7:
Using the data below compute the quartiles and the median.
Variable 5 7 9 11 13 15 17 19
frequency 1 2 7 9 11 8 5 4
Solution:
We first calculate the cumulative frequency in order to determine the value of the
quartiles.
5 1 1
7 2 3
9 7 10
11 9 19
13 11 30
15 8 38
17 5 43
19 4 47
47
=
4
= 11.75th item.
This item is included in the cumulated frequency (c.f. =19) where x=11. Hence the first
quartile Q1 =11.
Example 3.8:
Find the median and the quartile for the marks obtained by 76 students given below.
Solution:
th
The median Q2 =size of ( N 2) item.
76
=
2
= 38th item.
This item lies in class interval 30-40 whose cumulated frequency (c.f.=56).
Applying the interpolation formulae
N
− C h
M = L1 +
2
f
Here L1 =30, f=32 N 2 = 38 and C=24. Substituting these values in the formula above we
get;
M = 30 +
(38 − 24)10
32
=34.37 marks
The first quartile Q1 = size of ( N 4) th item
76
=
4
=19th item.
This item lies in the class interval 20-30 whose cumulated frequency (c.f. = 24).
Applying the interpolation formulae
N
− C h
Q1 = L1 +
4
f
Here h=10, L1 =20, f =12 N 2 = 19 and C=12
Substituting these values in the formula above we get:
Q1 = 20 +
(19 − 12)10
12
=25.83 marks
th
3N
The third quartile Q3 = size of item
4
76 × 3
=
4
= 57th item
The item lies in class interval 40-50 whose cumulated frequency (c.f.=56).
Applying the interpolation formulae
3N
− C h
Q 3 = L1 +
4
f
Here h= 10, L1 =40, f= 20, N 2 = 57 and C=56. Substituting these values in the
formula above we get;
(57 − 56)10
Q 3 = 40 +
20
= 40.5 marks
Exercise 3.3
Calculate the median, the first and the third quartiles for the following data.
Weight 60-69 70-79 80 -89 90-99 100-109 110-119 120-129 130-139 140-149
Boys 2 9 24 28 15 11 7 3 1
ii. Median
This is the second quartile, i.e. when k=2. It may be defined as the middle most or central
value of the variable when the values are arranged in increasing order of magnitude. In
the case of grouped data, the median may be defined as that value of the variable that
divides the area of the curve into two equal parts.
The figure below shows a typical boxplot. It consists of a box from Q1 to Q3 with
whiskers extending to the minimum and maximum of the data set.
(f m − f 1 ) h
Mode = L1 +
2f m − f1 − f 2
Advantages
The mode,
o can easily be calculated.
o is not affected by extreme values.
o can be determined graphically.
o can be used for qualitative data analysis.
Disadvantages
The mode;
o is not amenable to further algebraic manipulation.
o is indeterminate when the distribution is irregular and there is no
definite point of maximum density.
o is not significant when the frequency distribution does not include
large number of items
In the case of discrete and continuous grouped data we locate the mode by the method of
grouping.
Example 3.9:
If seven men are receiving daily wages of Shs. 5,6,7,7,8,9,10 find the modal wage.
Solution:
The modal wage is 7. This is because it has maximum frequency of occurrence.
Method of Grouping
The method of grouping is applied when;
(i) The maximum frequency is repeated
(ii) The distribution is irregular, deviates from normality.
Example 3.10:
Find the mode of the following distribution.
Variable 3 4 5 6 7 8 9 10 11
Frequency 5 4 6 8 9 7 5 9 4
Solution:
If we locate the mode by inspection, we find that the variables 7 and 10 have a maximum
frequency, hence we cannot determine whether the mode is 7 or 10. This is a case of
bimodal distribution. We can determine the mode of this distribution using the method of
grouping.
Procedure:
The frequencies in column I are added in pairs. In column II we leave the first item and
added the rest in pairs. In column III the items are added in threes and in column IV the
first item is left out and the rest added in threes. In column V the first two items are left
out and the rest added in threes. The maximum frequency in each column is picked out
in the table below:
Column Number Maximum Frequency Combination of Values
I 16 7,8
II 17 6,7
III 24 6,7,8
IV 21 7,8,9
V 23 5,6,7
Frequency Table
Variable 5 6 7 8 9
Frequency 1 3 5 3 1
Since item 7 has the maximum frequency, then 7 is the mode. In the case of grouped data
we locate the modal class using the method of grouping and then apply the interpolation
formulae.
Example 3.11:
Find the mode from the following data
Solution:
Since this a unimodal distribution we see that the modal class is 30 - 40. We can now use
the interpolation formulae;
( f m − f1 ) h
Mode = L1 +
2f m − f1 − f 2
(37 − 21)10
= 30 +
74 − 21 − 31
= 37.27 marks
3. Below is the frequency distribution which resulted when the weight (in kg) of 50
calves in a dairy farm were measured.
Weight 170 172.5 175 177.5 180 182.5 185 187.5 190 192.5 195
(Kg)
Frequency 1 2 4 6 8 9 7 6 3 2 2
Find:
a) the mode
b) the median
c) the interquartile range
4. When checking the number of errors per page by a copy typist the frequency
distribution was as summarised below.
Number of errors per page 0 1 2 3 4 5 6 7 8
Frequency 4 15 27 20 18 10 4 1 1
Find:
a) the mode
b) the median
c) the upper quartile
5. The grouped frequency shown below gives the results of an IQ test performed on
a group of 50 students.
IQ test marks 90 - 95 - 100 - 105 - 110 - 115 - 120 - 125 -
94 99 104 109 114 119 24 129
Frequency 2 7 9 14 9 4 3 2
6. A cellular phone dealer sells three different models made by the same
manufacturer. He sells
265 of Nokia 5210 at a mean price of Shs. 10 860,
352 of Nokia 8250 at a mean price of Shs. 12 580,
150 of Nokia 8310 at a mean price of Shs. 18 250.
Find the mean price of all the three phones sold during this period.
9. The yields of grains (x tonnes) from 500 small plots are grouped in classes with a
common class interval (0.2 tonne) in the table below, the value of x given being
the mid-values of the classes.
x f x f x f x f x f
2.8 4 3.4 47 4.0 88 4.6 35 5.2 4
3.0 15 3.6 63 4.2 69 4.8 10 - -
3.2 20 3.8 78 4.4 59 5.0 8 - -
Show that
(i) the mean of the distribution is 3.95 tonnes,
(ii) the median of the distribution is 3.95 tonnes
(iii) the lower and the upper quartiles are 3.63 and4.28 tonnes respectively.
UNIT 4
MEASURES OF DISPERSION
4.0 Introduction
So far we have been concerned with calculating or estimating a single value to represent
a set of data. Although you can quote a single number to represent data, the data itself
will be spread about that number. This unit discusses the various methods of measuring
the spread of data.
4.1 Objectives
By the end of this unit the learner should be able to;
explain the purpose of measures of spread.
compute and interpret the range, variance and standard deviation for
quantitative variables using appropriate formulas.
know the basic properties of the standard deviation.
Learning Style
To achieve what is expected of you…
Briefly revise the Unit3.
Allocate sufficient study time.
Attempt most of the practice and revision exercises in this Unit.
will increase the accuracy of statistical analysis and interpretation and we can be
in a better position to draw more dependable inferences.
3. Measures of dispersion make it possible to compare between different groups.
4. Measures of dispersion are very important in many economic and social
problems. Comparisons are made and this helps in studying inequalities in the
distribution of income, wealth, land, etc., among different sections of the country.
Similarly, social problems in different areas of the country can be compared with
different areas and these social evils can be removed by taking effective steps.
There are five methods of measuring measures of dispersion.
1) The Range
2) The Interquartile Range
3) Semi Interquartile Range
4) Mean deviation
5) Standard deviation
The first three are position measures of dispersion based on some items of the series and
the last two are based on all items of this series.
4.4 Range
The range is the simplest measure of variation that we can use. It is simply the difference
between the largest and smallest values in a set of data. That is, if x1, x2, …, xn are the
observations in ascending order such that x 1 < x 2 < x 3 <…< x n
then
Range = x n − x1
In other words,
Range = Largest value – Smallest value
Example 4.1:
Given the observation 2,4,3,5,1,3,6, find the range.
Solution:
We first arrange the observations in ascending order; then using the above definition, we
have Range = 6 − 1 = 5
The range is mainly used in those fields where the variation is not considerable e.g. in the
field of quality control of manufactured goods, measuring money rates and rate of
exchange fluctuations. However, it should be kept in mind that the range is a crude
measure of dispersion and is entirely unsuitable for precise and accurate studies.
This method of mean or average deviation seems to remove a serious shortcoming of the
previous three methods; i.e., they are not based on all the observations. It gives us an idea
about the amount of observations around a central point. The mean deviation is the
arithmetic mean of deviations of a series computed from some measure of central
tendency ignoring the signs.
It is to be noted that the algebraic sum of the deviations of a group of observations
from their own mean is always zero. To avoid this we take the deviations ignoring the
signs; i.e., take absolute values of these variations.
If x1, x2, …, xn are the individual observations, then we define the mean deviation as:
1 n
Mean Deviation (M.D.) = ∑ xi − A
n i =1
where A is any measure of central tendency.
If f1, f2, …, fn are the corresponding frequencies of the above observations, then the mean
deviations is defined as;
1 n
M.D. = ∑ fi xi − A
n i =1
Advantages
The mean deviation is
o based on all observations and gives weight to items according to their size.
o easily computed and readily understood.
o not affected by the fluctuations of sampling and by extreme values.
o rigidly defined.
Disadvantages
o It is not amenable to further algebraic manipulations because it ignores
signs.
o It is not very accurate.
o Sometimes the mean deviation may not be a representative particularly if
it is calculated from the mode.
Example 4.2:
Find out the mean deviation from the mean and the median for the following data.
Frequency 2 4 6 8 10 12 8
Variable 5 7 9 11 13 15 17
Solution:
xi fi C.F. d1i = x i − x f i d 1i d 2i = x i − M e f i d 2i
5 2 2 7.5 15 8 16
7 4 6 5.5 22 6 24
9 6 12 3.5 21 4 24
11 8 20 1.5 12 2 16
13 10 30 0.5 5 0 0
15 12 42 2.5 30 2 24
17 8 50 4.5 36 4 32
∑f d i 1i = 141 ∑f d
i 2i = 136
∑f d i 2i
=
136
= 2.72
∑f i 50
∑f d i 1
=
141
= 2.82
∑f i 50
Example 4.3:
Calculate the mean deviation of the following data.
Marks 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50
Number of students 10 25 30 20 15
Solution:
Since here it is not specified whether the deviation should be taken from the mean,
median or mode, we generally take the deviations from the mean because in such
situation the deviations are minimal.
Exercise 4.1
1. From the following frequency distribution of the sale of the tickets, calculate the
mean deviation about the mean and the median.
1 n
σ= ∑ f i (x i − x )
2
N i =1
If the deviations are taken from any measure of central tendency like the mode or median
then it is called the root mean square deviation.
That is,
1 n
R.M.S.D = ∑ f i (x i − A )
2
N i =1
where A is either the mode or median.
The square of the standard deviation denoted by σ2 is called the variance.
The standard deviation suffers from least drawbacks and provides accurate results
compared to the other measures of dispersion we have considered. The method of
calculating the standard deviation removes the drawback of ignoring the algebraic signs
while calculating deviations of the items from the average. Instead of ignoring the signs,
the deviations are squared thereby making all the items positive and then take the square
root of the resultant.
N i =1 N i =1
Proof:
1 n
σ2 = ∑ f i (x i − x )
2
N i=1
∑ f i (x i2 − 2x i x + x 2 )
1 n
=
N i=1
1 n
= ∑ f i x12 − 2x 2 + x 2
N i−1
1 n 1 n 1 n
= ∑ f i x i2 − 2x ∑ f i x i + x 2 ∑ f i
N i=1 N i=1 N i=1
2
1 n 1 n
= ∑ f i x i2 − ∑ f i x i
N i =1 N i =1
x′i = x i − a so x ′ = x − a
The standard deviation
1 n 1 n
σ2 = ∑ f i (x i − x ) = ∑ f i (x′i − x′) = σ 2
2 2
N i=1 N i=1
N i =1
N i=1
1 n
= ∑ f i h 2 (u i − u ) 2 = h 2σ2u
N i=1
⇒ σ x = hσ u
Thus if the scale is changed by h, then we multiply the resultant standard deviation by h.
4) Let the standard deviations and means of two samples of sizes n1 and n2 be σ1, σ2 and
x1 , x 2 respectively. If the two samples are combined to get one sample of size
n = n1 + n 2 , then the combined variance of this combined sample measured from its
n 1σ12 + n 2 σ 22 n 1 n 2 ( x 1 − x 2 ) 2
combined mean x is given by σ 2 = +
n1 + n 2 (n 1 + n 2 ) 2
Proof:
1 ni 1 n2 1 n1 1 n2
Define x1 = ∑ x1i , x 2 = ∑ x 2j , σ 1 = ∑ ( x 1i − x 1 ) , σ 2 = ∑ ( x 2j − x 2 )
2 2 2 2
n1 i =1 n 2 j=1 n 1 i =1 n 2 j=1
The combined mean
n1x1 + n 2 x 2
x= Qx =
1
[∑ x1i + ∑ x 2j ] =
1
[n1x1 + n 2 x 2 ]
n1 + n 2 n1 + n 2 i j n1 + n 2
The variance of the combined series is given by,
1 n1 n2
σ2 = { ∑ ( x 1i − x ) 2 + ∑ ( x 2j − x ) 2 } (4.1)
n 1 + n 2 i =1 j=1
Consider
n1 n1
∑ (x1i − x ) = ∑ (x1i − x1 + x1 − x)
2 2
i =1 i =1
n1 n1
= ∑ (x1i − x1 ) 2 + 2( x1 − x) ∑ (x1i − x1 ) + n1 ( x1 − x) 2
i =1 i =1
n1
= ∑ (x1i − x1 ) 2 + n1 ( x1 − x) 2
i =1
n1
= n1σ12 + n1d12 (since ∑ (x1i − x1 ) = 0 ) (4.2)
i =1
Similarly,
n2 n2
∑ ( x 2j − x ) 2 = ∑ ( x 2j − x 2 ) 2 + n 2 ( x 2 − x) 2
j =1 j =1
= n 2σ 22 + n 2d 22 (4.3)
where d1 = x1 − x and d 2 = x 2 − x
Now substituting (4.2) and (4.3) in (4.1) we get
1
σ2 = {n1 (σ12 + d12 ) + n 2 (σ 22 + d 22 )} (4.4)
n1 + n 2
(n 1 x 1 + n 2 x 2 ) (n1x1 + n 2 x 2 )
Now, d 1 = x 1 − d2 = x 2 −
n1 + n 2 n1 + n 2
n 2 (x 1 − x 2 ) n1 (x 2 − x1 )
= =
n1 + n 2 n1 + n 2
1 n 1 n 22 (x 1 − x 2 ) 2 n 2 n 12 (x 2 − x 1 ) 2
σ =
2
{n 1σ1 + n 2 σ 2 +
2 2
+ }
n1 + n 2 (n 1 + n 2 ) 2 (n 1 + n 2 ) 2
1 nn
= {n 1σ12 + n 2 σ 22 + 1 2 (x 1 − x 2 ) 2 }
n1 + n 2 n1 + n 2
Exercise 4.2
The first of the two samples has 100 items with mean 15 and standard deviation 3. If the
whole group has 250 items with mean 15.6 and variance 13.44, find the standard
deviation of the second group.
group B then we can conclude that group B is more consistent than group A i.e. there is
less variation in group B than in group A.
Example 4.4:
Calculate the standard deviation and the coefficient of variation for the following data.
Wages Number of workers
70-80 12
80-90 18
90-100 35
100-110 42
110-120 50
120-130 45
130-140 20
140-150 8
Solution:
x − 105
Wages Midpoint (x) f u= u2 fu2 fu
10
70-80 75 12 -3 9 108 -36
80-90 85 18 -2 4 72 -36
90-100 95 35 -1 1 35 -35
100-110 105 42 0 0 0 0
110-120 115 50 1 1 50 50
120-130 125 45 2 4 180 90
130-140 135 20 3 9 180 60
140-150 145 8 4 16 128 32
Σf=230 Σfu =753
2
Σfu=125
Note: We have changed the scale and the original to make data less bulky.
x−A
Thus; u =
h
where we take A= median value and h a convenient common divisor. Since we have
previously shown that deviation is independent of change of origin but dependent on the
scale, we have
σ= [ ∑ fu /N − ( ∑ fu/N )
2 2
]× h
= (753 / 230) − (125 / 230) × 10 = 17.3
2
Mean x = A + h ∑ fu/N
= 105 + (125/230 )× 10 = 110.4
17.3
Coefficient of variation = × 100 = 15.67
110.4
Exercise 4.3
1. The score of two golfers for 24 rounds were as follows:
Golfer A: 74 75 78 78 72 77 79 78 81 76 72 72
77 74 70 78 79 80 81 74 80 75 71 73
Golfer B: 86 84 80 88 89 85 86 82 82 79 86 80
82 76 86 89 87 83 80 88 86 81 84 87
Find which golfer may be considered to be more consistent player (less variable).
2. In a certain test for which the pass mark is 30 the distribution of marks of passing
candidates classified by sex were given below.
Marks Boys Girls
30 – 34 5 15
35 – 39 10 20
40 – 44 15 30
45-49 30 20
50 – 54 5 5
55 – 59 5 0
The overall mean and standard deviation of marks for boys including thirty boys who
failed was 38 and 10 respectively. The corresponding mean and standard deviation for
girls including the 10 who failed was 35 and 9.
a) Find the mean and standard deviation of the 30 boys who failed in the test.
b) The moderation committee argued that the percentage of passes among girls is
higher because the girls are very studious and if the intention is to pass those
students who are really intelligent, a higher pass mark should be used for girls.
Without question in the priority of this argument suggest what the pass mark
should be which will allow only 70% of the girls to pass.
c) The prize committee decided to award prizes to the best 40 students irrespective
of sex, Judged on the basis of marks obtained in the test, estimate the number of
girls who would receive the prize.
Estimate the mean and the standard deviation of these marks for this population.
11. A student visits a shop frequently. On 20 random occasions he recorded the queue
length (the number of people queuing) at the check-out. The result were:
Queue length 4 5 6 7 8 9
Number of visits 2 3 7 6 0 2
By calculation, estimate
i. the mean
ii. the standard deviation
iii. the coefficient of variation
of the queue length at this shop.
12. The table below summarises the weights, to the nearest kilogram, of a random
sample of 40 Highland cattle.
Weight (Kg) 400-449 450-499 500-549 550-599 600-649 650-699
Frequency 1 2 4 6 8 9
Estimate the mean and the standard deviation of the weights for this distribution
13. A distribution consists of three components with frequencies of 200, 250 and 300,
having means of 25, 10 and 15, and standard deviations of 3, 4 and 5 respectively.
Show that the mean of the combined distribution is 16, and its standard deviation
7.2 approximately.
14. The yields of grains (x tonnes) from 500 small plots are grouped in classes with a
common class interval (0.2 tonne) in the table below, the value of x given being
the mid-values of the classes.
x f x f x f x f x f
2.8 4 3.4 47 4.0 88 4.6 35 5.2 4
3.0 15 3.6 63 4.2 69 4.8 10 - -
3.2 20 3.8 78 4.4 59 5.0 8 - -
UNIT 5
5.1 Objectives
By the end of this Unit, you should be able to:
recognize the following distributions: normal, skewed, platykurtic and
leptokurtic.
approximately locate the median (equal areas point) and the mean
(balance point) on a distribution.
know that both the mean and median lie at the centre of a symmetric
distribution and that the mean moves farther towards the long tail of a
skewed curve.
compute and interpret the coefficient of skewness.
compute and interpret the coefficient of kurtosis.
Learning Style
To achieve what is expected of you…
Allocate sufficient study time.
Attempt most of the practice and revision exercises in this Unit.
If x1, x2, …, xn are the n values assumed by the variable x, we define the rth moment
about a point ‘a’ as;
1 n
µ ′r = ∑ (x i − a) r for individual data or
n i =1
n
∑ f (x i i − a) r
µ ′r = i =1
n
for grouped data.
∑f i =1
i
∑x r
i
νr = i =1
n
for grouped data.
∑f
i =1
i
1 ∑
f i (x i − x) r
µr = i =1
n
for grouped data.
∑f
N
i
i =1
Note that µ 1 = 0
1 n
µ2 = ∑ (x i − x) 2 is the variance.
n i =1
5.2.1 Relationship between moments about the mean and moments about any other
point
The rth moment about the mean is given by,
1 n
µr = ∑ (x i − x) r
n i =1
Adding and subtracting ‘a’ inside the parenthesis we get,
∑ [(x i − a) − (x − a)]
1 n
µr =
r
n i =1
r
1 n
µr = ∑ i ( v − d ) where v i = x i − a , d = x − a
n i =1
We can use the binomial expansion to expand (a − b) r
r r r r r r
(a − b) r = a o b r − a 1 b r -1 + a 2 b r − 2 - a 3 b r −3 + L (− 1) a r b o
o 1 2 3 r
Using this expansion we expand µ r to get
1 n r r 1 r −1 r 2 r − 2
µr = ∑ v i − d v i + d v i L (−d) r
n i =1 1 2
1 n r r 1 n 1 r −1 r 1 n 2 r − 2
= ∑ v i − ∑ d v i + ∑ d v i L (−d) r
n i =1 1 n i =1 2 n i =1
r r
= µ ′r − µ 1r −1d + µ ′r − 2 d 2 L (− 1) µ 1′ d r −1 + (−d) r −1
r −1
1 2
In particular,
µ 1 = 0 ; i.e., sum of deviations about the mean is zero.
µ 2 = µ ′2 − 2dµ 1′ + d 2 µ ′0
= µ ′2 − 2(x − a)µ 1′ + (x − a) 2 µ ′0
2
1 n 1 n
= µ ′2 − 2 ∑ ( x i − a )µ 1′ + ∑ ( x i − a ) µ ′0
n i =1 n i =1
= µ ′2 − 2(µ 1′ ) 2 + (µ 1′ ) 2 Since µ ′0 = 1
= µ ′2 − (µ 1′ ) 2
Similarly we can get the relationship for the higher moments i.e.
µ 3 = µ ′3 − 3µ ′2 µ 1′ + 2(µ 1′ ) 3
µ 4 = µ ′4 − 4 µ ′3µ 1′ + 6µ ′2 (µ 1′ ) 2 − 3(µ 1′ ) 4
Exercise 5.1
Show that
µ ′2 = µ 2 + 3µ 2 µ 1′
2
µ ′3 = µ 3 + 3µ 2 µ 1′ + µ 1′
3
µ ′4 = µ 4 + 4µ 3 µ 1′ + 6µ 2 µ 1′ + µ 1′
2 4
1 n
where µ 1′ = x − A = ∑ (x i − A)
n i =1
1
Hint: µ ′r = ∑ (x i − A) r
n
1
= ∑ (x i − x + x − A) r
n
1
= ∑ (z i + µ 1′ ) r where z i = x i − x and µ 1′ = x − A
n
5.3 Sheppard’s Correction to Moments of Grouped Distribution
In computing the arithmetic mean, standard deviation etc. for a series we calculate the
midpoints of the class intervals to represent the classes. In this case we assume that there
is a maximum concentration of the items around the midpoint. This assumption only
holds when the numbers of observations are many. But every set of data to be analyzed is
not large hence we cannot consider our assumption as valid in every case. In the case of
arithmetic mean errors represent on both side of the mean tend to cancel each other and
provide accurate results. We thus do not make corrections to the first moment, which is
the mean. In this case of the second moment, the errors on both sides of the mean are
positive after squaring hence, the canceling effect is not there. We therefore make
corrections only to the rth moment if r is even Sheppard used the Euler maclaurin formula
to evaluate these corrections for different moments.
µ 1* (corrected) = µ 1
µ *2 (corrected) = µ 2 h 2 12
µ *3 (corrected) = µ 3
h2 7 4
µ 4 (corrected) = µ 4 − µ2 + h
2 240
where h is the width of the class interval.
x 1 2 3 4 5
Frequency 5 2 5 4 4
Solution:
X f d=x-4 d2 d3 d4 fd fd2 fd3 fd4
1 5 -3 9 -27 18 -15 45 -135 405
2 2 -2 4 -8 16 -4 8 -16 32
3 5 -1 1 -1 1 -5 5 -5 5
4 4 0 0 0 0 0 0 0 0
5 4 1 1 1 1 4 4 4 4
∑ f = 20, ∑ fd = −20, ∑ fd 2
= 62 , ∑ fd 3
= −152, ∑ fd 4 = 446.
N
= 62/20
= 3.1
1
Third moment: µ ′3 = ∑ fd
3
N
= −152/20
= -7.6
1
Fourth moment: µ ′4 = ∑ fd
4
N
= 446/20
= 22.3
We use the relationship between the moments about a point and the central moments to
calculate the central moments.
µ1 = 0
µ 2 = µ ′2 − (µ 1′ ) 2
= 3.1-1
= 2.1
µ 3 = µ ′3 − 3µ ′2 µ 1′ + 2(µ 1′ ) 3
Exercise 5.2
Calculate the first four moments about the mean and derive the corresponding moments
about the median using the data below.
Marks Number of students
5 -15 5
15- 25 20
25- 35 15
35- 45 45
45- 55 10
55- 65 5
5.4 Skewness
Measures of central tendency give more information about the average and measures of
dispersion give information about the degree of variation that exist in the data. These
measures do not indicate whether the dispersal of values on either side of the measure of
central tendency is symmetrical or not. There might be two series having the same mean
and standard deviation and yet they differ in terms of symmetry of their distribution. The
symmetry of their distribution is studied by measures of skewness. Skewness means
lopsidedness or lack of symmetry in a frequency distribution. It should be noted that
skewness relates to the shape of a frequency distribution and not to its size. When the
distribution is symmetrical, we get a normal curve. In such a situation the mean, median
and the mode are equal.
(mean)
(mode)
(median)
Fig. 5.1: A symmetrical distribution
80
70
60
50
40
30
20
10
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
If the excess tail is on the left hand side then the distribution is said to be negatively
skewed. In this case, Arithmetic mean< Median <Mode.
80
70
60
50
40
30
20
10
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Mean − Mode
Coefficient of Skewness (C.S) =
Standard Deviation
X − Mo
=
σ
But Mode = 3 median –2 Mean
Therefore if,
Coefficient of skewness = 0, then the distribution is symmetrical.
“ “ “ > 0, then the distribution is Positively Skewed.
“ “ “ < 0, then the distribution is Negatively Skewed.
µ 32
β1 =
µ 32
Example 5.2:
Compute the coefficient of skewness for the given data. What can you say about the
symmetry of the distribution?
Wages Number of persons
0–5 3
5 – 10 10
10 – 15 16
15 – 20 25
20 – 25 16
25 – 30 10
Solution:
x − 17.5
Wages (k) Midpoint (x) Frequency (f) d = fd fd2
5
0-5 2.5 3 -3 -9 27
5-10 7.5 10 -2 -20 40
10-15 12.5 16 -1 -16 16
15-20 17.5 25 0 0 0
20-25 22.5 16 1 16 16
25-30 27.5 10 2 20 40
30-35 32.5 3 3 9 27
Using the table above and the interpretation formula we can compute
Mean = Median = 17.5; Q1 = 22.58; Q3 = 12.42
22.58 + 12.42 − 35
Bowley’s coefficient = =0
22.58 − 12.42
3(17.5 - 15.5)
Karl Pearson coefficient of skewness = =0
σ
Since the coefficient of skewness is zero, we can conclude that the distribution is
symmetrical.
Exercise 5.3
1. Use the moment method to confirm the result in the above example.
2. Calculate the coefficient of skewness of the two groups given below. Which of
the two is more skewed?
Group B 20 22 25 13 7
5.1 Kurtosis
Kurtosis means “bulginess” in Greek but in statistics it refers to the degree
of peakedness in the region about the mode of a frequency curve. The
peakedness of a distribution is another characteristic, which can be
measured. Kurtosis measures the degree of peakedness of a distribution with
reference to the normal distribution. There are three types of kurtosis. These
types depend on the structure and the magnitude of the frequency
distribution and also on the peakedness of the curves. The three types are
leptokurtic, mesokurtic and platykurtic distributions.
Lepto
Normal (mesokurtic)
Platykurtic
The study of kurtosis is useful as it points at the nature of the distribution of the items in
the middle of a series and helps in the choice of appropriate averages. Thus when the
distribution is mesokurtic the arithmetic mean is the most appropriate, median is more
appropriate for leptokurtic distributions and for platykurtic distributions the quartiles are
more suitable.
Example 5.3:
Using the data given below comment on the symmetry and the peakedness of the
distribution.
Solution:
We first calculate the three moments about the mean . These are
µ1=0, µ2=145, µ3=300, µ4=54625.
(−300) 2 54625
= =
(145) 3 21025
= 0.0295 = 2.598
Since β1 >0 and β 2<3 then the distribution is positively skewed and platykurtic.
Exercise 5.4
Analyze the peakedness and the skewness of the following distribution.
17. Over a period of years, 570 students were examined in SMA 102-Basic
Mathematics at the end of the semester examinations of Kenyatta University. The
marks gained by students ranged from 0 to 99, all being integers. These were
grouped in 20 classes, with a class interval of 5, the class frequencies being as
shown in the table below.
Interval f
0-4 12
5-9 13
10-14 13
15-19 14
20-24 23
25-29 23
30-34 29
35-39 34
40-44 44
45-49 44
50-54 50
55-59 52
60-64 61
65-69 41
70-74 32
75-79 27
80-84 23
85-89 17
90-94 13
95-99 5
UNIT 6
CORRELATION ANALYSIS
3.0 Introduction
Two things correlate when they vary together. For example, we expect land values to fall
with distance from the city centre. In this unit, we address the question of the degree and
direction of the relationship that exists between two variables.
5.1 Objectives
By the end of this unit you should be able to;
know the meaning of bivariate data.
make a scatterplot to display the relationship between two quantitative
variables.
describe the form, direction and strength of the overall pattern of a
scatterplot. In particular, recognize positive or negative association and
linear (straight - line) patterns. Recognize outliers in a scatterplot.
know what correlation is, why correlation analysis is performed and
how to find (using appropriate formulas) the correlation coefficient.
describe the information provided by a correlation coefficient.
between body structure and choice of a partner. Such types of relationships are studied
using the analysis of correlation.
In short, correlation indicates quantitative associations of variables. For example a fall in
price of commodities may be accompanied by arise in demand. Thus demand and price
move in opposite directions. Similarly, the price of a commodity and supply move in the
same direction. Correlation is therefore used to measure degree and direction of variables.
There are three methods of studying correlation;
(a) Scatter diagram
(b) Karl Pearson product moment relation coefficient.
(c) Spearman’s Rank correlation coefficient.
(i) If the fitted line goes upward and this upward movement is from left to right then
the x and y variables are positively correlated.
(ii) If the fitted line moves downward and its direction is from left to right then the x
and y variables are negatively correlated.
(iii) If the plotted points are scattered haphazardly such that we cannot fit an
appropriate line then this indicates that there is no correlation between x and y
variables.
Fig. 6.1: (i) Positive Correlation, (ii) Negative correlation and, (iii) Zero correlation
Since scatter diagrams are rough, they can only be described whether the correlation is
negative or positive but cannot give the magnitude of this relationship.
n n n
n ∑ x i yi − ( ∑ x i ) ( ∑ yi )
= i =1 i =1 i =1
n n n n
n ∑ x i2 − ( ∑ x i ) 2 n ∑ y i2 − ( ∑ y i ) 2
i =1 i =1 i =1 i =1
1 n
If we denote ∑ (x i − x)(y i − y) = µ 11 (this is known as the covariance), then we can
n i =1
µ 11
write the coefficient of correlation as, r =
σxσy
µ 11
b xy = (6.2)
σ 2x
And the regression coefficient of x on y as:
µ
b xy = 112 (6.3)
σy
From equation (6.1) we
rσ xσ y = µ 11 (6.4)
rσ y
= (6.5)
σx
Similarly substituting (6.4) in (6.3) we get
rσ x σ y
b xy =
σ 2y
rσ x
= (6.6)
σy
Multiply (6.5) and (6.6) to get;
rσ rσ y
b xy .b yx = x ⋅
σy xy
= r2
⇒ r = ± b xy ⋅ b yx
1 n
n
∑ (x i − x) 2 1n ∑ (y i − y) 2
n i =1 i =1
Squaring both sides we get.
2
n (x − x)(y − y)
∑ i
r 2 = n i =1
i
(x − x) 2 (y − y) 2
n
∑
i =1
i ∑
i =1
i
Schwartz inequality states that
2
n a b ≤ n a2 n b2
∑
i =1
i i
∑
i =1
i
∑
i =1
i
2
n ab
∑
i =1
i i
⇒ n ≤1 (6.7)
a 2 b2 n
∑
i =1
i ∑ i
i =1
Define a i = x i − x and b i = y i − y
Substituting these values of ai and bi in (6.7), we find that
n
[∑ (x i − x)(y i − y)] 2
r2 = n
i =1
n
≤1
[∑ (x i − x) 2 ][∑ (y i − y) 2 ]
i =1 i =1
⇒ r2 ≤ 1
Hence r ∈[− 1, + 1]
The magnitude of the correlation coefficient is always positive. The negative or positive
sign associated with it indicates the direction; i.e., if we have two correlation coefficients
r1 = 0.9 and r2 = −0.9 , then these two coefficients have equal magnitude but opposite in
direction.
Example 6.1:
Find the Pearson correlation coefficient for the following data;
x 1 3 4 6 8 9 11 14
y 1 2 4 4 5 7 8 9
Solution:
x y xy x2 y2
1 1 1 1 1
3 2 6 9 4
4 4 16 16 16
6 4 24 36 16
8 5 40 64 25
9 7 63 81 49
11 8 88 121 64
14 9 126 196 81
From the above table ∑=y=40, ∑x=56, ∑xy=364, ∑x2=524 and ∑y2=250.
Now
n n n
n ∑ x i y i − (∑ x i )(∑ y i )
r= i =1 i =1 i =1
n n n n
n ∑ x i2 − (∑ x i ) 2 n ∑ y i2 − (∑ y i ) 2
i =1 i =1 i =1 i =1
Exercise 6.1
1. If y = 10 , find the missing values of y and then calculate the product-moment
correlation coefficient using the data given below and interpret your result.
X 10 15 20 25 30 35 40 45
Y 7 9 15 ? 4 5 7 3
2. For each of the following data sets, plot a scatter diagram, and then calculate the
product - moment correlation coefficient:
X 3.2 4.4 4.6 3.4 3.2 4.2 5.2 3.4 3.0 3.2
Y 10.0 6.4 5.4 4.2 8.2 5.4 5.8 9.2 7.0 8.8
N ∑ f x i y i − ∑ f x x i ∑ f y y i
n n n
i =1 i =1 i =1
2 2
N ∑ f x x i2 − ∑ f x x i N n f y2 − n f y
n n
∑ y i ∑ y i
i =1 i =1 i =1 i =1
n n n
where ∑ f x i yi = ∑ f x x i = ∑ f y yi
i =1 i =1 i =1
Example 6.2:
Given the following grouped data of the performances of students in both statistics and
calculus, investigate whether there is any relationship between the marks obtained in
calculus and those obtained in statistics. Comment on the result.
Statistics
40 - 49 50 - 59 60 - 69 70 - 79 80 - 89 90 - 99
Calcu
lus
90 - 99 - - - 2 4 4
80 - 89 - - 1 4 6 5
70 - 79 - - 5 10 8 1
60 - 69 1 4 9 5 2 -
50 - 59 3 6 6 2 - -
40 - 49 3 5 4 - - -
Solution:
x − 64.5 7 − 74.5
Ux = and U y = .
10 10
Then,
94.5 2 - - - 2 4 4 10 20 40 44
84.5 1 - - 1 4 6 5 16 16 16 31
74.5 0 - - 5 10 8 1 24 0 0 0
64.5 -1 1 4 9 5 2 - 21 -21 21 -3
54.5 -2 3 6 6 2 - - 17 -34 68 20
44.5 -3 3 5 4 - - - 12 -36 108 33
fx 7 15 25 23 20 10 100 -55 253 125
fy Ux -14 -15 0 23 40 30 64
fy U x2 28 15 0 23 80 90 236
fy U x U y 32 31 0 -1 24 39 125
Exercise 6.2
The following table gives the ages of husband and wife living together on
the census night of 1961. Investigate whether the age factor count in the
choice of partner.
Age of Husbands
Age of Wives 25 - 35 35 - 45 45 - 55 55 - 65 65 - 75
20 - 30 5 9 3 - -
30 - 40 - 10 25 2 -
40 - 50 - 1 12 2 -
50 - 60 - - 4 16 5
60 - 70 - - - 4 2
relating to, abilities, honesty, beauty etc. This serial order is known as the
rank.
Let us suppose that a group of n individuals are arranged in order of merit or
proficiency of two characteristics A and B. These ranks in the two
characteristics will be different in general. For example if we consider two
characteristics of an individual, intelligence and beauty, it does not
necessary mean that a beautiful lady will also be intelligence.
Theorem
Assuming that no two individuals are bracketed equal in either classification, each of the
variables X and Y takes the values 12….n prove that the Spearman’s rank correlation
denoted by r is given by;
n
6∑ d i2
r = 1− i =1
where d i = x i - y i
n(n − 1)
2
Proof:
Now
1
x=y= (1 + 2 + 3 + L + n)
n
(n + 1)
=
2
12
In general x i ≠ y i
Let d i = x i - y i
= (x i - x) - (y i − y ) since x = y
Squaring and summing over i from 1 to n we get
n n
∑ di
2
= ∑ [(x i - x) - (y i − y)] 2
i =1 i =1
n n n
= ∑ (x i - x) 2 + ∑ (y i - y) 2 − 2 ∑ [(x i - x)(y i - y)]
i =1 i =1 i =1
1 n 2 1 n 1 n 1 n
∑ d i = ∑ (x i - x) + ∑ (y i - y) − 2 ∑ [(x i - x)(y i - y)]
2 2
n i =1 n i =1 n i =1 n i =1
= σ 2x + σ 2y − 2µ 11 (6.8)
µ 11
We know that σ 2x = σ 2y and r =
σxσy
n
6 ∑ d i2
r = 1− i =1
n(n − 1) 2
Example 6.3:
Compute the rank correlation coefficient for the following data.
X 70 83 90 65 55 75 80 45
Y 120 130 145 110 135 140 95 100
Solution:
We first arrange the X and Y series in descending order and denote the ranks of x and y
as Rx and Ry respectively.
x Rx y Ry d = R x − R y d2
70 5 120 5 0 0
83 2 130 4 -2 4
90 1 145 1 0 0
65 6 110 6 0 0
55 7 135 3 4 16
75 4 140 2 2 4
80 3 95 8 -5 25
45 8 100 7 1 1
∑ d 2
= 50
The Spearman’s rank correlation coefficient is given by
n
6 ∑ d i2
r = 1− i =1
n(n − 1) 2
6 × 50
= 1−
8(64 − 1)
= 0.405
whose ranks are common. This correction factor is added for each repeated values then
the corrected Spearman’s formulae is given by
6 ∑ d i2 + ∑ m i (m i2 − 1) 12
n k
r = 1 − i =1 i =1
n(n − 1)
2
Example 6.4:
Calculate the rank correlation coefficient for the following data;
X 20 25 33 17 38 60 25 70
Y 35 30 45 30 20 109 30 50
Solution:
After arranging the X’ values in descending order we find that there are two items with
25 as their value. To assign the ranks to these equal valued items, we calculate the
average of the ranks of the two items. If the items were not equal we could have assigned
them ranks 5 and 6. Since they are equal we assign them 5.5 which is the average of 5
and 6. The next item after the two items with value 25 is 20. This item is assigned rank 7
and not 6. We carry out a similar exercise for the y values.
x y Rx Ry d2=(Rx-Ry)2
20 35 7 3 16
25 30 5.5 6 2.25
33 45 4 2 4
17 30 8 6 4
38 20 3 7 16
60 10 2 8 36
25 30 5.5 5 0.25
70 50 1 1 0
∑d2 = 81.5
i =1 i =1
r = 1−
n(n − 1)
2
Since only two values are repeated k=2, item 25 is repeated two times in the X series
therefore m1 = 2 and in the Y series, item with value 30 is repeated 3 times and thus
m 2 = 3 . Substituting all these in the adjusted formulae we get;
6[78.5 + 0.5 + 2]
r = 1−
8(8 2 − 1)
= 0.0476
Exercise 6.3
For each of the following data sets, compute the Spearman’s rank correlation coefficient
and comment on the result.
(i)
X 68 64 75 50 64 80 75 40 55 64
Y 62 58 68 45 81 60 68 48 50 70
(ii)
X 3.2 4.4 4.6 3.4 3.2 4.2 5.2 3.4 3.0 3.2
Y 10.0 6.4 5.4 4.2 8.2 5.4 5.8 9.2 7.0 8.8
(iii)
X 20 25 33 17 38 60 25 70
Y 35 30 45 30 20 10 30 50
(iv)
X 25 87.5 135 187.5 212.5 290 355 395 445 490
Y 0.90 0.60 1.00 0.50 0.50 0.60 0.40 0.30 0.50 0.425
(v)
X 2 4.6 7.2 4.2 9.6 6.2 8 1
Y 12.4 18.2 33.6 14 38.6 24.8 32 7.6
(vi)
X 1 1.5 2 2 2.5 2.5 3
Y 3.5 3 2.5 2 1.5 1 0.5
3. In a training scheme for young people, the times they took to reach a required
standard of proficiency were measured. The average training time in days for each
age was recorded and the results are shown.
Age, x (years) 16 17 18 19 20 21 22 23 24 25
Average training time, y days) 8 6 7 9 8 11 9 10 12 11
4. The ranks of the same 15 students in Mathematics and French were as follows,
the two numbers within the brackets denoting the ranks of the same student:
(1,10), (2,7), (3,2), (4,6), (5,4), (6,8), (7,3), (8,1), (9,11), (10,15), (11,9),
(12,5), (13,14), (14,12), (15,13).
Show that the Spearman’s rank correlation coefficient is 0.51.
5. The marks, X and Y, gained by 1000 students for theory and laboratory work
respectively, are grouped with common class interval of 5 marks for each
variable, the frequencies for the various classes being shown in the table below.
The values of X and Y indicated are the mid-values of the classes. Show that the
product-moment coefficient of correlation is 0.68.
6.
x 42 47 52 57 62 67 72 77 82 Totals
y
52 3 9 19 4 - - - - - 35
57 9 26 37 25 6 - - - - 103
62 10 38 74 45 19 6 - - - 192
67 4 20 59 96 54 23 7 - - 263
72 - 4 30 54 74 43 9 - - 214
77 - - 7 18 31 50 19 5 - 130
82 - - - 2 5 13 15 8 3 46
87 - - - - - 2 5 8 2 17
Totals 26 97 226 244 189 137 55 21 5 1000
UNIT 7
REGRESSION
7.0 Introduction
We have seen that correlation gives us the idea of the magnitude and direction between
correlated variables. Now it is natural to think of a method that helps us in estimating the
value of one variable when the other is known. This problem is now addressed in this
unit.
4.0 Objectives
By the end of this unit the learner should be able to;
explain what the slope m and the intercept c mean in the equation
y = mx + c of a straight line.
describe the information provided by a regression equation.
explain when it is appropriate to use the statistical techniques of
regression.
find (using appropriate formulas) the equation of the least squares
regression line and be able to sketch the regression line through the data
points.
interpret the least squares regression line and use it to make predictions.
make inferences about
• the slope and intercept of a simple regression line
• the predicted Y value corresponding to a given X value
the children of a group of short parents is greater than that of the parents.
Galton has described his relationship as regression. These days there is a
growing tendency among writers to make use of this term for estimating the
unknown values of one variable from the known values of the other variable.
While correlation analysis tests which two or more phenomena co-vary
regression analysis measures the extent of this relationship thus enabling us
to make predictions. In other words, by regression we mean average
relationship between two or more variables. One of these variables is called
the dependent variable while the others are called independent variables.
Regression analysis helps is to establish a functional relationship between
two or more variables. Unless stated otherwise the dependent variables are
usually designed by y and the independent variables by x. For example,
when determining the level of livelihood of the families in the city of
Nairobi, we know that it depends on several factors e.g. Income. But does
income depend on level of livelihood? If a person living in Mathare slums
decides to change his livelihood by shifting to the posh Muthaiga estates,
will his income change because of this decision. The answer is no. This is
because the level of livelihood depends on income but income does no6t
depend on the level of livelihood. Therefore the level of livelihood is a
dependent variable and the income is an independent variable.
x and y, the scatter diagram will be more or less concentrated round a curve which may
be called the curve of regression. If the curve is a straight line then the regression is said
to be linear. This line describes the average relationship between the two variables and is
analogous to the concept of the mean of the series. This regression line is also known as
the estimation line and gives the best fit in the least square sense to a given distribution.
There are usually two lines of regression, ’x on y’ and y’ on x’.
If the straight line is so chosen that the sum of square of deviations parallel to the y-axis
is minimized, then such a line is called a line of regression of y on x.
n
Thus we minimize S = ∑ ( y i − Yi ) 2 where Yi and yi are the actual and estimated values
i =1
respectively, to get the regression of y on x. The regression line of y on x gives the best
estimate of y on any given value of x.
y - axis
Estimate
Actual
x - axis
Estimate
Actual
Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)
lOMoARcPSD|31627713
Let S denote the sum of square of deviations of the estimated value y1 from its actual
value Yi
n
S = ∑ f i ( y i − Yi ) 2
i =1
n
= ∑ f i ( y i − a − bX i ) 2 since Yi = a + bX i (7.1)
i =1
n n
∑ Yi = ∑ y i
i =1 i =1
n
Divide equation (7.6) by N = ∑ f i to get
i =1
1 n a n b n
∑ f i x i y i − ∑ f i x i − ∑ f i x i2 = 0 (7.7)
N i =1 N i =1 N i =1
We know that
1 n 1 n
∑ fi x i = x
N i =1
∑ f i yi = y
N i =1
1 n 1 n a n
∑ f i x i2 = σ 2x + x 2 ∑ f i x i y i = µ 11 + x.y ∑ f i x i
N i =1 N i =1 N i =1
2
σ 2y y −y r
n
= ∑ i + ( x i − x)
n i =1 σ y σx
Dividing and multiplying by σ y2
σ 2y n ( y i − y) 2 n (x − x) 2 n ( x − x )( y − y)
= ∑ + ∑ − ∑
i i i
r 2r
n i =1 σ y 2
i =1 σx
2
i =1 σx ⋅ σy
= σ 2y [1 − r 2 ]
S.E.(y) = σ y [1 − r 2 ]
On similar lines the standard error of the estimate of the regression line x on y is
S.E.(x) = σ x [1 − r 2 ]
Example 7.1:
Given the data below, calculate the lines of regression and their standard errors
x 1 2 3 4 5 6 7
y 7 8 9 11 10 13 12
Solution:
Calculating the means of x and y we find that x = 4 and y = 10
x y dx = x - 4 d y = y - 10 d 2x d 2y dxdy
1 7 -3 -3 9 9 9
2 8 -2 -2 4 4 4
3 9 -1 -1 1 1 1
4 11 0 1 0 1 0
5 10 1 0 1 0 0
6 13 2 3 4 9 6
7 12 3 2 9 4 6
28 28 26
∑ dxdy
The correlation coefficient =
d 2x d 2y
26
=
28 28
= 0 . 93
2
d 2y
σx = dx , σy =
n n
=2 =2
Estimated regression equation of x on y is given by;
rσ x
x=x+ (y − y)
σy
= 4 + 0.93(2 / 2 )( y - 10)
x = 0.93y − 5.3
The standard error of y on x is given by;
S.E.(x) = σ x [1 − r 2 ]
= 0.737
Regression equation of y on x is give by;
rσ y
y = y+ (x − x)
σx
y = 0.93x + 6.28
The standard error of x on y is given by,
S.E.(y) = σ y [1 − r 2 ]
= 0.737
Exercise 7.1
1. The figures given below depict the production in a sugar factory,
Year 1 3 4 5 6 7 10
Production 67 88 94 85 91 98 90
(i) Fit a linear regression line and tabulate the trend values.
(ii) Estimate the production in year 2,8 and 9.
2. Two random variables have the regression with equations;
3x + 2y = 26, and
6x + y = 31
Find the mean value and the correlation coefficient between x and y.
8. The table below shows the age A and the strontium ratio S for each of 10 basalt
rock samples, one sample of each age being specifically chosen for the exercise.
A (102 Ma) 1 2 3 4 5 6 7 8 9 10
S 0.710 0.723 0.738 0.751 0.765 0.780 0.793 0.808 0.824 0.840
10. The class teacher of Form IV South in Migwani High school needs to predict the
grades his students will get in the final examination. To do this he decides to look
at the marks gained in mock examinationl. He thinks that in Mathematics there is
a linear relationship between these marks. To investigate this he looks at the
results of students from the past years. The mock examination and average final
examination marks are given in the following table.
Mock mark 18 26 28 34 36 42 48 52 54 60
Av. final mark 54 64 54 62 68 70 76 66 76 74
11. The marks, x and y, gained by 1000 students for theory and laboratory work
respectively, are grouped with common class interval of 5 marks for each
variable, the frequencies for the various classes being shown in the table below.
The values of x and y indicated are the mid-values of the classes.
12.
x 42 47 52 57 62 67 72 77 82 Totals
y
52 3 9 19 4 - - - - - 35
57 9 26 37 25 6 - - - - 103
62 10 38 74 45 19 6 - - - 192
67 4 20 59 96 54 23 7 - - 263
72 - 4 30 54 74 43 9 - - 214
77 - - 7 18 31 50 19 5 - 130
82 - - - 2 5 13 15 8 3 46
87 - - - - - 2 5 8 2 17
Totals 26 97 226 244 189 137 55 21 5 1000
UNIT 8
PROBABILITY
8.0 Introduction
Probability theory refers to a measure of occurrence of a chance event and is based on the
belief that the same set of cases is always accompanied by the same effect, that the future
will be like the past. Actually, it is a matter of belief, not certainty to which we give
expression in the concept of probability. All the terms, probability chance, likely convey
the same message that the event is not certain to take place or there is uncertainty about
the happening of the event. We make use of probability in our daily life, for example
somebody might say that “it will probably rain today” or “you are probably right”. Such
concepts refer to probability. An individual’s approach to probability depends upon the
nature of his interest in the concept.
8.1 Objectives:
By the end of this unit you should be able to:
define probability, sample space, simple and compound event, mutually
exclusive events, independent events and conditional probability.
define probability of an event in classical and axiomatic approach.
calculate conditional probability.
apply addition and multiplication laws of probability.
state and prove Bayes Theorem on conditional probability.
4. Exhaustive Events
A set of events are said to be exhaustive when it includes all possible outcomes of a trial.
It means that all the possible events that can happen are included in the study of
probability. For example;
(a) In an experiment of tossing of a coin there are two possible
outcomes, head or tail. Thus the number of exhaustive cases here are two.
(b) In an experiment of throwing a die, there are six exhaustive
cases, since any of the six faces can appear.
(c) In an experiment of throwing two dice, the number of
exhaustive cases is 62=36, since any of the six numbers on the first die can
be associated with any of the six numbers on the second die.
11 1 2 13 [1 4] 1 5 1 6
21 2 2 [2 3] 2 4 2 5 2 6
31 [3 2] 3 3 3 4 3 5 3 6
[4 1] 4 2 4 3 4 4 4 5 4 6
51 5 2 5 3 5 4 5 5 5 6
61 6 2 6 3 6 4 6 5 6 6
In general in the experiment of throwing n dice the number of exhaustive cases is 6n.
5. Favourable Cases
The number of cases favorable to an event in a trial is
the number of outcomes, which entail the happening of
the event. For example in the experiment of throwing
two dice, the number of ways of getting a sum of five
are: [1,4] [4,1] [2,3] [3,2], i.e. four ways.
A compound event means the joint occurrence of two or more simple events. For
example, at least one head appears if three coins are tossed or drawing a red and
then a green ball in two draws from a bag containing six red and ten green balls.
However if the first card is not replaced, then the result of the second draw is
dependent on the first draw.
8 Problem of Cards
A pack of cards contain 52 cards out of which we have;
4 cards each of, Two, Three, Four, Six, Seven, Eight, Nine, Ten, Jacks Queens,
Kings and Aces.
There are 4 suits of 13 cards each, Hearts, Spades, Diamond and Clubs
P[A ] = lim
nA
n →∞ n
For example, if a coin, fair or not, is tossed n times and heads shows nH times, then the
nH
probability of heads equals the limit lim
n →∞ n
Definition: If we have n items and we want to arrange them r items at a time, then
denoting the total number of ways this is possible by ncr for combinations and npr for
permutations, we have,
n! n!
n
cr = and n
pr =
(n − r )!r ! (n − r )!
Example 8.1:
To arrange the letters A,B,C in pairs n=3 and r=2.The total number of ways this can be
done ignoring order.
3 × 2 ×1
3
c2 = =3
1× 2 ×1
These are AB, AC, and BC.
Example 8.2:
A bag contains 7 white, 6 red and 5 black balls. A ball is drawn at random. Find the
probability that it will be red?
Solution:
Number of favorable cases of getting a red ball is equal to 6.
Total number of exhaustive cases =7+6+5 =18.
Let A be the event of getting a red ball then,
Number of Favourable cases
P(A) =
Total number of Exhaustive cases
=6/18
=1/3
Example 8.3:
A bag contains 4 red and 3 blue balls. Two draws of two balls are made. Find the chance
that the first draw gives two red balls and the second draw two blue balls when the balls
are replaced after the first draw.
Solution:
Let A be the event of drawing two red balls and B be the event of drawing two blue balls.
The bag contains a total of 7 balls; it means that 2 balls can be drawn in 7c 2 ways.
Total number of exhaustive ways=7c 2
Two red balls can be drawn from the red balls in 4c 2 ways.
Therefore the total number of ways favorable to the event A of drawing two red
balls=4 c 2.
4
c2
P(A) = 7
c2
2
=
7
On similar lines, it can be shown that
3
c2 1
P(B) = 7
=
c2 7
Example 8.4:
Five cards are drawn at random from a well-shuffled pack of cards. Find the probability
that;
(a) 4 are aces
(b) There are 4 aces and 1 king
Solution:
The total number of exhaustive cases = 52c 5
a) 4 aces can be drawn in 4c 4 and the remaining card can be drawn in 48
c 1 ways.
Let A be the event of getting 4 aces. Since the two separate events are mutually
exclusive. Hence
Total number of ways favorable to the event A
= {No. of ways favorable to × {No. of ways favorable to
getting 4 aces } getting the remaining card}
= 4 c 4 × 48 c1
4
c 4 × 48 c1
Hence P(A) = 52
c5
b) Let B be the event of getting 4 aces and 1 king.
Favorable ways of drawing a king =4c 1
Favorable ways of drawing a king =4c 4
These two events are mutually exclusive therefore favorable ways of drawing 1
king and 4 aces = 4 C4 × 4 C1
4 C 4 × 4 C1
P(A ) =
52 C5
Exercise 8.1
1. A bag contains 6 white and 9 black balls. Two draws of 4 balls are made and the
balls replaced after the first draw. Find the probability of getting 4 white balls in the
first draw and 4 black balls in the second draw.
(iii) Find the probability that five cards drawn from a well-shuffled pack are:
(i) 3 Tens and 2 Jacks
(ii) 3 are from any suit and 2 from the other.
3. In a gambling den, a game of Bridge is being played. What is the probability that
player ‘A’ will hold all the four kings.
HINT: In a game of Bridge 13 cards are distributed to each of the four players.
⇒ P(S ∪ φ) = P(S)
P(S) + P(φ) = P(S)
⇒ P(φ) = 0
Theorem 2: If A denotes the event and Ac its complement then P(A c ) = 1 − P(A ).
Proof:
Events A and Ac are mutually exclusive events therefore,
A ∪ Ac = S
( )
P A ∪ A c = P(S)
P(A C ) = 1 - P(A )
Theorem 3: The probability of the union of any two events A1 and A2 is given by
P(A 1 ∪ A 2 ) = P(A 1 ) + P(A 2 ) − P(A 1 ∩ A 2 )
Where A1 and A2 are not mutually exclusive.
Proof:
A1 S
A2
A1 ∩ A2c
A2 ∩ A1C
A A
A1 ∩ A2
We have A 1 ∪ A 2 = A 1 ∪ (A 1c ∩ A 2 )
P(A 1 ∪ A 2 ) = P(A 1 ∪ (A 1c ∩ A 2 ))
= P(A 1 ) + P(A 1c ∩ A 2 ) Using Axiom 3
= P(A 1 ) + P(A 1 ∩ A 2 ) + P(A 1 ∩ A 2 ) − P(A 1 ∩ A 2 )
c
L (− 1) P(A i ∩ A 2 ∩L ∩ A n )
n −1
= P(A ) × P(B).
Now the conditional probability P(A/B) refers to the sample space of NB occurrences out
of which NAB occurrences pertain to the occurrence of A when B has already occurred.
Thus
N AB
P(A/B) =
NB
N AB
Similarly P(B/A ) =
NA
N AB
Since P(A ∩ B) =
N
N AB N A
= ×
NA N
= P(B/A ) × P(A ).
On similar lines
N AB
P(A ∩ B) =
N
N AB N B
= ×
NB N
= P(A/B) × P(B)
∴ P(B/A ) P(A ) = P(A/B) P(B)
n
where i =1,2,3,...,n and P(E)= ∑ P(H i ∩ E)
∑ P(H ∩ E)
i =1
i
i =1
Proof:
Consider the following diagram.
Hn E
H1 H2
H1 ∩ E H 2 ∩ E ………………………… Hn ∩ E
E = (H 1 ∩ E ) ∪ (H 2 ∩ E ) ∪ L ∪ (H n ∩E ) Q H1 ∪ H 2 ∪ L ∪ H n = S
P(E ) = P(H 1 ∩ E ) + P(H 2 ∩ E ) + L + P(H n ∩ E )
∴E = E ∪S = E
Since Hi’s are mutually exclusive compound events Hi and E are dependent events.
Therefore using the law of conditional probability we have:
P(H i ∩ E )
P(H i /E ) =
P(E )
P(H i ∩ E )
⇒ P(H i \ E ) =
∑ P(H i ∩ E )
n
i =1
Example 8.5:
In a factory, machine A produces 30% of the output, Machine B 25% and
machine C the remaining 45%. 1% of the output of machine A is
defective.1.2% of B’s and 2% of C’s . In a day’s run the three machines
produce 10,000 items. An item drawn at random from a day’s output is
defective. What is the probability that it was produced by
(i) Machine A?
(ii) Machine B?
(iii) Machine C?
Solution:
Let E: defective item.
H1: Event that A produces the item.
H2: Event that B produces the item.
H3: Event that C produces the item.
Then P(H1/E) is the probability that the item was produced by machine A given that it is
defective; P(E/H1) is the probability that the defective item is produced by machine A;
P(E ∩ H 1 ) is the probability of the event that the item was produced by A and is
defective.
P(H1) =0.3
P(H2)=0.25
P(H3)=0.45
P(E/H1) =0.01
P(E/H2)=0.012
P(E/H3)=0.02
Exercise 8.2
4. If 10% of the rivets produced by a machine are defective, what is the probability that
out of 5 rivets chosen at random;
(i) None will be defective?
(ii) One will be defective?
(iii) At least two will be defective?
5. Kamau Speaks truth 75% and Otieno in 80% of the cases. In what percentage of the
cases are they likely to contradict each other in stating the same fact?
6. The contents of Urns I, II and III are as follows:
45 - 50 230
50 - 55 112
55 - 60 30
60 - 65 16
65 - 70 7
An individual is taken at random from the above group. Find the probability that his
wages;
(i) are under 40.
(ii) are 55 and over.
(iii) are either between 45 – 50 or 35 – 40.
14. A card is selected at random from a normal set of 52 playing cards. Let Q be the
event that the card is a queen and D the event that the card is a diamond. Find:
(i) P(Q ∩ D )
(ii) P(Q ∪ D )
(iii) P(Q C ∪ D )
(iv) P(Q ∩ D C )
15. The events A and B are such that P(A ) = P(B) = 2P(A ∩ B) . Given that
P(A ∪ B) = 0.6 , find
(i) P(A ∩ B)
(ii) P(A )
(iii) P(A C ∩ BC )
(iv) P(A ∩ BC )
16. The motherboard for a particular computer is manufactured at one of the three
factories A, B, C and then delivered to the main assembly line. Factory A supplies
45% of the total number of motherboards to the line, factory B 30% and factory
C 25%. Of the motherboards manufactured at factory A, 2% are faulty and the
corresponding percentages for factories B and C are 4% and 3% respectively.
Let A, B and C represent the events that a motherboard
chosen at random from the assembly line was
manufactured at factory A, B or C respectively and let
F denote the event that this motherboard is faulty.
(i) Calculate P(A ∩ F) , P(B ∩ F) and P(C ∩ F)
(ii) Find the probability that a motherboard selected at random from the
main assembly line is faulty.
17. A basket contains 6 white and 4 black balls. A ball is picked from the basket at
random and retained and then a second ball is picked out. Find the probability
that:
(i) Both balls are white
(ii) The balls are of different colours
(iii) The second ball is white given that the first one is black.
18. In Form III of Kisumu High School 55% of the students are boys. Of the boys
80% come from Nyakach constituency but only 75% of the girls come from this
constituency. The area Member of Parliament wishes to meet 4 student
representatives from this school. The headmaster decides to select one of the
representatives from Form III.
(i) Find the probability that a randomly chosen student comes from
Nyakach constituency.
(ii) Find the probability that a randomly chosen student does not come
from Nyakach constituency.
19. In Form IV of Mtwapa High School 60 students are studying one of the three
subjects Geography, French and Accounting. Of these 25 are studying
Geography, 26 are studying French, 44 are studying Accounting, 10 are studying
Geography and French, 15 are studying French and Accounting and 16 are
studying Geography and Accounting.
(i) Find the probability that a student chosen at random from those
studying Accounting is also studying French.
(ii) Are the events “studying Geography” and “studying French”
independent? Give reasons.
(iii) A student is chosen at random from all 60 students. Find the
probability that the chosen student is studying all the three subjects.
20. Two cards are drawn at random from a well-shuffled pack of 52. show that the
chance of drawing two aces is 1 221 .
21. Show that the chance of throwing a 6 at least once in two throws of a die is 11 36 .
22. A and B toss a coin alternately on the understanding that the first to obtain heads
wins the toss. Show that their respective chances of winning are 2 3 and 1 3 .
Now suppose they embark in throwing two dice, the first to throw 9 being
awarded the prize. Show that their chances of winning are in the ratio 9 : 8 .
23. Eight coins are thrown simultaneously. Show that the chance of obtaining at least
six heads is 37 256 .
24. Three men toss in succession for a prize to be given to the one who first obtains
heads. Show that their chances of winning are 4 7 , 2 7 and 1 7 .