0% found this document useful (0 votes)
117 views165 pages

Sma 160 Probability and Statistics 1

Probability and statistics notes that will enlighten hten your view and understanding

Uploaded by

surgentian0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
117 views165 pages

Sma 160 Probability and Statistics 1

Probability and statistics notes that will enlighten hten your view and understanding

Uploaded by

surgentian0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 165

lOMoARcPSD|31627713

SMA 160 Probability and Statistics 1

computer science (Kenyatta University)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)
lOMoARcPSD|31627713

KENYATTA UNIVERSITY
INSTITUTE OF OPEN LEARNING
SCHOOL OF PURE AND APPLIED SCIENCES

SMA 160: Probability & Statistics I

E. G. Njenga
S. M. Karuku
Department of Mathematics

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Preface

This module introduces statistical methods as applied to collecting, tabulating, analyzing,


presenting, and interpreting data. Topics covered include frequency distributions,
measures of central tendency, measures of dispersion, elementary probability theory,
regression and correlation. This is a basic course for students in business, behavioral and
social science and is part of the foundation for all Probability and Statistics courses in
later years. The purpose of this module is to introduce the student to statistics in a way
that will make the student aware of the techniques of statistics as they apply to the
solutions of practical problems in various fields. The module will be presented with
particular attention to statistical vocabulary, problem solving, and point of view. The
module is made up of eight units. Unit one introduces the concept of statistics, justifies
the study of statistics and discusses some of the areas where statistics is applied. Unit two
explores the graphical and tabular representation of data. Various methods of data
summarization and interpretation of statistical graphs, charts and tables are considered.
Units three and four explore numerical methods for ordering and presenting quantitative
data. Whereas Unit three considers the measures of location, Unit four discusses the
spread or variation within data. The symmetry and the peakedness of the distribution of a
sample or population are considered in Unit five where description of the shape of the
distribution is looked at. Units six and seven attend to the problem of Bivariate data with
the former addressing the question of the degree and strength of relationship between two
data sets and the latter dealing with the issue of prediction of one variable given the other.
The last unit; that is, Unit eight considers the problem of chance/uncertainty
mathematically. Here the concept of probability and its application in real life is dealt
with. The revision exercises at the end of each of units 2-8 ensure that the student has
grasped the subject matter of the unit.

E. G. Njenga
S. M. Karuku
Department of Mathematics

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

TABLE OF CONTENTS
Unit
Page
Unit 1
Introduction to Statistics 1

Unit 2
Data Presentation 30

Unit 3
Measures of Central Tendency 54

Unit 4
Measures of Dispersion 80

Unit 5
Moment, Skewness and Kurtosis 95

Unit 6
Correlation 111

Unit 7
Regression 126

Unit 8
Probability 136

Answers to Exercises 153

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

UNIT 1

INTRODUCTION TO STATISTICS
1.0 Introduction
In our everyday life, we come across different types of quantitative information in
newspapers, magazines, over radio and television. For example, we may hear or read that
the infant mortality rate had decreased at the rate of 15% per annum during the period
1997-1998, the population of Kenya had increased at the rate of 3% per year during the
period 1989-1999, the number of admission in Public Universities had gone up by, say,
4% during 1998-99 as compared to 1996-97, etc. We would like to know what these
figures mean. This quantitative information or expression is called statistical data or
statistics. The unit begins with a discussion of the meaning and scope of statistics. This is
followed by definitions of terms commonly encountered in statistics. We also outline the
stages involved in any statistical enquiry. The different types of variables and levels of
measurement are discussed. We also introduce some mathematical symbols commonly
used in statistics. At the end of the unit, a revision exercise is provided to assist the
student in reviewing the main ideas of the Unit and practice on the use of the
mathematical symbols considered.

have successfully completed SMA 102 –Basic


Mathematics.
Prerequisite
Before starting this Unit you should …

1.1 Objectives
By the end of this unit, you should be able to:
distinguish between statistical data and statistical methods.
classify statistical studies as either descriptive or inferential.
define the terms data, population, sample and variable.
identify the population and the sample in an inferential study.
explain what is meant by a representative sample.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

explain the different stages of statistical enquiry.


identify data as qualitative or quantitative, discrete or continuous, and
either nominal, ordinal, interval or ratio.
use the summation and multiplication symbols appropriately.
Learning Style
To achieve what is expected of you …
allocate sufficient study time.
briefly revise the prerequisite material.
attempt most of the exercises given including the revision exercises.

1.2 Need and Scope of Statistics


Although it would be impossible to have a unifying definition of the term statistics, the
following section will expound on the scope of the discipline and its relevance in our
day-to-day life.

1.2.1 Scope of Statistics


The term statistics can be used in at least two different ways.
(i) In one sense statistics refers to a collection of numerical facts summarizing
information that has been collected from several observations or from other
numerical data. For example, the number of accidents per week recorded by
the traffic department. In this sense, statistics are observations organized into
numerical form.

(ii) In the second sense, statistics refers to a set of tools for dealing with numerical
facts. Thus in this sense, statistics is a set of tools used to collect, organize,
present, analyze and interpret numerical facts or observations to make
decisions.

It is the second definition that constitutes the subject matter of this module. Within this
second definition, a distinction is often made between the two functions of the statistical
method.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

1. Descriptive statistics – consists of methods for organizing, displaying and


describing information in a convenient, usable and understandable form. It displays
information in tables, graphs and summary measures so as to illustrate the main
features.

2. Inferential statistics – concerned with inferring characteristics about the general


population based upon the data from a portion of the population. It uses procedures
to arrive at broader generalizations or inferences from the portion of the population
to the entire population.

The second function is beyond the scope of this module. As such we will deal with
descriptive statistics.

1.2.2 Need for Statistical Data


There is need of statistical data in every walk of life. No field of study is complete
without the supporting quantitative information about that field. No government
department can function well without the support of statistical data. The following are
some of the uses of statistical data in our everyday life.
(i) Prediction
One of the most and important uses of statistical data is prediction. The purpose is to
forecast the unknown value of some attribute of a system based on known values of other
attributes.
(ii) Planning
No economic planning is possible without the aid of statistical data. Targets of predictor
cannot be fixed unless we have data about available resources and requirements of, say, a
country.
(iii) Evaluation
We require statistical data not only to implement plans but also to know whether the
implementation has been proper or not.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

(iv) Justification of an assertion


A conjecture or a supposition is first developed on the basis of what we observe in real
life. It will, however, remain as such - a hypothesis, unless evidence accumulates in its
favour. In most cases, the hypothesis will be approved or disapproved by the statistical
data relative to the observations.

1.3 Definition of terms commonly used in Statistics


The following is a list of statistical terms that we shall frequently be using in this
course.
Variable – any characteristic (property or attribute) that differs or varies
from one observation to the next. A variable can assume different values.
Thus for example, weight, height and age are variables since they take on
different values when different individuals are observed.
Measurement – a system for assigning values to observations in a
consistent and reproducible way.

Note: in the plural sense, measurements refer to the values


! obtained as a result of the act of measuring. Another term
for measurements is observations.

Data - a set of numerical information obtained from enumeration or


measurement.

Note: the word data is a plural noun and always takes a


! plural verb, as in “the data were analyzed”. The singular
sense (datum) is rarely used.

Population – a collection (or set) of all individuals or items about which we


want information. For example, suppose we want to know the age of second
year university students and the amount of loan these students received last
year form the Higher Education Loans Board. The population in this case
would be all students who received the loan.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Parameter – a measure of some population characteristic.


Sample – a subset/portion of a population on which observations are made
in order to draw conclusions about the population.
Statistic – a measure of some sample characteristic. The main purpose of
statistical analysis is to draw conclusions about the real world by computing
useful statistics.

Note: this implies that there is a third meaning for the term
! statistics, which distinguishes a statistic from a parameter.

Random sample - a subset of a population selected in such a way that each


member of the population has an equal chance of being selected. A random
sample ensures that the selected objects are representative of the population
and there is a basis for generalization about the population.
Census – when all members of a population are included in a sample. That
is, data consists of measurements of variables taken on each and every
member of the population.
Inference – a generalization, prediction, decision, or conclusion about a
population characteristic based on sample data.

1.4 Stages involved in any Statistical enquiry


The following are the main steps involved in any statistical investigation.

(i) Definition/Formulation of the Problem


In our everyday life, questions arise about a phenomenon. The role of a researcher is to
formulate the problem in clear statistical terms. At this stage, the purpose and the
population for study are determined.

(ii) Planning
The researcher creates a design of the study/analysis. The decision as how to collect data
is made. To obtain an accurate and complete count of the population, the statistician
needs to decide on the precise nature of the items to be enumerated or measured. In most

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

cases, however, constraints like time and resources make a census unrealistic and
impossible. The only option left in that case is for the statistician to select an unbiased
(representative) sample from the population.

(iii) Data collection


Once a decision has been made on the type of data appropriate for the problem at hand,
the next stage is the actual collection of data.

(iv) Summarization and Analysis of Data


This step involves recording, organizing, summarizing and analyzing the data. It ensures
that the data are easily interpreted.
(v) Inference and Conclusion
On the basis of the data, practical conclusions about the population are drawn. The
researcher makes a decision concerning the problem formulated in Step (i).

1.5 Sources of Statistical Data


There is a basic distinction in data collection between primary and secondary data.
1.5.1 Primary Data
Primary data are data collected by the immediate user(s) of the data expressly for the
experiment or survey being conducted. That is, the investigator collects data directly from
the respondent (the source of answers to the investigator’s queries). Statistical
information thus collected is called primary data and the source of such information is
called primary source. For example, if the investigator collects the information about
the salaries of public universities employees by approaching them, then it is primary data
for him.
There are several methods for collecting primary data. Some of which are:
(i) Direct Observation

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

In this method, the investigator (also called interviewer) studies the facts and collects the
required data.
(ii) Interview
In this method, the investigator or his assistants establishes contact with the respondents.
An opportune time is agreed upon for a face-to-face interview with the respondents.
Alternatively the interviewer can use telephone interviews if the duration of the interview
is short.
(iii) Questionnaire method
In this method a question booklet is prepared and sent to respondents either through post
or taken personally to him.

What are the advantages of:


(i) direct observation?
? (ii)
(iii)
interview?
questionnaire method?

Qualities of a Good Research Question


• The question must be non-threatening.
• The question must be clear and unambiguous.
• The question must be simple and easy to understand. A question that asks for a
response on more than one dimension will not provide the information you are
seeking.
• It should not be answerable by a simple “Yes” or “No”.
• It should be reasonable and within the experience of the targeted respondents.
Advantages of Using Primary Data
• The investigator can collect the data according to his requirement.
• It is reliable and sufficient for the purpose of investigation.

Disadvantages of Using Primary Data


• It involves a lot of cost in terms of money, time and energy.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Many a times with some modifications, same purpose may be served by using data
collected by other persons or agencies.

1.5.2 Secondary Data


Secondary data refers to any data collected by a person or organization other than the
user(s) of the data. It may be useful to you in its original form, or you may have to
change its format to fit your needs. Statistical information thus obtained is called
secondary data. The source of such information is called secondary source. For
example, if the investigator collects the information about the salaries of employees of
public universities from the salary register maintained by the Commission for Higher
Education, then it is secondary data for him. Access to secondary data is increasingly
becoming available in electronic form.
In general there are two sources of secondary data: published and unpublished sources.

(i) Published sources


These are sources/agencies that collect the data and publish them in the form of either
regular journals or reports. For example, the Central Bureau of Statistics (CBS) in Kenya
collect, analyze and disseminate socio-economic statistics needed for planning and policy
formulation in the country. Some of the statistics published by this agency includes
family expenditure, import/export statistics, production statistics, agricultural statistics,
and population censuses.

(ii) Unpublished sources


These are sources that collect the data for their own use and do not get them published.
For example, some records maintained by universities, research scholars, government and
private offices are generally not published.

Advantages of using secondary data


• Secondary data may provide a context (geographic, temporal, social, etc.) for
primary data. This allows us to see where out primary data 'fit in' to the larger
scheme of things.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

• Secondary data may provide validation for primary data, whereby the secondary
data allow us to assess the quality and consistency of the primary data.
• Secondary data may act as a substitute for primary data. In some situations we
may simply not be able to collect data, for reasons of access, cost, or time; or the
data have been collected once and to repeat the collection process would be
undesirable.

The problems of secondary sources


Definitions – When making use of secondary data, the definitions used by those
responsible for its preparation may be different from the researchers’. Suppose,
for example, researchers are interested in rural communities and their average
family size. If published statistics are consulted then a check must be done on
how terms such as "family size" have been defined. They may refer only to the
nucleus family or include the extended family. Even apparently simple terms such
as 'farm size' need careful handling. Such figures may refer to any one of the
following: the land an individual owns, the land an individual owns plus any
additional land he/she rents, the land an individual owns minus any land he/she
rents out, all of his land or only that part of it which he actually cultivates.

Measurement errors – When a researcher conducts fieldwork she/he is possibly


able to estimate inaccuracies in measurement through the standard deviation and
standard error, but these are sometimes not published in secondary sources. The
only solution is to try to speak to the individuals involved in the collection of the
data to obtain some guidance on the level of accuracy of the data. The problem is
sometimes not so much 'error' but differences in levels of accuracy required by
decision makers. When the research has to do with large investments in, say, food
manufacturing, the management will want to set very tight margins of error in
making market demand estimates. In other cases, having a high level of accuracy
is not so critical. For instance, if a food manufacturer is merely assessing the
prospects for one more flavour for a snack food already produced by the company

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

then there is no need for highly accurate estimates in order to make the investment
decision.

Reliability –The reliability of published statistics may vary over time. It is not
uncommon, for example, for the systems of collecting data to have changed over
time but without any indication of this to the reader of published statistics. The
government may change geographical or administrative boundaries, or the basis
for stratifying a sample may have altered. Other aspects of research methodology
that affect the reliability of secondary data are the sample size, response rate,
questionnaire design and modes of analysis.

Time frame – Most censuses take place at 10-year intervals, so data from this and
other published sources may be out-of-date at the time the researcher wants to
make use of the statistics. The time period during which secondary data was first
compiled may have a substantial effect upon the nature of the data.

Source bias – Researchers have to be aware of vested interests when they consult
secondary sources. Those responsible for their compilation may have reasons for
wishing to present a more optimistic or pessimistic set of results for their
organization. For example, officials responsible for estimating food shortages
would wish to exaggerate figures before sending aid requests to potential donors.
Similarly, and with equal frequency, commercial organizations have been known
to inflate estimates of their market shares.

1.6 Types of Variables


Variables are classified according to their use in statistics. They are either quantitative
(numerical) or qualitative (categorical).

1.6.1 Quantitative (numerical) Variables


A quantitative variable is one for which the resulting observations can be measured,
because they posses a natural order or ranking. Examples of quantitative variables

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

include weight, age, height and price. Quantitative variables are further divided into
whether they are discrete or continuous.
A continuous variable is one for which all values in some range are possible.
Examples of continuous variables
Time taken to cover a distance of 100 metres by an athlete.
Volume of water in a fish pod at any time.
Height of plants in a greenhouse.
Weight of newborn babies in a maternity ward.

A discrete variable on the other hand is one for which the possible values form a finite (or
countably infinite) set of numbers. That is, a variable that assumes values that can be
counted.

Examples of discrete variables


Number of children per family.
Number of deaths in a hospital per day.
Number of patients in a doctor’s surgery.

Note: Typically, continuous variables arise from physical measurement,


! while discrete variables arise from physical counting.

1.6.2 Qualitative (categorical) Variables


A qualitative variable is a variable that cannot assume numerical value but can be
classified into nonnumeric categories.

Examples of qualitative variables


Gender – male or female
Religious preference – Christianity, Islam or Hinduism
Geographic location – East, West, North or South

The following chart illustrates the above classification of variables.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Variable

Quantitative Qualitative

Discrete Continuous

Figure1: Schematic classification of variables

Exercise 1.1
1. Which of the following variables are qualitative and which are
quantitative?
(a) The nationality of personnel at the World Health Organization
headquarters.
(b) Number of days absent from school due to illness.
(c) The political party people vote for in an election.
(d) The dimensions of the altar.
(e) The lifestyle of a member of the royal family.

2. Which of the following variables are discrete and which are


continuous?
(a) Speed of light in m/s.
(b) Gravitational force.
(c) Number of students who fail in an examination.
(d) Intensity of an earthquake on the Richter scale.
(e) Temperature (in oC).
(f) Number of defective screws in an assembly line.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

(g) Price of fuel in a filling station.

1.7 Levels (Scales) of Measurement


Before we can conduct a statistical analysis, we need to measure our variables. Exactly
how to carry out the measurement depends on the type of variable involved in the
analysis. Different types are measured differently. Although procedures for measurement
differ in many ways, they can be classified using a few fundamental categories. The
categories are called scales or levels of measurement. In a given scale all of the
procedures share some properties. There are four levels of measurement and it is
important to know what level of measurement you are working with as this partly
determines the arithmetic and statistical operations you can carry out on them. The four
levels of measurement are nominal, ordinal, interval and ratio. They are described in
the following section.

1.7.1 Nominal Scale


When measuring using a nominal scale, one simply labels or categorizes the variables.
Persons, things, and events characterized by a nominal variable are not ranked or ordered
by the variable. For example, gender is a nominal variable, in that being male is neither
better nor worse than being female. In the study of student loans, the type of institution is
a nominal variable with two attributes—private and public—to which we might assign
the numbers 0 and 1 or, if we wish, P and G. As another example, say you wanted to
classify a football team into left footed and right footed players, you could put all the left
footed players into a group classified as 1 and all the right footed players into a group
classified as 2. The numbers 1 and 2 are used for convenience, you could equally use the
letters L and R, or the words LEFT and RIGHT to label the groups of players. Numbers
are often preferred because text takes longer to type out and takes up more space.

Note: For purposes of data analysis, we can assign numbers to the


attributes of a nominal variable but must remember that the numbers are
! just labels and must not be interpreted as conveying the order of the
attributes.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

1.7.2 Ordinal Scale


With an ordinal scale, the attributes are ordered. One example is shoe size. Shoes are
assigned a number to represent the size, larger numbers mean bigger shoes so unlike the
nominal scale that just reflects a category or class, the numbers of an ordinal scale show
an ordered relationship between numbered items – we know that a shoe size of 6 is bigger
than a shoe size of 3. Similarly, observations about attitudes are often arrayed into five
classifications, such as greatly dislike, moderately dislike, indifferent to, moderately like,
greatly like. Although the ordinal level of measurement yields a ranking of attributes,
differences between adjacent scale values do not necessarily represent equal intervals on
the underlying scale giving rise to the measurements. That is, no assumptions are made
about the “distance” between the classifications. For example, you can’t say that a shoe
size of 6 is twice as big as a shoe size of 3. So numbers on an ordinal scale represent a
rough and ready ordering of measurements but the difference or ratios between any two
measurements represented along the scale will not be the same.
As for the nominal scale, with ordinal scales you can use textual labels instead of
numbers to represent the categories. The job groups of civil servants in some government
departments or the University degree classification – First Class, Second Class Upper
Division, Second Class Lower Division and Pass are classic examples. The Loans Board
could classify the students applying for study loan into social classes say I, II, III, and IV
based on their economic background or level of need.

For data analysis, numbers are assigned to the attributes (for example, greatly dislike
= −2 , moderately dislike = −1 , indifferent to = 0 , moderately like = +1 , and greatly like
= +2 ), but the numbers are understood to indicate rank order and the “distance” between
the numbers has no meaning. Any other assignment of numbers that preserves the rank
order of the attributes would serve as well.

1.7.3 Interval Scale


On an interval scale, measurements are not only classified and ordered, but the distances
between each interval on the scale are equal right along the scale from the low end to the
high end. The same distance separates two points next to each other on the scale, no

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

matter whether they are high or low. For example, consider the Fahrenheit scale of
temperature. The difference between 86o and 66o represents the same temperature
difference between 90o and 70o. This is because each 20o interval has the same physical
meaning (in terms of kinetic energy molecules). Interval scales, however, do not have a
true zero point: the zero is arbitrary. 0o Fahrenheit does not represent the complete
absence of temperature (the absence of any molecular kinetic energy). Consequently, it
does not make sense to compute ratios of temperature. For example, there is no sense in
which the ratio of 86o and 43o is the same as the ratio of 90o and 45o; no interesting
physical property is preserved across the two ratios. It does not make sense to say that 90o
is “twice as hot” as 45o

Another example of variables measured on an interval scale is the calendar years. The
arbitrary 0 was assigned when Christ was born and time before this is labelled ‘BC’.

1.7.4 Ratio Scale


The ratio scale of measurement is an interval scale with the additional property that its
zero position indicates the absence of the quantity being measured. Thus measurements
expressed on a ratio scale have a true (absolute) zero point. Examples of ratio scale
include length, weight, age and speed. With ratio variables, it makes sense to form ratios
of observations and it is thus meaningful, for example, to say that a person of 60 years is
twice as old as one of 30.

Exercise 1.2
What type of scale is being used in each of the following measurements?
(a) Altitude (height above sea level).
(b) The presidential candidate people vote for in an election.
(c) Pain level: None, Mild, Moderate, Severe.
(d) The dimensions of the altar.
(e) Age of an athlete taking part in the Youth Athletics Championship.
(f) Arrival time of a plane at an international airport.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

(g) Cause of death: Cancer, Heart attack, Accident, Other.


(h) Size of container: Small Medium, Large.

1.8 Some Mathematical Symbols Commonly Found in Statistics


In this section, we review some commonly used symbols and notations, which we shall
extensively exploit in this course.

1.8.1 Subscripts
As in other branches of mathematics, statistics uses literal symbols to represent variable
quantities. Subscripts are usually used to distinguish variables that are related in some
sense.
1.8.1.1 Single Subscript
Suppose we take X to be the height (in feet) of a fresh college student. Now, we may
measure the value of X for several students; say 8 of them, getting the set of values
{4.5, 5.2, 5.0, 5.2, 5.4, 4.9, 5.7, 5.8}
These eight values are all values of X, but correspond to different observations of the
variable X. To distinguish symbolically between such alternative values of a single
variable X, it is common to use a subscript notation, X, where i = 1 for the first value,
i = 2 for the second, and so on. Thus, for this example,
X1 = 4.5, X2 = 5.2, X3 = 5.0, X4 = 5.2, X5 = 5.4, X6 = 4.9, X7 = 5.7, and X8 = 5.8.
These subscripts are viewed as numerical labels used to distinguish one of a set of values
from the others.

1.8.1.2 Multiple Subscripts


Double subscripts can be used to indicate that a collection of variables differs along two
dimensions. For instance in agricultural experiments, a double subscript can be used to
represent the yields in a plot with one subscript indicating the row number and the other
the column number within the same plot. Thus for example X5,3 is used to represent the
yield in the fifth row and the third column of the plot (Table1).

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Col1 Col2 Col3 Col4 Col5 Col6 Col7


Row1 X1,1 X1,2 X1,3 X1,4 X1,5 X1,6 X1,7
Row2 X2,1 X2,2 X2,3 X2,4 X2,5 X2,6 X2,7
Row3 X3,1 X3,2 X3,3 X3,4 X3,5 X3,6 X3,7
Row4 X4,1 X4,2 X4,3 X4,4 X4,5 X4,6 X4,7
Row5 X5,1 X5,2 X5,3 X5,4 X5,5 X5,6 X5,7
Row6 X6,1 X6,2 X6,3 X6,4 X6,5 X6,6 X6,7
Row7 X7,1 X7,2 X7,3 X7,4 X7,5 X7,6 X7,7
Row8 X8,1 X8,2 X8,3 X8,4 X8,5 X8,6 X8,7
Row9 X9,1 X9,2 X9,3 X9,4 X9,5 X9,6 X9,7
Table1: A generic agricultural plot with nine rows and seven columns.

Triple subscripts can be used to indicate that a collection of variables differs along three
dimensions. For instance, suppose that instead of only one plot in the agricultural
experiment considered above, we had, say, four of them each with seven columns and
nine rows as in Table1 above. Then in this case we would use triple subscripts with one
subscript indicating the plot number, the second subscript indicating the row number and
the third subscript indicating the column number. That is Xi,j,k represents the yields in the
ith plot of the jth row and the kth column. Thus in the table below, X4,5,3 represents the
yields in the fourth plot of the fifth row and the third column.

X111 X112 X113 X114 X115 X116 X117 X211 X212 X213 X214 X215 X216 X217
X121 X122 X123 X124 X125 X126 X127 X221 X222 X223 X224 X225 X226 X227
X131 X132 X133 X134 X135 X136 X137 X231 X232 X233 X234 X235 X236 X237
X141 X142 X143 X144 X145 X146 X147 X241 X242 X243 X244 X245 X246 X247
X151 X152 X153 X154 X155 X156 X157 X251 X252 X253 X254 X255 X256 X257
X161 X162 X163 X164 X165 X166 X167 X261 X262 X263 X264 X265 X266 X267
X171 X172 X173 X174 X175 X176 X177 X271 X272 X273 X274 X275 X276 X277
X181 X182 X183 X184 X185 X186 X187 X281 X282 X283 X284 X285 X286 X287
X191 X192 X193 X194 X195 X196 X197 X291 X292 X293 X294 X295 X296 X297

Plot1 Plot2

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

X311 X312 X313 X314 X315 X316 X317 X411 X412 X413 X414 X415 X416 X417
X321 X322 X323 X324 X325 X326 X327 X421 X422 X423 X424 X425 X426 X427
X331 X332 X333 X334 X335 X336 X337 X431 X432 X433 X434 X435 X436 X437
X341 X342 X343 X344 X345 X346 X347 X441 X442 X443 X444 X445 X446 X447
X351 X352 X353 X354 X355 X356 X357 X451 X452 X453 X454 X455 X456 X457
X361 X362 X363 X364 X365 X366 X367 X461 X462 X463 X464 X465 X466 X467
X371 X372 X373 X374 X375 X376 X377 X471 X472 X473 X474 X475 X476 X477
X381 X382 X383 X384 X385 X386 X387 X481 X482 X483 X484 X485 X486 X487
X391 X392 X393 X394 X395 X396 X397 X491 X492 X493 X494 X495 X496 X497

Plot3 Plot4
Figure2: A generic experimental set up with four plots each plot having nine rows and seven columns

Exercise 1.3
1. What is a superscript? What is the difference between a superscript and an
exponent?
2. Give an example in which
(a) a double subscript can be used.
(b) a triple subscript can be used.

1.8.2 The Summation Notation


Addition is a very common operation in statistics. To make it easy to indicate the
computation of a sum, a mathematical notation using the Greek letter sigma (Σ) is used.

1.8.2.1 Single Summation


Let X 1 , X 2 , X 3 , X 4 , X 5 , X 6 , X 7 and X 8 be variables. Then their sum can be defined
using the sigma as
8
X1 + X 2 + X 3 + X 4 + X 5 + X 6 + X 7 + X 8 = ∑ X i
i =1

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Here i is the variable ranging over the integers 1, 2, 3, 4, 5, 6, 7 and 8. The symbol i = 1
below the Σ sign indicates that 1 is the initial value taken on by i and the 8, written above
the Σ sign indicates that 8 is the last value of i .
We call i the index of summation while X i is referred to as the summand. The summand
is a function of i which takes on the values X 1 , X 2 , X 3 , X 4 , X 5 , X 6 , X 7 and X 8 as i

takes on successively the values 1, 2, 3, 4, 5, 6, 7 and 8. The Σ sign indicates the fact that
the values X 1 , X 2 , X 3 , X 4 , X 5 , X 6 , X 7 and X 8 taken on by X i are to be added. The
8
entire symbol ∑X
i =1
i is read “the summation of X i as i ranges from 1 to 8 ”. In general,

suppose we want to add X1 + X 2 + L + X n . The shorthand notation for this summation is


n

∑X
i =1
i , which is read “the sum of X i for i = 1 to i = n ”.

Note: Sometimes the indices below and above the summation sign are
n

∑X ∑ X . This is
! omitted and the notation
i =1
i

particularly so when there is no danger of ambiguity.


is simply written as

1.8.2.2 Properties of the Summation Operator


Let X, Y be variables and k a constant. Then
n n n
1. ∑ (X
i =1
i ± Yi ) = ∑ X i ± ∑ Yi
i =1 i =1

Proof:
There are two parts in this proof; the sum and the difference rule. We begin with the
former. In this case
n

∑ (X
i =1
i + Yi ) = (X1 + Y1 ) + (X 2 + Y2 ) + L + (X n + Yn )

= X1 + X 2 + L + X n + Y1 + Y2 + L + Yn
n n
= ∑ X i + ∑ Yi
i =1 i =1

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Next, we consider the difference rule:


n

∑ (X
i =1
i − Yi ) = (X 1 − Y1 ) + (X 2 − Y2 ) + L + (X n − Yn )

= X 1 + X 2 + L + X n − Y1 − Y2 − L − Yn
= X 1 + X 2 + L + X n − (Y1 + Y2 + L + Yn )
n n
= ∑ X i − ∑ Yi
i =1 i =1
n n n
Therefore, ∑ (Xi ± Yi ) = ∑ Xi ± ∑ Yi
i =1 i =1 i =1

n
2. ∑ k = nk , k ≠ 0
i =1

Proof:
Recall that for all i ≠ 0 , i 0 = 1 . Thus we can write k = k (i 0 ) .
n n
This implies that ∑ k = ∑ k (i0 )
i =1 i =1

= k (10 ) + k (20 ) + L + k (n 0 )
= k (1) + k (1) + L + k (1)
= k + k + L + k {n times}
= nk

n n
3. ∑ kXi = k ∑ Xi
i =1 i =1

Proof:
n

∑ kX
i =1
i = kX1 + kX 2 + L + kX n

n
= k (X1 + X 2 + L + X n ) = k ∑ X i
i =1

n
   n n
4. ∑ (X i Yi ) ≠  ∑ X i  ∑ Yi 
i =1  i =1  i =1 

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Proof:
Suppose the equality holds. That is, for any given set of values for the variables X and Y
n
  n
 n

∑ (X i Yi ) =  ∑ X i  ∑ Yi  .
i =1  i =1  i =1 
Now consider the following set of values for the variables X and Y variables X and Y:
X1 = 1 , X 2 = 2 , X 3 = 3 , X 4 = 4

Y1 = 3 , Y2 = 4 , Y3 = 6 , Y4 = 5
4
Now, ∑ (X Y ) = X Y
i =1
i i 1 1 + X 2 Y2 + X 3 Y3 + X 4 Y4

= (1 × 3) + (2 × 4) + (3 × 6) + (4 × 5)
= 3 + 8 + 18 + 20 = 49
 4  4 
Next,  ∑ X i  ∑ Yi  = (1 + 2 + 3 + 4 ) × (3 + 4 + 6 + 5)
 i =1  i =1 
= 10 × 18 = 180
Clearly, 49 ≠ 180 , a contradiction to our supposition that the equality for any two
variables X and Y holds. Hence
n
 
n
 n

∑ (X i Yi ) ≠  ∑ X i  ∑ Yi 
i =1  i =1  i =1 

Note: an immediate consequence of Rule 4 above is


obtained when the two variables X and Y are equal
! ∑
n

i =1
(X i X i ) ≠
 n
 ∑ X
 n
 i =1  i =1 

i  ∑ X i  ; that is, ∑
n

i =1
X ≠
 n 
 ∑ Xi 
2
i
 i =1 
2

1.8.2.3 Double Summation


When a double summation sign appears in front of a variable with a double script, both
subscripts are set to the first value, and then the second summation is completed. Next,
the first subscript is incremented by 1, and the second summation is completed again, and
so on. Thus

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

m n

∑∑ X ij = (X11 + X12 + L + X1n ) + (X 21 + X 22 + L + X 2 n ) + L + (X m1 + X m 2 + L + X mn )


i =1 j =1

m n n m
An important property of the double summation is that ∑∑ X ij = ∑∑ X ij . We
i =1 j=1 j=1 i =1

demonstrate this property in the following example.

Example 1.1:
Let X ij denote the biomass (in g) of a genetically modified maize plant in the ith row

and the jth column of an experimental plot with five rows and four columns as shown
below:

Column1 Column2 Column3 Column4 Total


4
Row1 3 7 2 4 ∑X
j=1
1j = 16
4
Row2 5 1 2 6 ∑X
j =1
2j = 14
4
Row3 4 2 8 9 ∑X
j=1
3j = 23
4
Row4 2 3 7 8 ∑X
j=1
4j = 20
4
Row5 4 1 3 5 ∑X
j =1
5j = 13
5 5 5 5
Total ∑X
i =1
i1 = 18 ∑X i =1
i2 = 14 ∑X i =1
i3 = 22 ∑X
i =1
i4 = 32

Now,
5 4 5  4 
∑∑ X ij = ∑  ∑ X ij 

i =1  j=1

i =1 j=1 
5
= ∑ (X i1 + X i2 + X i3 + X i4 )
i =1
5 5 5 5
= ∑ X i1 + ∑ X i2 + ∑ X i3 + ∑ X i4
i =1 i =1 i =1 i =1
= (sum of col1) + (sum of col2) + (sum of col3) + (sum of col4)
= 18 + 14 + 22 + 32 = 86

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Next,
4 5 4
 5 
∑∑
j=1 i =1
X ij = ∑  ∑ X ij 
j=1  i =1 
= ∑ (X 1j + X 2j + X 3j + X 4j + X 5j )
4

j=1
4 4 4 4 4
= ∑ X 1j + ∑ X 2j + ∑ X 3j + ∑ X 4j + ∑ X 5j
j=1 j=1 j=1 j=1 j=1

= (sum of row1) + (sum of row2) + (sum of row3) + (sum of row4) + (sum of row5)
= 16 + 14 + 23 + 20 + 13 = 86
m n n m
Hence, ∑∑ X ij = ∑∑ X ij
i =1 j=1 j=1 i =1

1.8.3 The Product Notation


Another shorthand notation in mathematics, though not frequently used in statistics is the
product operator, which represents multiplication of variables. In the computation of a
product, a mathematical notation using the Greek letter pie (Π) is used.

1.8.3.1 Single Multiplication


If X 1 , X 2 , X 3 , X 4 , X 5 , X 6 , X 7 and X 8 are variables, then their product can be defined
using pie as
8

∏X
i =1
i = X1 × X 2 × X 3 × X 4 × X 5 × X 6 × X 7 × X8

Here, again, i is the index of multiplication, whose range is indicated by the notations on
the Π symbol and X i is a function of i . The Π sign indicates the fact that the values X 1 ,

X 2 , X 3 , X 4 , X 5 , X 6 , X 7 and X 8 taken on by X i are to be multiplied. The entire

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

8
symbol ∏X
i =1
i is read “the product of X i as i ranges from 1 to 8” In general, suppose

we want to multiply X 1 × X 2 × L × X n . The shorthand notation for this multiplication is


n

∏X
i =1
i , which is read “the product of X i for i = 1 to i = n ”.

Notes:
(1) Sometimes the indices below and above the multiplication sign
n
are omitted and the notation ∏X i is simply written as ∏X .

!
i =1

This is particularly so when there is no danger of ambiguity.


(2) Sometimes the multiplication sign between the variables is
replaced with a mathematical dot. Thus X j × X k ≡ X j ⋅ X k . In

this course we shall be using the two notations interchangeably.

1.8.3.2 Properties of the Product Operator


Let X, Y be variables and k a constant. Then,

n
1. ∏k = k
i =1
n

Proof:
Recall that, for all real numbers i ≠ 0 ., i 0 = 1 . In our case i ranges over integers, and
hence we can write k = k ⋅ (i 0 ) .
This implies that
n n

∏ k ≡ ∏ k ⋅ (i
i =1 i =1
0
) = k ⋅ (10 ) × k ⋅ (2 0 ) × k ⋅ (30 ) × L × k ⋅ (n 0 )

= k ⋅ (1) × k ⋅ (1) × k ⋅ (1) × L × k ⋅ (1) {n times}


= k × k × k × L × k {n times}

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

= kn
n n
2. ∏ k ⋅ Xi = k n ⋅ ∏ Xi
i =1 i =1
Proof:
n

∏k ⋅X
i =1
i = k ⋅ X1 × k ⋅ X 2 × L × k ⋅ X n

= (k × k × L × k ) ⋅ (X1 × X 2 × L × X n )
n n
= (k n ) ⋅ ∏ X i = k n ∏ X i
i =1 i =1

n
 n  n 
3. ∏X i =1
i ⋅ Yi =  ∏ X i  ∏ Yi 
 i =1  i =1 
Proof:
n

∏X
i =1
i ⋅ Yi = (X 1 ⋅ Y1 ) × (X 2 ⋅ Y2 ) × L × (X n ⋅ Yn )

 n   n 
= (X1 × X 2 × L × X n ) ⋅ (Y1 × Y2 × L × Yn ) =  ∏ X i  ⋅  ∏ Yi 
 i =1   i =1 

1.8.3.3 Double Multiplication


When a double multiplication sign appears in front of a variable with a double script,
both subscripts are set to the first value, and then the second multiplication is completed.
Next, the first subscript is incremented by 1, and the second multiplication is completed
again, and so on. Thus
m n

∏∏ X
i =1 j=1
ij = (X 11 × X 12 × L × X 1n ) × (X 21 × X 22 × L × X 2 n )

× L × (X m1 × X m 2 × L × X mn )
m n n m
Analogous to the double summation property, we have that ∏∏ X
i =1 j=1
ij = ∏∏ X ij .
j=1 i =1

This property is demonstrated in the following example.

Example 1.2:
Using the data of the above example on the yields of genetically modified maize
plants, we have the following table.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Column1 Column2 Column3 Column4 Product


4

Row1 3 7 2 4
∏X
j=1
1j = 168

Row2 5 1 2 6 ∏X
j=1
2j = 60

Row3 4 2 8 9
∏X
j=1
3j = 576

Row4 2 3 7 8
∏X
j=1
4j = 336

Row5 4 1 3 5 ∏X
j=1
5j = 60
5 5 5

Product ∏X
i =1
i1 = 480
∏X
5

i2 = 42 ∏X i =1
i3 = 672 ∏X
i =1
i4 = 8640
i =1

Now,
5 4 5  4 
∏∏ X ij = ∏  ∏ X ij 

i =1  j =1

i =1 j =1 
5
= ∏ (X i1 × X i2 × X i3 × X i4 )
i =1

 5   5   5   5 
=  ∏ X i1  ×  ∏ X i2  ×  ∏ X i3  ×  ∏ X i4 
 i =1   i =1   i =1   i =1 
= (product of col1) ⋅ (product of col2) ⋅ (product of col3) ⋅ (product of col4)

= 480 × 42 × 672 × 8640 = 1.1705 × 1011 (5 significant figures).


Next,
4 5 4
 5 
∏∏
j=1 i =1
X ij = ∏  ∏ X ij 
j=1  i =1 

= ∏ (X 1j × X 2j × X 3j × X 4j × X 5j )
4

j=1

 4   4   4   4   4 
=  ∏ X1j  ⋅  ∏ X 2j  ⋅  ∏ X 3j  ⋅  ∏ X 4j  ⋅  ∏ X 5j 
 j=1   j=1   j=1   j=1   j=1 

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

= (product of row1) × (product of row2) × (product of row3)


× (product of row4) × (product of row5)

= 168 × 60 × 576 × 336 × 60 = 1.1705 × 1011 (5 significant figures).


Hence,
m n n m

∏∏ X ij = ∏∏ X ij .
i =1 j=1 j=1 i =1

1.9 Revision Exercises


1. Which of the following would be called a statistic and which a parameter?
(i) The average height for 400 university students selected at random
from various campuses in Kenya.
(ii) The average height for University students in Kenya.

2. Which of the following variables are continuous and which are discrete?
(i) Level of chemical pollutant in the air.
(ii) Number of children a woman has had.
(iii) The heights of students in a class.
(iv) The thickness of blood vessels in different species.
(v) Number of goals scored by a soccer team.
(vi) The length of skipping ropes used by sportspersons in the pitch.
(vii) The blood pressure of a recuperating patient in a hospital.
(viii) Number of clinic visits made in one year.

3. Are the following variables categorical or numerical?


(i) Type of practitioner seen for prenatal care e.g., obstetrician, family
practitioner, or nurse/midwife.
(ii) The acceleration of a projectile from a point A to a point B with an
initial velocity of 10 m/s2.
(iii) The heights of 3-weeks old maize seedlings in a greenhouse.
(iv) The religious faiths of participants of an international conference.
(v) The speed of an aircraft from Oslo to Rome international airports.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

(vi) The popularity of presidential candidates in different parts of the


country.
(vii) The model of car owned by a university professor.
4. Fill in the scales to classify the different measurements

Nominal Ordinal Interval Ratio


Sex
Height
Weight
Year of Birth
Temperature
Blood Group
Shoe size
Team Number

5. What type of scale is being used for each of the following measurements?
(i) The type of car driven by a university professor.
(ii) The number of cars passing a particular point on the highway.
(iii) The level of formal education attained by citizens of a country.
(iv) The types of occupations of men in the age bracket 30 – 50 years.
(v) The academic grade attained by college students in a particular course.
(vi) The cost of beef in different meat shops within the city centre.
(vii) The temperature (in oC) recorded at a given location.
(viii) The speed at which elephants move.
(ix) The weight of newborn babies in a maternity ward.

6. Consider the following set of values for the two variables X and Y:
X1 = 2 , X 2 = 3 , X 3 = 7 , X 4 = 8 , X 5 = 1 , X 6 = 10

Y1 = 4 , Y2 = 5 , Y3 = 12 , Y4 = 15 , Y5 = 9 , Y6 = 20

Find the value of each of the following expressions:


{You may use your calculator}.
6 3 6
(a) ∑X
i =1
i (b) ∑Y
i =1
i (c) ∑ 2X
i =1
i

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

2
6 6
 6 
(d) ∑ 5Y
i =1
i (e) ∑X
i =1
2
i (f)  ∑ X i 
 i =1 
2
6
 6  6
(g) ∑Y
i =1
i
2
(h)  ∑ Yi 
 i =1 
(i) ∑ (Xi =1
i + Yi )

 6  6 
∑ (X )
6 6
(j) ∑X Y
i =1
i i (k)  ∑ X i  ∑ Yi 
 i =1  i =1 
(l)
i =1
2
i + Yi2

6 6 6

∑ (Xi − Yi ) ∑ (Xi + Yi ) ∑ (X + 6)
2
(m) (n) (o) i
i =1 i =1 i =1

6
(p) ∑ (Y − 3)
i =1
i

7. By considering a specific set of values for a variable X, demonstrate that


2
n
 n 

i =1
X ≠  ∑ Xi 
2
i
 i =1 
8. Prove that if X1 , X 2 ,L, X n is a given set of values for a variable X and k is a real
n n
constant, then ∑ (Xi + k ) = ∑ Xi + nk
i =1 i =1

9. Prove that if X1 , X 2 ,L, X n is a given set of values for a variable X, c and k are real
n n
constants, and n is an integer, then ∑ (k + cX ) = nk + c∑ X
i =1
i
i =1
i

10. Using the set of values for the variables X and Y in Exercise 6, find the value of
each of the following: {you may use your calculator}.
6 4 3
(a) ∏ Xi
i =1
(b) ∏ Yi
i =1
(c) ∏ 3X
k =1
k

2
3  3  3
(d) ∏ 2Y j (e)  ∏ Yj  (f) ∏X Y i i
j=1  j=1  i =1

 3  3  3 3
(g)  ∏ X i  ∏ Yj  (h) ∏ (X − Yi ) ∏ (X + Yi )
2
i (i) i
 i =1  j=1  i =1 i =1

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

3 6
(j) ∏ (X i + 6)
i =1
(k) ∏ (Y
k =1
k − 3)

11. The table below shows the physical measurements of some 6 cadets recruited into
the armed forces. The variable X represents the height (in ft.) and Y represents the
weight (in kg.)
Cadet Serial X Y
Number
1 5.6 60.9
2 6.2 58.7
3 5.9 62.3
4 5.7 59.0
5 5.3 67.8
6 5.5 71.6

(a) If each of the cadets lost 1.4 kg after 3 months of rigorous training,
find the sum of their weights after the 3 months.
(b) Find the product of their heights after 1 year if each had gained 0.5 ft.
(c) Find the sum of the squares of their heights at the time of recruitment.
(d) Find the square of the sum of their heights at the time of recruitment.
(e) Find the product of the squares of their heights at the time of
recruitment.
(f) Find the sum of three times their weights at the time of recruitment.

12. By considering a specific set of values for a two-dimensional variable X,


demonstrate that
m n n m
(i) ∑∑ X
i =1 j=1
ij = ∑∑ X ij
j=1 i =1

m n n m
(ii) ∏∏ X ij = ∏∏ X ij
i =1 j=1 j=1 i =1

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

UNIT 2

DATA PRESENTATION

2.0 Introduction
After the data have been collected, they are typically recorded as a tabulated set of
numbers. However, a list of numbers is not very helpful for:
• determining trends or relationships between different variables.
• presentation of data to demonstrate a relationship or support a hypothesis.
A more visually accessible form of the data is required in such cases. There are a wide
variety of ways to summarize and present data. In this Unit, we consider the graphical
method. The numerical method of data summarization will be discussed in Units 3 and 4.

Be conversant with the various types of data.


Prerequisite
Before starting this Unit you should …

2.1 Objectives
By the end of this Unit, you should be able to:
Prepare grouped and ungrouped frequency distributions from a given
data set.
Graphically display grouped and ungrouped frequency distributions by
means of a histogram and an ogive from a given data set.
Present data in a variety of graphical forms including the pie chart, bar
chart, column chart, stem-and-leaf plots, box-and-whisker diagram
(boxplot)
Construct and interpret bar graphs and pie charts using a given set of
data.
Construct and interpret a box-and-whisker diagram (boxplot) from a
given set of data.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Construct and interpret a stem-and-leaf plot from a given set of data.


Learning Style
To achieve what is expected of you…
Briefly revise the previous Unit.
Allocate sufficient study time.
Attempt most of the practice and revision exercises.

2.2 Frequency Distributions


A common experimental situation involves the measurement of the same quantity for a
large range of subjects or equivalent situations. The raw data for such a single variable
measurement consists of a large collection of measurements, one for each subject or
situation. Generally it is very hard to observe any real pattern in the raw data. A standard
way to proceed is to construct a frequency table. This is done by listing each possible
experimental value (or appropriate range of values) in one column and recording the
corresponding frequencies in another column. The term frequency refers to the number of
times each value (or set of values) occurs in the original data.

Note: When we have more than categorical variable in our data

! set, a frequency table is sometimes referred to as a contingency


table because the figures found in the rows are contingent
(dependent) upon those found in the columns.

There are two types of frequency distributions –grouped and ungrouped.

2.2.1 Ungrouped Frequency Distribution


If each measurement can produce only one of a relatively small number of
discrete values, a frequency table can be constructed by simply recording the
frequency of each of the possible values.

Example 2.1:

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

The set of observations below shows the number of times that each of 30
public service vehicles plying a certain route was charged with a traffic
offence during the month of December in the year 1999.
3 0 1 6 0 5
6 2 1 3 6 3
4 0 6 2 3 5
6 0 6 6 5 1
1 5 2 4 0 0

The following table summarizes the above data

Times Tally of number Frequency Cumulative Relative


Charged of vehicles frequency frequency
0 6 6 0.20
1 4 10 0.13
2 3 13 0.10
3 4 17 0.13
4 2 19 0.07
5 4 23 0.13
6 7 30 0.23
Total 30

Each observation is represented by a tally (slash) mark placed against the


appropriate category. A group of five is represented by making it easier to
add the tallies. The tally marks for each category are totaled to give the
frequency for that category.
The cumulative frequency column is a running tally of how many measurements have
values up to and including the given value. That is, we obtain the cumulative frequency
for the figures in the table by adding the frequencies as we go down the column
frequencies: 6, 6+4, 6+4+3, and so on, giving 6, 10, 13, …. These show, for example,
that there were 17 vehicles charged at most 3 times for traffic offences.
The relative frequencies are obtained by expressing the frequencies as a proportion of the
total frequency. Thus, for example, the relative frequency for the value 2 in the above
data is

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

frequency of the value 2 3


= = 0.10
total frequency 30

This value indicates that 1% of the public service vehicles considered above
were charged twice for traffic offences during the period under
consideration.

2.2.2 Grouped Frequency Distribution


Listing the frequency of each possible observation is not sensible if
• the number of observations is very large.
• the observations are from a continuous variable such as height or weight.
In such cases, we use an alternative summary called a grouped frequency distribution.
This involves dividing the possible range of observations into classes, and recording the
frequencies associated with each class. Class frequencies can be converted into relative
frequencies or percentage frequencies.

Note: Exact measurements for each individual are no longer

! recorded and so information is lost when a grouped frequency


distribution is formed.

Example 2.2:
The set of numbers below shows the marks scored in an end-of-semester
Psychology examination by 150 students in a certain university 1994.

88 53 29 36 58 56 78 90 68 59 35 65 54 87 44
66 45 87 63 53 52 48 38 48 61 80 46 70 54 67
58 65 32 39 60 57 81 92 68 90 27 68 84 83 56
42 50 67 90 80 88 93 92 51 93 87 75 59 68 79
78 76 89 86 91 50 49 89 38 76 45 46 73 49 91
70 86 89 80 90 41 53 86 43 49 82 76 72 58 80
90 87 51 43 76 70 85 81 46 79 86 81 79 80 78
84 64 59 63 75 74 89 61 92 77 87 42 65 93 87
79 41 33 57 86 88 91 83 73 31 44 51 55 62 70

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

32 68 47 29 54 50 43 51 88 91 52 76 88 90 63

Unless we organize the above data in some systematic manner, it is very


difficult to make sense of them even after several hours of trying to study
them. Although we could construct an ungrouped frequency distribution for
the data, this would still not make the interpretation of the data any better. A
more sensible way of presenting this data is to divide the data into classes
and calculate the frequency with which each class occurs. Choosing class
intervals of 20 – 29, 30 – 39, 40 – 49, 50 – 59, 60 – 69, 70 – 79, 80 – 89, and
90 – 99, we can now form a grouped frequency distribution as in the
following table.

Marks Tally of Frequency Cumulative Relative Percentage


Marks frequency frequency frequency
20 – 29 3 3 0.02 2%
30 – 39 9 12 0.06 6%
40 – 49 20 32 0.13 13%
50 – 59 26 58 0.17 17%
60 – 69 19 77 0.13 13%
70 – 79 23 100 0.15 15%
80 – 89 34 134 0.23 23%
90 – 99 16 150 0.11 11%
Total 150

The following are the general guidelines for constructing a grouped


frequency distribution from a given data set:

Determine the number of classes


Before we can construct our frequency table we must determine how many
classes we should use. This is purely arbitrary, but too few or too many
classes will not provide as clear a picture as can be obtained with some
nearly optimum number. An empirical relationship, known as Sturge’s rule,
may be used as a useful guide to determine the optimum number of classes:

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

c = smallest integer ≥ [1 + 3.332 ⋅ log(n)] ,

where c is the number of classes and n is the total number of observations in


the data set.
Alternatively;
Find the difference between the largest and the smallest value for the
measured quantity. This quantity is referred to as the range and will be
considered in Unit 4.
Divide the range into a convenient number of class intervals having the same
size. If this is not feasible, use class intervals of varying sizes or open intervals
(such as “below 30” or “80 and over”). As a rule of thumb, the number of
class intervals is taken to be between 5 and 20.

Find the corresponding class frequencies


Record the frequencies associated with each class.
Determine the class width
The class width (upper class boundary – lower class boundary) should
be an odd number. This ensures that the midpoint (half the sum of the lower
class limit and the upper class limit) has the same number of decimal points
as the original data. The reason for this requirement will become apparent
when we consider the cumulative frequency histogram in the next section.

The class intervals must be mutually exclusive


Each datum must fall into one and only one class interval.

The class intervals must continuous


There should be no gaps in the number line even if a class interval has
no members. Again this requirement will be appreciated when we consider
the construction of cumulative frequency histogram and the computation of
some summary measures.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

The class intervals must be exhaustive


All possible data must fit into one of the class intervals.

Before moving on, let us revise the terminology of grouped frequency distributions.
• Upper and lower class limits are the largest and smallest values belonging to a
given class interval. The lower limit of the first class interval is any number,
which is less than or equal to the lowest value in the data.

• Upper and lower class boundaries are the largest and the smallest actual values
that separate classes. Class limits are usually converted to class boundaries by
finding the midpoint of the upper class limit of the upper class and the lower class
limit of the following class.

Note: Class boundaries are also

! referred to as real class limits.

• Class mark (midpoint) is half of the sum of the upper class limit and lower class
limit.

• Class width is the difference between the upper class boundary and the lower
class boundary.

To illustrate the above terminology, consider the class interval 70 – 79 in the above
grouped frequency table. Then
o The lower and the upper class limits are 70 and 79 respectively.
o The lower and the upper class boundaries are 69.5 and 79.5. Note that this is done
to ensure that the class intervals are continuous, mutually exclusive and
exhaustive. If the data under consideration were continuous, the class boundaries
would capture the value that falls between the upper class limit of one class
interval and the lower class limit of the subsequent class interval.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

(70 + 79)
o The class midpoint is = 74.5
2
o The class width is 79.5 − 69.5 = 10 .

Exercise 2.1
1. During the Christmas holiday the Department of Culture organized a music
extravaganza. The following day, a random sample of 70 of those who
attended were asked to rate the extravaganza on a five point scale 1, 2, 3, 4, 5
where 1 represents maximum enjoyment and 5 represents minimum
enjoyment. Their ratings are shown below.
1 5 3 4 2 1 4 2 4 3
2 4 3 2 1 3 5 2 1 1
1 4 3 2 3 2 4 1 1 2
1 2 3 3 2 2 4 5 5 2
4 2 1 1 4 4 2 3 3 4

Construct a frequency distribution for these ratings.

2. A hardware store recorded the number of bags of cement sold on 52


consecutive Saturdays. The results are as shown below.

58 47 85 47 63 51 40 70 80 73 72 90 84 42
56 67 63 70 54 76 49 81 75 80 75 46 60 71
70 79 84 72 54 55 61 82 70 47 40 77 81 76
66 59 81 66 48 43 87 55 70 60

Construct a grouped frequency distribution for these data.

2.3 Bar chart


A bar chart is a graphic device used to summarize nominal or ordinal data. It displays the
data using bars (rectangles) of the same width, each of which represents a particular
category. The length of each bar will represent the number of observations in that

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

category or simply the value of the variable. Bar charts are displayed horizontally or
vertically and they are separated rather than touching so that implications of continuity
among the categories are avoided. The bar charts are visually appealing and make it easy
for users to see comparisons, patterns, and trends in data.

Notes:
• It is reasonable to use a logarithmic scale for the frequency axis if the
range of values is greater than two orders of magnitude (e.g., 0 –200).

! • The first and the last bars may include the extremes; that is, we may
have open-ended intervals (for example, if we are dealing with age in
years, we may have, say, “under 30 ” and “over 70”)

Example 2.3:
The table below shows the 1999 census of Kenya by province as at 24th August 1999.
Province Population
Nairobi 2,143,254
Nyanza 4,392,196
Coast 2,487,264
North Eastern 962,143
Eastern 4,631,779
Western 3,358,776
Central 3,724,159
Rift –valley 6,987,036
Source: Kenya, Central Bureau of Statistics

We can represent this information in a vertical or a horizontal bar chart as shown in the
figures below.

8 1999 Kenyan population by Province


7.5
7
(in millions)

6.5
6
5.5
5
4.5
Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)
4
lOMoARcPSD|31627713

Fig 2.1: A vertical bar chart depicting the 1999 census of Kenya by province

1999 Kenyan population by Province

Western

Rift-valley

Nyanza
Province

North-Eastern

Nairobi

Eastern

Coast

Central

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8

Population (in millions)

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Fig 2.2: A horizontal bar chart depicting the 1999 census of Kenya by province

2.4 Pie chart


A pie chart is a circular plot that shows the amount in each category relative to the total
amount. The pie chart represents the 360 degrees of a full circle and is divided into
sectors by straight lines from its centre to its circumference. The sector angle is the same
proportion of 360 degrees as the category is of the total data.

Example 2.4:
The table below shows the 1930 Education Department Expenditure by race in Kenya.

Race Pupils (in state and state- Total expenditure Expenditure per pupil
aided schools only) (In US dollars) (In US dollars)
African 6948 232,293 33.4
Asian 1900 70,329 37.0
European 776 140,041 180.5
Total 9,624 442,663 46.0
Source: Kenya, Education Department Annual Report, 1930

We can represent the total expenditure information (third column) in a pie chart as shown
below.
232,293
Sector angle corresponding to 232,293 is × 360 o = 188.91o
442,663
70,329
Sector angle corresponding to 70,329 is × 360 o = 57.20 o
442,663
140,041
Sector angle corresponding to 140,041 is × 360 o = 113.89 o
442,663

1930 Total e ducational e xpe nditure by race

European
32%
African
52%

Asian
16%

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Fig 2.3: A pie chart depicting the 1930 educational expenditure by race in Kenya

We note the following from the above pie chart:


57.20 o 70,329
o
≡ ≈ 15.89%
360 442,663

113.89 o 140,041
≡ ≈ 31.64%
360 o 442,663

188.91o 232,293
≡ ≈ 52.48%
360 o 442,663

57.20 o + 113.89 o + 188.91o = 360 o


70,329 + 140,041 + 232,293 = 442,663
15.89% + 31.64% + 52.48% = 100%
2.5 Histogram
A histogram is a graphic device that resembles a bar
chart except that in a histogram, the bars touch one
another; that is, there are no gaps between the bars. A
histogram is thus used for data that are measured on an
interval scale. We can construct a histogram from a
grouped frequency distribution by plotting the classes
(or the class midpoints) along the abscissa and the
corresponding class frequencies along the ordinate.
Rectangles (bars) are then drawn above each class (class

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

midpoint) in such a way that the area of the histogram


bar is proportional to its corresponding class frequency.
Thus the total area of the bars is proportional to the
total frequency.

Example 2.5:
Using the grouped frequency distribution in Example
2.2, the following histograms are obtained.

40

35

30

25
Frequencies

20

15

10

0
20 -29 30 - 39 40 - 49 50 - 59 60 - 69 70 - 79 80 - 89 90 - 99

Marks

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Fig 2.4a: A histogram of the psychology exam marks

40

35

30

25
Frequency

20

15

10

0
24.5 34.5 44.5 54.5 64.5 74.5 84.5 94.5

Marks

Fig 2.4b: A histogram of the psychology exam marks

Note:
When we construct a histogram based on the relative

! frequencies rather than on frequencies, the resulting


diagram is referred to as a relative frequency histogram

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

2.6 Frequency Polygon


A frequency polygon is constructed from a histogram
by joining the midpoints of the tops of the bars with
straight-line segments. The lines at either end of the
polygon should be extended to the abscissa.

Example 2.6:
Using the histogram in Fig. 2.4b we obtain the following
frequency polygon

40

35

30

25
Frequency

20

15

10

0
Downloaded
14.5 24.5by34.5
Alfonce Mwelelu
44.5 54.5 (mwelelualfonce9@gmail.com)
64.5 74.5 84.5 94.5 100
lOMoARcPSD|31627713

Fig 2.5: A frequency polygon for the psychology exam marks

Note:
When we construct a polygon based on the relative

! frequencies rather than on frequencies, the resulting


diagram is referred to as a relative frequency polygon.

2.7 Cumulative Frequency Polygon (Ogive)


The data in a grouped frequency distribution can be
represented in yet another graphic device; namely, the
grouped frequency polygon (or an ogive). In this graph,
the cumulative frequencies are plotted against the
upper class boundaries of the corresponding classes.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

The maximum value on the ordinate of the ogive is the


sum total of the frequencies.

Example 2.7:
The following table shows the class intervals, the upper
class boundaries and their corresponding cumulative
frequencies obtained from the grouped frequency
distribution for the data on psychology exam marks in
Example 2.2.

Marks Upper class boundaries Frequency Cumulative frequency


20 – 29 29.5 3 3
30 – 39 39.5 9 12
40 – 49 49.5 20 32
50 – 59 59.5 26 58
60 – 69 69.5 19 77
70 – 79 79.5 23 100
80 – 89 89.5 34 134
90 – 99 99.5 16 150
Total 150

Fig 2.6 shown below is the cumulative frequency


polygon for this distribution and is obtained by plotting
the pair of points (29.5, 3), (39.5, 12), (49.5, 32), (59.5,
58), (69.5, 77), (79.5, 100), (89.5, 134), and (99.5, 150).

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

150
140
130
120
Cumulative frequency 110
100
90
80
70
60
50
40
30
20
10
0
19.5 29.5 39.5 49.5 59.5 69.5 79.5 89.5 99.5

Upper class boundaries

Fig 2.6: A cumulative frequency polygon for the psychology exam marks

At this point, it is noteworthy that some authors


consider an alternative view of an ogive. Instead of
using the cumulative frequencies as tabulated above, we
can also consider the greater- than cumulative
frequencies; that is for each class, we consider the sum
of all frequencies greater than or equal to the frequency

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

of that particular class. This is illustrated in the


following table.
Marks Upper Frequency Less-than Greater-than
class Cumulative Cumulative
boundaries frequency frequency
20 – 29 29.5 3 3 150
30 – 39 39.5 9 12 134
40 – 49 49.5 20 32 100
50 – 59 59.5 26 58 77
60 – 69 69.5 19 77 58
70 – 79 79.5 23 100 32
80 – 89 89.5 34 134 12
90 – 99 99.5 16 150 3
Total 150

If we now plot the values of the second column against


those in the last column in the above table, we obtain
the greater- than cumulative frequency polygon
depicted below.
150
140
130
120
Cumulative frequency

110
100
90
80
70
60
50
40
30
20
10
0
29.5 39.5 49.5 59.5 69.5 79.5 89.5 99.5 100

Upper class boundaries

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Fig 2.7: A greater-than cumulative frequency polygon


for the psychology exam marks

We shall appreciate the utility of an ogive in


determining some measures of location such as the
median, quartiles, deciles and percentiles in Unit 3.

2.8 Stem –and –Leaf Plot


On of the drawbacks of using a histogram is that some
information is inevitably lost when the raw data is
transformed into classes. If the number of observations
to be summarized is not too large an alternative way of
summarizing the data is to us a stem –and –leaf plot,
which gives a similar summary to a histogram while
preserving the all the information in the raw data. A
stem –and –leaf plot for the psychology examination
marks in Example 2.1 is shown below.
Stem : Leaf
2: 799
3: 122356889

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

4: 11223334455666788999
5: 00011112233344456677888999
6: 0112333455567788888
7: 00002334556666678889999
8:
000001112334456666677777788888
9999
9: 0000001111222333
Fig 2.8a: A stem –and –leaf plot for the psychology exam marks using a class width of

10

The first digit in each row is the stem, and represents


the “tens” part of the psychology marks. The remaining
digits are the leaves, and represent the “units”. The
result resembles the corresponding histogram in Fig
2.4a or Fig 2.4b. The class width chosen here is 10 but a
class width of 5 could also have been used and would
look like this:
Stem : Leaf
2:
2: 799
3: 1223
3: 56889
4: 112233344

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

4: 55666788999
5: 000111122333444
5: 56677888999
6: 01123334
6: 55567788888
7: 00002334
7: 556666678889999
8: 0000011123344
8: 566666777777888889999
9: 0000001111222333
9:
Fig 2.8b: A stem –and –leaf plot for the psychology exam marks using a class width of 5

To construct a stem –and –leaf plot from a given data set;


Find the range of the data set.
Based on the range, choose an appropriate class width to yield between 10 and
20 classes.
Write down the stems.
Write down the leaf of each number next to the appropriate stem.

Note:
If we turn the stem –and –leaf plot sideways we note that it

! has exactly the same shape as the histogram of the


corresponding class width for the data under consideration.

2.9 Line graph

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Used for summarizing a set of data measured on an interval or ratio scale, the line graphs
are commonly used to present mathematical expressions. They are also used to represent
a time series –the statistical data arranged in accordance with occurrence in time.

Example 2.8:
The table below shows the number of primary schools in Kenya from 1971 to 1988.

Year Number Year Number of


of schools schools
1971 6372 1980 10255
1972 6657 1981 11127
1973 6932 1982 11497
1974 7706 1983 11856
1975 8161 1984 12539
1976 8544 1985 12936
1977 8896 1986 13392
1978 9349 1987 13849
1979 9622 1988 14288
Source: Central Bureau of Statistics, Kenya

We can represent the above information in a line graph as shown in the figure below

Primary school statistics in kenya: 1971 - 1988

16
Numbe r of schools (in thousands)

14
12
10
8
6
4
2
0
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988

Ye ar

Fig 2.9: A line graph for the number of primary schools in Kenya from 1971 to 1988

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Exercise 2.2
1. The marks (out of 100) of 50 candidates in an examination are given below.
53 54 18 29 82 17 36 54 47 72
75 23 7 51 70 61 57 35 37 40
46 43 81 27 35 57 26 44 50 36
77 20 55 43 46 70 44 28 63 35
72 45 33 22 51 35 60 47 57 27
(i) Select suitable classes to prepare a grouped frequency distribution for
these data.
(ii) Use the grouped frequency distribution obtained in (i) to construct a
cumulative frequency distribution.
(iii) Prepare a relative frequency polygon using the grouped frequency
distribution obtained in (i).

2. The table below shows the number of health institutions in Kenya by province in
the year 1989

Province No. of No. of No. of health Total


hospitals health sub-centres and
centres dispensaries
Nairobi 30 18 137 185
Nyanza 26 32 162 220
Coast 42 42 224 308
North Eastern 3 6 31 40
Eastern 43 46 232 321
Western 60 63 455 578
Central 42 48 252 342
Rift –valley 18 39 62 19
Total 264 294 1555 2113
Source: Health Information System (MOH), Kenya

(i) Represent this information on a bar chart


(ii) Represent the number of health institutions in Nairobi province on a pie chart.
(iii) Represent the entries of the last row in the above table on a pie chart.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

(iv) Represent the entries of the last column in the above table on a bar chart.

3. The following table shows the distribution of ages of 150 persons who were
interviewed by a beverage manufacturing company to establish the number of
persons in each of the age groups who were users of a particular beverage.

Age (in years) No. of persons


0 – 10 8
10 – 20 16
20 – 30 30
30 – 40 38
40 – 50 30
50 – 60 18
60 – 70 10

Using this data, construct


(i) a histogram
(ii) a frequency polygon
(iii) an ogive
4. The data below shows the marks obtained by 80 candidates in a Geography
examination. The examination was marked out of 100.

74 45 84 17 51 46 34 31 56 70 93 11 67 85 65 94
54 10 14 20 17 31 53 57 69 47 33 91 52 68 87 13
59 60 43 27 81 44 25 84 59 37 92 50 97 80 11 30
90 58 20 18 37 74 34 79 70 54 31 44 64 52 88 90
70 88 37 45 28 31 40 41 60 62 13 45 92 70 81 98

Represent this in formation in a stem –and –leaf diagram.

2.10 Box and Whisker Plot (Box plot)


A box and whisker plot (or, simply, a box plot) is a way of summarizing a set of data
measured on an interval scale. It is used to show the shape of the distribution, its central
value, and variability. The picture produced consists of the most extreme points in the
data set (maximum and minimum values), the lower and upper quartiles, and the median.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

A box plot is especially helpful for indicating whether there are any unusual observations
(outliers) in the data set. They are also very useful when large numbers of observations
are involved and when two or more data sets are being compared. While the maximum
and minimum values in a given data set are clear to us, the median, the lower and the
upper quartiles are new concepts, which will be discussed in Units 3 and 4. We shall thus
revisit the box plot as a form of data summarization in Unit 4 after discussing the
foregoing summarization measures.

2.11 Revision Exercises

1. During the months of March and April, a meteorologist noted the temperature
(in oC) each day at noon. The results are shown below.
24 21 22 21 22 20 24 19 21 18 23 22 19 18 21 24
25 18 20 22 18 19 22 18 25 19 24 27 17 22 22 27
18 20 19 23 24 26 21 23 21 26 22 21 21 23 24 18
18 22 25 20 22 20 23 19 24 19 26 20 20

(i) Construct a frequency distribution for these data.


(ii) Select appropriate class width and prepare a grouped frequency
distribution for these data.
(iii)Construct a histogram using the grouped frequency distribution obtained
in (ii).
(iv) Construct an ogive using the grouped frequency distribution obtained in
(ii).

2. A large number of people set out together on a Freedom from Hunger walk. The
time taken by each of a sample of 50 of these people to complete the walk is
recorded below. The times are given to the nearest hour.

63 74 88 79 82 67 66 84 77 92
75 83 67 72 70 61 67 85 77 60
76 83 81 67 75 67 66 74 80 76
77 80 75 83 96 70 94 78 63 75
72 65 83 72 91 85 60 77 67 77

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

i. Prepare a grouped frequency distribution for these data using equal


class widths, the first class having a lower class limit of 60 and
midpoint of 62.
ii. Use the grouped frequency obtained in (i) to construct a histogram.
iii. Construct an ogive for the data using the grouped frequency
distribution in (i) above.
3. The following table shows the value of imports and exports in a given country
during the period 1952 - 1960

Year Value of Imports (in US$) Value of exports (in US$)


1952 185,000 125,000
1953 191,700 141,700
1954 191,700 183,000
1955 216,700 250,000
1956 167,000 245,000
1957 198,300 247,000
1958 180,000 261,700
1959 168,300 271,700
1960 161,700 285,000

(i) Represent this information on a bar chart.


(ii) Represent the imports on a line graph.
4. A random sample of 50 patients suffering from migraine headache were treated
with a new drug and their time to recovery, to the nearest second were as shown
below.
754 456 784 467 751 468 734 731 586 740
654 570 654 620 597 601 583 707 690 730
579 650 473 577 781 544 605 704 590 477
490 508 700 568 607 504 664 579 700 604
760 588 730 645 588 631 580 701 602 647

1. Construct a grouped frequency distribution for these data.


2. Using the grouped frequency distribution, construct a
frequency polygon to for these data.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

5. The weights, to the nearest kilogram of heifers in a dairy farm are as summarized
below.

Weight (in kg) 170 - 180 - 190 - 200 - 210 - 220 -


179 189 199 209 219 229
No. of heifers 3 8 19 43 79 8

Draw a histogram to illustrate these data.

6. A group of 130 company employees were interviewed to establish their religious


affiliations. Five categories emerged and are tabulated below.

Religion No. of employees


Christianity 65
Islam 26
Hinduism 13
Buddhism 16
Atheism 10

Use the above data to prepare


(i) a bar chart
(ii) a pie chart
7. An archeologist was investigating the effect of time on the weight of fish bones.
The following table shows the distribution of the weights, to the nearest gram, for
a random sample of 180 fossils of fish around Kisumu city in Kenya.

Weight (in grams) 20 - 25- 30- 35- 40- 45- 50- 55- 60- 65-
24 29 34 39 44 49 54 59 64 69
No. of fossils 13 24 37 43 28 17 8 5 4 1

Represent these data by means of a histogram.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

8. The table below shows the number of students at Kenyatta University in Nairobi,
Kenya pursuing a Bachelor of Education (Home Economics) degree course in the
years 1984 –1989.

Year No. of students


1984 89
1985 185
1986 146
1987 257
1988 260
1989 260
Source: Kenyatta University, Kenya
Using this information, prepare
(i) A histogram
(ii) A pie chart

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

UNIT 3

MEASURES OF CENTRAL TENDENCY


3.0 Introduction
In the previous Unit, we have considered tabular and graphical methods of data
presentation and summarization. This unit discusses the measures of central tendency, a
numerical approach to the problem of data summarization. The Unit elucidates how to
choose an appropriate measure of central tendency for quantitative variables.

Be conversant with the common mathematical


symbols and notations including the subscripts
and the summation symbol discussed in Unit 1
Prerequisite
Before starting this Unit you should …

3.1 Objectives
By the end of this Unit, you should be able to:
distinguish between the two main types of measures of central
tendency; i.e., mathematical and positional averages.
know when it is appropriate to use each of them.
calculate the various mathematical averages and positional averages
from given raw data.
compute and interpret quartiles for a set of observations.

Learning Style
To achieve what is expected of you…
Briefly revise the section 1.8 of Unit1.
Allocate sufficient study time.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Attempt most of the practice and revision exercises in this Unit.


3.2 What is a Measure of Central Tendency?
Measures of central tendency are the most common with the statisticians because they
help to reduce the complexity of data and make it more comparable. We cannot
remember the whole set of data and analysis of such data is impossible. In order to
reduce this complexity of the data and make the data comparable, we seek a numerical
measure that is, in a way, representative of the entire data set. We will frequently hear of
such statements as the average life span of person in such and such a country has reduced
by 5%, or the average growth rate of the world economy has doubled. This average must
be a representative of the whole data. It provides us with an example of a measure of
central tendency.

Definition: a measure of central tendency is any statistical


measure that gives an idea about the position of the point around
which other observations cluster.

Note: Another term frequently used for a measure


! of central tendency is a measure of location.

3.2.1 Requisites for an Ideal Measure of Central Tendency


The following are the characteristics of an ideal measure of central tendency. It should
• be rigidly defined – there should exist a definite formulae so that the result does
not change from one individual to another.
• be based on all observations.
• be calculated with reasonable ease and speed.
• not be affected by sampling fluctuations.
• be amenable to further algebraic treatment; that is, it should be in such a way that
it can be easily used in further statistical analysis. For example we should be able
to amend a formula for a measure of central tendency for one series to obtain the
formulae for the measure of central tendency for two or more series of a similar
kind combined together.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

• be easy to understand.
• not be affected by extreme values.

3.2.2 Categories of Measures of Central Tendency


Measures of central tendency can be grouped into two main categories:
I. Mathematical averages, and
II. Positional averages.

3.3 Mathematical Averages


Use of average is based on the principle that over a long time the attribute possessed by a
large number of cases in one direction is generally offset by those in the other direction.
The average gives a single expression of the whole set of data. Average is the value of
the variable, which is located in the middle of the distribution. We shall consider three
types of mathematical averages; namely
a) Arithmetic mean
b) Geometric mean
c) Harmonic mean

3.3.1 Arithmetic mean


Let x1, x2, …, xn be a set of n observations. Then the arithmetic mean for this set, denoted
by x , is defined by
n

∑x i
x= i =1
(3.1)
n
For grouped data, the arithmetic mean is given by
n

∑f x i i
x= i =1
n
, (3.2)
∑f
i =1
i

where fi is the frequency of the ith class and xi is, in this case, the class midpoint.
This method is called the direct method of calculating the arithmetic mean.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

There is another method of calculating the arithmetic mean for a grouped data, called the
indirect method. In this method we list the class midpoints, pick any of them as the
assumed (working) mean, A, and then calculate di, the deviation from the assumed mean
of the ith midpoint. Then the mean is given by
n

∑f d i i
x =A+ i =1
n
, (3.3)
∑f
i =1
i

where d i = x i − A and fi is the frequency of the ith class. This formula is suitable for
classes with equal class widths.

Advantages of the Arithmetic Mean


The arithmetic mean
o is uniquely defined and leaves no scope for deliberate prejudice or personal bias.
o can be easily calculated.
o is easy to understand.
o is amenable to further algebraic treatment; for example, we can calculate the
combined mean of a set of data, given the means of the subsets (sub-samples).
o takes into account every observation in the data set.
o is a more accurate and more reliable basis for comparison if the number of items
or observations are many.
o is least affected by fluctuations of the sample –at least compared to other
measures of location.

Disadvantages of the Arithmetic Mean


o It is highly affected by extreme values in a data set. The extreme value in a data
set “pulls away” the mean from the point of concentration. For example, the
arithmetic average of the observations 1,2,10,12,1000 is x = 205 . This average is
biased, since it has given more importance to bigger items and less importance to
smaller ones.
o It does not deal with qualitative data that cannot be expressed numerically.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

o It may fall at a point where none of the actual observations are. For instance, we
may obtain 15.3 eggs as the arithmetic mean of the following eggs collected at ten
spots: 40, 10, 20, 24, 14, 13, 7, 17, 1, 3, 4. Hence it may not be truly
representative.
o It cannot be computed for grouped data in the cases of open classes.

Properties of the Arithmetic Mean


1. The sum of the deviations of the observations x1, x2, …, xn from their arithmetic
mean is equal to zero.
Proof:
n n n
∑ (x i − x) = ∑ x i − ∑ x
i =1 i =1 i =1

∑ x = nx , and, from equation (3.1), ∑ x


n
= nx − nx {Since i = nx }
i =1 i =1

=0
This property implies that the arithmetic mean is a score or a potential score that
balances all the scores on either side of it. This explains why the arithmetic mean
is very sensitive to extreme values when these values are not balanced on both
sides of it.

2. If z1 = x1 + y1 , z 2 = x 2 + y 2 , L, z n = x n + y n then z = x + y
n n n

∑x i ∑y i ∑z i
where x = i =1
, y= i =1
, and z = i =1
.
n n n
Proof:
By definition,
n

∑z i
z= i =1
n
n

∑ (x i + yi )
= i =1
{Since z i = x i + y i }
n

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

n n

∑x ∑y i i
= i =1
+ i =1
n n
=x+y

3. Let d 1 = x 1 - A , d 2 = x 2 - A , … , d n = x n - A be the deviations of the


observations x1, x2, …, xn from any number A. Then
n

∑d i
x = A+ i =1
{A is then referred to as the assumed mean}
n
Proof:
By definition
n

∑x i
x= i =1

n
n

∑ (d i + A)
= i =1
{Since x i = d i + A }
n
n n

∑ di ∑A
= i =1
+ i =1

n n
n n

∑ di ∑A nA
= i =1
+A {Since i =1
= = A}
n n n

4. If x 1 and x 2 are the means of two samples of sizes n1and n2 then the combined
mean is given by
n x + n2x2
x= 1 1
n1 + n 2
Proof:
Let x11 , x12 ,L, x1n1 be a sample of n1 observations and let x 21 , x 22 ,L, x 2 n 2 be

another sample of n2 observations.


By definition, the means of these samples are, respectively
n1 n1

∑ x1i ∑x 2i
x1 = i =1
and x 2 = i =1
n1 n2
The mean of the combined sample is thus given by

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

n1 + n 2

∑x i
x= i =1

n1 + n 2
n1 n1 + n 2

∑ xi +
i =1
∑x
i = n 1 +1
i

=
n1 + n 2

n1x1 + n 2 x 2
=
n1 + n 2

In general, let x 1 , x 2 , … , x k be sample means of k samples of sizes n1, n2, …,


nk, respectively. Then the overall (combined) mean of these samples is given by
k

n x + n 2x 2 + L + n k x k ∑n x i i
x= 1 1 = i =1
n1 + n 2 + L + n k k

∑n
i =1
i

5. The sum of squares of deviations from the arithmetic mean is less than the sum of
squares of deviations from any other arbitrary score. That is, the sum of squares of
deviations is minimum when taken about the arithmetic mean.
Proof:
Consider a frequency distribution and take an assumed mean, A to be any of the
class midpoints, xi, i = 1, 2, …, n.
Define
n
S = ∑ f i d i2 , where d i = x i - A , i = 1,2,L, n
i =1

n
= ∑ f i (x i − A) 2
i =1

We will now invoke some fundamental results in differential calculus.


∂S n
At the relative maxima, = −2∑ f i (x i − A) = 0
∂A i =1

n
⇒ ∑ f i (x i − A) = 0 (3.4)
i =1

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

∂ 2S ∂  n n

Next,
∂A 2
= −2  ∑
∂A  i =1
f i x i − A ∑
i =1
fi 

n n
= 0 + ∑ fi = ∑ fi > 0
i =1 i =1

Thus S is a relative minimum.


n n
And from (3.4), ∑f x
i =1
i i − A∑ fi = 0
i =1

n n
⇒ ∑ fi x i = A∑ fi
i =1 i =1

∑f x i i
⇒A= i =1
n
=x
∑f
i =1
i

Let us now consider some examples to demonstrate how the arithmetic mean is
calculated.
Example 3.1:
Given the following data calculate the arithmetic mean.
x 1 2 3 4 5
frequency 3 5 9 6 2

Solution:
(a) Direct Method
For grouped data,
n

∑f x i i
x= i =1
n

∑f
i =1
i

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

From the given values, we form the following frequency distribution.


x Frequency (f) fx
1 3 3
2 5 10
3 9 27
4 6 24
5 2 10
Total 25 74

5 5
Thus, ∑f
i =1
i = 25 , and ∑f x
i =1
i i = 74 .

Therefore the arithmetic mean is given by


n

∑f x i i
x= i =1
n

∑f
i =1
i

74
= = 2.96
25

(b) Indirect Method


Let the assumed mean A =3. Then, we obtain the following table.

x frequency d = x - 3 fd
(f)
1 3 -2 -6
2 5 -1 -5
3 9 0 0
4 6 1 6
5 2 2 4
Total 25 -1
5 5
Thus, ∑ fi = 25 , and
i =1
∑f d
i =1
i i = −1 .

The arithmetic mean is then given by;

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

∑f d i i
x =A+ i =1
7

∑f i =1
i

(−1)
= 3+
25
1
= 3− = 2.96
25

Example 3.2:
The following grouped data shows the number of consignments and their weights (in kg)
received by a courier service for shipment.
weight frequency
6.5 – 7.5 5
7.5 – 8.5 12
8.5 – 9.5 25
9.5 – 10.5 48
10.5 – 11.5 32
11.5 – 12.5 6
12.5 – 13.5 1

Compute the arithmetic mean of the above weights.

Solution:
Let the assumed mean A=10. Then we have the following table.

weight frequency midpoint d = x - 10 fd


(f) (x)
6.5 – 7.5 5 7 -3 -15
7.5 – 8.5 12 8 -2 -24
8.5 – 9.5 25 9 -1 -25
9.5 – 10.5 48 10 0 0
10.5 – 11.5 32 11 1 32
11.5 – 12.5 6 12 2 12
12.5 – 13.5 1 13 3 3
Total 129 -17

7 7
Thus, ∑f
i =1
i = 129 , and ∑f d
i =1
i i = −17

The arithmetic mean is then given by;

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

∑f d i i  ∑ fd 
x =A+ i =1
x = A+ ⋅h
 ∑ f 
7

∑f
i =1
i

(−17)
= 10 +
129
= 9.867

Note:
In the computation of x by the method of assumed mean, we
sometimes reduce the bulkiness of the deviations x – A by

! dividing the deviations by a common factor, h, say. In this case

the arithmetic mean Exercise


is given by
3.1x = A +
 ∑ fd 
 ⋅h
 ∑ f 
Using the data in Example 3.2, calculate the arithmetic mean using the direct method.
Compare your answer with the one obtained using the indirect method. What do you
note?

3.3.2 Geometric mean


The geometric mean is commonly used in the calculations of index numbers. It is defined
as the nth root of the product of all the observations in a data set. That is, if x1, x2, …, xn are
n observations then the geometric mean is given by
x r = n x1 ⋅ x 2 ⋅ L ⋅ x n
n
=n ∏x
i =1
i

We can introduce logarithms in this definition so that we have


1  n 
log(x r ) = log ∏ x i 
n  i =1 
1 n
= ∑ log(x i )
n i =1
Therefore,

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

1 n 
x r = antilog ∑ log(x i ) 
 n i =1 
In the case of grouped data; if x1, x2, …, xn are the observations with f1, f2, …, fn as the
corresponding frequencies then
n

xr = ∑ fi x ⋅ x ⋅ L ⋅ x ⋅ x ⋅ x ⋅ L ⋅ x ⋅ L ⋅ x ⋅ x ⋅ L ⋅ x
i =1 11 4
41244 31 12 442244 32 1n 44n244 3n
f1 times f 2 times f n times

∑ fi
= i =1
x 1f1 ⋅ x f22 ⋅ L ⋅ x fnn

Taking logarithms we get

log(x r ) = n
1
(
log x 1f1 ⋅ x f22 ⋅ L ⋅ x fnn )
∑ fi
i =1

n
1
= n ∑f i log(x i )
∑ fi i =1
i =1

Thus,
 
 1 n 
x r = anti log n ∑ f i log(x i ) 
 ∑ fi i =1 
 i =1 
The geometric mean has its own limitations. For example, it cannot be used for a data set
with negative values. Besides, if any observation in the data set is zero, the geometric
mean is equal to zero.

Example 3.3:
Find the geometric mean of the values 4,6,8,9.

Solution:
By definition, x r = n x1 ⋅ x 2 ⋅ L ⋅ x n .
Here n=4 and therefore,
xr = 4 4 × 6 × 8 × 9
= 4 1728
Using logarithms;

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

x r = anti log(0.25 × log(1728) )


≈ 0.8094 (correct to 4 d.p.)

Example 3.4:
Given below is the frequency distribution of students’ performance in a Mathematics test
at the end of a school term. The marks are out of 50.
Marks Number of
students
0 – 10 5
10 – 20 9
20 – 30 10
30 – 40 16
40 – 50 4

Find the geometric mean of the frequency distribution.


Solution:
Using the above frequency distribution, we form the following table.

Marks Midpoint frequency log(x) f.log(x)


(x) (f)
0 – 10 5 5 0.699 3.495
10 – 20 15 9 1.176 10.585
20 – 30 25 10 1.398 13.979
30 – 40 35 16 1.544 24.706
40 – 50 45 4 1.653 6.613
Total 44 6.47 59.377

Then,
 
 1 5 
x r = anti log 5 ∑ f i log(x i ) 
 ∑ f i i =1 
 i =1 
 1 
= anti log (59.377) 
 44 
= anti log(1.350) ≈ 22.39

3.3.3 Harmonic mean

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Harmonic mean of a set of n observations x1, x2, …, xn is the ratio of n to the sum of the
reciprocals of the observations. That is, the harmonic mean is defined as
n
xh =
1 1 1
+ +L+
x1 x 2 xn
n
= n

∑1 x
i =1
i

For grouped data if x1, x2, …, xn are the observations with f1, f2, …, fn as the
corresponding frequencies then
n

∑f i
xh = i =1
n
fi
∑x
i =1 i

The harmonic mean is mainly used where it is desired to give the greatest weight to the
smallest items. It is used in such areas as in averaging rates and time. It is not, however,
a popular measure of location.

Example 3.5:
An airplane flies around a square of length 100 miles. It covers at a speed of 100 miles
per hour the first side, 200 mph the second side, 300 mph the third side and at 400 mph
the fourth side. What is the average speed?

Solution
We make use of the harmonic mean to calculate the average speed.
n
xh =
1 1 1
+ +L+
x1 x 2 xn
4
=
1 1 1 1
+ + +
100 200 300 400
= 192 mph

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Example 3.6:
The table below shows the distribution of wages across the different age groups in a Life
Is Precious, a non-governmental organization, whose role is to alleviate poverty in a
certain community.

Wages Number of workers


40 – 50 12
50 – 60 10
60 – 70 15
70 – 80 17
80 – 90 8
90 – 100 3
Find the harmonic mean for the distribution.

Solution:
Using the above frequency distribution, we form the following table.

Wages Midpoint Number of Reciprocal f/x


(x) workers (1/x)
(f)
40 – 50 45 12 0.0222 0.2664
50 – 60 55 10 0.0182 0.1820
60 – 70 65 15 0.0154 0.2310
70 – 80 75 17 0.0133 0.2261
80 – 90 85 8 0.0118 0.0944
90 – 100 95 3 0.0105 0.0315
Total 65 1.0314

Then,
n

∑f i
xh = i =1
n
fi
∑x i =1 i

65
= ≈ 63.021
1.0314

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Note:
In the computation of the x , x h , and x r , the

! results are unaffected whether the class intervals,


o are continuous or not.
o have equal or unequal class widths.

Exercise 3.2
1. Given the following data:

Age group 80-89 70-79 60-69 50-59 40-49 30-39 20-29 10-19

Frequency 2 2 6 20 56 40 42 32

Calculate the
(i) arithmetic mean
(ii) harmonic mean
(iii) geometric mean
2. Find the average mark of the student from the following frequency table:

Marks Number of students


Below 10 15
Below 20 35
Below 30 60
Below 40 84
Below 50 96
Below 60 127
Below 70 198
Below 80 250

3. Find the average wage of a worker from the frequency table:

Wage Number of workers


Above 0 685
Above 10 500
Above 20 423

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Above 30 389
Above 40 309
Above 50 273
Above 60 250
Above 70 0

3.4 Positional Averages


There are three main types positional averages namely;
(i) Quartiles
(ii) Median
(iii) Mode

i. Quartiles
Quartiles are the values of the variate, which divided the total frequency into four equal
parts.
The kth quartile denoted by Qk is given by;
 Nk 
 − C h
Q k = L1 +  
4
f
Where
Li = Lower limit of the ith quartile class
N = Total cumulated frequency
F = Frequency of the quartile class
C = Cumulative frequency of the class preceding the quartile class
k =1,2,3; that is, first, second and third quartile.

Advantages of Quartiles
The quartiles
o are very easy to calculate.
o are not affected by extreme values.
o can be used to treat qualitative data.
o can be determined graphically using ogives.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Limitations
The quartiles
o are not amenable to further algebraic manipulation
o requires that data must arranged in ascending order or descending
order of magnitude and involves additional work.
o are erratic if the number of items is small.

Example 3.7:
Using the data below compute the quartiles and the median.

Variable 5 7 9 11 13 15 17 19
frequency 1 2 7 9 11 8 5 4

Solution:
We first calculate the cumulative frequency in order to determine the value of the
quartiles.

Variable (x) Frequency (f) Cumulative frequency (c.f.)

5 1 1

7 2 3

9 7 10

11 9 19

13 11 30

15 8 38

17 5 43

19 4 47

First quartile Q1 =size of ( N 4) th item

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

47
=
4
= 11.75th item.
This item is included in the cumulated frequency (c.f. =19) where x=11. Hence the first
quartile Q1 =11.

The second quartile (median) Q2 = size of ( N 2) th item


47
=
2
= 23.5th item
This item is included in the cumulated frequency (c.f. =30) where x =13. Hence the
second quartile Q2 =13.

The third quartile Q3 = size of (3N 4) th


47 × 3
=
4
= 35.25th item.
This item is included in the cumulated frequency (c.f. =38) where x =15. Hence the third
quartile Q3 =15.

Example 3.8:
Find the median and the quartile for the marks obtained by 76 students given below.

Marks 0 –10 10 –20 20 –30 30 –40 40 –50


frequency 4 8 12 32 20
c.f 4 12 24 56 76

Solution:
th
The median Q2 =size of ( N 2) item.
76
=
2
= 38th item.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

This item lies in class interval 30-40 whose cumulated frequency (c.f.=56).
Applying the interpolation formulae
N 
 − C h
M = L1 +  
2
f
Here L1 =30, f=32 N 2 = 38 and C=24. Substituting these values in the formula above we
get;

M = 30 +
(38 − 24)10
32
=34.37 marks
The first quartile Q1 = size of ( N 4) th item
76
=
4
=19th item.
This item lies in the class interval 20-30 whose cumulated frequency (c.f. = 24).
Applying the interpolation formulae
N 
 − C h
Q1 = L1 +  
4
f
Here h=10, L1 =20, f =12 N 2 = 19 and C=12
Substituting these values in the formula above we get:
Q1 = 20 +
(19 − 12)10
12
=25.83 marks
th
 3N 
The third quartile Q3 = size of   item
 4 
76 × 3
=
4
= 57th item
The item lies in class interval 40-50 whose cumulated frequency (c.f.=56).
Applying the interpolation formulae

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

 3N 
 − C h
Q 3 = L1 +  
4
f
Here h= 10, L1 =40, f= 20, N 2 = 57 and C=56. Substituting these values in the
formula above we get;

(57 − 56)10
Q 3 = 40 +
20
= 40.5 marks

Exercise 3.3
Calculate the median, the first and the third quartiles for the following data.

Weight 60-69 70-79 80 -89 90-99 100-109 110-119 120-129 130-139 140-149
Boys 2 9 24 28 15 11 7 3 1

ii. Median
This is the second quartile, i.e. when k=2. It may be defined as the middle most or central
value of the variable when the values are arranged in increasing order of magnitude. In
the case of grouped data, the median may be defined as that value of the variable that
divides the area of the curve into two equal parts.

At this point, let us revisit the box and whisker plot,


which we could not discuss exhaustively in Unit 2, as we
could not comprehend what quartiles are.

The Box and Whisker Plot (Boxplot)


A boxplot (also referred to as a 5-number summary)
consists of five values.
• The most extreme values in the data set (maximum and minimum).
• The lower and the upper quartiles.
• The median

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

The figure below shows a typical boxplot. It consists of a box from Q1 to Q3 with
whiskers extending to the minimum and maximum of the data set.

Min Q1 Median Q3 Max

Fig. 3.1: A Box and Whisker plot


iii. Mode
It is usually found that in a given data, a certain item will occur more frequently than any
other and this predominant item can easily be located. The value of the item, which is
most common, is known as the mode. The mode is the value that occurs most frequently.
It automatically follows that if the items are selected at random, the most likely item to
occur will be the modal value. In the case of discrete grouped frequency distribution, the
mode is the value of the variable corresponding to the maximum frequency. In the case of
continuous data the mode is given by the following interpolation formulae;

(f m − f 1 ) h
Mode = L1 +
2f m − f1 − f 2

Where L1 =Lower limit of the modal class.


f m= Frequency of the modal class.
f1 =Frequency of the class preceding the modal class.
f 2=Frequency of the class succeeding the modal class.

Advantages
The mode,
o can easily be calculated.
o is not affected by extreme values.
o can be determined graphically.
o can be used for qualitative data analysis.
Disadvantages

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

The mode;
o is not amenable to further algebraic manipulation.
o is indeterminate when the distribution is irregular and there is no
definite point of maximum density.
o is not significant when the frequency distribution does not include
large number of items

In the case of discrete and continuous grouped data we locate the mode by the method of
grouping.

Example 3.9:
If seven men are receiving daily wages of Shs. 5,6,7,7,8,9,10 find the modal wage.

Solution:
The modal wage is 7. This is because it has maximum frequency of occurrence.

Method of Grouping
The method of grouping is applied when;
(i) The maximum frequency is repeated
(ii) The distribution is irregular, deviates from normality.

Example 3.10:
Find the mode of the following distribution.
Variable 3 4 5 6 7 8 9 10 11
Frequency 5 4 6 8 9 7 5 9 4

Solution:
If we locate the mode by inspection, we find that the variables 7 and 10 have a maximum
frequency, hence we cannot determine whether the mode is 7 or 10. This is a case of
bimodal distribution. We can determine the mode of this distribution using the method of
grouping.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Variable frequency I II III IV V


3 5
9
4 4 15
10
5 6 18
14
6 8 23
17
7 9 24
16
8 7 21
12
9 5 21
14
10 9 18
13
11 4

Procedure:
The frequencies in column I are added in pairs. In column II we leave the first item and
added the rest in pairs. In column III the items are added in threes and in column IV the
first item is left out and the rest added in threes. In column V the first two items are left
out and the rest added in threes. The maximum frequency in each column is picked out
in the table below:
Column Number Maximum Frequency Combination of Values

I 16 7,8
II 17 6,7

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

III 24 6,7,8
IV 21 7,8,9
V 23 5,6,7

We then construct a frequency table by counting the number of occurrences of each of


the items in the combinations above:

Frequency Table
Variable 5 6 7 8 9
Frequency 1 3 5 3 1

Since item 7 has the maximum frequency, then 7 is the mode. In the case of grouped data
we locate the modal class using the method of grouping and then apply the interpolation
formulae.

Example 3.11:
Find the mode from the following data

Marks 0-10 10-20 20-30 30-40 40-50 50-60 60-70


Frequency 5 13 21 37 31 24 3

Solution:
Since this a unimodal distribution we see that the modal class is 30 - 40. We can now use
the interpolation formulae;

( f m − f1 ) h
Mode = L1 +
2f m − f1 − f 2
(37 − 21)10
= 30 +
74 − 21 − 31
= 37.27 marks

3.5 Revision Exercises


1. Calculate the mode of the following data.
Class Interval 0-5 5-10 10-15 15-20 20-25 25-30
Frequency 12 30 18 40 10 6

2. Find the mode for the following data.


Marks 5-10 10-15 15-20 20-25 25-30 30-35 35-40 40-45 45-50
Frequency 7 9 7 11 13 3 4 13 1

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

3. Below is the frequency distribution which resulted when the weight (in kg) of 50
calves in a dairy farm were measured.
Weight 170 172.5 175 177.5 180 182.5 185 187.5 190 192.5 195
(Kg)
Frequency 1 2 4 6 8 9 7 6 3 2 2

Find:
a) the mode
b) the median
c) the interquartile range

4. When checking the number of errors per page by a copy typist the frequency
distribution was as summarised below.
Number of errors per page 0 1 2 3 4 5 6 7 8
Frequency 4 15 27 20 18 10 4 1 1

Find:
a) the mode
b) the median
c) the upper quartile
5. The grouped frequency shown below gives the results of an IQ test performed on
a group of 50 students.
IQ test marks 90 - 95 - 100 - 105 - 110 - 115 - 120 - 125 -
94 99 104 109 114 119 24 129
Frequency 2 7 9 14 9 4 3 2

Estimate the median and the lower quartile of these data.

6. A cellular phone dealer sells three different models made by the same
manufacturer. He sells
265 of Nokia 5210 at a mean price of Shs. 10 860,
352 of Nokia 8250 at a mean price of Shs. 12 580,
150 of Nokia 8310 at a mean price of Shs. 18 250.
Find the mean price of all the three phones sold during this period.

7. A group of 100 kids participated in a certain company promotion’s Quiz. The


quiz consisted of several questions cutting across various issues. The following

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

frequency distribution shows the number of questions answered correctly by the


kids.
Number of correct answers 1 2 3 4 5 6
Frequency 11 18 26 23 15 7
Calculate the mean number of correct answers.

8. The following table shows the number of deaths due to HIV/AIDS–related


complications recorded in a certain hospital by age.
Age 0-9 10-19 20-29 30-39 40-49 50-59 60-69 70-79
No. of
2 12 55 95 71 42 16 7
deaths

Estimate the mean age of death.

9. The yields of grains (x tonnes) from 500 small plots are grouped in classes with a
common class interval (0.2 tonne) in the table below, the value of x given being
the mid-values of the classes.
x f x f x f x f x f
2.8 4 3.4 47 4.0 88 4.6 35 5.2 4
3.0 15 3.6 63 4.2 69 4.8 10 - -
3.2 20 3.8 78 4.4 59 5.0 8 - -

Show that
(i) the mean of the distribution is 3.95 tonnes,
(ii) the median of the distribution is 3.95 tonnes
(iii) the lower and the upper quartiles are 3.63 and4.28 tonnes respectively.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

UNIT 4

MEASURES OF DISPERSION
4.0 Introduction
So far we have been concerned with calculating or estimating a single value to represent
a set of data. Although you can quote a single number to represent data, the data itself
will be spread about that number. This unit discusses the various methods of measuring
the spread of data.

Be conversant with the measures of location


particularly the arithmetic mean.
Prerequisite
Before starting this Unit you should …

4.1 Objectives
By the end of this unit the learner should be able to;
explain the purpose of measures of spread.
compute and interpret the range, variance and standard deviation for
quantitative variables using appropriate formulas.
know the basic properties of the standard deviation.

Learning Style
To achieve what is expected of you…
Briefly revise the Unit3.
Allocate sufficient study time.
Attempt most of the practice and revision exercises in this Unit.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

4.2 What is a measure of Dispersion?


In Unit 3, we discussed the measures of central tendency. These measures reduce the
whole set of data into one single figure called average. Now an average cannot truly
represent the whole group adequately. To illustrate this suppose we have marks of three
groups of students A, B, and C:
A B C
30 45 5
30 35 55
30 10 30
In this example the arithmetic mean for the three groups of students is equal to 30. Can
we conclude that the three distributions or groups are homogeneous? Perhaps not. There
might be some differences between the three groups, which have not been pointed out by
the average. Such differences in the distributions are measured by resorting to measures
of dispersion. The main difference between these three groups is the variability between
them. Thus there is a need to measure the variability within the samples, because the
arithmetic mean, mode, median may be the same in two or more distributions, but the
composition of the individual items in the series may vary widely. It will be misleading to
describe the situation with any one measure of central tendency. Measures of central
tendency locate the centre of the distribution but tell us nothing about the variability in
the samples.
Definition: Dispersion is the variability around a measure of central tendency.
The requisites of a good or ideal measure of dispersion are the same as those of an ideal
measure of central tendency.

4.3 Importance of Dispersion


1. The objective of measuring dispersion is to ascertain the degree of deviation that
exists in the data and hence the limits within which the data will vary in some
measurable variate or attribute.
2. Measures of dispersion supplement the information given by the measures of
central tendency. These measures are also called averages of the second order, i.e.
second time averaging the deviations from a measure of central tendency. This

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

will increase the accuracy of statistical analysis and interpretation and we can be
in a better position to draw more dependable inferences.
3. Measures of dispersion make it possible to compare between different groups.
4. Measures of dispersion are very important in many economic and social
problems. Comparisons are made and this helps in studying inequalities in the
distribution of income, wealth, land, etc., among different sections of the country.
Similarly, social problems in different areas of the country can be compared with
different areas and these social evils can be removed by taking effective steps.
There are five methods of measuring measures of dispersion.
1) The Range
2) The Interquartile Range
3) Semi Interquartile Range
4) Mean deviation
5) Standard deviation
The first three are position measures of dispersion based on some items of the series and
the last two are based on all items of this series.

4.4 Range
The range is the simplest measure of variation that we can use. It is simply the difference
between the largest and smallest values in a set of data. That is, if x1, x2, …, xn are the
observations in ascending order such that x 1 < x 2 < x 3 <…< x n
then
Range = x n − x1
In other words,
Range = Largest value – Smallest value

Example 4.1:
Given the observation 2,4,3,5,1,3,6, find the range.

Solution:

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

We first arrange the observations in ascending order; then using the above definition, we
have Range = 6 − 1 = 5

The range is mainly used in those fields where the variation is not considerable e.g. in the
field of quality control of manufactured goods, measuring money rates and rate of
exchange fluctuations. However, it should be kept in mind that the range is a crude
measure of dispersion and is entirely unsuitable for precise and accurate studies.

4.5 Interquartile Range


Sometimes we are interested in knowing the range within which a certain proportion of
the items fall. One such measure is the interquartile range. According to this method
deviation are calculated not between extreme items but between the upper and lower
quartiles; i.e.,
IQR = Q 3 - Q1 ,
where Q3 and Q1 denotes the third and the first quartile respectively.
The relative measure is obtained by dividing the difference of the upper and the lower
quartile by the sum of the two quartiles.
Q 3 - Q1
Coefficient of the interquartile range =
Q 3 + Q1
This method is not affected by extreme values but is not based on all the observations.

4.6 Semi -Interquartile Range (SIQR)


The dependence of the range on two extreme items can be avoided by adopting this
method. This measure of dispersion is based on two quartiles, the lower and the upper
quartiles.
Q 3 - Q1
SIQR =
2

4.7 Mean Deviation

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

This method of mean or average deviation seems to remove a serious shortcoming of the
previous three methods; i.e., they are not based on all the observations. It gives us an idea
about the amount of observations around a central point. The mean deviation is the
arithmetic mean of deviations of a series computed from some measure of central
tendency ignoring the signs.
It is to be noted that the algebraic sum of the deviations of a group of observations
from their own mean is always zero. To avoid this we take the deviations ignoring the
signs; i.e., take absolute values of these variations.
If x1, x2, …, xn are the individual observations, then we define the mean deviation as:
1 n
Mean Deviation (M.D.) = ∑ xi − A
n i =1
where A is any measure of central tendency.
If f1, f2, …, fn are the corresponding frequencies of the above observations, then the mean
deviations is defined as;
1 n
M.D. = ∑ fi xi − A
n i =1

Advantages
The mean deviation is
o based on all observations and gives weight to items according to their size.
o easily computed and readily understood.
o not affected by the fluctuations of sampling and by extreme values.
o rigidly defined.
Disadvantages
o It is not amenable to further algebraic manipulations because it ignores
signs.
o It is not very accurate.
o Sometimes the mean deviation may not be a representative particularly if
it is calculated from the mode.

Example 4.2:

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Find out the mean deviation from the mean and the median for the following data.
Frequency 2 4 6 8 10 12 8
Variable 5 7 9 11 13 15 17

Solution:
xi fi C.F. d1i = x i − x f i d 1i d 2i = x i − M e f i d 2i
5 2 2 7.5 15 8 16
7 4 6 5.5 22 6 24
9 6 12 3.5 21 4 24
11 8 20 1.5 12 2 16
13 10 30 0.5 5 0 0
15 12 42 2.5 30 2 24
17 8 50 4.5 36 4 32
∑f d i 1i = 141 ∑f d
i 2i = 136

Median (me) = size of the (n 2) th item


= 25th item
= 13
Mean deviation about the median (me)

∑f d i 2i
=
136
= 2.72
∑f i 50

Mean deviation about the mean x

∑f d i 1
=
141
= 2.82
∑f i 50

Example 4.3:
Calculate the mean deviation of the following data.
Marks 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50
Number of students 10 25 30 20 15

Solution:

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Since here it is not specified whether the deviation should be taken from the mean,
median or mode, we generally take the deviations from the mean because in such
situation the deviations are minimal.

Marks Midpoint No of fi x i di = x i − x fidi


(xi) students (fi)
0 – 10 5 10 50 20.5 250.0
10 – 20 15 25 375 10.5 262.5
20 – 30 25 30 750 0.5 15.0
30 – 40 35 20 700 9.5 190.0
40 – 50 45 15 675 19.5 292.5
∑f i = 100 ∑f x i i = 2550 ∑f d
i i = 965

Arithmetic mean, x = ∑ f i x i = 2550 = 25.5


∑f i 100

Mean deviation about the mean =


∑f di i
=
965
= 9.65
∑f i 100

Exercise 4.1
1. From the following frequency distribution of the sale of the tickets, calculate the
mean deviation about the mean and the median.

Sale in shillings Number of Tickets


0 – 2.99 2
3 – 5.99 10
6 – 8.99 26
9 – 11.99 32
12 – 14.99 8
15 – 17.99 2

2. Find the mean deviation of the data below.

Classes 0-6 6-12 12-18 18-24 24-30


Frequency 8 10 12 9 5

3. Find the mean deviation of the following data.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Marks Number of students


Above 0 30
" 10 26
" 20 14
" 40 10
" 50 0

4.8 Variance and Standard Deviation


The variance and standard deviation are widely used measures of dispersion.
Definition: The standard deviation is the square root of the arithmetic mean of the
squares of all the deviations measured from the mean of the series, since the sum of
squares from the actual mean is minimum. The standard deviation is denoted by σ
where

1 n
σ= ∑ f i (x i − x )
2

N i =1

If the deviations are taken from any measure of central tendency like the mode or median
then it is called the root mean square deviation.
That is,
1 n
R.M.S.D = ∑ f i (x i − A )
2

N i =1
where A is either the mode or median.
The square of the standard deviation denoted by σ2 is called the variance.
The standard deviation suffers from least drawbacks and provides accurate results
compared to the other measures of dispersion we have considered. The method of
calculating the standard deviation removes the drawback of ignoring the algebraic signs
while calculating deviations of the items from the average. Instead of ignoring the signs,
the deviations are squared thereby making all the items positive and then take the square
root of the resultant.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Properties of the variance and standard deviation


2
1 n 1 n 
1) σ = ∑ f i x i2 −  ∑ f i x i 
2

N i =1  N i =1 
Proof:
1 n
σ2 = ∑ f i (x i − x )
2

N i=1

∑ f i (x i2 − 2x i x + x 2 )
1 n
=
N i=1
1 n
= ∑ f i x12 − 2x 2 + x 2
N i−1
1 n 1 n 1 n
= ∑ f i x i2 − 2x ∑ f i x i + x 2 ∑ f i
N i=1 N i=1 N i=1
2
1 n 1 n 
= ∑ f i x i2 − ∑ f i x i 
N i =1  N i =1 

2) The variance is unaltered by the change of origin.


Proof:
Suppose we change the origin by defining

x′i = x i − a so  x ′ = x − a
The standard deviation
1 n 1 n
σ2 = ∑ f i (x i − x ) = ∑ f i (x′i − x′) = σ 2
2 2

N i=1 N i=1

3) The standard deviation/variance is affected by change of scale.


Proof:
Let a be the new origin and the new scale be h the original scale, so that:
xi − a
ui = ⇒ x i = a + hu i
h
x = a + hu (using properties of the arithmetic mean)
1 n
Therefore σ 2x = ∑ f i ( x i − x)
2

N i =1

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Changing the scale and origin we get;


1 n
σ 2x = ∑ f i (a + hu i − (a + hu ) )
2

N i=1
1 n
= ∑ f i h 2 (u i − u ) 2 = h 2σ2u
N i=1
⇒ σ x = hσ u
Thus if the scale is changed by h, then we multiply the resultant standard deviation by h.

4) Let the standard deviations and means of two samples of sizes n1 and n2 be σ1, σ2 and
x1 , x 2 respectively. If the two samples are combined to get one sample of size
n = n1 + n 2 , then the combined variance of this combined sample measured from its

n 1σ12 + n 2 σ 22 n 1 n 2 ( x 1 − x 2 ) 2
combined mean x is given by σ 2 = +
n1 + n 2 (n 1 + n 2 ) 2
Proof:
1 ni 1 n2 1 n1 1 n2
Define x1 = ∑ x1i , x 2 = ∑ x 2j , σ 1 = ∑ ( x 1i − x 1 ) , σ 2 = ∑ ( x 2j − x 2 )
2 2 2 2

n1 i =1 n 2 j=1 n 1 i =1 n 2 j=1
The combined mean
n1x1 + n 2 x 2
x= Qx =
1
[∑ x1i + ∑ x 2j ] =
1
[n1x1 + n 2 x 2 ]
n1 + n 2 n1 + n 2 i j n1 + n 2
The variance of the combined series is given by,
1 n1 n2
σ2 = { ∑ ( x 1i − x ) 2 + ∑ ( x 2j − x ) 2 } (4.1)
n 1 + n 2 i =1 j=1

Consider
n1 n1
∑ (x1i − x ) = ∑ (x1i − x1 + x1 − x)
2 2
i =1 i =1

n1 n1
= ∑ (x1i − x1 ) 2 + 2( x1 − x) ∑ (x1i − x1 ) + n1 ( x1 − x) 2
i =1 i =1

n1
= ∑ (x1i − x1 ) 2 + n1 ( x1 − x) 2
i =1

n1
= n1σ12 + n1d12 (since ∑ (x1i − x1 ) = 0 ) (4.2)
i =1

Similarly,

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

n2 n2
∑ ( x 2j − x ) 2 = ∑ ( x 2j − x 2 ) 2 + n 2 ( x 2 − x) 2
j =1 j =1

= n 2σ 22 + n 2d 22 (4.3)
where d1 = x1 − x and d 2 = x 2 − x
Now substituting (4.2) and (4.3) in (4.1) we get
1
σ2 = {n1 (σ12 + d12 ) + n 2 (σ 22 + d 22 )} (4.4)
n1 + n 2

(n 1 x 1 + n 2 x 2 ) (n1x1 + n 2 x 2 )
Now, d 1 = x 1 − d2 = x 2 −
n1 + n 2 n1 + n 2
n 2 (x 1 − x 2 ) n1 (x 2 − x1 )
= =
n1 + n 2 n1 + n 2

Substituting d1 and d 2 in (4.4) we get

1 n 1 n 22 (x 1 − x 2 ) 2 n 2 n 12 (x 2 − x 1 ) 2
σ =
2
{n 1σ1 + n 2 σ 2 +
2 2
+ }
n1 + n 2 (n 1 + n 2 ) 2 (n 1 + n 2 ) 2

1 nn
= {n 1σ12 + n 2 σ 22 + 1 2 (x 1 − x 2 ) 2 }
n1 + n 2 n1 + n 2

Exercise 4.2
The first of the two samples has 100 items with mean 15 and standard deviation 3. If the
whole group has 250 items with mean 15.6 and variance 13.44, find the standard
deviation of the second group.

4.9 Coefficient of Variation


This is a relative measure of dispersion and is more appropriate than the absolute
measures. In cases where the unit of measurement is the same for two or more
distributions, the standard deviation is seen to be dependent on the arithmetic mean of the
distribution. If we wish to compare the variability of those distributions whose sizes,
means and units of measurement differ then we use the coefficient of variation. If we
have two groups A and B where coefficient of variation of group A is greater than that of

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

group B then we can conclude that group B is more consistent than group A i.e. there is
less variation in group B than in group A.

Definition: The coefficient of variation is defined as the standard deviation expressed as


the percentage of the arithmetic mean. Thus,
σ
Coefficient of variation = × 100
x

Example 4.4:
Calculate the standard deviation and the coefficient of variation for the following data.
Wages Number of workers
70-80 12
80-90 18
90-100 35
100-110 42
110-120 50
120-130 45
130-140 20
140-150 8

Solution:
x − 105
Wages Midpoint (x) f u= u2 fu2 fu
10
70-80 75 12 -3 9 108 -36
80-90 85 18 -2 4 72 -36
90-100 95 35 -1 1 35 -35
100-110 105 42 0 0 0 0
110-120 115 50 1 1 50 50
120-130 125 45 2 4 180 90
130-140 135 20 3 9 180 60
140-150 145 8 4 16 128 32
Σf=230 Σfu =753
2
Σfu=125

Note: We have changed the scale and the original to make data less bulky.
x−A
Thus; u =
h
where we take A= median value and h a convenient common divisor. Since we have
previously shown that deviation is independent of change of origin but dependent on the
scale, we have

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

σ= [ ∑ fu /N − ( ∑ fu/N )
2 2
]× h
=  (753 / 230) − (125 / 230)  × 10 = 17.3
2
 

Mean x = A + h ∑ fu/N
= 105 + (125/230 )× 10 = 110.4

17.3
Coefficient of variation = × 100 = 15.67
110.4

Exercise 4.3
1. The score of two golfers for 24 rounds were as follows:

Golfer A: 74 75 78 78 72 77 79 78 81 76 72 72
77 74 70 78 79 80 81 74 80 75 71 73
Golfer B: 86 84 80 88 89 85 86 82 82 79 86 80
82 76 86 89 87 83 80 88 86 81 84 87
Find which golfer may be considered to be more consistent player (less variable).

2. In a certain test for which the pass mark is 30 the distribution of marks of passing
candidates classified by sex were given below.
Marks Boys Girls
30 – 34 5 15
35 – 39 10 20
40 – 44 15 30
45-49 30 20
50 – 54 5 5
55 – 59 5 0

The overall mean and standard deviation of marks for boys including thirty boys who
failed was 38 and 10 respectively. The corresponding mean and standard deviation for
girls including the 10 who failed was 35 and 9.
a) Find the mean and standard deviation of the 30 boys who failed in the test.
b) The moderation committee argued that the percentage of passes among girls is
higher because the girls are very studious and if the intention is to pass those
students who are really intelligent, a higher pass mark should be used for girls.
Without question in the priority of this argument suggest what the pass mark
should be which will allow only 70% of the girls to pass.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

c) The prize committee decided to award prizes to the best 40 students irrespective
of sex, Judged on the basis of marks obtained in the test, estimate the number of
girls who would receive the prize.

4.10 Revision Exercises


10. The following grouped frequency distribution shows the breakdown of the
performance of forth form students in Makhobe High School in the GCSE
examination of 1998.
Mark 1- 21- 31- 41- 51- 61- 71- 81-
20 30 40 50 60 70 80 90
Frequency 8 30 90 103 79 64 21 5

Estimate the mean and the standard deviation of these marks for this population.

11. A student visits a shop frequently. On 20 random occasions he recorded the queue
length (the number of people queuing) at the check-out. The result were:
Queue length 4 5 6 7 8 9
Number of visits 2 3 7 6 0 2

By calculation, estimate
i. the mean
ii. the standard deviation
iii. the coefficient of variation
of the queue length at this shop.

12. The table below summarises the weights, to the nearest kilogram, of a random
sample of 40 Highland cattle.
Weight (Kg) 400-449 450-499 500-549 550-599 600-649 650-699
Frequency 1 2 4 6 8 9

Estimate the mean and the standard deviation of the weights for this distribution

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

13. A distribution consists of three components with frequencies of 200, 250 and 300,
having means of 25, 10 and 15, and standard deviations of 3, 4 and 5 respectively.
Show that the mean of the combined distribution is 16, and its standard deviation
7.2 approximately.

14. The yields of grains (x tonnes) from 500 small plots are grouped in classes with a
common class interval (0.2 tonne) in the table below, the value of x given being
the mid-values of the classes.
x f x f x f x f x f
2.8 4 3.4 47 4.0 88 4.6 35 5.2 4
3.0 15 3.6 63 4.2 69 4.8 10 - -
3.2 20 3.8 78 4.4 59 5.0 8 - -

Show that the standard deviation of the distbution is 0.46.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

UNIT 5

MOMENT, SKEWNESS AND KURTOSIS


5.0 Introduction
There are three major features of interest when describing the distribution of a sample or
population:
its shape
its central tendency
its dispersion
This unit attempts to address the first feature plus a closely related idea of the peakedness
of a distribution.

5.1 Objectives
By the end of this Unit, you should be able to:
recognize the following distributions: normal, skewed, platykurtic and
leptokurtic.
approximately locate the median (equal areas point) and the mean
(balance point) on a distribution.
know that both the mean and median lie at the centre of a symmetric
distribution and that the mean moves farther towards the long tail of a
skewed curve.
compute and interpret the coefficient of skewness.
compute and interpret the coefficient of kurtosis.

Learning Style
To achieve what is expected of you…
Allocate sufficient study time.
Attempt most of the practice and revision exercises in this Unit.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

5.2 What are Moments?


The term moment in physics refers the turning point or the rotating effect of a force. In
statistics it is used to explain the features of a distribution. Moments play an important
role as a method for comparison and for testing normality and skewness of a distribution.
A moment is always taken relative to some reference point. The three common reference
points used in statistics are the origin mean, and about arbitrarily point ‘a’.

If x1, x2, …, xn are the n values assumed by the variable x, we define the rth moment
about a point ‘a’ as;
1 n
µ ′r = ∑ (x i − a) r for individual data or
n i =1
n

∑ f (x i i − a) r
µ ′r = i =1
n
for grouped data.
∑f i =1
i

If a = 0 , then we get the rth moment about the origin; i.e.,


1 n r
νr = ∑ x i for individual data, or
n i =1
n

∑x r
i
νr = i =1
n
for grouped data.
∑f
i =1
i

And if a = x we get the central moment or moment about the mean,


1 n
µr = ∑ (x i − x) r for individual data or
n i =1
n

1 ∑
f i (x i − x) r
µr = i =1
n
for grouped data.
∑f
N
i
i =1

Note that µ 1 = 0
1 n
µ2 = ∑ (x i − x) 2 is the variance.
n i =1

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

5.2.1 Relationship between moments about the mean and moments about any other
point
The rth moment about the mean is given by,
1 n
µr = ∑ (x i − x) r
n i =1
Adding and subtracting ‘a’ inside the parenthesis we get,

∑ [(x i − a) − (x − a)]
1 n
µr =
r

n i =1
r
1 n
µr = ∑ i ( v − d ) where v i = x i − a , d = x − a
n i =1
We can use the binomial expansion to expand (a − b) r

r  r r  r  r r
(a − b) r =  a o b r −  a 1 b r -1 +  a 2 b r − 2 -  a 3 b r −3 + L (− 1)  a r b o
o 1   2  3 r
Using this expansion we expand µ r to get

1 n  r  r  1 r −1  r  2 r − 2 
µr = ∑  v i −  d v i +  d v i L (−d) r 
n i =1  1   2 

1 n r  r  1 n 1 r −1  r  1 n 2 r − 2
= ∑ v i −   ∑ d v i +   ∑ d v i L (−d) r
n i =1 1  n i =1  2  n i =1
r r 
= µ ′r −  µ 1r −1d +  µ ′r − 2 d 2 L (− 1) µ 1′ d r −1 + (−d) r −1
r −1

1   2

In particular,
µ 1 = 0 ; i.e., sum of deviations about the mean is zero.

µ 2 = µ ′2 − 2dµ 1′ + d 2 µ ′0

= µ ′2 − 2(x − a)µ 1′ + (x − a) 2 µ ′0
2
1 n 1 n 
= µ ′2 − 2 ∑ ( x i − a )µ 1′ +  ∑ ( x i − a ) µ ′0
n i =1  n i =1 
= µ ′2 − 2(µ 1′ ) 2 + (µ 1′ ) 2 Since µ ′0 = 1

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

= µ ′2 − (µ 1′ ) 2
Similarly we can get the relationship for the higher moments i.e.
µ 3 = µ ′3 − 3µ ′2 µ 1′ + 2(µ 1′ ) 3

µ 4 = µ ′4 − 4 µ ′3µ 1′ + 6µ ′2 (µ 1′ ) 2 − 3(µ 1′ ) 4

Exercise 5.1
Show that
µ ′2 = µ 2 + 3µ 2 µ 1′
2

µ ′3 = µ 3 + 3µ 2 µ 1′ + µ 1′
3

µ ′4 = µ 4 + 4µ 3 µ 1′ + 6µ 2 µ 1′ + µ 1′
2 4

1 n
where µ 1′ = x − A = ∑ (x i − A)
n i =1
1
Hint: µ ′r = ∑ (x i − A) r
n
1
= ∑ (x i − x + x − A) r
n
1
= ∑ (z i + µ 1′ ) r where z i = x i − x and µ 1′ = x − A
n
5.3 Sheppard’s Correction to Moments of Grouped Distribution
In computing the arithmetic mean, standard deviation etc. for a series we calculate the
midpoints of the class intervals to represent the classes. In this case we assume that there
is a maximum concentration of the items around the midpoint. This assumption only
holds when the numbers of observations are many. But every set of data to be analyzed is
not large hence we cannot consider our assumption as valid in every case. In the case of
arithmetic mean errors represent on both side of the mean tend to cancel each other and
provide accurate results. We thus do not make corrections to the first moment, which is
the mean. In this case of the second moment, the errors on both sides of the mean are
positive after squaring hence, the canceling effect is not there. We therefore make

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

corrections only to the rth moment if r is even Sheppard used the Euler maclaurin formula
to evaluate these corrections for different moments.
µ 1* (corrected) = µ 1

µ *2 (corrected) = µ 2 h 2 12

µ *3 (corrected) = µ 3

h2 7 4
µ 4 (corrected) = µ 4 − µ2 + h
2 240
where h is the width of the class interval.

Conditions for Applying Sheppard Corrections


Sheppard’s corrections are to be applied only when certain conditions are fulfilled. These
conditions are;
o The frequency distribution should be continuous
o Corrections should not be applied when the total frequency is very large.
Example 5.1:
From the following data given below calculate the first four moments about the actual
mean

x 1 2 3 4 5
Frequency 5 2 5 4 4
Solution:
X f d=x-4 d2 d3 d4 fd fd2 fd3 fd4
1 5 -3 9 -27 18 -15 45 -135 405
2 2 -2 4 -8 16 -4 8 -16 32
3 5 -1 1 -1 1 -5 5 -5 5
4 4 0 0 0 0 0 0 0 0
5 4 1 1 1 1 4 4 4 4

From the table above

∑ f = 20, ∑ fd = −20, ∑ fd 2
= 62 , ∑ fd 3
= −152, ∑ fd 4 = 446.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

We first calculate the moments about the point 4.


1
First moment: µ 1′ = ∑ fd
N
= −20/20
= -1
1
Second moment: µ ′2 = ∑ fd
2

N
= 62/20
= 3.1
1
Third moment: µ ′3 = ∑ fd
3

N
= −152/20
= -7.6
1
Fourth moment: µ ′4 = ∑ fd
4

N
= 446/20
= 22.3
We use the relationship between the moments about a point and the central moments to
calculate the central moments.
µ1 = 0

µ 2 = µ ′2 − (µ 1′ ) 2
= 3.1-1
= 2.1
µ 3 = µ ′3 − 3µ ′2 µ 1′ + 2(µ 1′ ) 3

= −7.6 − 3(-1)(3.1) + 2(-1) 3


= -0.3
µ 4 = µ ′4 − 4 µ ′3µ 1′ + 6µ ′2 (µ 1′ ) 2 − 3(µ 1′ ) 4

= 22.3 − 4(-7.6)(-1) + 6(3.1)(-1)3 − 3(-1) 4


= −29.7

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Exercise 5.2
Calculate the first four moments about the mean and derive the corresponding moments
about the median using the data below.
Marks Number of students
5 -15 5
15- 25 20
25- 35 15
35- 45 45
45- 55 10
55- 65 5

5.4 Skewness
Measures of central tendency give more information about the average and measures of
dispersion give information about the degree of variation that exist in the data. These
measures do not indicate whether the dispersal of values on either side of the measure of
central tendency is symmetrical or not. There might be two series having the same mean
and standard deviation and yet they differ in terms of symmetry of their distribution. The
symmetry of their distribution is studied by measures of skewness. Skewness means
lopsidedness or lack of symmetry in a frequency distribution. It should be noted that
skewness relates to the shape of a frequency distribution and not to its size. When the
distribution is symmetrical, we get a normal curve. In such a situation the mean, median
and the mode are equal.

5.4.1 Symmetrical distribution


When a distribution is symmetrical, its mean, median and the mode coincide at the same
point as shown in the figure below. The tail of the curve is uniformly distributed around
the centre.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

(mean)
(mode)
(median)
Fig. 5.1: A symmetrical distribution

5.4.2 Positively skewed distribution


When symmetry is absent, we call the distribution skewed. The skewness is said to be in
the direction of the excess tail. Excess tail can be on the right or left side. If the excess
tail is on the right hand side of the distribution, then the distribution is said to be
positively skewed. In this case, Mode<Median<Arithmetic mean

80
70
60
50
40
30
20
10
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Fig. 5.2: A negatively skewed distribution

5.4.3 Negatively skewed distribution

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

If the excess tail is on the left hand side then the distribution is said to be negatively
skewed. In this case, Arithmetic mean< Median <Mode.

80
70
60
50
40
30
20
10
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Fig. 5.3: A negatively skewed distribution

5.4.4 Measures of Skewness


Several measures have been devised for measuring skewness. These measures tell us
about the direction and the extent of lopsidedness or asymmetry in the series. They also
allow us to compare two or more series.

5.4.4.1 Karl Pearson’s measure of Skewness


This measure is based on the fact that in a skewed or asymmetrical distribution the value
of arithmetic mean, median and the mode will be same. This being so, the difference
between any two of these measures of central tendency will show the extent of skewness
in a series. A relative measure known as the coefficient of skewness is used. This is
defined as follows;

Mean − Mode
Coefficient of Skewness (C.S) =
Standard Deviation
X − Mo
=
σ
But Mode = 3 median –2 Mean

Mean − (3Median - 2Mean)


C.S =
σ
3(Mean − Median)
=
σ

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Therefore if,
Coefficient of skewness = 0, then the distribution is symmetrical.
“ “ “ > 0, then the distribution is Positively Skewed.
“ “ “ < 0, then the distribution is Negatively Skewed.

5.4.4.2 Bowley’s Measure of Skewness


This measure is based on the principle that if the curve is symmetrical the range from the
first quartile to the median would be the same as that from the third quartile to the median
in a symmetrical distribution.
i.e. Q3 –Q2 = Q2-Q1
Using this property we define the
Q3 − Q 2 − (Q 2 − Q1 )
Bowley’s coefficient of skewness =
Q3 − Q1
Q3 + Q1 − 2Q 2
=
Q3 − Q1
The coefficient = 0 (for symmetrical distribution).
>0 for positively skewed distribution.
<0 for negatively skewed distribution.

5.4.4.3 Coefficient of Skewness based on Moments


Karl Pearson developed a method for the measurement of the coefficient of skewness
based on moments. If we denote this coefficient by β 1 , then

µ 32
β1 =
µ 32

β 1 = 0 for a symmetrical distribution.


β 1 > 0 for the positively skewed distribution.
β 1 < 0 for the negatively skewed distribution.

Example 5.2:

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Compute the coefficient of skewness for the given data. What can you say about the
symmetry of the distribution?
Wages Number of persons
0–5 3
5 – 10 10
10 – 15 16
15 – 20 25
20 – 25 16
25 – 30 10

Solution:
x − 17.5
Wages (k) Midpoint (x) Frequency (f) d = fd fd2
5
0-5 2.5 3 -3 -9 27
5-10 7.5 10 -2 -20 40
10-15 12.5 16 -1 -16 16
15-20 17.5 25 0 0 0
20-25 22.5 16 1 16 16
25-30 27.5 10 2 20 40
30-35 32.5 3 3 9 27

Using the table above and the interpretation formula we can compute
Mean = Median = 17.5; Q1 = 22.58; Q3 = 12.42
22.58 + 12.42 − 35
Bowley’s coefficient = =0
22.58 − 12.42
3(17.5 - 15.5)
Karl Pearson coefficient of skewness = =0
σ
Since the coefficient of skewness is zero, we can conclude that the distribution is
symmetrical.

Exercise 5.3
1. Use the moment method to confirm the result in the above example.
2. Calculate the coefficient of skewness of the two groups given below. Which of
the two is more skewed?

Marks 55-58 58-61 61-64 64-67 67-70


Group A 12 17 23 18 11

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Group B 20 22 25 13 7

5.1 Kurtosis
Kurtosis means “bulginess” in Greek but in statistics it refers to the degree
of peakedness in the region about the mode of a frequency curve. The
peakedness of a distribution is another characteristic, which can be
measured. Kurtosis measures the degree of peakedness of a distribution with
reference to the normal distribution. There are three types of kurtosis. These
types depend on the structure and the magnitude of the frequency
distribution and also on the peakedness of the curves. The three types are
leptokurtic, mesokurtic and platykurtic distributions.

5.5.1 Leptokurtic distribution


A distribution is said to be leptokurtic when its peak is higher than that of the normal
distribution. In such a distribution the items are more closely concentrated around the
mode.

5.5.2 Mesokurtic distribution


A distribution is said to be mesokurtic if it is identical to the normal distribution. That is a
normal distribution is a mesokurtic distribution.

5.5.3 Platykurtic distribution


A distribution is said to be playkurtic if its distribution is less than that of a normal
distribution.

Lepto
Normal (mesokurtic)

Platykurtic

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Fig. 5.4. The different forms of kurtosis

5.5.4 Measures of Kurtosis


1. The fourth measure about the mean is mainly used for measurement of kurtosis This
µ4
measure, denoted by β2, is given by β 2 =
µ 22

If β2 = 3 the curve is mesokurtic (normal)

β2 >3 the curve is leptokurtic.

β2 >3 the curve is platykurtic.

2. An alternative measure of kurtosis, denoted by γ2, is modified from β2 by subtracting


3; that is, γ 2 = β 2 − 3

Then, if γ2=0, the curve is normal

γ2 >0, the curve is leptokurtic

γ2<0, the curve is platykurtic.


3. Kurtosis can also be measured in terms of quartiles and percentiles. If we denote this
measure by k then,
(Q 3 − Q1 ) × 0.5
k=
P90 − P10
For a normal distribution, k=0.263;
if k>0.263 the distribution is leptokurtic;
if k<0.263 the distribution is platykurtic.

The study of kurtosis is useful as it points at the nature of the distribution of the items in
the middle of a series and helps in the choice of appropriate averages. Thus when the

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

distribution is mesokurtic the arithmetic mean is the most appropriate, median is more
appropriate for leptokurtic distributions and for platykurtic distributions the quartiles are
more suitable.

Example 5.3:
Using the data given below comment on the symmetry and the peakedness of the
distribution.

Marks Number of students


5-15 5
15-25 20
25-35 15
35-45 45
45-55 10
55-65 5

Solution:
We first calculate the three moments about the mean . These are
µ1=0, µ2=145, µ3=300, µ4=54625.

The measure of symmetry (skewness)


µ 32 µ4
β1 = 3 β2 =
µ2 µ 22

(−300) 2 54625
= =
(145) 3 21025

= 0.0295 = 2.598

Since β1 >0 and β 2<3 then the distribution is positively skewed and platykurtic.

Exercise 5.4
Analyze the peakedness and the skewness of the following distribution.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Marks 50-60 60-70 70-80 80-90 90-100 100-110 110-120


Frequency 8 10 16 14 10 5 2

5.6 Revision Exercises


15. (i) Compute the coefficient of skewness for the following distribution, which
shows ages of HIV/AIDS patients in a hospital ward.
58 39 30 48 27 16 56 56 65 63
(i) comment on the skewness of this distribution.
16. The number of birds visiting a bird table was counted each minute for 15 minutes.
The distribution was as follows:
9 4 7 5 3 8 6 5 8 10 5 6 5 8 10
Find the mode, mean and median of these data and use
them to comment on the skewness of this distribution.

17. Over a period of years, 570 students were examined in SMA 102-Basic
Mathematics at the end of the semester examinations of Kenyatta University. The
marks gained by students ranged from 0 to 99, all being integers. These were
grouped in 20 classes, with a class interval of 5, the class frequencies being as
shown in the table below.
Interval f

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

0-4 12
5-9 13
10-14 13
15-19 14
20-24 23
25-29 23
30-34 29
35-39 34
40-44 44
45-49 44
50-54 50
55-59 52
60-64 61
65-69 41
70-74 32
75-79 27
80-84 23
85-89 17
90-94 13
95-99 5

(i) Compute the coefficient of skewness based on moments, β1 for this


distribution.
(ii) Interpret your results.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

UNIT 6

CORRELATION ANALYSIS
3.0 Introduction
Two things correlate when they vary together. For example, we expect land values to fall
with distance from the city centre. In this unit, we address the question of the degree and
direction of the relationship that exists between two variables.

5.1 Objectives
By the end of this unit you should be able to;
know the meaning of bivariate data.
make a scatterplot to display the relationship between two quantitative
variables.
describe the form, direction and strength of the overall pattern of a
scatterplot. In particular, recognize positive or negative association and
linear (straight - line) patterns. Recognize outliers in a scatterplot.
know what correlation is, why correlation analysis is performed and
how to find (using appropriate formulas) the correlation coefficient.
describe the information provided by a correlation coefficient.

6.2 What is Correlation?


The theory of correlation is concerned with the study of two or more variables. Measures
of central tendency or dispersion are concerned with the problems, which arise from
variations in a single variable. For example we may be interested in finding some
relationship between prices of foodstuff and rainfall changes in wages and standards of
living etc. We observe that usually a tall man prefers a tall wife and a short man prefers a
short structured wife. This kind of observation does show some sort of relationship

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

between body structure and choice of a partner. Such types of relationships are studied
using the analysis of correlation.
In short, correlation indicates quantitative associations of variables. For example a fall in
price of commodities may be accompanied by arise in demand. Thus demand and price
move in opposite directions. Similarly, the price of a commodity and supply move in the
same direction. Correlation is therefore used to measure degree and direction of variables.
There are three methods of studying correlation;
(a) Scatter diagram
(b) Karl Pearson product moment relation coefficient.
(c) Spearman’s Rank correlation coefficient.

6.3 Scatter Diagram


This method consist of plotting pairs of data points (x1,y2) , (x2, y1),…,(xn, yn) on a graph
paper and then using the plot to study the relationship between x and y. Some authors
refer to the scatter diagram as a dot diagram. It is a very easy, simple but rough method of
studying correlation. The frequencies or points are plotted on a graph by taking
convenient scales for the two series. The plotted points will tend to concentrate on a band
of greater or smaller width according to its degree. The line of best fit is drawn with a
free hand and its direction and slope reveals the nature and degree of correlation between
the variables.

(i) If the fitted line goes upward and this upward movement is from left to right then
the x and y variables are positively correlated.

(ii) If the fitted line moves downward and its direction is from left to right then the x
and y variables are negatively correlated.

(iii) If the plotted points are scattered haphazardly such that we cannot fit an
appropriate line then this indicates that there is no correlation between x and y
variables.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

(i) (ii) (iii)

Fig. 6.1: (i) Positive Correlation, (ii) Negative correlation and, (iii) Zero correlation

Since scatter diagrams are rough, they can only be described whether the correlation is
negative or positive but cannot give the magnitude of this relationship.

6.4 Karl Pearson’s Product - Moment Correlation Coefficient


The scatter plot method of correlation analysis does not give us a numerical measure of
the degree of correlation between two variables; this difficulty is overcome by using the
mathematical formula devised by Karl Pearson commonly known as the product -
moment method. It is so called because it is based on the first moment about the mean in
two series. This method is most commonly used because it gives a fairly accurate
measure of correlation existing between two variables. This method expresses the degree
of correlation in numerical terms, its value lies between +1 and –1.
Karl Pearson coefficient of correlation is the arithmetic mean of the product of the
deviation of each pair of items from their respective means by the product of their
standard deviations. The coefficient of correlation is denoted by r and is equal to;
n
1
n ∑ ( x i − x)(y i − y)
r= i =1
n
 1 n (y − y) 2 
1
∑ ( x i − x) 2 n∑ i 
 i =1 
n
i =1

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

n n n
n ∑ x i yi − ( ∑ x i ) ( ∑ yi )
= i =1 i =1 i =1
n n n n
n ∑ x i2 − ( ∑ x i ) 2 n ∑ y i2 − ( ∑ y i ) 2
i =1 i =1 i =1 i =1

1 n
If we denote ∑ (x i − x)(y i − y) = µ 11 (this is known as the covariance), then we can
n i =1
µ 11
write the coefficient of correlation as, r =
σxσy

where σx and σy are the standard deviations on x and y respectively.

Assumptions of the Pearson’s Coefficient of Correlation


Karl Pearson coefficient of correlation is based on certain assumptions. These are;
o that there exists a linear relationship between the two variables.
o that there exists a casual relationship between the two variables.

Properties of the Pearson’s Coefficient of Correlation


1. The coefficient of correlation is the geometric mean between the regression
coefficients.
Proof:
µ 11
Let r= (6.1)
σxσy

We define the regression coefficient of y on x as

µ 11
b xy = (6.2)
σ 2x
And the regression coefficient of x on y as:
µ
b xy = 112 (6.3)
σy
From equation (6.1) we
rσ xσ y = µ 11 (6.4)

Substitute (6.4) in (6.2) we get,


rσ x σ y
b xy = 2
σx

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

rσ y
= (6.5)
σx
Similarly substituting (6.4) in (6.3) we get
rσ x σ y
b xy =
σ 2y
rσ x
= (6.6)
σy
Multiply (6.5) and (6.6) to get;
rσ rσ y
b xy .b yx = x ⋅
σy xy

= r2
⇒ r = ± b xy ⋅ b yx

2. Coefficient of correlation lies between –1 and +1l; that is, r ∈[− 1, + 1] .


Proof:
Now
1 n
∑ (x i − x)(y i − y)
r= n i =1

1 n
 n 
∑ (x i − x) 2  1n ∑ (y i − y) 2 
n i =1  i =1 
Squaring both sides we get.
2
 n (x − x)(y − y)
∑ i 
r 2 = n  i =1
i

 (x − x) 2   (y − y) 2 
n

∑
i =1
i  ∑
i =1
i 
Schwartz inequality states that

2
 n a b  ≤  n a2  n b2 
∑
i =1
i i
 ∑
i =1
i 
 ∑
i =1
i 

2
n ab 
∑
i =1
i i

⇒ n ≤1 (6.7)
 a 2   b2  n

∑
i =1
i  ∑ i 
  i =1 

Define a i = x i − x and b i = y i − y
Substituting these values of ai and bi in (6.7), we find that

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

n
[∑ (x i − x)(y i − y)] 2
r2 = n
i =1
n
≤1
[∑ (x i − x) 2 ][∑ (y i − y) 2 ]
i =1 i =1

⇒ r2 ≤ 1
Hence r ∈[− 1, + 1]

The magnitude of the correlation coefficient is always positive. The negative or positive
sign associated with it indicates the direction; i.e., if we have two correlation coefficients
r1 = 0.9 and r2 = −0.9 , then these two coefficients have equal magnitude but opposite in
direction.

Example 6.1:
Find the Pearson correlation coefficient for the following data;
x 1 3 4 6 8 9 11 14
y 1 2 4 4 5 7 8 9

Solution:
x y xy x2 y2
1 1 1 1 1
3 2 6 9 4
4 4 16 16 16
6 4 24 36 16
8 5 40 64 25
9 7 63 81 49
11 8 88 121 64
14 9 126 196 81

From the above table ∑=y=40, ∑x=56, ∑xy=364, ∑x2=524 and ∑y2=250.

Now
n n n
n ∑ x i y i − (∑ x i )(∑ y i )
r= i =1 i =1 i =1
n n n n
n ∑ x i2 − (∑ x i ) 2 n ∑ y i2 − (∑ y i ) 2
i =1 i =1 i =1 i =1

8 × 364 − (56 × 40)


= = 0.977
8 × 524 − (56) 2 8 × 256 − ( 40) 2

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Interpretation: r = 0.977 indicates that the variables x and y are highly


positively correlated.

Exercise 6.1
1. If y = 10 , find the missing values of y and then calculate the product-moment
correlation coefficient using the data given below and interpret your result.
X 10 15 20 25 30 35 40 45
Y 7 9 15 ? 4 5 7 3

2. For each of the following data sets, plot a scatter diagram, and then calculate the
product - moment correlation coefficient:

X 3.2 4.4 4.6 3.4 3.2 4.2 5.2 3.4 3.0 3.2
Y 10.0 6.4 5.4 4.2 8.2 5.4 5.8 9.2 7.0 8.8

X 2 4.6 7.2 4.2 9.6 6.2 8 1


Y 12.4 18.2 33.6 14 38.6 24.8 32 7.6

X 1 1.5 2 2 2.5 2.5 3


Y 3.5 3 2.5 2 1.5 1 0.5

6.5 Bivariate Frequency Distribution


In the case of grouped data where the frequencies fx and fy of the variables x and y
respectively are given, the Karl Pearson formulae for computing the correlation
coefficient is given by;

N ∑ f x i y i −  ∑ f x x i  ∑ f y y i 
n n n

i =1  i =1  i =1 
2 2
N ∑ f x x i2 −  ∑ f x x i   N n f y2  −  n f y 
n n
 ∑ y i  ∑ y i 
i =1  i =1   i =1   i =1 
n n n
where ∑ f x i yi = ∑ f x x i = ∑ f y yi
i =1 i =1 i =1

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Example 6.2:
Given the following grouped data of the performances of students in both statistics and
calculus, investigate whether there is any relationship between the marks obtained in
calculus and those obtained in statistics. Comment on the result.

Statistics
40 - 49 50 - 59 60 - 69 70 - 79 80 - 89 90 - 99
Calcu
lus
90 - 99 - - - 2 4 4
80 - 89 - - 1 4 6 5
70 - 79 - - 5 10 8 1
60 - 69 1 4 9 5 2 -
50 - 59 3 6 6 2 - -
40 - 49 3 5 4 - - -

Solution:
x − 64.5 7 − 74.5
Ux = and U y = .
10 10
Then,

x 44.5 54.5 64.4 74.5 84.5 94.5


U x -2 -1 0 1 2 3
y Uy fy fy Uy fy U y2 fy U y U x

94.5 2 - - - 2 4 4 10 20 40 44
84.5 1 - - 1 4 6 5 16 16 16 31
74.5 0 - - 5 10 8 1 24 0 0 0
64.5 -1 1 4 9 5 2 - 21 -21 21 -3
54.5 -2 3 6 6 2 - - 17 -34 68 20
44.5 -3 3 5 4 - - - 12 -36 108 33
fx 7 15 25 23 20 10 100 -55 253 125
fy Ux -14 -15 0 23 40 30 64
fy U x2 28 15 0 23 80 90 236
fy U x U y 32 31 0 -1 24 39 125

Substituting these values in the formula we get,

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

100 × 125 − 64 × ( −55)


r=
100 × 236 - (64) 2 100 × 253 - ( −55) 2
= 0.7686
Since the correlation coefficient is positive, we conclude that the performance in statistics
and calculus are positively correlated. Thus a student performing poorly in statistics will
also perform poorly in calculus.

Exercise 6.2
The following table gives the ages of husband and wife living together on
the census night of 1961. Investigate whether the age factor count in the
choice of partner.

Age of Husbands
Age of Wives 25 - 35 35 - 45 45 - 55 55 - 65 65 - 75
20 - 30 5 9 3 - -
30 - 40 - 10 25 2 -
40 - 50 - 1 12 2 -
50 - 60 - - 4 16 5
60 - 70 - - - 4 2

6.6 Spearman’s Rank Correlation


The Spearman’s rank correlation is used to measure correlation where
quantitative or numerical expression of the data to be analyzed is not
possible but the data can be arranged in a serial order. For example data

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

relating to, abilities, honesty, beauty etc. This serial order is known as the
rank.
Let us suppose that a group of n individuals are arranged in order of merit or
proficiency of two characteristics A and B. These ranks in the two
characteristics will be different in general. For example if we consider two
characteristics of an individual, intelligence and beauty, it does not
necessary mean that a beautiful lady will also be intelligence.

Theorem
Assuming that no two individuals are bracketed equal in either classification, each of the
variables X and Y takes the values 12….n prove that the Spearman’s rank correlation
denoted by r is given by;
n
6∑ d i2
r = 1− i =1
where d i = x i - y i
n(n − 1)
2

Proof:
Now
1
x=y= (1 + 2 + 3 + L + n)
n
(n + 1)
=
2

The Variances of the ranks xi and yi are


1 n
σ 2x = ∑ x i2 − x 2
n i =1
2
1 2  n +1
= (1 + 2 + L + n ) − 
2 2

n  2 
2
n(n + 1)(2n + 1)  n + 1 
= − 
6n  2 
n2 −1
=
12
On similar lines we can show that
n 2 −1
σy =
2

12
In general x i ≠ y i

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Let d i = x i - y i
= (x i - x) - (y i − y ) since x = y
Squaring and summing over i from 1 to n we get
n n
∑ di
2
= ∑ [(x i - x) - (y i − y)] 2
i =1 i =1
n n n
= ∑ (x i - x) 2 + ∑ (y i - y) 2 − 2 ∑ [(x i - x)(y i - y)]
i =1 i =1 i =1

Dividing both sides by n we get

1 n 2 1 n 1 n 1 n
∑ d i = ∑ (x i - x) + ∑ (y i - y) − 2 ∑ [(x i - x)(y i - y)]
2 2

n i =1 n i =1 n i =1 n i =1

= σ 2x + σ 2y − 2µ 11 (6.8)
µ 11
We know that σ 2x = σ 2y and r =
σxσy

Substituting these expressions in (6.8), we get

n
6 ∑ d i2
r = 1− i =1

n(n − 1) 2

Example 6.3:
Compute the rank correlation coefficient for the following data.
X 70 83 90 65 55 75 80 45
Y 120 130 145 110 135 140 95 100

Solution:
We first arrange the X and Y series in descending order and denote the ranks of x and y
as Rx and Ry respectively.

x Rx y Ry d = R x − R y d2

70 5 120 5 0 0

83 2 130 4 -2 4

90 1 145 1 0 0

65 6 110 6 0 0

55 7 135 3 4 16

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

75 4 140 2 2 4

80 3 95 8 -5 25

45 8 100 7 1 1
∑ d 2
= 50
The Spearman’s rank correlation coefficient is given by
n
6 ∑ d i2
r = 1− i =1

n(n − 1) 2

6 × 50
= 1−
8(64 − 1)
= 0.405

6.6.1 Repeated Ranks


If any two individuals are bracketed equal in any classification with respect to
characteristic AA or B or if there is more than one item with the same value in the series
then the spearman’s formulae for calculating the rank correlation coefficient breaks
down.
In this case common ranks are given to the repeated items. This common rank is the
average of the ranks, which these items would have assumed if they were slightly
different. The next item will get the rank next to the ranks already assumed. As a result
of this, the following adjustments or correction is made in the rank correlation formulae.
k m i (m i2 − 1) n
In the formulae we add the factor ∑ to ∑ d i2 where m is the number of items
i =1 12 i =1

whose ranks are common. This correction factor is added for each repeated values then
the corrected Spearman’s formulae is given by

6 ∑ d i2 + ∑ m i (m i2 − 1) 12
n k

 
r = 1 − i =1 i =1

n(n − 1)
2

Example 6.4:
Calculate the rank correlation coefficient for the following data;
X 20 25 33 17 38 60 25 70
Y 35 30 45 30 20 109 30 50

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Solution:
After arranging the X’ values in descending order we find that there are two items with
25 as their value. To assign the ranks to these equal valued items, we calculate the
average of the ranks of the two items. If the items were not equal we could have assigned
them ranks 5 and 6. Since they are equal we assign them 5.5 which is the average of 5
and 6. The next item after the two items with value 25 is 20. This item is assigned rank 7
and not 6. We carry out a similar exercise for the y values.

x y Rx Ry d2=(Rx-Ry)2

20 35 7 3 16
25 30 5.5 6 2.25
33 45 4 2 4
17 30 8 6 4
38 20 3 7 16
60 10 2 8 36
25 30 5.5 5 0.25
70 50 1 1 0
∑d2 = 81.5

Using the adjusted Spearman’s formulae, we get


6 ∑ d i2 + ∑ m i (m i2 − 1) 12
n k

i =1 i =1 
r = 1−
n(n − 1)
2

Since only two values are repeated k=2, item 25 is repeated two times in the X series
therefore m1 = 2 and in the Y series, item with value 30 is repeated 3 times and thus
m 2 = 3 . Substituting all these in the adjusted formulae we get;
6[78.5 + 0.5 + 2]
r = 1−
8(8 2 − 1)
= 0.0476

Exercise 6.3
For each of the following data sets, compute the Spearman’s rank correlation coefficient
and comment on the result.
(i)
X 68 64 75 50 64 80 75 40 55 64
Y 62 58 68 45 81 60 68 48 50 70
(ii)

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

X 3.2 4.4 4.6 3.4 3.2 4.2 5.2 3.4 3.0 3.2
Y 10.0 6.4 5.4 4.2 8.2 5.4 5.8 9.2 7.0 8.8
(iii)
X 20 25 33 17 38 60 25 70
Y 35 30 45 30 20 10 30 50
(iv)
X 25 87.5 135 187.5 212.5 290 355 395 445 490
Y 0.90 0.60 1.00 0.50 0.50 0.60 0.40 0.30 0.50 0.425
(v)
X 2 4.6 7.2 4.2 9.6 6.2 8 1
Y 12.4 18.2 33.6 14 38.6 24.8 32 7.6
(vi)
X 1 1.5 2 2 2.5 2.5 3
Y 3.5 3 2.5 2 1.5 1 0.5

6.7 Revision Exercises


1. For each of the following data sets plot a scatter diagram, and then calculate the
Pearson’s correlation coefficient.
i.
x 1 2.3 3.6 2.1 4.8 3.1 4 0.5
y 6.2 9.1 16.8 7 19.3 12.4 16 3.8
ii.
x 125 159 285 210 152 243 279 116 181 162 236
y 75 70 54 63 68 56 50 77 68 73 56

2. The following data relate to the percentage of unemployment and percentage


change in wages over several years.
% Unemployment (x) 1.6 2.2 2.3 1.7 1.6 2.1 2.6 1.7 1.5 1.6
% Change in wages (y) 5.0 3.2 2.7 2.1 4.1 2.7 2.9 4.6 3.5 4.4

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

(i) Calculate the Pearson’s product-moment correlation coefficient.


(ii) Interpret your result.

3. In a training scheme for young people, the times they took to reach a required
standard of proficiency were measured. The average training time in days for each
age was recorded and the results are shown.

Age, x (years) 16 17 18 19 20 21 22 23 24 25
Average training time, y days) 8 6 7 9 8 11 9 10 12 11

(i) Find the product-moment correlation coefficient between average training


time and age of trainee.
(ii) What conclusions can you draw about age of trainees vis-a-vis average
training time.

4. The ranks of the same 15 students in Mathematics and French were as follows,
the two numbers within the brackets denoting the ranks of the same student:
(1,10), (2,7), (3,2), (4,6), (5,4), (6,8), (7,3), (8,1), (9,11), (10,15), (11,9),
(12,5), (13,14), (14,12), (15,13).
Show that the Spearman’s rank correlation coefficient is 0.51.

5. The marks, X and Y, gained by 1000 students for theory and laboratory work
respectively, are grouped with common class interval of 5 marks for each
variable, the frequencies for the various classes being shown in the table below.
The values of X and Y indicated are the mid-values of the classes. Show that the
product-moment coefficient of correlation is 0.68.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

6.
x 42 47 52 57 62 67 72 77 82 Totals
y

52 3 9 19 4 - - - - - 35
57 9 26 37 25 6 - - - - 103
62 10 38 74 45 19 6 - - - 192
67 4 20 59 96 54 23 7 - - 263
72 - 4 30 54 74 43 9 - - 214
77 - - 7 18 31 50 19 5 - 130
82 - - - 2 5 13 15 8 3 46
87 - - - - - 2 5 8 2 17
Totals 26 97 226 244 189 137 55 21 5 1000

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

UNIT 7

REGRESSION

7.0 Introduction
We have seen that correlation gives us the idea of the magnitude and direction between
correlated variables. Now it is natural to think of a method that helps us in estimating the
value of one variable when the other is known. This problem is now addressed in this
unit.

4.0 Objectives
By the end of this unit the learner should be able to;
explain what the slope m and the intercept c mean in the equation
y = mx + c of a straight line.
describe the information provided by a regression equation.
explain when it is appropriate to use the statistical techniques of
regression.
find (using appropriate formulas) the equation of the least squares
regression line and be able to sketch the regression line through the data
points.
interpret the least squares regression line and use it to make predictions.
make inferences about
• the slope and intercept of a simple regression line
• the predicted Y value corresponding to a given X value

7.2 What is regression?

The concept of regression was practically developed by the Francis Galton


towards the later half of the nineteenth century. His study revealed very
interesting facts about heredity. According to him, tall parents tend to have
tall children and short parents have short children but the average height of

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

the children of a group of short parents is greater than that of the parents.
Galton has described his relationship as regression. These days there is a
growing tendency among writers to make use of this term for estimating the
unknown values of one variable from the known values of the other variable.
While correlation analysis tests which two or more phenomena co-vary
regression analysis measures the extent of this relationship thus enabling us
to make predictions. In other words, by regression we mean average
relationship between two or more variables. One of these variables is called
the dependent variable while the others are called independent variables.
Regression analysis helps is to establish a functional relationship between
two or more variables. Unless stated otherwise the dependent variables are
usually designed by y and the independent variables by x. For example,
when determining the level of livelihood of the families in the city of
Nairobi, we know that it depends on several factors e.g. Income. But does
income depend on level of livelihood? If a person living in Mathare slums
decides to change his livelihood by shifting to the posh Muthaiga estates,
will his income change because of this decision. The answer is no. This is
because the level of livelihood depends on income but income does no6t
depend on the level of livelihood. Therefore the level of livelihood is a
dependent variable and the income is an independent variable.

7.3 Regression Lines


Suppose the values of two series x and y are given. If for a particular value of x, the
corresponding value of y is to be ascertained, then x is the independent variable and y is
the dependent variable. However if for a given value of y, the corresponding value of x
is to be ascertained then y is the independent variable and x is the dependent variable. If
we plot all the original values of the two varieties x and y, on a graph paper and obtain a
scatter diagram, we will note that if there exists an association between the two varieties

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

x and y, the scatter diagram will be more or less concentrated round a curve which may
be called the curve of regression. If the curve is a straight line then the regression is said
to be linear. This line describes the average relationship between the two variables and is
analogous to the concept of the mean of the series. This regression line is also known as
the estimation line and gives the best fit in the least square sense to a given distribution.
There are usually two lines of regression, ’x on y’ and y’ on x’.
If the straight line is so chosen that the sum of square of deviations parallel to the y-axis
is minimized, then such a line is called a line of regression of y on x.

n
Thus we minimize S = ∑ ( y i − Yi ) 2 where Yi and yi are the actual and estimated values
i =1

respectively, to get the regression of y on x. The regression line of y on x gives the best
estimate of y on any given value of x.
y - axis

Estimate
Actual

x - axis

Fig.7.1: Regression line of y on x


On the other hand if the sum of square of deviations parallel to the x-axis is minimized
the resulting straight line is the line of regression of x for any given value of y. Thus we
n
minimize S = ∑ ( x i − X i ) 2 where X i and xi are the actual and the estimated values
i =1

respectively to get the regression line of x on y.


y - axis

Estimate
Actual
Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)
lOMoARcPSD|31627713

Fig.7.2: Regression line of y on x


Generally speaking the above two regression lines are different but if:
o there is either perfect positive or negative correlation between the two variables, the
two regression lines will coincide with each other; i.e., there will be one regression
line only.
o the degree of correlation is high; the two regression lines will be nearer to each other.
o the correlation is zero the two regression lines will be at right angles to each other.

7.4 Equations of Regression Lines and Coefficients

7.4.1 Regression equation of y on x


This regression equation can be expressed in the linear form as Y = a + bX where Y and
X are observations in the population Y = (Y1 , Y2 ,L YN ) and X = (X1 , X 2 ,L X N ) , a is a
constant and b is the regression coefficient. From this population we select a sample with
observations y = ( y1 , y 2 , L , y n ) and x = ( x 1 , x 2 , L x n ) and use the sample selected to
estimate the actual regression line; the estimated regression line.

Let S denote the sum of square of deviations of the estimated value y1 from its actual
value Yi
n
S = ∑ f i ( y i − Yi ) 2
i =1
n
= ∑ f i ( y i − a − bX i ) 2 since Yi = a + bX i (7.1)
i =1

Since we are summing up only the observations in the sample, then

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

n n
∑ Yi = ∑ y i
i =1 i =1

Hence Yi = a + bX i is equivalent to Yi = a + bx i . Substituting this in equation (7.1) we


get
n
S = ∑ f i (y i − a − bx i ) 2 (7.2)
i =1
According to the principle of least squares theory, we choose a and b such that S is
minimum, by taking the first differential and equating it to zero. Taking first derivative
with respect to a we have
∂S ∂ n
= ∑ f i (y i − a − bx i ) 2
∂a ∂a i =1
n
= −2∑ f i (y i − a − bx i ) (7.3)
i =1
Equating (7.3) to zero we have,
n
∑ f i (y i − a − bx i ) = 0
i =1

Simplifying this equation we get


y = a + bx (7.4)
We now differentiate equation (7.2) with respect to b
∂S ∂ n
= ∑ f i (y i − a − bx i ) 2
∂b ∂b i =1
n
= −2∑ f i (y i − a − bx i )x i (7.5)
i =1

Equating (7.5) to zero we have,


n
∑ f i (y i − a − bx i )x i = 0 (7.6)
i =1

n
Divide equation (7.6) by N = ∑ f i to get
i =1

1 n a n b n
∑ f i x i y i − ∑ f i x i − ∑ f i x i2 = 0 (7.7)
N i =1 N i =1 N i =1
We know that
1 n 1 n
∑ fi x i = x
N i =1
∑ f i yi = y
N i =1
1 n 1 n a n
∑ f i x i2 = σ 2x + x 2 ∑ f i x i y i = µ 11 + x.y ∑ f i x i
N i =1 N i =1 N i =1

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Substituting these results in equation (7.7) we get


µ 11 + x.y = ax + b(x 2 + σ 2x ) (7.8)
Multiply equation (7.4) by x to get
y.x = ax + bx 2 (7.9)
Subtract (7.9) from (7.8) to get
µ 11
b= (7.10)
σ 2x
Substitute (7.10) in (7.4) to get
µ11
a=y− x (7.11)
σ 2x
Since the linear equation of y on x can be expressed as y = a + bx , substituting the values
of a and b from (7.10) and (7.11), we get
µ 11
y−y = (x − x)
σ 2x

7.4.2 Regression equation of x on y


Using the same analogy as the one above, the equation of the regression of x on y is
given by
µ 11
x−x = (y − y)
σ 2y

7.5 Standard Error of the Estimate or Residual Variance


The estimated regression equation of y on x is given by,
rσ y
ŷ = y + (x − x) (1) since µ 11 = rσ x σ y (7.12)
σx
The residual variance
1 n
S 2y = ∑ ( y i − ŷ) 2
n i =1
2
1 n   
(x i − x )
rσ y
= ∑  y i −  y + {using (7.12)}
n i =1   σx 

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

2
σ 2y y −y r
n 
= ∑  i + ( x i − x) 

n i =1  σ y σx 
Dividing and multiplying by σ y2

σ 2y  n ( y i − y) 2 n (x − x) 2 n ( x − x )( y − y) 
= ∑ + ∑ − ∑ 
i i i
r 2r
n  i =1 σ y 2
i =1 σx
2
i =1 σx ⋅ σy 
= σ 2y [1 − r 2 ]

Thus the standard error of the estimate of the regression line y on x is

S.E.(y) = σ y [1 − r 2 ]

On similar lines the standard error of the estimate of the regression line x on y is

S.E.(x) = σ x [1 − r 2 ]

Example 7.1:
Given the data below, calculate the lines of regression and their standard errors
x 1 2 3 4 5 6 7
y 7 8 9 11 10 13 12

Solution:
Calculating the means of x and y we find that x = 4 and y = 10
x y dx = x - 4 d y = y - 10 d 2x d 2y dxdy
1 7 -3 -3 9 9 9
2 8 -2 -2 4 4 4
3 9 -1 -1 1 1 1
4 11 0 1 0 1 0
5 10 1 0 1 0 0
6 13 2 3 4 9 6
7 12 3 2 9 4 6
28 28 26
∑ dxdy
The correlation coefficient =
d 2x d 2y

26
=
28 28

= 0 . 93

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

2
d 2y
σx = dx , σy =
n n
=2 =2
Estimated regression equation of x on y is given by;
rσ x
x=x+ (y − y)
σy

= 4 + 0.93(2 / 2 )( y - 10)
x = 0.93y − 5.3
The standard error of y on x is given by;

S.E.(x) = σ x [1 − r 2 ]
= 0.737
Regression equation of y on x is give by;
rσ y
y = y+ (x − x)
σx
y = 0.93x + 6.28
The standard error of x on y is given by,

S.E.(y) = σ y [1 − r 2 ]

= 0.737

Exercise 7.1
1. The figures given below depict the production in a sugar factory,

Year 1 3 4 5 6 7 10
Production 67 88 94 85 91 98 90

(i) Fit a linear regression line and tabulate the trend values.
(ii) Estimate the production in year 2,8 and 9.
2. Two random variables have the regression with equations;
3x + 2y = 26, and
6x + y = 31
Find the mean value and the correlation coefficient between x and y.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

3. Given the following population figures in China:

Census year 1911 1921 1932 1941 1951 1961 1971


Population 250 251 279 319 361 439 547
(in millions)

Fit a regression equation and estimate the population figure in 1981.

7.6 Revision Exercises


7. For each of the following data sets plot a scatter diagram, and then find the
regression line of y on x in the form y = a + bx .
i.
x 2 4 5 8 10
y 3 7 8 13 17
ii.
x 21 39 48 24 72 75 15 35 62 81 12 56
y 40 58 67 45 89 96 37 53 83 102 35 75

8. The table below shows the age A and the strontium ratio S for each of 10 basalt
rock samples, one sample of each age being specifically chosen for the exercise.
A (102 Ma) 1 2 3 4 5 6 7 8 9 10
S 0.710 0.723 0.738 0.751 0.765 0.780 0.793 0.808 0.824 0.840

(i) Find the values of A and S .


(ii) Obtain the equation of the regression line of strontium ratio on age.
(iii) Estimate the strontium ratio for a basalt rock sample of age
3.5 × 10 2 Ma.

9. In an experiment, the atomic heat, H units of an element was measured at various


temperatures, T degrees Kelvin. The results are shown in the table below:
T 100 150 200 250 300 350 400 450 500
H 0 0.38 0.8 1.22 1.6 2 2.42 2.8 3.18

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

(iii) Draw a scatter diagram to exhibit the data.


(iv) Obtain the equation of the regression line of atomic heat on temperature.
(v) From your equation, estimate the increase in the atomic heat of the element
when its temperature is increased by 100 degrees Kelvin.

10. The class teacher of Form IV South in Migwani High school needs to predict the
grades his students will get in the final examination. To do this he decides to look
at the marks gained in mock examinationl. He thinks that in Mathematics there is
a linear relationship between these marks. To investigate this he looks at the
results of students from the past years. The mock examination and average final
examination marks are given in the following table.

Mock mark 18 26 28 34 36 42 48 52 54 60
Av. final mark 54 64 54 62 68 70 76 66 76 74

If x represents the mock mark and y the average final mark:


(i) Draw a scatter diagram to represent these data,
(ii) Calculate the regression line in the form y = a + bx .
(iii) What final marks might be expected for mock marks of : 30, 16, 50, 85?
(iv) Comment on the validity of these predicted final marks.

11. The marks, x and y, gained by 1000 students for theory and laboratory work
respectively, are grouped with common class interval of 5 marks for each
variable, the frequencies for the various classes being shown in the table below.
The values of x and y indicated are the mid-values of the classes.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

12.

x 42 47 52 57 62 67 72 77 82 Totals
y

52 3 9 19 4 - - - - - 35
57 9 26 37 25 6 - - - - 103
62 10 38 74 45 19 6 - - - 192
67 4 20 59 96 54 23 7 - - 263
72 - 4 30 54 74 43 9 - - 214
77 - - 7 18 31 50 19 5 - 130
82 - - - 2 5 13 15 8 3 46
87 - - - - - 2 5 8 2 17
Totals 26 97 226 244 189 137 55 21 5 1000

Show that the regression equation of y on x is y = 29.7 + 0.656x

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

UNIT 8

PROBABILITY
8.0 Introduction
Probability theory refers to a measure of occurrence of a chance event and is based on the
belief that the same set of cases is always accompanied by the same effect, that the future
will be like the past. Actually, it is a matter of belief, not certainty to which we give
expression in the concept of probability. All the terms, probability chance, likely convey
the same message that the event is not certain to take place or there is uncertainty about
the happening of the event. We make use of probability in our daily life, for example
somebody might say that “it will probably rain today” or “you are probably right”. Such
concepts refer to probability. An individual’s approach to probability depends upon the
nature of his interest in the concept.

8.1 Objectives:
By the end of this unit you should be able to:
define probability, sample space, simple and compound event, mutually
exclusive events, independent events and conditional probability.
define probability of an event in classical and axiomatic approach.
calculate conditional probability.
apply addition and multiplication laws of probability.
state and prove Bayes Theorem on conditional probability.

8.2 Definition of terms used in Probability

1. Trial and Event


If an experiment is repeated a number of times to get possible outcomes like tossing a
coin, then the number of times the experiment is repeated are called trials and the
possible outcomes are called events.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

2. Equally Likely Cases


Two or more outcomes of an experiment are said to be equally likely if any one of them
cannot be expected to occur in preference to the other e.g., in tossing a coin that cannot
stand on its edge, head or tail are equally likely.

3. Mutually Exclusive Events


It means that if one of the events occurs, then the other event cannot take
place with the same subject at the same time. Two events are said to be
mutually exclusive if occurrence of either precludes the occurrence of the
other; i.e., two exclusive events cannot occur simultaneously in the same
trail. For example if a coin is tossed, we can either get a tail or head but not
both, hence events of getting a head or a tail are mutually exclusive.

4. Exhaustive Events
A set of events are said to be exhaustive when it includes all possible outcomes of a trial.
It means that all the possible events that can happen are included in the study of
probability. For example;
(a) In an experiment of tossing of a coin there are two possible
outcomes, head or tail. Thus the number of exhaustive cases here are two.
(b) In an experiment of throwing a die, there are six exhaustive
cases, since any of the six faces can appear.
(c) In an experiment of throwing two dice, the number of
exhaustive cases is 62=36, since any of the six numbers on the first die can
be associated with any of the six numbers on the second die.

11 1 2 13 [1 4] 1 5 1 6
21 2 2 [2 3] 2 4 2 5 2 6
31 [3 2] 3 3 3 4 3 5 3 6
[4 1] 4 2 4 3 4 4 4 5 4 6
51 5 2 5 3 5 4 5 5 5 6
61 6 2 6 3 6 4 6 5 6 6

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

In general in the experiment of throwing n dice the number of exhaustive cases is 6n.

5. Favourable Cases
The number of cases favorable to an event in a trial is
the number of outcomes, which entail the happening of
the event. For example in the experiment of throwing
two dice, the number of ways of getting a sum of five
are: [1,4] [4,1] [2,3] [3,2], i.e. four ways.

6. Simple and Compound events


An event is said to be simple if it corresponds to a
single possible outcome of an experiment. For
example, the probability of drawing a green ball from a
bag containing six red and ten green balls.

A compound event means the joint occurrence of two or more simple events. For
example, at least one head appears if three coins are tossed or drawing a red and
then a green ball in two draws from a bag containing six red and ten green balls.

7. Independent and Dependent Events


Several events are said to be independent if the happening of one event is not
affected by the supplementary knowledge concerning the occurrence of the
remaining events.
(i) In the experiment of tossing a coin, the event of getting a head in the first
trial is independent of getting a head in the second and subsequent trials.
(ii) In the case of dependent events, the occurrence or non-occurrence of one
event in any one trial affects the occurrence of the other events in the other
trials.
(iii)If we draw a card from a well-shuffled pack and replace it before drawing the
second card, the result of the second draw is independent of the first draw.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

However if the first card is not replaced, then the result of the second draw is
dependent on the first draw.

8 Problem of Cards
A pack of cards contain 52 cards out of which we have;
4 cards each of, Two, Three, Four, Six, Seven, Eight, Nine, Ten, Jacks Queens,
Kings and Aces.
There are 4 suits of 13 cards each, Hearts, Spades, Diamond and Clubs

8.3 Approaches to the concept of Probability


There are three approaches to the concept of probability;
I. Classical approach
II. Axiomatic approach.
III. Relative frequency approach

8.3.1 Classical Approach


In this approach probability is thought as the proportion of the number of times that a
certain event will occur if the experiment is repeated indefinitely.
Probability is thus an attempt to give a numerical expression to the possibility of a certain
outcome of the experiment in the face of uncertainty.
Classical approach to probability can be interpreted as ‘prior probability’. The
probability, which can be specified by feeling, understanding or common
sense, in definite measurements by considering the nature of the event, is
known as prior probability. In other words, if the probability is expressed on
the basis of pure logic even before the event takes place, it will be prior
probability. For example the probability of getting 9 in rolling a dice only
once is zero and of obtaining less than nine is one. It is assumed that all
possible outcomes of an experiment are mutually exclusive and exclusive and
equally likely.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Classical approach describes the probability of happening of an event as equal to the


number of ways favorable to the event divided by the total number of ways in which an
event can happen or fail. If A is the event then,
Number of favourable cases
P(A) =
Total number of possible cases
8.3.2 Axiomatic Approach
The probability of an event A is a number P(A) assigned to this event. This number obeys
the following three postulates (axioms) but is otherwise unspecified.
(i) P(A) is non-negative; that is, P(A) ≥ 0 .
(ii) The probability of the certain event S equals 1; that is P (S)=1.
(iii) If An is any sequence of mutually exclusive (disjoint) events then,
∞  ∞
P(∪ A n ) = ∪P(A n ) ; that is, P ∑ A n  = ∑ P(A n )
 n =1  n =1

! Note: Axiom 3 is referred to as the law of additivity

8.3.3 Relative Frequency Approach


Under this approach, the experiment under consideration is repeated n times. If the event
A occurs nA times, then its probability P(A) is defined as the limit

P[A ] = lim
nA
n →∞ n
For example, if a coin, fair or not, is tossed n times and heads shows nH times, then the
nH
probability of heads equals the limit lim
n →∞ n

8.4 Permutations and Combinations


If we have a set of three letters A, B and C and we want to arrange them in pairs, then
there are two ways of doing this. We can arrange them such that the order of
arrangement is taken or not taken into account. If the order of arrangement is ignored so
that AB=BA, then this is a combinatorial arrangement. If on the other hand the order is
not ignored then such an arrangement is called a permutation arrangement.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Definition: If we have n items and we want to arrange them r items at a time, then
denoting the total number of ways this is possible by ncr for combinations and npr for
permutations, we have,
n! n!
n
cr = and n
pr =
(n − r )!r ! (n − r )!

Example 8.1:
To arrange the letters A,B,C in pairs n=3 and r=2.The total number of ways this can be
done ignoring order.
3 × 2 ×1
3
c2 = =3
1× 2 ×1
These are AB, AC, and BC.

Example 8.2:
A bag contains 7 white, 6 red and 5 black balls. A ball is drawn at random. Find the
probability that it will be red?
Solution:
Number of favorable cases of getting a red ball is equal to 6.
Total number of exhaustive cases =7+6+5 =18.
Let A be the event of getting a red ball then,
Number of Favourable cases
P(A) =
Total number of Exhaustive cases
=6/18
=1/3

Example 8.3:
A bag contains 4 red and 3 blue balls. Two draws of two balls are made. Find the chance
that the first draw gives two red balls and the second draw two blue balls when the balls
are replaced after the first draw.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

Solution:
Let A be the event of drawing two red balls and B be the event of drawing two blue balls.
The bag contains a total of 7 balls; it means that 2 balls can be drawn in 7c 2 ways.
Total number of exhaustive ways=7c 2
Two red balls can be drawn from the red balls in 4c 2 ways.
Therefore the total number of ways favorable to the event A of drawing two red
balls=4 c 2.
4
c2
P(A) = 7
c2
2
=
7
On similar lines, it can be shown that
3
c2 1
P(B) = 7
=
c2 7

Example 8.4:
Five cards are drawn at random from a well-shuffled pack of cards. Find the probability
that;
(a) 4 are aces
(b) There are 4 aces and 1 king

Solution:
The total number of exhaustive cases = 52c 5
a) 4 aces can be drawn in 4c 4 and the remaining card can be drawn in 48
c 1 ways.
Let A be the event of getting 4 aces. Since the two separate events are mutually
exclusive. Hence
Total number of ways favorable to the event A
= {No. of ways favorable to × {No. of ways favorable to
getting 4 aces } getting the remaining card}
= 4 c 4 × 48 c1

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

4
c 4 × 48 c1
Hence P(A) = 52
c5
b) Let B be the event of getting 4 aces and 1 king.
Favorable ways of drawing a king =4c 1
Favorable ways of drawing a king =4c 4
These two events are mutually exclusive therefore favorable ways of drawing 1
king and 4 aces = 4 C4 × 4 C1

4 C 4 × 4 C1
P(A ) =
52 C5

Exercise 8.1
1. A bag contains 6 white and 9 black balls. Two draws of 4 balls are made and the
balls replaced after the first draw. Find the probability of getting 4 white balls in the
first draw and 4 black balls in the second draw.

(iii) Find the probability that five cards drawn from a well-shuffled pack are:
(i) 3 Tens and 2 Jacks
(ii) 3 are from any suit and 2 from the other.

3. In a gambling den, a game of Bridge is being played. What is the probability that
player ‘A’ will hold all the four kings.
HINT: In a game of Bridge 13 cards are distributed to each of the four players.

8.5 Theorems on Probability


Theorem 1: Probability of impossible event is zero.
Proof:
Impossible events contains no sample point and hence the certain event and impossible
events are mutually exclusive.
Let S denotes the whole sample space and φ the impossible event i.e. empty set, then;
S∪φ = S

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

⇒ P(S ∪ φ) = P(S)
P(S) + P(φ) = P(S)
⇒ P(φ) = 0

Theorem 2: If A denotes the event and Ac its complement then P(A c ) = 1 − P(A ).

Proof:
Events A and Ac are mutually exclusive events therefore,
A ∪ Ac = S
( )
P A ∪ A c = P(S)

P(A ) + P(A c ) = P(S) Since A and Ac are disjoint.

P(A ) + P(A c ) = 1 Since P(S) = 1

P(A C ) = 1 - P(A )

Theorem 3: The probability of the union of any two events A1 and A2 is given by
P(A 1 ∪ A 2 ) = P(A 1 ) + P(A 2 ) − P(A 1 ∩ A 2 )
Where A1 and A2 are not mutually exclusive.
Proof:

A1 S
A2

A1 ∩ A2c
A2 ∩ A1C
A A
A1 ∩ A2

We have A 1 ∪ A 2 = A 1 ∪ (A 1c ∩ A 2 )

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

P(A 1 ∪ A 2 ) = P(A 1 ∪ (A 1c ∩ A 2 ))
= P(A 1 ) + P(A 1c ∩ A 2 ) Using Axiom 3
= P(A 1 ) + P(A 1 ∩ A 2 ) + P(A 1 ∩ A 2 ) − P(A 1 ∩ A 2 )
c

adding and substracting P(A 1 ∩ A 2 )


= P(A 1 ) + P((A 1C ∩ A 2 ) ∪ (A 1 ∩ A 2 )) − P(A 1 ∩ A 2 )
Since A 1c ∩ A 2 and A 1 ∩ A 2 are disjoint
= P(A1 ) + P(A 2 ) − P(A1 ∩ A 2 )

In general it can be shown that


 = n P(A ) − P(A ∩ A ) +
P ∪ ∑ P(A i ∩ A j ∩ A k )
n
A  ∑ ∑
 i =1  i =1
i i i j
i≠ j i ≠ j≠ k

L (− 1) P(A i ∩ A 2 ∩L ∩ A n )
n −1

8.6 Multiplication Law of Probability


8.6.1 When the events are independent
This law is put to use when one is interested to find the probability of the combined
events (simultaneous occurrence) of two or more independent events. The basic
condition for the multiplicative law is that probability of individual happenings of
independent events must be known. In such a situation the probability of independent
events will be equal to the product of the probabilities of the two events.

Thus if A and B are independent events then,


P(AB) = P(A) × P(B)
or
P(A ∩ B) = P(A ) × P(B)
Proof:
If an event A can happen in n1 ways of which m1 are successful and event B can happen
in n2 ways of which m2 are successful, then
m1 m
P(A ) = , P(B) = 2
n1 n2
number of favourable events
P(A ∩ B) =
number of exhaustive cases
m1 × m 2
=
n1 × n 2

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

= P(A ) × P(B).

8.6.2 When the events are not independent


The multiplicative law stated for independent events is not applicable for
dependent events. Events are said to be dependent when occurrence of one
depends on the outcome of the other. The probability that the event B will
take place provided that event A has taken place is called conditional
probability of B relative to A and is written as P(B/A).
P(A ∩ B)
P(B/A ) =
P(A )
and
P(A ∩ B)
P(A/B) =
P(B)
Proof:
Suppose the sample space contains N occurrences of which NA belong to the event A and
NB belong to the event B. Let NAB be the occurrences favorable to both A and B; i.e.,
A ∩B.
Then
NA N N
P(A ) = , P(B) = B and P(A ∩ B) = AB
N N N

Now the conditional probability P(A/B) refers to the sample space of NB occurrences out
of which NAB occurrences pertain to the occurrence of A when B has already occurred.
Thus
N AB
P(A/B) =
NB
N AB
Similarly P(B/A ) =
NA
N AB
Since P(A ∩ B) =
N

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

N AB N A
= ×
NA N
= P(B/A ) × P(A ).
On similar lines
N AB
P(A ∩ B) =
N
N AB N B
= ×
NB N

= P(A/B) × P(B)
∴ P(B/A ) P(A ) = P(A/B) P(B)

8.7 Bayes Theorem


Let H1, H2,…, Hn be mutually exclusive events whose union is the sample spaces S of
an experiment. Let E be an arbitrary event of S such that P(E) ≠ 0. Then
P(H i ∩ E )
P(H i \ E ) =
n

n
where i =1,2,3,...,n and P(E)= ∑ P(H i ∩ E)
∑ P(H ∩ E)
i =1
i
i =1

Proof:
Consider the following diagram.

Hn E
H1 H2

H1 ∩ E H 2 ∩ E ………………………… Hn ∩ E

E = (H 1 ∩ E ) ∪ (H 2 ∩ E ) ∪ L ∪ (H n ∩E ) Q H1 ∪ H 2 ∪ L ∪ H n = S
P(E ) = P(H 1 ∩ E ) + P(H 2 ∩ E ) + L + P(H n ∩ E )
∴E = E ∪S = E
Since Hi’s are mutually exclusive compound events Hi and E are dependent events.
Therefore using the law of conditional probability we have:

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

P(H i ∩ E )
P(H i /E ) =
P(E )
P(H i ∩ E )
⇒ P(H i \ E ) =
∑ P(H i ∩ E )
n

i =1

Example 8.5:
In a factory, machine A produces 30% of the output, Machine B 25% and
machine C the remaining 45%. 1% of the output of machine A is
defective.1.2% of B’s and 2% of C’s . In a day’s run the three machines
produce 10,000 items. An item drawn at random from a day’s output is
defective. What is the probability that it was produced by
(i) Machine A?
(ii) Machine B?
(iii) Machine C?

Solution:
Let E: defective item.
H1: Event that A produces the item.
H2: Event that B produces the item.
H3: Event that C produces the item.

Then P(H1/E) is the probability that the item was produced by machine A given that it is
defective; P(E/H1) is the probability that the defective item is produced by machine A;
P(E ∩ H 1 ) is the probability of the event that the item was produced by A and is
defective.

P(H1) =0.3
P(H2)=0.25
P(H3)=0.45
P(E/H1) =0.01
P(E/H2)=0.012

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

P(E/H3)=0.02

Now by multiplication law


P(H1 ∩ E ) = P(H1 ) × P(E/H1 ) = 0.003
P(H 2 ∩ E ) = P(H 2 ) × P(E/H 2 ) = 0.003

P(H 3 ∩ E ) = P(H 3 ) × P(E/H 3 ) = 0.009


P(E ) = P(H1 ∩ E ) + P(H 2 ∩ E ) + P(H 3 ∩ E ) = 0.015
Using Bayes’ Theorem;
P(H 1 ∩ E ) P(H 2 ∩ E ) P(H 3 ∩ E )
P(H 1 /E ) = = 0.2 ; P(H 2 /E ) = = 0.2 ; P(H 3 /E ) = = 0.6
P(E ) P(E ) P(E)

Exercise 8.2
4. If 10% of the rivets produced by a machine are defective, what is the probability that
out of 5 rivets chosen at random;
(i) None will be defective?
(ii) One will be defective?
(iii) At least two will be defective?
5. Kamau Speaks truth 75% and Otieno in 80% of the cases. In what percentage of the
cases are they likely to contradict each other in stating the same fact?
6. The contents of Urns I, II and III are as follows:

White Black Red


Urn I 1 2 3
Urn II 2 1 1
Urn III 4 5 3
One urn is chosen at random and the balls are drawn. They happen to be
white and red. What is the probability that they come from urns I, II or III?

7. The following table gives a distribution of wages.

Weekly Wages Number of Workers


30 - 35 9
35 - 40 108
40 - 45 488

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

45 - 50 230
50 - 55 112
55 - 60 30
60 - 65 16
65 - 70 7

An individual is taken at random from the above group. Find the probability that his
wages;
(i) are under 40.
(ii) are 55 and over.
(iii) are either between 45 – 50 or 35 – 40.

8.8 Revision Exercises


13. In a certain class of 30 Kisumu High School students, there are 16 girls. There are
7 girls and 6 boys with fair hair. A student is selected at random to be the class
monitor. Find the probability that the class monitor
(i) is a girl
(ii) is a boy with fair hair
(iii) has not got fair hair
(iv) is a girl and has not got fair hair.

14. A card is selected at random from a normal set of 52 playing cards. Let Q be the
event that the card is a queen and D the event that the card is a diamond. Find:
(i) P(Q ∩ D )
(ii) P(Q ∪ D )
(iii) P(Q C ∪ D )
(iv) P(Q ∩ D C )

15. The events A and B are such that P(A ) = P(B) = 2P(A ∩ B) . Given that
P(A ∪ B) = 0.6 , find
(i) P(A ∩ B)
(ii) P(A )

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

(iii) P(A C ∩ BC )
(iv) P(A ∩ BC )
16. The motherboard for a particular computer is manufactured at one of the three
factories A, B, C and then delivered to the main assembly line. Factory A supplies
45% of the total number of motherboards to the line, factory B 30% and factory
C 25%. Of the motherboards manufactured at factory A, 2% are faulty and the
corresponding percentages for factories B and C are 4% and 3% respectively.
Let A, B and C represent the events that a motherboard
chosen at random from the assembly line was
manufactured at factory A, B or C respectively and let
F denote the event that this motherboard is faulty.
(i) Calculate P(A ∩ F) , P(B ∩ F) and P(C ∩ F)
(ii) Find the probability that a motherboard selected at random from the
main assembly line is faulty.
17. A basket contains 6 white and 4 black balls. A ball is picked from the basket at
random and retained and then a second ball is picked out. Find the probability
that:
(i) Both balls are white
(ii) The balls are of different colours
(iii) The second ball is white given that the first one is black.

18. In Form III of Kisumu High School 55% of the students are boys. Of the boys
80% come from Nyakach constituency but only 75% of the girls come from this
constituency. The area Member of Parliament wishes to meet 4 student
representatives from this school. The headmaster decides to select one of the
representatives from Form III.
(i) Find the probability that a randomly chosen student comes from
Nyakach constituency.
(ii) Find the probability that a randomly chosen student does not come
from Nyakach constituency.

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)


lOMoARcPSD|31627713

(iii) One of the randomly chosen student comes from Nyakach


constituency Find the probability that the student is a girl.

19. In Form IV of Mtwapa High School 60 students are studying one of the three
subjects Geography, French and Accounting. Of these 25 are studying
Geography, 26 are studying French, 44 are studying Accounting, 10 are studying
Geography and French, 15 are studying French and Accounting and 16 are
studying Geography and Accounting.
(i) Find the probability that a student chosen at random from those
studying Accounting is also studying French.
(ii) Are the events “studying Geography” and “studying French”
independent? Give reasons.
(iii) A student is chosen at random from all 60 students. Find the
probability that the chosen student is studying all the three subjects.

20. Two cards are drawn at random from a well-shuffled pack of 52. show that the
chance of drawing two aces is 1 221 .
21. Show that the chance of throwing a 6 at least once in two throws of a die is 11 36 .
22. A and B toss a coin alternately on the understanding that the first to obtain heads
wins the toss. Show that their respective chances of winning are 2 3 and 1 3 .
Now suppose they embark in throwing two dice, the first to throw 9 being
awarded the prize. Show that their chances of winning are in the ratio 9 : 8 .
23. Eight coins are thrown simultaneously. Show that the chance of obtaining at least
six heads is 37 256 .
24. Three men toss in succession for a prize to be given to the one who first obtains
heads. Show that their chances of winning are 4 7 , 2 7 and 1 7 .

Downloaded by Alfonce Mwelelu (mwelelualfonce9@gmail.com)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy