0% found this document useful (0 votes)
79 views251 pages

Note Buad806 Mod1 3buhumcr6wxmrre

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views251 pages

Note Buad806 Mod1 3buhumcr6wxmrre

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 251

DISTANCE LEARNING CENTRE

AHMADU BELLO UNIVERSITY


ZARIA, NIGERIA

COURSE MATERIAL

MASTER IN BUSINESS ADMINISTRATION (MBA)

BUAD 806: BUSINESS STATISTICS AND QUANTITATIVE


ANALYSIS
COPYRIGHT PAGE
© 2016 Distance Learning Centre, ABU Zaria, Nigeria

All rights reserved. No part of this publication may be reproduced in any form or by any means,
electronic, mechanical, photocopying, recording or otherwise without the prior permission of the
Director, Distance Learning Centre, Ahmadu Bello University, Zaria, Nigeria.

First published 2016 in Nigeria

ISBN:

Published and printed in Nigeria by:


Ahmadu Bello University Press Ltd.
Ahmadu Bello University,
Zaria, Nigeria.

Tel: +234

E-mail:
COURSE WRITERS/DEVELOPMENT TEAM
Mohammed Habibu Sabari (PhD)
Jamilu Abdulkadir (Subject Matter Experts)
Prof. Abiola Awosika
Halima Shuaibu (Subject Matter Reviewers)
Yusuf Musa (Language Reviewer)
Nasiru Tanko
Ibrahim Otukoya Graphics
Prof. Adamu Z. Hassan (Editor)
QUOTE
“It is the mark of a truly intelligent person to be moved by statistics”.
George Bernard Shaw
TABLE OF CONTENT
Title Page ---------------------------------------------------------------------------------------------
Copyright----------------------------------------------------------------------------------------------
Quote--------------------------------------------------------------------------------------------------
Table of Content-------------------------------------------------------------------------------------
1.0 Course Information---------------------------------------------------------------------------
2.0 Course Description----------------------------------------------------------------------------
3.0 Course Introduction--------------------------------------------------------------------------
4.0 Course Outcomes-----------------------------------------------------------------------------
5.0 Activities to Meet Course Objectives----------------------------------------------------
6.0 Grading Criteria and Scale------------------------------------------------------------------
7.0 Course Structure and Outline--------------------------------------------------------------
8.0 Discussion Forum------------------------------------------------------------------------------
8.1 Topical Discussions------------------------------------------------------------------------------
8.2 Discussion Questions---------------------------------------------------------------------------
9.0 Study Modules---------------------------------------------------------------------------------
9.1 Module 1: Basic Introduction to Statistics--------------------------------------
Introduction---------------------------------------------------------------------------------------
9.1.1 Objectives----------------------------------------------------------------------
9.1.2 Study Sessions--------------------------------------------------------------------
9.1.2.1 Study Session 1: Introduction--------------------------------------------------
9.1.2.2 Study Session 2: Presentations of Statistical Data---------------------------
9.1.2.3 Study Session 3: Measures of Central Tendency----------------------------
9.1.2.4 Study Session 4: Measures of Dispersion-------------------------------
9.2 Module 2: Skewness and Kurtosis, Probability Theory and Distribution---
Introduction----------------------------------------------------------------------------
9.2.1 Objectives------------------------------------------------------------------
9.2.2 Study Sessions------------------------------------------------------------------
9.2.2.1 Study Session 5: Skewness and Kurtosis---------------------------------------
9.2.2.2 Study Session 6: Probability I--------------------------------------------------
9.2.2.3 Study Session 7: Probability II----------------------------------------------------
9.2.2.4 Study Session 8: Probability III---------------------------------------------------
9.3 Module 3: Sampling and Sampling Methods------------------------------
Introduction----------------------------------------------------------------------------
9.3.1 Objectives----------------------------------------------------------------------
9.3.2 Study Sessions--------------------------------------------------------------------
9.3.2.1 Study Session 9: Sampling I-----------------------------------------------------
9.3.2.2 Study Session 10: Sampling II--------------------------------------------------
9.3.2.3 Study Session 11: Hypothesis--------------------------------------------------
9.3.2.4 Study Session 12: Correlation
9.3.2.5 Study Session 13: Regression-------------------------------------------------
10.0 Further Reading-------------------------------------------------------------------------------
11.0 Glossary-----------------------------------------------------------------------------------------
PREAMBLE
Welcome to Business Statistics and Quantitative Analysis, I am your instructor for
the semester. I assure you that if you put in your best, this will be a very interesting
course. I look forward to a fulfilling time with you.

1.0 COURSE INFORMATION


Course Code: BUAD 806
Course Title: Business Statistics and Quantitative Analysis
Credit Units: Three
Year of Study: Two
Semester: First

Lecturer’s Name: Dr. Mohammed Habibu Sabari

2.0 COURSE DESCRIPTION


This course is designed in such a way that basic statistical concepts and methods
are presented explicitly and concisely to enable you comprehend principles of data
gathering and analysis, coupled with theoretical underpinning adopting statistical
techniques. The outline is devoted to also applying the statistical concepts to the
real world situations.

3.0 COURSE INTRODUCTION


Statistics and quantitative techniques have become very crucial subjects in this
modern world where decisions are made on scientific grounds. Through the science
of statistics, data can be experimented, analysed and the outcome can be interpreted
with some degree of accuracy. With computer and information technology
advancements, the application of statistics goes beyond statehood but is now
recognised by all disciplines. These include Agriculture, Economics, Commerce,
Biology, Medicine, Industry, planning, education among others.

The data under study is made up of two aspects – descriptive and inferential. We
use descriptive to study features and characteristics of the data while the inferential
aspect provides mathematical avenues to infer the properties of a population from a
randomly selected sample taken from it. The course is structured to address both the
descriptive and inferential statistics.

The outline of the course is addressed by three modules. The first module dwells on
the introduction to statistics which covers four study sessions. Session 1 looks at
various definitions of the subject couple with its classifications. Session 2 addresses
presentation and tabulation of data. Session 3 explains and computes measures of
central tendency. And, session 4 expounds measures of variability.

The second module deals with skewness, kurtosis and probability distribution. The
module is made up of four sessions and it starts from session 5 that addresses
skewness and kurtosis. Session 6 talks about probability and expected value.
Session 7 discusses discrete probability distribution and, session 8 explains
continuous probability distribution.

The third module is made up of a variety of topics that include sampling and its
methods, hypothesis testing and correlation and regression analysis. The module
comprises five session, starting from session 9, and addresses probability and non-
probability sampling. Session 10 explains sampling distributions of mean and
proportions. Hypothesis and its testing are studied within session 11 and finally,
session 12 explains correlation and regression analysis.
4.0 COURSE OUTCOMES
Upon the completion of this course, you are expected to be able to:
• Define and understand basic statistical terms.
• Compute measures of central tendency and also variability measures.
• Compute skewness and kurtosis of data and interpret their values in
describing the data distribution.
• Compute probabilities and apply the concepts of probability to confidence
intervals and hypothesis tests.
• Use hypothesis test to weigh inferences concerning means and proportions.
• Use a scatter plot to visualise the relationship between variables, use
correlation coefficient to measure the strength and direction of the
relationship, use a linear function to describe relationship between variables
and make estimation/prediction.

5.0 ACTIVITIES TO MEET COURSE OBJECTIVES


The key to success in statistics and quantitative methods is to study each topic
slowly in order to ensure full apprehension before moving onto the next one. In the
event any topic is not fully understood, there is every need to go through it over and
over again by reworking illustrations in the topic. This is based on the fact that
practice plays significant role in understanding subjects that deal with computation
issues.

On the basis of these, there will be lecture guiding materials written in clear and
concise nature that will aid and guide you to have better understanding of the
course. Video lectures that address ambiguous areas will be provided. Relevant
sites and three standard reference books will be used. There will be series of group
and individual assignments that you are expected to do and submit within the
defined time limit. This is also to serve as part of your assessment.
Provision of instructor’s email and telephone line(s) are made to enable you seek
clarification on things that are not clear to you. Also, tutorials will be arranged
within the two weeks on campus activities in which questions will be clarified to
enable you understand fully what you’ve learnt.

6.0 GRADING CRITERIA AND SCALE


6.1 Grading Criteria
Grades will be based on the following Percentages:
Individual Assignments 10%
Group Assignment/Discussion Questions 10%
Discussion Topic Participation 10%
Quizzes/Other Assignments 10%
Semester Examination 60%
TOTAL 100%

6.2 Grading Scale


The following is the grading scale as recommended by the Board of Examiners of
the Centre:
A = 70 –100
B = 60– 69
C = 50 – 59
F = 0 – 49
As you work on your research in this course and throughout your
graduate programme, here are some examples of open education
resources that will serve you well:

Open education resources


OSS Watch provides tips for selecting open source, or for procuring free or open software.

SchoolForge and SourceForge are good places to find, create, and publish open software.
SourceForge, for one, has millions of downloads each day.

Open Source Education Foundation and Open Source Initiative, and other organisations like
these, help disseminate knowledge.

Creative Commons has a number of open projects from Khan Academy to Curriki where teachers
and parents can find educational materials for children or learn about Creative Commons licenses.
Also, they recently launched the School of Open that offers courses on the meaning, application, and
impact of "openness."

Numerous open or open educational resource databases and search engines exist. Some examples
include:

• OEDb: over 10,000 free courses from universities as well as reviews of colleges and rankings of
college degree programmes
• Open Tapestry: over 100,000 open licensed online learning resources for an academic and
general audience
• OER Commons: over 40,000 open educational resources from elementary school through to
higher education; many of the elementary, middle, and high school resources are aligned to the
Common Core State Standards
• Open Content: a blog, definition, and game of open source as well as a friendly search engine for
open educational resources from MIT, Stanford, and other universities with subject and
description listings
• Academic Earth: over 1,500 video lectures from MIT, Stanford, Berkeley, Harvard, Princeton, and
Yale
• JISC: Joint Information Systems Committee works on behalf of UK higher education and is
involved in many open resources and open projects including digitising British newspapers from
1620-1900!

Other sources for open education resources

Universities

• The University of Cambridge's guide on Open Educational Resources for Teacher Education
(ORBIT)
• OpenLearn from Open University in the UK

Global

• Unesco's searchable open database is a portal to worldwide courses and research initiatives
• African Virtual University (http://oer.avu.org/) has numerous modules on subjects in English,
French, and Portuguese
• https://code.google.com/p/course-builder/ is Google's open source software that is designed to let
anyone create online education courses
• Global Voices (http://globalvoicesonline.org/) is an international community of bloggers who
report on blogs and citizen media from around the world, including on open source and open
educational resources

Individuals (which include OERs)

• Librarian Chick: everything from books to quizzes and videos here, includes directories on open
source and open educational resources
• K-12 Tech Tools: OERs, from art to special education
• Web 2.0: Cool Tools for Schools: audio and video tools
• Web 2.0 Guru: animation and various collections of free open source software
• Livebinders: search, create, or organise digital information binders by age, grade, or subject (why
re-invent the wheel?)

Legal help

• New Media Rights is trying to help digital creators use public domain or open materials legally.
They have guides on how to use free and open software materials in various fields.
7.0 COURSE STRUCTURE AND OUTLINE
7.1 Course Structure:

WEEK/DATE MODULES STUDY SESSIONS ACTIVITY INDIVIDUAL REMARKS


ASSIGNMENT
Weeks 1 & 2
RESUMPTION & REVIEW OF COURSE SITE

Week 3 Study Session 1 1.Study the course material of this session 1. Explain the scope and Discussion Topics
• Meaning and Definitions 2.Watch the video of this study session functions of Statistics. shall be uploaded
of Statistics weekly while
• Classification of
3.Listen to the audio of this study Discussion Questions
Statistics session are presented in the
• Significance of Statistics 4. Read chapters 1 to 4 of Gerald and Brian appropriate sections.
• Limitations of Statistics (2006) Statistics for Management and
• Data Sources and Data Economics, International Thomson
Types Publishing, Southern Africa, 10th Edition.
• Key Statistical Concepts

MODULE 1

Week 4 Study Session 2 1.Study the course material of this session Briefly explain the
• Tabulation Method 2.Watch the video of this study session following:
• Charting Method 1. A bar chart
3.Listen to the audio of this study 2. A pie chart
• Graphical Techniques
for Quantitative Data session 3. Z chart
4. Read chapters 1 to 4 of Lind, Marchal
and Marson (2005) Statistical Techniques
in Business and Economics, McGraw-Hill
Companies, 12th Edition.
Week 5 Study Session 3 1. Why is it that statistics
1.Study the course material of this session
• Meaning of Central 2.Watch the video of this study session calculated from raw
Tendency data are more accurate
• Measures of Central
3.Listen to the audio of this study than statistics
Tendency session calculated from
4. Read chapters 1 to 4 of Lind, Marchal frequency tables?
and Marson (2002) Statistical Techniques 2. If the reasons
in Business and Economics, McGraw-Hill advanced are so far
Companies, International Edition. the case, why then are
statistics computed
from frequency table?
Week 6 Study Session 4 1.Study the course material of this session
• Meaning of Variability 2.Watch the video of this study session
• Significance and
Properties of Measuring
3.Listen to the audio of this study
Variability session
• Measures of Variability 4. Review the posted response to last
• Interpretation of week’s Discussion Question(s).
Standard Deviation
Study Session 5 1.Study the course material of this session 1. What do you
• Meaning of Skewness 2.Watch the video of this study session understand by the
Week 7 • Measures of Skewness term skewness and
3.Listen to the audio of this study what is the purpose of
• Meaning of Kurtosis session computing its value?
• Measures of Kurtosis 2. What do you
. understand by the
term kurtosis and
what is the purpose of
computing its value?
Week 8 Study Session 6 1.Study the course material of this session Explain the meaning of the
• Meaning of Probability 2.Watch the video of this study session followings:
• How to Assign 1. Independent events.
Probabilities to Events
3.Listen to the audio of this study 2. Mutually exclusive
• Computational session events.
Probability Rules 3. Conditional probability.
• Bayes’ Theorem 4. Expected value.
Week 9 Study Session 7 1.Study the course material of this session Define discrete random
• Meaning of Discrete 2.Watch the video of this study session variable and discrete
Probability Distribution probability distribution.
• Bernoulli Random
3.Listen to the audio of this study
MODULE 2 Variable session What are the properties of
• The Binomial discrete probability
Distribution distribution?
• The Poisson
Distribution
• The Hyper-geometric
Distribution

Week 10 Study Session 8 1.Study the course material of this session 1. Define continuous
• Meaning of Continuous 2.Watch the video of this study session random variable and
Probability Distribution continuous probability
• Normal Distribution
3.Listen to the audio of this study distribution.
• The Standard Normal session 2. What are the
Distribution properties of
• The Standard Normal continuous probability
Variable Transformation distribution?

Week 11
MID SEMESTER BREAK

Week 12 Study Session 9 1.Study the course material of this session


• Census versus 2.Watch the video of this study session
Sampling Method
• Probability versus Non-
3.Listen to the audio of this study
probability Samples session
• Probability Sampling
Methods 4. Study chapters 8 to 10 of Gerald and
• Non-probability Brian (2006) Statistics for Management and
Sampling Methods Economics, International Thomson
• Determination of Publishing, Southern Africa, 10th Edition.
Sample Size

MODULE 3
Week 13 Study Session 10 1.Study the course material of this session 1. Define sampling
• Sampling Distribution of 2.Watch the video of this study session distribution of the mean
the Mean and sampling distribution
• Central Limit Theorem
3.Listen to the audio of this study of proportion.
• Sampling Distribution of session
the Proportion 2. Define the following:
• Sampling Distribution of -Parameter
the Difference of -Statistic
Sample Means -Standard error
• Sampling Distribution of
the Difference of
Proportions
• Small Sampling
Distributions
Week 14 Study Session 11 . 1.Study the course material of this session 1. Define null and
• The Null and Alternative 2.Watch the video of this study session alternative hypothesis.
Hypotheses 2. What is the difference
• Classical Approach to
3.Listen to the audio of this study between type I and
Testing Hypotheses session type II errors?
• Statistical Decision 3. What is the difference
Rules and their between one-tailed
Applications and two tailed tests?
• The Meaning and
Interpretation of p-Value
• Power of a Test and the
Size Effect

Week 15 Study Session 12 1.Study the course material of this session .


• Meaning of Correlation 2.Watch the video of this study session
• Correlation Analysis
3.Listen to the audio of this study
• Limitations of
Correlation Analysis session

Week 16 Study Session 13 1.Study the course material of this session .


• Meaning of Regression 2.Watch the video of this study session
• Simple Regression
Equation
3.Listen to the audio of this study
• Predicting an Estimate session
and its Preciseness
• Error of Estimate
• Regression of X on Y
• Properties of
Regression Coefficients
• Regression Lines and
Coefficient of
Correlation
• Coefficient of
Determination
• Correlation Analysis
versus Regression
Analysis
• Regression Diagnostics-
Week 17
Week 18 ON CAMPUS ACTIVITIES/TRAINING/TUTORIALS/PRACTICAL
Week 19 REVISION

Week 20
Week 21 SEMESTER EXAMINATION
7.2 Course Outline
MODULE 1: Basic Introduction to Statistics
Study Session 1: Introduction
- Meaning and Definitions of Statistics
- Classification of Statistics
- Significance of Statistics
- Limitations of Statistics
- Data Sources and Data Types
- Key Statistical Concepts
Study Session 2: Presentations of Statistical Data
- Tabulation Method
- Charting Method
- Graphical Techniques for Quantitative Data
Study Session 3: Measures of Central Tendency
- Meaning of Central Tendency
- Measures of Central Tendency
Study Session 4: Measures of Dispersion
- Meaning of Variability
- Significance and Properties of Measuring Variability
- Measures of Variability
- Interpretation of Standard Deviation

MODULE 2: Skewness and Kurtosis, Probability Theory and Distribution


Study Session 5: Skewness& Kurtosis
- Meaning of Skewness
- Measures of Skewness
- Meaning of Kurtosis
- Measures of Kurtosis
Study Session 6: Probability I
- Meaning of Probability
- How to Assign Probabilities to Events
- Computational Probability Rules
- Bayes’ Theorem
Study Session 7: Probability II
- Meaning of Discrete Probability Distribution
- Bernoulli Random Variable
- The Binomial Distribution
- The Poisson Distribution
- The Hyper-geometric Distribution
Study Session 8: Probability III
- Meaning of Continuous Probability Distribution
- Normal Distribution
- The Standard Normal Distribution
- The Standard Normal Variable Transformation

MODULE 3: Sampling and Sampling Methods


Study Session 9: Sampling I
- Census versus Sampling Method
- Probability versus Non-probability Samples
- Probability Sampling Methods
- Non-probability Sampling Methods
- Determination of Sample Size
Study Session 10: Sampling II
- Sampling Distribution of the Mean
- Central Limit Theorem
- Sampling Distribution of the Proportion
- Sampling Distribution of the Difference of Proportions
- Small Sampling Distributions
Study Session 11: Hypothesis
- The Null and Alternative Hypotheses
- Classical Approach to Testing Hypotheses
- Statistical Decision Rules and their Applications
- The Meaning and Interpretation of p-Value
- Power of a Test and the Size Effect
Study Session 12: Correlation
- Meaning of Correlation
- Correlation Analysis
- Limitations of Correlation Analysis
Study Session 13: Regression
- Meaning of Regression
- Simple Regression Equation
- Predicting an Estimate and its Preciseness
- Error of Estimate
- Regression of X on Y
- Properties of Regression Coefficients
- Regression Lines and Coefficient of Correlation
- Coefficient of Determination
- Correlation Analysis versus Regression Analysis
- Regression Diagnostics

8.0 DISCUSSION FORUM


You will be required to participate in two types of discussions. These comprise
of topical discussion and questions. Upload will be made weekly in which you
are to answer questions raised. It is compulsory for you to answer all the
questions. The questions could be in the form of narration in which updating to
current situation based on previous is required. The other form of questions
could be in the form of problem solving along with the interpretation of results.

8.1 Topical Discussions


These are weekly topics which I shall post on the MBA dashboard. The topics
will centre on theoretical issues, contemporary issues and/or emerging events in
the discipline at the time of posting. You are expected to study and contribute to
such discussions. Contribution at these sessions is necessary.

8.2 Discussion Questions


These are pre-determined topics to be assigned to groups of you who would
develop a thesis around the topics, submit the thesis to an assigned forum where
all participants will interact and pass comments to; add to the knowledge and
move the conversation forward.
9.0 STUDY MODULES
9.1 MODULE 1: Basic Introduction to Statistics
Introduction
Business Statistics is a science assisting you to make business decisions under
uncertainties based on some numerical and measurable scales.

The main objective of Business Statistics is to make inferences (e.g., prediction,


making decisions) about certain characteristics of a population based on
information contained in a random sample from the entire population. The
condition for randomness is essential to make sure that the sample is
representative of the population.

Business Statistics is used in many disciplines such as financial analysis,


econometrics, auditing, production and operations, and marketing research. It
provides knowledge and skills to interpret and use statistical techniques in a
variety of business applications. By means of statistical concepts and statistical
thinking, decisions makers can be able to solve problems in a diversity of
contexts, add value to decisions and also reduce using guesswork on decision
making.

Statistics is segmented into two areas, namely descriptive statistics and


inferential statistics. The purpose of descriptive statistics techniques is to
extract useful information from unorganised data. Descriptive statistics uses just
a few numbers to capture the meaning of a much larger collection of
observations on many different cases. Descriptive statistics allows us to
describe groups of many numbers. One way to do this is by reducing them to a
few numbers that are typical of the groups, or describe their characteristics. In
descriptive statistics, numerical statistical data are presented clearly and
concisely, and in such a way that the decision maker can quickly obtain the
essential characteristics of the data in order to incorporate them into decision
process. For example, the average is one kind of descriptive statistic. Measures
of spread is another kind. Grouping numbers into frequency distributions and
drawing charts to illustrate frequency distributions are other examples of
descriptive statistics.

Inferential statistics is concerned with making inferences from samples about


the populations from which they have been drawn. We use them to draw
inferences (informed guesses) about situations where
we have only gathered part of the information that
exists. When inferential statistics is used for
explaining a phenomenon or checking for validity of a
claim, it is in that context called Exploratory Data
Analysis, Confirmatory Data Analysis or Inductive
Statistics.

9.1.1 Objectives
After this contact module, you would be able to:
1. comprehend the historical evolution of statistics;
2. understand the meaning of statistics and its classifications;
3. critique the significances and limitations of statistics;
4. identify data sources, data types, and key statistical concepts and symbols;
5. apply mean, median, mode and other
specialised averages to issues in your business
environment;
6. understand the limitations of central tendency
measures;
7. compute variability measures and their
properties; and
8. Calculate skewness, kurtosis and conduct their respective tests.
9.1.2 STUDY SESSIONS
9.1.2.1 STUDY SESSION 1: Introduction
Section and Subsection Headings
Introduction
Learning outcomes
Main Content
(A) Meaning and Definitions of Statistics
(B) Classification of Statistics
(C) Significance of Statistics
(D) Scope/Uses of Statistics
(E) Limitations of Statistics
(F) Sources of Data
(G) Types of Data
(H) Key Statistical Concepts
Summary
Self-assessment questions and Answers
References/Further Readings

Introduction
In this study session, you will be introduced to Business Statistics &
Quantitative Analysis which will form the basis of the subsequent discussions
that will follow. As you will understand, Business Statistics is used in many
disciplines such as financial analysis, econometrics, auditing, production &
operations, and marketing research. It provides knowledge and skills to interpret
and use statistical techniques in a variety of business applications. By means of
statistical concepts and statistical thinking, decisions makers will be able to
solve problems in a diversity of contexts, add value to decisions and also reduce
using guesswork on decision making.
Learning Outcomes
At the end of this study session you should be able to
1. Explain the meaning & definition of Business Statistics.
2. Discuss the classifications of statistics.
3. Discuss the uses and limitations of Business Statistics.
4. Explain the significance of statistics.
5. Know the types of statistics
6. Understand the key statistical concepts.

(A) Meaning and Definitions of Statistics


Students know statistics more intimately as a subject of study like economics,
mathematics, chemistry, physics, and others. It is a discipline, which
scientifically deals with data, and is often described as the science of data. In
dealing with statistics as data, statistics has developed appropriate methods of
collecting, presenting, summarising, and analysing data, and thus consists of a
body of these methods.

The word ‘statistics’ is used either in plural and singular form. In the plural
form, it refers to a set of figures or data. While in the singular form, statistics
refers to the whole body of tools that are used to collect data, organise and
interpret them and, finally, to draw conclusions from them. It should be noted
that both the aspects of statistics are important if the quantitative data are to
serve their purpose.

According to Horace Secrist, Statistics may be defined as the aggregate of facts


affected to a marked extent by multiplicity of causes, numerically expressed,
enumerated or estimated according to a reasonable standard of accuracy,
collected in a systematic manner, for a predetermined purpose and placed in a
relation to each other. To Spiegal, statistics is concerned with scientific method
for collecting, organising, summarising, presenting and analysing data as well
as drawing valid conclusions and making reasonable decisions on the basis of
such analysis.

Furthermore, W.I. King, sees Statistics as the method of judging collective,


natural or social phenomena from the results obtained by the analysis or
enumeration or collection of estimates. Seligman explored that statistics is a
science that deals with the methods of collecting, classifying, presenting,
comparing and interpreting numerical data collected to throw some light on any
sphere of enquiry.

We encounter statistics in our everyday life and in a more common usage.


According to Mason, Linda and marchal (1999), statistics can appear in
graphic form as well as in sentence form. A graph is often used to capture
reader attention and portray a large amount of data over an extended period of
time.

In view of the complexity of modern society and the increasing number of


variables, scientists analyse that statistics has now become part and parcel of
every discipline. For example, a person may decide to record the amount of
water he drinks every day, a married couple may decide to write in his diary
anytime he quarreled with his wife. If such records are kept over a period of
time, that becomes statistics. From the above information it is possible to
calculate the average number of time a married couple quarrel with his wife
either in one year or in five years. Such data can be useful to sociologist and
psychologist in analysing the relationship between married couples.
Therefore, statistics is concerned with abstracting data,
classifying it and then comparing it with data obtained
from similar sources so that plans and control
mechanism can be implemented.

ITQ: What is statistics?


ITA: Statistics is the mathematical science involved in the application of quantitative
principles to the collection, analysis and the presentation of numerical data.

(B) Classification of Statistics

There are two forms or classes of statistics: Descriptive and Inferential


Statistics.

1. Descriptive statistics deals with collecting, organising, summarising,


simplifying and presenting data, which are otherwise quite unwieldy and
voluminous in a meaningful form or usable format. It seeks to achieve this in a
manner that meaningful conclusions can be readily drawn from the data.

Descriptive statistics may thus be seen as comprising methods of bringing out


and highlighting the latent characteristics present in a set of numerical data. It
does not only facilitates an understanding of the data and systematic reporting
thereof in a manner; but also makes them amenable to further discussion,
analysis, and interpretations. A well thought-out and sharp data classification
facilitates easy description of the hidden data characteristics by means of a
variety of summary measures. These include measures of central tendency,
dispersion, skewness, and kurtosis, which constitute the essential scope of
descriptive statistics.
2. Inferential statistics consists of methods that are used for drawing
inferences, or making broad generalisations, about a totality of observations on
the basis of knowledge about a part of that totality. Inferential statistics, also
known as inductive statistics, goes beyond describing a given problem
situation by means of collecting, summarising, and meaningfully presenting the
related data. The totality of observations about which an inference may be
drawn, or a generalisation made, is called a population or a universe. The part
of totality, which is observed for data collection and analysis to gain knowledge
about the population, is called a sample.

The desired information about a given population of our interest; may also be
collected even by observing all the units comprising the population. This total
coverage is called census. Getting the desired value for the population through
census is not always feasible and practical for various reasons. Apart from time
and money considerations making the census operations prohibitive, observing
each individual unit of the population with reference to any data characteristic
may at times involve even destructive testing. In such cases, obviously, the only
recourse available is to employ the partial or incomplete information gathered
through a sample for the purpose. This is precisely what inferential statistics
does. Thus, obtaining a particular value from the sample information and using
it for drawing an inference about the entire population underlies the subject
matter of inferential statistics. Example of Population can be all Ahmadu Bello
University (ABU) Zaria-Nigeria; while Example of Sample can be all MBA
Students of ABU Zaria-Nigeria.

ITQ: What are the two forms of statistics?


ITA: They are Descriptive Statistics and Inferential Statistics.
(C) Significance of Statistics
The significant roles or functions of statistics are many in the society. The
following are few important ones.

1. Compression: The word ‘compress’ means to reduce or to condense. This


method is applied to facilitate the understanding of a huge mass of data by
providing only few observations. If in a particular class of students at the
Ahmadu Bello University (ABU) Zaria-Nigeria, only marks in an examination
are given, no purpose will be served but it would serve a better purpose if we
are given the average mark in that particular examination. Similarly, the range
of marks is also another measure of the data. Thus, Statistical measures help to
reduce the complexity of the data and consequently to understand any huge
mass of data.

2. Evaluation: The two main methods used in condensing data are


classification and tabulation method. These help researchers to compare and
contrast data collected from different sources. Grand totals, measures of central
tendency, measures of dispersion, graphs and diagrams, coefficient of
correlation etc, and provide ample scope for comparison. This is another
important function statistics performs. For example, if the rice production (in
Tonnes) by the commercial white farmers in Kebbi State of Nigeria is known,
then we can compare it with the production of the same commodity in Bida,
Niger State of the same region or the production of two different regions within
Nigeria. A comparative study can be made as statistics is an aggregate of facts,
comparison is always possible and in fact, comparison helps us to understand
the data in a better way.

3. Forecasting: The word ‘forecasting’ means to predict, estimate or to project


into the future. Given the data of the last ten years connected to rainfall of a
particular state in Nigeria, it is possible to predict or forecast the rainfall for the
near future. In politics, forecasts are possible on voting patterns, election results,
etc just as in business where forecasting also plays a dominant role in
connection with production, sales, profits etc. The analysis of time series and
regression analysis, which are provided by statistics, play a significant role in
such exercise.

4. Estimation: One of the main objectives of statistics is drawing inference


about a population from the analysis for the sample drawn from that population.
In estimation theory, we estimate the unknown value of the population
parameter based on the sample observed. Assuming we are given a sample of
heights of hundred students in the Faculty of Social Sciences of ABU, based
upon the heights of these 100 students, it is possible to estimate the average
height of all students in that Faculty.

5. Tests of Hypothesis: A statistical hypothesis is a statement or postulation or


a theory about the relationship between a dependent and independent variables.
In the formulation and testing of hypothesis, statistical methods are extremely
useful. For instance, we may be interested in knowing whether high rate of
unemployment affects re-election of an incumbent President in Nigeria or
whether crop yields increases because of the application of new fertilizer or
whether the involvement of Emirs in the immunisation campaign is effective in
reducing/eliminating polio
disease in the Northern part of
Nigeria, are some examples of
statements of hypothesis and
these are tested by proper
statistical tools.
ITQ: Mention the functions of statistics
ITA: Compression, Evaluation, Forecasting, Estimation and test of hypothesis.

(D) Scope/Uses of Statistics


Statistics is not a mere device for collecting numerical data, but also a means of
developing sound techniques for their handling, analysing and drawing valid
inferences from them. It is applied in every sphere of human endeavour – social
as well as physical sciences – like Biology, Economics, Education, Planning,
Politics, Information Technology, etc. It is almost impossible to find a single
department of human activity where statistics is not applicable. We now discuss
briefly the applications of statistics in other disciplines.

1. Industry: In industries, control charts are widely used to maintain a certain


quality level. In production engineering, to find whether the product is
conforming to specifications or not, statistical tools, namely inspection plans,
control charts, etc., are of extreme importance. In inspection plans, we have to
resort to some kind of sampling – a very important aspect of Statistics.

2. Commerce: Statistical data are lifeblood of successful trade and commerce.


No business can afford to ignore the inventory or sales records by either over or
under stocking of goods. In the beginning the businessperson has to study and
estimate, the interplay of market forces (demand, supply and price) for his/her
goods and then takes steps to adjust with his output or purchases. Thus,
statistics is indispensable in business and commerce. The trend of business
adjusts to a number of economic factors. In this connection, market survey plays
an important role to exhibit the present conditions and to forecast the likely
changes in future.
3. Political Economy: Statistical methods are useful in measuring numerical
changes in complex groups and interpreting collective phenomena. Nowadays
the uses of statistics are abundantly made in addressing many economic and
political problems and it also plays important roles in economic/political theory
and practice. Alfred Marshall opines, “Statistics is the straw only which every
other economist has to make the bricks.” Statistical tools are immensely useful
in solving many political-economic problems such as wages, prices, production,
distribution of income, wealth, population census, voting pattern, constituency
delimitations and so on.

4. Education: In education sector, the usefulness of statistics cannot be under-


estimated since research has become a common feature in all branches of
educational activities. Statistics is necessary for the formulation of policies on
courses, budget estimation, consideration of facilities available and job creation
for the graduates. Many scholars are engaged in research work to test the past
knowledge and evolve new knowledge. These are possible only through
statistics.

5. Planning: Statistics is indispensable in planning in the modern world.


Almost all the ministries, departments and agencies of government are seeking
the help of planning for efficient operations, formulation and implementation of
policies. In order to achieve this goals, the statistical data relating to all sectors
of the state economy and the society at large such as
production, consumption, demand, supply, prices,
investments, income expenditure etc are collected
through statistical techniques for processing,
analysing and interpretation. In Nigeria, though not
accurately available, the important roles played by
statistics in planning both at the central, state and local government levels,
cannot be over-emphasised.

ITQ: Mention the discipline’s that uses statistics


ITA: Industry, Commerce, Political economy, Education and Planning.

(E) Limitations of Statistics


Since there are no roses without thorns, Statistics with all its seeming bed of
roses in every sphere of human activity, have its own limitations and drawbacks
some of which are itemised below.

1. The Study of Qualitative Phenomena: Since statistics is a science and deals


with a set of numerical data, it is applicable to the study of only these subjects
of enquiry, which can be expressed in terms of quantitative measurements. In
fact, qualitative phenomenon like honesty, poverty, beauty, intelligence etc,
cannot be expressed in terms of number and no statistical analysis can be
directly applied on these qualitative phenomena. However, statistical techniques
may be applied indirectly by first reducing the qualitative expressions to
accurate quantitative terms. For example, the intelligence of a group of students
can be studied based on their marks in a particular examination.

2. Individuality: Statistics does not attach any specific importance to the


individual items rather; it deals with aggregates of objects. Individual items,
when they are taken individually do not constitute any statistical data and do not
serve any purpose for any statistical enquiry.

3. Lack of Exactitude: It is well known that mathematical and physical


sciences are exact but statistical laws are not as exact but only approximations.
Statistical conclusions may not have universal validity.
4. Misuse of Records: Statistics must be used only by experts otherwise
statistical methods are the most dangerous tools in the hands of the
inexperienced people. The use of statistical tools by the untrained persons might
lead to wrong conclusions. It may be easily misused by quoting wrong figures
of data to dress lies in the gown ‘fact’ in order to achieve a selfish interest.

ITQ: What are the limitations of statistics?


ITA: The study of qualitative phenomena, Individuality, Lack of exactitude and Misuse
of records.

(F) Sources of Data


Statistical data are the basic raw material of statistics. Data may relate to an
activity of our interest, a phenomenon, or a problem situation under study. They
derive as result of the process of measuring, counting and/or observing.
Statistical data, therefore, refer to those aspects of a problem situation that can
be measured, quantified, counted, or classified.

Data sources could be seen as of two types, viz., primary and secondary. The
two can be defined as under:

1. Primary data: Those data which do not already exist in any form, and thus
have to be collected for the first time from the primary source(s). By their very
nature, these data require fresh and first-time collection covering the whole
population or a sample drawn from it. The various methods of collecting
primary data include surveys, interview, observation, questionnaire,
experiments etc.

2. Secondary data: They already exist in some form: published or unpublished-


in an identifiable secondary source. They are, generally, available from
published source(s), though not necessarily in the form actually required.
Secondary data are always collected from published sources, like Textbooks,
Journals, Newspapers, Magazines, Gazette etc.

ITQ: What are the sources of data?


ITA: They are: Primary sources (surveys, interview, observation, questionnaire,
experiments. etc) and secondary sources (Textbooks, Journals, Newspapers, Magazines,
Gazette.) etc

(G) Types of Data


In statistics, data are classified into two broad categories: quantitative data and
qualitative data. This classification is based on the kind of characteristics that
are measured.

1. Quantitative data: are those that can be quantified in definite units of


measurement. These refer to characteristics whose successive measurements
yield quantifiable observations. Depending on the nature of the variable
observed for measurement, quantitative data can be further categorized as
continuous and discrete data. A variable is a characteristics, number or quantity
that increases or decreases overtime, or takes different values in different
situation. Obviously, a variable may be a continuous variable or a discrete
variable.

i. Continuous data: represent the numerical values of a continuous variable. A


continuous variable is the one that can assume any value between any two
points on a line segment, thus representing an interval of values. The values are
quite precise and close to each other, yet distinguishably different. All
characteristics such as weight, length, height, thickness, velocity, temperature,
tensile strength, etc., represent continuous variables. Thus, the data recorded on
these and similar other characteristics are called continuous data. It may be
noted that a continuous variable assumes the finest unit of measurement. Finest
in the sense that it enables measurements to the maximum degree of precision.

ii. Discrete data: are the values assumed by a discrete variable. A discrete
variable is the one whose outcomes are measured in fixed numbers. Such data
are essentially count data. These are derived from a process of counting, such as
the number of items possessing or not possessing a certain characteristic. The
number of customers visiting a departmental store every day, the incoming
flights at an airport, and the defective items in a consignment received for sale,
are all examples of discrete data.

2. Qualitative data: refer to qualitative characteristics of a subject or an object.


A characteristic is qualitative in nature when its observations are defined and
noted in terms of the presence or absence of a certain attribute in discrete
numbers. These data are further classified as nominal and rank data.

i. Nominal data: are the outcome of classification into two or more categories
of items or units comprising a sample or a population according to some quality
characteristic. Classification of students according to sex (as males and
females), of workers according to skill (as skilled, semi-skilled, and unskilled),
and of employees according to the level of education (as matriculates,
undergraduates, and post-graduates), all result into nominal data. Given any
such basis of classification, it is always possible to assign each item to a
particular class and make a summation of items belonging to each class. The
count data so obtained are called nominal data.

ii. Rank data: on the other hand, are the result of assigning ranks to specify
order in terms of the integers 1,2,3, ..., n. Ranks may be assigned according to
the level of performance in a test, a contest, a competition, an interview, or a
show. The candidates appearing in an interview, for example, may be assigned
ranks in integers ranging from 1 to n, depending on their performance in the
interview. Ranks so assigned can be viewed as the continuous values of a
variable involving performance as the quality characteristic.

ITQ: Mention the types of data and their examples


ITA: Quantitative data (continuous data, discrete data) and Qualitative data (nominal
data and rank data).

(H) Key Statistical Concepts


Let us quickly define some key statistical concepts you will continue to come
across in this course.

i. Data: This could be defined as pieces of information that represent the


qualitative or quantitative attributes of a variable or set of variables. Data are
typically the results of measurements and can be the basis of graphs, images or
observations of a set of variables. Data are often viewed as the lowest level of
abstraction from which information and knowledge are derived for statistical
analysis.

ii. Variable: This is any quality that can have a number of values, which may
be either discrete or continuous. A variable is a property that can take on
different values. Individual in a class may differ in sex, age, intelligence, height
etc. These properties are variables. Variables could vary in quality or in
quantity. Constants unlike variables do not assume different values.

iii. Dependent Variable: A variable whose values are influenced by the values
of another variable so that a change in the latter will cause a change in the
former. E.g. y=3x, y is a dependent variable as the value of x will cause a
change in the value of y. That is change in y depend in x.
iv. Independent Variable: The variable which exerts influence on another
variable in the previous section is the independent variable. The value of the
independent variables explains the value of the dependent variable e.g. y=3x-2
this is a functional relationship between x and y where y is the dependent
variable and x is the independent variable.

v. Quantitative Variable: This is a variable whose values are given as


numerical quantities. It is very easy to measure and compare with others e.g.
weight, height, age, distance, marks obtained in a test etc.

vi. Qualitative Variable: This is a variable that is not measurable in numerical


form or that cannot be counted. They are only categorised e.g. taste of some
brands of a biscuit, gender, nationality, social economic status, academic
qualifications, marital status, colours of fruits etc.

vii. Discrete Variable: This is the variable that can only assume whole
numbers. Examples of these are the number of Local Government Council
Areas of the States in Nigeria, number of female students in the various
programmes in the Ahmadu Bello University.

A discrete variable has “interruptions” between the values it can assume. For
instance between 1 and 2, there are infinite number of values such as 1.1, 1.11,
1.111, 1. 1 1 l l and so on. These are called interruptions.

viii. Continuous Variable: This is a variable that can assume both decimal and
non-decimal values. There is always a continuum of values that the continuous
variable can assume. The interruptions that characterize the discrete variable are
absent in the continuous variable. The weight can be either whole values or
decimal values such as 20 kilograms and 220.1752 kilograms.
ix. Distribution: This is the arrangement of a set of numbers classified
according to some properties or attributes such as age, height, weight, etc.

x. Population: This consists of the totality of the observations of a particular


group. For instance, if there are 800 farmers in a community that are engaged in
farming, we say the population size is 800. Also, in the study of how workers in
Nigeria spend their leisure hours, the number of workers in Nigeria constitutes
the population of the study. This measurement is of interest representing the
aggregate of units to be covered, which could be finite or infinite. When the
population can easily be counted then it is said to be finite e.g. the number of
contestants for a political post but if the population under consideration is large
e.g. the grain of sand, then we say it is infinite.

xi. Sample: This is the part of the population that is selected for a study. It is a
subset of a population. It is also a sub-group or sub-aggregate drawn from a
population; i.e. the portion appropriately selected out of the population by the
same statistical method for observation.

xii. Random Sample: This is a sample drawn from a population in such a way
that the results of its analysis may be used to generalize about the population
from which it was drawn.

xiii. Parameter: Any numerical value describing a characteristic of a


population is known as parameter. It is a situation when mean (or Average),
standard deviation or variance of a population are computed for statistical
analysis.
xiv. Statistic: This refers to a descriptive
measure of a sample, i.e. a numerical value or
function computed to describe a sample or
population.

ITQ: Mention the key statistical concepts


ITA: Data, Variable, Dependent variable, Independent variable, Qualitative variable,
Quantitative variable, Discrete variable, Continuous variable, Distribution, Population,
Sample, Random sample, Parameter and statistics.

Summary
In this study session, we have discussed the basics of Business Statistics &
Quantitative Analysis which form the basis of the subsequent discussions that
followed. As you have learnt, Business Statistics & Quantitative Analysis is
used in many disciplines such as financial analysis, econometrics, auditing,
production & operations, and marketing research. It provides knowledge and
skills to interpret and use statistical techniques in a variety of business
applications. By means of statistical concepts and statistical thinking, decisions
makers will be able to solve problems in a diversity of contexts, add value to
decisions and also reduce using guesswork on decision making. We further
discussed the classification of statistics, its significance, scope, limitations,
sources, types and also, key statistical concept was defined.

(I) Discussion Questions


1. Compare and contrast the definitions of statistics given by Bowley,
Croxton and Cowden, and Secrist.
2. What are the rationales behind taking into consideration data source and
types before making statistical decisions?
3. Discuss the significance of statistics
4. What are the limitations of statistics?
5. Discuss the sources and types of statistics

References
1. Dominick, Salvatore and Derrick, Reagle (2002). Theory and Problems of
Statistics and Econometrics (2ndEdition). New York: McGraw-Hill
Companies, Inc.
2. Gerald, K and Brian, W (2006).Statistics for Management and Economics
(10th Edition). Pacific Grove-USA: Brooks/Cole-Thompson Learning.
3. Hooda, R. P (2014). Statistics for Business and Economics. New Delhi:
Macmillan.
4. Hein, L. W (2015). Quantitative Approach to Managerial Decisions. New
Jersey: Prentice Hall.
5. Lind, Marchal and Marson (2005). Statistical Techniques in Business and
Economics (12th Edition). New York: McGraw-Hill Companies.
9.1.2.2 STUDY SESSION 2
Presentations of Statistical Data
Section and Subsection Headings
Introduction
Learning outcomes
Main Content
(A) Tabulation Method
(B) Charting Method
1. Bar Chart
2. Simple Bar Chart
3. Multiple Bar Chart
4. Component Bar Chart
5. Percentage Component Bar Chart
6. Pie Chart
(C) Graphical Method
(D) Discussion Questions
Summary
Self-assessment questions and Answers
References/Further Readings

Introduction
You are welcome to study session two. In this session, you will understand the
role tabulation, charting as well as graphical methods in Business Statistics.

Learning Outcomes
At the end of the study session, you should be able to
1. Explain the role tabulation in Business Statistics.
2. Discuss the concept of charting & graphical methods in Business
Statistics.
Tabulation, Charting & Graphical Methods
Data in their numerical (quantitative), categorical (qualitative) or ranking
(ordinal) forms do not seem meaningful until they are presented in tables, charts
or graphs. For the purpose of this study, we will focus more on the use of tables,
charts as well as graphical techniques.

(A) Tabulation Method


A table is a form of presentation of data in row and column. When data are
collected in numerical forms, you must find a way of reducing and simplifying
the details. You must try to put the data in to a form that brings out the main
features and makes it easily understood. This is done by means of statistical
table where the data are arranged in row and columns. Tabulation condenses
and facilitates comparism of data. Most published data are in tabular forms.

Characteristics of a Table
1. A title which is a brief explanation of what the table is all about.
2. A column title or caption to show order of classification along the
columns.
3. A row title or sub title to show order of classification along the row.
4. A source note at the bottom which gives the source of the information
contained in the table.
5. An indication of the units in which the data in the table is given, usually
at the right hand corner.
The table below illustrates these characteristics
Table: Values of Principal Export Crops of Nigeria
(2000-2005) N000, 000
Export Periods
Crops 2000 2001 2002 2003 2004 2005
Cocoa 73.6 67.4 66.6 64.8 80.2 85.4
Palm 55.6 66.8 55.6 60.4 63.4 80.2
produce
Groundnut 59.6 78.4 82.2 91.8 94.0 106.2
Total 188.8 212.6 200.4 217.0 237.6 271.8

(B) Charting Method


This includes:
a. Bar Chart
A bar chart is a chart that uses rectangle. The rectangle height in a bar chart is
proportional to the magnitude of the variable. A bar chart can be drawn either
vertically or horizontally depending on one’s willingness. The rectangles in a
bar chart are not joined together but separated from each other unlike histogram
that the rectangles are joined together.

Types of Bar chart


These include:
i) Simple
ii) Multiple
iii) Component
iv) Percentage component bar chart

i) Simple Bar Chart


A simple bar chart is use for a series of simple (single) variable. It is used to
represent non-numerical frequency distributions. A simple bar chart can be
drawn either vertically or horizontally.
Example:
The following data is the record of Postgraduate’s enrolment in the Department
of Business Administration in Ahmadu Bello University (ABU) Zaria-Nigeria.

Year Postgraduates
1990 300
1991 400
1992 500
1993 600
1994 750

Required: Draw a simple bar chart to represent the information.


Solution: Vertically drawn simple bar chart
800

600

400

200

0
1990 1991 1992 1993 1994

Year
Horizontally drawn simple bar chart

1994

1993

1992

1991

1990

0 100 200 300 400 500 600 700 800

Postgraduate
ii) Multiple Bar Chart
In a multiple bar chart the variables of interest are represented as A & B in the
diagram. If for example, A is 20 for management staff and B is 130 for senior
staff in a particular organization. Then, presentation of information of the
organization in term of management and senior staff in a form of multiple bar
chart is presented as follows:

Solution:
140
120
100
80
management
60
senior staff
40
20
0
management senior

Another Example:
A women council that wants to see the active involvement of women in private
sector employment conducted a research. The research revealed the data for
XYZ Company as follows:
Year Male Female
1990 150 50
1991 200 100
1992 300 150
1993 350 150
1994 500 300

Required: Present the above information in a multiple bar chart.


Solution:
600
500
400
Male
300
Female
200
100
0
1990 1991 1992 1993 1994

Year
iii) Component Bar Chart
Component bar chart is an essential part of variable of interest.
Component means the addition of variables of interest.

Example:
Using the above information of a women council that wants to see the active
involvement of women in private sector employment who conducted a research
that revealed the data for XYZ Company as follows:
Year Male Female Total
1990 150 50 200
1991 200 100 300
1992 300 150 450
1993 350 150 500
1994 500 300 800

Required: Present the information in a component bar chart


Solution:
1000
800
600
Female
400
Male
200
0
1990 1991 1992 1993 1994

Year
iv) Percentage Component Bar Chart
Percentage component bar chart is an essential part of variables of interest
expressed in percentage from having each bar split in to constituent parts.

Example:
Refer to the above information of a women council which is reproduced below
and depict the information in a percentage component chart.
Year Male Female Total
1990 150 50 200
1991 200 100 300
1992 300 150 450
1993 350 150 500
1994 500 300 800

Solution:
Year Male% Female% Total
1990 75 25 100
1991 66.67 33.3 100
1992 66.67 33.3 100
1993 70 30 100
1994 62 37.5 100

100%
90%
80%
70%
60%
50% Female

40% Male

30%
20%
10%
0%
1990 1991 1992 1993 1994

Year
ITQ: What are the types of chat?
ITA: Simple bar chat, Multiple bar chat, Component bar chat and Percentage component
bar chat.

b. Pie Chart
Pie chart is a cyclical representation of data. The circle is divided into various
variable of interest. Pie chart demonstrates the proportion of variable within a
group of variables and relates them to the group as a whole. It is a circular
diagram, which is divided into segments. The area of the segment is
proportionate to the magnitude of the variables.

In the construction of a pie chart, you are to calculate the proportion of the total
that each frequency represent and multiply each proportion by 360 degrees.

Example:
The following information revealed the number of registered students of the
various departments in the Distance Learning Centre of Ahmadu Bello
University (ABU) Zaria-Nigeria.
Department Registered Students
Accounting 100
Business Administration 120
Economics 170
Geography 120
Political Science 190
Sociology 200

Required: Represent the above information on a pie chart


Solution:
100
Accounting = × 360 = 40°
900
120
Business = × 360 = 48°
900
170
Economics = × 360 = 68°
900

120
Geography = × 360 = 48°
900

190
Pol.science = × 360 = 76°
900

200
Sociology = × 360 = 80°
900

Pie Chart
Accountancy
40
80 Business
48
Economics
Geography
76 68
Pol. Science
48
Sociology

ITQ: What is a pie chat?

ITA: Pie chart is a cyclical representation of data whereby a circle is divided into various
variable of interest.

(C) Graphical Method


Histogram
A histogram is a graphical representation of a frequency distribution. They are
rectangular bars placed side by side. The vertical axis represents the frequency
while the horizontal axis represents the variable being represented. The class
boundary is written on this axis.
The area of each rectangular bar of a histogram is proportional to the
corresponding frequency. The histogram differs from the bar chart in the
following ways:
i. No gaps between the bars.
ii. The histogram can be used for grouped data.
iii. The centre of the base of each rectangular bar corresponds to the
class mark of the variable being represented.

ITQ: What is a histogram?


ITA: A histogram is a graphical representation of a frequency distribution.

Exercise:
The following is the record of student offering various courses in the Business
Administration department of Ahmadu Bello University (ABU), Zaria-Nigeria.
Courses Scores
BUAD 101 90
BUAD 111 70
BUAD 113 40
BUAD 115 50
BUAD 117 30
BUAD 119 65

Required: Represent the above information in a histogram

Summary
In this study session, you have been made to understand the role of tabulation
method, charting method as well as graphical methods in Business Statistics.

(D) Discussion Questions


1. What do you understand by variability and what are the purposes of
measuring it?
2. Discuss the characteristics of good measure of dispersion.
3. Why is there the need to be cautious when interpreting graphs and
diagrams?
4. What are the advantages of presenting data in a table?
5. Why is it sometimes better to present data diagrammatically?
6. What are the essential features that are to appear on every table or
diagram?
7. With relevant examples, discuss the tabulation method
8. What is the difference between a bar chat and a pie chat?
9. Explain the types of chat that you know.

References
1. Dominick, Salvatore and Derrick, Reagle (2002). Theory and Problems of
Statistics and Econometrics (2nd Edition). New York: McGraw-Hill
Companies, Inc.
2. Gerald, K and Brian, W (2006).Statistics for Management and Economics
(10th Edition). Pacific Grove-USA: Brooks/Cole-Thompson Learning.
3. Hooda, R. P (2014). Statistics for Business and Economics. New Delhi:
Macmillan.
4. Hein, L. W (2015). Quantitative Approach to Managerial Decisions. New
Jersey: Prentice Hall.
5. Lind, Marchal and Marson (2005). Statistical Techniques in Business and
Economics (12th Edition). New York: McGraw-Hill Companies.
9.1.2.3 STUDY SESSION 3
Measures of Central Tendency
Section and Subsection Headings
Introduction
Learning outcomes
Main Content
(A) Meaning of Central Tendency
(B) Measures of Central Tendency
i) Arithmetic Mean or Mean
ii) Median
iii) Mode
(C) Empirical Relationship between Mean, Median and Mode
(D) Specialised Averages
a. Harmonic Mean
b. Geometric Mean
(E) Relationship between Arithmetic Mean, Harmonic and Geometric
Mean
(F) Discussion Questions
Summary
Self-assessment questions and Answers
References/Further Readings

Introduction
In this study session, you will understand the meaning of central tendency, the
measures of central tendency as well as the specialised averages.

Learning outcomes
At the end of the study, you should be able to:
1. Understand the meaning of central tendency.
2. Know the measures of central tendency.

3. Discussed the role of specialised averages.

(A) Meaning of Central Tendency


The central tendency or locations enable us to compare two or more
distributions pertaining to the same time period or within the same distribution
over time. For example, the average consumption of tea in two different
territories for the same period or in a territory for two years, say, 2014 and
2015, can be attempted by means of an average. The measures of central
tendency is important in determining averages of numerical values and it is the
value to be expected of a typical or middle data point.
ITQ: What does the central tendency or location enable us to do?
ITA: It enable us to compare two or more distributions pertaining to the same time
period or within the same distribution over time.

(B) Measures of Central Tendency


The measures central tendency or measures of location are:
i) Arithmetic mean
ii) Median
iii) Mode

i) Arithmetic Mean or Mean


One of the most important components of measures of central tendency is the
arithmetic mean. It is sometimes called mean or simple average. It is
calculated by taking the sum of the variables within a group and dividing the
sum by number of variables (observations) within that group.
Formula for Calculating Arithmetic Mean (Ungrouped Data)
Suppose items in a distribution are given as;
X 1 , X 2 , X 3 , X 4 …….X n
The mean for the distribution is represented by a symbol (X-bar)
X 1 + X 2 + X 3 + X 4 + ........... + X n
X=
n
n
xi
X=∑
i =1 n

Where: i = 1
n = Number of items in a distribution
The symbol ∑ (sigma) is the summation in Greek. Removing the subscript from
the above, we have
x
X=∑
n

Formula for Calculating Arithmetic Mean (Grouped Data)

X=
∑ fx
∑f
Where: x = class mark
∑f= total frequency

Assumed Mean for Grouped Data:

X = Xa +
∑ fd
∑f
Where: X a = assumed mean

d = deviation of class mark (X) about the assumed mean (X – X a )

Example:
Give the date below, compute the mean using the two methods.
Sales Frequency
10 – 20 2
20 – 30 4
30 – 40 3
40 – 50 7
50 – 60 5
60 – 70 2
70 – 80 3

Solution:
Sales Class mark (x) Frequency (f) Fx
10 – 20 15 2 30
20 – 30 25 4 100
30 – 40 35 3 105
40 – 50 45 7 315
50 – 60 55 5 275
60 – 70 65 2 130
70 – 80 75 3 225
Total 26 1180
* Class mark (x) is calculated by adding the lower class limit and upper class limit and
divide by two.
10 + 20 30
= = 15
2 2

X=
∑ fx = 1180 = 45.38
∑ f 26

Assumed mean solution:


X a = 45
Sales Class mark (x) Frequency (f) d = x - xa fd
10 – 20 15 2 -30 -60
20 – 30 25 4 -20 -80
30 – 40 35 3 -10 -30
40 – 50 45 7 0 0
50 – 60 55 5 10 50
60 – 70 65 2 20 40
70 – 80 75 3 30 90
Total 26 10
X = Xa +
∑ fd
∑f
10
= 45 +
26

= 45 + 0.38 = 45.38
ii) Median
Median is one of the statistical measures of central tendency. There is confusion
as to what constitutes a median in a given distribution. Median is said to be a
middle item in a distribution. This is true when the items in the distribution are
odd number. In the case of even number, a little calculation is made to simplify
the location of the median.

A median can simply be said, that value of observation which divides a data
into two equal parts. It may be defined as the value of the middle observation
(or the mean of the values of two middle observation) when the observations are
arranged in an ascending or descending order of magnitude.

ITQ: What is a median?


ITA: Median is one of the statistical measures of central tendency. It is the middle item
in a distribution.

Formula of Median for Ungrouped Data = (n + 1)th item


2
Where: n = number of items in a distribution.

Example:
Calculate the median of 8, 12, 14, 19, and 11
Solution:
 n + 1
th

Median =  term
 2 

 5 + 1  6 
=  =   = 3 rd item
 2  2
By this formula, the 3rd item in the distribution after arranging into ascending
order is the median
8, 11, 12, 14, 19
So median = 12

Example:
Compute the median, 3, 5, 7, and 9

Solution:
In the case of even number, the formula is:
 n + 1
th

Median =  term
 2 

 4 + 1  5 
= = = 2.5
 2   2 
The items in the distribution between 2nd and 3rd will be added and divide by
two.
5 + 7 12
= = =6
2 2

So median = 6

Formula for Calculating Median for Grouped Data


N 
 − (∑ f ) p 
Median = L m +  2 C
 fm 
 
 
Where: L m = lower class boundary of the median class (i.e. the class at
which the median fall).
N = total frequency (i.e. number of item in the distribution).
(∑f)p= sum of all frequencies before the median class
f m= frequency of the median class
C = class size or (class interval)

Example:
Given the information for the group data below:
Compute the median.
Sales Frequency
10 – 20 2
20 – 30 4
30 – 40 3
40 – 50 7
50 – 60 5
60 – 70 2
70 – 80 3

Solution:
Sales Frequency Cumulative
10 – 20 2 2
20 – 30 4 6
30 – 40 3 9
40 – 50 7 16
50 – 60 5 21
60 – 70 2 23
70 – 80 3 26

To determine the median class N = 26 = 13


2 2

Then 13 had to be located at a position that will fall within the cumulative
frequency. Therefore 13 will fall under 16 and the lower class boundary will be
40. The lower class boundary is determined by adding the lower class limit and
the upper class limit and divide by two.
i.e.
L m= 40
N/2 = 26/2 = 13
(∑f)p = 9
f m= 7
C = 10
N 
 − (∑ f ) p 
Median = L m +  2 C
 fm 
 
 
 13 − 9 
= 40 +  10
 7 
= 40 + 4/7 x 10 = 45.71

Graphical Determination of Median


Median can also be determined by less than Ogive. The application of Ogive
helps to compute the median of a distribution graphically. The cumulative
frequency is plotted on the vertical axis while the variable on the horizontal axis
of a graph. Half the total frequency is located on graph side of the cumulative
frequency and a line is drawn horizontally until it touches the less than Ogive
line and also drawn downward vertically and touches the variable where it
touches the variable is the median.

Exercise:
Given the data below:
Sales Frequency
10 – 20 2
20 – 30 4
30 – 40 3
40 – 50 7
50 – 60 5
60 – 70 2
70 – 80 3
Determine the median using graphical method
Solution:
Sales Frequency Less than Ogive Cumulative frequency
10 – 20 2 Less than 10 0
20 – 30 4 Less than 20 2
30 – 40 3 Less than 30 6
40 – 50 7 Less than 40 9
50 – 60 5 Less than 50 16
60 – 70 2 Less than 60 21
70 – 80 3 Less than 70 23
Less than 80 26

Graphical determination of median

Advantages of Median
1. It eliminates the effect of extreme values
2. It often corresponds to a definite item in a distribution.
3. It is easy to calculate and understand
4. Only the values of the middle items need to be known i.e., the median can
still be calculated even if the first and the last classes are open-ended and
the lower and upper limits are known.

Disadvantages of Median
1. If the distribution is irregular, the indication of the median may not be
defined.
2. When the items are grouped, it may not be possible for the median to be
located exactly.

iii) Mode
Mode is a French word meaning fashion. It is defined as the most frequency
(fashionable) value. In simple term, mode is the item with the highest frequency
in a distribution for ungrouped data. In other words the items that occur very
often in a distribution is a mode.

A distribution with one mode is often referred to as Bimodal and three or more
modes are referred to as Multi-modal distribution.

Example:
Determine the mode of the following distribution:
2, 3, 4, 2, 6, 2

Solution:
Mode = 2
The mode is 2 because the number occurs more than twice.

Example:
Determine the mode for the following distribution,
3, 6, 2, 4, 3, 3, 2, 2, 5

Solution:
Mode = 3
The mode is 3 because the number 3 occur more than twice and therefore it is
multi-modal.

Formula for Calculating Mode for Grouped Data


 d1 
mode = L +  C
 d1 + d 2 

Where:
L = lower class boundary of the modal class
d 1 = Excess of modal frequency over the frequency of the class proceeding the
modal class
d 2 = Excess of modal frequency over the frequency of the next highest class
C = class interval of the modal class

Example:
Given the data compute the mode
Sales Frequency
10 – 20 2
20 – 30 4
30 – 40 3
40 – 50 7
50 – 60 5
60 – 70 2
70 – 80 3

Solution:
Sales Frequency
10 – 20 2
20 – 30 4
30 – 40 3
40 – 50 7*
50 – 60 5
60 – 70 2
70 – 80 3

* The highest frequency = 7


L = 40
d1= 7 – 3 = 4
d2 = 7 – 5 = 2
C = 10
 d1 
mode = L +  C
 d1 + d 2 
 4 
= 40 +  10
4+ 2
= 40 + 4 x 10
6
= 40 + 40
6
= 40 + 6.66 = 46.67

ITQ: What is a mode?


ITA: mode is the item with the highest frequency in a distribution for ungrouped data.

Graphical Determination of a Mode


A mode is determined graphically with the help of constructing a histogram.
The histogram constructed with the highest frequency determines the position of
the mode on the graph

Exercise:
Given the data below:
Determine the mode using graphical method.
Sales Frequency
10 – 20 2
20 – 30 4
30 – 40 3
40 – 50 7
50 – 60 5
60 – 70 2
70 – 80 3

Solution:
Mode, X = 46.67
The point of intersection of the points gives the mode of the distribution. When
a line is drawn vertically it gives the mode of the two intersection lines.
Advantages of Mode
1. Mode is very easy to calculate and understand.
2. Open-ended classes or extreme values do not affect it.
3. It is not necessary to know the values of all the items in the distribution in
order to calculate the mode.

Disadvantages of Mode
1. Mode is not a good measure of central tendency because it depends on
the arbitrary grouping of data.
2. Because of mode impression, its usefulness in calculations requiring a
high degree of accuracy is limited, particularly if the distribution is
bimodal or widely dispersed.

(C) Empirical Relationship between Mean, Median and Mode


In a situation where we have a unimodal frequency curves which are skewed
asymmetrically. The relationship empirically of mean, median and mode is
given as follows:

Mean – mode = 3 (Mean – Median)


Many are times, the mean, median and mode of a distribution may be same.
Such coincide is termed as symmetrical curve. Which simply mean a
symmetrical distribution/curve is where the mean, median, and mode are the
same.

The position of the mean, median and mode for frequency curves are skewed to
the right and left respectively are shown below
Example:
Form the previous example 4 and 7 sales calculations and answer obtained of
mean and median. Use the empirical formular to calcutate the mode when mean
is 45.38 and median 45.71

Solution:
Mean = 45.38
Median = 45.71
Mean – mode = 3 (Mean – Median)
Mode = mean = 3 (mean – median)
= 45.38 – 3 (45.38 – 45.71)
+ 84 5.38 – ( - 0. 33)
= 45.38 + 0.99
= 46.37
This empirical formular of 46.37 compared with
formula of 46. 67 shows a good acceptability of
the agreement of the empirical formular.

(D) Specialised Averages


This includes:

a. Harmonic Mean
Harmonic mean is defined as the reciprocal of the mean of reciprocal of items in
a distribution. Harmonic mean used to measure average rate of change.

Suppose the following items:


X1, X2, X3 … Xn in a distribution
n
HM = n
1
∑X
i =1 i

Examples:
Kunun-Zaki is one of the popular fermented beverage drinks in northern
Nigeria. To manufacture kunun-Zaki the cost of the ingredients per kilogramme,
which serves as a mixture, are as follows:

Items N
Millet 9
Guinea Corn 6
Sugar 3
Spies 2

Calculate the cost per kilogramme of the mixture


Solution:
n
HM =
1
∑x
4
=
1 1 1 1
+ + +
9 6 3 2
4
=
2+3+ 6+9
18
4 4 × 18 72
= = = = 3.60
20 20 20
18
The cost per kilogramme of the mixture is N 3.60

Harmonic Mean for Grouped Data


Suppose X1, X2, X3 … represent the class marks in a distribution
Then f1, f2, f3 … represent the class frequency
Then its harmonic mean is:
1 f f f f 
HM =  1 + 2 + 3 + ....... + n 
n  X1 X 2 X 3 Xn 

1 ∑ f
=  
n  X 
Where: f1+ f2+ f3+ …fn= ∑f

Example:
Given the data below:
Sales Frequency
10 – 20 2
20 – 30 4
30 – 40 3
40 – 50 7
50 – 60 5
60 – 70 2
70 – 80 3

Calculate the Harmonic Mean (HM)


Solution:
Sales Class marks Frequency
10 – 20 15 2
20 – 30 25 4
30 – 40 35 3
40 – 50 45 7
50 – 60 55 5
60 – 70 65 2
70 – 80 75 3

1 ∑ f 
HM =  
n  X 
1 2 4 3 7 5 2 3
=  + + + + + + 
26 15 25 35 45 55 65 75 
= 0.44[0.13 + 0.16 + 0.09 + 0.16 + 0.09 + 0.03 + 0.04]
= 0.4 × 0.7
= 0.028
b. Geometric Mean
Geometric mean used to measure average rate of change or growth for some
quantity, computed by taking the nth root of the product of n values representing
change. It is represented by a symbol either G or GM.
For a set observation N
X1, X2, X3 … Xn
GM = N X 1 , X 2, X 3 ....... X n

This formula of Geometric Mean can further be simplify in this form:

GM = N X 1 , X 2, X 3 ....... X n
Or GM = ( X 1 , X 2 , X 3 ........ X n )
1
n

Take logarithms on both sides:


log GM = log( X 1 , X 2 , X 3 ........ X n ) n
1

log( X 1 , X 2 , X 3 ........ X n )
1
log GM =
n
Take antilog on both sides:
GM = anti log log( X 1 , X 2 , X 3 ........ X n )
1
n

GM = anti log (log X 1 + log X 2 + log X 3 + ....... + log X n )


1
n
1 n 
GM = anti log  ∑ log X i 
 n i =1 
Removing the subscript, we will have
1 
GM = anti log  ∑ log X 
n 

Example:
Calculate the geometric mean of the following numbers 6, 4, 3, 7.

Solution:
1 
GM = anti log  ∑ log X 
n 
= anti log (log 6 + log 4 + log 3 + log 7 )
1
4
= anti log
1
(0.7782 + 0.6021 + 0.4771 + 0.8451)
4

= anti log (2.7025)


1
4
= anti log 0.6756
GM = 4.7383

ITQ: Mention examples of specialised averages


ITA: They are: Harmonic mean and Geometric mean.

(E) Relationship between Arithmetic Mean, Harmonic and Geometric


Mean
Statistically the relationship between set of positive numbers in terms of
arithmetic, harmonic and geometric mean is based on less than or equal to. This
is further explained in this form
H ≤ G≤ X
This simply means that geometric mean is less than or equal to arithmetic mean
but greater than or equal to harmonic mean. This is only possible when equality
signs and all the numbers are identical.

Example:
In a class of a Nursery School three children has the following age, 5, 6, 7,
calculate:
a) Arithmetic mean
b) Harmonic mean
c) Geometric mean
d) Comment or relationship of a, b and c.

Solution:
a) =
∑X
n
5 + 6 + 7 18
= = =6
3 3
n
b) HM =
1
∑x
3
=
1 1 1
+ +
5 6 7
3
=
0.2 + 0.17 + 1.14
3
=
0.51
= 5.88
c) GM = N X 1 , X 2, X 3 ....... X n
= 3 5× 6× 7
= 3 210
= 5.94
d) The relationship of the three are as follows:
H<G<X
5.88 < 5.94 < 6

Quadratic Mean
Quadratic mean is one of the measures of central tendency. It is sometime called
Root Mean Square (RMS). Quadratic mean is more popular and useful in
physical applications.

Consider the following numbers or items:


X1, X2, X3 … Xn in a distribution which is sometime denoted
by √X2

Therefore QM = ∑X 2

Example:
Refer to the previous example and calculate the quadratic mean.
Solution:

QM =
∑X 2

n
52 + 6 2 + 7 2
=
3
25 + 36 + 49
=
3
110
=
3
= 36.6
= 6.06

The Relationship among Arithmetic Mean, Harmonic, Geometric and


Quadratic Mean
The relationship is expressed as:
H≤ G≤ X ≤ QM
5.88 ≤ 6 ≤ 6.06

Summary

In this study session, we have discussed the meaning of central tendency, the
measures of central tendency as well as the specialised averages.

(F) Discussion Questions


1. Discuss the advantages and the disadvantages of standard deviation.
2. What are the limitations of the measures of variability?
3. Why it is that sometimes the value of the median is smaller than that of
the mean?
4. What are the properties of good measure of central tendency?
References
1. Dominick, Salvatore and Derrick, Reagle (2002). Theory and Problems of
Statistics and Econometrics (2nd Edition). New York: McGraw-Hill
Companies, Inc.
2. Gerald, K and Brian, W (2006).Statistics for Management and Economics
(10th Edition). Pacific Grove-USA: Brooks/Cole-Thompson Learning.
3. Hooda, R. P (2014). Statistics for Business and Economics. New Delhi:
Macmillan.
4. Hein, L. W (2015). Quantitative Approach to Managerial Decisions. New
Jersey: Prentice Hall.
5. Lind, Marchal and Marson (2005). Statistical Techniques in Business and
Economics (12th Edition). New York: McGraw-Hill Companies.
9.1.2.4 STUDY SESSION 4
Measures of Dispersion
Section and Subsection Headings
Introduction
Learning outcomes
Main Content
(A) Meaning of Variability/Dispersion/Scatter/Spread
(B) Significance and Properties of Measuring Variability/Dispersion
(C) Measures of Variability/Dispersion
1. Range
2. Interquartile Range or Quartile Deviation
3. Mean Deviation (MD) or Mean Absolute Deviation (MAD)
4. Standard Deviation and Variance
5. Coefficient of Variation (CV or COV)
(D) Interpretation of Standard Deviation
(E) Discussion Questions
Summary
Self-assessment questions and Answers
References/Further Readings

Introduction
In this study session, you will be able to understand the meaning of variability,
significance and properties of variability as well as the measures of variability.

Learning Outcomes
At the end of the study session, you should be able to:
1. Discuss the meaning of variability.
2. Explain the significance and properties of variability.
3. Discuss the measures of variability.
(A) Meaning of Variability/Dispersion/Scatter/Spread
A measure of variation or dispersion is one that measures the extent to which
there are differences between individual observation and some central or
average value. In measuring variation we shall be interested in the amount of
the variation or its degree but not in the direction. For example, a measure of 6
inches below the mean has just as much dispersion as a measure of six inches
above the mean.

We have explained the measures of central tendency previously. It may be


noted that these measures do not indicate the extent of dispersion or variability
in distribution. The dispersion or variability provides us one more step in
increasing our understanding of the pattern of the data. A high degree of
uniformity (i.e. low degree of dispersion) is a desirable quality. If in a business
there is a high degree of variability in the raw material, then it could not find
mass production economical.

As we have seen, the various measures of central value give us one single figure
that represents the entire data. But the average alone cannot adequately describe
a set of observations, unless all the observations are the same. It is necessary to
describe the variability or dispersion of the observations. In two or more
distributions, the central value may bathe same but still there
can be wide disparities in the formation of distribution.
Measures of dispersion help us in studying this important
characteristic of distribution. With the help of dispersion, we
have an idea about homogeneity or heterogeneity of the
distribution.

It is clear from above that dispersion (also known as variability, scatter or


spread) measures the extent to which the items vary from some central value.
Since measures of dispersion give an average of the differences of various items
from an average, they are also called Averages of the Second Order. An
average is more meaningful when it is examined in the light of dispersion. For
example, if the average wage of the workers of factory A is #3885 and that of
factory B #3900, we cannot necessarily conclude that the workers of factory B
are better off, because, in factory B there may be much greater dispersion in the
distribution of wages. The study of dispersion is of great significance in
practice. Also, suppose an investor is looking for a suitable equity share for
investment. While examining the movement of share prices, he should avoid
those shares that are highly fluctuating-having sometimes very high prices and
at other times going very low. Such extreme fluctuations mean that there is a
high risk in the investment in shares. The investor should, therefore, prefer
those shares where risk is not so high.

ITQ: what does dispersion or variability provides us with?


ITA: It provide us with one more step in increasing our understanding of the pattern of
the data

(B) Significance and Properties of Measuring Variability/Dispersion


Measures of variation/dispersion are needed for four basic purposes:

1. Measures of variation point out as to how far an average is representative of


the mass. When dispersion is small, the average is a typical value in the sense
that it closely represents the individual value and it is reliable in the sense that it
is a good estimate of the average in the corresponding universe. On the other
hand, when dispersion is large, the average is not so typical, and unless the
sample is very large, the average may be quite unreliable.

2. Another purpose of measuring dispersion is to determine nature and cause of


variation in order to control the variation itself. In matters of health variations in
body temperature, pulse beat and blood pressure are the basic guides to
diagnosis. Prescribed treatment is designed to control their variation. In
industrial production efficient operation requires control of quality variation the
causes of which are sought through inspection is basic to the control of causes
of variation. In social sciences a special problem requiring the measurement of
variability is the measurement of “inequality” of the distribution of income or
wealth etc.

3. Measures of dispersion enable a comparison to be made of two or more series


with regard to their variability. The study of variation may also be looked upon
as a means of determining uniformity of consistency. A high degree of variation
would mean little uniformity or consistency whereas a low degree of variation
would mean great uniformity or consistency.

4. Many powerful analytical tools in statistics such as correlation analysis. The


testing of hypothesis, analysis of variance, the statistical quality control, and
regression analysis is based on measures of variation of one kind or another.

A good measure of dispersion should possess the following properties:


1. It should be simple to understand.
2. It should be easy to compute.
3. It should be rigidly defined.
4. It should be based on each and every item of the
distribution.
5. It should be amenable to further algebraic treatment.
6. It should have sampling stability.
7. Extreme items should not unduly affect it.
(C) Measures of Variability/Dispersion
There are five measures of variability/dispersion. These are: Range, Inter-
Quartile Range or Quartile Deviation, Mean Deviation and Standard Deviation
& Variance. These are discussed in the ensuing paragraphs with suitable
examples.

1. Range
Range is the difference between the highest and the lowest observation in a
given set of data. Thus, range is easy to determine and use, it is a poor measure
of variability because, and it makes use of only the two extreme values in the
given data.

Example:
The number of patients a medical consultant sees per day during the five
working days (Monday to Friday) in the ABU Teaching Hospital is as follows:
2, 4, 9, 5, 3
Determine the range

Solution:
Range = highest value – lowest value
9–2=7

2. Interquartile Range or Quartile Deviation


The interquartile range or the quartile deviation is a better measure of variation
in a distribution than the range. Here, avoiding the 25 percent of the distribution
at both the ends uses the middle 50 percent of the distribution.

Quartile (Q): divides a distribution into four equal parts.


Interquartile Range
Interquartile range denotes the difference between the third quartile (i.e. upper
quartile or Q 3 ) and the first quartile (i.e. lower quartile or Q 1 ).

Symbolically, it is = Q 3 – Q 1

Quartile deviation/Semi-Interquartile Range


It is half of the interquartile range.

Symbolically, it is = Q 3 – Q 1
2
Many times the Interquartile Range is reduced in the form of Quartile
Deviation/Semi-Interquartile range.

Lower Quartile (Q 1 )
Q 1 : divides a distribution into parts such that 25% of all items in the
distribution have a value less than Q 1 and 75% have a value more than Q 1 .

Upper Quartile (Q 3 )
Q 3 : divides a distribution into two parts, 75% of all items in the distribution
have a value less than Q 3 while the remaining 25% have a value more than Q 3 .

NB: When quartile deviation is small, it means that there is a small deviation in
the central 50 percent items. In contrast, if the quartile deviation is high, it
shows that the central 50 percent items have a large variation. It may be noted
that in a symmetrical distribution, the two quartiles, that is, Upper Quartile (Q 3 )
and Lower Quartile (Q 1 ) are equidistant from the Median (M).

Symbolically, M-Q 1 = Q 3 -M
However, this is seldom the case as most of the business and economic data are
asymmetrical. But, one can assume that approximately 50 percent of the
observations are contained in the interquartile range. It may be noted that
interquartile range or the quartile deviation is an absolute measure of dispersion.
It can be changed into a relative measure of dispersion as follows:

Coefficient of Quartile Deviation (QD) = Q 3 – Q 1 /Q 3 + Q 1

ITQ: What is a range?


ITA: Range is the difference between the highest and the lowest observation in a given set
of data.

The General Formula for Quartile


 jN 
 − (∑ f ) p 
Qj = Lj +  4 C j
 fj 
 
 

Where:
Q j =jthquartiles
L j = lower class boundary of the jth quartile class
N = total frequency
∑f = sum of frequencies preceding the jth quartile class
F j = frequency of the jth quartile class
C j = class size of the jth quartile class
j = 1, 2, 3 …

Example:
JAMEEL Marketing (Nig.) has recorded the following frequency distribution in
one of its filling stations.

Petrol per litre Frequency


35 – 39 3
40 – 44 3
45 – 49 2
50 – 54 8
55 – 59 7
60 – 64 6

Using the above grouped data, compute the following:


a) Lower quartile
b) Upper quartile
c) Quartile deviation
d) Semi-interquartile

Solution:
Petrol per litre Frequency Cumulative frequency
35 – 39 2 2
40 – 44 3 5
45 – 49 2 7
50 – 54 8 15
55 – 59 7 22
60 – 64 6 28

(a) Lower quartile


N 
 − (∑ f ) p 
Q1 = L j +  4 C j
 fj 
 
 
N 28
= =7
4 4

To determine the lower class boundary N and locate its position in the
4

cumulative frequency.
L = 44.5
∑f =5
C =5
f =2
 28 
 − (5) 
Q1 = 44.5 +  4 5
 2 
 
 
7 −5
Q1 = 44.5 +  5
 2 
2
= 44.5 +  5
2
= 44.5 + 5
= 49.5

(b) Upper quartile


 3N 
 − (∑ f ) p 
Q3 = L j +  4 C j
 fj 
 
 
Q 3 = to determine the lower class boundary of 3 N and locate its position in
4

cumulative frequency.
 84 
 − (15) 
Q 3 = 54.5 +  4 5
 7 
 
 

 21 − 15 
Q 3 = 54.5 +  5
 7 

6
= 54.5 +  5
7

= 54.5 + 4.29
= 58.79

(c) Quartile deviation


= Q3 – Q1
=58.79 – 49.5 = 9.29
(d) Semi- interquartile range
Q3 − Q1
=
2
58.79 − 49.5
=
2
9.29
= = 4.65
2

3. Mean Deviation (MD) or Mean Absolute Deviation (MAD)


The mean deviation or mean of absolute deviation is also known as the average
deviation. As the name implies, it is the average of absolute amounts by which
the individual items deviate from the mean. Since the positive deviations from
the mean are equal to the negative deviations, while computing the mean
deviation, we ignore positive and negative signs.
∑x
Symbolically, MD =
n
Where:
MD = mean deviation,
|x| = deviation of an item from the mean ignoring positive and negative signs,
n= the total number of observations.

Example:
Size of Item Frequency
2-4 20
4-6 40
6-8 30
8-10 10

Solution:
Size of Mid-Points (m) Frequency fm d from fd
Item (f) x
2– 4 3 20 60 -2.6 52
4 –6 5 40 200 -0.6 24
6–8 7 30 210 1.4 42
8 – 10 9 10 90 3.4 34

Total 100 560 152

∑ fm 560
x= = = 5.6
n 100
∑ f d 152
MD ( x ) = = = 1.52
n 100

4. Standard Deviation and Variance


The standard deviation is similar to the mean deviation in that, here too the
deviations are measured from the mean. At the same time, the standard
deviation is preferred to the mean deviation or the quartile deviation or the
range because it has desirable mathematical properties.

Before defining the concept of the standard deviation, we introduce another


concept known as Variance.

Example:
x x−µ ( x − µ )2
20 20-18 = 2 4
15 15-18 = -3 9
19 19-18 = 1 1
24 24-18 = 6 36
16 16-18 = -2 4
14 14-18 = -4 16
108 Total 70

Solution:

Mean = 108/6 = 18
The second column shows the deviations from the mean. The third or the last
column shows the squared deviations, the sum of which is 70. The arithmetic
mean of the squared deviations is:
∑(χ − µ )
2
70
= = 11.67approx. (This result is variance)
N 6
This mean of the squared deviations is known as the variance. It may be noted
that this variance is described by different terms that are used interchangeably:
the variance of the distribution X; the variance of X; the variance of the
distribution; and just simply, the variance.
∑(χ − µ )
2

Symbolically, Var (X) =


N

∑ (χ i − µ )
2

It is also written as σ =
2
N
Where σ2 (called sigma squared) is used to denote the variance.

Although the variance is a measure of dispersion, the unit of its measurement is


(points). If a distribution relates to income of families then the variance is (N)2
and not Naira. Similarly, if another distribution pertains to marks of students,
then the unit of variance is (marks)2. To overcome this inadequacy, the square
root of variance is taken, which yields a better measure of dispersion known as
the Standard Deviation (SD).Taking our earlier example of individual
observations, we take the square root of the Variance.

SD or σ= Variance = 11.67 = 3.42 points

∑(χ i − µ )
2

Symbolically, σ= =
N

In applied Statistics, the standard deviation is more frequently used than the
variance. This can also be written as:
∑χ 2

(∑ χi )
2

i
σ= N
N

We use this formula to calculate the standard deviation from the individual
observations given earlier.

Example:
x x2
20 400
15 225
19 361
24 576
16 256
14 196
108 2014

Solution:
∑ xi2 = 2014 ∑ xi = 108 N = 6

2014 −
(108)
2
2014 −
1164
σ= 6 = 6
6 6

12080 − 11664 420


σ= 6 = 6
6 6

70
σ= = 11.67
6

σ = 3.42

Example:
The following distribution relating to marks obtained by students in an
examination:
Marks Number of Students
0-10 1
10-20 3
20-30 6
30-40 10
40-50 12
50-60 11
60-70 6
70-80 3
80-90 2
90-100 1

Solution:
Marks Number of Mid-Points Deviations fd fd2
Students (d)/10 = d2

0-10 1 5 -5 -5 25
10-20 3 15 -4 -12 48
20-30 6 25 -3 -18 54
30-40 10 35 -2 -20 40
40-50 12 45 -1 -12 12
50-60 11 55 0 0 0
60-70 6 65 1 6 6
70-80 3 75 2 6 12
80-90 2 85 3 6 18
90-100 1 95 4 4 16
Total 55 Total -45 231

In the case of frequency distribution where the individual values are not known,
we use the midpoints of the class intervals. Thus, the formula used for
calculating the standard deviation is as given below:
K

∑ fi(m − µ)
2
i
σ= i =1

Where mi is the mid-point of the class intervals μ is the mean of the


distribution, fi is the frequency of each class; N is the total number of frequency
and K is the number of classes. This formula requires that the mean μ be
calculated and that deviations (mi -μ) be obtained for each class. To avoid this
inconvenience, the above formula can be modified as:
K
 K 

i =1
fid i2  ∑ fd i 
 i =1 
σ=
N

Where C is the class interval: fi is the frequency of the ith class and di is the
deviation of the item from an assumed origin; and N is the total number of
observations.

Applying this formula for the table given earlier,

231  − 45 
2

σ = 10 − 
55  55 

σ = 10 4.2 − 0.669421

σ = 18.8marks

When it becomes clear that the actual mean would turn out to be in fraction,
calculating deviations from the mean would be too cumbersome. In such cases,
an assumed mean is used and the deviations from it are calculated. While
midpoint of any class can be taken as an assumed mean, it is advisable to
choose the mid-point of that class that would make calculations least
cumbersome. Guided by this consideration, in the above example, we have
decided to choose 55 as the mid-point and, accordingly, deviations have been
taken from it. It will be seen from the calculations that they are considerably
simplified.
ITQ: Why is standard deviation preferred to the mean?
ITA: It is preferred because it has desirable mathematical properties.

5. Coefficient of Variation (CV or COV)


The standard deviation is an absolute measure of dispersion as it measures
variation in the same units as the original data. As such, it cannot be a suitable
measure while comparing two or more distributions. For this purpose, we
should use a relative measure of dispersion. One such measure of relative
dispersion is the coefficient of variation, which relates the standard deviation
and the mean such that the standard deviation is expressed as a percentage of
mean. Thus, the specific unit in which the standard deviation is measured is
done away with and the new unit becomes percent.

σ
Symbolically, Coefficient of Variation (COV) = × 100
µ

Example:
In a small business firm, two typists are employed- typist A and typist B. Typist
A types out, on an average, 30 pages per day with a standard deviation of 6.
Typist B, on an average, types out 45 pages with a standard deviation of 10.
Which typist shows greater consistency in his output?

Solution:
σ
Coefficient of variation for A = × 100
µ
6
A= × 100
30
A = 20% and
σ
Coefficient of variation for B = × 100
µ
10
B= × 100
45
B = 22.2 %

These calculations clearly indicate that although typist B types out more pages,
there is a greater variation in his output as compared to that of typist A. We can
say this in a different way: Though typist A’s daily output is much less, he is
more consistent than typist B. The
usefulness of the coefficient of
variation becomes clear in comparing
two groups of data having different
means, as has been the case in the
above example.

(D) Interpretation of Standard Deviation


Knowing the mean and standard deviation allows the statistics practitioner to
extract useful bits of information. The information depends on the shape of the
histogram. If the histogram is bell shaped, we can use the Empirical Rule.

A more general interpretation of the standard deviation is derived from


Chebysheff’s theorem, which applies to histograms of all shapes.

Summary
In this study session, we have discussed the meaning of variability, significance
and properties of variability as well as the measures of variability.

(E) Discussion Questions


1. What do you understand by variability and what are the purposes of
measuring it?
2. Discuss the characteristics of good measure of dispersion.
3. Discuss the advantages and the disadvantages of standard deviation.
4. Explain the measures of variability/dispersion
References
1. Dominick, Salvatore and Derrick, Reagle (2002). Theory and Problems of
Statistics and Econometrics (2ndEdition). New York: McGraw-Hill
Companies, Inc.
2. Hein, L. W (2015). Quantitative Approach to Managerial Decisions. New
Jersey: Prentice Hall.
3. Gerald, K and Brian, W (2006).Statistics for Management and Economics
(10th Edition). Pacific Grove-USA: Brooks/Cole-Thompson Learning.
4. Hooda, R. P (2014). Statistics for Business and Economics. New Delhi:
Macmillan.
5. Lind, Marchal and Marson (2005). Statistical Techniques in Business and
Economics (12th Edition). New York: McGraw-Hill Companies.
9.2 MODULE 2
Skewness and Kurtosis, Probability Theory and Distribution
Introduction
The study of probability is based on the way we describe situations involving
uncertainty. Statistics is used to help make
decisions in the face of uncertainty. By means of
probability theory, we can measure uncertainty
surrounding a given situation.

There are three basic ways for the decision maker


to assign a probability to an event. These are as follows:
1. Relative Frequency Approach
2. Classical or Theoretical Interpretations
3. Subjective Approach.

According to the Relative Frequency Approach, the probability of an event is


equal to the relative frequency of that event based on an ever increasing number
of observations. In other words, if a situation is repeated a large number of
times N, and event A occurs n times, the probability of A is P (A) =n/N.

In the case of Classical Approach, the probability is obtained using theory rather
than observation. Theoretical probability has several
characteristics, one of which is that it assumes symmetry
of events. A second characteristic is that it is based on
abstract reasoning and does not depend on experiment. It is
sometimes called priori probability.
The Subjective Approach refers to assigning probability numbers using
personal judgment. The subjective probability of an event A is a probability
that expresses the decision maker’s personal belief that A will occur. According
to the subjectivist view, a probability expresses the strength of person’s belief.

There exist various probability distributions that are very relevant in computing
probability problems, notable among which are
Uniform probability distribution, Binomial
distribution, Hyper-geometric distribution, Poisson
distribution and Normal distribution.

9.2.1 Objectives
After this contact module, you will be able to:
1. calculate the probability of certain events occurring, based on previous
occurrences;
2. understand the link between probability and statistical frequencies;
3. combine probabilities;
4. calculate expected values for certain occurrences;
5. understand the limitations of probability as a tool for decision making;
6. compute discrete and continuous probabilities;
7. recognise the situations where the use of the
normal distribution is appropriate; and
8. solve problems using normal distribution table
arising from various situations.
9.2.2 STUDY SESSIONS
9.2.2.1 STUDY SESSION 5: Skewness and Kurtosis
Section and Subsection Headings
Introduction
Learning outcomes
Main Content
(A) Meaning of Skewness
1. Symmetrical Distribution
2. Asymmetrical Distribution
(B) Tests of Skewness
(C) Measures of Skewness
(D) Meaning of Kurtosis
(E) Measures of Kurtosis
(F) Discussion Questions
Summary
Self-assessment questions and Answers
References/Further Readings

Introduction
You are welcome to study session five. In this study session, you will be able to
understand the meaning of skewness, kurtosis as well as their various measures.
In the above study sessions, we have discussed frequency distributions in detail.
It may be repeated here that frequency distributions differ in three ways:
Average value, Variability or Dispersion, and Shape. Since the first two, that
is, average value and variability or dispersion have already been discussed
previously, here our main spotlight will be on the shape of frequency
distribution. Generally, there are two comparable characteristics called
skewness and kurtosis that help us to understand a distribution. Two
distributions may have the same mean and standard deviation but may differ
widely in their overall appearance as can be seen from the diagram below:

In both these distributions the value of mean and standard deviation is the same
( x = 15 , σ = 5 ). But it does not imply that the distributions are alike in nature.
The distribution on the left-hand side is a symmetrical one whereas the
distribution on the right-hand side is asymmetrical or skewed. Measures of
skewness help us to distinguish between different types of distribution.

Learning Outcomes
At the end of the study session, you should be able to:
1. Explain the meaning of skewness and its measures.
2. Explain the meaning of kurtosis and its measures.

(A) Meaning of Skewness


According to Croxton & Cowden “When a series is not symmetrical it is said
to be asymmetrical or skewed”. To Morris Hamburg “Skewness refers to the
asymmetry or lack of symmetry in the shape of a frequency distribution”.
But Garrett said: “A distribution is said to be ‘skewed’ when the mean and the
median fall at different points in the distribution, and the balance (or centre of
gravity) is shifted to one side or the other-to left or right”.

The above definitions show that the


term ‘skewness’ refers to lack of
“symmetry” i.e., when a distribution is
not symmetrical (or is asymmetrical) it
is called a skewed distribution.
The concept of skewness will be clear from the following three diagrams
showing a symmetrical distribution, asymmetrical distribution (i.e. a positively
skewed distribution and a negatively skewed distribution).

1. Symmetrical Distribution: It is clear from the top diagram below that in a


symmetrical distribution the values of mean, median and mode coincide. The
spread of the frequencies is the same on both sides of the centre point of the
curve.

2. Asymmetrical Distribution: A distribution, which is not symmetrical, is


called a skewed distribution. There are two types of asymmetrical (skewed)
distribution; it could either be positively skewed or negatively skewed as would
be clear from the below (middle and bottom diagrams).

i. Positively Skewed Distribution: In the positively skewed distribution, the


value of the mean is maximum and that of mode least-the median lies in
between the two as is clear from the middle diagram.

ii. Negatively Skewed Distribution. The bottom diagram below is the shape of
negatively skewed distribution. In a negatively skewed distribution the value of
mode is maximum and that of mean least-the median lies in between the two. In
the positively skewed distribution the frequencies are spread out over a greater
range of values on the high-value end of the curve (the right-hand side) than
they are on the low-value end. In the negatively skewed distribution the position
is reversed, i.e. the excess tail is on the left-hand side. It should be noted that in
moderately symmetrical distributions the interval between the mean and the
median is approximately one-third of the interval between the mean and the
mode. It is this relationship, which provides a means of measuring the degree of
skewness.
ITQ: What is skewness?
ITA: To Morris Hamburg “Skewness refers to the asymmetry or lack of symmetry in
the shape of a frequency distribution”.

(B) Tests of Skewness


In order to ascertain whether a distribution is skewed or not, the following tests
may be applied. Skewness is present if:
1. The values of mean, median and mode do not coincide.
2. When the data are plotted on a graph they do not give the normal bell-
shaped form i.e. when cut along a vertical line through the centre the two
halves are not equal.
3. The sum of the positive deviations from the median is not equal to the
sum of the negative deviations.
4. Quartiles are not equidistant from the median.
5. Frequencies are not equally distributed at points of equal deviation from
the mode.
On the contrary, when skewness is absent, i.e. in case of a symmetrical
distribution, the following conditions are satisfied:
1. The values of mean, median and mode coincide.
2. Data when plotted on a graph give the normal bell-shaped form.
3. Sum of the positive deviations from the median is equal to the sum of the
negative deviations.
4. Quartiles are equidistant from the median.
5. Frequencies are equally distributed at points of equal deviations from the
mode.

(C) Measures of Skewness


According to Simpson & Kalka “Measures of skewness tell us the direction
and the extent of skewness. In symmetrical distribution the mean, median and
mode are identical. The more the mean moves away from the mode, the larger
the asymmetry or skewness”.

There are various measures of skewness, each divided into absolute and relative
measures. The relative measure is known as the coefficient of skewness and is
more frequently used than the absolute measure of skewness. When a
comparison between two or more distributions is involved, it is the relative
measure of skewness, which is used. The measures of skewness are: Karl
Pearson’s measure, Bowley’s measure, Kelly’s measure and Moment’s
measure. Here, the Karl Peason’s measure is discussed briefly below:

Karl Peason’s Measure


The formula for measuring skewness as given by Karl Pearson is as follows:
Skewness = Mean – Mode

Karl Pearson’s coefficient of skewness


1st coefficient of skewness = Mean-Mode .
Standard deviation

2nd coefficient of skewness = 3(Mean-Median)


Standard deviation

The direction of skewness is determined by ascertaining whether the mean is


greater than the mode or less than the mode. If it is greater than the mode, then
skewness is positive. But when the mean is less than the mode, it is negative.
The difference between the mean and mode indicates the extent of departure
from symmetry. It is measured in standard deviation units, which provide a
measure independent of the unit of measurement. It may be recalled that this
observation was made in the preceding chapter while discussing standard
deviation. The value of coefficient of skewness is zero, when the distribution is
symmetrical. Normally, this coefficient of skewness lies between +1. If the
mean is greater than the mode, then the coefficient of skewness will be positive,
otherwise negative.

ITQ: What is the importance of measures of skewness?


ITA: Measures of skewness tell us the direction and the extent of skewness.

Example:
The information below shows the extent of patronage by female students in
buying jeans trousers in a particular supermarket in Zaria City.

Jeans No. of female students


0-5 5
5-10 2
10-15 3
15-20 8
20-25 4
Compute:
a) 1st coefficient of skewness
b) 2nd coefficient of skewness

Solution:
First, calculate the mean, mode and standard deviation.
Jeans Class mark(x) f fx
0–5 2.5 5 12.5
5 - 10 7.5 2 15
10 -15 12.5 3 37.5
15 -20 17.5 8 140
20 – 25 22.5 4 90
Total f=22 fx=295

∑ fx 295
Mean (X) = = = 13.41
∑f 22

 d1 
Mode = L1 +  C
 d1 + d 2 

 5 
= 15 +  5
5+ 4

5
= 15 +  5
9
25
= 15 +
9

= 15 + 2.78
= 17.78

Standard deviation (SD)

SD(σ ) =
∑ fx 2
∑f
()
− x
2

Mean(X) = 13.41
Jeans Class mark(X) Frequency(f) X-X (X - X) 2 f(X - X) 2
0–5 2.5 5 -10.91 119.03 595.14
5- 10 7.5 2 -5.91 34.93 69.89
10-15 12.5 3 -0.91 0.83 2.48
15-20 17.5 8 4.09 16.73 133.83
20-25 22.5 4 9.09 82.63 330.51
Total ∑fx = 22 1131.82

σ=
∑ fx 2
∑f
− x() 2

1131.82
σ=
22

σ = 51.41
σ = 7.17

a) 1st coefficient of skewness = Mean-Mode .


Standard deviation
=13.41-17.78
7.17
= -0.61

b) 2nd coefficient of skewness = 3(Mean-Median)


Standard deviation
Jeans f cumulative frequency
0–5 5 5
5–10 2 7
10–15 3 10
15–20 8 18
20–25 4 22

N 
 2 − (∑ f ) p 
Median = L m +  C
 f m 
 
 22 
 2 − (10 )
= 15 +  5
 8 
 

11 − 10 
= 15 +  5
 8 
1
= 15 + × 5
8

= 15 + 0.625
= 15.63

2nd coefficient of skewness= 3(Mean-Median)


Standard deviation
= 3(13.41-15.63)
7.17
= -6.66
7.17
= -0.93

Quartiles & Percentiles and their Application in the Measures of Skewness


Quartile coefficient of skewness (Formula)
(Q3 − Q2 ) − (Q2 − Q1 )
=
Q3 − Q1

Q3 − 2Q2 + Q1
=
Q3 − Q1

10– 90 Percentile coefficient of skewness (Formula)

=
(P90 − P50 ) − (P50 − P10 )
P90 − P10

P90 − 2 P50 + P10


=
P90 − P10
Example:
The following information in the data below relates to the length of time spent
by car owners in filling station during a particular month.

Time (Hours) No. of Cars


Less than 2 1
2 and up to 4 4
4 and up to 6 6
6 and up to 8 8
8 and up to 10 10
10 and up to 12 12

Calculate:
a) Quartile coefficient of skewness
b) Percentile coefficient of skewness

Solution:
a) Calculate Q1
 jN 
 4 − (∑ f ) p 
Q1 = L j +  C j
 fj 
 

Time (Hours) f No. of Cars (Cumulative


Frequency)
Less than 2 1 1
2 and up to 4 4 5
4 and up to 6 6 11
6 and up to 8 8 19
8 and up to 10 10 29
10 and up to 12 12 41

 jN 
 4 − (∑ f ) p 
Q1 = L j +  C j
 f j 
 
N 41
= = 10.25
4 4
 41 
 4 − (5) 
Q1 = 4 +  2
 6 
 
10.25 − 5 
= 4+  2
 6
5.25
= 4+ ×2
6
10.5
= 4+
6
= 4 + 1.75
= 5.75

 2N 
 4 − (∑ f ) p 
Q2 = L j +  C j
 fj 
 
2 N 2 × 41 81
= = = 20.5
4 4 4
 2 × 41 
 4 − (19) 
Q2 = 8 +  2
 10 
 
1.5
=8+ ×2
10
4.5
=8+
10
= 8 + 0.45
= 8.45

 3N 
 4 − (∑ f ) p 
Q3 = L j +  C j
 f j 
 
3 N 3 × 41
= = 30.75
4 4
 3 × 41 
 4 − (29) 
Q 2 = 10 +  2
 12 
 
1.75
= 10 + ×2
12
= 10 + 0.29
= 10.29

Q3 − 2Q2 + Q1
a) Quartile coefficient of skewness =
Q3 − Q1
10.29 − 2(8.45) + 5.75
=
10.29 − 5.75
− 0.86
=
4.54
= −0.1894

P90 − 2 P50 + P10


b) Percentile coefficient of skewness =
P90 − P10
 10 N 
 100 − (∑ f ) p 
P10 = L j +  C j
 fj 
 
10 N 10 × 41
= = 4.1
100 100
 410 
 100 − (1) 
P10 = 2 +  2
 4 
 
 4.1 − 1
= 2+ 2
 4 
3.1
= 2+ ×2
4
6.2
= 2+
4
= 2 + 1.55
= 3.55

 50 N 
 100 − (∑ f ) p 
P50 = L j +  C j
 fj 
 
50 N 50 × 41
= = 20.5
100 100
 2050 
 100 − (19) 
P50 = 8 +  2
 10 
 
 20.5 − 19 
=8+  2
 10 
1.5
=8+ ×2
10
3
=8+
10
= 8 + 0.3
= 8.3

 90 N 
 100 − (∑ f ) p 
P90 = L j +  C j
 f j 
 
90 N 90 × 41
= = 36.9
100 100
 3690 
 100 − (29) 
P90 = 10 +  2
 12 
 
 36.9 − 29 
= 10 +  2
 12 
7.9
= 10 + ×2
12
= 10 + 1.316
= 11.3

P90 − 2 P50 + P10


Therefore, Percentile Coefficient of Skewness =
P90 − P10
11.3 − 2(8.3) + 3.55
=
11.3 − 3.55

11.3 − 16.6 + 3.55


=
7.75
− 1.75
=
7.75
= −0.2258
The skewness is said to be slightly negatively skewed.

(D) Meaning of Kurtosis


Kurtosis is another measure of the shape of a frequency curve. It is a Greek
word, which means bulginess. While skewness signifies the extent of
asymmetry, kurtosis measures the degree of peakedness of a frequency
distribution. Karl Pearson classified curves into three types on the basis of the
shape of their peaks. These are Mesokurtic, Leptokurtic and Platykurtic.
These three types of curves are shown in figure below:

It will be seen from the above figure that Mesokurtic curve is neither too much
flattened nor too much peaked. In fact, this is the frequency curve of a normal
distribution. Leptokurtic curve is a more peaked than the normal curve. In
contrast, Platykurtic is a relatively flat curve. The coefficient of kurtosis as
given by Karl Pearson is β 2 = µ 4 / µ 22 . In case of a normal distribution, that is,
Mesokurtic curve, the value of β 2 = 3 . If β 2 turn out to be > 3, the curve is
called a Leptokurtic curve and is more peaked than the normal curve. Again,
when β 2 < 3, the curve is called a Platykurtic curve and is less peaked than the
normal curve. The measure of kurtosis is very helpful in the selection of an
appropriate average. For example, for normal distribution, mean is most
appropriate; for a leptokurtic distribution, median is most appropriate; and for
platykurtic distribution, the quartile range is most appropriate.

ITQ: What is the difference between skewness and kurtosis?


ITA: Skewness signifies the extent of asymmetry while Kurtosis measures the degree
of peakedness of a frequency distribution.

(E) Measures of Kurtosis


Kurtosis measures the degree of peakedness of a frequency distribution. The
measures of kurtosis are:
M4 M4
Moment coefficient of kurtosis = Q 4 = 4
=
S M 22
This measure of Kurtosis uses the fourth moment about the mean expresses in
dimensionless form.

Quartile and Percentile: This is another measure of kurtosis which is also used
based on both quartile and percentile form and is expressed as follows:
Q
K=
P90 − P10

Where: Q=
1
(Q3 − Q1 ) (Semi-Interquartile range)
2

Example:
The following information in the data below relates to the length of time spent
by car owners in filling station during a particular month.

Time (Hours) No. of Cars


Less than 2 1
2 and up to 4 4
4 and up to 6 6
6 and up to 8 8
8 and up to 10 10
10 and up to 12 12
Calculate the percentile coefficient of kurtosis.
Solution:
Time (Hours) f No. of Cars
(Cumulative Frequency)
Less than 2 1 1
2 and up to 4 4 5
4 and up to 6 6 11
6 and up to 8 8 19
8 and up to 10 10 29
10 and up to 12 12 41

To find the semi-interquartile range


Q 1 = 5.67
Q 3 = 10.33
P 10 = 3.5
P 90 = 11.33

Q=
1
(Q3 − Q1 )
2

Q=
1
(10.33 − 5.67 )
2

Q=
1
(4.66)
2
Q = 2.33

Percentile coefficient of kurtosis


Q
K=
P90 − P10

2.33
=
11.33 − 3.5
2.33
= = 0.29
17.8
Summary
In this study session, we have discussed the meaning of skewness, kurtosis as
well as their various measures in Business Statistics.

(F) Discussion Questions


1. What is the difference between skewness and kurtosis and what role do they
play in describing frequency distribution?
2. Discuss the meaning and test of skewness
3. Explain the differences between skewness and kurtosis
4. What is the meaning of kurtosis? Explain its measures.

References
1. Dominick, Salvatore and Derrick, Reagle (2002). Theory and Problems of
Statistics and Econometrics (2nd Edition). New York: McGraw-Hill
Companies, Inc.
2. Gerald, K and Brian, W (2006).Statistics for Management and Economics
(10th Edition). Pacific Grove-USA: Brooks/Cole-Thompson Learning.
3. Hein, L. W (2015). Quantitative Approach to Managerial Decisions. New
Jersey: Prentice Hall.
4. Hooda, R. P (2014). Statistics for Business and Economics. New Delhi:
Macmillan.
5. Lind, Marchal and Marson (2005). Statistical Techniques in Business and
Economics (12th Edition). New York: McGraw-Hill Companies.
9.2.2.2 STUDY SESSION 6
Probability I
Section and Subsection Headings
Introduction
Learning outcomes
Main Content
(A) Meaning of Probability
(B) How to Assign Probabilities to Events
(C) Computational Probability Rule
(D) The Bayes’ Theorem
Summary
Self-assessment questions and Answers
References/Further Readings

Introduction
In this study session, you will be able to understand the meaning of probability,
some basic concepts, Bayes’ theorem and many more.

Learning Outcomes
At the end of the study session, you should be able to:
1. Explain the meaning of probability.
2. Discuss some basic concepts.
3. Explain the Bayes’ theorem and many others.

(A) Meaning of Probability


A probability is a quantitative measure of uncertainty - a number that conveys
the strength of our belief in the occurrence of an uncertain event.
The concept of probability is not a new phenomenon in the life of human
beings. The life itself is a fundamental example of probability. This is because
human beings either survive or die.

The probability always has two chances. The chances are either to win or lose
and occurrence or non-occurrence. The most commonly used words for
probability include the following: chance, likely, probably or possibly.
Examples of probability statements are:
a) The patient will either survive or die
b) The business ventures will either make a profit or a loss
c) The man will either win or lose the game
d) There is a probability of success or a failure in a business

Some Basic Concepts


Events-A set consisting of a collection of sample points or experimental
outcomes.
Experiment -is an action from which a result is
expected.
Sample Space -is a set of all possible outcomes of
an experiment

ITQ: What is a probability?


ITA: A probability is a quantitative measure of uncertainty

(B) How to Assign Probabilities to Events


Suppose an event happens in h ways and fails to happen in f ways and that each
of the mentioned event (h + f) is equally likely. Then we can define probability
of occurrence of events as this, if we assume that the probability of success of
his p and the probability of failure f is q.
Therefore, probability of p, success is:
P = h__
h+f
While probability of failure q, is:
q = f__
h+f
Therefore, the probability of success p and the probability of failure q are
p + q =1
q= 1- p

Note: If an event is certain to happen the probability is p = 1, but when an event


is not certain to happen the probability is p = 0. This is true because p can only
take values between 0 and 1 and when it is in percentage form, it is between 0%
to 100%

Example:
A student tosses a coin once. What is the probability that head will appear.

Solution:
P (head) = ½

Example:
A student throws a die once. What is the probability that the die face with
number 3 will appear.

Solution:
P (face with number 3) = 1/6

(C) Computational Probability Rule


1. If A and B are two mutually independent events then;
a) P(AB)= P(A) x P(B) Multiplication or Product Rule
b) P(A or B)= P(A) + P(B) – P(AB) Addition Rule
2. If A and B are two mutually exclusive events then;
P (A or B) = P (A) + P (B) Addition Rule

Example:
A carton of T–shirts contained 24 shirts in three different colours, 12 white, 8
yellow and 4 brown.
What is the probability of a customer willing to buy a T-shirt, selecting at
random, either a yellow or brown shirt?

Solution:
P (yellow) = Number of occurrences
Total number of occurrences
= 8 /24= 1/3
P (brown) = 4/24 = 1/6
The probability of not selecting brown = (1 - 4/24)
P (selecting either yellow or brown) = P(Y) + P (B)
= 8/24 + 4/24
= 12/24
= 1/2
Example:
Toss 2 fair coins, what is the sample space

Solution:
S is described as:
S = {HH, HT, TH, TT}
Example:
A calculator manufacturing company manufactures two types of calculators. In
their advertisement, the company is proud of their products, emphasizing on the
durability and reliability of the calculators. The company calculators are
labelled as calculator X and calculator Y. the probability that calculator X will
last for 30years is ¾ and the probability that calculator Y will last for 30 years is
2/3.
Find the probability that:
a) Both calculator will last for 30years;
b) Only calculator X will last for 30 years;
c) At least one calculator will last for 30 years

Solution:
a) P(both calculators) = P(XY) = P(X) x P(Y)
= ¾ x 2/3 = 6/12
b) = P(XY’) = P(X) x P( Y’)
= P(X) = ¾
= P(Y’) = 1 – 2/3 = 1/3
= ¾ x 1/3= 3/12 = ¼
c) P( at least one calculator will last for 30 years)
= P (XY’) = P(X’Y) x P(XY)
= P(X) x P(Y’) + P(X’) x P(Y) + P(X) x P(Y)
= ¾ x 1/3 + ¼ x 2/3 + ¾ x 2/ 3
= 3/12 + 2/12 + 6/12
= 11/12

(D) The Bayes’ Theorem


As we have already noted in the introduction, the basic objective behind
calculating probabilities is to help us in making decisions by quantifying the
uncertainties involved in the situations. Quite often, whether it is in our personal
life or our work life, decision-making is an ongoing process. Consider for
example, a seller of winter garments, who is interested in the demand of the
product. In deciding on the amount he should stock for this winter, he has
computed the probability of selling different quantities and has noted that the
chance of selling a large quantity is very high. Accordingly, he has taken the
decision to stock a large quantity of the product. Suppose, when finally the
winter comes and the season ends, he discovers that he is left with a large
quantity of stock. Assuming that he is in this business, he feels that the earlier
probability calculation should be updated given the new experience to help him
decide on the stock for the next winter.

Similar to the situation of the seller of winter garment, situations exist where we
are interested in an event on an ongoing basis. Every time some new
information is available, we do revise our odds mentally. This revision of
probability with added information is formalised in probability theory with the
help of famous Bayes’ Theorem. The theorem, discovered in 1761 by the
English clergyman Thomas Bayes, has had a profound impact on the
development of statistics and is responsible for the emergence of a new
philosophy of science. Bayes himself is said to have been unsure of his
extraordinary result, which was presented to the Royal Society by a friend in
1763 - after Bayes’ death. We will first understand. The Law of Total
Probability, which is helpful for derivation of Bayes’ Theorem.

ITQ: The probability always has two chances namely:


ITA: The chances are either to win or lose and occurrence or non-occurrence.

Summary
In this study session, we have discussed the meaning of probability, some basic
concepts and Bayes’ theorem.
(E) Discussion Questions
1. Explain with examples the meaning of the following:
a. Independent events
b. Mutually exclusive events
c. Conditional probability
d. Expected value
2. What is the difference between mathematical and statistical probabilities?
3. State probability axioms and the multiplication theorem of probability.
4. Find the probability that:
• both machines will be operating in two years’ time.
• neither machine will be operating in two years’ time.
• at least one machine will be operating in two years’ time.
• Find the probability that:
• the ball is yellow, given that it is striped.
• the ball is striped, given that it is red.
• the ball is blue, given that it is solid-coloured.
5. Compute the Expected Value and Variance of the problem.
6. Explain Bayes’ Theorem.
7. Define discrete random variable and discrete probability distribution.

References
1. Dominick, Salvatore and Derrick, Reagle (2002). Theory and Problems of
Statistics and Econometrics (2ndEdition). New York: McGraw-Hill
Companies, Inc.
2. Gerald, K and Brian, W (2006).Statistics for Management and Economics
(10th Edition). Pacific Grove-USA: Brooks/Cole-Thompson Learning.
3. Hein, L. W (2015). Quantitative Approach to Managerial Decisions. New
Jersey: Prentice Hall.
4. Hooda, R. P (2014). Statistics for Business and Economics. New Delhi:
Macmillan.
5. Lind, Marchal and Marson (2005). Statistical Techniques in Business and
Economics (12th Edition). New York: McGraw-Hill Companies.
9.2.2.3 STUDY SESSION 7
Probability II
Section and Subsection Headings
Introduction
Learning outcomes
Main Content
(A) Discrete Probability Distribution
(B) Bernoulli Random Variable
(C) The Binomial Distribution
(D) The Poisson Distribution
(E) Hyper-geometric Distribution
Summary
Self-assessment questions and Answers
References/Further Readings

Introduction
You are welcome to this study session. In this session, you will understand the
meaning of Discrete Probability Distribution, Bernoulli Random Variable,
Binomial Distribution, Poisson Distribution as well as Hyper-Geometric
Distribution.

Learning Outcomes
At the end of this study session, you should be able to:
1. Know the meaning of Discrete Probability Distribution.
2. Know the concepts of Bernoulli Random Variable, Binomial
Distribution, Poisson Distribution and many others.
(A) Discrete Probability Distribution
In many situations, our interest does not lie in the outcomes of an experiment as
such; we may find it more useful to describe a particular property or attribute of
the outcomes of an experiment in numerical terms. For example, out of three
births; our interest may be in the matter of the probabilities of the number of
boys. Consider the sample space of 8 equally likely sample points.

GGG GGB GBG BGG


GBB BGB BBG BBB

Now look at the variable “the number of boys out of three births”. This
number varies among sample points in the sample space and can take values 0,
1, 2, 3, and it is random –given to chance.

“A random variable is an uncertain quantity whose value depends on


chance”. A random variable may be…

Discrete if it takes only a countable number of values. For example, number of


dots on two dice, number of heads in three coin tossing, number of defective
items, number of boys in three births and so on.

Continuous if can take on any value in an interval of numbers (i.e. its possible
values are unaccountably infinite). For example, measured data on heights,
weights, temperature, and time and so on.

A random variable has a probability law - a rule that assigns probabilities to


different values of the random variable. This probability law - the probability
assignment is called the probability distribution of the random variable. We
usually denote the random variable by X.
The random variable X denoting “the number of boys out of three births”, we
introduced in the introduction of the lesson, is a discrete random variable; so it
will have a discrete probability distribution. It is easy to visualize that the
random variable X is a function of sample space. We can see the
correspondence of sample points with the values of the random variable as
follows:

BBB GGB GBG BBG


(X=0) (X=1)
GBB BGB BBG BBB
(X=2) (X=3)

The correspondence between sample points and the value of the random
variable allows us to determine the probability distribution of X as follows:

P(X=0) = 1/8 since one out of 8 equally likely points leads to X = 0


P(X=1) = 3/8 since three out of 8 equally likely points leads to X = 1
P(X=2) = 3/8 since three out of 8 equally likely points leads to X = 2
P(X=3) = 1/8 since one out of 8 equally likely points leads to X = 3

The above probability statement constitute the


probability distribution of the random variable X =
number of boys in three births. We may appreciate
how this probability law is obtained simply by
associating values of X with sets in the sample
space. (For example, the set GGB, GBG, BGG leads to X = 1). We may write
down the probability distribution of X in table format or we may plot it
graphically by means of probability Histogram or a Line chart.
The probability distribution of a discrete random variable X must satisfy
the following two conditions:

1. P(X = x) ≥ 0 for all values x


2. ∑ P( X = x ) = 1
allx

These conditions must hold because the P(X = x) values are probabilities. First
condition specifies that all probabilities must be greater than or equal to zero, as
we know from study session 6.

For the second condition, we note that for each value x, P(x) = P(X = x) is the
probability of the event that the random variable equals x. Since by definition all
x means all the values the random variable X may take, and since X may take
on only one value at a time, the occurrences of these values are mutually
exclusive events, and one of them must take place. Therefore, the sum of all the
probabilities P(X = x) must be 1.00.

In practice, however, assessing the probability of every possible value of a


random variable through actual experiment can be difficult, even impossible,
especially when the probabilities are very small. But we may be able to find out
what type of random variable the one at hand is by examining the causes that
make it random. Knowing the type, we can often approximate the random
variable to a standard one for which convenient formulae are available.

The proper identification of experiments with certain known processes in


Probability theory can help us in writing down the probability distribution
function. Two such processes are the Bernoulli Process and the Poisson
Process. The standard discrete probability distributions that are consequent to
these processes are the Binomial and the Poisson distribution. We will now
look into the conditions that characterize these processes, and examine the
standard distributions associated with the processes. This will enable us to
identify situations for which these distributions apply.

Let us first study the Bernoulli random variable, named so in honor of the
mathematician Jakob Bernoulli (1654-1705). It is the building block for other
random variables and the resulting distributions we will study here and other
study sessions.

(B) Bernoulli Random Variable


Suppose an operator uses a lathe to produce pins, and the lathe is not perfect in
the sense that it does not always produce a good pin. Rather, it has a probability
p of producing a good pin and (1 - p) of producing a defective one. Let us
denote a good pin as “success” and a defective pin as “failure”.

Just after the operator produces one pin, it is inspected; let X denote the
“number of good pins produced” i. e. “the number of successes”.

Now analysing the trial- “inspecting a pin “and our random variable X-
“number of successes”, we note two important points:

1. The trial-“inspecting a pin “has only two possible outcomes, which are
mutually exclusive. Such a trial, whose outcome can only be either a success or
a failure, is a Bernoulli trial. In other words, the sample space of a Bernoulli
trial is
S= {success, failure}
2. The random variable, X, that measures number of successes in one Bernoulli
trial, is a Bernoulli random variable. Clearly, X is1 if the pin is good and 0 if
it is defective.

It is easy to derive the probability distribution of Bernoulli random variable


X: 0 1
P(X): p 1-p

If X is a Bernoulli random variable, we may write


X ~ BER (p)
Where ~ is read as “is distributed as” and BER stands for Bernoulli.

A Bernoulli random variable is too simple to be of immediate practical use. But


it forms the building block of the Binomial random variable, which is quite
useful in practice. The binomial random variable in turn is the basis for many
other useful cases, such as Poisson random variable.

(C) The Binomial Distribution


In the real world we often make several trials, not just one, to achieve one or
more successes. Let us consider such cases of several trials.

Consider n number of identically and independently distributed Bernoulli


random variables X 1 , X 2 ………, X n . Here, identically means that they all have
the same p, and independently means that the value of one X does not in any
way affect the value of another. For example, the value of X 2 does not affect the
value of X 3 or X 8 and so on. Such a sequence of identically and independently
distributed Bernoulli variables is called a Bernoulli Process.
Suppose an operator produces n pins, one by one, on a lathe that has probability
p of making a good pin at each trial, the sequence of numbers (1 or 0) denoting
the good and defective pins produced in each of the n trials is a Bernoulli
process. For example, in the sequence of nine trials denoted by:
001011001

the third, fifth, sixth and ninth are good pins, or successes. The rest are failures.
In practice, we are usually interested in the total number of good pins rather
than the sequence of 1’sand 0’s. In the example above, four out of nine are
good. In the general case, let X denote the total number of good pins produced
in n trials. We then have
X = X 1 + X 2 +………+ X n where all X i ~ BER(p) and are independent.
The random variable that counts the number of successes in many
independent, identical Bernoulli trials is called a Binomial Random
Variable.

Conditions for a Binomial Random Variable


We may appreciate that the condition to be satisfied for a binomial random
variable is that the experiment should be a Bernoulli Process.

Any uncertain situation or experiment that is marked by the following three


properties is known as a Bernoulli Process:

1. There are only two mutually exclusive and collectively exhaustive outcomes
in the experiment i.e. S= {success, failure}
2. In repeated trials of the experiment, the probabilities of occurrence of these
events remain constant
3. The outcomes of the trials are independent of one another.
The probability distribution of Binomial Random Variable is called the
“Binomial Distribution”

Binomial Probability Function


The binomial distribution probability is given as:
f(X)= n C x P x (1 − P )
n −x

Where: n = Sample size


x = 0, 1, 2, ……., n
p = Probability of a success
q = (1-p) = Probability of failure

Note: X should not exceed 20.


Looking at the binomial distribution formula, it is clear that it has two
parameters. These are n and p. Therefore, the mean of a binomial distribution is
np.
If the mean of a binomial distribution is np then the standard deviation of a
binomial distribution is given as:
np(1 − p ) = npq

The principles upon which binomial distributions are applicable are, in the cases
of repeated trials.

Examples of trials in binomial distribution:


1. Probability that 5 out of 15 students will pass the introductory statistics
course;
2. Probability that half the patients will die in the emergency ward;
3. Sampling from a finite population with replacement;
4. Sampling from infinite population without replacement.

Example:
A super market ordered for certain product from the manufacturer. For the
goods to be accepted by the super market, a team of senior managers has to
inspect the product, cartoon by carton. The managers finally arrived at a
decision to make a random sample of seven items from each carton to see
whether they are good or defective. If there is one or less defective item in the
sample of seven items from each carton, the carton is accepted or otherwise it is
rejected.

Find the probability of accepting a carton that is one – percent defective.

Solution:
Probability of defective 1% = 0.01
Sample size, n = 7

Decision Rule:
Accept carton when x = 0, or x = 1
Probability of defective = 0.01
Probability of no defective (goods) = 0.99
P(accepting carton) = n C x P x (1 − P )n − x
P(no defective) is given as:
P(X)= n C x P x (1 − P )
n −x

n
= ∑ C x (0.01) (0.99 )
x n −x

x =0

P(0)= 7 C 0 (0.01) 0 (0.99 )


7 −0

= 7 C 0 (0.01) 0 (0.99 )
7
= 1 × 1 × 0.9321

= 0.9321

Probability of one defective is given as:


P(X)= n C x P x (1 − P )
n −x

P(1)= 7 C1 (0.01)1 (0.99)


7 −1

= 7 C1 (0.01)(0.99)
6

= 7 × 0.01 × 0.9415

= 0.0659

P (carton is accepted) = P (0 defective) + P (1 defective)


= 0.9321 + 0.0659
= 0.99

(D) The Poisson Distribution


Poisson Distribution was developed by a French Mathematician Simeon D.
Poisson(1781-1840).Poisson distribution is a discrete random variable, which is
often useful when dealing with the number of occurrences of events over a
specified period of time or after a certain interval of time.

Characteristics of Poison Distribution


1. The probability of an occurrence of the event is the same for any interval
of equal length.
2. The occurrence or non–occurrence of the event in any interval is
dependent on the occurrence or non–occurrence in any other interval.

Example of Poisson distribution:


a) Demand for a product or service
b) All arrivals.
i) Cars arriving at a banquet hall
ii) Planes arriving at the airport
iii) Students arriving at a lecture hall, etc.

If a random variable X is said to follow a Poisson Distribution, then its


probability distribution is given by
µ x e −µ
P( X = x ) = x = 1, 2, 3 …
x!
Where: x = is the number of successes
µ (mu / meu ) = is the mean of the Poisson distribution.

e = 2.71828 (the base of natural logarithms)


The random variable X counts the number of successes in Poisson Process.

A Poisson process corresponds to a Bernoulli process under the following


conditions:
1. The number of trials n, is infinitely large i.e. n → α
2. The constant probability of success p, for each trial is infinitely small i.e.
p →0(obviously q → 1)
3. np = μ is finite.

We can develop the Poisson probability rule from the Binomial probability rule
under the above conditions.

ITQ: What is a Poisson distribution?


ITA: Poisson distribution is a discrete random variable, which is often useful when
dealing with the number of occurrences of events over a specified period of time or
after a certain interval of time.
Poisson distribution may be expected in situations where the chance of
occurrence of any event is small, and we are interested in the occurrence of the
event and not in its non-occurrence. For example, number of road accidents,
number of defective items, number of deaths in flood or because of snakebite or
because of a rare disease etc. In these situations, we know about the occurrence
of an event although its probability is very small, but we do not know how
many times it does not occur. For instance, we can say that two road accidents
took place today, but it is almost impossible to say as to how many times,
accident fails to take place. The reason is that the number of trials is very large
here and the nature of event is of rare type. The Poisson random variable X,
counts the number of times a rare event occurs during a fixed interval of time or
space.

Example:
A labour expert on the phenomenon of strikes conducted a research. The
researcher found out that the mean number of strikes in a manufacturing
industry was 3.4 per month.

What is the probability that during a given month, there will be?
a. No strike at all
b. More than 2 strikes
c. Exactly 4 strikes

Solution:
µ xe−µ
f (x ) =
x!
µ = 3.4
3.4 0 e −3.4
a. P(no strikes) = P(x = 0) = = e −3.4 = 0.0334
0!
b. P(more than 2 strikes) = P(x > 2)
P(x > 2) = 1 − P( x ≤ 2)
The sum of all the probability is equal to one:
P( x ≤ 2) = P(x = 0 ) + P(x = 1) + P(x = 2 )
3.4 0 e −3.4
P(x = 0) = = e −3.4 = 0.0334
0!
3.41 e −3.4
P( x = 1) = = 3.4e −3.4 = 3.4 × 0.0334 = 0.1136
1!
3.4 2 e −3.4 11.56e −3.4
P(x = 2) = = = 0.1931
2! 2!

P (more than 2 strike) = 1 – [P (0) +P (1) +P (2)]


= 1 – [0.0334+0.1136+0.1931]
= 1 – 0.3401
= 0.6599

µ xe−µ
c. P(exactly 4 strikes) = f (x ) =
x!
3.4 4 e −3.4
P(x = 4) =
4!
−3.4
133.63e
=
4 × 3× 2
133.63 × 0.0334
=
24
4.4633
=
24
= 0.1859

(E) Hyper-geometric Distribution


This type of experiment is often referred to as sampling with replacement.
The hyper-geometric distribution is used to calculate probabilities when
sampling without replacement. For example, suppose you first randomly sample
one card from a deck of 52. Then, without putting the card back in the deck you
sample a second and then (again without replacing cards) a third. Given this
sampling procedure, what is the probability that exactly two of the sampled
cards will be aces (4 of the 52 cards in the deck are aces). You can calculate this
probability using the following formula based on the hyper-geometric
distribution:
Where:
k is the number of "successes" in the population
x is the number of "successes" in the sample
N is the size of the population
n is the number sampled
p is the probability of obtaining exactly x successes
k C x is the number of combinations of k things taken x at a time

In this example, k = 4 because there are four aces in the deck, x = 2 because the
problem asks about the probability of getting two aces, N = 52 because there are
52 cards in a deck, and n = 3 because 3 cards were sampled. Therefore:

The mean and standard deviation of the hyper-geometric distribution are:

ITQ: Which type of experiment is often referred to as sampling with replacement?


ITA: Hyper-geometric Distribution.
Summary

In this study session, we have discussed the meaning of Discrete Probability


Distribution, Bernoulli Random Variable, Binomial Distribution, Poisson
Distribution as well as Hyper-Geometric Distribution.

(F) Discussion Questions


1. What are the properties of discrete probability distribution?
2. Define continuous random variable and continuous probability
distribution.
3. What are the properties of continuous probability distribution?
4. Construct probability distribution of the problem. What is the probability
that at least one car will be sold in a given day? More than four? Fewer
than five? Compute and interpret the mean of the distribution. Compute
the standard deviation

References
1. Dominick, Salvatore and Derrick, Reagle (2002). Theory and Problems of
Statistics and Econometrics (2nd Edition). New York: McGraw-Hill
Companies, Inc.
2. Gerald, K and Brian, W (2006).Statistics for Management and Economics
(10th Edition). Pacific Grove-USA: Brooks/Cole-Thompson Learning.
3. Hein, L. W (2015). Quantitative Approach to Managerial Decisions. New
Jersey: Prentice Hall.
4. Hooda, R. P (2014). Statistics for Business and Economics. New Delhi:
Macmillan.
5. Lind, Marchal and Marson (2005). Statistical Techniques in Business and
Economics (12th Edition). New York: McGraw-Hill Companies.
9.2.2.4 STUDY SESSION 8
Probability III
Section and Subsection Headings
Introduction
Learning outcomes
Main Content
(A) Continuous Probability Distribution
(B) The Normal Distribution
(C) The Standard Normal Distribution
(D) The Transformation of Normal Random Variables
Summary
Self-assessment questions and Answers
References/Further Readings

Introduction
You are welcome to this study session. In this session you will get to understand
the meaning of Continuous Probability Distribution, Normal Distribution,
Standard Normal Distribution as well as the Standard Normal Variable
Transformation.

Learning Outcomes
At the end of this study session, you should be able to:
1. Understand the meaning of Continuous Probability Distribution.
2. Explain the Normal Distribution, Standard Normal Distribution as
well as the Standard Normal Variable Transformation.

(A) Continuous Probability Distribution


We have learnt that a probability distribution is basically a convenient
representation of the different values a random variable may take, together with
their respective probabilities of occurrence. In the last session, we have
examined situations involving discrete random variables and the resulting
discrete probability distributions. Consider the following random variables that
we have taken up in the last lesson:

1. Number of Successes (X 1 ) in a Bernoulli’s Process


2. Number of Successes (X 2 ) in a Poisson Process

In the first case, Binomial random variable X 1 could take only finite number of
integer values;0,1,2…n; whereas in the second case, Poisson random variable
X 2 could take an infinite number of integer value; 0,1,2,3…………The random
variables X 1 and X 2 are discrete, in the sense that they could be listed in a
sequence, finite or infinite. In contrast to these, let us consider a situation, where
the variable of interest may take any value within a given range. Suppose we are
planning for measuring the variability of an automatic bottling process that fills
½ liter (500 cm3) bottles with cola. The variable, say X, indicating the deviation
of the actual volume from the normal (average) volume can take any real value -
positive or negative; integer or decimal. This type of random variable which can
take an infinite number of values in a given range, is called a continuous
random variable, and the probability distribution of such a variable is called a
continuous probability distribution. The concepts and assumption inherent in
the treatment of such distributions are quite different from those used in the
context of a discrete distribution. In this study session, after understanding the
basic concepts of continuous distributions, we will discuss Normal distribution -
an important continuous distribution that is applicable to many real-life
processes.

A continuous random variable is a random variable that can take on any


value in an interval of numbers.
The probabilities associated with a continuous random variable X are
determined by the probability density function of the random variable. The
function, denoted by f(x), has the following properties:
1. f(x)= 0 for all x
2. The probability that X will be between two numbers a and b is equal to
the area under f(x) between a and b.
b
P(a 〈 X 〈b ) = ∫ f ( x ).dx = 1.00
a

3. The total area under the entire curve of f(x) is equal to 1.00.

P(− ∝≤ X ≤∝ ) = ∫ f ( x ).dx = 1.00

When the sample space is continuous, the probability of any single given value
is zero. For a continuous random variable, therefore, the probability of
occurrence of any given value is zero. We see this from property 2, noting that
the area under a curve between a point and itself is the area of a line, which is
zero. For a continuous random variable, non-zero probabilities are
associated only with intervals of numbers.

We define the cumulative distribution function F(x) for a continuous random


variable similarly to the way we defined it for a discrete random variable: F(x)
is the probability that X is less than (or equal to) x.

Thus, the cumulative distribution function of a continuous random variable:


F(x) = P(X = x) = area under f(x) between the smallest possible value of X
(often-∝) and point x.
x
= ∫ f ( x).dx
−∝

The cumulative distribution function F(x) is a smooth, non-


decreasing function that increases from 0 to 1.00.
The expected value of a continuous random variable X,
denoted by E(X), and its variance, denoted by V(X), require the use of calculus
for their computation. Thus

E ( X ) = ∫ x. f ( x).dx


V ( X ) = ∫ [x − E ( x)] . fx.dx
2

ITQ: The type of random variable which can take an infinite number of values in a
given range, is called ………
ITA: a continuous random variable

(B) The Normal Distribution


Many mathematicians have worked on the mathematics behind the normal
distribution and have made many independent discoveries. In the initial stages,
the normal distribution was developed by Abraham De Moivre (1667-1754).
His work was later taken up by Pierre S. Laplace (1949-1827). But the
discovery of equation for the normal density function is attributed to Carl
Friedrich Gauss (1777-1855), who did much work with the formula. In science
books, this distribution is often called the Gaussian distribution.

The Normal Distribution is the most versatile of all the continuous probability
distributions. It is being widely used in all data-based research in the field of
agriculture, trade, business and industry. It is found to be useful in
characterizing uncertainties in many real-life processes, in statistical inferences,
and in approximating other probability distributions.

A large number of random variables occurring in practice can be approximated


to the normal distribution.
A random variable that is affected by many independent causes, and the
effect of each cause is not overwhelmingly large compared to other effects,
closely follow a normal distribution.

The lengths of pins made by an automatic machine; the times taken by an


assembly worker to complete the assigned task repeatedly; the weights of
baseballs; the tensile strengths of a batch of bolts; and the volumes of cola in a
particular brand of canned cola - are good examples of normally distributed
random variables. All of these are affected by several independent causes where
the effect of each cause is small. This knowledge helps us in calculating the
probabilities of different events in varied situations, which in turn is useful for
decision-making.

In many real life situations, we face the problem of making statistical inferences
about processes based on limited data. Limited data is basically a sample from
the full body of data on the process. Irrespective of how the full body of data is
distributed, it has been found that the Normal Distribution can be used to
characterize the sampling distribution of many of the sample statistics. (We will
see it in next few study sessions). This helps considerably in Statistical
Inferences.

Finally, the Normal Distribution can be used to approximate certain probability


distributions. This helps considerably in simplifying the probability
calculations.

Several situations mean practical situations. Normal distribution as a continuous


random variable has a “bell shaped” curve.
Median
Mode
Mean

A normal distribution has the probability density function given as:


1
f ( x) =
σ 2π
−1  x − µ 
2

=e 2

 σ 

Where: μ = mean of the distribution


σ = standard deviation
e = 2.71828 (the base of natural logarithms)

A normal distribution has two parameters; mean and standard deviation. The
two parameters of a normal distribution (i.e. mean of a distribution and standard
deviation) are denoted by (μ) and (σ) respectively. The distribution is
symmetrical and has an area of one square unit. Since the distribution is
symmetrical, it means that half the curve is 0.5 square unit (e.g. cm, kg, litres,
etc.).

To be able to use normal distribution tables, we convert the distribution to a


standard normal distribution whose variables has no unit. The mean is zero (0)
and the standard deviation is one (1).

The transformation equation, called the standard deviation, is the standard


normal distribution.
x−µ
Z=
σ
0.5 0.5

ITQ: Which distribution is the most versatile of all the continuous probability
distributions?
ITA: The Nominal Distribution which was developed by Abraham De Moivre (1667-
1754).

Example:
The attendance at Ababa’s wedding is normally distributed with a mean of four
hundred and a standard deviation of one hundred participants. What is the
probability that the participants are?
a. Between 250 and 500
b. Less than 250
c. Between 500 and 600
d. More than 600.

Solution:
μ = 400
σ = 100
a. Where x = 250

250 400 600


x−µ
Z=
σ
250 − 400
Z= = −1.5
100
P(Z = −1.5) = 0.4332

Where x = 500
500 − 400
Z= =1
100
P(Z = 1) = 0.3413

0.4332 0.3413

-1.5 0 1

P(-1.5 ≤ Z ≤ 1) = P(Z = -1.5) + P(Z = 1)

= 0.4332 + 0.3413

= 0.7745
b. The probability that the participants are
When x = 250

250 400 Z

x−µ
Z=
σ
250 − 400
Z= = −1.5
100
P(Z = −1.5) = 0.4332
P(Z < −1.5) = 0.5 − 0.4332
0.4332
= 0.0668

-1.5 0 Z
The probability that the participants are less than 250 is 0.0668

c. Between 500 and 600

400 500 600


Where x = 500
500 − 400
Z= =1
100
P(Z = 1) = 0.3413

Where x = 600
600 − 400
Z= =2
100
P(Z = 2) = 0.4772

0 1 2 Z

P(1 ≤ Z ≤ 2) = P(Z = 2) − P(Z = 1)


= 0.4772 − 0.3413
= 0.1359
The probability that the participants are between 500 and 600 is 0.1359.

d. More than 600

400 600 Z
Where x = 600
600 − 400
Z= =2
100
P(Z = 2) = 0.4772

0.4772

0 2 Z

P(Z > 2) = 0.5 − 0.4772

= 0.0228
The probability that the participants are more than 600 is 0.2718.

(C) The Standard Normal Distribution


There are infinitely many possible normal random variables and the resulting
normal curves for different values of μ and σ2. So the range probability P (a< X
<b) will be different for different normal curves. We can make use of integral
calculus to compute the required range probability
b
P(a 〈 X 〈b ) = ∫ f ( x ).dx
a
It may be appreciated that we can simplify this process of computing range
probabilities to a great extent by tabulating the range probabilities. Since it is
not practicable and indeed impossible to have separate probability tables for
each of the infinitely many possible normal curves, we select one normal curve
to serve as a standard. Probabilities associated with the range of values of this
standard normal random variable are tabulated. A special transformation then
allows us to apply the tabulated probabilities to any normal random variable.
The standard normal random variable is denoted by a special name, Z (rather
than the general name X we use for other random variables).

We define the standard normal random variable Z as the normal random


variable with mean = 0 and standard deviation = 1.

We say:
Z ~ N (0,12)

(D) The Transformation of Normal Random Variables


The importance of the standard normal distribution derives from the fact that
any normal random variable may be transformed to the standard normal random
variable. If we want to transform X, where X ~ N (μ, σ2), into the standard
normal random variable Z ~ N (0, 12), we can do this as follows:
X −µ
Z=
σ

We move the distribution from its center of μ to a center of 0. This is done by


subtracting μ from all the values of X. Thus, we shift the distribution μ units
back so that it’s new center is0. To make the standard deviation of the
distribution equal to 1, we divide the random variable by its standard deviation
σ. The area under the curve adjusts so that the total remains the same. All
probabilities (areas under the curve) adjust accordingly. Thus, the
transformation from X to Z is achieved by first subtracting μ from X and then
dividing the result by σ.

X −µ
The transformation Z = takes us from a random variable X with mean μ,
σ
and standard deviation σ to the standard normal random variable. We also have
an Opposite, or Inverse Transformation, which takes us from the standard
normal random variable Z to the random variable X with mean μ and standard
deviation σ. The inverse transformation is given as
X = μ + Zσ

We use the inverse transformation when we want to get from a given


probability, the value or values of a normal random variable X.

We can summarise the procedure of obtaining values of a normal random


variable, given a probability, as:
1. Draw a picture of the normal distribution in question and the standard
normal distribution
2. In the picture, shade in the area corresponding to the probability
3. Use the table to find the z value (or values) that gives the required
probability
4. Use the transformation from Z to X to get the appropriate value (or
values) of the original normal random variable

Summary
In this study session, we have discussed the meaning of Continuous Probability
Distribution, Normal Distribution, Standard Normal Distribution as well as the
Standard Normal Variable Transformation.
(E) Discussion Questions
1. What is the probability that a baby born in the hospital will weigh more than
8kg? Less than 7kg?
2. Why is there a need to standardised variable by means of transformation?

References
1. Dominick, Salvatore and Derrick, Reagle (2002). Theory and Problems of
Statistics and Econometrics (2nd Edition). New York: McGraw-Hill
Companies, Inc.
2. Gerald, K and Brian, W (2006).Statistics for Management and Economics
(10th Edition). Pacific Grove-USA: Brooks/Cole-Thompson Learning.
3. Hein, L. W (2015). Quantitative Approach to Managerial Decisions. New
Jersey: Prentice Hall.
4. Hooda, R. P (2014). Statistics for Business and Economics. New Delhi:
Macmillan.
5. Lind, Marchal and Marson (2005). Statistical Techniques in Business and
Economics (12th Edition). New York: McGraw-Hill Companies.
9.3 MODULE 3
Sampling and Sampling Methods
Introduction
A sample is a set of measurement taken from a process or series of experiment.
It must be regarded as having been drawn from a large population of
measurements covering a large number of observations. If there is limit to the
number of possible observations, we have a finite population. An unlimited
number of observations constitute an infinite sample.

A sample provides a small amount of information about the population and yet,
for various reasons, we rely on the information available from the sample. There
are two principal methods of drawing a sample from a population. These are
probability samples and non-probability samples. In probability samples, each
observation in the population has an equal chance of being selected to become a
part of the sample. In the case of non-probability samples, there is no way of
estimating the probability that each individual will be included in the sample.

Sample taken from a population, provided it is a representative sample, can be


subjected to hypothesis testing. A hypothesis is a tentative insight into a concept
that is not yet verified but that if true, would explain certain facts or phenomena.
There are very many statistical tools in existence that are used for hypothesis
testing. The use of the tools depends on the nature of the hypothesis on ground.
If one is dealing with a problem of establishing
whether relationship exists or not, a correlational
analysis will be used in testing the hypothesis. If
one’s problem is to establish whether there is effect
or impact on a given situation, then regression
analysis will be used in testing the hypothesis.
9.3.1Objectives
After this contact module, you would be able to:
1. understand what is meant by statistical inference
2. distinguish among various sampling methods including random,
systematic, stratified and quota among others
3. calculate and use the standard error of the mean
4. understand the principles of confidence limits
5. use the Finite Population Correction Factor
6. develop hypotheses
7. identify type I and type II errors
8. calculate significance level
9. distinguish between one tail and two tail tests
10.understand significance tests of means, proportions and the difference
between means and proportions
9.3.2 STUDY SESSIONS
9.3.2.1 STUDY SESSION 9: Sampling I
Section and Subsection Headings
Introduction
Learning outcomes
Main Content
(A) Census versus Sampling Method
(B) Probability Samples versus Non-Probability Samples
(C) Probability Sampling Methods
(i) Simple random sampling method
(ii) Stratified random sampling method
(iii) Cluster sampling or multistage sampling
(iv) Systematic sampling
(D) Non-Probability Sampling Methods
(i) Convenience sampling
(ii) Quota sampling
(iii) Judgment sampling
(E) Determination of Sample Size
Summary
Self-assessment questions and Answers
References/Further Readings

Introduction
You are welcome to another study session. In this study session you will
understand the meaning of sampling, the concept of probability & non-
probability samples, determination of sample size and many more.

Learning Outcomes
In this study session, you should be able to:
1. Discuss the meaning of sampling.
2. Explain the concept of probability & non-probability samples,
determination of sample size and many more.

(A) Census versus Sampling Method


Sample is a part of the population from which it is selected. The process of
selecting a sample is known as sampling. Thus, the sampling theory is a study
of relationship that exists between the population and the samples drawn from
the population. The complete enumeration, popularly known as census, may not
be feasible either due to non-availability of time or because of high cost
involved. Therefore, it becomes essential to draw inferences for the population
on the basis of sample information. Thus, sampling helps us to get as much
information as possible of the whole universe. Sampling also helps us in
determining the reliability of the estimates. This can be done by drawing
samples from the same parent population and comparing the results obtained
from different samples.

In a survey of the entire population, data is collected from every elementary unit
of the population. Suppose, one is studying the wage structure of the coal
mining industry in the country, then one approach is to collect the data on
wages of every worker in the coal industry. From this data, one can calculate the
various characteristics of the population, such as average wage, the range and
the variance, etc. This is referred as census survey.

ITQ: What is a sample?


ITA: A sample is a set of measurement taken from a process or series of experiment.

Although there are many advantages with the census method, the cost, effort
and the time required to conduct census survey is very large, unless the
population is very small, and in many cases it is so prohibitive that one rarely
uses this method in surveys.

Sampling involves an examination of a small portion of the elementary units in


a population. Although, a census operation gives a more reliable data, sampling
method is more desired when the population is very large, i.e., infinite and it
would be impossible to conduct census surveys; and when quick results are
required it would be appropriate to conduct sample surveys rather than census
surveys.

As the sampling involves less time and money, it would be possible to give
attention to different characteristics of the elementary units. A
sample using same money and time can produce a detailed
study of lesser number of units. The process of sampling
involves selecting a sample, collecting all relevant information,
and finally drawing conclusions about the population from
which the sample has been drawn.

ITQ: What is sampling? What is its importance?


ITA: Sampling involves an examination of a small portion of the elementary units in a
population. Sampling helps us to get as much information as possible of the whole
universe, it also helps us in determining the reliability of the estimates.

(B) Probability Samples versus Non-Probability Samples


A probability sample is one for which the inclusion or exclusion of any
individual element of the population depends upon the application of probability
methods and not on a personal judgment. It is so designed and drawn that the
probability of inclusion of an element is known. The essential feature of
drawing such a sample is the randomness. As against the probability sample, we
have a variety of other samples, termed as judgment samples, purposive
samples, quota samples, etc. These samples have one common distinguishing
feature: personal judgment rather than the random procedure to determine the
composition of what is to be taken as a representative sample. The judgment
affects the choice of the individual elements. All such samples are non-random,
and no objective measure of precision may be attached to the results arrived at.

In a probability sampling, it is possible to estimate the error in the estimates and


they can be minimised also. It is also possible to evaluate the relative efficiency
of the various probability sampling designs. Probability sampling does not
depend upon the detailed information about population for its effectiveness.
However, probability sampling requires a high level of skill and experience for
its use. It also requires sufficient time and money to execute.

Non-probability sampling is a procedure of selecting a sample without the use


of probability or randomisation. It is based on convenience, judgment, etc. The
major difference between the two approaches is that it is possible to estimate the
sampling variability in the case of probability sampling while it is not possible
to estimate the same in the non-probability sampling.

ITQ: What are the requirements of a probability sampling?


ITA: Probability sampling requires a high level of skill and experience for its use. It
also requires sufficient time and money to execute.

The classification of various probability and non-probability methods are


discussed below:

(C) Probability Sampling Methods


The various probability sampling methods are described as under:
(i) Simple random sampling method
In simple random sampling, drawing of elements from the population is random
and the choice of an element is made in such a way that every element has the
same probability of being chosen. When the sample is so selected, every
possible set of elements has the same chance of being drawn. With N,
population size, fairly large, the number of such possible sets of size n is of
course very large. This number is given by NC n . Of course, it is unnecessary in a
specific case to compute the number of possible sets of stated size that might be
drawn from a given population, but the process of sample selection should be
such that the probability of selection is the same for every such set.

The objective is to achieve randomness in drawing the individual elements of a


sample for ensuring that all possible samples have the same chance of being
selected. If we are to draw from a population containing N elementary units, the
elementary unit also being a sampling unit, it is necessary that each of the N
units should be individually numbered or otherwise distinctively designed. One
of the approaches for drawing random sample of size n from a population of N
units is to draw n cards from N cards which are numbered from 1 to N and
mixed thoroughly. The sample size n, thus drawn, would constitute a simple
random sample (SRS). Another popular method of selecting a random sample is
by lottery method. In this method all the elements are named or numbered on a
small slip of paper of identical shape and size. These slips are folded identically
and mixed up well in a container. Number of slips of desired sample size is
selected blindly from this container.

Thus, the selection of elementary units depends purely on chance and no


personal bias exists. We shall illustrate this method of selection of a sample
with the following example: Suppose the warden of a student’s hostel with 200
occupants wants to constitute a welfare committee with the members randomly
selected. The lottery method of selecting these five members from a group of
200 would be first to prepare 200 slips of identical shape and size and write the
name of each student on a slip. Fold these 200 slips identically and mix them
well in a container. Then select five folded slips, from the container at random.
The five students so selected would constitute a welfare committee of the
hostel.

There are, however, some difficulties in these procedures. For, if N is large, the
task becomes physically difficult. So it is desirable to use better methods for
ensuring randomness. One such method is the use of random number tables.

This sampling procedure is quite satisfactory for a small population. With a


large population, the process of identification of numbers to each elementary
sampling unit becomes very prohibitive with respect to both time and money.
Moreover, the population is often geographically spread out or composed of
clearly identified strata possessing unique characteristics. Whenever any of the
above situations arise, alternative sampling schemes that are sophisticated
combinations of simple random sampling provide significantly better results for
the same expenditure and time. As a result, the simple random sampling method
is not very frequently used in practice. However, the simple random sampling
scheme is the basis of any other probabilistic sampling schemes.

ITQ: What is the difference between a probability sampling and non-probability


sampling?
ITA: A probability sample is one for which the inclusion or exclusion of any individual
element of the population depends upon the application of probability methods and not on
a personal judgment. While Non-probability sampling is a procedure of selecting a
sample without the use of probability or randomization.
(ii) Stratified random sampling method
In simple random sampling, the population to be sampled is treated as
homogeneous and the individual elements are drawn at random from the whole
universe. However, it is often possible and desirable to classify the population
into distinctive classes or strata and then obtain a sample by drawing at random
the specified number of sampling units from each of the classes thus
constructed. This may be desirable because of our interest in the distinct classes
of the universe as a whole.

In stratified random sampling, the population is sub-divided into strata before


the sample is drawn. Strata are so designed that they should not overlap. A
sample of specified size is drawn at random from the sampling units that make
up each stratum. If a given stratum is of our interest, the corresponding sub-
sample provides the basis for estimates concerning the attributes of the
population stratum, or sub-universe from which it is drawn. The total of
subsamples constitutes the aggregate sample on which estimates of attributes of
the entire population are based.

Stratified samples may be either proportional or non-proportional. In a


proportional stratified sampling, the number of elements to be drawn from each
stratum is proportional to the size of that stratum compared with the population.
For example, if a sample size of 500 elementary units have to be drawn from a
population with 10,000 units divided in four strata.

Thus, the elements to be drawn from each stratum would be 100, 150, 200 and
50 respectively. Proportional stratification yields a sample that represents the
population with respect to the proportion in each stratum in the population.
Proportional stratified sampling yields satisfactory results if the dispersion in
the various strata is of proportionately the same magnitude. If there is a
significant difference in dispersion from stratum to stratum, sample estimates
will be much more efficient if non-proportional stratified random sampling is
used. Here, equal numbers of elements are selected from each stratum
regardless of how the stratum is represented in the population. Thus, in the
earlier example, an equal number, i.e., 125, of elementary units will be drawn to
constitute the sample.

A sample drawn by stratified random sampling scheme ensures a representative


sample as the population is first divided into various strata and then a sample is
drawn from each stratum. Stratified random sampling also ensures greater
accuracy and it is maximum if each stratum is formed in such a way that it
consists of uniform or homogeneous items. Compared with a simple random
sample, a stratified sample can be more concentrated geographically, i.e., the
elementary units from different strata may be selected in such a way that all of
them are located in one geographical area. This would also reduce both time and
cost involved in data collection. However, care should be exercised in dividing
the population into various strata. Each stratum must contain, as far as possible,
homogeneous units, as otherwise the reliability of the results would be lost.

ITQ: What is done in stratified sampling?


ITA: the population is sub-divided into strata before the sample is drawn.

In conclusion, stratification is an effective sampling device to the extent that it


creates classes that are more homogeneous than the total. When this can be
done, the classes are distinguished that differ among themselves in respect of a
stated characteristic. Stratification may be futile if classes do not differ among
themselves. Thus, there should be homogeneity within classes and
heterogeneity between classes.
(iii) Cluster sampling or multistage sampling
Under this method, the random selection is made of primary, intermediate and
final (or the ultimate) units from a given population or stratum. There are
several stages in which the sampling process is carried out. At first, the first
stage units are sampled by some suitable method, such as simple random
sampling. Then, a sample of second stage unit is selected from each of the
selected first stage units, again by some suitable method which may be same as
or different from the method employed for the first stage units. Further stages
may be added as required. The procedure may be illustrated as follows:

Suppose we want to take a sample of 5,000 households from the Kaduna State.
At the first stage, the state may be divided into a number of districts and a few
districts are selected at random. At the second stage, each district may be sub-
divided into a number of villages and a sample of villages may be taken at
random. At the third stage, a number of households maybe selected from each
of the villages selected at second stage. To take another example supposes in a
particular survey, we wish to take a sample of 10,000 students from Ahmadu
Bello University Zaria-Nigeria. We may take faculties at the first stage, then
draw departments at the second stage, and choose students as the third and last
stage.

(iv) Systematic sampling


Another sampling form, simple in design and execution, may be employed
when the members of population to be sampled are arranged in order, the order
corresponding to consecutive numbers. The arrangements of names in a
telephone directory or income-tax returns in the income tax department are the
illustrations of such orderings. A sample of suitable size is obtained by taking
every unit say, seventh unit of the population, one of the first seven units in this
ordered arrangement is chosen at random and the sample is completely by
selecting every seventh unit from the rest of the list. If the first unit selected is
the fifth, the researcher will include in his sample 12th, 19th, 26th, 33rd, etc.
We can generalise the approach as follows: if the requirements of the survey
call for the inclusion of one unit out of every m units in the population, a unit is
chosen at random from the first m units, thereafter, every mth unit in the
population when arranged in order, is included in the sample.

This mode of selection is called systematic sampling, m is generally referred to


as the sampling ratio, i.e., the ratio of the population size to the sample size.

N
Symbolically: m =
n

Where: N is the population size and n is the sample size. While calculating the
value of m, we may get a fractional value. In such cases, it is rounded off to the
nearest digit.

ITQ: What are the probability sampling methods?


ITA: They are: simple random sampling, Stratified random sampling, Cluster sampling
or multistage sampling and Systematic sampling.

(D) Non-Probability Sampling Methods


There are three methods of sampling in this category. These are explained as
follows:
(i) Convenience sampling
In this scheme, a sample is obtained by selecting ‘convenient’ population
elements. For example, a sample selected from the readily available sources or
lists such as telephone directory or a register of the small scale industrial units,
etc. will give us a convenient sample. In these cases, even if a random approach
is used for identifying the units, the scheme will not be considered as simple
random sampling. For example, if one studies the wage structure in a close by
textile industry by interviewing a few selected workers, then the scheme
adopted here is convenient sampling. The results obtained by convenience
sampling method can hardly be said to be representative of the population
parameters. Therefore, the results obtained are generally biased and
unsatisfactory. However, convenient sampling approach is generally used for
making pilot studies, particularly for testing a questionnaire and to obtain
preliminary information about the population.

(ii) Quota sampling


In this method of sampling, the basic parameters which describe the population
are identified first. Then the sample is selected which conform to these
parameters. Thus, in a quotas ample, quotas are fixed according to these
parameters, and each field investigator is assigned with quotas of the number of
units to be interviewed. Within the pre-assigned quotas, the selection of the
sample elements depends on the personal judgment. For example, if one is
studying the consumer preferences for ice creams among children and college
going students and supposes it is fixed to interview 250 individuals from each
category. If the city has five colleges, one decides to fix up a quota of 50
students to be interviewed from each college. It entirely depends upon the
interviewer who will constitute this sub-sample of 50 students in a college—
they may be the first 50 students who visit the ice cream parlour or may be the
50students who visit the parlour between 4 p.m. and 6 p.m., etc.

Quota sampling method has the advantage that the sample will conform to the
selected parameters of the population. The cost and time involved in getting
information from the sample will be relatively less for a quota sample but there
are many weaknesses too. Some of these include: difficulty to validate the
information gathered on the elementary units; also, it may be difficult to specify
the characteristics of the population and therefore it may be hard to identify it
and even when the sample does conform to the characteristics used in the
quotas, the sample may be distorted on other factors of importance in the study.

Quota sampling method is generally used in public opinion studies, election


forecast polls, as there is not sufficient time to adopt a probability sampling
scheme.

(iii) Judgment sampling


Judgment sampling method can also be called as sampling by opinion. In this
method, someone who is well acquainted with the population decides which
members (elementary units) in his or her judgment would constitute a proper
cross-section representing the parameters of relevance to the study. This method
of sampling is generally used in studies involving performance of personnel.
For example, if one is studying the performance of sales staff in a marketing
organisation, the people here are classified into top grade, medium grade and
low grade performers. Having specified qualities that are important in the study,
the expert (possibly here the Executive Director or Vice-President -Sales)
indicates the people who, in his or her knowledge, would be representative of
each of the three categories mentioned earlier. This, of course, is not a scientific
method, but in the absence of better evidence, such a judgment method may
have to be used.

ITQ: What are the non-probability sampling methods?


ITA: They are: Convenience sampling, Quota sampling and Judgemental sampling.

(E) Determination of Sample Size


We prefer samples to complete enumeration because of convenience and
reduced cost of data collection. However, in sampling, there is a likelihood of
missing some useful information about the population. For a high level of
precision, we need to take a larger sample. How large should be the sample and
what should be the level of precision? In specifying a sample size, care should
be taken such that:
(i) Neither so few are selected so as to render the risk of sampling error
intolerably large, nor
(ii) Too many units are included, which would raise the cost of the study
to make it inefficient.

It is, therefore, necessary to make a trade-off between


(i) Increasing sample size, which would reduce the sampling error but
increase the cost, and
(ii) Decreasing the sample size, which might increase the sampling error
while decreasing the cost.

Therefore, one has to make a compromise between obtaining data with greater
precision and with that of lower cost of data collection. Several factors need to
be considered before determining the sample size.

The first and the foremost is the size of the error that would be tolerable for the
purposes of decision-making. The second consideration would be the degree of
confidence with the results of the study, i.e., if one wants to be 100 per cent
confident of the results, the entire population must be studied. However, this is
generally too impractical and costly. Therefore, one must accept something less
than 100 per cent confidence. In practice, the confidence limits most often used
are 99 per cent, 95 per cent and 90 per cent. Most commonly used confidence
limit is 95 per cent. This means that there is a 5 per cent risk that the true
population statistic is outside the range of possible error specified by the
confidence interval. This 5 per cent risk appears to be acceptable in most of the
decisions. Thus, for 95 per cent level of confidence, Z value is 1.96. The Z value
can be obtained from normal probability distribution for a specified level of
confidence. For determining the sample size, we make use of the following
relationship:
σ
σ = Standard error of the estimate =

x n

σ can be calculated if we know the upper and lower confidence limits. Let

x

these limits be Y, then


Zσ − = Y
x

Where Z is the value of the normal variate for a


given confidence level. The procedure has been
explained using the example given below:

Example:
A state cooperative department is performing a survey to determine the annual
salary earned by managers numbering 3000 in the cooperative sector within the
state. How large a sample size it should take in order to estimate the mean
annual earnings within plus and minus 1,000 and at 95 per cent confidence
level? The standard deviation of annual earnings of the entire population is
known to be #3,000.

Solution:
As the desired upper and lower limit is #1,000, i.e., we want to estimate the
annual earnings within plus and minus #1,000.
∴ Zσ − = 1000
x

As the level of confidence is 95 per cent, the Z value is 1.96


∴ 1.96σ − = 1000
x
1000
σ = − = 510.20
x 1.96

σ
The standard error σ is given by
− . Where σ is the population standard
x n

deviation
∴ σ
= 510.20
n
3000
= 510.20
i.e., n
3000
n= = 5.88
i.e., 510.2

This gives n = 34.57


Therefore, the desired sample size is about 35.

Summary
In this study session, we have discussed the meaning of sampling, the concept
of probability & non-probability samples as well as determination of sample
size.

(F) Discussion Questions


1. Why is it important for a sample to be properly representative of its
parent population?
2. Explain the difference between sampling error and non-sampling error
and which of the errors is more serious and why.
3. Is it possible for a sample to yield better results than a census? Explain.
4. Define the following and illustrate with examples:
a. Parameter
b. Statistic
c. Standard error
5. What do you understand by central limit theorem?
6. Explain what you understand by probability sampling and non-probability
sampling.
7. Discuss the determinants of a sample size.

References
1. Dominick, Salvatore and Derrick, Reagle (2002). Theory and Problems of
Statistics and Econometrics (2nd Edition). New York: McGraw-Hill
Companies, Inc.
2. Gerald, K and Brian, W (2006). Statistics for Management and
Economics (10th Edition). Pacific Grove-USA: Brooks/Cole-Thompson
Learning.
3. Hein, L. W (2015). Quantitative Approach to Managerial Decisions. New
Jersey: Prentice Hall.
4. Hooda, R. P (2014). Statistics for Business and Economics. New Delhi:
Macmillan.
5. Lind, Marchal and Marson (2005). Statistical Techniques in Business and
Economics (12th Edition). New York: McGraw-Hill Companies.
9.3.2.2 STUDY SESSION 10
Sampling II
Section and Subsection Headings
Introduction
Learning outcomes
Main Content
(A) Sampling Distribution
(B) Why We Study Sampling Distributions?
(C) Sampling Distribution of the Mean
(D) The Central Limit Theorem
(E) Sampling Distribution of the Proportion
(F) Sampling Distribution of the Difference of Sample Means
(G)Sampling Distribution of the Difference of Sample Proportions
(H) Small Sampling Distributions
(I) Sampling Distribution of the Variance
(i) The Sample Variance
(ii) The Chi-Square Distribution
Summary
Self-assessment questions and Answers
References/Further Readings

Introduction
You are welcome to another study session. In this session, you will understand
the meaning of Sampling Distribution, Central Limit Theorem, Small Sampling
Distributions, Sampling Distribution of the Variance.

Learning Outcomes
In this study session, you will be able to:
1. Explain the meaning of Sampling Distribution.
2. Discuss the Central Limit Theorem, Small Sampling Distributions,

Sampling Distribution of the Variance.

(A) Sampling Distribution


Having discussed the various methods available for picking up a sample from a
population, we would naturally be interested in drawing statistical inferences -
making generalisations about the population on the basis of a sample drawn
from it. The generalisations to be made about the population are usually either
by way of:
(i) Estimating the unknown population parameters, or
(ii) Testing appropriate hypotheses stated in relation to population parameters
in the light of sample data.

These generalisations, together with the measurement of their reliability, are


made in terms of the relationship between the values of any sample statistic
and those of the corresponding population parameters. Population parameter
is any number computed (or estimated) for the entire population viz. population
mean, population median, population proportion, population variance and so on.
Population parameter is unknown but fixed, whose value is to be estimated from
the sample statistic that is known but random. Sample Statistic is any numbers
computed from our sample data viz. sample mean, sample median, sample
proportion, sample variance and so on.

It may be appreciated that no single value of the sample statistic is likely to be


equal to the corresponding population parameter. This owes to the fact that the
sample statistic being random, assumes different values in different samples of
the same size drawn from the same population.
Referring to our earlier discussion on the concept of a random variable in the
study sessions 6 and 7 on Probability
Distributions, it is not difficult to see
that any sample statistics is a
random variable and, therefore, has a
probability distribution better known as the Sampling Distribution of the
statistic.

The sampling distribution of a statistic is the probability distribution of all


possible values the statistic may take when computed from random samples
of the same size drawn from a specified population.

In reality, of course we do not have all possible samples and all possible values
of the statistic. We have only one sample and one value of the statistic. This
value is interpreted with respect to all other outcomes that might have
happened, as represented by the sampling distribution of the statistic. In this
lesson, we will refer to the sampling distributions of only the commonly used
sample statistics like sample mean, sample proportion, sample variance etc.,
which have a role in making inferences about the population.

(B) Why We Study Sampling Distributions?


Sample statistics form the basis of all inferences drawn about populations. Thus,
sampling distributions are of great value in inferential statistics. The sampling
distribution of a sample statistic possess well-defined properties which help lay
down rules for making generalizations about a population on the basis of a
single sample drawn from it. The variations in the value of sample statistic not
only determine the shape of its sampling distribution, but also account for the
element of error in statistical inference. If we know the probability distribution
of the sample statistic, then we can calculate risks (error due to chance)
involved in making generalisations about the population. With the help of the
properties of sampling distribution of a sample statistic, we can calculate the
probability that the sample statistic assumes a particular value (if it is a discrete
random variable) or has a value in a given interval. This ability to calculate the
probability that the sample statistic lies in a particular interval is the most
important factor in all statistical inferences. We will demonstrate this by an
example.

Suppose we know that 40% of the population of all users of hair oil prefers our
brand to the next competing brand. A “new improved” version of our brand has
been developed and given to a random sample of 100 users for use. If 55 of
these prefer our “new improved” version to the next competing brand, what
should we conclude? For an answer, we would like to know the probability that
the sample proportion in a sample of size 100 is as large as 55% or higher when
the true population proportion is only 40%, i.e. assuming that the new version is
no better than the old. If this probability is quite large, say 0.5, we might
conclude that the high sample proportion viz. 55% is perhaps because of
sampling errors and the new version is not really superior to the old. On the
other hand, if this probability works out to a very small figure, say 0.001, then
rather than concluding that we have observed a rare event we might conclude
that the true population proportion is higher than 40%, i.e. the new version is
actually superior to the old one as perceived by members of the population. To
calculate this probability, we need to know the probability distribution of
sample proportion i.e. the sampling distribution of the proportion.

ITQ: What is a sampling distribution of a statistic?


ITA: The sampling distribution of a statistic is the probability distribution of all
possible values the statistic may take when computed from random samples of the same
size drawn from a specified population.
(C) Sampling Distribution of the Mean
Suppose we have a simple random sample of size n, picked up from a
population of size N. We take measurements on each sample member in the
characteristic of our interest and denote the observation asx 1 , x 2 ,......x n
respectively. The sample mean for this sample is defined as:
x1 + x 2 + ..... + x n
X =
n

If we pick up another sample of size n from the same population, we might end
up with a totally different set of sample values and so a different sample mean.
Therefore, there are many (perhaps infinite) possible values of the sample mean
and the particular value that we obtain, if we pick up only one sample, is
determined only by chance. In other words, the sample mean is a random
variable. The possible values of this random variable depends on the possible
values of the elements in the random sample from which sample mean is to be
computed. The random sample, in turn, depends on the distribution of the
population from which it is drawn. As a random variable, X has a probability
distribution. This probability distribution is the sampling distribution of X .

The sampling distribution of X is the probability distribution of all possible


values the random variable X may take when a sample of size n is taken
from a specified population.

(D) The Central Limit Theorem


When sampling is done from a population with mean μ and standard
deviation σ, the sampling distribution of the sample mean X tends to a
σ
normal distribution with mean μ and standard deviation as the sample
n

size n increases.
The central limit theorem is remarkable because it states that the distribution of
the sample mean X tends to a normal distribution regardless of the distribution
of the population from which the random sample is drawn. The theorem allows
us to make probability statements about the possible range of values the sample
mean may take. It allows us to compute probabilities of how far away X may be
from the population mean it estimates. We will extensively use the central limit
theorem in the next sessions about testing of hypotheses.

The central limit theorem says that, in the limit, as n goes to infinity (n →∝) , the
distribution of X becomes a normal distribution (regardless of the distribution of
the population). The rate at which the distribution approaches a normal
distribution does depend, however, on the shape of the distribution of the parent
population.

In general, a sample of 30 or more elements is considered “Large Enough” for


the central limit theorem to be applicable.

ITQ: Why is the central limit theorem remarkable?


ITA: The central limit theorem is remarkable because it states that the distribution of the
sample mean X tends to a normal distribution regardless of the distribution of the
population from which the random sample is drawn.

(E) Sampling Distribution of the Proportion


Let us assume we have a binomial population, with a proportion p of the
population possesses a particular attribute that is of interest to us. This also
implies that a proportion q (=1-p) of the population does not possess the
attribute of interest. If we pick up a sample of size n with replacement and
found x successes in the sample, the sample proportion of success ( p )is given by

(p ) = nx
X is a binomial random variable, the possible value of this random variable
depends on the composition of the random sample from which p is computed.
The probability of x successes in the sample of size n is given by a binomial
probability distribution, viz.
P(x) = nC x pxqn-x

x
Since p = and n is fixed (determined before the sampling) the distribution of
n

the number of successes (x) leads to the distribution of p .

The sampling distribution of p is the probability distribution of all possible


values the random variable p may take when a sample of size n is taken
from a specified population.

The expected value and the variance of x i.e. number of successes in a sample
of size n is known to be:
E(x) = n p
Var (x) = n p q

(F) Sampling Distribution of the Difference of Sample Means


In order to bring out the sampling distribution of the difference of sample
means, let us assume we have two populations labeled as 1 and 2. So that
μ 1 and μ 2 denote the two population means.
σ 1 and σ 2 denote the two population standard deviations
n 1 and n 2 denote the two sample sizes
X 1 and X 2 denote the two sample means
Let us consider independent random sampling from the populations so that the
sample sizes need not be same for both populations.

Since X 1 and X 2 are random variables so is their difference X 2 − X 2 . As a


random variable, X 2 − X 2 has a probability distribution. This probability
distribution is the sampling distribution of X 2 − X 2 .

The sampling distribution of X 2 − X 2 is the probability distribution of all


possible values the random variable X 2 − X 2 may take when independent
samples of size n 1 and n 2 are taken from two specified populations.

(G)Sampling Distribution of the Difference of Sample Proportions


Let us assume we have two binomial populations labeled as 1 and 2. So that:
P 1 and p 2 denote the two population proportions
N 1 and n 2 denote the two sample sizes
p1 and p 2 denote the two sample proportions

Let us consider independent random sampling from the populations so that the
sample sizes need not be same for both populations.

Since p1 and p 2 are random variables so is their difference p1 − p 2 . As a random


variable, p1 − p 2 has a probability distribution. This probability distribution is
the sampling distribution of p1 − p 2
.
The sampling distribution of p1 − p 2 is the probability distribution of all
possible values the random variable p1 − p 2 may take when independent
samples of size n 1 and n 2 are taken from two specified binomial
populations.

(H) Small Sampling Distributions


The Z-statistic is used in statistical inference when sample size is large. It may,
however, be appreciated that the sample size may be prohibited from being
large either due to physical limitations or due to practical difficulties of
sampling costs being too high. Consequently, for our statistical inferences, we
may often have to contend ourselves with a small sample size and limited
information. The consequences of the sample being small; n < 30; are that

(i) The central limit theorem ceases to operate, and


(ii) The sample variance S2fails to serve as an unbiased estimator of σ 2

Thus, the basic difference which the sample size makes is that while the
sampling distributions based on large samples are approximately normal and
sample variance S2is an unbiased estimator of σ 2, the same does not occur when
the sample is small.

It may be appreciated that the small sampling distributions are also known as
exact sampling distributions, as the statistical inferences based on them are not
subject to approximation.

However, the assumption of population being normal is the basic qualification


underlying the application of small sampling distributions.

In the category of small sampling distributions, the Binomial and Poisson


distributions were already discussed in study session 7. Now we will discuss
another important small sampling distributions – the chi-square. The purpose of
discussing this distributions at this stage is limited only to understanding the
variables, which define it and its essential properties. The application of this
distributions will be highlighted in the next two sub-topics.

The small sampling distributions are defined in terms of the concept of degrees
of freedom (df). The concept of degrees of freedom (df) is important for many
statistical calculations and probability distributions. We may define df
associated with a sample statistic as the number of observations contained in
a set of sample data which can be freely chosen. It refers to the number of
independent variables which vary freely without being influenced by the
restrictions imposed by the sample statistic(s) to be computed.

Sampling essentially consists of defining various sample statistics and to make


use them in estimating the corresponding population parameters. In this respect,
degrees of freedom maybe defined as the number of n independent
observations contained in a sample less the number of parameters m to be
estimated on the basis of that sample information, i.e. df =n-m.

For example, when the population variance σ2is not known, it is to be estimated
by a particular value of its estimator S2; the sample variance. The number of
observations in the sample being n, df = n-m = n-1 because σ2is the only
parameter (i.e. m =1) to be estimated by the sample variance.

(I) Sampling Distribution of the Variance


We will now discuss the sampling distribution of the variance. We will first
introduce the concept of the sample variance as an unbiased estimator of
population variance and then present the chi-square distribution, which helps us
in working out probabilities for the sample variance.
(i) The Sample Variance
By now it is implicitly clear that we use the sample mean to estimate the
population mean and sample proportion to estimate the population proportion,
when those parameters are unknown. Similarly, we use a sample statistic called
the sample variance to estimate the population variance.

A sample statistic is an unbiased estimator of the population parameter when the


expected value of sample statistic is equal to the corresponding population
parameter it estimates.

Thus, if we use the sample variance S2as an unbiased estimator of population


varianceσ2

Then E(S2) = σ2

In other words to get the unbiased estimator of population variance σ2, we

divide the sum ∑ (x − x ) by the degree of freedom n-1.


n
2

i =1

(ii) The Chi-Square Distribution


Chi-square variable is denoted by χ2.The chi-square random variable is the
sum of several independent, squared standard normal random variables.

The chi-square distribution is the probability distribution of chi-square variable.


So, the chi-square distribution is the probability distribution of the sum of
several independent, squared standard normal random variables.

Properties of χ2 Distribution
1. A χ2distribution is completely defined by the number of degrees of
freedom, df= n. So there are many χ2distributions each with its own df.
2. χ2is a sample statistic having no corresponding parameter, which makes
χ2distributiona non-parametric distribution.
3. As a sum of squares the χ2random variable cannot be negative and is,
therefore, bounded on the left by zero.
4. The mean of a χ2distribution is equal to the degrees of freedom df. The
variance of the distribution is equal to twice the number of degrees of
freedom df.
E(χ2) = nVar (χ2 ) = 2n
5. Unless the df is large, a χ2 distribution is skewed to the right. As df
increases, theχ2distribution looks more and more like a normal. Thus for
large df

χ 2 ~ N  n, 2n 
2

 

In general, for n ≥ 30, the probability of χ2 taking a value greater than or


less than a particular value can be approximated by using the normal area
tables.
6. If χ12 , χ 22 , χ 32 ,........χ k2 are k independent χ2 random variables, with degrees of
freedom n 1 ,n 2 ,n 3 ,.........n k . Then their sum χ12 + χ 22 + χ 32 + ........ + χ k2 also
possesses a χ2distribution with df=n 1 + n 2 + n 3 + ......... + n k .

Summary
In this study session, we have discussed the meaning of Sampling Distribution,
the reasons why we study sampling distributions, Central Limit Theorem, Small
Sampling Distributions, Sampling Distribution of the Variance and many
others.
(I) Discussion Questions
1. What is sampling distribution?
2. Discuss the reasons why we study sampling distribution
3. What do you understand by central limit theorem?
4. Discuss the sampling distributions of the difference of sample proportions

References
1. Dominick, Salvatore and Derrick, Reagle (2002). Theory and Problems of
Statistics and Econometrics (2nd Edition). New York: McGraw-Hill
Companies, Inc.
2. Gerald, K and Brian, W (2006). Statistics for Management and
Economics (10th Edition). Pacific Grove-USA: Brooks/Cole-Thompson
Learning.
3. Hein, L. W (2015). Quantitative Approach to Managerial Decisions. New
Jersey: Prentice Hall.
4. Hooda, R. P (2014). Statistics for Business and Economics. New Delhi:
Macmillan.
5. Lind, Marchal and Marson (2005). Statistical Techniques in Business and
Economics (12th Edition). New York: McGraw-Hill Companies.
9.3.2.3 STUDY SESSION 11
Hypothesis
Section and Subsection Headings
Introduction
Learning outcomes
Main Content
(A) Hypothesis
(B) The Null and the Alternative Hypothesis
(C) Approach to Testing Hypotheses
(D) Test Statistic and the Meaning & Interpretation of P-Value
(D) β and Power of the Test
(E) Sample Size Effect
(F) General Testing Procedure
Summary
Self-assessment questions and Answers
References/Further Readings

Introduction
You are welcome to study session eleven. In this study session, you will
understand the meaning of Hypothesis, the concept of Null and Alternative
Hypotheses, Interpretation of P-Value, Power of a Test & Size Effect and many
more.

Learning Outcomes
In this study session, you will be able to:
1. Explain the meaning of Hypothesis.
2. Discuss the concept of Null and Alternative Hypotheses.
3. Explain the meaning & Interpretation of P-Value, Power of a Test
& Size Effect and many more.
(A) Hypothesis
Testing of Hypotheses is one of the most important aspects of the theory of
decision-making. In this session, we will study a class of problems where the
decision made by a decision maker depends primarily on the strength of the
evidence thrown up by a random sample drawn from a population. We can
elaborate this by an example where the operations manager of a Cola company
has to decide whether the bottling operation is under statistical control or it has
gone out of control (and needs some corrective action). Imagine that the
company sells Cola in bottles labeled 1-liter, filled by an automatic bottling
machine. The implied claim that on the average each bottle contains 1,000
cm3of cola may or may not be true.

(i) If the claim is true, the process is said to be under statistical control. It is
in the interest of the company to continue the bottling process
(ii) If the claim is not true i.e. the average is either more than or less than
1,000 cm3, the process is said to be gone out of control. It is in the
interest of the company to halt the bottling process and set right the error

Therefore, to decide about the status of the bottling operation, the operations
manager needs a tool, which allows him to test such a claim.

Testing of Hypotheses provides such a tool to the decision maker. If the


operations manager were to use this tool, he would collect a sample of filled
bottles from the on-going bottling process. The sample of bottles will be
evaluated and based on the strength of the evidence produced by the sample; the
operations manager will accept or reject the implied claim and accordingly
make the decision. The implied claim (μ = 1,000 cm3) is a hypothesis that needs
to be tested and the statistical procedure, which allows us to perform such a test,
is called Hypothesis Testing or Testing of Hypotheses.
What is a Hypothesis?
A thesis is something that has been proven to be true. A
hypothesis is something that has not yet been proven to
be true. It is some statement about a population
parameter or about a population distribution. Our
hypothesis for the example of bottling process could be:

“The average amount of cola in the bottles is equal to 1,000 cm3”

This statement is tentative as it implies some assumption, which may or may not
be found valid on verification. Hypothesis testing is the process of determining
whether or not a given hypothesis is true.

If the population is large, there is no way of analyzing the population or of


testing the hypothesis directly. Instead, the hypothesis is tested on the basis of
the outcome of a random sample.

ITQ: What is one of the most important aspect of the theory of decision making?
ITA: Testing of Hypotheses.

(B) The Null and the Alternative Hypothesis


As stated earlier, a hypothesis is a statement about a population parameter or
about a population distribution. In any testing of hypotheses problem, we are
faced with a pair of hypotheses such that one and only one of them is always
true. One of this pair is called the null hypothesis and the other one the
alternative hypothesis.

A null hypothesis is an assertion about the value of a population parameter. It


is an assertion that we hold as true unless we have sufficient statistical evidence
to conclude otherwise.
For example, a null hypothesis might assert that the population mean is equal to
1,000.Unless we obtain sufficient evidence that it is not 1,000, we will accept it
as 1,000.

We write the null hypothesis compactly as:


H 0 : μ =1,000

Where the symbol H 0 denotes the null hypothesis.

The alternative hypothesis is the negation of the null hypothesis.


For the null hypothesis H 0 : μ =1, 000, the alternative hypothesis is μ ≠ 1000.
We will write it as:
H 1 : μ ≠ 1,000

We use the symbol H 0 (or H a ) to denote the alternative hypothesis.

The null and alternative hypotheses assert exactly opposite statements.


Obviously, both H 0 and H 1 cannot be true and one of them will always be true.
Thus, rejecting one is equivalent to accepting the other. At the end of our testing
procedure, if we come to the conclusion thatH 0 should be rejected, this also
amounts to saying that H 1 should be accepted and vice versa. It is not difficult to
identify the pair of hypotheses relevant in any decision situation. Can anyone of
the two be called the null hypothesis? The answer is a big “NO” — because the
roles of H 0 and H 1 are not symmetrical.

The possible outcomes of a test can be summarized as:


Either: Accept H 0 -a weak conclusion without any evidence in as a reasonable
possibility support of H 0
Or: Reject H 0 andAccept H 1 -a strong conclusion with strong evidence against
H0

To better understand the role of null and alternative hypotheses, we can


compare the process of hypothesis testing with the process by which an accused
person is judged to be innocent or guilty. The person before the bar is assumed
to be “innocent until proven guilty” So using the language of hypothesis
testing, we have:
H 0 : The person is innocent
H 1 : The person is guilty

The outcomes of the trial process may result


i. Accepting H 0 of innocence: when there was not enough evidence to
convict. However, it does not prove that the person is truly innocent
ii. Rejecting H 0 and accepting H 1 of guilt: when there is enough evidence to
rule out innocence as a possibility and to strongly establish guilt

The jury acquitted Michael Jackson, on June 13, of all charges against him in
the child molestation case. In other words, using the language of hypothesis
testing the jury had to accept the null hypothesis

H 0 : Michael Jackson is innocent

Because the prosecution could not prove their case against H 0 of innocence.

In a trial case we do not have to rule out guilt in order to find someone innocent,
but we do have to rule out innocence in order to find someone guilty. On the
similar lines, we do not have to rule out H 1 in order to accept H 0 ; but we do
have to rule out H 0 in order to accept H 1 . Thus, it is clear that the two
hypotheses - null and alternative - are not interchangeable; each one plays a
different, a special role. So it becomes more important to be clear about what
the null and alternative hypotheses should be in a given situation, or else the test
is meaningless. One can conceptualize the whole procedure of testing of
hypothesis as trying to answer one basic question: Is the sample evidence
strong enough to enable us to reject H 0 ? This means that H 0will be rejected
only when there is strong sample evidence against it. However, if the sample
evidence is not strong enough, we shall conclude that we cannot reject H 0 and so
we accept H 0 by default.

Thus, H 0 is accepted even without any evidence in support of it whereas it can


be rejected only when there is overwhelming evidence against it. In other
words, the decision maker is somewhat biased towards the null hypothesis and
he does not mind accepting the null hypothesis. However, he would reject the
null hypothesis only when the sample evidence against the null hypothesis is too
strong to be ignored.

The null hypothesis is called by this name because in many situations,


acceptance of this hypothesis would lead to null action. Thus, one way to ensure
what the null hypothesis should be is to note that…

…if the null hypothesis is true, then no corrective action would be


necessary. If the alternative hypothesis is true, then some corrective action
would be necessary.

Recall our example of the Cola-company in which an automatic bottling


machine fills 1-literbottles with Cola. Now consider three different situations:
Situation I: The operations manager wants to test the average amount filled, in
order to know whether the process is under statistical control.

In this situation, the operations manager will have to take corrective action
when the average is either more than or less than 1,000 cm3. Only when the
average equals 1,000 cm3, no corrective action is necessary. So we have:
H 0 : μ = 1,000 cm3
H 1 : μ ≠ 1,000 cm3

Situation II: A consumer advocate suspects that the average amount of Cola is
less than1, 000 cm3and wants to test it.

In this situation, if the average amount of cola is greater than or equal to 1,000
cm3, no corrective action is needed, but if the average amount is less than 1,000
cm3, the company has to halt the bottling process and set right the error. So, in
this case, we have:
H 0 : μ ≥ 1,000 cm3
H 1 : μ <1,000 cm3

Situation III: The owner of the company suspects that the machine is wasting
cola by filling more than 1,000 cm3on the average and wants to test it.

From the owner’s point of view, no corrective action is necessary if the average
is less than or equal to 1,000 cm3. And, therefore, in this case we have:
H 0 : μ ≤ 1,000 cm3
H 1 : μ >1,000 cm3

As the bottling example indicates, there are three possible cases for the null
hypothesis, involving ≥, ≤ and = relationships. The exact null hypothesis should
be finalised before any evidence is gathered, or the test will not be valid. Data
snooping - formulating the null and alternative hypotheses at one’s convenience
after collecting and looking at the evidence – is unethical.

ITQ: What is the difference between the null hypothesis and the alternative
hypothesis?
ITA: A null hypothesis is an assertion about the value of a population parameter. It is
an assertion that we hold as true unless we have sufficient statistical evidence to
conclude otherwise. The alternative hypothesis is the negation of the null hypothesis.

(C) Approach to Testing Hypotheses


We will now discuss some concepts, which are essential for setting up a
procedure for testing of hypotheses.

Type I and Type II Errors


After the null and alternative hypotheses are spelled out, the next step is to
gather evidence from a random sample of the population. An important
limitation of making interferences from the sample data is that we cannot be
100% confident about it. Since variations from one sample to another can
never be eliminated until the sample is as large as the population itself, it is
possible that the conclusion drawn is incorrect which leads to an error.

Type I Error
In the context of statistical testing, the wrong decision of rejecting a true null
hypothesis is known as Type I Error. If the operations manager reject H 0 and
conclude that the process has gone out of control, when in reality it is under
control, he would be making a type I error.
Type II Error
The wrong decision of accepting (not rejecting, to be more accurate) a false null
hypothesis is known as Type II Error. If the operations manager do not reject
H 0 and conclude that the process is under control, when in reality it has gone out
of control, he would be making a type II error.

Both the type I and type II errors are undesirable and should be reduced to the
minimum. Let us analyse how we can minimise the chances of type I and type II
errors. It may be easily realized that it is possible, even with imperfect sample
evidence, to reduce the probability of type I error all the way down to zero. Just
accept the null hypothesis; no matter what the evidence is. Since we will never
reject any null hypothesis, we will never reject a true null hypothesis and thus
we will never commit a type I error.

However, it is obvious that this would be foolish. If we always accept a null


hypothesis, then given a false null hypothesis, no matter how wrong it is, we are
sure to accept it. In other words, our probability of committing a type II error
will be 1. Similarly, we find it foolish to reduce the probability of type II error
all the way down to zero by always rejecting a null hypothesis, for we would
then reject every true null hypothesis, no matter how right it
is. Our probability of type I error will be 1.Therefore, we
cannot and should not try to completely avoid either type of
error. We should plan, organise, and settle for some small,
optimal probability of each type of error. Before we discuss
this issue, we need to learn a few more concepts.

ITQ: What is the difference between Type I Error and Type II Error?
ITA: The wrong decision of rejecting a true null hypothesis is known as Type I Error,
while the wrong decision of accepting (not rejecting, to be more accurate) a false null
hypothesis is known as Type II Error.
(D) Test Statistic and the Meaning & Interpretation of P-Value
Consider the case of owner’s suspicion related to our bottling process example.
The null and alternative hypotheses in this case are:
H 0 : μ ≤ 1,000
H 1 : μ >1,000

Suppose the population variance is 25 and a random sample of size 100 yields a
sample mean of 1,000.5. Because the sample mean is more than 1,000, the
evidence goes against the null hypothesis (H 0 ). Can we reject H 0 based on this
evidence?
i. If we reject it, there is some chance that we might be committing a type I
error, and
ii. If we accept it, there is some chance that we might be committing a type
II error.

Then what can we do? We should ask a natural question at this situation-
“What is the probability that H 0 can still be true despite the evidence?” The
question asks for the “credibility” of H 0 in light of unfavorable evidence.
However, due to mathematical complexities, it is not possible to compute the
probability that H 0 is true. We, therefore, settle for a question that comes very
close.

“When the actual μ = 1,000, and with sample size 100, what is the
probability of getting a sample mean that is more than or equal to 1000.5?”

The answer to this question is then taken, as the “credibility rating” of H 0 .


Analyzing the question carefully, we note an important aspect:
The condition assumed is μ = 1,000; although H 0 states μ ≤ 1,000. The reason
for assuming μ = 1,000 is that it gives the most benefit of doubt to H 0 .If we
assume μ= 999, for instance, the probability of the sample mean being more
than or equal to1, 000.5 will only be smaller, and H 0 will only have less
credibility. Thus the assumption μ = 1,000gives the maximum credibility to H 0 .

Now using our knowledge of sampling distribution of sample mean, we can


easily answer our question.

Since population variance is known and sample size is large enough, the Central
Limit Theorem is applicable here, that is,
  σ 2 
X ~ N  µ ,   
  n  

X −µ
and the standard normal variable Z = is to be used to calculate the required
σ
n

probability P(X ≥ 1000.5)


 
 
( )
So P X ≥ 1000.5 = P  Z ≥
1000.5 − 1000

5 
 100 

= P (Z ≥1.00)
= 0.1587
≈ 0.16

So the answer to our question is 16%. That is, there is a 16% chance for a
sample of size 100to yield a sample mean more than or equal to 1000.5 when
the actual μ= 1,000. Statisticians call this 16% the p-value. In other words p-
value-the probability of observing a sample statistic as extreme as the one
observed if the null hypothesis is true- is a kind of “credibility rating” of H 0 in
light of the evidence. A p-value of zero means H 0 iscertainly false and a p-value
of 1 means that H 0 is certainly true. A p-value of 16% means that there is
roughly 16% probability that H 0 is true, despite the evidence. Conversely, we
can be roughly 84% confident that H0 is false in light of the evidence. The
implication is that if we reject H 0 , then there is about an 84% chance that we are
doing the right thing, and about a16% chance that we are committing a type I
error. The formal definition of the p-value is as follows:

Given a null hypothesis and sample evidence with sample size n, the p-value is
the probability of getting a sample evidence with the same n that is equally or
more unfavorable to the null hypothesis while the null hypothesis is actually
true. The p-value is calculated giving the null hypothesis the maximum benefit
of doubt.

The random variable, as Z in this case, used to calculate the p-value is called
test statistic. The formal definition of the test statistic is as follows:

A test statistic is a random variable calculated from the sample evidence, which
follows a well-known distribution and thus can be used to calculate the p-value.

Most of the time, the test statistic we use will be Z, t, χ2,or F. The distributions
of these random variables are well known and we can calculate the p-value.

Up to this point it is very much clear that statistical hypothesis is always stated
with reference to a population parameter (mean, proportion or variance). The
appropriate random variable calculated from the sample evidence acts as a test
statistic and provide the means to decide whether statistical hypothesis is to be
rejected or accepted.
The Significance Level-α
From our discussion on p-value, it becomes clear that the p-value of a test i.e.
the credibility of the null hypothesis varies with actual observed value of the
sample statistic. This fact necessitates having a policy for rejecting H0 based on
p-value.

The most common policy in statistical hypothesis testing is to establish a


significance level, denoted by α, and to reject H 0 when the p-value falls below
it. When this policy is followed, one can be sure that the maximum probability
of type I error is α.

Policy: When the p-value is less than α, reject H 0

In other words, we can say that the rejection region for H 0 is the area under the
curve where the p-value is less than α. This region is also called critical region.

The standard values for α are 10%, 5%, and 1%. Suppose α is set at 5%. In the
preceding example, for a sample mean of 1,000.5 the p-value was 16%, and
H 0 will not be rejected. For a sample mean of 1001 the p-value will be 2.28%,
which is below α = 5%. Hence H 0 will be rejected.

Let us analyse in some detail the implications of using a significance level α for
rejecting a null hypothesis.

i. The first thing to note is that if we do not reject H 0 , this does not prove
that H 0 is true. For example, if α = 5% and the p-value = 6%, we will
not reject H 0 . But there is only about 6% chance that H 0 is true, which is
hardly proof that H 0 is true. It may be possible that H 0 is false and by not
rejecting it, we are committing a type II error. For this reason, we should
say “We cannot reject H 0 at anα of 5%”rather than “We accept H 0 ”.

ii. The second thing to note is that α is the maximum probability of type I
error we set for ourselves. Since α is the maximum p-value at which we
reject H 0 , it is the maximum probability of committing a type I error. In
other words, setting α = 5% means that we are willing to put up with up
to 5% chance of committing a type I error.
iii. The third thing to note is that the selected value of α indirectly determines
the probability of type II error as well. In general, other things
remaining the same, increasing the value of α will decrease the
probability of type II error. This should be intuitively obvious. For
example, increasing α from 5% to 10% means that in those instances with
p-value in the range 5% to 10% the H 0 that would not have been rejected
before would now be rejected. Thus, some cases of false H 0 that escaped
rejection before may not escape now. As a result, the probability of type
II error will decrease
iv. The fourth thing to note about α is the meaning of (1 - α). If we set α =
5%, then (1 - α) = 95% is the minimum confidence level that we set in
order to reject H 0 . In other words, we want to be at least95% confident
that H 0 is false before we reject it.

One-Tailed and Two-Tailed Tests


Consider the null and alternative hypotheses:
H 0 : μ ≥ 1,000
H 1 : μ <1,000
In this case, we will reject H 0 only when X is significantly less than 1,000 or
only when Z falls significantly below zero. Thus the rejection occurs only when
Z takes a significantly low value in the left tail of its distribution.

Such a case where rejection occurs in the left tail of the distribution of the test
statistic is called a left-tailed test, as seen in the figure below.

A Left-tailed Test figure


In the case of a left-tailed test, the p-value is the area to the left of the
calculated value of the test statistic.

Now consider the case where the null and alternative hypotheses are:

H 0 : μ ≤ 1,000
H 1 : μ >1,000

In this case, we will reject H 0 only when X


is significantly more than 1,000 or only when Z is significantly greater than
zero. Thus the rejection occurs only when Z takes a significantly high value in
the right tail of its distribution.

Such a case where rejection occurs in the right tail of the distribution of the test
statistic is called a right-tailed test, as seen in the figure below
A Right-tailed Test figure
In the case of a right-tailed test, the p-value is the area to the right of the
calculated value of the test statistic.

In left-tailed and right-tailed tests, rejection occurs only on one tail. Hence
each of them is called a one-tailed test.

Finally, consider the case where the null and alternative hypotheses are:
H 0 : μ = 1,000
H 1 : μ ≠ 1,000

In this case, we have to reject H 0 in both cases, that is, whether X is significantly
less than or greater than 1,000. Thus, rejection occurs when Z is significantly
less than or greater than zero, which is to say that rejection occurs on both tails.
Therefore, this case is called a two-tailed test. See the figure below, where the
shaded areas are the rejection regions.

A Two-tailed Test figure


In the case of a two-tailed test, the p-value is twice the tail area. If the
calculated value of the test statistic falls on the left tail, then we take the area to
the left of the calculated value and multiply it by 2. If the calculated value of the
test statistic falls on the right tail, then we take the area to the right of the
calculated value and multiply it by 2. For example, if the calculated Z= +1.75,
the area to the right of it is 0.0401. Multiplying that by 2, we get the p-value
as0.0802.

Selecting Optimal α
All tests of hypotheses hinge upon this concept of the significance level and it is
possible that a null hypothesis can be rejected at α = 5% whereas the same
evidence is not strong enough to reject the null hypothesis at α = 1%. In other
words, the inference drawn can be sensitive to the significance level used. We
should note that selecting a value for α is a question of compromise between
type I and type II error probabilities. In practice, the significance level is
supposed to be arrived at after considering the cost consequences of type I error
and type II error. However, most of the time the costs are difficult to estimate
since they depend, among other things, on the unknown actual value of the
parameter being tested. Thus, arriving at a “calculated” optimal value for α is
impractical. Instead, we follow an intuitive approach of assigning one of the
three standard values, 1%, 5%, and 10%, to α.

In the intuitive approach, we try to estimate the relative costs of the two types of
errors. For example, suppose we are testing the average tensile strength of a
large batch of bolts produced by a machine to see if it is above the minimum
specified. Here type I error will result in rejecting a good batch of bolts and the
cost of the error is roughly equal to the cost of the batch of bolts. Type II error
will result in accepting a bad batch of bolts and its cost can be high or low
depending on how the bolts are used.

If the bolts are used to hold together a structure, then the cost is high because
defective bolts can result in the collapse of the structure, causing great damage.
In this case, we should strive to reduce the probability of type II error more than
that of type I error. In such cases where type II error is more costly, we keep
a large value for α, namely, 10%.

On the other hand, if the bolts are used to secure the lids on trash cans, then the
cost of type II error is not high and we should strive to reduce the probability of
type I error more than that of type II error. In such cases where type I error is
more costly, we keep a small value for α, namely, 1%.

Then there are cases where we are not able to determine which type of error is
more costly. If the costs are roughly equal, or if we have not much
knowledge about the relative costs of the two types of errors, then we keep
α = 5%.

(D) β and Power of the Test


Denoted by β, Type II error is committed when a wrong decision is taken in
accepting a false null hypothesis. It is the probability of accepting H 0 when it
should have rejected for being false. It should be noted that βdepends on the
actual value of the parameter being tested, the sample size, and α. Let us see
exactly how it depends.

Consider the null and alternative hypotheses


H 0 : μ ≤ 1,000
H 1 : μ >1,000
Type II Error figure: H 0 : μ ≤ 1,000and actual μ = 1,002

Suppose the actual value of μ = μ 1 (say 1,002), such that μ 1 >1,000. Obviously,
H 0 is false. The cross-hatched area under the normal curve centered at μ 1 in the
figure above is then the probability of accepting H 0 when it is false. This area -
in the acceptance region of the normal curve centered at μ 0 = 1,000; represents
the probability that the observed sample mean X falls in the acceptance region
when μ= μ 1 (1,002), that is when H 0 is false.

Given the acceptance region (1 - α) for the normal curve centered at μ = μ 0 =


1,000, a careful analysis of figure reveals the following:
i. The value of βdecreases as μ 1 move away from μ 0 , displaying the entire
normal curve centered at μ 1 farther and farther away from the normal
curve centered at μ 0 .
ii. The value of β tends to increase as μ 1 moves nearer to μ 0 . A limit is
reached when μ 1 coincides with μ 0 , and the entire acceptance region (1-
α) for μ = μ 0 will represent the value of β. This is important conclusion in
the sense that when H 0 is true for μ = μ 0 , the entire acceptance region is
Type II error. Hence when H 0 is true, β = 1 - α and α=1-β.
iii. The un-shaded area under the normal curve centered at μ 1 , which falls
outside the acceptance region for μ = μ 0 , represents the probability of
rejecting H 0 when it is false for μ = μ 1 . This complement of β; (1-β) is
known as the power of the test.
The power of a test is the probability that a false null hypothesis will
be detected by the test.

iv. A change in the level of significance α means a change in the acceptance


region (1- α), which obviously implies a change in the cross hatched area
i.e.β. In other words, the smaller the α, the larger the βand vice-versa.
Type I and type II errors are, therefore negatively related.

Type I error and the power of the test (1-β) are, however, positively
related. Thus, the smaller the probability (α) of rejecting H 0 when it is
true, the smaller is the probability (1-β) of rejecting H 0 when it is false.

(E) Sample Size Effect


In the discussion above we said that we can keep a low or a β low depending on
which type of error is more costly. What if both types of error are costly and we
want to have low α as well as low β? The only way to do this is to make our
evidence more reliable, which can be done only by increasing the sample size.
If the sample size increases, then the evidence becomes more reliable and the
probability of any error will decrease.

The figure below shows the relationship between α and βfor various values of
sample size n. As n increases, the curve shifts downwards reducing both α and
β. Thus, when the costs of both types of error are high, the best policy is to have
a large sample and a low α, such as 1%.
p versusα for various values of n figure
After understanding the basic concepts of testing of hypotheses, we are now,
able to develop tests concerning different population parameters. Under
different conditions the test procedures have to be developed differently and
different test statistics are used for testing. Before proceeding further let us
define the critical region in terms of test statistic, which is often more helpful in
many situations.

Critical Region in Terms of Test Statistic


We have seen that the most common policy in statistical hypothesis testing is to
establish a significance level-α.We decide to reject or not to reject the null
hypothesis H 0 by comparing the p-value with the significance level. We define
the critical or rejection region as:

Critical Region: p-value < α


But in many situations we find it more useful to define the critical region in
terms of test statistic. We, then, decide to reject or not to reject the null
hypothesis H 0 by comparing the observed value of the test statistic with the cut-
off value or the critical value of the test statistic.
Z-test
When in the testing of hypotheses, we use the random variable Z for calculating
the p-value and for defining the critical region of the test; we call the test as Z-
test. The critical region in terms of Z are summarized in table below:

Critical Region of Z-test Table


Test Critical Region
Z < −Zα

Left-tailed
Z > Zα

Right-tailed

Z > Zα / 2
And
Z < −Zα / 2

Two-tailed

t-test
When in the testing of hypotheses, we use the random variable t for calculating
the p-value and for defining the critical region of the test; we call the test as t-
test. The critical region in terms of tare summarized in table below:
Critical Region of t-test Table
Test Critical Region
t < −tα

Left-tailed
t > tα

Right-tailed

t > tα / 2
And
t < −tα / 2
Two-tailed

χ2-test
When in the testing of hypotheses, we use the random variable χ2for calculating
the p-value and for defining the critical region of the test; we call the test as χ2-
test. The critical region in terms of χ2are summarized in the table below:

Critical Region of χ2-test Table


Test Critical Region
χ 2 < χ12−α

Left-tailed
χ 2 > χ α2

Right-tailed
χ 2 > χ α2 / 2
And
χ 2 < χ12−α / 2
Two-tailed

F-test
When in the testing of hypotheses, we use the random variable F for calculating
the p-value and for defining the critical region of the test; we call the test as F-
test. The critical region in terms of F are summarized in the table below:

Critical Region of F-test Table


Test Critical Region
F < F1−α (n1 − 1, n2 − 1)
i.e.F < Fα (n2 − 1, n1 − 1)

Left-tailed
F > Fα (n1 − 1, n2 − 1)

Right-tailed
F > Fα / 2 (n1 − 1, n2 − 1)

And
F < F1−α / 2 (n1 − 1, n2 − 1)
i.e.F < Fα / 2 (n2 − 1, n1 − 1)
Two-tailed
(F) General Testing Procedure
We have learnt a number of important concepts about hypothesis testing. We
are now in a position to lay down a general testing procedure in a more
systematic way. By now it should be clear that there are basically two phases in
testing of hypothesis - in the first phase, we design the test and set up the
conditions under which we shall reject the null hypothesis. In the second phase,
we use the sample evidence and draw our conclusion as to whether the null
hypothesis can be rejected. The detailed steps involved are as follows:

Step 1: State the Null and the Alternate Hypotheses. i.e. H 0 and H 1
Step 2: Specify a level of significance α
Step 3: Choose the test statistic and define the critical region in terms of the test
statistic
Step 4: Make necessary computations
i. Calculate the observed value of the test statistic
ii. Find the p- value of the test
Step 5: Decide to accept or reject the null hypothesis either
i. By comparing the p- value with α or
ii. By comparing the observed value of the test statistic with the cut- off
value or the critical value of the test statistic.

Summary
In this study session, we have discussed the meaning of Hypothesis, the concept
of Null and Alternative Hypotheses, Interpretation of P-Value, sample size
effect and general testing procedure.

(G) Discussion Questions


1. What is the difference between regressing y on x and regression x on y?
2. State the difference between correlation and regression.
3. Why is there a need to conduct diagnostic analysis?
4. Discuss the approaches to testing hypothesis
5. What is the difference between Type I and Type II error?
6. Discuss the differences between one-tailed and two tailed test.

References
1. Dominick, Salvatore and Derrick, Reagle (2002). Theory and Problems of
Statistics and Econometrics (2nd Edition). New York: McGraw-Hill
Companies, Inc.
2. Gerald, K and Brian, W (2006).Statistics for Management and Economics
(10th Edition). Pacific Grove-USA: Brooks/Cole-Thompson Learning.
3. Hein, L. W (2015). Quantitative Approach to Managerial Decisions. New
Jersey: Prentice Hall.
4. Hooda, R. P (2014). Statistics for Business and Economics. New Delhi:
Macmillan.
5. Lind, Marchal and Marson (2005). Statistical Techniques in Business and
Economics (12th Edition). New York: McGraw-Hill Companies.
9.3.2.4 STUDY SESSION 12
Correlation
Section and Subsection Headings
Introduction
Learning outcomes
Main Content
(A) Correlation
(B) Correlation Analysis
(C) Limitations of Correlation Analysis
(D) Discussion Questions
Summary
Self-assessment questions and Answers
References/Further Readings

Introduction
You are welcome to another study session. In this study session, you will
understand the meaning of Correlation, Correlation Analysis, Pearson’s
Coefficient Correlation, Spearman’s Rank Coefficient Correlation, Concurrent
Deviation Coefficient as well as the Limitations of Correlation Analysis.

Learning Outcomes
In this study session, you will be able to:
1. Explain the meaning of Correlation &Correlation Analysis.
2. Discuss the Pearson’s Coefficient Correlation and Spearman’s
Rank Coefficient Correlation,
3. Discuss the Concurrent Deviation Coefficient as well as the
Limitations of Correlation Analysis.
(A) Correlation
Statistical methods of measures of central tendency, dispersion, skewness and
kurtosis are helpful for the purpose of comparison and analysis of distributions
involving only one variable i.e. univariate distributions. However, describing
the relationship between two or more variables is another important part of
statistics.

In many business research situations, the key to decision making lies in


understanding the relationships between two or more variables. For example, in
an effort to predict the behavior of the bond market, a broker might find it
useful to know whether the interest rate of bonds is related to the prime interest
rate. While studying the effect of advertising on sales, an account executive may
find it useful to know whether there is a strong relationship between advertising
dollars and sales dollars for a company.

The statistical methods of Correlation (to be discussed in this study session)


and Regression (to be discussed in the next study session) are helpful in
knowing the relationship between two or more
variables which may be related in same way,
like interest rate of bonds and prime interest
rate; advertising expenditure and sales; income
and consumption; crop-yield and fertilizer used;
height and weights and so on.

In all these cases involving two or more variables, we may be interested in


seeing:
i. If there is any association between the variables;
ii. If there is an association, is it strong enough to be useful;
iii. If so, what form the relationship between the two variables takes;
iv. How we can make use of that relationship for predictive purposes, that is,
forecasting; and
v. How good such predictions will be.

ITQ: What is one of the important part of statistics?


ITA: Describing the relationship between two or more variables. This is so because
the key to decision making lies in understanding the relationships between two or
more variables. The statistical methods of Correlation and Regression are helpful
in knowing the relationship between two or more variables which may be related in
same way.

Since these issues are inter related, correlation and regression analysis, as two
sides of a single process, consists of methods of examining the relationship
between two or more variables. If two (or more) variables are correlated, we can
use information about one (or more) variable(s) to predict the value of the other
variable(s), and can measure the error of estimations - a job of regression
analysis.

What is Correlation?
Correlation is a measure of association between two or more variables. When
two or more variables very in sympathy so that movement in one tends to be
accompanied by corresponding movements in the other variable(s), they are
said to be correlated.

“The correlation between variables is a measure of the nature and degree


of association between the variables”.

As a measure of the degree of relatedness of two variables, correlation is


widely used in exploratory research when the objective is to locate variables
that might be related in some way to the variable of interest.
Types of Correlation
Correlation can be classified in several ways. The important ways of classifying
correlation are:
(i) Positive and negative,
(ii) Linear and non-linear (curvilinear) and
(iii) Simple, partial and multiple.

Positive and Negative Correlation


If both the variables move in the same direction, we say that there is a positive
correlation i.e., if one variable increases, the other variable also increases on an
average or if one variable decreases, the other variable also decreases on an
average.

On the other hand, if the variables are varying in opposite direction, we say that
it is a case of negative correlation; e.g., movements of demand and supply.

Linear and Non-linear (Curvilinear) Correlation


If the change in one variable is accompanied by change in another variable in a
constant ratio, it is a case of linear correlation. Observe the following data:
X: 10 20 30 40 50
Y: 25 50 75 100 125

The ratio of change in the above example is the same. It is, thus, a case of linear
correlation. If we plot these variables on graph paper, all the points will fall on
the same straight line.

On the other hand, if the amount of change in one variable does not follow a
constant ratio with the change in another variable, it is a case of non-linear or
curvilinear correlation. If a couple of figures in either series X or series Y are
changed, it would give a non-linear correlation.

Simple, Partial and Multiple Correlation


The distinction amongst these three types of correlation depends upon the
number of variables involved in a study. If only two variables are involved in a
study, then the correlation is said to be simple correlation. When three or more
variables are involved in a study, then it is a problem of either partial or
multiple correlation. In multiple correlation, three or more variables are studied
simultaneously. But in partial correlation we consider only two variables
influencing each other while the effect of other variable(s) is held constant.

Suppose we have a problem comprising three variables X, Y and Z. X is the


number of hours studied, Y is I.Q. and Z is the number of marks obtained in the
examination. In a multiple correlation, we will study the relationship between
the marks obtained (Z) and the two variables, number of hours studied (X) and
I.Q. (Y). In contrast, when we study the relationship between X and Z, keeping
an average I.Q. (Y) as constant, it is said to be a study involving partial
correlation.

In this study session, we will study linear correlation between two variables.

Correlation Does Not Necessarily Mean Causation


The correlation analysis, in discovering the nature and degree of relationship
between variables, does not necessarily imply any cause and effect relationship
between the variables. Two variables may be related to each other but this does
not mean that one variable causes the other. For example, we may find that
logical reasoning and creativity are correlated, but that does not mean if we
could increase peoples’ logical reasoning ability, we would produce greater
creativity. We need to conduct an actual experiment to unequivocally
demonstrate a causal relationship. But if it is true
that influencing someone’s’ logical reasoning
ability does influence their creativity, then the two
variables must be correlated with each other. In
other words, causation always implies
correlation, however converse is not true.

Let us see some situations-


1. The correlation may be due to chance particularly when the data pertain
to a small sample. A small sample bivariate series may show the
relationship but such a relationship may not exist in the universe.
2. It is possible that both the variables are influenced by one or more other
variables. For example, expenditure on food and entertainment for a
given number of households show a positive relationship because both
have increased over time. But, this is due to rise in family incomes over
the same period. In other words, the two variables have been influenced
by another variable - increase in family incomes.
3. There may be another situation where both the variables may be
influencing each other so that we cannot say which is the cause and
which is the effect. For example, take the case of price and demand. The
rise in price of a commodity may lead to a decline in the demand for it.
Here, price is the cause and the demand is the effect. In yet another
situation, an increase in demand may lead to a rise in price. Here, the
demand is the cause while price is the effect, which is just the reverse of
the earlier situation. In such situations, it is difficult to identify which
variable is causing the effect on which variable, as both are influencing
each other.
The foregoing discussion clearly shows that correlation does not indicate any
causation or functional relationship. Correlation coefficient is merely a
mathematical relationship and this has nothing to do with cause and effect
relation. It only reveals co-variation between two variables. Even when there is
no cause-and-effect relationship in bivariate series and one interprets the
relationship as causal, such a correlation is called spurious or non-sense
correlation. Obviously, this will be misleading. As such, one has to be very
careful in correlation exercises and look into other relevant factors before
concluding a cause-and-effect relationship.

ITQ: What is correlation?


ITA: Correlation is a measure of association between two or more variables.

(B) Correlation Analysis


Correlation Analysis is a statistical technique used to indicate the nature and
degree of relationship existing between one variable and the other(s). It is also
used along with regression analysis to measure how well the regression line
explains the variations of the dependent variable with the independent variable.

The commonly used methods for studying linear relationship between two
variables involve both graphic and algebraic methods. Some of the widely
used methods include:
1. Scatter Diagram
2. Correlation Graph
3. Pearson’s Coefficient of Correlation
4. Spearman’s Rank Correlation
5. Concurrent Deviation Method
1. Scatter Diagram
This method is also known as Dotogram or Dot diagram. Scatter diagram is
one of the simplest methods of diagrammatic representation of a bivariate
distribution. Under this method, both the variables are plotted on the graph
paper by putting dots. The diagram so obtained is called “Scatter Diagram”. By
studying diagram, we can have rough idea about the nature and degree of
relationship between two variables. The term scatter refers to the spreading of
dots on the graph. We should keep the following points in mind while
interpreting correlation:
i. If the plotted points are very close to each other, it indicates high degree
of correlation. If the plotted points are away from each other, it indicates
low degree of correlation.
Scatter Diagrams Figure
ii. If the points on the diagram reveal any trend (either upward or
downward), the variables are said to be correlated and if no trend is
revealed, the variables are uncorrelated.
iii. If there is an upward trend rising from lower left hand corner and going
upward to the upper right hand corner, the correlation is positive since
this reveals that the values of the two variables move in the same
direction. If, on the other hand, the points depict a downward trend from
the upper left hand corner to the lower right hand corner, the correlation
is negative since in this case the values of the two variables move in the
opposite directions.
iv. In particular, if all the points lie on a straight line starting from the left
bottom and going up towards the right top, the correlation is perfect and
positive, and if all the points like on a straight line starting from left top
and coming down to right bottom, the correlation is perfect and negative.

ITQ: What is the simplest methods of diagrammatic representation of a bivariate


distribution?
ITA: Scatter diagram is one of the simplest methods of diagrammatic
representation of a bivariate distribution.

The various diagrams of the scattered data in the above figure depict different
forms of correlation.

Example:
Given the following data on sales (in thousand units) and expenses (in thousand
naira) of a firm for 10 month:
Month: J F MA M J J A S O
Sales: 50 50 55 60 62 65 68 60 60 50
Expenses: 11 1314 16 16 15 15 14 13 13
a) Make a Scatter Diagram
b) Do you think that there is a correlation between sales and expenses of the
firm? Is it positive or negative? Is it high or low?

Solution:
a) The Scatter Diagram of the given data is shown in the figure below:

20

15
Expenses

10

0
0 20 40 60 80
Sales

Scatter Diagram Figure


b) The figure above shows that the plotted points are close to each other and
reveal an upward trend. So there is a high degree of positive correlation
between sales and expenses of the firm.

2. Correlation Graph
This method, also known as Correlogram is very simple. The data pertaining to
two series are plotted on a graph sheet. We can find out the correlation by
examining the direction and closeness of two curves. If both the curves drawn
on the graph are moving in the same direction, it is a case of positive
correlation. On the other hand, if both the curves are moving in opposite
direction, correlation is said to be negative. If the graph does not show any
definite pattern on account of erratic fluctuations in the curves, then it shows an
absence of correlation.
Example:
Find out graphically, if there is any correlation between price yield per plot
(qtls); denoted by Y and quantity of fertilizer used (kg); denote by X.

Plot No.: 1 2 3 4 5 6 7 8 9 10
Y: 3.5 4.3 5.2 5.8 6.4 7.3 7.2 7.5 7.8 8.3
X: 6 8 9 12 10 15 17 20 18 24

Solution:
The Correlogram of the given data is shown in Figure below:
30
25
20
X and Y

15
10
5
0
1 2 3 4 5 6 7 8 9 10
Plot Number

Correlation Graph Figure

The figure above shows that the two curves move in the same direction and,
moreover, they are very close to each other, suggesting a close relationship
between price yield per plot (qtls) and quantity of fertilizer used (kg)

Remark: Both the Graphic methods - scatter diagram and correlation graph
provide a ‘feel for ‘of the data – by providing visual representation of the
association between the variables. These are readily comprehensible and enable
us to form a fairly good, though rough idea of the nature and degree of the
relationship between the two variables. However, these methods are unable to
quantify the relationship between them. To quantify the extent of correlation,
we make use of algebraic methods - which calculate correlation coefficient.

3. Pearson’s Coefficient of Correlation


A mathematical method for measuring the intensity or the magnitude of linear
relationship between two variables was suggested by Karl Pearson (1867-
1936), a great British Biometrician and Statistician and, it is by far the most
widely used method in practice. Karl Pearson’s measure, known as Pearsonian
correlation coefficient between two variables X and Y, usually denoted by
r(X,Y) or r xy or simply r is a numerical measure of linear relationship between
them and is defined as the ratio of the covariance between X and Y, to the
product of the standard deviations of X and Y.

So Pearsonian correlation coefficient may be found as


1
N2
[
N ∑ XY − ∑ X ∑ Y ]
rxy =

N
1
2
[N ∑ X 2 − (∑ X )
2

N
1
2
] [
N ∑ Y 2 − (∑ Y )
2
]
OR
N ∑ XY − ∑ X ∑ Y
rxy =
N ∑ X 2 − (∑ Y ) N ∑ Y 2 − (∑ Y )
2 2

And in some cases this

rxy =
∑d d
x y

∑d d2
x
2
y

Remark: Thus if (i) X and Y are fractional and (ii) X and Y assume large values,
N ∑ XY − ∑ X ∑ Y
this formula: rxy = is not generally used for
N ∑ X 2 − (∑ Y ) N ∑ Y 2 − (∑ Y )
2 2

numerical problems. In such cases, the step deviation method where we take the
deviations of the variables X and Y from any arbitrary points is used. We will
discuss this method in the properties of correlation coefficient.

Properties of Pearsonian Correlation Coefficient


The following are important properties of Pearsonian correlation coefficient:

1. Pearsonian correlation coefficient cannot exceed 1 numerically. In


other words it lies between –1 and +1. Symbolically,
-1 ≤ r ≤1
Remarks:
i. This property provides us a check on our calculations. If in any problem,
the obtained value of r lies outside the limits + 1, this implies that there is
some mistake in our calculations.
ii. The sign of r indicate the nature of the correlation. Positive value of r
indicates positive correlation, whereas negative value indicates negative
correlation. r = 0indicateabsence of correlation.
iii. The following table sums up the degrees of correlation corresponding to
various values of r:
Value of r Degree of Correlation
±1 Perfect correlation
or more
± 0.90 or more Very high degree of correlation
± 0.75to ± 0.90 Sufficiently high degree of correlation
± 0.60to ± 0.75 Moderate degree of correlation
± 0.30to ± 0.60 Only the possibility of a correlation
less than ± 0.30 Possibly no correlation
0 Absence of correlation

2. Pearsonian Correlation coefficient is independent of the change of


origin and scale. Mathematically, if given variables X and Y are
transformed to new variables U and V by change of origin and scale, i. e.
X−A Y −B
U= and V =
h k

Where A, B, hand k are constants and h > 0, k > 0;then the correlation
coefficient between X and Y is same as the correlation coefficient between U
and V i.e.,
r(X,Y) = r(U, V) =>r xy = r uv

Remark: This is one of the very important properties of the correlation


coefficient and is extremely helpful in numerical computation of r. We had
N ∑ XY − ∑ X ∑ Y
already stated that, this formula rxy = become
N ∑ X − (∑ Y ) N ∑ Y − (∑ Y )
2 2 2 2

quite tedious to use in numerical problems if X and/or Yare in fractions or if X


and Yare large. In such cases we can conveniently change the origin and scale
(if possible) in X or/and Y to get new variables U and V and compute the
correlation between U and V by the formula below:
N ∑UV − ∑U ∑V
rxy = ruv =
N ∑U 2 − (∑U ) N ∑V 2 − (V )
2 2

3. Two independent variables are uncorrelated but the converse is not true.
If X and Yare independent variables then
r xy = 0

However, the converse of the theorem is not true i.e. Uncorrelated variables
need not necessarily be independent. As an illustration consider the following
bivariate distribution.
X: 1 2 3 -3 -2 -1
Y: 1 4 9 9 4 1
For this distribution, value of r will be 0.

Hence in the above example the variable X and Yare uncorrelated. But if we
examine the data carefully we find that X and Yare not independent but are
connected by the relation Y = X2 .The above example illustrates that
uncorrelated variables need not be independent.

Remarks: One should not be confused with the words un correlation and
independence. r xy = 0i.e., un correlation between the variables X and Y simply
implies the absence of any linear (straight line) relationship between them. They
may, however, be related in some other form other than straight line e.g.,
quadratic (as we have seen in the above example), logarithmic or trigonometric
form.

Pearsonian coefficient of correlation is the geometric mean of the two


regression coefficients, i.e.
rxy = ± bxy b yx

The signs of both the regression coefficients are the same, and so the value of r
will also have the same sign.

This property will be dealt with in detail in the next study session on
Regression Analysis.

The square of Pearsonian correlation coefficient is known as the coefficient


of determination.

Coefficient of determination, which measures the percentage variation in the


dependent variable that is accounted for by the independent variable, is a much
better and useful measure for interpreting the value of r. This property will also
be dealt with in detail in the next study session.

Probable Error of Correlation Coefficient


The correlation coefficient establishes the relationship of the two variables.
After ascertaining this level of relationship, we may be interested to find the
extent up to which this coefficient is dependable. Probable error of the
correlation coefficient is such a measure of testing the reliability of the observed
value of the correlation coefficient, when we consider it as satisfying the
conditions of the random sampling.

If r is the observed value of the correlation coefficient in a sample of N pairs of


observations for the two variables under consideration, then the Probable Error,
denoted by PE (r) is expressed as:
PE(r) = 0.6745 SE(r)
Or
1− r2
PE (r ) = 0.6745
N

There are two main functions of probable error:


a. Determination of limits: The limits of population correlation
coefficient are r ±PE(r), implying that if we take another random sample
of the size N from the same population, then the observed value of the
correlation coefficient in the second sample can be expected to lie within
the limits given above, with 0.5 probability. When sample size Nis small,
the concept or value of PE may lead to wrong conclusions. Hence to use
the concept of PE effectively, sample size Nit should be fairly large.
i. If r < PE(r), there is no evidence of correlation, i.e. a case of
insignificant correlation.
ii. If r > 6 PE(r), correlation is significant. If r < 6 PE(r), it is
insignificant.
iii. If the probable error is small, correlation exist where r > 0.5

Example:
Find the Pearsonian correlation coefficient between sales (in thousand units)
and expenses (in thousand naira) of the following 10 firms:
Firm: 1 2 3 4 5 6 7 8 9 10
Sales: 50 50 55 60 65 65 65 60 60 50
Expenses: 11 13 14 16 16 15 15 14 13 13

Solution:
Let sales of a firm be denoted by X and expenses be denoted by Y

Calculations for Coefficient of Correlation


Firm X Y dX = X − X dy = Y −Y d x2 d y2 dxd y

1 50 11 -8 -3 64 9 24
2 50 13 -8 -1 64 1 8
3 55 14 -3 0 9 0 0
4 60 16 2 2 4 4 4
5 65 16 7 2 49 4 14
6 65 15 7 1 49 1 7
7 65 15 7 1 49 1 7
8 60 14 2 0 4 0 0
9 60 13 2 -1 4 1 -2
10 50 13 -8 -1 64 1 8
∑X = ∑Y = ∑ d x2 = ∑ d y2 = ∑ d x2 d y2 =
580 140 360 22 70

X=
∑X =
580
= 58 Y=
∑ Y = 140 = 14
N 10 and N 10
Applying the formula below, we have Pearsonian coefficient of correlation

rxy =
∑d d
x y

∑d d
2
x
2
y

70
rxy =
360 X 22
70
rxy =
7920

rxy = 0.78

The value of r xy = 0.78, indicate a high degree of positive correlation


between sales and expenses.

4. Spearman’s Rank Correlation


Sometimes we come across statistical series in which the variables under
consideration are not capable of quantitative measurement but can be arranged
in serial order. This happens when we are dealing with qualitative
characteristics (attributes) such as honesty, beauty, character, morality, etc.,
which cannot be measured quantitatively but can be arranged serially. In such
situations Karl Pearson’s coefficient of correlation cannot be used as such.
Charles Edward Spearman, a British Psychologist, developed a formula in 1904,
which consists in obtaining the correlation coefficient between the ranks of N
individuals in the two attributes under study.

Suppose we want to find if two characteristics A, say, intelligence and B, say,


beauty are related or not. Both the characteristics are incapable of quantitative
measurements but we can arrange a group of N individuals in order of merit
(ranks) w.r.t. proficiency in the two characteristics. Let the random variables X
and Y denote the ranks of the individuals in the characteristics A and B
respectively. If we assume that there is no tie, i.e., if no two individuals get the
same rank in a characteristic then, obviously, X and Y assume numerical values
ranging from 1 to N.

The Pearsonian correlation coefficient between the ranks X and Y is called


the rank correlation coefficient between the characteristics A and B for the
group of individuals.

Spearman’s rank correlation coefficient, usually denoted by ρ(Rho) is given by


the equation:
6∑ d 2
ρ = 1−
(
N N 2 −1 )
Where d is the difference between the pair of ranks of the same individual in the
two characteristics and N is the number of pairs.

6∑ d 2
Spearman’s rank correlation formula ρ = 1 − can also be used even if we
(
N N 2 −1 )
are dealing with variables, which are measured quantitatively, i.e. when the
actual data but not the ranks relating to two variables are given. In such a case
we shall have to convert the data into ranks. The highest (or the smallest)
observation is given the rank 1. The next highest (or the next lowest)
observation is given rank 2 and so on. It is immaterial in which way
(descending or ascending) the ranks are assigned. However, the same approach
should be followed for all the variables under consideration.

Example:
Calculate the rank coefficient of correlation from the following data:
X: 75 88 95 70 60 80 81 50
Y: 120 134 150 115 110 140 142 100
Solution:
Calculations for Coefficient of Rank Correlation
X Ranks R X Y Ranks R Y d = R X –R Y d2
75 5 120 5 0 0
88 2 134 4 -2 4
95 1 150 1 0 0
70 6 115 6 0 0
60 7 110 7 0 0
80 4 140 3 +1 1
81 3 142 2 +1 1
50 8 100 8 0 0
Σd2 = 6
6∑ d 2
ρ = 1−
(
N N 2 −1 )
6X 6
ρ = 1−
(
8 82 − 1 )
36
ρ = 1−
504
ρ = 1 − 0.07
ρ = +0.93

Hence, there is a high degree of positive correlation between X and Y

Repeated Ranks
In case of attributes if there is a tie i.e., if any two or more individuals are
placed together in any classification w.r.t.an attribute or if in case of variable
data there is more than one item with the same value in either or both the series
6∑ d 2
then Spearman’s formula ρ = 1 − for calculating the rank correlation
(
N N 2 −1 )
coefficient breaks down, since in this case the variables X[the ranks of
individuals in characteristic A (1stseries)] and Y[the ranks of individuals in
characteristic B (2nd series)] do not take the values from 1to N.

In this case common ranks are assigned to the repeated items. These common
ranks are the arithmetic mean of the ranks, which these items would have got if
they were different from each other and the next item will get the rank next to
the rank used in computing the common rank. For example, suppose an item is
repeated at rank 4. Then the common rank to be assigned to each item is
(4+5)/2, i.e., 4.5 which is the average of 4 and 5, the ranks which these
observations would have assumed if they were different. The next item will be
assigned the rank 6. If an item is repeated thrice at rank 7, then the common
rank to be assigned to each value will be (7+8+9)/3, i.e.,8 which is the
arithmetic mean of 7,8 and 9 viz., the ranks these observations would have got
if they were different from each other. The next rank to be assigned will be 10.

If only a small proportion of the ranks are tied, this technique may be applied
6∑ d 2
together with this formula ρ = 1 − . If a large proportion of ranks are tied,
(
N N 2 −1 )
it is advisable to apply an adjustment or a correction factor to this formula
6∑ d 2
ρ = 1− as explained below:
(
N N 2 −1 )

“In the formula above, add the factor


m m2 −1 ( )
12

To ∑d 2
; where ‘m’ is the number of times an item is repeated. This
correction factor is to be added for each repeated value in both the series”.

Remarks on Spearman’s Rank Correlation Coefficient


a. We always have Σd= 0, which provides a check for numerical calculations.

b. Since Spearman’s rank correlation coefficient, ρ, is nothing but Karl


Pearson’s correlation coefficient, r, between the ranks, it can be interpreted in
the same way as the Karl Pearson’s correlation coefficient.

c. Karl Pearson’s correlation coefficient assumes that the parent population


from which sample observations are drawn is normal. If this assumption is
violated then we need a measure, which is distribution free (or non- parametric).
Spearman’s ρ is such a distribution free measure, since no strict assumption are
made about the form of the population from which sample observations are
drawn.

d. Spearman’s formula is easy to understand and apply as compared to Karl


Pearson’s formula. The values obtained by the two formulae, viz Pearsonian r
and Spearman’s ρ are generally different. The difference arises due to the fact
that when ranking is used instead of full set of observations, there is always
some loss of information. Unless many ties exist, the coefficient of rank
correlation should be only slightly lower than the Pearsonian coefficient.

e. Spearman’s formula is the only formula to be used for finding correlation


coefficient if we are dealing with qualitative characteristics, which cannot be
measured quantitatively but can be arranged serially. It can also be used where
actual data are given. In case of extreme observations, Spearman’s formula is
preferred to Pearson’s formula.

f. Spearman’s formula has its limitations also. It is not practicable in the case of
bivariate frequency distribution. For N >30, this formula should not be used
unless the ranks are given.
5. Concurrent Deviation Method
This is a casual method of determining the correlation between two series when
we are not very serious about its precision. This is based on the signs of the
deviations (i.e. the direction of the change) of the values of the variable from its
preceding value and does nottake into account the exact magnitude of the values
of the variables. Thus we put a plus (+) sign, minus (-) sign or equality (=) sign
for the deviation if the value of the variable is greater than, less than or equal to
the preceding value respectively. The deviations in the values of two variables
are said to be concurrent if they have the same sign (either both deviations are
positive or both are negative or both are equal). The formula used for computing
correlation coefficient r c by this method is given by:
 2c − N 
rc = ± ±  
 N 

Where c is the number of pairs of concurrent deviations and Nis the number of
pairs of deviations. If (2c-N) is positive, we take positive sign in and outside the
square root in r c formula above and if (2c-N) is negative, we take negative sign
in and outside the square root in r c formula.

Remarks:
i. It should be clearly noted that here Nis not the number of pairs of
observations but it is the number of pairs of deviations and as such it is
one less than the number of pairs of observations.
ii. Coefficient of concurrent deviations is primarily based on the following
principle:

“If the short time fluctuations of the time series are positively correlated or
in other words, if their deviations are concurrent, their curves would move
in the same direction and would indicate positive correlation between
them”
Example:
Calculate coefficient of correlation by the concurrent deviation method
Supply: 112 125 126 118 118 121 125 125 131 135
Price: 106 102 102 104 98 96 97 97 95 90

Solution:
Calculations for Coefficient of Concurrent Deviations
Supply Sign of Deviation from Price Sign of Deviation Concurrent
(X) Preceding Value (X) (Y) from Preceding Value (Y) Deviations
112 106
125 + 102 -
126 + 102 =
118 - 104 +
118 = 98 -
121 + 96 -
125 + 97 + +(c)
125 = 97 = =(c)
131 + 95 -
135 + 90 - .
We have
Number of pairs of deviations, N =10 – 1 = 9
c = Number of concurrent deviations
= Number of deviations having like signs
=2

Coefficient of correlation by the method of concurrent deviations is given by:


 2c − N 
rc = ± ±  
 N 
2X 2 − 9
rc = ± ±
9
rc = ± ± (− 0.5556 )

Since 2c – N = -5 (negative), we take negative sign inside and outside the


square root
rc = − − (− 0.5556 )

rc = ± 0.5556

rc = −0.7

Hence there is a fairly good degree of negative correlation between supply and
price.

(C) Limitations of Correlation Analysis


As mentioned earlier, correlation analysis is a statistical tool, which should be
properly used so that correct results can be obtained. Sometimes, it is
indiscriminately used by management, resulting in misleading conclusions. We
give below some errors frequently made in the use of correlation analysis:
1. Correlation analysis cannot determine cause-and-effect relationship. One
should not assume that a change in Y variable is caused by a change in X
variable unless one is reasonably sure that one variable is the cause while
the other is the effect. Let us take an example.

Suppose that we study the performance of students in their graduate


examination and their earnings after, say, three years of their graduation.
We may find that these two variables are highly and positively related. At
the same time, we must not forget that both the variables might have been
influenced by some other factors such as quality of teachers, economic
and social status of parents, effectiveness of the interviewing process and
so forth. If the data on these factors are available, then it is worthwhile to
use multiple correlation analysis instead of bivariate one.
2. Another mistake that occurs frequently is on account of misinterpretation
of the coefficient of correlation. Suppose in one case r = 0.7, it will be
wrong to interpret that correlation explains 70 percent of the total
variation in Y. The error can be seen easily when we calculate the
coefficient of determination. Here, the coefficient of determination r2will
be 0.49. This means that only 49 percent of the total variation in Y is
explained.

Similarly, the coefficient of determination is misinterpreted if it is also


used to indicate causal relationship, that is, the percentage of the change
in one variable is due to the change in another variable.
3. Another mistake in the interpretation of the coefficient of correlation
occurs when one concludes a positive or negative relationship even
though the two variables are actually unrelated. For example, the age of
students and their score in the examination have no relation with each
other. The two variables may show similar movements but there does not
seem to be a common link between them.

To sum up, one has to be extremely careful while interpreting coefficient of


correlation. Before one concludes a causal relationship, one has to consider
other relevant factors that might have any influence on the dependent variable
or on both the variables. Such an approach will avoid many of the pitfalls in the
interpretation of the coefficient of correlation. It has been rightly said that the
coefficient of correlation is not only one of the most widely used, but also
one of the widely abused statistical measures.
Summary
In this study session, we have discussed the meaning of Correlation, Correlation
Analysis, Pearson’s Coefficient Correlation, Spearman’s Rank Coefficient
Correlation, Concurrent Deviation Coefficient as well as the Limitations of
Correlation Analysis.

(D) Discussion Questions


1. What do you understand by the term correlation?
2. What are the differences between positive and negative correlations?
3. What is rank correlation?
4. State the merits and demerits of rank correlation
5. Discuss the limitations of correlation analysis

References
1. Dominick, Salvatore and Derrick, Reagle (2002). Theory and Problems of
Statistics and Econometrics (2nd Edition). New York: McGraw-Hill
Companies, Inc.
2. Gerald, K and Brian, W (2006).Statistics for Management and Economics
(10th Edition). Pacific Grove-USA: Brooks/Cole-Thompson Learning.
3. Hein, L. W (2015). Quantitative Approach to Managerial Decisions. New
Jersey: Prentice Hall.
4. Hooda, R. P (2014). Statistics for Business and Economics. New Delhi:
Macmillan.
5. Lind, Marchal and Marson (2005). Statistical Techniques in Business and
Economics (12th Edition). New York: McGraw-Hill Companies.
9.3.2.5 STUDY SESSION 13
Regression
Section and Subsection Headings
Introduction
Learning outcomes
Main Content
(A) Meaning of Simple Regression Analysis
(B) Simple Regression Equation
(C) Predicting an Estimate and its Preciseness
(D) Regression of x on y
(E) Properties of Regression Coefficients
(F) Correlation Analysis versus Regression Analysis
(G) Regression Diagnostics-I
Summary
Self-assessment questions and Answers
References/Further Readings

Introduction
You are welcome to the last study session of this course. In this session,
discussion will be made on how to measure relationship between variables, in
which regression analysis was considered as one of the approaches. The session
will also deal with simple regression analysis, it will explain how the intercept
and the gradient of the simple regression equation are determined and also how
to carry out some diagnostics analysis on simple regression.

Learning Outcomes
In this study session, you will:
1. Be able to describe regression analysis.
2. Understand ways of expressing simple regression equation.
3. Be able to compute intercept, gradient and error term.
4. Know various properties of regression coefficient.
5. Be able to make a decision using regression analysis.
6. Understand some regression diagnostics.

(A) Meaning of Simple Regression Analysis


Simple regression analysis is used to express a relationship between two
variables and estimate the value of the dependent variable y based on a selected
value of the independent variable x. The essence of the analysis is to make use
of its results in estimating or predicting the value of the y. The analysis enables
one to find out the causal effect of one variable on another.

(B) Simple Regression Equation


Simple regression equation (or first-order linear model) can be represented by
means of an equation of a straight line. The model can be deterministic or
stochastic (also called probabilistic) depending on the nature of the problem
under study. The deterministic model is used in a situation where the estimation
or prediction can be made with less degree of error. For instance, the equations,
Profit = Revenue – Cost or Total cost = Fixed cost + (Variable cost x Number
of units produced) can be considered as deterministic models, this is because in
both cases it is possible to predict or estimate the dependent variable with less
degree of error.

ITQ: What is the essence of regression analysis?


ITA: The essence of the analysis is to make use of its results in estimating or
predicting the value of the y.

In a situation of higher degree of error, probabilistic model is utilised. For


instance, if you are interested in predicting failure of a particular banking sector,
there are so many variables that are required to be taken into consideration
which your study may not necessary take all of them into regard and hence
more chance of the degree of error.

The distinction between deterministic model and stochastic model is the


inclusion of error term (or random term) in the case of stochastic model. The
error term is included in order to take care of all those variables that are not
captured by the model. It is worthy to note that error term has certain conditions
are expected to be met. The required conditions and how to test for their
meeting will be discussed later in the session under diagnostic analysis.

Equation that is used to indicate deterministic model can be represented as


follows:
y = a + bx
or𝑦𝑦� = 𝛽𝛽̂0 + 𝛽𝛽̂𝑖𝑖 𝑥𝑥𝑖𝑖
The above equations are used when we are regressing using a sample data. y (or
𝑦𝑦�) is the dependent variable, a (or𝛽𝛽̂0 ) is the intercept, that is the value of y when
x (or 𝑥𝑥𝑖𝑖 ) the independent variable is equal to zero. b (or 𝛽𝛽̂𝑖𝑖 ) is called the gradient
(or the slope or the regression coefficient of y on x). It describes the rate of
change of y as a result of change in x.

When dealing with population data 𝛼𝛼𝑖𝑖 (alpha)and𝛽𝛽𝑖𝑖 (beta)are used in the
regression equation as represented thus:
𝑦𝑦 = 𝛼𝛼𝑖𝑖 + 𝛽𝛽𝑖𝑖 𝑥𝑥𝑖𝑖

The𝛼𝛼𝑖𝑖 is the interceptand the 𝛽𝛽𝑖𝑖 is the gradient of the equation.

In the case of the stochastic model, the equation of the sampled data is written
in the following forms:
y = a + bx + 𝜀𝜀𝑖𝑖
or𝑦𝑦� = 𝛽𝛽̂0 + 𝛽𝛽̂𝑖𝑖 𝑥𝑥𝑖𝑖 + 𝜀𝜀𝑖𝑖

The equation that addresses population data is captured as follows:


y = αi+ βi xi + εi
The definitions of the variables of the stochastic models above are similar to
those stated under the deterministic models; the only distinction as earlier stated
is the inclusion of the error term that accounts for non-captured variables by the
models.

ITQ: What is the distinction between deterministic model and stochastic model?
ITA: The distinction between deterministic model and stochastic model is the inclusion
of error term (or random term) in the case of stochastic model

(C) Predicting an Estimate and its Preciseness


Prediction or estimation involving two variables, one dependent and the other
independent can be done in one of the two ways. It can be done by using scatter
diagram after drawing a line of best fit or by means of regression equation
earlier described.

The line of best fit is a line on a scatter diagram that can be drawn near the
points to more clearly show the trends between two sets of data. Data points that
appear not close to the line of best fit are considered as outliers. The fit line can
be drawn by means of freehand drawing using highest, lowest and mean value
of x. This method is simple and quick, but the result it yields is rough and
subjective.

On drawing the line of best fit of the scatter diagram of illustration16.1 of


chapter 16 at the intercept level of 8, the graph will appear as follows:
Line of Best Fit Figure
10

6 Series1

4 Linear
(Series1)
2

0
0 5 10 15

Assuming using the scatter diagram above, we want to estimate the value of y
when x has a value of 3, all that we need to do is to draw a vertical line at x=3
till when the line touches the line of best fit. At the point that it touched the line,
we then draw a horizontal line towards the y axis. The value at the point at
which the horizontal line touches the y axis is the value that represents the
predicted or estimated value of y. The value is roughly 7.

In an effort to overcome estimation problems using line of fit determined by


freehand drawing, alternative objective ways based on mathematical approaches
are developed. In these approaches both the intercept and the gradient values are
mathematically established by either using normal equations or direct
interpolation formulae.

Normal equations are two simultaneous equations that are solved to determine
the value of the intercept and the gradient of simple linear regression. The
equations are represented as follows:
an + bΣx = Σy …………..i
aΣx + bΣx2 = Σxy…….…ii

Alternatively, the equation can be written in this form,


𝑏𝑏�0 𝑛𝑛 + 𝑏𝑏�𝑖𝑖 ∑ 𝑥𝑥𝑖𝑖 = ∑ 𝑦𝑦𝑖𝑖 …………1
𝑏𝑏�0 ∑ 𝑥𝑥𝑖𝑖 + 𝑏𝑏�𝑖𝑖 ∑ 𝑥𝑥𝑖𝑖2 = ∑ 𝑥𝑥𝑖𝑖 𝑦𝑦𝑖𝑖 ……..…..…2

On solving the two equations, the sign of the coefficient of the gradient will
automatically emerged. The letter n in the first equation stands for sample size.

In the alternative normal equations can be transformed to determine the value of


the intercept and the gradient directly. After the transformation, the intercept
and the gradients are represented respectively using the following formulae:
∑ 𝑦𝑦𝑖𝑖 − 𝑏𝑏�𝑖𝑖 ∑ 𝑥𝑥𝑖𝑖
𝑏𝑏�0 =
𝑛𝑛
𝑛𝑛 ∑ 𝑥𝑥𝑖𝑖 𝑦𝑦𝑖𝑖 − ∑ 𝑥𝑥𝑖𝑖 ∑ 𝑦𝑦𝑖𝑖
𝑏𝑏�𝑖𝑖 =
𝑛𝑛 ∑ 𝑥𝑥𝑖𝑖2 − (∑ 𝑥𝑥𝑖𝑖 )2
Another set of formulae that could be used are in the form of 𝑏𝑏�0 = 𝑦𝑦�𝑖𝑖 − 𝑏𝑏�𝑖𝑖 𝑥𝑥̅𝑖𝑖
∑ 𝑥𝑥 𝑦𝑦 −𝑛𝑛𝑥𝑥̅ 𝑦𝑦�
And 𝑏𝑏�𝑖𝑖 = ∑ 2𝑖𝑖 𝑖𝑖 𝑖𝑖 2 .
𝑥𝑥𝑖𝑖 −𝑛𝑛( 𝑥𝑥̅ 𝑖𝑖 )

In each of the above cases, the value of 𝑏𝑏�𝑖𝑖 has to be determined first before that
of𝑏𝑏�0 . The reason being that 𝑏𝑏�0 formula has two unknown 𝑏𝑏�0 and 𝑏𝑏�𝑖𝑖 .

The value of 𝑏𝑏�0 and 𝑏𝑏�𝑖𝑖 can also be computed by using sum of squared
difference. This approach leads to the minimization of the sum of the squares of
the errors made in the results of every single equation. It is also called least
square method and the line of best fit determined using the method is called
least squares line.

The value of 𝑏𝑏�0 = 𝑦𝑦�𝑖𝑖 − 𝑏𝑏�𝑖𝑖 𝑥𝑥̅𝑖𝑖 as in the case of the second formula above, while
that of
𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆
𝑏𝑏�𝑖𝑖 = ,
𝑆𝑆𝑆𝑆𝑆𝑆
Where SS xy is sum of squared xy and SS x is sum of squared x. The two respective
values are computed using the following formulae:

𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 = �(𝑥𝑥𝑖𝑖 − 𝑥𝑥̅𝑖𝑖 ) (𝑦𝑦𝑖𝑖 − 𝑦𝑦�𝑖𝑖 )𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 = �(𝑥𝑥𝑖𝑖 − 𝑥𝑥̅𝑖𝑖 )2

Where:
∑(𝑦𝑦𝑖𝑖 ) ∑(𝑥𝑥𝑖𝑖 )
𝑦𝑦�𝑖𝑖 = 𝑎𝑎𝑎𝑎𝑎𝑎𝑥𝑥̅𝑖𝑖 =
𝑛𝑛 𝑛𝑛
Note:
SS xy is the numerator that is used in the computation of the covariance and
correlation coefficient. SS x is the numerator that is used in the computation of
the standard sample standard deviation earlier computed.

There are short cut formulae that could be used in computing SS xy and SS x .
These formulae are presented thus:
∑ 𝑥𝑥𝑖𝑖 ∑ 𝑦𝑦𝑖𝑖
𝑆𝑆𝑆𝑆𝑆𝑆 = � 𝑥𝑥𝑖𝑖 𝑦𝑦𝑖𝑖 −
𝑛𝑛
(∑ 𝑥𝑥𝑖𝑖 )2
𝑆𝑆𝑆𝑆𝑆𝑆 = � 𝑥𝑥𝑖𝑖2 −
𝑛𝑛

(D) Regression of x on y
There are two lines of regression: one of y on x and the other of x on y. When y
is regressed on x, y is the dependent variable and x is the independent. The
reverse is the case when x is regressed on y. The value of the intercept and the
gradient when x is regressed on y can be found using the following interpolation
formulae:
∑ 𝑥𝑥𝑥𝑥 − 𝑛𝑛𝑥𝑥̅ 𝑦𝑦�
𝑏𝑏�𝑖𝑖′ =
∑ 𝑦𝑦𝑖𝑖2 − 𝑛𝑛𝑦𝑦� 2
𝑏𝑏�0′ = 𝑥𝑥̅ − 𝑏𝑏�𝑖𝑖′ 𝑦𝑦�
Alternatively, by means of the following normal equations, the value of the
intercept and the gradient can be determined:
Σx i = n a’ + b’ Σy i -------------- 1
Σx i y i = a’ Σy i + b’ Σy i 2 -------- 2
After the values have been determined they can then be substituted into the
below regression equation.
x = a’ + yb’

(E) Properties of Regression Coefficients


Some of the basic properties or features possess by regression coefficient are:
1. A regression coefficient represents the increment in the value of the
dependent variable corresponding to a unit change in the value of the
independent variable. This means that whenever an independent variable
changes by one unit, the corresponding change in the dependent variable
is shown by the regression coefficient.
2. The sign of r depends on the sign of the regression coefficients. If the
regression coefficients are positive, r will be positive and if the regression
coefficients are negative, r will be negative. Hence, it is possible from the
sign of the regression coefficient to tell of the nature of the relationship
among variables under study.
3. If one of the regression coefficients is less than unity, the other must be
greater than unity, i.e. if b yx < 1 then b xy > 1 and if b xy < 1 then b yx > 1.
4. The arithmetic mean of the regression coefficients is always greater than
or equal to the correlation coefficient. i.e. ½ (b xy + b yx ) >r.
5. The two regression equations are not reversible or interchangeable
because the basis and assumption for deriving these equations are quite
different. The regression equation of y on x is obtained on minimising the
sum of squares of the errors parallel to the y-axis, while the regression
equation of x on y is obtained on minimising the sum of squares of the
errors parallel to the x-axis.
6. The point of intersection of the two regression lines is at (𝑥𝑥̅ ,�𝑦𝑦).

(F) Correlation Analysis versus Regression Analysis


Despite some common features that correlation and regression analysis share,
there is the need to state out the difference between the two analyses. The
following table highlights the major differences.

Table 4 Differences between Correlation and Regression Analysis


S/N Correlation Analysis Regression Analysis
1 This is used in measuring the This is used in measuring the nature
degree of relationship between of relationship between variables. It
variables. The relationship can be serves as a mathematical measure
moving in the same or opposite showing the average relationship
direction. It does not indicate cause between two variables that also
and effect. indicate cause and effect.
2 It is not used for estimation or It is basically used for estimation and
prediction. prediction.
3 There may be spurious or non- There is no spurious or non-sense
sense correlation between two regression.
variables.
4 The issues of classifying variables Variables must be classified into
into dependent and independent dependent and independent.
does not arise before conducting
correlation analysis.
5 Both variables x and y under X variable is considered as random
analysis are considered as random and y as a fixed variable.
variables.
6 Correlation coefficient is a relative This is an absolute figure that can be
measure that ranges between -1 used for estimation or prediction.
and +1.
7 It has limited application as it is The analysis has a wider spectrum of
restricted to a linear relationship applications as it can address both
between variables. Hence, not very linear and non-linear relationship.
amenable to further mathematical Hence, very amenable to further
treatment. mathematical treatment.

(G) Regression Diagnostics-I


There is every need after the construction of regression model to test for the
fitness of the model. One of the ways of doing that is by
i. The error term is normally distributed.
ii. The expected value of the error term is zero.
iii. The variance (or standard deviation) of the error term is constant for
all the values of the independent variable, x.
iv. The errors associated with any two values of y are independent. This
means that error from observation (ε i ) is different from observation
(ε j ).

To ascertain whether there is any deviation from the above conditions demand
examining the error term (or residual). The examination is done by first of all
standardising the residuals. The standardisation is done by subtracting the mean
and dividing by the standard deviation. The mean of the residual is zero, and
because the standard deviation is unknown, it is estimated by using the standard
error of estimate. Hence, the formula for computing standardised residual (SR)
is represented thus:
𝑟𝑟 𝑦𝑦𝑖𝑖 − 𝑦𝑦�𝑖𝑖
𝑆𝑆𝑆𝑆 = =
𝑆𝑆𝑆𝑆 𝑆𝑆𝑆𝑆
Where:
SR = standardized residual
r = residual = 𝑦𝑦𝑖𝑖 − 𝑦𝑦�𝑖𝑖
∑(𝑦𝑦𝑖𝑖 −𝑦𝑦�𝑖𝑖 )2
SE = standard error of estimate = �
𝑛𝑛−2

Summary
The study session dealt with a measure of relationship using regression analysis.
Explanation was offered on how to make prediction or estimation using
regression equation. Properties of regression coefficient, comparison between
correlation analysis and regression analysis, regression diagnosis was also
discussed.

(H) Discussion Questions


1. State what we use regression analysis for.
2. What are the properties of regression coefficients?
3. What is the difference between regressing y on x and regressing x on y?
4. State the difference between correlation and regression.
5. Why is there a need to conduct diagnostic analysis?

References
1. Dominick, Salvatore and Derrick, Reagle (2002). Theory and Problems of
Statistics and Econometrics (2ndEdition). New York: McGraw-Hill
Companies, Inc.
2. Gerald, K and Brian, W (2006).Statistics for Management and Economics
(10th Edition). Pacific Grove-USA: Brooks/Cole-Thompson Learning.
3. Hein, L. W (2015). Quantitative Approach to Managerial Decisions. New
Jersey: Prentice Hall.
4. Hooda, R. P (2014). Statistics for Business and Economics. New Delhi:
Macmillan.
5. Lind, Marchal and Marson (2005). Statistical Techniques in Business and
Economics (12th Edition). New York: McGraw-Hill Companies.
10.0 FURTHER READING
1. Gerald, K and Brian, W (2006).Statistics for Management and Economics
(10th Edition). Pacific Grove-USA: Brooks/Cole-Thompson Learning.
2. Lind, Marchal and Marson (2005). Statistical Techniques in Business and
Economics (12th Edition). New York: McGraw-Hill Companies.
3. Dominick, Salvatore and Derrick, Reagle (2002). Theory and Problems of
Statistics and Econometrics (2nd Edition). New York: McGraw-Hill
Companies, Inc.
4. St. Andrews (2003). Bayes School of Mathematics and Statistics,
University of St Andrews Scotland, http://www-gap.dcs.st-and.ac.uk/
history/Mathematicians/Bayes.html. Edited by John O’Connor and
Edmund Robertson.
5. Calgary, U (2003). Bayes Theorem. University of Calgary, Department of
Mathematics and Statistics, Division of Statistics and Actuarial Science,
http://balducci.math.ucalgary.ca/.
6. Gupta, S. P (2010). Statistical Methods. New Delhi: Sultan Chand and
Sons.
7. Hooda, R. P (2014). Statistics for Business and Economics. New Delhi:
Macmillan.
8. Hein, L. W (2015). Quantitative Approach to Managerial Decisions. New
Jersey: Prentice Hall.
9. Levin, Richard I. and David S. Rubin (2009). Statistics for Management.
New Delhi: Prentice Hall.
10.Lawrence, B. Moore (2010). Statistics for Business & Economics. New
York: Harper Collins.
11.Watsman, Terry J. and Keith Parramor (2012). Quantitative Methods in
Finance International. London: Thompson Business Press.
11.0 GLOSSARY
1. Alternative hypothesis is the hypothesis that the researcher expects to support.
2. Analysis of variance is a statistical test of the difference of means for two or more
groups.
3. ANOVA is an acronym for analysis of variance. It is a statistical test of the difference
of means for two or more groups.
4. Central tendency is a typical or representative value for a dataset which is reported
as the mean, the median, or the mode, depending on the data and/or one's purposes.
5. Chi Square is a statistical procedure that examines the relationship between two
categorical variables. The test is based on the discrepancy between the observed
number of observations in each category and the expected number of observations in
each category.
6. Coefficient of determination is a statistic used in linear regression that indicates the
amount of variation in the dependent variable which is explained or accounted for by
the independent variable(s).
7. Confidence interval is the generic label used to describe the decision points where
the researcher favours the alternative hypothesis over the null hypothesis. Stated
differently, it is the range of mean values within which the true population mean is
likely to fall.
8. Continuous variable is a variable which can assume an infinite number of values.
Convenience sample is the kind of sampling used when the researcher decides to
select the units of study on the basis of their being readily available.
9. Correlation is a standardised index of the strength and direction of the relationship
between two variables. The range for the possible correlation between any two
variables is from -1.00 (a perfect inverse relationship) to +1.00 (a perfect positive
relationship).
10. Covariance is a measure of association between a pair of variables. It is similar to a
correlation, but a correlation is expressed in a standardised metric, whereas
covariance is expressed in the units of the original variables.
11. Critical value is a value that establishes the boundaries of the confidence interval.
12. Decile is a subset of adjacent scores in a distribution representing 10% of a sample or
a population. A "decile score" is a raw score corresponding to the 10th, 20th, or 30th
etc. percentage score.
13. Degrees of freedom is the number of components in the calculation of a statistic that
are free to vary.
14. Dichotomous variable is a discrete measure with two categories that may or may not
be ordered. It is a variable which has only two categories.
15. Discrete variable is a variable which is limited to a finite number of values. A
discrete variable usually describes something which occurs only in whole units. The
number of males in an English class is an example of a discrete variable.
16. Dispersion is the "spread" of a data set, the departure from central tendency.
17. Distribution is where the horizontal axis (x-axis) represents the variable being
described. The density of the smooth curve over the x-axis represents the probability
of occurrence for each of the values on the x-axis.
18. Explained variance is the variance in Y about Y' where Y' is the value of Y on the
regression line predicted by the regression equation. If the regression line does not
help in predicting Y, then it will pass through Y-bar, in which case, B yx = 0. In
absolute value terms, the highest possible score for B yx = +/- 1.00.
19. Heteroscedasticity is a condition in which the variances of two or more population
distributions are not equal.
20. Histogram is a bar graph used to represent the frequency of each value occurring in a
distribution of scores.
21. Homoscedasticity is a condition in which the variances of two or more population
distributions are equal.
22. Hypotheses is a set of two or more mutually exclusive and often exhaustive
statements. The goal of hypothesis testing is to determine which is true.
23. Independent samples t-test is the procedure used in hypothesis testing to compare
the means of two different samples. As is true for all t-tests, the standard error is not
known and is estimated from sample data.
24. Interval data is data that possess magnitude (one value can be judged greater than,
less than, or equal to another) and a constant distance between intervals (units of
measurement are the same on the scale regardless of where the unit falls).
25. Interval variable is a variable whose attributes are rank ordered and have equal
distances between adjacent attributes.
26. Kurtosis is the degree of flatness or peakedness of a graph of a frequency
distribution. The relatively flat distributions are described as platykurtic. Distributions
with medium curvature are mesokurtic (note: a normal distribution is mesokurtic).
The most peaked distributions are leptokurtic.
27. Leptokurtic is a distribution that is more peaked than a normal distribution. This is to
say that there are more cases concentrated close to the mean than in a normal
distribution.
28. Line of best fit (least squares fit) is the least squares fit procedure that allows us to
reduce the scatterplot to a single straight line described by a linear equation. It
minimises the square of the vertical distance between each point and the regression
line.
29. Marginal is the frequency distribution of each of two cross tabulated variables. There
are row marginal and column marginal.
30. Mean is a measure of central tendency calculated by dividing the sum of the scores in
a distribution by the number of scores in the distribution. This value best reflects the
typical score of a data set when there are few outliers and/or the dataset is generally
symmetrical.
31. Median is the value in a data set which divides the scores into two equal halves (i.e.,
an equal number of scores lie above and below it). As a measure of central tendency,
it is largely unaffected by extreme values.
32. Mode is the score that occurs most frequently in a data set. This measure of central
tendency is the only one appropriate for nominal data.
33. Negative skew is an asymmetry in a distribution in which the scores are bunched to
the right side of the centre. With a negatively skewed distribution, the mean generally
falls to the left of the median and the median usually lies to the left of the mode.
Study Hint: the tail of a negatively skewed distribution points to the negative side of
the number line.
34. Non-probability sample is a type of sampling that involves the researcher's judgment
to determine the elements to be selected for the sample.
35. Nominal data are data that are classified into mutually exclusive ("named") groups
that lack intrinsic order.
36. Normal distribution is a theoretical distribution which is typically bell-shaped when
graphed. The distribution is theoretical because the height of the curve is defined by a
mathematical formula (and the exact values necessary to create the curve would never
occur).
37. Null hypothesis is the prediction that the researcher believes will be "nullified." That
is, the researcher believes this prediction is not true.
38. Observation is the empirical data that is used to support or refute a hypothesis.
39. Ordinal data are data whose values are ordered so that inferences can be made
regarding magnitude, but which have no fixed interval between values. An example of
ordinal data is a letter grade on a test.
40. Ordinal variable is a variable whose values are ordered so that inferences regarding
magnitude can be made, but which have no fixed interval between values. Letter
grade on a test would be an ordinal variable: while an 'A' is greater than a 'B' which is
greater than a 'C', we cannot conclude that the distance between an 'A' and a 'B' is the
same as the distance between a 'B' and a 'C'.
41. Outlier is a value in a data set that is very different from most other values in the set.
42. Paired t-testis the procedure used when the independent variable is within subjects in
nature in hypothesis testing. The goal is to compare two levels of the independent
variable assigned to the same group of subjects at different points in time. As it is true
for all t-tests, the standard error is not known and is estimated from sample data.
43. Parameter is a characteristic of a population, e.g. mean ( ), pronounced "mu", and
standard deviation ( ), or "sigma".
44. Pearson's correlation coefficient is a measure of association between two
continuous variables which estimates both the direction and strength of a linear
relationship.
45. Percentile is a value that exceeds a specific percentage of the distribution. Thus, if the
63rd percentile score for a set of you on the SAT verbal exam is 560, then 63% of
scores are at or below 560.
46. Platykurtic is a distribution that is flatter than a normal distribution. This is to say
that there are more cases in the tails of the distribution than in a normal distribution.
47. Population is the set of all possible data values that could be observed.
48. Positive skew is an asymmetry in a distribution in which the scores are bunched to
the left side of the centre. With a positively-skewed distribution, the mean generally
falls to the right of the median and the median usually lies to the right of the mode.
Study Hint: the tail of a positively skewed distribution points to the positive side of a
number line.
49. Probability sample is sampling in which each element within a study population has
a known, nonzero chance of being selected into the sample.
50. Protocol is a specified methodology for performing a task.
51. Quartile is a subset of adjacent scores in a distribution representing 25% of a sample
or a population. A "quartile score" is a raw score corresponding to the 25th, 50th, or
75th percentile score.
52. Quintile is subset of adjacent scores in a distribution representing 20% of a sample or
a population. A "quintile score" is a raw score corresponding to the 20th, 40th, 60th,
or 80th percentile score.
53. Random sample is a sample that contains observations which are selected form a
population so that every member of the population has a known chance of selection
for a sample.
54. Random variable is the measurements of a random variable, vary in a seemingly
random and unpredictable manner. A random variable assumes a unique numerical
value for each of the outcomes in the sample space of the probability experiment.
55. Range is a simple measure of dispersion, indicating the difference between the lowest
and highest values observed.
56. Ranked categories are categories within a variable that are logically ranked. The
different attributes of each category represent relatively more or less of the variable.
57. Ratio data are data that are ordered (so that we can make inferences regarding
magnitude), have equal intervals between values, and contain an absolute zero point.
Height is an example of ratio data: 60 inches is taller than 55 inches, the distance
between 60 and 55 inches is the same as the distance between 30 and 25 inches, and a
height of 0 inches implies no height at all.
58. Ratio variable are the variables that are based on a true zero point. An example of a
ratio variable would be age.
59. Regression is a statistical procedure that allows us to determine the extent to which
we can predict a given observation's score on a dependent variable, given
observation's score on one or more independent variables.
60. Regression coefficient is the slope of the regression line. It represents the change in y
for every one unit change in x.
61. Regression line is a model that simplifies the relationship between two variables. By
approximating a line through the centre of a scatterplot that represents the data, we
create a two dimensional centre for the data. The line summarises the data points in
the same way that measures of central tendency do.
62. Sample is a collection of observations selected form a larger population.
63. Sampling distribution are all the possible non-overlapping samples that can be
drawn, given a constant sample size.
64. Sampling distribution of means is a frequency distribution of a large number of
random sample means that have been drawn from the same population.
65. Sampling distribution of the difference between means is a sampling distribution
that consists of the differences in means between groups.
66. Sampling distribution of means is a frequency distribution of a large number of
random sample means that have been drawn from the same population.
67. Sampling distribution of the mean of difference scores is a sampling distribution
that consists of the differences in means within subjects across treatments.
68. Sampling error is the extent to which a sample distribution is different from the
population distribution of which the sample is drawn.
69. Scatterplot is a group of data points that are plotted along x-axis and y-axis
coordinates. Every individual is represented as a data point, whereby a perpendicular
line from the individual's "X" value intersects a perpendicular line from the
individual's "Y" value.
70. Single sample t-test is the procedure used to compare the mean of one sample to a
known population meaning hypothesis testing. As is true for all t-tests, the standard
error is not known and is estimated from sample data.
71. Skewness is asymmetry in a distribution in which scores are bunched on one side of
the distribution.
72. Standard deviation is a measure of dispersion describing the spread of scores around
the mean. It is the square root of the variance.
73. Standard error is the standard deviation of a sampling distribution.
74. Standard error of the mean is the standard deviation of a sampling distribution of
means.
75. Standard error of the mean of difference scores is the standard deviation of
a sampling distribution of the mean of difference scores.
76. Standard score is a raw score that has been converted from one scale into another
with an arbitrarily set mean and standard deviation. Standard scores are more easily
interpreted than raw scores, because they take into account the mean and standard
deviation of the distribution of values.
77. Statistic is a characteristic of a sample, e.g. mean ( ) and standard deviation(s).
78. Strata is a subdivision of a population.
79. Stratification is allocating samples among subcategories, within a population.
Stratification is sometimes necessary to improve the effectiveness of a sampling effort
or to increase understanding of population characteristics. For example, stratifying an
election survey by sex allows analysts to better understand voter behaviour by
revealing differences in the way that males and females vote.
80. Type I error is erroneously rejecting the null hypothesis: concluding that a sample
came from a different population when it in fact, is from the same population.
81. Type II error is erroneously failing to reject the null hypothesis: concluding that a
sample came from the given population when it in fact is from a different population.
82. Variance is a measure of dispersion, indicating the mean of the squared deviations of
a set of scores from the mean of the scores.
83. Y-intercept is the point through which the line intersects the Y-axis. It is the value of
y when x equals zero.
84. Z-score is a standardised score which indicates how many standard deviations a value
lies above or below the mean.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy