Note Buad806 Mod1 3buhumcr6wxmrre
Note Buad806 Mod1 3buhumcr6wxmrre
COURSE MATERIAL
All rights reserved. No part of this publication may be reproduced in any form or by any means,
electronic, mechanical, photocopying, recording or otherwise without the prior permission of the
Director, Distance Learning Centre, Ahmadu Bello University, Zaria, Nigeria.
ISBN:
Tel: +234
E-mail:
COURSE WRITERS/DEVELOPMENT TEAM
Mohammed Habibu Sabari (PhD)
Jamilu Abdulkadir (Subject Matter Experts)
Prof. Abiola Awosika
Halima Shuaibu (Subject Matter Reviewers)
Yusuf Musa (Language Reviewer)
Nasiru Tanko
Ibrahim Otukoya Graphics
Prof. Adamu Z. Hassan (Editor)
QUOTE
“It is the mark of a truly intelligent person to be moved by statistics”.
George Bernard Shaw
TABLE OF CONTENT
Title Page ---------------------------------------------------------------------------------------------
Copyright----------------------------------------------------------------------------------------------
Quote--------------------------------------------------------------------------------------------------
Table of Content-------------------------------------------------------------------------------------
1.0 Course Information---------------------------------------------------------------------------
2.0 Course Description----------------------------------------------------------------------------
3.0 Course Introduction--------------------------------------------------------------------------
4.0 Course Outcomes-----------------------------------------------------------------------------
5.0 Activities to Meet Course Objectives----------------------------------------------------
6.0 Grading Criteria and Scale------------------------------------------------------------------
7.0 Course Structure and Outline--------------------------------------------------------------
8.0 Discussion Forum------------------------------------------------------------------------------
8.1 Topical Discussions------------------------------------------------------------------------------
8.2 Discussion Questions---------------------------------------------------------------------------
9.0 Study Modules---------------------------------------------------------------------------------
9.1 Module 1: Basic Introduction to Statistics--------------------------------------
Introduction---------------------------------------------------------------------------------------
9.1.1 Objectives----------------------------------------------------------------------
9.1.2 Study Sessions--------------------------------------------------------------------
9.1.2.1 Study Session 1: Introduction--------------------------------------------------
9.1.2.2 Study Session 2: Presentations of Statistical Data---------------------------
9.1.2.3 Study Session 3: Measures of Central Tendency----------------------------
9.1.2.4 Study Session 4: Measures of Dispersion-------------------------------
9.2 Module 2: Skewness and Kurtosis, Probability Theory and Distribution---
Introduction----------------------------------------------------------------------------
9.2.1 Objectives------------------------------------------------------------------
9.2.2 Study Sessions------------------------------------------------------------------
9.2.2.1 Study Session 5: Skewness and Kurtosis---------------------------------------
9.2.2.2 Study Session 6: Probability I--------------------------------------------------
9.2.2.3 Study Session 7: Probability II----------------------------------------------------
9.2.2.4 Study Session 8: Probability III---------------------------------------------------
9.3 Module 3: Sampling and Sampling Methods------------------------------
Introduction----------------------------------------------------------------------------
9.3.1 Objectives----------------------------------------------------------------------
9.3.2 Study Sessions--------------------------------------------------------------------
9.3.2.1 Study Session 9: Sampling I-----------------------------------------------------
9.3.2.2 Study Session 10: Sampling II--------------------------------------------------
9.3.2.3 Study Session 11: Hypothesis--------------------------------------------------
9.3.2.4 Study Session 12: Correlation
9.3.2.5 Study Session 13: Regression-------------------------------------------------
10.0 Further Reading-------------------------------------------------------------------------------
11.0 Glossary-----------------------------------------------------------------------------------------
PREAMBLE
Welcome to Business Statistics and Quantitative Analysis, I am your instructor for
the semester. I assure you that if you put in your best, this will be a very interesting
course. I look forward to a fulfilling time with you.
The data under study is made up of two aspects – descriptive and inferential. We
use descriptive to study features and characteristics of the data while the inferential
aspect provides mathematical avenues to infer the properties of a population from a
randomly selected sample taken from it. The course is structured to address both the
descriptive and inferential statistics.
The outline of the course is addressed by three modules. The first module dwells on
the introduction to statistics which covers four study sessions. Session 1 looks at
various definitions of the subject couple with its classifications. Session 2 addresses
presentation and tabulation of data. Session 3 explains and computes measures of
central tendency. And, session 4 expounds measures of variability.
The second module deals with skewness, kurtosis and probability distribution. The
module is made up of four sessions and it starts from session 5 that addresses
skewness and kurtosis. Session 6 talks about probability and expected value.
Session 7 discusses discrete probability distribution and, session 8 explains
continuous probability distribution.
The third module is made up of a variety of topics that include sampling and its
methods, hypothesis testing and correlation and regression analysis. The module
comprises five session, starting from session 9, and addresses probability and non-
probability sampling. Session 10 explains sampling distributions of mean and
proportions. Hypothesis and its testing are studied within session 11 and finally,
session 12 explains correlation and regression analysis.
4.0 COURSE OUTCOMES
Upon the completion of this course, you are expected to be able to:
• Define and understand basic statistical terms.
• Compute measures of central tendency and also variability measures.
• Compute skewness and kurtosis of data and interpret their values in
describing the data distribution.
• Compute probabilities and apply the concepts of probability to confidence
intervals and hypothesis tests.
• Use hypothesis test to weigh inferences concerning means and proportions.
• Use a scatter plot to visualise the relationship between variables, use
correlation coefficient to measure the strength and direction of the
relationship, use a linear function to describe relationship between variables
and make estimation/prediction.
On the basis of these, there will be lecture guiding materials written in clear and
concise nature that will aid and guide you to have better understanding of the
course. Video lectures that address ambiguous areas will be provided. Relevant
sites and three standard reference books will be used. There will be series of group
and individual assignments that you are expected to do and submit within the
defined time limit. This is also to serve as part of your assessment.
Provision of instructor’s email and telephone line(s) are made to enable you seek
clarification on things that are not clear to you. Also, tutorials will be arranged
within the two weeks on campus activities in which questions will be clarified to
enable you understand fully what you’ve learnt.
SchoolForge and SourceForge are good places to find, create, and publish open software.
SourceForge, for one, has millions of downloads each day.
Open Source Education Foundation and Open Source Initiative, and other organisations like
these, help disseminate knowledge.
Creative Commons has a number of open projects from Khan Academy to Curriki where teachers
and parents can find educational materials for children or learn about Creative Commons licenses.
Also, they recently launched the School of Open that offers courses on the meaning, application, and
impact of "openness."
Numerous open or open educational resource databases and search engines exist. Some examples
include:
• OEDb: over 10,000 free courses from universities as well as reviews of colleges and rankings of
college degree programmes
• Open Tapestry: over 100,000 open licensed online learning resources for an academic and
general audience
• OER Commons: over 40,000 open educational resources from elementary school through to
higher education; many of the elementary, middle, and high school resources are aligned to the
Common Core State Standards
• Open Content: a blog, definition, and game of open source as well as a friendly search engine for
open educational resources from MIT, Stanford, and other universities with subject and
description listings
• Academic Earth: over 1,500 video lectures from MIT, Stanford, Berkeley, Harvard, Princeton, and
Yale
• JISC: Joint Information Systems Committee works on behalf of UK higher education and is
involved in many open resources and open projects including digitising British newspapers from
1620-1900!
Universities
• The University of Cambridge's guide on Open Educational Resources for Teacher Education
(ORBIT)
• OpenLearn from Open University in the UK
Global
• Unesco's searchable open database is a portal to worldwide courses and research initiatives
• African Virtual University (http://oer.avu.org/) has numerous modules on subjects in English,
French, and Portuguese
• https://code.google.com/p/course-builder/ is Google's open source software that is designed to let
anyone create online education courses
• Global Voices (http://globalvoicesonline.org/) is an international community of bloggers who
report on blogs and citizen media from around the world, including on open source and open
educational resources
• Librarian Chick: everything from books to quizzes and videos here, includes directories on open
source and open educational resources
• K-12 Tech Tools: OERs, from art to special education
• Web 2.0: Cool Tools for Schools: audio and video tools
• Web 2.0 Guru: animation and various collections of free open source software
• Livebinders: search, create, or organise digital information binders by age, grade, or subject (why
re-invent the wheel?)
Legal help
• New Media Rights is trying to help digital creators use public domain or open materials legally.
They have guides on how to use free and open software materials in various fields.
7.0 COURSE STRUCTURE AND OUTLINE
7.1 Course Structure:
Week 3 Study Session 1 1.Study the course material of this session 1. Explain the scope and Discussion Topics
• Meaning and Definitions 2.Watch the video of this study session functions of Statistics. shall be uploaded
of Statistics weekly while
• Classification of
3.Listen to the audio of this study Discussion Questions
Statistics session are presented in the
• Significance of Statistics 4. Read chapters 1 to 4 of Gerald and Brian appropriate sections.
• Limitations of Statistics (2006) Statistics for Management and
• Data Sources and Data Economics, International Thomson
Types Publishing, Southern Africa, 10th Edition.
• Key Statistical Concepts
MODULE 1
Week 4 Study Session 2 1.Study the course material of this session Briefly explain the
• Tabulation Method 2.Watch the video of this study session following:
• Charting Method 1. A bar chart
3.Listen to the audio of this study 2. A pie chart
• Graphical Techniques
for Quantitative Data session 3. Z chart
4. Read chapters 1 to 4 of Lind, Marchal
and Marson (2005) Statistical Techniques
in Business and Economics, McGraw-Hill
Companies, 12th Edition.
Week 5 Study Session 3 1. Why is it that statistics
1.Study the course material of this session
• Meaning of Central 2.Watch the video of this study session calculated from raw
Tendency data are more accurate
• Measures of Central
3.Listen to the audio of this study than statistics
Tendency session calculated from
4. Read chapters 1 to 4 of Lind, Marchal frequency tables?
and Marson (2002) Statistical Techniques 2. If the reasons
in Business and Economics, McGraw-Hill advanced are so far
Companies, International Edition. the case, why then are
statistics computed
from frequency table?
Week 6 Study Session 4 1.Study the course material of this session
• Meaning of Variability 2.Watch the video of this study session
• Significance and
Properties of Measuring
3.Listen to the audio of this study
Variability session
• Measures of Variability 4. Review the posted response to last
• Interpretation of week’s Discussion Question(s).
Standard Deviation
Study Session 5 1.Study the course material of this session 1. What do you
• Meaning of Skewness 2.Watch the video of this study session understand by the
Week 7 • Measures of Skewness term skewness and
3.Listen to the audio of this study what is the purpose of
• Meaning of Kurtosis session computing its value?
• Measures of Kurtosis 2. What do you
. understand by the
term kurtosis and
what is the purpose of
computing its value?
Week 8 Study Session 6 1.Study the course material of this session Explain the meaning of the
• Meaning of Probability 2.Watch the video of this study session followings:
• How to Assign 1. Independent events.
Probabilities to Events
3.Listen to the audio of this study 2. Mutually exclusive
• Computational session events.
Probability Rules 3. Conditional probability.
• Bayes’ Theorem 4. Expected value.
Week 9 Study Session 7 1.Study the course material of this session Define discrete random
• Meaning of Discrete 2.Watch the video of this study session variable and discrete
Probability Distribution probability distribution.
• Bernoulli Random
3.Listen to the audio of this study
MODULE 2 Variable session What are the properties of
• The Binomial discrete probability
Distribution distribution?
• The Poisson
Distribution
• The Hyper-geometric
Distribution
Week 10 Study Session 8 1.Study the course material of this session 1. Define continuous
• Meaning of Continuous 2.Watch the video of this study session random variable and
Probability Distribution continuous probability
• Normal Distribution
3.Listen to the audio of this study distribution.
• The Standard Normal session 2. What are the
Distribution properties of
• The Standard Normal continuous probability
Variable Transformation distribution?
Week 11
MID SEMESTER BREAK
MODULE 3
Week 13 Study Session 10 1.Study the course material of this session 1. Define sampling
• Sampling Distribution of 2.Watch the video of this study session distribution of the mean
the Mean and sampling distribution
• Central Limit Theorem
3.Listen to the audio of this study of proportion.
• Sampling Distribution of session
the Proportion 2. Define the following:
• Sampling Distribution of -Parameter
the Difference of -Statistic
Sample Means -Standard error
• Sampling Distribution of
the Difference of
Proportions
• Small Sampling
Distributions
Week 14 Study Session 11 . 1.Study the course material of this session 1. Define null and
• The Null and Alternative 2.Watch the video of this study session alternative hypothesis.
Hypotheses 2. What is the difference
• Classical Approach to
3.Listen to the audio of this study between type I and
Testing Hypotheses session type II errors?
• Statistical Decision 3. What is the difference
Rules and their between one-tailed
Applications and two tailed tests?
• The Meaning and
Interpretation of p-Value
• Power of a Test and the
Size Effect
Week 20
Week 21 SEMESTER EXAMINATION
7.2 Course Outline
MODULE 1: Basic Introduction to Statistics
Study Session 1: Introduction
- Meaning and Definitions of Statistics
- Classification of Statistics
- Significance of Statistics
- Limitations of Statistics
- Data Sources and Data Types
- Key Statistical Concepts
Study Session 2: Presentations of Statistical Data
- Tabulation Method
- Charting Method
- Graphical Techniques for Quantitative Data
Study Session 3: Measures of Central Tendency
- Meaning of Central Tendency
- Measures of Central Tendency
Study Session 4: Measures of Dispersion
- Meaning of Variability
- Significance and Properties of Measuring Variability
- Measures of Variability
- Interpretation of Standard Deviation
9.1.1 Objectives
After this contact module, you would be able to:
1. comprehend the historical evolution of statistics;
2. understand the meaning of statistics and its classifications;
3. critique the significances and limitations of statistics;
4. identify data sources, data types, and key statistical concepts and symbols;
5. apply mean, median, mode and other
specialised averages to issues in your business
environment;
6. understand the limitations of central tendency
measures;
7. compute variability measures and their
properties; and
8. Calculate skewness, kurtosis and conduct their respective tests.
9.1.2 STUDY SESSIONS
9.1.2.1 STUDY SESSION 1: Introduction
Section and Subsection Headings
Introduction
Learning outcomes
Main Content
(A) Meaning and Definitions of Statistics
(B) Classification of Statistics
(C) Significance of Statistics
(D) Scope/Uses of Statistics
(E) Limitations of Statistics
(F) Sources of Data
(G) Types of Data
(H) Key Statistical Concepts
Summary
Self-assessment questions and Answers
References/Further Readings
Introduction
In this study session, you will be introduced to Business Statistics &
Quantitative Analysis which will form the basis of the subsequent discussions
that will follow. As you will understand, Business Statistics is used in many
disciplines such as financial analysis, econometrics, auditing, production &
operations, and marketing research. It provides knowledge and skills to interpret
and use statistical techniques in a variety of business applications. By means of
statistical concepts and statistical thinking, decisions makers will be able to
solve problems in a diversity of contexts, add value to decisions and also reduce
using guesswork on decision making.
Learning Outcomes
At the end of this study session you should be able to
1. Explain the meaning & definition of Business Statistics.
2. Discuss the classifications of statistics.
3. Discuss the uses and limitations of Business Statistics.
4. Explain the significance of statistics.
5. Know the types of statistics
6. Understand the key statistical concepts.
The word ‘statistics’ is used either in plural and singular form. In the plural
form, it refers to a set of figures or data. While in the singular form, statistics
refers to the whole body of tools that are used to collect data, organise and
interpret them and, finally, to draw conclusions from them. It should be noted
that both the aspects of statistics are important if the quantitative data are to
serve their purpose.
The desired information about a given population of our interest; may also be
collected even by observing all the units comprising the population. This total
coverage is called census. Getting the desired value for the population through
census is not always feasible and practical for various reasons. Apart from time
and money considerations making the census operations prohibitive, observing
each individual unit of the population with reference to any data characteristic
may at times involve even destructive testing. In such cases, obviously, the only
recourse available is to employ the partial or incomplete information gathered
through a sample for the purpose. This is precisely what inferential statistics
does. Thus, obtaining a particular value from the sample information and using
it for drawing an inference about the entire population underlies the subject
matter of inferential statistics. Example of Population can be all Ahmadu Bello
University (ABU) Zaria-Nigeria; while Example of Sample can be all MBA
Students of ABU Zaria-Nigeria.
Data sources could be seen as of two types, viz., primary and secondary. The
two can be defined as under:
1. Primary data: Those data which do not already exist in any form, and thus
have to be collected for the first time from the primary source(s). By their very
nature, these data require fresh and first-time collection covering the whole
population or a sample drawn from it. The various methods of collecting
primary data include surveys, interview, observation, questionnaire,
experiments etc.
ii. Discrete data: are the values assumed by a discrete variable. A discrete
variable is the one whose outcomes are measured in fixed numbers. Such data
are essentially count data. These are derived from a process of counting, such as
the number of items possessing or not possessing a certain characteristic. The
number of customers visiting a departmental store every day, the incoming
flights at an airport, and the defective items in a consignment received for sale,
are all examples of discrete data.
i. Nominal data: are the outcome of classification into two or more categories
of items or units comprising a sample or a population according to some quality
characteristic. Classification of students according to sex (as males and
females), of workers according to skill (as skilled, semi-skilled, and unskilled),
and of employees according to the level of education (as matriculates,
undergraduates, and post-graduates), all result into nominal data. Given any
such basis of classification, it is always possible to assign each item to a
particular class and make a summation of items belonging to each class. The
count data so obtained are called nominal data.
ii. Rank data: on the other hand, are the result of assigning ranks to specify
order in terms of the integers 1,2,3, ..., n. Ranks may be assigned according to
the level of performance in a test, a contest, a competition, an interview, or a
show. The candidates appearing in an interview, for example, may be assigned
ranks in integers ranging from 1 to n, depending on their performance in the
interview. Ranks so assigned can be viewed as the continuous values of a
variable involving performance as the quality characteristic.
ii. Variable: This is any quality that can have a number of values, which may
be either discrete or continuous. A variable is a property that can take on
different values. Individual in a class may differ in sex, age, intelligence, height
etc. These properties are variables. Variables could vary in quality or in
quantity. Constants unlike variables do not assume different values.
iii. Dependent Variable: A variable whose values are influenced by the values
of another variable so that a change in the latter will cause a change in the
former. E.g. y=3x, y is a dependent variable as the value of x will cause a
change in the value of y. That is change in y depend in x.
iv. Independent Variable: The variable which exerts influence on another
variable in the previous section is the independent variable. The value of the
independent variables explains the value of the dependent variable e.g. y=3x-2
this is a functional relationship between x and y where y is the dependent
variable and x is the independent variable.
vii. Discrete Variable: This is the variable that can only assume whole
numbers. Examples of these are the number of Local Government Council
Areas of the States in Nigeria, number of female students in the various
programmes in the Ahmadu Bello University.
A discrete variable has “interruptions” between the values it can assume. For
instance between 1 and 2, there are infinite number of values such as 1.1, 1.11,
1.111, 1. 1 1 l l and so on. These are called interruptions.
viii. Continuous Variable: This is a variable that can assume both decimal and
non-decimal values. There is always a continuum of values that the continuous
variable can assume. The interruptions that characterize the discrete variable are
absent in the continuous variable. The weight can be either whole values or
decimal values such as 20 kilograms and 220.1752 kilograms.
ix. Distribution: This is the arrangement of a set of numbers classified
according to some properties or attributes such as age, height, weight, etc.
xi. Sample: This is the part of the population that is selected for a study. It is a
subset of a population. It is also a sub-group or sub-aggregate drawn from a
population; i.e. the portion appropriately selected out of the population by the
same statistical method for observation.
xii. Random Sample: This is a sample drawn from a population in such a way
that the results of its analysis may be used to generalize about the population
from which it was drawn.
Summary
In this study session, we have discussed the basics of Business Statistics &
Quantitative Analysis which form the basis of the subsequent discussions that
followed. As you have learnt, Business Statistics & Quantitative Analysis is
used in many disciplines such as financial analysis, econometrics, auditing,
production & operations, and marketing research. It provides knowledge and
skills to interpret and use statistical techniques in a variety of business
applications. By means of statistical concepts and statistical thinking, decisions
makers will be able to solve problems in a diversity of contexts, add value to
decisions and also reduce using guesswork on decision making. We further
discussed the classification of statistics, its significance, scope, limitations,
sources, types and also, key statistical concept was defined.
References
1. Dominick, Salvatore and Derrick, Reagle (2002). Theory and Problems of
Statistics and Econometrics (2ndEdition). New York: McGraw-Hill
Companies, Inc.
2. Gerald, K and Brian, W (2006).Statistics for Management and Economics
(10th Edition). Pacific Grove-USA: Brooks/Cole-Thompson Learning.
3. Hooda, R. P (2014). Statistics for Business and Economics. New Delhi:
Macmillan.
4. Hein, L. W (2015). Quantitative Approach to Managerial Decisions. New
Jersey: Prentice Hall.
5. Lind, Marchal and Marson (2005). Statistical Techniques in Business and
Economics (12th Edition). New York: McGraw-Hill Companies.
9.1.2.2 STUDY SESSION 2
Presentations of Statistical Data
Section and Subsection Headings
Introduction
Learning outcomes
Main Content
(A) Tabulation Method
(B) Charting Method
1. Bar Chart
2. Simple Bar Chart
3. Multiple Bar Chart
4. Component Bar Chart
5. Percentage Component Bar Chart
6. Pie Chart
(C) Graphical Method
(D) Discussion Questions
Summary
Self-assessment questions and Answers
References/Further Readings
Introduction
You are welcome to study session two. In this session, you will understand the
role tabulation, charting as well as graphical methods in Business Statistics.
Learning Outcomes
At the end of the study session, you should be able to
1. Explain the role tabulation in Business Statistics.
2. Discuss the concept of charting & graphical methods in Business
Statistics.
Tabulation, Charting & Graphical Methods
Data in their numerical (quantitative), categorical (qualitative) or ranking
(ordinal) forms do not seem meaningful until they are presented in tables, charts
or graphs. For the purpose of this study, we will focus more on the use of tables,
charts as well as graphical techniques.
Characteristics of a Table
1. A title which is a brief explanation of what the table is all about.
2. A column title or caption to show order of classification along the
columns.
3. A row title or sub title to show order of classification along the row.
4. A source note at the bottom which gives the source of the information
contained in the table.
5. An indication of the units in which the data in the table is given, usually
at the right hand corner.
The table below illustrates these characteristics
Table: Values of Principal Export Crops of Nigeria
(2000-2005) N000, 000
Export Periods
Crops 2000 2001 2002 2003 2004 2005
Cocoa 73.6 67.4 66.6 64.8 80.2 85.4
Palm 55.6 66.8 55.6 60.4 63.4 80.2
produce
Groundnut 59.6 78.4 82.2 91.8 94.0 106.2
Total 188.8 212.6 200.4 217.0 237.6 271.8
Year Postgraduates
1990 300
1991 400
1992 500
1993 600
1994 750
600
400
200
0
1990 1991 1992 1993 1994
Year
Horizontally drawn simple bar chart
1994
1993
1992
1991
1990
Postgraduate
ii) Multiple Bar Chart
In a multiple bar chart the variables of interest are represented as A & B in the
diagram. If for example, A is 20 for management staff and B is 130 for senior
staff in a particular organization. Then, presentation of information of the
organization in term of management and senior staff in a form of multiple bar
chart is presented as follows:
Solution:
140
120
100
80
management
60
senior staff
40
20
0
management senior
Another Example:
A women council that wants to see the active involvement of women in private
sector employment conducted a research. The research revealed the data for
XYZ Company as follows:
Year Male Female
1990 150 50
1991 200 100
1992 300 150
1993 350 150
1994 500 300
Year
iii) Component Bar Chart
Component bar chart is an essential part of variable of interest.
Component means the addition of variables of interest.
Example:
Using the above information of a women council that wants to see the active
involvement of women in private sector employment who conducted a research
that revealed the data for XYZ Company as follows:
Year Male Female Total
1990 150 50 200
1991 200 100 300
1992 300 150 450
1993 350 150 500
1994 500 300 800
Year
iv) Percentage Component Bar Chart
Percentage component bar chart is an essential part of variables of interest
expressed in percentage from having each bar split in to constituent parts.
Example:
Refer to the above information of a women council which is reproduced below
and depict the information in a percentage component chart.
Year Male Female Total
1990 150 50 200
1991 200 100 300
1992 300 150 450
1993 350 150 500
1994 500 300 800
Solution:
Year Male% Female% Total
1990 75 25 100
1991 66.67 33.3 100
1992 66.67 33.3 100
1993 70 30 100
1994 62 37.5 100
100%
90%
80%
70%
60%
50% Female
40% Male
30%
20%
10%
0%
1990 1991 1992 1993 1994
Year
ITQ: What are the types of chat?
ITA: Simple bar chat, Multiple bar chat, Component bar chat and Percentage component
bar chat.
b. Pie Chart
Pie chart is a cyclical representation of data. The circle is divided into various
variable of interest. Pie chart demonstrates the proportion of variable within a
group of variables and relates them to the group as a whole. It is a circular
diagram, which is divided into segments. The area of the segment is
proportionate to the magnitude of the variables.
In the construction of a pie chart, you are to calculate the proportion of the total
that each frequency represent and multiply each proportion by 360 degrees.
Example:
The following information revealed the number of registered students of the
various departments in the Distance Learning Centre of Ahmadu Bello
University (ABU) Zaria-Nigeria.
Department Registered Students
Accounting 100
Business Administration 120
Economics 170
Geography 120
Political Science 190
Sociology 200
120
Geography = × 360 = 48°
900
190
Pol.science = × 360 = 76°
900
200
Sociology = × 360 = 80°
900
Pie Chart
Accountancy
40
80 Business
48
Economics
Geography
76 68
Pol. Science
48
Sociology
ITA: Pie chart is a cyclical representation of data whereby a circle is divided into various
variable of interest.
Exercise:
The following is the record of student offering various courses in the Business
Administration department of Ahmadu Bello University (ABU), Zaria-Nigeria.
Courses Scores
BUAD 101 90
BUAD 111 70
BUAD 113 40
BUAD 115 50
BUAD 117 30
BUAD 119 65
Summary
In this study session, you have been made to understand the role of tabulation
method, charting method as well as graphical methods in Business Statistics.
References
1. Dominick, Salvatore and Derrick, Reagle (2002). Theory and Problems of
Statistics and Econometrics (2nd Edition). New York: McGraw-Hill
Companies, Inc.
2. Gerald, K and Brian, W (2006).Statistics for Management and Economics
(10th Edition). Pacific Grove-USA: Brooks/Cole-Thompson Learning.
3. Hooda, R. P (2014). Statistics for Business and Economics. New Delhi:
Macmillan.
4. Hein, L. W (2015). Quantitative Approach to Managerial Decisions. New
Jersey: Prentice Hall.
5. Lind, Marchal and Marson (2005). Statistical Techniques in Business and
Economics (12th Edition). New York: McGraw-Hill Companies.
9.1.2.3 STUDY SESSION 3
Measures of Central Tendency
Section and Subsection Headings
Introduction
Learning outcomes
Main Content
(A) Meaning of Central Tendency
(B) Measures of Central Tendency
i) Arithmetic Mean or Mean
ii) Median
iii) Mode
(C) Empirical Relationship between Mean, Median and Mode
(D) Specialised Averages
a. Harmonic Mean
b. Geometric Mean
(E) Relationship between Arithmetic Mean, Harmonic and Geometric
Mean
(F) Discussion Questions
Summary
Self-assessment questions and Answers
References/Further Readings
Introduction
In this study session, you will understand the meaning of central tendency, the
measures of central tendency as well as the specialised averages.
Learning outcomes
At the end of the study, you should be able to:
1. Understand the meaning of central tendency.
2. Know the measures of central tendency.
Where: i = 1
n = Number of items in a distribution
The symbol ∑ (sigma) is the summation in Greek. Removing the subscript from
the above, we have
x
X=∑
n
X=
∑ fx
∑f
Where: x = class mark
∑f= total frequency
X = Xa +
∑ fd
∑f
Where: X a = assumed mean
Example:
Give the date below, compute the mean using the two methods.
Sales Frequency
10 – 20 2
20 – 30 4
30 – 40 3
40 – 50 7
50 – 60 5
60 – 70 2
70 – 80 3
Solution:
Sales Class mark (x) Frequency (f) Fx
10 – 20 15 2 30
20 – 30 25 4 100
30 – 40 35 3 105
40 – 50 45 7 315
50 – 60 55 5 275
60 – 70 65 2 130
70 – 80 75 3 225
Total 26 1180
* Class mark (x) is calculated by adding the lower class limit and upper class limit and
divide by two.
10 + 20 30
= = 15
2 2
X=
∑ fx = 1180 = 45.38
∑ f 26
= 45 + 0.38 = 45.38
ii) Median
Median is one of the statistical measures of central tendency. There is confusion
as to what constitutes a median in a given distribution. Median is said to be a
middle item in a distribution. This is true when the items in the distribution are
odd number. In the case of even number, a little calculation is made to simplify
the location of the median.
A median can simply be said, that value of observation which divides a data
into two equal parts. It may be defined as the value of the middle observation
(or the mean of the values of two middle observation) when the observations are
arranged in an ascending or descending order of magnitude.
Example:
Calculate the median of 8, 12, 14, 19, and 11
Solution:
n + 1
th
Median = term
2
5 + 1 6
= = = 3 rd item
2 2
By this formula, the 3rd item in the distribution after arranging into ascending
order is the median
8, 11, 12, 14, 19
So median = 12
Example:
Compute the median, 3, 5, 7, and 9
Solution:
In the case of even number, the formula is:
n + 1
th
Median = term
2
4 + 1 5
= = = 2.5
2 2
The items in the distribution between 2nd and 3rd will be added and divide by
two.
5 + 7 12
= = =6
2 2
So median = 6
Example:
Given the information for the group data below:
Compute the median.
Sales Frequency
10 – 20 2
20 – 30 4
30 – 40 3
40 – 50 7
50 – 60 5
60 – 70 2
70 – 80 3
Solution:
Sales Frequency Cumulative
10 – 20 2 2
20 – 30 4 6
30 – 40 3 9
40 – 50 7 16
50 – 60 5 21
60 – 70 2 23
70 – 80 3 26
Then 13 had to be located at a position that will fall within the cumulative
frequency. Therefore 13 will fall under 16 and the lower class boundary will be
40. The lower class boundary is determined by adding the lower class limit and
the upper class limit and divide by two.
i.e.
L m= 40
N/2 = 26/2 = 13
(∑f)p = 9
f m= 7
C = 10
N
− (∑ f ) p
Median = L m + 2 C
fm
13 − 9
= 40 + 10
7
= 40 + 4/7 x 10 = 45.71
Exercise:
Given the data below:
Sales Frequency
10 – 20 2
20 – 30 4
30 – 40 3
40 – 50 7
50 – 60 5
60 – 70 2
70 – 80 3
Determine the median using graphical method
Solution:
Sales Frequency Less than Ogive Cumulative frequency
10 – 20 2 Less than 10 0
20 – 30 4 Less than 20 2
30 – 40 3 Less than 30 6
40 – 50 7 Less than 40 9
50 – 60 5 Less than 50 16
60 – 70 2 Less than 60 21
70 – 80 3 Less than 70 23
Less than 80 26
Advantages of Median
1. It eliminates the effect of extreme values
2. It often corresponds to a definite item in a distribution.
3. It is easy to calculate and understand
4. Only the values of the middle items need to be known i.e., the median can
still be calculated even if the first and the last classes are open-ended and
the lower and upper limits are known.
Disadvantages of Median
1. If the distribution is irregular, the indication of the median may not be
defined.
2. When the items are grouped, it may not be possible for the median to be
located exactly.
iii) Mode
Mode is a French word meaning fashion. It is defined as the most frequency
(fashionable) value. In simple term, mode is the item with the highest frequency
in a distribution for ungrouped data. In other words the items that occur very
often in a distribution is a mode.
A distribution with one mode is often referred to as Bimodal and three or more
modes are referred to as Multi-modal distribution.
Example:
Determine the mode of the following distribution:
2, 3, 4, 2, 6, 2
Solution:
Mode = 2
The mode is 2 because the number occurs more than twice.
Example:
Determine the mode for the following distribution,
3, 6, 2, 4, 3, 3, 2, 2, 5
Solution:
Mode = 3
The mode is 3 because the number 3 occur more than twice and therefore it is
multi-modal.
Where:
L = lower class boundary of the modal class
d 1 = Excess of modal frequency over the frequency of the class proceeding the
modal class
d 2 = Excess of modal frequency over the frequency of the next highest class
C = class interval of the modal class
Example:
Given the data compute the mode
Sales Frequency
10 – 20 2
20 – 30 4
30 – 40 3
40 – 50 7
50 – 60 5
60 – 70 2
70 – 80 3
Solution:
Sales Frequency
10 – 20 2
20 – 30 4
30 – 40 3
40 – 50 7*
50 – 60 5
60 – 70 2
70 – 80 3
Exercise:
Given the data below:
Determine the mode using graphical method.
Sales Frequency
10 – 20 2
20 – 30 4
30 – 40 3
40 – 50 7
50 – 60 5
60 – 70 2
70 – 80 3
Solution:
Mode, X = 46.67
The point of intersection of the points gives the mode of the distribution. When
a line is drawn vertically it gives the mode of the two intersection lines.
Advantages of Mode
1. Mode is very easy to calculate and understand.
2. Open-ended classes or extreme values do not affect it.
3. It is not necessary to know the values of all the items in the distribution in
order to calculate the mode.
Disadvantages of Mode
1. Mode is not a good measure of central tendency because it depends on
the arbitrary grouping of data.
2. Because of mode impression, its usefulness in calculations requiring a
high degree of accuracy is limited, particularly if the distribution is
bimodal or widely dispersed.
The position of the mean, median and mode for frequency curves are skewed to
the right and left respectively are shown below
Example:
Form the previous example 4 and 7 sales calculations and answer obtained of
mean and median. Use the empirical formular to calcutate the mode when mean
is 45.38 and median 45.71
Solution:
Mean = 45.38
Median = 45.71
Mean – mode = 3 (Mean – Median)
Mode = mean = 3 (mean – median)
= 45.38 – 3 (45.38 – 45.71)
+ 84 5.38 – ( - 0. 33)
= 45.38 + 0.99
= 46.37
This empirical formular of 46.37 compared with
formula of 46. 67 shows a good acceptability of
the agreement of the empirical formular.
a. Harmonic Mean
Harmonic mean is defined as the reciprocal of the mean of reciprocal of items in
a distribution. Harmonic mean used to measure average rate of change.
Examples:
Kunun-Zaki is one of the popular fermented beverage drinks in northern
Nigeria. To manufacture kunun-Zaki the cost of the ingredients per kilogramme,
which serves as a mixture, are as follows:
Items N
Millet 9
Guinea Corn 6
Sugar 3
Spies 2
1 ∑ f
=
n X
Where: f1+ f2+ f3+ …fn= ∑f
Example:
Given the data below:
Sales Frequency
10 – 20 2
20 – 30 4
30 – 40 3
40 – 50 7
50 – 60 5
60 – 70 2
70 – 80 3
1 ∑ f
HM =
n X
1 2 4 3 7 5 2 3
= + + + + + +
26 15 25 35 45 55 65 75
= 0.44[0.13 + 0.16 + 0.09 + 0.16 + 0.09 + 0.03 + 0.04]
= 0.4 × 0.7
= 0.028
b. Geometric Mean
Geometric mean used to measure average rate of change or growth for some
quantity, computed by taking the nth root of the product of n values representing
change. It is represented by a symbol either G or GM.
For a set observation N
X1, X2, X3 … Xn
GM = N X 1 , X 2, X 3 ....... X n
GM = N X 1 , X 2, X 3 ....... X n
Or GM = ( X 1 , X 2 , X 3 ........ X n )
1
n
log( X 1 , X 2 , X 3 ........ X n )
1
log GM =
n
Take antilog on both sides:
GM = anti log log( X 1 , X 2 , X 3 ........ X n )
1
n
Example:
Calculate the geometric mean of the following numbers 6, 4, 3, 7.
Solution:
1
GM = anti log ∑ log X
n
= anti log (log 6 + log 4 + log 3 + log 7 )
1
4
= anti log
1
(0.7782 + 0.6021 + 0.4771 + 0.8451)
4
Example:
In a class of a Nursery School three children has the following age, 5, 6, 7,
calculate:
a) Arithmetic mean
b) Harmonic mean
c) Geometric mean
d) Comment or relationship of a, b and c.
Solution:
a) =
∑X
n
5 + 6 + 7 18
= = =6
3 3
n
b) HM =
1
∑x
3
=
1 1 1
+ +
5 6 7
3
=
0.2 + 0.17 + 1.14
3
=
0.51
= 5.88
c) GM = N X 1 , X 2, X 3 ....... X n
= 3 5× 6× 7
= 3 210
= 5.94
d) The relationship of the three are as follows:
H<G<X
5.88 < 5.94 < 6
Quadratic Mean
Quadratic mean is one of the measures of central tendency. It is sometime called
Root Mean Square (RMS). Quadratic mean is more popular and useful in
physical applications.
Therefore QM = ∑X 2
Example:
Refer to the previous example and calculate the quadratic mean.
Solution:
QM =
∑X 2
n
52 + 6 2 + 7 2
=
3
25 + 36 + 49
=
3
110
=
3
= 36.6
= 6.06
Summary
In this study session, we have discussed the meaning of central tendency, the
measures of central tendency as well as the specialised averages.
Introduction
In this study session, you will be able to understand the meaning of variability,
significance and properties of variability as well as the measures of variability.
Learning Outcomes
At the end of the study session, you should be able to:
1. Discuss the meaning of variability.
2. Explain the significance and properties of variability.
3. Discuss the measures of variability.
(A) Meaning of Variability/Dispersion/Scatter/Spread
A measure of variation or dispersion is one that measures the extent to which
there are differences between individual observation and some central or
average value. In measuring variation we shall be interested in the amount of
the variation or its degree but not in the direction. For example, a measure of 6
inches below the mean has just as much dispersion as a measure of six inches
above the mean.
As we have seen, the various measures of central value give us one single figure
that represents the entire data. But the average alone cannot adequately describe
a set of observations, unless all the observations are the same. It is necessary to
describe the variability or dispersion of the observations. In two or more
distributions, the central value may bathe same but still there
can be wide disparities in the formation of distribution.
Measures of dispersion help us in studying this important
characteristic of distribution. With the help of dispersion, we
have an idea about homogeneity or heterogeneity of the
distribution.
1. Range
Range is the difference between the highest and the lowest observation in a
given set of data. Thus, range is easy to determine and use, it is a poor measure
of variability because, and it makes use of only the two extreme values in the
given data.
Example:
The number of patients a medical consultant sees per day during the five
working days (Monday to Friday) in the ABU Teaching Hospital is as follows:
2, 4, 9, 5, 3
Determine the range
Solution:
Range = highest value – lowest value
9–2=7
Symbolically, it is = Q 3 – Q 1
Symbolically, it is = Q 3 – Q 1
2
Many times the Interquartile Range is reduced in the form of Quartile
Deviation/Semi-Interquartile range.
Lower Quartile (Q 1 )
Q 1 : divides a distribution into parts such that 25% of all items in the
distribution have a value less than Q 1 and 75% have a value more than Q 1 .
Upper Quartile (Q 3 )
Q 3 : divides a distribution into two parts, 75% of all items in the distribution
have a value less than Q 3 while the remaining 25% have a value more than Q 3 .
NB: When quartile deviation is small, it means that there is a small deviation in
the central 50 percent items. In contrast, if the quartile deviation is high, it
shows that the central 50 percent items have a large variation. It may be noted
that in a symmetrical distribution, the two quartiles, that is, Upper Quartile (Q 3 )
and Lower Quartile (Q 1 ) are equidistant from the Median (M).
Symbolically, M-Q 1 = Q 3 -M
However, this is seldom the case as most of the business and economic data are
asymmetrical. But, one can assume that approximately 50 percent of the
observations are contained in the interquartile range. It may be noted that
interquartile range or the quartile deviation is an absolute measure of dispersion.
It can be changed into a relative measure of dispersion as follows:
Where:
Q j =jthquartiles
L j = lower class boundary of the jth quartile class
N = total frequency
∑f = sum of frequencies preceding the jth quartile class
F j = frequency of the jth quartile class
C j = class size of the jth quartile class
j = 1, 2, 3 …
Example:
JAMEEL Marketing (Nig.) has recorded the following frequency distribution in
one of its filling stations.
Solution:
Petrol per litre Frequency Cumulative frequency
35 – 39 2 2
40 – 44 3 5
45 – 49 2 7
50 – 54 8 15
55 – 59 7 22
60 – 64 6 28
To determine the lower class boundary N and locate its position in the
4
cumulative frequency.
L = 44.5
∑f =5
C =5
f =2
28
− (5)
Q1 = 44.5 + 4 5
2
7 −5
Q1 = 44.5 + 5
2
2
= 44.5 + 5
2
= 44.5 + 5
= 49.5
cumulative frequency.
84
− (15)
Q 3 = 54.5 + 4 5
7
21 − 15
Q 3 = 54.5 + 5
7
6
= 54.5 + 5
7
= 54.5 + 4.29
= 58.79
Example:
Size of Item Frequency
2-4 20
4-6 40
6-8 30
8-10 10
Solution:
Size of Mid-Points (m) Frequency fm d from fd
Item (f) x
2– 4 3 20 60 -2.6 52
4 –6 5 40 200 -0.6 24
6–8 7 30 210 1.4 42
8 – 10 9 10 90 3.4 34
∑ fm 560
x= = = 5.6
n 100
∑ f d 152
MD ( x ) = = = 1.52
n 100
Example:
x x−µ ( x − µ )2
20 20-18 = 2 4
15 15-18 = -3 9
19 19-18 = 1 1
24 24-18 = 6 36
16 16-18 = -2 4
14 14-18 = -4 16
108 Total 70
Solution:
Mean = 108/6 = 18
The second column shows the deviations from the mean. The third or the last
column shows the squared deviations, the sum of which is 70. The arithmetic
mean of the squared deviations is:
∑(χ − µ )
2
70
= = 11.67approx. (This result is variance)
N 6
This mean of the squared deviations is known as the variance. It may be noted
that this variance is described by different terms that are used interchangeably:
the variance of the distribution X; the variance of X; the variance of the
distribution; and just simply, the variance.
∑(χ − µ )
2
∑ (χ i − µ )
2
It is also written as σ =
2
N
Where σ2 (called sigma squared) is used to denote the variance.
∑(χ i − µ )
2
Symbolically, σ= =
N
In applied Statistics, the standard deviation is more frequently used than the
variance. This can also be written as:
∑χ 2
−
(∑ χi )
2
i
σ= N
N
We use this formula to calculate the standard deviation from the individual
observations given earlier.
Example:
x x2
20 400
15 225
19 361
24 576
16 256
14 196
108 2014
Solution:
∑ xi2 = 2014 ∑ xi = 108 N = 6
2014 −
(108)
2
2014 −
1164
σ= 6 = 6
6 6
70
σ= = 11.67
6
σ = 3.42
Example:
The following distribution relating to marks obtained by students in an
examination:
Marks Number of Students
0-10 1
10-20 3
20-30 6
30-40 10
40-50 12
50-60 11
60-70 6
70-80 3
80-90 2
90-100 1
Solution:
Marks Number of Mid-Points Deviations fd fd2
Students (d)/10 = d2
0-10 1 5 -5 -5 25
10-20 3 15 -4 -12 48
20-30 6 25 -3 -18 54
30-40 10 35 -2 -20 40
40-50 12 45 -1 -12 12
50-60 11 55 0 0 0
60-70 6 65 1 6 6
70-80 3 75 2 6 12
80-90 2 85 3 6 18
90-100 1 95 4 4 16
Total 55 Total -45 231
In the case of frequency distribution where the individual values are not known,
we use the midpoints of the class intervals. Thus, the formula used for
calculating the standard deviation is as given below:
K
∑ fi(m − µ)
2
i
σ= i =1
Where C is the class interval: fi is the frequency of the ith class and di is the
deviation of the item from an assumed origin; and N is the total number of
observations.
231 − 45
2
σ = 10 −
55 55
σ = 10 4.2 − 0.669421
σ = 18.8marks
When it becomes clear that the actual mean would turn out to be in fraction,
calculating deviations from the mean would be too cumbersome. In such cases,
an assumed mean is used and the deviations from it are calculated. While
midpoint of any class can be taken as an assumed mean, it is advisable to
choose the mid-point of that class that would make calculations least
cumbersome. Guided by this consideration, in the above example, we have
decided to choose 55 as the mid-point and, accordingly, deviations have been
taken from it. It will be seen from the calculations that they are considerably
simplified.
ITQ: Why is standard deviation preferred to the mean?
ITA: It is preferred because it has desirable mathematical properties.
σ
Symbolically, Coefficient of Variation (COV) = × 100
µ
Example:
In a small business firm, two typists are employed- typist A and typist B. Typist
A types out, on an average, 30 pages per day with a standard deviation of 6.
Typist B, on an average, types out 45 pages with a standard deviation of 10.
Which typist shows greater consistency in his output?
Solution:
σ
Coefficient of variation for A = × 100
µ
6
A= × 100
30
A = 20% and
σ
Coefficient of variation for B = × 100
µ
10
B= × 100
45
B = 22.2 %
These calculations clearly indicate that although typist B types out more pages,
there is a greater variation in his output as compared to that of typist A. We can
say this in a different way: Though typist A’s daily output is much less, he is
more consistent than typist B. The
usefulness of the coefficient of
variation becomes clear in comparing
two groups of data having different
means, as has been the case in the
above example.
Summary
In this study session, we have discussed the meaning of variability, significance
and properties of variability as well as the measures of variability.
In the case of Classical Approach, the probability is obtained using theory rather
than observation. Theoretical probability has several
characteristics, one of which is that it assumes symmetry
of events. A second characteristic is that it is based on
abstract reasoning and does not depend on experiment. It is
sometimes called priori probability.
The Subjective Approach refers to assigning probability numbers using
personal judgment. The subjective probability of an event A is a probability
that expresses the decision maker’s personal belief that A will occur. According
to the subjectivist view, a probability expresses the strength of person’s belief.
There exist various probability distributions that are very relevant in computing
probability problems, notable among which are
Uniform probability distribution, Binomial
distribution, Hyper-geometric distribution, Poisson
distribution and Normal distribution.
9.2.1 Objectives
After this contact module, you will be able to:
1. calculate the probability of certain events occurring, based on previous
occurrences;
2. understand the link between probability and statistical frequencies;
3. combine probabilities;
4. calculate expected values for certain occurrences;
5. understand the limitations of probability as a tool for decision making;
6. compute discrete and continuous probabilities;
7. recognise the situations where the use of the
normal distribution is appropriate; and
8. solve problems using normal distribution table
arising from various situations.
9.2.2 STUDY SESSIONS
9.2.2.1 STUDY SESSION 5: Skewness and Kurtosis
Section and Subsection Headings
Introduction
Learning outcomes
Main Content
(A) Meaning of Skewness
1. Symmetrical Distribution
2. Asymmetrical Distribution
(B) Tests of Skewness
(C) Measures of Skewness
(D) Meaning of Kurtosis
(E) Measures of Kurtosis
(F) Discussion Questions
Summary
Self-assessment questions and Answers
References/Further Readings
Introduction
You are welcome to study session five. In this study session, you will be able to
understand the meaning of skewness, kurtosis as well as their various measures.
In the above study sessions, we have discussed frequency distributions in detail.
It may be repeated here that frequency distributions differ in three ways:
Average value, Variability or Dispersion, and Shape. Since the first two, that
is, average value and variability or dispersion have already been discussed
previously, here our main spotlight will be on the shape of frequency
distribution. Generally, there are two comparable characteristics called
skewness and kurtosis that help us to understand a distribution. Two
distributions may have the same mean and standard deviation but may differ
widely in their overall appearance as can be seen from the diagram below:
In both these distributions the value of mean and standard deviation is the same
( x = 15 , σ = 5 ). But it does not imply that the distributions are alike in nature.
The distribution on the left-hand side is a symmetrical one whereas the
distribution on the right-hand side is asymmetrical or skewed. Measures of
skewness help us to distinguish between different types of distribution.
Learning Outcomes
At the end of the study session, you should be able to:
1. Explain the meaning of skewness and its measures.
2. Explain the meaning of kurtosis and its measures.
ii. Negatively Skewed Distribution. The bottom diagram below is the shape of
negatively skewed distribution. In a negatively skewed distribution the value of
mode is maximum and that of mean least-the median lies in between the two. In
the positively skewed distribution the frequencies are spread out over a greater
range of values on the high-value end of the curve (the right-hand side) than
they are on the low-value end. In the negatively skewed distribution the position
is reversed, i.e. the excess tail is on the left-hand side. It should be noted that in
moderately symmetrical distributions the interval between the mean and the
median is approximately one-third of the interval between the mean and the
mode. It is this relationship, which provides a means of measuring the degree of
skewness.
ITQ: What is skewness?
ITA: To Morris Hamburg “Skewness refers to the asymmetry or lack of symmetry in
the shape of a frequency distribution”.
There are various measures of skewness, each divided into absolute and relative
measures. The relative measure is known as the coefficient of skewness and is
more frequently used than the absolute measure of skewness. When a
comparison between two or more distributions is involved, it is the relative
measure of skewness, which is used. The measures of skewness are: Karl
Pearson’s measure, Bowley’s measure, Kelly’s measure and Moment’s
measure. Here, the Karl Peason’s measure is discussed briefly below:
Example:
The information below shows the extent of patronage by female students in
buying jeans trousers in a particular supermarket in Zaria City.
Solution:
First, calculate the mean, mode and standard deviation.
Jeans Class mark(x) f fx
0–5 2.5 5 12.5
5 - 10 7.5 2 15
10 -15 12.5 3 37.5
15 -20 17.5 8 140
20 – 25 22.5 4 90
Total f=22 fx=295
∑ fx 295
Mean (X) = = = 13.41
∑f 22
d1
Mode = L1 + C
d1 + d 2
5
= 15 + 5
5+ 4
5
= 15 + 5
9
25
= 15 +
9
= 15 + 2.78
= 17.78
SD(σ ) =
∑ fx 2
∑f
()
− x
2
Mean(X) = 13.41
Jeans Class mark(X) Frequency(f) X-X (X - X) 2 f(X - X) 2
0–5 2.5 5 -10.91 119.03 595.14
5- 10 7.5 2 -5.91 34.93 69.89
10-15 12.5 3 -0.91 0.83 2.48
15-20 17.5 8 4.09 16.73 133.83
20-25 22.5 4 9.09 82.63 330.51
Total ∑fx = 22 1131.82
σ=
∑ fx 2
∑f
− x() 2
1131.82
σ=
22
σ = 51.41
σ = 7.17
N
2 − (∑ f ) p
Median = L m + C
f m
22
2 − (10 )
= 15 + 5
8
11 − 10
= 15 + 5
8
1
= 15 + × 5
8
= 15 + 0.625
= 15.63
Q3 − 2Q2 + Q1
=
Q3 − Q1
=
(P90 − P50 ) − (P50 − P10 )
P90 − P10
Calculate:
a) Quartile coefficient of skewness
b) Percentile coefficient of skewness
Solution:
a) Calculate Q1
jN
4 − (∑ f ) p
Q1 = L j + C j
fj
jN
4 − (∑ f ) p
Q1 = L j + C j
f j
N 41
= = 10.25
4 4
41
4 − (5)
Q1 = 4 + 2
6
10.25 − 5
= 4+ 2
6
5.25
= 4+ ×2
6
10.5
= 4+
6
= 4 + 1.75
= 5.75
2N
4 − (∑ f ) p
Q2 = L j + C j
fj
2 N 2 × 41 81
= = = 20.5
4 4 4
2 × 41
4 − (19)
Q2 = 8 + 2
10
1.5
=8+ ×2
10
4.5
=8+
10
= 8 + 0.45
= 8.45
3N
4 − (∑ f ) p
Q3 = L j + C j
f j
3 N 3 × 41
= = 30.75
4 4
3 × 41
4 − (29)
Q 2 = 10 + 2
12
1.75
= 10 + ×2
12
= 10 + 0.29
= 10.29
Q3 − 2Q2 + Q1
a) Quartile coefficient of skewness =
Q3 − Q1
10.29 − 2(8.45) + 5.75
=
10.29 − 5.75
− 0.86
=
4.54
= −0.1894
50 N
100 − (∑ f ) p
P50 = L j + C j
fj
50 N 50 × 41
= = 20.5
100 100
2050
100 − (19)
P50 = 8 + 2
10
20.5 − 19
=8+ 2
10
1.5
=8+ ×2
10
3
=8+
10
= 8 + 0.3
= 8.3
90 N
100 − (∑ f ) p
P90 = L j + C j
f j
90 N 90 × 41
= = 36.9
100 100
3690
100 − (29)
P90 = 10 + 2
12
36.9 − 29
= 10 + 2
12
7.9
= 10 + ×2
12
= 10 + 1.316
= 11.3
It will be seen from the above figure that Mesokurtic curve is neither too much
flattened nor too much peaked. In fact, this is the frequency curve of a normal
distribution. Leptokurtic curve is a more peaked than the normal curve. In
contrast, Platykurtic is a relatively flat curve. The coefficient of kurtosis as
given by Karl Pearson is β 2 = µ 4 / µ 22 . In case of a normal distribution, that is,
Mesokurtic curve, the value of β 2 = 3 . If β 2 turn out to be > 3, the curve is
called a Leptokurtic curve and is more peaked than the normal curve. Again,
when β 2 < 3, the curve is called a Platykurtic curve and is less peaked than the
normal curve. The measure of kurtosis is very helpful in the selection of an
appropriate average. For example, for normal distribution, mean is most
appropriate; for a leptokurtic distribution, median is most appropriate; and for
platykurtic distribution, the quartile range is most appropriate.
Quartile and Percentile: This is another measure of kurtosis which is also used
based on both quartile and percentile form and is expressed as follows:
Q
K=
P90 − P10
Where: Q=
1
(Q3 − Q1 ) (Semi-Interquartile range)
2
Example:
The following information in the data below relates to the length of time spent
by car owners in filling station during a particular month.
Q=
1
(Q3 − Q1 )
2
Q=
1
(10.33 − 5.67 )
2
Q=
1
(4.66)
2
Q = 2.33
2.33
=
11.33 − 3.5
2.33
= = 0.29
17.8
Summary
In this study session, we have discussed the meaning of skewness, kurtosis as
well as their various measures in Business Statistics.
References
1. Dominick, Salvatore and Derrick, Reagle (2002). Theory and Problems of
Statistics and Econometrics (2nd Edition). New York: McGraw-Hill
Companies, Inc.
2. Gerald, K and Brian, W (2006).Statistics for Management and Economics
(10th Edition). Pacific Grove-USA: Brooks/Cole-Thompson Learning.
3. Hein, L. W (2015). Quantitative Approach to Managerial Decisions. New
Jersey: Prentice Hall.
4. Hooda, R. P (2014). Statistics for Business and Economics. New Delhi:
Macmillan.
5. Lind, Marchal and Marson (2005). Statistical Techniques in Business and
Economics (12th Edition). New York: McGraw-Hill Companies.
9.2.2.2 STUDY SESSION 6
Probability I
Section and Subsection Headings
Introduction
Learning outcomes
Main Content
(A) Meaning of Probability
(B) How to Assign Probabilities to Events
(C) Computational Probability Rule
(D) The Bayes’ Theorem
Summary
Self-assessment questions and Answers
References/Further Readings
Introduction
In this study session, you will be able to understand the meaning of probability,
some basic concepts, Bayes’ theorem and many more.
Learning Outcomes
At the end of the study session, you should be able to:
1. Explain the meaning of probability.
2. Discuss some basic concepts.
3. Explain the Bayes’ theorem and many others.
The probability always has two chances. The chances are either to win or lose
and occurrence or non-occurrence. The most commonly used words for
probability include the following: chance, likely, probably or possibly.
Examples of probability statements are:
a) The patient will either survive or die
b) The business ventures will either make a profit or a loss
c) The man will either win or lose the game
d) There is a probability of success or a failure in a business
Example:
A student tosses a coin once. What is the probability that head will appear.
Solution:
P (head) = ½
Example:
A student throws a die once. What is the probability that the die face with
number 3 will appear.
Solution:
P (face with number 3) = 1/6
Example:
A carton of T–shirts contained 24 shirts in three different colours, 12 white, 8
yellow and 4 brown.
What is the probability of a customer willing to buy a T-shirt, selecting at
random, either a yellow or brown shirt?
Solution:
P (yellow) = Number of occurrences
Total number of occurrences
= 8 /24= 1/3
P (brown) = 4/24 = 1/6
The probability of not selecting brown = (1 - 4/24)
P (selecting either yellow or brown) = P(Y) + P (B)
= 8/24 + 4/24
= 12/24
= 1/2
Example:
Toss 2 fair coins, what is the sample space
Solution:
S is described as:
S = {HH, HT, TH, TT}
Example:
A calculator manufacturing company manufactures two types of calculators. In
their advertisement, the company is proud of their products, emphasizing on the
durability and reliability of the calculators. The company calculators are
labelled as calculator X and calculator Y. the probability that calculator X will
last for 30years is ¾ and the probability that calculator Y will last for 30 years is
2/3.
Find the probability that:
a) Both calculator will last for 30years;
b) Only calculator X will last for 30 years;
c) At least one calculator will last for 30 years
Solution:
a) P(both calculators) = P(XY) = P(X) x P(Y)
= ¾ x 2/3 = 6/12
b) = P(XY’) = P(X) x P( Y’)
= P(X) = ¾
= P(Y’) = 1 – 2/3 = 1/3
= ¾ x 1/3= 3/12 = ¼
c) P( at least one calculator will last for 30 years)
= P (XY’) = P(X’Y) x P(XY)
= P(X) x P(Y’) + P(X’) x P(Y) + P(X) x P(Y)
= ¾ x 1/3 + ¼ x 2/3 + ¾ x 2/ 3
= 3/12 + 2/12 + 6/12
= 11/12
Similar to the situation of the seller of winter garment, situations exist where we
are interested in an event on an ongoing basis. Every time some new
information is available, we do revise our odds mentally. This revision of
probability with added information is formalised in probability theory with the
help of famous Bayes’ Theorem. The theorem, discovered in 1761 by the
English clergyman Thomas Bayes, has had a profound impact on the
development of statistics and is responsible for the emergence of a new
philosophy of science. Bayes himself is said to have been unsure of his
extraordinary result, which was presented to the Royal Society by a friend in
1763 - after Bayes’ death. We will first understand. The Law of Total
Probability, which is helpful for derivation of Bayes’ Theorem.
Summary
In this study session, we have discussed the meaning of probability, some basic
concepts and Bayes’ theorem.
(E) Discussion Questions
1. Explain with examples the meaning of the following:
a. Independent events
b. Mutually exclusive events
c. Conditional probability
d. Expected value
2. What is the difference between mathematical and statistical probabilities?
3. State probability axioms and the multiplication theorem of probability.
4. Find the probability that:
• both machines will be operating in two years’ time.
• neither machine will be operating in two years’ time.
• at least one machine will be operating in two years’ time.
• Find the probability that:
• the ball is yellow, given that it is striped.
• the ball is striped, given that it is red.
• the ball is blue, given that it is solid-coloured.
5. Compute the Expected Value and Variance of the problem.
6. Explain Bayes’ Theorem.
7. Define discrete random variable and discrete probability distribution.
References
1. Dominick, Salvatore and Derrick, Reagle (2002). Theory and Problems of
Statistics and Econometrics (2ndEdition). New York: McGraw-Hill
Companies, Inc.
2. Gerald, K and Brian, W (2006).Statistics for Management and Economics
(10th Edition). Pacific Grove-USA: Brooks/Cole-Thompson Learning.
3. Hein, L. W (2015). Quantitative Approach to Managerial Decisions. New
Jersey: Prentice Hall.
4. Hooda, R. P (2014). Statistics for Business and Economics. New Delhi:
Macmillan.
5. Lind, Marchal and Marson (2005). Statistical Techniques in Business and
Economics (12th Edition). New York: McGraw-Hill Companies.
9.2.2.3 STUDY SESSION 7
Probability II
Section and Subsection Headings
Introduction
Learning outcomes
Main Content
(A) Discrete Probability Distribution
(B) Bernoulli Random Variable
(C) The Binomial Distribution
(D) The Poisson Distribution
(E) Hyper-geometric Distribution
Summary
Self-assessment questions and Answers
References/Further Readings
Introduction
You are welcome to this study session. In this session, you will understand the
meaning of Discrete Probability Distribution, Bernoulli Random Variable,
Binomial Distribution, Poisson Distribution as well as Hyper-Geometric
Distribution.
Learning Outcomes
At the end of this study session, you should be able to:
1. Know the meaning of Discrete Probability Distribution.
2. Know the concepts of Bernoulli Random Variable, Binomial
Distribution, Poisson Distribution and many others.
(A) Discrete Probability Distribution
In many situations, our interest does not lie in the outcomes of an experiment as
such; we may find it more useful to describe a particular property or attribute of
the outcomes of an experiment in numerical terms. For example, out of three
births; our interest may be in the matter of the probabilities of the number of
boys. Consider the sample space of 8 equally likely sample points.
Now look at the variable “the number of boys out of three births”. This
number varies among sample points in the sample space and can take values 0,
1, 2, 3, and it is random –given to chance.
Continuous if can take on any value in an interval of numbers (i.e. its possible
values are unaccountably infinite). For example, measured data on heights,
weights, temperature, and time and so on.
The correspondence between sample points and the value of the random
variable allows us to determine the probability distribution of X as follows:
These conditions must hold because the P(X = x) values are probabilities. First
condition specifies that all probabilities must be greater than or equal to zero, as
we know from study session 6.
For the second condition, we note that for each value x, P(x) = P(X = x) is the
probability of the event that the random variable equals x. Since by definition all
x means all the values the random variable X may take, and since X may take
on only one value at a time, the occurrences of these values are mutually
exclusive events, and one of them must take place. Therefore, the sum of all the
probabilities P(X = x) must be 1.00.
Let us first study the Bernoulli random variable, named so in honor of the
mathematician Jakob Bernoulli (1654-1705). It is the building block for other
random variables and the resulting distributions we will study here and other
study sessions.
Just after the operator produces one pin, it is inspected; let X denote the
“number of good pins produced” i. e. “the number of successes”.
Now analysing the trial- “inspecting a pin “and our random variable X-
“number of successes”, we note two important points:
1. The trial-“inspecting a pin “has only two possible outcomes, which are
mutually exclusive. Such a trial, whose outcome can only be either a success or
a failure, is a Bernoulli trial. In other words, the sample space of a Bernoulli
trial is
S= {success, failure}
2. The random variable, X, that measures number of successes in one Bernoulli
trial, is a Bernoulli random variable. Clearly, X is1 if the pin is good and 0 if
it is defective.
the third, fifth, sixth and ninth are good pins, or successes. The rest are failures.
In practice, we are usually interested in the total number of good pins rather
than the sequence of 1’sand 0’s. In the example above, four out of nine are
good. In the general case, let X denote the total number of good pins produced
in n trials. We then have
X = X 1 + X 2 +………+ X n where all X i ~ BER(p) and are independent.
The random variable that counts the number of successes in many
independent, identical Bernoulli trials is called a Binomial Random
Variable.
1. There are only two mutually exclusive and collectively exhaustive outcomes
in the experiment i.e. S= {success, failure}
2. In repeated trials of the experiment, the probabilities of occurrence of these
events remain constant
3. The outcomes of the trials are independent of one another.
The probability distribution of Binomial Random Variable is called the
“Binomial Distribution”
The principles upon which binomial distributions are applicable are, in the cases
of repeated trials.
Example:
A super market ordered for certain product from the manufacturer. For the
goods to be accepted by the super market, a team of senior managers has to
inspect the product, cartoon by carton. The managers finally arrived at a
decision to make a random sample of seven items from each carton to see
whether they are good or defective. If there is one or less defective item in the
sample of seven items from each carton, the carton is accepted or otherwise it is
rejected.
Solution:
Probability of defective 1% = 0.01
Sample size, n = 7
Decision Rule:
Accept carton when x = 0, or x = 1
Probability of defective = 0.01
Probability of no defective (goods) = 0.99
P(accepting carton) = n C x P x (1 − P )n − x
P(no defective) is given as:
P(X)= n C x P x (1 − P )
n −x
n
= ∑ C x (0.01) (0.99 )
x n −x
x =0
= 7 C 0 (0.01) 0 (0.99 )
7
= 1 × 1 × 0.9321
= 0.9321
= 7 C1 (0.01)(0.99)
6
= 7 × 0.01 × 0.9415
= 0.0659
We can develop the Poisson probability rule from the Binomial probability rule
under the above conditions.
Example:
A labour expert on the phenomenon of strikes conducted a research. The
researcher found out that the mean number of strikes in a manufacturing
industry was 3.4 per month.
What is the probability that during a given month, there will be?
a. No strike at all
b. More than 2 strikes
c. Exactly 4 strikes
Solution:
µ xe−µ
f (x ) =
x!
µ = 3.4
3.4 0 e −3.4
a. P(no strikes) = P(x = 0) = = e −3.4 = 0.0334
0!
b. P(more than 2 strikes) = P(x > 2)
P(x > 2) = 1 − P( x ≤ 2)
The sum of all the probability is equal to one:
P( x ≤ 2) = P(x = 0 ) + P(x = 1) + P(x = 2 )
3.4 0 e −3.4
P(x = 0) = = e −3.4 = 0.0334
0!
3.41 e −3.4
P( x = 1) = = 3.4e −3.4 = 3.4 × 0.0334 = 0.1136
1!
3.4 2 e −3.4 11.56e −3.4
P(x = 2) = = = 0.1931
2! 2!
µ xe−µ
c. P(exactly 4 strikes) = f (x ) =
x!
3.4 4 e −3.4
P(x = 4) =
4!
−3.4
133.63e
=
4 × 3× 2
133.63 × 0.0334
=
24
4.4633
=
24
= 0.1859
In this example, k = 4 because there are four aces in the deck, x = 2 because the
problem asks about the probability of getting two aces, N = 52 because there are
52 cards in a deck, and n = 3 because 3 cards were sampled. Therefore:
References
1. Dominick, Salvatore and Derrick, Reagle (2002). Theory and Problems of
Statistics and Econometrics (2nd Edition). New York: McGraw-Hill
Companies, Inc.
2. Gerald, K and Brian, W (2006).Statistics for Management and Economics
(10th Edition). Pacific Grove-USA: Brooks/Cole-Thompson Learning.
3. Hein, L. W (2015). Quantitative Approach to Managerial Decisions. New
Jersey: Prentice Hall.
4. Hooda, R. P (2014). Statistics for Business and Economics. New Delhi:
Macmillan.
5. Lind, Marchal and Marson (2005). Statistical Techniques in Business and
Economics (12th Edition). New York: McGraw-Hill Companies.
9.2.2.4 STUDY SESSION 8
Probability III
Section and Subsection Headings
Introduction
Learning outcomes
Main Content
(A) Continuous Probability Distribution
(B) The Normal Distribution
(C) The Standard Normal Distribution
(D) The Transformation of Normal Random Variables
Summary
Self-assessment questions and Answers
References/Further Readings
Introduction
You are welcome to this study session. In this session you will get to understand
the meaning of Continuous Probability Distribution, Normal Distribution,
Standard Normal Distribution as well as the Standard Normal Variable
Transformation.
Learning Outcomes
At the end of this study session, you should be able to:
1. Understand the meaning of Continuous Probability Distribution.
2. Explain the Normal Distribution, Standard Normal Distribution as
well as the Standard Normal Variable Transformation.
In the first case, Binomial random variable X 1 could take only finite number of
integer values;0,1,2…n; whereas in the second case, Poisson random variable
X 2 could take an infinite number of integer value; 0,1,2,3…………The random
variables X 1 and X 2 are discrete, in the sense that they could be listed in a
sequence, finite or infinite. In contrast to these, let us consider a situation, where
the variable of interest may take any value within a given range. Suppose we are
planning for measuring the variability of an automatic bottling process that fills
½ liter (500 cm3) bottles with cola. The variable, say X, indicating the deviation
of the actual volume from the normal (average) volume can take any real value -
positive or negative; integer or decimal. This type of random variable which can
take an infinite number of values in a given range, is called a continuous
random variable, and the probability distribution of such a variable is called a
continuous probability distribution. The concepts and assumption inherent in
the treatment of such distributions are quite different from those used in the
context of a discrete distribution. In this study session, after understanding the
basic concepts of continuous distributions, we will discuss Normal distribution -
an important continuous distribution that is applicable to many real-life
processes.
3. The total area under the entire curve of f(x) is equal to 1.00.
∝
P(− ∝≤ X ≤∝ ) = ∫ f ( x ).dx = 1.00
∝
When the sample space is continuous, the probability of any single given value
is zero. For a continuous random variable, therefore, the probability of
occurrence of any given value is zero. We see this from property 2, noting that
the area under a curve between a point and itself is the area of a line, which is
zero. For a continuous random variable, non-zero probabilities are
associated only with intervals of numbers.
ITQ: The type of random variable which can take an infinite number of values in a
given range, is called ………
ITA: a continuous random variable
The Normal Distribution is the most versatile of all the continuous probability
distributions. It is being widely used in all data-based research in the field of
agriculture, trade, business and industry. It is found to be useful in
characterizing uncertainties in many real-life processes, in statistical inferences,
and in approximating other probability distributions.
In many real life situations, we face the problem of making statistical inferences
about processes based on limited data. Limited data is basically a sample from
the full body of data on the process. Irrespective of how the full body of data is
distributed, it has been found that the Normal Distribution can be used to
characterize the sampling distribution of many of the sample statistics. (We will
see it in next few study sessions). This helps considerably in Statistical
Inferences.
=e 2
σ
A normal distribution has two parameters; mean and standard deviation. The
two parameters of a normal distribution (i.e. mean of a distribution and standard
deviation) are denoted by (μ) and (σ) respectively. The distribution is
symmetrical and has an area of one square unit. Since the distribution is
symmetrical, it means that half the curve is 0.5 square unit (e.g. cm, kg, litres,
etc.).
ITQ: Which distribution is the most versatile of all the continuous probability
distributions?
ITA: The Nominal Distribution which was developed by Abraham De Moivre (1667-
1754).
Example:
The attendance at Ababa’s wedding is normally distributed with a mean of four
hundred and a standard deviation of one hundred participants. What is the
probability that the participants are?
a. Between 250 and 500
b. Less than 250
c. Between 500 and 600
d. More than 600.
Solution:
μ = 400
σ = 100
a. Where x = 250
Where x = 500
500 − 400
Z= =1
100
P(Z = 1) = 0.3413
0.4332 0.3413
-1.5 0 1
= 0.4332 + 0.3413
= 0.7745
b. The probability that the participants are
When x = 250
250 400 Z
x−µ
Z=
σ
250 − 400
Z= = −1.5
100
P(Z = −1.5) = 0.4332
P(Z < −1.5) = 0.5 − 0.4332
0.4332
= 0.0668
-1.5 0 Z
The probability that the participants are less than 250 is 0.0668
Where x = 600
600 − 400
Z= =2
100
P(Z = 2) = 0.4772
0 1 2 Z
400 600 Z
Where x = 600
600 − 400
Z= =2
100
P(Z = 2) = 0.4772
0.4772
0 2 Z
= 0.0228
The probability that the participants are more than 600 is 0.2718.
We say:
Z ~ N (0,12)
X −µ
The transformation Z = takes us from a random variable X with mean μ,
σ
and standard deviation σ to the standard normal random variable. We also have
an Opposite, or Inverse Transformation, which takes us from the standard
normal random variable Z to the random variable X with mean μ and standard
deviation σ. The inverse transformation is given as
X = μ + Zσ
Summary
In this study session, we have discussed the meaning of Continuous Probability
Distribution, Normal Distribution, Standard Normal Distribution as well as the
Standard Normal Variable Transformation.
(E) Discussion Questions
1. What is the probability that a baby born in the hospital will weigh more than
8kg? Less than 7kg?
2. Why is there a need to standardised variable by means of transformation?
References
1. Dominick, Salvatore and Derrick, Reagle (2002). Theory and Problems of
Statistics and Econometrics (2nd Edition). New York: McGraw-Hill
Companies, Inc.
2. Gerald, K and Brian, W (2006).Statistics for Management and Economics
(10th Edition). Pacific Grove-USA: Brooks/Cole-Thompson Learning.
3. Hein, L. W (2015). Quantitative Approach to Managerial Decisions. New
Jersey: Prentice Hall.
4. Hooda, R. P (2014). Statistics for Business and Economics. New Delhi:
Macmillan.
5. Lind, Marchal and Marson (2005). Statistical Techniques in Business and
Economics (12th Edition). New York: McGraw-Hill Companies.
9.3 MODULE 3
Sampling and Sampling Methods
Introduction
A sample is a set of measurement taken from a process or series of experiment.
It must be regarded as having been drawn from a large population of
measurements covering a large number of observations. If there is limit to the
number of possible observations, we have a finite population. An unlimited
number of observations constitute an infinite sample.
A sample provides a small amount of information about the population and yet,
for various reasons, we rely on the information available from the sample. There
are two principal methods of drawing a sample from a population. These are
probability samples and non-probability samples. In probability samples, each
observation in the population has an equal chance of being selected to become a
part of the sample. In the case of non-probability samples, there is no way of
estimating the probability that each individual will be included in the sample.
Introduction
You are welcome to another study session. In this study session you will
understand the meaning of sampling, the concept of probability & non-
probability samples, determination of sample size and many more.
Learning Outcomes
In this study session, you should be able to:
1. Discuss the meaning of sampling.
2. Explain the concept of probability & non-probability samples,
determination of sample size and many more.
In a survey of the entire population, data is collected from every elementary unit
of the population. Suppose, one is studying the wage structure of the coal
mining industry in the country, then one approach is to collect the data on
wages of every worker in the coal industry. From this data, one can calculate the
various characteristics of the population, such as average wage, the range and
the variance, etc. This is referred as census survey.
Although there are many advantages with the census method, the cost, effort
and the time required to conduct census survey is very large, unless the
population is very small, and in many cases it is so prohibitive that one rarely
uses this method in surveys.
As the sampling involves less time and money, it would be possible to give
attention to different characteristics of the elementary units. A
sample using same money and time can produce a detailed
study of lesser number of units. The process of sampling
involves selecting a sample, collecting all relevant information,
and finally drawing conclusions about the population from
which the sample has been drawn.
There are, however, some difficulties in these procedures. For, if N is large, the
task becomes physically difficult. So it is desirable to use better methods for
ensuring randomness. One such method is the use of random number tables.
Thus, the elements to be drawn from each stratum would be 100, 150, 200 and
50 respectively. Proportional stratification yields a sample that represents the
population with respect to the proportion in each stratum in the population.
Proportional stratified sampling yields satisfactory results if the dispersion in
the various strata is of proportionately the same magnitude. If there is a
significant difference in dispersion from stratum to stratum, sample estimates
will be much more efficient if non-proportional stratified random sampling is
used. Here, equal numbers of elements are selected from each stratum
regardless of how the stratum is represented in the population. Thus, in the
earlier example, an equal number, i.e., 125, of elementary units will be drawn to
constitute the sample.
Suppose we want to take a sample of 5,000 households from the Kaduna State.
At the first stage, the state may be divided into a number of districts and a few
districts are selected at random. At the second stage, each district may be sub-
divided into a number of villages and a sample of villages may be taken at
random. At the third stage, a number of households maybe selected from each
of the villages selected at second stage. To take another example supposes in a
particular survey, we wish to take a sample of 10,000 students from Ahmadu
Bello University Zaria-Nigeria. We may take faculties at the first stage, then
draw departments at the second stage, and choose students as the third and last
stage.
N
Symbolically: m =
n
Where: N is the population size and n is the sample size. While calculating the
value of m, we may get a fractional value. In such cases, it is rounded off to the
nearest digit.
Quota sampling method has the advantage that the sample will conform to the
selected parameters of the population. The cost and time involved in getting
information from the sample will be relatively less for a quota sample but there
are many weaknesses too. Some of these include: difficulty to validate the
information gathered on the elementary units; also, it may be difficult to specify
the characteristics of the population and therefore it may be hard to identify it
and even when the sample does conform to the characteristics used in the
quotas, the sample may be distorted on other factors of importance in the study.
Therefore, one has to make a compromise between obtaining data with greater
precision and with that of lower cost of data collection. Several factors need to
be considered before determining the sample size.
The first and the foremost is the size of the error that would be tolerable for the
purposes of decision-making. The second consideration would be the degree of
confidence with the results of the study, i.e., if one wants to be 100 per cent
confident of the results, the entire population must be studied. However, this is
generally too impractical and costly. Therefore, one must accept something less
than 100 per cent confidence. In practice, the confidence limits most often used
are 99 per cent, 95 per cent and 90 per cent. Most commonly used confidence
limit is 95 per cent. This means that there is a 5 per cent risk that the true
population statistic is outside the range of possible error specified by the
confidence interval. This 5 per cent risk appears to be acceptable in most of the
decisions. Thus, for 95 per cent level of confidence, Z value is 1.96. The Z value
can be obtained from normal probability distribution for a specified level of
confidence. For determining the sample size, we make use of the following
relationship:
σ
σ = Standard error of the estimate =
−
x n
σ can be calculated if we know the upper and lower confidence limits. Let
−
x
Example:
A state cooperative department is performing a survey to determine the annual
salary earned by managers numbering 3000 in the cooperative sector within the
state. How large a sample size it should take in order to estimate the mean
annual earnings within plus and minus 1,000 and at 95 per cent confidence
level? The standard deviation of annual earnings of the entire population is
known to be #3,000.
Solution:
As the desired upper and lower limit is #1,000, i.e., we want to estimate the
annual earnings within plus and minus #1,000.
∴ Zσ − = 1000
x
σ
The standard error σ is given by
− . Where σ is the population standard
x n
deviation
∴ σ
= 510.20
n
3000
= 510.20
i.e., n
3000
n= = 5.88
i.e., 510.2
Summary
In this study session, we have discussed the meaning of sampling, the concept
of probability & non-probability samples as well as determination of sample
size.
References
1. Dominick, Salvatore and Derrick, Reagle (2002). Theory and Problems of
Statistics and Econometrics (2nd Edition). New York: McGraw-Hill
Companies, Inc.
2. Gerald, K and Brian, W (2006). Statistics for Management and
Economics (10th Edition). Pacific Grove-USA: Brooks/Cole-Thompson
Learning.
3. Hein, L. W (2015). Quantitative Approach to Managerial Decisions. New
Jersey: Prentice Hall.
4. Hooda, R. P (2014). Statistics for Business and Economics. New Delhi:
Macmillan.
5. Lind, Marchal and Marson (2005). Statistical Techniques in Business and
Economics (12th Edition). New York: McGraw-Hill Companies.
9.3.2.2 STUDY SESSION 10
Sampling II
Section and Subsection Headings
Introduction
Learning outcomes
Main Content
(A) Sampling Distribution
(B) Why We Study Sampling Distributions?
(C) Sampling Distribution of the Mean
(D) The Central Limit Theorem
(E) Sampling Distribution of the Proportion
(F) Sampling Distribution of the Difference of Sample Means
(G)Sampling Distribution of the Difference of Sample Proportions
(H) Small Sampling Distributions
(I) Sampling Distribution of the Variance
(i) The Sample Variance
(ii) The Chi-Square Distribution
Summary
Self-assessment questions and Answers
References/Further Readings
Introduction
You are welcome to another study session. In this session, you will understand
the meaning of Sampling Distribution, Central Limit Theorem, Small Sampling
Distributions, Sampling Distribution of the Variance.
Learning Outcomes
In this study session, you will be able to:
1. Explain the meaning of Sampling Distribution.
2. Discuss the Central Limit Theorem, Small Sampling Distributions,
In reality, of course we do not have all possible samples and all possible values
of the statistic. We have only one sample and one value of the statistic. This
value is interpreted with respect to all other outcomes that might have
happened, as represented by the sampling distribution of the statistic. In this
lesson, we will refer to the sampling distributions of only the commonly used
sample statistics like sample mean, sample proportion, sample variance etc.,
which have a role in making inferences about the population.
Suppose we know that 40% of the population of all users of hair oil prefers our
brand to the next competing brand. A “new improved” version of our brand has
been developed and given to a random sample of 100 users for use. If 55 of
these prefer our “new improved” version to the next competing brand, what
should we conclude? For an answer, we would like to know the probability that
the sample proportion in a sample of size 100 is as large as 55% or higher when
the true population proportion is only 40%, i.e. assuming that the new version is
no better than the old. If this probability is quite large, say 0.5, we might
conclude that the high sample proportion viz. 55% is perhaps because of
sampling errors and the new version is not really superior to the old. On the
other hand, if this probability works out to a very small figure, say 0.001, then
rather than concluding that we have observed a rare event we might conclude
that the true population proportion is higher than 40%, i.e. the new version is
actually superior to the old one as perceived by members of the population. To
calculate this probability, we need to know the probability distribution of
sample proportion i.e. the sampling distribution of the proportion.
If we pick up another sample of size n from the same population, we might end
up with a totally different set of sample values and so a different sample mean.
Therefore, there are many (perhaps infinite) possible values of the sample mean
and the particular value that we obtain, if we pick up only one sample, is
determined only by chance. In other words, the sample mean is a random
variable. The possible values of this random variable depends on the possible
values of the elements in the random sample from which sample mean is to be
computed. The random sample, in turn, depends on the distribution of the
population from which it is drawn. As a random variable, X has a probability
distribution. This probability distribution is the sampling distribution of X .
size n increases.
The central limit theorem is remarkable because it states that the distribution of
the sample mean X tends to a normal distribution regardless of the distribution
of the population from which the random sample is drawn. The theorem allows
us to make probability statements about the possible range of values the sample
mean may take. It allows us to compute probabilities of how far away X may be
from the population mean it estimates. We will extensively use the central limit
theorem in the next sessions about testing of hypotheses.
The central limit theorem says that, in the limit, as n goes to infinity (n →∝) , the
distribution of X becomes a normal distribution (regardless of the distribution of
the population). The rate at which the distribution approaches a normal
distribution does depend, however, on the shape of the distribution of the parent
population.
(p ) = nx
X is a binomial random variable, the possible value of this random variable
depends on the composition of the random sample from which p is computed.
The probability of x successes in the sample of size n is given by a binomial
probability distribution, viz.
P(x) = nC x pxqn-x
x
Since p = and n is fixed (determined before the sampling) the distribution of
n
The expected value and the variance of x i.e. number of successes in a sample
of size n is known to be:
E(x) = n p
Var (x) = n p q
Let us consider independent random sampling from the populations so that the
sample sizes need not be same for both populations.
Thus, the basic difference which the sample size makes is that while the
sampling distributions based on large samples are approximately normal and
sample variance S2is an unbiased estimator of σ 2, the same does not occur when
the sample is small.
It may be appreciated that the small sampling distributions are also known as
exact sampling distributions, as the statistical inferences based on them are not
subject to approximation.
The small sampling distributions are defined in terms of the concept of degrees
of freedom (df). The concept of degrees of freedom (df) is important for many
statistical calculations and probability distributions. We may define df
associated with a sample statistic as the number of observations contained in
a set of sample data which can be freely chosen. It refers to the number of
independent variables which vary freely without being influenced by the
restrictions imposed by the sample statistic(s) to be computed.
For example, when the population variance σ2is not known, it is to be estimated
by a particular value of its estimator S2; the sample variance. The number of
observations in the sample being n, df = n-m = n-1 because σ2is the only
parameter (i.e. m =1) to be estimated by the sample variance.
Then E(S2) = σ2
i =1
Properties of χ2 Distribution
1. A χ2distribution is completely defined by the number of degrees of
freedom, df= n. So there are many χ2distributions each with its own df.
2. χ2is a sample statistic having no corresponding parameter, which makes
χ2distributiona non-parametric distribution.
3. As a sum of squares the χ2random variable cannot be negative and is,
therefore, bounded on the left by zero.
4. The mean of a χ2distribution is equal to the degrees of freedom df. The
variance of the distribution is equal to twice the number of degrees of
freedom df.
E(χ2) = nVar (χ2 ) = 2n
5. Unless the df is large, a χ2 distribution is skewed to the right. As df
increases, theχ2distribution looks more and more like a normal. Thus for
large df
χ 2 ~ N n, 2n
2
Summary
In this study session, we have discussed the meaning of Sampling Distribution,
the reasons why we study sampling distributions, Central Limit Theorem, Small
Sampling Distributions, Sampling Distribution of the Variance and many
others.
(I) Discussion Questions
1. What is sampling distribution?
2. Discuss the reasons why we study sampling distribution
3. What do you understand by central limit theorem?
4. Discuss the sampling distributions of the difference of sample proportions
References
1. Dominick, Salvatore and Derrick, Reagle (2002). Theory and Problems of
Statistics and Econometrics (2nd Edition). New York: McGraw-Hill
Companies, Inc.
2. Gerald, K and Brian, W (2006). Statistics for Management and
Economics (10th Edition). Pacific Grove-USA: Brooks/Cole-Thompson
Learning.
3. Hein, L. W (2015). Quantitative Approach to Managerial Decisions. New
Jersey: Prentice Hall.
4. Hooda, R. P (2014). Statistics for Business and Economics. New Delhi:
Macmillan.
5. Lind, Marchal and Marson (2005). Statistical Techniques in Business and
Economics (12th Edition). New York: McGraw-Hill Companies.
9.3.2.3 STUDY SESSION 11
Hypothesis
Section and Subsection Headings
Introduction
Learning outcomes
Main Content
(A) Hypothesis
(B) The Null and the Alternative Hypothesis
(C) Approach to Testing Hypotheses
(D) Test Statistic and the Meaning & Interpretation of P-Value
(D) β and Power of the Test
(E) Sample Size Effect
(F) General Testing Procedure
Summary
Self-assessment questions and Answers
References/Further Readings
Introduction
You are welcome to study session eleven. In this study session, you will
understand the meaning of Hypothesis, the concept of Null and Alternative
Hypotheses, Interpretation of P-Value, Power of a Test & Size Effect and many
more.
Learning Outcomes
In this study session, you will be able to:
1. Explain the meaning of Hypothesis.
2. Discuss the concept of Null and Alternative Hypotheses.
3. Explain the meaning & Interpretation of P-Value, Power of a Test
& Size Effect and many more.
(A) Hypothesis
Testing of Hypotheses is one of the most important aspects of the theory of
decision-making. In this session, we will study a class of problems where the
decision made by a decision maker depends primarily on the strength of the
evidence thrown up by a random sample drawn from a population. We can
elaborate this by an example where the operations manager of a Cola company
has to decide whether the bottling operation is under statistical control or it has
gone out of control (and needs some corrective action). Imagine that the
company sells Cola in bottles labeled 1-liter, filled by an automatic bottling
machine. The implied claim that on the average each bottle contains 1,000
cm3of cola may or may not be true.
(i) If the claim is true, the process is said to be under statistical control. It is
in the interest of the company to continue the bottling process
(ii) If the claim is not true i.e. the average is either more than or less than
1,000 cm3, the process is said to be gone out of control. It is in the
interest of the company to halt the bottling process and set right the error
Therefore, to decide about the status of the bottling operation, the operations
manager needs a tool, which allows him to test such a claim.
This statement is tentative as it implies some assumption, which may or may not
be found valid on verification. Hypothesis testing is the process of determining
whether or not a given hypothesis is true.
ITQ: What is one of the most important aspect of the theory of decision making?
ITA: Testing of Hypotheses.
The jury acquitted Michael Jackson, on June 13, of all charges against him in
the child molestation case. In other words, using the language of hypothesis
testing the jury had to accept the null hypothesis
Because the prosecution could not prove their case against H 0 of innocence.
In a trial case we do not have to rule out guilt in order to find someone innocent,
but we do have to rule out innocence in order to find someone guilty. On the
similar lines, we do not have to rule out H 1 in order to accept H 0 ; but we do
have to rule out H 0 in order to accept H 1 . Thus, it is clear that the two
hypotheses - null and alternative - are not interchangeable; each one plays a
different, a special role. So it becomes more important to be clear about what
the null and alternative hypotheses should be in a given situation, or else the test
is meaningless. One can conceptualize the whole procedure of testing of
hypothesis as trying to answer one basic question: Is the sample evidence
strong enough to enable us to reject H 0 ? This means that H 0will be rejected
only when there is strong sample evidence against it. However, if the sample
evidence is not strong enough, we shall conclude that we cannot reject H 0 and so
we accept H 0 by default.
In this situation, the operations manager will have to take corrective action
when the average is either more than or less than 1,000 cm3. Only when the
average equals 1,000 cm3, no corrective action is necessary. So we have:
H 0 : μ = 1,000 cm3
H 1 : μ ≠ 1,000 cm3
Situation II: A consumer advocate suspects that the average amount of Cola is
less than1, 000 cm3and wants to test it.
In this situation, if the average amount of cola is greater than or equal to 1,000
cm3, no corrective action is needed, but if the average amount is less than 1,000
cm3, the company has to halt the bottling process and set right the error. So, in
this case, we have:
H 0 : μ ≥ 1,000 cm3
H 1 : μ <1,000 cm3
Situation III: The owner of the company suspects that the machine is wasting
cola by filling more than 1,000 cm3on the average and wants to test it.
From the owner’s point of view, no corrective action is necessary if the average
is less than or equal to 1,000 cm3. And, therefore, in this case we have:
H 0 : μ ≤ 1,000 cm3
H 1 : μ >1,000 cm3
As the bottling example indicates, there are three possible cases for the null
hypothesis, involving ≥, ≤ and = relationships. The exact null hypothesis should
be finalised before any evidence is gathered, or the test will not be valid. Data
snooping - formulating the null and alternative hypotheses at one’s convenience
after collecting and looking at the evidence – is unethical.
ITQ: What is the difference between the null hypothesis and the alternative
hypothesis?
ITA: A null hypothesis is an assertion about the value of a population parameter. It is
an assertion that we hold as true unless we have sufficient statistical evidence to
conclude otherwise. The alternative hypothesis is the negation of the null hypothesis.
Type I Error
In the context of statistical testing, the wrong decision of rejecting a true null
hypothesis is known as Type I Error. If the operations manager reject H 0 and
conclude that the process has gone out of control, when in reality it is under
control, he would be making a type I error.
Type II Error
The wrong decision of accepting (not rejecting, to be more accurate) a false null
hypothesis is known as Type II Error. If the operations manager do not reject
H 0 and conclude that the process is under control, when in reality it has gone out
of control, he would be making a type II error.
Both the type I and type II errors are undesirable and should be reduced to the
minimum. Let us analyse how we can minimise the chances of type I and type II
errors. It may be easily realized that it is possible, even with imperfect sample
evidence, to reduce the probability of type I error all the way down to zero. Just
accept the null hypothesis; no matter what the evidence is. Since we will never
reject any null hypothesis, we will never reject a true null hypothesis and thus
we will never commit a type I error.
ITQ: What is the difference between Type I Error and Type II Error?
ITA: The wrong decision of rejecting a true null hypothesis is known as Type I Error,
while the wrong decision of accepting (not rejecting, to be more accurate) a false null
hypothesis is known as Type II Error.
(D) Test Statistic and the Meaning & Interpretation of P-Value
Consider the case of owner’s suspicion related to our bottling process example.
The null and alternative hypotheses in this case are:
H 0 : μ ≤ 1,000
H 1 : μ >1,000
Suppose the population variance is 25 and a random sample of size 100 yields a
sample mean of 1,000.5. Because the sample mean is more than 1,000, the
evidence goes against the null hypothesis (H 0 ). Can we reject H 0 based on this
evidence?
i. If we reject it, there is some chance that we might be committing a type I
error, and
ii. If we accept it, there is some chance that we might be committing a type
II error.
Then what can we do? We should ask a natural question at this situation-
“What is the probability that H 0 can still be true despite the evidence?” The
question asks for the “credibility” of H 0 in light of unfavorable evidence.
However, due to mathematical complexities, it is not possible to compute the
probability that H 0 is true. We, therefore, settle for a question that comes very
close.
“When the actual μ = 1,000, and with sample size 100, what is the
probability of getting a sample mean that is more than or equal to 1000.5?”
Since population variance is known and sample size is large enough, the Central
Limit Theorem is applicable here, that is,
σ 2
X ~ N µ ,
n
X −µ
and the standard normal variable Z = is to be used to calculate the required
σ
n
= P (Z ≥1.00)
= 0.1587
≈ 0.16
So the answer to our question is 16%. That is, there is a 16% chance for a
sample of size 100to yield a sample mean more than or equal to 1000.5 when
the actual μ= 1,000. Statisticians call this 16% the p-value. In other words p-
value-the probability of observing a sample statistic as extreme as the one
observed if the null hypothesis is true- is a kind of “credibility rating” of H 0 in
light of the evidence. A p-value of zero means H 0 iscertainly false and a p-value
of 1 means that H 0 is certainly true. A p-value of 16% means that there is
roughly 16% probability that H 0 is true, despite the evidence. Conversely, we
can be roughly 84% confident that H0 is false in light of the evidence. The
implication is that if we reject H 0 , then there is about an 84% chance that we are
doing the right thing, and about a16% chance that we are committing a type I
error. The formal definition of the p-value is as follows:
Given a null hypothesis and sample evidence with sample size n, the p-value is
the probability of getting a sample evidence with the same n that is equally or
more unfavorable to the null hypothesis while the null hypothesis is actually
true. The p-value is calculated giving the null hypothesis the maximum benefit
of doubt.
The random variable, as Z in this case, used to calculate the p-value is called
test statistic. The formal definition of the test statistic is as follows:
A test statistic is a random variable calculated from the sample evidence, which
follows a well-known distribution and thus can be used to calculate the p-value.
Most of the time, the test statistic we use will be Z, t, χ2,or F. The distributions
of these random variables are well known and we can calculate the p-value.
Up to this point it is very much clear that statistical hypothesis is always stated
with reference to a population parameter (mean, proportion or variance). The
appropriate random variable calculated from the sample evidence acts as a test
statistic and provide the means to decide whether statistical hypothesis is to be
rejected or accepted.
The Significance Level-α
From our discussion on p-value, it becomes clear that the p-value of a test i.e.
the credibility of the null hypothesis varies with actual observed value of the
sample statistic. This fact necessitates having a policy for rejecting H0 based on
p-value.
In other words, we can say that the rejection region for H 0 is the area under the
curve where the p-value is less than α. This region is also called critical region.
The standard values for α are 10%, 5%, and 1%. Suppose α is set at 5%. In the
preceding example, for a sample mean of 1,000.5 the p-value was 16%, and
H 0 will not be rejected. For a sample mean of 1001 the p-value will be 2.28%,
which is below α = 5%. Hence H 0 will be rejected.
Let us analyse in some detail the implications of using a significance level α for
rejecting a null hypothesis.
i. The first thing to note is that if we do not reject H 0 , this does not prove
that H 0 is true. For example, if α = 5% and the p-value = 6%, we will
not reject H 0 . But there is only about 6% chance that H 0 is true, which is
hardly proof that H 0 is true. It may be possible that H 0 is false and by not
rejecting it, we are committing a type II error. For this reason, we should
say “We cannot reject H 0 at anα of 5%”rather than “We accept H 0 ”.
ii. The second thing to note is that α is the maximum probability of type I
error we set for ourselves. Since α is the maximum p-value at which we
reject H 0 , it is the maximum probability of committing a type I error. In
other words, setting α = 5% means that we are willing to put up with up
to 5% chance of committing a type I error.
iii. The third thing to note is that the selected value of α indirectly determines
the probability of type II error as well. In general, other things
remaining the same, increasing the value of α will decrease the
probability of type II error. This should be intuitively obvious. For
example, increasing α from 5% to 10% means that in those instances with
p-value in the range 5% to 10% the H 0 that would not have been rejected
before would now be rejected. Thus, some cases of false H 0 that escaped
rejection before may not escape now. As a result, the probability of type
II error will decrease
iv. The fourth thing to note about α is the meaning of (1 - α). If we set α =
5%, then (1 - α) = 95% is the minimum confidence level that we set in
order to reject H 0 . In other words, we want to be at least95% confident
that H 0 is false before we reject it.
Such a case where rejection occurs in the left tail of the distribution of the test
statistic is called a left-tailed test, as seen in the figure below.
Now consider the case where the null and alternative hypotheses are:
H 0 : μ ≤ 1,000
H 1 : μ >1,000
Such a case where rejection occurs in the right tail of the distribution of the test
statistic is called a right-tailed test, as seen in the figure below
A Right-tailed Test figure
In the case of a right-tailed test, the p-value is the area to the right of the
calculated value of the test statistic.
In left-tailed and right-tailed tests, rejection occurs only on one tail. Hence
each of them is called a one-tailed test.
Finally, consider the case where the null and alternative hypotheses are:
H 0 : μ = 1,000
H 1 : μ ≠ 1,000
In this case, we have to reject H 0 in both cases, that is, whether X is significantly
less than or greater than 1,000. Thus, rejection occurs when Z is significantly
less than or greater than zero, which is to say that rejection occurs on both tails.
Therefore, this case is called a two-tailed test. See the figure below, where the
shaded areas are the rejection regions.
Selecting Optimal α
All tests of hypotheses hinge upon this concept of the significance level and it is
possible that a null hypothesis can be rejected at α = 5% whereas the same
evidence is not strong enough to reject the null hypothesis at α = 1%. In other
words, the inference drawn can be sensitive to the significance level used. We
should note that selecting a value for α is a question of compromise between
type I and type II error probabilities. In practice, the significance level is
supposed to be arrived at after considering the cost consequences of type I error
and type II error. However, most of the time the costs are difficult to estimate
since they depend, among other things, on the unknown actual value of the
parameter being tested. Thus, arriving at a “calculated” optimal value for α is
impractical. Instead, we follow an intuitive approach of assigning one of the
three standard values, 1%, 5%, and 10%, to α.
In the intuitive approach, we try to estimate the relative costs of the two types of
errors. For example, suppose we are testing the average tensile strength of a
large batch of bolts produced by a machine to see if it is above the minimum
specified. Here type I error will result in rejecting a good batch of bolts and the
cost of the error is roughly equal to the cost of the batch of bolts. Type II error
will result in accepting a bad batch of bolts and its cost can be high or low
depending on how the bolts are used.
If the bolts are used to hold together a structure, then the cost is high because
defective bolts can result in the collapse of the structure, causing great damage.
In this case, we should strive to reduce the probability of type II error more than
that of type I error. In such cases where type II error is more costly, we keep
a large value for α, namely, 10%.
On the other hand, if the bolts are used to secure the lids on trash cans, then the
cost of type II error is not high and we should strive to reduce the probability of
type I error more than that of type II error. In such cases where type I error is
more costly, we keep a small value for α, namely, 1%.
Then there are cases where we are not able to determine which type of error is
more costly. If the costs are roughly equal, or if we have not much
knowledge about the relative costs of the two types of errors, then we keep
α = 5%.
Suppose the actual value of μ = μ 1 (say 1,002), such that μ 1 >1,000. Obviously,
H 0 is false. The cross-hatched area under the normal curve centered at μ 1 in the
figure above is then the probability of accepting H 0 when it is false. This area -
in the acceptance region of the normal curve centered at μ 0 = 1,000; represents
the probability that the observed sample mean X falls in the acceptance region
when μ= μ 1 (1,002), that is when H 0 is false.
Type I error and the power of the test (1-β) are, however, positively
related. Thus, the smaller the probability (α) of rejecting H 0 when it is
true, the smaller is the probability (1-β) of rejecting H 0 when it is false.
The figure below shows the relationship between α and βfor various values of
sample size n. As n increases, the curve shifts downwards reducing both α and
β. Thus, when the costs of both types of error are high, the best policy is to have
a large sample and a low α, such as 1%.
p versusα for various values of n figure
After understanding the basic concepts of testing of hypotheses, we are now,
able to develop tests concerning different population parameters. Under
different conditions the test procedures have to be developed differently and
different test statistics are used for testing. Before proceeding further let us
define the critical region in terms of test statistic, which is often more helpful in
many situations.
Left-tailed
Z > Zα
Right-tailed
Z > Zα / 2
And
Z < −Zα / 2
Two-tailed
t-test
When in the testing of hypotheses, we use the random variable t for calculating
the p-value and for defining the critical region of the test; we call the test as t-
test. The critical region in terms of tare summarized in table below:
Critical Region of t-test Table
Test Critical Region
t < −tα
Left-tailed
t > tα
Right-tailed
t > tα / 2
And
t < −tα / 2
Two-tailed
χ2-test
When in the testing of hypotheses, we use the random variable χ2for calculating
the p-value and for defining the critical region of the test; we call the test as χ2-
test. The critical region in terms of χ2are summarized in the table below:
Left-tailed
χ 2 > χ α2
Right-tailed
χ 2 > χ α2 / 2
And
χ 2 < χ12−α / 2
Two-tailed
F-test
When in the testing of hypotheses, we use the random variable F for calculating
the p-value and for defining the critical region of the test; we call the test as F-
test. The critical region in terms of F are summarized in the table below:
Left-tailed
F > Fα (n1 − 1, n2 − 1)
Right-tailed
F > Fα / 2 (n1 − 1, n2 − 1)
And
F < F1−α / 2 (n1 − 1, n2 − 1)
i.e.F < Fα / 2 (n2 − 1, n1 − 1)
Two-tailed
(F) General Testing Procedure
We have learnt a number of important concepts about hypothesis testing. We
are now in a position to lay down a general testing procedure in a more
systematic way. By now it should be clear that there are basically two phases in
testing of hypothesis - in the first phase, we design the test and set up the
conditions under which we shall reject the null hypothesis. In the second phase,
we use the sample evidence and draw our conclusion as to whether the null
hypothesis can be rejected. The detailed steps involved are as follows:
Step 1: State the Null and the Alternate Hypotheses. i.e. H 0 and H 1
Step 2: Specify a level of significance α
Step 3: Choose the test statistic and define the critical region in terms of the test
statistic
Step 4: Make necessary computations
i. Calculate the observed value of the test statistic
ii. Find the p- value of the test
Step 5: Decide to accept or reject the null hypothesis either
i. By comparing the p- value with α or
ii. By comparing the observed value of the test statistic with the cut- off
value or the critical value of the test statistic.
Summary
In this study session, we have discussed the meaning of Hypothesis, the concept
of Null and Alternative Hypotheses, Interpretation of P-Value, sample size
effect and general testing procedure.
References
1. Dominick, Salvatore and Derrick, Reagle (2002). Theory and Problems of
Statistics and Econometrics (2nd Edition). New York: McGraw-Hill
Companies, Inc.
2. Gerald, K and Brian, W (2006).Statistics for Management and Economics
(10th Edition). Pacific Grove-USA: Brooks/Cole-Thompson Learning.
3. Hein, L. W (2015). Quantitative Approach to Managerial Decisions. New
Jersey: Prentice Hall.
4. Hooda, R. P (2014). Statistics for Business and Economics. New Delhi:
Macmillan.
5. Lind, Marchal and Marson (2005). Statistical Techniques in Business and
Economics (12th Edition). New York: McGraw-Hill Companies.
9.3.2.4 STUDY SESSION 12
Correlation
Section and Subsection Headings
Introduction
Learning outcomes
Main Content
(A) Correlation
(B) Correlation Analysis
(C) Limitations of Correlation Analysis
(D) Discussion Questions
Summary
Self-assessment questions and Answers
References/Further Readings
Introduction
You are welcome to another study session. In this study session, you will
understand the meaning of Correlation, Correlation Analysis, Pearson’s
Coefficient Correlation, Spearman’s Rank Coefficient Correlation, Concurrent
Deviation Coefficient as well as the Limitations of Correlation Analysis.
Learning Outcomes
In this study session, you will be able to:
1. Explain the meaning of Correlation &Correlation Analysis.
2. Discuss the Pearson’s Coefficient Correlation and Spearman’s
Rank Coefficient Correlation,
3. Discuss the Concurrent Deviation Coefficient as well as the
Limitations of Correlation Analysis.
(A) Correlation
Statistical methods of measures of central tendency, dispersion, skewness and
kurtosis are helpful for the purpose of comparison and analysis of distributions
involving only one variable i.e. univariate distributions. However, describing
the relationship between two or more variables is another important part of
statistics.
Since these issues are inter related, correlation and regression analysis, as two
sides of a single process, consists of methods of examining the relationship
between two or more variables. If two (or more) variables are correlated, we can
use information about one (or more) variable(s) to predict the value of the other
variable(s), and can measure the error of estimations - a job of regression
analysis.
What is Correlation?
Correlation is a measure of association between two or more variables. When
two or more variables very in sympathy so that movement in one tends to be
accompanied by corresponding movements in the other variable(s), they are
said to be correlated.
On the other hand, if the variables are varying in opposite direction, we say that
it is a case of negative correlation; e.g., movements of demand and supply.
The ratio of change in the above example is the same. It is, thus, a case of linear
correlation. If we plot these variables on graph paper, all the points will fall on
the same straight line.
On the other hand, if the amount of change in one variable does not follow a
constant ratio with the change in another variable, it is a case of non-linear or
curvilinear correlation. If a couple of figures in either series X or series Y are
changed, it would give a non-linear correlation.
In this study session, we will study linear correlation between two variables.
The commonly used methods for studying linear relationship between two
variables involve both graphic and algebraic methods. Some of the widely
used methods include:
1. Scatter Diagram
2. Correlation Graph
3. Pearson’s Coefficient of Correlation
4. Spearman’s Rank Correlation
5. Concurrent Deviation Method
1. Scatter Diagram
This method is also known as Dotogram or Dot diagram. Scatter diagram is
one of the simplest methods of diagrammatic representation of a bivariate
distribution. Under this method, both the variables are plotted on the graph
paper by putting dots. The diagram so obtained is called “Scatter Diagram”. By
studying diagram, we can have rough idea about the nature and degree of
relationship between two variables. The term scatter refers to the spreading of
dots on the graph. We should keep the following points in mind while
interpreting correlation:
i. If the plotted points are very close to each other, it indicates high degree
of correlation. If the plotted points are away from each other, it indicates
low degree of correlation.
Scatter Diagrams Figure
ii. If the points on the diagram reveal any trend (either upward or
downward), the variables are said to be correlated and if no trend is
revealed, the variables are uncorrelated.
iii. If there is an upward trend rising from lower left hand corner and going
upward to the upper right hand corner, the correlation is positive since
this reveals that the values of the two variables move in the same
direction. If, on the other hand, the points depict a downward trend from
the upper left hand corner to the lower right hand corner, the correlation
is negative since in this case the values of the two variables move in the
opposite directions.
iv. In particular, if all the points lie on a straight line starting from the left
bottom and going up towards the right top, the correlation is perfect and
positive, and if all the points like on a straight line starting from left top
and coming down to right bottom, the correlation is perfect and negative.
The various diagrams of the scattered data in the above figure depict different
forms of correlation.
Example:
Given the following data on sales (in thousand units) and expenses (in thousand
naira) of a firm for 10 month:
Month: J F MA M J J A S O
Sales: 50 50 55 60 62 65 68 60 60 50
Expenses: 11 1314 16 16 15 15 14 13 13
a) Make a Scatter Diagram
b) Do you think that there is a correlation between sales and expenses of the
firm? Is it positive or negative? Is it high or low?
Solution:
a) The Scatter Diagram of the given data is shown in the figure below:
20
15
Expenses
10
0
0 20 40 60 80
Sales
2. Correlation Graph
This method, also known as Correlogram is very simple. The data pertaining to
two series are plotted on a graph sheet. We can find out the correlation by
examining the direction and closeness of two curves. If both the curves drawn
on the graph are moving in the same direction, it is a case of positive
correlation. On the other hand, if both the curves are moving in opposite
direction, correlation is said to be negative. If the graph does not show any
definite pattern on account of erratic fluctuations in the curves, then it shows an
absence of correlation.
Example:
Find out graphically, if there is any correlation between price yield per plot
(qtls); denoted by Y and quantity of fertilizer used (kg); denote by X.
Plot No.: 1 2 3 4 5 6 7 8 9 10
Y: 3.5 4.3 5.2 5.8 6.4 7.3 7.2 7.5 7.8 8.3
X: 6 8 9 12 10 15 17 20 18 24
Solution:
The Correlogram of the given data is shown in Figure below:
30
25
20
X and Y
15
10
5
0
1 2 3 4 5 6 7 8 9 10
Plot Number
The figure above shows that the two curves move in the same direction and,
moreover, they are very close to each other, suggesting a close relationship
between price yield per plot (qtls) and quantity of fertilizer used (kg)
Remark: Both the Graphic methods - scatter diagram and correlation graph
provide a ‘feel for ‘of the data – by providing visual representation of the
association between the variables. These are readily comprehensible and enable
us to form a fairly good, though rough idea of the nature and degree of the
relationship between the two variables. However, these methods are unable to
quantify the relationship between them. To quantify the extent of correlation,
we make use of algebraic methods - which calculate correlation coefficient.
N
1
2
[N ∑ X 2 − (∑ X )
2
N
1
2
] [
N ∑ Y 2 − (∑ Y )
2
]
OR
N ∑ XY − ∑ X ∑ Y
rxy =
N ∑ X 2 − (∑ Y ) N ∑ Y 2 − (∑ Y )
2 2
rxy =
∑d d
x y
∑d d2
x
2
y
Remark: Thus if (i) X and Y are fractional and (ii) X and Y assume large values,
N ∑ XY − ∑ X ∑ Y
this formula: rxy = is not generally used for
N ∑ X 2 − (∑ Y ) N ∑ Y 2 − (∑ Y )
2 2
numerical problems. In such cases, the step deviation method where we take the
deviations of the variables X and Y from any arbitrary points is used. We will
discuss this method in the properties of correlation coefficient.
Where A, B, hand k are constants and h > 0, k > 0;then the correlation
coefficient between X and Y is same as the correlation coefficient between U
and V i.e.,
r(X,Y) = r(U, V) =>r xy = r uv
3. Two independent variables are uncorrelated but the converse is not true.
If X and Yare independent variables then
r xy = 0
However, the converse of the theorem is not true i.e. Uncorrelated variables
need not necessarily be independent. As an illustration consider the following
bivariate distribution.
X: 1 2 3 -3 -2 -1
Y: 1 4 9 9 4 1
For this distribution, value of r will be 0.
Hence in the above example the variable X and Yare uncorrelated. But if we
examine the data carefully we find that X and Yare not independent but are
connected by the relation Y = X2 .The above example illustrates that
uncorrelated variables need not be independent.
Remarks: One should not be confused with the words un correlation and
independence. r xy = 0i.e., un correlation between the variables X and Y simply
implies the absence of any linear (straight line) relationship between them. They
may, however, be related in some other form other than straight line e.g.,
quadratic (as we have seen in the above example), logarithmic or trigonometric
form.
The signs of both the regression coefficients are the same, and so the value of r
will also have the same sign.
This property will be dealt with in detail in the next study session on
Regression Analysis.
Example:
Find the Pearsonian correlation coefficient between sales (in thousand units)
and expenses (in thousand naira) of the following 10 firms:
Firm: 1 2 3 4 5 6 7 8 9 10
Sales: 50 50 55 60 65 65 65 60 60 50
Expenses: 11 13 14 16 16 15 15 14 13 13
Solution:
Let sales of a firm be denoted by X and expenses be denoted by Y
1 50 11 -8 -3 64 9 24
2 50 13 -8 -1 64 1 8
3 55 14 -3 0 9 0 0
4 60 16 2 2 4 4 4
5 65 16 7 2 49 4 14
6 65 15 7 1 49 1 7
7 65 15 7 1 49 1 7
8 60 14 2 0 4 0 0
9 60 13 2 -1 4 1 -2
10 50 13 -8 -1 64 1 8
∑X = ∑Y = ∑ d x2 = ∑ d y2 = ∑ d x2 d y2 =
580 140 360 22 70
X=
∑X =
580
= 58 Y=
∑ Y = 140 = 14
N 10 and N 10
Applying the formula below, we have Pearsonian coefficient of correlation
rxy =
∑d d
x y
∑d d
2
x
2
y
70
rxy =
360 X 22
70
rxy =
7920
rxy = 0.78
6∑ d 2
Spearman’s rank correlation formula ρ = 1 − can also be used even if we
(
N N 2 −1 )
are dealing with variables, which are measured quantitatively, i.e. when the
actual data but not the ranks relating to two variables are given. In such a case
we shall have to convert the data into ranks. The highest (or the smallest)
observation is given the rank 1. The next highest (or the next lowest)
observation is given rank 2 and so on. It is immaterial in which way
(descending or ascending) the ranks are assigned. However, the same approach
should be followed for all the variables under consideration.
Example:
Calculate the rank coefficient of correlation from the following data:
X: 75 88 95 70 60 80 81 50
Y: 120 134 150 115 110 140 142 100
Solution:
Calculations for Coefficient of Rank Correlation
X Ranks R X Y Ranks R Y d = R X –R Y d2
75 5 120 5 0 0
88 2 134 4 -2 4
95 1 150 1 0 0
70 6 115 6 0 0
60 7 110 7 0 0
80 4 140 3 +1 1
81 3 142 2 +1 1
50 8 100 8 0 0
Σd2 = 6
6∑ d 2
ρ = 1−
(
N N 2 −1 )
6X 6
ρ = 1−
(
8 82 − 1 )
36
ρ = 1−
504
ρ = 1 − 0.07
ρ = +0.93
Repeated Ranks
In case of attributes if there is a tie i.e., if any two or more individuals are
placed together in any classification w.r.t.an attribute or if in case of variable
data there is more than one item with the same value in either or both the series
6∑ d 2
then Spearman’s formula ρ = 1 − for calculating the rank correlation
(
N N 2 −1 )
coefficient breaks down, since in this case the variables X[the ranks of
individuals in characteristic A (1stseries)] and Y[the ranks of individuals in
characteristic B (2nd series)] do not take the values from 1to N.
In this case common ranks are assigned to the repeated items. These common
ranks are the arithmetic mean of the ranks, which these items would have got if
they were different from each other and the next item will get the rank next to
the rank used in computing the common rank. For example, suppose an item is
repeated at rank 4. Then the common rank to be assigned to each item is
(4+5)/2, i.e., 4.5 which is the average of 4 and 5, the ranks which these
observations would have assumed if they were different. The next item will be
assigned the rank 6. If an item is repeated thrice at rank 7, then the common
rank to be assigned to each value will be (7+8+9)/3, i.e.,8 which is the
arithmetic mean of 7,8 and 9 viz., the ranks these observations would have got
if they were different from each other. The next rank to be assigned will be 10.
If only a small proportion of the ranks are tied, this technique may be applied
6∑ d 2
together with this formula ρ = 1 − . If a large proportion of ranks are tied,
(
N N 2 −1 )
it is advisable to apply an adjustment or a correction factor to this formula
6∑ d 2
ρ = 1− as explained below:
(
N N 2 −1 )
To ∑d 2
; where ‘m’ is the number of times an item is repeated. This
correction factor is to be added for each repeated value in both the series”.
f. Spearman’s formula has its limitations also. It is not practicable in the case of
bivariate frequency distribution. For N >30, this formula should not be used
unless the ranks are given.
5. Concurrent Deviation Method
This is a casual method of determining the correlation between two series when
we are not very serious about its precision. This is based on the signs of the
deviations (i.e. the direction of the change) of the values of the variable from its
preceding value and does nottake into account the exact magnitude of the values
of the variables. Thus we put a plus (+) sign, minus (-) sign or equality (=) sign
for the deviation if the value of the variable is greater than, less than or equal to
the preceding value respectively. The deviations in the values of two variables
are said to be concurrent if they have the same sign (either both deviations are
positive or both are negative or both are equal). The formula used for computing
correlation coefficient r c by this method is given by:
2c − N
rc = ± ±
N
Where c is the number of pairs of concurrent deviations and Nis the number of
pairs of deviations. If (2c-N) is positive, we take positive sign in and outside the
square root in r c formula above and if (2c-N) is negative, we take negative sign
in and outside the square root in r c formula.
Remarks:
i. It should be clearly noted that here Nis not the number of pairs of
observations but it is the number of pairs of deviations and as such it is
one less than the number of pairs of observations.
ii. Coefficient of concurrent deviations is primarily based on the following
principle:
“If the short time fluctuations of the time series are positively correlated or
in other words, if their deviations are concurrent, their curves would move
in the same direction and would indicate positive correlation between
them”
Example:
Calculate coefficient of correlation by the concurrent deviation method
Supply: 112 125 126 118 118 121 125 125 131 135
Price: 106 102 102 104 98 96 97 97 95 90
Solution:
Calculations for Coefficient of Concurrent Deviations
Supply Sign of Deviation from Price Sign of Deviation Concurrent
(X) Preceding Value (X) (Y) from Preceding Value (Y) Deviations
112 106
125 + 102 -
126 + 102 =
118 - 104 +
118 = 98 -
121 + 96 -
125 + 97 + +(c)
125 = 97 = =(c)
131 + 95 -
135 + 90 - .
We have
Number of pairs of deviations, N =10 – 1 = 9
c = Number of concurrent deviations
= Number of deviations having like signs
=2
rc = ± 0.5556
rc = −0.7
Hence there is a fairly good degree of negative correlation between supply and
price.
References
1. Dominick, Salvatore and Derrick, Reagle (2002). Theory and Problems of
Statistics and Econometrics (2nd Edition). New York: McGraw-Hill
Companies, Inc.
2. Gerald, K and Brian, W (2006).Statistics for Management and Economics
(10th Edition). Pacific Grove-USA: Brooks/Cole-Thompson Learning.
3. Hein, L. W (2015). Quantitative Approach to Managerial Decisions. New
Jersey: Prentice Hall.
4. Hooda, R. P (2014). Statistics for Business and Economics. New Delhi:
Macmillan.
5. Lind, Marchal and Marson (2005). Statistical Techniques in Business and
Economics (12th Edition). New York: McGraw-Hill Companies.
9.3.2.5 STUDY SESSION 13
Regression
Section and Subsection Headings
Introduction
Learning outcomes
Main Content
(A) Meaning of Simple Regression Analysis
(B) Simple Regression Equation
(C) Predicting an Estimate and its Preciseness
(D) Regression of x on y
(E) Properties of Regression Coefficients
(F) Correlation Analysis versus Regression Analysis
(G) Regression Diagnostics-I
Summary
Self-assessment questions and Answers
References/Further Readings
Introduction
You are welcome to the last study session of this course. In this session,
discussion will be made on how to measure relationship between variables, in
which regression analysis was considered as one of the approaches. The session
will also deal with simple regression analysis, it will explain how the intercept
and the gradient of the simple regression equation are determined and also how
to carry out some diagnostics analysis on simple regression.
Learning Outcomes
In this study session, you will:
1. Be able to describe regression analysis.
2. Understand ways of expressing simple regression equation.
3. Be able to compute intercept, gradient and error term.
4. Know various properties of regression coefficient.
5. Be able to make a decision using regression analysis.
6. Understand some regression diagnostics.
When dealing with population data 𝛼𝛼𝑖𝑖 (alpha)and𝛽𝛽𝑖𝑖 (beta)are used in the
regression equation as represented thus:
𝑦𝑦 = 𝛼𝛼𝑖𝑖 + 𝛽𝛽𝑖𝑖 𝑥𝑥𝑖𝑖
In the case of the stochastic model, the equation of the sampled data is written
in the following forms:
y = a + bx + 𝜀𝜀𝑖𝑖
or𝑦𝑦� = 𝛽𝛽̂0 + 𝛽𝛽̂𝑖𝑖 𝑥𝑥𝑖𝑖 + 𝜀𝜀𝑖𝑖
ITQ: What is the distinction between deterministic model and stochastic model?
ITA: The distinction between deterministic model and stochastic model is the inclusion
of error term (or random term) in the case of stochastic model
The line of best fit is a line on a scatter diagram that can be drawn near the
points to more clearly show the trends between two sets of data. Data points that
appear not close to the line of best fit are considered as outliers. The fit line can
be drawn by means of freehand drawing using highest, lowest and mean value
of x. This method is simple and quick, but the result it yields is rough and
subjective.
6 Series1
4 Linear
(Series1)
2
0
0 5 10 15
Assuming using the scatter diagram above, we want to estimate the value of y
when x has a value of 3, all that we need to do is to draw a vertical line at x=3
till when the line touches the line of best fit. At the point that it touched the line,
we then draw a horizontal line towards the y axis. The value at the point at
which the horizontal line touches the y axis is the value that represents the
predicted or estimated value of y. The value is roughly 7.
Normal equations are two simultaneous equations that are solved to determine
the value of the intercept and the gradient of simple linear regression. The
equations are represented as follows:
an + bΣx = Σy …………..i
aΣx + bΣx2 = Σxy…….…ii
On solving the two equations, the sign of the coefficient of the gradient will
automatically emerged. The letter n in the first equation stands for sample size.
In each of the above cases, the value of 𝑏𝑏�𝑖𝑖 has to be determined first before that
of𝑏𝑏�0 . The reason being that 𝑏𝑏�0 formula has two unknown 𝑏𝑏�0 and 𝑏𝑏�𝑖𝑖 .
The value of 𝑏𝑏�0 and 𝑏𝑏�𝑖𝑖 can also be computed by using sum of squared
difference. This approach leads to the minimization of the sum of the squares of
the errors made in the results of every single equation. It is also called least
square method and the line of best fit determined using the method is called
least squares line.
The value of 𝑏𝑏�0 = 𝑦𝑦�𝑖𝑖 − 𝑏𝑏�𝑖𝑖 𝑥𝑥̅𝑖𝑖 as in the case of the second formula above, while
that of
𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆
𝑏𝑏�𝑖𝑖 = ,
𝑆𝑆𝑆𝑆𝑆𝑆
Where SS xy is sum of squared xy and SS x is sum of squared x. The two respective
values are computed using the following formulae:
Where:
∑(𝑦𝑦𝑖𝑖 ) ∑(𝑥𝑥𝑖𝑖 )
𝑦𝑦�𝑖𝑖 = 𝑎𝑎𝑎𝑎𝑎𝑎𝑥𝑥̅𝑖𝑖 =
𝑛𝑛 𝑛𝑛
Note:
SS xy is the numerator that is used in the computation of the covariance and
correlation coefficient. SS x is the numerator that is used in the computation of
the standard sample standard deviation earlier computed.
There are short cut formulae that could be used in computing SS xy and SS x .
These formulae are presented thus:
∑ 𝑥𝑥𝑖𝑖 ∑ 𝑦𝑦𝑖𝑖
𝑆𝑆𝑆𝑆𝑆𝑆 = � 𝑥𝑥𝑖𝑖 𝑦𝑦𝑖𝑖 −
𝑛𝑛
(∑ 𝑥𝑥𝑖𝑖 )2
𝑆𝑆𝑆𝑆𝑆𝑆 = � 𝑥𝑥𝑖𝑖2 −
𝑛𝑛
(D) Regression of x on y
There are two lines of regression: one of y on x and the other of x on y. When y
is regressed on x, y is the dependent variable and x is the independent. The
reverse is the case when x is regressed on y. The value of the intercept and the
gradient when x is regressed on y can be found using the following interpolation
formulae:
∑ 𝑥𝑥𝑥𝑥 − 𝑛𝑛𝑥𝑥̅ 𝑦𝑦�
𝑏𝑏�𝑖𝑖′ =
∑ 𝑦𝑦𝑖𝑖2 − 𝑛𝑛𝑦𝑦� 2
𝑏𝑏�0′ = 𝑥𝑥̅ − 𝑏𝑏�𝑖𝑖′ 𝑦𝑦�
Alternatively, by means of the following normal equations, the value of the
intercept and the gradient can be determined:
Σx i = n a’ + b’ Σy i -------------- 1
Σx i y i = a’ Σy i + b’ Σy i 2 -------- 2
After the values have been determined they can then be substituted into the
below regression equation.
x = a’ + yb’
To ascertain whether there is any deviation from the above conditions demand
examining the error term (or residual). The examination is done by first of all
standardising the residuals. The standardisation is done by subtracting the mean
and dividing by the standard deviation. The mean of the residual is zero, and
because the standard deviation is unknown, it is estimated by using the standard
error of estimate. Hence, the formula for computing standardised residual (SR)
is represented thus:
𝑟𝑟 𝑦𝑦𝑖𝑖 − 𝑦𝑦�𝑖𝑖
𝑆𝑆𝑆𝑆 = =
𝑆𝑆𝑆𝑆 𝑆𝑆𝑆𝑆
Where:
SR = standardized residual
r = residual = 𝑦𝑦𝑖𝑖 − 𝑦𝑦�𝑖𝑖
∑(𝑦𝑦𝑖𝑖 −𝑦𝑦�𝑖𝑖 )2
SE = standard error of estimate = �
𝑛𝑛−2
Summary
The study session dealt with a measure of relationship using regression analysis.
Explanation was offered on how to make prediction or estimation using
regression equation. Properties of regression coefficient, comparison between
correlation analysis and regression analysis, regression diagnosis was also
discussed.
References
1. Dominick, Salvatore and Derrick, Reagle (2002). Theory and Problems of
Statistics and Econometrics (2ndEdition). New York: McGraw-Hill
Companies, Inc.
2. Gerald, K and Brian, W (2006).Statistics for Management and Economics
(10th Edition). Pacific Grove-USA: Brooks/Cole-Thompson Learning.
3. Hein, L. W (2015). Quantitative Approach to Managerial Decisions. New
Jersey: Prentice Hall.
4. Hooda, R. P (2014). Statistics for Business and Economics. New Delhi:
Macmillan.
5. Lind, Marchal and Marson (2005). Statistical Techniques in Business and
Economics (12th Edition). New York: McGraw-Hill Companies.
10.0 FURTHER READING
1. Gerald, K and Brian, W (2006).Statistics for Management and Economics
(10th Edition). Pacific Grove-USA: Brooks/Cole-Thompson Learning.
2. Lind, Marchal and Marson (2005). Statistical Techniques in Business and
Economics (12th Edition). New York: McGraw-Hill Companies.
3. Dominick, Salvatore and Derrick, Reagle (2002). Theory and Problems of
Statistics and Econometrics (2nd Edition). New York: McGraw-Hill
Companies, Inc.
4. St. Andrews (2003). Bayes School of Mathematics and Statistics,
University of St Andrews Scotland, http://www-gap.dcs.st-and.ac.uk/
history/Mathematicians/Bayes.html. Edited by John O’Connor and
Edmund Robertson.
5. Calgary, U (2003). Bayes Theorem. University of Calgary, Department of
Mathematics and Statistics, Division of Statistics and Actuarial Science,
http://balducci.math.ucalgary.ca/.
6. Gupta, S. P (2010). Statistical Methods. New Delhi: Sultan Chand and
Sons.
7. Hooda, R. P (2014). Statistics for Business and Economics. New Delhi:
Macmillan.
8. Hein, L. W (2015). Quantitative Approach to Managerial Decisions. New
Jersey: Prentice Hall.
9. Levin, Richard I. and David S. Rubin (2009). Statistics for Management.
New Delhi: Prentice Hall.
10.Lawrence, B. Moore (2010). Statistics for Business & Economics. New
York: Harper Collins.
11.Watsman, Terry J. and Keith Parramor (2012). Quantitative Methods in
Finance International. London: Thompson Business Press.
11.0 GLOSSARY
1. Alternative hypothesis is the hypothesis that the researcher expects to support.
2. Analysis of variance is a statistical test of the difference of means for two or more
groups.
3. ANOVA is an acronym for analysis of variance. It is a statistical test of the difference
of means for two or more groups.
4. Central tendency is a typical or representative value for a dataset which is reported
as the mean, the median, or the mode, depending on the data and/or one's purposes.
5. Chi Square is a statistical procedure that examines the relationship between two
categorical variables. The test is based on the discrepancy between the observed
number of observations in each category and the expected number of observations in
each category.
6. Coefficient of determination is a statistic used in linear regression that indicates the
amount of variation in the dependent variable which is explained or accounted for by
the independent variable(s).
7. Confidence interval is the generic label used to describe the decision points where
the researcher favours the alternative hypothesis over the null hypothesis. Stated
differently, it is the range of mean values within which the true population mean is
likely to fall.
8. Continuous variable is a variable which can assume an infinite number of values.
Convenience sample is the kind of sampling used when the researcher decides to
select the units of study on the basis of their being readily available.
9. Correlation is a standardised index of the strength and direction of the relationship
between two variables. The range for the possible correlation between any two
variables is from -1.00 (a perfect inverse relationship) to +1.00 (a perfect positive
relationship).
10. Covariance is a measure of association between a pair of variables. It is similar to a
correlation, but a correlation is expressed in a standardised metric, whereas
covariance is expressed in the units of the original variables.
11. Critical value is a value that establishes the boundaries of the confidence interval.
12. Decile is a subset of adjacent scores in a distribution representing 10% of a sample or
a population. A "decile score" is a raw score corresponding to the 10th, 20th, or 30th
etc. percentage score.
13. Degrees of freedom is the number of components in the calculation of a statistic that
are free to vary.
14. Dichotomous variable is a discrete measure with two categories that may or may not
be ordered. It is a variable which has only two categories.
15. Discrete variable is a variable which is limited to a finite number of values. A
discrete variable usually describes something which occurs only in whole units. The
number of males in an English class is an example of a discrete variable.
16. Dispersion is the "spread" of a data set, the departure from central tendency.
17. Distribution is where the horizontal axis (x-axis) represents the variable being
described. The density of the smooth curve over the x-axis represents the probability
of occurrence for each of the values on the x-axis.
18. Explained variance is the variance in Y about Y' where Y' is the value of Y on the
regression line predicted by the regression equation. If the regression line does not
help in predicting Y, then it will pass through Y-bar, in which case, B yx = 0. In
absolute value terms, the highest possible score for B yx = +/- 1.00.
19. Heteroscedasticity is a condition in which the variances of two or more population
distributions are not equal.
20. Histogram is a bar graph used to represent the frequency of each value occurring in a
distribution of scores.
21. Homoscedasticity is a condition in which the variances of two or more population
distributions are equal.
22. Hypotheses is a set of two or more mutually exclusive and often exhaustive
statements. The goal of hypothesis testing is to determine which is true.
23. Independent samples t-test is the procedure used in hypothesis testing to compare
the means of two different samples. As is true for all t-tests, the standard error is not
known and is estimated from sample data.
24. Interval data is data that possess magnitude (one value can be judged greater than,
less than, or equal to another) and a constant distance between intervals (units of
measurement are the same on the scale regardless of where the unit falls).
25. Interval variable is a variable whose attributes are rank ordered and have equal
distances between adjacent attributes.
26. Kurtosis is the degree of flatness or peakedness of a graph of a frequency
distribution. The relatively flat distributions are described as platykurtic. Distributions
with medium curvature are mesokurtic (note: a normal distribution is mesokurtic).
The most peaked distributions are leptokurtic.
27. Leptokurtic is a distribution that is more peaked than a normal distribution. This is to
say that there are more cases concentrated close to the mean than in a normal
distribution.
28. Line of best fit (least squares fit) is the least squares fit procedure that allows us to
reduce the scatterplot to a single straight line described by a linear equation. It
minimises the square of the vertical distance between each point and the regression
line.
29. Marginal is the frequency distribution of each of two cross tabulated variables. There
are row marginal and column marginal.
30. Mean is a measure of central tendency calculated by dividing the sum of the scores in
a distribution by the number of scores in the distribution. This value best reflects the
typical score of a data set when there are few outliers and/or the dataset is generally
symmetrical.
31. Median is the value in a data set which divides the scores into two equal halves (i.e.,
an equal number of scores lie above and below it). As a measure of central tendency,
it is largely unaffected by extreme values.
32. Mode is the score that occurs most frequently in a data set. This measure of central
tendency is the only one appropriate for nominal data.
33. Negative skew is an asymmetry in a distribution in which the scores are bunched to
the right side of the centre. With a negatively skewed distribution, the mean generally
falls to the left of the median and the median usually lies to the left of the mode.
Study Hint: the tail of a negatively skewed distribution points to the negative side of
the number line.
34. Non-probability sample is a type of sampling that involves the researcher's judgment
to determine the elements to be selected for the sample.
35. Nominal data are data that are classified into mutually exclusive ("named") groups
that lack intrinsic order.
36. Normal distribution is a theoretical distribution which is typically bell-shaped when
graphed. The distribution is theoretical because the height of the curve is defined by a
mathematical formula (and the exact values necessary to create the curve would never
occur).
37. Null hypothesis is the prediction that the researcher believes will be "nullified." That
is, the researcher believes this prediction is not true.
38. Observation is the empirical data that is used to support or refute a hypothesis.
39. Ordinal data are data whose values are ordered so that inferences can be made
regarding magnitude, but which have no fixed interval between values. An example of
ordinal data is a letter grade on a test.
40. Ordinal variable is a variable whose values are ordered so that inferences regarding
magnitude can be made, but which have no fixed interval between values. Letter
grade on a test would be an ordinal variable: while an 'A' is greater than a 'B' which is
greater than a 'C', we cannot conclude that the distance between an 'A' and a 'B' is the
same as the distance between a 'B' and a 'C'.
41. Outlier is a value in a data set that is very different from most other values in the set.
42. Paired t-testis the procedure used when the independent variable is within subjects in
nature in hypothesis testing. The goal is to compare two levels of the independent
variable assigned to the same group of subjects at different points in time. As it is true
for all t-tests, the standard error is not known and is estimated from sample data.
43. Parameter is a characteristic of a population, e.g. mean ( ), pronounced "mu", and
standard deviation ( ), or "sigma".
44. Pearson's correlation coefficient is a measure of association between two
continuous variables which estimates both the direction and strength of a linear
relationship.
45. Percentile is a value that exceeds a specific percentage of the distribution. Thus, if the
63rd percentile score for a set of you on the SAT verbal exam is 560, then 63% of
scores are at or below 560.
46. Platykurtic is a distribution that is flatter than a normal distribution. This is to say
that there are more cases in the tails of the distribution than in a normal distribution.
47. Population is the set of all possible data values that could be observed.
48. Positive skew is an asymmetry in a distribution in which the scores are bunched to
the left side of the centre. With a positively-skewed distribution, the mean generally
falls to the right of the median and the median usually lies to the right of the mode.
Study Hint: the tail of a positively skewed distribution points to the positive side of a
number line.
49. Probability sample is sampling in which each element within a study population has
a known, nonzero chance of being selected into the sample.
50. Protocol is a specified methodology for performing a task.
51. Quartile is a subset of adjacent scores in a distribution representing 25% of a sample
or a population. A "quartile score" is a raw score corresponding to the 25th, 50th, or
75th percentile score.
52. Quintile is subset of adjacent scores in a distribution representing 20% of a sample or
a population. A "quintile score" is a raw score corresponding to the 20th, 40th, 60th,
or 80th percentile score.
53. Random sample is a sample that contains observations which are selected form a
population so that every member of the population has a known chance of selection
for a sample.
54. Random variable is the measurements of a random variable, vary in a seemingly
random and unpredictable manner. A random variable assumes a unique numerical
value for each of the outcomes in the sample space of the probability experiment.
55. Range is a simple measure of dispersion, indicating the difference between the lowest
and highest values observed.
56. Ranked categories are categories within a variable that are logically ranked. The
different attributes of each category represent relatively more or less of the variable.
57. Ratio data are data that are ordered (so that we can make inferences regarding
magnitude), have equal intervals between values, and contain an absolute zero point.
Height is an example of ratio data: 60 inches is taller than 55 inches, the distance
between 60 and 55 inches is the same as the distance between 30 and 25 inches, and a
height of 0 inches implies no height at all.
58. Ratio variable are the variables that are based on a true zero point. An example of a
ratio variable would be age.
59. Regression is a statistical procedure that allows us to determine the extent to which
we can predict a given observation's score on a dependent variable, given
observation's score on one or more independent variables.
60. Regression coefficient is the slope of the regression line. It represents the change in y
for every one unit change in x.
61. Regression line is a model that simplifies the relationship between two variables. By
approximating a line through the centre of a scatterplot that represents the data, we
create a two dimensional centre for the data. The line summarises the data points in
the same way that measures of central tendency do.
62. Sample is a collection of observations selected form a larger population.
63. Sampling distribution are all the possible non-overlapping samples that can be
drawn, given a constant sample size.
64. Sampling distribution of means is a frequency distribution of a large number of
random sample means that have been drawn from the same population.
65. Sampling distribution of the difference between means is a sampling distribution
that consists of the differences in means between groups.
66. Sampling distribution of means is a frequency distribution of a large number of
random sample means that have been drawn from the same population.
67. Sampling distribution of the mean of difference scores is a sampling distribution
that consists of the differences in means within subjects across treatments.
68. Sampling error is the extent to which a sample distribution is different from the
population distribution of which the sample is drawn.
69. Scatterplot is a group of data points that are plotted along x-axis and y-axis
coordinates. Every individual is represented as a data point, whereby a perpendicular
line from the individual's "X" value intersects a perpendicular line from the
individual's "Y" value.
70. Single sample t-test is the procedure used to compare the mean of one sample to a
known population meaning hypothesis testing. As is true for all t-tests, the standard
error is not known and is estimated from sample data.
71. Skewness is asymmetry in a distribution in which scores are bunched on one side of
the distribution.
72. Standard deviation is a measure of dispersion describing the spread of scores around
the mean. It is the square root of the variance.
73. Standard error is the standard deviation of a sampling distribution.
74. Standard error of the mean is the standard deviation of a sampling distribution of
means.
75. Standard error of the mean of difference scores is the standard deviation of
a sampling distribution of the mean of difference scores.
76. Standard score is a raw score that has been converted from one scale into another
with an arbitrarily set mean and standard deviation. Standard scores are more easily
interpreted than raw scores, because they take into account the mean and standard
deviation of the distribution of values.
77. Statistic is a characteristic of a sample, e.g. mean ( ) and standard deviation(s).
78. Strata is a subdivision of a population.
79. Stratification is allocating samples among subcategories, within a population.
Stratification is sometimes necessary to improve the effectiveness of a sampling effort
or to increase understanding of population characteristics. For example, stratifying an
election survey by sex allows analysts to better understand voter behaviour by
revealing differences in the way that males and females vote.
80. Type I error is erroneously rejecting the null hypothesis: concluding that a sample
came from a different population when it in fact, is from the same population.
81. Type II error is erroneously failing to reject the null hypothesis: concluding that a
sample came from the given population when it in fact is from a different population.
82. Variance is a measure of dispersion, indicating the mean of the squared deviations of
a set of scores from the mean of the scores.
83. Y-intercept is the point through which the line intersects the Y-axis. It is the value of
y when x equals zero.
84. Z-score is a standardised score which indicates how many standard deviations a value
lies above or below the mean.