Statistics Sol
Statistics Sol
) Economics
Semester-I
Course Credit - 4
DSC-3
INTRODUCTORY STATISTICS
FOR ECONOMICS
As per the UGCF - 2022 and National Education Policy 2020
Introductory Statistics for Economics
Editorial Board
J. Khuntia, V.A.Rama Raju, Vajala Ravi,
Bhavna Rajput, Anupama, Devender
Content Writers
Suramya Sharma, Dr. Pooja Sharma,
Taramati, Sugandh Kumar Choudhary,
Tasha Agarwal
Academic Coordinator
Mr. Deekshant Awasthi
Published by:
Department of Distance and Continuing Education under
the aegis of Campus of Open Learning, University of Delhi
Printed by:
School of Open Learning, University of Delhi
TABLE OF CONTENT
LIST OF CONTRIBUTORS
About Contributors
LESSON 1
STRUCTURE
iii. To learn about various scales of measurement viz. ratio, interval, ordinal, and
nominal scales
iv. To identify and comprehend the differences between statistical population and
samples
v. To distinguish between various sampling techniques
vi. To demonstrate the knowledge of various branches of statistics through descriptive
and inferential statistics
1.2 INTRODUCTION
"Statistical thinking will one day be as necessary for efficient citizenship as the ability to read
and write." Samuel S. Wilks (1906 - 1964).
Over the past many decades, statistics have become an indispensable part of our lives. We often
come across statistics in one form or the other. If we turn our newspapers or televisions, we
can definitely find some surveys that establish relationship between particular issues, say,
eating fast food and the risk of having a health issue; or we may find graphs depicting growth
rates or changes in some variables overtime, say growth rate of GDP or inflation level in a
country. But the scope of statistics is not just limited to data collection and representation, but
it establishes a basis for decision making and problem solving. We can create models to not
only study the past trends but can also extend them to study the uncertain future. Statistics help
us to make informed decisions in a world of uncertainty and variability. We would not have
needed statistics had there been no uncertainty and variability around us.
The word Statistics is derived from a Latin word, “Status,” which means “a group of numbers
or figures that represent some information of human interest.” Statistics may be formally
defined as “the study of collecting, organizing, analyzing and interpreting information in the
form of data.”
Statistics helps us gain valuable insights not only in the Economics discipline, but is also
popular amongst finance scholars, engineers, medical researchers and other science and social-
science disciplines. In the financial sector, statistical analysis may be used at the micro and
macro level. At micro level, it facilitates understanding of a company or business’ performance
like determining the revenue generating capacity, relationship between advertising and sales,
etc. Whereas at macro level, statistical analysis allows a country to assess its financial condition
and measure economic growth. In the field of engineering, statistics is an indispensable part
for robust analysis, probability risk assessment, measurement of error etc. Statistics allow
clinical researchers to compare various medical treatments, evaluate the benefits of alternate
therapies, establish optimal treatment combinations, etc. Since the scope of statistics has
broadened and is now used in a number of practical fields, it is also referred to as applied
statistics.
2|Page
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
When we talk about the nature of statistics, it is considered as a science as well as an art. It is
a science since the statistical techniques are systematic and have broad application. In several
instances, statistics are used to study cause and effect relationships and the results can be
generalized in the same way as any scientific experiment or law. Statistics is also an art which
refers to the “skill of handling facts so as to achieve a given objective.” Managing, presenting,
and drawing relevant conclusions from data is considered an art.
We’ve learnt that statistics is the study of collecting, organizing, analyzing and interpreting
data. But you may wonder, what exactly is data? Data is nothing but pieces of raw information,
facts and figures that are used for analysis. Broadly, data can be of two kinds:
a. Quantitative data
b. Qualitative data
As the name suggests, all kinds of numerical data that can be measured comprise quantitative
data. Information like age, height, distance, income, saving, GDP, imports, exports, rate of
employment, etc. are quantitative data.
Now quantitative data can further be classified into two types:
i. Discrete data – The data which can only take specific values are termed as discrete
data. For example, age of respondents- this variable can assume only whole numbers.
For instance, the number of computers in a school can only be whole numbers. We will
never witness 7.2 or 15.7 computers in a school. Similarly, the number of students in a
class- this variable too can only take whole numbers. We never observe 39.7 or 45.2
students in a class. These are examples of discrete data. So discrete data generally takes
only whole numbers, is finite and countable.
ii. Continuous data – The data which can take any value between an interval is referred
to as continuous data. This means that such data can take up any value between two
numbers. For example, daily temperature recorded in an Indian city in degree Celsius-
this variable can take any value in the range of -50° to 50°. Here, 4.7° or 42.3° are
acceptable observations. Similarly, the daily income of an ice-cream seller- here too
Rs. 346 or Rs. 1787.5 are suitable values. So continuous data can take decimal values,
is infinite and may not be countable.
3|Page
On the contrary, any information which is not directly measurable is known as qualitative data.
Qualitative data represents the qualities or the characteristics of data. Information like political
ideology, physical attributes of a person, problems faced by workers etc are examples of
qualitative data.
Scales of measurement:
i. Nominal scale data – The information which cannot be sorted or put in any order
is known as Nominal. Such data are individual set of information where changing
the order of the information does not change any meaning. For example, occupation
of respondents- the values may range from teacher, farmer, shopkeeper,
unemployed etc. These data cannot be measured, nor can they be sorted in any way.
Another example of nominal data may be the marital status of respondents. The
variable may take values like married, unmarried, divorced, widow etc. Changing
the order of the responses does not make any difference in the understanding of the
sample
ii. Ordinal scale data – In contrast, ordinal data follows a natural order. Although
these too cannot be measured explicitly, we can sort the data or order them in a way
to observe basic comparison between values. For example, the education level of
respondents- such a variable can take values from nursery to post graduation and
the data has a specific order. Opinion of respondents towards relevance of CCTV
cameras in workplaces could take values like strongly agree, somewhat agree,
somewhat disagree, strongly disagree, etc. These values can also be arranged in a
particular order.
iii. Interval scale – This scale pertains to numerical data which possesses the property
that differences in values represent the real differences in the variable. With such
variables, we know that not only one value is greater than the other but that the
distances between the intervals on the scale are the same. For example, the
temperature in Fahrenheit or Celsius, year of birth, etc. Here a temperature of 92°F
is greater than 90°F and also the difference between 92°F and 90°F would be same
as the difference between 90°F and 88°F.
iv. Ratio scale – The data belonging to ratio scale is a quantitative measurement with
labels and orders the variable with evenly spaced intervals between values. These
scales have a real absolute zero representing the total absence of the variable being
measured. Hence ratio scale variables are exactly same as interval scale variables
along with a “True zero.” For example, weight of a commodity & height of a person,
etc. Note that a zero value indicates that the commodity is weightless.
4|Page
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
IN-TEXT QUESTIONS
Q.1 Sort the following data into quantitative and qualitative data:
A. Gender of respondents
B. Number of lectures attended
C. Percentage marks obtained in Economics subject
D. Revenue in lakhs
E. Whether interested in buying a washing machine
Q.2 Mark whether the statements are true or false:
A. Nominal data can be arranged in a particular order
B. Amount of time taken to complete a class project is a continuous variable
C. Statistics helps us to make informed decisions
D. Qualitative data can be measured directly
E. Discrete data are finite and countable
Q.3 Match the following:
We can also categorize data into univariate, bivariate and multivariate data. Uni means one
and variate refers to variable. Hence univariate data consists of only a single variable. It is the
simplest form of data. For instance, the number of cold drinks sold by a street vendor on
weekdays. The data may look like as below:
1 2 3 4 5
5|Page
In the above example, you may notice that we can only describe the data and any kind of
relationship or comparison cannot be drawn. On the other hand, bivariate and multivariate data
allow a researcher to establish relationships and correlations between variables. Here, bi means
two and hence bivariate data involves two distinct variables. A researcher may use this data to
establish a relationship between the two variables. For instance, the number of cold drinks sold
by a street vendor and the daily temperature of the city the vendor resides in. They could look
like:
1 2 3 4 5
Temperature
34 37 36 37 38
(in °C)
A researcher may draw a conclusion that sales and temperature have a positive relationship
since as temperature in the city rises, so does the number of cold drinks sell by the street vendor.
Finally, multivariate data consists of more than two variables. Suppose a researcher wishes to
analyze the determinants of cold drinks sales in a city. So, she gathers data on cold drinks sales,
temperature of city, and price of cold drinks.
1 2 3 4 5
Temperature
34 37 36 37 38
(in °C)
6|Page
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
However, in statistics, the definition of population is slightly different. Here, population refers
to all the individuals/entities possessing similar characteristics and belonging to a particular
group under study. The population could vary from study to study. For instance, a researcher
might be interested in analyzing the results of the Common Admission Test (CAT) examination
in India. Here, the population would be comprised of all the candidates who appear for CAT
in a particular year. Say in a year, about 2.3 lakh candidates appear for the exam. To study a
population, the researcher would have to gather data for all 2.3 lakh candidates—no one from
the population can be left out. The best example to understand the concept of a population is a
census. In India, a census is the process of collecting and analyzing demographic, economic
and social aspects of Indian residents. Since census is a study of the whole population, the data
is collected from each and every Indian resident. You can imagine the scale at which the data
collection is carried out that it takes about 10 years to collect, clean and publish the entire data.
In statistics, the population depends on the topic and scope of research. Other examples of a
population could be total number of people working under the NREGS, voter population in
India, number of students enrolled in private schools in a state, total number of accidents taken
place in India etc.
Studying a population helps a researcher to gain useful insights into the characteristics of all
the elements under study. Since each and every unit is considered, the results are considered to
be reliable and representative of the population. This also enables a researcher to study more
than one aspect of the population and carry out an intensive study. Such a data can also form
the basis of further investigations. However, studying a population is more suitable when we
have a small scope of study.
The descriptive statistics taken from the population are termed as ‘population parameter’ or
simply a ‘parameter’. So, a parameter describes the characteristics of a population. They are
usually denoted by Greek letters such as µ (Mu) for mean and σ (Sigma) for standard deviation.
Based on the data collected from the entire population of 2.3 Lakh candidates, if we want to
say that the average marks a candidate scores in the CAT examination are 85, we could denote
it as: 𝜇 = 85.
1.4.2 Sample
As you may have identified, the major challenge of analyzing a population is that such research
requires data to be collected from each and every member of the population, which is
undeniably a tedious task. The possibilities of making errors in studying population is also
significant since there may be missing data due to non-response by respondents or
measurement complexities due to the large amount of data, etc. Gathering data from a
population not only requires extra efforts, but also consumes a lot of time and is expensive.
Constraints on scarce resources render a population survey unfeasible.
7|Page
To overcome the drawbacks, a subset of the population, referred to as a sample, may be used
instead. A sample is an unbiased subset of the statistical population which is representative of
the entire dataset. This means that a researcher randomly draws and analyzes some
observations from the population to make inferences about the whole population. The figure 1
depicts the relationship between population and sample:
Population
Sample
Population Sample
9|Page
IN-TEXT QUESTIONS
A. 𝑠 = 5.7
B. 𝜇 = 120
C. 𝜎 = 32
D. 𝑥̅ = 18
Q.7 Select the correct option:
I. Sample is used when:
A) Data collection is inexpensive B) Research is time sensitive
C) Population is small D) Population is unknown
II. A mean is called a statistic if it is calculated from the:
A) Sample B) Population
C) Parameter D) Standard deviation
10 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
To ensure that the samples drawn are representative of the population, it is crucial to understand
the different ways in which we can select a sample. The two broad methods of sampling are:
1.5.1 Probability sampling techniques – It is one of the commonly used sampling techniques
where each unit of population has an equal chance (or probability) of getting selected in a
sample. This means that the samples selected are random and unbiased and hence are
representative of the population. These techniques are also known as random sampling
techniques. The five techniques under probability sampling are:
a. Simple random sampling – As the name suggests, it is the most basic and crude form
of random sampling. Here each unit of population has an equal and independent chance
of being chosen. Example: in a lottery system, the names of each unit of population are
written on a chit and after thorough shuffling, the researcher picks the chits one by one
and notes down the names. Another way of simple random sampling is through random
number generation where all the units of population are assigned a number in sequential
order. Then random numbers are generated using software and sample selections is
carried out.
b. Systematic random sampling – Under systematic random sampling, the sample set is
selected from the population in a fixed interval. This technique is more feasible than
simple random sampling. To draw samples using systematic random sampling, the first
step is to arrange the units of population in an order and assign a number to each unit
𝑁
from 1 to 𝑁. Then the sampling interval is calculated using the formula: 𝐾 = 𝑛 ,
where 𝐾 is the interval, 𝑁 is the population size and 𝑛 is the sample size. Finally, we
select one unit at random and then select following units at equal interval 𝐾. For
example, suppose we have to collect information from residents of a city where the
houses are numbered from 1 to 100,000. So, if the size of the population is 100,000 and
we need a sample size of 1000 houses, then the interval should be 100,000/ 1000 = 100.
The researcher will choose one house at random and then will select every 100th house
thereafter to get the sample.
c. Stratified random sampling – The above two types of sampling techniques assume
that the population is homogeneous. In case the population is heterogenous, we use the
Stratified random sampling technique. Under this technique, we divide the population
into sub-groups based on homogeneous characteristics such as gender, age group,
income level, etc., called strata, and then select random samples from each sub-group
or stratum. For instance, if a company has 700 male employees and 300 female
employees, then simple and systematic random sampling technique may give us biased
samples. To avoid the bias, we use stratified random sampling wherein we create two
11 | P a g e
groups based on gender and then select random samples from both groups. The
technique has further two approaches to select a sample: Proportionate stratified
sampling and disproportionate stratified sampling.
d. Multi-stage sampling – At times the population of interest is quite large and
geographically diverse. In such cases, one sampling technique is not enough to select a
sample and using multi-stage sampling is suitable. Here, a sample is selected in stages,
combining different sampling techniques as described above. For instance, to study the
issues faced by primary school children, a researcher may first divide the population
into states, then use simple random sampling to create a sample of states. Next, the
researcher could again use simple random sampling to select a few districts, and finally
use systematic random sampling to identify a few schools within a district. The multi-
stage sampling method is frequently used to scale down large data sets into workable
sizes. Although there is no restriction on the number of stages you could use to select a
sample, it is important to note that all the sampling techniques used must be probability
sampling methods.
1.5.2 non-probability sampling techniques – Under non-probability sampling or non-
random sampling techniques, each unit of population does not have an equal chance of getting
selected in a sample, that is, the samples are not random—they may be biased and may not
represent the population accurately. Hence, non-probability sampling techniques may not
produce results that can be generalized. Yet, there are many occasions when non-probability
sampling methods are preferred over probability sampling methods, which will be discussed in
the following passages. The most commonly used non-probability sampling techniques are:
a. Convenience sampling – As is clear from the term itself, convenience sampling refers
to the sampling technique in which a researcher collects the data from convenient
sources. For instance, as part of the undergraduate course, a student undertakes a
research project in which she tries to understand the consumer sentiments related to
rooftop solar panels. Considering the time, cost and effort constraints, the student may
choose to collect data from the most convenient location to her, it may be within the
city she resides in or within the district. It is worth noting that such research may not
be representative of the population and generalization of the results may not be
appropriate. That said, such a technique is useful when a researcher carries out a pilot
survey to test the questionnaire.
b. Judgement sampling – Under this technique, the researcher selects a sample based on
her own judgement about the characteristics of the individuals. In other words, the
researcher uses her expertise to identify the best fir for her sample. Take the example
of a researcher who is interested in understanding the challenges faced by disabled
employees in a company. In such a case, the researcher can easily identify her sample
by using her sense of judgement about the characteristics of disabled persons.
12 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
c. Snowball sampling – At times it is difficult to locate the appropriate target group for
a study. In such cases, snowball sampling techniques may be used. This is generally
used in social science research. Under snowball sampling, the existing respondents are
asked to identify or suggest other individuals who are well-suited for the research. So
based on the references, the sample size keeps on increasing—just like a snowball. For
example, a researcher may want to understand the plight of immigrants in a city. Since
there may not be any official record of immigrants readily available, the researcher
could identify few immigrants and then ask them to help locating other immigrants for
the study. Such a technique comes in handy when we’re interested in researching hard-
to-find- groups. However, the risk of achieving biased results is quite high since the
initial respondents may refer to their friends and family, who share common
characteristics and beliefs.
The following figure summarizes all the sampling techniques we have discussed:
IN-TEXT QUESTIONS
Q.8 What type of a sampling technique is being used in the following examples:
A. A manager wants to select a sample of their clients to ask for donation. She
arranges the list of clients in alphabetical order and randomly selects the first
client. She then proceeds to select every 5th client from the list.
B. A news reporter gathers consumer sentiments regarding a government policy by
interviewing people on the street.
13 | P a g e
A researcher may apply statistics to simply summarize and describe the characteristics of data
or employ statistics to draw some conclusions or inferences from the data. Vast number of
research apply two types of Statistical analysis:
1. Descriptive statistics
2. Inferential statistics
14 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
can only report the findings. It cannot be concluded, by merely looking at the descriptive
statistics, that generally, in India, households with highly educated heads save more.
So, we see that descriptive statistics cannot be used to make general estimates or predictions.
Yet, descriptive statistics are extremely useful as they can provide a snapshot of the whole data
in meaningful ways. It helps simplify large data and present it in visually attractive ways. We
will learn about the various components of descriptive statistics in later chapters.
1.6.2 Inferential Statistics
On the other hand, inferential statistics is used to predict, estimate, and make other generalized
approximations based on the data. It is usually based on sample data to draw conclusions and
generalize the results to the larger population. Hypothesis testing and regression analysis are
two examples of inferential statistics.
Inferential statistics is very popular among researchers since it allows them to gather limited
data and extrapolate the results to a larger population. This saves them a lot of cost as well as
time.
If we consider the above example again, the research institute now gathers data from 10
randomly selected villages in India with total observations equal to roughly 1000 households.
Now the institute may generalize its results to state or national level through inferential
statistics.
The figure 3 summarizes the branches of statistics we discussed:
Statistical
analysis
Descriptive Inferential
statistics statistics
Measures of
Measures of Hypothesis Regression
central Graphs
variability testing analysis
tendency
15 | P a g e
IN-TEXT QUESTIONS
Q.11 Fill in the blanks:
A. We can make predictions and estimations using __________ statistics.
B. Histogram is an example of __________ statistics.
C. Standard deviation is calculated as a part of ___________ statistics.
Q.12 Select the correct option:
Inferential statistics can be used to
A) Estimate B) Generalize
C) Only A) D) Both A) and B)
1.7 SUMMARY
16 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
1.8 GLOSSARY
17 | P a g e
b. On the occasion of National Milk Day, the government wants to estimate the
number of cows a dairy farmer owns in a particular state. Using the census
approach, the government finds that about 40 lakh households are engaged in dairy
farming in the state with the average number of cows owned equal to 22. When a
government official took a random sample of 100 households in a district engaged
in dairy farming, she found out that the average number of cows owned equaled to
37.
Q.4 In a college, parking on the premises has become a major problem. To deal with the
problem, the college administration wishes to compute the average parking time of the
students who park their vehicles in the college parking lot. One of the college officials
quietly follows 150 students and records the duration of time the students keep their
vehicle in the parking lot.
a. Identify the population of interest to the college administration.
b. What is the sample size that the college administration is examining?
Q.5 Briefly describe any two methods of drawing non-probability samples.
Q.6 Why are descriptive statistics used? Can we use descriptive statistics to make
generalized predictions based on the data? If not, then how can we do so?
1.11 REFERENCES
• Devore, J. L. (2016). Probability and Statistics for Engineering and the Sciences.
Cengage learning.
• Larsen, R. J., & Marx, M. L. (2012). An introduction to mathematical statistics and its
applications. Prentice Hall.
• McClave, J. T., Benson, P. G., & Sincich, T. (2018). Statistics for business and
economics. Pearson Education.
19 | P a g e
LESSON 2
STRUCTURE
We’ve mentioned in the earlier chapters that descriptive statistics involve the computation of
the basic statistics of the data such as mean, median, standard deviation etc. These give a basic
idea about the distribution of the dataset. Visual representation of the data is also an integral
part of descriptive statistics. In this chapter we will take a closer look at the most common
graphic methods to present the data.
20 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
A stem and leaf plot are a convenient way to visualize continuous data. The plot can easily be
constructed by hand and gives an overview of the distribution of the observations in the data at
first glance. To create a plot, the data is arranged in an order and divided into equal intervals.
We then create a table that presents the whole data set in two columns. The values of the data
are split into two – stem and leaf. The first column—referred to as the stem—includes the tens,
hundreds or thousands unit, as per the values of the data and the second column—known as
the leaf—contains the rest of the digits. The concept will become clear with the following
example.
Say a researcher has collected data of the weight of 10 college students, chosen at random. The
weights, in kgs, are as follows:
Stem Leaf
4 3
5 29
6 167
7 14
8 8
9 2
21 | P a g e
Here, the first row with stem of 4 and leaf of 3 denotes the weight of 43kg and the last row
with stem of 9 and leaf of 2 denotes the weight of 92kg. Simply looking at the table, we can
work out that on average, the college students have a weight in sixties. We can observe that the
shape of the display gradually rises, peaks at 6 and then steadily declines. We call this a bell-
shaped curve which is symmetric in shape. We will learn about other features of a symmetrical
distribution later in the unit.
Let us consider another example in which we have a data set consisting of the number of hours
of daily sleep 100 students who are in college get. The stem and leaf display looks like this:
Stem Leaf
5 4455567889
6 00223446667899
7 000112233566789999
8 00000011222234445577778889
9 0001333355777779
10 01344446899
11 12247
Note that here, the first row with stem of 5 and leaf of 4 denotes 5.4 hours of sleep and the last
row with stem of 11 and leaf of 7 denotes 11.7 hours of sleep. We can clearly observe that most
of the students get about 8 hours of sleep. If we ask you how many students sleep for more than
10 hours a day, then you have to simply add the number of leaves written in front of stems 10
and 11. So the answer will be 16 students.
A stem and leaf plot are helpful to get a basic understanding of the dataset, however it gets
difficult to create a chart when the number of observations increase.
IN-TEXT QUESTIONS
Q.1 The correct list of data for the following stem and leaf plot is:
Stem Leaf
0 3
3 27
5 119
7 0
22 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
A. 03, 32, 37, 11, 51, 59,70 B. 03, 27, 37, 51, 51, 59,70
C. 03, 32, 37, 51, 51, 59,70 D. 03, 32, 37, 51, 59,70
Q.2 The following stem-and-leaf plot depicts the number of cakes that a home-baker sells
each week. If 1|7 represents 17 cakes, then,
Stem Leaf
0 689
1 024479
2 01136889
3 122
A. How many weeks did the home-bakers sell cakes?
B. How many weeks did they sell more than 25 cakes?
A dot plot is another simplified way to visualize the data in the form of dots representing each
unit of observation. The dots are stacked over one another that represent the frequency of the
value in our dataset. For example, suppose a researcher would like to know the number of
vaccinated children in a city. To make the analysis simpler, she divides the city into 5 localities
and collects the number of vaccinated children for each locality. The data collected is tabulated
in the following manner:
1 6
2 1
3 3
4 11
5 8
23 | P a g e
The dot plot for the data would look like figure 1 where the height of each column denotes the
frequency of the observation.
Since the data is continuous and unique, we would have one dot for each state. Instead, to
make the dot plots more informative, we create groups of data or class intervals.
24 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
70-75 1
75-80 2
80-85 1
85-90 3
90-95 2
95-100 1
Now we can easily create the dot plot using the above table. Try making the dot plot on your
own using the above table. Your dot plot should look something like figure 2:
25 | P a g e
IN-TEXT QUESTIONS
Q.3 The following dot plot illustrates the number of hours a student spends talking on phone
per week:
A. How many students report that they did not talk on the phone at all during the week?
B. How many students spend at least 2 hours on the phone?
Q.4 True or False: We cannot create dot plots for continuous data.
One of the most popularly used graph types is a bar chart that uses horizontal or vertical bars
to depict the observations in the dataset. Bar charts can further be of two types: Stacked bar
chart or grouped bar chart. Let’s understand all the kinds of bar graphs using an example.
Suppose we have the following data on two-wheeler sales in a city over the years:
2010 12,500
2015 17,000
2020 24,000
To create a vertical bar graph, we plot the years on the X-axis and the sale numbers on Y-axis.
The height of the vertical rectangular bar represents the value of the data, as presented below:
26 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
27 | P a g e
Stacked bar charts are designed in a way that two or more categories of the same data are
presented on the same bar. Stacked bar charts allow the reader to easily compare the value of
various categories simultaneously. Let us take the above example one step further. Suppose
that we have the following data for two-wheeler, three-wheeler and four-wheeler sales in a city:
To create stacked bar charts, we draw the bars for each category on top of each other for a
particular year. The final chart will look something like this:
Figure 4: Stacked bar chart of sale of two-wheelers in 2010, 2015 and 2020
Here we can compare the sales of each category of vehicle for each year. We can also create
100% stacked bar charts to present the same data in a more visually appealing way. A 100%
stacked bar chart represents the share of each category in the data out of 100. The height of all
the bars is equal to hundred percent and we can observe the relative changes in the values from
the size of the sub-bars:
28 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
Figure 5: 100% Stacked bar chart of sale of two-wheelers in 2010, 2015 and 2020
Grouped bar charts are a convenient way to compare different categories of data. In a grouped
bar chart, the bars for each category are placed adjacent to each other instead of on top of each
other. In this case, we can easily observe the absolute changes in the data of each category.
When interpreting stacked and grouped bar charts, it is crucial to pay extra attention to the
legend that is usually displayed at the bottom of a chart.
It is important to remember that it can become complex to present too many categories of data
in a stacked or grouped bar chart.
Figure 6: Grouped bar chart of sale of two-wheelers in 2010, 2015 and 2020
29 | P a g e
IN-TEXT QUESTIONS
Q.5 The following bar graph displays the favorite color of 200 kindergarten students in a
school:
A. Which is the most preferred and least preferred color among the students?
B. How many students like oranges?
Q.6 The following grouped bar chart shows the daily number of customers visiting a
shopping center in morning (M) and evening(E).
A. On which day/days does the shopping center receive an equal number of customers
in the morning as well as evening?
B. On which day/days do we see less customers shopping in the evening than in the
morning?
30 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
Q.7 A travel agent organizes trips to destinations either in South India or North India. The
following 100% stacked bar chart presents the share of tourists visiting both the regions
between 2010 and 2014.
A. Looking at the bar graph can you say that over the years the popularity of North Indian
destinations is increasing? Why or why not?
B. In which year did maximum tourists visit South Indian destinations as compared to
North Indian destinations?
2.6 HISTOGRAMS
At first, a histogram may look similar to a bar chart, but both are significantly different to each
other. A histogram is also represented in the form of bars placed adjacent to each other, but
each bar here represents the frequency with which an observation occurs in the dataset. The
term frequency of any particular value is simply the number of times that value occurs in the
data set. Hence, a histogram is also said to represent the ‘frequency distribution of variables.’
On the horizontal axis, we usually take the range/class intervals and on the vertical axis we
place the frequency. We construct class intervals in such a way that each observation is
contained in exactly one interval.
For instance, following are the economics marks of 20 college students:
86, 57, 69, 64, 67, 59, 81, 34, 47, 46, 38, 51, 66, 91, 42, 73, 62, 70, 77, 55
To construct a histogram, we divide the dataset into class intervals with each interval
representing 10 marks. Next, we’ll insert the frequency with which an observation occurs
within each interval:
31 | P a g e
Marks Frequency
30-40 2
40-50 3
50-60 4
60-70 5
70-80 3
80-90 2
90-100 1
32 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
following Dot plot displays data with a strong concentration of values in the center and a limited
values on either side.
When there are only a few class intervals with equal widths, there are chances that all
observations fall into just one or two of the classes. There may be some classes that have zero
frequency if equal widths of class intervals are used. Using a few bigger intervals close to
extreme observations and narrower intervals in the area of high concentration is a wise
decision. To construct a histogram with unequal class widths, first determine the frequencies.
Then relative frequencies may be calculated by the following formula:
Number of times the value occurs
𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 =
Number of observations in the data set
This signifies the proportion of times the value occurs in the data.
The height of each bar can then be computed using the formula given by:
𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠
𝐵𝑎𝑟 ℎ𝑒𝑖𝑔ℎ𝑡 =
𝐶𝑙𝑎𝑠𝑠 𝑤𝑖𝑑𝑡ℎ
The resulting rectangle or bar heights are typically called densities.
Such a histogram has an interesting property. If we multiply the bar height by the class width,
we will get,
= 𝐴𝑟𝑒𝑎 𝑜𝑓 𝑟𝑒𝑐𝑡𝑎𝑛𝑔𝑙𝑒
This means that the area of each rectangle or bar represents the relative frequency of the
corresponding class interval. Moreover, the sum of relative frequencies should be one and
hence the total area of all rectangles in such a density histogram is one.
SHAPES OF HISTOGRAMS
The histogram in the first example follows a quite symmetric or a bell-shaped distribution. This
means that if we place a mirror in the exact center of the distribution, the left and right side of
the distribution will have the same shape. A symmetric or a bell-shaped distribution is also
called a ‘normal distribution.’
33 | P a g e
However, this is not the case always. Asymmetric distributions are called skewed. There are
two types of skewed distributions – right skewed distribution and left skewed distribution.
When the dataset has a greater number of observations on the left side of mean, then it is called
a right or positively skewed distribution. Conversely, when the dataset has a greater number of
observations on the right side of mean, then it is called a left or negatively skewed distribution.
Now consider the mathematics marks of the same 20 students:
45, 58, 51, 54, 49, 59, 88, 44, 49, 41, 48, 61, 66, 93, 40, 64, 69, 77, 72, 51
Follow the same steps to create intervals:
Marks Frequency
40-50 7
50-60 5
60-70 4
70-80 2
80-90 2
90-100 1
The histogram looks like this:
34 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
Marks Frequency
40-50 1
50-60 2
60-70 3
70-80 3
80-90 5
90-100 6
histogram arises in case the data set consists of observations of two very dissimilar kinds of
individuals or objects. Let us understand this with an example. A restaurant experiences high
footfall during lunchtime and dinner time. Hence if a researcher collects data on the number of
customers entering a restaurant in day, the histogram will display two peaks. A bimodal
histogram with hypothetical numbers would look something like this:
Figure 10: Bimodal histogram of number of the customers entering a restaurant in day
Similarly, multimodal histograms have more than two peaks. Such a graph suggests that the
data may have different patterns of response. Suppose a researcher has some data on the heights
of plants belonging to 3 different kinds of species. One of the species has tall plants, the other
has medium plants and the third species has shorter plants. She creates a histogram to find out
the pattern in her data. The following figure illustrates a histogram she created:
This is clearly a multimodal histogram where each peak represents the most common height of
each species of plants.
Although histograms are a useful way to present the data, one major drawback of histograms
is that the reader cannot identify the individual values of the data by simply looking at the
graph.
IN-TEXT QUESTIONS
37 | P a g e
Q. 11 Choose the correct option from the bracket and fill in the blank:
When a dataset has greater number of observations on the left side of mean, then it is
called a _______ (right/ left) skewed distribution.
2.7 SUMMARY
In this lesson some of the most popular ways to display data, as a part of descriptive statistics,
were discussed. In a stem and leaf plot, data is divided into two columns – stem and leaf. The
plot gives a visual understanding of the distribution of the observations. The dot plots are
illustrated in the form of dots representing each unit of observation. The dots are stacked over
one another that represent the frequency of the value in our dataset. Bar charts use horizontal
or vertical bars to depict the observations in the dataset. Bar charts can further be of two types:
stacked bar chart and grouped bar chart. Histograms are the most widely used graphs in
statistics. Vertical bars are used to depict the observations in the dataset. When the dataset has
a greater number of observations on the left side of mean, it is called a right or positively
skewed distribution whereas when the dataset has a greater number of observations on the right
side of mean, then it is called a left or negatively skewed distribution. A histogram with a single
peak is known as a unimodal histogram. A histogram with two peaks is referred to as a bimodal
histogram. Histograms with more than two peaks are referred to as multimodal histograms.
2.8 GLOSSARY
• Bar charts – A graph that displays data through vertical or horizontal rectangular bars with
heights of each bar representing the values of the observation
38 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
• Dot plots – A plot illustrating the frequency of observations through dots on a simple scale
• Histograms – A graph that shows the frequency of data using rectangular bars with heights
of each bar representing the frequency of observations laying in that range
• Negative skewness – When dataset has greater number of observations on right side of
mean
• Positive skewness – When dataset has greater number of observations on left side of mean
• Stem and Leaf plot – A plot where each observation in data is presented in two columns-
Stem (the first digit) and leaf (the other digits).
Q.1 Following are the weights of individuals who have taken membership of two local gyms
in a city:
39 | P a g e
Gym 1: 94 90 95 93 128 95 125 91 104 116 162 102 90 110 92 113 116 90 97 103 95 120 109
91 138
Gym 2: 123 116 90 158 122 119 125 90 96 94 137 102 105 106 95 125 122 103 96 111 81 113
128 93 92
Create stem and leaf plot for both the gyms and interpret the plots.
Q.2 The following table summarizes the data collected by a researcher on the time taken to
eat breakfast by 40 respondents:
Minutes: 0 1 2 3 4 5 6 7 8 9 10 11 12
People: 6 2 3 5 2 5 0 0 2 3 7 4 1
Q.3 Honey has recently passed class 12th and the following table presents her marksheet:
Business
Subject English Hindi Accountancy Mathematics Economics
studies
Marks 87 93 96 99 97 92
Business
Subject English Hindi Accountancy Mathematics Economics
studies
Marks 82 97 88 71 79 95
In which subject/subjects was the difference in the marks between Honey and Hazel highest?
Q.4 Mr. Kapoor owns a garden with 30 cherry trees. The height of the trees in inches are:
61, 63, 64, 66, 68, 69, 71, 71.5, 72, 72.5, 67.5, 73.5, 74, 74.5, 76, 76.2, 76.5, 77, 77.5,
78, 78.5, 79, 79.2, 80, 81, 82, 83, 84, 85, 87
40 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
A. Create a histogram of the above data by creating intervals of 5 inches. Comment on the
shape of the graph.
B. Recently Mrs. Kapoor bought 10 more cherry plants to propagate them in their garden.
Their heights in inches are: 57.5, 66, 40.5, 59, 46, 69.5, 67, 51.5, 52, 62. Incorporate
this additional information and create a new histogram for all 40 cherry trees.
Comment and compare the shapes of both the histograms.
Q.5. Draw a histogram for the following dataset:
Frequency 35 25 45 15 20 40
Number of 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
particles
Frequency 1 2 3 12 11 15 18 10 12 4 5 3 1 2 1
a. What proportion of the sampled wafers had at least one particle? At least five
particles?
b. What proportion of the sampled wafers had between five and ten particles,
inclusive? Strictly between five and ten particles?
c. Draw a histogram using relative frequency on the vertical axis. How would you
describe the shape of the histogram?
2.11 REFERENCES
• Devore, J. L. (2016). Probability and Statistics for Engineering and the Sciences. Cengage
learning.
• Larsen, R. J., & Marx, M. L. (2012). An introduction to mathematical statistics and its
applications. Prentice Hall.
• McClave, J. T., Benson, P. G., & Sincich, T. (2018). Statistics for business and economics.
Pearson Education.
41 | P a g e
Gupta, S. C. (2019). Fundamentals of statistics. New Delhi, India: Himalaya publishing house.
42 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
LESSON 3
STRUCTURE
43 | P a g e
Continuing with our discussion about descriptive statistics, we now move on to the core
elements of the descriptive statistics, also known as summary statistics. Recall that under
descriptive statistics we quantitatively present an overview of the data. Descriptive statistics
can be divided into two sub-groups:
a. Measures of central tendency – Measures of central tendency comprise of certain
measurements that give us a typical or central value from the data around which the
data is generally clustered. These central tendencies are also referred to as averages. In
this course, we will concentrate on the three major measures of central tendencies-
Mean, median, and mode. We will also briefly study quartiles, percentiles, deciles and
trimmed mean.
b. Measures of variability – Measures of variability represent the spread of the data around
the averages. It denotes the variation in the data. The most widely used measures of
variability are range, standard deviation, and variance.
Let us now look at the two in detail.
3.3.1 Mean
The most common measure of central tendency is arithmetic mean or simply, mean of the
data set. You must have studied about mean at some point in your mathematics course. Mean
is simply the average value of the dataset represented which can be calculated by adding the
value of all the observations and dividing the sum by the total number of observations, i.e.,
44 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
∑𝑛𝑖=1 𝑥𝑖
𝜇=
𝑁
Where, 𝜇 denotes the mean of 𝑁 observations, ∑𝑁
𝑖=1 𝑥𝑖 = 𝑥1 + 𝑥2 + 𝑥3 + ⋯ + 𝑥𝑛
Recall from previous lesson, population mean (parameter) is denoted by 𝜇 and sample mean
(statistic) is denoted by 𝑥̅ .
Let us calculate the mean of following 5 numbers to understand the formula better:
Note here that we denote mean as 𝑥̅ . This is called the sample mean, that is calculated from the
sample data. The mean taken from the population data is denoted by µ, as mentioned in the
previous chapters.
45 | P a g e
Mean is a useful measure of central tendency with allows the reader to get an approximate idea
about the typical value in a dataset. When we have very large datasets then the relevance of
mean is clearer. Say a researcher gathers prices of houses in a locality. She gathers the price of
about 1000 houses. The prices range between Rs. 45 lakhs to 1.9 crore. If the researcher
calculates the average price of the houses as Rs. 1.2 crore, then we can easily interpret that a
typical house in that locality is worth Rs. 1.2 crore. So, mean is a convenient way to locate the
average value of the dataset.
Before we move on to median, we should note some pros and cons of using mean. Arithmetic
mean is the simplest measure of central tendency and is rigidly defined. The mean is not
affected by the order of the data, i.e., the data may be in ascending or in descending order. Each
and every observation in the dataset is used to calculate mean. This ensures that there is no loss
of information. Finally, Arithmetic mean is capable of further mathematical treatment. If we
have separate means of two groups of data, we can easily get the combined mean. However,
mean also suffers from some limitations. First, mean is affected by extreme observations. Few
extreme observations can impact the mean of the dataset which may not be the accurate
representation anymore. Second, mean cannot be calculated if even one observation is missing.
We cannot determine the mean by merely glancing at the dataset. It needs to be calculated each
time using the formula. Finally, mean cannot be calculated for open ended class intervals.
Sometimes when we’re collecting data, we may come across extremely large or extremely
small values in our data that may either be incorrectly stated by the respondent or incorrectly
recorded by the researcher. In such cases, we see that mean gets impacted by extreme values
in our dataset. This is one of the major limitations of using mean as a central value.
The following example will make the argument clear: suppose that the salary of 10 employees
in a firm, in lakhs per annum, is given below:
46 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
We conclude that the average salary in the firm is Rs 6.1 lakhs per annum. However, if we take
a closer look at the observations, we realize that eight out of the 10 salaries lie between 3 and
4.5 lakhs and the two extreme values influenced the mean. So, we see here that mean is not
representative of the above data.
When we have data that contains extreme values or outliers, in such cases, we prefer to use
median over mean. We’ll see why and how, in the subsequent section.
IN-TEXT QUESTIONS
Q.3 The mean of 6, 8, 𝑥 + 2, 10, 2𝑥 − 1, 𝑎𝑛𝑑 2 is 9. The value of x the value of the
observations in the data are:
A. x = 11; Observations = 13 and 21 B. x = 9; Observations = 11 and 17
C. x = 10; Observations = 12 and 19 D. x = 12; Observations = 14 and 23
Q.4 The average marks of 39 students of a class are 50. The marks obtained by the 40th
student is 39 more than the average marks of all 40 students. Find the mean marks of
all 40 students.
3.3.2 Median
The word Median is synonymous with ‘middle’. Median is the middle observation in the data
when the data is arranged in ascending or descending order of magnitude. Median is less
affected by extreme values. The population median (parameter) is denoted by 𝜇̃ while sample
median (statistic) is symbolized as 𝑥̃. There are two formulas to calculate median if we have N
observations:
i. When there are odd number of observations, then median is simply the middle
value, i.e., when n = odd,
47 | P a g e
𝑁 + 1 𝑡ℎ
𝜇̃ = ( ) 𝑜𝑟𝑑𝑒𝑟𝑒𝑑 𝑣𝑎𝑙𝑢𝑒
2
ii. When there are even number of observations, then median is simply the average of
the two middle values or, when n = even,
𝑁 𝑡ℎ 𝑁 𝑡ℎ
𝜇̃ = 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑜𝑓 ( ) 𝑎𝑛𝑑 ( + 1) 𝑜𝑟𝑑𝑒𝑟𝑒𝑑 𝑣𝑎𝑙𝑢𝑒
2 2
Let us consider the same example of salaries of 10 employees in a firm, in lakhs per annum:
10 𝑡ℎ 10 𝑡ℎ
𝜇̃ = 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑜𝑓 ( ) 𝑎𝑛𝑑 ( + 1) 𝑜𝑟𝑑𝑒𝑟𝑒𝑑 𝑣𝑎𝑙𝑢𝑒
2 2
= 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑜𝑓 5𝑡ℎ 𝑎𝑛𝑑 6𝑡ℎ 𝑜𝑟𝑑𝑒𝑟𝑒𝑑 𝑣𝑎𝑙𝑢𝑒
= 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑜𝑓 3.7 𝑎𝑛𝑑 4.2
3.7 + 4.2
=
2
= 3.95
So, we can conclude that the median salary in the firm is Rs. 3.95 lakh per annum.
Note here that if we had salaries of only 9 employees, then since 9 is odd, the median would
𝑁+1 𝑡ℎ
simply be the middle observation, i.e., the ( 2 ) observation. This means that the median
would’ve been the 5th observation, i.e., 3.7. We would then have concluded that the median
salary in the firm is Rs. 3.7 lakh per annum.
It is worth noting here that even if we increase the values of the extreme observations from 15
and 16.5 to say 40 and 50, we will still get the same value of median. As median does not get
affected by extreme values in a dataset, it is said to be representative of the sample. Hence, in
cases when we have extreme observations, median is a better measure of central tendency than
mean.
Median as a measure of central tendency is quite useful since it is easy to understand and
calculate and, in some cases, median can be located by simply looking at the data. Median is
better than mean since is not at all affected by extreme values in the dataset. Also, it can be
48 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
calculated for open-ended distributions. However, the limitations of using mean are that the
data must be in either ascending or descending order. If we have a very large dataset, arranging
the data may be time-consuming. Median is not based on all the observations and so may not
be representative of the dataset. Finally, it is not capable of further mathematical treatment
IN-TEXT QUESTIONS
Q.5 The import of electronic products in million dollars in a country for eight years was
recorded as 27.4, 16.6, 1.7, 14.1, 32.9 18.7, 3.8, 22.5. The median import of the country
is ____.
Q.6 The following numbers are arranged in ascending order:
15 , 𝑥 , 22 , 𝑥 + 7, 32 , 56 , 88
If the median of the data is 25, then the value of x will be:
A. 17 B. 25
C. 18 C. 19
Q.7 The runs scored in a cricket match by 11 players is as follows:
7, 16, 167, 41, 110, 57, 1, 16, 9, 0, 16
Answer the following questions:
A. The mean runs scored is ____.
B. Mean is a good measure of central tendency for the above data. (True/False). Justify
your answer.
C. The median runs scored is ____.
D. Median is a good measure of central tendency for the above data. (True/False).
Justify your answer.
3.3.3 Mode
In simple terms, mode is that value in the dataset which appears most frequently. Like median,
the value of mode too does not get affected by extreme observations. Mode is easy to
understand and calculate and, in some cases, its value can be located by simply looking at the
data. Mode can also be calculated for open-ended distributions.
For example, following are a person’s daily expenditure, in Rs., on lunch in a week:
Clearly, we can observe that the person spends Rs. 130 four times a week on lunch. Hence, the
mode is 130.
Data may have a single mode, two modes (known as bimodal) or more than two modes
(multimodal).
Mode also suffers from some limitations. First, it is not rigidly defined. Second, mode is not
based on all the observations and so may not be representative of the dataset. Finally, it is not
capable of further mathematical treatment
3.3.4 Relationship between Mean, Median and Mode
Now that we have studied the three measures of central tendencies, you may believe that since
all of the measures denote the central value in a dataset, then they must be the same. In this
section we’ll see that this is not always true.
First take the following 16 observations and try to calculate the mean, median and mode by
yourself:
4, 5, 6, 6, 6, 7, 7, 7, 7, 7, 7, 8, 8, 8, 9, 10
Mean =
∑𝑛𝑖=1 𝑥𝑖
𝑥̅ =
𝑛
4 + 5 + 6 + 6 + 6 + 7 + 7 + 7 + 7 + 7 + 7 + 8 + 8 + 8 + 9 + 10
16
112
=
16
= 7
We have mean equal to 7.
Since we have even (16) observations, for median, we use the following formula:
𝑛 𝑡ℎ 𝑛 𝑡ℎ
𝑥̃ = 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑜𝑓 ( ) 𝑎𝑛𝑑 ( + 1) 𝑜𝑟𝑑𝑒𝑟𝑒𝑑 𝑣𝑎𝑙𝑢𝑒
2 2
16 𝑡ℎ 16 𝑡ℎ
= 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑜𝑓 ( ) 𝑎𝑛𝑑 ( + 1) 𝑜𝑟𝑑𝑒𝑟𝑒𝑑 𝑣𝑎𝑙𝑢𝑒
2 2
= 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑜𝑓 8𝑡ℎ 𝑎𝑛𝑑 9𝑡ℎ 𝑜𝑟𝑑𝑒𝑟𝑒𝑑 𝑣𝑎𝑙𝑢𝑒
50 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
7+7
=
2
=7
Hence, we get median equal to 7.
Finally, we can look at the data and can infer that the mode is 7 since it occurs the maximum
number of times in the dataset.
In this example, it is evident that Mean = Median = Mode. We can look at the central tendencies
using a histogram as well:
Figure 1: Histogram of symmetric data
However, when we have unsymmetrical or skewed data, the three measures are not equal. We
will learn about skewness in detail later in the lesson.
The relationship between Mean, Median and Mode, when the data is not symmetrical, can
mathematically be explained by the following formula:
IN-TEXT QUESTIONS
Q.8 A researcher records the age of each participant in her study. The ages are:
21, 59, 62, 21, 66, 28, 66, 48, 79, 59, 28, 62, 63, 63, 48, 66, 59, 66, 48, 79, 19, 79
The mode of the above data is:
A. 66 B. 59
C. 79 D. 63
Q.9 A researcher has computed the mean of her data as 22.5 and median as 20. Calculate
the value of mode using these values. Is the distribution symmetrical?
Q.10 A researcher calculates the following values of median, and mode of a distribution.
Median = 17.5
Mode = 20.5
A. Calculate Mean.
B. Do these values represent a symmetrical distribution?
3.3.5 Other measures of central tendency- Quartiles, percentiles, deciles and trimmed
mean
Mean and median are not the only measures of central tendency. There are several others as
well. We will briefly introduce the concepts of quartiles, percentiles and trimmed mean.
Just as a median divide the dataset into two equal halves, quartiles divide the data set into 4
equal parts. There are 3 quartiles, and each quartile consists of exactly 25% of observations.
The first quartile, Q1, containing the first 25% of observations is known as the lower quartile.
The second quartile, Q2 is called the median which divides the dataset into two. The third
52 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
quartile, Q3 is known as the upper quartile where 75% of observations lie below it and 25% of
the observations are greater than this quartile.
To calculate quartiles, we first arrange the observations in ascending or descending order. We
can then find the value of each quartile by using the following formulae:
𝑛 + 1 𝑡ℎ
𝑄1 = ( ) 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
4
𝑛 + 1 𝑡ℎ
𝑄2 = ( ) 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 = 𝑀𝑒𝑑𝑖𝑎𝑛
2
𝑛 + 1 𝑡ℎ
𝑄3 = (3 × ) 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
4
Consider the following dataset to understand the quartiles better:
0, 2, 5, 7, 8, 10, 16, 23, 35, 52, 77.
Since there are odd number of observations, you can easily identify the median in the above
dataset. Median is 10. This is our second quartile, i.e., Q2. Now as per the formula of the first
quartile,
11 + 1 𝑡ℎ
𝑄1 = ( ) 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
4
12 𝑡ℎ
= ( ) 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
4
= 3𝑟𝑑 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
=5
Similarly, for third quartile,
𝑛 + 1 𝑡ℎ
𝑄3 = (3 × ) 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
4
11 + 1 𝑡ℎ
= (3 × ) 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
4
12 𝑡ℎ
= (3 × ) 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
4
53 | P a g e
= 9𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
= 35
For grouped data, a quartile can be calculated using the following formula:
𝑗𝑁
− 𝑃𝑐𝑓
𝑄𝑗 = 𝐿 + 4 × 𝑖 𝑓𝑜𝑟 𝑗 = 1, 2, 3.
𝑓
Here, L = lower limit of quartile class, Pcf = Preceding cumulative frequency and i = size of
quartile class.
We can easily use the above formula to obtain each quartile in the following manner:
𝑁
− 𝑃𝑐𝑓
𝑄1 = 𝐿 + 4 ×𝑖
𝑓
𝑁
− 𝑃𝑐𝑓
𝑄2 = 𝐿 + 2 ×𝑖
𝑓
and,
3𝑁
− 𝑃𝑐𝑓
𝑄3 = 𝐿 + 4 ×𝑖
𝑓
Percentiles, on the other hand, simply denote that observation below which a particular
percentage of observations fall. The value of percentiles varies on the scale from 1 to 100. For
instance, 90th percentile would indicate that observation in the dataset below which 90% of
observations fall. A percentile of an observation ‘x’ can be calculated by the following formula:
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑉𝑎𝑙𝑢𝑒𝑠 𝐵𝑒𝑙𝑜𝑤 “𝑥”
𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 = × 100
𝑇𝑜𝑡𝑎𝑙 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑉𝑎𝑙𝑢𝑒𝑠
For grouped data, a percentile can be calculated by using the following formula:
𝑗𝑁
− 𝑃𝑐𝑓
𝑃𝑗 = 𝐿 + 100 × 𝑖 𝑓𝑜𝑟 𝑖 = 1,2,3, … , 99.
𝑓
Similarly, deciles divide a dataset into 10 equal parts, which is in contrast to a percentile, which
divides a dataset into 100 parts. As seen above, we can derive a similar formula for computing
a decile:
54 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
𝑗𝑁
− 𝑃𝑐𝑓
𝐷𝑗 = 𝐿 + 10 × 𝑖 𝑓𝑜𝑟 𝑖 = 1,2,3, … , 9.
𝑓
Where the symbols have the usual meaning and interpretation. You must note here that we will
have ninety-nine percentiles (𝑃1 , 𝑃2 , … … … , 𝑃99 ) and ten deciles (𝐷1 , 𝐷2 , … … … , 𝐷9 ). For both,
percentiles and deciles, the middle values, 𝑃50 and 𝐷5 represent the median.
As we already know that mean is sensitive to extreme observations, we use trimmed mean to
eliminate the extreme observations from our analysis. Such a measure is considered to be more
accurate than the regular mean. For instance, to compute a 5% trimmed mean, we will eliminate
the smallest 5% and the largest 5% of the sample observations and then calculate the mean of
the remaining observations in the regular way.
IN-TEXT QUESTIONS
Q.11 For the following data set, the value of upper quartile is _______.
18, 30, 32, 39, 54, 57, 61, 62, 81, 88, 90
Q.12 2nd Quartile = 5th Decile = 50th Percentile =
A. Mode B. Median
C. Mean D. Trimmed mean
Reporting the central values of a dataset provides only partial information about the entire data.
It is possible that the two datasets have similar measures of central tendency, but both may
differ based on the spread of the values. The logic will become clear through the following
example.
Consider two restaurants selling pizzas that are located in the same town. A researcher collected
data on the delivery time of both the restaurants, in minutes, for a week, as given below:
55 | P a g e
42 + 50 + 47 + 43 + 52 + 55 + 40
𝑥̅𝐴 =
7
329
=
7
= 47
For restaurant B,
47 + 32 + 70 + 55 + 65 + 35 + 25
𝑥̅ 𝐵 =
7
329
=
7
= 47
It is your task to check if the value of median for both the restaurants is also equal to 47 or not.
Moving on, we can conclude that the average delivery time of both the restaurants is 47
minutes. You may also try to create a dot plot of the two datasets. It will look something like
this:
Now, since both the datasets have the same central value, can we claim that both the datasets
convey the same information? No.
If you observe the values in each dataset carefully, you will notice that the delivery time of
restaurant A ranges between 40 and 55 minutes, whereas the delivery time of restaurant B
ranges between 25 and 70 minutes. Even in the Dot plots, you may observe that the data points
of restaurant A are clustered together, whereas the data points of restaurant B are spread out.
What can you infer from this extra piece of information? This means that restaurant A is more
consistent in delivering pizzas between 40 and 55 minutes, that is, the time frame of delivery
is shorter. Whereas the time taken by restaurant B to deliver a pizza is subject to more variation,
that is the time frame of delivery is longer. Knowledge about such variation is important when
we have to make important decisions. In this example, say you are starving, but you have just
entered an Economics class that will finish in exactly 45 minutes. If you have to take the
decision now to order a pizza, which restaurant would you prefer? You should prefer restaurant
A since it is possible that restaurant B will deliver the pizza 20 minutes earlier and you are sure
that neither would the professor finish the class, nor would you be able to leave the class that
early.
56 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
𝑅𝑎𝑛𝑔𝑒 = 𝐻 − 𝐿
Where H is the highest value and L is the lowest value of a dataset. In the above example, the
range for restaurant A = 55 − 40 = 15. Whereas the range for restaurant B = 70 − 25 = 45.
57 | P a g e
This clearly indicates that there is more variability in the data points of restaurant B as
compared to restaurant A.
The concept of range is extensively used in statistical quality control. Range is helpful in
studying the variation in the prices of shares and debentures and other commodities that are
very sensitive to price changes from one period to another. For the meteorological department
too, range is a good indicator for weather forecast.
The relative measure corresponding to range, called coefficient of range, is obtained by the
formula:
𝐻−𝐿
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑅𝑎𝑛𝑔𝑒 =
𝐻+𝐿
Although range is easy to understand and compute, the usage of this measure is limited since
it takes into consideration only the extreme data points and other observations in the data are
simply ignored. So, range can be affected by extreme values. Moreover, as the size of the
dataset increases, range loses its relevance as a measure of variability.
A slightly better measure of variability than range is the inter-quartile range. Here, instead of
taking the difference between the extreme observations in the dataset, we take the difference
between the upper and lower quartile. It is calculated as Q3 – Q1. This is a better measure than
range since it takes into account only the middle 50% of the observations and the extreme
observations do not affect the measure.
3.4.2 Standard deviation and Variance
Standard deviation and Variance are considered significantly better measures of variability as
they are based on all the observations in the dataset and hence are more sensitive than range
and inter quartile range. Since standard deviation is just the square root of variance, we will
begin our discussion with variance.
As the name suggests, variance illustrates the variation in a dataset, that is, how far each data
point is from the average. Formally, variance is calculated by dividing the sum of the squared
deviations from the mean by (n – 1), where n denotes the sample size. We symbolize sample
variance by 𝑠 2 and the formula can be written as:
2
∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2
𝑠 =
(𝑛 − 1)
Here, the term ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 represents the sum of square of deviations of each data point
taken from the sample mean 𝑥̅ . To make this simpler, we break this figure into three steps:
Step 1: Calculate the difference between the mean and each observation in the dataset
58 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
∑(𝑥1 − 𝑥̅ ) = − 5 + 3 + 0 − 4 + 5 + 8 − 7
𝑖
=0
Since the positive and negative deviations from the mean cancel each other out, we consider
the sum of squared deviations from the mean. In this way we can eliminate all the negative
values. So, in our example,
(𝑥1 − 𝑥̅ )2 = (−5)2 = 25
(𝑥2 − 𝑥̅ )2 = 32 = 9
(𝑥3 − 𝑥̅ )2 = 02 = 0
(𝑥4 − 𝑥̅ )2 = (−4)2 = 16
59 | P a g e
(𝑥5 − 𝑥̅ )2 = 52 = 25
(𝑥6 − 𝑥̅ )2 = 82 = 64
(𝑥7 − 𝑥̅ )2 = (−7)2 = 49
∑(𝑥1 − 𝑥̅ )2 = 25 + 9 + 0 + 16 + 25 + 64 + 49
𝑖
= 188
𝑠 = √𝑠 2
60 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
understanding of the population parameters remains the same, just that we now say that the
population variance denotes the variability in the population and population standard deviation
denotes the typical deviation of a population value from its population mean µ.
The formal representation of the population variance and standard deviation also gets modified.
In terms of the population parameters, the population variance can be written as:
∑𝑛𝑖=1(𝑥𝑖 − 𝜇)2
𝜎2 = √
𝑁
𝜎 = √𝜎 2
Just as we may use 𝑥̅ to make inferences about µ, similarly, we use s2 to make inferences about
σ2 .
Note here that we divide the sum of deviations from mean by N and not N-1 since s2 is based
on n-1 degrees of freedom. Degree of freedom refers to the maximum number of independent
values, that have the freedom to vary, in a sample. In other words, if we fix 𝑥̅ , then we need
only determine (n−1) number of the elements in the sample in order to know the nth element
of the sample.
Coefficient of Variation (C.V.)
A very popular and frequently used relative measure of variation is the coefficient of variation
denoted by C.V. This is simply the ratio of the standard deviation to arithmetic mean expressed
as a Percentage.
𝜎
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑉𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 = 𝐶. 𝑉. = × 100
𝑥̅
When C.V. is less in the data, it is said to be less variable or more consistent.
Consider the following data on the mean daily sales and standard deviation of four regions:
61 | P a g e
4 60 11.22
To determine which region is most consistent in terms of daily sales, we can calculate the
coefficient of variation:
10.41
𝐶𝑉1 = × 100
82
= 12.69
5.85
𝐶𝑉2 = × 100
44
= 13.29
9.52
𝐶𝑉3 = × 100
70
= 13.60
11.22
𝐶𝑉1 = × 100
60
= 18.70
Since the coefficient of variation is 12.69, the minimum for region 1. Hence the most consistent
region for sales is Region 1.
IN-TEXT QUESTIONS
Q.13 If the standard deviation of a data is 0.012. The variance will be:
A) 0.144 B) 0.00144
C) 0.000144 D) 0.0000144
Q.14 The variance of the first 10 whole numbers is _______.
Q.15 In a class of 100 students, the mean marks on a particular exam was 75, and the standard
deviation was 0. This implies that:
A) All students scored 75 marks B) Variance is 0.75
C) Standard deviation cannot be zero D) None of the above
62 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
Q.16 If the mean of certain observations is given as 60 and the standard deviation is 12, then
the coefficient of variation is 20%. (True/False)
Now that we are clear with the calculation of the measures of central tendencies and variations,
we will now examine the effects of change in origin and scale on both mean and standard
deviation/variance. It is important to learn about these effects since often a researcher may
incorrectly report the values in the dataset and to recalculate the mean and standard deviation
can become a lengthy task. To avoid such an inefficiency, we will learn how does mean and
variance respond if every value in the data set is changed.
3.5.1 Change in origin
Change in origin can also be understood in simpler terms as shifting of data. This suggests a
situation in which we add or subtract a constant value from all the observations in our dataset.
We will now see how this impacts the value of mean and standard deviation. Let us understand
this concept through a simple example.
Suppose we have the following 5 observations in our sample:
3, 9, 12, 18, 23
For convenience, we will call these observations as the ‘original dataset.’ At this point, you can
easily compute the mean and standard deviation of the original dataset.
∑𝑛
𝑖=1 𝑥𝑖 3+9+12+18+23 65
Mean = = = = 13
𝑛 5 5
∑𝑛
𝑖=1(𝑥𝑖 −𝑥̅ )
2 100 +16+1+25+100 242
Variance = = = = 60.5
𝑛−1 4 4
∑𝑛
𝑖=1(𝑥𝑖 −𝑥̅ )
2 100 +16+1+25+100 242
Variance = = = = 60.5
𝑛−1 4 4
63 | P a g e
𝑀𝑒𝑎𝑛 ∶ 𝑦̅ = 𝑥̅ + 𝑎 ,
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 ∶ 𝑠𝑦 = 𝑠𝑥
∑𝑛
𝑖=1(𝑥𝑖 −𝑥̅ )
2 900+144+9+225+900 2178
Variance = = = = 544.5
𝑛−1 4 4
64 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
𝑀𝑒𝑎𝑛 ∶ 𝑦̅ = 𝑏𝑥̅ ,
In conclusion, we can say that mean is affected by both- change in origin as well as change in
change in scale; whereas Standard deviation is affected only by change in scale.
IN-TEXT QUESTIONS
Q.17 Suppose the standard deviation of a dataset is 6. If each observation is divided by 3 then
the standard deviation of the new dataset will be:
A) 3 B) 2
C) 18 D) 9
Q.18 A researcher measures the weight of 10 students. The mean weight she calculates is
57kg. Later she realized that the weighing scale was misreporting each weight and she
had to add 3 kgs to the weight of each student. The new mean weight of students will
be _____.
Q.19 If the standard deviation of 11, 21, 31…,71, 81, 91 is ‘K’, then the standard deviation
of 15, 25, 35…,75, 85, 95 will be:
A) K - 4 B) K +4
C) K D) 4K
3.6 SKEWNESS
The measures of central tendencies and variation discussed above do not reveal all the
characteristics of a given set of data. For example, two distributions may have the same mean,
variance and standard deviation but may differ widely in terms of their shape and peakedness.
The given data is either symmetrical or it is not. It may be flat, normal or peaked.
If the distribution of data is not symmetrical, it is called asymmetrical or skewed. Thus,
skewness refers to the lack of symmetry in distribution.
A simple method of detecting the direction of skewness is to look at the tails to distribution.
The rules are:
65 | P a g e
1. Data are symmetrical when there are no extreme values in a particular direction so that
low and high Values balance each other. In this case, mean = median = Mode, or 𝑥̅ =
Q2 = Mode.
2. If the longer tail is towards the lower value or left-hand side, the skewness is negative.
Negative skewness arises when the mean is decreased by some very low values. Then
we have, mean < median < mode.
3. If the longer tail of the distribution is towards the higher values or right-hand side, then
skewness is positive. Positive skewness occurs when mean is increased by some very
high valued observations. In this case, mean > median > mode.
66 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
𝑀𝑒𝑎𝑛 − 𝑀𝑜𝑑𝑒
𝑆𝐾 =
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
𝑥̅ − 𝑀𝑜𝑑𝑒
𝑆𝐾 =
𝜎
If Mode is not given, we can use the approximate relationship studied earlier in the lesson, i.e.
Mode = 3 Median – 2 Mean. Hence, we can write the equation of coefficient of skewness as:
3(𝑀𝑒𝑎𝑛 − 𝑀𝑒𝑑𝑖𝑎𝑛)
𝑆𝐾 =
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
Now, if,
• SK = 0, it is a symmetrical distribution.
• SK > 0, the distribution is positively skewed.
• SK < 0, then it is a negatively skewed distribution.
67 | P a g e
For an open-ended distribution with extreme values in data with positional measures such as
median and quartiles, Bowley’s coefficient of skewness is used,
𝑄1 + 𝑄3 − 2𝑄2
𝑆𝐾 =
𝑄3 − 𝑄1
• SK = 0, it is a symmetrical distribution.
• SK > 0, the distribution is positively skewed.
• SK < 0, then it is a negatively skewed distribution.
IN-TEXT QUESTIONS
Q.20 For the frequency distribution of a variable x, mean = 32, median = 30 and mode = 26.
The distribution is:
A. Positively Skewed B. Negatively skewed
C. Symmetric D. None of the above
Q.21 A researcher gathers data on the number of years of experience professors in a
university have. The mean, median, mode and standard deviation are 25, 24, 26 and 5,
respectively. Karl Pearson’s coefficient of skewness is _______ (0.20 / - 0.20).
3.7 BOXPLOTS
We will conclude this lesson by discussing boxplots. Boxplots are yet another type of graphical
representation that is extremely informative. On one hand, the stem and leaf plots, bar graphs
and histograms depict a particular aspect of the data, on the other hand, measures of central
tendency and variability also focus on separate features of the data. Is there no way we can
visualize the data and also trace the mean and variability at the same time? There is, and the
answer is boxplots. Boxplot is a comprehensive graphical representation of data in which we
can illustrate not only the central value and the variability in the data, but it is also capable of
presenting the extreme values (outliers) as well as the shape of the distribution. The following
figure displays a box plot and its features:
68 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
Figure 7: Boxplot
The left side of the box represents the first quartile of the dataset whereas the right side denotes
the third quartile. The difference between the two, i.e., the interquartile range, or fourth spread
(fs) is the length of the box. You can see the median, or the second quartile in the middle,
dividing the box into two equal halves. The whiskers (the two hands extending out of the box
on the right and left side) extend towards the smallest and the largest value in the dataset, which
are not outliers. The small dots beyond the minimum and maximum values are termed as
outliers.
Let us create a boxplot together for better understanding. Suppose we have the following
dataset containing 10 numbers:
34, 29, 25, 35, 28, 37, 30, 35, 29, 38
To create a boxplot, we first need the 5-number summary of the data, i.e., the smallest number,
Q1, Median (Q2), Q3 and the largest number. To get these, it is advisable to arrange the data in
ascending or descending order. So, we get:
25, 28, 29, 29, 30, 34, 35, 35, 37, 38
You should now try and identify the 5-number summary from the above data.
We will get, smallest number = 25
Q1 = 29
Median (Q2) = 32
Q3 = 35
69 | P a g e
Largest number = 38
Interquartile range = 35 – 29 = 6
Now start by drawing an X-axis with appropriate labels and then draw a box around the first
and third quartile. Mark the median value in the middle. The length of the box must equal the
interquartile range. Next, draw two whiskers from both the ends of the box extending to the
smallest value on the left and largest value on the right. You should get a graph that looks like:
Figure 8: Boxplot
The boxplots can also be created in a vertical manner.
In a larger dataset, we can differentiate between the highest and lowest values of the dataset
and the extreme values that are also known as outliers. We use the following formula to
calculate the minimum and maximum values in a data set and any other value lower or greater
than these, respectively are termed as outliers:
Minimum value: Q1 – 1.5 IQR
Maximum value: Q3 + 1.5IQR
Where IQR is simply the Interquartile range. So, any value below the minimum and any value
above the maximum are outliers and we denote such values in the box plot as dots. Formally,
any observation farther than 1.5fs from the closest quartile is termed as an outlier. An outlier is
extreme if it is more than 3fs from the nearest quartile, and it is mild otherwise. Having an idea
about outliers is important since as we have seen in earlier sections extreme values can affect
our measures of central tendencies and variability. There is a possibility that the extreme value
is a result of an error in any step of research. By identifying such extreme values, we can be
cautious with our study ahead.
Boxplots are also useful in indicating the shape of the distribution of the data, i.e., whether we
have symmetrical or skewed data. When the median sits exactly in the center of the box and
has equal length of the whiskers on both the sides, we can say that the distribution is
symmetrical. However, when the median lies somewhere on the right end of the box with the
70 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
right whisker smaller than the left one, then the distribution is said to be left skewed and vice-
versa. The following figure represents the relationship between the shape of distribution and
boxplots:
Figure 10: Comparative boxplot of duration of Indian classical songs and rap songs
Try and attempt to interpret both the box plots yourself. What differences can you observe in
both the plot? What do these differences signify?
Let us begin interpreting the median. We can clearly see that the median length of classical
songs is significantly higher than that of rap songs. The average length of classical songs is 4.8
minutes, whereas the average length of rap songs is 4 minutes. Next, interpret the length of the
box, i.e. the interquartile range. It shows 50% of the data lies within this range. So, we can say
that half of the classical songs are 4.40 to 5 minutes long. Whereas 50% of the rap songs are
71 | P a g e
3.60 to 4.40 minutes long. We can also interpret the range of the data, i.e., the difference
between the maximum and minimum value. For Indian classical songs, the range is 5.20 – 3.80
that is 1.4 minutes. Whereas for rap songs, the range is 4.80 – 3.20 = 1.6 minutes. So, we can
say that the length of rap songs is more variable than classical songs. Finally, we can observe
the shape of the distributions by studying the location of the median value inside the box. The
median length of Indian classical songs is towards the right end of the box, indicating that the
distribution is left skewed. This means that most of the observations lie towards the right of
mean. In other words, we can say that most of the Indian classical songs in the data have a
longer duration. In contrast, the median length of rap songs lies right in the middle of the box.
This means that the distribution is quite symmetric.
IN-TEXT QUESTIONS
Q.22 The following boxplots illustrates the data about the ages of actors and actresses who
have won the National film award since 1967.
Mark all the statements that support the data shown by the boxplots:
A. The first quartile age of Best Actor winner is less that the last quartile age of
Best Actress winner
B. The minimum age of Best Actor winner is equal to the minimum age of Best
Actress winner
C. The range of age of Best Actor winner is higher than the range of age of Best
Actress winner
D. Both the distributions are left skewed
Q.23 Inspired by his statistics class; a student started maintaining a record of the number of
minutes he was late to enter the classroom every day. He recorded the time for 15 days
and the following list displays the data (in minutes):
19, 12, 9, 7, 17, 10, 6, 18, 9, 14, 19, 8, 5, 17, 9
Which of the given boxplots accurately depict the data:
72 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
A. B.
C. D.
3.8 SUMMARY
This lesson focused on the quantitative features of data in terms of measures of central tendency
and variability and their applications. Measures of central tendency refer to a typical or central
value of the data around which the data is generally clustered. Arithmetic mean is the average
value of the dataset which can be calculated by adding the value of all the observations and
dividing the sum by the total number of observations. Since mean is affected by extreme values,
median is preferred. Median is the middle observation in the data when the data is arranged in
ascending or descending order of magnitude.
When the distribution is symmetric, the three measures of central tendencies converge
at the same point. Quartiles divide the data set into 4 equal parts. Percentiles denote the
observation below which a particular percentage of observations fall. Deciles divide a dataset
into 10 equal parts. Since arithmetic mean is affected by extreme values, trimmed mean is used
to eliminate the extreme observations from analysis. Under measures of variability, range is
the simplest measure. It is the difference between the largest and the smallest value in the
dataset. Inter-quartile range is calculated by taking the difference between the upper and lower
quartile. Variance is calculated by dividing the sum of the squared deviations from the mean
by (n – 1). Standard deviation is simply the square root of variance. Degree of freedom refers
to the maximum number of independent values, that have the freedom to vary, in a sample.
Coefficient of variation is a relative measure of variation. Mean is affected by both- change in
origin as well as change in change in scale; whereas Standard deviation is affected only by
change in scale. Skewness refers to the lack of symmetry in distribution. In case of a right or
positively skewed distribution, the value of mean is the largest, followed by the median and
mode. The opposite is true in the case of a left or negatively skewed distribution, i.e., the value
of mode is the largest and mean has the smallest value. Boxplots are capable of illustrating the
central values, variability in the data, the extreme values (outliers) as well as the shape of the
distribution. The boxplots are extremely useful to compare two datasets and identify outliers.
73 | P a g e
3.9 GLOSSARY
• Boxplot: Graphical representation of measure of central tendency, variability and skewness
of numerical data using quartiles
• Change in origin: Addition or subtraction of a constant value from all observations in
dataset
• Change in scale: Multiplying or dividing all the observations in the dataset with a constant
term
• Coefficient of variation: Relative measure of variation
• Deciles: Divide dataset into 10 equal parts
• Degree of freedom: Maximum number of independent values, that have the freedom to
vary
• Inter-quartile range: Difference between the upper and lower quartile
• Mean: Average value of the dataset
• Median: Middle observation in the data when the data is arranged in ascending or
descending order
• Measures of central tendency: Certain measurements that give a typical or central value
from the data
• Measures of variability: Represent spread of data around averages
• Mode: Most frequently occurring value in the dataset
• Percentiles: That observation below which a particular percentage of observations fall
• Quartile: Divide the data set into 4 equal parts
• Range: Difference between largest and smallest value in the dataset
• Skewness: Lack of symmetry in distribution
• Standard deviation: square root of variance
• Trimmed mean: Computing mean after eliminating the extreme observations
• Variance: Illustrates the variation in a dataset
If the average monthly spending by 21 women in a kitty group was Rs. 5240, what is
the new average spending if another member is added whose average monthly spending
is 5540? Use the formula above to answer.
Q.7 The sum of deviations of a certain number of observations from 12 is 166 while the
sum of deviations of these observations from 16 is (-54). Find the number of
observations and their mean.
Q.8 Consider the following data that depicts the daily income of two ice cream sellers in
two different regions:
76 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
Q.9 If the coefficient of skewness of a distribution is 0.32, the standard deviation is 6.5 and
the mean is 29.6 then find the mode of the distribution.
Q.10 There are 40 students in a class preparing for a statistics test. There are two strategies
that these students can adopt to ace the test. After the test, the professor interviewed the
students and noted down their strategies. The professor observed that 20 students
followed strategy A and the other 20 followed strategy B. Given below are the marks
of the students based on the strategies they adopted:
Strategy A: 78, 78, 79, 80, 80, 82, 82, 83, 83, 86, 86, 86, 86, 87, 87, 87, 88, 88, 88, 91
Strategy B: 66, 66, 66, 67, 68, 70, 72, 75, 75, 78, 82, 83, 86, 88, 89, 90, 93, 94, 95, 98
Write down the 5-number summary for both the strategies. Create two boxplots for each
strategy using the 5-number summary and comment on the shapes of boxplots. According to
you, which strategy is more likely to fetch you good marks in the test and why?
3.12 REFERENCES
• Devore, J. L. (2016). Probability and Statistics for Engineering and the Sciences. Cengage
learning.
• Larsen, R. J., & Marx, M. L. (2012). An introduction to mathematical statistics and its
applications. Prentice Hall.
• McClave, J. T., Benson, P. G., & Sincich, T. (2018). Statistics for business and economics.
Pearson Education.
77 | P a g e
LESSON 4
STRUCTURE
78 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
4.2 INTRODUCTION
This unit introduces the concept of ‘probability’ to the students. The phenomenon of
probability indicates the presence of randomness and the existence of some element of
uncertainty. Whenever we face a situation in which there is more than one possible outcome
that can occur, the concept of probability renders a technique for quantifying the chances or
likelihood associated with every possible outcome. There are several instances that involve
chances and thus the notion of probability is applicable. For example, in political elections,
based on exit polls it is plausible to predict that a certain political party could come into power.
By deploying a database of the previous days and considering various parameters such as
temperature, humidity, pressure, etc., the meteorologists use specific tools or techniques to
predict weather forecasts and determine that there are 60 out of 100 chances that it would rain
today.
Another example from day-to-day life is that ‘since it is supposed to rain tomorrow, it is very
likely I will use my raincoat when I go to work. Similarly, flipping a coin involves the
probability of getting either a head or a tail is 0.5 and playing with dice involves one out of six
chances that the required number will come. Thus, the concept of probability can be applied to
several interesting events.
Probability is a mathematical term and the study of probability as a branch of mathematics is
over 300 years. This chapter enables the students to understand and estimate the likelihood of
various possibilities of events and outcomes. Various elementary concepts used in
79 | P a g e
comprehending the concept of probability will be discussed and explained, such as Sample,
population, random experiments, Venn diagram, sample points, events, types of events etc.
Deductive Reasoning
Probability
Population Sample
Inferential Statistics
Inductive Reasoning
reasoning. However, when the sample is used to deduct or infer the population, inferential
statistics is deployed for inferring the population. The technique is referred to as ‘inductive
reasoning. Thus, the role of probability is explicit and well-defined as it plays a critical role in
inferring the sample derived from the population. It is crucial in the deductive method of
statistical inference or research.
Having understood the difference between the sample and population and the relationship
between them, also their role played in statistical inference, it is crucial to comprehend the kind
of experiments or data collection.
4.3.1 Statistical or Random Experiments
Any activity or process whose outcome is subject to uncertainty is considered an experiment.
Experiments generally suggest careful controlled testing of the situation or planned testing in
the laboratory. However, in the disciple of statistics, experiments refer to a wider scope of trials
such as tossing a coin once or several times, selecting a card from the deck, obtaining a
particular blood type from a group of individuals, etc.
Any process of observation or measurement that has more than one possible outcome and for
which there is uncertainty about which outcome will actually materialize is referred to as a
‘random experiment’. For example, tossing a coin, throwing a pair of dice, drawing a card from
the deck of cards.
4.3.2 Sample Point, Event
Each member or outcome of a sample space or population is called Sample Point and event.
It is also called an element of sample space. Let us consider the example of the toss of the coin
for which the sample space is S = {H, T}. The number of elements in the sample space or
population is n(S) = 2. Each element of the sample space that is H and T are known as a Sample
point. In general, n(S) is the number of sample points, a number of times the experiment is
repeated.
Consider an event B which is defined as Event B: Tail appears: B={T}. The number of elements
in event B is 1, denoted by n(B)=1
In a random experiment of the toss of a coin, suppose the event A denotes the event that Head
appears. A = {H}. The number of elements in event A is 1, denoted by n(A)=1
A+B = S: {H}+{T} = {H,T} = S
Let us consider another example of tossing two fair coins. The sample space or population for
this experiment is given by Sample Space: {HH, HT, TH, TT}
The number of elements in the sample space is 4, denoted by n (S) = 4.
81 | P a g e
Consider an event B that at least the head appears on one of the coins in the toss of two coins
simultaneously. Event B can be represented as
B= {HH, HT, TH},
The number of elements in event B is 3, represented by n(B) = 3
Trial & Events: An experiment is repeated under essentially identical conditions but does not
give unique results. It may result in several possible outcomes. The experiment is called a Trial
and the outcomes are called events. For example, throwing a coin once is an experiment, and
getting a Head or Tail is an event. Planting a sapling is a Trial and whether it survives, or dies
is an Event. Sitting for an examination is a Trial and getting grades such as A, B, C, D, and E
are events.
Exhaustive Events: All possible outcomes of an experiment constitute collectively exhaustive
events. For example, tossing a coin result in two exhaustive cases which are Head and Tail.
Planting a sapling leads to two exhaustive cases which are Survival and Death. Sitting for an
examination where a student is awarded only 5 grades results in those many exhaustive
numbers of cases.
Favourable Events: All those outcomes of an experiment that lend themselves to the
objectives or favour of the experiments are favourable events. For example, a gambler betting
on an Ace in a game of cards where every draw of cards decides the winner or loses has 4
favourable events, and betting on a black card has 13+13 = 26 favourable events.
Mutually Exclusive Events: Events are said to be mutually exclusive if happening of one
event prevents the occurrence of other events at the same time. Such events are also referred to
as disjoint events since they have no element in common. For example, in athletics meet
involving 10 challengers if any one of them wins then the remaining 9 winning cannot happen
and hence are mutually exclusive. Similarly, in a toss of coin, occurrence of Head or Tail are
mutually exclusive.
Equally Likely Events: Two events are said to be equally likely if one of them is as likely to
happen as the other. For example, in tossing a fair coin once, the outcomes Head and Tail are
equally likely. In a throw of 6-faced dice, all the six numbers 1,2,3,4,5,6 are equally likely. If
a person suffers a minor heart attack, the death or survival outcomes are not equally likely.
Independent Events: If the happening of one event is not affected by the happening (or not
happening) of another event, such events are said to be independent. For example, successively
throwing a dart on the dartboard and getting a perfect score in every throw are independent
events. However, a person throwing the dart once, practicing, and then throws it for the second
time. The event of getting a perfect score in both throws is not independent.
82 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
A CASE STUDY
Consider another example of rolling a dice, The sample space for the random experiment of
rolling dice is given by the Sample space S = {1,2,3,4,5,6},
The number of elements in the sample space is 6, denoted by n (S) = 6,
Let event E be an event that reflects even numbers that appear on dice, as represented by
E= {2, 4, 6},
The number of elements in event E is 3, represented by n(E)=3
There are several varieties of events as described in the next section.
IN-TEXT QUESTIONS
1. Events are said to be _____________. if the occurrence of one event prevents the
occurrence of another event at the same time.
2. If event A represents an event that at least a head appears, and event B represents an
event that only the tail appears. Events A and B are equally likely True / False
84 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
3. In the occurrence of the event: {Head} in a single throw of the coin, the occurrence of
event {Tail} is disjoint. The two events are called
a) Mutually exhaustive b) Equally likely c) Both
4. In an experiment consisting of tossing two coins, if event A represents an Event that at
least a Head occurs and event B represents that at least a Tail occurs, then
a) Events A and B are equally likely (True/False)
b) Events A and B are mutually exclusive (True/False)
c) Events A and B together form an exhaustive set (True/False)
4.3.5 Events, Set theory, and Venn diagrams
An event can be considered a set, therefore the relationships and results from elementary set
theory can be used to study events of any random experiment. Some of the fundamental
operations of set theory can therefore be applied to events such as.
1. The complement of an event A is denoted by A'. A complement represented as A' is
the set of all outcomes in the sample space S that are not contained in set A.
2. The union of the two events A and B is denoted by A ∪ B. A union B can also be read
as “A or B” or in both events. In other words, the union of two events includes outcomes
for which both A and B occur as well as outcomes for which exactly one occurs. It
means all outcomes in at least one of the events.
3. The intersection of the two events, A and B, denoted by A ∩ B is read as “A and B”.
The intersection of two events indicates an event consisting of all outcomes that are in
both A and B.
4. A null event is an event consisting of no outcomes whatsoever and is denoted by ∅.
Suppose there are two events A and B, and it is given that A ∩ B = ∅. then A and B are
said to be mutually exclusive or disjoint events.
4.3.6 De Morgan’s laws
a. The complement of the union of events A and B is equal to the intersection of the
complement of A and the complement of B.
(A ∪ B) ' = A ‘∩ B '
b. The complement of the intersection of event A and B is equal to the union of the
complement of A and the complement of B.
(A ∩ B) ' = A' ∪ B'
The events can be represented by using the Venn diagram as shown in the diagrams below.
85 | P a g e
A B
A A
B
IN-TEXT QUESTIONS
1. Consider an experiment in which each of the three vehicles taking a particular freeway
exit turns left (L) or right (R) at the end of the exit ramp. Outline the sample space and
events.
2. The two events E1 and E2 are mutually exclusive, where E1 is the event consisting of
numbers less than 3 and E2 is the event that consists of numbers greater than 4. (True/
False)
86 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
3. If the two events have some common elements, the two events are not ____________.
4.4 PROBABILITY
In the realm of random experiments, the key objective of the probability of any event A is to
assign a number P(A) to event A. This value P(A) is called the probability of event A which
gives a unique measure of the chances that the event will occur.
In other words, the probability is the chance of happening or occurrence of an event such as it
might rain today, team X will probably win today, or I may win the lottery. Largely, probability
is a measure of uncertainty.
4.4.1 Classical Definition of Probability
It is also called a priori or mathematical definition of probability. The probabilities are derived
from purely deductive reasoning. This implies that one does not throw a coin to state that the
probability of obtaining a head, or a tail is ½. However, there are cases where possibilities that
arise cannot be regarded as equally likely. For example, the Probability of a recession next year
Probability of GDP value next year. Similarly, the possibility of whether it will rain, or the
outcome of an election is not equally likely.
If an experiment results in mutually exclusive and equally likely outcomes. If m outcomes are
favorable to event A and n is the total number of outcomes in the sample space, then
𝑚 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒 𝑡𝑜 𝐴
P(A) = , 𝑜𝑟
𝑛 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠
In a single throw of a die, the total occurrences or sample space is n = 6. All are mutually
exclusive and equally likely.
4.4.2 Relative Definition of Probability (by Von Mises)
If a trial is repeated a large number of times under essentially homogeneous and identical
conditions, then the limiting value of the relative frequency which is the ratio of absolute
frequencies to the total number of occurrences is called the probability of happening of events.
𝑚
P(A) = lim
𝑛→∞ 𝑛
87 | P a g e
IN-TEXT QUESTIONS
6. In a toss of two coins simultaneously, the probability of getting exactly 2 heads P(E)
no. of possible outcomes / total outcomes
7. In the toss of 3 coins simultaneously, the probability of getting exactly two heads.
8. What is the probability of getting at least 1 head when two coins are tossed
simultaneously?
9. Prob of getting almost 2 tails when three coins are tossed simultaneously.
10. Probability of getting at least 2 heads when three coins are tossed simultaneously.
11. Probability of getting a greater number of tails than heads when three coins are tossed
simultaneously.
4.4.3 Axiomatic Definition of Probability
The axiomatic approach to probability was provided by Russian Mathematician A.N.
Kolmogorov and includes both the above definitions. In order to ensure that the probability
assignments of values P(A) for a particular event in the sample space S, is consistent with the
intuitive notion of probability, all assignments of values of probability P(A) must satisfy the
following properties or Axioms.
1. For any event A, the probability of event A, given by P(A) is non-positive P(A) ≥0. In
other words, the probability that event A will occur can either be zero or some positive
number. The probability of event A can never be negative.
The Axiom 1 reflects the intuitive notion that the chance of A occurring should be non-
negative and is known as the Axiom of non-negativity.
2. The probability of the entire sample space is 1, that is P(S) = 1. In other words, the
probability that the entire sample space will occur is 100 percent, which means it will
surely occur. This is known as the Axiom of Certainty.
The sample space by definition is the event that must occur when the experiment is
performed. The sample space S contains all possible outcomes, therefore the maximum
possible probability is assigned to sample space S.
3. If A1, A2, A3, ………... are the infinite collection of disjoints events, then
P (A1 ∪A2 ∪ A3, ……….) = ∑∞ 𝑖=1 𝑃(𝐴)
This indicates that the probability of the union of all disjoint events belonging to the sample
space sums the chances of all individual events.
88 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
The third Axiom formalizes the idea that if we wish the probability that at least one of a number
of events will occur, given that no two events can occur simultaneously, then the chance of at
least one occurring is the sum of the chances of the individual events. This is known as the
axiom of finite additivity.
4. The probability of an event always lies between 0 and 1.
0< = P(A) < = 1,
P(A) = 0 means event A will not occur.
P(A) = 1 means event A will occur certainly.
5. Let the ∅ be the null event. The event contained no outcomes whatsoever. This property
mainly reflects Axiom 3 indicating the finite collection of disjoint events.
Therefore, P (∅) = 0, the probability of a null event is zero.
6. If A, B, and C are mutually exclusive events, the probability that any one of them will
occur is equal to the sum of probabilities of either individual occurrence.
P(A+B+C+...........) = P (AUBUC…….) = P(A) +P(B)+P(C) +...............
7. If A, B, C …… are a mutually exclusive and collectively exhaustive set of events the
sum of the probability of their individual occurrences is 1. However, if A, B, C ……
are any events, they are said to be statistically independent if the probability of their
occurring together is equal to the product of their individual probabilities. P(A∩B∩C)
= Probability of events A, B, and C occurring together or jointly or simultaneously, also
referred to as Joint probability.
P(A), P(B), and P(C) are called unconditional marginal or individual probabilities.
8. If events A, B, and C …...are not mutually exclusive then,
P(A+B) or P(AUB) = P(A) +P(B) - P(A∩B)
Where P(AB) is the joint probability that the two events occur simultaneously, that is
P (A∩ 𝐵). However, if A and B are mutually exclusive then,
P(A∩B) = P (∅) = 0
For every event A, there is an event A', called as a complement of A
89 | P a g e
IN-TEXT QUESTIONS
4.5 SUMMARY
This lesson familiarized the students with the basic concepts of sample space and population
along with their significance. The notion of probability was introduced with help of random
experiments. Various applications of probability in real life are presented in the chapter. Certain
important concepts related to probability such as space, events, sample points, and random
experiments are described in the chapter. The basic difference between the sample, population,
sample points, and events have been emphasized. The types of events such as disjoint events,
mutually exhaustive, and exclusive events have been explained. Further, the concept of the
Venn diagram is also presented in the chapter. The notion of probability by using classical and
relative definition has been introduced. Later the properties of probabilities are also discussed
in the chapter.
90 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
4.6 GLOSSARY
92 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
1. Two six-faced dice are rolled together, or dice is rolled twice. The total number of
possible outcomes are 36.
2. (i) Prove that the probability of null event is zero, P (∅) = 0.
(ii) Prove that for any two events A and B
P(AUB) = P(A) +P(B) - P(AB)
4.9 REFERENCES
• Devore, J. L. (2015). Probability and Statistics for Engineering and the Sciences.
Cengage Learning.
• Freund, J. E., Miller, I., & Miller, M. (2004). John E. Freund's Mathematical Statistics:
With Applications. Pearson Education India.
• McClave, J. T., Benson, P. G., & Sincich, T. (2008). Statistics for business and
economics. Pearson Education.
93 | P a g e
LESSON 5
CONDITIONAL PROBABILITY
STRUCTURE
• To understand the concept of conditional probability and its significance in real life.
• To comprehend the significance of the initial assignment of probability that may be
followed by partial information relevant to the outcome.
• To visualize that partial information may affect the assignment of probability
assignment. This leads us to the concept of conditional probability and Bayes’
Theorem.
• To comprehend the concept of Bayes’ Theorem and its applications.
• Learning the computation of a posterior probability from the given prior probabilities
and conditional probabilities plays a critical role in Bayes’ Theorem.
94 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
• To practice several cases and examples for the application of Bayes’ Theorem.
5.2 INTRODUCTION
In the previous chapter, we introduced the topic of probability. In this chapter, we expose
students to the deeper concepts and situations related to probability. The probabilities assigned
to various events or occurrences are subject to what is known as experimental situations. The
initial assignment may be followed by partial information relevant to the outcome. The partial
information may affect the assignment of probability assignment. This leads us to the concept
of conditional probability and Bayes’ Theorem.
Conditional probability is considered a measure of the likelihood of an event occurring,
assuming that another event or outcome has previously occurred. For instance, a student aims
to receive an academic scholarship while applying for admission. For every 1000 applications,
the college accepts 100 applications and awards an academic scholarship to 10 of every 500
students. Of the scholarship recipients, 50 % receive university stipends for books, meals,
housing etc. As a result, the chance of students being accepted and then receiving a scholarship
is 2% given by 0.1 multiplied 0.02. While the chance of being accepted, receiving the
scholarship and then receiving the stipend for books etc. is 0.1 & given by 0.1 multiplied by
0.02 multiplied by 0.5.
In the realm of conditional probability, Bayes’ Rule or Bayes’ Law is used to calculate the
conditional probability. Bayes’ theorem is a mathematical equation that helps calculate
conditional probability. The computation of a posterior probability from the given prior
probabilities and conditional probabilities plays a critical role in Bayes’ Theorem.
5.3 CONDITIONAL PROBABILITY
95 | P a g e
Where P(AB) is the joint probability of events A and B occurring together. Also, read as the
probability of events A and B or P(A∩B) or probability of event A intersection B.
The computation of conditional probability P (A | B) requires that the P(B) is positive or the
probability of event B occurring cannot be negative or zero.
Similarly, the conditional probability of B, given A, is denoted by P (B | A)
P (B | A) = P(AB)/ P(A), P(A) > 0
P(AB) = P (B | A) *P(B), P(A) > 0
96 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
Where P(AB) is the joint probability of events A and B occurring together. Also, read as the
probability of events A and B or P(A∩B) or probability of event A intersection B. The
computation of conditional probability P (A | B) requires that the P(B) is positive or the
probability of event B occurring cannot be negative or zero.
In the case of Conditional Probability is denoted by: P (A | B)
P (A | B) = P(AB)/ P(B), P(B) > 0
The conditional probability is expressed as a ratio of the unconditional probabilities. The
numerator is simply the probability of the intersection of the two events, whereas the
denominator is the probability of the conditioning event B. The conditional probability can be
represented by the following Venn diagram.
A A B
97 | P a g e
Points to remember
(a) Two possible mutually disjoint events are always dependent.
Proof: Let A and B be disjoint events i.e. A ∩ B = Ø
So, P (A ∩ B) = 0
We know that P (A ∩ B) = P(A)*P (B |A), P(A) ≠ 0
P(B)*P (A |B), P(B) ≠ 0
98 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
CASE STUDY
Suppose that of all the individuals who buy the smartphone, 60% include an optional memory
card in their purchase, 40% include an extra battery and 30% include both a card and a battery.
Solution:
Let us consider a randomly selected buyer, and let event A be the memory card purchased,
while event B be the battery purchased. Thus, P(A) is 0.6, P(B) is 0.4. The probability that both
memory card and battery are purchased, P(A∩B) or P(AB) is 0.3.
Given that the selected individual purchased an extra battery, the probability that an optional
card was also purchased is P (memory card/battery) or P(A/B).
P (memory card/battery) = P(A|B) = P(A∩B)/ P(B), P(B) > 0
= 0.3/0.4 = 0.75
This implies of all those purchasing an extra battery, 75% purchased an optional memory card.
Similarly, given that the memory card was purchased, the probability of buying battery is given
by P (battery/memory card) or P(B/A).
P (battery|memory card) = P(B|A) = P(A∩B)/ P(A), P(A) > 0
= 0.3/0.6 = 0.5
It can be observed that P(A|B) ≠ P(A) and P(B|A) ≠ P(B)
This means conditional probability is not equal to unconditional probability.
IN-TEXT QUESTIONS
1. A card is drawn from a deck of cards. What is the probability that it will be either a
heart or a queen?
2. The numerator is the union of two events in the computation of conditional probability.
(True/false)
3. In a class, there are 500 students of which 300 are, males and 200 are females. Of these
100 males and 60 females plan to major in accounting. A student is selected at random
from this class and it is found that this student plans to be an accounting major. What
is the probability that the student is a male?
4. If there are two events, A and B, then the probability that event A occurs knowing that
event B has already occurred is referred to as ______________.
99 | P a g e
5. If we randomly pick two TV sets in succession from a shipment of 240 T.V tubes of
which 15 are defective. What is the probability that they will both be defective?
5.4 BAYES’ THEOREM
1
https://www.investopedia.com/terms/b/bayes-theorem.asp
100 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
∪ ………∪ ( Bk ∩A), where all the events (Bi ∩A) are mutually exclusive. This “partitioning of
A” is illustrated in figure 2 below.
𝐴
𝑃(𝐵𝑟).𝑃( )
𝐵𝑟
P(Br/A) = 𝑘 𝐴 , for all r = 1, 2,3,………..k
∑𝑖=1 𝑃 (𝐵𝑖 ).𝑃( ))
𝐵𝑖
𝐴 𝐴 𝐴
∑𝑘𝑖=1 𝑃 (𝐵𝑖 ). 𝑃( ) = 𝑃( 𝐵1 ). 𝑃 ( ) + 𝑃(𝐵2 ). 𝑃 ( ) +
𝐵 𝑖 𝐵 1𝐵 2
𝐴 𝐴
𝑃(𝐵3 ). 𝑃(𝐵 )+……… 𝑃(𝐵𝑘 ). 𝑃(𝐵 )
3 𝑘
𝑃( 𝐵𝑟 ∩𝐴)
P(Br/A) = 𝑃(𝐴)
In the figure 2, it is evident that given that event A has occurred, the probability that A had
occurred from partition B4 is given by
𝐴
𝑃(𝐵4 ).𝑃( )
𝐵4
P(B4/A) = 𝐴 , for i = 1, 2,3,4
∑4𝑖=1 𝑃 (𝐵𝑖 ).𝑃( ))
𝐵𝑖
101 | P a g e
𝐴 𝐴 𝐴 𝐴 𝐴
∑4𝑖=1 𝑃 (𝐵𝑖 ). 𝑃( ) = 𝑃( 𝐵1 ). 𝑃 ( ) + 𝑃(𝐵2 ). 𝑃 ( ) + 𝑃(𝐵3 ). 𝑃( ) + 𝑃(𝐵4 ). 𝑃( )
𝐵𝑖 𝐵 1 𝐵 2 𝐵 3 𝐵 4
P(B4/A): Probability that partition B4 occurs given that event A has occurred.
Example: The probability of receiving a spam message given that the computer programme
filter has confirmed the probability to be more than 0.6. These are related probabilities that can
be calculated by using Bayes’ Theorem. Using the same notations, we find two mutually
exclusive and collectively exhaustive events A and B as follows.
A : The incoming mail is a spam message.
B : The incoming mail is not a spam message.
The other events defined in the context of the same experiment are:
C : Filter test confirms spam
D : Filter test did not confirm spam.
The data given to us are:
P(A) : Probability of finding spam = 0.6
P(B) : Probability of not finding spam = 0.4
P(C|A) : Probability test predicts correctly when spam is actually confirmed or found.
P(D|A) : Probability test predicts incorrectly when spam is actually found.
P(D|B) : Probability test predicts correctly when actually spam is not there.
P(C|B) : Probability test predicts incorrectly when actually no spam is found.
We are interested in finding:
P(C) : Probability that the test says spam is there.
P(D) : Probability that the test says no spam is there.
P(A|C) : Probability of finding spam, given positive test results.
P(A|D) : Probability of finding spam, given negative test results.
P(B|C) : Probability of not finding spam, given positive test results.
P(B|D)' : Probability of not finding spam, given negative.
Applying Bayes’ Theorem
102 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
𝑃(𝐶|𝐴)∗𝑃(𝐴)
P(A|C) = 𝑃(𝐶 |𝐴).𝑃(𝐴)+𝑃(𝐶 |𝐵 ).𝑃(𝐵)
0.9∗0.6
= 0.9 .∗ 0.6 +0.3∗0.4 = 0.818
𝑃(𝐶|𝐵)∗𝑃(𝐵)
P(B|C) = 𝑃(𝐶 |𝐵 ).𝑃(𝐵)+𝑃(𝐶 |𝐴).𝑃(𝐴)
0.3∗0.4
= = 0.182
0.3 .∗ 0.4 +0.9∗0.6
𝑃(𝐷|𝐴)∗𝑃(𝐴)
P(A|D) = 𝑃(𝐷 |𝐴).𝑃(𝐴)+𝑃(𝐷 |𝐵 ).𝑃(𝐵)
0.1∗0.6
= 0.1 .∗ 0.6 +0.7∗0.4 = 0.176
𝑃(𝐷|𝐵)∗𝑃(𝐵)
P(B|D) = 𝑃(𝐷 |𝐵 ).𝑃(𝐵)+𝑃(𝐷 |𝐴).𝑃(𝐴)
0.7∗0.4
= 0.1∗0.6+0.7∗0.4 = 0.824
103 | P a g e
CASE STUDY
Suppose a consulting firm rents motorbikes from three rental agencies, 60 percent from agency
1, 30 percent from agency 2, and 10 percent from agency 3. Suppose 9 percent of bikes from
agency 1, need a tune-up and 6 percent of the cars from agency 3 need a tune-up, what is the
probability that a rental bike delivered to the firm will need a tune-up?
Let A be the event that the bike needs a tune and B1, B2, and B3 are the events that the bike
comes from rental agencies 1,2, or 3. P(B1) = 0.60, P(B2) = 0.30, P (B3) = 0.10, P(A/B2) =
0.20, and P(A/B3) = 0.06
According to the Bayes’ Theorem,
P(A) = (0.60) * (0.09) + (0.30) * (0.20) + (0.10) * (0.06) = 0.12
If a rental bike delivered to the consulting firm needs a tune-up, then what is the probability
that it came from rental agency 2?
P(B2/A) = (0.30) * (0.20) / (0.60) * (0.09) + (0.30) * (0.20) + (0.10) * (0.06)
= 0.060 / 0.120 = 0.5
It is observed that although only 30 percent of the bike delivered to the firm come from agency
2, 50 percent of those who require a tune-up come from the agency.
IN-TEXT QUESTIONS
6. A balanced die is tossed twice. If A is the event that an even number comes up on the
first toss, B is the event that an even number comes up on the second toss, and C is the
event that both tosses result in the same number, are the events A, B and C
a. Pairwise independent
b. Independent?
7. The probability of simultaneous occurrences of two events can never exceed the sum
of probabilities of these events. T
8. The conditional probability of an event given another event can never be less than the
probability of the joint occurrence of their events. T
5.5 INDEPENDENCE OF EVENTS
The concept of conditional probability suggests that the probability of an event A, P(A) must
be modified in context of another event B has occurred whose outcome affects the occurrence
of event A. The new probability now assigned to A can be expressed as P(A|B). This is
considered as conditional probability of event A occurring given that event B has already
104 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
occurred. Therefore, the conditional probability of A such that B has occurred, given by P(A|B)
differs from unconditional probability P(A). This mainly indicates that the information that B
has occurred results in change in the chance of A occurring.
The chances are that the occurrence of A is not affected by the fact that B has occurred,
implying that P(A|B) = P(A). In other words, the occurrence or non-occurrence of one event
has no consequence on the chances that the other will occur. Such events are referred to as
independent events.
The two events A and B are independent if P(A|B) = P(A), while if they are dependent P(A/B)
≠ P(A). There exists a strong connection between the concept of independence and conditional
probability.
The conditional probability formula for P(A|B) and P(B|A) as given below,
P(A|B) = P(AB)/ P(B), P(B) > 0 eq (1)
For P(B/A),
P(B|A) = P(AB)/ P(A), P(A) > 0 eq (2)
105 | P a g e
Sample space: S = {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}
A = {HHH, HHT} B = {HHT, HTT, THT, TTT} C = {HTT, THT, TTH}
A ∩ B = {HHT}, B ∩ C = {HTT, THT} P(A) = ¼, P (B) = ½, P (C) = ⅜, P (A ∩ B) =
⅛, and P (B ∩ C) = ¼
Since, P (A) * P (B) = ¼ * ½ = ⅛ = P (A ∩ B) = ⅛, implies that events A and B are
independent.
Since P (B) * P (C) = ½ * ⅜ = 3/16 ≠ P (B ∩ C) = ⅛, implies that events A and
B are not independent.
IN-TEXT QUESTIONS
9. Prove that if A and B are independent the A' and B' are also independent
10. The two mutually exclusive events must be independent. (True / False)
11. A bag contains 7 red and 4 blue balls. Two balls are drawn at random with replacement.
The probability of getting the balls of different colors is:
a. 28/121
b. 56/121
c. ½
d. None of these
12. The conditional and unconditional probability of random variables are always equal.
5.6 SUMMARY
The lesson presents a measure of the likelihood of an event occurring, assuming that another
event or outcome has previously occurred, which is called conditional probability. The
conditional probability has wide range of application. The computation of conditional
probability has been explained systematically in the lesson. The role of conditional probability
to define independence and dependence of events has been described. In case of independent
events conditional and unconditional probabilities are same. The notion and relevance of Bayes
theorem which is primarily the application of conditional probability has been explained.
5.7 GLOSSARY
1. Conditional Probability: Conditional probability is defined as the likelihood of an
event or an outcome occurring based on the occurrence of a previous event or outcome.
It is condition upon the occurrence of some event that has happened earlier.
106 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
5. 7/1,912
Hint: Let event A be the event when the first randomly picked-up TV set out of the two
is defective, A: 1st TV defective. The number of elements in the sample space is n(S)
is 240.
The probability of event A, P(A) = n(A)/n(S) = 15/240.
Let event B be the event when the second randomly picked-up TV set out of the two is
defective, B: 2nd TV defective, P(B/A) = 14/239
P(AB) = P(A)*P(B|A) =15/240*14/239 = 7/1,912
6. a) the events are pairwise independent
b) the events are not independent
7. True
8. True
9. Hint: Event A can be expressed as (A ∩ B) ∪ (A ∩ B')
Also note that (A ∩ B) and (A ∩ B') are mutually exclusive. It is given that A and B
are independent.
10. False
11. (b)
12. False
5.9 SELF-ASSESSMENT QUESTIONS
1. Two six-faced dice are rolled together, or dice is rolled twice. The total number of
possible outcomes are 36.
2. (i) Prove that the probability of null event is zero, P (∅) = 0.
(ii) Prove that for any two events A and B
P(AUB) = P(A) +P(B) - P(AB)
5.10 REFERENCES
• Devore, J. L. (2015). Probability and Statistics for Engineering and the Sciences.
Cengage Learning.
• Freund, J. E., Miller, I., & Miller, M. (2004). John E. Freund's Mathematical Statistics:
With Applications. Pearson Education India.
• McClave, J. T., Benson, P. G., & Sincich, T. (2008). Statistics for business and
economics. Pearson Education.
108 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
109 | P a g e
LESSON 6
RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS
STRUCTURE
6.1 Learning Objectives
6.2 Introduction
6.3 Random Variables
6.3.1 Types of random variables
6.4 Probability Distribution Function
6.5 Probability Density Function
6.5.1 Properties of Probability Density Function
6.6 Summary
6.7 Glossary
6.8 Answers to In-text Questions
6.9 Self-Assessment Questions
6.10 References
6.11 Suggested Readings
6.1 LEARNING OBJECTIVES
• To understand the concept of random variables, and their significance in statistical
analysis.
• The students will be able to distinguish between the two fundamental types of random
variables, namely the discrete random variable and continuous random variable.
• To familiarize the students with some commonly used discrete and continuous
distributions of random variables.
• To understand the concept of probability distribution function or probability mass
function.
• To comprehend the derivation of the probability distribution function
6.2 INTRODUCTION
We have seen in earlier units how the concept of probability enables us to compute the extent
of uncertainty associated with random experiments. An experiment may yield both qualitative
and quantitative outcomes. Statistical analysis focuses on the numerical aspect of the data or
experiment. Thus, the term random variable is introduced to represent any event or outcome
110 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
that can take different values. Such a variable takes the values that are the plausible outcomes
of any event or experiment.
Since the values are random and associated with random experiment such variables are termed
as random variables. The concept of random variable allows us to pass from the experimental
outcomes themselves to a numerical function of the outcomes.
There are two types of random variables discrete random variables and continuous random
variables. This chapter will define the concept of random variables along with the two
fundamental types of random variables. The chapter will help us to understand the derivation
of probability distribution function both for discrete and continuous random variables.
6.3 RANDOM VARIABLE
Each outcome of an experiment can be associated with a number by specifying a rule of
association example: total weight of baggage for a sample of 25 airline passengers.
Such a rule of association is called random variable
A random variable is a variable because the observed value depends on which of the possible
experimental outcomes results.
Sample Space
X2 f(x1) f(x2)
X1
111 | P a g e
A: {no. of heads in the toss}, then a variable X can be assigned as a random variable to A,
where the values that X picks up are all random based on the outcome of the experiment i.e.
Toss of two coins.
Similarly, if an event B is defined as B as the number of tails in the toss of two coins,
B: {no. of tails in the toss}, then a variable Y can be assigned as a random variable to B, where
the values that Y picks up are all random based on the outcome of the experiment i.e. Toss of
two coins.
6.3.1 Types of random variables
A random variable is a variable that takes values that are nothing, but the outcomes associated
with the random experiment. Here, on the basis of values or data taken by the random variable,
the random variables can be distinguished on the basis of the observed data and its countability.
Thus, a random variable can be distinguished as a discrete rv or a continuous rv.
a) A discrete random variable: A discrete random variable is a rv whose possible values either
constitute a finite set or else can be listed in an infinite sequence in which there is a first
element, a second element, and so on ……... (countable finite).
b) A continuous random variable: The continuous random variable consists either of all the
numbers in a single interval on the number line (infinite from - infinity to infinity) or all
numbers in a disjoint union of such intervals. No possible value of the variable has a
positive probability, P(X=c) = 0 for any possible value c.
On the basis of the data taken by the random variable, we define the functions that yield the
corresponding value of probability for each specific value of a random variable. Such functions
are called probability distribution which gives probabilities of occurrences of different possible
outcomes of an experiment. Further, depending on whether the random variable takes discrete
or continuous values, these functions are referred to as Probability mass function (Pmf) or
Probability density function (pdf).
6.4 PROBABILITY MASS FUNCTION
For a discrete random variable X that can take at most a countably infinite number of values
x1, x2, ………………., we associate a probability
pi = P [X =x] = P [ all s € S, X(s) = x]
that must satisfy the following conditions,
1. p(x) ≥ 0 for all x which implies that the value of probability distribution is positive at
all the values taken by the random variable x.
2. ∑𝑥 𝑝(𝑥) =1 which implies that all the values taken by the random variable complete the
sample space, therefore the sum of all probabilities of every value of the random
variable is 1.
112 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
Example 1:
Let us consider an example of a lab in the Department of Economics, where six computers are
reserved for an Economics major. Let the random variable X denote the number of these
computers that are in use at a particular time in a day. Suppose the probability distribution of
X corresponding to each value of X is given below.
X 0 1 2 3 4 5 6
It is easy to verify that the above values satisfy both the properties of a Probability Mass
Function as each p(x) is positive and all of them sum to unity.
By using the above-given probabilities, various probabilities could be computed such as
The probability that at most 2 computers are in use is given by P (X ≤ 2), the probability that
the random variable takes at most the value 2.
P (X ≤ 2) = P (X =0 or 1 or 2) = p (0) + p (1) + p (2) = 0.05 + 0.10 +0.15 = 0.30
In case of the event that at least 3 computers are in use is given by P (X ≥ 3). Since the event
that at least 3 computers are in use is complementary to at most 2 computers are in use,
therefore, the probability can be computed as follows.
P (X ≥ 3) = 1 - P (X ≤ 2)
= 1 - 0.30
= 0.70
Another way to compute the probability of the event that at least 3 computers are in use is by
adding values of probability when X takes the values 3, 4, 5 and 6.
The probability that between 2 and 5 computers are in use is given by
P (2 ≤ X ≤ 5) = P (X = 2, 3, 4 or 5) = 0.15 + 0.25 + 0.20 + 0.15 = 0.75
The probability that the number of computers in use is strictly between 2 and 5 is
P (2 < X < 5) = P (X = 3 or 4) = 0.25 + 0.20 = 0.45
In the above example of the number of computers in use in the computer lab of the Department
of Economics, let us verify if the probability distribution function satisfies the properties.
113 | P a g e
Firstly, the value of each probability distribution function is positive. Therefore, the first
property is satisfied. The sum of all probabilities is 1. The second property is also satisfied.
Thus, the given distribution function can serve as a probability distribution function.
Let us create a probability distribution function for the toss of two fair coins. In this random
experiment, consider the event X which is the number of heads that appear.
S: Sample space X: Random variable P(X=x) p(x)
(r.v.)
(no. of heads)
TT 0 P(X=0) = ½ * ½ ¼
HT 1 P(X=1) = ½ * ½ ¼
TH 1 P(X=1) = ½ * ½ ¼
HH 2 P(X=2) = ½ * ½ ¼
Now, one can create the probability distribution function defined for the specific values x was
taken by the random variable X that represents the event, the number of heads that appear in
the toss of two fair coins. Thus, X can take values 0,1 and 2 because in two tosses of fair coins
we can have no heads, only one head or both outcomes as head.
The probability distribution function for the discrete random variable can be represented by the
following function p(x) which defines below.
¼ if x = 0
p(x) = ½ if x = 1
¼, if x = 2
1. The value of the probability distribution function is always positive. This implies p(x)
≥0
2. The sum of all values of probability distribution function at given values of x is one.
∑2𝑥=0 𝑝(𝑥) = 1
The above function satisfies both conditions; therefore, it serves as pdf or pmf, which
can be depicted the form of a graph as below.
114 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
0.5
0.4
0.3
0.2
0.1
0
1 2 3
456789
5 6 7 8 9 10
6 7 8 9 10 11
7 8 9 10 11 12
The probability distribution or pmf is therefore given by:
x p(x) = P(X=x)
2 1/36
3 2/36
4 3/36
5 4/36
6 5/36
7 6/36
8 5/36
9 4/36
10 3/36
11 2/36
12 1/36
116 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
IN-TEXT QUESTIONS
4. A random variable that can assume any possible value between two points is called a
________________ (discrete/continuous) random variable.
5. If C is a constant in a continuous probability distribution, then p(X=c) is always equal
to Zero. This statement is True/False.
6. A listing of all the outcomes of an experiment and the probability associated with each
outcome is called:
a) Probability density function
b) Cumulative distribution function
c) Probability distribution
d) Probability tabulation
CASE STUDY
Check if the following function given by
p(x) = x + 2 / 25, for x = 1,2,3,4,5
can serve as the probability distribution of a discrete random variable.
Solution: For all the values of x, p(x) is computed as follows,
For, x=1, f (1) = 3/25
x=2, f (2) = 4/25
x= 3, f (3) = 5/25
x=4, f (4) = 6/25
x=5, f (5) = 7/25
For every x, p(x) > 0, and also the sum of all p(x) is 1. Thus, all two properties of the
probability distribution function have been satisfied.
IN-TEXT QUESTIONS
4. Find the probability distribution of the total number of heads obtained in four tosses of
a balanced coin.
5. The probability distribution of a random variable is defined as
117 | P a g e
x -1 -2 0 1 2
p(x) c 2c 3c 4c 6c
Then, c is equal to
a) 0
b) ¼
c) 1
d) 1/16
6. The suitable graph for the probability distribution of a discrete random variable is
a) Probability Histogram
b) Stepwise Function
c) Both (a) and (b)
118 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
a b
Figure 3: The density curve between a and b given by P (a ≤ X ≤ b)
The probability between intervals a and b can be computed by integrating the density function
as depicted in figure 3. This implies that probability between a and b P (a ≤ X ≤ b) can be
𝑏
obtained by integrating the density function ∫𝑎 𝑓(𝑥)𝑑𝑥.
6.5.1 Properties of Probability Density Function
Every Probability density function qualifies certain properties. The first and foremost property
is the two conditions that should be satisfied for any function of a continuous random variable
to be addressed as the probability density function.
1. A function can serve as a probability density of a continuous random variable X if its
values, p(x), satisfy the following two conditions.
(a) p(x) ≥ 0 for -∞ < x < ∞, for all x
∞
(b) ∫−∞ 𝑝(𝑥)𝑑𝑥 = 1
2. If X is a continuous random variable and a and b are real constants with a ≤ b then,
CASE STUDY
For example, If X random variable has the probability density given by
f (x) = k.e-3x for x > 0
0 elsewhere
119 | P a g e
The given function satisfies the two necessary conditions for the probability density function.
(a) p(x) ≥ 0 for -∞ < x < ∞
∞ ∞
(b) ∫−∞ 𝑓(𝑥)𝑑𝑥 = 1 = ∫−∞ k. 𝑒 −3𝑥 dx = k/3 =1
1
For k = 3, P (0.5 ≤ X ≤ 1) = ∫0.5 3. 𝑒 −3𝑥 dx = 0.173
IN-TEXT QUESTIONS
The pdf of a continuous random variable X is given by:
0.075𝑥 + 0.2, 3 ≤ 𝑥 ≤ 5
𝑓(𝑥) = {
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
7. The total area under the density curve of X is
(a) 1
(b) 0
(c) ½
(d) ¼
8. Find P(X≤ 4).
9. P(X≤ 4) will be the same as P(X<4). This statement is True/False.
6.6 SUMMARY
The lesson describes the concept of a random variable which takes values that are outcomes of
random experiments. The two types of random variables namely the discrete and continuous
random variables have been described in the lesson. The discrete random variables are the
variables whose values are either finite or infinite in character. While the continuous random
variable consists either of the intervals on the number line or a disjoint union of intervals. The
probability distribution functions, also known as probability mass functions, depicting
probabilities corresponding to each outcome or value of the discrete random variable have been
derived. The corresponding graph of probability mass function has been plotted. Similarly, the
probability density function, depicting the probability associated with each of the continuous
random variable has been derived. Finally, the properties of the probability distribution and
probability density functions have been presented in the lesson.
120 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
6.7 GLOSSARY
1. Random Variable: A random variable is a rule that assigns a numerical value to each
outcome in a sample space. It is a variable because the observed value depends on which
of the possible experimental outcomes results.
2. Discrete random variable: A discrete random variable is an rv whose possible values
either constitute a finite set or else can be listed in an infinite sequence in which there
is a first element, a second element, and so on
3. Continuous random variable: The continuous random variable consists either of all
the numbers in a single interval on the number line (infinite from - infinity to infinity)
or all numbers in a disjoint union of such intervals.
4. Probability distribution function: The probabilities assigned to various outcomes in
S in turn determine probabilities associated with the values of any particular random
variable rv. X. It is a function that provides relative likelihood of occurrence of all
possible outcomes of an experiment.
5. Probability density function: The function defines the probability function
representing a continuous random variable belonging to some specified range of values.
The function provides the likelihood of values of continuous random variables.
6.8 ANSWERS TO THE QUESTIONS
1. Discrete
2. True
3. (c) Probability Distribution
4. p(x) = 4Cx /16, for x = 0, 1,2, 3, 4
5. (d) 1/16
6. (a) Probability Histogram
7. (a) 1
8. 0.4625
9. True
6.9 SELF-ASSESSMENT QUESTIONS
1. What is the difference between discrete and continuous random variables?
2. An event management company has overbooked the tickets for an upcoming music
concert. The available seating capacity is 40 while the company has sold 45 tickets.
121 | P a g e
Suppose X denotes the number of ticketed people who actually show up for the concert.
The probability mass function of X is given by:
x 35 36 37 38 39 40 41 42 43 44 45
p(x) 0.05 0.10 0.13 0.13 0.25 0.18 0.05 0.05 0.03 0.02 0.01
What is the probability that the event management company will be able to accommodate
all ticketed people who show up?
3. Prove that the following function defined by
1
𝑓(𝑥) = {5 , 2<𝑥<7
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
can serve as a valid pdf for a random variable X.
4. The pdf of a continuous random variable Y is given by
𝑘
, 0<𝑦<4
𝑓(𝑦) = {√𝑦
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
Find the value of k. Also, find P (Y≥ 1)
6.10 REFERENCES
• Devore, J. L. (2015). Probability and Statistics for Engineering and the Sciences.
Cengage Learning.
• Freund, J. E., Miller, I., & Miller, M. (2004). John E. Freund's Mathematical Statistics:
With Applications. Pearson Education India.
• McClave, J. T., Benson, P. G., & Sincich, T. (2008). Statistics for business and
economics. Pearson Education.
6.11 SUGGESTED READINGS
• Rice, J. A. (2006). Mathematical statistics and data analysis. Cengage Learning.
• Larsen, R. J., & Marx, M. L. (2005). An introduction to mathematical statistics.
Prentice Hall.
122 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
LESSON 7
CUMULATIVE DISTRIBUTION FUNCTION, DENSITY FUNCTION, EXPECTED
VALUE, AND VARIANCE
STRUCTURE
7.1 Learning Objectives
7.2 Introduction
7.3 Cumulative Distribution Function
7.3.1 Properties of Cumulative Distribution Function
7.4 Cumulative Density Function
7.4.1 Properties of Cumulative Density Function
7.5 Expected Value
7.5.1 Properties of Expected value
7.6 Variance
7.6.1 Properties of Variance
7.7 Summary
7.8 Glossary
7.9 Answers to In-text Questions
7.10 Self-Assessment Questions
7.11 References
7.12 Suggested Readings
7.1 LEARNING OBJECTIVES
• To understand the need to evaluate the descriptive statistics of the probability
distribution function.
• To learn the formula and method to derive the expected value of the probability
distribution function.
• To compute and apply the expected value in different random experiments by deriving
the probability distribution functions.
• To learn the formula and method to derive the variance of the probability distribution
function.
123 | P a g e
• To compute and apply the variance in different random experiments by deriving the
probability distribution functions.
• To provide exposure to various useful properties of expected value and variance to the
students and their applications.
7.2 INTRODUCTION
In the earlier lesson 6, we discussed random variables, types of random variables, and
probability distributions. Once we have the probability distribution function of a random
variable, it is essential to evaluate and assess its mean and other descriptive statistics such as
expected value and variance. The population mean for a random variable is a measure of centre
for the distribution of a random variable. The expected value is essentially a formula that
enables us to evaluate the mean value as more and more values of the random variables are
collected either by trials or random experiments or any kind of experiment involving
probability, the sample mean becomes closer and closer to the expected value. It is obtained by
summing the product of the value of the random variable and its associated probabilities over
all the values of the random variable.
Another important measure of dispersion is variance. It determines the measure of spread for
the distribution of a random variable. It reflects the degree of variability of values of a random
variable from the expected value. The variance for a given probability distribution function is
obtained by summing the product of the square of the difference between the value of the
random variable and the expected value, and the associated probability of the value of the
random variable taken across all the values of the random variable.
7.3 CUMULATIVE DISTRIBUTION FUNCTIONS (CDF)
The cumulative distribution function (cdf) of a discrete r.v. variable X with pmf f(x) is defined
for every number x by
F(x) = P (X ≤ x) = ∑𝑥𝑦=−∞ 𝑝(𝑦)
For any number x, F(x) is the probability that the observed value of X will be at most x.
Let us compute the Cumulative Distribution Function for the toss of two coins.
S: Sample space X: Random variable f(x): pmf F(x): CDF (cdf)
(r.v.) (no. of heads)
TT 0 ¼ ¼
HT 1 ¾=¼+½
TH 1 ½
HH 2 ¼ 1=¼+¾
124 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
The first column represents the elements in the sample space. The next column presents the
values of a random variable which is defined as the number of heads in the toss of two coins,
denoted by X random variable. The values that random variable X takes are denoted by x and
these values are 0,1 or 2. The third column represents the corresponding probabilities
associated with each value of random variable X. The fourth column is obtained by adding or
cumulating all the earlier probabilities till each random variable.
The Cumulative Distribution Function for the toss of two coins can be presented in the
following way.
¼ if x< 1
F(x) = ¾, if 1 ≤ x < 2
1, if x ≥ 2
1. f(x) ≥ 0
The probability distribution function is positive for each value of random variable X.
2. F (x) = ∑𝐷 𝑓(𝑥)= 1
This property signifies that the value of the cumulative distribution function for the last
value of the random variable for which it is defined is equal to 1. This is due to the fact that
while cumulating all the probabilities till the last value of random variable X for which the pdf
is defined, we tend to exhaust all plausible values or outcomes of sample space and therefore
the value of CDF function is 1 as the probability of whole sample space is 1.
The graph of the above cumulative distribution function can be presented in figure 1 below.
125 | P a g e
The graph of the cumulative distribution function is a step function. For all the values of
random variables less than zero, the value of the cumulative distribution function CDF is zero
since the probability or the value of the probability distribution function is zero. The cumulative
distribution function is defined for intervals where the extreme points of each interval are not
included in the interval concerned but included in the next slab or interval. At each interval the
values of the probability function keep getting added or cumulated, thus the cumulative
distribution function appears like a ladder or steps moving upwards. At the last value of the
random variable, all the values of probability distributions get added and the value of the
cumulative distribution function attains the value one, where it reaches the maximum value
and the value of cdf remains at one for all the infinite values of random variable for which it is
defined.
Consider whether the next person buying a computer at a university bookstore buys a laptop or
a desktop model.
X = 1, if the customer purchases a laptop computer
0, if the customer purchases a desktop computer
If 20% of all purchasers during a week select a laptop computer
For X = 0, p (0) = P(X=0) = P (next customer purchases a desktop model) = 0.8
For X = 1, p (1) = P (X =1) = P (next customer purchases a laptop model) = 0.2
p(x) = P (X = x) = 0 for x ≠ 0, 1
In the above activity, the example mentioned related to the next person buying a computer at a
university bookstore buying a laptop or a desktop model. The cumulative distribution function
can be derived in the following manner.
0, if x< 0
F(x) = 0.8, if 0 ≤ x < 1
1, if x ≥ 1
126 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
The value of CDF remains zero for all negative values of x for which the probability
distribution function is zero, while the CDF takes a value of 0.8 between 0 and 1 and finally 1
for all values 1 and greater than 1.
127 | P a g e
The CDF is a non-decreasing function in the random variable X. This implies for each greater
value of X; the value of the cumulative distribution function is also greater.
If the probability distribution of a discrete random variable is given, the corresponding
distribution function can be derived.
The distribution function of the total number of heads obtained in four tosses of a balanced
coin can be obtained as below. For x = 0,1,2,3,4.
‘
f (0) = 1/16, f(1) = 4/16 , f(2) = 6/16, f(3) = 4/16 , and f(4) = 1/16
It follows that
F (0) = f (0) = 1/16
F (1) = f (0) + f(1) = 5/16
F (2) = f (0) + f(1) + f(2) = 11/16
F (3) = f(0) + f(1) + f(2) + f(3) = 15/16
F (4) = f (0) + f(1) + f(2) + f(3) + f (4) = 1
The properties of Distribution Function are satisfied by the above CDF.
1. F(- ∞ ) = 0 and F ( ∞ ) = 1
2. If a < b, then F (a) ≤ F (b) for any real numbers a and b.
Therefore, the distribution function is given by
0 for x < 0
1/16 for 0 ≤ x < 1
F(x) = 5/16 for 1 ≤ x < 2
11/16 for 2 ≤ x < 3
15/16 for 3 ≤ x < 4
1 for x ≥ 4
The distribution function is defined not only for the values taken on by the given random
variable but for all real numbers.
F (1.7) = 5/16 and F (100) =1, although the probabilities of getting “at most 1.7 heads” or “at
most heads” in four tosses of a balanced coin may not be of any real significance.
If the range of a random variable X consists of the values x1< x2< x3 < x4 ……<xn , then
128 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
0.25 x=0
f(x) = 0.10 x=1
0.30 x=2
0.35 x=3
(A) Calculate the Probability that almost 1 bulb is defective.
(B) find CDF.
Q2: Let X be the number of a group that attended the festival of the college.
X 0 1 2 3 4 5
P(X) 0.20 0.10 0.05 0.15 0.30 0.20
(a) Find the probability that at most two groups attended the festival.
(b) Find the probability that at least four groups attended the festival.
Q3: Cumulative distribution function of a random variable Y is the probability that Y takes
the value _____
(a) Equal to Y
(b) Greater than Y
(c) Less than or equal to Y
(d) Zero
129 | P a g e
f(x) F(x)
F(x) 1
x
Figure 2 (a): Probability density function Figure 2(b) : Cumulative density function
Figure 2 (a) depicts the probability density function while figure 2(b) depicts the cumulative
density function.
7.4.1 Properties of Cumulative Density Function
There are certain properties of cumulative density function:
130 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
f (x) = dF(x) / dx
where the derivative exists.
The third property indicates that one can obtain the probability function from the given
distribution or cumulative density function by simply differentiating the density function.
CASE STUDY
In the case study depicted in lesson 7, section 7.5.1, If X random variable has the probability
density given by
f (x) = k.e-3x for x > 0
0 elsewhere
= 1 - e-3t
Since F(x) = 0 for x ≤ 0,
F(x) = 0 for x ≤ 0
=1 -e-3x for x > 0
Now, to determine the probability P( 0.5 ≤ X ≤ 1 ), Cumulative density function F (x) can be
used
P( 0.5 ≤ X ≤ 1 ) = F(1) - F (0.5)
= (1 -e-3 ) - (1 - e-1.5 )
= 0.173
131 | P a g e
IN -TEXT QUESTIONS
Q.4 Given the following probability density function pdf
132 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
If the r.v. X has a set of possible values D and pmf f(x), then the expected value of any function
h(x), denoted by E [ h(x)] or μh(x) is computed by
E [ h(x)] or μh(x) = ∑𝐷 ℎ(𝑥). 𝑓(𝑥)
The expected value of h(x) is given by the sum of the product of h(x) and the corresponding
probability distribution function f(x) as shown in the above expression, where D is the domain.
Example 1: Suppose there is a collection of 12 audio sets that include 2 with white cords. If
three of the sets are chosen at random for shipment to a hotel, how many sets with white cords
can the shipper expect to send to the hotel?
First, we need to construct the probability distribution of X, the number of sets with white cords
shipped to the hotel, given by
2 10
f(x) = Cx * C3-x for x = 0,1,2
10
C3
X 0 1 2
133 | P a g e
E(b) = b
2. Expectation of the sum of the two random variables X and Y is equal to the sum of the
expectations of those random variables.
E (X+Y) = E (X) + E (Y)
3. The expected value of ratio of two random variables is not equal to the ratio of the
expected values of those random variables.
E (X/ Y) ≠ E(X) / E(Y)
4. The expected value of the product of two random variables that are dependent is not
equal to the product of expectations of those random variables.
E (XY) ≠ E(X) * E(Y)
However, if X and Y are independent random variables then
E (XY) = E(X) * E(Y)
Hint: Since the joint probability mass function for all values of X and Y is equal to the product
of individual probability distribution function pdf of two random variables for all values of
variables.
134 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
5. The expected value of the square of X is not equal to the square of the expected value
of X
E (X2) ≠ [ E(X) ]2
6. If a is a constant, then
E (aX) = a E (X)
135 | P a g e
Let X have pmf f(x) and expected value E(X) as μₓ. Then the variance of X denoted by V(X)
or 𝝈 x²X is
V(X) = 𝝈 x²X = ∑ (X - μₓ) ². f(x) = E (X - μₓ) ²
The standard deviation sd of X is 𝝈x = √ 𝝈²x
Let us consider the computation of variance of the two coins
X: Random f(x) :pmf x.f(x) x 2 f (x)
variable (r.v.) (no.
of heads)
0 ¼ 0* 1/4 0*¼ = 0
1 ½ 1 * 1/2 1* ½ = ½
2 ¼ 2 * 1/4 4* ¼ = 1
E (X) = ∑𝐷 𝑥. 𝑓(𝑥) or E (X) = 1 E (X²) = ∑𝐷 𝑥 2 . 𝑓(𝑥) = 1.5
136 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
A CASE STUDY
Calculate the variance of a random variable X that represents the number of points rolled with
a balanced die.
X 0 1 2 3 4 5 6
V (X + Y) = V (X) + V (Y)
V (X - Y) = V (X) - V (Y)
137 | P a g e
4. If b is a constant then,
V (X + b) = V (X) + V (b)
= V (X) + 0
= V (X)
5. If a is a constant then,
V (aX) = a2 V (X)
V (5X) = 25 V (X)
6. If a and b are constants then,
V (aX + b) = a2 V (X) + 0
V (5X + 9) = 25 V (X)
7. If X and Y are independent random variables and a & b are constants then,
V (aX + bY) = a2 V (X) + b2 V (Y)
V (3X + 5Y) = 9 V (X) + 25 V (Y)
8. The variance can be computed as
V (X) = E (X2) - [ E (X) ]2
E (X2) = ∑𝐷 𝑥 2 . 𝑓(𝑥)
E (X) = ∑𝐷 𝑥. 𝑓(𝑥)
IN-TEXT QUESTION
Q.10 V(X) = 9, Find V(2X).
(A) 18
(B) 9
(C) 36
(D) 72
Q.11 If the standard deviation of a set of observations is 5 and if each observation is divided
by 5, then the new standard deviation is.
(A) 1
(B) 2
(C) 4
(D) 5
138 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
7.7 SUMMARY
The lesson presents the cumulative distribution function both for the discrete and continuous
random variables. The properties of the cumulative distribution function have been discussed
with respect to both the discrete random variable and continuous random variable. The method
to derive the probability density function from the cumulative density function by
differentiating the cumulative density function is explained. Further the descriptive statistics
such as expected value and variance is computed for probability distribution function. There
are several useful and interesting properties of expected value and variance comprehensively
explained in the lesson.
7.8 GLOSSARY
1. Cumulative Distribution Function: The cumulative distribution function is another
way of defining the distribution of the discrete random variable. The cumulative
distribution function (cdf) is a discrete r.v. variable X with pmf p(x) is defined for every
number x
2. Cumulative Density Function: The cumulative density function, also referred to as
Density Function is the cumulative function of probability density distribution for
continuous random variables.
3. Expected value of Distribution Function: Let x be a discrete rv with a set of possible
values D and pmf p(x). The expected value or mean value of X, denoted by E (X) or μₓ.
It is the sum of the product of each random variable and the associated probability
function
4. Variance of Distribution Function: The variance is the square of the mean of
deviation between the values of random variables from the expected value or population
means. The variance of the random variable X is denoted by 𝛔2, Var (X). The
symbol represents standard deviation under root of variance.
7.9 ANSWERS TO IN-TEXT QUESTIONS
139 | P a g e
140 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
Q.4 A coin is tossed thrice. Let X denotes the number of tails. Find its expectation and
variance
Q.5 The probability density function of a random variable Y is given as below:
• Freund, J. E., Miller, I., & Miller, M. (2004). John E. Freund's Mathematical Statistics:
With Applications. Pearson Education India.
• McClave, J. T., Benson, P. G., & Sincich, T. (2008). Statistics for business and
economics. Pearson Education.
7.12 SUGGESTED READINGS
• Rice, J. A. (2006). Mathematical statistics and data analysis. Cengage Learning.
• Larsen, R. J., & Marx, M. L. (2005). An introduction to mathematical statistics.
Prentice Hall.
142 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
LESSON-8
DISCRETE DISTRIBUTION
STRUCTURE
8.1 Learning Objectives
8.2 Introduction
8.3 Probability distribution for discrete random variable
8.3.1 Uniform Distribution
8.3.2 Bernoulli Distribution
8.3.3 Poisson Distribution
8.3.4 Limiting Case of Binomial Distribution
8.3.5 Hyper Geometric Distribution
8.4 Summary
8.5 Answer to Intext Questions
8.6 Self Assessment Questions
8.7 References
8.1 LEARNING OBJECTIVES
(1) In this unit, you will learn about different kinds of discrete distributions.
(2) Uniform distributions, Bernoulli distributions, Bernoulli trials have been discussed
(3) Binomial distributions, Poisson distributions with some important numerous have been
discussed.
(4) Waiting distributions i.e., geometric distributions, negative binomial and
hypergeometric distributions have been discussed.
8.2 INTRODUCTION
In this unit we will study the different types of discrete distributions. We have studied the
discrete random variable in unit 3. The discrete random variables form the discrete probability
distributions. Possible values of discrete random variables along with the probabilities forms
the discrete probability distribution.
1. Uniform Distribution
143 | P a g e
2. Bernoulli Distribution
3. Binomial Distribution
4. Poisson Distribution
5. Limiting case of Binomial distribution
6. Hyper-geometric distribution
144 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
145 | P a g e
= 1- [0.0776 + n C1p1q 4
0 ; otherwise
where represents average number of successes.
E(X) = V (X) = i.e., mean and variance =
For eg: Noting the number of deaths in an area during the month.
The number of cars arriving at parking during a given period of time.
The number of errors made by typist per page.
The number of defective bulbs in a manufacturing unit etc.
The average number of customers per hour at a shop.
Visits to a particular website, e mail messages sent to a particular address.
Accidents in an industrial facility.
Cosmic ray showers observed by astronomers at a particular observatory.
These are some of the examples where Poisson distribution can be used.
For example, the number of customers at a shop is 4 in an hour. Find the probability that during
an hour (i) no customer arrived (ii) 2 or more customer arrived at shop.
Let X be the number of customers at shop is an hour.
So, X follows Poisson distribution with = 4
146 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
0 ; otherwise
e- 4 ´ 40
f (X = 0) = = 0.01831
(i) 0!
(ii) f(X2)=1-f(X<2)
= 1 – [f (x = 0) + f (x = 1)]
𝑒 −4 41
= 1-[0.1831+ ]
1!
= 1 – [0.01831 + 0.07326]
= 0.90843
8.3.5 Limiting Case of Binomial Distribution
If the probability of success in the binomial distribution is too small and number of trials are
large, then binomial distribution can be approximated to Poisson distribution. In such an
approximation the average number of successes is the mean of binomial distribution which is
np i.e.
= np
Mathematically, if n→ ∞ and p → 0 then binomial distribution is approximated to Poisson
distribution.
For e.g., the probability of ineffective covid vaccine is 0.002, determine that out of 1000
individuals:
(i) exactly 2 will suffer Covid infection after being vaccinated.
(ii) more than 2 will suffer from infection.
Answer. since the p is 0.002 i.e., p → 0 and n is large i.e., 1000. So, the Binomial distribution
will approximate to Poisson distribution with = np.
= np
0.002
l = ´ 1000
1000
147 | P a g e
= 2
e- 2 22
f (X = 2) = = 0.2706
(i) 2!
(ii) f(X>2)=1-f(X2)
= 1- [f (X = 0) + f (X = 1) + f (X = 2) ]
𝑒 −2 20 𝑒 −2 21 𝑒 −2 22
=1-[ + + ]
0! 1! 2!
X 0 1 2 3 4
F 25 68 45 12 5
148 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
For example: Let there be 5 economics and 10 commerce graduates. The organization requires
5 analysts that are to be chosen from the economics and commerce students. Find the
probability of selecting 3 economics students for this job.
K = 5 = number of economic students
N – K = 10 = number of commerce student
N = 15 total student
x = selecting economics student as analyst
n = total student selected as analyst
K
Cx ´ N- K
C n- x
f (x) =
N Cn
C3 ´ 10 C2
5
f (x = 3) = 15
C5
= 0.1498
8.4 SUMMARY
A thorough knowledge of all the discrete distribution will help to assess the level of uncertainty
and plan accordingly. All the distribution along with their probability mass function, mean and
variance are shown in the following table.
Distribution Probability Mass Function PMF Mean Variance
Uniform 1 n2
Distribution f (x = x) = f (x) = x = 1, 2,3.....n n 12
n
2
Poisson 𝑒 λ λ𝑥
𝑥!
So, the Binomial distribution will approximate to Poisson distribution with λ = np.
λ= np
λ = 0.00006×100000
100000
λ =6
𝑒 −6 61
f(X=x) = 1!
= 0.0148
So, Answer is 0.0148
2. Bernoulli
3. small; large
214
4p=155
107 203
p=310 ; q=310
150 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
1 107 1 203 3 60
𝐶14 𝑋 (310) X(310) =0.3876
2 107 2 203 2
𝐶24 𝑋 (310) X(310) =0.3065 47.5148
3 107 3 203 1
𝐶34 𝑋 (310) X(310) =0.1077 16.6917
X 0 1 2 3 4
F 28 60 48 17 2
151 | P a g e
152 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
LESSON 9
CONTINOUS DISTRIBUTION
STRUCTURE
9.1 Learning Objective
9.2 Introduction
9.2.1 Uniform Distribution
9.2.2 Exponential Distribution
9.2.3 Normal Distribution
9.2.4 Standard Normal Distribution
9.2.5 Central Limit Theorem
9.3 Summary
9.4 Answer to In Text Question
9.5 Self Assessment Question
9.6 References
9.1 LEARNING OBJECTIVES
In this chapter we will learn about various continuous distributions.
1. In the first section uniform and exponential distributions are discussed.
2. Normal distribution with its properties has been discussed.
3. Standard Normal distribution with its application has been discussed
4. Central limit theorem and in application has been discussed.
9.2 INTRODUCTION
In unit 3, you have learned about continuous random variables and associated functions. In this
chapter you will read about different kinds of continuous distribution. Normal distribution is
used extensively in economics and statistics. Several important applications of normal
distribution with cognizable examples have been discussed. The central limit theorem is one of
the most celebrated theorems in statistics and is used extensively.
153 | P a g e
140 70
= −
90 90
70
= = 0.78
90
130
1
P(50 Y 130) = 90
dx
50
154 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
130 50
= − = 0.89
90 90
9.2.2 EXPONENTIAL DISTRIBUTION
A continuous non-negative random variable X is said to have an exponential distribution with
parameters if its pdf is given by
f ( X ) = e− x ; x≥0
0 ; otherwise
Exponential distribution is also known as waiting distribution where waiting parameter is .
1 1
E( X ) = and V (X ) =
So, 2
x
P ( X x) = e − t dt
Cumulative density function 0
− t
x
P( X x) = e
− 0
= 1 − [1 − e− x ]
= e− x
Example: A call center receives 4 calls per hour. What is the probability that next call arrives
after ½ hours?
Solution: So = 4
P X = (4e −4 x )dx
1
2 1/ 2
155 | P a g e
−4 x
4 e
= −4 1/ 2
= −e−4 + e−4 / 2
= −e − + e−2
= 0.13533
9.2.3 Normal Distribution
This is the most important distribution developed in 1733 by French Mathematician De Moivre.
The normal distribution is also called Gaussian distribution as German Mathematician
Friedrich Gauss (1777-1855) derived in equation. It is a symmetric bell-shaped curve
A random variable X is called normal random variable
1 X − 2
1 −
f (X ) = e 2 − x
2
Where constant π=3.14 and e=2.718. µ and σ are the two parameters of the distribution and X
is a real number denoting the continous random variable of interest.
Properties
• It is symmetric through the mean.
• Because of the symmetry at the points of inflexion at ±σ distance, the normal curve has
a bell shape
• The right and left tails of the curve extend infinitely without touching the horizontal x
axis.
• In normal distribution Mean = Median = Mode.
• The area between two points a and b is represented by the shaded region.
156 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
Property O
(2) In standard Normal curve areas at the right and left of 0 is 0.5
Normal curve is symmetric So, any area between 0 and a particular point c and area between 0
and point −c will be same.
P(−c z 0) = P(0 z c)
For e.g., Area between 0 and 1.45 will be equal to area between O and −1.45.
158 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
1. P( X 15)
2. P( 10< X<25)
3. P(12 X 36)
159 | P a g e
(II)
10 − 30 X − 25 − 30
P(10 X 25) = P
(III) 4 4
= P(−5 Z −1.25)
= P(−5 Z 0) = P(0 Z −1.25)
= 0.49999 − 0.3944
= 0.1055
12 − 30 X − 36 − 30
P (12 X 36) = P
(IV) 4 4
X −
= P(−4.5 1.5)
160 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
= P(−4.5 Z 1.5)
= (−4.5 Z 0) + P(0 Z 1.5)
= 0.4999 + 0.4332
= 0.9331
INTEXT QUESTIONS
Ques. In normal distribution, 34% of the items are under 50 and 5% of items are over 70. Find
the mean and variance of the distribution.
Ques. Standard normal distribution has mean ………... and variance…………
SKEWNESS
When a distribution aperture from its symmetry then it said to have a skewed distribution.
There are two types of skewed distribution, i.e., positive and negatively skewed distribution. A
positive skewed distribution is skewed to right or have a longer tail at the right side.
Similarly, negatively skewed distribution is skewed towards left or have longer tail at the left
side.
161 | P a g e
KURTOSIS
The degree of peakedness is determined by the kurtosis. A high peak distribution is
characterized as leptokurtic
A low peak distribution is characterized as Platykurtic
The normal distribution is Mesokurtic neither too peaked nor too low.
So, normal distribution is symmetric i.e., neither positive nor negatively skewed and they are
mesokurtic i.e., neither too high peaked nor too low peaked.
162 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
# Rule E ( X + Y ) = E ( X ) + E (Y )
X i = X1 + X 2 + ... + X n
E (X i ) = E ( X1 + X 2 + ... + X n )
E (X i ) = E ( X1 ) + E ( X 2 ) + ... + E ( X n )
E (X i ) = + + .... +
To find variance of
V (X i ) = V ( X1 + X 2 + X 3 + ... X n ) .
V (X i ) = V ( X1 ) + V ( X 2 ) + V ( X 3 ) + ...V ( X n )
V (X i ) = 2 + 2 + ... + 2
X i − n
z= ~ N (0,1)
n 2
163 | P a g e
X i
X=
n
X
E( X ) = E i E ( X i ) = n
n From (1)
1
= E (X i )
n
1
= n
n
=
X
V (X ) = V i
n # Rule V (aX ) = a V ( X )
2
V (X i )
V (X ) =
n2 V (X i ) = n 2
from (2)
n 2
V (X ) =
n2
2
V (X ) =
n
2
X ~ N ,
n
Ques. In a large population the distribution of a variable has a mean of 165 and standard
deviation 25 units. If a random sample of size 35 is chosen, find the approximate probability
that the sample mean lies between 162 and 170.
2
Solution: X ~ N (165, 25 )
Where, sample size (n) = 35
We have to find the distribution of sample mean
164 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
2
X ~ N ,
n , Since = 165, 2 = 252
2
X ~ N 165, 25
35
252 2
Note that 35 i.e., n is variance
2
=
Standard deviation is n n
= P(−0.70 z 1.18)
= P( z 1.18) + P( z 0.70)
as P( z 0.70) = P( z −0.70)
= 0.3810 + 0.2580
= 0.639
As the sample size increases then distribution of X will tend to normal distribution. For a
distribution to be approximated to normal distribution, sample size must be at least 30 or in
other words n 30. As the sample size increases, even discrete distribution approximates
normal distribution.
165 | P a g e
INTEXT QUESTIONS
Q. Consider a random sample of size 30 taken from a Normal distribution with Mean 60
and variance 25. Let the sample mean be denoted by X . So, calculate the probability
that X assumes a value greater than 62.
166 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
167 | P a g e
2
;
n
LESSON 10
STRUCTURE
10.1 Learning Objective
10.2 Introduction
10.3 Joint probability mass function
10.3.1 Conditional probability distributions
10.3.2 Independence of random variables
10.3.3 Marginal probability mass functions
10.3.4. Expectations of probability mass functions
10.4 Continuous random variables
10.4.1 Marginal probability density functions
10.4.2 Expected value of a probability density function
10.4.3 Conditional probability distributions
10.5 Summary
10.6 Glossary
10.7 Answers to In-text Questions
10.8 Self-Assessment Questions
10.9 References
10.10 Suggested Readings
P[( X , Y ) A] = P ( x, y )
(c) ( x , y ) A
It must be noted that conditions (a) and (b) are required for P( x, y ) to be a valid joint PMF.
Example-1
Consider two random variables X and Y with joint PMF as shown in the table below:
Y=0 Y=1 Y=2
X=0 1/6 1/4 1/8
X=1 1/8 1/6 1/6
170 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
(i) P( X = 0, Y 1)
(ii) P(Y = 0, X 1)
Solution:
Example-2
A function f is given by 𝑓(𝑥, 𝑦) = 𝑐𝑥𝑦 for 𝑥 = 1,2,3 ; 𝑦 = 1,2,3
Determine the value of 𝑐 for which the above function 𝑓(𝑥, 𝑦) 𝑣𝑎𝑙𝑖𝑑𝑎𝑡𝑒 𝑎𝑠 𝑡𝑟𝑢𝑒 𝑝. 𝑚. 𝑓.
Solution:
From the question given,
𝑓(𝑥, 𝑦) = 𝑐𝑥𝑦
Where, 𝑥 = 1,2,3 𝑎𝑛𝑑 𝑦 = 1,2,3.
There are 9 possible pairs of 𝑋 𝑎𝑛𝑑 𝑌, namely (1,1), (1,2), (1,3), (2,1), (2,2), (2,3),
(3,1), (3,2) 𝑎𝑛𝑑 (3,3). The probabilities associated with each of the pairs are:
𝑓(1,1) = 𝑐(1)(1) = 𝑐
𝑓(1,2) = 2𝑐, 𝑓(1,3) = 3𝑐, 𝑓(2,1) = 2𝑐
𝑓(2,2) = 4𝑐, 𝑓(2,3) = 6𝑐, 𝑓(3,1) = 3𝑐
𝑓(3,2) = 6𝑐, 𝑓(3,3) = 9𝑐
For 𝑓(𝑥, 𝑦) to be a valid joint 𝑝𝑚𝑓,
∑ ∑ 𝑓(𝑥, 𝑦) = 1
𝑦
𝑥
Hence,
3 3
∑ ∑ 𝑓(𝑥, 𝑦) = 𝑐 + 2𝑐 + 3𝑐 + 2𝑐 + 4𝑐 + 6𝑐 + 3𝑐 + 6𝑐 + 9𝑐 = 1
𝑥=1 𝑦=1
171 | P a g e
36𝑐 = 1
1
𝑐=
36
1
Thus, for, 𝑐 = the given function is a valid probability mass function.
36
where C , D R .
For discrete random variables X and Y, the conditional PMFs of X given Y and Y given by X
respectively are given by
PXY ( xi , y j )
PX |Y ( xi / y j ) =
PY ( y j )
{for any
xi RX and
PXY ( xi , y j )
PY | X ( y j / xi ) =
PY ( xi ) y j RY
Example-3
Consider two random variables X and Y with joint PMF as shown in the table below
Y=2 Y=4 Y=5
X=1 1/12 1/24 1/24
X=2 1/6 1/12 1/8
X=3 1/4 1/8 1/12
172 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
(a) P( X 2, Y 4)
(b) P(Y = 2 | X = 1)
Solution:
(a) P( X 2, Y 4)
= PXY (1, 2) + PXY (1, 4) + PXY (2, 2) + PXY (2, 4)
1 1 1 1 3
= + + + =
12 24 6 8 8
P( X = 1, Y = 2)
P(Y = 2 | X = 1) =
(b) P( X = 1)
P (1, 2) 1 1 1
= XY = =
PX (1) 12 6 2 Ans.
for all
xi RX and for all y j RY
1
P( X = 2, Y = 2) =
6
3 1 3
P( X = xi ) P(Y = y j ) = =
8 2 16
173 | P a g e
1 3
6 16
X and Y are not independent.
If ( X , Y ) are discrete variables, then marginal probability is the probability of a single event
that occur independent of another event.
Xi
The marginal probability mass function of is obtained from the joint PMF as shown below–
PX i ( x) = PX ( x1 , x2 ,..., xk )
X1... X k
In words the marginal PMF of Xi at the point X is obtained by taking the sum of the joint PMF
PX out all the vectors that belong to R X in such a way that is component is equal to X.
Example-5
Carrying forward from example 3, find the marginal PMFs of X and Y.
Solution
RX = {1, 2,3}, RY = {2, 4,5}
Marginal PMFs are given by
1
6 , for X = 1
3 for X = 2
PX ( x) = 8
11
for X = 3
24
0 Otherwise
174 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
1
2 , for Y = 2
1 for Y = 4
PY ( y ) = 4
1
for Y = 5
4
0 Otherwise
Let X and Y be a jointly distributed Random variable with probability mass function P( x, y )
with discrete variables. Then the expected value of function g ( x, y) is given by
E[ g ( X , Y )] = g ( X , Y ) P( x, y)
x y
Example-6
Find E(XY) for data given in example 2
Solution:
1 1 1 1 1
= (1 × 2 × ) + (1 × 4 × ) + (1 × 5 × ) + (2 × 2 × ) + (2 × 4 × )
12 24 24 6 12
1 1 1 1
+ (2 × 5 × ) + (3 × 2 × ) + (3 × 4 × ) + (3 × 5 × )
8 4 8 12
177
= = 7.38
24
IN – TEXT QUESTIONS
1. Let U {0,1} and V {0, 1} be two independent binary variables. If P(U = 0) = P and
P(V = 0) = q, when P(U + V ) 1 is
(a) pq + (1 − p)(1 − q)
(b) pq
(c) p (1 − q)
175 | P a g e
(d) 1 − pq
2. If a variable can take certain integer values between two given points, then it is called –
(a) Continuous random variable
(b) Discrete random variable
(c) Irregular random variable
(d) Uncertain random variable
3. If E (U ) = 2 and E (V ) = 4 then E (U − V ) = ?
(a) 2
(b) 6
(c) 0
(d) Insufficient data
4. Height is a discrete variable (T / F)
5. If X and Y are two events associated with the same sample space of a random experiment. then
P( X | Y ) is given by
(c) P( X Y ) / P(Y )
(d) P ( X Y ) / P( X )
176 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
The probability that the observed value of a continuous random variable X lies in a one-
dimensional set A, is obtained by integrating the probability density function (PDF) f ( x) over
the set A.
Similarly, the probability that the pair ( X , Y ) of a continuous random variable fall in a two-
dimensional set A is obtained by integrating the joint PDM.
Joint density function is a piecewise continuous function of two variables f ( x, y ) , such that
for any "reasonable" two-dimensional set B
P( X , Y ) A = f ( x, y )dydy
A .
Definition: Let X and Y be continuous random variables. A joint density function f ( x) for
these two variables is a function satisfying
(a) f ( x, y ) 0
and
f ( x, y )dxdy = 1
− −
Example-7
The joint PDF of (X, Y) is given by
6
(x + y ) 0 x 1, 0 y 1
2
f ( x, y ) = 5
0 Otherwise
177 | P a g e
o
f ( x, y )dxydy = 1
(ii) − −
the first condition is fulfilled as f ( x, y ) 0 for the verification of the second condition –
1 1
6
f ( x, y )dxdy = 5
( x + y 2 )dxdy
− − 0 0
1 1 1 1
6 6 2
= x dxdy + y dxdy
0 0
5 0 0
5
1 1
6 6
= x dxdy + y 2 dxdy
0
5 0
5
P 0 X , 0 Y
1 1
(b) 4 4
1/ 4 1/ 4
6
= 5
( x + y 2 )dxdy
0 0
1/ 4 1/ 4 1/ 4
6 6
=
5 x dxdy +
5 y 2 dxdy
0 0 0
1 1
2 x= 4 3 y= 4
= 6 x + 6 y
20 2 x =0 20 3 y =0
7
=
640 Ans.
Example-8
Consider two continuous random variables X and Y with joint p.d.f.
2 2
𝑓(𝑥, 𝑦) = {81 𝑥 𝑦, 0 < 𝑥 < 𝐾, 0 < 𝑦 < 𝐾
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
178 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
𝑥
3 2
b) P(X>3Y) = ∫0 (∫03 81 𝑥 2 𝑦𝑑𝑦) 𝑑𝑥
3 1
= ∫0 𝑥 4 𝑑𝑥
729
1
= 15
179 | P a g e
1
6 6x 2
= ( x + y 2 )dy = +
0
5 5 5
6 2
x+ , 0 x 1
f X ( x) = 5 5
0 otherwise
fY ( y ) = f ( x, y )dx
−
1
6
= ( x + y 2 )dx
0
5
6 2 3
= y +
5 5
6 2 3
y + for 0 y 1
fY ( y ) = 5 5
0 Otherwise
10.4.2 Expected value of a PDF
Let X and Y be a continuous random variable with joint PDF f ( x, y ) . Let g be some function,
then
E[ g ( x, y )] = g ( x, ly ), f ( x, y ) dxdy
− −
Example-10
The length of a thread is 1 mm, and two points are chosen Uniformly and independently along
the thread. Find the expected distance between these two points.
Solution
Let U and V be the two points that are chosen. The joint PDF of U and V is
1 0 U , V 1
f (U ,V ) =
0 otherwise
180 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
1 1
E[U − V ] = | U − V | dUdV
01 0
1 1 1 1
= (U − V )dUdV + (V − U )dUdV
0 0 0 0
1
E[U − V ] =
3
Example 11
The joint PDF of X and Y is given by
3
( x + y) 0 x 1
f ( x, y ) = 7
0 otherwise
2
find the expected value of X / Y .
Solution
2 1
3x( x + y )
E[ X , Y ] =
2
dxdy
1 0 7 y2
2
3 1 1
= 2 + dy
7 1 3y y
3
E[ X , Y 2 ] =
28 Ans.
181 | P a g e
E[ X | Y ] = xf X |Y ( x | y )dx
Similarly, one can define the conditional PDF, expected value of Y given X = X by
interchanging the rate of X and Y.
Properties of Conditional PDFs
The conditional PDF for X, given Y = Y is a valid PDF if two conditions are satisfied–
0 f X |Y ( x, y)
(1) (a)
(b)
f X |Y ( x | y )dx = 1
(2) The conditional distribution of X given Y does not equal the conditional distribution of Y
given X.
f f ( x | y) fY | X ( y | x)
i.e. X |Y
Example 12
If the joint PDF of U and V is given by
2
(U + V ) 0 U 1, 0 V 1
f (U ,V ) = 3
0 Otherwise
so that
2
1 (U + 1) 0 U 1
f U = 3
2
0 otherwise
182 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
1
1 2
E U = U (U + 1)dV
2 0 3
then
1 5
E U =
2 9
183 | P a g e
IN TEXT QUESTIONS
(a) a = 0.5, b = 1
(b) a = 1, b = 4
(c) a = 1, b = −1
(d) a = 0, b = 0
11. What are the two important conditions that must be satisfied for f ( x, y ) to be a
legitimate PDF.
12. When do the conditional density function get converted into the marginal density
function?
(a) Only if random variable exhibits statistical dependency.
(b) Only if random variable exhibits statistical independency
(c) Only if random variable exhibit deviation from its mean value
184 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
Joint probability distribution function refers to the combined probability distribution of more
than one random variable. These variables may be discrete or continuous. Marginal probability
distribution is obtained by adding probability distribution of one variable keeping the other
variable as constant. P( x, y ) must satisfy the following conditions in the case of discrete
variables to be a valid joint probability mass function-
(d) 0 P( x, y ) 1
P ( x, y ) = 1
(e) x y
10.6 GLOSSARY
Conditional Probability: a measure of the probability of an event occurring given that another
event has already occurred
Independence of Random Variables: if PXY ( x, y ) = PX ( x) PY ( y ) x, y
Marginal probability Density Function: obtained by integrating the joint PDF of one variable
keeping the other constant.
185 | P a g e
E[ g ( x, y )] = g ( x, ly ), f ( x, y ) dxdy
Expected Value of a PDF: − −
1. d 8. a
2. b 9. a
3. a 10. b
4. False 11. Refer to example 5
5. a 12. b
6. c 13. a) Yes
7. a b). Yes
c) Yes
1. A fair coin is tossed 4 times. Let the random variable X denote the number of leads in
the first 3 tosses and let the random variable Y denote the number of leads in the last 3
tosses. Answer the following.
(a) What is the joint PMF of X and Y.
(b) What is the probability of 2 or 3 leads appearing in the first three tosses and 1 or 2
leads appearing in the last three tosses.
2. Let X and Y be random variables with joint PDF.
1
−1 x , y 1
f XY ( x, y ) = 4
0 otherwise
Find
(a) P( X + Y 1)
2 2
(b) P(2 X − Y 0)
3. Let X and Y be two jointly distributed continuous random variable with joint PDF
186 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
6 xy 0 x 1, 0 y x
f X ,Y ( x, y ) =
0 otherwise
10.9 REFERENCES
Devore J. L. (2012). Probability and statistics for engineering and the sciences (8th ed.; First
Indian reprint 2012). Brooks/Cole Cengage Learning.
Rice J. A. (2007). Mathematical statistics and data analysis (3rd ed.). Thomson/Brooks/Cole.
Johnson R. A. & Pearson Education. (2017). Miller & Freund’s probability and statistics for
engineers (Ninth edition Global). Pearson Education.
Miller, I., Miller, M. (2017). J. Freund's Mathematical Statistics with Application, 8th ed.,
Pearson
Hogg R. V. Tanis E. A. & Zimmerman D. L. (2021). Probability and statistical inference (10th
Edition). Pearson.
James McClave, P. George Benson, Terry Sincich (2017), Statistics for Business and
Economcs, Pearson Publication
10.10 SUGGESTED READINGS
187 | P a g e
LESSON 11
CORRELATION AND COVARIANCE
STRUCTURE
11.1. Learning objectives
11.2. Introduction
11.3. Covariance
11.4. Method of calculating covariance
11.5. Correlation
11.5.1. Positive and Negative Correlation
11.5.2. Linear and Non-Linear Correlation
11.5.3. Simple and Multiple Correlation
11.6. Covariance vs correlation
11.7. Methods of calculating correlation
11.7.1. Scatter diagram
11.7.2. Karl Pearson’s coefficient of correlation
11.7.3. Spearman’s rank correlation
11.8. Glossary
11.9. Summary
11.10. Answers to the in-text Questions
11.11 Self-Assessment Questions
11.12. References
11.13. Suggested Readings
11.1. LEARNING OBJECTIVES
By the end of the unit, students will be able to understand the following:
• Difference between covariance and correlation
• Method of calculating covariance
• Types of correlation
• Methods of calculating correlation
188 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
11.2 INTRODUCTION
In the previous units, you must have come across problems dealing with a single variable such
as marks, weight, and height of students in a classroom in which you used statistical measures
of central tendency such as mean, median, mode, standard deviation etc. All these measures
focused on understanding the data set containing individual variables independently. However,
in the real world, we have to analyse not only single variable but number of variables at the
same time. In such a situation, the basic question that comes in our minds is whether there is
any relationship between the two or more variables or not? And if there is a relationship, then
what kind of relationship? How can we find out the presence of such relationship between the
variables? What is the strength of such a relationship? The objective of this unit is to find the
answer to such numerous questions that we come across while dealing with two or more
variables simultaneously.
11.3. COVARIANCE
Covariance is one of the statistical measures of the relationship between two variables. In other
words, it shows how two variables change simultaneously. Suppose in your classroom there
are different students with different height and weight. So, if you want to know whether there
is any relationship between the height and weight of students. In other words, whether weight
of students varies simultaneously with the height or not. Then in such a case, we can use
covariance to understand the relationship between the two variables such as height and weight
in the present case.
The formula for covariance is given by:
𝛴(X −𝑋)(Y− 𝑌)
Cov (X, Y) = 𝑁
189 | P a g e
X Y (X − 𝑋) (Y − 𝑌 ) (X − 𝑋)(Y − 𝑌 )
65 73 5 13 65
60 82 0 22 0
70 50 10 -10 -100
55 40 -5 -20 100
50 55 -10 -5 50
ΣX= 300 ΣY= 300 115
𝛴𝑋 300 𝛴𝑌 300
𝑋= = = 60 and 𝑌= = = 60
𝑁 5 𝑁 5
𝛴(𝑋− 𝑋)(Y−𝑌)
Using Cov (X, Y) = 𝑁
115
We have Cov(X, Y)= = 23
5
Q.3. Find the covariance between X and Y from the following table.
X 100 150 175 225 250
Y 700 600 500 400 800
190 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
11.4. CORRELATION
In earlier example, where we tried to find out the relationship between weight and height of
students, and we used covariance to find that there is positive covariance between the two
variables. Positive covariance suggested that the variable move in same direction i.e. when
there is increase in height of a student, weight also rise simultaneously. However, in this
example, we couldn’t find through covariance, how much the weight increases with increase
in the height.
Therefore, covariance tells us whether there is relationship between two variable or not.
However, it fails to determine the strength of such a relationship. In other words, covariance
doesn’t inform about how closely two variables are related to each other. Thus, in order to
determine the strength of relationship between two variables, we use correlation. In other
words. If we need to determine how much one variable changes with respect to another
variable, we use another measure known as correlation.
Correlation is defined as the degree of association between two variables. In simple words, it
explains how far two variables are related to each other. In fact, coefficient of correlation is
said to be a measure of covariance between two series of variables.
Correlation is an important statistical measure which helps in determining changes in one
variable vis-à-vis another variable. For example, we know law of demand, according to which,
quantity demanded is inversely related to the price of a commodity given all other things are
constant. Similarly, Keynes physiological law of consumption, which says that if there is an
increase in income, it will lead to increase in consumption but by less than the increase in the
former. However, if we need to find out how much consumption changes with increase in
income, we can again take the help of correlation coefficient to measure this relationship
between income and consumption.
However, it is important to distinguish correlation from causation. Correlation simply informs
us about the how much one variable varies with respect to changes in another variable. It
doesn’t necessarily mean causation. It means correlation doesn’t not tell anything about the
cause-and-effect relationship between two variables, it just only gives an understanding
regarding the strength of relationship between the two variables. For example, in the following
table, we have information regarding the demand and price of a commodity.
In this case, there is a perfect negative correlation between the demand and price. However, it
implies that decrease in price causes demand to rise. This is only explaining inverse relationship
191 | P a g e
between the price and quantity demanded. In order to determine the cause-and-effect
relationship, we need to use higher statistical measures such as regression analysis
11.5. TYPES OF CORRELATION
11.5.1. Positive and Negative Correlation
When the coefficient of correlation between the two variables is positive it means that both the
variables move in the same direction. In other words, when one variable increases, then the
other also increases, though the rate of increase could be different.
For example: The law of supply curve states that there is a one-to-one relationship between the
price of a commodity and quantity supplied, given other things are constant. Such a relationship
between the price and quantity supplied is positive indicating as there is rise in the price, the
quantity supplied by the producer also rise.
Negative correlation between the two variables implies that both move in opposite direction
i.e. when one variable increase, there is a decrease in other variable. Such an inverse
relationship is found in the law of demand, which states that as the price of a commodity
increase, there is fall in the quantity demanded of the commodity.
11.5.2. Linear and Non- linear correlation
When the relationship between the two variables is linear, then it is referred as linear
correlation. In case of linear correlation, the amount of change in one variable tends to bear
constant ratio to the amount of change in another variable as a result when two variables are
plotted in a graph, we get a straight line.
800
700
600
Marks in History
500
400
300
200
100
0
0 10 20 30 40 50 60 70 80
Marks in English
192 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
On the other hand, when the relationship between the two variables is non-linear, then it is
referred as non- linear correlation. In this case, the amount of change in one variable does not
bear a constant ratio to the amount of change in other variable. Thus, when we plot two
variables in a graph, then we don’t get a straight line but a curve.
25
20
15
Marks
10
0
0 1 2 3 4 5 6 7 8 9 10
No. of Hours of study
193 | P a g e
The value of correlation lies between -1 and The value of covariance lies between -∞ and
+1. +∞.
It measures the direction as well as strength It only indicates the direction of the
of the relationship between the given two relationship between the given two variables.
variables.
It is free from units of measurement. It depends on units of measurement.
As the name suggests, in this method we will simply put the data into the graph in the form of
scatter plot to find out the correlation between the two variables. If the scatter of the plotted
points is dense, then the correlation between the two variables is higher. However, if the scatter
of the plotted point is spread widely, then the correlations between the two variables is small.
This is one of the simplest methods of ascertaining relationship between two variables as it just
requires plotting on graph and visualization.
194 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
The above graph plots the GDP at current price and public expenditure on education by the
government of India since 1999-2000 to 2019-2020. In this case, the points lie closely on a
upward sloping straight line from left to right, thus the correlation between the GDP and public
expenditure on education is highly positive.
195 | P a g e
The above graph plots the GDP and the expenditure on education as the percentage of GDP
spent on education in India during 1999-2000 to 2019-20. In this case, the points are widely
scattered in the graphs which indicates weak correlation between the GDP and expenditure on
education as the percentage of GDP
11.7.2. Karl Pearson’s Coefficient of Correlation
It is the mathematical method of calculating coefficient of correlation. The coefficient of
correlation in case of Karl Pearson is represented by r. When both variables in a particular data
set are normally distributed, it is the best method to use this method. However, extreme values
can have an impact on this coefficient, which makes it undesirable when one or both of the
variables are not normally distributed because they could exaggerate or weaken the strength of
the association.
Assumptions of Karl Pearson’s coefficient of correlation:
1) There exits linear relationship between two variables
2) The two variables are normally distributed
3) There is a presence of cause-and-effect relationship between the factors which affect
the distribution of the two variables.
𝛴𝑥𝑦
r = 𝑁𝜎𝑥𝜎𝑦
where x = ( X − 𝑋)
y = (Y − 𝑌 )
N = No of items
𝜎𝑥= Standard deviation of X
𝜎𝑦= Standard deviation of Y
𝛴𝑥𝑦
In simpler form, r =
√𝛴𝑥 2 𝛴𝑦 2
Where x = ( X − 𝑋)
y= (Y − 𝑌 )
x2= (X − 𝑋 )2
196 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
y2= (Y − 𝑌)2
Example 1: Find the Karl Pearson’s coefficient of correlation between the marks in economics
and mathematics of the following students.
Economics 70 50 55 70 85 90
Mathematics 35 45 28 33 25 14
70 35 0 5 0 25 0
50 45 -20 15 400 225 -300
55 28 -15 -2 225 2 30
70 33 0 3 0 9 0
85 25 15 -5 225 25 -75
90 14 20 -16 400 256 -320
ΣX = 420 ΣY = Σx2= 1250 Σy2= 544 Σxy= -665
180
𝛴𝑋
𝑋 = = 420/6 = 70
𝑁
𝛴𝑌
𝑌= = 180/6 = 30
𝑁
𝛴𝑥𝑦
Using, r =
√𝑥 2 × 𝑦 2
We have
197 | P a g e
−665
r=
√1250 ×544
−665
=
√6,80,000
−665
= 824.62
= - 0.8064
= - 0.81(approx.)
Therefore, the marks of students in economics and mathematics are inversely correlated to each
other as the Karl Pearson’s coefficient of correlation is -0.81.
Example 2: find out the coefficient of correlation from the following data set using direct
method
A 1 6 9 3 4 5 8 2 1
B 12 9 5 6 3 7 15 11 9
Solution:
X Y X2 Y2 XY
1 12 1 144 12
6 9 36 81 54
9 5 81 25 45
3 6 9 36 18
4 3 16 9 12
198 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
5 7 25 49 35
8 15 64 225 120
2 11 4 121 22
1 9 1 81 9
327− (39×37)
r=
√9×237−(39)2 √9×771−(77)2
327− 3003
r=
√2133−1521 √6939−5929
−2676
r=
√612 √1010
−2676
r = 24,74 × 31.78
−2676
r = 786.24
r = - 3.40
11.7.3. Spearman’s Coefficient of Rank Correlation
In the previous method of calculating correlation coefficient, important assumption was made
that the variables under the study must be normally distributed so as to yield appropriate results.
However, in actual circumstances, we often face a situation where the variables are not
normally distributed but skewed. In such a situation, there is a need to use another method
which doesn’t make such unrealistic assumptions about the distribution of the variables in
question. Such one method is Spearman’s rank correlation, under which no assumption is to be
followed for calculating coefficient of correlation between the two variables.
In Spearman’s rank correlation, variables are ranked, and the calculations are made on the basis
of ranks not the original observations in order to determine the coefficient of correlation.
The formula for spearman’s rank correlation is
6𝛴𝐷 2
R = 1- 𝑁(𝑁2−1)
199 | P a g e
200 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
Solution:
Students Ranks in Ranks in (R1-R2)= D (R1-R2) = D2
Section A (R1) Section B(R2)
1 4 1 3 9
2 5 3 2 4
3 7 2 5 25
4 1 7 -6 36
5 3 5 -2 4
6 2 6 -4 16
7 6 4 2 4
ΣD2= 98
Using,
6𝛴𝐷 2
R = 1- 𝑁(𝑁2−1)
Where, ΣD2= 98
6 ×98
R = 1- 7(72−1)
588
R = 1 - 336
R= 1- 1.75
R = -0.75
Example 4: Calculate coefficient of correlation between the rank of participants in a dance
competition.
Participants 1 2 3 4 5
Score of 55 70 80 70 75
Judge 1
Score of 70 80 90 50 60
Judge 2
201 | P a g e
Solution: In the above case, we are given score of two judges. Since ranks are not given, we
will give rank to the participants on the basis of scores of two judges.
Participants Score of R1 Score of R2 R1-R2 (R1-R2)2 = D2
Judge 1 Judge 1
1 55 5 70 3 2 4
2 70 3.5 80 2 1.5 2.25
3 80 1 90 1 0 0
4 70 3.5 50 5 1.5 2.25
5 75 2 60 4 -2 4
ΣD2= 12.50
1
6𝛴𝐷 2 − (𝑚13 − 𝑚1 )
12
Using, R = 1- 𝑁(𝑁 2 −1)
R = 1- 0.60
R= 0.40
IN TEXT QUESTIONS
Q.4. Scatter plot is simplest method of determining correlation. True/False
Q.5. Karl Pearson’s is a one of the methods of correlation. True/False
Q.6. Scatter plot method involves plotting variables in graph. True/ False
Q.7. Spearman’s Rank correlation cannot be used in case of common ranks. True/False
Q.8. Which of the following is the method of measuring correlation?
A) Spearman’s Rank method B) Standard Deviation
202 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
C) Covariance D) Mode
Q.9. Find out the correlation between X and Y using Karl Pearson’s coefficient of
correlation.
X 10 20 30 40 50 60 70 80
Y 100 150 100 250 300 200 200 300
Q.10. In the following table ranking of 10 participants by two judges in a drawing competition
is given.
Judge 1 2 5 10 1 9 8 4 3 7 6
Judge 2 2 10 7 4 9 5 9 1 8 3
Q.11. Calculate the rank coefficient between the marginal utilities of two goods received by
the 10 individuals.
Individuals A B C D E F G H I J
Marginal utility 70 50 60 60 77 80 90 15 25 45
of Good X
Marginal Utility 60 90 55 40 59 65 70 85 73 50
of Good Y
11.7. SUMMARY
In this chapter, we dealt with related but different statistical measure of determining
relationship between two or more variables. on the one hand, covariance informs us about how
the variables move together or not. However, it doesn’t inform anything about the amount of
association between the variables. On the other hand, correlation shows the relationship as well
as degree of relationship between two or more variables. Correlation can be further determined
using following methods:
A) Scatter Plot method
B) Karl Pearson’s Coefficient of Correlation
C) Spearman’s Rank Correlation
203 | P a g e
11.8 GLOSSARY
Q.4. Using scatter plot find out whether there is any correlation between the number of hours
individuals exercise and their respective weight.
204 | P a g e
© Department of Distance & Continuing Education, Campus of Open Learning,
School of Open Learning, University of Delhi
Introductory Statistics for Economics
No of 1 2 3 4 5 6
Exercise
Hours
Weight 80 65 55 60 50 60
11.11. REFERENCES
Devore, J. (2012) Probability and Statistics for Engineers, 8th ed.. Cengage Learning
John A. Rice (2007), Mathematical Statistics and Data Analysis, 3rd ed. Thomson Brooks/Cole
Miller, I,Miller, M (2017, J. Freund’s Mathematical Statistics with Applications. 8th ed Pearson
Larsen, R, Marx (2011). An Introduction to Mathematical Statistics inference, 10th Edition,
Pearson
11.12. SUGGESTED READINGS
Godfrey, K. (1980). Correlation methods. Automatica, 16(5), 527–534.
https://doi.org/10.1016/0005-1098(80)90076-x
Schober, P., Boer, C., & Schwarte, L. A. (2018). Correlation Coefficients. Anesthesia &
Analgesia, 126(5), 1763–1768. https://doi.org/10.1213/ane.0000000000002864
Akoglu, H. (2018). User’s guide to correlation coefficients. Turkish Journal of Emergency
Medicine, 18(3), 91–93. https://doi.org/10.1016/j.tjem.2018.08.001
Janse, R. J., Hoekstra, T., Jager, K. J., Zoccali, C., Tripepi, G., Dekker, F. W., & van Diepen,
M. (2021). Conducting correlation analysis: important limitations and pitfalls. Clinical Kidney
Journal, 14(11), 2332–2337. https://doi.org/10.1093/ckj/sfab085
Correction to Lancet Respir Med 2021; published online April 9. https://doi.
org/10.1016/S2213-2600(21)00160-0. (2021). The Lancet Respiratory Medicine, 9(6), e55.
https://doi.org/10.1016/s2213-2600(21)00181-8
205 | P a g e