Statistics file of pust
Statistics file of pust
Date: 14 / 02 / 23
Example: Let us consider the experiment of tossing two fair coins. The sample space of
the experiment is,
𝑋(𝑊1) = 2
𝑋(𝑊2) = 1
𝑋(𝑊3) = 1
𝑋(𝑊4) = 0
So, the random variable 𝑋 is a real number and takes the values 0, 1, 2 with
1 1 1
probabilities 4 , 2 𝑎𝑛𝑑 4 respectively.
Rules of probability:
➢ 0 ≤ 𝑃(𝑋) ≤ 1
➢ ∑ 𝑃(𝑋) = 1
Example: Let us consider the experiment of 3 - child family in which the probability that
a child will be a boy and girl are equal. Let 𝑋 be a random variable which is defined by
the number of boys. Let B and G denote the boy and girl respectively. The sample space
of the experiment is,
1
Here the random variable 𝑋 takes a finite number of values 0, 1, 2, 3. There 𝑋 is a discrete
random variable.
Let 𝑓(𝑥) 𝑜𝑟 𝑃(𝑥) represents the probability that the random variable 𝑋 takes the value x.
Thus, the probability function or probability mass function of the random variable 𝑋 for
the above example can be written as,
Value of X: x 0 1 2 3
𝑓(𝑥) 𝑜𝑟 𝑃(𝑥) 1 3 3 1
8 8 8 8
➢ Bar chart:
➢ Probability Histogram:
1/2
3/8
1/4
1/8
0
0 1 2 3
2
Date: 26 / 02 / 23
Example: The collection of all families residing in Dhaka city during the year constitutes
the population.
➢ Finite population
➢ Infinite population
➢ Qualitative data
➢ Quantitative data
➢ Nominal data
➢ Ordinal data
Nominal data: The measurement level in which numbers or symbols are assigned to the
categories or variable values for identification only is called a nominal data. The
categories are distinct, mutually exclusive and exhaustive.
3
Ordinal data: The measurement level is which numbers are assigned to the categories
or variable values for identification as well as ranking is called an ordinal data.
➢ Interval scale data: The measurement level in which numbers are assigned to the
variable values in such a way that measurement has order and distance
properties but not an absolute zero values is called an interval scale data.
➢ Ratio scale data: The measurement level in which numbers are assigned to the
variable values in such a way that measurement has order and distance
properties and an absolute zero property is called a ratio scale data.
# Question 1: What do you mean by measurement scale of data and its classification?
Solve: Measurement scale of data refer to how the properties of number can be changed
with different uses. It is the foundation of any scientific investigation. Scales of
measurement can be categorized in two different forms:
1. Qualitative data
➢ Nominal Data
➢ Ordinal Data
2. Quantitative data
➢ Interval scale data
➢ Ratio scale data
Nominal data refers to data that cannot be ordered or ranked, such as categories of
colors or names of countries. Ordinal data can be ordered or ranked, but the intervals
between the numbers are not necessarily equal, such as a rating scale from 1 to 5.
Interval data has equal intervals between the values but no true zero point, such as
temperature in Celsius or Fahrenheit. Ratio data has equal intervals between the values
and a true zero point, such as weight or height.
Data collection
Data: Observed values of one or more variable yield data. Each individual piece of data
is called a data point or an observation.
➢ Primary data
➢ Secondary data
4
Primary data: The primary data are those which are collected fresh and for the first
time, and thus happened to be original in character.
Secondary data: It is the data which has already been collected by individuals or
agencies for purpose other than those of our particular research study.
Data Operation
Frequency: The number of observations or values falling each group or class is called
class frequency or simply frequency.
Class boundary: A class boundary is always located mid-way between the upper limits
of a class and the lower limit of the next higher class.
5
Date: 28 / 02 / 23
# Question 1: 10, 9, 15, 16, 13, 21, 25, 14, 37, 30, 32, 12, 19, 21, 23, 30, 33, 16, 16, 34.
Construct a frequency table and draw graph of histogram, ogive curve and frequency polygon.
Solve: Here, 𝑁 = 20
∴ 𝐾 = 1 + 3.322 log 20
= 5.32 ≈ 5
𝑅𝑎𝑛𝑔𝑒 37 − 9
Number of class interval = = = 6.5 ≈ 7 [C.I = Class Interval]
𝐾−1 5−1
9, 10, 12, 13, 14, 15, 16, 16, 16, 19, 21, 21, 23, 25, 30, 30, 32, 33, 34, 37
6
Ogive curve:
Note: Ogive curve must include data points from both table
7
➢ Frequency polygon:
0
11 15 19 23 27 31 35 39
Qualitative data:
➢ Bar Diagram:
8
35
30
25
20
15
10
5
0
O+ B+ B- AB+
➢ Pie chart:
Pie chart is shown below: (Calculate the angles correctly and then draw using
protractor)
AB+
B
O+
B+
9
Date: 5 / 03 / 23
Central Tendency
Measures of center: Descriptive measures that indicate where the center or most typical
value of a data set lies.
➢ Mean
➢ Median
➢ Mode
Mean: The value obtained by summing all observations in a set and dividing by the
number of observations. Mean are divided by three types.
➢ Arithmetic mean
➢ Geometric mean
➢ Harmonic mean
Arithmetic mean: The arithmetic mean (AM) of a set of observations is the sum of
observations divided by the number of observations. Suppose we have 𝑛 observations
𝑥1 , 𝑥2 , 𝑥3 , … … , 𝑥𝑛 . Then the arithmetic mean is,
𝑥1 + 𝑥2 + 𝑥3 +. … . . +𝑥𝑛
𝐴𝑀 =
𝑛
∑𝑛𝑖=1 𝑥𝑖
=
𝑛
∑𝑥𝑖
=
𝑛
Geometric mean: Let a data set contains 𝑛 observations which are all positives, then the
geometric mean is the 𝑛𝑡ℎ positive root of their product. Suppose we have 𝑛 positive
observations 𝑥1 , 𝑥2 , 𝑥3 , … … , 𝑥𝑛 . Thus, the geometric mean is,
1
𝐺𝑀 = (𝑥1 . 𝑥2 . 𝑥3 … … 𝑥𝑛 )𝑛
Harmonic mean: If a data set contains non-zero observations, then the harmonic mean
is the reciprocal of arithmetic mean of the reciprocal of the observations. Suppose we
have a set of 𝑛 non-zero observations 𝑥1 , 𝑥2 , 𝑥3 , … … , 𝑥𝑛 . Then the harmonic mean is,
𝑛
𝐻𝑀 =
1 1 1
+ +. … . . +
𝑥1 𝑥2 𝑥𝑛
10
𝑛
=
1
∑𝑛𝑖=1
𝑥𝑖
Weighted arithmetic mean: If the relative importance of the values varies, each value is
assigned to an appropriate numerical weight. Let 𝑥1 , 𝑥2 , 𝑥3 , … … , 𝑥𝑛 be 𝑛 values whose
relative importance is measured by corresponding positive weights 𝑤1 , 𝑤2 , 𝑤3 , … … , 𝑤𝑛 .
Then the weighted arithmetic mean is given by,
𝑤1 𝑥1 + 𝑤2 𝑥2 +. … . . +𝑤𝑛 𝑥𝑛
𝑥𝑤 =
̅̅̅̅
𝑤1 + 𝑤2 +. … . . +𝑤𝑛
∑𝑤𝑖 𝑥𝑖
= ∑𝑤𝑖
# Question 1: Find the value of AM, GM and HM for ungrouped data 2, 5, 9, 3, 7, 11, 15
𝑥1 +𝑥2 +𝑥3 +.…..+𝑥𝑛
Solve: For AM, we know, 𝐴𝑀 = 𝑛
7 7
∴ 𝐻𝑀 = 1 1 1 1 1 1 1 = 1.448 = 4.83
+ + + + + +
2 5 9 3 7 11 15
# Question 2: Find the value of AM, GM and HM for the grouped data,
11
Class Interval Mid-Point (𝑥𝑖 ) Frequency (𝑓𝑖 ) 𝑥𝑖 𝑓𝑖
0 – 10 5 2 10
10 – 20 15 7 105
20 – 30 25 13 325
30 – 40 35 5 175
40 – 50 45 1 45
We know,
𝑥1 𝑓1 +𝑥2 𝑓2 +𝑥3 𝑓3 +.…..+𝑥𝑛 𝑓𝑛
𝐴𝑀 = 𝑁
∑𝑥𝑖 𝑓𝑖 660
= = = 23.571
𝑁 28
12
∑𝑓𝑖
Here 𝑁 = 28 and = 1.553
𝑥𝑖
We know,
𝑁
𝐻𝑀 = 𝑓1 𝑓 𝑓
+ 2 +...+ 𝑛
𝑥1 𝑥2 𝑥𝑛
𝑁
= 𝑓
∑ 𝑖
𝑥𝑖
28
= 1.553
= 18.03
Median: The median for a dataset is the value that is exactly in the middle position of the
list when the data are arranged in order from smallest to largest. Now if the number of
observations is odd, then the median is the exact middle value in the ordered list. If it is
even, then the median is in the halfway between the two middle values in the ordered
list.
𝑁+1 𝑡ℎ
➢ If 𝑛 is odd, 𝑚𝑒𝑑𝑖𝑎𝑛 = ( ) observation
2
𝑡ℎ
𝑁 𝑡ℎ 𝑁 𝑡ℎ
( ) +( +1)
➢ If 𝑛 is even, 𝑚𝑒𝑑𝑖𝑎𝑛 = ( 2 2
) observation
2
13
Class Interval Mid-point (𝑥𝑖 ) Frequency (𝑓𝑖 ) Cumulative Frequency (𝐹𝑐 )
0 – 10 5 2 2
10 – 20 15 7 9
20 – 30 25 13 22
30 – 40 35 5 27
40 – 50 45 1 28
Here,
We know,
𝑁
−𝐹𝐶1
Median = 𝑋𝐿 + 2 𝐹 × 𝐶. 𝐼
𝑚
14−9
= 20 + × 14
13
= 23.846
Date: 7 / 03 / 23
Mode: The mode of a dataset is its most frequently occurring value. If each value occurs
with the same frequency, the dataset has no mode; otherwise, any value that occurs
with greatest frequency is a mode.
Ans: 9
14
Solve:
𝑋𝐿 = 20,
∆1 = 13 − 7 = 6,
∆2 = 13 − 5 = 8,
𝐶. 𝐼 = 10
We know,
∆1
Mode (𝑀𝑜 ) = 𝑋𝐿 + ∆ × 𝐶. 𝐼
1 +∆2
6
= 20 + 6+8 × 10
= 24.285
➢ 𝐴𝑀 ≥ 𝐺𝑀 ≥ 𝐻𝑀
➢ 𝐴𝑀 × 𝐻𝑀 = 𝐺𝑀2
15
Now,
2
(√𝑥1 − √𝑥2 ) ≥ 0
→ 𝑥1 − 2√𝑥1 𝑥2 + 𝑥2 ≥ 0
→ 𝑥1 + 𝑥2 ≥ 2√𝑥1 𝑥2
𝑥1 + 𝑥2 1
→ ≥ (𝑥1 𝑥2 )2
2
∴ 𝐴𝑀 ≥ 𝐺𝑀
Again,
1 1 2
( − ) ≥0
√𝑥1 √𝑥2
1 1 1 1
→ − 2. . + ≥0
𝑥1 √𝑥1 √𝑥2 𝑥2
1 1 2
→ + ≥
𝑥1 𝑥2 √𝑥1 𝑥2
1 1
→ √𝑥1 𝑥2 . ( + ) ≥ 2
𝑥1 𝑥2
2
→ √𝑥1 𝑥2 ≥
1 1
+
𝑥1 𝑥2
→ 𝐺𝑀 ≥ 𝐻𝑀
∴ 𝐴𝑀 ≥ 𝐺𝑀 ≥ 𝐻𝑀
Again,
𝐴𝑀 × 𝐻𝑀
(𝑥1 + 𝑥2 ) 2
→ ×
2 1 1
𝑥1 + 𝑥2
𝑥1 𝑥2
→ (𝑥1 + 𝑥2 ) ×
(𝑥1 + 𝑥2 )
→ 𝑥1 𝑥2
2
→ (√𝑥1 𝑥2 )
16
→ 𝐺𝑀2
∴ 𝐴𝑀 × 𝐻𝑀 = 𝐺𝑀2
=0
#Question 5: Show that ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 ≤ ∑𝑛𝑖=1(𝑥𝑖 − 𝐴)2 where A is any arbitrary value.
17
Here 𝐴 ∈ ℝ. So, for any given value, ∑𝑛𝑖=1(𝑥𝑖 − 𝐴)2 will be a positive value or equal
0 (𝑤ℎ𝑒𝑛 𝐴 = 𝑥̅ ).
Therefore, we get,
𝑛
∑(𝑥𝑖 − 𝑥̅ ) = 0
𝑖=1
18
1
𝑎+𝑏+𝑐 𝑎. 𝑏. 𝑐(𝑎 + 𝑏 + 𝑐) 4
=𝑎+𝑏+𝑐+ ≥ 4. ( )
3 3
1
4. (𝑎 + 𝑏 + 𝑐) 1 𝑎+𝑏+𝑐 4
= ≥ 4. (𝑎. 𝑏. 𝑐)4 . ( )
3 3
1
𝑎+𝑏+𝑐 1 𝑎+𝑏+𝑐 4
= ≥ (𝑎. 𝑏. 𝑐)4 . ( )
3 3
1
𝑎 + 𝑏 + 𝑐 1−(4) 1
=( ) ≥ (𝑎. 𝑏. 𝑐)4
3
3
𝑎+𝑏+𝑐 4 1
=( ) ≥ (𝑎. 𝑏. 𝑐)4
3
𝑎+𝑏+𝑐 1 4
= ≥ (𝑎. 𝑏. 𝑐)4×3
3
𝑎+𝑏+𝑐 1
= ≥ (𝑎. 𝑏. 𝑐)3
3
Thus, for three observations, we can prove that, 𝐴𝑀 ≥ 𝐺𝑀
Part 2: Again, let’s assume that our three variables are 𝑝, 𝑞, 𝑟 and their values are as
1 1 1
follows, 𝑝 = 𝑎 , 𝑞 = 𝑏 , 𝑟 = 𝑐
19
1 −1
1 3 𝑎. 𝑏 + 𝑏. 𝑐 + 𝑐. 𝑎 −1
= (( ) ) ≥( )
𝑎. 𝑏. 𝑐 3. 𝑎. 𝑏. 𝑐
1 3. 𝑎. 𝑏. 𝑐
= (𝑎𝑏𝑐)3 ≥
𝑎. 𝑏 + 𝑏. 𝑐 + 𝑐. 𝑎
= 𝐺𝑀 ≥ 𝐻𝑀
Therefore, we proved, 𝐴𝑀 ≥ 𝐺𝑀 ≥ 𝐻𝑀
#Question 5: Which measure of the central tendency is the best and why?
The mean is usually the best measure of central tendency to use when the data
distribution is continuous and symmetrical, such as when the data is normally
distributed. However, it all depends on what we are trying to show from the data. The
mean is equal to the sum of observation values in the dataset divided by the number of
values in the dataset,
𝑥1 +𝑥2 +.…..+𝑥𝑛
i.e. 𝑥̅ = 𝑛
The median is a good choice to represent a set with one or two outliers (ordinal data).
And the mode is only useful for sets of data that have many identical values (nominal
data).
So, we see these all reason that mean is the best measure of central tendency.
Quartile: There are three quartiles in a dataset, usually denoted by 𝑄1 , 𝑄2 and 𝑄3 , which
divide the who distribution into four equal parts.
The first quartile 𝑄1 is the value at or below which one-fourth (25%) of all observations
in the set fall, the third quartile 𝑄3 is the value at or below which three-fourth (75%) of
the observations lie.
20
For grouped data,
𝑖×𝑁
− 𝐹𝑐
𝑄𝑖 = 𝑋𝐿 + 4 × 𝐶. 𝐼
𝐹𝑚
Here,
𝑄𝑖 = 𝑖 𝑡ℎ quartile
𝐶. 𝐼 = Class Interval
Deciles: When a distribution is divided into ten equal parts, each division is called a
decile. Thus, there are 9 deciles in a distribution which are denoted by 𝐷1 , 𝐷2 , … … , 𝐷9
21
Date: 12 / 03 / 23
Computation of 𝑸𝟏 :
Here, 𝑁 = 100
𝑖×𝑁 1×100
Let 𝑖 = 1, hence 𝑄1 = = = 25𝑡ℎ observation
4 4
𝑖×𝑁
− 𝐹𝑐
𝑄1 = 𝑋𝐿 + 4 × 𝐶. 𝐼
𝐹𝑚
22
25 − 22
= 60 + × 10
35
= 60.85
≈ 60.9
Computation of 𝑸𝟐 :
2×100
𝑖 = 2, hence 𝑄2 = = 50𝑡ℎ observation
4
2×𝑁
− 𝐹𝑐
𝑄2 = 𝑋𝐿 + 4 × 𝐶. 𝐼
𝐹𝑚
2 × 100
− 22
= 60 + 4 × 10
35
= 68
Computation of 𝑸𝟑 :
3×100
𝑖 = 3, hence 𝑄3 = = 75𝑡ℎ observation
4
3×𝑁
( 4 − 𝐹𝑐 )
𝑄3 = 𝑋𝐿 + × 𝐶. 𝐼
𝐹𝑚
(3 × 100)
− 57
= 70 + 4 × 10
20
= 79
23
# Question 2: Compute first, fifth and ninth deciles for group frequency data.
Computation of 𝑫𝟏 :
𝑖×𝑁 1×100
𝑖 = 1, hence 𝐷1 = = = 10𝑡ℎ observation.
10 10
1×𝑁
( 10 − 𝐹𝑐 )
𝐷1 = 𝑋𝐿 + × 𝐶. 𝐼
𝐹𝑚
(1 × 100)
−8
= 50 + 10 × 10
14
= 51.42
Computation of 𝑫𝟓 :
𝑖×𝑁 5×100
𝑖 = 5, hence 𝐷5 = = = 50𝑡ℎ observation.
10 10
24
So, the 𝐷5 class is 60 – 70.
5×𝑁
( 10 − 𝐹𝑐 )
𝐷5 = 𝑋𝐿 + × 𝐶. 𝐼
𝐹𝑚
(5 × 100)
− 22
= 60 + 10 × 10
35
= 68
Computation of 𝑫𝟗 :
𝑖×𝑁 9×100
𝑖 = 9, hence 𝐷9 = = = 90𝑡ℎ observation.
10 10
9×𝑁
( 10 − 𝐹𝑐 )
𝐷9 = 𝑋𝐿 + × 𝐶. 𝐼
𝐹𝑚
(9 × 100)
− 77
= 80 + 10 × 10
15
= 88.6667
≈ 88.67
# Question 3: Compute 30𝑡ℎ and 90𝑡ℎ percentiles for group frequency data.
25
40 – 50 8 8
50 – 60 14 22
60 – 70 35 57
70 – 80 20 77
80 – 90 15 92
90 – 100 8 100
Computation of 𝑷𝟑𝟎 :
𝑖×𝑁 30×100
𝑖 = 30, hence 𝑃30 = = = 30𝑡ℎ observation.
100 100
30 × 𝑁
( 100 − 𝐹𝑐 )
𝑃30 = 𝑋𝐿 + × 𝐶. 𝐼
𝐹𝑚
(30 × 100)
− 22
= 60 + 100 × 10
35
= 62.285
≈ 62.29
Computation of 𝑷𝟗𝟎 :
𝑖×𝑁 90×100
𝑖 = 90, hence 𝑃90 = = = 90𝑡ℎ observation.
100 100
90 × 𝑁
( 100 − 𝐹𝑐 )
𝑃90 = 𝑋𝐿 + × 𝐶. 𝐼
𝐹𝑚
(90 × 100)
− 77
= 80 + 100 × 10
15
= 88.667
26
≈ 88.67
Trimmed mean: Discarding a proportion of the largest and smallest observations from
a data set, arithmetic mean is computed from the rest observation and the mean is
called trimmed mean.
# Question 4: Consider the following set of 10 observations 50, 55, 52, 56, 58, 60, 57, 53, 120, 5
and determine the 10% trimmed mean.
Solve: 10% of 10 is 1. Thus 10% trimmed mean is the mean of 8 observations discarding
the largest value 120 and smallest value 5. That is, the trimmed mean is,
1
𝑇𝑀 = × (50 + 55 + 52 + 56 + 58 + 60 + 57 + 53)
10 − (2 × 1)
441
=
8
= 55.125
Date: 14 / 03 / 23
Measure of dispersion
Dispersion is an important characteristic of a frequency distribution, it describes how
compactly the individual scores are distributed around the average.
27
Methods of measuring dispersions:
1. Algebraic method
2. Graphical method
Range: Range is measured just as the difference between the highest and lowest values
of the lowest variable.
∴ 𝑅𝑎𝑛𝑔𝑒 = 𝐻 − 𝐿
In the case of grouped data, the range is the difference between the upper boundary of
the highest class and the lower boundary of the lowest class. It is also calculated by
using the difference between the mid-points of the highest class and the lowest class.
Uses of range:
1. Quality control: The idea basically is that if the range, the difference between the
largest and smallest mass-produced items increases beyond a certain point, the
production machinery should be examined to find out why the items produced
have not followed their usual more consistent pattern.
2. Fluctuations in the share prices: Range is useful in studying the variations in the
price of stocks and shares and other commodities that are sensitive to price
changes from one period to another.
28
3. Weather forecast: The meteorological department does make use of the range in
determining the difference between minimum temperature and maximum
temperature.
Coefficient of range:
𝐻−𝐿
Coefficient of range = 𝐻+𝐿 × 100%
#Question 1: The following are the weekly wages of 8 workers in a manufacturing factory. Find
the range and coefficient of range. Wages are in taka 1400, 1450, 1520, 1380, 1485, 1495, 1575,
1440
Solve: Given that, wages are 1400, 1450, 1520, 1380, 1485, 1495, 1575, 1440
We know, 𝑅𝑎𝑛𝑔𝑒 = 𝐻 − 𝐿
Again, we know,
𝐻−𝐿
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑟𝑎𝑛𝑔𝑒 = × 100%
𝐻+𝐿
195
∴ 𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑟𝑎𝑛𝑔𝑒 = × 100% = 6.59%
1575 + 1380
#Question 2: The following table gives the frequency distribution of the number of orders
received each day during past 50 days at office of a mail company. Calculate the range and
coefficient of range.
29
Solve: Given that,
Here the table has inclusive class interval. So, to determine the highest and lowest value
of class interval, we need to adjust the class interval. By adjusting the class interval, we
get,
Here, 𝐻 = 23, 𝐿 = 11
Date: 21 / 03 / 23
#Quartile deviation:
Interquartile range: The interquartile range (IQR) of a data set is the difference between
the 1𝑠𝑡 and 3𝑟𝑑 quartiles.
30
𝐼𝑄𝑅 = 𝑄3 − 𝑄1
Example: Following are the runs scored by a batsman in last 20 test matches:
96, 70, 100, 96, 81, 84, 90, 89, 63, 90, 34, 75, 39, 82, 85, 86, 76, 64, 67 𝑎𝑛𝑑 88
Now,
3 × (20 + 1)
𝑄3 = = 15.75𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
4
Since 15.75𝑡ℎ observation lies between 15𝑡ℎ and 16𝑡ℎ observation that is midway
between 89 and 90. Therefore,
31
# Question 1: Following are the observation showing the age of 50 employees working
in a wholesale center. Find the quartile deviation and coefficient of QD
Here, 𝑁 = 50
𝑖×𝑁 1×50
Let 𝑖 = 1, hence 𝑄1 = = = 12.5𝑡ℎ observation. The 12.5𝑡ℎ observation lies
4 4
between 12𝑡ℎ and 13𝑡ℎ observation.
𝑖×𝑁
− 𝐹𝑐
𝑄1 = 𝑋𝐿 + 4 × 𝐶. 𝐼
𝐹𝑚
12.5 − 11
= 49.5 + ×5
14
= 50.0357
≈ 50.036
Computation of 𝑸𝟑 :
Here, 𝑁 = 50
32
𝑖×𝑁 3×50
Let 𝑖 = 3, hence 𝑄3 = = = 37. 5𝑡ℎ observation. The 37.5𝑡ℎ observation lies
4 4
between 37𝑡ℎ and 38𝑡ℎ observation.
𝑖×𝑁
− 𝐹𝑐
𝑄3 = 𝑋𝐿 + 4 × 𝐶. 𝐼
𝐹𝑚
37.5 − 36
= 59.5 + ×5
8
= 60.43
Therefore,
𝑄3 − 𝑄1 60.43 − 50.036
𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 (𝑄𝐷) = =
2 2
= 5.197
𝑄3 − 𝑄1 60.43 − 50.036
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑄𝐷 = × 100% = × 100%
𝑄3 + 𝑄1 60.43 + 50.036
= 9.40%
Date: 28 / 03 / 23
# Mean deviation:
33
It may be mentioned that the mean deviation is generally calculated about the
arithmetic mean. Again if 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 are the given values of a variable 𝑥 and
𝑓1 , 𝑓2 , 𝑓3 , … , 𝑓𝑛 are the corresponding frequencies.
Then,
𝑛
1
𝑀𝐷𝑐 = ∑ 𝑓𝑖 |𝑥𝑖 − 𝑐|
𝑛
𝑖=1
# Coefficient of mean:
A relative measure of dispersion based on the mean deviation is called the coefficient of
mean deviation or the coefficient of dispersion.
𝑀𝐷
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑚𝑒𝑎𝑛 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 (𝑚𝑒𝑎𝑛) = × 100%
𝑚𝑒𝑎𝑛
𝑀𝐷
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑚𝑒𝑎𝑛 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 (𝑚𝑒𝑑𝑖𝑎𝑛) = × 100%
𝑚𝑒𝑑𝑖𝑎𝑛
𝑀𝐷
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑚𝑒𝑎𝑛 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 (𝑚𝑜𝑑𝑒) = × 100%
𝑚𝑜𝑑𝑒
# Question 1: Following are the number of hours a machine worked for the last 9 weeks,
𝑛 + 1 𝑡ℎ
∴ 𝑀𝑒𝑑𝑖𝑎𝑛, 𝑀𝑒 = ( ) 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
2
= 5𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 = 47
10 + 28 + 32 + 39 + 47 + 60 + 63 + 75 + 96
∴ 𝑀𝑒𝑎𝑛, 𝑥̅ =
9
= 50
To find the mean deviation and coefficient of mean deviation from mean, we construct
the following table:
34
𝑥𝑖 𝑥𝑖 − 𝑥 ̅ |𝑥𝑖 − 𝑥̅ |
10 -40 40
28 -22 22
32 -18 18
39 -11 11
47 -3 3
60 10 10
63 13 13
75 25 25
96 46 46
∑|𝑥 − 𝑀𝑒|
𝑀𝐷(𝑚𝑒𝑎𝑛) =
𝑛
35
185
= = 20.56
9
𝑀𝐷
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑀𝐷 (𝑚𝑒𝑑𝑖𝑎𝑛) = × 100%
𝑚𝑒𝑑𝑖𝑎𝑛
20.56
= × 100% = 43.75%
47
# Variance: Variance is the arithmetic mean of the squares of the deviations of all values
in a set of number from their arithmetic mean. In other words, variance is the square of
the standard deviation,
𝑛
(𝑥𝑖 − 𝑥̅ )2
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛, 𝜎 = √∑
𝑛
𝑖=1
2
𝑛
(𝑥𝑖 − 𝑥̅ )2
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒, 𝜎 2 = {√∑ }
𝑛
𝑖=1
𝑛
1
= ∑(𝑥𝑖 − 𝑥̅ )2
𝑛
𝑖=1
𝑛 𝑛 2
𝑥𝑖2 𝑥𝑖
2
𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒, 𝜎 = ∑ − (∑ )
𝑛 𝑛
𝑖=1 𝑖=1
# Proof:
𝑛
2
(𝑥𝑖 − 𝑥̅ )2
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒, 𝜎 = ∑
𝑛
𝑖=1
36
𝑛
𝑥𝑖2 − 2𝑥𝑖 . 𝑥̅ + 𝑥̅ 2 Explanation:
=∑
𝑛 𝑛
𝑖=1
𝑛 𝑛 𝑛
∑1 = 𝑛
𝑥𝑖2 2𝑥𝑖 𝑥̅ 𝑥̅ 2 𝑖=𝑖
=∑ −∑ +∑
𝑛 𝑛 𝑛 𝑛 𝑛
𝑖=1 𝑖=1 𝑖=1 𝑥̅ 2 𝑥̅ 2 𝑥̅ 2
𝑛 𝑛
∑ = ∑1 = × 𝑛 = 𝑥̅ 2
1 1 𝑛 𝑛 𝑛
𝑖=𝑖 𝑖=𝑖
= ∑ 𝑥𝑖2 − 2𝑥̅ . ∑ 𝑥𝑖 + 𝑥̅ 2
𝑛 𝑛
𝑖=1 𝑖=1
𝑛
1
= ∑ 𝑥𝑖2 − 2𝑥̅ 2 + 𝑥̅ 2
𝑛
𝑖=1
𝑛
1
= ∑ 𝑥𝑖2 − 𝑥̅ 2
𝑛
𝑖=1
𝑛 𝑛
𝑥𝑖2 𝑥𝑖 2
= ∑ − ∑( )
𝑛 𝑛
𝑖=1 𝑖=1
Date: 2 / 04 / 23
𝑛
𝑓𝑖 (𝑥𝑖 − 𝑥̅ )2
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛, 𝜎 = √∑
𝑛
𝑖=1
𝑛 𝑛 2
𝑓𝑖 𝑥𝑖2 𝑓𝑖 𝑥𝑖
√
= ∑ − (∑ )
𝑛 𝑛
𝑖=1 𝑖=1
𝑛
𝑓𝑖 (𝑥𝑖 − 𝑥̅ )2
𝑃𝑜𝑝𝑢𝑙𝑎𝑖𝑜𝑛 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒, 𝜎 2 = ∑
𝑛
𝑖=1
37
𝑛 𝑛 2
𝑓𝑖 𝑥𝑖2 𝑓𝑖 𝑥𝑖
=∑ − (∑ )
𝑛 𝑛
𝑖=1 𝑖=1
𝑛
(𝑥𝑖 − 𝑥̅ )2
𝑆 = √∑
𝑛−1
𝑖=1
𝑛 𝑛 2
1 𝑥𝑖
=√ {∑ 𝑥𝑖2 − (∑ ) }
𝑛−1 𝑛
𝑖=1 𝑖=1
Sample variance,
𝑛
2
(𝑥𝑖 − 𝑥̅ )2
𝑆 =∑
𝑛−1
𝑖=1
𝑛 𝑛 2
1 𝑥𝑖
= {∑ 𝑥𝑖2 − (∑ ) }
𝑛−1 𝑛
𝑖=1 𝑖=1
𝑛
𝑓𝑖 (𝑥𝑖 − 𝑥̅ )2
𝑆𝑎𝑚𝑝𝑙𝑒 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 (𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦), 𝑆 = √∑
𝑛−1
𝑖=1
𝑛 𝑛 2
1 𝑓𝑖 𝑥𝑖
=√ {∑ 𝑓𝑖 𝑥𝑖2 − (∑ ) }
𝑛−1 𝑛
𝑖=1 𝑖=1
𝑛
2
𝑓𝑖 (𝑥𝑖 − 𝑥̅ )2
𝑆𝑎𝑚𝑝𝑙𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦), 𝑆 = ∑
𝑛−1
𝑖=1
38
𝑛 𝑛 2
1 𝑓𝑖 𝑥𝑖
= {∑ 𝑓𝑖 𝑥𝑖2 − (∑ ) }
𝑛−1 𝑛
𝑖=1 𝑖=1
# Question 1: The data on relative humidity (in %) for the last 10 days of a month in a city in
Bangladesh are 90, 97, 92, 95, 93, 95, 85, 83, 85, 75. Calculate the variance and standard
deviation for the above data.
79636 890 2
𝜎2 = −( )
10 10
= 42.6
𝜎 = √42.6 = 6.53
Daily Income 100 – 150 150 – 200 200 – 250 250 – 300 300 – 350
No. of workers 1 3 6 4 1
39
Solve: Using the data, we construct the following table,
𝐶𝐼 𝑓𝑖 𝑀𝑖𝑑𝑝𝑜𝑖𝑛𝑡 (𝑥𝑖 ) 𝑓𝑖 𝑥𝑖 𝑥𝑖2 𝑓𝑖 𝑥𝑖2
100 – 150 1 125 125 15625 15625
150 – 200 3 175 525 30625 91875
200 – 250 6 225 1350 50625 303750
250 – 300 4 275 1100 75625 302500
300 – 350 1 325 325 105625 105625
Here,
𝑛 𝑛
We know,
𝑛 𝑛 2
2
𝑓𝑖 𝑥𝑖2 𝑓𝑖 𝑥𝑖
𝜎 =∑ − (∑ )
𝑛 𝑛
𝑖=1 𝑖=1
3425 819375 2
= −( )
15 15
= 2488.89
𝜎 = √2488.89 = 49.89
# Question 3: The run scores of two cricketers for 10 innings are given below:
A 105 12 45 74 0 30 80 55 0 39
B 25 35 28 40 21 14 33 31 41 32
40
b. Who is a more consistent player?
Solve (a): In order to find out more consistent player, we have to calculate the
coefficient of variance for each cricketer.
Cricketer A Cricketer B
Score (𝑥𝑖 ) 𝑥𝑖2 Score (𝑦𝑖 ) 𝑦𝑖2
105 11025 25 625
12 144 35 1225
45 2025 28 784
74 5476 40 1600
0 0 21 441
30 900 14 196
80 6400 33 1089
55 3025 31 961
0 0 41 1681
39 1521 32 1024
𝑛 𝑛 𝑛 𝑛
∑ 𝑥𝑖 = 450 , ∑ 𝑥𝑖2 = 30616 ∑ 𝑦𝑖 = 300 , ∑ 𝑦𝑖2 = 9626
𝑖=1 𝑖=1 𝑖=1 𝑖=1
𝑛 𝑛
𝑥𝑖2 𝑥𝑖 2 30616 450 2
𝜎𝐴 = √∑ − ∑ ( ) = √ −( ) = 32.195
𝑛 𝑛 10 10
𝑖=1 𝑖=1
𝜎𝐴
𝐶𝑉 (𝐴) = × 100% = 71.54%
𝑥̅
41
And standard deviation,
𝑛 𝑛
𝑦𝑖2 𝑦𝑖 2 9626 300 2
𝜎𝐵 = √∑ − ∑ ( ) = √ −( ) = 7.92
𝑛 𝑛 10 10
𝑖=1 𝑖=1
𝜎𝐵
𝐶𝑉 (𝐵) = × 100% = 26.37%
𝑦̅
From the above results, the mean score of cricketer A is higher than the mean score of
cricketer B
Solve (B): Since the coefficient of variance for cricketer A is higher than that of cricketer
B, cricketer B is more consistent than cricketer A.
# Question 4: The daily wages (in taka) paid to the workers in the two factories A and B in
Dhaka city are given below:
Solve: From the above data, we can construct the table below for “Factory A”,
Here,
42
𝑛 𝑛
𝑛 𝑓 𝑖 𝑥𝑖
a: So, Mean (Factory A), 𝑥 𝐴 = ∑𝑖=1
̅̅̅ 𝑛
62250
∴ ̅̅̅
𝑥𝐴 = = 320.87
194
From the above data, we can construct the table below for “Factory B”,
𝑓 𝑖 𝑥𝑖
𝑥𝐵 = ∑𝑛𝑖=1
Mean (Factory B), ̅̅̅ 𝑛
84250
∴ ̅̅̅
𝑥𝐵 = = 312.03
270
b: Here we want to determine which factory gives higher amount of wages. So, we need
to identify which factory has higher variability in individual wages.
𝑛 𝑛 2
𝑓𝑖 𝑥𝑖2 𝑓𝑖 𝑥𝑖
√
𝜎𝐴 = ∑ − (∑ )
𝑛 𝑛
𝑖=1 𝑖=1
21171250 62250 2
= √( )−( ) = √6168.56 = 78.54
194 194
43
For “Factory B”,
𝑛 𝑛 2
𝑓𝑖 𝑥 2 𝑓𝑖 𝑥𝑖
𝜎𝐵 = √∑ 𝑖 − (∑ )
𝑛 𝑛
𝑖=1 𝑖=1
27993805 84250 2
√
= ( )−( ) = √6313.64 = 79.45
270 270
𝜎𝐴 78.54
𝐶𝑉(𝐴) = × 100% = × 100% = 24.50%
𝑥̅ 320.87
𝜎𝐵 79.45
𝐶𝑉(𝐵) = × 100% = × 100% = 25.46%
𝑥̅ 312.03
Since the Coefficient of variance of “Factory B” is higher than that of “Factory A”,
“Factory A” has more consistent wage structure.
Boys Girls
Number 100 75
Mean Weight (kg) 56 42
Variance 9 4
44
Solve: Given that,
𝑛1 = 100, 𝑛2 = 75
𝑥1 = 56, 𝑥
̅̅̅ ̅̅̅2 = 42
𝑆12 = 9, 𝑆22 = 4
𝑛1 ̅̅̅
𝑥1 + 𝑛2 ̅̅̅
𝑥2
𝑥̅ = = 50
𝑛1 + 𝑛2
2
𝑛1 𝑆12 + 𝑛2 𝑆22 𝑛1 𝑛2 2
𝑆 = +{ × (𝑥
̅̅̅1 − 𝑥
̅̅̅)
2 }
𝑛1 + 𝑛2 (𝑛1 + 𝑛2 )2
= 54.857
𝑆 = √54.857 = 7.406
Solve c: If we want to determine the stability of the distributions, we need to know the
coefficient of variance.
We know,
Therefore,
𝜎 3
Coefficient of variance for the boys, 𝐶𝑉 (𝐵𝑜𝑦𝑠) = ̅𝑥̅̅1̅ × 100% = 56 × 100% = 5.35%
1
𝜎 2
Coefficient of variance for the girls, 𝐶𝑉 (𝐺𝑖𝑟𝑙𝑠) = ̅𝑥̅̅2̅ × 100% = 42 × 100% = 4.76%
2
45
Date: 4 / 04 / 23
Theorem:
For a set of two unequal observations, each of mean deviation and standard deviation is
half of the range, that is,
𝑅𝑎𝑛𝑔𝑒
𝑀𝐷 = 𝑆𝐷 =
2
Proof:
Now, 𝑅 = 𝑥1 − 𝑥2
𝑥1 +𝑥2
And Mean, 𝑥̅ = 2
Now,
𝑥1 + 𝑥2
𝑥1 − 𝑥̅ = 𝑥1 −
2
𝑥1 − 𝑥2
=
2
And,
𝑥1 + 𝑥2
𝑥2 − 𝑥̅ = 𝑥2 −
2
𝑥2 − 𝑥1
=
2
Mean deviation,
∑|𝑥𝑖 − 𝑥̅ |
𝑀𝐷 =
𝑛
|𝑥1 − 𝑥̅ | + |𝑥2 − 𝑥̅ |
=
2
46
𝑥1 − 𝑥2 𝑥2 − 𝑥1
| | + |
= 2 2 |
2
𝑥1 − 𝑥2 𝑥 −𝑥
+ |− 1 2 2 |
= 2
2
𝑥1 − 𝑥2 𝑥1 − 𝑥2
+
= 2 2
2
𝑥1 − 𝑥2
=
2
𝑅
= 𝑀𝐷 =
2
(𝑥1 − 𝑥̅ )2 + (𝑥2 − 𝑥̅ )2
𝑆𝐷 = √
2
𝑥1 − 𝑥2 2 𝑥2 − 𝑥1 2
√( 2 ) + ( 2 )
=
2
𝑥1 − 𝑥2 2 𝑥1 − 𝑥2 2
√( 2 ) + ( 2 )
=
2
𝑥1 − 𝑥2 2
√2 ( 2 )
=
2
𝑥1 − 𝑥2
=
2
𝑅
=
2
𝑅
∴ 𝑆𝐷 =
2
𝑛2 −1
# Question 1: Show that the variance of first 𝑛 natural numbers is 12
47
Thus, the variance,
2
∑𝑥𝑖 ∑𝑥𝑖 2
𝑆 = −( )
𝑛 𝑛
Here, ∑𝑥𝑖 = 1 + 2 + 3 + ⋯ + 𝑛
𝑛(𝑛 + 1)
=
2
And ∑𝑥𝑖2 = 12 + 22 + 32 + ⋯ + 𝑛2
𝑛(𝑛 + 1)(2𝑛 + 1)
=
6
Here,
2
2
𝑛(𝑛 + 1)(2𝑛 + 1) 𝑛(𝑛 + 1)
𝑆 = −( )
6𝑛 2𝑛
(𝑛 + 1)(2𝑛 + 1) 𝑛+1 2
= −( )
6 2
𝑛+1 2𝑛 + 1 𝑛 + 1
= ×( − )
2 3 2
𝑛 + 1 4𝑛 + 2 − 3𝑛 − 3
= ×
2 6
𝑛+1 𝑛−1
= ×
2 6
𝑛2 − 1
=
12
∑(𝑥𝑖 −𝑥̅ )2
# Question 2: For a set of 𝑛 positive observations, if 𝑆 2 = , then prove that,
𝑛
1. 𝑥̅ √𝑛 − 1 > 𝑆
2. 𝐶𝑉 < 100√(𝑛 − 1)
Solve 1: We know,
∑𝑥𝑖
𝑥̅ =
𝑛
48
∴ ∑𝑥𝑖 = 𝑛𝑥̅ − − − − − − − − − −(1)
Again,
Thus,
(∑𝑥𝑖 )2 (∑𝑥𝑖 )2
→ (∑𝑥𝑖 )2 − > ∑𝑥𝑖2 −
𝑛 𝑛
(∑𝑥𝑖 )2
𝑛 (∑𝑥𝑖2 −
(∑𝑥𝑖 )2 𝑛 )
→ (∑𝑥𝑖 )2 − >
𝑛 𝑛
1 ∑𝑥𝑖2 ∑𝑥𝑖 2
→ (∑𝑥𝑖 )2 (1 − ) > 𝑛 ( −( ) )
𝑛 𝑛 𝑛
(𝑛𝑥̅ )2 (𝑛 − 1)
→ > 𝑛𝑆 2 [∴ 𝑓𝑟𝑜𝑚 𝑒𝑞𝑢𝑎𝑡𝑖𝑜𝑛 (1) → ∑𝑥𝑖 = 𝑛𝑥̅ ]
𝑛
𝑛2 𝑥̅ 2 . (𝑛 − 1)
→ > 𝑛𝑆 2
𝑛
→ 𝑥̅ 2 (𝑛 − 1) > 𝑆 2
→ 𝑥̅ √𝑛 − 1 > 𝑆
𝑥̅ √𝑛 − 1 > 𝑆
𝑆
→ √𝑛 − 1 >
𝑥̅
𝑆
→ 100 × √𝑛 − 1 > × 100
𝑥̅
∴ 100√𝑛 − 1 > 𝐶𝑉
# Show that the mean deviation cannot exceed the standard deviation, 𝑆 ≥ 𝑀𝐷.
49
Solve: We know,
2
∑(𝑥𝑖 − 𝑥̅ )2 ∑𝑥𝑖2 ∑𝑥𝑖 2
𝑆 = = −( )
𝑛 𝑛 𝑛
We also know that the square of any non-imaginary number is greater or equal zero.
Thus,
∑𝑥𝑖2 ∑𝑥𝑖 2
−( ) ≥0
𝑛 𝑛
∑𝑥𝑖2 ∑𝑥𝑖 2
→ ≥( )
𝑛 𝑛
∑(𝑥𝑖 − 𝑥̅ )2 𝑥𝑖 − 𝑥̅ 2
→ ≥( )
𝑛 𝑛
→ 𝑆 2 ≥ 𝑀𝐷2
→ 𝑆 ≥ 𝑀𝐷
Date: 11 / 04 / 23
1. Central moment: Moments computed about mean are called central moments.
2. Ray moment: Moment computed about any arbitrary value are called raw
moments.
Central moments are moments computed from arithmetic mean. Let us consider 𝑛
observations and 𝑥̅ be its mean. The 𝑟 𝑡ℎ central moment, denoted by 𝜇𝑟 , is defined as,
50
∑(𝑥𝑖 − 𝑥̅ )𝑟
𝜇𝑟 =
𝑛
∑𝑥𝑖 − 𝑥̅
𝜇1 = =0
𝑛
∑(𝑥𝑖 − 𝑥̅ )2
𝜇2 = = 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
𝑛
∑(𝑥𝑖 − 𝑥̅ )3
𝜇3 =
𝑛
∑𝑓𝑖 = 𝑁
∑𝑓𝑖 (𝑥𝑖 − 𝑥̅ )𝑟
𝜇𝑟 =
𝑛
# Question 1: The heights (in cm) of the players of football team-1 are given below:
142, 143, 146, 146, 148. Compute the first four central moments.
51
∑(𝑥𝑖 − 𝑥̅ )𝑟
𝜇𝑟 =
𝑛
Here,
∑(𝑥𝑖 −𝑥̅ )
1𝑠𝑡 central moment, 𝜇1 = =0
𝑛
∑(𝑥𝑖 −𝑥̅ )2
2𝑛𝑑 central moment, 𝜇2 = = 4.8
𝑛
∑(𝑥𝑖 −𝑥̅ )3
3𝑟𝑑 central moment, 𝜇3 = = −1.2
𝑛
∑(𝑥𝑖 −𝑥̅ )4
4𝑟𝑑 central moment, 𝜇4 = = 36
𝑛
Putting 𝑟 = 1, 2, 3, 4, we get,
∑(𝑥𝑖 − 𝑎)
𝜇1′ =
𝑛
∑𝑥𝑖 ∑𝑎
= −
𝑛 𝑛
∑𝑥𝑖 𝑛𝑎
= −
𝑛 𝑛
52
∴ 𝜇1′ = 𝑥̅ − 𝑎
∑(𝑥𝑖 − 𝑥̅ )
𝜇1 =
𝑛
∑𝑥𝑖 𝑛𝑥̅
= −
𝑛 𝑛
= 𝑥̅ − 𝑥̅
=0
∑(𝑥𝑖 − 𝑥̅ )2
𝜇2 =
𝑛
∑{(𝑥𝑖 − 𝑎) − (𝑥̅ − 𝑎)}2
=
𝑛
∑(𝑑𝑖 − 𝜇1′ )2
=
𝑛
2
∑𝑑𝑖2 − 2𝑑𝑖 𝜇1′ + 𝜇1′
=
𝑛
2
∑𝑑𝑖2 ∑𝑑𝑖 ′ ∑𝜇1′
= − 2. 𝜇 +
𝑛 𝑛 1 𝑛
2 2
= 𝜇2′ − 2𝜇1′ + 𝜇1′
2
∴ 𝜇2 = 𝜇2′ − 𝜇1′
1
𝜇𝑟 = ∑(𝑥𝑖 − 𝑥̅ )𝑟
𝑛
53
1
= ∑{(𝑥𝑖 − 𝑎) − (𝑥̅ − 𝑎)}𝑟
𝑛
1
= ∑(𝑑𝑖 − 𝜇1′ )𝑟
𝑛
1 2 𝑟
∴ 𝜇𝑟 = ∑{𝑑𝑖𝑟 − 𝑟𝑐1 𝑑𝑖𝑟−1 𝜇1′ + 𝑟𝑐2 𝑑𝑖𝑟−2 𝜇1′ + ⋯ + (−1)𝑟 𝜇1′ } − − − − − − − (𝑖)
𝑛
′ 2 3 𝑟
𝜇𝑟 = 𝜇𝑟′ − 𝑟𝑐1 𝜇𝑟−1 𝜇1′ + 𝑟𝑐2 𝜇𝑟−2
′
𝜇1′ − 𝑟𝑐3 𝜇𝑟−3
′
𝜇1′ + ⋯ + (−1)𝑟 𝜇1′
1 2 3
𝜇3 = ∑{𝑑𝑖3 − 3𝑐1 𝑑𝑖2 𝜇1′ + 3𝑐2 𝑑𝑖 𝜇1′ − 𝜇1′ }
𝑛
3
𝑑𝑖3 𝑑𝑖2 ′ 𝑑𝑖 ′ 2 𝑛𝜇1′
= − 3 𝜇1 + 3. 𝜇1 −
𝑛 𝑛 𝑛 𝑛
3 3
= 𝜇3′ − 3𝜇2 𝜇1′ + 3𝜇1′ − 𝜇1′
3
= 𝜇3′ − 3𝜇2′ 𝜇1′ + 2𝜇1′
1 2 3 4
𝜇4 = ∑{𝑑𝑖4 − 4𝑐1 𝑑𝑖3 𝜇1′ + 4𝑐2 𝑑𝑖2 𝜇1′ − 4𝑐3 𝑑𝑖 𝜇1′ + 𝜇1′ }
𝑛
4
𝑑𝑖4 𝑑𝑖3 𝑑𝑖2 2 𝑑𝑖 3 𝑛𝜇1′
= − 4. . 𝜇1′ + 6. . 𝜇1′ − 4. . 𝜇1′ +
𝑛 𝑛 𝑛 𝑛 𝑛
2 4 4
= 𝜇4′ − 4𝜇3′ 𝜇1′ + 6𝜇2′ 𝜇1′ − 4𝜇1′ + 𝜇1′
2 4
= 𝜇4′ − 4𝜇3′ 𝜇1′ + 6𝜇2′ 𝜇1′ − 3𝜇1′
# Question 3: Compute first four central moments for the data on number of computers sold per
day is given below:
Class Interval 20 – 29 30 – 39 40 – 49 50 – 59 60 – 69 70 – 79
Frequency 4 11 9 7 5 4
54
Solve: From the given data, let us construct the table below:
𝐶𝐼 𝑀𝑃 𝐹𝑟𝑒𝑞 𝑑 𝑓𝑑 𝑓𝑑 2 𝑓𝑑 3 𝑓𝑑 4
20 – 29 24.5 4 -2 -8 16 -32 64
30 – 39 34.5 11 -1 -11 11 -11 11
40 – 49 44.5 9 0 0 0 0 0
50 – 59 54.5 7 1 7 7 7 7
60 – 69 64.5 5 2 10 20 40 80
70 – 79 74.5 4 3 12 36 108 324
Here,
Raw moment,
𝑓𝑑 10
𝜇1′ = 𝐶 ∑ = 10 × = 2.5
𝑁 40
𝑓𝑑2 90
𝜇2′ 2
=𝐶 ∑ = 102 × = 225
𝑁 40
𝑓𝑑3 112
𝜇3′ 3
=𝐶 ∑ = 103 × = 2800
𝑁 40
𝑓𝑑4 486
𝜇4′ = 𝐶 4 ∑ = 104 × = 121500
𝑁 40
Central moments:
𝜇1 = 0
2
𝜇2 = 𝜇2′ − 𝜇1′ = 218.75
3
𝜇3 = 𝜇3′ − 3𝜇2′ 𝜇1′ + 2𝜇1′ = 1143.75
2 4
𝜇4 = 𝜇4′ − 4𝜇3′ 𝜇1′ + 6𝜇2′ 𝜇1′ − 3𝜇1′ = 101820.3125
55
Date: 7 / 05 / 23
Measures of skewness:
56
One measure of skewness is based on the second and third central moments in a
frequency distribution. If 𝜇2 and 𝜇3 are the second and third central moments
respectively, in a frequency distribution,
𝑛
(𝑥𝑖 − 𝑥̅ )2
𝜇2 = ∑
𝑛
𝑖=1
𝑛
(𝑥𝑖 − 𝑥̅ )3
𝜇3 = ∑
𝑛
𝑖=1
𝜇32
𝛽1 =
𝜇23
𝑛
𝑓𝑖 (𝑥𝑖 − 𝑥̅ )3
𝜇3 = ∑
𝑛
𝑖=1
The sign of skewness 𝛾1 would depend entirely upon the value of 𝜇3 . This is the most
important measure of skewness from a theoretical point of view.
57
1. Mesokurtic
2. Platykurtic
3. Leptokurtic
Measure of kurtosis:
Where 𝜇2 and 𝜇4 are the second and forth central moments respectively,
𝑛 𝑛
(𝑥𝑖 − 𝑥̅ )2 (𝑥𝑖 − 𝑥̅ )4
𝜇2 = ∑ 𝜇4 = ∑
𝑛 𝑛
𝑖=1 𝑖=1
58
3. When 𝛽2 < 3 or 𝛾2 = 𝛽2 − 3 < 0, the curve is less peaked than the mesokurtic
curve and is called a platykurtic curve.
Question 1: The 1st four central moments of distribution are 0, 2.5, 0.7 and 18.75. Examine the
skewness and kurtosis.
𝜇32
𝛽1 = 3 = 0.021
𝜇2
∴ 𝛾2 = 𝛽2 − 3 = 0
Question 2: The hourly earnings (in taka) of sample of 7 workers in a manufacturing company
are 27, 27, 24, 26, 25, 24, 22. Compute the coefficient of skewness based on moment.
27 + 27 + 24 + 26 + 25 + 24 + 22
𝑥̅ = = 25
7
𝑥𝑖 𝑥𝑖 − 𝑥̅ (𝑥𝑖 − 𝑥̅ )2 (𝑥𝑖 − 𝑥̅ )3 (𝑥𝑖 − 𝑥̅ )4
27 2 4 8 16
27 2 4 8 16
24 -1 1 -1 1
26 1 1 1 1
25 0 0 0 0
24 -1 1 -1 1
22 -3 9 -27 81
Here,
59
∑(𝑥𝑖 −𝑥̅ )
1𝑠𝑡 central moment, 𝜇1 = =0
𝑛
∑(𝑥𝑖 −𝑥̅ )2 20
2𝑛𝑑 central moment, 𝜇2 = = = 2.8571
𝑛 7
∑(𝑥𝑖 −𝑥̅ )3 12
3𝑟𝑑 central moment, 𝜇3 = =− = −1.7143
𝑛 7
Coefficient of skewness,
𝜇3 −1.71432
𝛾1 = = = −0.355
√𝜇23 √2.85713
𝜇4 16.5714
𝛽2 = = = 2.0301
𝜇22 2.85712
Hence,
𝛾2 = 𝛽2 − 3 = 2.0301 − 3 = −0.9699
Since, 𝛾2 < 0,
Date: 9 / 05 / 23
60
Correlation: The strength of association that exists between two variables. The
statistical relationship which gives us the strength or degree and direction of association
or interrelationship that exists between two variables is called correlation analysis.
Scatter plot:
∑(𝑥𝑖 − 𝑥)(𝑦𝑖 − 𝑦)
𝑟𝑥𝑦 =
√∑(𝑥𝑖 − 𝑥)2 (𝑦𝑖 − 𝑦)2
Alternative method:
(∑𝑥𝑖 )(∑𝑦𝑖 )
∑(𝑥𝑖 𝑦𝑖 ) −
𝑟= 𝑛
2 2
√∑(𝑥𝑖2 ) − (∑𝑥𝑖 ) . √(∑𝑦𝑖2 ) − (∑𝑦𝑖 )
𝑛 𝑛
61
1. Positive correlation: When two variables vary together in the same direction, i.e.
small values of variable 𝑥 are associated with small values of 𝑦, and large values
of 𝑥 are associated with large values of 𝑦, the correlation between the two
variables is said to be positive.
2. Negative correlation: When two variables vary together in the opposite
direction, i.e. small values of 𝑥 are associated with the large values of 𝑦, the large
values of 𝑥 are associated with the small values of 𝑦, the correlation between the
variables is said to be negative.
3. Zero correlation: When there is no relationship between two variables, i.e. high
and low values of the two variables do not show any relationship that can be
predicted, then there exists a zero correlation between the variables.
Solve: Let (𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 ), … … , (𝑥𝑛 , 𝑦𝑛 ) be the pairs of 𝑛 observations, Then the
correlation coefficient between 𝑥 and 𝑦 denoted by 𝑟𝑥𝑦 and defined as,
∑(𝑥𝑖 − 𝑥)(𝑦𝑖 − 𝑦)
𝑟𝑥𝑦 =
√∑(𝑥𝑖 − 𝑥)2 ∑(𝑦𝑖 − 𝑦)2
𝑋2 𝑋 𝑌 𝑌2
→ ∑ ( 2 ± 2. . + 2
)≥0
∑𝑋 √∑𝑋 2 √∑𝑌 2 ∑𝑌
∑𝑋 2 ∑𝑋𝑌 ∑𝑌 2
→ ± 2. + ≥0
∑𝑋 2 √∑𝑋 2 ∑𝑌 2 ∑𝑌
2
→ 1 ± 2. 𝑟𝑥𝑦 + 1 ≥ 0
→ 1 ± 𝑟𝑥𝑦 ≥ 0 − − − − − − − −(𝑖)
Either, Or,
1 + 𝑟𝑥𝑦 ≥ 0 1 − 𝑟𝑥𝑦 ≥ 0
∴ 𝑟𝑥𝑦 ≥ −1 ∴ 𝑟𝑥𝑦 ≤ 1
62
Hence, −1 ≤ 𝑟𝑥𝑦 ≤ 1
Question 2. Compute correlation between 𝑥 and 𝑦 from the data given below:
x 60 66 66 66 68 68 70 72 74 80
y 5 7 8 9 11 12 14 16 21 27
Solve: Here,
∑𝑥𝑖 690
𝑥̅ = = = 69
𝑛 10
∑𝑦𝑖 130
𝑦̅ = = = 13
𝑛 10
∑(𝑥𝑖 − 𝑥̅ )2 = 266
∑(𝑥𝑖 − 𝑥)(𝑦𝑖 − 𝑦)
𝑟𝑥𝑦 =
√∑(𝑥𝑖 − 𝑥)2 ∑(𝑦𝑖 − 𝑦)2
324
𝑟𝑥𝑦 =
√266 × 416
63
𝑟𝑥𝑦 = 0.97
Date: 14 / 05 / 23
6∑𝐷2
𝑅 = 1−
𝑁(𝑁 2 − 1)
Where 𝑅 denotes rank coefficient of correlation and 𝐷 refers to the difference of ranks
between paired items in two series.
Question 1. Two managers are asked to rank a group of employees in order of potential for
eventually becoming top managers. The rankings are as follows:
Employee A B C D E F G H I J
Manager 1 10 2 1 4 3 6 5 8 7 9
Manager 2 9 4 2 3 1 5 6 8 7 10
Employee A B C D E F G H I J
Manager 1 10 2 1 4 3 6 5 8 7 9
Manager 2 9 4 2 3 1 5 6 8 7 10
(𝑅1 − 𝑅2 )2 1 4 1 1 4 1 1 0 0 1
= 𝐷2
Here, ∑𝐷2 = 14
We know,
6∑𝐷2
𝑅 =1−
𝑁(𝑁 2 − 1)
6 × 14
=1−
10(102 − 1)
= 0.915
Thus, we find that there is a high degree of positive correlation in the ranks assigned by
the two managers.
64
Question 2. Calculate the rank correlation coefficient for the following data of marks of 2 tests
given to candidates for clerical job.
Preli. 92 89 87 86 83 77 71 63 53 50
Final 86 83 91 77 68 85 52 82 37 57
Solve: The given data is unranked. So, ranking these data, we can construct the below
table,
6∑𝐷2
𝑅 =1−
𝑁(𝑁 2 − 1)
= 1.448
Thus, we find that there is a high degree of positive correlation in the ranks of given
two test marks.
Regression: The relationship between the response or dependent variable and the
explanatory or independent variable specified by a linear equation is called a
regression. The general form of a linear equation is,
𝑦 = 𝑎 + 𝑏𝑥
65
Where 𝑥 is called the independent variable and 𝑦 is called the dependent variable and
𝑎, 𝑏 are constants.
Date: 16 / 05 / 23
Simple linear regression model for 𝑥 = 𝑥𝑖 , we have a saving data set for 𝑦 and 𝑦̂𝑖 is its
predicted (average) saving for all values of 𝑥𝑖 the predicted values of 𝑦̂ fall on the
equation.
In general, we expect that 𝑦𝑖 ≠ 𝑦̂, 𝑖 because same data points do not fall on the best line.
The difference 𝑒𝑖 = 𝑦𝑖 − 𝑦̂𝑖 is the prediction error. The prediction error for the data point
𝑖 is illustrated in,
→ 𝑦𝑖 = 𝑦̂𝑖 − 𝑒𝑖
= 𝐴 + 𝐵𝑥𝑖 + 𝑒𝑖
Where,
𝑦 = 𝐷𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
𝑥 = 𝐼𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒
𝑒 = 𝑟𝑎𝑛𝑑𝑜𝑚 𝑒𝑟𝑟𝑜𝑟
𝑥̂ = 𝑎1 + 𝑏1 𝑦
Where,
66
(∑𝑥𝑖 )(∑𝑦𝑖 )
∑𝑥𝑖 𝑦𝑖 −
= 𝑛
(∑𝑦 )2
∑𝑦𝑖2 − 𝑛𝑖
𝑎1 = 𝑥̅ − 𝑏1 𝑦̅
1
= (∑𝑥𝑖 − 𝑏∑𝑦𝑖 )
𝑛
𝑥 60 66 66 66 68 68 70 72 74 80
𝑦 5 7 8 9 11 12 14 16 21 27
Solve: From the given data, we can construct the table below:
𝑥𝑖 𝑦𝑖 𝑥𝑖2 𝑦𝑖2 𝑥𝑖 𝑦𝑖
60 5 3600 25 300
66 7 4356 49 462
66 8 4356 64 528
66 9 4356 81 594
68 11 4624 121 748
68 12 4624 144 816
70 14 4900 196 980
72 16 5184 256 1152
74 21 5476 441 1554
80 27 6400 729 2160
Here, ∑ 𝑥𝑖 = 690, ∑𝑦𝑖 = 130, ∑𝑥𝑖2 = 47876, ∑𝑦𝑖2 = 2106, ∑𝑥𝑖 𝑦𝑖 = 9294, 𝑛 = 10
We know,
𝑦 𝑛 = 𝑎1 + 𝑏1 𝑥
1
→ 𝑎1 = (∑𝑦𝑖 − 𝑏1 ∑𝑥𝑖 )
𝑛
67
∑𝑥𝑖 ∑𝑦𝑖
∑𝑥𝑖 𝑦𝑖 −
𝑏1 = 𝑛
(∑𝑥 )2
∑𝑥𝑖2 − 𝑛𝑖
= 1.218
And,
1
𝑎= (∑𝑦𝑖 − 𝑏∑𝑥𝑖 )
𝑛
1
= × (130 − 1.218 × 690)
10
= −71.042
𝑦̂ = −71.042 + 1.218𝑥
# Origin and scale shift method of computing regression equation. The regression
coefficient of 𝑦 on 𝑥 is,
𝑥𝑖 − 𝐴 𝑦𝑖 − 𝐵
𝑢𝑖 = 𝑎𝑛𝑑 𝑣𝑖 =
𝑐 𝑑
68
(𝑦𝑖 −𝐵)
And 𝑣𝑖 = so that 𝑦 = 𝐵 + 𝑑𝑣̅
𝑑
∑𝑢𝑖 . ∑𝑣𝑖
𝑑 ∑𝑢𝑖 𝑣𝑖 − 𝑛
= ×
𝑐 ∑𝑢 2
∑𝑢𝑖2 − ( 𝑛 𝑖 )
And,
𝑎 = 𝑦̅ − 𝑏𝑥̅
1
= (∑𝑦𝑖 − 𝑏)
𝑛
Date: 21 / 05 / 23
Question 1. Find the regression equation using origin and scale shifting method:
a. Determine regression that can be used to predict the price of an external disk,
given its price.
b. Determine regression that can be used to predict capacity of an external disk,
given its price.
c. Predict the price of a disk whose capacity is 1200 GB.
d. Predict the disk capacity whose price is taka 5000.
Solution:
𝑥𝑖 − 𝐴
𝑢𝑖 =
𝑐
𝑦𝑖 − 𝐵
𝑣𝑖 =
𝑑
69
120 2000 -38 -22 1444 484 836
200 2600 -30 -16 900 256 480
250 2800 -25 -14 625 196 350
320 3000 -18 -12 324 144 216
500 4200 0 0 0 0 0
800 5500 30 13 900 169 390
1000 6000 50 18 2500 324 900
1500 6800 100 26 10000 676 2600
Here, ∑𝑢𝑖 = 69, ∑𝑣𝑖 = −7, ∑𝑢𝑖2 = 16693, ∑𝑣𝑖2 = 2249, ∑𝑢𝑖 𝑣𝑖 = 5772, 𝑐 = 10, 𝑑 = 100
∑𝑢𝑖 ∑𝑣𝑖
𝑑 (∑𝑢𝑖 𝑣𝑖 − 𝑛 )
𝑏=
𝑐 (∑𝑢 )2
∑𝑢𝑖2 − 𝑛𝑖
(69 × −7)
100 5772 − 8
= ×
10 692
16693 − 8
= 3.623
And,
𝑎 = 𝑦̅ − 𝑏𝑥̅
∑𝑣𝑖 ∑𝑢𝑖
= (𝐵 + 𝑑 ) − 𝑏 (𝐴 + 𝑐 )
𝑛 𝑛
7 69
= 4200 + 100 × − − 3.623 (500 + 10 × )
8 8
= 1988.516
𝑦̂ = 1988.516 + 3.623𝑥
𝑥̂ = 𝑎 + 𝑏𝑦
70
∑𝑢𝑖 ∑𝑣𝑖
𝑐 (∑𝑢𝑖 𝑣𝑖 − 𝑛 )
𝑏=
𝑑 (∑𝑣 )2
∑𝑣𝑖2 − 𝑛𝑖
(69 × −7)
10 5772 − 8
= × 2
100 (−7)
2249 − 8
= 0.26
And,
𝑎 = 𝑥̅ − 𝑏𝑦̅
∑𝑢𝑖 ∑𝑣𝑖
= (𝐴 + 𝑐 ) − 𝑏 (𝐵 + 𝑑 )
𝑛 𝑛
69 7
= 500 + 10 × − 0.26 (4200 + 100 × − )
8 8
= −483
𝑥̂ = −483 + 0.26𝑦
Ans (c). When capacity is 1200 GB or 𝑥 = 1200, then the predicted price of the disk
would be,
= 6336.116
Ans (d). When price is taka 5000 or 𝑦 = 5000, then the predicted capacity of the disk
would be,
𝑥̂ = −483 + 0.26𝑦
= 817
71
Date: 23 / 05 / 23
Methods of constructing index number: A large number of formulae had been derived
for constructing index number. Broadly speaking, they can be grouped under two
heads,
i. Unweighted indices
ii. Weighted indices
1. Simple aggregative
2. Simple average of price relatives
1. Weighted aggregative
2. Weighted average of price relatives
This is the simplest method of constructing index numbers. When this method is used
to construct a price index, the total of current year prices for the various commodities in
question is divided by the total of base year price and the quotient is multiplied by 100.
∑𝑃1
𝑃01 = × 100%
∑𝑃0
Question 1. From the following data construct an index number for 2005 taking 2004 as base.
72
∑𝑃1 = 1170, ∑𝑃0 = 1115
∑𝑃1
𝑃01 = × 100%
∑𝑃0
1170
= × 100%
1115
= 104.93%
Simple average of relatives method: When this method is used to compute a price
index, price relatives are obtained for the various items included in the index and then
the average using any one of the measures of central tendency.
𝑃
∑ 𝑃1 × 100
0
𝑃01 =
𝑁
When geometric mean is used for averaging the price relatives, the formula for
obtaining the index number becomes,
(∑ log P)
𝑃01 = 𝑎𝑛𝑡𝑖𝑙𝑜𝑔
𝑛
𝑃
Where 𝑃 = 𝑃1 × 100
0
Question 2. From the following data construct a price index by simple average of price relatives’
method based on,
a. Arithmetic mean
b. Geometric mean
Commodity and unit Price (2004) Price (2005)
Butter (kg) 100.0 110.0
Cheese (kg) 60.0 75.0
Milk (liter) 20.0 30.0
Bread (quantity) 15.0 20.0
Eggs (Dozen) 20.0 25.0
Ghee (kg) 900.0 910.0
73
Commodity and unit Price (2004) Price (2005) 𝑃1 log 𝑃
× 100% = 𝑃
𝑃0
Butter (kg) 100.0 110.0 110 2.041
Cheese (kg) 60.0 75.0 125 2.097
Milk (liter) 20.0 30.0 150 2.176
Bread (quantity) 15.0 20.0 133.33 2.125
Eggs (Dozen) 20.0 25.0 125 2.097
Ghee (kg) 900.0 910.0 101.11 2.004
Here,
𝑃1
∑( × 100%) = 𝑃 = 744.44
𝑃0
∑ log 𝑃 = 12.54
𝑃
∑ 𝑃1 × 100 744.44
0
∴ 𝑃𝑟𝑖𝑐𝑒 𝑖𝑛𝑑𝑒𝑥 (𝐴𝑀), 𝑃01 = = = 124.073
𝑁 6
(∑ log P) 12.54
∴ 𝑃𝑟𝑖𝑐𝑒 𝑖𝑛𝑑𝑒𝑥 (𝐺𝑀), 𝑃01 = 𝑎𝑛𝑡𝑖𝑙𝑜𝑔 = 𝐴𝑛𝑡𝑖𝑙𝑜𝑔 = 123.026
𝑛 6
Weighted aggregative index number: There are various methods of assigning weights
and consequently a large number of formulae for constructing index number have been
devised. Some of them are,
1. Laspeyres method
2. Paasche method
3. Dorbish and Bowley’s method
4. Fisher’s ideal method
5. Marshall-Edgeworth method
6. Kelly’s method
Laspeyres method: In this method, the base quantities are taken as weights. The
formula for constructing index is,
∑𝑞0 𝑝1
𝑃01 = × 100
∑𝑞0 𝑝0
74
Paasche method: In this method the current year quantities are taken as weights. The
formula for constructing the index is,
∑𝑞1 𝑝1
𝑃01 = × 100
∑𝑞1 𝑝0
Dorbish and Bowley’s method: Dorbish and Bowley have suggested simple arithmetic
mean of the two indices (Laspeyres and Paasche) mentioned above so as to consider the
influence of both periods. The formula for constructing the index is,
𝐿+𝑃
𝑃01 =
2
Fisher’s ideal method: Professor Fisher has given a number of formulae for
constructing index number and of those he calls on as the ideal index. The Fisher’s ideal
index is given by the formula,
∑𝑝1 𝑞0 ∑𝑝1 𝑞1
𝑃01 = √ × × 100
∑𝑝0 𝑞0 ∑𝑝0 𝑞1
= √𝐿 × 𝑃
Marshall-Edgeworth method: In this method also both current year as well as base year
prices and quantities are considered. The formula for constructing the index is,
∑𝑝1 × (𝑞0 + 𝑞1 )
𝑃01 = × 100
∑𝑝0 × (𝑞0 + 𝑞1 )
Kelly’s method: T.L Kelly has suggested the following formula for constructing index
number,
∑𝑝1 𝑞
𝑃01 = × 100
∑𝑝0 𝑞
𝑞0 ×𝑞1
Where 𝑞 = 2
Question 3. Construct index number of prices from the following data using 1-5:
Commodity Price (2004) Quantity (2004) Price (2005) Quantity (2005)
A 5 8 7 9
75
B 4 15 9 17
C 7 18 10 20
D 9 20 3 13
Solve: From the above data, we can construct the table below:
Commodity Price Quantity Price Quantity 𝑝0 𝑞0 𝑝1 𝑞0 𝑝0 𝑞1 𝑝1 𝑞1 𝑝0 (𝑞0 + 𝑞1 ) 𝑝1 (𝑞0 + 𝑞1 )
𝑝0 𝑞0 𝑝1 𝑞1
A 5 8 7 9 40 56 45 63 85 119
B 4 15 9 17 60 135 68 153 128 288
C 7 18 10 20 126 180 140 200 266 380
D 9 20 3 13 180 60 117 39 297 99
Here,
1. Laspeyres method:
∑𝑞0 𝑝1
𝑃01 = × 100
∑𝑞0 𝑝0
431
= × 100 = 106.157
406
2. Paasche method:
∑𝑞1 𝑝1
𝑃01 = × 100
∑𝑞1 𝑝0
455
= × 100 = 122.98
370
3. Dorbish and Bowley’s method:
∑𝑞0 𝑝1 ∑𝑞1 𝑝1
𝐿 + 𝑃 (∑𝑞0 𝑝0 × 100) + (∑𝑞1 𝑝0 × 100)
𝑃01 = =
2 2
106.157 + 122.98
= = 114.568
2
4. Fisher’s method:
∑𝑝1 𝑞0 ∑𝑝1 𝑞1
𝑃01 = √ × × 100
∑𝑝0 𝑞0 ∑𝑝0 𝑞1
= √106.157 × 122.98 = 114.26
5. Marshall-Edgeworth method:
∑𝑝1 × (𝑞0 + 𝑞1 )
𝑃01 = × 100
∑𝑝0 × (𝑞0 + 𝑞1 )
76
886
= × 100 = 114.175
776
∑𝑃𝑉
𝑃01 =
∑𝑉
∑𝑉 log 𝑃
𝑃01 = 𝐴𝑛𝑡𝑖𝑙𝑜𝑔 [ ]
∑𝑉
𝑝
Where 𝑃 = 𝑝1 × 100
0
Question 4. From the following data compute price index by applying weighted average of price
relatives using arithmetic mean.
77
∑𝑃𝑉
𝑃01 =
∑𝑉
847499.85
= = 118.531
7150
This means that there has been a 118.531 percent increase in price over the base level.
78