0% found this document useful (0 votes)
8 views78 pages

Statistics file of pust

Stats

Uploaded by

aysalikhelbo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views78 pages

Statistics file of pust

Stats

Uploaded by

aysalikhelbo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 78

Statistics – 1131

Date: 14 / 02 / 23

Random variable: A random variable is denoted by 𝑋 (. ) (function) or 𝑋 for a given


probability space (Ω, 𝑃) is a function with domain Ω and the counter-domain is the real
line or number.

Example: Let us consider the experiment of tossing two fair coins. The sample space of
the experiment is,

Ω = {𝐻𝐻, 𝐻𝑇, 𝑇𝐻, 𝑇𝑇} [with two coins]

= {𝑊1, 𝑊2, 𝑊3, 𝑊4}

Let the random variable 𝑋, is defined by the number of heads. Thus,

𝑋(𝑊1) = 2
𝑋(𝑊2) = 1
𝑋(𝑊3) = 1
𝑋(𝑊4) = 0

So, the random variable 𝑋 is a real number and takes the values 0, 1, 2 with
1 1 1
probabilities 4 , 2 𝑎𝑛𝑑 4 respectively.

Rules of probability:

➢ 0 ≤ 𝑃(𝑋) ≤ 1
➢ ∑ 𝑃(𝑋) = 1

Discrete random variable: A random variable on Ω is defined to be a discrete random


variable if it takes at most a countable number of values. In other words, a real valued
function defined on a discrete sample space is called a discrete random variable.

Example: Let us consider the experiment of 3 - child family in which the probability that
a child will be a boy and girl are equal. Let 𝑋 be a random variable which is defined by
the number of boys. Let B and G denote the boy and girl respectively. The sample space
of the experiment is,

Ω = {𝐵𝐵𝐵, 𝐵𝐵𝐺, 𝐵𝐺𝐵, 𝐺𝐵𝐵, 𝐺𝐺𝐵, 𝐺𝐵𝐺, 𝐵𝐺𝐺, 𝐺𝐺𝐺}


1
and 𝑃(𝐵) = 𝑃(𝐺) = 2
1
The sample space is simple and the probability of each outcome is 8 .

1
Here the random variable 𝑋 takes a finite number of values 0, 1, 2, 3. There 𝑋 is a discrete
random variable.

Let 𝑓(𝑥) 𝑜𝑟 𝑃(𝑥) represents the probability that the random variable 𝑋 takes the value x.
Thus, the probability function or probability mass function of the random variable 𝑋 for
the above example can be written as,

Value of X: x 0 1 2 3
𝑓(𝑥) 𝑜𝑟 𝑃(𝑥) 1 3 3 1
8 8 8 8

➢ Bar chart:

➢ Probability Histogram:

1/2

3/8

1/4

1/8

0
0 1 2 3

2
Date: 26 / 02 / 23

Population: A population is the entire collection of objects or units whose


characteristics are of interest in any particular enquiry.

Example: The collection of all families residing in Dhaka city during the year constitutes
the population.

Population are two types. Such as,

➢ Finite population
➢ Infinite population

Finite population: A population consists of a finite number of units is called a finite


population; in finite population, the total number of units is in limit.

Infinite population: A population consisting of an infinite number of units is called an


infinite population; in infinite population, the number of units is endless.

Sample: A subgroup of a population selected for study is called a sample.

Scales of measurement: Scales of measurement refer to how the properties of number


can be changed with different uses. It is the foundation of any scientific investigation.

Scales of measurements are two types,

➢ Qualitative data
➢ Quantitative data

Qualitative data: A variable is said to be a qualitative if its values cannot be measured


inherently on a numerical scale.

Quantitative data: A variable is said to be quantitative if its values are measured


inherently on a numerical scale.

Qualitative data are two types,

➢ Nominal data
➢ Ordinal data

Nominal data: The measurement level in which numbers or symbols are assigned to the
categories or variable values for identification only is called a nominal data. The
categories are distinct, mutually exclusive and exhaustive.

3
Ordinal data: The measurement level is which numbers are assigned to the categories
or variable values for identification as well as ranking is called an ordinal data.

Quantitative data are two types,

➢ Interval scale data: The measurement level in which numbers are assigned to the
variable values in such a way that measurement has order and distance
properties but not an absolute zero values is called an interval scale data.
➢ Ratio scale data: The measurement level in which numbers are assigned to the
variable values in such a way that measurement has order and distance
properties and an absolute zero property is called a ratio scale data.

# Question 1: What do you mean by measurement scale of data and its classification?

Solve: Measurement scale of data refer to how the properties of number can be changed
with different uses. It is the foundation of any scientific investigation. Scales of
measurement can be categorized in two different forms:

1. Qualitative data
➢ Nominal Data
➢ Ordinal Data
2. Quantitative data
➢ Interval scale data
➢ Ratio scale data

Nominal data refers to data that cannot be ordered or ranked, such as categories of
colors or names of countries. Ordinal data can be ordered or ranked, but the intervals
between the numbers are not necessarily equal, such as a rating scale from 1 to 5.
Interval data has equal intervals between the values but no true zero point, such as
temperature in Celsius or Fahrenheit. Ratio data has equal intervals between the values
and a true zero point, such as weight or height.

Data collection
Data: Observed values of one or more variable yield data. Each individual piece of data
is called a data point or an observation.

Sources of data: There are two sources of getting statistical data.

➢ Primary data
➢ Secondary data

4
Primary data: The primary data are those which are collected fresh and for the first
time, and thus happened to be original in character.

Secondary data: It is the data which has already been collected by individuals or
agencies for purpose other than those of our particular research study.

Data Operation
Frequency: The number of observations or values falling each group or class is called
class frequency or simply frequency.

Frequency distribution: A frequency distribution is a set of mutually exclusive classes


or categories together with the frequency of occurrences of items, values or
observations in class or category in a given set of data presented usually in a tabular
form.

Class boundary: A class boundary is always located mid-way between the upper limits
of a class and the lower limit of the next higher class.

Percentage distribution: A percentage distribution is formed by dividing the number of


classes attributable to a category or class by the total number of classes and multiply the
resulting value with 100.
𝐹𝑖
∴ 𝑃𝑖 = ∗ 100
𝑁
Relative frequency distribution: Instead of presenting the frequencies in absolute
terms, it is sometimes convenient to express the frequencies in relative terms. The
resulting distribution is then called relative frequency distribution.

Graphs and diagrams:


𝐻𝑖𝑠𝑡𝑜𝑔𝑟𝑎𝑚
➢ Quantitative data { 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑃𝑜𝑙𝑦𝑔𝑜𝑛
𝑂𝑔𝑖𝑣𝑒 𝑐𝑢𝑟𝑣𝑒
𝐵𝑎𝑟 𝐷𝑖𝑎𝑔𝑟𝑎𝑚
➢ Qualitative data {
𝑃𝑖𝑒 𝑐ℎ𝑎𝑟𝑡

5
Date: 28 / 02 / 23

# Question 1: 10, 9, 15, 16, 13, 21, 25, 14, 37, 30, 32, 12, 19, 21, 23, 30, 33, 16, 16, 34.
Construct a frequency table and draw graph of histogram, ogive curve and frequency polygon.

Solve: Here, 𝑁 = 20

Class width, 𝐾 = 1 + 3.322 log 𝑁

∴ 𝐾 = 1 + 3.322 log 20
= 5.32 ≈ 5
𝑅𝑎𝑛𝑔𝑒 37 − 9
Number of class interval = = = 6.5 ≈ 7 [C.I = Class Interval]
𝐾−1 5−1

This means, we have to construct a table with rows (C.I + 1) = 7+1 = 8

By sorting the values, we get,

9, 10, 12, 13, 14, 15, 16, 16, 16, 19, 21, 21, 23, 25, 30, 30, 32, 33, 34, 37

The table is shown below:

Class interval Frequency Commutative Frequency


9 – 13 3 3
13 – 17 6 9
17 – 21 1 10
21 – 25 3 13
25 – 29 1 14
29 – 33 3 17
33 – 37 2 19
37 – 41 1 20
Histogram:

6
Ogive curve:

➢ Less than table:

Less than Frequency Cumulative frequency


13 3 3
17 6 9
21 1 10
25 3 13
29 1 14
33 3 17
37 2 19
41 1 20
➢ More than table:
𝐹𝑖 = 𝑁 − 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦𝑛

More than Frequency 𝐹𝑖


9 3 20
13 6 17
17 1 11
21 3 10
25 1 7
29 3 6
33 2 3
37 1 1
Ogive curve is shown below,
(Only taken cumulative frequency points from “Less than” table and 𝐹𝑖 points from
“More than” table)

Note: Ogive curve must include data points from both table

7
➢ Frequency polygon:

Class interval Midpoint Frequency


9 – 13 11 3
13 – 17 15 6
17 – 21 19 1
21 – 25 23 3
25 – 29 27 1
29 - 33 31 3
33 – 37 35 2
37 - 41 39 1

Frequency polygon is shown below:

0
11 15 19 23 27 31 35 39

Qualitative data:

➢ Bar Diagram:

Blood Group Frequency


O+ 15
B+ 30
B- 2
AB+ 3

Bar diagram is shown below,

8
35
30
25
20
15
10
5
0
O+ B+ B- AB+

➢ Pie chart:

Blood Group Frequency Relative R.F * 360


Frequency (R.F)
O+ 15 15 108°
× 100 = 30%
50
B+ 30 30 216°
× 100 = 60%
50
B- 2 2 14.4°
× 100 = 4%
50
AB+ 3 3 21.6°
× 100 = 6%
50

Pie chart is shown below: (Calculate the angles correctly and then draw using
protractor)

AB+
B
O+

B+

9
Date: 5 / 03 / 23

Central Tendency
Measures of center: Descriptive measures that indicate where the center or most typical
value of a data set lies.

Measures of center are three types:

➢ Mean
➢ Median
➢ Mode

Mean: The value obtained by summing all observations in a set and dividing by the
number of observations. Mean are divided by three types.

➢ Arithmetic mean
➢ Geometric mean
➢ Harmonic mean

Arithmetic mean: The arithmetic mean (AM) of a set of observations is the sum of
observations divided by the number of observations. Suppose we have 𝑛 observations
𝑥1 , 𝑥2 , 𝑥3 , … … , 𝑥𝑛 . Then the arithmetic mean is,
𝑥1 + 𝑥2 + 𝑥3 +. … . . +𝑥𝑛
𝐴𝑀 =
𝑛
∑𝑛𝑖=1 𝑥𝑖
=
𝑛
∑𝑥𝑖
=
𝑛
Geometric mean: Let a data set contains 𝑛 observations which are all positives, then the
geometric mean is the 𝑛𝑡ℎ positive root of their product. Suppose we have 𝑛 positive
observations 𝑥1 , 𝑥2 , 𝑥3 , … … , 𝑥𝑛 . Thus, the geometric mean is,
1
𝐺𝑀 = (𝑥1 . 𝑥2 . 𝑥3 … … 𝑥𝑛 )𝑛

Harmonic mean: If a data set contains non-zero observations, then the harmonic mean
is the reciprocal of arithmetic mean of the reciprocal of the observations. Suppose we
have a set of 𝑛 non-zero observations 𝑥1 , 𝑥2 , 𝑥3 , … … , 𝑥𝑛 . Then the harmonic mean is,
𝑛
𝐻𝑀 =
1 1 1
+ +. … . . +
𝑥1 𝑥2 𝑥𝑛

10
𝑛
=
1
∑𝑛𝑖=1
𝑥𝑖

Weighted arithmetic mean: If the relative importance of the values varies, each value is
assigned to an appropriate numerical weight. Let 𝑥1 , 𝑥2 , 𝑥3 , … … , 𝑥𝑛 be 𝑛 values whose
relative importance is measured by corresponding positive weights 𝑤1 , 𝑤2 , 𝑤3 , … … , 𝑤𝑛 .
Then the weighted arithmetic mean is given by,
𝑤1 𝑥1 + 𝑤2 𝑥2 +. … . . +𝑤𝑛 𝑥𝑛
𝑥𝑤 =
̅̅̅̅
𝑤1 + 𝑤2 +. … . . +𝑤𝑛
∑𝑤𝑖 𝑥𝑖
= ∑𝑤𝑖

# Question 1: Find the value of AM, GM and HM for ungrouped data 2, 5, 9, 3, 7, 11, 15
𝑥1 +𝑥2 +𝑥3 +.…..+𝑥𝑛
Solve: For AM, we know, 𝐴𝑀 = 𝑛

Here 𝑥1 = 2, 𝑥2 = 5, 𝑥3 = 9, 𝑥4 = 3, 𝑥5 = 7, 𝑥6 = 11, 𝑥7 = 15 and 𝑛 = 7


2+5+9+3+7+11+15
∴ 𝐴𝑀 = = 7.43
7
1
For GM, we know, 𝐺𝑀 = (𝑥1 . 𝑥2 . 𝑥3 … … 𝑥𝑛 )𝑛
1
∴ 𝐺𝑀 = (2.5.9.3.7.11.15)7 = 6.093
𝑛
For HM, we know, 𝐻𝑀 = 1 1 1
+ +.…..+
𝑥1 𝑥2 𝑥𝑛

7 7
∴ 𝐻𝑀 = 1 1 1 1 1 1 1 = 1.448 = 4.83
+ + + + + +
2 5 9 3 7 11 15

# Question 2: Find the value of AM, GM and HM for the grouped data,

Class interval Frequency


0 – 10 2
10 – 20 7
20 – 30 13
30 – 40 5
40 – 50 1

Solve: For AM, we get,

11
Class Interval Mid-Point (𝑥𝑖 ) Frequency (𝑓𝑖 ) 𝑥𝑖 𝑓𝑖
0 – 10 5 2 10
10 – 20 15 7 105
20 – 30 25 13 325
30 – 40 35 5 175
40 – 50 45 1 45

Here 𝑁 = 28 and ∑𝑥𝑖 𝑓𝑖 = 660

We know,
𝑥1 𝑓1 +𝑥2 𝑓2 +𝑥3 𝑓3 +.…..+𝑥𝑛 𝑓𝑛
𝐴𝑀 = 𝑁
∑𝑥𝑖 𝑓𝑖 660
= = = 23.571
𝑁 28

For GM, we get,

Class Interval Mid-Point Frequency (𝑓𝑖 ) log 𝑥𝑖 𝑓𝑖 log 𝑥𝑖


(𝑥𝑖 )
0 – 10 5 2 0.693 1.396
10 – 20 15 7 1.1760 8.232
20 – 30 25 13 1.3979 18.1727
30 – 40 35 5 1.544 7.72
40 – 50 45 1 1.653 1.653

Here, 𝑁 = 28 and ∑𝑓𝑖 log 𝑥𝑖 = 37.1737


(∑𝑓𝑖 log 𝑥𝑖 ) 37.1737
𝐺𝑀 = 𝐴𝑛𝑡𝑖𝑙𝑜𝑔 = 𝐴𝑛𝑡𝑖𝑙𝑜𝑔 ( ) = 21.26
𝑁 28

For HM, we get,

Class Interval Mid-Point Frequency (𝑓𝑖 ) 𝑓𝑖


(𝑥𝑖 ) 𝑥𝑖
0 – 10 5 2 0.4
10 – 20 15 7 0.467
20 – 30 25 13 0.52
30 – 40 35 5 0.143
40 – 50 45 1 0.023

12
∑𝑓𝑖
Here 𝑁 = 28 and = 1.553
𝑥𝑖

We know,
𝑁
𝐻𝑀 = 𝑓1 𝑓 𝑓
+ 2 +...+ 𝑛
𝑥1 𝑥2 𝑥𝑛

𝑁
= 𝑓
∑ 𝑖
𝑥𝑖

28
= 1.553

= 18.03

Median: The median for a dataset is the value that is exactly in the middle position of the
list when the data are arranged in order from smallest to largest. Now if the number of
observations is odd, then the median is the exact middle value in the ordered list. If it is
even, then the median is in the halfway between the two middle values in the ordered
list.
𝑁+1 𝑡ℎ
➢ If 𝑛 is odd, 𝑚𝑒𝑑𝑖𝑎𝑛 = ( ) observation
2
𝑡ℎ
𝑁 𝑡ℎ 𝑁 𝑡ℎ
( ) +( +1)
➢ If 𝑛 is even, 𝑚𝑒𝑑𝑖𝑎𝑛 = ( 2 2
) observation
2

#Question 3: Find the median of the ungrouped data 2, 5, 9, 3, 7, 11, 15

Ans: 7 (pretty simple huh!)

# Question 4: Find the median of the grouped data,

Class Interval Frequency


0 – 10 2
10 – 20 7
20 – 30 13
30 – 40 5
40 – 50 1

Solve: For median, we get,

13
Class Interval Mid-point (𝑥𝑖 ) Frequency (𝑓𝑖 ) Cumulative Frequency (𝐹𝑐 )
0 – 10 5 2 2
10 – 20 15 7 9
20 – 30 25 13 22
30 – 40 35 5 27
40 – 50 45 1 28
Here,

𝑁 = 28, 𝑋𝐿 = 20, 𝐹𝐶 1 = 9, 𝐹𝑚 = 13, 𝐶. 𝐼 = 10

We know,
𝑁
−𝐹𝐶1
Median = 𝑋𝐿 + 2 𝐹 × 𝐶. 𝐼
𝑚

14−9
= 20 + × 14
13

= 23.846

Date: 7 / 03 / 23

Mode: The mode of a dataset is its most frequently occurring value. If each value occurs
with the same frequency, the dataset has no mode; otherwise, any value that occurs
with greatest frequency is a mode.

# Question 1: Find the mode of the ungrouped data 2, 3, 5, 7, 9, 9, 11, 15

Ans: 9

# Question 2: Find the mode of the following grouped data,

Class Interval Frequency


0 – 10 2
10 – 20 7
20 – 30 13
30 – 40 5
40 – 50 1

14
Solve:

Class Interval Mid-Point (𝑥𝑖 ) Frequency (𝑓𝑖 ) Cumulative Frequency


0 – 10 5 2 2
10 – 20 15 7 9
20 – 30 25 13 22
30 – 40 35 5 27
40 – 50 45 1 28
Here we get,

𝑋𝐿 = 20,
∆1 = 13 − 7 = 6,
∆2 = 13 − 5 = 8,
𝐶. 𝐼 = 10

We know,
∆1
Mode (𝑀𝑜 ) = 𝑋𝐿 + ∆ × 𝐶. 𝐼
1 +∆2

6
= 20 + 6+8 × 10

= 24.285

# Question 3: For two observations show that,

➢ 𝐴𝑀 ≥ 𝐺𝑀 ≥ 𝐻𝑀
➢ 𝐴𝑀 × 𝐻𝑀 = 𝐺𝑀2

Solve: Let 𝑥1 and 𝑥2 be two observations, then we know,


(𝑥1 + 𝑥2 )
𝐴𝑀 =
2
𝐺𝑀 = (𝑥1 . 𝑥2 )2
2
𝐻𝑀 =
1 1
𝑥1 + 𝑥2

15
Now,
2
(√𝑥1 − √𝑥2 ) ≥ 0

→ 𝑥1 − 2√𝑥1 𝑥2 + 𝑥2 ≥ 0

→ 𝑥1 + 𝑥2 ≥ 2√𝑥1 𝑥2
𝑥1 + 𝑥2 1
→ ≥ (𝑥1 𝑥2 )2
2
∴ 𝐴𝑀 ≥ 𝐺𝑀

Again,

1 1 2
( − ) ≥0
√𝑥1 √𝑥2
1 1 1 1
→ − 2. . + ≥0
𝑥1 √𝑥1 √𝑥2 𝑥2
1 1 2
→ + ≥
𝑥1 𝑥2 √𝑥1 𝑥2
1 1
→ √𝑥1 𝑥2 . ( + ) ≥ 2
𝑥1 𝑥2
2
→ √𝑥1 𝑥2 ≥
1 1
+
𝑥1 𝑥2
→ 𝐺𝑀 ≥ 𝐻𝑀
∴ 𝐴𝑀 ≥ 𝐺𝑀 ≥ 𝐻𝑀

Again,

𝐴𝑀 × 𝐻𝑀
(𝑥1 + 𝑥2 ) 2
→ ×
2 1 1
𝑥1 + 𝑥2
𝑥1 𝑥2
→ (𝑥1 + 𝑥2 ) ×
(𝑥1 + 𝑥2 )
→ 𝑥1 𝑥2
2
→ (√𝑥1 𝑥2 )

16
→ 𝐺𝑀2
∴ 𝐴𝑀 × 𝐻𝑀 = 𝐺𝑀2

# Question 4: Show that, ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ ) = 0

Solve: Let 𝑥1 , 𝑥2 , … … , 𝑥𝑛 be a set of n observations. Then the arithmetic mean,


𝑥1 + 𝑥2 +. … . . +𝑥𝑛
𝑥̅ =
𝑛
∑𝑥𝑖
=
𝑛
Thus, 𝑛𝑥̅ = ∑𝑥𝑖 ………(i)

Therefore, the sum of deviations of the observation about mean,


𝑛

∑(𝑥𝑖 − 𝑥̅ ) = (𝑥1 − 𝑥̅ ) + (𝑥2 − 𝑥̅ ) +. … . . +(𝑥𝑛 − 𝑥̅ )


𝑖=1

= (𝑥1 + 𝑥2 +. … . . +𝑥𝑛 ) − 𝑛𝑥̅

= 𝑛𝑥̅ − 𝑛𝑥̅ [from (i)]

=0

#Question 5: Show that ∑𝑛𝑖=1(𝑥𝑖 − 𝑥̅ )2 ≤ ∑𝑛𝑖=1(𝑥𝑖 − 𝐴)2 where A is any arbitrary value.

Ans: Let 𝑥1 , 𝑥2 , … . . , 𝑥𝑛 be a set of n observations. Then the arithmetic mean,


𝑥1 + 𝑥2 +. … . . +𝑥𝑛
𝑥̅ =
𝑛
∑𝑥𝑖
= − − − − − − − − − − − (1)
𝑛
Thus, 𝑛𝑥̅ = ∑𝑥𝑖

Therefore, the sum of deviations of the observation about mean,


𝑛

∑(𝑥𝑖 − 𝑥̅ ) = (𝑥1 − 𝑥̅ ) + (𝑥2 − 𝑥̅ )+. … . . +(𝑥𝑛 − 𝑥̅ )


𝑖=1

= (𝑥1 + 𝑥2 +. … . . +𝑥𝑛 ) − 𝑛𝑥̅


= 𝑛𝑥̅ − 𝑛𝑥̅ = 0 − − − − − − − − − − − (2)

17
Here 𝐴 ∈ ℝ. So, for any given value, ∑𝑛𝑖=1(𝑥𝑖 − 𝐴)2 will be a positive value or equal
0 (𝑤ℎ𝑒𝑛 𝐴 = 𝑥̅ ).

Therefore, we get,
𝑛

∑(𝑥𝑖 − 𝐴)2 ≥ 0 − − − − − − − − − −(3)


𝑖=1

But from equation (2), we get,


𝑛

∑(𝑥𝑖 − 𝑥̅ ) = 0
𝑖=1

So, by comparing equation (2) and (3), we get,


𝑛 𝑛

∑(𝑥𝑖 − 𝑥̅ )2 ≤ ∑(𝑥𝑖 − 𝐴)2


𝑖=1 𝑖=1

#Question 6: For three observations show that 𝐴𝑀 ≥ 𝐺𝑀 ≥ 𝐻𝑀

Solve: For three observation 𝑎, 𝑏, 𝑐 we get,


𝑎+𝑏+𝑐
𝐴𝑀 =
3
1
𝐺𝑀 = (𝑎. 𝑏. 𝑐)3
3𝑎𝑏𝑐
𝐻𝑀 =
𝑎. 𝑏 + 𝑏. 𝑐 + 𝑐. 𝑎
Part 1: From the relation between AM and GM we get,
𝑥1 + 𝑥2 +. … . . +𝑥𝑛 1
≥ (𝑥1 𝑥2 … 𝑥𝑛 )𝑛
𝑛
Now, let’s assume we have four variables, 𝑝, 𝑞, 𝑟, 𝑠. If we rewrite the relation, we get,
𝑝+𝑞+𝑟+𝑠 1
≥ (𝑝𝑞𝑟𝑠)4
4
Now, let’s assume that the values are as follows,
𝑎+𝑏+𝑐
𝑝 = 𝑎, 𝑞 = 𝑏, 𝑟 = 𝑐, 𝑠=
3
Here, we get,

18
1
𝑎+𝑏+𝑐 𝑎. 𝑏. 𝑐(𝑎 + 𝑏 + 𝑐) 4
=𝑎+𝑏+𝑐+ ≥ 4. ( )
3 3
1
4. (𝑎 + 𝑏 + 𝑐) 1 𝑎+𝑏+𝑐 4
= ≥ 4. (𝑎. 𝑏. 𝑐)4 . ( )
3 3
1
𝑎+𝑏+𝑐 1 𝑎+𝑏+𝑐 4
= ≥ (𝑎. 𝑏. 𝑐)4 . ( )
3 3
1
𝑎 + 𝑏 + 𝑐 1−(4) 1
=( ) ≥ (𝑎. 𝑏. 𝑐)4
3
3
𝑎+𝑏+𝑐 4 1
=( ) ≥ (𝑎. 𝑏. 𝑐)4
3
𝑎+𝑏+𝑐 1 4
= ≥ (𝑎. 𝑏. 𝑐)4×3
3
𝑎+𝑏+𝑐 1
= ≥ (𝑎. 𝑏. 𝑐)3
3
Thus, for three observations, we can prove that, 𝐴𝑀 ≥ 𝐺𝑀

Part 2: Again, let’s assume that our three variables are 𝑝, 𝑞, 𝑟 and their values are as
1 1 1
follows, 𝑝 = 𝑎 , 𝑞 = 𝑏 , 𝑟 = 𝑐

Now for these three observations, we get,


𝑝+𝑞+𝑟 1
= ≥ (𝑝. 𝑞. 𝑟)3
3
1
= 𝑝 + 𝑞 + 𝑟 ≥ 3. (𝑝. 𝑞. 𝑟)3

Now, by applying those values for those variables, we get,


1
1 1 1 1 1 1 3
= + + ≥ 3. ( . . )
𝑎 𝑏 𝑐 𝑎 𝑏 𝑐
1
𝑎. 𝑏 + 𝑏. 𝑐 + 𝑐. 𝑎 1 3
= ≥( )
3. 𝑎. 𝑏. 𝑐 𝑎. 𝑏. 𝑐
Now inversing both sides, we get,

19
1 −1
1 3 𝑎. 𝑏 + 𝑏. 𝑐 + 𝑐. 𝑎 −1
= (( ) ) ≥( )
𝑎. 𝑏. 𝑐 3. 𝑎. 𝑏. 𝑐

1 3. 𝑎. 𝑏. 𝑐
= (𝑎𝑏𝑐)3 ≥
𝑎. 𝑏 + 𝑏. 𝑐 + 𝑐. 𝑎
= 𝐺𝑀 ≥ 𝐻𝑀

Therefore, we proved, 𝐴𝑀 ≥ 𝐺𝑀 ≥ 𝐻𝑀

#Question 5: Which measure of the central tendency is the best and why?

Solve: Mean is the best measure of central tendency. Because,

The mean is usually the best measure of central tendency to use when the data
distribution is continuous and symmetrical, such as when the data is normally
distributed. However, it all depends on what we are trying to show from the data. The
mean is equal to the sum of observation values in the dataset divided by the number of
values in the dataset,
𝑥1 +𝑥2 +.…..+𝑥𝑛
i.e. 𝑥̅ = 𝑛

The median is a good choice to represent a set with one or two outliers (ordinal data).

And the mode is only useful for sets of data that have many identical values (nominal
data).

So, we see these all reason that mean is the best measure of central tendency.

Quartile: There are three quartiles in a dataset, usually denoted by 𝑄1 , 𝑄2 and 𝑄3 , which
divide the who distribution into four equal parts.

The second quartile 𝑄2 , is identical with the median.

The first quartile 𝑄1 is the value at or below which one-fourth (25%) of all observations
in the set fall, the third quartile 𝑄3 is the value at or below which three-fourth (75%) of
the observations lie.

So that means, 𝑄1 = 25%, 𝑄2 = 50%, 𝑄3 = 75%

For ungrouped data,


𝑖×𝑁 𝑡ℎ
𝑄𝑖 = observation
4

20
For grouped data,
𝑖×𝑁
− 𝐹𝑐
𝑄𝑖 = 𝑋𝐿 + 4 × 𝐶. 𝐼
𝐹𝑚

Here,

𝑄𝑖 = 𝑖 𝑡ℎ quartile

𝑋𝐿 = Lower boundary of the 𝑖 𝑡ℎ quartile

𝐹𝑚 = Frequency of the 𝑖 𝑡ℎ quartile class

𝐹𝑐 = Cumulative frequency for the pre 𝑖 𝑡ℎ quartile class

𝐶. 𝐼 = Class Interval

𝑄1 < 𝑄2 < 𝑄3 and 𝑄2 = median

Deciles: When a distribution is divided into ten equal parts, each division is called a
decile. Thus, there are 9 deciles in a distribution which are denoted by 𝐷1 , 𝐷2 , … … , 𝐷9

For ungrouped data,


𝑖×𝑁 𝑡ℎ
𝐷𝑖 = observation
10

For grouped data,


𝑖×𝑁
− 𝐹𝑐
𝐷𝑖 = 𝑋𝐿 + 10 × 𝐶. 𝐼
𝐹𝑚

Percentiles: The statistical measure referred to as percentile offers a mean for


identifying the location of values in the data set that are not necessarily central values.
Percentiles are the values which divide the distribution into 100 equal parts. Thus, there
are 99 percentiles in a distribution, which are conventionally denoted by 𝑃1 , 𝑃2 , … … , 𝑃99

For ungrouped data,


𝑖×𝑁 𝑡ℎ
𝑃𝑖 = observation
100

For grouped data,


𝑖×𝑁
− 𝐹𝑐
𝑃𝑖 = 𝑋𝐿 + 100 × 𝐶. 𝐼
𝐹𝑚

21
Date: 12 / 03 / 23

# Question 1: Find the Quartiles for group frequency data.

Class Interval Frequency


40 – 50 8
50 – 60 14
60 – 70 35
70 – 80 20
80 – 90 15
90 – 100 8

Solve: For Quartiles,

Class Interval Frequency (𝑓𝑖 ) Cumulative


Frequency (𝑓𝑐 )
40 – 50 8 8
50 – 60 14 22
60 – 70 35 57
70 – 80 20 77
80 – 90 15 92
90 – 100 8 100

Computation of 𝑸𝟏 :

Here, 𝑁 = 100
𝑖×𝑁 1×100
Let 𝑖 = 1, hence 𝑄1 = = = 25𝑡ℎ observation
4 4

The first quartile class is 60 – 70.

Here we get, 𝑋𝐿 = 60, 𝑖 = 1, 𝑁 = 100, 𝐹𝑐 = 22, 𝐹𝑚 = 35, 𝐶. 𝐼 = 10

𝑖×𝑁
− 𝐹𝑐
𝑄1 = 𝑋𝐿 + 4 × 𝐶. 𝐼
𝐹𝑚

22
25 − 22
= 60 + × 10
35
= 60.85
≈ 60.9

Computation of 𝑸𝟐 :
2×100
𝑖 = 2, hence 𝑄2 = = 50𝑡ℎ observation
4

So, the second quartile class is 60 - 70.

Here we get, 𝑋𝐿 = 60, 𝑖 = 2, 𝑁 = 100, 𝐹𝑐 = 22, 𝐹𝑚 = 35, 𝐶. 𝐼 = 10

2×𝑁
− 𝐹𝑐
𝑄2 = 𝑋𝐿 + 4 × 𝐶. 𝐼
𝐹𝑚
2 × 100
− 22
= 60 + 4 × 10
35
= 68

Computation of 𝑸𝟑 :
3×100
𝑖 = 3, hence 𝑄3 = = 75𝑡ℎ observation
4

So, the third quartile class is 70 – 80.

Here we get, 𝑋𝐿 = 70, 𝑖 = 3, 𝑁 = 100, 𝐹𝑐 = 57, 𝐹𝑚 = 20, 𝐶. 𝐼 = 10

3×𝑁
( 4 − 𝐹𝑐 )
𝑄3 = 𝑋𝐿 + × 𝐶. 𝐼
𝐹𝑚
(3 × 100)
− 57
= 70 + 4 × 10
20
= 79

23
# Question 2: Compute first, fifth and ninth deciles for group frequency data.

Class Interval Frequency


40 – 50 8
50 – 60 14
60 – 70 35
70 – 80 20
80 – 90 15
90 – 100 8
Ans: For deciles,

Class Interval Frequency (𝑓𝑖 ) Cumulative


Frequency (𝑓𝑐 )
40 – 50 8 8
50 – 60 14 22
60 – 70 35 57
70 – 80 20 77
80 – 90 15 92
90 – 100 8 100

Computation of 𝑫𝟏 :
𝑖×𝑁 1×100
𝑖 = 1, hence 𝐷1 = = = 10𝑡ℎ observation.
10 10

So, the 𝐷1 class is 50 – 60.

Here we get, 𝑋𝐿 = 50, 𝑖 = 1, 𝑁 = 100, 𝐹𝑐 = 8, 𝐹𝑚 = 14, 𝐶. 𝐼 = 10

1×𝑁
( 10 − 𝐹𝑐 )
𝐷1 = 𝑋𝐿 + × 𝐶. 𝐼
𝐹𝑚
(1 × 100)
−8
= 50 + 10 × 10
14
= 51.42

Computation of 𝑫𝟓 :
𝑖×𝑁 5×100
𝑖 = 5, hence 𝐷5 = = = 50𝑡ℎ observation.
10 10

24
So, the 𝐷5 class is 60 – 70.

Here we get, 𝑋𝐿 = 60, 𝑖 = 5, 𝑁 = 100, 𝐹𝑐 = 22, 𝐹𝑚 = 35, 𝐶. 𝐼 = 10

5×𝑁
( 10 − 𝐹𝑐 )
𝐷5 = 𝑋𝐿 + × 𝐶. 𝐼
𝐹𝑚
(5 × 100)
− 22
= 60 + 10 × 10
35
= 68

Computation of 𝑫𝟗 :
𝑖×𝑁 9×100
𝑖 = 9, hence 𝐷9 = = = 90𝑡ℎ observation.
10 10

So, the 𝐷9 class is 80-90

Here we get, 𝑋𝐿 = 80, 𝑖 = 9, 𝑁 = 100, 𝐹𝑐 = 77, 𝐹𝑚 = 15, 𝐶. 𝐼 = 10

9×𝑁
( 10 − 𝐹𝑐 )
𝐷9 = 𝑋𝐿 + × 𝐶. 𝐼
𝐹𝑚
(9 × 100)
− 77
= 80 + 10 × 10
15
= 88.6667
≈ 88.67

# Question 3: Compute 30𝑡ℎ and 90𝑡ℎ percentiles for group frequency data.

Class Interval Frequency


40 – 50 8
50 – 60 14
60 – 70 35
70 – 80 20
80 – 90 15
90 – 100 8
Ans: For percentiles,

Class Interval Frequency (𝑓𝑖 ) Cumulative


Frequency (𝑓𝑐 )

25
40 – 50 8 8
50 – 60 14 22
60 – 70 35 57
70 – 80 20 77
80 – 90 15 92
90 – 100 8 100

Computation of 𝑷𝟑𝟎 :
𝑖×𝑁 30×100
𝑖 = 30, hence 𝑃30 = = = 30𝑡ℎ observation.
100 100

So, the 𝑃30 class is 60 – 70.

Here we get, 𝑋𝐿 = 60, 𝑖 = 30, 𝑁 = 100, 𝐹𝑐 = 22, 𝐹𝑚 = 35, 𝐶. 𝐼 = 10

30 × 𝑁
( 100 − 𝐹𝑐 )
𝑃30 = 𝑋𝐿 + × 𝐶. 𝐼
𝐹𝑚
(30 × 100)
− 22
= 60 + 100 × 10
35
= 62.285
≈ 62.29

Computation of 𝑷𝟗𝟎 :
𝑖×𝑁 90×100
𝑖 = 90, hence 𝑃90 = = = 90𝑡ℎ observation.
100 100

So, the 𝑃90 class is 80 – 90.

Here we get, 𝑋𝐿 = 80, 𝑖 = 90, 𝑁 = 100, 𝐹𝑐 = 77, 𝐹𝑚 = 15, 𝐶. 𝐼 = 10

90 × 𝑁
( 100 − 𝐹𝑐 )
𝑃90 = 𝑋𝐿 + × 𝐶. 𝐼
𝐹𝑚
(90 × 100)
− 77
= 80 + 100 × 10
15
= 88.667

26
≈ 88.67

Trimmed mean: Discarding a proportion of the largest and smallest observations from
a data set, arithmetic mean is computed from the rest observation and the mean is
called trimmed mean.

Let 𝑥1 , 𝑥2 , 𝑥3 , … … , 𝑥𝑛 be a set of observations which are arranged in order from smallest


to largest. Let 𝛼% of 𝑛 be 𝑚. Then 𝛼% mean is defined as,
𝑛−𝑚
1
𝑇𝑀 = ∑ 𝑥𝑖
𝑛 − 2𝑚
𝑖=𝑚+1

# Question 4: Consider the following set of 10 observations 50, 55, 52, 56, 58, 60, 57, 53, 120, 5
and determine the 10% trimmed mean.

Solve: 10% of 10 is 1. Thus 10% trimmed mean is the mean of 8 observations discarding
the largest value 120 and smallest value 5. That is, the trimmed mean is,
1
𝑇𝑀 = × (50 + 55 + 52 + 56 + 58 + 60 + 57 + 53)
10 − (2 × 1)
441
=
8
= 55.125

Date: 14 / 03 / 23

Measure of dispersion
Dispersion is an important characteristic of a frequency distribution, it describes how
compactly the individual scores are distributed around the average.

Purpose of measure of dispersion:

1. To compare two or more series with regard to their variability


2. To determine the reliability of an average
3. To serve as a basis for control of variability
4. To facilitate the use of other statistical measures like correlation, regression,
structural equation modelling etc.

27
Methods of measuring dispersions:

1. Algebraic method
2. Graphical method

Algebraic method has two types:

1. The Absolute measure (same statistical unit)


➢ Range
➢ Quartile deviation
➢ Mean deviation
➢ Standard deviation
2. The relative measure (proportional unit)
➢ Coefficient of range
➢ Coefficient of quartile deviation
➢ Coefficient of mean deviation
➢ Coefficient of variation (CV)

Range: Range is measured just as the difference between the highest and lowest values
of the lowest variable.

For ungrouped data,

∴ 𝑅𝑎𝑛𝑔𝑒 = 𝐻 − 𝐿

Where, 𝐻 = 𝐻𝑖𝑔ℎ𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒, 𝐿 = 𝐿𝑜𝑤𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒

For grouped data,

In the case of grouped data, the range is the difference between the upper boundary of
the highest class and the lower boundary of the lowest class. It is also calculated by
using the difference between the mid-points of the highest class and the lowest class.

Uses of range:

1. Quality control: The idea basically is that if the range, the difference between the
largest and smallest mass-produced items increases beyond a certain point, the
production machinery should be examined to find out why the items produced
have not followed their usual more consistent pattern.
2. Fluctuations in the share prices: Range is useful in studying the variations in the
price of stocks and shares and other commodities that are sensitive to price
changes from one period to another.

28
3. Weather forecast: The meteorological department does make use of the range in
determining the difference between minimum temperature and maximum
temperature.

Coefficient of range:
𝐻−𝐿
Coefficient of range = 𝐻+𝐿 × 100%

#Question 1: The following are the weekly wages of 8 workers in a manufacturing factory. Find
the range and coefficient of range. Wages are in taka 1400, 1450, 1520, 1380, 1485, 1495, 1575,
1440

Solve: Given that, wages are 1400, 1450, 1520, 1380, 1485, 1495, 1575, 1440

By sorting the wages values from lowest to highest we get,

1380, 1400, 1440, 1450, 1485, 1495, 1520, 1575

Here, 𝐻 = 1575, 𝐿 = 1380

We know, 𝑅𝑎𝑛𝑔𝑒 = 𝐻 − 𝐿

∴ 𝐻 − 𝐿 = 1575 − 1380 = 195

Again, we know,
𝐻−𝐿
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑟𝑎𝑛𝑔𝑒 = × 100%
𝐻+𝐿
195
∴ 𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑟𝑎𝑛𝑔𝑒 = × 100% = 6.59%
1575 + 1380

#Question 2: The following table gives the frequency distribution of the number of orders
received each day during past 50 days at office of a mail company. Calculate the range and
coefficient of range.

Number of Order Frequency


10 – 12 4
13 – 15 12
16 – 18 20
19 – 21 10
22 – 24 4

29
Solve: Given that,

Here the table has inclusive class interval. So, to determine the highest and lowest value
of class interval, we need to adjust the class interval. By adjusting the class interval, we
get,

Class Interval Adjusted Class Interval Frequency


10 – 12 9.5 – 12.5 4
13 – 15 12.5 – 15.5 12
16 – 18 15.5 – 18.5 20
19 – 21 18.5 – 21.5 10
22 – 24 21.5 – 24.5 4
Method – 1: Using adjusted class interval:

Here, 𝐻 = 24.5, 𝐿 = 9.5

∴ 𝑅𝑎𝑛𝑔𝑒 = (24.5 − 9.5) = 15


𝑅𝑎𝑛𝑔𝑒 15
∴ 𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑅𝑎𝑛𝑔𝑒 = = × 100% = 44.11%
𝐻+𝐿 24.5 + 9.5
Method – 2: Using mid-point:

Here, 𝐻 = 23, 𝐿 = 11

∴ 𝑅𝑎𝑛𝑔𝑒 = (23 − 11) = 12


𝑅𝑎𝑛𝑔𝑒 12
∴ 𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑅𝑎𝑛𝑔𝑒 = = × 100% = 35.29%
𝐻+𝐿 23 + 11

Date: 21 / 03 / 23

#Quartile deviation:

Interquartile range: The interquartile range (IQR) of a data set is the difference between
the 1𝑠𝑡 and 3𝑟𝑑 quartiles.

Quartile Deviation (QD):


𝑄3 − 𝑄1
𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 =
2

30
𝐼𝑄𝑅 = 𝑄3 − 𝑄1

Coefficient of Quartile deviation:

Coefficient of QD is to compare the variation in two data. It is calculated as,


𝑄3 − 𝑄1
× 100%
𝑄3 + 𝑄1

Example: Following are the runs scored by a batsman in last 20 test matches:

96, 70, 100, 96, 81, 84, 90, 89, 63, 90, 34, 75, 39, 82, 85, 86, 76, 64, 67 𝑎𝑛𝑑 88

Calculate the quartile deviation and coefficient of quartile deviation.

Solve: First quartile (𝑄1 ):


𝑖(𝑛 + 1) 20 + 1
𝑄1 = = = 5.25𝑡ℎ 𝑜𝑏𝑠𝑒𝑣𝑎𝑡𝑖𝑜𝑛
4 4
Since 5.25𝑡ℎ observation lies between 5𝑡ℎ − 6𝑡ℎ observation that is midway between 67
and 70. Therefore,

𝑄1 = 67 + 0.25 × (70 − 67)


= 67.75

Now,
3 × (20 + 1)
𝑄3 = = 15.75𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
4
Since 15.75𝑡ℎ observation lies between 15𝑡ℎ and 16𝑡ℎ observation that is midway
between 89 and 90. Therefore,

𝑄3 = 89 + 0.75 × (90 − 89)


= 89.75

Hence the quartile deviation


𝑄3 − 𝑄1 89.75 − 67.75
= = = 11
2 2
And the coefficient of quartile deviation,
𝑄3 − 𝑄1
= × 100%
𝑄3 + 𝑄1
= 13.96%

31
# Question 1: Following are the observation showing the age of 50 employees working
in a wholesale center. Find the quartile deviation and coefficient of QD

Age (in year) 40-44 45-49 50-54 55-59 60-64 65-69


Employees 4 7 14 11 8 6

Solve: For Quartiles,

Class Interval Adjusted Class Frequency (𝑓𝑖 ) Cumulative


Interval Frequency (𝑓𝑐 )
40 – 44 39.5 – 44.5 4 4
45 – 49 44.5 – 49.5 7 11
50 – 54 49.5 – 54.5 14 25
55 – 59 54.5 – 59.5 11 36
60 – 64 59.5 – 64.5 8 44
65 – 69 64.5 – 69.5 6 50
Computation of 𝑸𝟏 :

Here, 𝑁 = 50
𝑖×𝑁 1×50
Let 𝑖 = 1, hence 𝑄1 = = = 12.5𝑡ℎ observation. The 12.5𝑡ℎ observation lies
4 4
between 12𝑡ℎ and 13𝑡ℎ observation.

Hence, the first quartile class is 49.5 − 54.5

Here we get, 𝑋𝐿 = 49.5, 𝑖 = 1, 𝑁 = 50, 𝐹𝑐 = 11, 𝐹𝑚 = 14, 𝐶. 𝐼 = 5

𝑖×𝑁
− 𝐹𝑐
𝑄1 = 𝑋𝐿 + 4 × 𝐶. 𝐼
𝐹𝑚

12.5 − 11
= 49.5 + ×5
14
= 50.0357
≈ 50.036

Computation of 𝑸𝟑 :

Here, 𝑁 = 50

32
𝑖×𝑁 3×50
Let 𝑖 = 3, hence 𝑄3 = = = 37. 5𝑡ℎ observation. The 37.5𝑡ℎ observation lies
4 4
between 37𝑡ℎ and 38𝑡ℎ observation.

Hence, the third quartile class is 59.5 − 64.5

Here we get, 𝑋𝐿 = 59.5, 𝑖 = 3, 𝑁 = 50, 𝐹𝑐 = 36, 𝐹𝑚 = 8, 𝐶. 𝐼 = 5

𝑖×𝑁
− 𝐹𝑐
𝑄3 = 𝑋𝐿 + 4 × 𝐶. 𝐼
𝐹𝑚

37.5 − 36
= 59.5 + ×5
8
= 60.43

Therefore,
𝑄3 − 𝑄1 60.43 − 50.036
𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 (𝑄𝐷) = =
2 2
= 5.197
𝑄3 − 𝑄1 60.43 − 50.036
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑄𝐷 = × 100% = × 100%
𝑄3 + 𝑄1 60.43 + 50.036
= 9.40%

Date: 28 / 03 / 23

# Mean deviation:

The mean deviation of 𝑥 about 𝑐 is denoted by 𝑀𝐷𝑐 . Thus,


𝑛
1
𝑀𝐷𝑐 = ∑ |𝑥𝑖 − 𝑐|
𝑛
𝑖=1

In particular, when 𝑐 = 𝑥̅ , mean deviation about mean is,


𝑛
1
𝑀𝐷𝑥̅ = ∑ |𝑥𝑖 − 𝑥̅ |
𝑛
𝑖=1

33
It may be mentioned that the mean deviation is generally calculated about the
arithmetic mean. Again if 𝑥1 , 𝑥2 , 𝑥3 , … , 𝑥𝑛 are the given values of a variable 𝑥 and
𝑓1 , 𝑓2 , 𝑓3 , … , 𝑓𝑛 are the corresponding frequencies.

Then,
𝑛
1
𝑀𝐷𝑐 = ∑ 𝑓𝑖 |𝑥𝑖 − 𝑐|
𝑛
𝑖=1

# Coefficient of mean:

A relative measure of dispersion based on the mean deviation is called the coefficient of
mean deviation or the coefficient of dispersion.
𝑀𝐷
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑚𝑒𝑎𝑛 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 (𝑚𝑒𝑎𝑛) = × 100%
𝑚𝑒𝑎𝑛
𝑀𝐷
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑚𝑒𝑎𝑛 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 (𝑚𝑒𝑑𝑖𝑎𝑛) = × 100%
𝑚𝑒𝑑𝑖𝑎𝑛
𝑀𝐷
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑚𝑒𝑎𝑛 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 (𝑚𝑜𝑑𝑒) = × 100%
𝑚𝑜𝑑𝑒
# Question 1: Following are the number of hours a machine worked for the last 9 weeks,

47, 63, 75, 39, 10, 60, 96, 32 𝑎𝑛𝑑 28

a. Find the mean absolute deviation from mean.


b. Find the coefficient of mean deviation from mean.
c. Find mean absolute deviation from median.
d. Find the coefficient of mean deviation from mean.

Solve: We arrange the data in ascending order:

10, 28, 32, 39, 47, 60, 63, 75, 96

𝑛 + 1 𝑡ℎ
∴ 𝑀𝑒𝑑𝑖𝑎𝑛, 𝑀𝑒 = ( ) 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛
2
= 5𝑡ℎ 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 = 47
10 + 28 + 32 + 39 + 47 + 60 + 63 + 75 + 96
∴ 𝑀𝑒𝑎𝑛, 𝑥̅ =
9
= 50

To find the mean deviation and coefficient of mean deviation from mean, we construct
the following table:

34
𝑥𝑖 𝑥𝑖 − 𝑥 ̅ |𝑥𝑖 − 𝑥̅ |
10 -40 40
28 -22 22
32 -18 18
39 -11 11
47 -3 3
60 10 10
63 13 13
75 25 25
96 46 46

∑𝑥𝑖 = 450, ∑|𝑥𝑖 − 𝑥̅ | = 188


∑|𝑥𝑖 − 𝑥̅ |
∴ 𝑀𝐷(𝑚𝑒𝑎𝑛) = = 20.89
𝑛
𝑀𝐷
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑀𝐷(𝑚𝑒𝑎𝑛) = × 100%
𝑚𝑒𝑎𝑛
20.89
= × 100%
50
= 41.78%
Again, to find the mean deviation and coefficient of mean deviation from median, we
construct the following table:
𝑥𝑖 𝑥𝑖 − 𝑀𝑒 |𝑥𝑖 − 𝑀𝑒|
10 -37 37
28 -19 19
32 -15 15
39 -8 8
47 0 0
60 13 13
63 16 16
75 28 28
96 49 49

∑|𝑥 − 𝑀𝑒|
𝑀𝐷(𝑚𝑒𝑎𝑛) =
𝑛

35
185
= = 20.56
9
𝑀𝐷
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑀𝐷 (𝑚𝑒𝑑𝑖𝑎𝑛) = × 100%
𝑚𝑒𝑑𝑖𝑎𝑛
20.56
= × 100% = 43.75%
47

# Standard deviation and variance:

Standard deviation of a set of observations of a variable is defined as the square rotos of


the arithmetic mean of the squares of deviations from the arithmetic mean. In other
words, it is termed as,

“The root mean squared deviations from the arithmetic mean”

# Variance: Variance is the arithmetic mean of the squares of the deviations of all values
in a set of number from their arithmetic mean. In other words, variance is the square of
the standard deviation,

𝑛
(𝑥𝑖 − 𝑥̅ )2
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛, 𝜎 = √∑
𝑛
𝑖=1

2
𝑛
(𝑥𝑖 − 𝑥̅ )2
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒, 𝜎 2 = {√∑ }
𝑛
𝑖=1

𝑛
1
= ∑(𝑥𝑖 − 𝑥̅ )2
𝑛
𝑖=1

𝑛 𝑛 2
𝑥𝑖2 𝑥𝑖
2
𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒, 𝜎 = ∑ − (∑ )
𝑛 𝑛
𝑖=1 𝑖=1

# Proof:
𝑛
2
(𝑥𝑖 − 𝑥̅ )2
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒, 𝜎 = ∑
𝑛
𝑖=1

36
𝑛
𝑥𝑖2 − 2𝑥𝑖 . 𝑥̅ + 𝑥̅ 2 Explanation:
=∑
𝑛 𝑛
𝑖=1

𝑛 𝑛 𝑛
∑1 = 𝑛
𝑥𝑖2 2𝑥𝑖 𝑥̅ 𝑥̅ 2 𝑖=𝑖
=∑ −∑ +∑
𝑛 𝑛 𝑛 𝑛 𝑛
𝑖=1 𝑖=1 𝑖=1 𝑥̅ 2 𝑥̅ 2 𝑥̅ 2
𝑛 𝑛
∑ = ∑1 = × 𝑛 = 𝑥̅ 2
1 1 𝑛 𝑛 𝑛
𝑖=𝑖 𝑖=𝑖
= ∑ 𝑥𝑖2 − 2𝑥̅ . ∑ 𝑥𝑖 + 𝑥̅ 2
𝑛 𝑛
𝑖=1 𝑖=1

𝑛
1
= ∑ 𝑥𝑖2 − 2𝑥̅ 2 + 𝑥̅ 2
𝑛
𝑖=1

𝑛
1
= ∑ 𝑥𝑖2 − 𝑥̅ 2
𝑛
𝑖=1

𝑛 𝑛
𝑥𝑖2 𝑥𝑖 2
= ∑ − ∑( )
𝑛 𝑛
𝑖=1 𝑖=1

Date: 2 / 04 / 23

# For the data having their respective frequency:

𝑛
𝑓𝑖 (𝑥𝑖 − 𝑥̅ )2
𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛, 𝜎 = √∑
𝑛
𝑖=1

𝑛 𝑛 2
𝑓𝑖 𝑥𝑖2 𝑓𝑖 𝑥𝑖

= ∑ − (∑ )
𝑛 𝑛
𝑖=1 𝑖=1

𝑛
𝑓𝑖 (𝑥𝑖 − 𝑥̅ )2
𝑃𝑜𝑝𝑢𝑙𝑎𝑖𝑜𝑛 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒, 𝜎 2 = ∑
𝑛
𝑖=1

37
𝑛 𝑛 2
𝑓𝑖 𝑥𝑖2 𝑓𝑖 𝑥𝑖
=∑ − (∑ )
𝑛 𝑛
𝑖=1 𝑖=1

# Sample standard deviation and variance:

Sample standard deviation is denoted by 𝑆 (without having frequency),

𝑛
(𝑥𝑖 − 𝑥̅ )2
𝑆 = √∑
𝑛−1
𝑖=1

𝑛 𝑛 2
1 𝑥𝑖
=√ {∑ 𝑥𝑖2 − (∑ ) }
𝑛−1 𝑛
𝑖=1 𝑖=1

Sample variance,
𝑛
2
(𝑥𝑖 − 𝑥̅ )2
𝑆 =∑
𝑛−1
𝑖=1

𝑛 𝑛 2
1 𝑥𝑖
= {∑ 𝑥𝑖2 − (∑ ) }
𝑛−1 𝑛
𝑖=1 𝑖=1

Having respective frequencies,

𝑛
𝑓𝑖 (𝑥𝑖 − 𝑥̅ )2
𝑆𝑎𝑚𝑝𝑙𝑒 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 (𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦), 𝑆 = √∑
𝑛−1
𝑖=1

𝑛 𝑛 2
1 𝑓𝑖 𝑥𝑖
=√ {∑ 𝑓𝑖 𝑥𝑖2 − (∑ ) }
𝑛−1 𝑛
𝑖=1 𝑖=1

𝑛
2
𝑓𝑖 (𝑥𝑖 − 𝑥̅ )2
𝑆𝑎𝑚𝑝𝑙𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦), 𝑆 = ∑
𝑛−1
𝑖=1

38
𝑛 𝑛 2
1 𝑓𝑖 𝑥𝑖
= {∑ 𝑓𝑖 𝑥𝑖2 − (∑ ) }
𝑛−1 𝑛
𝑖=1 𝑖=1

# Question 1: The data on relative humidity (in %) for the last 10 days of a month in a city in
Bangladesh are 90, 97, 92, 95, 93, 95, 85, 83, 85, 75. Calculate the variance and standard
deviation for the above data.

Solve: Using the data, we construct the following table,


𝑥𝑖 𝑥𝑖2
90 8100 Here,
𝑛 𝑛
97 9409
∑ 𝑥𝑖 = 890, ∑ 𝑥𝑖2 = 79636
92 8464 𝑖=1 𝑖=1
95 9025
93 8649 We know,
95 9025 𝑛 𝑛 2
85 7225 2
𝑥𝑖2 𝑥𝑖
𝜎 = ∑ − (∑ )
83 6889 𝑛 𝑛
𝑖=1 𝑖=1
85 7225
75 5625

79636 890 2
𝜎2 = −( )
10 10
= 42.6

And Standard Deviation,

𝜎 = √42.6 = 6.53

# Question 2: Daily incomes of 15 workers in a factory are given below:

Daily Income 100 – 150 150 – 200 200 – 250 250 – 300 300 – 350
No. of workers 1 3 6 4 1

Calculate variance and standard deviation (SD)

39
Solve: Using the data, we construct the following table,
𝐶𝐼 𝑓𝑖 𝑀𝑖𝑑𝑝𝑜𝑖𝑛𝑡 (𝑥𝑖 ) 𝑓𝑖 𝑥𝑖 𝑥𝑖2 𝑓𝑖 𝑥𝑖2
100 – 150 1 125 125 15625 15625
150 – 200 3 175 525 30625 91875
200 – 250 6 225 1350 50625 303750
250 – 300 4 275 1100 75625 302500
300 – 350 1 325 325 105625 105625

Here,
𝑛 𝑛

∑ 𝑓𝑖 𝑥𝑖 = 3425, ∑ 𝑓𝑖 𝑥𝑖2 = 819375


𝑖=1 𝑖=1

We know,

𝑛 𝑛 2
2
𝑓𝑖 𝑥𝑖2 𝑓𝑖 𝑥𝑖
𝜎 =∑ − (∑ )
𝑛 𝑛
𝑖=1 𝑖=1

3425 819375 2
= −( )
15 15
= 2488.89

And Standard deviation,

𝜎 = √2488.89 = 49.89

# Coefficient of variance: The coefficient of variance (CV) is relative measure of


dispersion which is denoted by,
𝜎
𝐶𝑉 = × 100%
𝑥̅

Where 𝜎 is the standard deviation and 𝑥̅ is the mean.

# Question 3: The run scores of two cricketers for 10 innings are given below:

A 105 12 45 74 0 30 80 55 0 39
B 25 35 28 40 21 14 33 31 41 32

a. Who is a better player?

40
b. Who is a more consistent player?

Solve (a): In order to find out more consistent player, we have to calculate the
coefficient of variance for each cricketer.

Cricketer A Cricketer B
Score (𝑥𝑖 ) 𝑥𝑖2 Score (𝑦𝑖 ) 𝑦𝑖2
105 11025 25 625
12 144 35 1225
45 2025 28 784
74 5476 40 1600
0 0 21 441
30 900 14 196
80 6400 33 1089
55 3025 31 961
0 0 41 1681
39 1521 32 1024
𝑛 𝑛 𝑛 𝑛
∑ 𝑥𝑖 = 450 , ∑ 𝑥𝑖2 = 30616 ∑ 𝑦𝑖 = 300 , ∑ 𝑦𝑖2 = 9626
𝑖=1 𝑖=1 𝑖=1 𝑖=1

For cricketer A, Mean,


𝑛
𝑥𝑖 450
𝑥̅ = ∑ = = 45
𝑛 10
𝑖=1

And Standard deviation,

𝑛 𝑛
𝑥𝑖2 𝑥𝑖 2 30616 450 2
𝜎𝐴 = √∑ − ∑ ( ) = √ −( ) = 32.195
𝑛 𝑛 10 10
𝑖=1 𝑖=1

𝜎𝐴
𝐶𝑉 (𝐴) = × 100% = 71.54%
𝑥̅

Again, for cricketer B, Mean,


𝑛
𝑦𝑖 300
𝑦̅ = ∑ = = 30
𝑛 10
𝑖=1

41
And standard deviation,

𝑛 𝑛
𝑦𝑖2 𝑦𝑖 2 9626 300 2
𝜎𝐵 = √∑ − ∑ ( ) = √ −( ) = 7.92
𝑛 𝑛 10 10
𝑖=1 𝑖=1

𝜎𝐵
𝐶𝑉 (𝐵) = × 100% = 26.37%
𝑦̅

From the above results, the mean score of cricketer A is higher than the mean score of
cricketer B

Hence cricketer A is a better player.

Solve (B): Since the coefficient of variance for cricketer A is higher than that of cricketer
B, cricketer B is more consistent than cricketer A.

# Question 4: The daily wages (in taka) paid to the workers in the two factories A and B in
Dhaka city are given below:

Daily wages 150-200 200-250 250-300 300-350 350-400 400-450 450-500


Factory A 15 20 42 50 35 20 12
Factory B 22 37 71 55 44 26 15

a. Which factory pays higher average wages?


b. Which factory pays higher amount of wages?
c. Which factory has more consistent wages structure?

Solve: From the above data, we can construct the table below for “Factory A”,

Class Interval 𝑓𝑖 𝑥𝑖 𝑥𝑖2 𝑓𝑖 𝑥𝑖 𝑓𝑖 𝑥𝑖2


150 – 200 15 175 30625 2625 459375
200 – 250 20 225 50625 4500 1012500
250 – 300 42 275 75625 11550 3176250
300 – 350 50 325 105625 16250 5281250
350 – 400 35 375 140625 13125 4921875
400 – 450 20 425 180625 8500 3612500
450 – 500 12 475 225625 5700 2707500

Here,

42
𝑛 𝑛

𝑛 = 194, ∑ 𝑓𝑖 𝑥𝑖 = 62250, ∑ 𝑓𝑖 𝑥𝑖2 = 21171250


𝑖=1 𝑖=1

𝑛 𝑓 𝑖 𝑥𝑖
a: So, Mean (Factory A), 𝑥 𝐴 = ∑𝑖=1
̅̅̅ 𝑛

62250
∴ ̅̅̅
𝑥𝐴 = = 320.87
194

From the above data, we can construct the table below for “Factory B”,

Class Interval 𝑓𝑖 𝑥𝑖 𝑥𝑖2 𝑓𝑖 𝑥𝑖 𝑓𝑖 𝑥𝑖2


150 – 200 22 175 30625 3850 673750
200 – 250 37 225 50625 8325 1873125
250 – 300 71 275 75625 19525 5369375
300 – 350 55 325 105625 17875 5809430
350 – 400 44 375 140625 16500 6187500
400 – 450 26 425 180625 11050 4696250
450 – 500 15 475 225625 7125 3384375
𝑛 𝑛

𝑛 = 270, ∑ 𝑓𝑖 𝑥𝑖 = 84250, ∑ 𝑓𝑖 𝑥𝑖2 = 27993805


𝑖=1 𝑖=1

𝑓 𝑖 𝑥𝑖
𝑥𝐵 = ∑𝑛𝑖=1
Mean (Factory B), ̅̅̅ 𝑛

84250
∴ ̅̅̅
𝑥𝐵 = = 312.03
270

Thus, “Factory A” pays higher average wages than “Factory B”.

b: Here we want to determine which factory gives higher amount of wages. So, we need
to identify which factory has higher variability in individual wages.

Thus, for “Factory A”,

𝑛 𝑛 2
𝑓𝑖 𝑥𝑖2 𝑓𝑖 𝑥𝑖

𝜎𝐴 = ∑ − (∑ )
𝑛 𝑛
𝑖=1 𝑖=1

21171250 62250 2
= √( )−( ) = √6168.56 = 78.54
194 194

43
For “Factory B”,

𝑛 𝑛 2
𝑓𝑖 𝑥 2 𝑓𝑖 𝑥𝑖
𝜎𝐵 = √∑ 𝑖 − (∑ )
𝑛 𝑛
𝑖=1 𝑖=1

27993805 84250 2

= ( )−( ) = √6313.64 = 79.45
270 270

Here, we get, 𝜎𝐴 < 𝜎𝐵

Thus, “Factory B” has higher amount of wages.

c: To determine consistent wage structure, we need to determine the Coefficient of


variance.

Now, 𝜎𝐴 = 78.54, 𝜎𝐵 = 79.45, ̅̅̅


𝑥𝐴 = 320.87, ̅̅̅
𝑥𝐵 = 312.03

So, for “Factory A”,

𝜎𝐴 78.54
𝐶𝑉(𝐴) = × 100% = × 100% = 24.50%
𝑥̅ 320.87

For “Factory B”,

𝜎𝐵 79.45
𝐶𝑉(𝐵) = × 100% = × 100% = 25.46%
𝑥̅ 312.03

Since the Coefficient of variance of “Factory B” is higher than that of “Factory A”,
“Factory A” has more consistent wage structure.

# Question 5: The following information is given to boys and girls in a class.

Boys Girls

Number 100 75
Mean Weight (kg) 56 42
Variance 9 4

a. Calculate combined arithmetic mean (AM).


b. Calculate combined Standard Deviation (SD).
c. Which of the two distributions is more stable?

44
Solve: Given that,

𝑛1 = 100, 𝑛2 = 75

𝑥1 = 56, 𝑥
̅̅̅ ̅̅̅2 = 42

𝑆12 = 9, 𝑆22 = 4

Solve a: The combined mean,

𝑛1 ̅̅̅
𝑥1 + 𝑛2 ̅̅̅
𝑥2
𝑥̅ = = 50
𝑛1 + 𝑛2

Solve b: The combined variance,

2
𝑛1 𝑆12 + 𝑛2 𝑆22 𝑛1 𝑛2 2
𝑆 = +{ × (𝑥
̅̅̅1 − 𝑥
̅̅̅)
2 }
𝑛1 + 𝑛2 (𝑛1 + 𝑛2 )2

= 54.857

So, the combined standard deviation,

𝑆 = √54.857 = 7.406

Solve c: If we want to determine the stability of the distributions, we need to know the
coefficient of variance.

We know,

𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛, 𝜎1 = √𝑆12 = √9 = 3

𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛, 𝜎2 = √𝑆22 = √4 = 2

Therefore,
𝜎 3
Coefficient of variance for the boys, 𝐶𝑉 (𝐵𝑜𝑦𝑠) = ̅𝑥̅̅1̅ × 100% = 56 × 100% = 5.35%
1

𝜎 2
Coefficient of variance for the girls, 𝐶𝑉 (𝐺𝑖𝑟𝑙𝑠) = ̅𝑥̅̅2̅ × 100% = 42 × 100% = 4.76%
2

Thus, girl’s distribution is more stable.

45
Date: 4 / 04 / 23

Theorem:

For a set of two unequal observations, each of mean deviation and standard deviation is
half of the range, that is,

𝑅𝑎𝑛𝑔𝑒
𝑀𝐷 = 𝑆𝐷 =
2

Proof:

Let two observations be 𝑥1 and 𝑥2 and 𝑥1 > 𝑥2

Now, 𝑅 = 𝑥1 − 𝑥2
𝑥1 +𝑥2
And Mean, 𝑥̅ = 2

Now,

𝑥1 + 𝑥2
𝑥1 − 𝑥̅ = 𝑥1 −
2
𝑥1 − 𝑥2
=
2

And,

𝑥1 + 𝑥2
𝑥2 − 𝑥̅ = 𝑥2 −
2
𝑥2 − 𝑥1
=
2

Mean deviation,

∑|𝑥𝑖 − 𝑥̅ |
𝑀𝐷 =
𝑛
|𝑥1 − 𝑥̅ | + |𝑥2 − 𝑥̅ |
=
2

46
𝑥1 − 𝑥2 𝑥2 − 𝑥1
| | + |
= 2 2 |
2
𝑥1 − 𝑥2 𝑥 −𝑥
+ |− 1 2 2 |
= 2
2
𝑥1 − 𝑥2 𝑥1 − 𝑥2
+
= 2 2
2
𝑥1 − 𝑥2
=
2
𝑅
= 𝑀𝐷 =
2

Now, Standard deviation,

(𝑥1 − 𝑥̅ )2 + (𝑥2 − 𝑥̅ )2
𝑆𝐷 = √
2

𝑥1 − 𝑥2 2 𝑥2 − 𝑥1 2
√( 2 ) + ( 2 )
=
2

𝑥1 − 𝑥2 2 𝑥1 − 𝑥2 2
√( 2 ) + ( 2 )
=
2

𝑥1 − 𝑥2 2
√2 ( 2 )
=
2
𝑥1 − 𝑥2
=
2
𝑅
=
2
𝑅
∴ 𝑆𝐷 =
2
𝑛2 −1
# Question 1: Show that the variance of first 𝑛 natural numbers is 12

Solve: Let the variable 𝑥 assume the 1𝑠𝑡 𝑛 natural numbers 1, 2, 3, … … , 𝑛

47
Thus, the variance,

2
∑𝑥𝑖 ∑𝑥𝑖 2
𝑆 = −( )
𝑛 𝑛

Here, ∑𝑥𝑖 = 1 + 2 + 3 + ⋯ + 𝑛

𝑛(𝑛 + 1)
=
2

And ∑𝑥𝑖2 = 12 + 22 + 32 + ⋯ + 𝑛2

𝑛(𝑛 + 1)(2𝑛 + 1)
=
6

Here,
2
2
𝑛(𝑛 + 1)(2𝑛 + 1) 𝑛(𝑛 + 1)
𝑆 = −( )
6𝑛 2𝑛

(𝑛 + 1)(2𝑛 + 1) 𝑛+1 2
= −( )
6 2
𝑛+1 2𝑛 + 1 𝑛 + 1
= ×( − )
2 3 2
𝑛 + 1 4𝑛 + 2 − 3𝑛 − 3
= ×
2 6
𝑛+1 𝑛−1
= ×
2 6
𝑛2 − 1
=
12

∑(𝑥𝑖 −𝑥̅ )2
# Question 2: For a set of 𝑛 positive observations, if 𝑆 2 = , then prove that,
𝑛

1. 𝑥̅ √𝑛 − 1 > 𝑆
2. 𝐶𝑉 < 100√(𝑛 − 1)

Solve 1: We know,

∑𝑥𝑖
𝑥̅ =
𝑛

48
∴ ∑𝑥𝑖 = 𝑛𝑥̅ − − − − − − − − − −(1)

Again,

(∑𝑥𝑖 )2 = ∑𝑥𝑖2 + ∑𝑥𝑖 ∑𝑥𝑗

Thus,

→ (∑𝑥𝑖 )2 > ∑𝑥𝑖2

(∑𝑥𝑖 )2 (∑𝑥𝑖 )2
→ (∑𝑥𝑖 )2 − > ∑𝑥𝑖2 −
𝑛 𝑛
(∑𝑥𝑖 )2
𝑛 (∑𝑥𝑖2 −
(∑𝑥𝑖 )2 𝑛 )
→ (∑𝑥𝑖 )2 − >
𝑛 𝑛

1 ∑𝑥𝑖2 ∑𝑥𝑖 2
→ (∑𝑥𝑖 )2 (1 − ) > 𝑛 ( −( ) )
𝑛 𝑛 𝑛

(𝑛𝑥̅ )2 (𝑛 − 1)
→ > 𝑛𝑆 2 [∴ 𝑓𝑟𝑜𝑚 𝑒𝑞𝑢𝑎𝑡𝑖𝑜𝑛 (1) → ∑𝑥𝑖 = 𝑛𝑥̅ ]
𝑛
𝑛2 𝑥̅ 2 . (𝑛 − 1)
→ > 𝑛𝑆 2
𝑛
→ 𝑥̅ 2 (𝑛 − 1) > 𝑆 2

→ 𝑥̅ √𝑛 − 1 > 𝑆

Solve 2: Given that,

𝑥̅ √𝑛 − 1 > 𝑆

𝑆
→ √𝑛 − 1 >
𝑥̅
𝑆
→ 100 × √𝑛 − 1 > × 100
𝑥̅

∴ 100√𝑛 − 1 > 𝐶𝑉

# Show that the mean deviation cannot exceed the standard deviation, 𝑆 ≥ 𝑀𝐷.

49
Solve: We know,

2
∑(𝑥𝑖 − 𝑥̅ )2 ∑𝑥𝑖2 ∑𝑥𝑖 2
𝑆 = = −( )
𝑛 𝑛 𝑛

We also know that the square of any non-imaginary number is greater or equal zero.

Thus,

∑𝑥𝑖2 ∑𝑥𝑖 2
−( ) ≥0
𝑛 𝑛

∑𝑥𝑖2 ∑𝑥𝑖 2
→ ≥( )
𝑛 𝑛

Replacing ∑𝑥𝑖 with (𝑥𝑖 − 𝑥̅ ), we get,

∑(𝑥𝑖 − 𝑥̅ )2 𝑥𝑖 − 𝑥̅ 2
→ ≥( )
𝑛 𝑛
→ 𝑆 2 ≥ 𝑀𝐷2

→ 𝑆 ≥ 𝑀𝐷

Date: 11 / 04 / 23

# Moments: A set of descriptive measure which can provide a unique characterization


of a distribution and hence can determine the distribution uniquely is called moments.

Moments can be computed from mean or from any arbitrary constant.

1. Central moment: Moments computed about mean are called central moments.
2. Ray moment: Moment computed about any arbitrary value are called raw
moments.

# Central moments or moments about mean:

Central moments are moments computed from arithmetic mean. Let us consider 𝑛
observations and 𝑥̅ be its mean. The 𝑟 𝑡ℎ central moment, denoted by 𝜇𝑟 , is defined as,

50
∑(𝑥𝑖 − 𝑥̅ )𝑟
𝜇𝑟 =
𝑛

Thus, 1𝑠𝑡 central moment,

∑𝑥𝑖 − 𝑥̅
𝜇1 = =0
𝑛

2𝑛𝑑 central moment,

∑(𝑥𝑖 − 𝑥̅ )2
𝜇2 = = 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
𝑛

3𝑟𝑑 central moment,

∑(𝑥𝑖 − 𝑥̅ )3
𝜇3 =
𝑛

# Central moments for frequency data:

Suppose 𝑥1 , 𝑥2 , … … , 𝑥𝑛 are the class marks and 𝑓1 , 𝑓2 , … … , 𝑓𝑛 the corresponding


frequencies. Such that,

∑𝑓𝑖 = 𝑁

Then the 𝑟 𝑡ℎ central moment,

∑𝑓𝑖 (𝑥𝑖 − 𝑥̅ )𝑟
𝜇𝑟 =
𝑛

# Question 1: The heights (in cm) of the players of football team-1 are given below:

142, 143, 146, 146, 148. Compute the first four central moments.

Solve: Given that, 142, 143, 146, 146, 148


∑𝑥𝑖
Here, 𝑥̅ = 𝑛

142 + 143 + 146 + 146 + 148


=
5
= 145

We know 𝑟 𝑡ℎ central moment is,

51
∑(𝑥𝑖 − 𝑥̅ )𝑟
𝜇𝑟 =
𝑛

From the given data, we construct the table below:


𝑥𝑖 𝑥𝑖 − 𝑥̅ (𝑥𝑖 − 𝑥̅ )2 (𝑥𝑖 − 𝑥̅ )3 (𝑥𝑖 − 𝑥̅ )4
142 -3 9 -27 81
143 -2 4 -8 16
146 1 1 1 1
146 1 1 1 1
148 3 9 27 81

Here,

∑(𝑥𝑖 − 𝑥̅ ) = 0, ∑(𝑥𝑖 − 𝑥̅ )2 = 24, ∑(𝑥𝑖 − 𝑥̅ )3 = −6, ∑(𝑥𝑖 − 𝑥̅ )4 = 180, 𝑛 = 5

∑(𝑥𝑖 −𝑥̅ )
1𝑠𝑡 central moment, 𝜇1 = =0
𝑛

∑(𝑥𝑖 −𝑥̅ )2
2𝑛𝑑 central moment, 𝜇2 = = 4.8
𝑛

∑(𝑥𝑖 −𝑥̅ )3
3𝑟𝑑 central moment, 𝜇3 = = −1.2
𝑛

∑(𝑥𝑖 −𝑥̅ )4
4𝑟𝑑 central moment, 𝜇4 = = 36
𝑛

# Central moments in terms of Raw moments:

Let 𝑥1 , 𝑥2 , … … , 𝑥𝑛 be 𝑛 observations, 𝑥̅ be its mean. Let 𝑎 be an arbitrary value such that,


𝑎 ≠ 𝑥̅ . Then 𝑟 𝑡ℎ raw moment,

∑(𝑥𝑖 − 𝑎)𝑟 ∑𝑑𝑖𝑟


𝜇𝑟′ = = [𝑑𝑖 = 𝑥𝑖 − 𝑎, 𝑟 = (1, 2, 3)]
𝑛 𝑛

Putting 𝑟 = 1, 2, 3, 4, we get,

∑(𝑥𝑖 − 𝑎)
𝜇1′ =
𝑛
∑𝑥𝑖 ∑𝑎
= −
𝑛 𝑛
∑𝑥𝑖 𝑛𝑎
= −
𝑛 𝑛

52
∴ 𝜇1′ = 𝑥̅ − 𝑎

1𝑠𝑡 central moment,

∑(𝑥𝑖 − 𝑥̅ )
𝜇1 =
𝑛
∑𝑥𝑖 𝑛𝑥̅
= −
𝑛 𝑛
= 𝑥̅ − 𝑥̅

=0

2𝑛𝑑 central moment,

∑(𝑥𝑖 − 𝑥̅ )2
𝜇2 =
𝑛
∑{(𝑥𝑖 − 𝑎) − (𝑥̅ − 𝑎)}2
=
𝑛
∑(𝑑𝑖 − 𝜇1′ )2
=
𝑛
2
∑𝑑𝑖2 − 2𝑑𝑖 𝜇1′ + 𝜇1′
=
𝑛
2
∑𝑑𝑖2 ∑𝑑𝑖 ′ ∑𝜇1′
= − 2. 𝜇 +
𝑛 𝑛 1 𝑛
2 2
= 𝜇2′ − 2𝜇1′ + 𝜇1′
2
∴ 𝜇2 = 𝜇2′ − 𝜇1′

# Question 2: Prove that,


3
1. 𝜇3 = 𝜇3′ − 3𝜇2′ 𝜇1′ + 2𝜇1′
4
2. 𝜇4 = 𝜇4′ − 4𝜇3′ 𝜇1′ + 6𝜇2′ 𝜇1′ − 3𝜇1′

Solve 1: We know that,

1
𝜇𝑟 = ∑(𝑥𝑖 − 𝑥̅ )𝑟
𝑛

53
1
= ∑{(𝑥𝑖 − 𝑎) − (𝑥̅ − 𝑎)}𝑟
𝑛

Now, replacing 𝑥𝑖 − 𝑎 with 𝑑𝑖 , we get,

1
= ∑(𝑑𝑖 − 𝜇1′ )𝑟
𝑛
1 2 𝑟
∴ 𝜇𝑟 = ∑{𝑑𝑖𝑟 − 𝑟𝑐1 𝑑𝑖𝑟−1 𝜇1′ + 𝑟𝑐2 𝑑𝑖𝑟−2 𝜇1′ + ⋯ + (−1)𝑟 𝜇1′ } − − − − − − − (𝑖)
𝑛

From equation (𝑖), we get,

′ 2 3 𝑟
𝜇𝑟 = 𝜇𝑟′ − 𝑟𝑐1 𝜇𝑟−1 𝜇1′ + 𝑟𝑐2 𝜇𝑟−2

𝜇1′ − 𝑟𝑐3 𝜇𝑟−3

𝜇1′ + ⋯ + (−1)𝑟 𝜇1′

1: Now putting 𝑟 = 3 in equation (𝑖), we get,

1 2 3
𝜇3 = ∑{𝑑𝑖3 − 3𝑐1 𝑑𝑖2 𝜇1′ + 3𝑐2 𝑑𝑖 𝜇1′ − 𝜇1′ }
𝑛
3
𝑑𝑖3 𝑑𝑖2 ′ 𝑑𝑖 ′ 2 𝑛𝜇1′
= − 3 𝜇1 + 3. 𝜇1 −
𝑛 𝑛 𝑛 𝑛
3 3
= 𝜇3′ − 3𝜇2 𝜇1′ + 3𝜇1′ − 𝜇1′
3
= 𝜇3′ − 3𝜇2′ 𝜇1′ + 2𝜇1′

2: Now putting 𝑟 = 4, in equation (𝑖𝑖), we get,

1 2 3 4
𝜇4 = ∑{𝑑𝑖4 − 4𝑐1 𝑑𝑖3 𝜇1′ + 4𝑐2 𝑑𝑖2 𝜇1′ − 4𝑐3 𝑑𝑖 𝜇1′ + 𝜇1′ }
𝑛
4
𝑑𝑖4 𝑑𝑖3 𝑑𝑖2 2 𝑑𝑖 3 𝑛𝜇1′
= − 4. . 𝜇1′ + 6. . 𝜇1′ − 4. . 𝜇1′ +
𝑛 𝑛 𝑛 𝑛 𝑛
2 4 4
= 𝜇4′ − 4𝜇3′ 𝜇1′ + 6𝜇2′ 𝜇1′ − 4𝜇1′ + 𝜇1′
2 4
= 𝜇4′ − 4𝜇3′ 𝜇1′ + 6𝜇2′ 𝜇1′ − 3𝜇1′

# Question 3: Compute first four central moments for the data on number of computers sold per
day is given below:

Class Interval 20 – 29 30 – 39 40 – 49 50 – 59 60 – 69 70 – 79
Frequency 4 11 9 7 5 4

54
Solve: From the given data, let us construct the table below:
𝐶𝐼 𝑀𝑃 𝐹𝑟𝑒𝑞 𝑑 𝑓𝑑 𝑓𝑑 2 𝑓𝑑 3 𝑓𝑑 4
20 – 29 24.5 4 -2 -8 16 -32 64
30 – 39 34.5 11 -1 -11 11 -11 11
40 – 49 44.5 9 0 0 0 0 0
50 – 59 54.5 7 1 7 7 7 7
60 – 69 64.5 5 2 10 20 40 80
70 – 79 74.5 4 3 12 36 108 324

Here,

𝑁 = 40, ∑ 𝑓𝑑 = 10, ∑ 𝑓𝑑 2 = 90 , ∑ 𝑓𝑑 3 = 112 , ∑ 𝑓𝑑 4 = 486

Raw moment,

𝑓𝑑 10
𝜇1′ = 𝐶 ∑ = 10 × = 2.5
𝑁 40
𝑓𝑑2 90
𝜇2′ 2
=𝐶 ∑ = 102 × = 225
𝑁 40
𝑓𝑑3 112
𝜇3′ 3
=𝐶 ∑ = 103 × = 2800
𝑁 40
𝑓𝑑4 486
𝜇4′ = 𝐶 4 ∑ = 104 × = 121500
𝑁 40

Central moments:

𝜇1 = 0
2
𝜇2 = 𝜇2′ − 𝜇1′ = 218.75
3
𝜇3 = 𝜇3′ − 3𝜇2′ 𝜇1′ + 2𝜇1′ = 1143.75
2 4
𝜇4 = 𝜇4′ − 4𝜇3′ 𝜇1′ + 6𝜇2′ 𝜇1′ − 3𝜇1′ = 101820.3125

55
Date: 7 / 05 / 23

Skewness: Skewness is a measure of shape characteristics of a distribution. It is a


measure of symmetry or more precisely, lack of symmetry. A distribution or series of
data is symmetric if it looks the same to the left and right of the center point. If the data
is not symmetrical, it is said to be skewed.

Symmetrical distribution: A symmetrical distribution is a type of distribution where


left side of the distribution mirrors the right side (Example: 1). For symmetric
distribution, the measures of its central tendency, i.e. AM, Median and mode are equal.

Negatively Skewed distribution: A negatively skewed distribution is one in which the


tail of the distribution shifts towards the left side, i.e. towards the negative side of the
peak. (Example: 2). For such distribution,

𝑀𝑒𝑎𝑛 < 𝑀𝑒𝑑𝑖𝑎𝑛 < 𝑀𝑜𝑑𝑒

Positively skewed distribution: A positively skewed distribution is one in which the


tail of the distribution shift towards the right (Example: 3) and for such distribution,

𝑀𝑒𝑎𝑛 > 𝑀𝑒𝑑𝑖𝑎𝑛 > 𝑀𝑜𝑑𝑒

Measures of skewness:

56
One measure of skewness is based on the second and third central moments in a
frequency distribution. If 𝜇2 and 𝜇3 are the second and third central moments
respectively, in a frequency distribution,
𝑛
(𝑥𝑖 − 𝑥̅ )2
𝜇2 = ∑
𝑛
𝑖=1

𝑛
(𝑥𝑖 − 𝑥̅ )3
𝜇3 = ∑
𝑛
𝑖=1

The coefficient of skewness, 𝛽1 is given by,

𝜇32
𝛽1 =
𝜇23

For grouped data,


𝑛
𝑓𝑖 (𝑥𝑖 − 𝑥̅ )2
𝜇2 = ∑
𝑛
𝑖=1

𝑛
𝑓𝑖 (𝑥𝑖 − 𝑥̅ )3
𝜇3 = ∑
𝑛
𝑖=1

In a symmetric distribution, 𝛽1 will be zero. A coefficient of skewness 𝛾1 which is


defined as the square root of 𝛽1 ,
𝜇3
𝛾1 = √𝛽1 =
√𝜇23

The sign of skewness 𝛾1 would depend entirely upon the value of 𝜇3 . This is the most
important measure of skewness from a theoretical point of view.

𝑖. 𝐼𝑓 𝛾1 = 0, 𝑡ℎ𝑒 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑖𝑠 𝑠𝑦𝑚𝑚𝑒𝑡𝑟𝑖𝑐

𝑖𝑖. 𝐼𝑓 𝛾1 > 0, 𝑡ℎ𝑒 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑖𝑠 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑙𝑦 𝑠𝑘𝑒𝑤𝑒𝑑

𝑖𝑖𝑖. 𝐼𝑓 𝛾1 < 0, 𝑡ℎ𝑒 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑖𝑠 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑙𝑦 𝑠𝑘𝑒𝑤𝑒𝑑

Kurtosis: Kurtosis is another measure of the shape of distribution whereas skewness


measures the lack of symmetry of the frequency curve of a distribution, Kurtosis is a
measure of the relative peakedness of its frequency curve. Various frequency curves can
be divided into three categories depending upon the shape of their peak.

57
1. Mesokurtic
2. Platykurtic
3. Leptokurtic

Measure of kurtosis:

A measure of kurtosis is defined by,


𝜇4
𝛽2 =
𝜇22

Where 𝜇2 and 𝜇4 are the second and forth central moments respectively,
𝑛 𝑛
(𝑥𝑖 − 𝑥̅ )2 (𝑥𝑖 − 𝑥̅ )4
𝜇2 = ∑ 𝜇4 = ∑
𝑛 𝑛
𝑖=1 𝑖=1

For grouped data,


𝑛 𝑛
𝑓𝑖 (𝑥𝑖 − 𝑥̅ )2 𝑓𝑖 (𝑥𝑖 − 𝑥̅ )4
𝜇2 = ∑ 𝜇4 = ∑
𝑛 𝑛
𝑖=1 𝑖=1

This measure is also known as Karl Pearson’s measure of kurtosis.

The coefficient of kurtosis:

1. If the value of 𝛽2 = 3 or 𝛾2 = 𝛽2 − 3 = 0, then the distribution is mesokurtic. This


value is taken as a standard against which the kurtosis of other distribution is
judged.
2. When 𝛽2 > 3 or 𝛾2 = 𝛽2 − 3 > 0, the curve is more peaked than the mesokurtic
curve and is called as mesokurtic curve.

58
3. When 𝛽2 < 3 or 𝛾2 = 𝛽2 − 3 < 0, the curve is less peaked than the mesokurtic
curve and is called a platykurtic curve.

Question 1: The 1st four central moments of distribution are 0, 2.5, 0.7 and 18.75. Examine the
skewness and kurtosis.

Solve: We know, the coefficient of,

𝜇32
𝛽1 = 3 = 0.021
𝜇2

Since 𝜇3 > 0 and 𝛽1 is small, the distribution is moderately positively skewed.

Now the coefficient of kurtosis,


𝜇4
𝛽2 = =3
𝜇22

∴ 𝛾2 = 𝛽2 − 3 = 0

Hence the curve is mesokurtic.

Question 2: The hourly earnings (in taka) of sample of 7 workers in a manufacturing company
are 27, 27, 24, 26, 25, 24, 22. Compute the coefficient of skewness based on moment.

Solve: From the given data, we get,

27 + 27 + 24 + 26 + 25 + 24 + 22
𝑥̅ = = 25
7
𝑥𝑖 𝑥𝑖 − 𝑥̅ (𝑥𝑖 − 𝑥̅ )2 (𝑥𝑖 − 𝑥̅ )3 (𝑥𝑖 − 𝑥̅ )4
27 2 4 8 16
27 2 4 8 16
24 -1 1 -1 1
26 1 1 1 1
25 0 0 0 0
24 -1 1 -1 1
22 -3 9 -27 81

Here,

∑(𝑥𝑖 − 𝑥̅ ) = 0, ∑(𝑥𝑖 − 𝑥̅ )2 = 20, ∑(𝑥𝑖 − 𝑥̅ )3 = −12, ∑(𝑥𝑖 − 𝑥̅ )4 = 116, 𝑛 = 7

59
∑(𝑥𝑖 −𝑥̅ )
1𝑠𝑡 central moment, 𝜇1 = =0
𝑛

∑(𝑥𝑖 −𝑥̅ )2 20
2𝑛𝑑 central moment, 𝜇2 = = = 2.8571
𝑛 7

∑(𝑥𝑖 −𝑥̅ )3 12
3𝑟𝑑 central moment, 𝜇3 = =− = −1.7143
𝑛 7

∑(𝑥𝑖 −𝑥̅ )4 116


4𝑟𝑑 central moment, 𝜇4 = = = 16.5714
𝑛 7

Coefficient of skewness,

𝜇3 −1.71432
𝛾1 = = = −0.355
√𝜇23 √2.85713

Since 𝛾1 = −0.355 < 0. So the given distribution is negatively skewed.

The coefficient of kurtosis,

𝜇4 16.5714
𝛽2 = = = 2.0301
𝜇22 2.85712

Hence,

𝛾2 = 𝛽2 − 3 = 2.0301 − 3 = −0.9699

Since, 𝛾2 < 0,

Hence, the distribution is platykurtic.

1. Mesokurtosis: The property of having kurtosis equal to that of a normal


distribution, equivalently having zero excess kurtosis.
2. Leptokurtosis: The property of having kurtosis greater than that of a normal
distribution, equivalently having positive excess kurtosis.
3. Platykurtosis: The property of having kurtosis less than that of a normal
distribution, equivalently having negative excess kurtosis.

Date: 9 / 05 / 23

Correlation and Regression

60
Correlation: The strength of association that exists between two variables. The
statistical relationship which gives us the strength or degree and direction of association
or interrelationship that exists between two variables is called correlation analysis.

1. Simple Correlation: The study of strength or degree of linear relationship that


exists between two variables is termed as simple correlation.
2. Multiple Correlation: The relationship among more than two variables is termed
as multiple correlation.

Correlation are three types and denoted by 𝑟𝑥𝑦

1. Positive correlation → 𝑟𝑥𝑦 = +1


2. Negative correlation → 𝑟𝑥𝑦 = −1
3. Zero → 𝑟𝑥𝑦 = 0

Scatter plot:

Correlation coefficient: It is a descriptive measure of the strength and the direction of


linear (straight line) relationship between two quantitative variables. Let
(𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 ), (𝑥3 , 𝑦3 ), … … , (𝑥𝑛 , 𝑦𝑛 ) be 𝑛 pair of observations and 𝑥̅ and 𝑦̅ are the
means of 𝑥 and 𝑦 respectively. Then the correlation coefficient is defined,

∑(𝑥𝑖 − 𝑥)(𝑦𝑖 − 𝑦)
𝑟𝑥𝑦 =
√∑(𝑥𝑖 − 𝑥)2 (𝑦𝑖 − 𝑦)2

Alternative method:

(∑𝑥𝑖 )(∑𝑦𝑖 )
∑(𝑥𝑖 𝑦𝑖 ) −
𝑟= 𝑛
2 2
√∑(𝑥𝑖2 ) − (∑𝑥𝑖 ) . √(∑𝑦𝑖2 ) − (∑𝑦𝑖 )
𝑛 𝑛

61
1. Positive correlation: When two variables vary together in the same direction, i.e.
small values of variable 𝑥 are associated with small values of 𝑦, and large values
of 𝑥 are associated with large values of 𝑦, the correlation between the two
variables is said to be positive.
2. Negative correlation: When two variables vary together in the opposite
direction, i.e. small values of 𝑥 are associated with the large values of 𝑦, the large
values of 𝑥 are associated with the small values of 𝑦, the correlation between the
variables is said to be negative.
3. Zero correlation: When there is no relationship between two variables, i.e. high
and low values of the two variables do not show any relationship that can be
predicted, then there exists a zero correlation between the variables.

Question 1. Prove that, −1 ≤ 𝑟𝑥𝑦 ≤ 1

Solve: Let (𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 ), … … , (𝑥𝑛 , 𝑦𝑛 ) be the pairs of 𝑛 observations, Then the
correlation coefficient between 𝑥 and 𝑦 denoted by 𝑟𝑥𝑦 and defined as,

∑(𝑥𝑖 − 𝑥)(𝑦𝑖 − 𝑦)
𝑟𝑥𝑦 =
√∑(𝑥𝑖 − 𝑥)2 ∑(𝑦𝑖 − 𝑦)2

Suppose, (𝑥𝑖 − 𝑥̅ ) = 𝑋 and (𝑦𝑖 − 𝑦̅) = 𝑌

Therefore, let us consider the following expression which is always positive.


2
𝑋 𝑌
∑( ± ) ≥0
√∑𝑋 2 √∑𝑌 2

𝑋2 𝑋 𝑌 𝑌2
→ ∑ ( 2 ± 2. . + 2
)≥0
∑𝑋 √∑𝑋 2 √∑𝑌 2 ∑𝑌

∑𝑋 2 ∑𝑋𝑌 ∑𝑌 2
→ ± 2. + ≥0
∑𝑋 2 √∑𝑋 2 ∑𝑌 2 ∑𝑌
2

→ 1 ± 2. 𝑟𝑥𝑦 + 1 ≥ 0

→ 1 ± 𝑟𝑥𝑦 ≥ 0 − − − − − − − −(𝑖)

From equation (𝑖), we get,

Either, Or,
1 + 𝑟𝑥𝑦 ≥ 0 1 − 𝑟𝑥𝑦 ≥ 0
∴ 𝑟𝑥𝑦 ≥ −1 ∴ 𝑟𝑥𝑦 ≤ 1

62
Hence, −1 ≤ 𝑟𝑥𝑦 ≤ 1

Question 2. Compute correlation between 𝑥 and 𝑦 from the data given below:

x 60 66 66 66 68 68 70 72 74 80
y 5 7 8 9 11 12 14 16 21 27

Solve: Here,

∑𝑥𝑖 690
𝑥̅ = = = 69
𝑛 10
∑𝑦𝑖 130
𝑦̅ = = = 13
𝑛 10

From the given data, we construct the table below:

𝑥 𝑦 𝑥𝑖 − 𝑥̅ 𝑦𝑖 − 𝑦̅ (𝑥𝑖 − 𝑥̅ )2 (𝑦𝑖 − 𝑦̅)2 (𝑥𝑖 − 𝑥̅ ) × (𝑦𝑖 − 𝑦̅)


60 5 -9 -8 81 64 72
66 7 -3 -6 9 36 18
66 8 -3 -5 9 25 15
66 9 -3 -4 9 16 12
68 11 -1 -2 1 4 2
68 12 -1 -1 1 1 1
70 14 1 1 1 1 1
72 16 3 3 9 9 9
74 21 5 8 25 64 40
80 27 11 14 121 196 154

From the table,

∑(𝑥𝑖 − 𝑥̅ ) × (𝑦𝑖 − 𝑦̅) = 324

∑(𝑥𝑖 − 𝑥̅ )2 = 266

∑(𝑦𝑖 − 𝑦̅)2 = 416

Hence the coefficient correlation is,

∑(𝑥𝑖 − 𝑥)(𝑦𝑖 − 𝑦)
𝑟𝑥𝑦 =
√∑(𝑥𝑖 − 𝑥)2 ∑(𝑦𝑖 − 𝑦)2

324
𝑟𝑥𝑦 =
√266 × 416

63
𝑟𝑥𝑦 = 0.97

Date: 14 / 05 / 23

Rank Correlation: Rank correlation and coefficient is defined as,

6∑𝐷2
𝑅 = 1−
𝑁(𝑁 2 − 1)

Where 𝑅 denotes rank coefficient of correlation and 𝐷 refers to the difference of ranks
between paired items in two series.

Question 1. Two managers are asked to rank a group of employees in order of potential for
eventually becoming top managers. The rankings are as follows:

Employee A B C D E F G H I J
Manager 1 10 2 1 4 3 6 5 8 7 9
Manager 2 9 4 2 3 1 5 6 8 7 10

Solve: From the given data, we can construct,

Employee A B C D E F G H I J
Manager 1 10 2 1 4 3 6 5 8 7 9
Manager 2 9 4 2 3 1 5 6 8 7 10
(𝑅1 − 𝑅2 )2 1 4 1 1 4 1 1 0 0 1
= 𝐷2

Here, ∑𝐷2 = 14

We know,

6∑𝐷2
𝑅 =1−
𝑁(𝑁 2 − 1)
6 × 14
=1−
10(102 − 1)

= 0.915

Thus, we find that there is a high degree of positive correlation in the ranks assigned by
the two managers.

64
Question 2. Calculate the rank correlation coefficient for the following data of marks of 2 tests
given to candidates for clerical job.

Preli. 92 89 87 86 83 77 71 63 53 50
Final 86 83 91 77 68 85 52 82 37 57

Solve: The given data is unranked. So, ranking these data, we can construct the below
table,

Preli. Rank, 𝑅1 Final Rank, 𝑅2 (𝑅1 − 𝑅2 )2


= 𝐷2
92 1 91 1 1
89 2 86 2 9
87 3 85 3 4
86 4 83 4 9
83 5 82 5 1
77 6 77 6 0
71 7 68 7 9
63 8 57 8 36
53 9 52 9 1
50 10 37 10 169

Here, ∑𝐷2 = 239, 𝑁 = 10

6∑𝐷2
𝑅 =1−
𝑁(𝑁 2 − 1)

= 1.448

Thus, we find that there is a high degree of positive correlation in the ranks of given
two test marks.

Regression: The relationship between the response or dependent variable and the
explanatory or independent variable specified by a linear equation is called a
regression. The general form of a linear equation is,

𝑦 = 𝑎 + 𝑏𝑥

65
Where 𝑥 is called the independent variable and 𝑦 is called the dependent variable and
𝑎, 𝑏 are constants.

Date: 16 / 05 / 23

Simple linear regression model for 𝑥 = 𝑥𝑖 , we have a saving data set for 𝑦 and 𝑦̂𝑖 is its
predicted (average) saving for all values of 𝑥𝑖 the predicted values of 𝑦̂ fall on the
equation.

In general, we expect that 𝑦𝑖 ≠ 𝑦̂, 𝑖 because same data points do not fall on the best line.
The difference 𝑒𝑖 = 𝑦𝑖 − 𝑦̂𝑖 is the prediction error. The prediction error for the data point
𝑖 is illustrated in,

𝑦𝑖 − 𝑦̂𝑖 = 𝑒𝑖 [𝑦 = 𝐴 + 𝐵𝑥, 𝑦𝑖 = 𝐴 + 𝐵𝑥𝑖 ]

→ 𝑦𝑖 = 𝑦̂𝑖 − 𝑒𝑖

= 𝐴 + 𝐵𝑥𝑖 + 𝑒𝑖

Which is the linear regression model.

Where,

𝑦 = 𝐷𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒

𝐴 = 𝑦 − 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡 𝑜𝑓 𝑡ℎ𝑒 𝑙𝑖𝑛𝑒

𝑥 = 𝐼𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒

𝐵 = 𝑆𝑙𝑜𝑝𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑙𝑖𝑛𝑒

𝑒 = 𝑟𝑎𝑛𝑑𝑜𝑚 𝑒𝑟𝑟𝑜𝑟

# The linear regression line of 𝑥 on 𝑦 is,

𝑥̂ = 𝑎1 + 𝑏1 𝑦

Where,

∑(𝑥𝑖 − 𝑥̅ )(𝑦𝑖 − 𝑦̅)


𝑏1 =
∑(𝑦𝑖 − 𝑦̅)2

66
(∑𝑥𝑖 )(∑𝑦𝑖 )
∑𝑥𝑖 𝑦𝑖 −
= 𝑛
(∑𝑦 )2
∑𝑦𝑖2 − 𝑛𝑖

𝑎1 = 𝑥̅ − 𝑏1 𝑦̅

1
= (∑𝑥𝑖 − 𝑏∑𝑦𝑖 )
𝑛

Question 1. Find the regression equation

𝑥 60 66 66 66 68 68 70 72 74 80
𝑦 5 7 8 9 11 12 14 16 21 27

Solve: From the given data, we can construct the table below:
𝑥𝑖 𝑦𝑖 𝑥𝑖2 𝑦𝑖2 𝑥𝑖 𝑦𝑖
60 5 3600 25 300
66 7 4356 49 462
66 8 4356 64 528
66 9 4356 81 594
68 11 4624 121 748
68 12 4624 144 816
70 14 4900 196 980
72 16 5184 256 1152
74 21 5476 441 1554
80 27 6400 729 2160

Here, ∑ 𝑥𝑖 = 690, ∑𝑦𝑖 = 130, ∑𝑥𝑖2 = 47876, ∑𝑦𝑖2 = 2106, ∑𝑥𝑖 𝑦𝑖 = 9294, 𝑛 = 10

We know,

𝑦 𝑛 = 𝑎1 + 𝑏1 𝑥

∑𝑥𝑖 𝑦𝑖 − (∑𝑥𝑖 )(∑𝑦𝑖 )


→ 𝑎1 =
(∑𝑥 )2
∑𝑥𝑖2 − 𝑛𝑖

1
→ 𝑎1 = (∑𝑦𝑖 − 𝑏1 ∑𝑥𝑖 )
𝑛

The slope of the regression line is,

67
∑𝑥𝑖 ∑𝑦𝑖
∑𝑥𝑖 𝑦𝑖 −
𝑏1 = 𝑛
(∑𝑥 )2
∑𝑥𝑖2 − 𝑛𝑖

= 1.218

And,

1
𝑎= (∑𝑦𝑖 − 𝑏∑𝑥𝑖 )
𝑛
1
= × (130 − 1.218 × 690)
10
= −71.042

So, the regression equation is,

𝑦̂ = −71.042 + 1.218𝑥

# Origin and scale shift method of computing regression equation. The regression
coefficient of 𝑦 on 𝑥 is,

∑(𝑥𝑖 − 𝑥̅ )(𝑦𝑖 − 𝑦̅)


𝑏=
∑(𝑥𝑖 − 𝑥̅ )2

We now transform 𝑥 and 𝑦 such that,

𝑥𝑖 − 𝐴 𝑦𝑖 − 𝐵
𝑢𝑖 = 𝑎𝑛𝑑 𝑣𝑖 =
𝑐 𝑑

Where 𝐴 and 𝐵 are shifting origin and 𝑐 and 𝑑 are scales.

Thus, 𝑥𝑖 = 𝐴 + 𝐶𝑢𝑖 so that 𝑥 = 𝐴 + 𝑐𝑢̅

68
(𝑦𝑖 −𝐵)
And 𝑣𝑖 = so that 𝑦 = 𝐵 + 𝑑𝑣̅
𝑑

∑(𝐴 + 𝑐𝑢𝑖 − 𝐴 − 𝑐𝑢̅)(𝐵 + 𝑑𝑣𝑖 − 𝐵 − 𝑑𝑣̅ )


∴𝑏=
∑(𝐴 + 𝑐𝑢𝑖 − 𝐴 − 𝑐𝑢̅)2

∑𝑢𝑖 . ∑𝑣𝑖
𝑑 ∑𝑢𝑖 𝑣𝑖 − 𝑛
= ×
𝑐 ∑𝑢 2
∑𝑢𝑖2 − ( 𝑛 𝑖 )

And,

𝑎 = 𝑦̅ − 𝑏𝑥̅

1
= (∑𝑦𝑖 − 𝑏)
𝑛

Date: 21 / 05 / 23

Question 1. Find the regression equation using origin and scale shifting method:

Capacity 120 200 250 320 500 800 1000 1500


(in GB)
Price 2000 2600 2800 3000 4200 5500 6000 6800

a. Determine regression that can be used to predict the price of an external disk,
given its price.
b. Determine regression that can be used to predict capacity of an external disk,
given its price.
c. Predict the price of a disk whose capacity is 1200 GB.
d. Predict the disk capacity whose price is taka 5000.

Solution:

𝑥𝑖 − 𝐴
𝑢𝑖 =
𝑐
𝑦𝑖 − 𝐵
𝑣𝑖 =
𝑑

From the given data, we can construct the table below:


𝑦 − 4200
Capacity 𝑥 Price 𝑦 𝑢𝑖 =
𝑥 − 500
𝑣𝑖 = 𝑢𝑖2 𝑣𝑖2 𝑢𝑖 𝑣𝑖
10 100

69
120 2000 -38 -22 1444 484 836
200 2600 -30 -16 900 256 480
250 2800 -25 -14 625 196 350
320 3000 -18 -12 324 144 216
500 4200 0 0 0 0 0
800 5500 30 13 900 169 390
1000 6000 50 18 2500 324 900
1500 6800 100 26 10000 676 2600

Here, ∑𝑢𝑖 = 69, ∑𝑣𝑖 = −7, ∑𝑢𝑖2 = 16693, ∑𝑣𝑖2 = 2249, ∑𝑢𝑖 𝑣𝑖 = 5772, 𝑐 = 10, 𝑑 = 100

Ans (a). We know,

∑𝑢𝑖 ∑𝑣𝑖
𝑑 (∑𝑢𝑖 𝑣𝑖 − 𝑛 )
𝑏=
𝑐 (∑𝑢 )2
∑𝑢𝑖2 − 𝑛𝑖

(69 × −7)
100 5772 − 8
= ×
10 692
16693 − 8

= 3.623

And,

𝑎 = 𝑦̅ − 𝑏𝑥̅

∑𝑣𝑖 ∑𝑢𝑖
= (𝐵 + 𝑑 ) − 𝑏 (𝐴 + 𝑐 )
𝑛 𝑛
7 69
= 4200 + 100 × − − 3.623 (500 + 10 × )
8 8
= 1988.516

So, the regression equation of 𝑦 on 𝑥 is,

𝑦̂ = 1988.516 + 3.623𝑥

Ans (b). The regression line of 𝑥 on 𝑦 is,

𝑥̂ = 𝑎 + 𝑏𝑦

70
∑𝑢𝑖 ∑𝑣𝑖
𝑐 (∑𝑢𝑖 𝑣𝑖 − 𝑛 )
𝑏=
𝑑 (∑𝑣 )2
∑𝑣𝑖2 − 𝑛𝑖

(69 × −7)
10 5772 − 8
= × 2
100 (−7)
2249 − 8

= 0.26

And,

𝑎 = 𝑥̅ − 𝑏𝑦̅

∑𝑢𝑖 ∑𝑣𝑖
= (𝐴 + 𝑐 ) − 𝑏 (𝐵 + 𝑑 )
𝑛 𝑛
69 7
= 500 + 10 × − 0.26 (4200 + 100 × − )
8 8
= −483

Hence, the regression line equation of 𝑥 on 𝑦 is,

𝑥̂ = −483 + 0.26𝑦

Ans (c). When capacity is 1200 GB or 𝑥 = 1200, then the predicted price of the disk
would be,

𝑦̂ = 1988.51 + 3.623 × 1200

= 6336.116

Ans (d). When price is taka 5000 or 𝑦 = 5000, then the predicted capacity of the disk
would be,

𝑥̂ = −483 + 0.26𝑦

= −483 + 0.26 × 5000

= 817

71
Date: 23 / 05 / 23

Methods of constructing index number: A large number of formulae had been derived
for constructing index number. Broadly speaking, they can be grouped under two
heads,

i. Unweighted indices
ii. Weighted indices

Unweighted indices: They are two types such as,

1. Simple aggregative
2. Simple average of price relatives

Weighted indices: They are also two types,

1. Weighted aggregative
2. Weighted average of price relatives

Simple aggregative method:

This is the simplest method of constructing index numbers. When this method is used
to construct a price index, the total of current year prices for the various commodities in
question is divided by the total of base year price and the quotient is multiplied by 100.

∑𝑃1
𝑃01 = × 100%
∑𝑃0

Where ∑𝑃1 = total of current year prices for various commodities

∑𝑃2 = total of base year prices for various commodities

Question 1. From the following data construct an index number for 2005 taking 2004 as base.

Commodity and unit Price (2004) Price (2005)


Butter (kg) 100.0 110.0
Cheese (kg) 60.0 75.0
Milk (liter) 20.0 30.0
Bread (quantity) 15.0 20.0
Eggs (Dozen) 20.0 25.0
Ghee (kg) 900.0 910.0

Solve: From the above data, we have,

72
∑𝑃1 = 1170, ∑𝑃0 = 1115

∑𝑃1
𝑃01 = × 100%
∑𝑃0

1170
= × 100%
1115
= 104.93%

Simple average of relatives method: When this method is used to compute a price
index, price relatives are obtained for the various items included in the index and then
the average using any one of the measures of central tendency.

𝑃
∑ 𝑃1 × 100
0
𝑃01 =
𝑁

When geometric mean is used for averaging the price relatives, the formula for
obtaining the index number becomes,

(∑ log P)
𝑃01 = 𝑎𝑛𝑡𝑖𝑙𝑜𝑔
𝑛
𝑃
Where 𝑃 = 𝑃1 × 100
0

Question 2. From the following data construct a price index by simple average of price relatives’
method based on,

a. Arithmetic mean
b. Geometric mean
Commodity and unit Price (2004) Price (2005)
Butter (kg) 100.0 110.0
Cheese (kg) 60.0 75.0
Milk (liter) 20.0 30.0
Bread (quantity) 15.0 20.0
Eggs (Dozen) 20.0 25.0
Ghee (kg) 900.0 910.0

Solve: From the given data, we construct the table below:

73
Commodity and unit Price (2004) Price (2005) 𝑃1 log 𝑃
× 100% = 𝑃
𝑃0
Butter (kg) 100.0 110.0 110 2.041
Cheese (kg) 60.0 75.0 125 2.097
Milk (liter) 20.0 30.0 150 2.176
Bread (quantity) 15.0 20.0 133.33 2.125
Eggs (Dozen) 20.0 25.0 125 2.097
Ghee (kg) 900.0 910.0 101.11 2.004

Here,

𝑃1
∑( × 100%) = 𝑃 = 744.44
𝑃0

∑ log 𝑃 = 12.54

𝑃
∑ 𝑃1 × 100 744.44
0
∴ 𝑃𝑟𝑖𝑐𝑒 𝑖𝑛𝑑𝑒𝑥 (𝐴𝑀), 𝑃01 = = = 124.073
𝑁 6
(∑ log P) 12.54
∴ 𝑃𝑟𝑖𝑐𝑒 𝑖𝑛𝑑𝑒𝑥 (𝐺𝑀), 𝑃01 = 𝑎𝑛𝑡𝑖𝑙𝑜𝑔 = 𝐴𝑛𝑡𝑖𝑙𝑜𝑔 = 123.026
𝑛 6

Weighted aggregative index number: There are various methods of assigning weights
and consequently a large number of formulae for constructing index number have been
devised. Some of them are,

1. Laspeyres method
2. Paasche method
3. Dorbish and Bowley’s method
4. Fisher’s ideal method
5. Marshall-Edgeworth method
6. Kelly’s method

Laspeyres method: In this method, the base quantities are taken as weights. The
formula for constructing index is,

∑𝑞0 𝑝1
𝑃01 = × 100
∑𝑞0 𝑝0

74
Paasche method: In this method the current year quantities are taken as weights. The
formula for constructing the index is,

∑𝑞1 𝑝1
𝑃01 = × 100
∑𝑞1 𝑝0

Dorbish and Bowley’s method: Dorbish and Bowley have suggested simple arithmetic
mean of the two indices (Laspeyres and Paasche) mentioned above so as to consider the
influence of both periods. The formula for constructing the index is,

𝐿+𝑃
𝑃01 =
2

Where 𝐿 = 𝐿𝑎𝑠𝑝𝑒𝑦𝑟𝑒𝑠 𝑖𝑛𝑑𝑒𝑥, 𝑃 = 𝑃𝑎𝑎𝑠𝑐ℎ𝑒 𝑖𝑛𝑑𝑒𝑥

Fisher’s ideal method: Professor Fisher has given a number of formulae for
constructing index number and of those he calls on as the ideal index. The Fisher’s ideal
index is given by the formula,

∑𝑝1 𝑞0 ∑𝑝1 𝑞1
𝑃01 = √ × × 100
∑𝑝0 𝑞0 ∑𝑝0 𝑞1

= √𝐿 × 𝑃

Marshall-Edgeworth method: In this method also both current year as well as base year
prices and quantities are considered. The formula for constructing the index is,

∑𝑝1 × (𝑞0 + 𝑞1 )
𝑃01 = × 100
∑𝑝0 × (𝑞0 + 𝑞1 )

Kelly’s method: T.L Kelly has suggested the following formula for constructing index
number,

∑𝑝1 𝑞
𝑃01 = × 100
∑𝑝0 𝑞
𝑞0 ×𝑞1
Where 𝑞 = 2

Question 3. Construct index number of prices from the following data using 1-5:
Commodity Price (2004) Quantity (2004) Price (2005) Quantity (2005)

A 5 8 7 9

75
B 4 15 9 17
C 7 18 10 20
D 9 20 3 13

Solve: From the above data, we can construct the table below:
Commodity Price Quantity Price Quantity 𝑝0 𝑞0 𝑝1 𝑞0 𝑝0 𝑞1 𝑝1 𝑞1 𝑝0 (𝑞0 + 𝑞1 ) 𝑝1 (𝑞0 + 𝑞1 )
𝑝0 𝑞0 𝑝1 𝑞1
A 5 8 7 9 40 56 45 63 85 119
B 4 15 9 17 60 135 68 153 128 288
C 7 18 10 20 126 180 140 200 266 380
D 9 20 3 13 180 60 117 39 297 99
Here,

∑𝑝0 𝑞0 = 406, ∑𝑝0 𝑞1 = 370, ∑𝑝1 𝑞0 = 431, ∑𝑝1 𝑞1 = 455,


∑𝑝0 (𝑞0 + 𝑞1 ) = 776, ∑𝑝1 (𝑞0 + 𝑞1 ) = 886

1. Laspeyres method:
∑𝑞0 𝑝1
𝑃01 = × 100
∑𝑞0 𝑝0
431
= × 100 = 106.157
406
2. Paasche method:
∑𝑞1 𝑝1
𝑃01 = × 100
∑𝑞1 𝑝0
455
= × 100 = 122.98
370
3. Dorbish and Bowley’s method:
∑𝑞0 𝑝1 ∑𝑞1 𝑝1
𝐿 + 𝑃 (∑𝑞0 𝑝0 × 100) + (∑𝑞1 𝑝0 × 100)
𝑃01 = =
2 2
106.157 + 122.98
= = 114.568
2

4. Fisher’s method:
∑𝑝1 𝑞0 ∑𝑝1 𝑞1
𝑃01 = √ × × 100
∑𝑝0 𝑞0 ∑𝑝0 𝑞1
= √106.157 × 122.98 = 114.26
5. Marshall-Edgeworth method:
∑𝑝1 × (𝑞0 + 𝑞1 )
𝑃01 = × 100
∑𝑝0 × (𝑞0 + 𝑞1 )

76
886
= × 100 = 114.175
776

Weighted average of relative index number: In the weighted aggregative methods


discussed above price relative were not computed. However above price relative was
not computed. However unweighted relatives’ method it is also possible to compute
weighted average of relatives. For this purpose, we may use either arithmetic mean or
geometric mean. The formula becomes,

∑𝑃𝑉
𝑃01 =
∑𝑉

Where 𝑃 = 𝑝𝑟𝑖𝑐𝑒 𝑟𝑒𝑙𝑎𝑡𝑖𝑣𝑒, 𝑉 = 𝑣𝑎𝑙𝑢𝑒 𝑤𝑒𝑖𝑔ℎ𝑡𝑠, 𝑖. 𝑒. 𝑝0 𝑞0

When geometric mean is used, the formula becomes,

∑𝑉 log 𝑃
𝑃01 = 𝐴𝑛𝑡𝑖𝑙𝑜𝑔 [ ]
∑𝑉
𝑝
Where 𝑃 = 𝑝1 × 100
0

Question 4. From the following data compute price index by applying weighted average of price
relatives using arithmetic mean.

Commodity Price (Previous) Quantity Price (Present)


Rice 30 15 / kg 35
Milk 25 40 / liter 30
Oil 90 50 / liter 100
Sugar 20 30 / kg 25
Flour 15 40 / kg 25

Solve: From the above data, we construct the table below:

Commodity Price (𝑝0 ) Quantity Price (𝑝1) 𝑝0 𝑞0 𝑝1 𝑃𝑉


× 100 = 𝑃
(Previous) (𝑞0 ) (Present) =𝑉 𝑝0
Rice 30 15 35 450 116.667 52500.15
Milk 25 40 30 1000 120 120000
Oil 90 50 100 4500 111.111 499999.5
Sugar 20 30 25 600 125 75000
Flour 15 40 25 600 166.667 100000.2

Here, ∑𝑃𝑉 = 847499.85, ∑𝑉 = 7150

77
∑𝑃𝑉
𝑃01 =
∑𝑉

847499.85
= = 118.531
7150

This means that there has been a 118.531 percent increase in price over the base level.

78

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy