0% found this document useful (0 votes)
19 views17 pages

Unit III

Research and statistics method

Uploaded by

Neethupaul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views17 pages

Unit III

Research and statistics method

Uploaded by

Neethupaul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

MATHEMATICAL STATISTICS

UNIT-III

MEASURE OF DISPERSION

The measures of central tendency describes the central part of values in the data set
appears to concentrate around a central value called average. But these measures do not reveal
how these values are dispersed (spread or scattered) on each side of the central value. Therefore
while describing data set it is equally important to know how for the item in the data are close
around or scattered away from the measures of central tendency.

Definitions

Dispersion is the measure of the variation of the items – A L Bowley

Dispersion or spread is the degree of the scatter or variation of the variable about a central value
– Brooks & Dick

Example

Look at the runs scored by the two cricket players in a test match:

Players I Innings II Innings Mean


Player 1 0 100 50
Player 2 40 60 50
Comparing the averages of the two players we may come to the conclusion that they were
playing alike. But player 1 scored 0 runs in I innings and 100 in II innings. Player 2 scored
nearly equal runs in both the innings. Therefore it is necessary for us to understand data by
measuring dispersion.

Characteristics of a good Measure of Dispersion

An ideal measure of dispersion is to satisfy the following characteristics.

(i) It should be well defined without any ambiguity.

(ii) It should be based on all observations in the data set..

(iii) It should be easy to understand and compute.

(iv) It should be capable of further mathematical treatment.

(v) It should not be affected by fluctuations of sampling.

(vi) It should not be affected by extreme observation.

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 1


MATHEMATICAL STATISTICS

Types of measures of dispersion

The measures of dispersion are classified in two categories, namely

1. Absolute measures

2. Relative measures

1. Absolute Measures

In involve the units of measurements of the observations. For example,

(i) the dispersion of salary of employees is expressed in rupees

(ii) the variation of time required for workers is expressed in hours.

Such measures are not suitable for comparing the variability of the two data sets which
are expressed in different units of measurements.

Range

Raw Data

Range is defined as difference between the largest and smallest observations in the data
set. Range (R) = Largest value in the data set (L) - Smallest value in the data set (S)

R=L–S

Grouped Data

For grouped frequency distribution of values in the data set, the range is the difference
between the upper class limit of the last class interval and the lower class limit of first class
interval.

Coefficient of Range

The relative measure of range is called the coefficient of range

Coefficient of Range = (L - S) / (L + S)

Example

The following data relates to the heights of 10 students (in cm’s) in a school. Calculated
the range and coefficient of range

158, 164, 168, 170, 142, 160, 154, 174, 159, 146

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 2


MATHEMATICAL STATISTICS

Solution

L = 174 S = 142

Range = L – S = 174 – 142

Range = 32

Coefficient of range = (L – S) / (L + S)

= (174 – 142) / (174 + 142) = 32 / 316

Coefficient of range = 0.101

Example

Calculate the range and the co-efficient of range for the marks obtained by 100 students
in a school

Marks 60 – 63 63-66 66-69 69-72 72-75


No. of students 5 18 42 27 8
Solution

L = Upper limit of highest class = 75

S = Lower limit of lowest class = 60

Range = L – S

= 75 – 60

Range = 15

Coefficient of range = (L – S) / (L + S)

= (75 - 60) / (75 + 60)

Coefficient of range = 0.111

Merits

 Range is the simplest measure of dispersion


 It is well defined, and easy to compute
 It is widely used in quality control, weather forecasting, stock market variations etc.

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 3


MATHEMATICAL STATISTICS

Limitations

 The calculations of range is based on only two values - largest value and smallest value.
 It is largely influenced by two extreme values.
 It cannot be computed in the case of open-ended frequency distributions
 It is not suitable for further mathematical treatment.

Inter Quartile Range or Quartile Deviation

The quartile Q 1 , Q 2 and Q 3 have been introduced and studied

Inter quartile range is defined as:

Inter quartile Range (IQR) = Q3 – Q1

Quartile Deviation is defined as, half of the distance between Q 1 and Q 3 ,

Quartile Deviation Q.D = (Q 3 – Q1 ) / 2

Coefficient of Q.D. 
 Q3  Q1  2 Q3  Q1

 Q3  Q1  2 Q3  Q1

Example

Find out the value of quartile deviation and its coefficient from the following data:

Roll No 1 2 3 4 5 6 7
Marks 20 28 40 12 30 15 50
Solution

Marks arranged in ascending order

12 15 20 28 30 40 50

N 1
Q1  Size of th item
4
7 1
 Size of th item
4
Size of 2nd item is 15. Thus Q1 = 15

 N 1 
Q3  Size of 3   th item
 4 
 3 8 
 Size of   th item
 4 
Size of 6nd item is 40. Thus Q3 = 40

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 4


MATHEMATICAL STATISTICS

Q3  Q1
Q.D =
2
40  15

2
Q.D  12.5

Q3  Q1
Coefficient of Q.D. 
Q3  Q1
40  15 25
 
40  15 55
Coefficient of Q.D.  0.455

Example

Compute coefficient of quartile deviation from the following data:

Marks 10 20 30 40 50 60
No. of Students 4 7 15 8 7 2
Solution:

Marks Frequency c.f


10 4 4
20 7 11
30 15 26
40 8 34
50 7 41
60 2 43
N 1 43  1
Q1  Size of th item   11th item
4 4

Size if 11th item is 20. Thus Q 1 = 20

 N 1   43  1 
Q3  Size of 3   th item  3    33rd item
 4   4 

Size if 33rd item is 40. Thus Q 3 = 40

Q3  Q1 40  20
Q.D.    10
2 2

Q3  Q1 40  20 20
Coefficient of Q.D.     0.333
Q3  Q1 40  20 60

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 5


MATHEMATICAL STATISTICS

Example

Calculate quartile deviation and the coefficient of quartile deviation from the following data

Wages in Rupees per hour Less than 35 35-37 38-40 41-43 Over 43
Number of wage earners 14 62 99 18 7
Solution

Wages
f c.f
(Rs. Per hour)
Less than 35 14 14
35-37 62 76
38-40 99 175
41-43 18 193
Over 43 7 200
Q  Q1
Q.D = 3
2

N 200
Q1  Size of th item   50th item
4 4

Q1 lies in the class 35 – 37.

N
 c. f .
Q1  L  4 i
f

L = 35, N/4 = 50, c.f. = 14, f = 62, i = 2

50  14
Q1  35   2  35  1.16  36.16
62

3N 3  200
Q3  Size of th item   150th item
4 4

Q3 lies in the class 38 – 40.

3N
 c. f .
Q1  L  4 i
f

L = 38, 3N/4 = 150, c.f. = 76, f = 99, i = 2

150  76
Q3  38   2  38  1.49  39.49
99

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 6


MATHEMATICAL STATISTICS

39.49  36.16
Q.D.   1.67
2

Q3  Q1 39.49  36.16 3.33


Coefficient of Q.D.     0.044
Q3  Q1 39.49  36.16 75.65

Merits

 It is not affected by the extreme (highest and lowest) values in the data set.
 It is an appropriate measure of variation for a data set summarized in open-ended class
intervals.
 It is a positional measure of variation; therefore it is useful in the cases of erratic or
highly skewed distributions.

Limitations

 The QD is based on the middle 50 per cent observed values only and is not based on all
the observations in the data set, therefore it cannot be considered as a good measure of
variation.
 It is not suitable for mathematical treatment.
 It is affected by sampling fluctuations.
 The QD is a positional measure and has no relationship with any average in the data set.

Mean Deviation

The Mean Deviation (MD) is defined as the arithmetic mean of the absolute deviations of
the individual values from a measure of central tendency of the data set. It is also known as the
average deviation.

The measure of central tendency is either mean or median. If the measure of central
tendency is mean (or median), then we get the mean deviation about the mean (or median).

MD (about mean) 
D 
D xx 
N

MD (about median) 
D m
Dm   x  median 
N

The coefficient of mean deviation (CMD) is the relative measure of dispersion


corresponding to mean deviation and it is given by

MD( Mean or median)


Coefficient of Mean Deviation (CMD) 
mean or median

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 7


MATHEMATICAL STATISTICS

Example

The following are the weights of 10 children admitted in a hospital on a particular day.
Find the mean deviation about mean, median and their coefficients of mean deviation.

7 4 10 9 15 12 7 9 9 18

Solution

n = 10; Mean : x 
 x  100  10
n 10

Median: The arranged data is : 4 7 7 9 9 9 10 12 15 18

9  9 18
Median   9
2 2

Marks (x) D  xx Dm  x  Median


7 3 2
4 6 5
10 0 1
9 1 0
15 5 6
12 2 3
7 3 2
9 1 0
9 1 0
18 8 9
Total = 100 30 28

MD (about mean) 
 D  30  3
N 10

Mean Deviation about mean 3


Coefficient of Mean Deviation about mean    0.3
x 10

MD (about median) 
D m

28
 2.8
N 10

Mean Deviation about median 2.8


Coefficient of Mean Deviation about median    0.311
median 9

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 8


MATHEMATICAL STATISTICS

Example

Calculate mean deviation from the following series

x 10 11 12 13 14
f 3 12 18 12 3
Solution

x f c.f D  x  median f D
10 3 3 2 6
11 12 15 1 12
12 18 33 0 0
13 12 45 1 12
14 14 48 2 6
Total = 60 48 36

MD 
f D
N

N 1 48  1
Median  Size of th item   24.5th item
2 2

Size of 24.5th item is 12 hence median = 12

MD 
f D

36
 0.75
N 48

Example

Find the median and mean deviation of the following data

Size 0-10 10-20 20-30 30-40 40-50 50-60 60-70


f 7 12 18 25 16 14 8
Solution

Size f c.f x D  x  median f D


0-10 7 7 5 30.2 211.4
10-20 12 19 15 20.2 242.4
20-30 18 37 25 10.2 183.6
30-40 25 62 35 0.2 5.0
40-50 16 78 45 9.8 156.8
50-60 14 92 55 19.8 277.2
60-70 8 100 65 29.8 238.4
Total 100 1314.8
N 100
Q1  Size of th item   50th item
2 2

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 9


MATHEMATICAL STATISTICS

Median lies in the class 30 - 40.

N
 c. f .
Median  L  2 i
f

L = 30, N/2 = 50, c.f. = 37, f = 25, i = 10

50  37
Median  30  10  30  5.2  35.2
25

MD 
D 
1314.8
 13.148
n 100

Standard Deviation

Definition: Standard deviation is the positive square root of average of the deviations of all the
observation taken from the mean. It is denoted by a Greek letter σ.

Ungrouped data

x1 , x2 , x3 , , xn are the ungrouped data then standard deviation is calculated by

1. Actual mean method: Standard deviation  


d 2

,d  x  x
N

d  d 
2 2

Standard deviation    ,d  x  A
 N 
2. Assumed mean method:
N  

Grouped Data (Discrete)

 fd   fd 
2 2

    , d  x  A
N  N 

Where, f = frequency of each class interval

N = total number of observation (or elements) in the population

x = mid – value of each class interval

where A is an assumed A.M.

Grouped Data (continuous)

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 10


MATHEMATICAL STATISTICS

 fd   fd 
2
x A
2

 
 N 
 C, d 
N   C

Where, f = frequency of each class interval

N = total number of observation (or elements) in the population

c = width of class interval

x = mid – value of each class interval

where A is an assumed A.M.

Merits

 The value of standard deviation is based on every observation in a set of data.


 It is less affected by fluctuations of sampling.
 It is the only measure of variation capable of algebraic treatment.

Limitations

 Compared to other measures of dispersion, calculations of standard deviation are


difficult.
 While calculating standard deviation, more weight is given to extreme values and less to
those near mean.
 It cannot be calculated in open intervals.
 If two or more data set were given in different units, variation among those data set
cannot be compared.

Example

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 11


MATHEMATICAL STATISTICS

Weights of children admitted in a hospital is given below calculate the standard deviation of
weights of children.

13 15 12 19 10.5 11.3 13 15 12 9

Solution

A.M ., x
x
n

13  15  12  19  10.5  11.3  13  15  12  9

10

129.8

10

x  12.98

x d = x – 12.98 d2
13 0.02 0.0004
15 2.02 4.0804
12 -0.98 0.9604
19 6.02 36.2404
10.5 2.48 6.1504
11.3 -1.68 2.8224
13 0.02 0.0004
15 2.02 4.0804
12 -0.98 0.9604
9 -3.98 15.8404
129.8 71.136

Standard deviation  
d 2

,d  x  x
N

71.136

10

 2.67

Example

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 12


MATHEMATICAL STATISTICS

The wholesale price of a commodity for seven consecutive days in a month is as follows:

Days 1 2 3 4 5 6 7
Commodity/price/quintal 240 260 270 245 255 286 264
Calculate the variance and standard deviation.

Solution

We assume the A.M. = 255.

Observations (x) d = x - A d2
240 -15 225
260 5 25
270 15 225
245 -10 100
255 0 0
286 31 961
264 9 81
35 1617
d  d 
2 2

Variance   2
   
n  n 

2
1617  35 
  
7  7

 231  52  231  25

Variance   2  206

Standard deviation    variance

  206  14.35

Example

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 13


MATHEMATICAL STATISTICS

A study of 100 engineering companies gives the following information

Profit (`in Crore) 0-10 10-20 20-30 30-40 40-50 50-60


No. of Companies 8 12 20 30 20 10
Calculate the standard deviation of the profit earned.

Solution

A = 35, C = 10

Profit Mid-value x A
d f fd fd2
(Rs. In Crore) (x) C
0-10 5 -3 8 -24 72
10-20 15 -2 12 -24 48
20-30 25 -1 20 -20 20
30-40 35 0 30 0 0
40-50 45 1 20 20 20
50-60 55 2 10 20 40
Total 100 -28 400

 fd   fd 
2 2

Standatd deviation        C
N  N 

200  28 
2

   C
100  100 

2   0.078 10

 13.863

Coefficient of Variation

The coefficient of variation is obtained by dividing the standard deviation by the mean
and multiplying it by 100. Symbolically,


Coefficient of Variation  C.V .  100
x

Merit

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 14


MATHEMATICAL STATISTICS

 The CV is independent of the unit in which the measurement has been taken, but standard
deviation depends on units of measurement. Hence one should use the coefficient of
variation instead of the standard deviation.

Limitations

 If the value of mean approaches 0, the coefficient of variation approaches infinity. So the
minute changes in the mean will make major changes.

Example

The scores of two batsmen, A and B, in ten innings during a certain season, are as under:

A: Mean score = 50; Standard deviation = 5

B: Mean score = 75; Standard deviation = 25

Find which of the batsmen is more consistent in scoring

Solution


Coefficient of Variation  C.V .  100
x

5
C.V for batsman A  100  10%
50

25
C.V for batsman B  100  33.33%
75

The batsman with the smaller C.V is more consistent.

Since for Cricketer A, the C.V is smaller, he is more consistent than B.

Example

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 15


MATHEMATICAL STATISTICS

The weekly sales of two products A and B were recorded as given below

Product A 59 75 27 63 27 28 56
Product B 150 200 125 310 330 250 225
Find out which of the two shows greater fluctuations in sales.

Solution:

For comparing the fluctuations in sales of two products, we will prefer to calculate coefficient of
variation for both the products.

Product A: Let A = 56 be the assumed mean of sales for product A.

A = 56
Sales (x) Frequency (f) fd fd2
d=x–A
27 2 -29 -58 1682
28 1 -28 -28 784
56A 1 0 0 0
59 1 3 3 9
63 1 7 7 49
75 1 19 19 361
Total 7 -57 2885

x  A
 fd  56  57  47.86
N 7

 fd   fd 
2
2885  57 
2 2

Variance   2
        412.14  66.30  345.84
N  N  7  7 

SD    Variance  345.84  18.59

 18.59
C.V .( A)  100  100  38.84%
x 47.86

Product B: Let A = 225 be the assumed mean of sales for product B.

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 16


MATHEMATICAL STATISTICS

A = 225
Sales (x) Frequency (f) fd fd2
d=x–A
125 1 -100 -100 10000
150 1 -75 -75 5625
200 1 -25 -25 625
225 1 0 0 0
250 1 25 25 625
310 1 85 85 7225
330 1 105 105 11025
Total 7 15 35125

x  A
 fd  225  15  227.14
N 7

 fd   fd  35125  15 2
2 2

Variance   2
         5017.85  4.59  5013.26
N  N  7 7

SD    Variance  5013.26  70.80

 70.80
C.V .( B)  100  100  31.17%
x 227.14

Since the coefficient of variation fro product A is more than that of product B,

Therefore the fluctuation in sales of product A is higher than product B.

Dr.J.Kesavan, Biostatistician Research, VMRF(DU) Page 17

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy