Unit 16
Unit 16
Structure
16.0 Objecthes
16.1 Introduction
16.2 Meaning of Skewness
16.3 Positive and Nagative Skewness
16.4 Difference between Dispersion and Skewness
. 16.5 Tests of Skewness
16.6 Measures of Skewness
i6.7 Some Illustrations
16.8 Properties of Normal Cuke
16.9 Let Us Sum Up
16-10 Key Words and Symbols
16.11 Answers to Check Your Progress
16.12 Terminal QuestionslExercises
16.0 OBJECTIVES
After studying this unit, you should be able to :
9 distinguish between skewness and dispersion
- 0 differentiate between symmetrical, positively skewed and negatively skewed data
calculate skewness by different methods
* decide which of the methods of computing is suitable in a given sithation
e appreciate the role of normal curve in the analysis of data and discuss its properties.
6 1 INTRODUCTION
As you know, to analyse any numerical data there are three main characteristics :
1) central tendency i.e., a value around which many other items of the data
congregate, 2) dispersion i.e., how much the items deviate from central tendency,
and 3) skewness i.e., how the items are distributed about the central tendency. 111
this unit, you will learn about the third characteristic i.e. skewness.
In Unit 10 to 13 you have studied the measures of central tendency viz., arithmetic
mean, median, mode geometric mean, harmonic mean and moving average. In units
14 and 15 you have studied the measures of dispersion viz. range, quartile deviation,
mean deviation, standard deviation and Lorenz curve. In this unit you will learn
about third characteristic i.e. skewness. You will study the meaning, purpose and
methods of computing skewness. You will also study the role and properties of
normal curve in analysis of data. In fact, there is one more characteristic called
kurtosis i.e., concentration of frequencies in the central part of the data, which is not
within the scope of this course.
Figure 16.1
- I
I
I Symmetrical Distribution
1 If the graph of a perfectly symmetrical data is folded at the line passing through mean,
I one side of the curve perfectly coincides with the others side. You can say one side is
I the mirror image of the other side.
Ingeneral, however, frequency distributions are not perfectly sylnmetrical; some may
,be slightly asymmetrical and some others may be highly asymmetrical. Consider the
following two asymrr~etrical(of Skewed) distributions :
B)
I
X 5-9 9-13 13-17 17-21 21-24
I f 7 28 15 10 2
4
' Here the frequencies are not symmetrically distributed about the middle. In
ldistrihution A the extent of asymmetry is small while in distribution B it is
I comparatively larger.
; I ' The word 'skewness' is used to denote the 'extent of asymmetry' in the data. When the
frequency distribution is not symmetrical, it is said to be "skewed". The word
Ii y y
'skewness' literally denoted 'asymmetry , or 'lack of symmetry and the word 'skewed'
1
1 denoted 'asymmetrical'. A symmetrical distribution has therefore zero skewness.
1 ! A distribution..can be symmetrical even if frequencies first steadily fall and then
I steadily rise. Consider the following distribAtions:
I
1
I Size of Items ; 10-20 20-30 30-40 40-50 50-60 6&70 7&80 '
I
I 1 Frequency : 40 27 15 10 15 27 40
E This is also a case of symmetrical distribution. But in this case there will be two values
1 . a
of the mode and both of them will be different from arithmetic mean and median
; which will be in the middle group, You may notice in such symmetrical distributions, .
,
+
which are called bimodel or u-shaped, only mean and median are equal. Look at
fl figure 16.2 and study the shape of such data on graph paper. 1
I
rB
Measures of Dispersion and
Skewness
I I
Mode 1 Mean = Median Mode 2
1) It helps in finding out the nature and the degree of concentration - whether it
is in higher or the lower values.
2) The empirical relationship between mean, median and mode i.e.,
M, = 3 Md'- 2x, is based on a moderately skewed distribution. The measure
of skewnesi will reveal to what extent such empirical relationship holds goods.
3) It helps in knowing if the distribution is normal or not. You will l e a ~ nabout
normal distribution later in thisunit.
A) Sizeof Items
Frequency
:
:
24
5
4-6
12
6-8
27 , 108-10 10-12
8
12-14
3
14-16
1
Frequency : 2 5 ' 12 18 30 21 6
I)
Like in Set B if there ismlonger tail towards the lower value or left hand sideCi.e..
larger spread on the lower side, of inode,.the skewness is negative or left handed. In
such 'a case Mean < Median < Mode. Look at Figure 16.3 to unclerstand the data
on graph paper.
Measures of Skewness
As in the case of Set A if there is a longer tail of the distribution towards the higher
values or right hand side i.e., larger spread on higher side of mode, the skewness 1s
positive or right handed. In this case Mean > Median > Mode. Shape of such a data '
on graph paper would be as shown in Figure 16.4.
Such data is termed 'ulongated bcll shaped data'. The case of extreme positive
skewness would arise when frequencies are highest in the lowest values and then they
steadily fall as the values increase. Similarly, the extreme negative skewness would
arise when frequencies are lowest in the lower values and they steadily increase as
values increase. t'he highest frequency representing the highest values: Such data
is callcd 'J' shupcd data. Consider the following two sets of data :
A ) Size of Items : 10-12 12-14 14-16 1 6 1 8 18-20
Frequency : 27 20 12 6 3
B) Sizeof Items : 10-12 12-14 14-18 1 6 1 8 18-20
Frequency : 3 6 12 20 27
set A shows very high positive skewness and Set B shows high negative skewness.
Thcir shape on graph paper will be as shown in Figure 16.5A and 16.5B.
\
. .
2) Differentiate between positiveskewness and negative skewness.
3) ~ i s t i n ~ u i sbetween
h high skewness and moderate skewness.
.........................................................................................................
4) Differentiate between bell shaped and U-shaped data.
SKEWNESS
It has been explained in Units 14 and 15 that dispersion relates to the scatteredness
or spread or the deviation of the items of a sbries from its central value. You also
know that the measure of dispersion shows the degree of the scatteredness or average
of deviation of the items of the central tendency. On the other hand, skewness relates
to the depart-ureof the items of a series from symmetry and the measure of skewness
shows the degree of imbalance in the distribution of items around the central
tendency. The distinguishing features are tgbulated below :
- - -
2) Judges the extent of representativeness of any of the difference between any two of the three
three iiverages : Mean, Median, averages : Mean, Median and Mode
and Mode
How can we say that a particular distribution is skewed or not? We can say skewness
is present in a distribution if it has the following features:
1) Meal!, median and mode should not coincide.
2) The sum of the positive deviations from the median is not equal to the sum of the
, negative deviations.
3) Frequencies and their spread on either side of the mode are not equal. .
4) Quartiles are +notequidistant from the median i.e.. (Qj - Md) is not equal to
(Md - Q1).
5) When the observations in the' series are plotted on a graph paper, they do not
yield'a symmetrical curve. This means when the graph is divided vertically
I
through the median or mean,and folded, the two halves of the curve d o not
coincide in a perfect manner.
~ k ,= Mean -Mode
S.D.
.,
measuring skewness is as follows:
-
X -M,
u i.e., first absolute measure of skewness is
divided by standard deviation. Thus, this value will be ftee of units of the data. The
value of this coefficient would be zero in a symmetrical distribution. If mean is greater
than mode, coefficient of skewness would be positive otherwise negative. In practice,
+
the value of this coefficient usually lies between 3.
If the mode is ill-defined, then using the approximate relationship:
Mode = 3 Median - 2 Mean
The above formula reduces to
Sk, =
3 (Mean - Median) or 3 (x-Md)
, S.D. u
Note : A s mean and standard deviations are calculated by using values of all the items
.of the data, Karl Pearson's method measures skewness utilising all the items of the
data,
T o understand the application of Karl Pearson method clearly, let us consider some
illustrations.
. Illustration 1
From the marks secured by 120 students in Sections A and B of a class of 120
students, the following measures are obtained: I
Solution
Section A
-
~k,
= X - Mode
cl
Section B Measures of Skewness 'mi
Hence the distribution of marks in Section A is more skewed. The skewness for
Section A.is negative, while that of B is positive.
Illustration 2 .
Following statistical measures are given for a data set. Find out the value of standard
deviation. ,
Coefficient of skewness is -0.375, Mean is 62 and Median is 6.
Sdution
The coefficient of skewness that depnds upon Mean, Median and Standard
Deviation is Karl Pearson's coefficient of skewness.
-
1 . .
-
- (X - Md), substituting the given values
SkP u
This method is particularly useful in case of open end distributions and where extreme '
values are present or when class-intervals are unequal. Skewness should be measured
by t)is Bowley's method also when positional measures are called for.
If tlie value of this coefficient is zero, it is a symmetrical distribution. For positive
value, it is a positively skewed distribution and for a negative value it is a negatively
+
skewed distribution. The.range of variation under this formula is 1. But the main
drawback of this measure is that it is based on central 50% of the data and it ignores
the remaiaing 50% of the data i.e.,25% of the data below Q,, and 25% of the data
. above 'Q3.To understand the application of Bowley's method clearly, study
. Illustroti~ns3 and 4.
MC~SUWof Dispedon Pnd
. Skewness
For a given data, Q1= 58, Md= 59 and Q3 == 61. Find coefficient of skewness.
Solution .
Illustration 4
In a frequency distribution, the coefficient of skewness based upon quartiles is 0.6
If the sum of the upper and lower quartiles is 100 and the median is 38, find the value
of the upper quartile.
Solution
Bowley's coefficient of skewness based on quartiles is given by:
or QS- Q1 -
-.
0.6
= 40 ... (i)
Also it is given, Q3 + Q1 = 100 ... (ii)
Adding (i) and (ii), we get
-
2200 2400 5
I Solution
Since the given distribution is not openended and also the mode can be determined,
it is appropriate to apply Karl Pearson formula as given below :
Mode = L + fl - fo x i
(fl - fo) + (fl - f2)
= Clearly the modal group is 1800 - 2000. Substituting the values. '
. ,
Now calculating the standard deviation.
= 200 x jpr-m%i
= 1.582 x 200 = 316.4
1682 - 1850 .
Now coefficient of skewness, Skp = -
316.4
I
i = -0.531
2 '
Illustration 6
Calculate the coefficient of skewness based on mean and median from the following
distribution: .
I
I
ctmm Inlewd Frrq-j
1
I
0-I0 6
j 10-20
20-30
12~
22
11 30-40 48
I .40-50 . 56
i
1 50-60 32
d
1 60-70 18
1
70- IU) 6
L
Ivieasures of Dispersion and Solution
Skewness
Calculations lor Mean, Median and S.D.
0- 10 5 -3 6 -18 54 6 .
20 - 30 25 -1 22 -22 22 40
30 - 40 35 0 48 0 0 88
40 - 50 45 1 56 56 56 144
50 - 60 55 2 32 64 128 176
60-70 65 3 18 54 162 194
70- 80 75 4 6 24 96 200
-
Total . - --.
-. 200, 134 566
-
.= 1.543 x 10 = 15.43
Karl Pearson's coefficient of skewness based on mean and median is given by:
= -0,085
I
Hence, the distribution is negatively skewed with very low degree of skewness.
Illustration 7
~aicula'tethe coefficient of skewness based on quartiies from the following data:
Monthly Mary No. of Employ&
1000-1200 5
Solution
Computation of Quartlles
1000-1200 5 5
12W - 1400 14 19
Q,has 9observations or 50 observations below it. It lies in the class 1600 - 1800.
1
N observations or 100 observations below it. So it lies in the class
Qz (= Mddian) has -
2
1800 - 2000.
( Q3 has 4
observations or 150 observations below it. So it lies in the class
1
I
Coefficient of Sk =
Q3 + Q1-2Md
j
QJ - Q;
Measures of Dispersion and lllustration 8
Skewness
a The following table gives the distribution of monthly income of 500 workers in a
factory:
Belaw Rs.1000 10
-
1000 1500 25
1500- 2000 145
2500-3000
3000 and above .. .
'
i) Obtain the limits of income of central 50 per cent of the dbserved ml;loyees
ii) Calculate Bowley's coefficient of skewness.
Solution
i) For obtaining the limits of central 50% of the workers, calculate Q , and Q3.
Hence the incomes of central 50% of workers lies between Rs. 1'810.3 and
Rs. 2443.18.
Illustration 9
Calculate Karl Pearson's coefficient of skewness from the following data:
Incomes (Rs.per day) No. of Shops
Above 0 150
Above 100 140
Above 200 100
Above 300 80
Above 400 80
Above 500 70
Above 600 30
Above 700 14
Above 800 0
Solution
Converting thc cumulative frequency distributions to ordinary frequency distribution,
we have:
-
0-100 50 10 -3 -30 90 10
100-200 150 40 -2 -80 I60 50
200 - 300 250 20 . -1 -20 20 70
300 - 400 350 0 ..O 0 0 70* '
..
. a
4002%0 45? 10
I .
I
.
'. 10 10 80
500 Y $00 : so 40 2 80 160 12d
600 - 700 . 650 16 3 48 144 13'lr
700 - 800 75 0 14 . 4 56 224 150
Total . r.'
150 ' . 64 808
Measures of Dispersion and Calculation of Mean
Skewness
-
X Ifd'
=A+-
N
,;
Calculation of Median
Find standard deviation, rnode and median when mean = 50, coefficient of
variation = 40%, Skewness = -0.4.
Solution
Substituting the values of mean and C.V. in the formula
Below 50 8 8
50- 60 12 20
60-80 20 40
80 - 100 25 65
100 and Above 15 80
Solution
Here class intervals are unequal and open. So the appropriate method of determining
skewness is.BowleyYsmethod.
t
je
Q3 has a
4
or 60 observations below it. So it lies in the class 80 - 100
Measures of Dlsperslon and Q3f.Q1-2Md
Skewnesa SkB =
Q3 - Q1
= -0.11
This value of coeffident of skewness indicates that the distribution is slightly skewed
to the left and, therefore, there is a greater concentration of the sales at the higher
valuesethan the lower values of the distribution.
Illustration 12
The following facts were gathered fr,oma firm before a ~ after
d an industrial dispute:
By making use of the above data, compare the position of the firm before and after
the dispute as fully as possible.
Solution
a) Number of workers has decreased by 50, from 600 to 550 as a result of the dispute.
b) Although the mean wage has slightly increased, the firm.saves Rs. 15,000 (after
dispute) in respect of the monthly salary bill: . .
Total Wages before Dispute (600 x 850) = Rs. 5,10,000
Total Wages after Dispute (550 x 900) = Rs. 4,95,000
Difference 15,000
c) The median and modal wages have decreased. Before the dispute, 50% of the
workers used to get Rs. 820 and above. But after the dispute, workers in thh
category are less than 50%. Similarly, most of the workers are being paid around
Rs. 600 (after dispute) as against Rs. 760 (before dispute).
d) The'first quartile Q , has not changed. The second quartile Q2(i.e,, Median) has
decreased slightly, but the third quartile Q3 has increased. 'Fhe significance df the
information is as shown below :
Wages (Rs.)
Category (A) workers are not affected. The next higher category (B) workers
are now confined to a narrower range of salary. But the highest paid categories
(C) and (D) are now generally paid more after the dispute.
e) Standard deviation has increased from Rs. 30 to Rs. 110 implying thereby that
the variability in individual wages has increased after dispute. For pFoper
comparison, we have :
C.V. (before dispute) = 30
850
- X 100 = 3:53%
Pearson's measure of skewness. after dispute has decreased while the Bowley's
measure has increased, both being positive. This means that for middle 50% of
workers concentration in lower wages has increased. But when we consider all
the workers, then the relative concentration of frequencies on lower values side
is lower.
Note: There is nothing wrong if one formula gives result indicating increase in ' .
skewness while the other gives decrease in skewness. In fact, thtse can be
'
situations when one formula gives positive skewness while the other may give
negative skewness. This is because Bowley's me'thod is based on only middle 50%
data while Pearson's method relates to entire data.
............................................................................................................
2) What is skrwncss?
. .
..........................................................................................................
. '
I
1
..........................................................................................................
. . . .
1
a ................. .........................................................................................
i
. .
[.
!: 4). State whether the following statements dre True or False.
%
i,) Skcwness judges thc cxtent of representativeness of any average.
. ii) For a positively skewed distribution. on cent ration of frequencies is on left.
\
iii) Only relative value of skewness is used'for comparison even though standard
dcviation'is thc same.
i I iv) Skewness cannot he calculated for open end class intervals.
; v)- Skcwness docs not exist in Bimodel distribution
I
vi) In a perfectly symmetrical distribution, 50% items are above 60 and 75%
items are below 75. Therefore Me = ..........................
QJ = .........................Q1 = ........................., coefficient or qi:~~.tilt:
deviation is.. ...................,and coefficient of skewness is ......................
Suppose mean height of 100 persons selected from a big group is 68 'inches and
standard deviation is 1.5 inches.
i) What is the range of height of middle 95% persons in the whole group?
' ii) How much would be the expected value of mode, Q.D. and M.D. for the
whole group?
Solution
i) Now 95% of items have values between the range Mean k 1.96 SD. So the
+
required range is 68 1.96 X 1.5 or 65.06 inches to 70.94 inches.
ij) Mean = Mode. Therefore mode is also 68 inches
QD = -
3 3
-
2 SD approximately. So QD = 2 x 1.5 = 1 inch approximately
MD =A SD approximately. So MD 4
=- x 1.5 = 1.2 inch approximately
5 5
In fact normal curve is very much useful in drawing statistical inference. It is also
used as a standard to find out the extent of concentration of frequencies in the central
part of the given data. This is the fourth main characteristic in analysis of data, called
Kurtoses, the details of which are out of scope of this course.
In highly skewed data highest frequency exists on one extreme of the data. A
positivcly skcwed distribution has a long tail on right hand side of the data and is also
tcrmcd as right handed skew. A negatively skewed data has a long tail on left hand
s i t l c of the data and is also termed as left handed skew. When the graph of a perfectly
symmetrical data, bell shaped or U-shaped, folded at the line at mean, two sides of
tht! curves perfectly coincide with one another.
*' Most of thc data which occurs in nature resembles the normal distribution. Normal
, curve is a perfectly symmetrical data with bell shape. It has a fixed percentages of
frequencies lying in different ranges from mean. These values of percentages help us
I in deciding whether the given data is no,rmal or not.
16.11
- ANSWERS TO CHECK YOUR PROGRESS
A) 5) j)+False ii) True iii) True iv) False v) False vi) False vii) True viii) Trul
ix) True x) False xi) False xii) False.
6) i) no variation 'ii) symmetrical iii) skewed '
- Questiow '
looo-1100 22
1100-1200 -- 6
:.
(Answer : Skp = 715.5 - 669.23 = 0.243)
190.2
-. ' 3) The following data shows the daily salis at a pe&ol station. calculate the meai
median, standard deviation and coifficient bi skewness.
I
Mcssurcs of Skewness
Quantity sold tin Litrep) No.of Days
-
(Answer : X = 1426, -Md = 1600 = 447.35, SK = 1.167)
4) ' The following table gives the distribution of daily travelling allowance of
salesmen in a company. Compute Bowley's Coefficient of Skewness and
comment on its value.
Trnvelling Allowance (in Rs.) No. of Salesmen
100 - 120 14
120 - 140 16
140 - 1 60 20
-
160 180 IX
180- 2oU 15
-
200 220 7
(Answer : Ski, =
203.75 -
+ 39.9S (2 X 76.45) = 0.554)
203.75 - 39.95
7) You arc givcn bclow the details relating to the wages in respect of two factories.
From this it is concluded that the skewness and variability are the samc in both
t hc factories. Point out t h t mistakc or wrong. inference in sthe abovc statement.
Factory A '. Factory B '
.-
-.:
(Rs.) (Ks.)
Arithmetic Mean 50 -45
, .
Mode ' 45 SO
Variancc 1 OU 100
Measures of d ~ s ~ e r s i oand
n 8) Calculate Karl Pearson's coefficient of skewness based on the empirical
Skewness
relationship that exists between the central tendencies in a moderately
asymmetrical distribution:
Mean = 23, Median = 24, Standard Deviation = 10.
Is this distribution negatively or positively skewed?
(Answer : = -0.3)
9) The following is the position in a factory before and after the settlement of a;
industrial dispute. Comment on the gains or losses from the point of view sf
workers and that of management:
Before After
No. of Workers 3,000 2,900
Mean of Wages (Rs.) 220 230
Median of Wagec (Rs.) 250 240
Standard Deviation 30 26
Note : These questions and exercises will help you to understand the unit
better. Try to write answer for them. But do not submit your answer to the
University. These are for your practice only.