Statistucs Theory
Statistucs Theory
Let us consider the marks obtained by 30 students in an exam. The data is as given below:
7, 7, 2, 5, 5, 3, 1, 3, 2, 1
7, 8, 9, 4, 1, 6, 7, 4, 4, 3
2, 9, 5, 1, 1, 6, 3, 7, 9, 1
This representation of data is called Raw Data. This data doesn’t furnish any useful information and is rather
confusing. A better way to express the data is as given below.
The representation of data as above is known as ‘ungrouped frequency distribution’. Here, marks are called the
variable (x) and the number of students against the marks is known as the frequency (f) of the variable.
If the identity of the individuals about whom particular information is taken is not relevant, nor the order in which the
observations arise, then the first real step of condensation is to divide the observed range of variable into a suitable
number of class intervals. For example, in the above case, the data may be expressed as shown below.
The representation of data as above is known as ‘grouped frequency distribution’ (inclusive type of
Classification).
Note:
(a) The classes of the type 1 – 3, 4 – 6, 7 – 9, etc., in which both the upper and lower limits are included are called
inclusive classes.
(b) The following points may be kept in mind for classification of data into grouped frequency.
The classes should be clearly defined and should not lead to any ambiguity.
The classes should be exhaustive i.e. each of the given values should be included in one of the classes.
The classes should be mutually exclusive and non overlapping
The classes should be of equal width. However, it cannot be rigidly followed.
If the classes are of varying width, the different class frequencies will not be comparable. The ratio of frequency by
the corresponding width of the class interval is called ‘frequency density’.
(c) The difference between the greatest and smallest observations is called the ‘range’.
KUKATPALLY CENTRE: # 22-97, Plot No.1, Opp. Patel Kunta Huda Park, Vijaynagar Colony, Hyderabad - 500 072. Ph.: 040-64601123
MIYAPUR CENTRE: Above Sai Motors Maruthi Showroom, Allwyn X Road, Miyapur, Hyderabad.
Regd. Off.: 29A, ICES House, Kalu Sarai, Sarvapriya Vihar, New Delhi - 110 016. Ph: 011 - 2651 5949, 2656 9493, Fax: 2651 3942
If we deal with a continuous variable, it is not possible to arrange the data in the class intervals of above type. For
example, let us consider the distribution of age in years. If class intervals are 1 – 3, 4 – 6, 7 – 9…… then the
persons with ages between 6 and 7 years are not taken into consideration. In such cases, we form the class
intervals as shown below.
This form of frequency distribution is known as ‘continuous frequency distribution’ (or) ‘exclusive type of
classification’.
Note: It should be understood that in the above classes, the upper limit of each class are excluded from the
respective classes.
Histograms: In drawing the histogram of a given continuous frequency distribution, we first mark off along the x-
axis, all the class intervals on suitable scale. On each class intervals, draw rectangles with heights proportional to
the frequency of the corresponding class interval. The diagram of continuous rectangles so obtained is called
Histogram.
Note: To draw the histograms for an ungrouped frequency distribution of a variable, we shall have to assume that
h h
the frequency corresponding to the variable (x) is spread over the interval x to x where h is the jump from
2 2
one value to next. For example, consider the following frequency distribution.
3. Frequency Polygon:
KUKATPALLY CENTRE: # 22-97, Plot No.1, Opp. Patel Kunta Huda Park, Vijaynagar Colony, Hyderabad - 500 072. Ph.: 040-64601123
MIYAPUR CENTRE: Above Sai Motors Maruthi Showroom, Allwyn X Road, Miyapur, Hyderabad.
4. Average (or) Measure of central tendency:
x f ,
n
1
(b) In case of ungrouped frequency distribution xi | fi , i 1,2,3,....,n the Arithmetic Mean is given by x i i
N i1
f
n
where N i
i1
f
n n
1
(c) In case of continuous frequency distribution, the arithmetic mean, x xi fi , where N i and xi is the
N i1 i1
mid value of class interval.
x 1 : 2 3 4 5 6 7
f 5 : 9 12 17 14 10 6
fi xi 299
Solution: x
fi 73
Solution:
2800
Arithmetic Mean x 28
100
Note:
x f
n
1
The Arithmetic Mean x i i
N i1
Let di xi A, where A is a fixed constant for all i 1,2....n . Then xi di A
n n n n
1 1 A 1
A.M., x
N
i1
fi di A
N
i 1
fidi
N i1
fi A
N fd
i 1
i i
KUKATPALLY CENTRE: # 22-97, Plot No.1, Opp. Patel Kunta Huda Park, Vijaynagar Colony, Hyderabad - 500 072. Ph.: 040-64601123
MIYAPUR CENTRE: Above Sai Motors Maruthi Showroom, Allwyn X Road, Miyapur, Hyderabad.
Example (3): Find the Arithmetic Mean of the data given in Example (2) by shifting the origin.
Solution:
Let A 25
300
Arithmetic Mean 25 28
100
xi A
(b) Change of Scale: Let di , for all i 1,2,...n , where A is any constant and h is length of each class.
h
h
Then, x A
N f d
i i
Example (4): Find the Arithmetic Mean of the data given in Example (2) by change of scale.
Solution:
Let A 25 and h 10
h 10 30
Then, x A
N f d 25
i i
100
28
(c) Let the mean of m variables x1, x 2 ,...x m be x and the mean of y1,y 2 ,....yn be y . Then the mean of combined
mx ny
data x1, x 2 .....xm ,y1,y 2 ....yn is equal to .
mn
Example (5): The average salary of a male employee is Rs 520 and that of a female is Rs 420. The mean salary of
all the employees is Rs. 500. Find the percentage of male and female employees.
KUKATPALLY CENTRE: # 22-97, Plot No.1, Opp. Patel Kunta Huda Park, Vijaynagar Colony, Hyderabad - 500 072. Ph.: 040-64601123
MIYAPUR CENTRE: Above Sai Motors Maruthi Showroom, Allwyn X Road, Miyapur, Hyderabad.
Merits and demerits of A.M.:
n
1
logG
N f log x
i 1
i i
1/ N
(c) In case of grouped frequency distribution Geometric Mean is given by G x1f1 x 2f 2 ...xnfn , where
n
N f
i1
i and x1,x 2 ,......xn are mid-values of the class intervals.
Note: Let G1 be the G.M. of x1, x 2 ,...x m and G2 be the G.M. of y1,y 2 ,....yn . Let G be the G.M. of combined data
x1, x 2 ,....xm ,y1,y 2 ....yn .
1/ m n mlogG1 nlogG2
Then, G x1x 2 ....xm y1y 2 y3 ...yn
1/ m n
G G1m Gn2 logG
mn
i1
xi
i1
n
N
(c) In case of grouped frequency distribution, Harmonic Mean is given by H.M. n
fi
, where N f and i
i1
xi
i1
Examples (6): A student starts from his house and reaches the college at speed of 10 kmph and back from college
to home at speed of 15 kmph. Find average speed.
Solution:
2x 2 t1
Vavg 12km / h x
t1 t 2 1 1 A B
v1 v 2
t2
KUKATPALLY CENTRE: # 22-97, Plot No.1, Opp. Patel Kunta Huda Park, Vijaynagar Colony, Hyderabad - 500 072. Ph.: 040-64601123
MIYAPUR CENTRE: Above Sai Motors Maruthi Showroom, Allwyn X Road, Miyapur, Hyderabad.
Merits and Demerits of H.M:
(iv) Median: Median of a distribution is the value of the variable which divides it into two equal parts.
(a) In case of ungrouped data, if the number of observations is odd then median is the middle value after values
have been arranged in ascending or descending order.
In case of even number of observations, there are two middle terms and median is obtained by taking A.M. of
middle two terms. In fact, any value lying between the middle two values can be taken as the median.
(b) In case of discrete (ungrouped) frequency distribution, median is obtained by considering the cumulative
frequencies. The steps for calculating median are given below.
f .
n
N
Step (1): Find , where N i
2 i1
N
Step (2): See the less than cumulative frequency (l.c.f) just greater than .
2
Step (3): The corresponding value of x is the median.
x: 1 2 3 4 5 6 7 8 9
f: 8 10 11 16 20 25 15 9 6
Solution:
x f c.f
1 8 8
2 10 18
3 11 29 N
4 16 45 Here, 60
2
5 20 65 Therefore, Median = 5
6 25 90
7 15 105
8 9 114
9 6 120
N=120
(c) In case of grouped frequency distribution the median is given by the following formula:
h N
Median c
f 2
where, is the lower limit of the median class
h is the length of the median class
f is the frequency of the median class
f .
n
c is the cumulative frequency of the class preceding the median class and N i
i1
KUKATPALLY CENTRE: # 22-97, Plot No.1, Opp. Patel Kunta Huda Park, Vijaynagar Colony, Hyderabad - 500 072. Ph.: 040-64601123
MIYAPUR CENTRE: Above Sai Motors Maruthi Showroom, Allwyn X Road, Miyapur, Hyderabad.
Proof:
xn xn1 fn Fn
N fi
slopeof AB slope of AC
N
F
Fk Fk 1 2 k 1
h x xk
N
c
f hN
2 x c
h x f 2
Solution:
Class Frequency c.f
Intervals
20-30 3 3
30-40 5 8
40-50 20 28
50-60 10 38
60-70 5 43
N = 43
N
Here , 21.5 , 40 , h 10 , f 20 and c 8
2
10
Median 40 21.5 8 46.75
20
Demerits: (a) In case of even number of observations, median can’t be determined exactly.
KUKATPALLY CENTRE: # 22-97, Plot No.1, Opp. Patel Kunta Huda Park, Vijaynagar Colony, Hyderabad - 500 072. Ph.: 040-64601123
MIYAPUR CENTRE: Above Sai Motors Maruthi Showroom, Allwyn X Road, Miyapur, Hyderabad.
(b) It does not depend on all the observations.
Uses: Median is the only average to be used while dealing with qualitative data which cannot be measured
quantitatively but still can be arranged in ascending or descending order of magnitude. For example, average
intelligence or average honesty.
(v) Mode: Mode is the value which occurs most frequently in a set of observations and around which other
observations of the set cluster densely.
Demerits:
(a) Mode is ill-defined. For example, the data 1, 2, 7, 2, 3, 4, 5, 3 have two modes. This kind of data is called
bimodal.
(b) It is not based upon all the observations
Note: (a) For a symmetrical distribution mean, median and mode coincide.
For example, for the following data Mean = Median = Mode = 5
x f
1 1
2 3
3 5
4 3
5 1
(b) If the distribution is moderately asymmetrical Mean, Median and Mode obey the following empirical
relation.
Mode 3 Median 2 Mean
5. Partition values: These are the values which divide the series into a number of equal parts.
(i) The three points which divide the series into 4 equal parts are called quartiles. The first, second and third points
are known as first, second and third quartiles, respectively. The first quartile ( Q1 ) is the value which exceed 25% of
observations and is exceeded by 75% of observations. The second quartile ( Q2 ) is the value which exceed 50% of
observations and is exceeded by 50% of observation. The third quartile ( Q3 ) is the value which exceed 75% of
observations and is exceeded by 25% of observation.
(ii) The nine points which divide the series into ten equal parts are called deciles. The first, second, …., ninth
deciles are represented by D1,D2 ,.....D9 respectively.
(iii) The ninety nine points which divide the series into hundred equal parts are called percentiles. The first, second,
percentiles are represented by P1,P2 ,..... respectively.
KUKATPALLY CENTRE: # 22-97, Plot No.1, Opp. Patel Kunta Huda Park, Vijaynagar Colony, Hyderabad - 500 072. Ph.: 040-64601123
MIYAPUR CENTRE: Above Sai Motors Maruthi Showroom, Allwyn X Road, Miyapur, Hyderabad.
Example(9) : Calculate the median, quartiles, 4th decile and 27th percentile of following data:
x: 0 1 2 3 4 5 6 7 8
f: 1 9 26 59 72 52 29 7 1
Solution:
N
(i) 128 Median 4
2
x f c.f
0 1 1 N
(ii) 64 Q1 3
1 9 10 4
2 26 36 N
128 Q2 4
3 59 95 2
4 72 167 3N
192 Q3 5
5 52 219 4
6 29 248
4N
7 7 255 (iii) 102.4 D4 4
10
8 1 256
N 256 27N
(iv) 69.12 P27 3
100
Mode:
A x k ,fk 1 , B xk 1,fk
D xk ,fk , C xk 1,fk 1
fk fk 1
y fk 1 x xk
h
f f
y fk k 1 k x xk
h
x xk
fk fk 1 2fk fk 1 fk 1
h
h fk fk 1
x xk
2fk fk 1 fk 1
h fk fk 1
Mode
2fk fk 1 fk 1
KUKATPALLY CENTRE: # 22-97, Plot No.1, Opp. Patel Kunta Huda Park, Vijaynagar Colony, Hyderabad - 500 072. Ph.: 040-64601123
MIYAPUR CENTRE: Above Sai Motors Maruthi Showroom, Allwyn X Road, Miyapur, Hyderabad.
2. Measures of Dispersion
1. Measure of dispersion:
Consider the series (i) 7, 8, 9 (ii) 8, 8, 8 (iii) 4,8,12
In all these cases, we see that number of observations is 3 and the mean is 8. If we are given that, the mean of 3
observations is 8, we cannot form an idea as to whether it is the average of first series or second or third series or of
any other series of 3 observations whose sum is 24. Thus, we see that measure of central tendency is inadequate to
give us a complete idea of the distribution. They must be supported and supplemented by some other measures.
One such measure is dispersion.
Range: The range is the difference between two extreme observations of the distribution. If A and B are the greatest
and smallest observations respectively in a distribution, then its range is A-B.
Range is the simplest, but a crude measure of dispersion. Since it is based on two extreme observations, it is not at
all a reliable measure of dispersion.
1
(ii) Quartile Deviation: Quartile deviation Q is given by Q Q3 Q1 , where Q1 and Q3 are the first and third
2
quartiles of the distribution respectively.
Quartile deviation is definitely a better measure than range as it makes use of 50% of the data but since it ignores
the other 50 % of the data, it cannot be regarded as a reliable measure.
(iii) Mean Deviation: Consider n observations x1, x 2 ,x3 ,.....xn . Then mean deviation from the average A (usually
n
1
A.M, median or mode) is given by
n x A .
i 1
i
1
n n
In case of frequency distribution xi | fi , i 1,2,......,n , Mean deviation =
i 1
fi xi A , where N
i1
fi .
N
Since mean deviation is based on all observations, it is better measure of dispersion than quartile deviation.
(iv) Standard Deviation: The standard deviation of n observations x1, x 2 ,x3 ...x n is denoted by , and it is given by
x1 x x 2 x .... xn x
2 2 2
1
x x
2
i , where x AM of x1,x 2 .....xn .
n n
n n
1
fi xi x , where N f .
2
i
N i 1 i1
KUKATPALLY CENTRE: # 22-97, Plot No.1, Opp. Patel Kunta Huda Park, Vijaynagar Colony, Hyderabad - 500 072. Ph.: 040-64601123
MIYAPUR CENTRE: Above Sai Motors Maruthi Showroom, Allwyn X Road, Miyapur, Hyderabad.
(v) Root mean square deviation: The root mean square deviation of n observations x1,x 2 ,....xn about any constant
A is denoted by ‘s’ and it is given by
A x1 2 A x2 2 ..... A xn 2 1
n
A x
2
s i .
n n i 1
n n
1
fi xi A , where N f .
2
In case of frequency distribution xi | fi , i 1,2,.....,n, we have s i
N i 1 i1
n
1
x x
2
(a) 2 i
n i1
n n 2
1 1 x
n x 2x n x n n
i1
2
i
i1
i
n
1
n x
i1
2
i i 2x x x 2
n
1
2
n x i 1
2
i x2
(b) The shifting of origin doesn’t change value of the variance i.e variance is independent of shifting the origin.
n
1
y y
2
2y 2
i
n i 1
n
1
x k x k 2 2
i
n i 1
n n n
k2
1 x
1 2k
n
i 1
xi2
n i1
xi
n i1
2
2kx k 2
n
1
x x 2 2
i 2x
n i1
1
n
1
n 1 n
yi2 y hxi hx x x
2 2 2 2
2y h2 2
h2 2x
n n n i
i 1 i 1 i 1
KUKATPALLY CENTRE: # 22-97, Plot No.1, Opp. Patel Kunta Huda Park, Vijaynagar Colony, Hyderabad - 500 072. Ph.: 040-64601123
MIYAPUR CENTRE: Above Sai Motors Maruthi Showroom, Allwyn X Road, Miyapur, Hyderabad.
(d) Relation between s and :
n
1
x A
2
s2 i
n i 1
n
1
x x x A
2
i
n i 1
n
1
x x d
2
i , where d x A
n i1
x x
n
1
d2 2d xi x
2
i
n i 1
1
n 1 n 1
n
x x x
2
d2 2d xi
n
i n n
i 1 i1 i 1
d 2d x x d2
2 2 2
s2 = σ2 + d2 where, d2 x A
2
3. Coefficient of Dispersion:
Whenever we want to compare, the variability of two series which differ widely in their averages or which are
measured in different units, we do not merely calculate the measure of dispersion but we calculate the coefficients
of dispersion. The coefficient of dispersion (C.D) based on different measures of dispersion are as follows:
A B
(i) Based upon the range: C.D = ,where A and B are greatest and the smallest items in the series.
A B
Q Q1
(ii) Based upon Quartile deviation: C.D 3
Q3 Q1
Mean deviation
(iii) Based upon Mean deviation: C.D
Average from which it is calculated
(iv) Based upon standard deviation: C.D =
x
4. Coefficient of variation (C.V.): The coefficient of variation is given by C.V 100
x
Examples:
1. Find the mean deviation about the mean for 38, 70, 48, 40, 42, 55, 63, 46, 54, 44
Sol: Given data is 38, 70, 48, 40, 42, 55, 63, 46, 54, 44.
38+70 48+40 42+55+63+46+54+44 500
The arithmetic mean of the given data is, x 50
10 10
Then the absolute values |xi x| are 12, 20, 2, 10, 8, 5, 13, 4, 4, 6
10
|xi x| 84
i 1
Mean deviation about the mean = 8.4
10 10
2. Find the mean deviation about the median for 13, 17, 16, 11, 13, 10, 16, 11, 18, 12, 17
Sol. Given data is 13, 17, 16, 11, 13, 10, 16, 11, 18, 12, 17.
On keeping the observations in ascending order 10, 11, 11, 12, 13, 13, 16, 16, 17, 17, 18
The median (b) of the given 11 observations is 13.
Then the absolute values |xi b|are 3, 2, 2, 1, 0, 0, 3, 3, 4, 4, 5.
KUKATPALLY CENTRE: # 22-97, Plot No.1, Opp. Patel Kunta Huda Park, Vijaynagar Colony, Hyderabad - 500 072. Ph.: 040-64601123
MIYAPUR CENTRE: Above Sai Motors Maruthi Showroom, Allwyn X Road, Miyapur, Hyderabad.
11
|xi b| 27
i 1
Mean deviation about the median 2.45
11 11
3. Find the mean deviation about the mean for the following distribution.
xi 10 11 12 13
fi 3 12 18 12
Sol. Construct the following data to find mean deviation about mean
xi 10 11 12 13
fi 3 12 18 12
xi f i 30 132 216 156
Arithmetic mean, x
fi xi 534 11.87
fi 45
4
4. Find the mean deviation about the median for the following frequency distribution.
xi 5 7 9 10 12 15
fi 8 6 2 2 2 6
5. Find the mean deviation about the mean for the following data:
Marks obtained 0-10 10-20 20-30 30-40 40-50
No. of students 5 8 15 16 6
Sol. We take the assumed mean A 25 , Here, C 10 . Hence, we form the following table
xi 25
frequency f i
Class Midpoint
di f i di xi x f i xi x
interval xi 10
KUKATPALLY CENTRE: # 22-97, Plot No.1, Opp. Patel Kunta Huda Park, Vijaynagar Colony, Hyderabad - 500 072. Ph.: 040-64601123
MIYAPUR CENTRE: Above Sai Motors Maruthi Showroom, Allwyn X Road, Miyapur, Hyderabad.
0 – 10 5 5 2 10 22 110
10 – 20 8 15 1 8 12 96
20 – 30 15 25A 0 0 2 30
30 – 40 16 35 1 16 8 128
40 – 50 6 45 2 12 18 108
fi 50 N fi di 10 fi xi x 472
Here, N 50 . Mean x A C
fi di 25 10 10 25 2 27
50
N
5
fi xi x 50 472 9.44
1 1
Mean deviation about the mean M.D
N i 1
6. Find the mean deviation from the median for the following data
Age (years) 20-25 25-30 30-35 35-40 40-45 45-50 50-55 55-60
No. of workers f i 120 125 175 160 150 140 100 30
8
N = 1000 fi xi med. 8175
i1
N 1000
Here th observation 500th observation lies in the class interval 35 40. This is the
2 2
median class.
500 420
Median L N /2 p.c. f / f i 35 5
160
400
35 35 2.5 37.50
160
8
KUKATPALLY CENTRE: # 22-97, Plot No.1, Opp. Patel Kunta Huda Park, Vijaynagar Colony, Hyderabad - 500 072. Ph.: 040-64601123
MIYAPUR CENTRE: Above Sai Motors Maruthi Showroom, Allwyn X Road, Miyapur, Hyderabad.
Some Important results to be remembered:
nx xi x '
1. The A.M. of n numbers x1, x 2 ,x3 ,......, xn is x . If xi is replaced by x ' , then new average is .
n
2. Let x, 2 , are Mean, Variance and standard deviation of x1, x 2 ,x3 ,......, xn then,
x A
2
4. Sum of squares of deviations of variants is minimum when taken about the mean. i.e. if i is
minimum then A x .
5. A.M. of a combined series: Let x and y are the A.M.s of two series of sizes m and n and X be the mean of
m.x n.y
the combined data then X .
mn
6. G.M. of a combined series: Let G1 and G2 are G.M.s of two series of sizes m and n and if G is geometric
m.logG1 n.logG2
mean of combined series, then logG .
mn
7. Variance of a combined series: If m,n are the sizes, x, y the means and 1, 2 the standard deviation of two
1 m.x n.y
series, then 2 [m.(12 d12 ) n.(22 d22 )] , where d1 x X , d2 y X and X .
mn mn
n2 1
8. The variance of first n natural numbers is .
12
xi a n and x a
n n
na n,a 1 then the standard deviation of ‘n’ observations x1,x 2 ,,xn
2
9. If i
i1 i1
is a 1 .
10. If the distribution is moderately asymmetrical Mean, Median and Mode obey the empirical relation
Mode = 3.Median – 2.Mean.
KUKATPALLY CENTRE: # 22-97, Plot No.1, Opp. Patel Kunta Huda Park, Vijaynagar Colony, Hyderabad - 500 072. Ph.: 040-64601123
MIYAPUR CENTRE: Above Sai Motors Maruthi Showroom, Allwyn X Road, Miyapur, Hyderabad.