Statistics
Statistics
STATISTICS
n Some Basic Terms
l Data : The word data means information in the form of numerical figures or a set of given facts.
l Raw data : Data obtained from direct observation is called raw data.
The marks obtained by 10 students in a monthly test is an example of raw data or ungrouped data.
l Grouped data : To present the data in a more meaningful way, we condense the data into convenient
number of classes or groups, generally not exceeding 10 and not less than 5. This helps us in perceiving
at a glance, certain salient features of data.
l Observation : Each numerical figure in a data is called an observation.
l Frequency of an observation : The number of times a particular observation occurs is called its frequency.
l Range : The difference between the maximum and the minimum values of the given observations is called the
range of the data.
Given x1, x2..... xn (n individual observations)
Range = (Maximum Value) – (Minimum Value)
l Class boundaries or true upper and true lower limits
} In the exclusive form, the upper and lower limits of a class are respectively known as the true upper limit
and true lower limit.
} In the inclusive form, the number midway between the upper limit of a class and lower limit of the
subsequent class gives the true upper limit of the class and the true lower limit of the subsequent class .
l Class interval : Each group into which the raw data is condensed, is called a class-interval.
l Class size :The difference between the true upper limit and the true lower limit of a class is called its class size.
l Class mark of a class
æ True upper limit + True lower limit ö
Class mark = ç ÷
è 2 ø
The difference between any two successive class marks gives the class size.
Sum of observations
Mean =
Number of observations
l Mean of ungrouped Data
The mean of n observations x1, x2, ...., xn is given by
(x 1 + x 2 + x 3 + ...... + x n ) å x i
Mean, x = =
n n
where the symbol å, called sigma stands for the summation of the terms.
(f1 x1 + f2 x 2 + ... + fn x n ) å fi x i
Mean, x = =
(f1 + f2 + ..... + fn ) å fi
Step-2: Choose a suitable value of xi in the middle as the assumed mean and denote it by 'a'.
Step-3: Find di = xi – a for each i
Step-4: Find fi × di for each i
Step-5 : Find N = Sfi
S f id i
Step-6 : Calculate the mean, ( x ) by using the formula x = a + .
N
Sometimes, the values of x and f are so large that the calculation of mean by assumed mean method
becomes quite inconvenient. In this case, we follow the following steps:
Step-1 : Find the class mark xi of each class by using
Step-2: Choose a suitable values of xi in the middle as the assumed mean and denote it by 'a'.
Step-3 : Find h = (upper limit –lower limit) for each class.
xi – a
Step-4 : Find ui = for each class.
h
ì Sfi ´ u i ü
Step-6 : Calculate, the mean by using the formula x = a + í ý × h, where N = Sfi.
î N þ
Important facts about mean
n
} The algebraic sum of deviations taken about the mean is zero. i.e., å (x
i= 1
i - x) = 0
a+b
} The A.M. of two numbers a and b is
2
} Combined mean : If x1 and x2 are the arithmetic means of two series with n1 and n2. Observations
respectively, then the combined mean is :
n1 x1 + n2 x2
xc =
n1 + n2
x1 x 2 x x
} If x is the mean of x1, x2,....xn, then the mean of ax1, ax2,....axn is a x and that of , ,.... n is
a a a a
æ n +1 ö
} The mean of the first n natural numbers is ç ÷
è 2 ø
(n + 1)(2n + 1)
} The mean of the squares of the first n natural numbers =
6
2
æ n(n + 1) ö
} The mean of the cubes of the first n natural numbers = ç ÷
è 2 ø
l Median
} Median of ungrouped data
After arranging the given data in an ascending or a descending order of magnitude, the value of the
middle-most observation is called the median of the data.
} Method for finding the median of an ungrouped data
Arrange the given data in an increasing or decreasing order of magnitude. Let the total number of
observations be n.
æ n +1 ö
(i) If n is odd, then median = value of ç ÷ th observation.
è 2 ø
(ii) If n is even, then median
1 ìæ n ö æn ö ü
= í th observation + ç + 1 ÷ th observation ý
2 îçè 2 ÷ø è2 ø þ
} Median of a grouped data
Median : It is a measure of central tendency which gives the value of the middle most observation in the
data. In a grouped data, it is not possible to find the middle observation by looking at the cumulative
frequencies as the middle observation will be some value in a class interval. It is, therefore, necessary to
find the value inside a class that divides the whole distribution into two halves.
N
Median Class : The class whose cumulative frequency is greater than is called the median class.
2
To calculate the median of a grouped data, we follow the following steps
Step-1 : Prepare the cumulative frequency table corresponding to the given frequency distribution and obtain
N = Sfi.
N
Step-2 : Find
2
N
Step-3 : Look at the cumulative frequency just greater than and find the corresponding class (Median class).
2
ìN ü
ïï 2 – C ïï
Step-4 : Use the formula Median, M = l + í ý´ h
ï f ï
ïî ïþ
Where l = Lower limit of median class.
f = Frequency of the median class.
C = Cumulative frequency of the class preceding the median class.
h = Size of the median class.
N = Sfi.
Important facts about median
} The median does not take into consideration all the items.
} The sum of absolute deviations taken about the median is the least.
} The median can be calculated graphically while the mean cannot be.
} The median is not effected by extreme values.
} The sum of deviations taken about median is less than the sum of absolute deviations taken from any
other observation in the data.
l Mode
Mode is that value among the observations which occurs most often i.e. the value of the observation having the
maximum frequency.
} Mode of a grouped data
In a grouped frequency distribution, it is not possible to determine the mode by looking at the frequencies.
Modal Class : The class of a frequency distribution having maximum frequency is called modal class of a
frequency distribution.
The mode is a value inside the modal class and is calculated by using the formula.
ì f1 – f0 ü
Mode = l + í 2f - f - f ý ´ h
î 1 0 2þ
Application of an ogive
Ogive can be used to find the median of a frequency distribution. To find the median, we follow these steps.
Method-I
Step-1 : Draw anyone of the two types of frequency curves on the graph paper.
N
Step-2 : Compute where (N = Sfi ) and mark the corresponding points on the y-axis.
2
Step-3 : Draw a line parallel to x-axis from the point marked in step 2, cutting the cumulative frequency curve
at a point P.
Step-4 : Draw perpendicular PM from P on the x-axis. The x - coordinate of point M gives the median.
Method-II
Step-1 : Draw less than type and more than type cumulative frequency curves on the graph paper.
Step-2 : Mark the point of intersecting (P) of the two curves drawn in step 1.
Step-3 : Draw perpendicular PM from P on the x-axis. The x- coordinate of point M gives the median.
n Mean deviation
l Mean deviation (M.D.) for ungrouped or raw data
M.D. =
å|x - x| where x is each observation
i
n
x is arithmetic mean, or median or mode as specified in the problem, and n is the number of observations.
l Mean deviation for discrete data
M.D. =
åf i xi - x
or
å fD
N N
D = x-x
|a - b|
} M.D. of two numbers of a and b is .
2
STATISTICS EXERCISE
Direction (Q.1-2) : These questions are based on 6. The mean of 1,3,4,5,7,4, is m. The numbers
the following data (figure) 3,2,2,4,3,3 have mean m – 1 and median q, then
p+q=
(1) 7 (2) 6 (3) 5 (4) 4
7. A class of 40 students is divided into four groups
y-axis
named as A, B, C and D. Percentage of marks
60 scored by them are given below group wise in a
55
table.
50
45
Cumulative frequency
40
A B C D
35
30 20 42 10 21
25 30 51 25 69
20 40 45 85 70
15 25 58 73 86
10 22 53 98 53
5 45 64 43 68
0 65 72 64 99
x-axis
0 10 20 30 40 50 60 70 80 90 100 110 120
True limits of the classes
By using coefficient of range say which of the groups
has shown good performance.
(1) A (2) B (3) C (4) D
The given figure represents the percentage of
8. Life (in h) of 10 bulbs from each of four different
marks on x-axis and the number of students on y-
companies A, B, C and D are given below in the
axis.
table.
1. Find the number of students who scored less than
or equal to 50% marks.
(1) 35 (2) 15 (3) 20 (4) 30
A B C D
2. Find the number of students who scored greater than
120 700 950 530
or equal to 90% of marks.
1600 502 330 650
(1) 47 (2) 45 (3) 5 (4) 10 280 1430 1200 720
3. In a class of 15 students, on an average, each 420 625 500 550
student got 12 books. If exactly two students 800 780 445 748
received same number of books, and remaining 770 335 1260 570
students books average be an integer then which 270 224 375 635
of the following could be the number of books 455 1124 1130 804
150 473 185 500
received by each of the two students who received
same number of books?
(1) 11 (2) 15 (3) 20 (4) 25
By using the coefficient of range which company has
4. If the mean of observations x1, x2, ..., x10 is x then shown the best consistency in its quality?
the mean of x1 + a, x2 + a, x3 + a,...,xn + a is (1) A (2) B (3) C (4) D
9. If the mode of the observations 5, 4, 4, 3, 5, x, 3,
x 4, 3, 5, 4, 3 and 5 is 3, then find the median of the
(1) ax (2) x – a (3) x + a (4)
a observations.
(1) 3 (2) 4 (3) 5 (4) 3.5
n2
5. The mean of first n odd natural numbers is , then n 10. If the ratio of mean and median of a certain data is
81
2 : 3, then find the ratio of its mode and mean.
(1) 9 (2) 81 (3) 27 (4) 18 (1) 2 : 5 (2) 3 : 2 (3) 5 : 2 (4) 1 : 2
11. If the mean of n observations ax1, ax2, ax3,..., axn 21. The mean of the following distribution is 5, then find
is ax then, (ax1 - ax) + (ax 2 - ax)+...+ (axn - ax) = the value of B.
(1) ax (2) – ax x 3 5 7 4
(3) 0 (4) ax1 + axn f 2 A 5 B
12. If the arithmetic mean of the observations x1, x2, x3
...... x n is 1, then the arithmetic mean of (1) 10 (2) 6 (3) 8 (4) None
x1 x 2 x3 x n a+b a-b
, , ,... 22. The mean deviation of and (where a
k k k k 2 2
(k > 0) is and b > 0) is ________.
(1) greater than 1 (2) less than 1
b a
(3) equal to 1 (4) None of these (1) (2) (3) a (4) b
13. Range of 14, 12, 17, 18, 16 and x is 20. Find 2 2
x(x > 0) 23. If the mean of x + 2, 2x + 3, 3x + 4 ansd 4x + 5
(1) 2 (2) 28 is x + 2, the find the value of x.
(3) 32 (4) Can’t be determined (1) 0 (2) 1 (3) –1 (4) 2
14. The mean of a set of observation is a. If each 24. The range of 15, 14, x, 25, 30, 35 is 23. Find the
observation is multiplied by b and each product is least possible value of x.
decreased by c, then the mean of new set of (1) 14 (2) 12 (3) 13 (4) 11
observations is___________. 25. Find the median of the following data.
ANSWER KEY
Que. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Ans. 2 3 4 3 2 1 2 4 2 3 3 4 3 2 1 3 3 3 2
Que. 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
Ans. 2 1 3 2 3 4 1 2 2 2 2 1 4 2 1 3 3 3 1
Que. 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
Ans. 2 1 2 2 4 3 2 1 3 2 1 2 4 3 3 2 2 1 3