Episode 2
Episode 2
Example 1: In a survey of 40 families in a village, the number of children per family was
recorded and the following data obtained.
1 0 3 2 1 5 6 2
2 1 0 3 4 2 1 6
3 2 1 5 3 3 2 4
2 2 3 0 2 1 4 5
3 3 4 4 1 2 4 5
1
Represent the data in the form of a discrete frequency distribution.
Solution:
Example 2: In a survey of 30 students, the number of siblings each student has was
recorded, and the following data was obtained:
2 1 0 3 2 1 4 2
1 0 2 3 1 2 1 4
3 2 1 0 2 3 1 2
4 1 2 3 1 2
Solutions
The distinct values in the data are: 0, 1, 2, 3, 4.
Count the Frequency of Each Value
We count how many times each value appears in the dataset:
Number of Siblings (x) Frequency (f)
0 3
1 8
2 11
3 5
2
4 3
Add up all the frequencies to ensure they match the total number of
students surveyed (30):
3+8+11+5+3=303+8+11+5+3=30 Correct!
0 3
1 8
2 11
3 5
4 3
Interpretation:
Sometimes, data collected are so large that it may not easily be managed; as a result it
becomes necessary to group the data through the use of some intervals. When data are
organised by the use of some intervals (class intervals), the organized data is called
grouped data. The advantage of a group frequency distribution is that it enables a very
large array of data to be reduced to a smaller manageable size.
3
50-100 4
100-150 12
150-200 22
200-250 33
250-300 16
300-350 8
350-400 5
Total 100
Nature of class
The following are some basic technical terms when a continuous frequency distribution
is formed or data are classified according to class intervals.
a. Class limits:
The class limits are the lowest and the highest values that can be included in the class.
For example, take the class 50 -100. The lowest value of the class is 50 and highest class
is 100. The two boundaries of class are known as the lower limits and the upper limit of
the class. In statistical calculations, lower class limit is denoted by L and upper class limit
by U.
b. Class Interval:
The class interval may be defined as the size of each grouping of data.
For example, 50 -75, 75 -100, 100 -125…are class intervals. Each grouping begins with
the lower limit of a class interval and ends at the lower limit of the next succeeding class
interval.
d. Range:
The difference between largest and smallest value of the observation is called the Range
and is denoted by ‘R’ i. e
R = Largest value – Smallest value
R =L–S
4
e. Mid-value or mid-point:
The central point of a class interval is called the mid value or mid-point. It is found out by
adding the upper and lower limits of a class and dividing the sum by 2. i.e
Midvalue=L+U/2.
For example, if the class interval is 20 - 30 then the mid-value is 20+30/2 = 25.
f. Number of class interval:
The number of class interval in a frequency is matter of importance. The number of class
interval should not be too many. For an ideal frequency distribution, the number of class
intervals can vary from 5 to 15. To decide the number of class intervals for the frequency
distributive in the whole data, we choose the lowest and the highest of the values. The
difference between them will enable us to decide the class intervals. Thus the number of
class intervals can be fixed arbitrarily keeping in view the nature of problem under study
or it can be decided with the help of Sturges’ Rule.
Where
Example: if the number of observations is 10, then the number of class intervals is
K = 1 + 3. 322 log10 10 = 4.322 ≅ 4
Since the size of the class interval is inversely proportional to the number of class interval
in a given distribution. The approximate value of the size (or width or magnitude) of the
class interval ‘C’ is obtained by using Sturges’ rule as
= Range/1+3.322log10N
There are three methods of classifying the data according to class intervals namely
Type of class interval in which the class interval overlaps. The following data are classified
on this basis.
0-5000 60
5000 - 10000 95
15000 - 20000 83
20000 - 25000 40
TOTAL 400
The first class implies all the set of data from 0 to 4999.99, 5000 is not included in the first
class but the second class implies all sets of numbers from 5000 to 9999.99; 10000 is not
included but transferred to the third class etc.
In this method, the overlapping of the class intervals is avoided. Both the lower and upper
limits are included in the class interval. This type of classification may be used for a
grouped frequency distribution for discrete variable like members in a family, number of
workers in a factory etc., where the variable may take only integral values. It cannot be
used with fractional values like age, height, weight etc.
5-9 7
10-14 12
15-19 15
20-24 21
25-29 10
6
30-34 5
Total 70
Thus, to decide whether to use the inclusive method or the exclusive method, it is
important to determine whether the variable under observation in a continuous or discrete
one. In case of continuous variables, the exclusive method must be used. The inclusive
method should be used in case of discrete variable.
c. Open-end classes:
A class limit is missing either at the lower end of the first class interval or at the upper end
of the last class interval or both are not specified. The necessity of open end classes
arises in a number of practical situations, particularly relating to economic and medical
data when there are few very high values or few very low values which are far apart from
the majority of observations. The example for the open-end classes as follows:
Below 2000 7
2000-4000 5
4000-6000 6
6000-8000 4
Total 25
The premise of data in the form of frequency distribution describes the basic pattern which
the data assumes in the mass. Frequency distribution gives a better picture of the pattern
of data if the number of items is large. If the identity of the individuals about whom
particular information is taken, is not relevant then the first step of condensation is to
divide the observed range of variable into a suitable number of class-intervals and to
record the number of observations in each class.
Example 1: Given below are the numbers of tools produced by workers in a factory.
43 18 25 18 39 44 19 20 20 26
7
40 45 38 25 13 14 27 41 42 17
34 31 32 27 33 37 25 26 32 25
33 34 35 46 29 34 31 34 35 24
28 30 41 32 29 28 30 31 30 34
31 35 36 29 26 32 36 35 36 37
32 23 22 29 33 37 33 27 24 36
23 42 29 37 29 23 44 41 45 39
21 21 42 22 28 22 15 16 17 28
22 29 35 31 27 40 23 32 40 37
Using the Sturges rule determine the number of class interval and prepare frequency
distribution table.
Solution
46 − 13
c = =4.34
7.6
Thus the number of class interval is 8 and size of each class is 5. The required frequency
distribution is prepared using tally marks as given below:
Hence taking the magnitude of class intervals as 5, we have 7 classes 13 -17, 18-22…
43-47 are the classes by inclusive type. Using tally marks, the required frequency
distribution is obtained in the following table
8
Histogram
Frequency distribution can be represented in form of graphs and charts. Histogram is also
called block frequency diagram. It shows the pattern of the distribution of data whether
symmetrical or skewed. Histogram is a continuous distribution, and therefore if the class
interval is discrete, we need to adjust it to a continuous one before the histogram is drawn
by subtracting 0.5 from lower classes and adding 0.5 to upper classes. The histogram is
constructed by placing the class boundaries on the horizontal (X) axis and the frequency
on the vertical (Y) axis.
Example 2: The scores of thirty students in Statistics examination were given as follows
9
136-140 135.5 -140.5 140.5 5
Total 30
5
4
3
2
1
0
125.5 130.5 135.5 140.5 145.5 150.5 155.5 160.5
Upper Class Boundaries
Frequency Polygon
Frequency is obtained by plotting the midpoints of each class against the corresponding
frequency of that class. It can also be obtained by joining the midpoints of the tops of the
rectangles of the histogram and extending the lines to meet the X-axis. A polygon thus
drawn will have the same area as the corresponding histogram if the class intervals are
the same.
Using the data in example 2 plot the frequency polygon of the distribution
10
126-130 125.5 -130.5 130.5 4 128.00
Total 30
10
8
Frequency
4 Histogram
2 Frequency Polygon
0
125.5 130.5 135.5 140.5 145.5 150.5 155.5 160.5
10
8
Frequency
6
4
2
0
62.75 128 133 138 143 148 153 158
Mid-value
11