Lec 01 - Frequency Distribution - Stat - 1
Lec 01 - Frequency Distribution - Stat - 1
Frequency: Given a collection of data values, the specification of all the distinct values
together with the number of times each of these values occurs in the collection is called a
frequency distribution. The number of times a value occurs is called its frequency. The
frequency of a data value x is denoted by fx or just f .
Frequency distribution or frequency table is one of the important forms of tabulation. The
table prepared according to the levels of quantitative variables is known as frequency
table. Thus frequency table is nothing but classification by variable.
(i) Each frequency table should be started with a short, concise and self explanatory.
(ii) There are rows and columns of a frequency table, where rows are used to present
the number of observations against individual values of the variable or against
intervals of the values of variable. The first column headed by “Class interval”
has some rows. The second column headed by “Tally marks” is used to indicate
the number of observations belonging to a class. The third column headed by
“Frequency” is used to represent the number of observations of a class. The
fourth column headed by “Cumulative frequency” presents the number of
observations below an upper limit of the class or number of observations from
some lower limit and above of the class, as the case may be for cumulating. The
cumulating of observations is done either from top of the class or from bottom of
the class. Sometimes fifth column is used to present the mid-value of a class.
(iii) Decision regarding number of classes and class interval: Let x1, x2,…., xN are
the N sample observations arranged in array, where xN is the highest value and x1
is the lowest value in the series. Then
Range R= xN – x1.
Let k be the number of classes in a frequency table. This number k should not be
less than 4 and should not be more than 20. However, the value of k can be found
out by a mathematical formula, where
k = 1 + 3.3 logN
This rule is known as Sturge’s rule for number classes. Once the value of k is
decided, the interval (h, width) of a class is found our by
h = R/k.
1
It is better for large data set N, to take the value of h as 5 or multiple of 5. This
will be convenient and facilitate computation. The class interval should be
uniform as per as possible.
(iv) Class Limit: Every class has a lower limit and an upper limit, such as a –(a+h),
(a+h) –(a+2h),…..,{a+(k-1)h} –(a+kh). Here a is the lower limit of the first class
and (a+h) is upper limit of that class, again {a+(k-1)h} is the lower limit of the
last class and (a+kh) is the upper limit of that class. In practice, the data set may
be discrete or continuous. Whatever be the observation, discrete or continuous in
a data set, it is better to use continuous type of classes such as 5-10, 10-15, 15-20,
20-25, and so on, where observations of a class are included with exclusion of
upper limit. The exclusion of upper limit maintains the uniformity in the real
width of a class. However, for some sets of data, discontinuous classes may be
used. In such a case the lower limit and upper limit of classes are decided as
follows:
Lower limit of a class = observed lower limit – ½ x.
Upper limit of class = observed upper limit + ½ x.
Here, x = observed lower limit of the class minus the upper limit of the
preceding class. Thus, if the classes are 0-4, 5-9, 10-14, 15-19, and so on,
then the lower and upper limit of the classes are 0-0.5 to 4+0.5, 5-0.5
to9+0.5,….,10-0.5 to 14+0.5,……
Here, x = 1 (such as 5 – 4 = 1 for the first two classes).
(v) Width of a class: The difference between upper limit and lower limit of a class
is known as width of a class and it is usually denoted by h, where, h = (a+h)-a. If
the classes are 0-5, 5-10,…, the width h = 5-0 = 5, 10-5 = 5.
(vii) Frequency: The total tally marks of a class is the frequency of that class.
Cumulative Frequency
It is also a part of frequency table. It indicates the number of observations for less than a
certain upper limit of the class if cumulating is done from top of the table. The number of
observations from certain lower limit and above of the class is recorded if cumulating is
done from bottom of the table.
Let X is the mid value of class, where X = ½ [lower limit of the class + upper limit of the
class]
This mid value of a frequency table is used for further statistical analysis, such as
calculation of mean, variance, mean deviation and other characteristics of the
distribution.
2
Example. The, following data represent the number of workers in different small scale
industries in the country:
16, 25, 28, 32, 26, 25, 25, 20, 20, 22, 24, 26, 28, 30, 35, 32, 17, 20, 22, 22, 24, 25, 26, 28,
20, 18, 26, 28, 30, 30, 32, 34, 31, 36, 30, 35, 28, 27, 21, 24, 20, 18, 15, 15, 15, 18, 20, 22,
36, 26, 21, 23, 24, 26, 28, 30, 32, 34, 28, 27, 15, 20, 19, 26, 16, 24, 20, 18, 20, 20, 24, 27,
25, 25, 25, 26, 20, 21, 20, 28, 17, 30, 32, 33, 30, 28, 26, 24, 26, 24, 20, 18, 19, 18, 15, 16,
23, 18, 15, 17, 18, 20, 20, 18, 18, 19, 20, 21, 27, 25, 26, 19, 29, 20, 24, 26, 27, 29, 30, 32,
34, 28, 30, 27, 26, 28, 28, 33
(i) Prepare a frequency table of discontinuous type (ungrouped).
(ii) Find number of industries having 25 workers or more.
(iii) Find number of industries having workers 20 and less.
(iv) Prepare a frequency table (grouped).
(ii) There are 65 industries having 25 or more workers (last column of the table).
3
(iv) We have N = 128, x1 = 15, xN = 36 Range = R = xN – x1 = 36 – 15 = 21
k = 1 + 3.3logN = 1 + 6.9 = 8
Class interval = R/k = 21/8 = 3
It is observed that there are 44 industries in which number of workers are less than 21 and
there are 25 industries having 30 or more workers.
Additional questions:
(i) Find the percentage of industries which have 30 workers or more.
(ii) Find the percentage of industries which have 20 workers or less.
Answer: From the above table (i) there are 25.100/128 = 19.53% industries in which
number of workers are 30 or more.
(ii) there are 44.100/128 = 34.38% industries in which number of workers are 20 or less.
Sales managers, stock analysis, hospital administrators and other busy executives often
need a quick picture of the trends in sales, stock prices or hospital costs. These trends can
often be depicted by the use of charts and graphs. Three charts that will help portray a
frequency distribution graphically are (i) the histogram, (ii) the frequency polygon and
(iii) the cumulative frequency polygon or ogive.
Histogram