Data An Overview Lecture 5
Data An Overview Lecture 5
Lecture 5
Frequency Distribution
“Statistical data are always numerical.”
-True or False?
Consider the following examples.
• On asking a class of 12 students what is their favourite colour, the
following answers were received:
Blue, Blue, Green, Blue, Red, Purple, Cyan, Blue, Green, Orange,
Red, Blue
• When the same group was asked what was their grade in the last
exam, the following responses were received:
A, A, A, B, A, B, A, A, A, C, A, B
• Don’t we consider these responses to be data?
Attributes
• An attribute is a quantitative character that cannot be numerically
expressed. Example: Colour, Religion, Economic Status, Educational
Qualification, Mother Tongue.
• Data on attributes may be of two types.
• If, there is an inherent ordering of the forms or the categories, then
that particular data type is known as ordinal data. Example: Economic
Status, Educational Qualification, Grade.
• However, if there is no inherent ordering of the categories, then that
data type is known as nominal data. Example: Colour, Religion,
Gender, Mother Tongue.
Frequency Distribution of Attributes
The term frequency refers to the number of cars with each colour. For example, the frequency for red is 6. The
total frequency is 20. The distribution of the total frequency over all the categories is known as the frequency
distribution. Table 1 is a frequency table that describes the frequency distribution of the car colours. Relative
frequency refers to the relative share of the frequencies over different categories.
Note: The final frequency table should not contain any tally marks.
Graphical Representation of Attributes
• The frequency distribution of an attribute when expressed in terms
of absolute frequencies, can be represented by horizontal bar
diagrams.
Frequency Distribution of Car colours
Magenta
Yellow
Green
Pink
Blue
Red
0 1 2 3 4 5 6 7 8 9
Graphical Representation of Attributes
• Stacked bar diagram and Pie diagram can be used to showcase the
frequency distribution in terms of relative frequencies:
Pie diagram showing the frequency distribution Stacked bar diagram diagram showing the
of car colours. frequency distribution of car colours.
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Frequency
• Variables can be classified in two main types, namely 1. Discrete and 2. Continuous
• Discrete Variable: A quantitive character that can take certain isolated values only in its
range of variation is called a discrete variable. e.g. No. of students in different colleges, the
size of families in a locality.
• Continuous Variable: A quantitative character that can assume any value within its range of
variation is termed as a continuous variable. e.g. The weight of individuals, marks
obtained by candidates in an exam, income of different persons etc.
Frequency Distribution of a Variable
The same frequency distribution may be represented with relative frequencies. For variables, another important
feature is the cumulative frequencies. We may ask: how many families have more than or less than a certain number
of members. Cumulative frequency tables help us answer the questions.
Frequency Distribution of a Discrete Variable:
Table 3. More than and less than type cumulative frequency table of house sizes in that particular locality.
To get the less than type cumulative frequency, we start with the smallest value assumed by the variable and we
consecutively add the frequencies as the values gradually increase. The more than type cumulative frequency starts at
the largest value assumed by the variable and we consecutively add the frequencies as the values gradually decrease.
Graphical Representation of the frequency
distribution of a Discrete Variable:
Consider the data on the family size with the following frequency table
(Table 2.):
30
25
Frequency
20
15
10
0
0 1 2 3 4 5 6 7 8 9
Family SIze
Frequency Polygon
30
25
Frequency
20
15
10
0
0 1 2 3 4 5 6 7 8
Family Size
Column Diagram
1. A frequency distribution of a discrete variable can be represented
graphically using two perpendicular axes: the horizontal axis for
variable values and the vertical axis for frequencies.
2. Proper scales must be chosen for both axes to ensure accurate
representation.
3. Perpendicular columns are drawn at each variable value on the
horizontal axis, with heights corresponding to their frequencies.
4. This graphical representation is called a column diagram or frequency
bar diagram and can use relative frequencies instead of absolute
frequencies.
Graphical Representation of the cumulative
frequency distribution of a Discrete Variable:
• Consider the data on the family size with the following frequency
table (Table 2.):
Family Frequency Less than type More than type
Size Cumul. Freq. Cumul. Freq.
2 9 9 90
3 20 29 81
4 30 59 61
5 17 76 31
6 10 86 14
7 4 90 4
Total 90
Step Diagram Showing the Cumulative frequencies of the family
sizes.
100
90
80
70
60
50
40
30
20
10
0
0 1 2 3 4 5 6 7 8 9
Less Than Type Greater Than Type
Step Diagram
Step Diagrams
Step diagrams are graphical representations used to display cumulative frequency
distributions. They consist of horizontal and vertical segments, resembling a staircase.
Key Features:
• Cumulative Representation: Used for "less than" and "greater than" cumulative
frequencies.
• Axis Representation: The variable values are plotted on the horizontal axis, while
cumulative frequencies are on the vertical axis.
• Staircase Shape:
• The "less than" type diagram ascends from left to right.
• The "greater than" type diagram ascends from right to left.
• Use: Step diagrams help visualize how data accumulates over a range, making it easier
to interpret trends in frequency distributions.
Frequency Distribution of Continuous
Variable
• Suppose the following data relate to marks in a test on mathematics
of 25 students in a college.
57 54 95 67 65
38 64 75 69 74
85 77 60 72 63
60 36 57 70 87
70 71 55 67 44
Tally Marks:
Table 4: Tally Marks for the Data on Marks
𝑑 …….
Lower boundary = lower limit - 2
𝑑
Upper boundary = upper limit - 2 𝑎 + (𝑘 − 1)𝑐 − 𝑑ൗ2 − 𝑎 + 𝑘𝑐 − 𝑑ൗ2 𝑓𝑘
Relative and Cumulative Frequencies
Here’s a summary of the content in points:
• Frequency distribution can be represented as:
• Relative frequencies (proportions)
• Cumulative frequencies
• Cumulative frequencies are calculated by successively adding class frequencies.
• The addition starts:
• From the top (lowest class) → for less-than type cumulative frequencies
• From the bottom (highest class) → for more-than type cumulative frequencies
• Less-than cumulative frequency of a class shows:
• The number of values less than the upper boundary of that class
• More-than cumulative frequency of a class shows:
• The number of values greater than or equal to the lower boundary of that class
• Which points the cumulative frequencies correspond to:
• Less-than type → corresponds to upper boundaries
• More-than type → corresponds to lower boundaries
Graphical Representation of Frequency
Distribution
5
4
3
2
1
0
0 20 40 60 80 100 120
Marks
Frequency Polygon
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
31-40 41-50 51-60 61-70 71-80 81-90 91-100
Marks
Histogram
1. A histogram represents the frequency distribution of a
continuous variable by considering the spread of
frequency over an interval.
2. The horizontal axis represents class boundaries, and
rectangles are drawn over each class interval.
3. The area of each rectangle indicates the class frequency,
with height representing frequency density.
4. The diagram consists of adjoining rectangles, and class
widths may vary.
Ogives corresponding to less than and more than type
cumulative frequencies
30
25
Cumulative Frequencies
20
15
10
0
0 20 40 60 80 100 120
Marks
Less Than Type More Than Type
Ogives
This diagram exhibits the frequency distribution of a continuous variable using
cumulative frequencies.
•Axes Setup:
•Horizontal axis: Represents the variable values.
•Vertical axis: Represents the cumulative frequencies.
•Less-than Type Ogive:
•Plot cumulative frequencies against the upper class boundaries.
•Points are joined by line segments to form the ogive.
•Cumulative frequency is zero at the lower boundary of the lowest class, which is
included in the diagram.
•More-than Type Ogive:
•Plot cumulative frequencies against the lower class boundaries.
•Construction is similar to the less-than ogive.
•Cumulative frequency is zero at the upper boundary of the highest class, and this
point is included.