0% found this document useful (0 votes)
12 views39 pages

Data An Overview Lecture 5

The document provides an overview of frequency distribution, distinguishing between attributes and variables, and explaining their types (nominal, ordinal, discrete, and continuous). It includes methods for calculating and representing frequency distributions through tables, graphs, and diagrams, emphasizing the importance of cumulative frequencies. Additionally, it outlines key definitions related to class intervals and frequency densities for continuous variables.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views39 pages

Data An Overview Lecture 5

The document provides an overview of frequency distribution, distinguishing between attributes and variables, and explaining their types (nominal, ordinal, discrete, and continuous). It includes methods for calculating and representing frequency distributions through tables, graphs, and diagrams, emphasizing the importance of cumulative frequencies. Additionally, it outlines key definitions related to class intervals and frequency densities for continuous variables.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Data: an overview

Lecture 5

Frequency Distribution
“Statistical data are always numerical.”

-True or False?
Consider the following examples.
• On asking a class of 12 students what is their favourite colour, the
following answers were received:
Blue, Blue, Green, Blue, Red, Purple, Cyan, Blue, Green, Orange,
Red, Blue

• When the same group was asked what was their grade in the last
exam, the following responses were received:
A, A, A, B, A, B, A, A, A, C, A, B
• Don’t we consider these responses to be data?
Attributes
• An attribute is a quantitative character that cannot be numerically
expressed. Example: Colour, Religion, Economic Status, Educational
Qualification, Mother Tongue.
• Data on attributes may be of two types.
• If, there is an inherent ordering of the forms or the categories, then
that particular data type is known as ordinal data. Example: Economic
Status, Educational Qualification, Grade.
• However, if there is no inherent ordering of the categories, then that
data type is known as nominal data. Example: Colour, Religion,
Gender, Mother Tongue.
Frequency Distribution of Attributes

• Suppose in a survey about flowers, the following colours were noticed


for 20 flowers:
red blue blue pink blue blue red blue blue green
red blue red pink yellow yellow red blue red magenta

• Find the numbers of the cars for each colour.


Frequency Distribution of Attributes
• You can use tally marks to count the numbers.
Table 1. Frequency table of car colours.
Colour Tally Marks Frequency Relative Frequency
Red |||| | 6 6/20=0.3
Blue |||| ||| 8 8/20=0.4
Pink || 2 2/20=0.1
Green | 1 1/20=0.05
Yellow || 2 2/20=0.1
Magenta | 1 1/20=0.05
Total 20

The term frequency refers to the number of cars with each colour. For example, the frequency for red is 6. The
total frequency is 20. The distribution of the total frequency over all the categories is known as the frequency
distribution. Table 1 is a frequency table that describes the frequency distribution of the car colours. Relative
frequency refers to the relative share of the frequencies over different categories.
Note: The final frequency table should not contain any tally marks.
Graphical Representation of Attributes
• The frequency distribution of an attribute when expressed in terms
of absolute frequencies, can be represented by horizontal bar
diagrams.
Frequency Distribution of Car colours

Magenta

Yellow

Green

Pink

Blue

Red

0 1 2 3 4 5 6 7 8 9
Graphical Representation of Attributes
• Stacked bar diagram and Pie diagram can be used to showcase the
frequency distribution in terms of relative frequencies:

Pie diagram showing the frequency distribution Stacked bar diagram diagram showing the
of car colours. frequency distribution of car colours.
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Frequency

Red Blue Pink Green Yellow Magenta


Red Blue Pink Green Yellow Magenta
Variables
• The term variable (or variate) refers to the character of an item or an individual that can
be expressed in numerical terms. It is also called a quantitative character and such characters
can be measured or counted. Weight of students in a school, ages of buys, family size
etc. are characters of this type.

• Variables can be classified in two main types, namely 1. Discrete and 2. Continuous

• Discrete Variable: A quantitive character that can take certain isolated values only in its
range of variation is called a discrete variable. e.g. No. of students in different colleges, the
size of families in a locality.

• Continuous Variable: A quantitative character that can assume any value within its range of
variation is termed as a continuous variable. e.g. The weight of individuals, marks
obtained by candidates in an exam, income of different persons etc.
Frequency Distribution of a Variable

• Frequency Distribution of a Discrete Variable.


• Frequency Distribution of a Continuous Variable.
Frequency Distribution of a Discrete Variable:
• A survey was performed in a locality of Calcutta and the following
data relating to the number of members in different families was
recorded.
4 4 3 6 5 4 3 4 4 4
5 6 2 7 6 5 4 3 4 3
4 6 6 5 3 5 3 3 2 4
4 5 4 6 3 5 5 3 2 3
3 5 3 5 4 4 4 5 6 7
4 4 4 6 7 2 4 4 4 3
4 4 3 5 3 3 4 5 3 6
4 6 4 3 2 5 5 3 3 4
4 5 4 2 4 2 7 5 2 2

• Find the frequency distribution of the given dataset.


Frequency Distribution of a Discrete Variable:
We proceed in the same way:
1. First we identify the different values assumed by the discrete variable in its range of
variability.
2. Then we use tally marks to find the frequency with respect to each possible value.

Family Size Tally Marks Frequency


2 |||| |||| 9
3 |||| |||| |||| |||| 20
4 |||| |||| |||| |||| |||| |||| 30
5 |||| |||| |||| || 17
6 |||| |||| 10
7 |||| 4
Total 90
Frequency Distribution of a Discrete Variable:
Table 2. Frequency Distribution of house sizes in that particular locality.

Family Size Frequency


2 9
3 20
4 30
5 17
6 10
7 4
Total 90

The same frequency distribution may be represented with relative frequencies. For variables, another important
feature is the cumulative frequencies. We may ask: how many families have more than or less than a certain number
of members. Cumulative frequency tables help us answer the questions.
Frequency Distribution of a Discrete Variable:
Table 3. More than and less than type cumulative frequency table of house sizes in that particular locality.

Family Frequency Less than type More than type


Size Cumul. Freq. Cumul. Freq.
2 9 9 90
3 20 29 81
4 30 59 61
5 17 76 31
6 10 86 14
7 4 90 4
Total 90

To get the less than type cumulative frequency, we start with the smallest value assumed by the variable and we
consecutively add the frequencies as the values gradually increase. The more than type cumulative frequency starts at
the largest value assumed by the variable and we consecutively add the frequencies as the values gradually decrease.
Graphical Representation of the frequency
distribution of a Discrete Variable:
Consider the data on the family size with the following frequency table
(Table 2.):

Family Size Frequency


2 9
3 20
4 30
5 17
6 10
7 4
Total 90
Frequency Polygon showing the frequency distribution of family
size.
35

30

25
Frequency

20

15

10

0
0 1 2 3 4 5 6 7 8 9
Family SIze
Frequency Polygon

1. A frequency polygon is an effective way to represent the frequency


distribution of a discrete variable.
2. Variable values are placed on the horizontal axis, and frequencies are
plotted on the vertical axis.
3. Data points are marked where each variable value meets its
corresponding frequency, with additional zero-frequency points at both
ends.
4. These points are connected by line segments to form the frequency
polygon.
Column Diagram showing the frequency distribution of family
size
35

30

25
Frequency

20

15

10

0
0 1 2 3 4 5 6 7 8
Family Size
Column Diagram
1. A frequency distribution of a discrete variable can be represented
graphically using two perpendicular axes: the horizontal axis for
variable values and the vertical axis for frequencies.
2. Proper scales must be chosen for both axes to ensure accurate
representation.
3. Perpendicular columns are drawn at each variable value on the
horizontal axis, with heights corresponding to their frequencies.
4. This graphical representation is called a column diagram or frequency
bar diagram and can use relative frequencies instead of absolute
frequencies.
Graphical Representation of the cumulative
frequency distribution of a Discrete Variable:
• Consider the data on the family size with the following frequency
table (Table 2.):
Family Frequency Less than type More than type
Size Cumul. Freq. Cumul. Freq.
2 9 9 90
3 20 29 81
4 30 59 61
5 17 76 31
6 10 86 14
7 4 90 4
Total 90
Step Diagram Showing the Cumulative frequencies of the family
sizes.
100

90

80

70

60

50

40

30

20

10

0
0 1 2 3 4 5 6 7 8 9
Less Than Type Greater Than Type
Step Diagram
Step Diagrams
Step diagrams are graphical representations used to display cumulative frequency
distributions. They consist of horizontal and vertical segments, resembling a staircase.
Key Features:
• Cumulative Representation: Used for "less than" and "greater than" cumulative
frequencies.
• Axis Representation: The variable values are plotted on the horizontal axis, while
cumulative frequencies are on the vertical axis.
• Staircase Shape:
• The "less than" type diagram ascends from left to right.
• The "greater than" type diagram ascends from right to left.
• Use: Step diagrams help visualize how data accumulates over a range, making it easier
to interpret trends in frequency distributions.
Frequency Distribution of Continuous
Variable
• Suppose the following data relate to marks in a test on mathematics
of 25 students in a college.

57 54 95 67 65
38 64 75 69 74
85 77 60 72 63
60 36 57 70 87
70 71 55 67 44
Tally Marks:
Table 4: Tally Marks for the Data on Marks

Class Limits Tally Marks Class Frequency


31-40 || 2
41-50 | 1
51-60 |||| | 6
61-70 |||| ||| 8
71-80 |||| 5
81-90 || 2
91-100 | 1
Frequency Table

Table 5: Frequency Distribution of Marks of Students


in Mathematics in a College

Class Boundaries Class Frequency


30.5-40.5 2
40.5-50.5 1
50.5-60.5 6
60.5-70.5 8
70.5-80.5 5
80.5-90.5 2
90.5-100.5 1
Total 25
Frequency Table
Table 6: Relative and Cumulative Frequency Table of
Marks of Students in Mathematics in a College

Class Relative Cumulative Frequency


Boundaries Frequency Less Than More Than
30.5-40.5 0.08 2 25
40.5-50.5 0.04 3 23
50.5-60.5 0.24 9 22
60.5-70.5 0.32 17 16
70.5-80.5 0.2 22 8
80.5-90.5 0.08 24 3
90.5-100.5 0.04 25 1
Total 1
Some Definitions:
1. Class-Interval: The whole range of the variable values is divided into
some groups in the forms of intervals. Each interval is called class
interval.
2. Class Frequency: The number of observations included in a class is
termed as absolute frequency (or relative frequency).
3. Class Limits: These are the two end-points of a class interval used for
tally marking the given values. However, these limits do to show the real
boundaries of the class.
4. Class Boundaries: In case of a continuous variable, its values are
rounded off; for instance, any value from 59.5-60.5 is taken as 60. In
other words, the number 65 stands for any value from 59.5 to 60.5. The
two real endpoints of a class interval are called class boundaries. Clearly
the upper boundary of a class coincides with the lower boundary of the
next class. The class boundaries are used for forming the frequency
distribution of a continuous variable.
Some Definitions:
5. Class-mark: The mid-value of a class interval that lies half way
between its two end points (i.e., class limits or class boundaries) is
termed as class-mark.
6. Class width: The difference between the upper and lower
boundaries of a class interval is called the class width or size of the
class.
7. Frequency density: The frequency density of a class is the
frequency per unit width or size of the class. i.e.,
Frequency density=class frequency/class width. Frequency densities
are used for comparing the concentration of frequencies in different
classes, particularly when the classes are of unequal width.
Construction of frequency distribution of a
continuous variable: Some points to remember
1. n values corresponding to a continuous variable are given. First we
pickup the smallest and the greatest of the given values. Their
difference gives us the range of variation, which is divided into a
suitable number of classes.
2. The classes should be exhaustive (no value should be escaped) and
mutually exclusive (no value should be contained in more than one
class.).
3. The number of classes should not be too large or small.
4. Equal width should preferably be maintained over different classes.
Construction of frequency distribution of a
continuous variable: Some points to remember
1. n values corresponding to a continuous variable are given. First we
pickup the smallest and the greatest of the given values. Their
difference gives us the range of variation, which is divided into a
suitable number of classes.
2. The classes should be exhaustive (no value should be escaped) and
mutually exclusive (no value should be contained in more than one
class.).
3. The number of classes should not be too large or small.
4. Equal width should preferably be maintained over different classes.
Steps to construct the frequency distribution
for continuous variable:
Class Limits Tally Marks 𝒂: a number smaller than or equal to the smallest
value.
𝑎 − (𝑎 + 𝑐 − 𝑑) |||| | 𝒄: desired width of the class.
𝒌: the number of classes.
(𝑎 + 𝑐) − (𝑎 + 2𝑐 − 𝑑) |||| ||| 𝒂 + 𝒌𝒄 − 𝒅: a number greater than or equal to the
……. |||| highest value.
𝒅: 1,0.1, 0.001 etc., (i.e. upto which decimal places
(𝑎 + (𝑘 − 1)𝑐) − (𝑎 + 𝑘𝑐 − 𝑑) || the values are given.

Class Limits Frequency


After completing the table of tally marks, we prepare
the final frequency table, where, the classes are given 𝑎 − 𝑑ൗ2 − 𝑎 + 𝑐 − 𝑑 𝑓1
in terms of class boundaries and frequency of different
classes are noted against them. For any class: 𝑎 + 𝑐 − 𝑑ൗ2 − 𝑎 + 2𝑐 − 𝑑ൗ2 𝑓2

𝑑 …….
Lower boundary = lower limit - 2
𝑑
Upper boundary = upper limit - 2 𝑎 + (𝑘 − 1)𝑐 − 𝑑ൗ2 − 𝑎 + 𝑘𝑐 − 𝑑ൗ2 𝑓𝑘
Relative and Cumulative Frequencies
Here’s a summary of the content in points:
• Frequency distribution can be represented as:
• Relative frequencies (proportions)
• Cumulative frequencies
• Cumulative frequencies are calculated by successively adding class frequencies.
• The addition starts:
• From the top (lowest class) → for less-than type cumulative frequencies
• From the bottom (highest class) → for more-than type cumulative frequencies
• Less-than cumulative frequency of a class shows:
• The number of values less than the upper boundary of that class
• More-than cumulative frequency of a class shows:
• The number of values greater than or equal to the lower boundary of that class
• Which points the cumulative frequencies correspond to:
• Less-than type → corresponds to upper boundaries
• More-than type → corresponds to lower boundaries
Graphical Representation of Frequency
Distribution

Class Boundaries Class Mark Class Frequency


30.5-40.5 35.5 2
40.5-50.5 45.5 1
50.5-60.5 55.5 6
60.5-70.5 65.5 8
70.5-80.5 75.5 5
80.5-90.5 85.5 2
90.5-100.5 95.5 1
Total 25
Frequency Polygon
Frequency polygon for marks in
Mathematics
9
8
7
6
Frequency

5
4
3
2
1
0
0 20 40 60 80 100 120
Marks
Frequency Polygon

1. A frequency polygon represents the frequency distribution


of a continuous variable with equal class width.
2. The horizontal axis shows class boundaries, while the
vertical axis represents frequencies.
3. Frequencies are plotted against class marks, and the points
are connected by line segments.
4. Two additional classes with zero frequencies are added at
both ends to close the polygon.
Histogram
Histogram of marks in mathematics
0.9
0.8
Frequency Density

0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
31-40 41-50 51-60 61-70 71-80 81-90 91-100
Marks
Histogram
1. A histogram represents the frequency distribution of a
continuous variable by considering the spread of
frequency over an interval.
2. The horizontal axis represents class boundaries, and
rectangles are drawn over each class interval.
3. The area of each rectangle indicates the class frequency,
with height representing frequency density.
4. The diagram consists of adjoining rectangles, and class
widths may vary.
Ogives corresponding to less than and more than type
cumulative frequencies
30

25
Cumulative Frequencies

20

15

10

0
0 20 40 60 80 100 120
Marks
Less Than Type More Than Type
Ogives
This diagram exhibits the frequency distribution of a continuous variable using
cumulative frequencies.
•Axes Setup:
•Horizontal axis: Represents the variable values.
•Vertical axis: Represents the cumulative frequencies.
•Less-than Type Ogive:
•Plot cumulative frequencies against the upper class boundaries.
•Points are joined by line segments to form the ogive.
•Cumulative frequency is zero at the lower boundary of the lowest class, which is
included in the diagram.
•More-than Type Ogive:
•Plot cumulative frequencies against the lower class boundaries.
•Construction is similar to the less-than ogive.
•Cumulative frequency is zero at the upper boundary of the highest class, and this
point is included.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy