Methods of Organizing Data
Methods of Organizing Data
Introduction
When conducting a statistical study, the researcher must gather data for the particular variable under
study. For example, if a researcher wishes to study the number of people who were bitten by poisonous snakes
in a specific geographic area over the past several years, he or she has to gather the data from various doctors,
hospitals, or health departments. To describe situations, draw conclusions, or make inferences about events,
the researcher must organize the data in some meaningful way. The most convenient method of organizing data
is to construct a frequency distribution.
Data collected from different sources are usually unorganized and, in a form, unsuitable for immediate
interpretation. In any statistical investigation, once relevant data are already gathered, the next step is to
organized and present in a form using appropriate tables and graphs.
I. Organizing Datasets
There are two ways of organizing numerical data ungrouped data or raw data that are not too many,
can be organized by making counts or constructing a stem and leaf plot. While for large set of data, the use of
frequency distribution table is necessary. One way of organizing data is to prepare an array.
Array
- is considered as the first step in organizing data
- it is a process of arranging data in order of magnitude from smallest value to largest value
- enables one to determine quickly the value of the smallest measurement, the value of the largest
measurement
Example:
For an exam of 100 items, the ten students scored as follows:
24, 65, 89, 35, 58, 79, 55, 80, 90, 37
Solution:
Step 1: In order to create a stem and leaf plot, we need to organize first the data into groups.
56
67, 67, 69
72, 74, 77, 77, 78
81, 82, 82, 83, 84, 88
90, 92, 93, 94
Step 2: Create the plot with stems as the tens and the leaves as the ones.
Stem Leaf
5 6
6 7, 7, 9
7 2, 4, 7, 7, 8
8 1, 2, 2, 3, 4, 8
9 0, 2, 3, 4
Example 2:
Here is a set of data on showing the test scores on the last statistics quiz.
123, 125, 132, 156, 178, 190, 136, 200, 201, 205, 202
Solution:
Step 1: 123, 125
132, 136
156
178
190
200, 201, 205, 202
Step 2: Create the plot with stems as the tens and the leaves as the ones.
The stems will be 12, 13, 15, 17, 19, 20
Stem Leaf
12 3, 5
13 2, 6
15 6
17 8
19 0
20 0, 1, 5, 2
Since little information can be obtained from looking at raw data, the researcher organizes the data into
what is called a frequency distribution.
Frequency Distribution Table
- consists of classes and their corresponding frequencies
- each raw data value is placed into a quantitative or qualitative category called a class
- the frequency of a class refers to the count of data values that fall within a specific class interval
Example:
Given below are marks obtained by 20 students in Math out of 25:
22, 24, 20, 18, 13, 16, 16, 18, 18, 20, 24, 24, 22, 24, 26, 26, 22, 20, 20, 20
Solution:
Using the data above, construct a frequency distribution with 10 classes. For each class interval of the
frequency distribution, determine the class midpoint, the lower and upper class boundaries. Use 83 as the lowest
lower limit.
Step 1: Get the highest and the lowest value in the distribution. Let H be the highest value and
L be the lowest value.
Note: Manual Inspection Sorting: Arrange the data in ascending or descending order to easily
identify the highest and lowest values. Scanning: Manually look through the data to find the highest
and lowest values, which is practical for small datasets.
H = 120; L = 83
Step 2: Determine the range of the raw data. The range is defined as the difference between
the highest and the lowest value in the distribution.
R = H – L = 120 – 83 = 37
Step 3: Determine the number of classes. In the determination of the number of classes, it
should be noted that there is no standard method to follow. Generally, the number of classes must not
be less than 5 and should not be more than 15 (Rule of Thumb: 5 – 20 classes). In some instances,
however, the number of classes can be approximated by using the relation.
𝑘 = 1 + 3.3 log 𝑛 or 𝑘 = √𝑛
where:
𝑘 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠 | 𝑛 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒
Step 4: Determine the size of the interval. The value of 𝑐 can be obtained by dividing the range
by the desired number of classes. Hence,
𝑹 𝟑𝟕
𝒄= = = 𝟑. 𝟕 𝒐𝒓 𝟒
𝒌 𝟏𝟎
where:
𝑅 = 𝑟𝑎𝑛𝑔𝑒 | 𝑘 = 𝑑𝑒𝑠𝑖𝑟𝑒𝑑/𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠
Step 5: Construct the classes. In constructing the classes, we first determine the lower limit of
the distribution. This lower limit's value can be chosen arbitrarily as long as the lowest value shall fall
on the first interval and the highest value to the last interval.
Determine the frequency of each class. The number of frequencies is determined by counting
the number of items that fall in each class.
Class Boundaries
Class Mark or
Classes f Lower Upper
Midpoint (x)
Boundaries Boundaries
83 - 86 2 84.5 82.5 86.5
87 – 90 5 88.5 86.5 90.5
91 – 94 8 92.5 90.5 94.5
95 – 98 11 96.5 94.5 98.5
99 – 102 15 100.5 98.5 102.5
103 – 106 26 104.5 102.5 106.5
107 – 110 15 108.5 106.5 110.5
111 – 114 9 112.5 110.5 114.5
115 – 118 5 116.5 114.5 118.5
119 - 122 4 120.5 118.5 122.5
Total 100
Given a frequency distribution, we can construct other frequency distributions like the relative frequency
distribution and the cumulative frequency distribution.
• Relative Frequency Distribution Table - is another table that describes the frequency
distribution in terms of percentages. The relative frequency denoted by “%f” can be obtained by
dividing the class frequency by the sample size and multiplying the result by 100. The formula
for converting the class frequency to percent, we have
𝑓
%𝑓 = (100)
𝑛
where:
%𝑓 = 𝑟𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑓𝑜𝑟 𝑒𝑎𝑐ℎ 𝑐𝑙𝑎𝑠𝑠 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙
𝑓 = 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑒𝑎𝑐ℎ 𝑐𝑙𝑎𝑠𝑠
𝑛 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒
Relating to the previous example, the relative frequency of the first interval can be obtained as follows:
𝟐
Relative Frequency of the Class Interval 83 – 86: %𝒇 = 𝟏𝟎𝟎 (𝟏𝟎𝟎) = 𝟐%
𝟓
Relative Frequency of the Class Interval 87 – 90: %𝒇 = 𝟏𝟎𝟎 (𝟏𝟎𝟎) = 𝟓%
If we continue converting the class frequencies to percent, we shall develop the relative
frequency distribution below.
Classes f %f
83 - 86 2 2
87 – 90 5 5
91 – 94 8 8
95 – 98 11 11
99 – 102 15 15
103 – 106 26 26
107 – 110 15 15
111 – 114 9 9
115 – 118 5 5
119 - 122 4 4
Total 100 100%
To interpret the result, we can say that 2% of the 100 freshmen students admitted at the College of
Engineering has the intelligence quotients ranging from 83 – 86, 5% of the 100 freshmen students has the
intelligence quotients ranging from 87 – 90, and so on.
• Cumulative Frequency Distribution Table - can also be derived from the frequency
distribution. This distribution can be obtained by simply adding the class frequency. There are
two types of a cumulative frequency distribution. These are as follows:
The less than and greater than cumulative frequency distribution are shown below.
Table 1
Frequency Distribution of the Intelligence Quotients of 100 freshmen students
Admitted at the College of Engineering
Class Boundaries
Class Mark or
Classes f Lower Upper %f < 𝒄𝒖𝒎𝒇 > 𝒄𝒖𝒎𝒇
Midpoint (x)
Boundaries Boundaries
83 - 86 2 84.5 82.5 86.5 2 2 100
87 – 90 5 88.5 86.5 90.5 5 7 98
91 – 94 8 92.5 90.5 94.5 8 15 93
95 – 98 11 96.5 94.5 98.5 11 26 85
99 – 102 15 100.5 98.5 102.5 15 41 74
103 – 106 26 104.5 102.5 106.5 26 67 59
107 – 110 15 108.5 106.5 110.5 15 82 33
111 – 114 9 112.5 110.5 114.5 9 91 18
115 – 118 5 116.5 114.5 118.5 5 96 9
119 - 122 4 120.5 118.5 122.5 4 100 4
Total 100 100%
Worksheet No. 9 (1 whole sheet of paper)