Chapter 2
Chapter 2
CHAPTER TWO
2. METHODS OF DATA COLLECTION AND PRESENTATION
Introduction
This unit will deal how to collect and present the data you have collected so that they can be of use.
Thus the collected data also known as raw data are always in an unorganized form and need to be
organized and presented in a meaningful and readily comprehensible form in order to facilitate further
statistical analysis.
Frequency: is the number of times a certain value of the variable repeated in the given data or the
number of times a certain value (set of value) occurs in a specific group.
There are two things which must be considered before starting the data collection. These are:
B. Plan of data collection: in planning data collection the following points should be considered:
BY: Habtamu W.(MSc in Biostatistics) Page 1
Basic Statistics Lecture Note 2024/2025
NB:
Primary data are more expensive than secondary data.
Data which are primary for one may be secondary for the other.
A Frequency distribution is a table that presents data according to some criteria with the
corresponding number of items following in each class (i.e. with the corresponding frequencies)
A frequency distribution is essentially the classification of data in to an appropriate number of mutually
exclusive (non-overlapping) classes.
There are 3 types of Frequency distribution. These are:
1. Categorical Frequency distribution
2. Ungrouped Frequency distribution
3. Grouped Frequency distribution
There are specific procedures for constructing each type.
1) Categorical Frequency distribution: Used for data that can be place in specific categories such as
nominal, or ordinal. e.g. marital status and Letter grade
Example 2.1: a social worker collected the following data on marital status for 25
persons.(M=married, S=single, W=widowed, D=divorced)
M S D W D
S S M M M
W D S M M
W D D S S
S W W D D
Solution:
Since the data are categorical, discrete classes can be used. There are four types of marital status M, S,
D, and W. These types will be used as class for the distribution. We follow procedure to construct the
frequency distribution.
Step 1: Make a table as shown.
Step 2: Tally the data and place the result in column (2).
Step 3: Count the tally and place the result in column (3).
Step 4: Find the percentages of values in each class by using;
f
% * 100 Where f= frequency of the class, n=total number of value.
n
Percentages are not normally a part of frequency distribution but they can be added since they are
used in certain types diagrammatic such as pie charts.
Step 5: Find the total for column (3) and (4).
Combing the entire steps one can construct the following frequency distribution.
2) Ungrouped Frequency distribution: Is a table of all the potential raw score values that could possible
occur in the data along with the number of times each actually occurred. Ungrouped frequency
distribution is often constructed for small set or data on discrete variable.
76 70 70 80 85
Construct a frequency distribution, which is ungrouped.
Solution:
Step 1: Find the range, Range=Max-Min=90-60=30.
Step 2: Make a table as shown
Step 3: Tally the data.
Step 4: Compute the frequency.
Mark Tally Frequency
60 // 2
62 / 1
63 / 1
65 / 1
70 //// 4
74 / 1
75 / 1
76 // 2
80 /// 3
85 /// 3
90 / 1
Each individual value is presented separately, that is why it is named ungrouped frequency distribution.
1) Grouped Frequency Distribution: is used when the range of the data is large, the data must be
grouped in to classes that are more than one unit in width.
Class width (W): The difference between the upper and lower boundaries of any consecutive
class. The class width is also the difference between the lower limit or upper limits of two
consecutive class.
Class mark (Midpoint): is the average of the lower and upper class limits or the average of
upper and lower class boundary.
Cumulative frequency: It is the number of observation less than or greater than the upper class
boundary of class.
CF (Less than type): it is the number of values less than the upper class boundary of a given
class.
CF (Greater than type): it is the number of values greater than the lower class boundary of a
given class.
Relative frequency (Rf ): The frequency divided by the total frequency. This gives the present of
values falling in that class.
Rfi = fi/n= fi/ ∑fi , where fi is frequency of ith class and n= total number of observation or items
Relative cumulative frequency (RCf): The running total of the relative frequencies or the
cumulative frequency divided by the total frequency gives the present of the values which are less
than the upper class boundary or the reverse.
CRfi=Cfi/n=Cfi/∑fi
3. Select the number of classes desired, usually between 5 and 20 or use Sturges rule
k 1 3.32 log n where k is number of classes desired and n is total number of observation.
4. Find the class width by dividing the range by the number of classes and rounding up, not off.
R
w .
k
5. Pick a suitable starting point less than or equal to the minimum value. The starting point is
called the lower limit of the first class. Continue to add the class width to this lower limit to get
the rest of the lower limits.
6. To find the upper limit of the first class, subtract U from the lower limit of the second class (i.e
UCLi = LCLi -U) . Then continue to add the class width to this upper limit to find the rest of the
upper limits.
7. Find the boundaries by subtracting U/2 units from the lower limits and adding U/2 units from
the upper limits. The boundaries are also half-way between the upper limit of one class and the
lower limit of the next class. Mathematically expressed as:
LCBi = LCLi – ½ U, where LCBi is lower class boundary of the ith class
UCBi = UCLi + ½ U , where UCBi is upper class boundary of the ith class
Class limit Class boundary Class Tally Freq. Cf (less Cf(more rf. rcf(less than
Mark than than type) type
type)
6 – 11 5.5 – 11.5 8.5 // 2 2 20 0.10 0.10
12 – 17 11.5 – 17.5 14.5 // 2 4 18 0.10 0.20
18 – 23 17.5 – 23.5 20.5 //// // 7 11 16 0.35 0.55
24 – 29 23.5 – 29.5 26.5 //// 4 15 9 0.20 0.75
30 – 35 29.5 – 35.5 32.5 /// 3 18 5 0.15 0.90
36 – 41 35.5 – 41.5 38.5 // 2 20 2 0.10 1.00
Step 3: Using a protractor and compass, graph each section and write its name corresponding
percentage.
Frequency,
Boys, 1500, Frequency,
15% Men, 2500,
25%
Frequency,
Frequency,
Girls, 4000, 40%
Women, 2000,
20%
B) Pictogram: is a device used to represent data by means of pictures or small symbols. We decide
about a suitable picture to represent a definite number of units in which the variable is
measured.
Example: The following table shows the orange production in a plantation from production year
1990-1993. Represent the data by a pictogram.
Production year 1990 1991 1992 1993
Amount (in kg) 3000 3850 3500 5000
C) Bar Charts: Used to represent & compare the frequency distribution of discrete variables and
attributes or categorical series. Bars can be drawn either vertically or horizontally.
In presenting data using bar diagram,
All bars must have equal width and the distance between bars must be equal.
The height or length of each bar indicates the size (frequency) of the figure represented.
There are different types of bar charts. The most common being:
120
80
60
40
20
0
1990 1991 1992 1993 1994 1995
Production year
Example: The following data represent sale by product, 1957- 1959 of a given company for three
products A, B, C.
100
80
Sales in $
Product C
60
Product B
40
Product A
20
0
1957 1958 1959
Year of production
III. Multiple Bar charts: These are used to display data on more than one variable. They are
used for comparing different variables at the same time.
Example: Draw a multiple bar chart to represent the sales by product from 1957 to 1959.
60
50
Sales in $
40 Product A
30 Product B
20 Product C
10
0
1957 1958 1959
Year of production
Choose a suitable scale for the frequencies or cumulative frequencies and label it on the y-axes.
Represent the class boundaries for the histogram or ogive or the mid points for the frequency
polygon on the x-axes.
Plot the points.
Draw the bars or lines to connect the points.
i. Histogram: is a graph which displays the data by using vertical bars of various heights to
represent frequencies. Class boundaries are placed along the horizontal axes. Class marks and
class limits are sometimes used as quantity on the x-axis.
Example: Construct a histogram for the frequency distribution of the time spent by the automobile
workers. The frequency distribution is:
15.5-16.5 18.5 3
16.5-27.5 24.5 6
27.5-33.5 30.5 8
33.5-39.5 36.5 4
39.5-45.5 42.5 3
45.5-51.5 48.5 1
Figure 5. The time in minutes spent by automobile workers to travel from home to work.
Example: Construct a frequency polygon for the frequency distribution of the time spent by the
automobile workers.
Figure 6: The time in minutes spent by automobile workers to travel from home to work.