0% found this document useful (0 votes)
6 views16 pages

Chapter 2

This document discusses methods of data collection and presentation in statistics, emphasizing the importance of organizing raw data for analysis. It outlines primary and secondary data sources, as well as various methods for collecting primary data, including interviews and observations. Additionally, it covers different types of data presentation, such as tabular and graphical formats, and details the construction of frequency distributions.

Uploaded by

design
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views16 pages

Chapter 2

This document discusses methods of data collection and presentation in statistics, emphasizing the importance of organizing raw data for analysis. It outlines primary and secondary data sources, as well as various methods for collecting primary data, including interviews and observations. Additionally, it covers different types of data presentation, such as tabular and graphical formats, and details the construction of frequency distributions.

Uploaded by

design
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Basic Statistics Lecture Note 2024/2025

CHAPTER TWO
2. METHODS OF DATA COLLECTION AND PRESENTATION

Introduction
This unit will deal how to collect and present the data you have collected so that they can be of use.
Thus the collected data also known as raw data are always in an unorganized form and need to be
organized and presented in a meaningful and readily comprehensible form in order to facilitate further
statistical analysis.

1.1 Methods of Data Collection


Definition some basic terms
Data: Data defined as any information collected as parts of a research project and numerical result of
any scientific measurement it may be in the form of counting or measurement.
Raw data: are collected data, which have not been organized numerically.
Array: is an arrangement of raw numerical data in ascending or descending order of magnitude. It
enables us to know the range of the data set easily and it also gives us some idea about the general
characteristics of the distribution.

Frequency: is the number of times a certain value of the variable repeated in the given data or the
number of times a certain value (set of value) occurs in a specific group.
There are two things which must be considered before starting the data collection. These are:

a. Statement of the purpose of investigation (objective)


b. Plan of data collection
A. Purpose of investigation (objective of statistical inquiry):
The objective of statistical investigation may be:
1. To supplement, disprove or to test some theory (hypothesis) which is current.
2. To discover a new theory (hypothesis)
3. To solve a problem involving the inter dependence of several group of facts

B. Plan of data collection: in planning data collection the following points should be considered:
BY: Habtamu W.(MSc in Biostatistics) Page 1
Basic Statistics Lecture Note 2024/2025

a. Scope of inquiry: decide with reference to


 Time: the work of collection of data must be finished within a reasonable time.
Reasonable depends on the nature of the phenomenon under investigation. If the
phenomenon is such where the conditions change quickly & frequently the duration of
the process of investigation should narrowed to such an extent that there is no possibility
of a change affecting the data.
 Space:
 Political & administrative (country, district, woreda, municipality)
 Economic division (agriculture & animal husbandry), mining , manufacturing,
trade, transport
 Natural or climate division (plaints, mountains, plateaus, forests )
 The number of items included in the study: it means the questions of choice between
the census and the sampling technique of data collection.
 Census: each item constituting for information in the population(the
universe is enumerated)
 Sample: a limited number of items is taken in to account(this limited of
items regarded as the sample of the population)
2.1.1 Source of Data
Any scientific investigation requires data related to the study. The required data is obtained from two
sources called primary & secondary.
A. Primary Sources: is a source of data that supplies firsthand information for the use of
immediate purpose. Primary data are data originally collected for the immediate purpose. The
sources of primary data are the objects under study themselves and there is also a direct contact
between the investigator and the items (objects) under investigation because of this it is more
expensive.
B. Secondary Sources: When an investigator uses data, which have already been collected by
others, such data are called "Secondary Data". Such data are primary data for the agency that
collected them, and become secondary for someone else who uses these data for his own
purposes. The secondary data can be obtained from journals, reports, government publications,
publications of professionals and research organizations. Secondary data are less expensive to
collect both in money, cost and time.
BY: Habtamu W.(MSc in Biostatistics) Page 2
Basic Statistics Lecture Note 2024/2025

NB:
 Primary data are more expensive than secondary data.
 Data which are primary for one may be secondary for the other.

Method of Primary Data Collection


In primary data collection, you collect the data yourself using methods such as interviews,
observations, laboratory experiments and questionnaires. The key point here is that the data you collect
is unique to you and your research and, until you publish, no one else has access to it. There are many
methods of collecting primary data and the main methods include:
 Questionnaire methods: it includes personal interview (face to face, telephone) & mail
interview.
 Observation: It involves recording the behavioral patterns of people, objects and events
in a systematic manner.
 Diaries: A diary is a way of gathering information about the way individuals spend their
time on professional activities. They are not about records of engagements or personal
journals of thought! Diaries can record either quantitative or qualitative data, and in
management research can provide information about work patterns and activities.
 Laboratory experiment: Conducting laboratory experiments on fields of chemical,
biological sciences and so on.

2.2 Methods of Data Presentation


Having collected and edited the data, the next important step is to organize it. That is to present it in a
readily comprehensible condensed form that aids in order to draw inferences from it. It is also
necessary that the like be separated from the unlike ones.
The presentation of data is broadly classified in to the following two categories:
• Tabular presentation
• Diagrammatic and Graphic presentation.
2.2.1 Tabular Presentation
Classification is the process of arranging items/data/ in to classes or categories according to their
similarities or differences. Classification is a preliminary and it prepares the ground for proper
presentation of data. Tabular presentation of data is presented by using Frequency distribution.

BY: Habtamu W.(MSc in Biostatistics) Page 3


Basic Statistics Lecture Note 2024/2025

A Frequency distribution is a table that presents data according to some criteria with the
corresponding number of items following in each class (i.e. with the corresponding frequencies)
A frequency distribution is essentially the classification of data in to an appropriate number of mutually
exclusive (non-overlapping) classes.
There are 3 types of Frequency distribution. These are:
1. Categorical Frequency distribution
2. Ungrouped Frequency distribution
3. Grouped Frequency distribution
There are specific procedures for constructing each type.

1) Categorical Frequency distribution: Used for data that can be place in specific categories such as
nominal, or ordinal. e.g. marital status and Letter grade
Example 2.1: a social worker collected the following data on marital status for 25
persons.(M=married, S=single, W=widowed, D=divorced)
M S D W D
S S M M M
W D S M M
W D D S S
S W W D D
Solution:
Since the data are categorical, discrete classes can be used. There are four types of marital status M, S,
D, and W. These types will be used as class for the distribution. We follow procedure to construct the
frequency distribution.
Step 1: Make a table as shown.

Class Tally Frequency Percent


(1) (2) (3) (4)
M
S
D
W

BY: Habtamu W.(MSc in Biostatistics) Page 4


Basic Statistics Lecture Note 2024/2025

Step 2: Tally the data and place the result in column (2).
Step 3: Count the tally and place the result in column (3).
Step 4: Find the percentages of values in each class by using;
f
% * 100 Where f= frequency of the class, n=total number of value.
n
Percentages are not normally a part of frequency distribution but they can be added since they are
used in certain types diagrammatic such as pie charts.
Step 5: Find the total for column (3) and (4).
Combing the entire steps one can construct the following frequency distribution.

Class Tally Frequency Percent


(1) (2) (3) (4)
M ///// 6 20
S //// // 7 28
D //// // 7 28
W //// 5 24

2) Ungrouped Frequency distribution: Is a table of all the potential raw score values that could possible
occur in the data along with the number of times each actually occurred. Ungrouped frequency
distribution is often constructed for small set or data on discrete variable.

Steps for constructing ungrouped frequency distribution:


 First find the smallest and largest raw score in the collected data.
 Arrange the data in order of magnitude and count the frequency.
 To facilitate counting one may include a column of tallies.
Example:
The following data represent the mark of 20 students.
80 76 90 85 80
70 60 62 70 85
65 60 63 74 75

BY: Habtamu W.(MSc in Biostatistics) Page 5


Basic Statistics Lecture Note 2024/2025

76 70 70 80 85
Construct a frequency distribution, which is ungrouped.
Solution:
Step 1: Find the range, Range=Max-Min=90-60=30.
Step 2: Make a table as shown
Step 3: Tally the data.
Step 4: Compute the frequency.
Mark Tally Frequency
60 // 2
62 / 1
63 / 1
65 / 1
70 //// 4
74 / 1
75 / 1
76 // 2
80 /// 3
85 /// 3
90 / 1

Each individual value is presented separately, that is why it is named ungrouped frequency distribution.
1) Grouped Frequency Distribution: is used when the range of the data is large, the data must be
grouped in to classes that are more than one unit in width.

Definition of some basic terms


 Grouped frequency distribution: is a FD when several numbers are grouped into one class.
 Class limits (CL): It separate one class from another. The limits could actually appear in the data
and have gaps between the upper limits of one class and the lower limit of the next class.
 Unit of measure (U): This is the possible difference between successive values. E.g. 1, 0.1, 0.01,
0.001, etc
 Class boundaries: Separate one class in a grouped frequency distribution from the other. The
boundary has one more decimal place than the raw data. There is no gap between the upper
boundaries of one class and the lower boundaries of the succeeding class. Lower class boundary
is found by subtracting half of the unit of measure from the lower class limit and upper class
boundary is found by adding half unit measure to the upper class limit.

BY: Habtamu W.(MSc in Biostatistics) Page 6


Basic Statistics Lecture Note 2024/2025

 Class width (W): The difference between the upper and lower boundaries of any consecutive
class. The class width is also the difference between the lower limit or upper limits of two
consecutive class.
 Class mark (Midpoint): is the average of the lower and upper class limits or the average of
upper and lower class boundary.
 Cumulative frequency: It is the number of observation less than or greater than the upper class
boundary of class.
 CF (Less than type): it is the number of values less than the upper class boundary of a given
class.
 CF (Greater than type): it is the number of values greater than the lower class boundary of a
given class.
 Relative frequency (Rf ): The frequency divided by the total frequency. This gives the present of
values falling in that class.
Rfi = fi/n= fi/ ∑fi , where fi is frequency of ith class and n= total number of observation or items
 Relative cumulative frequency (RCf): The running total of the relative frequencies or the
cumulative frequency divided by the total frequency gives the present of the values which are less
than the upper class boundary or the reverse.
CRfi=Cfi/n=Cfi/∑fi

Guidelines for classes


1. There should be between 5 and 20 classes.
2. The classes must be mutually exclusive. This means that no data value can fall into two different
classes
3. The classes must be all inclusive or exhaustive. This means that all data values must be included.
4. The classes must be continuous. There are no gaps in a frequency distribution.
5. The classes must be equal in width.

Steps for constructing Grouped frequency Distribution


1. Find the largest and smallest values
2. Compute the Range(R) = Maximum - Minimum

BY: Habtamu W.(MSc in Biostatistics) Page 7


Basic Statistics Lecture Note 2024/2025

3. Select the number of classes desired, usually between 5 and 20 or use Sturges rule
k  1  3.32 log n where k is number of classes desired and n is total number of observation.
4. Find the class width by dividing the range by the number of classes and rounding up, not off.
R
w .
k
5. Pick a suitable starting point less than or equal to the minimum value. The starting point is
called the lower limit of the first class. Continue to add the class width to this lower limit to get
the rest of the lower limits.
6. To find the upper limit of the first class, subtract U from the lower limit of the second class (i.e
UCLi = LCLi -U) . Then continue to add the class width to this upper limit to find the rest of the
upper limits.
7. Find the boundaries by subtracting U/2 units from the lower limits and adding U/2 units from
the upper limits. The boundaries are also half-way between the upper limit of one class and the
lower limit of the next class. Mathematically expressed as:

LCBi = LCLi – ½ U, where LCBi is lower class boundary of the ith class
UCBi = UCLi + ½ U , where UCBi is upper class boundary of the ith class

8. Find class mark (CM)


CMi = (UCLi + LCLi )/ 2 or CMi = (UCBi + LCBi )/ 2.
9. Tally the data.
10. Find the frequencies.
11. Find the cumulative frequencies. Depending on what you're trying to accomplish, it may not be
necessary to find the cumulative frequencies.
12. If necessary, find the relative frequencies and/or relative cumulative frequencies

Example: Construct a frequency distribution for the following data.


11 29 6 33 14 31 22 27 19 20
18 17 22 38 23 21 26 34 39 27
Solutions:
Step 1: Find the highest and the lowest value H=39, L=6
Step 2: Find the range; R=H-L=39-6=33

BY: Habtamu W.(MSc in Biostatistics) Page 8


Basic Statistics Lecture Note 2024/2025

Step 3: Select the number of classes desired using Sturges formula;


k =1+3.32log (20) =5.32=6(rounding up)
Step 4: Find the class width; w=R/k=33/6=5.5=6 (rounding up)
Step 5: Select the starting point, let it be the minimum observation.
 6, 12, 18, 24, 30, 36 are the lower class limits.
Step 6: Find the upper class limit; e.g. the first upper class=12-U=12-1=11
 11, 17, 23, 29, 35, 41 are the upper class limits.
So combining step 5 and step 6, one can construct the following classes.
Class limits
6 – 11
12 – 17
18 – 23
24 – 29
30 – 35
36 – 41
Step 7: Find the class boundaries;
E.g. for class 1 Lower class boundary=6-U/2=5.5
Upper class boundary =11+U/2=11.5
 Then continue adding class width (w) on both boundaries to obtain the rest boundaries. By
doing so one can obtain the following classes.
Class boundary
5.5 – 11.5
11.5 – 17.5
17.5 – 23.5
23.5 – 29.5
29.5 – 35.5
35.5 – 41.5
Step 8: find the class marks (CM)
CMi = (UCLi + LCLi )/ 2, CM1 = 5+11/2 =8.5, then continued to add W to find the rest class
marks. So the class marks are:
8.5, 14.5 , 20.5, 26.5, 32.5, 38.5
Step 8: tally the data.
Step 9: Write the numeric values for the tallies in the frequency column.
Step 10: Find cumulative frequency.
Step 11: Find relative frequency or/and relative cumulative frequency.

BY: Habtamu W.(MSc in Biostatistics) Page 9


Basic Statistics Lecture Note 2024/2025

The complete frequency distribution follows:

Class limit Class boundary Class Tally Freq. Cf (less Cf(more rf. rcf(less than
Mark than than type) type
type)
6 – 11 5.5 – 11.5 8.5 // 2 2 20 0.10 0.10
12 – 17 11.5 – 17.5 14.5 // 2 4 18 0.10 0.20
18 – 23 17.5 – 23.5 20.5 //// // 7 11 16 0.35 0.55
24 – 29 23.5 – 29.5 26.5 //// 4 15 9 0.20 0.75
30 – 35 29.5 – 35.5 32.5 /// 3 18 5 0.15 0.90
36 – 41 35.5 – 41.5 38.5 // 2 20 2 0.10 1.00

2.2.2 Diagrammatic and Graphic Presentation of Data


The most convenient and popular way of describing data is using graphical presentation. It is easier to
understand and interpret data when they are presented graphically than using words or a frequency
table. A graph can present data in a simple and clear way. Also it can illustrate the important aspects of
the data. This leads to better analysis and presentation of the data. In this article, we discuss the
approach for the most commonly used diagrammatic or graphical methods.
2.2.2.1 Diagrammatic Presentation of Data
The three most commonly used diagrammatic presentation for discrete as well as qualitative data are:
 Pie charts
 Bar charts
 Pictogram
A) Pie chart
A pie chart is a circle that is divided in to sections or wedges according to the percentage of frequencies
in each category of the distribution. The angle of the sector is obtained using:

Example: Draw a suitable diagram to represent the following population in a town.


Men Women Girls Boys
2500 2000 4000 1500
Solutions:

Step 1: Find the percentage.

Step 2: Find the number of degrees for each class.

BY: Habtamu W.(MSc in Biostatistics) Page 10


Basic Statistics Lecture Note 2024/2025

Step 3: Using a protractor and compass, graph each section and write its name corresponding
percentage.

Class Frequency Percent Degree


Men 2500 25 90
Women 2000 20 72
Girls 4000 40 144
Boys 1500 15 54

Figure1. Pie chart of the population in a town

Frequency,
Boys, 1500, Frequency,
15% Men, 2500,
25%

Frequency,
Frequency,
Girls, 4000, 40%
Women, 2000,
20%

B) Pictogram: is a device used to represent data by means of pictures or small symbols. We decide
about a suitable picture to represent a definite number of units in which the variable is
measured.
Example: The following table shows the orange production in a plantation from production year
1990-1993. Represent the data by a pictogram.
Production year 1990 1991 1992 1993
Amount (in kg) 3000 3850 3500 5000

Figure 2: Pictogram of the data on Orange productions from 1990 to 1993.

BY: Habtamu W.(MSc in Biostatistics) Page 11


Basic Statistics Lecture Note 2024/2025

C) Bar Charts: Used to represent & compare the frequency distribution of discrete variables and
attributes or categorical series. Bars can be drawn either vertically or horizontally.
In presenting data using bar diagram,

 All bars must have equal width and the distance between bars must be equal.
 The height or length of each bar indicates the size (frequency) of the figure represented.
There are different types of bar charts. The most common being:

 Simple bar chart


 Component or sub divided bar chart.
 Multiple bar charts.
I. Simple bar chart
 Are used to display data on one variable.
 They are thick lines (narrow rectangles) having the same breadth. The magnitude of a quantity
is represented by the height /length of the bar.
Example: Draw a bar chart for the following coffee production data from 1990 to 1995.

Year 1990 1991 1992 1993 1994 1995

Amount (in 50 75 92 64 100 120


1000 tones)

Figure 3: Production of coffee from 1990 to 1995

BY: Habtamu W.(MSc in Biostatistics) Page 12


Basic Statistics Lecture Note 2024/2025

120

Amount of coffee in 1000 tons


100

80

60

40

20

0
1990 1991 1992 1993 1994 1995

Production year

II. Component Bar Chart:


When there is a desire to show how a total (or aggregate) is divided in to its component parts, we use
component bar chart. The bars represent total value of a variable with each total broken in to its
component parts and different colors or designs are used for identifications

Example: The following data represent sale by product, 1957- 1959 of a given company for three
products A, B, C.

Product Sales($) Sales($) Sales($)


In 1957 In 1958 In 1959
A 12 14 18
B 24 21 18
C 24 35 54
Draw a component bar chart to represent the sales by product from 1957 to 1959.

Figure3. Component bar chart of sales by product from 1957 to 1959.

BY: Habtamu W.(MSc in Biostatistics) Page 13


Basic Statistics Lecture Note 2024/2025

SALES BY PRODUCT 1957-1959

100

80

Sales in $
Product C
60
Product B
40
Product A
20

0
1957 1958 1959
Year of production

III. Multiple Bar charts: These are used to display data on more than one variable. They are
used for comparing different variables at the same time.
Example: Draw a multiple bar chart to represent the sales by product from 1957 to 1959.

Figure4. Multiple bar charts sales by product from 1957 to 1959.

Sales by product 1957-1959

60
50
Sales in $

40 Product A
30 Product B
20 Product C

10
0
1957 1958 1959
Year of production

2.2.2.2 Graphical Presentation of Data


The histogram, frequency polygon and cumulative frequency graph or ogive is most commonly applied
graphical representation for continuous data.

Procedures for constructing statistical graphs:


 Draw and label the x and y axes.

BY: Habtamu W.(MSc in Biostatistics) Page 14


Basic Statistics Lecture Note 2024/2025

 Choose a suitable scale for the frequencies or cumulative frequencies and label it on the y-axes.
 Represent the class boundaries for the histogram or ogive or the mid points for the frequency
polygon on the x-axes.
 Plot the points.
 Draw the bars or lines to connect the points.
i. Histogram: is a graph which displays the data by using vertical bars of various heights to
represent frequencies. Class boundaries are placed along the horizontal axes. Class marks and
class limits are sometimes used as quantity on the x-axis.
Example: Construct a histogram for the frequency distribution of the time spent by the automobile
workers. The frequency distribution is:

Time (class boundaries) Class mark Number of workers

15.5-16.5 18.5 3

16.5-27.5 24.5 6

27.5-33.5 30.5 8

33.5-39.5 36.5 4

39.5-45.5 42.5 3

45.5-51.5 48.5 1

Figure 5. The time in minutes spent by automobile workers to travel from home to work.

BY: Habtamu W.(MSc in Biostatistics) Page 15


Basic Statistics Lecture Note 2024/2025

ii. Frequency polygon


Frequency polygon is a line graph. The frequency is placed along the vertical axis and classes mid
points are placed along the horizontal axis. Add two classes with zero frequencies at the two ends of the
frequency distribution; this is to make it a complete polygon.

Example: Construct a frequency polygon for the frequency distribution of the time spent by the
automobile workers.

Figure 6: The time in minutes spent by automobile workers to travel from home to work.

iii. Ogive (cumulative frequency polygon):


Give is a graph plotting the cumulative frequencies of a distribution against the boundaries. There are two
type of Ogive namely less than Ogive and more than Ogive. Less than Ogive plotted against upper class
boundaries and more than Ogive plotted against lower class boundaries. That is class boundaries are
plotted along the horizontal axis and the corresponding cumulative frequencies are plotted along the vertical
axis. The points are joined by a free hand curve.

BY: Habtamu W.(MSc in Biostatistics) Page 16

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy