Chapter 1 INTRODUCTION TO DATA
Chapter 1 INTRODUCTION TO DATA
presentation of data to make valid conclusion. Statistics is divided into: Descriptive and
Inferential.
Descriptive Statistics: It involves scientific methods to collect and present information with
Statistical data are raw facts of statistics. It may relate to an activity of under study, a
phenomenon, or a situation of interest. Statistical data are derived through the process of
measuring, counting and/or observing. An activity or phenomenon that generates data through its
is one that takes on different values upon successive measurements. In statistics, data are
classified into two categories: quantitative data and qualitative data. This classification is based
Quantitative Data: These are data that can be expressed numerically or quantified in definite
units of measurement.
Examples : Age of students taking STS 102, Score of UTME exam, etc. These observations are
Depending on the nature of the variable observed for measurement, quantitative data can be
measurement. Examples include Blood group, Sex, Nationality etc. These data are further
DATA SOURCES
Primary Data: These are data collected directly from the respondent. They are regarded as first
hand information collected by the researcher. Examples of Primary data can be obtained from:
Census
Survey
Secondary data: These are data already existed in form of published or unpublished source.
They are available from published source(s) which may not necessarily in the form actually
required.
Journals publication
The method of data collection depends solely on the problem at hand. There are various methods
Interviewing
Questionnaire
Observation
Telephone
Data Presentation
A set of raw data collected are organized numerically for ease of analysis and
distribution. Presenting data in tables, charts, graphs gives a clearer meaning to the data.
Basic Terms
Class interval : A symbol defining a class, e.g 60–62 is called a class interval. The end numbers,
60
and 62, are called class limits; the smaller number (60) is the lower class limit, and the larger
number (62)
Class Boundaries : the class boundaries are obtained by adding the upper limit of one class
interval to the
Class Width or Class Size: The size, or width, of a class interval is the difference between the
and is also referred to as the class width, class size, or class length. If all class intervals of a
frequency
distribution have equal widths, this common width is denoted by c. In such case c is equal to the
difference between two successive lower class limits or two successive upper class limits.
Class Mark: The class mark is the midpoint of the class interval and is obtained by adding the
class limits and dividing by 2. The class mark is also called the class midpoint.
times a value of the data occurs in the set of all outcomes to the total number of outcomes. To
find the relative frequencies, divide each frequency by the total number of students in the
sample, n.
Cumulative Frequency: it is the sum of a frequency of the particular class to the frequencies of
Frequency Distribution
Ungrouped frequency: it is basically for quantitative data sets. It is best when the range of the
data is less than 10 units. Range is the difference between the largest data value and the smallest
data value. For example, twenty students were asked how many hours they worked per day.
5; 6; 3; 3; 2; 4; 8; 5; 2; 3; 5; 6; 5; 4; 4; 3; 5; 2; 5; 3.
Range= 8-2
=6
Since the range is 6, we will keep each data value separate and not group them together. To
create an ungrouped frequency distribution is a simple task. Place the data values from smallest
to the largest without skipping any values on the first column. Place the frequency, the count of
The table below shows the different data values in ascending order and their frequencies. Notice
all the data values are listed including seven which is not listed on the original data set.
Data Values Frequency(f)
2 3
3 5
4 3
5 6
6 2
7 0
8 1
This second type of frequency distribution is also used when there is quantitative data. However,
it is used when the range is large and the data values need to be grouped together. For example,
28 students were asked how many hours they worked per week. Their responses, in hours, are as
follows:
15; 26; 13; 33; 22; 14; 27; 15; 32; 23; 5; 26; 25; 14; 34; 13; 15; 22; 15; 28; 10; 18; 21; 24; 20; 18;
34; 20;
Here there are too many different data values to list them separately as in the ungrouped
frequency distribution. Notice the range is 29 (highest – lowest = 34 – 5). Therefore we need to
construct a grouped frequency distribution and group data values into classes.
A class is an interval where the lowest value of the interval is known as the lower limit and the
1.) Find Range (R) (highest data value – lowest data value)
2.) Determine the number of classes (C) (usually the minimum is 5 classes and a maximum of 20
classes)
There are several suggested guide lines aimed at helping one decided on how many class
(a) C = 1 +3.322(log10 𝑛)
𝑅
3. Determine the width of the class interval (W), given as W= 𝐶 , where R is the Range of values,
5. Create the other lower limits of the classes by adding the class width to the previous lower
limit
7. Determine the numbers of observations falling into each class interval i.e. find the class
frequencies.
.
Example1: The following are the marks of 50 students in STS 102:
48 70 60 47 51 55 59 63 68 63 47 53 72 53 67 62 64 70 57 56
48 51 58 63 65 62 49 64 53 59 63 50 61 67 72 56 64 66 49 52
61 71 58 53 63 69 59 64 73 56.
(iii) what is the probability that a student selected at random from the class will
Solution:
= 73-47=26
Frequency Table
47-50 |||| || 7
51-54 |||| || 7
55-58 |||| || 7
67-70 |||| | 6
71-74 |||| 4
50
b. i. 7+7+8 = 22
ii. 7+7+8+11+6+4= 43
Example2: Twenty-eight students were asked how many hours they worked per week. Their
responses, in hours, are as follows: 15; 26; 13; 33; 22; 14; 27; 15; 32; 23; 5; 26; 25; 14; 34; 13;
15; 22; 15; 28; 10; 18; 21; 24; 20; 18; 34; 20; construct a grouped frequency distribution using 5
classes
Solution:
1. Range = 34 – 5 = 29
2. Use 5 classes
5. The other lower limits will be 11, 17, 23, 29 by adding the class width of 6 to the previous
lower limit
6. The first upper limit will be 10 since the next class begins at 11. Using class width again, the
5- 10 || 2
17- 22 |||| || 7
23- 28 |||| || 7
29-34 |||| 4
28
ASSIGNMENT 1
The following data represent the ages (in years) of people living in a housing estate
in Abeokuta.
18 31 30 6 16 17 18 43 2 8 32 33 9 18 33 19 21 13 13 14
14 6 52 45 61 23 26 15 14 15 14 27 36 19 37 11 12 11 20 12
39 20 40 69 63 29 64 27 15 28.
Present the above data in a frequency table showing the following columns; class
interval, class boundary, class mark (mid-point), tally, frequency and cumulative
ASSIGNMENT 2
The grade points of 40 students are given below, using class 8 classes, construct a frequency
48 70 60 47 51 55 59 63 68 63 47 53 72 53 67 62 64 70 57 56
48 51 58 63 65 62 49 64 53 59 63 50 61 67 72 56 64 66 49 52