0% found this document useful (0 votes)
34 views9 pages

Chapter 1 INTRODUCTION TO DATA

Notes for computer science

Uploaded by

zarahrasheed1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views9 pages

Chapter 1 INTRODUCTION TO DATA

Notes for computer science

Uploaded by

zarahrasheed1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Statistics is the scientific methods of collecting, analyzing, summarizing, interpreting, and

presentation of data to make valid conclusion. Statistics is divided into: Descriptive and

Inferential.

Descriptive Statistics: It involves scientific methods to collect and present information with

graphs and numerical values.

Inferential Statistics: Involves the use of probability to generalize base on a sample of

population from a larger population to make conclusion.

DATA AND DATA SOURCES

Statistical data are raw facts of statistics. It may relate to an activity of under study, a

phenomenon, or a situation of interest. Statistical data are derived through the process of

measuring, counting and/or observing. An activity or phenomenon that generates data through its

process is termed as a variable. In other words, a variable

is one that takes on different values upon successive measurements. In statistics, data are

classified into two categories: quantitative data and qualitative data. This classification is based

on the kind of characteristics that are measured.

Quantitative Data: These are data that can be expressed numerically or quantified in definite

units of measurement.

Examples : Age of students taking STS 102, Score of UTME exam, etc. These observations are

expressed using numbers or quantified.

Depending on the nature of the variable observed for measurement, quantitative data can be

further categorized as continuous and discrete data.


Qualitative Data: These data cannot be expressed in numbers or quantified in unit of

measurement. Examples include Blood group, Sex, Nationality etc. These data are further

classified as nominal and rank data.

DATA SOURCES

The sources of data is divided into: Primary and Secondary data

Primary Data: These are data collected directly from the respondent. They are regarded as first

hand information collected by the researcher. Examples of Primary data can be obtained from:

 Census

 Survey

Secondary data: These are data already existed in form of published or unpublished source.

They are available from published source(s) which may not necessarily in the form actually

required.

Examples of secondary data include:

 Journals publication

 Research or Media organization

Methods of Data Collection

The method of data collection depends solely on the problem at hand. There are various methods

of collection of data viz-a-viz :

 Interviewing

 Questionnaire

 Observation

 Telephone
Data Presentation

A set of raw data collected are organized numerically for ease of analysis and

presentation. This is done by creating frequency table which is known as frequency

distribution. Presenting data in tables, charts, graphs gives a clearer meaning to the data.

Basic Terms

Class interval : A symbol defining a class, e.g 60–62 is called a class interval. The end numbers,

60

and 62, are called class limits; the smaller number (60) is the lower class limit, and the larger

number (62)

is the upper class limit.

Class Boundaries : the class boundaries are obtained by adding the upper limit of one class

interval to the

lower limit of the next-higher class interval and dividing by 2.

Class Width or Class Size: The size, or width, of a class interval is the difference between the

lower and upper class boundaries

and is also referred to as the class width, class size, or class length. If all class intervals of a

frequency

distribution have equal widths, this common width is denoted by c. In such case c is equal to the

difference between two successive lower class limits or two successive upper class limits.

Class Mark: The class mark is the midpoint of the class interval and is obtained by adding the

lower and upper

class limits and dividing by 2. The class mark is also called the class midpoint.

Frequency: A frequency is the number of times a value of the data occurs


Relative Frequency: A relative frequency is the ratio (fraction or proportion) of the number of

times a value of the data occurs in the set of all outcomes to the total number of outcomes. To

find the relative frequencies, divide each frequency by the total number of students in the

sample, n.

Cumulative Frequency: it is the sum of a frequency of the particular class to the frequencies of

the class before it.

Frequency Distribution

Frequency distribution is classified as: grouped and ungrouped frequency distribution.

Ungrouped frequency: it is basically for quantitative data sets. It is best when the range of the

data is less than 10 units. Range is the difference between the largest data value and the smallest

data value. For example, twenty students were asked how many hours they worked per day.

Their responses, in hours, are as follows:

5; 6; 3; 3; 2; 4; 8; 5; 2; 3; 5; 6; 5; 4; 4; 3; 5; 2; 5; 3.

Range= 8-2

=6

Since the range is 6, we will keep each data value separate and not group them together. To

create an ungrouped frequency distribution is a simple task. Place the data values from smallest

to the largest without skipping any values on the first column. Place the frequency, the count of

each data value, in the corresponding row of the second column.

The table below shows the different data values in ascending order and their frequencies. Notice

all the data values are listed including seven which is not listed on the original data set.
Data Values Frequency(f)

2 3

3 5

4 3

5 6

6 2

7 0

8 1

Frequency distribution of students work hours

Grouped Frequency Distribution

This second type of frequency distribution is also used when there is quantitative data. However,

it is used when the range is large and the data values need to be grouped together. For example,

28 students were asked how many hours they worked per week. Their responses, in hours, are as

follows:

15; 26; 13; 33; 22; 14; 27; 15; 32; 23; 5; 26; 25; 14; 34; 13; 15; 22; 15; 28; 10; 18; 21; 24; 20; 18;

34; 20;

Here there are too many different data values to list them separately as in the ungrouped

frequency distribution. Notice the range is 29 (highest – lowest = 34 – 5). Therefore we need to

construct a grouped frequency distribution and group data values into classes.

A class is an interval where the lowest value of the interval is known as the lower limit and the

highest value of the interval is known as the upper limit.

Guidelines for classes:


 There should be between 5 and 20 classes

 Classes must be mutually exclusive (no overlap of data values)

 Classes must be all inclusive and continuous

 Classes must be equal in width

Constructing a Grouped Frequency Distribution:

1.) Find Range (R) (highest data value – lowest data value)

2.) Determine the number of classes (C) (usually the minimum is 5 classes and a maximum of 20

classes)

There are several suggested guide lines aimed at helping one decided on how many class

intervals to employ. Two of such methods are:

(a) C = 1 +3.322(log10 𝑛)

(b) C = 𝑛 where n = number of observations.

𝑅
3. Determine the width of the class interval (W), given as W= 𝐶 , where R is the Range of values,

and C is number of classes.

Note: Class width are rounded up to give number of classes.

4. Choose first lower limit (usually the lowest data value)

5. Create the other lower limits of the classes by adding the class width to the previous lower

limit

6. Create the upper limits by not overlapping the limits

7. Determine the numbers of observations falling into each class interval i.e. find the class

frequencies.

.
Example1: The following are the marks of 50 students in STS 102:

48 70 60 47 51 55 59 63 68 63 47 53 72 53 67 62 64 70 57 56

48 51 58 63 65 62 49 64 53 59 63 50 61 67 72 56 64 66 49 52

61 71 58 53 63 69 59 64 73 56.

(a) Construct a frequency table for the above data.

(b) Answer the following questions using the table obtained:

(i) how many students scored between 51 and 62?

(ii) how many students scored above 50?

(iii) what is the probability that a student selected at random from the class will

score less than 63?

Solution:

(a) Range (R) = Largest value – Smallest Value

= 73-47=26

No of classes(C) = 𝑛 = 50= 7.07≅ 7


𝑅 26
Class size or width (W)= 𝐶 = = 3.7 ≅ 4
7

Frequency Table

Marks Tally Frequency (f)

47-50 |||| || 7

51-54 |||| || 7

55-58 |||| || 7

59-62 |||| ||| 8


63-66 |||| |||| | 11

67-70 |||| | 6

71-74 |||| 4

50

b. i. 7+7+8 = 22

ii. 7+7+8+11+6+4= 43

iii. scores less than 63= 8+7+7+7= 29

Total number of students= 50

Prob(less than 63) = 29/50= 0.58

Example2: Twenty-eight students were asked how many hours they worked per week. Their

responses, in hours, are as follows: 15; 26; 13; 33; 22; 14; 27; 15; 32; 23; 5; 26; 25; 14; 34; 13;

15; 22; 15; 28; 10; 18; 21; 24; 20; 18; 34; 20; construct a grouped frequency distribution using 5

classes

Solution:

1. Range = 34 – 5 = 29

2. Use 5 classes

3. Class Width = 29/5 = 5.8 round up to 6

4. First lower limit will be 5 which is the minimum data value

5. The other lower limits will be 11, 17, 23, 29 by adding the class width of 6 to the previous

lower limit
6. The first upper limit will be 10 since the next class begins at 11. Using class width again, the

other upper limits are 16, 22, 28, 34

Class Tally Frequency (f)

5- 10 || 2

11-16 |||| ||| 8

17- 22 |||| || 7

23- 28 |||| || 7

29-34 |||| 4

28

ASSIGNMENT 1

The following data represent the ages (in years) of people living in a housing estate
in Abeokuta.
18 31 30 6 16 17 18 43 2 8 32 33 9 18 33 19 21 13 13 14
14 6 52 45 61 23 26 15 14 15 14 27 36 19 37 11 12 11 20 12
39 20 40 69 63 29 64 27 15 28.
Present the above data in a frequency table showing the following columns; class

interval, class boundary, class mark (mid-point), tally, frequency and cumulative

ASSIGNMENT 2

The grade points of 40 students are given below, using class 8 classes, construct a frequency

distribution and relative frequency

48 70 60 47 51 55 59 63 68 63 47 53 72 53 67 62 64 70 57 56

48 51 58 63 65 62 49 64 53 59 63 50 61 67 72 56 64 66 49 52

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy