0% found this document useful (0 votes)
29 views35 pages

GEC4 Mathematics in The Modern World CHAPTER 4

Chapter 4 covers data management, focusing on methods for collecting, organizing, and presenting data, as well as the importance of statistics in decision-making. It differentiates between descriptive and inferential statistics, outlines types of data and sampling methods, and discusses data organization through frequency distributions and various presentation techniques. The chapter aims to equip students with the skills to analyze and interpret data effectively using appropriate summary measures.

Uploaded by

dondon manlangit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views35 pages

GEC4 Mathematics in The Modern World CHAPTER 4

Chapter 4 covers data management, focusing on methods for collecting, organizing, and presenting data, as well as the importance of statistics in decision-making. It differentiates between descriptive and inferential statistics, outlines types of data and sampling methods, and discusses data organization through frequency distributions and various presentation techniques. The chapter aims to equip students with the skills to analyze and interpret data effectively using appropriate summary measures.

Uploaded by

dondon manlangit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

Chapter 4

Data Management
Learning Outcomes:

1. At the end of the chapter, the students shall be able to:


2. differentiate the methods of collecting, organizing and presenting the data.
3. compare the different summary measures
4. choose the most appropriate summary measures to be used in a given situation or data
5. analyze and interpret data using several summary measures
6. use the methods of linear regression and correlations to predict the value of a variable
given certain conditions

Introduction

What is data management? In a broader sense, it encompasses a variety of different


techniques that facilitate and ensure data control and flow from creation to processing, utilization
and deletion. In a simpler sense, it discusses about how researchers explain and interpret the
gathered data. It discusses on basic concepts in statistics that we encounter in our daily life. It is
an aid in decision making specifically in one’s company which relies on their consumers for its
advancement

What is statistics? What is its importance in our everyday living?

Statistics is a branch of science which involves the collection, presentation, analysis and
interpretation of numerical data. It provides us procedures in collecting the data, presenting,
analysing and interpreting of gathered data that are useful to business decision-makers (Sirug,
2015)

4.1 Preliminaries

AREAS OF STATISTICS

There are two main areas of statistics: descriptive and inferential statistics.

Descriptive statistics is the simple collecting, presenting, and analyzing of the data and
its primary purpose is only to describe the characteristics of the population/sample under
investigation.

Inferential statistics is the logical process from analyzing to generalizing or concluding


about the population under study. From the gathered data from the sample, it arrives with a
conclusion to a particular population.

TYPES OF DATA

There are two basic types of data: qualitative and quantitative data.
GEC 4 Mathematics in the Modern World , First Semester,AY 2021-2022
Quantitative Data (also termed as numerical data)- is the data that involves quantities
which came from counting measurement. Its value differs in degree.
Examples are height, weight, number of employees, salary, etc.

This can also be classified as either discrete or continuous. Discrete data are those data
that can be counted like number of students, number of likes and shares in FB post, etc. While
continuous data are those data that are obtained through measurement like weight, length of your
hair, thickness of your eyeglasses, etc.

Qualitative data (also termed as categorical data) is the data that involves qualities which
cannot be measured. Examples are sex, nationality, color of skin, religion, etc.

Levels of Measurement
Data can be classified according to the levels of measurement. This classification includes
nominal, ordinal, interval and ratio data.

1. Nominal level is the lowest level of data and is used purely for classification and
identification purposes only. Examples of this level are gender, house number, home
ownership, etc.

2. Ordinal level is used in ranking. It is stronger form of measurement. However, it is also


considered as the weak level because there is no meaningful numerical statement
about the differences between the ranks. Examples are honor ranks, academic rank,
level of awareness, etc.

3. Interval level – it specifies the precise difference between or among the values or
ranks.

4. Ratio level- has the same characteristics as the interval, however, the ratio level starts
from zero. In addition, it has a presence of units of measures.

4.2 Data Collection, Organization, Presentation

4.2.1 Data Collection

The following are some methods of collecting data:

1. Survey Method. The researcher asks people or the respondents


questions relative to the research topic either personally or through questionnaires
sent through courier or electronically. Interview is one technique under this
method. A survey method may be classified into two according to the coverage in
terms of respondents:
a. Census – the entire population is covered
b. Sample Survey – makes use of only a segment of the population. Considering
that a relatively smaller group is covered compared with a census, this method
could be more detailed. This however is usually less expensive. Types of data

GEC 4 Mathematics in the Modern World , First Semester,AY 2021-2022


collection could be personal interview, self-administered questionnaire or via
internet.

2. Observation Method. In this method, the researcher observes the


subjects individually or group of individuals to obtain information relative to the
objectives of the investigation. It is important to note that the behavior of the
subjects is recorded only during the occurrence. This method is commonly used in
the field of sociology, anthropology, psychology, psychiatry and others.

3. Experimentation. This method of data collections involves the


researcher’s intervention on the conditions that may affect the outcome of the
variables of interest.

4. Use of Existing Records/Sources. This method makes use of


documentary sources such as published articles, journals, unpublished materials,
reports, and others. Experts may also be considered here as sources.

Basic Sampling Method

As it has been said, in data gathering, it is usually less expensive when only a segment of
the population, or sample, is considered. Apart from economy reasons, that is, saving money,
time, and effort, gathering data from a sample is easier and at times, more practical. The following
are sampling techniques that may be used.

A. Probability Sampling - each unit in the population has a known probability of selection, and a
random number table or other randomization mechanism is used to choose the specific units
to be included in the sample
- relatively small sample can be used to make inferences about an arbitrarily
large population
1. Simple Random Sampling (SRS). This is the simplest form of probability sampling wherein
all the elements of the population have equal chances of being selected as
sample. This usually serves as the foundation of more complex sampling design.

a. Without replacement - every possible subset of n distinct units in the population has
the same probability of being selected as sample
𝑁!
- there are (𝑁) = ̅̅̅̅̅̅̅̅̅̅̅̅̅̅ possible samples
𝑛 𝑛! (𝑁 − 𝑛)!
- probability of selecting any individual sample S of n units is
1 𝑛! (𝑁 − 𝑛)!
𝑃(𝑆) = 𝑁 =
( ) 𝑁!
𝑛
- the probability that the 𝑖th unit appears in the sample is πi = n/N
b. With replacement - the probability of each element to be chosen as sample is 1/N
- may include duplicates from the population

2. Systematic Sampling - starting point is chosen from a list of population members using a
random number

GEC 4 Mathematics in the Modern World , First Semester,AY 2021-2022


- every kth unit thereafter, is chosen to be in the sample. If N/n is an integer, let k =
N/n; otherwise, let k be the next integer after N/n.
- consists of units that are equally spaced in the list
- may not be representative of the population

3. Stratified Random Sampling. In this sampling method, the elements are divided into
subgroups called strata. Then a random sample of units is taken from each
stratum. Elements in the same stratum often tend to be more similar than
randomly selected elements from the whole population, so stratification often
increases precision.

4. Cluster Sampling. Here, observation units in the population are aggregated into larger
sampling units, called clusters, and sampling is done on clusters and uses all
members of the cluster as samples.

Note:

Elements in the same stratum often tend to be more similar than randomly selected elements
from the whole population, so stratification often increases precision.

Illustration:

Suppose you want to estimate the average amount of time that professors at CSU
say they spent grading homework in a specific week.

 To take an SRS, construct a list of all professors and randomly select n of them
to be your sample. Now ask each professor in your sample how much time he or she spent
grading homework that week—you would of course have to define the words homework
and grading carefully in your questionnaire.

 In a stratified sample, you might classify faculty by college: engineering, CAS,


CBA, CHS, CoEd, CIT, and CICT. You would then take an SRS of faculty in the
engineering college, a separate SRS of faculty in CAS, and so on.

 For a cluster sample, you might randomly select 3 of the 10 academic


departments in the university and ask each faculty member in those departments how
much time he or she spent grading homework.

 A systematic sample could be chosen by selecting an integer at random


between 1 and 20; if the random integer is 16, say, then you would include professors in
positions 16, 36, 56, and so on, in the list of faculty members of the university.

B. Non-Probability Sampling - Not all units in the population has a chance of being selected as
sample
GEC 4 Mathematics in the Modern World , First Semester,AY 2021-2022
1. Convenience Sampling - Sampling done based on the convenience of the researcher.

2. Purposive Sampling - Sampling based on expert's opinion/judgment. Usually done in


clinical and experimental research.

3. Quota Sampling - a quota or a certain number of sample is identified based on literature

4.2.2 Organizing Data

Frequency Distribution. This organizes raw data in table form, using classes and frequencies
or counts. Each raw data value is placed into a quantitative or qualitative category called
a class. The frequency of a class is the number of data values contained in that specific
class.

A frequency distribution is used to:


 organize the data in a meaningful, intelligible way.
 enable the reader to determine the nature or shape of the distribution.
 facilitate computational procedures for measures of average and spread.
 enable the researcher to draw charts and graphs for the presentation of data.
 enable the reader to make comparisons among different data sets.

Types of Frequency Distribution

1. Categorical Frequency Distributions - used for data that can be placed in specific categories,
such as nominal or ordinal level data.

Example 1:

Distribution of Blood Types


Twenty-five army inductees were given a blood test to determine their blood type.
The data set is:

A B B AB O
O O B AB B
B B O A O
A O O O AB
AB A O B A

Construct a frequency distribution for the data.

The frequency distribution corresponding to the given data is as follows.


GEC 4 Mathematics in the Modern World , First Semester,AY 2021-2022
Class Tally Frequency Percent
A llll 5 20
B llll - ll 7 28
O llll - llll 9 36
AB llll 4 16
TOTAL 25 100

2. Ungrouped Frequency Distribution - used for data whose range of values is relatively small.
The single data values are used as classes.

Example 2:

MPGs for SUVs


The data shown here represent the number of miles per gallon (mpg) that 30
selected four-wheel-drive sports utility vehicles obtained in city driving. Construct a
frequency distribution, and analyze the distribution.

12 17 12 14 16 18
16 18 12 16 17 15
15 16 12 15 16 16
12 14 15 12 15 15
19 13 16 18 16 14

Below is the corresponding frequency distribution.

3. Grouped Frequency Distributions - used for data that has a very large range. Data are
grouped into classes that are more than 1 unit in width.

GEC 4 Mathematics in the Modern World , First Semester,AY 2021-2022


4.2.3 Presentation of Data

Tabular Form - effective devices of presenting both qualitative and quantitative data.
- make comparisons and draw relationships between and among variables

The following are the parts of a table:


1. Table Heading - table number and title
2. Body - contains the quantitative data
3. Stubs- labels that classify values of variables
4. Box heads - captions above the columns
5. Footnote
6. source note
Exploratory/Graphical Method - uses visual representation of data like graphs to describe the data

1. Histograms - uses bars of various heights to represent the frequencies.


- draw and label the x and y axes. X being the class boundaries and Y, the
frequencies.
- using the frequencies as the heights, draw vertical bars for each class.

Since the class boundaries are used in the graph, the bars in a histogram are
contiguous, unlike those in a bar or column chart.

Example: Record high Temperatures for each of the 50 states.

Class Boundaries Frequency


99.5 – 104.5 2
104.5 – 109.5 8
109.5 – 114.5 18
114.5 – 119.5 13
119.5 – 124.5 7
124.5 – 129.5 1
129.5 – 134.5 1

GEC 4 Mathematics in the Modern World , First Semester,AY 2021-2022


2. Frequency Polygon - graph that displays the data by using lines that connect points plotted
for the frequencies at the midpoints of the classes. The frequencies are
represented by the heights of the points.

Class Boundaries Midpoints Frequency


99.5 – 104.5 102 2
104.5 – 109.5 107 8
109.5 – 114.5 112 18
114.5 – 119.5 117 13
119.5 – 124.5 122 7
124.5 – 129.5 127 1
129.5 – 134.5 132 1
To construct a frequency polygon, follow the steps below.
𝐿𝐶𝐵 + 𝑈𝐶𝐵
i. Find the midpoint of each class as ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
2
ii. Draw the x and y axes. Label the x axis with the midpoint of each class, and
then use a suitable scale on the y axis for the frequencies.
iii. Using the midpoints for the x values and the frequencies as the y values, plot
the points
iv. Connect adjacent points with line segments. Draw a line back to the x axis at
the beginning and end of the graph, at the same distance that the previous
and next midpoints would be located

The graph below shows the frequency polygon corresponding to the distribution above.

GEC 4 Mathematics in the Modern World , First Semester,AY 2021-2022


3. Cumulative Frequency Graph or Ogive - used to represents the cumulative frequencies for
the classes.
- used to visually represent how many values are below a certain upper class
boundary

i. Find the cumulative frequency for each class,


ii. Draw the x and y axes. Label the x axis with the class boundaries. Use an
appropriate scale for the y axis to represent the cumulative frequencies.
iii. Plot the cumulative frequency at each upper class boundary. Upper
boundaries are used since the cumulative frequencies represent the number
of data values accumulated up to the upper boundary of each class
iv. Starting with the first upper class boundary, connect adjacent points with line
segments. Then extend the graph to the first lower class boundary, on the x
axis

The cumulative frequency distribution of the data is as follows.

Cumulative Frequency
less than 99.5 0
less than 104.5 2
less than 109.5 10
less than 114.5 28
less than 119.5 41
less than 124.5 48
less than 129.5 49
less than 134.5 50

The ogive is given below.


GEC 4 Mathematics in the Modern World , First Semester,AY 2021-2022
4. Bar graphs - used when the data is qualitative or categorical
- vertical/horizontal bars are used to represent frequency to be able to compare
the different categories

Example:

GEC 4 Mathematics in the Modern World , First Semester,AY 2021-2022


5. Time Series Graph - used to represent data that occur over a specific period of time

Example:

GEC 4 Mathematics in the Modern World , First Semester,AY 2021-2022


6. Pie Graph - show relationships of the parts to the whole by visually comparing the sizes of
the sections.
- sections/wedges are divided according to the percentage of frequencies in
each category of the distribution

GEC 4 Mathematics in the Modern World , First Semester,AY 2021-2022


ASEAN Population Distribution, 2020
1% 1% 0% 0% Indonesia
3%
Philippines
5%
8% Vietnam
Thailand
41%
10% Myanmar
Malaysia
Cambodia
15%
Laos

16% Singapore
Timor-Leste

7. Scatter Plot - shows relationship between two variables.

GEC 4 Mathematics in the Modern World , First Semester,AY 2021-2022


___________________________________________________________________________

Exercise 4.1

A. Direction: Identify whether the following situations can be considered as descriptive or


inferential.
1. A store owner finds out that 80% of daily sales came from soft drinks.
2. A researcher claimed that the academic performance of students were affected by the
size of the family.
3. A school nurse reported that the nutritional status of some pupils were severely wasted.
4. The teacher finds that the average score of the students in a 40-item test is 23.
5. The social worker studied that there is a significant effect in communication of preschool
children using electronic gadgets.

B. Direction: Classify the following data according to its types.

Quantitative/Qualitative Level of Measurement


1. Volume of rice production
(in sacks)
2. Scores in a test
3. Monetary prizes in a contest
4. Awards in pageant
5. Membership in org
6. Academic Rank
7. Monthly income
8. Civil Status
9. Religious Affiliation
10. Length of Service

GEC 4 Mathematics in the Modern World , First Semester,AY 2021-2022


4.2 Measures of Central Tendency

The measures of central tendency are used to determine the cluster of the data about the
center. The most common measures of central tendency are the mean, median and mode.

4.2.1 MEAN
The mean or arithmetic mean (𝒙 ̅) is the average of all the values in the data set. It can
be obtained by getting the sum of all the observations divided by the total number of observations.

For sample mean:


𝑛

∑ 𝑋𝑖
where: 𝑋𝑖 is the individual observations
𝑋̅ = 𝑖=1
̅̅̅̅̅̅̅ 𝑛 is the total number of observations
𝑛

Example 1: Player’s Height

Find the mean height of the 12 basketball players whose heights (in cm) are 150,
160, 163, 159, 174, 178, 165, 156, 187, 176, 175, 180.
Solution: Let X be the height of the players and n for the total number of players

Using the equation for finding the mean,

150 + 160 + 163 + 159 + 174 + 178 + 165 + 156 + 187 + 176 + 175 + 180
𝑋̅ = ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
12
2,023
= ̅̅̅̅̅̅̅̅ = 168.6
12

Note:
If the observations are in whole number, the final answer must be in tenth place, while if
the raw data has one decimal place, then its final answer must be in two decimal and so on.

Example 2: Infants’ Weight

What is the mean of the set of values: 6.7, 4.6, 5.5, 3.4, 8.2, and 5.8

Solution:
6.7 + 4.6 + 5.6 + 3.4 + 8.2 + 5.9
𝑋̅ = ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
6
34.4
= ̅̅̅̅̅̅ = 5.73
6

Example 3: Test Scores

Twelve students were given an arithmetic test and the times (in minutes) to
complete it were as follows:
GEC 4 Mathematics in the Modern World , First Semester,AY 2021-2022
10, 9, 12, 11, 8, 15, 9, 7, 8, 6, 12, 10

Calculate the average time to complete the test.

Solution:
117
𝑋̅ = ̅̅̅̅̅̅ = 9.8
12

Therefore, the average time to complete the arithmetic test is 9.8 minutes.

4.2.2 WEIGHTED MEAN

There are some cases when individual values do not have equal importance. A weighted
mean is appropriate to use. The formula in the computation of the weighted arithmetic mean is:
n

W X i i
W1 X 1  W2 X 2  . ..  Wn X n
Xw  i 1

n
W1  W2  ...  Wn
W
i 1
i

where: Xi represents each of the item values


W represents the weight of each item value

Example 1: General Weighted Average (GWA)

Suppose Mark wants to determine his General Weighted Average of the subjects
for the last semester he was enrolled as follows:

Subjects Units Rating


A 3 2.5
B 3 2.2
C 4 1.8
D 5 1.5
E 3 2.0
F 2 1.2

Solution:

W X i i
W1 X 1  W2 X 2  . ..  Wn X n
Xw  i 1

n
W1  W2  ...  Wn
W
i 1
i

(3)(2.5) + (3)(2.2) + (4)(1.8) + (5)(1.5) + (3)(2.0) + (2)(1.2)


=
̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
3+3+4+5+3+2
GEC 4 Mathematics in the Modern World , First Semester,AY 2021-2022
7.5 + 6.6 + 7.2 + 7.5 + 6.0 + 2.4
= ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
20
37.2
= ̅̅̅̅̅̅̅ = 1.86
20

Properties of the Mean:


1. It is unique value in a data set and always exists.
2. It is affected by the extreme and deviant values.
3. It is used only if the data are interval or ratio and when normally distributed.
4. It is the most reliable measure of central tendency.

4.2.3 MEDIAN

The median is the middlemost value in the data set. It divides the distribution into two
equal parts.

If the number of observation is even, the median is the average of the two middle values,
while if the number of observation is odd, then the median is the middlemost value in the data set.

𝑛+1
Median (Rank Value) = ̅̅̅̅̅̅̅
2

Example 1:

Find the median height of the 12 basketball players whose heights (in cm) were as
follows:
150, 160, 163, 159, 174, 178, 165, 156, 187, 176, 175, 180.

Solution:
The first step is to arrange the data in an increasing order (from lowest to highest).
Thus, 150, 156, 159, 160, 163, 165, 174, 175, 176, 178, 180, 187

Since the data set is an even number of observation, we will be getting the average
of the two middle values and following the formula for the median (rank value)

Median (rank value) = 12 + 1 = 6.5


2

Since the middle value falls on the 6.5, then we are going to get its 6th and 7th value.
Therefore,

165 + 174 339


Median (x) = ̅̅̅̅̅̅̅̅̅̅̅̅̅̅ = ̅̅̅̅̅̅ = 169.5
2 2

Example 2:
The daily rates of a sample of 9 employees at GMS Inc. are ₱550, ₱420, ₱650,
₱500, ₱700, ₱480, ₱520, ₱860, and ₱670. Find the median rate.

Solution:
The first step is to arrange the data set in an increasing order. Thus,
GEC 4 Mathematics in the Modern World , First Semester,AY 2021-2022
₱420, ₱480, ₱500, ₱520, ₱550, ₱650, ₱670, ₱700, ₱860

Since the data set is an odd number, then the median is the middlemost value. Therefore,
the median is the 5th value which is ₱550.

Properties of the Median:


1. It is a positional average and it is not influenced by the position of the items.
2. Like the mean, it may also be a unique value in a data set.
3. It is associated with ordinal data.

4.2.4 MODE

The mode is the most frequent observation. It is the observation which occur most often
in the data set.

If there is only one observation having the highest frequency, then the data set is said to
be unimodal. If it has two, then it is bimodal. If it has three observations with the same highest
frequency, it is said to be trimodal. And, if there is no repetition of the individual values in the
data set, no mode exists.

Example 1:
The following are the scores of the students in Mathematics quiz. Determine the
mode of the data set.
40, 27, 20, 40, 26, 24, 25, 29, 30, 31, 27, 33, 39, 36, 22, 36, 28, 27, 27,
26, 20, 21, 30, and 19.

Solution:

The most frequent number that appears in the data set is 27. Since there is only one
observation having the highest frequency, then it is unimodal.

Example 2:
Determine the mode of the grades of 19 engineering students in Mathematics
subject as follows:
2.2, 1.7, 2.1, 2.0, 1.9, 2.3, 2.0, 2.4, 1.9, 2.1, 2.2, 2.4, 2.0, 1.9, 2.1,
2.1, 2.1, 2.0, 2.0

Solution:
Since the grade of 2.0 and 2.1 appeared the most and with the same number of times,
then they are considered as the mode. The type of mode is bimodal.

Properties of the Mode:


1. It is not affected by the extreme and deviant values.
2. It may not exist.
3. If it exists, it may not always be unique.
4. It is usually associated with nominal data.

GEC 4 Mathematics in the Modern World , First Semester,AY 2021-2022


Exercise 4.2
Direction: Answer the following problems.
1. Calculate the mean, median and mode of the following sets of data.
a. 14, 16, 8, 8,10, 12, 9, 12, 13, 14, 8, 9, 5, 7, 11
b. 12.5, 15.5, 10.5, 10.2, 11.15, 9.5, 8.2, 18.3, 10.5, 10.5
2. Heather’s test scores are 81, 93, 74 and 95. What score must she get on the fifth test in
order to get a mean of 85 on all five tests?
3. The final grade of a student in six subjects are as follows:

Subject No. of Units Final Grade


Math in the Modern World 3 2.0
Readings in Phil. History 3 1.9
Wellness and Fitness 3 1.4
Understanding the Self 3 1.5
Masining na Pagpapahayag 3 1.5
Calculus 1 4 1.8

a. Determine his weighted mean grade.


b. If the subjects are of equal number of units, what would be the average grade of the
student?

4.3 Measures of Dispersion

The measures of dispersion or variability tell about the spread of the data or how the
individual values are dispersed from the mean. The common measures of dispersion are the
range, variance and standard deviation.

The range is the simplest and easiest to compute measure of dispersion. It is obtained by
subtracting the lowest value from the highest value in the data set.

The variance is defined as the average of the squared deviations from the mean. The
square root of this variance is known as the standard deviations. The variance for a sample data
is denoted by s2 while the population variance is σ2.

The formula for the sample variance is:


∑(𝑥 − 𝑥̅ )2
𝑠2 =
̅̅̅̅̅̅̅̅̅̅̅̅̅̅
𝑛−1

and for the population variance is:


∑(𝑥 − 𝑥̅ )2
𝜎2 =
̅̅̅̅̅̅̅̅̅̅̅̅̅̅
𝑁

To determine the variance of ungrouped data, let us follow the steps below:
1. Arrange the values in order (i.e. increasing or decreasing) vertically.
GEC 4 Mathematics in the Modern World , First Semester,AY 2021-2022
2. Calculate the mean of the data set.
3. Subtract the mean from the individual values. Place this on another column.
4. Add another column for the square of the difference of individual values and the mean.
5. Get the sum of the squared deviations.
6. Divide the sum in step 5 by n-1 for a sample data and N for the population data.

Example 1:
Determine the range, variance and standard deviation of the following data on a sample
of weights of pre-school children: 25.2, 19.5, 20.4, 21.5, 18.2, 16.0, 17.8, 17.6

Solution:
a. Range = highest value – lowest value
= 25.2 – 16.0
= 9.2

b. To get the variance and standard deviation, make a table as follows:

𝑋 𝑋 − 𝑋̅ (𝑋 − 𝑋̅)2
16.0 –3 .24 10.50
17.6 –1.64 2.69
17.8 –1.44 2.07
18.2 –1.04 1.08
19.5 0.26 0.07
20.4 1.16 1.34
25.2 5.96 35.52
𝑋̅= 19.24 (𝑋 − 𝑋̅)2 = 53.27

∑(𝑥 − 𝑥̅ )2 53.27 53.27


𝑠2 = = ̅̅̅̅̅̅̅̅̅ = ̅̅̅̅̅̅̅̅ = 8.88
̅̅̅̅̅̅̅̅̅̅̅̅̅̅
𝑛−1 7 − 1 6

c. Since the standard deviation is the square root of the variance, then
𝑠 = √8.88 = 2.98

This value of the standard deviations implies that the cluster of observation is in the range
of 2.98 units above and below the mean.

Example 2:
The marks of 10 students of a class is given to be 0, 4, 9, 12, 25, 2, 21, 7, 11 and
12. What is the variance of the data set?

Solution:
Step 1: Organize the marks of the students in a table.

Marks (x) 𝑥 − 𝑥̅ (𝑥 − 𝑥̅ )2
25 14.7 216.09
21 10.7 114.49
12 1.7 2.89
12 1.7 2.89
11 0.7 0.49
9 – 1.3 1.69
GEC 4 Mathematics in the Modern World , First Semester,AY 2021-2022
7 – 3.3 10.89
4 – 6.3 39.69
2 – 8.3 68.89
0 – 10.3 106.09
𝑥̅ = 10.3 (𝑥 − 𝑥̅ )2 = 56.4

Step 2: Substitute the values in the equation for variance.


∑(𝑥 − 𝑥̅ )2 564.1
𝑠2 = = ̅̅̅̅̅̅̅̅ = 62.67
̅̅̅̅̅̅̅̅̅̅̅̅̅̅
𝑛−1 9

Step 3: Get the square root of the variance to obtain the standard deviation.
√𝑠2 = √62.67 = 7.92

Step 4: Interpret the results.


The distribution of the marks of the students fall between ± 7.92 from the mean.
This means that the cluster of distribution is between 2.38 and 18.22.

Exercise 4.3

Direction: Solve the following problems.

1. Determine the range, variance and standard deviations of the following data sets.
a. 7, 8, 4, 3, 2, 3, 6, 5 and 7
b. 2, 8, 11, 17, 12, 6 and 4

2. The result of the college entrance examination of 10 students in a certain university were as
follows:
2.5, 3.4, 5.6, 3.8, 4.2, 2.8, 3.0, 3.0, 3.4, 4.2
Compute for the variance and standard deviation.

3. The newspaper company reported that samples of their weekly sales (in hundred thousand
pesos) are: 345, 452, 254, 137, 483, 515 and 218. Calculate and interpret the variance and
standard deviations.
____________________________________________________________________________

4.4 MEASURES OF LOCATION/POSITION

The measures of location describe the data in some situations and it would be beneficial
knowing how to interpret the obtained values. Quartiles, percentiles and standard scores are the
most commonly used measures of location.

Quartiles divide the distribution into four equal parts (segments of 25% each). Three
quartiles are defined: Q1, Q2 and Q3.

Percentiles divide the distribution into 100 equal parts denoted by P k where k is the
percentile rank. Say P50 means 50th percentile, P75 means 75th percentile and so on. P50 and Q2
is also the same as the median of the distribution. Same with P 25 that is equal to Q1.
GEC 4 Mathematics in the Modern World , First Semester,AY 2021-2022
𝑛𝑘
𝑄𝑘 = ̅̅̅̅ , where 𝑘 = 1, 2, 3
4
𝑛𝑘
𝑃𝑘 = ̅̅̅̅̅̅ , where 𝑘 is an integer from 1 to 99
100

If the value of 𝑄𝑘 and 𝑃𝑘 is an integer, the kth percentile/quartile is the average of the value
of the obtained percentile/quartile rank and the value preceding it. If the value is not an integer,
then it must be round up.

Percentile rank refers to the percentile ranking of a certain value. This can be obtained by
following the equation below:

number of values below + 0.5


Percentile rank = ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ ∙ 100%
total number of values

Example 1:
What is the third quartile (Q3) of the following data set?
20, 40, 50, 65, 70, 75, 80, 100

Solution:

If the data set are not arranged in chronological order, then you must arrange it either
increasing or decreasing order.

Since the given data set are already arranged, then we must compute for Q 3.
Q3 =nk/ 4 = (8)(3)/4 = 24/4 = 6

Since the value of Q3 is an integer then we must get the average of the 6th and 7th value
in the data set.

That is, (75 + 80)/2 = 155/2 = 77.5

Therefore, the 3rd quartile in the data set is 77.5

Example 2:
For the data set below, which value is in the 75th percentile?
1, 3, 3, 4, 6, 7, 7, 7, 8, 9, 9, 10, 12, 15, 16, 17

Solution:
Since we want to find the P75, and we know that there are 16 values in the data set, then
computing for P75 = nk/100 = (16)(75)/100 =1200/100 = 12

Again since the obtained value is an integer, then we must get the average of the 12th and
13th value in the data set. That is, P75 = (10 + 12)/2 = 22/2 = 11

Therefore, the 75th percentile is 11. This implies that 75% in the data set have values less
than 11 and only 25% have values greater than 11.

GEC 4 Mathematics in the Modern World , First Semester,AY 2021-2022


Example 3:
Find the percentile rank of the value of 6 in Example 2:

Solution:

Using the equation for the percentile rank and substituting the given information,

number of values below + 0.5


Percentile rank = ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ ∙ 100%
total number of values

4 + 0.5
= ̅̅̅̅̅̅̅̅̅ ∙ 100%
16
= 28%

Therefore, the percentile rank of the value 6 is 28th percentile.

========================================================================
Exercise 4.4

Answer each of the following questions.

1. Which of the following data values is the 50th percentile?


{1.52, 5.36, 6.79, 5.21, 0.28, 6.36, 8.47, 5.52, 6.26, 5.97}

2. Listed are 29 ages for Academy Award-winning best actors in order from smallest to largest:
18, 21, 22, 25, 26, 27, 29, 30, 31, 33, 36, 37, 41, 42, 47, 52, 55,
57, 58, 62, 64, 67, 69, 71, 72, 73, 74, 76, 77
a. Find the 70th percentile.
b. Find the 83rd percentile.

3. At a high school, it was found that the 30th percentile of number of hours that students spend
studying per week is seven hours. Interpret the 30 th percentile in the context of this situation.
______________________________________________________________________

4.5 Probabilities and the Normal Distribution

4.5.1 Standard Score


Standard score (z score) refers to the number of standard deviation of the observation
above or below the mean. If the value of the z score is negative, it means that the observation is
lower than the mean while if it is positive then it is higher than the mean and if the value is zero
this means that the value is equal to the mean. It also allows us to calculate the probability of a
score occurring within our normal distribution and enables us to compare two scores that are from
different normal distribution.

Standard score can be obtained by getting the ratio of the difference of the value and the
mean and the standard deviation. In symbols,
GEC 4 Mathematics in the Modern World , First Semester,AY 2021-2022
𝑋 − 𝑥̅
𝑍=
𝑠

where: Z = standard score


X = raw score
𝑥̅ = mean
s = standard deviation

Note: A positive (+) z-score means that the observed value is above the mean.
A negative (-) z score means that the observed value is below the mean.
A zero (0) z –score means that the observed value is equal to the mean.

Example 1:
In a given distribution, the mean is 85 and the standard deviation is 10. Find the
corresponding standard score of the ff. values:

a. 95 b. 87 c. 68 d. 55

Solution:

1. The standard score of 95 is:

95 − 85
z= = 1.0
10

Since the standard score is positive, this implies that the score of 95 is 1 standard
deviation above the mean.

2. The standard score of 87 is:


87 − 85
z= = 0.2
10

This implies that the score of 87 is 0.2 standard deviations above the mean.

3. The standard score of 68 is :


68 − 85
z= = −1.7
10

Since the standard score is negative, this implies that score of 68 is 1.7 standard
deviations below the mean.
4. The standard score of 55 is:
55 − 85
z= = −3.0
10

This implies that the score of 55 is 3 standard deviations below the mean.

GEC 4 Mathematics in the Modern World , First Semester,AY 2021-2022


4.5.2 Normal Distribution

The graphical presentation of a normal distribution is the normal curve. A normal curve is
symmetrical, with the highest point at the center. Since it is symmetrical, the left side of the curve
is equal to the right side. The area of the normal distribution represents the population of a
particular distribution.

Two parameters are used to describe the normal curve; the mean and the standard
deviation. Negative standard deviations are located at the left side while the positive standard
deviations are on the right side of the curve.

The area of the normal distribution represents probability. Thus, the larger the area the
greater probability.

Source: Kanbanize

Example: 1.

Find the area of the following standard scores:


a. to the left of z= 1.99
b. to the right of -2.04
c. between z = 2.00 and z = 2.47
d. between z = -1.02 and z = 2.35

Solution:

Using the table of the areas under the normal curve:

a. the area of z = 1.99 is equal to 0.4767 or 47.67%, since we will find the area to the left
of 1.99 we must add the other 50% on the left side of the normal curve. Therefore, the
total area would be 97.67%.
b. the area of z = 2.04 is equal to 0.4793 or 47.93%, since this area pertains to the left of
the normal curve and we are looking for the area to the right, then we must add the
other 50% of the normal curve. Therefore, the total area to the right of -2.04 would be
97.93%.
GEC 4 Mathematics in the Modern World , First Semester,AY 2021-2022
c. the area of z = 2.00 is equal to 0.4772 or 47.72% and the area of z = 2.47 is equal to
0.4932 or 49.32%, since we are looking for the area between these regions, we must
subtract the values. Therefore, 49.32% subtracted by 47.72% is equal to 1.60%.
d. the area of z= 1.02 is equal to 0.3461 or 34.61% and the area of z = 2.35 is equal to
0.4906 or 49.06%, since we are looking for the area between these regions, and it can
be noticed that the z scores comes from left and right of the curve, then we must add
the areas in order to get the total area. Therefore, 34.61% added by 49.06% is equal
to 83.67%.

Applications of the Normal Curve

Several problems in different fields can be solved with the application of the normal curve.
The only requirement is that the variable be normally or approximately normally distributed.

To solve problems by using standard normal distribution, transform the original variable to
a standard normal distribution variable using the standard score or z-score.

Example 1:
A survey found that women spend on average ₱146. 21 on beauty products during
the summer months. Assume that the standard deviation is ₱29.44 and the variable is
normally distributed. Find the percentage of women who spend less than ₱160. 00.

Solution:
Step 1: Draw the normal curve and represent the area.

160.00

𝑥̅ =146.21

Step 2: Find the z- value corresponding to ₱160.00.

₱160.00 − ₱146.21
z = ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ = 0.47
₱29.44

Hence, ₱ 160.00 is 0.47 standard deviations above the mean.

Step 3: Find the area, using the Table for Areas Under the Normal Curve. Look for z =
0.47. From the table, the area of z= 0.47 is 0.1808. Since the question is to look for
the percentage of women who spend less than 160.00, so we need to add the area
below the mean which is 50%.

GEC 4 Mathematics in the Modern World , First Semester,AY 2021-2022


Therefore, 0.6808 or 68.08% of the women spend less than ₱160.00 on beauty
products during summer months.

Example 2:

To qualify for a police academy, candidates must score in the top 10% on a general
abilities test. The test has a mean of 200 and a standard deviation of 20. Find the lowest
possible score to qualify. Assume that the test scores are normally distributed.

Solution:
Since the test scores are normally distributed, the test value x that cuts off the
upper 10% of the area under the normal distribution is desired. (refer to figure below). The
shaded region represents the students who qualify for the test.

𝑥̅ 1.28

Step 1: Subtract 0.1000 from 1.000 to get the area under the normal distribution to the left of x:
1.000 – 0.1000 = 0.9000

Step 2: Find the z value that corresponds to an area of 0.9000. If the specific value cannot be
found, use the closest value. In this case, 0.8997. The corresponding value is 1.28.

Step 3: Substitute in the equation of the z- score.


𝑥 − 200
1.28 = ̅̅̅̅̅̅̅̅̅̅
20
(1.28)(20) = 𝑥 − 200
25.60 + 200 = 𝑥
𝑥 = 226

Therefore, a score of 226 should be used as a cut off. Anybody scoring 226 and above qualifies.

========================================================================

Exercise 4.5

Direction: Solve the following problems:

1. The mean lifetime of a wristwatch is 25 months, with a standard deviation of 5 months. If


the distribution is normal, for how many months should a guarantee be made if the
manufacturer does not want to exchange more than 10% of the watches?

GEC 4 Mathematics in the Modern World , First Semester,AY 2021-2022


2. Scores on a standardized test are approximately normally distributed with a mean of 480
and a standard deviation of 90.
A. What is the proportion of the scores are above 700?
B. What is the percentage of scores between 420 and 520?
3. If a distribution of raw scores were plotted and then the scores are transformed to z-scores,
would the shape of the distribution change? Explain your answer.
____________________________________________________________________________

4.6 Linear Regression and Correlation Coefficient

When conducting research studies, researchers wish to determine whether two variables
are related. If these variables are found to be related, they may then find an equation that can be
used to model the relationship. A correlation is a relationship between two variables. The data
can be represented by the ordered pairs (𝑥, 𝑦) where x is the independent (or explanatory)
variable, and y is the dependent (or response) variable.

4.6.1 Correlation Coefficient

The correlation coefficient is a measure of the strength and the direction of a linear
relationship between two variables. The symbol r represents the sample correlation coefficient.
The formula for 𝑟 is

𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑟 = ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
√𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 √𝑛 ∑ 𝑦 2 − (∑ 𝑦)2

where:
r = Pearson r correlation coefficient
n = number of observations
∑ 𝑥𝑦 = sum of the products of paired scores
∑ 𝑥 = sum of x scores
∑ 𝑦 = sum of y scores
∑ 𝑥 2 = sum of squared x scores
∑ 𝑦 2 = sum of squared y scores

A value of +1 indicates that there is a perfect positive correlation. This means that if one
variable increases, the other variable also increases. The value of -1 indicates that there is a
negative correlation. This implies that as one variable increases, the other variable decreases. A
value of 0 indicates that there is no correlation between variables. (Tolentino, et al., 2018) The
complete list of values was presented below to further interpret the value of computed r.

0.0 - no correlation
±1.00 - perfect correlation
±0.01 − ±0.25 - very low correlation
±0.26 − ±0.50 - moderately low correlation
±0.51 − ±0.75 - high correlation
±0.76 − ±0.99 - very high correlation

GEC 4 Mathematics in the Modern World , First Semester,AY 2021-2022


Example 1: Age and Weight of Preschool Children

Determine the correlation between the age and the weight of 10 preschool children at barangay
Marilima as shown in the table below:

Age (x) Weight (y)


(in years) (in kg)
1.0 6.0
1.5 6.8
2.5 8.0
3.0 9.2
3.6 12.0
4.2 13.0
4.8 14.0
5.0 10.5
5.4 14.2
6.0 17.3

Solution:

To determine the relationship between age and weight of 10 children, Pearson product
moment correlation must be used.

Step 1: To obtain the values, we may construct another table adding columns for specific values
necessary in the computation of the Pearson r.

Age (𝑥) Weight (𝑦)


𝑥𝑦 𝑥2 𝑦2
(in years) (in kg)
1.0 6.0 6.00 1.00 36.00
1.5 6.8 10.20 2.25 46.24
2.5 8.0 20.00 6.25 64.00
3.0 9.2 27.60 9.00 84.64
3.6 12.0 43.20 12.96 144.00
4.2 13.0 54.60 17.64 169.00
4.8 14.0 67.20 23.04 196.00
5.0 10.5 52.50 25.00 110.25
5.4 14.2 76.68 29.16 201.64
6.0 17.3 103.80 36.00 299.29
𝑥 = 37 𝑦 = 111 𝑥𝑦 = 461.78 𝑥 = 162.30
2
𝑦 = 1 351.06
2

Step 2: Substitute these values to the equation


𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑟 = ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
√𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 √𝑛 ∑ 𝑦 2 − (∑ 𝑦)2
(10)(461.78) − (37)(111)
= ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
√(10)(162.3) − 372 √10(1351.06) − 1112
4617.8 − 4107
= ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
√1623 − 1369 √13510.6 − 12321

GEC 4 Mathematics in the Modern World , First Semester,AY 2021-2022


510.8
= ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
√254 √1189.6
510.8
= ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
(15.94)(34.49)
510.8
= ̅̅̅̅̅̅̅̅̅̅
549.77
= 0.93

Step 3: Interpret the result.

The result of 0.93 indicates a positive with very high correlation between the age of pre-
school children and their weight. This implies that as age of the pre-school children increases,
their weight also increases.

Example 2:

For the following data set, find the Pearson r and the r2.

x 12 2 5 9 11 10 4 1
y 10 3 7 5 9 8 6 3

Solution:

The scatterplot of the data set is presented below:

Scatterplot of Data set


12
10
8
6
4
2
0
0 2 4 6 8 10 12 14

The following steps are helpful for the computation of the correlation coefficient.

Step 1: Construct another table for the values of 𝑥𝑦, 𝑥 2 , and 𝑦 2 .

X Y XY X2 Y2
12 10 120 144 100

GEC 4 Mathematics in the Modern World , First Semester,AY 2021-2022


2 3 6 4 9
5 7 35 25 49
9 5 45 81 25
11 9 99 121 81
10 8 80 100 64
4 6 24 16 36
1 3 3 1 9
∑ 𝑥 =54 ∑ 𝑦 =51 ∑ 𝑥𝑦 =412 ∑ 𝑥 2 =492 ∑ 𝑦 2 =373

Step 2: Substitute the necessary values in the equation.

𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑟 = ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
√𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 √𝑛 ∑ 𝑦 2 − (∑ 𝑦)2

8 (412) − (54)(51)
= ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
√[(8)(492) − 542 ] √[(8)(373) − 512 ]

3296 − 2754
= ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
√(3936 − 2916) √2984 − 2601

542 542
= ̅̅̅̅̅̅̅̅̅̅̅̅̅ = ̅̅̅̅̅ = 0.09
√390660 625

Step 4. Interpret the results.


The value of r = 0.09 indicates that the variables have very low correlation between
variables. This implies that whatever the value of one variable does not affect the values of the
other variable.
Step 5. Computing for r2 where r = 0.09, r2 = 0.09 which implies that only 9% explained the
aforementioned correlation.

4.6.2 Linear Regression

Simple linear regression is a statistical method that allows us to summarize and study
relationships between continuous (quantitative) variables. One variable, denoted by x represents
the predictor variable or the independent variable. The other variable, denoted by y represents
the response or the dependent variable.

Simple linear regression is appropriate when the following conditions are satisfied.
 The dependent variable Y has a linear relationship to the independent variable X. To check
this, make sure that the XY scatterplot is linear and that the residual plot shows a random
pattern.
 For each value of X, the probability distribution of Y has the same standard deviation σ.
When this condition is satisfied, the variability of the residuals will be relatively constant
across all values of X, which is easily checked in a residual plot.
The least square regression equation can be formed from a set of sample data using the
formula:
GEC 4 Mathematics in the Modern World , First Semester,AY 2021-2022
𝑦̂ = 𝑎 + 𝑏𝑥

where 𝑦̂ is the dependent variable to be predicted


𝑥 = the independent variable
𝑎 = the y intercept
𝑏 = the slope of the line that represents the equation

The constants a, b in the regression equation are called the regression coefficients. The
values of a and b can be found using the following equations:

∑ 𝑥𝑦 − 𝑛𝑥̅ 𝑦̅
𝑎 = 𝑦̅ − 𝑏𝑥̅ and 𝑏 = ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
1
∑ 𝑥 2 − (𝑥)2
𝑛

The regression equation can be used to predict the value of one variable when the value
of the other variable is known.

Example 1:

The values of x and their corresponding values of y are shown in the table below

X 0 1 2 3 4
Y 2 3 5 4 6

a) Find the least square regression line y = ax + b.


b) Estimate the value of y when x = 10.

Solution:

Step 1: Organize the listing of the values of x and y. Include the necessary values like x 2, y2, xy
and the mean of x and y.
𝑥 𝑦 𝑥𝑦 x2 y2
0 2 0 0 4
1 3 3 1 9
2 5 10 4 25
3 4 12 9 16
4 6 24 16 36
x = 10 y = 20 xy = 49 x2 = 30 x2 = 90
𝑥̅ = 2.0 𝑦̅ = 4.0

Step 2: Substitute the values in the equation.

∑ 𝑥𝑦 − 𝑛𝑥̅ 𝑦̅ 49 − 5(2.0)(4.0)
9
𝑏 = ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
2 1 2
= ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
1 2
= ̅̅̅̅ = 0.9
∑ 𝑥 − (∑ 𝑥) 30 − (10) 10
𝑛 5

𝑎 = 𝑦̅ - b𝑥̅ = 4 – (0.9)(2) = 2.2

GEC 4 Mathematics in the Modern World , First Semester,AY 2021-2022


Step 3: Formulate the regression equation.
𝑦̂ = 𝑎 + 𝑏𝑥
𝑦̂ = 2.2 + 0.9𝑥

Therefore, if the value of x = 10, the estimated value of y will be 11.2.

Example 2:
The table below shows the height, 𝑥, in inches and the pulse rate, 𝑦, per minute,
for 9 people. Find the correlation coefficient and interpret your result.

x 68 72 65 70 62 75 78 64 68

y 90 85 88 100 105 98 70 65 72

Solution:
Step 1:
Height Pulse rate
𝑥𝑦 𝑥2 𝑦2
(𝑥) (𝑦)
68 90 6120 4624 8100
72 85 6120 5184 7225
65 88 5720 4225 7744
70 100 7000 4900 10000
62 105 6510 3844 11025
75 98 7350 5625 9604
78 70 5460 6084 4900
64 65 4160 4096 4225
68 72 4896 4624 5184
∑ 𝑥 =622 𝑦 = 773 𝑥𝑦 = 53336 𝑥 2 = 43206 ∑ 𝑦 2 =68007
𝑥̅ = 69.1 𝑦̅ = 85.9

Step 2: Substitute the values in the equation.


∑ 𝑥𝑦 − 𝑛𝑥̅ 𝑦̅ 53336 − 9(69.1)(85.9) −85.21
𝑏 = ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ = = ̅̅̅̅̅̅̅̅̅ = −0.37
∑ 𝑥 2 − 𝑛(∑ 𝑥)2 ̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
43206 − 9(69.1)2 232.71
𝑎 = 𝑦̅ − 𝑏𝑥̅ = 85.9 − (−0.37)(69.1) = 111.47

Step 3: Formulate the regression equation.


𝑦̂ = 𝑎 + 𝑏𝑥
𝑦̂ = 111.47 − 0.37𝑥

Exercise 4.6

Direction: Solve the following problems:

1. A researcher carefully computes the correlation coefficient between two variables and
gets r = 1.23. What does this value mean?

GEC 4 Mathematics in the Modern World , First Semester,AY 2021-2022


2. Given the following data:
X 72 73 75 76 77 78 79 80 80 81 82 83 84 85 86 88
Y 45 38 41 35 31 40 25 32 36 29 34 38 26 32 28 27

a. Sketch a scatterplot.
b. Compute the correlation coefficient, r.
c. Compute the coefficients of the linear regression line, y = b 1x + b0.
d. What is the estimated value for X = 7?

Reflective Journal
Write your reflections in learning the topics. Describe your strengths and
weaknesses to learn the concepts.
________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________
________________________________________________________________

GEC 4 Mathematics in the Modern World , First Semester,AY 2021-2022


GEC 4 Mathematics in the Modern World , First Semester,AY 2021-2022

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy