Linear Regression and Correlation
Linear Regression and Correlation
Linear Regression
and
Correlation
December, 2024
Intended
Learning Outcomes
1. Use the methods of linear regression and
correlations to predict the value of a variable
given certain conditions
2. Advocate the use of statistical data in
making important decisions
3. Apply linear regression and correlation
analysis to analyze real-world data and solve
practical problems in fields like economics,
science, and social sciences
examples of situations Researchers want to
where regression determine if the daily
analysis could be caffeine intake (in
applied
milligrams) is related to
Educators are the level of heart
interested in damage.
determining how the
number of hours a Researchers want to
student studies can be determine if a
used to predict the
person’s age is
student’s score on a
particular exam. related to their
blood pressure.
3
Linear Regression is a statistical
method used to analyze the
relationship between two or more
variables, typically by fitting a straight
line to the data points.
Linear Regression is used to predict the
value of the dependent variable y based on
the independent variable x by fitting a
straight line to the data. It also helps to
understand the relationship between x and
y, quantifying how changes in x affect y.
The least-squares regression line is a straight line
that best fits a set of data points by making the
difference between the actual data points and the
points predicted by the line as small as possible. It
helps to show the relationship between two variables
and can be used to make predictions.
A scatter plot, also known as a scatter diagram, is
a type of mathematical diagram used to display
the relationship between two variables.
In a scatter plot, each data point represents the
values of two variables, one plotted along the
horizontal axis (x-axis) and the other along the
vertical axis (y-axis).
A scatter plot, also known as a scatter diagram
𝐚 =𝟔 . 𝟐𝟓
Let us solve the value of a
a = 6.25
Using
What is the predicted achievement grade of a
student who spent 4 hours in studying the subject?
Using
Solution:
Substitute x = 4 hours in the equation of linear regression and solve for
y.
𝐚 ≈ 𝟐 .𝟕𝟑𝟎𝟑
∑ 𝑥=28.8,∑ 𝑦=52.1
Find and .
𝑥=3.6 =
𝒃=− 𝟑.𝟑𝟏𝟔𝟓𝟖
If and are each rounded to the nearest tenth, to reflect the
accuracy of the original data, then we have as our equation of
the least-squares line:
𝑏=−3.31658
^
𝑦 =𝑎𝑥 +𝑏
^
𝒚 =𝟐 .𝟕 𝒙 − 𝟑 .𝟑
If and are each rounded to the nearest tenth, to reflect the
accuracy of the original data, then we have as our equation of
the least-squares line:
y
10
𝑏=−3.31658 9
f(x) = 2.730263157895 x − 3.316447368421
8
R² = 0.987469217736318
^
𝑦 =𝑎𝑥 +𝑏 7
^
𝒚 =𝟐 .𝟕 𝒙 − 𝟑 .𝟑 5
0
2 2.5 3 3.5 4 4.5 5
Use the equation of the least-squares line to
predict the average speed of an adult man for each
of the following stride lengths. Round your results
to the nearest tenth of a meter per second.
a) 2.8 m
b) 4.8 m
a.
Substitute 2.8 for x
𝑛 ∑ 𝑥𝑦 − ( ∑ 𝑥 ) ( ∑ 𝑦 )
𝑟=
√[ 𝑛 (∑ 𝑥 ) − ( ∑ 𝑥 ) ] [ 𝑛 (∑ 𝑦 ) − (∑ 𝑦 ) ]
2 2 2 2
If the linear correlation coefficient r is positive, the
relationship between the variables has a positive
correlation. In this case, if one variable increases, the other
variable also tends to increase.
the plot shown suggests
a positive relationship,
since as the number of
cars rented increases,
revenue tends to
increase also.
32
CAR RENTALS
If r is negative, the linear relationship between the
variables has a negative correlation. In this case, if one
variable increases, the other variable tends to decrease.
the data shown
suggests a negative
relationship, since as
the number of absences
increases, the final
grade decreases.
34
ABSENCES and FINAL
GRADES
The plot of the data
shows no clear
relationship, as no
visible pattern can be
identified..
35
AGE and HEALTH
36
2. Find the linear correlation coefficient
for stride length speed of an adult man.
Round your result to the nearest
hundredth.
𝑛 ∑ 𝑥𝑦 − ( ∑ 𝑥 ) ( ∑ 𝑦 )
𝑟=
√[ 𝑛 (∑ 𝑥 ) − ( ∑ 𝑥 ) ] [ 𝑛 (∑ 𝑦 ) − (∑ 𝑦 ) ]
2 2 2 2
10 ( 195.86 ) −(28.8)(52.1)
𝑟=
√ [ 10 ( 106.72 − ( 28.8 ) ) ] [ 10 ( 362.25 ) − ( 52.1 ) ]
2 2
𝑟 ≈ 0.99
What is the significance of the fact that the
linear correlation coefficient is positive ?
It indicates a positive correlation between a
man’s stride length and his speed. That is, as a
man’s stride length increases, his speed also
increases.
Activity: (3 members in a group)
Find the equation of the least-squares line and
the linear correlation coefficient for the given
data. Round the constants, a, b, and r to the
nearest hundredth. {(−7,−11.7),(−5,−9.8),
(−3,−8.1),(1,−5.9),(2,−5.7)}