0% found this document useful (0 votes)

6 views51 pages

Correlation and Linear Regression

The document discusses bivariate analysis and linear regression, focusing on how to quantitatively describe the relationship between two variables using correlation and regression techniques. It explains the mathematical formulation of linear regression, including the concepts of slope and intercept, and the process of minimizing the sum of squared errors to find the best fitting line. Additionally, it covers assumptions regarding residuals and the evaluation of regression models.

Uploaded by

吳恩

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views51 pages

Correlation and Linear Regression

Uploaded by

吳恩

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

Correlation and Linear Regression –

Part 2
Reading
• Chapter 6.2.1, 6.2.2, 6.2.3, and 6.2.4 in the textbook
Bivariate analysis
• We know how to read and interpret the scatterplots. But we want to use
mathematical/statistical language to describe the relationship between
two data sets quantitatively.
• One of the most commonly used statistical tools is correlation.
• Bivariate analysis aims to understand the relationship between two
variables x and y.
• When the two variables are measured on the same object, x is usually
identified as the independent variable or predictor, and y as the
dependent variable or predictand.
• The methods of bivariate statistics aim to describe the strength of the
relationship between variables, such as Pearson’s correlation coefficient
for linear relationships or an equation obtained by regression.
Linear regression
• Simple linear regression seeks to summarize the relationship
between two variables, shown graphically in their scatterplot, by a
single straight line.
Linear regression
• Simple linear regression seeks to summarize the relationship
between two variables, shown graphically in their scatterplot, by a
single straight line.
• The regression procedure chooses that line producing the least
error for predictions of y given observations of x.

Linear function: y= bx +a
where b is called the slope, and a is called
the intercept
Δy = b*Δx
The value of y depends on the value of x Δx
Linear function: y= bx +a + error

The value of y depends on the value of x,

but this relationship is masked by random errors

In ordinary linear regression, the dependent variable Y

has some random error (X is considered to be error-
free).
There is a conditional dependence between
the random variables Y and X.

The probability that Y is found in a certain interval

range is conditionally dependent on the value of X.

If we know the value of X we can make a

statistical estimate for value of Y given X.
We usually use 𝒀 ෡ to represent the estimated or predicted values of Y: 𝒚
ෝ = 𝒃𝒙 + 𝒂

This is what the linear regression line gives us

when we fit the line to the data sample.
Linear function y = bx +a

The value of y depends on the value of x

b= Δy/Δx =2

Δy = b*Δx
Δx
a=-1

a is the intercept: the point at which the regression line intersects

the y-axis. (That is the value of y for x=0)

b is the slope: if you go one unit step in x direction

b tells you how many steps you have go into y-direction.
Linear function y= bx +a

The value of y depends on the value of x

value of y
does
not depend on x
y = 0x + a = a
b>0 b<0

Corresponds to r(x,y)>0 Corresponds to r(x,y)=0 Corresponds to r(x,y)<0

Linear relationships between
independent (x) and
dependent variable (y) are
not perfect. We assume errors are
affecting the dependent variable.
What is the best linear fit to the data?
What we need is a quantitative measure how well the line fits the data!

?
𝑒𝑖 = 𝑦𝑖 − 𝑦(𝑥
ො 𝑖)

Figure 6.1 from the textbook

Schematic illustration of simple linear regression. The regression line, 𝑦ො = 𝑏𝑥 +

𝑎, is chosen as the one minimizing some measure of the vertical differences (the
residuals) between the points and the line. In least-squares regression that
measure is the sum of the squared vertical distances. The error or residual, 𝑒, is
the difference between the data point (observed y values) and the regression line
(estimated/predicted y values by the regression).
How to estimate the best fitting line?
Example: Imagine you want to ‘predict’ the expected
outcome on a final exam given the statistical linear
relationship between first exam scores (test taken earlier in
the semester) and the final exam scores.
Data available
X-axis: First Exam Score before the final
exam!

Our variable of
interest (before the Y-axis: Final Exam Score
final exam is taken)

You aim to build a linear regression model to predict your

final exam score based on your first exam score. You use
your scores from the courses you have been taken to build
such a model.
How to estimate the best fitting line?

Y-axis: Final Exam Score

X-axis: First Exam Score

How to estimate the best fitting line
for final exam scores based on previous third exam scores?

Red dot marks a

point in the X-Y

Final Exam Score

plane. We are
given a value X Example: Blue dots
(1st exam score) mark observed
is 65 and scores in the course
from past semester.
‘predict’ that
the student will
have a final
exam score First Exam Score
around 145
How to estimate the best fitting line?

Mathematically we formulate this as a minimization problem:

Minimize the distance of the data points from the linear regression line:
Distance is measured here only in y-direction (because we assume only error in y).
Note further, the squared distance is used.
How to estimate the best fitting line?

The deviations from

the deterministic model
line are interpreted as random errors (following a Gaussian distribution)

Sum of Squared Errors (SSE)

How to estimate the best fitting line?

Sum of Squared Errors (SSE)

What we need to do is:

Find the slope and intercept
value that minimize the SSE!

This is the least-squares regression.

How to estimate the best fitting line?
Minimize
Sum of Squared Errors (SSE) !

A line is defined in the x-y plane when we

know the slope and one point on the line.
In linear regression we want to find the point
where the line intersects the y-axis (intercept)
How to estimate the best fitting line?
• Finding analytic expressions for the least-squares slope, b, and the intercept, a.
• In order to minimize the sum of squared residuals,
Recap: finding the maximum and minimum of a quadratic function
• A quadratic function: 𝑓 𝑥 = 𝑎𝑥 2 + 𝑏𝑥 + 𝑐, 𝑎 ≠ 0.
• If a > 0, which has a minimum of the function.
• If a < 0, which has a maximum of the function.
• The maximum or minimum of the function, or the vertex point, can
be obtained by finding the roots of the derivative of the function.

• The derivative of 𝑓(𝑥): 𝑓 ′ 𝑥 =

2𝑎𝑥 + 𝑏.
𝑏
• Let 𝑓 𝑥 =0, then get 𝑥 = −
′
→
2𝑎
𝑏
When 𝑥 = − , we can find the
2𝑎
minimum of 𝑓(𝑥).
How to estimate the best fitting line?
• Finding analytic expressions for the least-squares slope, b, and the intercept, a.
• In order to minimize the sum of squared residuals,

• Set the derivatives of the above equation with respect to the parameters a and b
to zero and solve.
• Write down the expression of the derivatives for and
How to estimate the best fitting line?
• Finding analytic expressions for the least-squares slope, b, and the intercept, a.
• In order to minimize the sum of squared residuals,

• Set the derivatives of the above equation with respect to the parameters a and b
to zero and solve:

and
How to estimate the best fitting line?

• Now we have two equations and two unknowns, a and b.

• We can solve the two unknowns, a and b.
• Derive the final expression for a and b.
How to estimate the best fitting line?

𝑛 𝑛

෍ 𝑦𝑖 = 𝑎𝑛 + 𝑏 ෍ 𝑥𝑖
𝑖=1 𝑖=1
𝑛 𝑛 𝑛 These two equations
෍ 𝑥𝑖 𝑦𝑖 = 𝑎 ෍ 𝑥𝑖 + 𝑏 ෍ 𝑥𝑖2 will be useful in later
𝑖=1 𝑖=1 𝑖=1 analysis of variance.
How to estimate the best fitting line?

• Now we have two equations and two unknowns, a and b.

• We can solve the two unknowns, a and b.
𝑎 = 𝑦 − 𝑏𝑥

𝑛𝑥 𝑦 − σ𝑛𝑖=1 𝑥𝑖 𝑦𝑖
𝑏= 2
𝑛𝑥 − σ𝑛𝑖=1 𝑥𝑖2
Summary: How to estimate the best fitting line?
• To find the slope b and the intercept a such that the sum of squared residuals is
minimized.

• 𝑎 = 𝑦 − 𝑏𝑥

𝑛𝑥 𝑦−σ𝑛
𝑖=1 𝑥𝑖 𝑦𝑖
• 𝑏= 2
𝑛𝑥 −σ𝑛 2
𝑖=1 𝑥𝑖

• Other expressions of the slope b

Class Activity:

Fitting the regression line (by hand) and

compare the results

X 65 67 71 71 66 75 67 70 71 69 69
Y 175 133 185 163 126 198 153 163 159 151 159
first exam score
How to estimate the best fitting line?
Minimize
Sum of Squared Errors (SSE) !

Task 1: Find the best value for the slope

How to estimate the best fitting line?
Minimize
Sum of Squared Errors (SSE) !

Task 1: Find the best value for the slope

𝑥,ҧ 𝑦ഥ represent the sample mean
of variable X and Y, respectively.
How to estimate the best fitting line?
Minimize
Sum of Squared Errors (SSE) !

Task 1: Find the best value for the slope

Note on the notation with the “^”:

Many textbooks distinguish between the estimated values
from the actual true (but unknown) parameter values using a different symbol.
Or they use Greek letters for the true values, and Latin letters for the estimates.
The hat (^) on this slide indicates that. In the next slides we omit this nuance.
How to estimate the best fitting line?
Minimize
Sum of Squared Errors (SSE) !

Task 2: Find the intercept point

Interpretation of the slope and intercept in terms of:

• statistical mean values,

• variances,
• covariance and correlation

In ordinary linear
Intercept a regression the fitted
line goes through the
center point of the
paired data samples.
center point 𝑥,ҧ 𝑦ത
Sample mean of x and y
How to estimate the best fitting line?

Minimize
Sum of Squared Errors (SSE)

Analytical solution for min(SSE):

Intercept Slope
Interpretation of the slope and intercept in terms of:

• statistical mean values,

• variances,
• covariance and correlation

slope
1
𝑛
1
𝑛
Covariance
between x and y
b=
Variance of x
Interpretation of the slope and intercept in terms of:

• statistical mean values, Covariance

• variances, between x and y
• covariance and correlation
b=
Variance of x

slope
Covariance
between x and y
r=
Standard deviation of x *
Standard deviation of y

Standard deviation of y
b= Correlation between x and y *
Standard deviation of x
Relationship between slope b and the Pearson
correlation coefficient

Slope of the regression line:

Correlation coefficient * standard deviation (y)
/ standard deviation (x)
How to estimate the best fitting line?

Sum of Squared Errors (SSE)

The intercept value guarantees that the

regression line goes through the center
point (mean of x, mean of y)

Slope of the regression line:

Correlation coefficient * standard deviation (y)
/ standard deviation (x)
How to estimate the best fitting line?

Estimated
Regression line

Linear relationship with errors: y= bx +a + ε

The value of y depends on the value of x plus a random error

ෝ + 𝒆 = 𝒃𝒙 + 𝒂 + 𝒆
𝒚=𝒚
Understanding and evaluating regression models
• Two assumptions for the quantities 𝑒𝑖 :
• 𝑒𝑖 are independent random variables with zero mean and constant
variance.
• The residuals follow a Gaussian distribution.
• The sample mean of the residuals (dividing the following equation
by n) is zero.
𝑛

෍ 𝑒𝑖 = 0
𝑖=1
• Given 𝑒𝑖 = 𝑦𝑖 − 𝑦ො𝑖 , 𝑦𝑖 = 𝑦ො𝑖 + 𝑒𝑖
• Prove that the mean of 𝑦ො𝑖 is 𝑦.
Understanding and evaluating regression models
• Now, we take a close look at the sum of square errors (SSE).
𝑛

𝑆𝑆𝐸 = ෍ 𝑒𝑖2
𝑖=1
1 1 1 1
• se2 = 𝑆𝑆𝐸 = σ𝑛𝑖=1 𝑒𝑖2 = σ𝑛𝑖=1 𝑒𝑖 − 𝑒𝑖 2
given 𝑒𝑖 = σ𝑛𝑖=1 𝑒𝑖 = 0
𝑛 𝑛 𝑛 𝑛

This is the variance of

the residuals/errors.
• If SSE is divided by n−2, it is residual variance from the sample of
residuals.
Understanding and evaluating regression models
1
• Variance of the residuals: se2 = σ𝑛𝑖=1 𝑒𝑖2
𝑛
1
• Variance of the observed y values: 𝑠𝑦2 = σ𝑛𝑖=1 𝑦𝑖 − 𝑦 2
𝑛
1
• Variance of the estimated/predicted y values: 𝑠𝑦2ො = σ𝑛𝑖=1 𝑦ෝ𝑖 − 𝑦 2
𝑛
• Prove 𝑠𝑦2 = 𝑠𝑦2ො + 𝑠𝑒2 (Assignment 5)
Understanding and evaluating regression models
1
• Variance of the residuals: se2 = σ𝑛𝑖=1 𝑒𝑖2
𝑛
1
• Variance of the observed y values: 𝑠𝑦2 = σ𝑛𝑖=1 𝑦𝑖 − 𝑦 2
𝑛
1
• Variance of the estimated/predicted y values: 𝑠𝑦2ො = σ𝑛𝑖=1 𝑦ෝ𝑖 − 𝑦 2
𝑛
• Prove 𝑠𝑦2 = 𝑠𝑦2ො + 𝑠𝑒2 (Assignment 5)
• Hint:
1 1
• = σ𝑛𝑖=1 𝑦𝑖 − 𝑦 2 = σ𝑛𝑖=1 𝑦ො𝑖 + 𝑒𝑖 − 𝑦
start from 𝑠𝑦2 2
𝑛 𝑛
• Replace 𝑒𝑖 with 𝑦𝑖 − 𝑦ො𝑖 in the above equation:
1 𝑛 1 𝑛 1
σ 𝑦ො + 𝑒𝑖 − 𝑦 = σ𝑖=1 𝑦ො𝑖 +
2
𝑦𝑖 − 𝑦ෝ𝑖 − 𝑦 2
= σ𝑛𝑖=1 𝑦ො𝑖 − 𝑦 + (𝑦𝑖 − 𝑦ො𝑖 ) 2
𝑛 𝑖=1 𝑖 𝑛 𝑛
1 1
• σ𝑛𝑖=1 𝑦𝑖 − 𝑦 2 = σ𝑛𝑖=1 𝑦ො𝑖 − 𝑦 + (𝑦𝑖 − 𝑦ො𝑖 ) 2
𝑛 𝑛
• Expand the right-hand-side of the equation and use the following equations
• σ𝑛𝑖=1 𝑦𝑖 = 𝑎𝑛 + 𝑏 σ𝑛𝑖=1 𝑥𝑖
If you have a different approach, that
• σ𝑛𝑖=1 𝑥𝑖 𝑦𝑖 = 𝑎 σ𝑛𝑖=1 𝑥𝑖 + 𝑏 σ𝑛𝑖=1 𝑥𝑖2 would be great and feel free to use that!
Understanding and evaluating regression models
• We know 𝒔𝟐𝒚 = 𝒔𝟐𝒚ෝ + 𝒔𝟐𝒆 , this is an important relationship!
1 1 1
• σ𝑛𝑖=1 𝑦𝑖 − 𝑦 2
= σ𝑛𝑖=1 𝑦ෝ𝑖 − 𝑦 2
+ σ𝑛𝑖=1 𝑒𝑖2
𝑛 𝑛 𝑛
• Removing 1/n, we get
𝑛 𝑛 𝑛

෍ 𝑦𝑖 − 𝑦 2
= ෍ 𝑦ෝ𝑖 − 𝑦 2
+ ෍ 𝑒𝑖2
𝑖=1 𝑖=1 𝑖=1
• σ𝑛𝑖=1 𝑦𝑖 − 𝑦 2 is the total sum of squares (SST).
• σ𝑛𝑖=1 𝑦ෝ𝑖 − 𝑦 2 is the regression sum of squares (SSR).
• σ𝑛𝑖=1 𝑒𝑖2 is the sum of squared errors (SSE).
• We get the relationship SST=SSR+SSE.
• This relationship describes that the variation in the predictand, y
can be explained by the variation represented by the regression and
the variation of the residuals.
Understanding and evaluating regression models
• We take a close look at the relationship: SST=SSR+SSE.
• If the variation of the predictand y can be completely explained by
SSR, which means SSE=0, then this is a perfect regression model
and it means all the data points in the scatterplot fall on the
regression line.
• If there is absolutely no linear relationship between x and y, the
regression slope will be zero, the SSR=0, and SSE=SST.
• So, We can use these three quantities to measure the fit of a
regression or the correspondence between the regression line and
a scatterplot of the data.
Variance analysis
• So, We can use quantities related to these three quantities to
measure the fit of a regression or the correspondence between the
regression line and a scatterplot of the data.
• Summarize in ANOVA table.

Table 6.1 from the textbook

Measures of the fit of a regression model
• So, We can use quantities related to these three quantities to measure the fit of
a regression or the correspondence between the regression line and a
scatterplot of the data.
• The first measure is SSE or MSE (mean squared error), MSE=SSE.
• Smaller SSE or MSE, better-fit regression model, better predictions.
• The second measure is the coefficient of determination 𝑅2 ,
𝑆𝑆𝑅 𝑆𝑆𝐸
2
𝑅 = =1− .
𝑆𝑆𝑇 𝑆𝑆𝑇
• The 𝑅2 can be interpreted as the proportion of the variation of the
predictand (proportional to SST) that is described or accounted for by the
regression (SSR). It is also known as explained variance.
• 𝑅2 varies from 0 and 1.
• Larger 𝑅2 , better-fit regression model, better predictions.
• Prove that the squares of the correlation coefficient r between x and y is the
same as 𝑅2 in a simple univariate linear regression (Assignment 5).
Multiple linear regression
• Chapter 6.2.8 in the textbook
• Multiple linear regression is the more general and more common situation of
linear regression. As in the case of simple linear regression, there is still a
single predictand, y, but in distinction, there is more than one predictor (x)
variable.
• Let K denote the number of predictor variables, the multiple linear regression
will look like:
𝑦ො = 𝑏0 + 𝑏1 𝑥1 + 𝑏2 𝑥2 + 𝑏3 𝑥3 + ⋯ + 𝑏𝐾 𝑥𝐾
• Each of the K predictor variables has its own coefficient, analogous to the
slope, b, in a simple univariate linear regression. 𝑏0 is the regression constant,
analogous to the intercept, a, in a simple univariate linear regression.
• These K + 1 regression coefficients (𝑏0 , 𝑏1 , 𝑏2 , 𝑏3 , … , 𝑏𝐾 ) often are called the
regression parameters.
• If K = 1, then 𝑦ො = 𝑏0 + 𝑏1 𝑥1 , which is the simple linear regression.
Multiple linear regression
• 𝑦ො = 𝑏0 + 𝑏1 𝑥1 + 𝑏2 𝑥2 + 𝑏3 𝑥3 + ⋯ + 𝑏𝐾 𝑥𝐾
• To find the regression parameters, the same approach is used as in the simple
linear regression.
• Minimizing the sum of squared errors (SSE):
𝑛 𝑛

𝑆𝑆𝐸 = ෍ 𝑒𝑖2 = ෍ 𝑦𝑖 − 𝑦ො𝑖 2

𝑖=1 𝑖=1
Curvilinear regression
• When we have nonlinear relations, we often assume an intrinsically linear
model, and then we fit data into the model using polynomial regression.
• That is, we employ some models that use regression to fit curves instead of
straight lines. The technique is known as curvilinear regression analysis.
• For example, we test several polynomial regression equations. Polynomial
equations are formed by taking our independent variable to successive powers.
Curvilinear regression
• In general, the polynomial equation is referred to by its degree, which is the
number of the largest exponent.
• For example, the linear equation is a polynomial equation of the first degree,
the quadratic is of the second degree, and the cubic is of the third degree.
• The function of the power terms is to introduce bends into the regression line.
With simple linear regression, the regression line is straight. With the addition
of the quadratic term, we can introduce or model one bend. With the addition
of the cubic term, we can model two bends, and so forth.

SOP - University of Texas at Austin
No ratings yet
SOP - University of Texas at Austin
2 pages
Welding Machine Specifications PDF
0% (1)
Welding Machine Specifications PDF
4 pages
Chap-7 Memory and Programmable Logic 4th Ed. Mano
100% (1)
Chap-7 Memory and Programmable Logic 4th Ed. Mano
42 pages
Slides - Simple Linear Regression
No ratings yet
Slides - Simple Linear Regression
35 pages
Business Statistics II
100% (2)
Business Statistics II
100 pages
PE Civil: Transportation Ebook Practice Exam
No ratings yet
PE Civil: Transportation Ebook Practice Exam
41 pages
9 Regression (Statistics IEM 2-2)
No ratings yet
9 Regression (Statistics IEM 2-2)
32 pages
Ch10 - Curve Fitting
No ratings yet
Ch10 - Curve Fitting
157 pages
Unit 2 Regression
No ratings yet
Unit 2 Regression
31 pages
9 Regression (Statistics IEM 2-2)
No ratings yet
9 Regression (Statistics IEM 2-2)
32 pages
AdvStats - W1 - Descriptive Stats
No ratings yet
AdvStats - W1 - Descriptive Stats
23 pages
MA150 Statistics Notes (Chapter 14) Descriptive Methods in Regression and Correlation
No ratings yet
MA150 Statistics Notes (Chapter 14) Descriptive Methods in Regression and Correlation
25 pages
Introduction To Linear Regression
No ratings yet
Introduction To Linear Regression
6 pages
Chapter 7 Presentation - 11.18.2024
No ratings yet
Chapter 7 Presentation - 11.18.2024
18 pages
Reference+Material Linear Regression
No ratings yet
Reference+Material Linear Regression
12 pages
Introduction To Linear Algebra With Applications
0% (3)
Introduction To Linear Algebra With Applications
7 pages
STAR Rando Questions Stats
No ratings yet
STAR Rando Questions Stats
14 pages
Linear Regression - Module 3
No ratings yet
Linear Regression - Module 3
16 pages
Lectures 14 15
No ratings yet
Lectures 14 15
66 pages
Linear Regression Analysis and Least Square Methods
No ratings yet
Linear Regression Analysis and Least Square Methods
65 pages
Regression Equations
No ratings yet
Regression Equations
14 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
27 pages
Reference Material Linear Regression
No ratings yet
Reference Material Linear Regression
12 pages
OpenStax Chapter 12 Power Point
No ratings yet
OpenStax Chapter 12 Power Point
81 pages
Regression Coeffient
No ratings yet
Regression Coeffient
52 pages
Lecture Week 12 - Intro To Regression
No ratings yet
Lecture Week 12 - Intro To Regression
5 pages
Output Input Linear Correlation Coefficient Regression Analysis
No ratings yet
Output Input Linear Correlation Coefficient Regression Analysis
6 pages
DISCRETE MATH Chapter-8
No ratings yet
DISCRETE MATH Chapter-8
34 pages
Lecture 5 Regression
No ratings yet
Lecture 5 Regression
77 pages
Regression Analysis All
No ratings yet
Regression Analysis All
29 pages
DMJAP LinearRegression 3
No ratings yet
DMJAP LinearRegression 3
28 pages
Introduction To Linear Regression
No ratings yet
Introduction To Linear Regression
6 pages
Lecture 8 Linear and Multiple Regression
No ratings yet
Lecture 8 Linear and Multiple Regression
55 pages
Regression
No ratings yet
Regression
60 pages
LAB6
50% (2)
LAB6
5 pages
Biostat Lecture 10
No ratings yet
Biostat Lecture 10
47 pages
MAP 716 Lecture 4 Simple Linear Regression
No ratings yet
MAP 716 Lecture 4 Simple Linear Regression
23 pages
Chapter 12 Notes
No ratings yet
Chapter 12 Notes
60 pages
Chapter 5: Regression: 5.1 Meaning and Purpose
No ratings yet
Chapter 5: Regression: 5.1 Meaning and Purpose
4 pages
Session 18 Regression
No ratings yet
Session 18 Regression
16 pages
Simple and Multiple Regression
No ratings yet
Simple and Multiple Regression
56 pages
Polynomial Curve Fitting
No ratings yet
Polynomial Curve Fitting
44 pages
Regression
No ratings yet
Regression
6 pages
Internship - Report NETWORKING PDF
No ratings yet
Internship - Report NETWORKING PDF
24 pages
Unit 2 - Scatterplots Correlation and Regression Summer 2021
No ratings yet
Unit 2 - Scatterplots Correlation and Regression Summer 2021
43 pages
Investigating Variables
No ratings yet
Investigating Variables
15 pages
Regression Analysis
No ratings yet
Regression Analysis
49 pages
Determination of A Solubility Product Constant Lab 19C: Michelle Finkle Kenna Hunter MR Read April 12 2019
No ratings yet
Determination of A Solubility Product Constant Lab 19C: Michelle Finkle Kenna Hunter MR Read April 12 2019
11 pages
(Revised) Simple Linear Regression and Correlation
No ratings yet
(Revised) Simple Linear Regression and Correlation
41 pages
Parts Diagram and Description
No ratings yet
Parts Diagram and Description
8 pages
Module III (Part II) (Regression and Time Series)
No ratings yet
Module III (Part II) (Regression and Time Series)
118 pages
Chapter4 Regression
No ratings yet
Chapter4 Regression
15 pages
Topic 8 - Regression Analysis
No ratings yet
Topic 8 - Regression Analysis
51 pages
1486016038da Mod12 Q1 e Text
No ratings yet
1486016038da Mod12 Q1 e Text
11 pages
Regression Analysis
No ratings yet
Regression Analysis
18 pages
Module 3 EDA
No ratings yet
Module 3 EDA
14 pages
03 Revisions L Regression
No ratings yet
03 Revisions L Regression
25 pages
4 Regression Analysis
No ratings yet
4 Regression Analysis
44 pages
Regression Notes - Part-1
No ratings yet
Regression Notes - Part-1
17 pages
820.9.5X MLA Style (8th Edition) - S17
No ratings yet
820.9.5X MLA Style (8th Edition) - S17
6 pages
Lecture9 Regression1 PDF
No ratings yet
Lecture9 Regression1 PDF
22 pages
10 - Regression 1
No ratings yet
10 - Regression 1
58 pages
7700e SPM
No ratings yet
7700e SPM
2 pages
Viewsonic-Manuals N3235w-1M SM 1a
No ratings yet
Viewsonic-Manuals N3235w-1M SM 1a
100 pages
Regression
No ratings yet
Regression
24 pages
1.3. Clarification To Comments On Turbne Foundation Load Calculation - Rev A
No ratings yet
1.3. Clarification To Comments On Turbne Foundation Load Calculation - Rev A
2 pages
812 Comptizione
No ratings yet
812 Comptizione
10 pages
GEA Marine Purifiers For Motor Yachts - tcm11-83673
No ratings yet
GEA Marine Purifiers For Motor Yachts - tcm11-83673
6 pages
Untitled 472
No ratings yet
Untitled 472
13 pages
Regression and Correlation
No ratings yet
Regression and Correlation
14 pages
CS610 Quiz-3 by Vu Topper RM-2
No ratings yet
CS610 Quiz-3 by Vu Topper RM-2
46 pages
MySky Update Fail
No ratings yet
MySky Update Fail
10 pages
Unit 4 MC and Written ANSWERS
No ratings yet
Unit 4 MC and Written ANSWERS
7 pages
Eagle Incident Form: User Information
No ratings yet
Eagle Incident Form: User Information
6 pages
Lab 19C
No ratings yet
Lab 19C
4 pages
Assignment 3 - Evidence of Core Competencies
No ratings yet
Assignment 3 - Evidence of Core Competencies
3 pages
DE Ch21
No ratings yet
DE Ch21
20 pages
Semi-Supervised K-Means Ddos Detection Method Using Hybrid Feature Selection Algorithm
No ratings yet
Semi-Supervised K-Means Ddos Detection Method Using Hybrid Feature Selection Algorithm
15 pages
Ubc 2020 November Tembrevilla Gerald
No ratings yet
Ubc 2020 November Tembrevilla Gerald
253 pages
Chapter 1 - Shining Resonance Refrain Walkthrough - Neoseeker
No ratings yet
Chapter 1 - Shining Resonance Refrain Walkthrough - Neoseeker
6 pages
Line Blockage Guidance For Planners & Gzac
No ratings yet
Line Blockage Guidance For Planners & Gzac
12 pages
4.5 Self Assessment Inked
No ratings yet
4.5 Self Assessment Inked
2 pages
Regression: by Vijeta Gupta Amity University
No ratings yet
Regression: by Vijeta Gupta Amity University
15 pages
Design and Analysis of CNN-Based Skin Disease Detection System With Preliminary Diagnosis
No ratings yet
Design and Analysis of CNN-Based Skin Disease Detection System With Preliminary Diagnosis
13 pages
Chapter 5 - 1
No ratings yet
Chapter 5 - 1
5 pages
1.3 Notes Part 1
No ratings yet
1.3 Notes Part 1
3 pages
Brian Chiang - Quote Assigment
No ratings yet
Brian Chiang - Quote Assigment
1 page
1.3 Part 2 Boards Inked
No ratings yet
1.3 Part 2 Boards Inked
1 page
SAC MoM 20.3.24
No ratings yet
SAC MoM 20.3.24
10 pages
Welcome To Transport Department Government of Tel 3
No ratings yet
Welcome To Transport Department Government of Tel 3
1 page
Benzara MBA 2024 MAIT
No ratings yet
Benzara MBA 2024 MAIT
3 pages
Wpq-105-03 Gmaw 3g Jose A. Rivas
No ratings yet
Wpq-105-03 Gmaw 3g Jose A. Rivas
1 page
SOP For E-Mail Security Policy - v. 1.0
No ratings yet
SOP For E-Mail Security Policy - v. 1.0
8 pages
Smart Traffic Management Project
No ratings yet
Smart Traffic Management Project
2 pages
113 Trellix NX 4600 Ds Trellix Network Security Tech Specifications Datasheet
No ratings yet
113 Trellix NX 4600 Ds Trellix Network Security Tech Specifications Datasheet
9 pages
Painting Crew Supervisor Interview Question
No ratings yet
Painting Crew Supervisor Interview Question
6 pages
Working With Numpy Arrays
No ratings yet
Working With Numpy Arrays
11 pages
Smuat Guide
No ratings yet
Smuat Guide
53 pages
Calc 3 Final Practice
No ratings yet
Calc 3 Final Practice
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Correlation and Linear Regression

Uploaded by

Correlation and Linear Regression

Uploaded by

Correlation and Linear Regression –

The value of y depends on the value of x,

In ordinary linear regression, the dependent variable Y

The probability that Y is found in a certain interval

If we know the value of X we can make a

This is what the linear regression line gives us

The value of y depends on the value of x

a is the intercept: the point at which the regression line intersects

b is the slope: if you go one unit step in x direction

The value of y depends on the value of x

Corresponds to r(x,y)>0 Corresponds to r(x,y)=0 Corresponds to r(x,y)<0

Figure 6.1 from the textbook

Schematic illustration of simple linear regression. The regression line, 𝑦ො = 𝑏𝑥 +

You aim to build a linear regression model to predict your

Y-axis: Final Exam Score

X-axis: First Exam Score

Red dot marks a

Final Exam Score

Mathematically we formulate this as a minimization problem:

The deviations from

Sum of Squared Errors (SSE)

Sum of Squared Errors (SSE)

What we need to do is:

This is the least-squares regression.

A line is defined in the x-y plane when we

• The derivative of 𝑓(𝑥): 𝑓 ′ 𝑥 =

• Now we have two equations and two unknowns, a and b.

• Now we have two equations and two unknowns, a and b.

• Other expressions of the slope b

Fitting the regression line (by hand) and

Task 1: Find the best value for the slope

Task 1: Find the best value for the slope

Task 1: Find the best value for the slope

Note on the notation with the “^”:

Task 2: Find the intercept point

• statistical mean values,

Analytical solution for min(SSE):

• statistical mean values,

• statistical mean values, Covariance

Slope of the regression line:

Sum of Squared Errors (SSE)

The intercept value guarantees that the

Slope of the regression line:

Linear relationship with errors: y= bx +a + ε

This is the variance of

Table 6.1 from the textbook

𝑆𝑆𝐸 = ෍ 𝑒𝑖2 = ෍ 𝑦𝑖 − 𝑦ො𝑖 2

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.