0% found this document useful (0 votes)

39 views

05 Linear Regression

The document discusses linear regression and estimating coefficients. It begins with the basic form of a simple linear regression model relating a single input (x) to a continuous response variable (y). It then explains that coefficients are estimated to minimize the sum of squared errors between predicted and actual y values. Specifically, the coefficients are estimated using the normal equation, which finds the values that solve the linear system (X^T*X)^-1 *X^T*Y.

Uploaded by

ashishamitav123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views

05 Linear Regression

Uploaded by

ashishamitav123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 50

DATA SCIENCE

LINEAR REGRESSION
AGENDA 2

0. BASIC FORM
I. ESTIMATING COEFFICIENTS
II. CATEGORICAL VARIABLES
III. MAKING INFERENCES
LINEAR REGRESSION

0. BASIC FORM
BASIC FORM 4

continuous categorical
supervised regression classification
unsupervised dimension reduction clustering
BASIC FORM 5

Q: What is the motivation for learning about linear

regression?

• widely used
• runs fast
• easy to use (not a lot of tuning required)
• highly interpretable
• basis for many other methods
BASIC FORM 6

Q: What is a regression model?

BASIC FORM 7

Q: What is a regression model?

A: A functional relationship between input &
continuous a response variable.
BASIC FORM 8

Q: What is a regression model?

A: A functional relationship between input &
response variables.

The simple linear regression model captures a

linear relationship between a single input variable x
and a response variable y :
BASIC FORM 9

Q: What is a regression model?

A: A functional relationship between input &
response variables.

The simple linear regression model captures a

linear relationship between a single input variable x
and a response variable y :
y = b0 + b 1x + e
BASIC FORM 1
0
Q: What do the terms in this model mean?
y = b0 + b 1x + e
BASIC FORM 1
1
Q: What do the terms in this model mean?
y = b0 + b 1x + e
A: y = response variable (the one we want to
predict)
BASIC FORM 1
2
Q: What do the terms in this model mean?
y = b0 + b 1x + e
A: y = response variable (the one we want to
predict)
x = input variable (the one we use to train the
model)
BASIC FORM 1
3
Q: What do the terms in this model mean?
y = b0 + b 1x + e
A: y = response variable (the one we want to
predict)
x = input variable (the one we use to train the
model)
b0 = intercept (where the line crosses the y-axis)
BASIC FORM 1
4
Q: What do the terms in this model mean?
y = b0 + b 1x + e
A: y = response variable (the one we want to
predict)
x = input variable (the one we use to train the
model)
b0 = intercept (where the line crosses the y-axis)
BASIC FORM 1
5
Q: What do the terms in this model mean?
y = b0 + b 1x + e
A: y = response variable (the one we want to
predict)
x = input variable (the one we use to train the
model)
b0 = intercept (where the line crosses the y-axis)
BASIC FORM 1
6
Q: What do the terms in this model mean?
y = b0 + b 1x + e

∆y b1 = ∆ y / ∆ x
∆x
b0

x
BASIC FORM 1
7
We can extend this model to several input variables,
giving us the multiple linear regression model:
BASIC FORM 1
8
We can extend this model to several input variables,
giving us the multiple linear regression model:

y = b0 + b 1 x 1 + … + b n x n + e
LINEAR REGRESSION

I. ESTIMATING
COEFFICIENTS
ESTIMATING COEFFICIENTS 2
0
Q: How to determine the impact of a particular input
variable on the response variable?

( ˆ )
A: The coefficient estimates
ESTIMATING COEFFICIENTS 2
1
Q: What is meant by estimates?
A: We are making an inference based off of a sample.
ESTIMATING COEFFICIENTS 2
2
Q: What is meant by estimates?
A: We are making an inference based off of a sample.

Estimates
y True Model

x
ESTIMATING COEFFICIENTS 2
3
Q: What is meant by estimates?
A: We are making an inference based off of a sample.

Estimates
y True Model

x
A fundamental part of statistics is quantifying our
confidence that our estimates are reflective of truth.
ESTIMATING COEFFICIENTS 2
4
Q: How to estimate coefficients for a linear model?
A: By finding the line that minimizes the sum of
squared residuals.

x
ESTIMATING COEFFICIENTS 2
5
Q: How to estimate coefficients for a linear model?
A: By finding the line that minimizes the sum of
squared residuals.

x
ESTIMATING COEFFICIENTS 2
6
Q: How to estimate coefficients for a linear model?
A: By finding the line that minimizes the sum of
squared residuals.
2
SS residuals  i 1 ( yˆ i  yi )
N
y

x
ESTIMATING COEFFICIENTS 2
7
Q: How to estimate coefficients for a linear model?
A: By finding the line that minimizes the sum of
squared residuals. Model
Prediction
2
SS residuals  i 1 ( yˆ i  yi )
N
y

Observed
Result
x
ESTIMATING COEFFICIENTS 2
8
Q: How to calculate estimates that minimize the sum
of squared errors?
A: Through calculus, it can be shown that the
following equation minimizes the sum of squared
errors.
ˆ
  (X X ) X Y
T 1 T
ESTIMATING COEFFICIENTS 2
9
Let’s walk through an trivial calculation to see how
this works. Predictor column
1, 3.385   44.5  Response column
   
1, 0.48   15.5 
X  1, 1.35  Y   8.1 
 
 
“Dummy” column 1, 465   423 
placeholder for the 1, 36.33  119 .5 
error variable b0    

Along the way, we’ll review some matrix

math.
ESTIMATING COEFFICIENTS 3
0
ˆ
  (X X ) X Y
T 1 T

Transposing simply
means flipping the
columns and rows
1, 3.385 
 
 1, 0.48 
 1 1 1 1 1     5 506.54 
X X  
T
 1, 1.35   
 3. 385 0. 48 1.35 465 36. 33    506 .54 217558 . 38 
1, 465 
1, 36.33 
 
ESTIMATING COEFFICIENTS 3
1
ˆ
  (X X ) X Y
T 1 T

1, 3.385 
 
 1, 0.48 
 1 1 1 1 1     5 506.54 
X X  
T
 1, 1.35   
 3. 385 0. 48 1.35 465 36. 33    506 .54 217558 . 38 
1, 465 
1, 36.33 
 
ESTIMATING COEFFICIENTS 3
2
ˆ
  (X X ) X Y
T 1 T

Only square
matrices can be
inverted

1
 5 506.54   0.26  6.1104 
1
( XX )  
T
   4

6 
 506.54 217558.38    6.110 6.0 10 

Taking the inverse of a 2x2

matrix simply means swapping 217558.38
across diagonals, and dividing
each value by the determinant. 5  217558.38  506.54  506.54
ESTIMATING COEFFICIENTS 3
3
ˆ
  (X X ) X Y
T 1 T

 44.5 
 
 15.5 
 1 1 1 1 1   610.6 
X T Y    8.1    
 3.385 0.48 1.35 465 36.33    201205.4 
 423 
119.5 
 
ESTIMATING COEFFICIENTS 3
4
ˆ
  (X X ) X Y
T 1 T

 ˆ0   0.26  6.1 10 4  610.6   37.201

    
 ˆ    6.1 10  4
 1  6.0 10  201205.4   0.838 
 6 
LINEAR REGRESSION

II. CATEGORICAL
VARIABLES
CATEGORIAL VARIABLES 3
6
Q: How do we deal with categorical variables? (i.e.,
with k levels)

Major (k=4)
Computer Science
Engineering
Business
Literature
Business
Engineering
CATEGORIAL VARIABLES 3
7
Q: How do we deal with categorical variables? (i.e.,
with k levels)
A: Create a k-1 binary (“dummy”) variables.
Major (k=4) Engineering Business Literature
Computer Science 0 0 0
Engineering 1 0 0
Business 0 1 0
Literature
0 0 1
Business
0 1 0
Engineering
1 0 0
Computer Science is the reference
CATEGORIAL VARIABLES 3
8
Q: Why k-1 and not k?
A: Because k-1 captures all possible outputs, and to
avoid multicollinearity.
CATEGORIAL VARIABLES 3
9
Q: Why k-1 and not k?
A: Because k-1 captures all possible outputs, and to
avoid multicollinearity.

Multicollinearity is when two or more

predictor variables in a regression
model are very correlated
CATEGORIAL VARIABLES 4
0
Q: Why k-1 and not k?
A: Because k-1 captures all possible outputs, and to
avoid multicollinearity.
Q: Does it matter which factor level I leave out?
A: Yes, this is the reference point for all other factor
levels.
CATEGORIAL VARIABLES 4
1
Q: Why k-1 and not k?
A: Because k-1 captures all possible outputs, and to
avoid multicollinearity.
Q: Does it matter which factor level I leave out?
A: Yes, this is the reference point for all other factor
levels.
Q: Is this a limitation?
A: Not really, a comparison must have a baseline.
CATEGORIAL VARIABLES 4
2
Q: Is this the only way to represent categorical data?
A: This is the conventional way to represent nominal
data, however, ordinal data can be represented with
integers.

Ordinal meaning that the data have order,

While Nominal data have NO order
CATEGORIAL VARIABLES 4
3
Q: Is this the only way to represent categorical data?
A: This is the conventional way to represent nominal
data, however, ordinal data can be represented with
integers.
Q: What does this mean?
A: Categories that can be ranked (i.e., strongly
disagree, disagree, neutral, agree, strongly agree) can
be represented as 1, 2, 3, 4, 5.
LINEAR REGRESSION

II. MAKING
INFERENCES
MAKING INFERENCS 4
5
Linear modeling is a parametric technique, meaning
that it relies on specific assumptions about the
underlying data:
1) Linearity and additivity of the relationship
between input and response variables
2) Homoscedasticity of the errors
3) Normality of the Error Distribution
4) Statistical independence of the errors
Source: http://people.duke.edu/~rnau/testing.htm
INTERPRETING THE OUPUT 4
6
Q: How to determine the whether a coefficient
estimate is significant?
A: The p-value associated with the coefficient t-
value.
INTERPRETING THE OUPUT 4
7
Q: How to determine the whether a coefficient
estimate is significant?
A: The p-value associated with the coefficient t-
Q: What is a p-value?
value.
A: The probability of getting the observed outcome
(e.g., the coefficient estimate) if the null hypothesis
were true (p < 0.05 is typically considered
significant).
INTERPRETING THE OUPUT 4
8
Q: What is the null hypothesis for linear regression
coefficients?
A: There is no relationship between X and Y.

H0: b j = 0

Ha: b j ≠ 0
INTERPRETING THE OUPUT 4
9
Q: What does the confidence interval mean?
A: 95% of the time, the true coefficients will be in
this range.
True value 1
for

Confidence ˆ j
Intervals for
INTERPRETING THE OUPUT 5
0
Q: What does the confidence interval mean?
A: 95% of the time, the true coefficients will be in
this range. Confidence intervals
are calculated based
True value 1 off of the error
for variance

Confidence ˆ j
Intervals for

Academic Writing:: Guidelines For Preparing A Seminar Paper With Examples
No ratings yet
Academic Writing:: Guidelines For Preparing A Seminar Paper With Examples
50 pages
Chapter2 1
No ratings yet
Chapter2 1
55 pages
Module III (Part II)(Regression and Time Series)
No ratings yet
Module III (Part II)(Regression and Time Series)
118 pages
Lec2 ASE
No ratings yet
Lec2 ASE
86 pages
Econometrics Unit 3 Tedy Best
No ratings yet
Econometrics Unit 3 Tedy Best
147 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
CH 06
No ratings yet
CH 06
22 pages
Lecture3 221109 035214
No ratings yet
Lecture3 221109 035214
87 pages
Topic Simple Linear Regression
No ratings yet
Topic Simple Linear Regression
38 pages
R-programming - Unit 5
No ratings yet
R-programming - Unit 5
43 pages
StatLearning2r PDF
No ratings yet
StatLearning2r PDF
267 pages
Multiple Regression Analysis
No ratings yet
Multiple Regression Analysis
14 pages
10 - Regression 1
No ratings yet
10 - Regression 1
58 pages
SimpleMultipleLinearRegression_FoundationalMathofAI_S24
No ratings yet
SimpleMultipleLinearRegression_FoundationalMathofAI_S24
6 pages
BA501 Week5 Linear Regression
No ratings yet
BA501 Week5 Linear Regression
45 pages
Linear Regression PDF
100% (1)
Linear Regression PDF
32 pages
Regression Equation: Independent Variable Predictor Variable Explanatory Variable Dependent Variable Response Variable
No ratings yet
Regression Equation: Independent Variable Predictor Variable Explanatory Variable Dependent Variable Response Variable
60 pages
Regression Analysis: Causal Relationship Between The Explanatory and
No ratings yet
Regression Analysis: Causal Relationship Between The Explanatory and
17 pages
Chapter 2
No ratings yet
Chapter 2
53 pages
Regression Equations
No ratings yet
Regression Equations
94 pages
1 Linear Regreesion Introduction
No ratings yet
1 Linear Regreesion Introduction
7 pages
ESBE7ch12a (1) (1)
No ratings yet
ESBE7ch12a (1) (1)
48 pages
Simple Regression
No ratings yet
Simple Regression
35 pages
Lecture1 STAT4355
No ratings yet
Lecture1 STAT4355
59 pages
NOTES - UNIT 2 - Machine Learning
No ratings yet
NOTES - UNIT 2 - Machine Learning
33 pages
The Bucharest University of Economic Studies Bucharest Business School Romanian - French INDE MBA Program
No ratings yet
The Bucharest University of Economic Studies Bucharest Business School Romanian - French INDE MBA Program
67 pages
Module -05 Statistical Computing and r Programming
No ratings yet
Module -05 Statistical Computing and r Programming
53 pages
Chapter4_Regression.docx
No ratings yet
Chapter4_Regression.docx
15 pages
Linear Regression
No ratings yet
Linear Regression
22 pages
Fba 1
No ratings yet
Fba 1
9 pages
Fsgs
No ratings yet
Fsgs
28 pages
Module 2 Transcripts_v3
No ratings yet
Module 2 Transcripts_v3
103 pages
3CP10 Final MJJ Linear Regression
No ratings yet
3CP10 Final MJJ Linear Regression
68 pages
Simple Linear Regression Analysis - Final
No ratings yet
Simple Linear Regression Analysis - Final
46 pages
Fundamentals of Business Statistics: 6E John Loucks
No ratings yet
Fundamentals of Business Statistics: 6E John Loucks
40 pages
Unit - 1
No ratings yet
Unit - 1
8 pages
AA3 - Linear Regression - 2024
No ratings yet
AA3 - Linear Regression - 2024
26 pages
Intro To Regresion: Codergirl Data Analysis
No ratings yet
Intro To Regresion: Codergirl Data Analysis
32 pages
Unit -3_ML_24
No ratings yet
Unit -3_ML_24
41 pages
Linear Regression
No ratings yet
Linear Regression
8 pages
Lecture Note #8_PEC-CS701E
No ratings yet
Lecture Note #8_PEC-CS701E
20 pages
lecture 3
No ratings yet
lecture 3
33 pages
Sta 3
No ratings yet
Sta 3
9 pages
2.linear Regression
No ratings yet
2.linear Regression
49 pages
21csc305p Ml Unit 2 Ppt
No ratings yet
21csc305p Ml Unit 2 Ppt
115 pages
Linear Regression
No ratings yet
Linear Regression
108 pages
Week 8 - 10
No ratings yet
Week 8 - 10
72 pages
Linear Regression
No ratings yet
Linear Regression
64 pages
BST 32202 LINEAR REGRESSION 6 SLR ASSUMPTIONS LSE
No ratings yet
BST 32202 LINEAR REGRESSION 6 SLR ASSUMPTIONS LSE
20 pages
Regression
No ratings yet
Regression
44 pages
Mungadze Linear
No ratings yet
Mungadze Linear
21 pages
Linear Regression - Module 3
No ratings yet
Linear Regression - Module 3
16 pages
Regression
No ratings yet
Regression
60 pages
Regression
No ratings yet
Regression
24 pages
Short - Notes - Econometric Methods
No ratings yet
Short - Notes - Econometric Methods
22 pages
ML unit-2
No ratings yet
ML unit-2
52 pages
Topic3_SimpleLinearRegressionModels
No ratings yet
Topic3_SimpleLinearRegressionModels
97 pages
ch12_0
No ratings yet
ch12_0
43 pages
SAT Math: Master the Skills in 40 Pages
From Everand
SAT Math: Master the Skills in 40 Pages
Jennifer L Johnson
No ratings yet
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Labour Laws Every HR Professional Must Master in a Private Limited Company (India) (1)
No ratings yet
Labour Laws Every HR Professional Must Master in a Private Limited Company (India) (1)
7 pages
DPM 4 (Solutions)
No ratings yet
DPM 4 (Solutions)
6 pages
DPM 40(Solutions)
No ratings yet
DPM 40(Solutions)
8 pages
DPM 4
No ratings yet
DPM 4
10 pages
DPM 56(Solutions)
No ratings yet
DPM 56(Solutions)
8 pages
DPM 57(Solutions)
No ratings yet
DPM 57(Solutions)
12 pages
DPM 92(Solutions)
No ratings yet
DPM 92(Solutions)
8 pages
DPM 18(Solutions)
No ratings yet
DPM 18(Solutions)
7 pages
DPM 95
No ratings yet
DPM 95
13 pages
DPM 91(Solutions)
No ratings yet
DPM 91(Solutions)
9 pages
DPM 96
No ratings yet
DPM 96
14 pages
DPM 95(Solutions)
No ratings yet
DPM 95(Solutions)
9 pages
Chapter-4-Simple Linear Regression & Correlation
100% (3)
Chapter-4-Simple Linear Regression & Correlation
9 pages
Assignment 7
No ratings yet
Assignment 7
3 pages
Time Series Practice HW
No ratings yet
Time Series Practice HW
3 pages
ecn222 standard error
No ratings yet
ecn222 standard error
2 pages
Solution Manual For Econometric Analysis 7th Edition by Greene
No ratings yet
Solution Manual For Econometric Analysis 7th Edition by Greene
12 pages
The Multiple Classical Linear Regression Model (CLRM) : Specification and Assumptions
No ratings yet
The Multiple Classical Linear Regression Model (CLRM) : Specification and Assumptions
19 pages
ARDL Model - Hossain Academy Note PDF
100% (1)
ARDL Model - Hossain Academy Note PDF
5 pages
The MDC Procedure: Chapter Contents
100% (1)
The MDC Procedure: Chapter Contents
74 pages
Assignment 3
No ratings yet
Assignment 3
3 pages
Machine Learning: An Applied Econometric Approach
100% (1)
Machine Learning: An Applied Econometric Approach
31 pages
Heteroscedasticity
No ratings yet
Heteroscedasticity
7 pages
BIA B350F Assignment 1 Regression Analysis Sample
No ratings yet
BIA B350F Assignment 1 Regression Analysis Sample
19 pages
Regression by XLstat
No ratings yet
Regression by XLstat
1,025 pages
Lab 3 - Linear Regression
No ratings yet
Lab 3 - Linear Regression
15 pages
STAT 5700 Homework 1
No ratings yet
STAT 5700 Homework 1
19 pages
JD For MRM CAPITAL ONE
No ratings yet
JD For MRM CAPITAL ONE
3 pages
The Purpose of This Feasibility Study Is To Forecast The Sales of Renewable Stationary Generators Over The Next Three Years
No ratings yet
The Purpose of This Feasibility Study Is To Forecast The Sales of Renewable Stationary Generators Over The Next Three Years
2 pages
Analisis Regresi Dan Uji Asumsi Klasik Ayu Febri Citra Pertiwi
No ratings yet
Analisis Regresi Dan Uji Asumsi Klasik Ayu Febri Citra Pertiwi
16 pages
ARDL Coint EViews
No ratings yet
ARDL Coint EViews
13 pages
Assignment No.6
No ratings yet
Assignment No.6
8 pages
Mann-Whitney U-Distribution PDF
No ratings yet
Mann-Whitney U-Distribution PDF
1 page
Baiko
No ratings yet
Baiko
15 pages
The Seven Classical OLS Assumptions: Ordinary Least Squares
No ratings yet
The Seven Classical OLS Assumptions: Ordinary Least Squares
7 pages
Ec401 Midterms
No ratings yet
Ec401 Midterms
11 pages
ECON2280 Introductory Econometrics 2012-21
No ratings yet
ECON2280 Introductory Econometrics 2012-21
9 pages
Time Series Analysis of Inflation
No ratings yet
Time Series Analysis of Inflation
26 pages
Briefly Explain The Properties of Good Estimators
No ratings yet
Briefly Explain The Properties of Good Estimators
4 pages
4 Regression Diagnostics I
No ratings yet
4 Regression Diagnostics I
10 pages
Chap 014
No ratings yet
Chap 014
16 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

05 Linear Regression

Uploaded by

05 Linear Regression

Uploaded by

DATA SCIENCE

Q: What is the motivation for learning about linear

Q: What is a regression model?

Q: What is a regression model?

Q: What is a regression model?

The simple linear regression model captures a

Q: What is a regression model?

The simple linear regression model captures a

Along the way, we’ll review some matrix

Taking the inverse of a 2x2

 ˆ0   0.26  6.1 10 4  610.6   37.201

Multicollinearity is when two or more

Ordinal meaning that the data have order,

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.