3-4 CLRM
3-4 CLRM
CLRM: An Overview
GIM
Learning Objectives
I Regression Model.
I Regression vs Correlation.
I Simple Regression.
I Terminology.
I Assumptions.
I Properties of OLS Estimators.
What is a Regression Model
I Concerned with the relationship between a given variable and one or more
other variables.
What is a Regression Model
I Concerned with the relationship between a given variable and one or more
other variables.
I An attempt to explain movements in a variable by one or more other
variables.
What is a Regression Model
I Concerned with the relationship between a given variable and one or more
other variables.
I An attempt to explain movements in a variable by one or more other
variables.
I Names of y and xs in regression models.
What is a Regression Model
I Concerned with the relationship between a given variable and one or more
other variables.
I An attempt to explain movements in a variable by one or more other
variables.
I Names of y and xs in regression models.
I
Regression vs Correlation
I Correlation.
Regression vs Correlation
I Correlation.
I Degree of linear association between two variables.
Regression vs Correlation
I Correlation.
I Degree of linear association between two variables.
I If y and x are correlated, it means that y and x are being treated in a
completely symmetrical way.
Regression vs Correlation
I Correlation.
I Degree of linear association between two variables.
I If y and x are correlated, it means that y and x are being treated in a
completely symmetrical way.
I It is not implied that changes in x cause changes in y , or changes in y cause
changes in x .
Regression vs Correlation
I Correlation.
I Degree of linear association between two variables.
I If y and x are correlated, it means that y and x are being treated in a
completely symmetrical way.
I It is not implied that changes in x cause changes in y , or changes in y cause
changes in x .
I Regression.
Regression vs Correlation
I Correlation.
I Degree of linear association between two variables.
I If y and x are correlated, it means that y and x are being treated in a
completely symmetrical way.
I It is not implied that changes in x cause changes in y , or changes in y cause
changes in x .
I Regression.
I Dependent variable (y ) and the independent variable(s) (xs) are treated very
differently.
Regression vs Correlation
I Correlation.
I Degree of linear association between two variables.
I If y and x are correlated, it means that y and x are being treated in a
completely symmetrical way.
I It is not implied that changes in x cause changes in y , or changes in y cause
changes in x .
I Regression.
I Dependent variable (y ) and the independent variable(s) (xs) are treated very
differently.
I The y variable is assumed to be random or ‘stochastic’ in some way, i.e., to
have a probability distribution.
Regression vs Correlation
I Correlation.
I Degree of linear association between two variables.
I If y and x are correlated, it means that y and x are being treated in a
completely symmetrical way.
I It is not implied that changes in x cause changes in y , or changes in y cause
changes in x .
I Regression.
I Dependent variable (y ) and the independent variable(s) (xs) are treated very
differently.
I The y variable is assumed to be random or ‘stochastic’ in some way, i.e., to
have a probability distribution.
I The x variables are, however, assumed to have fixed (‘non-stochastic’) values
in repeated samples.
Simple Regression
I
Simple Regression
I
Simple Regression
I
Simple Regression
I Thus, the OLS involves minimising the Residual Sum of Squares (RSS).
Simple Regression
I Thus, the OLS involves minimising the Residual Sum of Squares (RSS).
P5 P5
I Minimise t=1
uˆt 2 or t=1
(yt − yˆt )2 .
Simple Regression
OLS: Most common method used to fit a line to the data
I Equation of the fitted line
Simple Regression
OLS: Most common method used to fit a line to the data
I Equation of the fitted line
I yˆt = α̂ + β̂xt
Simple Regression
OLS: Most common method used to fit a line to the data
I Equation of the fitted line
I yˆt = α̂ + β̂xt
I Hence, following function (also known as Loss function) is minimised.
Simple Regression
OLS: Most common method used to fit a line to the data
I Equation of the fitted line
I yˆt = α̂ + β̂xt
I Hence, following function (also known as Loss function) is minimised.
I
Simple Regression
OLS: Most common method used to fit a line to the data
I Equation of the fitted line
I yˆt = α̂ + β̂xt
I Hence, following function (also known as Loss function) is minimised.
I The coefficient estimators for the slope and the intercept are given by
Simple Regression
OLS: Most common method used to fit a line to the data
I Equation of the fitted line
I yˆt = α̂ + β̂xt
I Hence, following function (also known as Loss function) is minimised.
I The coefficient estimators for the slope and the intercept are given by
I
Simple Regression
OLS: Most common method used to fit a line to the data
I Equation of the fitted line
I yˆt = α̂ + β̂xt
I Hence, following function (also known as Loss function) is minimised.
I The coefficient estimators for the slope and the intercept are given by
I
Simple Regression
OLS: Most common method used to fit a line to the data
I Equation of the fitted line
I yˆt = α̂ + β̂xt
I Hence, following function (also known as Loss function) is minimised.
I The coefficient estimators for the slope and the intercept are given by
I
I α̂ = ȳ + β̂x̄ .
Simple Regression
OLS: Most common method used to fit a line to the data
I Equation of the fitted line
I yˆt = α̂ + β̂xt
I Hence, following function (also known as Loss function) is minimised.
I The coefficient estimators for the slope and the intercept are given by
I
I α̂ = ȳ + β̂x̄ .
I Hence, given xt and yt , it is always possible to calculate the values of the
two parameters, α̂ and β̂ that best fit the set of data.
Simple Regression
OLS: Most common method used to fit a line to the data
I Equation of the fitted line
I yˆt = α̂ + β̂xt
I Hence, following function (also known as Loss function) is minimised.
I The coefficient estimators for the slope and the intercept are given by
I
I α̂ = ȳ + β̂x̄ .
I Hence, given xt and yt , it is always possible to calculate the values of the
two parameters, α̂ and β̂ that best fit the set of data.
I This method of finding the optimum is known as OLS.
Simple Regression
I
Simple Regression
I The fund manager has intuition that the β on this fund is positive.
Simple Regression
I The fund manager has intuition that the β on this fund is positive.
I Relationship between x and y given the data.
Simple Regression
I
Simple Regression
I PRF
Terminology
I PRF
I Represents the true relationship between the variables.
Terminology
I PRF
I Represents the true relationship between the variables.
I Also known as the data generating process (DGP).
Terminology
I PRF
I Represents the true relationship between the variables.
I Also known as the data generating process (DGP).
I yt = α + βxt + ut
Terminology
I PRF
I Represents the true relationship between the variables.
I Also known as the data generating process (DGP).
I yt = α + βxt + ut
I Note the disturbance term.
Terminology
I PRF
I Represents the true relationship between the variables.
I Also known as the data generating process (DGP).
I yt = α + βxt + ut
I Note the disturbance term.
I Even if we have the entire population, it is still not possible to obtain a
perfect line.
Terminology
I PRF
I Represents the true relationship between the variables.
I Also known as the data generating process (DGP).
I yt = α + βxt + ut
I Note the disturbance term.
I Even if we have the entire population, it is still not possible to obtain a
perfect line.
I SRF
Terminology
I PRF
I Represents the true relationship between the variables.
I Also known as the data generating process (DGP).
I yt = α + βxt + ut
I Note the disturbance term.
I Even if we have the entire population, it is still not possible to obtain a
perfect line.
I SRF
I Estimated relationship using sample.
Terminology
I PRF
I Represents the true relationship between the variables.
I Also known as the data generating process (DGP).
I yt = α + βxt + ut
I Note the disturbance term.
I Even if we have the entire population, it is still not possible to obtain a
perfect line.
I SRF
I Estimated relationship using sample.
I yˆt = α̂ + β̂xt
Terminology
I PRF
I Represents the true relationship between the variables.
I Also known as the data generating process (DGP).
I yt = α + βxt + ut
I Note the disturbance term.
I Even if we have the entire population, it is still not possible to obtain a
perfect line.
I SRF
I Estimated relationship using sample.
I yˆt = α̂ + β̂xt
I yt = α̂ + β̂xt + ût
Assumptions
I
Properties of OLS Estimators