5 - Part II - Regression Analysis w-notes(1)
5 - Part II - Regression Analysis w-notes(1)
OMGT 3223
Lecture 5: Regression Analysis
Lecture Outline
Predictive Models
In this topic we turn our attention to regression models. Recall that quantitative models can be classified as descriptive,
prescriptive or predictive models. Regression models are predictive models.
Predictive models are similar to descriptive models but allow for the fact that the outputs cannot be predicted exactly
from the inputs. In a predictive model the exact functional relationship between the inputs and outputs cannot be fully
described with the inputs available.
Regression Analysis
Regression analysis examines the relation of a dependent variable to one or more independent variables. Regression
analysis is one of the most widely applied tools in statistics. Regression is a rich and multi-faceted topic (entire courses
cover nothing but regression!) and is one of the most widely applied tools in statistics. Our objectives will be to review
simple and multiple regression and to learn how perform regression analysis using Excel.
Linear regression is a mathematical technique that relates a dependent variable to one or more independent variables in
the form of a linear equation.
1. Simple linear regression (aka Bivariate regression) generates a linear equation that best fits the observed data to
a single independent or predictor variable.
2. Multiple linear regression generates a linear equation that best fits the observed data to multiple independent or
predictor variables.
-1-
OMGT 3223 - Lecture 5: Regression Analysis
There are several different lines that provide a “good” approximation to the data. Therefore, we need some criteria to
determine the “best” fit line. Different criteria are used in different applications, all of which are based on minimizing
some “error” function. Errors, also known as residuals, can be measured for each y data point.
-2-
OMGT 3223 - Lecture 5: Regression Analysis
i 1
yˆi b0 b1 xi
The regression line predicts the value of y for a given x. The predicted y value ( yˆ i ) is the value calculated from the
regression equation for the corresponding x value (xi). The actual y values will be scattered around that prediction.
-3-
OMGT 3223 - Lecture 5: Regression Analysis
Interpretation: The correlation coefficient, r, measures the strength and direction of a linear relationship.
Interpretation: The coefficient of determination, R2, measures the proportion of the total variation in y that is
explained by the regression line, i.e., R2 measures the percentage of variation in the dependent variable y resulting
from changes in the independent variable x.
-4-
OMGT 3223 - Lecture 5: Regression Analysis
Two alternative approaches can be used to perform simple linear regression analysis in Excel.
1. The Data Analysis Add-In can be used to develop and evaluate a simple linear regression model.
2. Individual Excel functions can also be used to estimate and evaluate a simple linear regression model.
The =Intercept and =Slope functions can be used to estimate a simple regression line.
The =Correl and =RSQ functions can be used to evaluate a simple linear regression model.
Download the file Simple Linear Regression Example.xls and complete the following tasks:
1. Use the data to develop a simple linear regression model.
2. Forecast the number of applications for the university if tuition increases to $10,000 per year and if tuition is
lowered to $7,000 per year.
3. Evaluate the regression model.
-5-
OMGT 3223 - Lecture 5: Regression Analysis
-6-
OMGT 3223 - Lecture 5: Regression Analysis
Regression Model:
The following equation predicts the average number of freshman applications for a given tuition cost.
# Applications =
Model Evaluation:
Tuition cost “explains” about 65% of the variation in the number of freshman applications in this sample.
The standard error of 408.4 is the estimated standard deviation of the number of applications about the mean.
ANOVA Table: The ANOVA table is the middle section of the regression output. ANOVA stands for ANalysis Of
VAriance. ANOVA studies the overall variation in the y variable and the data in the ANOVA table can help us perform
diagnostics on the model.
Model Significance: We can use the ANOVA table to determine how significant our regression model is. The
Significance-F value tells us if the model can make statistically significant predictions.
A model with a Significance-F value under .05 is generally regarded as a statistically significant model.
In simple (bivariate) regression models we limited ourselves to a single predictor or independent variable. However, in
many cases we may have multiple potential predictor variables (e.g., house prices).
Simple regression can be easily extended to allow for multiple predictors. When we have more than one predictor we refer
to the model as a multiple regression model. Multiple regression is a more powerful extension of linear regression.
-7-
OMGT 3223 - Lecture 5: Regression Analysis
Yˆ b0 b1 X1 b2 X 2 bk X k
The mean values of the dependent or response variable Y are estimated as a linear function of the multiple independent or
predictor X variables.
Data Format:
The n observed values of the response variable Y and the proposed predictor variables X1, X2, … , Xk are presented in the
form of an n x k matrix.
The Data Analysis Add-In can be used to develop Multiple Linear Regression models.
Download the file Multiple Linear Regression Example.xls and complete the following tasks:
1. Use Excel to develop a multiple linear regression equation using ‘Private Endowment’ and ‘Annual Budget’ as
the independent or predictor variables.
2. Forecast a ranking for a private endowment of $70 million and an annual budget of $40 million.
3. Evaluate the regression model.
-8-
OMGT 3223 - Lecture 5: Regression Analysis
-9-
OMGT 3223 - Lecture 5: Regression Analysis
Regression Model:
The following equation predicts the ranking position for a given endowment figure and a given budget level.
Model Evaluation:
The model “explains” about 91.4% of the variation in the ranking positions in this sample.
The Adjusted R2: Adjusted R2 is a measure similar to R2. The Adjusted R2 adjusts the R2 down, adding a penalty for
including more independent variables. We need the Adjusted R2 because adding more predictor variables will always
make R2 increase (or at least stay the same!)
Unlike the R2, the Adjusted R2 will increase only if the additional predictor variables improve the model more
than would be expected by chance. Adding more predictors may make Adjusted R2 decrease if the new variable
is not a good predictor.
Model Significance:
We can determine how significant a multiple regression model is. The Significance-F value tells us if the model can
make statistically significant predictions.
A model with a Significance-F value under .05 is generally regarded as a statistically significant model.
Variable Significance:
We can also determine how significant each predictor (i.e., each independent variable) is. A t-statistic and a p-value are
calculated for each independent variable. The p-value associated with each independent variable represents the
probability the variable is not significant (i.e. the probability the variable is significant due to chance).
A variable with a p-value under .05 is generally regarded as a statistically significant predictor.
- 10 -