Regression Analysis: Case Study 1: Dr. Kempthorne September 23, 2013
Regression Analysis: Case Study 1: Dr. Kempthorne September 23, 2013
Dr. Kempthorne
September 23, 2013
Contents
1 Linear Regression Models for Asset Pricing 2
1.1 CAPM Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Historical Financial Data . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Fitting the Linear Regression for CAPM . . . . . . . . . . . . . . 9
1.4 Regression Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5 Adding Macro-economic Factors to CAPM . . . . . . . . . . . . 16
1.6 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1
1 Linear Regression Models for Asset Pricing
1.1 CAPM Theory
Sharpe (1964) and Lintner (1965) developed the Capital Asset Pricing Model
for a market in which investors have the same expectations, hold portfolios of
risky assets that are mean-variance efficient, and can borrow and lend money
freely at the same risk-free rate. In such a market, the expected return of asset
j is
E[Rj ] = Rriskf ree + βj (E[RM arket ] − Rriskf ree )
βj = Cov[Rj , RM arket ]/V ar[RM arket ]
where RM arket is the return on the market portfolio and Rriskf ree is the return
on the risk-free asset.
Consider fitting the simple linear regression model of a stock’s daily excess
return on the market-portfolio daily excess return, using the S&P 500 Index as
the proxy for the market return and the 3-month Treasury constant maturity
rate as the risk-free rate. The linear model is given by:
Rj∗,t = αj + βj RM
∗
arket,t + j,t , t = 1, 2, . . .
where j,t are white noise: W N (0, σ 2 )
Under the assumptions of the CAPM, the regression parameters (αj , βj ) are
such that βj is the same as in the CAPM model, and αj is zero.
> library("zoo")
> load("casestudy_1_0.RData")
> dim(casestudy1.data0.0)
[1] 3373 12
> names(casestudy1.data0.00)
> head(casestudy1.data0.00)
2
2000-01-07 15.87740 33.69002 719.76 31.27315 1441.47 5.38 6.00 6.42 6.52
2000-01-10 15.32631 33.67666 801.52 30.83501 1457.60 5.42 6.07 6.49 6.57
DAAA DBAA DCOILWTICO
2000-01-03 7.75 8.27 NA
2000-01-04 7.69 8.21 25.56
2000-01-05 7.78 8.29 24.65
2000-01-06 7.72 8.24 24.79
2000-01-07 7.69 8.22 24.79
2000-01-10 7.72 8.27 24.71
> tail(casestudy1.data0.00)
We first plot the raw data for the stock GE, the market-portfolio index
SP 500, and the risk-free interest rate.
3
> library ("graphics")
> library("quantmod")
> plot(casestudy1.data0.00[,"GE"],ylab="Price",main="GE Stock")
GE Stock
40
35
30
25
Price
20
15
10
5
Index
4
> plot(casestudy1.data0.00[,"SP500"], ylab="Value",main="S&P500 Index")
1600
1400
1200
S&P500 Index
Value
1000
800
Index
5
> plot(casestudy1.data0.00[,"DGS3MO"], ylab="Rate" ,
+ main="3-Month Treasury Rate (Constant Maturity)")
3
2
1
0
Index
Now we construct the variables with the log daily returns of GE and the
SP500 index as well as the risk-free asset returns
6
+ order.by=time(casestudy1.data0.00)[-1])
> dimnames(r.daily.GE)[[2]]<-"r.daily.GE"
> dim(r.daily.GE)
[1] 3372 1
> head(r.daily.GE)
r.daily.GE
2000-01-04 -0.0408219945
2000-01-05 -0.0017376199
2000-01-06 0.0132681098
2000-01-07 0.0379869230
2000-01-10 -0.0003966156
2000-01-11 0.0016515280
[1] 3372 1
> head(r.daily.SP500)
r.daily.SP500
2000-01-04 -0.0390992269
2000-01-05 0.0019203798
2000-01-06 0.0009552461
2000-01-07 0.0267299353
2000-01-10 0.0111278213
2000-01-11 -0.0131486343
7
> # Merge all the time series together,
> # and display first and last sets of rows
> r.daily.data0<-merge(r.daily.GE, r.daily.SP500, r.daily.riskfree,
+ r.daily.GE.0, r.daily.SP500.0)
> head(r.daily.data0)
> tail(r.daily.data0)
8
0.15
0.10
0.05
r.daily.GE.0
0.00
−0.05
−0.10
r.daily.SP500.0
9
[1] "coefficients" "residuals" "effects" "rank"
[5] "fitted.values" "assign" "qr" "df.residual"
[9] "xlevels" "call" "terms" "model"
Call:
lm(formula = r.daily.GE.0 ~ r.daily.SP500.0, data = r.daily.data0)
Residuals:
Min 1Q Median 3Q Max
-0.153166 -0.005605 -0.000334 0.005560 0.137232
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0001334 0.0002376 -0.561 0.575
r.daily.SP500.0 1.1843613 0.0177920 66.567 <2e-16
>
Note that the t-statistic for the intercept αGE is not significant (-0.5613).
> dim(lmfit0.inflm$infmat)
[1] 3372 6
10
> head(lmfit0.inflm$infmat)
> head(lmfit0.inflm$is.inf)
FALSE TRUE
3243 129
11
> # Re-Plot data adding
> # fitted regression line
> # selective highlighting of influential cases
>
> plot(r.daily.SP500.0, r.daily.GE.0,
+ main="GE vs SP500 Data \n OLS Fit (Green line)\n High-Leverage Cases (red points)\n H
> abline(h=0,v=0)
> abline(lmfit0, col=3, lwd=3)
> # Plot cases with high leverage as red (col=2) "o"s
> index.inf.hat<-which(lmfit0.inflm$is.inf[,"hat"]==TRUE)
> points(r.daily.SP500.0[index.inf.hat], r.daily.GE.0[index.inf.hat],
+ col=2, pch="o")
> # Plot cases with high cooks distance as big (cex=2) blue (col=4) "X"s
> index.inf.cook.d<-which(lmfit0.inflm$is.inf[,"cook.d"]==TRUE)
> points(r.daily.SP500.0[index.inf.cook.d], r.daily.GE.0[index.inf.cook.d],
+ col=4, pch="X", cex=2.)
12
GE vs SP500 Data
OLS Fit (Green line)
High−Leverage Cases (red points)
High Cooks Dist (blue Xs)
o
0.15
o
oo
0.10
o o
oo oo o
oooooo o
o
o oo oo
0.05
o oo oo o
r.daily.GE.0
oo o
oo o o
oooooooooo o
ooo
o o
o o
0.00
oo o
ooo
o oo
oooo
o
oooo
Xo
−0.05
o ooo o
o ooo o o
o ooooo
oo o o
o oo o o
o oo
o o
−0.10
o
o o oo
o oo
r.daily.SP500.0
13
> lmfit0.leverages<-zoo(lmfit0.inflm$infmat[,"hat"], order.by=time(r.daily.SP500.0))
> chartSeries(lmfit0.leverages)
lmfit0.leverages [2000−01−04/2013−05−31]
Last 0.000639405004758484 0.020
0.015
0.010
0.005
0.000
14
The R function plot.lm() generates a useful 2x2 display of plots for various
regression diagnostic statistics:
Standardized residuals
2008−10−13
2008−10−10
3.0
2008−10−10
2009−02−09 2009−02−09
0.00 0.10
Residuals
2.0
1.0
−0.15
2008−10−13
0.0
−0.10 0.00 0.10 −0.10 0.00 0.10
Standardized residuals
10
10
2008−10−10
2009−02−09 1
2009−03−10
2008−12−02 0.5
5
5
0
0
0.5
−10
1
−10
Cook's distance
2008−10−13
2008−10−13
15
1.5 Adding Macro-economic Factors to CAPM
The CAPM relates a stock’s return to that of the diversified market portfo-
lio, proxied here by the S&P 500 Index. A stock’s return can depend on
macro-economic factors, such commodity prices, interest rates, economic growth
(GDP).
Call:
lm(formula = r.daily.GE.0 ~ r.daily.SP500.0 + r.daily.DCOILWTICO,
data = r.daily.data00)
Residuals:
Min 1Q Median 3Q Max
-0.152977 -0.005567 -0.000260 0.005589 0.133583
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0001216 0.0002373 -0.512 0.608532
r.daily.SP500.0 1.1972374 0.0181296 66.038 < 2e-16
r.daily.DCOILWTICO -0.0342538 0.0096188 -3.561 0.000374
The regression coefficient for the oil factor (r.daily.DCOILW T ICO) is sta-
tistically significant and negative. Over the analysis period, price changes in
GE stock are negatively related to the price changes in oil.
Consider the corresponding models for Exxon-Mobil stock, XOM
Call:
lm(formula = r.daily.XOM.0 ~ r.daily.SP500.0, data = r.daily.data00)
Residuals:
Min 1Q Median 3Q Max
-0.085289 -0.005788 -0.000009 0.006230 0.113614
Coefficients:
16
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0002968 0.0002105 1.41 0.159
r.daily.SP500.0 0.8299221 0.0157595 52.66 <2e-16
Call:
lm(formula = r.daily.XOM.0 ~ r.daily.SP500.0 + r.daily.DCOILWTICO.0,
data = r.daily.data00)
Residuals:
Min 1Q Median 3Q Max
-0.085977 -0.005564 0.000010 0.005765 0.105583
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0002520 0.0002029 1.242 0.214
r.daily.SP500.0 0.7823785 0.0155009 50.473 <2e-16
r.daily.DCOILWTICO.0 0.1324461 0.0082237 16.105 <2e-16
The R-squared for XOM is lower than for GE. Its relationship to the market
index is less strong.
The regression coefficient for the oil factor (r.daily.DCOILW T ICO) is sta-
tistically significant and positive.
17
For the extended model, we use the R function plot.lm() to display regression
diagnostic statistics:
> layout(matrix(c(1,2,3,4),2,2)) # optional 4 graphs/page
> plot(lmfit1)
3.0
Standardized residuals
2000−03−07
0.10
2000−03−07 2001−01−03
2008−10−16
2008−10−16
2.0
Residuals
0.00
1.0
−0.10
2001−01−03
0.0
−0.05 0.00 0.05 −0.05 0.00 0.05
10
Standardized residuals
Standardized residuals
2000−03−07
0.5
2008−10−16 2008−10−16
2008−10−13
5
5
0
0
−5
−5
2008−10−15
Cook's distance
2001−01−03 0.5
The high-leverage cases in the data are those which have high Mahalanobis
distance from the center of the data in terms of the column space of the inde-
pendent variables (see Regression Analysis Problem Set).
18
We display the data in terms of the independent variables and highlight the
high-leverage cases.
> # Refit the model using argument x=TRUE so that the lm object includes the
> # matrix of independent variables
> lmfit1<-lm(r.daily.XOM.0 ~ r.daily.SP500.0 + r.daily.DCOILWTICO,
+ data=r.daily.data00,
+ x=TRUE)
> names(lmfit1)
> dim(lmfit1$x)
[1] 3371 3
> head(lmfit1$x)
We now compute the leverage (and other influence measures) with the func-
tion inf luence.measures() and display the scatter plot of the independent vari-
ables, highlighting the high-leverage cases.
> lmfit1.inflm<-influence.measures(lmfit1)
> index.inf.hat<-which(lmfit1.inflm$is.inf[,"hat"]==TRUE)
> par(mfcol=c(1,1))
> plot(lmfit1$x[,2], lmfit1$x[,3],xlab="r.daily.SP500.0", ylab="r.daily.DCOILWTICO.0")
> title(main="Scatter Plot of Independent Variables \n High Leverage Points (red o s)")
> points(lmfit1$x[index.inf.hat,2], lmfit1$x[index.inf.hat,3],
+ col=2,
+ pch="o")
>
19
Scatter Plot of Independent Variables
High Leverage Points (red o s)
o
0.15
o o o
o
0.10
o oo oo o o
oo
o o oo oo o
oo o oo
o
o oo
r.daily.DCOILWTICO.0
o
0.05
oo o
o o oo o o o
ooooo o
o o ooo
o o o
0.00
o o oo
o oo
ooo o
o
o
oo
o oo o
oo
−0.15 −0.10 −0.05
ooo oo ooo o o
o o o o
oo o
o o o oo o oo oo ooo
o o o o o o o oooo
o o o ooo oo
o oo oo
o o o o
oo o o
o
o
r.daily.SP500.0
20
1.6 References
Lintner, J. (1965). “The Valuation of Risky Assets and the Selection of Risky
Investments in Stock Portfolio and Capital Budgets,” Review of Economics and
Statistics, 47: 13-37.
Sharpe, W. (1964). “Capital Asset Prices: A Theory of Market Equilibrium
under Conditions of Risk,” Journal of Finance, 19: 425-442.
21
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.