Handout TRX
Handout TRX
EVIEWS
29 MaY 2008
OUTLINE OF CONTENTS:
I. INTRODUCTORY SESSION
I. INTRODUCTORY SESSION
Target: This session will equip one to take decisions based on time series data. The
regression techniques covered in this session will be particularly useful for people interested
in forecasting and relating/predicting a variable to/from a single or a set of explanatory
variables. It covers the basic elements of ordinary least squares (OLS) models as well as time
series econometrics and forecasting.
Prerequisite: Attendees are familiar with MS Office especially MS Excel, and have basic
knowledge of statistics - descriptive statistics, both numerical (mean, standard deviation,
standard error, etc.) and graphical (histogram, scatter plot, etc.), hypothesis testing and
confidence interval.
Sub Contents:
Wokfile basics: This involves the creation of a new work file or loading in the
memory an existing one. Along with explanation of Data management.
Basic Data Analysis: a brief description of statistical graphs from series and groups
and descriptive statistics.
The classical Linear Regression Model: this will involve methods of estimation-
least squares, maximum likelihood, dummy variables, and autocorrelation and
hetroskedasticity
Refer Handout 1
3|Ganesh Manjhi
Target: Providing basic knowledge of Structural Time Series Models using eviews.
Attendees will able to estimate and analyze econometric models using eviews.
Prerequisite: It entails the knowledge of contents mentioned below and things covered in
Session-I.
Sub Contents:
Sub Contents:
Dummy variable
Handout
(A)Window Basics
***********************************************************************
To create a workfile, click File/New/Workfile, as shown in the following dialogue
box
5|Ganesh Manjhi
Next to enter data by import: Save the excel sheet as text file, click on
File/Import/Read Text-Lotus-Excel
6|Ganesh Manjhi
Name for series OR number if named is in file such as we can write “9” in our
case
(B)Multivariate Methods
o We have the following variables in the workfile:
Cons- Cement Consumption Demand
gdp – Gross domestic product
cngdp – Construction share in GDP
wpi – whole sale price index
plr – Prime lending rate as proxy for interest rate
prdn – Total production
capin – capacity installed
gdcf – gross domestic capital formation
The fitted values can also be viewed along with the actual values of cons and the
residual plot by clicking View/Actual, Fitted, Residual Table.
10 | G a n e s h M a n j h i
The F-statistics at the bottom right of the table gives the joint significance of the
coefficients (excluding the constant) in the regression. Since there is only one
slope coefficient, the F-statistic is equal to the square of the t-statistics of plr
We can also observe the serial correlation by Durbin-Watson statistics(0.859)
R-squared is the coefficient of variation shows goodness of fit of the model.
12 | G a n e s h M a n j h i
To generate the first difference, lag series and growth series: Click the Genr
button on the workfile toolbar and type:
wpi1=wpi(-1)
gdpgr=(gdp-gdp(-1))/gdp(-1)
cngdp=cngdp(-1)
To regress cons on the constant gdp plr wpi [cons = f(c, gdp, plr, wpi)]:
Click Quick /Estimate Equation and specify the equation in the dialog box.
Click OK to get the estimation output
Alternatively, special functions can be used directly in the equation. For instance,
to get the percentage change (growth) in GDP, the @PCH(gdp) can be used
13 | G a n e s h M a n j h i
To test whether more two variables/more than two variables are jointly significant
we do the Wald F-test as follows:
Click View/Coefficient Test/Wald coefficient restrictions
14 | G a n e s h M a n j h i
The Wald F-stat will look as shown below and the observing the p-value we
see that wpi and growth rate of gdp are jointly significant.
15 | G a n e s h M a n j h i
(ii) Multicollinearity
How to deal with the problem of multicollinearity.
*****************************************************************
o Click Quick/Estimate Equation to run the following regression
Cons = f(gdp, cngdp, gdcf, wpi, plr, cngdp)
In the regression result we see that, only the cngdp is significant The R2 is
very high but the variables are not significant. Furthermore, the highly
significant overall F-statistics and low individual t-statistics indicate
collinearity. In order to check this, the correlation matrix of the all the
variables is looked at. Highlight all the variables considered in the above
equation in the workfile, Open Group, View/Correlations
16 | G a n e s h M a n j h i
The high correlation coefficient of 0.98 between cngdp and gdp makes the regression
unable to identify the effects of each of these variables separately
Drop gdp and re-run the regressions. The results thus obtained are:
We see that results are improving over all. The Adjusted-R2 , AIC, Schwartz
etc. are giving better results now
We can further fit this model in a better way after observing the serial
correlation and heteroskedasticity.
17 | G a n e s h M a n j h i
The estimation output of the equation cons=f(c, cngdp, cngdp1, gdp1, plr, wpi,
gdcf) appears as:
In the presence of the lagged dependent variable as one of the regressors, the
Durbin h statistic is used to test for serial correlation.
Create a coefficient result vector. For this, type coef(10) result in the
command window.
To store the h statistics in the first row of the result vector, use the
following command:
coef result(1)=(1-@dw/2)*(@regobs/(1-@regobs*@covariance(7,7)))^.5
The h-statistics -0.91 is lesser than the critical value of the normal distribution at 5% level
of significance -1.96. Hence, the null hypothesis of no serial correlation is not rejected.
20 | G a n e s h M a n j h i
In case the term inside the square root becomes negative, it is not possible to use
the h-statistic for carrying out the test for serial correlation . Alternatively,
In the command window type genr res=resid
Run the following regression
res c res(-1) cngdp, cngdp1, gdp1, plr, wpi, gdcf
(iv) Heteroskedasticity
o How to deal with heteroskedasticity
************************************************************************
Select the variables gdp cngdp, cons, gdcf, plr, wpi and Open as Group to view
the graph of selected series.
To include a time trend in cement consumption function. For this use special
function @trend( ) . For example, @trend(92.01) in a series with value 0 in
1992.01, value 1 in 1992:02, and so on.
Click Quick/Estimate Equation and specify the following in the dialogue box:
Cons c @trend(92.01) cngdp gdcf, wpi, plr
To carry out the White‟s test without specifying the form of heteroskedasticity.
Click View/Residual Test/White Heteroskedasticity (cross term). The following
result appears:
The upper window gives the test statistics under the null hypothesis of
homoskedasticity and the associated p-values. F-statistics is the Wald version of
24 | G a n e s h M a n j h i
the test and Obs*R-sqaured is the Lagrange multiplier (LM) version of the test.
Observing the p-values, the null hypothesis of homoskedasticity is not rejected.
In the presence of the heteroskedasticity the standard errors from the OLS are
incorrect.
To get consistent standard errors. Re-estimate the cons function and at the
same time click on Options. Therein click on heteroskedasticity Consistent
Covariance.
25 | G a n e s h M a n j h i
To generate the weighting series for the efficient cons estimation , use the
following command in the command window:
Re-estimate the cons function and at the same time click Options and mark the
weighted LS/TSLS icon and specify the weight as w.
26 | G a n e s h M a n j h i
Observe the E-views give the result for both the weighted and unweighted
statistics. Here we get better result in the case of WLS.
27 | G a n e s h M a n j h i
Check for the serial correlation, heteroskedasticity to get the efficient estimates.
However multicollinearity is not the problem for forecasting.
Forecast Option
To plot the actual and the forecasted series along with the forecast interval:
Change the sample range to 2001:01 to 2002:04
30 | G a n e s h M a n j h i
Generate the upper and lower bounds of the forecasts interval by the
following commands:
Genr up=consf+2*se
Genr low =consf-2*se
Equilibrium: DD=SS
This is an example of simultaneous equation model. Here we are taking WPI, DD and SS
as endogenous variables and rest of the variables we are considering is either exogenous
or predetermined. Where, SS = Cement Consumption Demand, SS = Cement Supply.
Further to find the equilibrium price level, the WPI equation can be written as follows:
To test for the endogeneity and simultaneity with respect to WPI variable
Click Genr to the workfile window and type res=resid. Alternatively, type genr
res=resid in the command window. This is done to carry out the Hausman Test by
an auxiliary regression.
Next, click Quick/Estimate Equation, to estimate the following equation:
DD=f(WPI, GDPt-1, CONTNt, PLRt, res)
RES in the regression should not be significantly different from zero under the
null hypothesis that WPI is exogenous. Observe that the p-value for res is
significantly different from zero at 5% level of significance. It means WPI is
endogenous at 5% level of significance. If the null hypothesis is rejected then
the OLS estimates of the Cement Demand gives bias and inconsistent results.
o How to do 2SLS?
Regress WPI on all the exogenous variables in the system. This has been
done above in estimating the first regression
Next obtain the fitted values from this regression. For this type the
following in the command window
GENR WPIHAT=WPI-RES
Estimate the following equation:
DD C WPIHAT GDPt-1, CONTNt, PLRt,
Compare these estimation results with that when the Hausman test is
carried out by auxiliary regression. Observe, the standard errors from this
regression are not correct.
o To Obtain the correct standard errors of the 2SLS estimates,
Click Estimate in the equation window.
In the estimation settings, give the method as 2SLS.
Specify all the exogenous variables in the system including the constant in
the instrument list.
Here we get the correctly calculated standard error as final result. From here we can
do the static recursive forecasting as we have done for the single equation model and
calculate the 95% confidence band etc.
36 | G a n e s h M a n j h i
(C)Univariate Methods
(i)Decomposition
o Modelling and Forecasting with the Classical Decomposition(Multiplicative)
Method
********************************************************************
To open the series, Double Click on the “cons”
Next, click “proc”, then “seasonal adjustment”, and “OK”. Then, select ratio to
moving average-Multiplicative”. Write “cons” for the adjusted series
(deseasonalised series) and then click “ok”.
37 | G a n e s h M a n j h i
Since the sum of the „scaling factors‟ is 4.0079, adjust it to 4, i.e., equal to the
number of seasons in a year. For this write
scalar c1a = (1.047819/4.0079)*4
scalar c2a = (0.921718/4.0079)*4
scalar c3a = (0.961611/4.0079)*4
scalar c4a = (1.076753/4.0079)*4
in this command window. This will give you the „adjusted scaling factors‟, where
c1a =1.045754, c2a = 0.919901, c3a = 0.959716, c4a = 1.074631
To get the seasonal indexes, first generate a dummy seasonal factor for each
season in the following way:
38 | G a n e s h M a n j h i
Estimate the trend cyclical regression equation using the deseasonalised data
(consd). Before running the regression, we need to generate a trend variable by
the commands: choose “GENR”, type trend=@trend(1992.02) in the dialogue
box. To get the value of trend equals 1 for quarter 2 of year 1992, generate
another series by writing trend1 = trend +1 in the dialogue box.
40 | G a n e s h M a n j h i
Fitted Trend (= a + bt) can be calculated by writing the following in the dialogue
box “fitted_trend = Consd-resid”
To measure the accuracy of fit, i.e. Root Mean Squared Error, double click
“residual”, then click “view” and select “descriptive statistics”, “Histogram
and stats”. RMSE will be standard deviation of the residual series.
To get Adjusted R-square, first get the standard deviation of the “cons” series
by following the similar steps given for residuals. The value of the standard
deviation will be 5.403. Then write
43 | G a n e s h M a n j h i
************************************************************************
To carry out the ADF(Augmented Dickey fuller) and PP (Phillips Perron) test
consider three different regression equations:
p
yt 0 yt 1 2 t i yt i 1 t (1)
i 2
p
yt 0 yt 1 i yt i 1 t (2)
i 2
p
yt yt 1 i yt i 1 t (3)
i 2
For sample size of 100 the complete set of test statistics are as follows:
To carry out the unit root tests (Augmented Dickey-Fuller Tests) for cement
consumption demand cons, doubles click on cons highlight right click/Open/Unit
root. We will start from a more general model i.e Model(1) and thus include a
constant and trend in the ADF test equation with the optimally chosen lag values :
45 | G a n e s h M a n j h i
The ADF Test statistics reported at the top of the window is the t-statistic of cons (-
4) in the test regression. The t-statistic reported under the null of unit root does not
have a normal distribution but is simulated critical value. The unit root test is one-
tailed test with the null hypothesis of a unit root against the alternative stationary
process (i.e. a root less than unity). We see that for the cons, we cannot reject the
null of unit root hypothesis even at 10% level of significance.
The same exercise you have to carry for Model (2) (i.e. with intercept but without
trend) and Model (3) (i.e. without intercept without trend) till you reject the null of
unit root. If you are not able to reject the null till the Model (3), then you have a
non-stationary series, but if you reject the null of unit root at Model(1) then you
stop there and declare the series non-stationary and so on…..
46 | G a n e s h M a n j h i
To test for the joint significance of unit root and the trend, we carry out a random
walk test. This test is more stringent than unit root test.
Run the test equation by Quick/Estimate Equation and specify the test
equation
To test for the joint significance of @trend (1992:01) and cons (-1), Click
View/Coefficient Tests/Redundant Variables and type in the variables under
the test in the dialog box.
47 | G a n e s h M a n j h i
The result is
The F-statistics ( 3 ) under the null of random walk does not follow the
standard F-distribution and the P-value reported is not applicable. From the
tables above the 5% critical value reported is 6.49 (from table above) and the
estimated F-statistic is 4.91, so we cannot reject the null hypothesis of
presence of stochastic trend (unit root) and thus conclude that the series is
non-stationary.
We see that the null of unit root the absence of deterministic trend cannot be
rejected even at 10% critical value, where the 10% critical value is -3.51 (for a
sample size of 100)
And so on…….!!!
Similarly you can do PP perron test taking the same critical values as in ADF case.
Some more unit root test options available in Eviews are: KPSS (Kwiatkowsky-
phillips-Schmidt-Shin), DFGLS, ERSPO (Eliot-Rothenberg-Stock Point-Optimal)
etc.
Repeat the same unit root test process for other variables considered, such as gdp,
plr, wpi. If all the variables considered are non-stationary at level, then we expect
that series will be stationary at 1st difference and the order of integration will be one
and hence we can carry to do cointegration test to get the long rum relationship
among all the variables. If some of the series are stationary at the level then that
variable will be introduced as exogenous variable.
o VAR Model:
Highlight the variables cons, gdp, plr, wpi Open/as VAR the dialog box will
appear as follows:
49 | G a n e s h M a n j h i
To choose the lag interval for endogenous variables we take the maximum lag and
observe the values for AIC and Schwarz SC. The lag values with minimum AIC
and Schwarz SC give the optimal lag interval. However we can select the lag
interval as 1-4 2-4 3-6 depending on your requirements.
After selecting the lag values and correctly specifying the variables Click OK ,
results will appear as follows:
50 | G a n e s h M a n j h i
Diagnostics Views:
Once you estimated the VAR equation, a set of Diagnostics views are provided
under the menu View/Lag Structure and View/Residual Tests in the VAR window
In the VAR results window Click on View/Lag Structure/AR roots table,
following result appears:
51 | G a n e s h M a n j h i
To carry pair wise Granger Causality Test and tests whether endogenous variables
can be treated as exogenous look at the Wald F-statistics in the result window for
each equation.
OR, One can also try to do this test by highlighting the variables cons, gdp, plr,
wpi, Right Click/Open as Group/Granger Causality, specify the same lag value as
in the VAR. and observe the result whether endogenous variable can be treated as
exogenous
Click on the impulse definitions and choose the appropriate options in the
dialog box shown below:
53 | G a n e s h M a n j h i
While impulse response functions trace the effects of a shock to one endogenous
variable on to the other variables in the VAR, variance decomposition separates the
variation in an endogenous variable into the component shocks to the VAR. Thus, the
variance decomposition provides information about the relative importance of each
random innovation in affecting the variables in the VAR.
To obtain the variance decomposition, select View/Variance Decomposition... from
the VAR object toolbar. You should provide the same information as for impulse
responses above. Here also we get Generalized and Orthogonalized Variance
Decomposition. Cholesky give the orthogonalised variance decomposition where
ordering of the variable are important.
o Cointegration Test:
To carry out the Cointegration test Highlight the variables cons, gdp, plr wpi
Open as Group/View/Cointegration test. The following dialog box appears and
click on Summary option to get all the results together.
55 | G a n e s h M a n j h i
We get one cointegrating vector for the option “No Intercept No Trend” and two
cointegrating vector for “Intercept No Trend”. To check whether the summary
table giving the correct result, it can be confirmed it from individual option
cointegration test by clicking on the single option such as click on option “No
Intercept No Trend”, results appears as follows:
56 | G a n e s h M a n j h i
From the different options of Johansen Cointegration Test we can select our
appropriate cointegrating vector on the basis of expected sign. Suppose we select
option one then we can carry similar exercise for cointegration as in VAR.
57 | G a n e s h M a n j h i
o
o
(ii) ARIMA
o How to plot autocorrelation functions and to determine the presence of a unit root.
o How to determine the order of the ARIMA models using sample autocorrelation and
partial autocorrelation functions
o How to estimate ARIMA models and to use them for forecasting.
***********************************************************************
Set the sample size to 1992:02 to 2002:04
To plot graph highlight cons right click -View/Line Graph
58 | G a n e s h M a n j h i
To plot and compute the autocorrelations of the first difference of cons generate the
first difference of the series by clicking on genr and name it dcons. Click on d (cons)
View/Line Graph. We see that mean of the series appears to be constant, although the
variance is unusually high during 1997-99.
59 | G a n e s h M a n j h i
Check the size of ACF and PACF at various lag lengths. We see that the sample
autocorrelation function is much smaller in magnitude with lag 1, 3, 5, 7 shows
pattern of change in signs. It declines slowly and loosely consistent with stationary
series.
Similarly, plot the autocorrelations of the second difference. The results does not
appear to be qualitatively different from those for d(cons), thus indicating over
differencing, suggesting that the order of integration is d=1
Observing the autocorrelation function for d(cons). We see that it begins decaying
after k=1(value of -0.752), thus exhibiting moving properties that are second or third
60 | G a n e s h M a n j h i
For the diagnostic check, view the model fit by View/Actual, Fitted,
Residual/Graph
61 | G a n e s h M a n j h i
r k
2
Q T (T 2) k 1
(T k )
62 | G a n e s h M a n j h i
The Q-Statistic for the null hypothesis that there is no serial correlation up to
order 20 is 48.82 with a P-value of zero indicating serial correlation in error terms
and misspecification.
Try out ARIMA models of different lag length specifications like (2, 1, 4), (2, 1, 6),
(3, 1, 2), (3, 1, 4), (3, 1, 6), (4, 1, 4)….etc. After some experiment we opt ARIMA (4,
1, 4) model to generate ex post forecasts over various horizons. To estimate this
model, click Quick/Estimate Equation and type:
d(cons) c ar(1) ar(2) ar(3) ar(4) ma(1) ma(2) ma(3) ma(4)
Note that the 4th order AR and MA terms are significant.
For the diagnostic check, view the model fit by View/Actual, Fitted, and
Residual/Graph.
Check the residual autocorrelation function by View/Residual
Estimates/Correlogram-Q-Statistics. The Q-Statistic for up to 20 lags is
approximately 13.55, which is smaller than that from the ARIMA (2, 1, 2) model.
As the figure shows, none of the autocorrelations and partial autocorrelations is
individually significant (except lag 7), nor is the sum of the 20 autocorrelations, as
shown by the Q-Statistic. In other words, the correlograms of both autocorrelations
and partial autocorrelation give the impression that the residuals are purely random.
Hence there is no need to look any other ARIMA model. So we use model ARIMA
(4, 1, 4) for forecasting purposes.
The bias proportion 0.04 indicates that the forecasts consistently track the actual
series. This can be seen graphically by plotting cons and consf. Change the
sample size 2001:01 to 2002:04, highlight cons and consf Open as Group and
View/Graph line
The plot shows that the model over predicts in the 1st two quarters but under
predicts in the last two quarters for the forecasting range
64 | G a n e s h M a n j h i
Similarly we can forecast for the 2nd, 3rd and 4th quarters ahead forecasts and get
the accuracy measures for each period to compare across the models.
***********************************************************************
We have considered the following variables:
Experience<=7
Experience_2 If Experience>=8 Otherwise
Age If Age>=25 Otherwise
Age_1 If Age>=25 and Age<=30 Otherwise
Results show that the coefficient of the experience is significantly different from zero.
Hence earnings do depend on experience.
In the first box above we see that male earning is not significantly different from the
female counterpart. Similarly even after including the race dummy individual
coefficients are not significantly different from the earning of black female in the
second result box.