Advances 20220303 24
Advances 20220303 24
Email address:
Received: August 15, 2022; Accepted: September 1, 2022; Published: September 29, 2022
Abstract: When we rely on the general linear regression model to represent the data, we use the ordinary least squares
method to estimate the parameters of this model. This method, when applied, depends on the fulfillment of certain basic
assumptions and conditions so that there is an accuracy in estimating the parameters of the regression model, and in many
practical applications this hypothesis cannot be achieved, which makes the method of least squares ineffective in giving correct
and accurate results, and this leads to falling into econometric problems. The estimated parameters lose the property of
credibility, unbiased and make them not have the lowest possible variance and not expressive of the original theory. Most
econometric models suffer from the problems of autocorrelation, multicollinearity, and heteroscedasticity. This paper presents
a brief on these problems, their causes, how can be detected, tested, and minimized. The OLS method is based on several
assumptions, and if these assumptions are fulfilled, we obtain unbiased, consistent, and efficient estimates (less variance
compared to other methods). We discuss these problems as follows: First: the problem of multicollinearity Second: The
problem of autocorrelation Third: Variation Heteroscedasticity. This article presents inference for many commonly used
estimators - Variance Inflation Factors, Coefficient covariance matrix, Correlogram of Residuals, Normality Test for
Residuals. Serial correlation LM test, Heteroskedasticity Test: Harvey, Actual and Estimated Residuals.
Keywords: Multicollinearity, Autocorrelation, Heteroscedasticity
1. Introduction
2. Material and Methods
There is no doubt that the assumptions of the linear
regression model may or may not be available. If available, 2.1. Autocorrelation and Detection Tests
the ordinary least squares method is valid for use in Causes of Autocorrelation
measuring the economic relations under study. But if it is not 1) Deleting some explanatory variables from the regression
available, the method of ordinary squares does not become model results in the so-called deletion error, which in
the appropriate method for estimating the parameters of turn is reflected in the values of the random term.
economic relations, and these results in the emergence of 2) Misidentification of the mathematical form of the model.
some econometric problems that make this method an For example, if the real relationship of a dependent
inappropriate method, and it is necessary to search in this variable is nonlinear, but the researcher has used a linear
case for other more appropriate standard methods. formula. Hence, without a doubt, the use of the linear
I will present some standard problems encountered in the formula instead of the non-linear one involves a certain
search: type of error and is reflected in the random term [1].
1. The problem of autocorrelation Below are a few examples of some nonlinear formulas.
2. Variation homoskedasticity problem
3. The problem of lack of normal distribution Yi=A+BXi2+Ui
4. The problem of multicollinearity
Yi2=C + D(1/X) Ui
141 Abeer Mohamed Abd El Razek Youssef: Detecting Of Multicollinearity, Autocorrelation and
Heteroscedasticity in Regression Analysis
As: A, B, C, D and F are constants whose value is 3-Data processing. In some cases, the published data may
estimated in the respective model. These formulas indicate be monthly, and the researcher wants data on a quarterly
that there is a nonlinear relationship between Y and the basis, so he collects it and obtains an average of it. Perhaps it
explanatory variable X in the three formulas. However, it is will provide fewer fluctuating data, which involves a kind of
noted that redefining the variable X2 in the form numbered error that will be repeated from one observation to another
(1), as if we put X2=W, converts the original nonlinear because of the approximation process, which leads to the
relationship to a linear relationship: existence of autocorrelation [2].
Y=A+BWi+U i 2.2. Autocorrelation Tests
The use of the mathematical logarithmic transformation 1. Durban Watson Test
transforms the relationship with the number (3) into a linear 2. Durban h test
relationship as well: 3. Breusch-Godfrey serial correlation LM test
2.2.1. Steps of Residual Series
Source: Prepared by the researcher based on the statistical program EViews 10th Edition
Figure 1. Steps of Residual Series.
Source: Prepared by the researcher based on the statistical program EViews 10th Edition
Figure 2. Durban Watson.
Advances 2022; 3(3): 140-152 142
2.3. Breusch-Godfrey Serial Correlation LM Test f-statistic 2.52 is smaller than the tabular one, which
means accepting the null hypothesis and rejecting the
To perform the Breusch-Godfrey test we have two alternative hypothesis, i.e., rejecting the existence of
possibilities [3]: autocorrelation.
As for the classical f test, we note that the calculated
Source: Prepared by the researcher based on the statistical program EViews 10th Edition
Figure 4. Steps serial correlation LM test.
143 Abeer Mohamed Abd El Razek Youssef: Detecting Of Multicollinearity, Autocorrelation and
Heteroscedasticity in Regression Analysis
But if we take the second case, the calculated Obs*R In the practical aspect of conducting this test, we follow the
squared statistic of 5.41 is greater than the tabular statistic following steps [6]:
which has a chi-square distribution, rejecting the null The first stage: calculating the error term t in the
hypothesis and accepting the alternative hypothesis that is, regression model.
the existence of autocorrelation [4]. The second stage: calculating et2.
The third stage: autoregression of residuals through p
2.4. Inconsistency of Variance (Heterogeneity of Variance) periods of deceleration. Significant delays are preserved.
ARCH test: Test the variance hypothesis, using an Fourth stage: calculating the LaGrange multiplier statistic,
autoregressive conditional variance test. We judge the results, where LM=N*R2, with N sample size and R2 representing the
whether the possibility of accepting the null hypothesis coefficient of determination [7].
which states that the variance of the random error term in the
estimated model is constant, or the rejection of the null 3. Steps of Heteroscedasticity Test
hypothesis and the acceptance of the alternative hypothesis
“absence of the homo of the variance.” [5]. The result decides to accept the null hypothesis with the
We conduct the ARCH Test to test the relationship constant of the variance and the rejection of the inconstant of
between the square of residuals as a dependent variable and the variance for the error term series. This is the confirmation
the square of the slowed residuals for one period to test the of the validity of the second hypothesis of the least squares
null hypothesis saying constant of variance. This test is based method, which states constant of variance [8].
on either the classic Fisher test or the LaGrange multiple tests.
Source: Prepared by the researcher based on the statistical program EViews 10th Edition
Figure 5. Steps of heteroscedasticity test.
Advances 2022; 3(3): 140-152 144
Source: Prepared by the researcher based on the statistical program EViews 10th Edition
Figure 6. Calculation method of heteroscedasticity LM Test.
3.2. Test to Verify The Normalization of Residuals of the Regression Equation (Jarque-Bera Test)
For checking the normal distribution of residuals of the regression equation we use the Jarque-Bera test.
The null hypothesis that the residuals of the regression equation are normally distributed based on the statistic of this choice
can be rejected or accepted [9].
We reject the null hypothesis if the JB statistic value is greater than the tabular value of the chi-square distribution [10].
Source: Prepared by the researcher based on the statistical program EViews 10th Edition
Figure 7. Steps of Normal Distribution Test.
coefficient estimates. Both matrices are used in forming the prediction intervals of the model’s forecasts [14].
LMETHANE_E LNITROUS_OXIDE_E
C LCO2_EMISSIONS LMANUFACTURING
MISSIONS MISSIONS
C 0.439164 0.020861 -0.004496 -0.039445 -0.021249
LCO2_EMISSIONS 0.020861 0.005877 -0.002078 0.001139 -0.005530
LMANUFACTURING -0.004496 -0.002078 0.001028 -5.54E-05 0.000568
LMETHANE_EMISSIONS -0.039445 0.001139 -5.54E-05 0.015156 -0.012460
LNITROUS_OXIDE_EMISSIONS -0.021249 -0.005530 0.000568 -0.012460 0.020348
Source: Prepared by the researcher based on the statistical program EViews 10th Edition.
LMETHANE_EMI LNITROUS_OXIDE
LGDP LCO2_EMISSIONS LMANUFACTURING
SSIONS _EMISSIONS
LGDP 1.000000 0.958840 0.997754 0.674685 0.817970
LCO2_EMISSIONS 0.958840 1.000000 0.953773 0.731382 0.889963
LMANUFACTURING 0.997754 0.953773 1.000000 0.675877 0.826621
LMETHANE_EMISSIONS 0.674685 0.731382 0.675877 1.000000 0.872754
LNITROUS_OXIDE_EMISSIONS 0.817970 0.889963 0.826621 0.872754 1.000000
Source: Prepared by the researcher based on the statistical program EViews 10th Edition
Source: Prepared by the researcher based on the statistical program EViews 10th Edition
From Figure 8, data of residuals are given. Normality of mean value (0.037 <3.000), data were considered unnormally
the above data was assessed. Result showed that data were distributed.
not normally distributed as skewness (0.06772) and kurtosis
(3.308) individually were within ±1. Jarque-Bera test (P = 2. 4.7. Autocorrelation Test
654) were statistically significant, that is, data were An identifiable relationship (positive or negative) exists
considered unnormal distributed. between the values of the error in one period and the values
Although both methods indicated that data were not of the error in another period.
normally distributed. As SD of residuals was less than half
As shown in table Serial correlation LM test since Prob. between X 's and residuals.
Chi-Square (2) less than 5% is (0.0101) We reject H0 that
there is autocorrelation. Obs*R2 =9.198859 =NR2 4.8. Heteroskedasticity and Serial Correlation
H0: ρ = 0 In statistics, heteroskedasticity (or heteroscedasticity)
H1: ρ ≠ 0 happens when the standard deviations of a predicted variable,
P value (Coefficient) > 5% It means there is no correlation monitored over different values of an independent variable or
Advances 2022; 3(3): 140-152 148
as related to prior time periods, are non-constant. With the residual errors is that they will tend to fan out over time,
heteroskedasticity, the tell-tale sign upon visual inspection of as depicted in the image below.
Unconditional Unconditional heteroskedasticity occurs when the heteroskedasticity is uncorrelated with the values of the independent variables.
Heteroskedasticity Although this is a violation of the homoscedasticity assumption, it does not present major problems to statistical inference.
Conditional Conditional heteroskedasticity occurs when the error variance is related/conditional on the values of the independent variables. It poses
Heteroskedasticity significant problems for statistical inference. Fortunately, many statistical software packages can diagnose and correct this error.
As shown in table Prob. Chi-Square (4) less than 5% is (0.0335) We reject H0 that there is Heteroskedasticity. Obs*R2 =
10.44874 = NR2
H0: σ12 = σ22 …σ n2 = 0 The constant homogeneity of variances.
H1: σ i 2 ≠ 0 at least heterogeneity of variances
Source: Prepared by the researcher based on the statistical program EViews 10th Edition
Figure 10. Simple regression.
As appears in figure 10 diagram between independent when the natural logarithm is taken for it. This method is
variable and corresponding to the measurement of the LGDP used when the variable values are large, and it is intended to
(horizontal axis) and the LMETHANE_EMISSIONS and simplify them to reduce dispersion and variance between
LNITROUS_OXIDE_EMISSIONS (vertical axis). shows other variables. The log is characterized by not changing the
increasing positive relation among variables. shape of the distribution but changing the shape of the scale.
The figure 11 shows the variables over time. It is clear
6. Trends of the Variables over Time from the figure that the dependent and independent variables
are in a continuous direction through time .and that there is
To avoid the problem of different scales for each variable, an increasing direct relationship through time between the
the log is chosen. The values of the variable are not affected dependent and independent variable.
Source: Prepared by the researcher based on the statistical program EViews 10th Edition
Figure 11. logarithm of dependent and independent variables.
Advances 2022; 3(3): 140-152 150
Figure 12. Actual and Estimated Residuals. Figure 13. LGDP Residuals.
Source: Prepared by the researcher based on the statistical program EViews Source: Prepared by the researcher based on the statistical program EViews
10th Edition. 10th Edition
The curve in red represents the "true" actual values of the The figure 13 shows the residual time series of the
time series of the dependent variable in the figure 12. logarithm variable of GDP, which means that the residual
The curve in green represents the estimated values of the series is unstable and often there is no co-integration.
dependent variable according to the estimated equation.
The blue curve represents the residuals of the regression 7. Conclusion
equation "random error term perturbation"
From the Figure 12, we notice that there are "extreme" From the above we conclude the following:
abnormal values in the random error term, and I think that the 1. The model suffers from the problem of heterogeneity of
estimated model suffers from a disruption of variance. variance, and this leads to that the predictions in the
There is no essential difference between the actual values, variable Y depending on the estimators Βˆ 's (the
which are the values before including the explanatory coefficients of the independent variables) from the
variables, and the estimated values, which are the values after original data will have large variances, and this means
including the explanatory variables. that the prediction will be inefficient and the reason for
From the figure, we notice an almost similarity. As for the this is that the variance The predictions will include the
curve in blue, it indicates the behavior of residuals, which U variance as well as the parameters variance.
can theoretically be divided into three sections. 2. The model suffers from a problem of autocorrelation,
Where it is noticed that there is relative stability of the which means that Cov (uj, ui) ≠ 0, and therefore the
behavior of the residuals in the middle region of the chain, standard errors σ 2 are rather large, which means that
which in turn explained the absence of the imbalance the accuracy in the model is low and therefore the
between the actual values and the estimated values. confidence intervals and the model’s significance will
As for the sides of the series parties, we notice a be unacceptable and unreliable in and inefficient.
fluctuation, which in turn affects the quality of the overall 3. The model suffers from the problem of linear
model. It is believed that the problem should be treated with interference. This means that the estimators’ values are
the effect of outliers, as you can detect them through the very large and biased, as well as the variances of these
mahaleb’s test. estimators and the covariances are very large, so the
properties of estimators are not BLUE.
6.2. Logarithm of LGDP Residuals
professor Economic Dep., Faculty of Administrative Sciences, teaching us advanced econometrics lectures in the preparatory
Ain Shams University, Egypt .I thank him for teaching the year for the doctorate program, He is a distinguished young
monetary policy course, benefiting from his knowledge, and doctor, genius and a role model for the youth.
Appendix of Study
Table 10. Data of the Study Variables.
Country year GDP methane emissions nitrous oxide emissions CO2 emissions GDP growth (annual%) Manufacturing FDI
Egypt 1990 4.3E+10 11270 9450 87750 2.900791 7.3E+09 1.058425
Egypt 1991 3.74E+10 11980 10130 89370 3.973172 5.99E+09 2.420133
Egypt 1992 4.19E+10 12490 10100 90900 4.642459 6.54E+09 0.994028
Egypt 1993 4.66E+10 12760 10890 92660 4.988731 7.33E+09 0.940415
Egypt 1994 5.19E+10 13000 10030 87900 5.492355 8.31E+09 1.135376
Egypt 1995 6.02E+10 13140 11730 93720 5.575497 9.83E+09 1.268437
Egypt 1996 6.76E+10 13400 12120 98940 6.053439 1.12E+10 1.174393
Egypt 1997 7.84E+10 14100 11760 106060 6.370004 1.28E+10 1.236997
Egypt 1998 8.48E+10 13210 12290 110980 3.535252 1.44E+10 0.527385
Egypt 1999 9.07E+10 14640 12400 116540 2.390204 1.63E+10 0.759753
Egypt 2000 9.98E+10 15100 13170 114610 3.193455 1.8E+10 0.295684
Egypt 2001 9.67E+10 14760 13450 126700 4.092072 1.71E+10 1.590836
Egypt 2002 8.51E+10 15940 14140 129440 4.471744 1.53E+10 5.999509
Egypt 2003 8.03E+10 16040 14540 133020 6.843838 1.39E+10 9.348567
Egypt 2004 7.88E+10 16360 15680 144500 7.087827 1.36E+10 8.876336
Egypt 2005 8.96E+10 16350 15500 162220 7.156284 1.5E+10 5.831413
Egypt 2006 1.07E+11 16940 15180 170750 4.6736 1.72E+10 3.548351
Egypt 2007 1.3E+11 17630 14670 183400 5.147235 2E+10 2.916017
Egypt 2008 1.63E+11 17820 14910 189940 1.764572 2.53E+10 0.204543
Egypt 2009 1.89E+11 15810 14800 197660 2.2262 2.99E+10 1.002341
Egypt 2010 2.19E+11 14880 14400 200310 2.185466 3.53E+10 1.453434
Egypt 2011 2.36E+11 16110 14990 205770 2.915912 3.72E+10 1.50925
Egypt 2012 2.79E+11 16730 14560 215000 4.372019 4.51E+10 2.102581
Egypt 2013 2.88E+11 16090 14630 213860 4.346643 4.79E+10 2.438563
Egypt 2014 3.06E+11 15990 14740 219120 4.181221 5.13E+10 3.142826
Egypt 2015 3.29E+11 15380 15320 226280 5.314121 5.5E+10 3.260263
Egypt 2016 3.32E+11 15570 15670 231230 5.557684 5.6E+10 2.972837
Egypt 2017 2.36E+11 14770 15460 242230 3.569669 3.88E+10 1.602124
Egypt 2018 2.5E+11 13180 15070 247910 3.326742 4.04E+10 1.602124
Egypt 2019 3.03E+11 16800 15650 249370 3.326742 4.82E+10 1.602124
Egypt 2020 3.65E+11 16800 15650 249370 3.326742 5.88E+10 1.602124
Egypt 2021 4.04E+11 16800 15650 249370 3.326742 5.88E+10 1.602124
Egypt 2022 4.04E+11 16800 15650 249370 3.326742 5.88E+10 1.602124
[12] Obite, C. P., Olewuezi, N. P., Ugwuanyim, G. U., & [15] Aslam, M., & Ahmad, S. (2020). The modified Liu-ridge-type
Bartholomew, D. C. (2020). Multicollinearity effect in estimator: a new class of biased estimators to address
regression analysis: A feed forward artificial neural network multicollinearity. Communications in Statistics-Simulation
approach. Asian journal of probability and statistics, 6 (1), 22- and Computation, 1-20.
33.
[16] Negret, P. J., Marco, M. D., Sonter, L. J., Rhodes, J.,
[13] Zhang, T., Zhou, X. P., & Liu, X. F. (2020). Reliability Possingham, H. P., & Maron, M. (2020). Effects of spatial
analysis of slopes using the improved stochastic response autocorrelation and sampling design on estimates of protected
surface methods with multicollinearity. Engineering Geology, area effectiveness. Conservation Biology, 34 (6), 1452-1462.
271, 105617.
[17] Stojkoski, V., Sandev, T., Kocarev, L., & Pal, A. (2022).
[14] Vörösmarty, G., & Dobos, I. (2020, October). Green Autocorrelation functions and ergodicity in diffusion with
purchasing frameworks considering firm size: a stochastic resetting. Journal of Physics A: Mathematical and
multicollinearity analysis using variance inflation factor. In Theoretical, 55 (10), 104003.
Supply Chain Forum: An International Journal (Vol. 21, No. 4,
pp. 290-301). Taylor & Francis.