Abstract
Marquandt and Snee (Am Stat 29(1):3–20, 1975), Marquandt (J Am Stat Assoc 75(369):87–91, 1980) and Snee and Marquardt (Am Stat 38(2):83–87, 1984) refer to non-essential multicollinearity as that caused by the relation with the independent term. Although it is clear that the solution is to center the independent variables in the regression model, it is unclear when this kind of collinearity exists. The goal of this study is to diagnose the non-essential collinearity parting from a simple linear model. The collinearity indices \(k_{j}\), traditionally misinterpreted as variance inflation factors, are reinterpreted in this paper where they will be used to distinguish and quantify the essential and non-essential collinearity. The results can be immediately extended to the multiple linear model. The study also has some recommendations for statistical software such as SPSS, Stata, GRETL or R for improving the diagnosis of non-essential collinearity.



Similar content being viewed by others
References
Belsley DA (1982) Assessing the presence of harmful collinearity and other forms of weak data through a test for signal-to-noise. J Econ 20(2):211–253
Belsley DA (1984) Demeaning conditioning diagnostics through centering. Am Stat 38(2):73–77
Berk KN (1977) Tolerance and condition in regression computations. J Am Stat Assoc 72:863–866
Christensen R (2018) Comment on a note on collinearity diagnostics and centering. Am Stat 72(1):114–117
Curto JD, Pinto JC (2011) The corrected vif (cvif). J Appl Stat 38(7):1499–1507
EMMI (2018) European money markets institute. https://www.emmi-benchmarkseu Checked: 1 Feb 2018
Eurostat (2018) European commission. http://www.eceuropaeu/eurostat/web Checked: 1 Feb 2018
García J, Salmerón R, García C, López M (2016) Standardization of variables and collinearity diagnostic in ridge regression. Int Stat Rev 84(2):245–266
Gujarati D (2003) Basic Econometrics, 4th edn. McGraw-Hill, New York
Gunst RF (1984) Toward a balanced assessment of collinearity diagnostics. Am Stat 38:79–82
Jensen D, Ramírez D (2013) Revision: variance inflation in regression. Adv Decis Sci Article ID 671204
Johnston JD, Dinardo J (2001) Métodos de econometría. Ed. Vicens Vives, Barcelona
Marquandt DW (1980) You should standardize the predictor variables in your regression models. J Am Stat Assoc 75(369):87–91
Marquandt DW, Snee R (1975) Ridge regression in practice. Am Stat 29(1):3–20
Novales A (1993) Econometría, 2nd edn. Ed. McGraw-Hil, Madrid
Novales A (2010) Análisis de regresión. https://www.ucmes/data/cont/docs/518-2013-11-13-Analisis%20de%20Regresionpdf Checked: 16 Oct 2017
Salmerón R, Blanco V (2016) El problema de un tamaño muestral pequeño en la regresión lineal: micronumerosidad. Rect@ 17(2):167–177
Salmerón R, García J, García C, Martín ML (2017) A note about the corrected vif. Stat Pap 58(3):929–945
Salmeron R, Garcia C, Garcia J (2019) multiColl: collinearity detection in a multiple linear regression model. https://CRAN.R-project.org/package=multiColl, R package version 1.0
Snee RD, Marquardt DW (1984) Collinearity diagnostics depend on the domain of prediction, the model, and the data. Am Stat 38(2):83–87
Stewart G (1987) Collinearity and least squares regression. Stat Sci 2(1):68–100
Stock J, Watson M (2012) Introducción a la Econometría, 3rd edn. Ed. Pearson, Madrid
Uriel E, Periró A, Contreras D, Moltó M (1997) Econometría: El Modelo Lineal. Ed. Alfa Centauro, Madrid
Velilla S (2018) A note on collinearity diagnostics and centering. Am Stat 72(2):140–146
Wood F (1984) Comment on effect of centering on collinearity and interpretation of the constant. Am Stat 38(2):88–90
Wooldridge J (2009) Introductory Econometrics: A Modern Approach. South-Western Cengage Learning, Canada
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Stewart indices
Appendix A: Stewart indices
Given matrix \(\mathbf {A}\) with dimensions \(n \times p\) partitioned as \(\mathbf {A} = [ \mathbf {A}_{1}, \ldots , \mathbf {A}_{i}, \ldots , \mathbf {A}_{p} ] = [ \mathbf {A}_{i}, \mathbf {A}_{-i}]\) where \(\vert \mathbf {A}\vert \) is the determinant of A and \(\mathbf {A}_{-i}\) is equal to \(\mathbf {A}\) after eliminating column i, Stewart (1987) defined the following index to measure the relation between \(\mathbf {A}_{i}\) and the rest of the columns of \(\mathbf {A}\):
Since \(\vert \mathbf {A}^{t} \mathbf {A} \vert = \vert \mathbf {A}_{-i}^{t} \mathbf {A}_{-i} \vert \cdot \vert \mathbf {A}_{i}^{t} \mathbf {A}_{i} - \mathbf {A}_{i}^{t} \mathbf {A}_{-i} \cdot \left( \mathbf {A}_{-i}^{t} \mathbf {A}_{-i} \right) ^{-1} \cdot \mathbf {A}_{-i}^{t} \mathbf {A}_{i} \vert \), is clear that:
Then, it is verified that:
where \(\mathbf {0}\) is a vector composed of zeros with appropriate dimensions. In addition, when \(i=1,\ldots ,p\), it is verified that:
\(k_{i}^{2} > 1\) if \(\mathbf {A}_{-i}^{t} \mathbf {A}_{-i}\) is positive defined.
\(k_{i}^{2} < 1\) if \(\mathbf {A}_{-i}^{t} \mathbf {A}_{-i}\) is negative defined.
Thus, this index can capture the orthogonality between \(\mathbf {A}_{i}\) and the rest of the columns of matrix \(\mathbf {A}\). However, note that orthogonality does not imply that there is no correlation:
for \(i,j = 1,\ldots ,p, \ i \not = j\), unless the columns have zero mean.
Rights and permissions
About this article
Cite this article
Salmerón-Gómez, R., Rodríguez-Sánchez, A. & García-García, C. Diagnosis and quantification of the non-essential collinearity. Comput Stat 35, 647–666 (2020). https://doi.org/10.1007/s00180-019-00922-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-019-00922-x