Ridge regression and the lasso are two commonly used shrinkage methods for feature selection in regression models. Shrinkage works by shrinking regression coefficients towards zero, reducing the influence of unimportant features and producing models with lower prediction error variability compared to direct filtering. Ridge regression performs shrinkage by minimizing residuals plus an L2 penalty on the coefficients, while the lasso uses an L1 penalty, resulting in some coefficients being exactly zero and therefore automatically performing subset selection.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
19 views1 page
Shrinkage Content
Ridge regression and the lasso are two commonly used shrinkage methods for feature selection in regression models. Shrinkage works by shrinking regression coefficients towards zero, reducing the influence of unimportant features and producing models with lower prediction error variability compared to direct filtering. Ridge regression performs shrinkage by minimizing residuals plus an L2 penalty on the coefficients, while the lasso uses an L1 penalty, resulting in some coefficients being exactly zero and therefore automatically performing subset selection.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1
Shrinkage
December 15, 2020
Shrinkage is a useful method for feature selection in regression methods.
By shrinking the regression coefficients towards 0, the relatively unimportant features have little influence on the response variable. Thus, shrinkage may produce a model with lower variability in prediction error, compared to direct filtering methods such as the t-statistic (Hastie et al., 2009). Two commonly used shrinkage methods are ridge regression (Hoerl and Ken- nard, 1970) and the lasso (Tibshirani, 1996). For ridge regression, the regression coefficients are estimated as Xn p p β̂ ridge = arg min X X (yi − β0 − xij βj )2 + λ βj2 , β i=1 j=1 j=1
where p is the number of features, n is the sample size, and λ ≥ 0 is a complexity
parameter for controlling shrinkage amount (larger λ, larger shrinkage). Thus, ridge regression estimates the regression coefficients by minimising Pn the usual sum of squares along with an L2 penalty term, which is λ j=1 βj2 . For the Pn lasso (Tibshirani, 1996), an L1 penalty term λ j=1 |βj | is used instead: Xn p p β̂ lasso = arg min X X (yi − β0 − xij βj )2 + λ |βj | , β i=1 j=1 j=1
Equivalently,Pinclusion of the penalty term implies Pminimisation subject to the
p p constraints j=1 βj2 ≤ t for ridge regression, and j=1 |βj | ≤ t for the lasso, where t is some constant. For the lasso, making t small enough results in some of the regression coefficients becoming exactly zero. Thus, the lasso also can be considered as performing subset selection of the features. Ref.
Hastie, T., Tibshirani, R., and Friedman, J. (2009). The elements of sta- tistical learning: data mining, inference, and prediction (2nd ed.). New York: Springer.