0% found this document useful (0 votes)

4 views11 pages

SL 3

The document discusses various statistical learning concepts, including properties of symmetric matrices, model selection criteria (AIC, BIC, RIC), false discovery rates, and ridge regression. It also covers LASSO methods and the James-Stein estimator, providing formulas and comparisons of prediction errors across different models. The analysis is based on regression models fitted to a dataset with 100 observations and 10 variables.

Uploaded by

Ventus ™

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views11 pages

SL 3

Uploaded by

Ventus ™

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Statistical learning

Assignment 3
Jakub Skalski

March 11, 2025

1. Trace of symmetric real matrix is the sum of its eigenvalues

We can easily show this by first decomposing real symmetric matrix A into its eigenvectors
and eigenvalues and then using the circular property of the trace:
X
tr(A) = tr(P ΛP T ) = tr(P P T Λ) = tr(Λ) = λi
i

2. Properties of A = X T X
2.1 Positive semi-definite
We say that matrix A is positive semi-definite if it satisfies vT Av ≥ 0 for all vectors
v ∈ Rn . Consider any vector v ∈ Rn , then:

vT Av = v T X T Xv = (Xv)T Xv = ∥Xv∥2 ≥ 0

2.2 Non-negative eigenvalues

Consider any eigenvector v ∈ Rn . Since A is positive semi-definite:
vT Av = vT λv = vT vλ ≥ 0
Clearly, vT v is strictly positive, therefore λ ≥ 0.

2.3 At least one eigenvalue is zero when p > n

Matrix is singular, if and only if it is not full rank:
rank(A) < p
First, note that rank(A) is bounded from above by the rank of X T :
rank(A) = rank(X T X) ≤ rank(X T ) ≤ n
From the original assumptions it follows that rank(X T X) ≤ n < p.

1
Assignment 3 - Statistical learning - Jakub Skalski

3. Model selection
The data consists of 100 observations of 10 variables. We fit 10 regression models,
where k-th model includes only the first k variables. The residual sums of squares for
these 10 consecutive models are equal to (1731, 730, 49, 38.9, 32, 29, 28.5 27.8, 27.6,
26.6). Let us consider different criteria for the model selection under the assumption of a
standard error term.

3.1 Akaike Information Criterion

Table 1: Values of the criterion AIC = RSS + 2kσ 2 for each model

k 1 2 3 4 5 6 7 8 9 10
AIC 1733 734 55 46.9 42 41 42.5 43.8 45.6 46.6

3.2 Bayesian Information Criterion

Table 2: Values of the criterion BIC = RSS + klog(n)σ 2 for each model

k 1 2 3 4 5 6 7 8 9 10
≈ BIC 1735 739 63 57 55 57 61 65 69 72

3.3 Risk Inflation Criterion

Table 3: Values of the criterion RIC = RSS + 2klog(p)σ 2 for each model

k 1 2 3 4 5 6 7 8 9 10
≈ RIC 1735 739 63 57 55 57 61 65 69 72

2
Assignment 3 - Statistical learning - Jakub Skalski

4. False discovery rate

Assuming the orthogonal design (X T X = I) and n = p = 10000 we calculate the
expected number of false discoveries of AIC, BIC and RIC for β̂i ∼ N (0, σ 2 ) when
βi = 0.

4.1 Akaike Information Criterion

√
AIC selects variables satisfying β̂i ≥ 2σ, meaning the probability of type I error is:
√
P (Xi selected |βi = 0) = 2(1 − Φ( 2) = 0.16
The number of false discoveries for this criterion is 0.16 ∗ 10000 = 1600.

4.2 Bayesian Information Criterion

√
BIC selects variables satisfying β̂i ≥ log nσ, meaning the probability of type I error is:
p
P (Xi selected |βi = 0) = 2(1 − Φ( log n)) ≈ 0.0024
The number of false discoveries for this criterion is 0.0024 ∗ 10000 = 24.

4.3 Risk Inflation Criterion

√
BIC selects variables satisfying β̂i ≥ 2 log pσ, meaning the probability of type I error is:
p
P (Xi selected |βi = 0) = 2(1 − Φ( 2 log p)) ≈ 0.00002
The number of false discoveries for this criterion is 0.00002 ∗ 10000 = 0.2.

5. Choosing the right criterion

5.1 Akaike Information Criterion
When the primary aim is to select a model that provides the best predictive perfor-
mance rather than identifying the true underlying model, since it directly coincides with
minimizing the prediction error.

5.2 Bayesian Information Criterion

When there is a belief that one of the candidate models is the true model and the sample
size is relatively large, making it ideal for explanatory analysis.

5.3 Risk Inflation Criterion

Mostly useful in high-dimensional settings, such as variable selection in regression
models with a large number of predictors.

3
Assignment 3 - Statistical learning - Jakub Skalski

6. Ridge regression formulas

The multiple regression models are used to describe the relationship between a de-
pendent variable and multiple independent variables. In matrix notation, the multiple
regression model can be formulated as follows:

Y = Xβ + ϵ, ϵ ∼ N (0, σ 2 I)

The general ridge regression solution can be obtained analytically by solving for the zero
gradient of this convex function:

∇[(Y − Xβ)T (Y − Xβ) + λβ T β] = 2X T Xβ − 2X T Y + 2λβ = 0

The closed form general solution is thus:

β̂ = (X T X + λI)−1 X T Y

Under orthogonal design this simplifies to:

1
β̂ = ΛX T Y, where Λ :=
1+λ

6.1 Bias
First, let us compute the expectation of β̂:

E[β̂] = ΛX T E[Y ] = ΛX T Xβ = Λβ

Then the ridge regression bias is equal to the following:

λ
E[β̂] − β = Λβ − β = − β
1+λ
Which, in the case of OLS, where λ = 0 is zero.

6.2 Variance
The ridge regression estimator variance is the following:
σ2
V [β̂] = V [ΛX T Y ] = V [ΛX T (Xβ + ϵ)] = V [ΛX T ϵ] = ΛX T σ 2 XΛ = σ 2 Λ2 =
(1 + λ)2
Which for OLS is simply σ 2 .

6.3 Mean squared error

σ2 σ 2 + λ2 β T β
E[(β̂ − β)T (β̂ − β)] = V [β̂] + (E[β̂] − β)2 = + λ 2 2 T
Λ β β =
(1 + λ)2 (1 + λ)2
which equates to σ 2 for OLS.

4
Assignment 3 - Statistical learning - Jakub Skalski

7. Prediction error
Suppose that for a given data set with 40 explanatory variables the residual sums of
squares from the least squares method and the ridge regression are equal to 4.5 and
11.6, respectively. For the ridge regression the trace of X(X T X + yI)−1 X T is equal
to 32. We will now compute and compare the resulting prediction errors of these two
methods.

7.1 Ordinary least squares

P ˆEo = RSS + 2σ 2 p = 4.5 + 80σ 2

7.2 Ridge regression

P ˆEr = RSS + 2σ 2 T r(M ) = 11.6 + 64σ 2

7.3 Comparison
Assuming σ = 1, we surmise that P Eo is greater than P Er .

8. LASSO
8.1 False discovery rate and power
Under orthogonal design, Lasso selects variable Xi when |β̂i | > λ, where β̂ ∼ N (β, σ 2 ),
so the false discovery rate equates to P (Xi selected |βi = 0) = 2(1 − Φ( σλ )), while the
power is P (Xi selected |βi ̸= 0) = 2(1 − Φ( λ−β i
σ )).

8.2 Computing adaptive LASSO

Solving adaptive LASSO is actually the same as first solving regular LASSO for some
scaled input data and then correcting the produced estimator.

8.2.1 Explanation
1
Let βj = wj · βjw . Replacing in the minimization problem we obtain:
 
2 p
 1 X 1 
β̂ w = arg min y−X · βw +λ wj · βjw
βw  w 2 wj 
j=1

5
Assignment 3 - Statistical learning - Jakub Skalski

Now, we can simply multiply into the absolute value in the latter sum to get rid of the wj ,
so that it becomes:
 
2 p
 1 X 
β̂ w = arg min y − X · β w
+ λ β w
j
βw  w 2 
j=1

Finally, scaling the input data to account for this change we arrive at the regular LASSO
formulation we are familiar with:
 
 Xp 
w w w 2 w
β̂ = arg min ∥y − X β ∥2 + λ βj
βw  
j=1

8.2.2 Solution

The approach is known as the LARS algorithm:

1. Define xw
i,j = xj /wj , j = 1, 2, . . . , p.

2. Solve the lasso problem for the scaled data,

3. Output β̂j = β̂jw /wj , j = 1, 2, . . . , p.

Typically, one would simply pass the vector of absolute inverses of the initial weights to
some regular LASSO solver (like glmnet, for instance).

8.3 Closed form solution under orthogonal design

Solving LASSO is equivalent to minimizing the following objective function:
p
1 X
||Y − Xβ||2 + λi |βi |
2
i

Expanding and rearranging the terms we obtain:

p
1 X
−Y Xβ + ||β||2 +
T
λi |βi |
2
i

Recall that solution to the least squares problem is β̂ = X T Y , and so we can simplify
further:
p p
X 1 2 X
−β̄i βi + βi + λi |βi | = Zi
2
i i
Minimizing the above amounts to minimizing each individual Zi . Taking the subdifferential
of such with respect to β and solving for zero gradient we arrive at:

βi = β̂i − λi

6
Assignment 3 - Statistical learning - Jakub Skalski

Which is only feasible for the non-negatives. Adjusting for the general case we obtain
the following closed form solution:

βi∗ = sgn(β̂i )(|β̂i | − λi )+

8.4 Relation to the ordinary least squares method

One could obtain weights for each variable through OLS along with the estimator and
use them for the adaptive lasso. For instance, suppose that the ordinary least squares
weight w1 is 14 and its estimator of β1 under the orthogonal design is equal to 3. Also,
the regular LASSO estimator of this parameter is equal to 2. We can compute the tuning
parameter first:
2 = sgn(3)(3 − λ)+ =⇒ λ = 1
Then, the adaptive LASSO estimator would be:
1 11
sgn(β̂i )(|β̂i | − λi wi ) = 3 − =
4 4

9. Project 1
9.1 James-Stein Estimators
The most common estimation approach is the Maximum Likelihood Estimator (MLE),
which often corresponds to the sample mean in many contexts. The sample mean
is an unbiased estimator, meaning its expected value is equal to the true parameter
value. When estimating the mean of a multivariate normal distribution with three or
more dimensions, the sample mean is not the best estimator in terms of mean squared
error (Stein paradox). The James-Stein estimator improves upon the sample mean by
”shrinking” it towards a central point (often the origin). This shrinkage reduces the mean
squared error of the estimator.

9.1.1 Shrink to zero

p−2
µ̂JS = 1 − σ 2 ∥x̄∥2 x̄

9.1.2 Shrink to the common mean

p−2
µ̂JS = µ0 + 1 − σ 2 ∥x̄−µ 0∥
2 (x̄ − µ0 )

9.1.3 Experiment

Here, we test the validity of the previous statements. We compute the James-Stein
estimators on a standardized genes data and compare them to the classical maximum

7
Assignment 3 - Statistical learning - Jakub Skalski

likelihood estimator (MLE). Estimators are computed from the first 5 observations, while
the remaining 205 observations are used for validation.

Table 4: Mean squared estimation errors

estimator mle zero mean

MSE ≈ 84 ≈ 85 ≈7

Figure 1: Estimators

9.2 Multiple Regression

Here we look into
the ordinary least square method. The generated data is sampled
1
from N 0, 1000 . The response variable Y is modeled as Y = Xβ + ϵ, where the first
√

five elements of β are 3, and the rest are 0. The error term ϵ follows N (0, I). We fit the
model for k = 2, 5, 10, 100, 500, 950 variables.

8
Assignment 3 - Statistical learning - Jakub Skalski

9.2.1 Prediction Errors

For each model we compute prediction error estimators:

• P E1 = RSS + 2ps2

• P E2 = RSS + 2pσ 2
P Yi −Ŷi 2
• P E3 = ni=1 1−M ii

Figure 2: Prediction error residuals

PE2 is clearly the superior estimator (since it uses the real variance) while PE3 (the
cross-validation error) appears to struggle for highly dimensional data.

9
Assignment 3 - Statistical learning - Jakub Skalski

Table 5: Rounded conditional prediction error estimate averages across thirty runs

k PE PE1 PE2 PE3

2 1002 1001 1001 1001
5 1005 1006 1006 1006
10 1009 1006 1006 1006
100 1098 1099 1099 1111
500 1497 1501 1500 2006
950 1950 1899 1948 20293

The bivariate model achieves supreme prediction error scores by a very small margin.
Using only the prediction error as the metric it would make for the optimal choice.

10. Project 2
The following experiment is designed to investigate the efficacy of various regression
techniques, particularly those involving regularization, such as ridge regression, LASSO,
and SLOPE. We also compare the results with mBIC2 model selection criterion. For
each of the described methods we calculate the square estimation errors E1 = ∥β̂ − β∥2 ,
E2 = ∥X(β̂ − β)∥2 , FDP and TPP.

10.1 Multiple regression methods

10.2 Ordinary least squares estimator
 2
n
X p
X
β̂ OLS = argmin yi − β0 − xij βj 
β i=1 j=1

Computed with cv.glmnet and SLOPE by setting λ to zero.

10.2.1 Ridge estimator

  2 
Xn p
X p
X 
β̂ ridge = argmin yi − β0 − xij βj  +λ 2
βj
β 
i=1 j=1

j=1

Computed with cv.glmnet by passing the alpha = 0 parameter.

10.2.2 LASSO estimator

  2 
Xn p
X p
X 
β̂ lasso = argmin yi − β0 − xij βj  + λ |βj |
β 
i=1 j=1

j=1

10
Assignment 3 - Statistical learning - Jakub Skalski

When fitting a cross-validated lasso model using cv.glmnet, two lambda values are
commonly reported:

• lambda.min: The value of lambda that gives the minimum mean cross-validated
error. This is often referred to as the ”best” lambda because it directly minimizes
the prediction error on the validation set.

• lambda.1se: The largest value of lambda for which the mean cross-validated error
is within one standard error of the minimum. This lambda value usually results in a
sparser model (fewer non-zero coefficients), potentially improving interpretability
and generalization by favoring simpler models.

10.3 Results

Table 6: Rounded error averages across ten runs

model E1 E2 FDP TPP

mBIC2 22 22 0 1
lasso (min) 99 87 0.81 1
lasso (1se) 126 114 0.53 1
lasso (arg) 1479 652 0.97 1
SLOPE 18493 989 0.97 1
ridge 553 405 - -
lasso (ols) 126 114 - -
SLOPE (ols) 720 691 - -

Unit 02 - Nonlinear Classification, Linear Regression, Collaborative Filtering - MD
No ratings yet
Unit 02 - Nonlinear Classification, Linear Regression, Collaborative Filtering - MD
14 pages
SL 4
No ratings yet
SL 4
6 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
18 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
Ec2 1
No ratings yet
Ec2 1
11 pages
3.1 Multiple Linear Regression Model
No ratings yet
3.1 Multiple Linear Regression Model
11 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
Lecture BDS 5 23 24 Print
No ratings yet
Lecture BDS 5 23 24 Print
9 pages
Matrix OLS NYU Notes
No ratings yet
Matrix OLS NYU Notes
14 pages
Weather Wax Hastie Solutions Manual
No ratings yet
Weather Wax Hastie Solutions Manual
18 pages
05 Regression Least Squares
No ratings yet
05 Regression Least Squares
5 pages
LeastSquares DeptMath
No ratings yet
LeastSquares DeptMath
7 pages
Solution Quiz 1
No ratings yet
Solution Quiz 1
5 pages
Linear Regression
No ratings yet
Linear Regression
19 pages
Definition of Simple Linear Regression
No ratings yet
Definition of Simple Linear Regression
9 pages
Notes MSM
No ratings yet
Notes MSM
66 pages
Ch2 Linear Regression Analysis
No ratings yet
Ch2 Linear Regression Analysis
57 pages
Econometrics - Exercise Set 2 (Solution)
No ratings yet
Econometrics - Exercise Set 2 (Solution)
12 pages
Ols 23-24
No ratings yet
Ols 23-24
87 pages
Regression Modelling and Least-Squares: GSA Short Course: Session 1 Regression
No ratings yet
Regression Modelling and Least-Squares: GSA Short Course: Session 1 Regression
6 pages
2 RegularizedRegression
No ratings yet
2 RegularizedRegression
25 pages
Cs419 Closed Form Derv
No ratings yet
Cs419 Closed Form Derv
5 pages
Lecture-4 2
No ratings yet
Lecture-4 2
50 pages
Notes 2
No ratings yet
Notes 2
16 pages
Data Analysis
No ratings yet
Data Analysis
40 pages
Lecture BDS 7-23-24 Print
No ratings yet
Lecture BDS 7-23-24 Print
14 pages
Design and Analysis of Computer Experiments: Theory: 1 Density Estimation
No ratings yet
Design and Analysis of Computer Experiments: Theory: 1 Density Estimation
9 pages
Advanced Regression With JMP PRO Handout
No ratings yet
Advanced Regression With JMP PRO Handout
46 pages
FECO Note 2 - Simple Linear Regression: Xuan Chinh Mai
No ratings yet
FECO Note 2 - Simple Linear Regression: Xuan Chinh Mai
7 pages
CH 01
No ratings yet
CH 01
20 pages
Additional Problem Set Units I and II
No ratings yet
Additional Problem Set Units I and II
8 pages
Notes On Applied Linear Regression
No ratings yet
Notes On Applied Linear Regression
47 pages
Chapter2 Annotated Part2
No ratings yet
Chapter2 Annotated Part2
30 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
Undergraduate Econometric
No ratings yet
Undergraduate Econometric
15 pages
Linear Regression
No ratings yet
Linear Regression
31 pages
Regress
No ratings yet
Regress
11 pages
Wooldridge 6e AppE IM
No ratings yet
Wooldridge 6e AppE IM
5 pages
Daunit 3
No ratings yet
Daunit 3
32 pages
4 - Multiple Linear Regressions
No ratings yet
4 - Multiple Linear Regressions
61 pages
Chapter3 PDF
No ratings yet
Chapter3 PDF
52 pages
Fdsa UNIT V
No ratings yet
Fdsa UNIT V
18 pages
Technometrics
No ratings yet
Technometrics
14 pages
Lecture 2 SLR - 1
No ratings yet
Lecture 2 SLR - 1
28 pages
Lecture Notes On High Dimensional Linear Regression
No ratings yet
Lecture Notes On High Dimensional Linear Regression
73 pages
Data Analytics Unit 3 Notes
100% (3)
Data Analytics Unit 3 Notes
28 pages
TCH442E Quantitative Methods For Finance
No ratings yet
TCH442E Quantitative Methods For Finance
21 pages
Exercise 03
No ratings yet
Exercise 03
5 pages
Ridge Regression and Lasso Estimators For Data Analysis - 1749804481151
No ratings yet
Ridge Regression and Lasso Estimators For Data Analysis - 1749804481151
38 pages
Chapter 3
No ratings yet
Chapter 3
15 pages
Lecture BDS 4 23 24 Print
No ratings yet
Lecture BDS 4 23 24 Print
14 pages
Simple Linear Regression Model
No ratings yet
Simple Linear Regression Model
6 pages
EC501 Lecture 02
No ratings yet
EC501 Lecture 02
27 pages
Mungadze Linear
No ratings yet
Mungadze Linear
21 pages
Regression Analysis
100% (1)
Regression Analysis
280 pages
Multiple Linear Reegression
No ratings yet
Multiple Linear Reegression
21 pages
PE Civil: Transportation Ebook Practice Exam
No ratings yet
PE Civil: Transportation Ebook Practice Exam
41 pages
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Algebraic Equations
From Everand
Algebraic Equations
Demetrios P. Kanoussis
No ratings yet
Numerical Analysis II Essentials
From Everand
Numerical Analysis II Essentials
The Editors of REA
No ratings yet
Soulful Japa Modules
0% (1)
Soulful Japa Modules
583 pages
Application Letter
No ratings yet
Application Letter
18 pages
Dancing Sultanas
No ratings yet
Dancing Sultanas
2 pages
Attitudetowardsresearch
No ratings yet
Attitudetowardsresearch
5 pages
Intro To Human Behavior and Social Environment
No ratings yet
Intro To Human Behavior and Social Environment
53 pages
B1 Preliminary Speaking Part 3 - Teacher's Notes
No ratings yet
B1 Preliminary Speaking Part 3 - Teacher's Notes
11 pages
Gsu List English 15032018
No ratings yet
Gsu List English 15032018
13 pages
Introduction To Intelligent Systems
No ratings yet
Introduction To Intelligent Systems
3 pages
3748 Ist Assighment Week 1
100% (1)
3748 Ist Assighment Week 1
7 pages
Chemistry Notes Class 11 Chapter 3 Classification of Elements and Periodicity in Properties
No ratings yet
Chemistry Notes Class 11 Chapter 3 Classification of Elements and Periodicity in Properties
9 pages
The Scream: Language Level:! ! Learner Type:! Time:!! Activity:! Topic:! Language:! !
No ratings yet
The Scream: Language Level:! ! Learner Type:! Time:!! Activity:! Topic:! Language:! !
7 pages
Disabled Students in Welsh Higher Education A Framework For Equality and Inclusion Karen Beauchamppryor Auth Instant Download
No ratings yet
Disabled Students in Welsh Higher Education A Framework For Equality and Inclusion Karen Beauchamppryor Auth Instant Download
81 pages
Paper 4
No ratings yet
Paper 4
8 pages
Cpar M1
No ratings yet
Cpar M1
23 pages
Tle10 Cookery10 Q4 M3
No ratings yet
Tle10 Cookery10 Q4 M3
15 pages
Tuition Fees For 2019/20: A: Senior School Validated Programmes U
No ratings yet
Tuition Fees For 2019/20: A: Senior School Validated Programmes U
5 pages
Using An Inquiry Approach To Teach Science To Seco
No ratings yet
Using An Inquiry Approach To Teach Science To Seco
7 pages
NYT Learning Network - Student Opinion - Article Analysis Options
100% (1)
NYT Learning Network - Student Opinion - Article Analysis Options
2 pages
Zazzafar Kishi Complt by Mumy Islam-1
No ratings yet
Zazzafar Kishi Complt by Mumy Islam-1
34 pages
July 2025 Timetable
100% (1)
July 2025 Timetable
58 pages
Filipino Education Philosophy (New Slide)
100% (1)
Filipino Education Philosophy (New Slide)
20 pages
Unit 8 - Lesson C - Upc - English 2
No ratings yet
Unit 8 - Lesson C - Upc - English 2
28 pages
October 22 Science 7 Biomagnification
100% (1)
October 22 Science 7 Biomagnification
2 pages
Urban Rattan Employment Application Form
No ratings yet
Urban Rattan Employment Application Form
5 pages
Lesson Plan For Position and Movement Mathematics 8 Lesson 1
No ratings yet
Lesson Plan For Position and Movement Mathematics 8 Lesson 1
5 pages
Hope 3 Q2 - Module 1
No ratings yet
Hope 3 Q2 - Module 1
28 pages
Instrument: Fagerstrom Test For Nicotine Dependence (FTND) : Description
No ratings yet
Instrument: Fagerstrom Test For Nicotine Dependence (FTND) : Description
3 pages
Kechari Mudra
100% (1)
Kechari Mudra
3 pages
Certificates
No ratings yet
Certificates
4 pages
Semi Detailed Lesson Plan in Measure of Central Tendency
No ratings yet
Semi Detailed Lesson Plan in Measure of Central Tendency
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

SL 3

Uploaded by

SL 3

Uploaded by

Statistical learning

March 11, 2025

1. Trace of symmetric real matrix is the sum of its eigenvalues

2.2 Non-negative eigenvalues

2.3 At least one eigenvalue is zero when p > n

3.1 Akaike Information Criterion

3.2 Bayesian Information Criterion

3.3 Risk Inflation Criterion

4. False discovery rate

4.1 Akaike Information Criterion

4.2 Bayesian Information Criterion

4.3 Risk Inflation Criterion

5. Choosing the right criterion

5.2 Bayesian Information Criterion

5.3 Risk Inflation Criterion

6. Ridge regression formulas

∇[(Y − Xβ)T (Y − Xβ) + λβ T β] = 2X T Xβ − 2X T Y + 2λβ = 0

The closed form general solution is thus:

Under orthogonal design this simplifies to:

Then the ridge regression bias is equal to the following:

6.3 Mean squared error

7.1 Ordinary least squares

7.2 Ridge regression

8.2 Computing adaptive LASSO

The approach is known as the LARS algorithm:

2. Solve the lasso problem for the scaled data,

3. Output β̂j = β̂jw /wj , j = 1, 2, . . . , p.

8.3 Closed form solution under orthogonal design

Expanding and rearranging the terms we obtain:

βi∗ = sgn(β̂i )(|β̂i | − λi )+

8.4 Relation to the ordinary least squares method

9.1.1 Shrink to zero

9.1.2 Shrink to the common mean

Table 4: Mean squared estimation errors

estimator mle zero mean

9.2 Multiple Regression

9.2.1 Prediction Errors

For each model we compute prediction error estimators:

Figure 2: Prediction error residuals

k PE PE1 PE2 PE3

10.1 Multiple regression methods

Computed with cv.glmnet and SLOPE by setting λ to zero.

10.2.1 Ridge estimator

Computed with cv.glmnet by passing the alpha = 0 parameter.

10.2.2 LASSO estimator

Table 6: Rounded error averages across ten runs

model E1 E2 FDP TPP

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.