0% found this document useful (0 votes)
98 views6 pages

L28 Bayseian Linear Regression Linchpin Sampler PDF

Uploaded by

Ananya Agarwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
98 views6 pages

L28 Bayseian Linear Regression Linchpin Sampler PDF

Uploaded by

Ananya Agarwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

MTH 511a - 2020: Lecture 28

Instructor: Dootika Vats

The instructor of this course owns the copyright of all the course materials. This lecture
material was distributed only to the students attending the course MTH511a: “Statistical
Simulation and Data Analysis” of IIT Kanpur, and should not be distributed in print or
through electronic media without the consent of the instructor. Students can make their own
copies of the course materials for their use.
In this lecture, we will focus only the Bayesian linear regression model, and discuss
accept-reject algorithms that sample from the posterior.

1 Bayesian linear regression


Consider a Bayesian version of the linear regression model, where prior distributions
are assigned to both the regression coefficient and the variance, 2 . Recall that the
likelihood is
y = (y1 , . . . , yn ) | , 2 ⇠ N (X , 2 In ) .
2
The parameters of interest are and and popular prior distributions assume inde-
pendent priors:
2 2
⇠ Np (0, Ip ) and ⇠ Inverse Gamma(a, b) .

Note that the Inverse Gamma distribution has density


✓ ◆ a 1
2 1 2
⇡( ) / 2
e b/ .

The posterior distribution is


n
Y
2 2 2
⇡( , |y) / ⇡( , ) f (yi | , )
i=1
n
Y
2 2
= ⇡( )⇡( ) f (yi | , )
i=1
✓ ◆ a 1 ✓ ◆p/2 ⇢ T
✓ ◆n/2 ⇢
1 b/ 2 1 1 (y X )T (y X )
= 2
e 2
exp 2 2
exp
2 2 2

1

2 n/2 p/2 a 1 (y X )T (y X ) T
b
= exp
2 2 2 2 2

The above is the (p + 1)-dimensional posterior distribution and we want to obtain


samples from this using accept-reject. We already know that accept-reject does not
work well in higher dimensions. So in any
So, to run an accept-reject sampler consider a proposal distribution q( , 2 ) = ⇡( )⇡( 2 ).
That is the proposal distribution is the same as the prior distribution. Then, if MLE
exists,
Q
⇡˜ ( , 2 |y) ⇡( )⇡( 2 ) ni=1 f (yi | , 2 )
=
⇡( )⇡( 2 ) ⇡( )⇡( 2 )
Yn
= f (yi | , 2 )
i=1
n
Y
 f (yi | ˆMLE , ˆ2 MLE ) := M .
i=1

So an accept-reject sampler is theoretically possible to implement. However, we note


that the dimensionality of the problem will certainly impede efficiency. So that it will
be very inefficient to implement AR here and possibly close to impossible to get even
one draw from the posterior distribution in reasonable time.

1.1 Linchpin variable samplers

As we have discussed plenty of times now, it is difficult to implement AR when the


target is high-dimensional or when the upper bound is hard to get. In the first case, a
linchpin variable trick can be very useful. Suppose the target density is

⇡(x, y) .

Then, we can split the joint distribution as the product of conditional times marginal.
That is
⇡(x, y) = ⇡(x|y) ⇡(y) .
If X|Y is known in closed-form and we can sample from it, then we may try and get
samples from the marginal distribution of y. This is beneficial since the dimension of
y is smaller than (x, y), and implementing AR on a smaller dimensional problem will
be much easier. So the algorithm would be
• Generate Y ⇠ ⇡(y)
• Generate X ⇠ X|Y
• Output (X, Y ).

2
The variable Y is called the linchpin variable with target density ⇡(y). We can use
this quite easily in Bayesian linear regression.
Example 1 (Bayesian linear regression). Recall the posterior distribution in Bayesian
linear regression as:

2 2 n/2 p/2 a 1 (y X )T (y X ) T
b
⇡( , |y) = exp
2 2 2 2 2

First, note that we prefer 2 to be the linchpin variable since it is univariate, and
is p-variate. So we need to find the distribution | 2 and the marginal distribution of
2
. Let A = (X T X + I).

Z
2
⇡( , |y)d
Z ⇢
2 n/2 p/2 a 1 yT y 2 T X T y + T
XT X T
b
/ exp d
2 2 2 2 2
⇢ T Z ⇢ T
2 n/2 p/2 a 1 y y b XT X 2 T
XT y T
= exp exp d
2 2 2 2 2 2 2
⇢ T Z ⇢ T
2 n/2 p/2 a 1 y y b (X T X + I) 2 T XT y
= exp exp d
2 2 2 2 2
⇢ T Z (
T
2 n/2 p/2 a 1 y y b A 2 T AA 1 X T y
= exp 2 2
exp
2 2 2
)
1 T T 1 T 1 T T 1 T
(A X y) A(A X y) (A X y) A(A X y)
+ d
2 2 2 2

2 n/2 p/2 a 1 yT y b y T XA 1 AA 1 X T y
= exp +
2 2 2 2 2
Z ⇢ T
A 2 T AA 1 X T y + y T XA 1 AA 1 X T y
= ⇥ exp
2 2
⇢ Z ⇢
2 n/2 p/2 a 1 yT y b y T XA 1 X T y ( A 1 X T y)T A( A 1 X T y)
= exp + exp .
2 2 2 2 2 2 2

2
So | is a multivariate normal distribution

| 2 , y ⇠ Np A 1 X T y, 2
A 1
,

and the integral integrates to a known constant.

Z ⇢
2 2 n/2 p/2 a 1 y T (I XA 1 X T )y b 2 p/2
⇡( , |y)d / exp · det(A)p/2
2 2 2

3

2 n/2 a 1 y T (I XA 1 X T )y b
/ exp .
2 2 2

So the marginal posterior distribution for 2 |y is


✓ ◆
2 n y T (I XA 1 X T )y
|y ⇠ Inverse Gamma + a, +b .
2 2

We have just done the following decomposition


2
⇡( , |y) = ⇡( | 2 , y)⇡( 2 |y) ,

where both those densities are available in closed-form and samples can be generated
easily from them in the following way:
2
1. Generate ⇠ Inverse Gamma as indicated above
2
2. Generate | ⇠ Normal distribution as indicated above.
3. ( , 2 ) is one draw from the posterior. Repeat for many draws, and estimate
posterior mean and quantiles.
We now implement Bayesian linear regression for the cars dataset
###########################################
## Linchpin variable sampler
## for Bayesian linear regression for cars
###########################################
set.seed(1)

# loading the dataset


data(cars)
n <- dim(cars)[1]
X <- cbind(1, cars$speed)
y <- cars$dist
p <- dim(X)[2]
a <- 1 # prior parameters
b <- 1 # prior parameters

Drawing the samples is easy, since no AR step is required

# We implement Monte Carlo sampling using linchping


N <- 1e4
A <- t(X)%*%X + diag(p)
A.inv <- solve(A)

sig2 <- numeric(length = N)


beta <- matrix(0, nrow = N, ncol = p)

4
rate.sig <- ( t(y) %*% (diag(1,n) - X %*% A.inv %*% t(X)) %*% y )/2 + b

# sampling Inverge Gamma for sigma2


sig2 <- 1 / rgamma(N, shape = n/2 + a, rate = rate.sig)

# Sampling beta from multivariate normal


# mean + sqrt(covariance) %*% rnorm
foo <- svd(A.inv) #Singular values decomposition of A^{-1}
Ainv.sqrt <- foo$u %*% diag(foo$d^(1/2)) %*% t(foo$v)

for(i in 1:N)
{
beta[i,] <- A.inv %*% t(X) %*%y + Ainv.sqrt %*% rnorm(p, sd = sqrt(sig2))
# Getting beta estimates
}

We can view the posterior marginal density plots of the samples


par(mfrow = c(1,3))
plot(density(sig2), main = expression(sigma^2))
plot(density(beta[,1] ), main = expression(beta[1]))
plot(density(beta[,2] ), main = expression(beta[2]))

σ2 β1 β2
1.0
0.06
0.008

0.05

0.8
0.006

0.04

0.6
Density

Density

Density
0.03
0.004

0.4
0.02
0.002

0.2
0.01
0.000

0.00

0.0

100 200 300 400 500 −40 −30 −20 −10 0 10 2.5 3.0 3.5 4.0 4.5 5.0 5.5

N = 10000 Bandwidth = 6.579 N = 10000 Bandwidth = 0.9254 N = 10000 Bandwidth = 0.05591

We can also find the posterior means and quantiles:


poster <- cbind(sig2, beta)
colMeans(poster)
# sig2
#232.404843 -14.713624 3.766138

apply(poster, 2, quantile, c(.025, .975))

5
# sig2
#2.5% 157.9957 -27.370729 2.993072
#97.5% 342.2522 -1.772828 4.539354

Note that the posterior credible interval for both 1 and 2 do not have 0 in the interval,
implying both regression coefficients are important and should be treated as non-zero.

2 Questions to think about


• Implement accept-reject for the cars dataset and see for yourself how well the
algorithm works here.
2
• Suppose the marginal posterior distribution for |y was not from a nice known
family. What could we have done then?
• What is a MAP estimator of in this problem?

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy