0% found this document useful (0 votes)
95 views12 pages

Linear Regression Model: Alan Ledesma Arista

This document summarizes Bayesian inference for linear regression models. It discusses: 1) The likelihood function for the linear regression model assumes errors are normally distributed. 2) Non-informative priors result in posterior distributions where β follows a t distribution and σ2 follows a gamma distribution. 3) Conjugate priors can be used that maintain the same distribution families for the posterior as the prior. A normal-gamma prior is discussed as a conjugate prior for the linear regression model.

Uploaded by

joel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views12 pages

Linear Regression Model: Alan Ledesma Arista

This document summarizes Bayesian inference for linear regression models. It discusses: 1) The likelihood function for the linear regression model assumes errors are normally distributed. 2) Non-informative priors result in posterior distributions where β follows a t distribution and σ2 follows a gamma distribution. 3) Conjugate priors can be used that maintain the same distribution families for the posterior as the prior. A normal-gamma prior is discussed as a conjugate prior for the linear regression model.

Uploaded by

joel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

January 2021

Universidad Nacional de Ingeniera


FIECS - Summer School in Economics and Finance
Bayesian Econometrics of Time Series

Linear regression model


Alan Ledesma Arista1

Contents
1 Bayesian inference 2

2 Likelihood function 3

3 Non informative priors 4

4 Conjugate priors 5

5 Independent priors 7

6 Simulations 10

1
Research analyst at Central Reserve Bank of Peru. Email: alan.ledesma@bcrp.gob.pe
Bayesian inference Bayesian Econometrics of Time Series

1 Bayesian inference
• We are interested in learning about a set of coefficients θ (of a model) based on the data Y.

• In the Bayesian approach the ‘true’ coefficient are regarded as random variables; hence, we estimate distributions rather
than points.

• The Bayes theorem is used to ‘update’ our ‘beliefs’ on θ:

1. The prior distribution: p(θ). It reflects the researcher beliefs about θ and it is parametrized within a PDF.
2. The likelihood function L(θ) ≡ p(Y|θ). It describes the likelihood of observing Y conditional on θ.
3. The posterior distribution p(θ|Y). This measure is the fundamental interest of a Bayesian analysis. It summarizes
what we learn about θ given the data. It can also be understood as the update of our beliefs once the data has
been processed.

• Bayes theorem:
p(Y|θ)p(θ)
p(θ|Y) =
p(Y)
as inference is made upon θ de denominator can be treated as an scalar. Hence

p(θ|Y) ∝ p(Y|θ)p(θ) ≡ L(θ)p(θ) (1.1)

where ∝ means “proportional to”.

Alan Ledesma Arista UNI Page 2 out of 12


Likelihood function Bayesian Econometrics of Time Series

2 Likelihood function
• The linear regression model is

y = Xβ + ε, (2.1)

where y, X and ε are n × 1, n × k and n × 1, respectively.

• If ε ∼ N (0n,1 , σ 2 In ) and X is exogenous then y ∼ N (Xβ, σ 2 In ); hence


0
 
1 (y − Xβ) (y − Xβ)
L(β, σ 2 ) = p(y|β, σ 2 )y ∝ (σ 2 )−n/2 exp − .
2 σ2

• Remember that OLS estimates of β and σ 2 are

0 −1 0 (y − Xb)0 (y − Xb)
2
b = (X X) X y and s = ,
n−k
therefore, the expression (y − Xβ)0 (y − Xβ) can be reduced to

(y − Xβ)0 (y − Xβ) =(y − Xb + Xb − Xβ)0 (y − Xb + Xb − Xβ)


=[(y − Xb) − X(β − b)]0 [(y − Xb) − X(β − b)]
=(y − Xb)0 (y − Xb) − (y − Xb)0 X(β − b) − (β − b)0 X0 (y − Xb) + (β − b)0 X0 X(β − b)
=(y − Xb)0 (y − Xb) + (β − b)0 X0 X(β − b)
=(n − k)s2 + (β − b)0 X0 X(β − b)

the second to last equality comes from the following well-known result X0 (y − Xb) = X0 (y − X(X0 X)−1 X0 y) =
X0 (I − X(X0 X)−1 X0 )y = (X0 − X0 X(X0 X)−1 X0 )y = 0. Hence, the likelihood remains as follows:
2 0 0
 
1 (n − k)s + (β − b) X X(β − b)
L(β, σ 2 ) = p(y|β, σ 2 )y ∝ (σ 2 )−n/2 exp − . (2.2)
2 σ2

Alan Ledesma Arista UNI Page 3 out of 12


Non informative priors Bayesian Econometrics of Time Series

3 Non informative priors


• If there is no a priori information over β, σ 2 a non informative prior should be set.
• A non informative prior is p(β, σ 2 ) = 1, i.e., all pairs (β, σ 2 ) are equally likely.
• As the prior is non informative, it does not alter the information within the likelihood function. Hence, the posterior
distribution is
p(β, σ 2 |y) ∝L(β, σ 2 )p(β, σ 2 ) = L(β, σ 2 )
2 0 0
 
1 (n − k)s + (β − b) X X(β − b)
∝(σ 2 )−n/2 exp − . (3.1)
2 σ2

• As a result, the posterior distribution of β given σ 2 is


 
2 1 0 2 −1 0
p(β|σ , y) ∝ exp − (β − b) [(σ ) (X X)](β − b) . (3.2)
2
from where β|σ 2 , y ∼ N (b, σ 2 (X0 X)−1 )
• The posterior marginal density of β is
Z ∞
p(β|y) = p(β, σ 2 |y)dσ 2 (3.3)
0
Using identity (2.1) in the technical appendix, it can be shown that
−(ν+k)/2
p(β|y) ∝ ν + (β − b)0 [(s2 )−1 X0 X](β − b)

with ν = n − k, (3.4)
from where β|y ∼ t(b, s2 (X0 X)−1 , n − k).
• The posterior marginal density of σ 2 is
Z ∞
((n − k)/2)s2
 
2 2 1
p(σ |y) = p(β, σ |y)dβ ∝ 2 (n−k−2)/2+1 exp − . (3.5)
0 (σ ) σ2
then σ 2 |y ∼ Γ−1 n−k−2 n−k 2

2 , 2 s

Alan Ledesma Arista UNI Page 4 out of 12


Conjugate priors Bayesian Econometrics of Time Series

4 Conjugate priors
• A prior p(θ) is called “conjugate prior” of the likelihood p(y|θ) if the posterior distribution p(θ|y) is in the same
probability distribution family as the prior density.
 
2 2 2 −1 ν−k−2 η
• In the case of the linear regression model, set the following prior distribution: β|σ ∼ N (b, σ Q) and σ ∼ Γ 2 ,2 .
These priors are
 
1 1
p(β|σ 2 ) ∝ 2 k/2 exp − 2 (β − b)0 Q−1 (β − b) and (4.1)
(σ ) 2σ
1 h η i
2
p(σ ) ∝ 2 (ν−k)/2 exp − 2 (4.2)
(σ ) 2σ
such that
 
1 1
p(β, σ 2 ) = p(β|σ 2 )p(σ 2 ) ∝ 2 ν/2 exp − 2 η + (β − b)0 Q−1 (β − b)

(4.3)
(σ ) 2σ

The prior in (4.3) is known as the normal-gamma prior.

• The posterior distribution is

p(β, σ 2 |y) ∝p(β, σ 2 )L(β, σ 2 )


2 0 0
   
1 1 1 1 (n − k)s + (β − b) X X(β − b)
∝ 2 ν/2 exp − 2 η + (β − b)0 Q−1 (β − b)

exp −
(σ ) 2σ (σ 2 )n/2 2 σ2
 
1 1
∝ 2 (ν+n)/2 exp − 2 η + (n − k)s2 + (β − b)0 Q−1 (β − b) + (β − b)0 X0 X(β − b) .

(4.4)
(σ ) 2σ

Define
−1
Q = Q−1 + X0 X b = Q Q−1 b + X0 Xb ,

, (4.5)
−1
ν = ν + n and η = η + (n − k)s2 + (b − b)0 Q + (X0 X)−1 (b − b) (4.6)

Alan Ledesma Arista UNI Page 5 out of 12


Conjugate priors Bayesian Econometrics of Time Series

• With definitions (4.5) the following result holds (see derivations in equations (2.2-2.9) in Appendix.pdf)
−1 −1
(β − b)0 Q−1 (β − b) + (β − b)0 X0 X(β − b) =(β − b)0 Q (β − b) + (b − b)0 (X0 X)−1 + Q (b − b)
(4.7)

replacing (4.7) in (4.4) yields


 
2 1 1  2 0 −1 0 0 −1
−1
p(β, σ |y) ∝ 2 (ν+n)/2 exp − 2 η + (n − k)s + (β − b) Q (β − b) + (b − b) (X X) + Q (b − b) .
(σ ) 2σ

with definitions (4.6) the equation above can be written as


 
2 1 1  0 −1
p(β, σ |y) ∝ 2 ν/2 exp − 2 η + (β − b) Q (β − b) ,
(σ ) 2σ
     
1 1 −1 1 η
∝ 2 k/2
exp − 2 (β − b)0 Q (β − b) exp − 2 (4.8)
(σ ) 2σ (σ 2 )(ν−k)/2 2σ
η
that is, β|σ 2 , y ∼ N b, σ 2 Q and σ 2 |y ∼ Γ−1 ν−k−2
 
2 , 2

• Using identity (2.1) in the technical appendix, it can be shown that the posterior marginal distributions are
 
η
β|y ∼ t b, Q, ν − k − 1 (4.9)
ν−k−1
 
ν − k − 2 η
σ 2 |y ∼ Γ−1 , (4.10)
2 2

Alan Ledesma Arista UNI Page 6 out of 12


Independent priors Bayesian Econometrics of Time Series

5 Independent priors
• Sometimes it is convenient to formulate prior as independent distributions among coefficients

• The algorithm to draw simulations from the posterior will depend upon the shape of the resulting posterior:

– If the complete set of posterior conditional posterior distributions are easy to simulate: Gibbs sampling
– Otherwise: Metropolis-Hastings

• Complete set of posterior conditional posterior known


 
2 −1 a b
– If independent priors are set as β ∼ N (b, V) and σ ∼ Γ 2, 2 then the joint posterior is
   
1 1 b
p(β, σ 2 ) ∝ exp − (β − b)0 V−1 (β − b) a+2 exp − ,
2 (σ 2 ) 2 2σ 2
 
1 1
b + σ 2 (β − b)0 V−1 (β − b)

∝ a+2 exp − 2
(5.1)
2
(σ ) 2 2σ

Under this specification, the if no closed expression for the posterior distribution
– The posterior distribution is

p(β, σ 2 |y) ∝L(β, σ 2 )p(β, σ 2 ),


 
1 1 2 0 0

∝ 2 −n/2 exp − 2 (n − k)s + (β − b) X X(β − b)
(σ ) 2σ
 
1 1
b + σ 2 (β − b)0 V−1 (β − b) ,

× a+2 exp − 2
2
(σ ) 2 2σ
 
1 1
b + (n − k)s2 + (β − b)0 X0 X(β − b) + σ 2 (β − b)0 V−1 (β − b)

∝ a−n+2 exp − 2
(σ 2 ) 2 2σ
(5.2)

Alan Ledesma Arista UNI Page 7 out of 12


Independent priors Bayesian Econometrics of Time Series

– From where, if σ 2 is taken as given


 
1 −1
p(β|σ 2 , y) ∝ exp − 2 (β − b)0 X0 X(β − b) + σ 2 (β − b)0 V (β − b)

(5.3)

define

V = (σ 2 V−1 + X0 X)−1 and b = V(σ 2 V−1 b + X0 y) (5.4)

then
 
1 −1
p(β|σ 2 , y) ∝ exp − 2 (β − b)0 V (β − b) (5.5)

hence, β|σ 2 , y ∼ N (b, σ 2 V)


– Now, if β is taken as given
 
1 1
2
exp − 2 b + (n − k)s2 + (β − b)0 X0 X(β − b)

p(σ |β, y) ∝ a−n (5.6)
(σ 2 ) 2 +1 2σ

define

a = a − n and b = b + (n − k)s2 + (β − b)0 X0 X(β − b) (5.7)

therefore,
 
1 b/2
p(σ 2 |β, y) ∝ a exp − 2 (5.8)
(σ 2 ) 2 +1 σ
 
2 −1 a b
As a result, σ |β, y ∼ Γ 2, 2

– The set of conditional distributions is complete

Alan Ledesma Arista UNI Page 8 out of 12


Independent priors Bayesian Econometrics of Time Series

• A more general prior

– In a more general case, the posterior does not take a known form

p(β, σ 2 |y) ∝L(β, σ 2 )p(β, σ 2 ),


 
1 1 0 0
∝ 2 −n/2 exp − 2 (n − k)s + (β − b) X X(β − b) p(β)p(σ 2 ).
2

(5.9)
(σ ) 2σ

Alan Ledesma Arista UNI Page 9 out of 12


Simulations Bayesian Econometrics of Time Series

6 Simulations
Non informative prior and conjugate priors
• As the marginal posterior distributions of β and σ 2 are known and easy to simulate, we can just use then directly to
dra simulations. Theses distributions are given by equations (3.4) and (3.5) in the case of the non-informative prior and
equations (4.9) and (4.10) for the conjugate prior.

Gibbs sampling
• Gibbs sampling is a Markov Chain Monte Carlo (MCMC) based algorithm to simulate unknown distributions if the whole
set of conditional distributions is known.

• We can simulate z = {z1 , . . . , zn } ∼ f (z) with f (z) unknown but with all f (zi |z−i ) known (notation: z−i ≡
{z1 , ..., zi−1 , zi+1 , ..., zn }) in the following recursion:
(0) (0)
1. Initialize: propose starting points z(0) = {z1 , ..., zn }
(r) (r−1)
2. Simulate all zi from f (zi |z−i ) for i ∈ {1, ..., n}
3. Repeat step 2 for r = {1, ..., R + B} and disregard the first B simulations.

• Geman y Gemam (1984) showed that the previous recursion converges to the joint distribution exponentially.

At the independent prior

• Gibbs sampling can also be used in the case of the conjugate prior as the whole set of conditional posteriors distribution
belong to known distribution families.

• The sampler:

1. Initialize: Calculate a according to (5.7) and propose a starting point for σ 2(0)
(r) (r) (r) (r)
2. With σ 2 = σ 2(r−1) , calculate V and b according to (5.4) and simulate β (r) from N (b , σ(r−1)
2
Q ).

Alan Ledesma Arista UNI Page 10 out of 12


Simulations Bayesian Econometrics of Time Series

(r)
(r)
 
(r) 2(r) −1 a(r) b
3. With β = β , calculate b according to (5.7) and simulate σ from Γ 2 , 2 .
4. Repeat steps 2 and 3 for r = {1, ..., R + B} and disregard the first B simulations.

Metropolis-Hasting sampling
• (Random walk) Metropolis-Hasting sampling is a more general MCMC based simulations method.
g(z)
• We want to simulate z ∼ f (z) = k where

– g(·) is the unnormalized distribution, and


– k is the (potentially unknown) normalizing constant.

That is, f (z) ∝ g(z).

• Define the candidate generator function q(z(r−1) , z∗ ), in the case of a Gaussian random walk Metropolis-Hasting is
specified as follows

z∗ = z(r−1) +  with  ∼ N (0, Ω)

• The likelihood of accepting the candidate z∗ is

f (z∗ )q(z(r−1) , z∗ ) g(z∗ )


     
(r−1) ∗
α z , z = min , 1 = min ,1 .
f (z(r−1) )q(z∗ , z(r−1) ) g(z(r−1) )

The likelihood of proposing an state z∗ departing from state z(r−1) should equal the likelihood of proposing an state
z(r−1) departing from state z∗ (i.e., q(z∗ , z(r−1) ) = q(z(r−1) , z∗ ): balanced sampling)

• The sampler is given by the following recursion

1. Initialize: propose starting point z(0)


2. Simulate z∗ from q(z(r−1) , z∗ ) and calculate its likelihood of acceptance α(z(r−1) , z∗ )

Alan Ledesma Arista UNI Page 11 out of 12


Simulations Bayesian Econometrics of Time Series

3. Simulate u from U (0, 1), if u < α(z(r−1) , z∗ ) set z(r) = z∗ but z(r) = z(r−1) otherwise.
4. Repeat steps 2 and 3 until r = R + B and drop the first B simulations.
At the Independent prior
• The density to simulate is f (β, σ 2 ) ≡ p(β, σ 2 |y) which is proportional to
 
1 1
g(β, σ 2 ) ≡ 2 −n/2 exp − 2 (n − k)s2 + (β − b)0 X0 X(β − b) p(β)p(σ 2 )

(6.1)
(σ ) 2σ

• The sampler
1. Initialize: set β (0) , σ 2(0) and select Ω " ∂ 2 log g(β,σ2 ) ∂ 2 log g(β,σ 2 )
#

∂β∂β 0 ∂β∂σ 2
It is customary to set (β (0) , σ 2(0) ) = arg max log g(β, σ 2 ) and Ω = c for some

∂ log g(β,σ 2 )
2
∂ log g(β,σ 2 )
2
∂∂σ 2 β 0 ∂ 2 σ2
(0)
β ,σ 2(0)
c>0
2. Simulate (β ∗ , σ 2∗ ) from
β ∗ = β (r−1) + β (6.2)
σ 2∗ = σ 2(r−1) + σ2 (6.3)
where [0β σ2 ]0 ∼ N (0, Ω)
3. Calculate the candidate likelihood of acceptance
h i
∗ 2∗ (r−1) 2(r−1)
log α = min log g(β , σ ) − log g(β ,σ ), 0 . (6.4)

4. Simulate u from U (0, 1), if log u < log α set (β (r) , σ 2(r) ) = (β ∗ , σ 2∗ ) but (β (r) , σ 2(r) ) = (β (r−1) , σ 2(r−1) ) otherwise.
5. Repeat steps 2 and 3 until r = R + B and drop the first B simulations.

n 1
log g(β, σ 2 ) ≡ − σ 2 − 2 (n − k)s2 + (β − b)0 X0 X(β − b) + log p(β) + log p(σ 2 )

(6.5)
2 2σ

Alan Ledesma Arista UNI Page 12 out of 12

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy