Print Merged
Print Merged
A new design of a nuclear reactor is suggested for which it is claimed that a nuclear meltdown is impossible. In order
to verify this, a numerical code h (θ) has been developed which (perfectly) simulates the reactor under operating
conditions θ, where it is known that θ ∼ p (θ). The numerical code returns for each θ either 1 or 0 according to
(
1 if nuclear meltdown occurs
h (θ) = (1.1)
0 if NO nuclear meltdown occurs
Using this code a Monte Carlo simulation with N = 108 samples has been run in order to estimate the probability q
of a nuclear meltdown. This Monte Carlo estimate of the the failure probability q was exactly zero.
on
• H1 : nuclear meltdown is impossible, i.e. q = 0
• H2 : nuclear meltdown is possible but rare, i.e. q = 10−10
ti
0 a) Write down the Monte Carlo estimator used to obtain the numerical estimate
1
2
The Monte Carlo estimator is given by
lu
3
4
5
N
1 X (i)
Iˆ = h θ θ (i) ∼ p (θ)
N i=1
So
with θ (i) i.i.d samples from p (θ) and N = 108 .
e
pl
m
Sa
– Page 2 / 10 –
b) Assess the plausibility of H1 compared to H2 within the Bayesian framework given the results obtained during 0
the Monte Carlo study. Can you give preference to either of the hypotheses? 1
2
3
Due to a-priori equal probabilites of hypothesis H1 and H2 it follows 4
5
6
p (H1 |D) p (D|H1 ) 7
=
p (H2 |D) p (D|H2 ) 8
9
Note that for a Monte Carlo estimator to yield exactly zero probability it implies that there have been 10
N = 108 numerical simulations that predicted no occurence of a nuclear meltdown; as such the data is given 11
12
by D {0, 0, ..., 0} with |D| = 108 . We therefore obtain the Bayes factor 13
14
15
p (H1 |D) p (D|H1 ) 1
on
= = 108
≈ 1.0101
p (H2 |D) p (D|H2 ) (1 − q)
I.e. one can conclude that the simulations run for the Monte Carlo analysis do not offer any evidence for a
fail-safe design vs. a small failure probability of q = 10−10 .
ti
lu
So
e
pl
m
Sa
– Page 3 / 10 –
Problem 2 SIR Model (20 credits)
The SIR model is a simple epidemiological model for the spread of an infectious disease within a population of N
people. It is given by a set of ordinary differential equation
dS I ·S
= −β (2.1)
dt N
dI βI · S
= − γI (2.2)
dt N
dR
= γI (2.3)
dt
where
on
• β > 0, γ > 0 are model parameters,
ti
d(S+I+R)
• Note that S + I + R = N (constant) for all t (i.e. dt = 0).
The variables S, I, R depend on time and their values at t = 0 are known. A dataset DK has been collected which
lu
K
contains the number of infected people I1 , ..., IK at time instances t1 , ..., tK , i.e. DK = {(tk , Ik )}k=1 .
– Page 4 / 10 –
a) complete the function PandemicRisk whichtakes as inputs the function handles prior and likelihood and 0
β 1
returns an estimate of the probability q = P r γ > 1 (The ratio R0 = β/γ is known as the basic reproduction
2
number and when R0 > 1 the number of infected people grows at an exponential rate) 3
4
Note: Both the likelihood and prior implementations return probability zero if either β ≤ 0 or λ ≤ 0. Any 5
implementation that runs in finite time is acceptable - efficiency is no consideration. 6
7
8
9
1 function q = PandemicRisk(prior, likelihood) 10
2 11
3
12
13
4
14
5 % the first part of the problem consists of writing a standard MCMC 15
on
6 % sampler as introduced in the lecture and / or exercise
7
8 N = 10^8; stepsize = 1;
9 beta = 1; gamma = 1;
10 p = prior(beta,gamma) * likelihood(beta,gamma);
ti
11
lu
14 beta_samples = zeros(1,N);
15 gamma_samples = zeros(1,N);
16
17 for n=1:N
So
18
25 beta = beta_proposed;
26 gamma = gamma_proposed;
pl
27
28
31 end
32
37
38 end
– Page 5 / 10 –
0 b) describe or mark within your code of the PandemicRisk function where the numerical expense of solving the
1 ordinary differential equations (i.e. Eqs. (2.1) - (2.3)) occurs
2
3
4 The numerical burden of solving the ODE occurs in Line 21, where the posterior and thus the likelihood of
5 the proposed β, γ values are evaluated.
on
ti
lu
So
e
pl
m
Sa
– Page 6 / 10 –
Problem 3 Linear latent variable model (15 credits)
A set of random vectors x(i) ∈ R4 is generated as follows:
where:
β1−1
0 0 0
0 β2−1 0 0
• y (i) ∈ R4 and Σ = with β1 < β2 < β3 < β4 (given).
0 0 β3−1 0
0 0 0 β4−1
on
• γ ∈ R4 (given)
N
Given the data D = x(i) i=1 we use an Expectation-Maximization algorithm to find the maximum likelihood
estimate of the parameters W , b, σ 2 of a linear latent variable model
ti
∼ N 0, σ 2
x = Wz + b + (3.2)
where
5
pl
m
1
2
Since L defines a rotation it is clear based on the ordering of the eigenvalues that the principal directions, i.e. 3
span (W ), will be given by the span of the two first column vectors of L. Given the spherical prior on z and 4
the rotation-invariance in the latent space, no further statements can be made. 5
– Page 7 / 10 –
0 c) Maximum likelihood estimate of parameter σ 2 ∈ R+
1
2
3 The MLE of σ 2 depends on the smallest eigenvalues of Σ, since the eigenvalues of the covariance of x are
2
4 unaffected by L and γ corresponding to a rotation or shift. Given the ordering of βi therefore σM LE only
1 −1 −1
2
5 depends on β3 and β4 , defininig the largest precision values, i.e. σM LE = 4−2 β3 + β4
on
ti
lu
So
e
pl
m
Sa
– Page 8 / 10 –
Problem 1 Auxiliary Variable Markov Chain Monte Carlo (15 credits)
We want to make use of Markov Chain Monte Carlo to sample from a posterior distribution π (x), with x ∈ RD .
To this end we consider a MCMC Method where the proposal mechanism makes use of an auxiliary variable
u ∼ U [0, 1], as explained in the code below. The function takes as input a function handle posterior (which returns
π (x)) and the dimension D = dim (x).
0 Implement the accept-reject step (line 20 - 38) to obtain a valid Metropolis-Hastings algorithm. Do not alter the
1 code outside these lines, and do not alter the way in which proposals are generated.
2
3
4
1 function X = MCMC(posterior, D)
5
6 2
7 3 % The array X needs to contain valid samples from the posterior when returned
8 N = 10^8; X = zeros(N,D); x = ones(D,1);
n
4
9
10 5 % function handle takes D-dimensional vectors and returns posterior probability
11 6 p = posterior(x);
tio
12 7
13
14
8 for n=1:N
15 9
lu
12 if u < 0.75
13 y = x + randn(D,1);
14 else
15 y = x + sqrt(10)*randn(D,1);
So
16 end
17
25 end
26
27
28
m
29
30
31
32
Sa
33
34
35
36
37 % ===============================================================================
38 X(n,:) = x;
39 end
Reminder: rand() returns a sample from U (0, 1), randn() returns a sample from N (0, 1).
– Page 2 / 12 –
Problem 2 Buckling Mode (10 credits)
We consider a beam under vertical and horizontal loading. Buckling is assumed to occur
if the horizontal force FH exceeds 10% of the vertical force FV , which we simplify to be
FV
given by the event FH > 0.1 · FV . It is known that FH and FV are jointly Gaussian
FH
∼ N (µ, Σ)
FV
where the parameters µ and Σ are fully defined by the marginal distributions
FH
FH ∼ N (1, 0.5) FV ∼ N (10, 2)
n
and the correlation coefficient ρ ∈ [0, 1] of FH and FV .
tio
Derive the probability q for buckling to occur. 0
1
Note: You may expess your answer wrt. the CDF Φ (·) (or its inverse) of the distribution N (0, 1). 2
3
4
The probability q is found to be independent of ρ, since 5
6
lu
7
FH 1 8
q=p > 0.1 = p (FH > 0.1 · FV ) = p (FH − 0.1 · FV > 0) = 9
FV 2
10
So
The last step follows by noting that Z := FH − 0.1 · FV is zero-mean Gaussian, i.e. E [FH − 0.1 · FV ) = 0.
e
pl
m
Sa
– Page 3 / 12 –
Problem 3 Statistical Independence (5 credits)
Consider a random vector x ∼ N (0, K ) where K is:
1 0.5 0 0 0
0.5 1 0.5 0 0
K =
0 0.5 1 0.5 0
0 0 0.5 1 0.5
0 0 0 0.5 1
n
4
5
x1
∼ N (0, I)
tio
x5
lu
So
Problem 4 Linear Kernel (5 credits)
5
We wish to use Gaussian Process regression to infer a mapping implied by the dataset {xn , fn }n=1 , where fn = f (xn ).
e
For this purpose we introduce a zero-mean Gaussian Process GP (0, C ) with a stationary covariance function
defined by
pl
C xi , xj = xiT xj + δij σ 2
0 Specify in which circumstance you will obtain p (f ∗ |x ∗ , D) = p (f ∗ |x ∗ ), i.e. the posterior is equal to the prior.
1
2
3 This is the case when the covariance between the observed data points and f ∗ is zero. We therefore require
Sa
4 x ∗ T xn = 0 ∀n ∈ {1, 2, 3, 4, 5}, i.e. all xn in the dataset need to be orthogonal to x ∗ (leading to exclusively
5 zero off-diagonal entries and no coupling).
– Page 4 / 12 –
Problem 5 Linear elastic bar (21 credits)
We consider a linear elastic bar of length l = 1 and cross-sectional area Acs = 1
l=1
Acs = 1
n
u (x) = θx
where θ is the reciprocal of the elastic modulus E > 0, i.e. θ = 1/E . Suppose θ is unknown and we try to determine
tio
it by measuring the displacement at x = 1. Suppose we obtain a noisy measurement u1 which is assumed to relate
to u(x = 1) as follows:
∼ N 0, σ 2
u1 = u (x = 1) +
We assume that a priori θ ≥ 0 follows an exponential distribution, i.e. p (θ) = λ exp (−λθ), where λ > 0 and σ 2 > 0
lu
are given.
1 1
p (u1 |θ) = √ exp − (u1 − θ)2
2πσ 2 2σ 2
pl
m
2
Noting that the prior p (θ) only has support for θ ≥ 0 3
4
5
p (θ|u1 ) ∝ p (u1 |θ) p (θ) 6
1
∝ exp − 2 (u1 − θ)2 exp (−λθ) · Iθ≥0 (θ)
2σ
(
1 if θ ≥ 0
Iθ≥0 (θ) =
0 else
– Page 5 / 12 –
0 c) determine the MAP estimate of θ (for σ 2 = 1, λ = 1/2)
1
2
3
4 1 1
5 p (θ|u1 ) ∝ exp − (u1 − θ)2 − θ · Iθ≥0 (θ)
2 2
1 2 2
= exp − u1 − 2θu1 + θ + θ · Iθ≥0 (θ)
2
1 2 1
∝ exp − θ − 2θ u1 − · Iθ≥0 (θ)
2 2
2 !
1 1
∝ exp − θ − u1 − · Iθ≥0 (θ)
2 2
n
Therefore θMAP = max (u1 − 0.5, 0), i.e.
tio
(
u1 − 0.5 if u1 > 0.5
θMAP =
0 else
0
1
2
lu
d) determine the Laplace approximation of the posterior if u1 = 1.5 (and σ 2 = 1, λ = 1/2)
From the result of the previous problem it follows trivially that the Laplace approximation (for u1 > 0.5) is given
So
3
by N (θ|u1 − 0.5, 1), i.e. N (θ|1, 1). Alternatively one may obtain the result by taking the second derivative at
the mode.
e
pl
m
Sa
– Page 6 / 12 –
Problem 6 Monte Carlo estimator (24 credits)
We wish to estimate the value of the integral
Z+∞
2
I= x 3 e −2x dx
−∞
N
1 X
h xi xi ∼ p xi
Î =
N
i=1
n
where x i are independent and identically distributed random numbers drawn from a PDF p x i .
a) specify a valid choice for the form of h (·) and the distribution p x i 0
tio
1
2
1
With σ = 2
3
4
5
Z∞ Z∞ √ 6
3 2 3 2πσ 2 1
exp − 2 x 2
I= x exp −2x dx = x √ dx 7
lu
2πσ 2 2σ 8
−∞ −∞
Z∞ √ h √ i
= x 3 2πσ 2 N (0, 0.25) dx = EN (0,0.25) x 3 2πσ 2
So
−∞
this implies
√
h (x) = x 3 0.5π p x i = N (0, 0.25)
e
pl
m
b) explain why for the choices above the resulting Monte Carlo estimator wil converge to I
Sa
0
1
2
Explanation should contain
• Ep (x i ) h(x i ) = I and Var Î → 0
– Page 7 / 12 –
0 c) implement the function MonteCarlo which returns the estimate Î (using your previous choice of h (·) , p x i ).
1
2
3
1 function I_hat = MonteCarlo()
4
2
5
6 3 % number of Monte Carlo samples
4 N = 10^8;
5
6 I_hat = 0;
7
8 for n = 1:N
9 % sample x_i ~ N(0, 0.25)
10 x_i = 0.5*randn();
n
11 % additional term due to normalization of N(0, 0.25)
12 I_hat = I_hat + (x^3)*sqrt(0.5*pi);
13 end
tio
14
15 I_hat = I_hat / N;
16
17
18
lu
19
20
21
22
So
23
24
25
26
27
28
e
29
30
31
pl
32
33
34
35 end
m
Sa
Reminder: rand() returns a sample from U (0, 1), randn() returns a sample from N (0, 1).
– Page 8 / 12 –
d) consider an alternative estimator (for I) of the form 0
1
N
2
1 X 3
h xi + a xi − µ
Ĩ = 4
N
i=1 5
6
where µ = Ep (x i ) x i and x i are again i.i.d. samples from p x i . 7
8
For which value of a will the estimator above converge the fastest? Provide a numerical value.
• You mase use: the first four moments of a zero-mean Gaussian Y ∼ N 0, σ 2 are given by
n
E Y 2 = σ2 E Y3 = 0 E Y 4 = 3σ 4
E [Y ] = 0
• You can reuse known results from the lecture or exercise to expedite the solution
tio
This is a Monte Carlo estimator making use of a control variate. As discussed e.g. in problem sheet 6 the
optimal value is given by
√
e
∗ 2πσ 2 · 3σ 4 √
a =− = − 2π · 3σ 3 ≈ −0.94
σ2
pl
m
Sa
– Page 9 / 12 –
Problem 1 Monte Carlo (12 credits)
A rod in a truss system experiences significant loading on any given day with probability q = 0.1 due to wind. If the
truss is under signficant loading due to wind, the maximum stress it experiences on this day can be modeled as a
Gaussian (in MPa)
X ∼ N µ = 40, σ 2 = 9
0 Complete the Matlab function to return a numerical estimate of the expected number of days within a year (365
1 days), for which the stress exceeds 60 MPa.
2
3
4
1 function D = MonteCarlo()
5
n
6 2 % D: expected number of days for which stress exceeds 60MPa (in a year)
7 3
8 4 N = 1e6;
tio
9
10 5 mu = 40;
11 6 sigma = 3;
12 7
8 I = 0;
9
lu
10 for n=1:N
11 days_threshold_exceeded = 0;
12 for d=1:binornd(365,0.1)
13 S = mu + sigma*randn();
So
14
15 if S > 60
16 days_threshhold_exceeded = days_threshold_exceeded + 1;
17 end
18 end
19
e
20 end
21
22 D = days_threshold_exceeded / N;
pl
23
24 end
25
26
m
27
28
29
Sa
30
31
32 end
Reminder: rand() returns a sample from U (0, 1), randn() returns a sample from N (0, 1), binornd(n,p) returns a
sample from Binomial(n, p).
– Page 2 / 12 –
Problem 2 Model Selection (14 credits)
We consider a bar of length ` = 1 and cross-sectional area Acs = 1 subjected to a force F
`=1
Acs = 1
The following two models which predict the displacement of the bar along the x-direction are proposed:
n
√
M1 : u (x) = α x M2 : u (x) = β x
tio
p (α) = N (1, 0.01) p (β ) = N (1, 0.01)
lu
û = u (x = 0.64) + ∼ N (0, 0.01)
p(M1 |û)
Find the evidence ratio , if a-priori both models are considered equally plausible and û = 0.70. 0
So
p(M2 |û)
1
Hint: there is a way to do this without solving an integral. 2
3
4
Provide a numerical value of the evidence ratio for û = 0.70. 5
6
With p (M1 ) = p (M2 ) = 0.5 7
8
e
9
p (M1 |û) p (û|M1 ) p (M1 ) p (û|M1 ) 10
R= = = 11
p (M2 |û) p (û|M2 ) p (M2 ) p (û|M2 )
pl
12
13
The remaining terms involve a (tractable) integral 14
Z Z
m
Note however that we can stidestep this integration due to the properties of a Gaussian being closed under
liner transformations and addition. With x = 0.64 we obtain for
Sa
√
Model 1: û = α x + ε
√ √
E [û] = E α x + ε = 0.64 · E [α] + E [] = 0.8
√
Var [û] = Var α x + ε = x · Var [α] + Var [] = 0.64 · 0.01 + 0.01 = 0.0164
Model 2: û = β x + ε
From which follows p (û|M1 ) = N (0.80, 0.0164) and p (û|M2 ) = N (0.64, 0.014096). For û = 0.70 this yields
2.2966
R≈ ≈ 0.7766
2.9574
– Page 3 / 12 –
Problem 3 Probability Density Function (20 credits)
We consider the following PDF
0
for x < a,
2(x −a)
(b −a)(c −a)
for a ≤ x ≤ c,
fX (x) = 2(b −x)
(b −a)(b −c)
for c < x ≤ b,
0 for x > b
with parameters a = 0, b = 2, c = 1.
n
For the given parameters, fX (x) defines a simple triangular distribution with the mean obviously E [X ] = 1. 3
4
5
tio
0 for x < 0, 6
7
x for 0 ≤ x ≤ 1,
fX (x) = 8
2−x for 1 < x ≤ 2, 9
0 for x > 2 10
Making use of Var [X ] = E X 2 − E2 [X ]
E X
2
=
Z+∞
2
x fX (x) dx =
Z1
3
x dx +
lu
Z2
x 2 (2 − x) dx
So
−∞ 0 1
1 2
1 4 2 3 1 4
= x + x − x
4 0 3 4 1
1 16 2 1 7
= + −4− + =
4 3 3 4 6
e
– Page 5 / 12 –
0 b) Complete the Matlab function sample() to generate and return a single sample from the conditional distribution
1 fX |X ≤1 x |x ≤ 1 . The only random number generator you are allowed to use is rand(), which returns a sample
2 from the uniform distribution U (0, 1). Provide a short derivation / justification for your solution.
3
4
5
6 1 function x = sample()
7 2
8 3 u = rand();
9
4
10
5 % solution:
6 x = sqrt(u)
7
n
9
10
11
tio
12
13
14
15
16 end
lu
With the PDF symmetric around 1, it follows p (X ≤ 1) = 0.5. Therefore the conditional PDF
So
follows immediately as
(
2x 0≤x≤1
fX |X ≤1 (x) =
0 else
With FX |X ≤1 (x) = x 2 (for 0 ≤ x ≤ 1) it follows that u ∼U (0, 1) will yield the desired distribution
√
if mapped via x = u, since d /dxF −1 (x) = d /dx x 2 = 2x (again, for 0 ≤ x ≤ 1).
e
pl
m
Sa
– Page 6 / 12 –
Problem 4 Elastic Rod (10 credits)
A rod made of material with yield strength Y modeled with a Gaussian
Y ∼ N (100, 5)
X ∼ N (70, 10)
n
3
4
We are interested in the probability that X exceeds Y , i.e. 5
tio
6
7
8
p (X > Y ) = p (X − Y > 0) = 1 − p (X − Y ≤ 0)
9
10
Let D := X − Y with D ∼ N (70 − 100, 10 + 5) = N (−30, 15), then with Z ∼ N (0, 1)
lu
30 30
p (X > Y ) = 1 − p (D ≤ 0) = 1 − p Z≤ √ =1−Φ √ ≈ 1 − Φ (7.746)
15 15
– Page 7 / 12 –
Problem 5 Bayesian Inference (20 credits)
The measured displacements Y1 , Y2 of a material at two different locations are assumed to relate to an externally
applied force X as follows:
Y1 = 2X + W1
Y2 = X + W2
with W1 ∼ N (0, 2) and W2 ∼ N (0, 4) assumed independentent, and the a-priori belief X ∼ N (0, 1).
Y1 = 5 Y2 = 4
n
0 a) Derive the maximum likelihood estimate of X given Y1 , Y2
tio
1
2
3 The likelihood factorizes
4
5
6 L = p (y1 , y2 |x) = N (y1 |2x, 2) · N (y2 |x, 4)
7
lu
8 working with the log-likelihood
9
10
11 11
log L = − (y1 − 2x)2 − (y2 − x)2 + const.
So
22 24
we take the derivative
d 1
log L = y1 − 2x + (y2 − x)
dx 4
9 !
=6− x =0
4
e
– Page 8 / 12 –
b) Complete the MCMC function below such that it returns N = 5000 samples from the a-posteriori distribution of X 0
using a Random Walk Metropolis Hastings algorithm. 1
2
3
4
1 function X = MCMC()
5
2 6
3 % init 7
4 N = 5000; 8
9
5 X = zeros(1,N); 10
6 y1 = 5; y2 = 4;
7 x = 0;
8
n
10 posterior = @(x) normpdf(y1, 2*x, sqrt(2))*normpdf(y2, x, 2)*normpdf(x,0,1);
11
12 for n=1:N
tio
13
14 % propose
15 x_proposed = x + randn();
16
lu
18 x = x_proposed;
19
20 X(n) = x;
21
So
22
23
24
25
26 end
27
end
e
28
Alternatively of course p (y1 , y2 |x) can also be implemented using the joint distribution. This
Sa
is identical to the exercise, and merely the posterior needs to be adapted to the specific
case.
Reminder: rand() returns a sample from U (0, 1), randn() returns a sample from N (0, 1)
– Page 9 / 12 –
Problem 6 Gaussian Process Regression (14 credits)
We wish to solve a regression problem using the Gaussian Process prior f ∼ GP (0, C ), where we assume a linear
kernel
C xi , xj = xi xj
N
The observed values {yn }n=1 are corrupted by additive Gaussian noise
yn = f (xn ) + n n ∼ N (0, 1)
n
(x1 , y1 ) = (0, 1) (x2 , y2 ) = (1, 1)
tio
0 What can you say about the value of the function f at x3 = 2 based on this data?
1
2 Provide numerical values for the parameters of the PDF.
3
4
5 Let f3 = f (x3 ). The vector z = [y1 , y2 , f3 ] is jointly Gaussian with zero mean and covariance matrix following
lu
6 from Cij = xi xj , and - for fy , y2 , the additive Gaussian noise (+δij ).
7
8
9
1 0 0
10
So
11
C= 0 2 2
12 0 2 4
13
14 which implies that we can disregard the obserd value y1 due to independence. We can instead consider
y2 0 2 2
∼N ,
f3 0 2 4
e
from the lecture / exercise it is known that the conditional distribution follows as
pl
p (f3 |D) = N µ, σ 2
with
m
– Page 10 / 12 –