0% found this document useful (0 votes)
87 views22 pages

Sampling Methods: Søren Højsgaard

The document describes various sampling methods for Bayesian modeling including rejection sampling, sampling importance resampling (SIR), and Markov chain Monte Carlo (MCMC) methods. Rejection sampling draws samples from a proposal distribution and accepts them with a probability proportional to the target distribution. SIR first draws weighted samples from a proposal distribution and then resamples to approximate the target distribution. MCMC methods like Metropolis-Hastings and Gibbs sampling generate dependent samples that converge to the target distribution.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
0% found this document useful (0 votes)
87 views22 pages

Sampling Methods: Søren Højsgaard

The document describes various sampling methods for Bayesian modeling including rejection sampling, sampling importance resampling (SIR), and Markov chain Monte Carlo (MCMC) methods. Rejection sampling draws samples from a proposal distribution and accepts them with a probability proportional to the target distribution. SIR first draws weighted samples from a proposal distribution and then resamples to approximate the target distribution. MCMC methods like Metropolis-Hastings and Gibbs sampling generate dependent samples that converge to the target distribution.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
You are on page 1/ 22

Sampling methods

Søren Højsgaard
Department of Mathematical Sciences

Aalborg University, Denmark

June 3, 2014

Contents
1 Introduction – Bayesian modelling 1

2 Computations using Monte Carlo methods 2


2.1 Rejection sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Example: Rejection sampling . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.3 QUIZ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.4 Sampling importance resampling (SIR)* . . . . . . . . . . . . . . . . . . . 5
2.5 Example: Sampling importance resampling (SIR)* . . . . . . . . . . . . . . 5

3 Markov Chain Monte Carlo methods 6


3.1 The Metropolis–Hastings (MH) algorithm . . . . . . . . . . . . . . . . . . 6
3.2 Special cases of the Metropolis–Hastings algorithm . . . . . . . . . . . . . 7
3.3 Example: Metropolis–Hastings algorithm . . . . . . . . . . . . . . . . . . . 7
3.4 Capture–recapture revisited . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.5 Quiz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.6 Single component Metropolis–Hastings . . . . . . . . . . . . . . . . . . . . 10
3.7 The Gibbs sampler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.8 Slice sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.9 Towards omnibus software . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.10 Sampling from a (truncated) poisson . . . . . . . . . . . . . . . . . . . . . 16
3.11 Sampling on the unit interval . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.12 Capture–recapture revisited . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.13 Work on the log–scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

1
1 Introduction – Bayesian modelling
• In a Bayesian setting, parameters are treated as random quantities on equal footing
with the random variables.
• The joint distribution of a parameter (vector) ✓ and data (vector) y is specified
through a prior distribution ⇡(✓) for ✓ and a conditional distribution p(y | ✓) of data
for a fixed value of ✓.
• This leads to the joint distribution for data AND parameters
p(y, ✓) = p(y | ✓)⇡(✓)

• The prior distribution ⇡(✓) represents our knowledge (or uncertainty) about ✓ before
data have been observed.

• After observing data y, the posterior distribution ⇡ ⇤ (✓) of ✓ is obtained by condition-


ing with data which gives
p(y|✓)⇡(✓)
⇡ ⇤ (✓) = p(✓|y) = / L(✓)⇡(✓)
p(y)
R
where L(✓) = p(y | ✓) is the likelihood and the marginal density p(y) = p(y | ✓)⇡(✓)d✓
is the normalizing constant.
• Often we are interested in the posterior mean of some function g(✓):
Z
E(g(✓)|⇡ ) = g(✓)⇡ ⇤ (✓)d✓

Examples: E(✓|⇡ ⇤ ) or Var(✓|⇡ ⇤ ).


• However, usually
R ⇡ ⇤ (✓) can not be found analytically because the normalizing con-
stant p(y) = p(y | ✓)⇡(✓)d✓ is intractable.

• In such cases one will often resort to sampling based methods: If we can draw samples
✓(1) , . . . , ✓(N ) from ⇡ ⇤ (✓) we can do just as well:
1 X
E(g(✓)|⇡ ⇤ ) ⇡ g(✓(i) )
N i

• The question is then how to draw samples from ⇡ ⇤ (✓) where ⇡ ⇤ (✓) is only known up
to the normalizing constant.
• There are many methods for achieving this; these methods are known as Markov
Chain Monte Carlo (MCMC) methods and will be described elsewhere.
• Sections marked with “*” in the following can be skipped at first reading.

2
2 Computations using Monte Carlo methods
Consider a random vector X with density / probability mass function p(x) which is the
target distribution (from which we want to sample).
In many real world applications

• we can not directly draw samples from p.

• p is only known up to a constant of proportionality; that is

p(x) = k(x)/c

where k() is known and the normalizing constant c is unknown.

We reserve h(x) for a proposal distribution which is a distribution from which we can
draw samples.

2.1 Rejection sampling


Let p(x) = k(x)/c be a density where k() is known and c is unknown. Let h(x) be a
proposal distribution.
Suppose we can find a constant M such that k(x) < M h(x) (i.e. k(x)/M < h(x) and hence
k(x)/M
h(x)
< 1) for all x. The algorithm is then

1. Draw sample x ⇠ h(). Draw u ⇠ U (0, 1).


k(x)/M
2. Set ↵ = h(x)

3. If u < ↵, accept x.

The accepted values x1 , . . . xN is a random sample from p(·).


Notice:

• It is tricky to choose a good proposal distribution h(). It should have support at


least as large as p() and preferably heavier tails than p().

• It is desirable to choose M as small as possible which is difficult in practice. Hence


one often chooses a large value of M whereby only few proposed values are accepted
so it is difficult to make rejection sampling efficient.

3
2.2 Example: Rejection sampling
> k <- function(x, a=.4, b=.08){exp(a*(x-a)^2 - b*x^4)}
> x <- seq(-4, 4, 0.1)
> plot(x,k(x),type="l")
0.0 0.5 1.0 1.5 2.0 2.5 3.0

−4 −2 0 2 4

> # uniform proposal on [-4,4]:


> h <- function(x){rep.int(0.125,length(x))}
> # we can find M in this case:
> M <- round(max(k(x)/h(x))) + 1; M
[1] 25
> # number of samples
> N <- 1000
> # generate proposals and u
> x.h <- runif( N, -4, 4 )
> u <- runif( N )
> acc <- u < k(x.h) / (M * h(x.h))
> x.acc <- x.h[ acc ]
> # how many proposals are accepted
> sum( acc ) /N
[1] 0.335
> # calculate some statistics
> c(m=mean(x.acc), s=sd(x.acc))
m s
-0.60006 1.42031
> par(mfrow=c(1,2), mar=c(2,2,1,1))
> plot(x,k(x),type="l")
> barplot(table(round(x.acc,1))/length(x.acc))

4
0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.04
0.02
0.00
−4 −2 0 2 4 −2.8 −1.6 −0.4 0.7 1.6 3.1

2.3 QUIZ
Reuse the code from above to answer these questions, but please think about what the
results would be before executing the code.

• Suppose we could not easily determine M and hence had to make a conservative
choice; say M = 100 or M = 500 in this context.
Which e↵ect would that have on the number of accepted samples, and how would
you have to compensate?

• Suppose we take the proposal distribution h() to be uniform om [ 10, 10]. Which
e↵ect would that have on the acceptance rate? What if the proposal distribution is
an N (0, 1)? What is the quality of the samples in this case? Hint: Use dnorm() to
evaluate the normal density.

2.4 Sampling importance resampling (SIR)*


When M is not readily available, we may generated approximate samples from p as follows.

1. Draw samples x1 , . . . xN ⇠ h(x).

2. Calculated importance weights wi = p(xi )/h(xi ).


P
3. Normalize the weights as qi = wi / j wj .

4. Resample from {x1 , . . . xN } where y i is drawn with probability qi .

The samples obtained in 4. are approximately samples from p.


Notice:

• This scheme works also if p is only known up to proportionality (because the nor-
malizing constant cancels out in step 3. above).

5
• Samples from h which “fits best to p” are those most likely to appear in the resample.
However, if h is a poor approximation to p then the “best samples from h” are not
necessarily good samples in the sense of resembling p.

2.5 Example: Sampling importance resampling (SIR)*


> h <- function(x){rep.int(0.125,length(x))}
> N <- 1000
> x.h <- runif( N, -4, 4 )
> u <- runif( N )
> ww <- k(x.h) / h(x.h)
> qq <- ww / sum(ww)
> x.acc <-sample(x.h, prob=qq, replace=T)
> par(mfrow=c(2,2), mar=c(2,2,1,1))
> plot(x,k(x),type="l")
> barplot(table(round(x.acc,1))/length(x.acc))
0.0 1.0 2.0 3.0

0.03
0.00

−4 −2 0 2 4 −3.1 −2 −1.1 0 0.7 1.6 2.5

3 Markov Chain Monte Carlo methods


• A drawback of the rejection algorithm and the SIR–algorithm is that it is difficult to
suggest a proposal distribution h which leads to an efficient algorithm.
• For the rejection algorithm, it is also difficult to find M .
• A way around this problem is to let the proposed values depend on the last accepted
values: If x0 is a “likely” value from p then so is probably also a proposed value x
which is “close” to x0 .
• Hence the proposal distribution will now be conditional on the last accepted value
and have the form h(x|x0 ).

• This leads schemes (described below) for drawing samples x1 , . . . , xN and these sam-
ples will, under certain conditions, form an ergodic Markov chain with p(x) as its
stationary distribution.

6
• Hence, the expected value of any function of x can be calculated approximately as
Z
1 X
g(x)p(x)dx ⇡ g(xi ).
N i

• The samples x1 , . . . , xN will typically be correlated because the value xj will be


generated from h(·|xj 1 ) and will hence depend on xj 1 .

3.1 The Metropolis–Hastings (MH) algorithm


Given an accepted value xt 1 :

1. Draw x ⇠ h(·|xt 1 ). Draw u ⇠ U (0, 1).


✓ ◆
p(x) h(xt 1 |x)
2. Calculate acceptance probability ↵ = min 1, p(xt 1 ) h(x|xt 1 )

3. If u < ↵ then set xt = x; else set xt = xt 1 .

After a burn–in period the samples x1 , x2 , . . . will be samples from p(·).


Notice:

• The samples x1 , x2 , . . . will be correlated.


• The algorithm also works if p is only known up to proportionality (because the
normalizing constant cancels when calculating the acceptance probability).
• We must be able to both sample from h() and evaluate the density.

3.2 Special cases of the Metropolis–Hastings algorithm


Metropolis algorithm (a special case of the Metropolis-Hastings algorithm) The pro-
posal distribution is symmetrical, i.e. h(x|x0 ) = h(x0 |x) for all pairs (x, x0 ). Hence
the acceptance probability is ↵ = min 1, p(xp(x)
t 1) .

Random–walk Metropolis A popular choice for proposal in a Metropolis algorithm is


h(x|x0 ) = g(x x0 ) where g is symmetric, e.g.
x = x0 + e e ⇠ g

Example: x = x0 + N (0, 2
)
The independence sampler (A special case of the Metropolis–Hastings algorithm) The
proposal h(x|x0 ) = h(x) does not depend on x0 .) The acceptance probability be-
h(xt 1 )
comes ↵ = min 1, p(xp(x)
t 1 ) h(x) . For this sampler to work well, h should be a good
approximation to p.

7
3.3 Example: Metropolis–Hastings algorithm
Random walk Metropolis is straight forward to implement
> N <- 10000
> x.acc5 <- rep.int(NA, N)
> u <- runif(N)
> acc.count <- 0
> std <- 0.05 ## Spread of proposal distribution
> xc <- 0; ## Starting value
> for (ii in 1:N){
xp <- rnorm(1, mean=xc, sd=std) ## proposal
alpha <- min(1, (k(xp)/k(xc)) *
(dnorm(xc, mean=xp,sd=std)/dnorm(xp, mean=xc,sd=std)))
x.acc5[ii] <- xc <- ifelse(u[ii] < alpha, xp, xc)
## find number of acccepted proposals:
acc.count <- acc.count + (u[ii] < alpha)
}
> ## Fraction of accepted *new* proposals
> acc.count/N
[1] 0.9846
> par(mfrow=c(1,2), mar=c(2,2,1,1))
> plot(x,k(x),type="l")
> barplot(table(round(x.acc5,1))/length(x.acc5))
0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.04
0.02
0.00

−4 −2 0 2 4 −3.4 −2 −1 0 0.9 2 2.9

3.4 Capture–recapture revisited


Consider again the capture–recapture model for estimating population size.

recaptured not recaptured


marked m=20 n-m=80 n=100
unmarked u=180 ? U?
total R=200 ? N?

8
We assume
m ⇠ bin(n, ✓), u ⇠ bin(U, ✓)

So we get
p(m|✓) ⇠ bin(n, ✓) p(u|✓, U ) ⇠ bin(U, ✓)

Hence as before we get


p(m, u|✓, U ) = p(m|✓)p(u|✓, U )

The likelihood is:

p(m, u|✓, U ) = L(✓, U )


✓ ◆ ✓ ◆
n m n m U
= ✓ (1 ✓) ✓u (1 ✓)U u
m u
✓ ◆
U m+u
/ ✓ (1 ✓)n+U (m+u)
u

To complete the model specification we must specify prior distributions for ✓ and U . These
must reflect our prior knowledge of the problem.
The joint density of data (m, u) and the parameters (✓, U ) is then
✓ ◆
U m+u
p(m, u, ✓, U ) / ✓ (1 ✓)n+U (m+u) ⇡✓ (✓)⇡U (U )
u

The posterior is proportional to the joint density


✓ ◆
U m+u
p(✓, U |m, u) / ✓ (1 ✓)n+U (m+u)
⇡✓ (✓)⇡U (U )
u

To fit in with the current notation let x1 = ✓, x2 = U and x = (x1 , x2 ). Also notice that
data (m, u) is fixed so we need not write that in the posterior.
✓ ◆
⇤ x2 m+u
p (x1 , x2 ) / x (1 x1 )n+x2 (m+u) ⇡x1 (x1 )⇡x2 (x2 ) = k(x1 , x2 )
u 1
> logk <- function(x1, x2, n_, m_, u_){
R_ <- m_ + u_
R_*log(x1) + (n_+x2-R_)*log(1-x1) + lchoose (x2, u_) +
+ log(dunif(x1, .0, .2)) + log( disc.pmf(x2, 500, 2000))
}
> disc.pmf <- function(x, a, b){
ifelse (x>=a & x<=b, 1/(b-a+1), 0)
}
> n_ <- 100
> m_ <- 20

9
> u_ <- 180
> NN <- 10000 ## Number of samples
> u <- runif(NN)
> th.prop <- runif(NN, .0, 0.5)
> U.prop <- sample(300:3000, NN, replace=T)
> out <- matrix(NA, NN,2)
> xc <- c(0.2, 1500)
> acc.count <- 0
> for (i in 1:NN){
xp <- c( th.prop[i], U.prop[i] )
alpha <- min(1, exp(logk(xp[1], xp[2], n_, m_, u_) -
logk(xc[1], xc[2], n_, m_, u_)))
xc <- if(u[i]<alpha) xp else xc
out[i, ] <- xc
acc.count <- acc.count + (u[i]<alpha)
}
> acc.count / NN ## Not impressive acceptance ratio
[1] 0.0128
> summary(out[,1])
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.103 0.158 0.168 0.166 0.180 0.200
> summary(out[,2])
Min. 1st Qu. Median Mean 3rd Qu. Max.
813 991 1090 1110 1200 1750
> par(mfrow=c(1,2), mar=c(2,2,1,1))
> hist(out[,1], prob=T); lines(density(out[,1]), col="red")
> hist(out[,2], prob=T); lines(density(out[,2]), col="red")
Histogram of out[, 1] Histogram of out[, 2]
0.0000 0.0010 0.0020 0.0030
10 15 20 25 30

Density
5
0

0.10 0.14 0.18 800 1200 1600 2000

3.5 Quiz
Using the code from the slides, experiment with the following:

10
• Set m = 2 and u = 18. How does that e↵ect the posterior distribution? What if you
set m = 40 and u = 360?

• Experiment with narrowing and widening the range of the proposal distributions.
Which e↵ect does that have on the output?

• Try chaning the prior for U to a poisson distribution. Hint: dpois is your friend.

• Experiment with changing the number of samples. How many do you need to produce
“nice” histograms?

3.6 Single component Metropolis–Hastings


Instead of updating the entire vector x it is often more convenient and computationally
efficient to update x in blocks.
We partition x into blocks, for example x = (x1 , x2 , x3 ).
Suppose that we have a sample xt 1 = (xt1 1 , xt2 1 , xt3 1 ) and also that x1 has also been
updated to xt1 in the current iteration. The task is to update x2 .
To do so we specify a proposal distribution h2 from which we can sample candidate values
for x2 :

1. Draw x2 ⇠ h2 (·|xt1 , xt2 1 , xt3 1 ). Draw u ⇠ U (0, 1).


✓ ◆
p(x |xt ,xt 1 ) h (xt 1 |xt ,x ,xt 1
)
2. Calculate acceptance probability ↵ = min 1, p(xt 2 1 |x1 t ,x3 t 1 ) h2 (x2 |xt ,x1t 21 ,x3t 1
)
2 1 3 2 2 1 2 3

3. If u < ↵ set xt2 = x2 ; else set xt2 = xt2 1 .

Notice:

• Item 3. can be restated as: With probability ↵ set xt2 = x2 ; with probability 1 ↵
set xt2 = xt2 1 .

• If we can choose h2 such that ↵ is close to 1 then we have an efficient sampler.

3.7 The Gibbs sampler


Consider the acceptance probability for single component Metropolis–Hastings for updating
x2 : ✓ ◆
p(x2 |xt1 , xt3 1 ) h2 (xt2 1 |xt1 , x2 , xt3 1 )
↵ = min 1,
p(xt2 1 |xt1 , xt3 1 ) h2 (x2 |xt1 , xt2 1 , xt3 1 )

11
The Gibbs sampler is a special case of single component Metropolis–Hastings, namely the
case where the proposal distribution h2 (x2 |xt1 , xt2 1 , xt3 1 ) for updating x2 is chosen to be
p(x2 |xt1 , xt3 1 )

Hence for the Gibbs sampler the proposed values are always accepted.
One version of the algorithm is as follows. Suppose a sample xt = (xt1 , xt2 , xt3 ) is available.

1. Sample xt+1
1 ⇠ p(x1 |xt2 , xt3 )
2. Sample xt+1
2 ⇠ p(x2 |xt+1 t
1 , x3 )

3. Sample xt+1
3 ⇠ p(x3 |xt+1 t+1
1 , x2 )

4. Set xt+1 = (xt+1 t+2 t+3


1 , x1 , x1 )

The sequence x1 , x2 , . . . then consists of (correlated) samples from p(x).


Notice:

• The proposed values are always accepted (because ↵ = 1), so the sampler is very
efficient.
• The sampler requires that we can sample from the conditionals p(xi |x i ). In some
cases this is easy; in some cases this is difficult. In general; slice sampling can be
used (and this is what JAGS does).

3.8 Slice sampling


Suppose we want to sample from p(xi |x i ) where x i = (x1 , . . . , xi 1 , xi+1 , . . . xK ).
Since x i is fixed we can regard p(xi |x i ) as a function of xi alone; call this function ki (xi )
and recall that ki () is an unnormalized density.
Slice sampling is a simple approach for sampling from an unnormalized density.
> k <- function(x, a=.4, b=.08){exp(a*(x-a)^2 - b*x^4)}
> plot(x,k(x), type='l', lwd=2, col=2)
0.0 0.5 1.0 1.5 2.0 2.5 3.0

−4 −2 0 2 4

12
Notice: k() is practically zero outside [ 4, 4] and in this interval k() takes values in, say
[0, 3].
Slice sampling is based on the following idea: Sample uniformly in a “large enough” window:
> N <- 5000
> xs <- runif(N, -4, 4)
> ys <- runif(N, 0, 3)
> plot(x,k(x), type='l', lwd=2, col=2)
> points(xs,ys, pch=".")
0.0 0.5 1.0 1.5 2.0 2.5 3.0

−4 −2 0 2 4

Keep those samples that fall under the curve.


> good <- ys<k(xs)
> xg <- xs[good]
> yg <- ys[good]
> plot(x,k(x), type='l', lwd=2, col=2)
> points(xg, yg, pch='.')
0.0 0.5 1.0 1.5 2.0 2.5 3.0

−4 −2 0 2 4

> hist(xg)

13
Histogram of xg
250
150
0 50

−3 −2 −1 0 1 2 3

Obviously inefficient for other type of distributions, for example


> x2 <- seq(0,10, 0.1)
> k2 <- function(x){1/x}
> plot(x2, k2(x2), type='l')
10
8
6
4
2
0

0 2 4 6 8 10

Algorithm goes as follows: Given sample xt . Pick y uniformly in [0, k(xt )].
> xt<-1; y <- runif(1, 0, k(xt))
> plot(x,k(x), type='l', lwd=2, col=2)
> abline(v=xt, col='green'); abline(h=y, col='blue')
0.0 0.5 1.0 1.5 2.0 2.5 3.0

−4 −2 0 2 4

Let set S = {x : k(x) y} of x–values for which k(x) is below the curve. Sample xt+1
uniformly from S.

14
1. Sample y uniformly from ]0, k(xt )]. Defines a horisontal “slice” S = {x : k(x) y}.

2. Find interval I = [L, R] containing all (or much) of the slice.

3. Sample xt+1 uniformly from the part of the slice within I.

The last two steps can be implemented in many ways. We need an interval width w (chosen
by us).
“Stepping out”: Position interval of length w randomly around xt . Denote interval by
I = [L, R]. Expand both ends in steps of size w until both ends are outside the slice, i.e.
until k(L) < y and k(R) < y. Sample xt+1 from the part of the slice within I. (That is,
sample uniformly from I; if a sample is outside S just sample again).
“Doubling”: Position interval of length w around xt . Denote interval by I = [L, R]. Double
the interval until both ends are outside the slice, i.e. until k(L) < y and k(R) < y. Sample
xt+1 from the part of the slice within I.
> sliceSample_real<- function(k, xc, w){
kc<-k(xc)
y <-runif(1, 0, kc)
a <- runif(1) ## place w randomly around xc
l <- xc-a*w
u <- xc+(1-a)*w
kl <- k(l)
while (kl>y){ ## expand interval to the left if necessary
l <- l-w; kl <- k(l)
}
ku <- k(u)
while(ku>y){ ## expand interval to the right if necessary
u <- u+w; ku <- k(u)
}
xp <- runif(1, l, u) ## propose xp
kp <- k(xp)
while(kp<y){ ## shrink interval if possible
if (xp<xc) l <- xp else u <- xp
xp <- runif(1, l, u)
kp <- k(xp)
}
xp
}
> N <- 3000
> out <- rep.int(NA, N)
> x <- 1
> for (i in 1:1000){
x <- sliceSample_real(k, x, w=1)
out[i] <- x
}

15
> hist( out )
Histogram of out
200
150
100
50
0

−3 −2 −1 0 1 2 3

3.9 Towards omnibus software


With the slice sampling method, it is now clear why general purpose software (such as
JAGS) can be constructed.
We need a collection of slice samplers:

• Sampling on the real line

• Sampling on the unit interval [0, 1]

• Sampling on non–negative real values

• Sampling on the non–negative integers 0, 1, 2, . . .

• Sampling on a, a + 1, a + 2, . . . a + b

• ... and more

3.10 Sampling from a (truncated) poisson


Suppose we want to sample from a (truncated) poisson distribution given as
x x
p(x) / k(x) = I(x L)I(x  U ) e / I(x L)I(x  U )
x! x!
> k <- function(x, lambda=12, L=5, U=15){
(x>=L)*(x<=U)*x^lambda/factorial(x)
}
> x <- seq(0,20)
> plot(x, k(x))

16

2500000


1000000


● ●
● ● ● ● ● ● ● ● ● ● ● ● ●
0

0 5 10 15 20

> sliceSample_int <- function(k, xc, w=5){


a <- runif(1, 0, 1)
l <- floor( xc-a*w )
u <- ceiling( xc+(1-a)*w )
#cat(sprintf("init l=%d, u=%d\n",l,u))
y <- runif(1, 0, k(xc))
kl <- k(l)
while(kl>y){
l <- l-w; kl <- k(l)
#cat(sprintf("lower: l=%d, u=%d\n",l,u))
}
ku <- k(u)
while(ku>y){
u <- u+w; ku <- k(u)
#cat(sprintf("upper: l=%d, u=%d\n",l,u))
}
z <- l:u # sample uniformly on [l,u]
xp <- sample(z, 1)
kp <- k(xp)
while(kp<y){xp <- sample(z, 1);kp <- k(xp)}
xp
}
Sampling from poisson:
> N <- 1000
> out <- rep.int(NA, N)
> x <- 10 # Some starting value
> k <- function(x, lambda=12, L=0, U=150){
if(x<0) 0
else
(x>=L)*(x<=U)*lambda^x/factorial(x)
}
> for (i in 1:N){
out[i] <- sliceSample_int(k, x, w=5)
}
> barplot(table(out))

17
120
0 20 40 60 80

4 5 6 7 8 9 11 13 15 17 19 21

Sampling from truncated poisson:


> N <- 1000
> out <- rep.int(NA, N)
> x <- 10 # Some starting value
> k <- function(x, lambda=12, L=7, U=15){
if(x<0) 0
else
(x>=L)*(x<=U)*lambda^x/factorial(x)
}
> for (i in 1:N){
out[i] <- sliceSample_int(k, x, w=5)
}
> barplot(table(out))
140
100
60
0 20

7 8 9 10 11 12 13 14 15

3.11 Sampling on the unit interval


> k <- function(x){-x*(1-x)+.25}
> x <- seq(0,1,0.01)
> plot(x,k(x))

18
● ●
● ●
● ●
● ●
0.20

● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
0.10

● ●
● ●
● ●
● ●
●● ●
●● ●●●
●● ●●
●● ●●
●● ●●
●● ●●
●● ●
●●● ●●
●●● ●●●
0.00

●●●●● ●●●
●●●●●●●●●●●●●

0.0 0.2 0.4 0.6 0.8 1.0

> sliceSample_unit<- function(k, xc, w){


kc<-k(xc)
y <-runif(1, 0, kc)
xp <-runif(1, 0, 1)
kp <- k(xp)
while (kp<y){xp<-runif(1,0,1); kp<-k(xp)}
xp
}
> N <- 1000
> out <- rep(NA, N)
> for (i in 1:N){
x <- sliceSample_unit(k, x, w);
out[i] <- x
}
> hist(out)
Histogram of out
100 150 200 250
50
0

0.0 0.2 0.4 0.6 0.8 1.0

3.12 Capture–recapture revisited


Consider again the capture–recapture model for estimating population size.
We assume
m ⇠ bin(n, ✓), u ⇠ bin(U, ✓)

19
recaptured not recaptured
marked m=20 n-m=80 n=100
unmarked u=180 ? U?
total R=200 ? N?

So we get
p(m|✓) ⇠ bin(n, ✓) p(u|✓, U ) ⇠ bin(U, ✓)

Hence as before we get


p(m, u|✓, U ) = p(m|✓)p(u|✓, U )

The likelihood is:

p(m, u|✓, U ) = L(✓, U )


✓ ◆ ✓ ◆
n m n m U
= ✓ (1 ✓) ✓u (1 ✓)U u
m u
✓ ◆
U m+u
/ ✓ (1 ✓)n+U (m+u)
u

To complete the model specification we must specify prior distributions for ✓ and U . These
must reflect our prior knowledge of the problem.
The joint density of data (m, u) and the parameters (✓, U ) is then
✓ ◆
U m+u
p(m, u, ✓, U ) / ✓ (1 ✓)n+U (m+u) ⇡✓ (✓)⇡U (U )
u

The posterior is proportional to the joint density


✓ ◆
U m+u
p(✓, U |m, u) / ✓ (1 ✓)n+U (m+u)
⇡✓ (✓)⇡U (U )
u

To fit in with the current notation let x1 = ✓, x2 = U and x = (x1 , x2 ). Also notice that
data (m, u) is fixed so we need not write that in the posterior.
✓ ◆
⇤ x2 m+u
p (x1 , x2 ) / x (1 x1 )n+x2 (m+u) ⇡x1 (x1 )⇡x2 (x2 ) = k(x1 , x2 )
u 1
> k <- function(x1, x2, n_, m_, u_){
R_ <- m_ + u_
z<-R_*log(x1) + (n_+x2-R_)*log(1-x1) + lchoose (x2, u_) +
+ log(dunif(x1, .0, .2)) + log( disc.pmf(x2, 500, 2000))
exp(z)
}
> disc.pmf <- function(x, a, b){

20
ifelse (x>=a & x<=b, 1/(b-a+1), 0)
}
> n_ <- 100
> m_ <- 20
> u_ <- 180
> library(doBy)
> kk <- specialize(k, list(n_=n_, m_=m_, u_=u_))
> # Now kk is function only of x1, x2
> args(kk)
function (x1, x2)
NULL
> kk
function (x1, x2)
{
R_ <- 20 + 180
z <- R_ * log(x1) + (100 + x2 - R_) * log(1 - x1) + lchoose(x2,
180) + +log(dunif(x1, 0, 0.2)) + log(disc.pmf(x2, 500,
2000))
exp(z)
}
<environment: 0x07e1c188>
> N <- 10000
> x1t <- .1 # initial values
> x2t <- 1000 # initial values
> out <- matrix(NA, N, 2)
> kk1 <- specialize(kk, list(x2=x2t)); kk1
function (x1)
{
R_ <- 20 + 180
z <- R_ * log(x1) + (100 + 1000 - R_) * log(1 - x1) + lchoose(1000,
180) + +log(dunif(x1, 0, 0.2)) + log(disc.pmf(1000, 500,
2000))
exp(z)
}
<environment: 0x07db16e0>
> kk2 <- specialize(kk, list(x1=x1t)); kk2
function (x2)
{
R_ <- 20 + 180
z <- R_ * log(0.1) + (100 + x2 - R_) * log(1 - 0.1) + lchoose(x2,
180) + +log(dunif(0.1, 0, 0.2)) + log(disc.pmf(x2, 500,
2000))
exp(z)
}
<environment: 0x07d95200>

21
> for (i in 1:N){
x1t <- sliceSample_unit(kk1, x1t, w=1)
kk2 <- specialize(kk, list(x1=x1t))
x2t <- sliceSample_int(kk2, x2t, w=10)
kk1 <- specialize(kk, list(x2=x2t))
out[i,] <- c(x1t,x2t)
}
> par(mfrow=c(1,2))
> z<-apply(out, 2, hist)
Histogram of newX[, i] Histogram of newX[, i]

2500
1500

Frequency

1500
1000
500

500
0

0.08 0.12 0.16 0.20 800 1200 1600 2000

3.13 Work on the log–scale


For numerical reasons it is generally better to work on a log scale, i.e. with log k(x) instead
of k(x).
Sampling y from a U (0, k(xt ) distribution is the same as taking y = k(xt )u where u ⇠
u(0, 1).
Instead of doing this work on the log scale. Take

z = log k(xt ) + log u where u ⇠ u(0, 1)

We need to sample xt+1 uniformly from the slice S = {x : k(x) y}. But this is the same
as sampling xt+1 uniformly from the slice S = {x : log k(x) log y = z}.

22

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy