Lec3 Inverse Transformation Rejection
Lec3 Inverse Transformation Rejection
Random Variables
Department of Statistics
Yunnan University of Finance and Economics
Outline
3.1 Introduction
Random Generators of Common Probability Distributions in R
3.2 The Inverse Transform Method
3.2.1 Inverse Transform Method, Continuous Case
3.2.2 Inverse Transform Method, Discrete Case
3.3 The Acceptance-Rejection Method
The Acceptance-Rejection Method
3.4 Transformation Methods
3.5 Sums and Mixtures
3.6 Multivariate Distributions
3.6.1 Multivariate Normal Distribution
3.6.2 Mixtures of Multivariate Normals
3.6.3 Wishart Distribution
3.6.4 Uniform Dist. on the d-Sphere
Introduction
superimposed.
0.5
0.0
R note 3.2
rbinom (random binomial) function with size=1 generates a
Bernoulli sample. Another method is to sample from the vector
(0, 1) with prob. (1 − p, p).
rbinom (n , size = 1 , prob = p )
sample ( c (0 ,1) , size =n , replace = TRUE , prob = c (.6 ,.4))
Example 3.5 (Geometric distribution)
Generate a random geometric sample with parameter p = 14 .
The pmf is f (x) = pq x , x = 0, 1, 2, · · · , where q = 1 − p. At the
points of discontinuity x = 0, 1, 2, · · · , cdf is F (x) = 1 − q x+1 . For
each sample element, we generate a random uniform u and solve
1 − qx < u ≤ 1 − qx,
aθx
f (x) = P (X = x) = , x = 1, 2, · · · (3.1)
x
where 0 < θ < 1 and a = (− log(1 − θ))−1 . A recursive formula for
f (x) is
θx
f (x + 1) = f (x), x = 1, 2, · · · (3.2)
x+1
Generate random samples from Logarithmic(0.5) dist.
rlogarithmic
• Initially choose a length N for the cdf vector, and compute
F (x), x = 1, 2, · · · , N . If necessary, N will be increased.
rlogarithmic <- function (n , theta ) {
# returns a random logarithmic ( theta ) sample size n
u <- runif ( n )
# set the initial length of cdf vector
N <- ceiling ( -16 / log10 ( theta ))
k <- 1: N ; a <- -1 / log (1 - theta )
fk <- a * theta ^ k / k
Fk <- cumsum ( fk ); x <- integer ( n )
for ( i in 1: n ) {
x [ i ] <- as . integer ( sum ( u [ i ] > Fk )) # F ^{ -1}( u ) -1
while ( x [ i ] == N ) {
# if x == N we need to extend the cdf
# very unlikely because N is large
f <- a * theta ^( N +1) / N
fk <- c ( fk , f )
Fk <- c ( Fk , Fk [ N ] + fk [ N +1])
N <- N + 1
x [ i ] <- as . integer ( sum ( u [ i ] > Fk ))}}
x + 1 }
The Acceptance-Rejection Method
Suppose that X and Y are r.v. with density or pmf f and g re-
spectively, and there exists a constant c such that fg(t)
(t)
≤ c for all t
such that f (t) > 0. Then the acceptance-rejection method can be
applied to generate the r.v. X.
The acceptance-rejection method
1. Find a r.v. Y with density g satisfying f (t)/g(t) ≤ c, for all t
such that f (t) > 0.
2. Generate a random y from the dist. with density g
3. Generate a random u from the U (0, 1) dist.
4. If u < f (y)/(cg(y)) accept y and deliver x = y; otherwise
reject y and repeat step 2-4.
f (Y ) f (Y )
Note that in step 4, P (accept|Y ) = P (U < cg(Y ) |Y ) = cg(Y ) .
The total prob. of acceptance for any iteration is
X X f (y) 1
P (accept|y)P (Y = y) = g(y) = .
y y cg(y) c
In the discrete case, for each k such that f (k) > 0,
f (k)
P (A | k)g(k) ( cg(k) )g(k)
P (Y = k | A) = = = f (k)
P (A) 1/c
f (Y )
In the continuous case, we need to prove P (Y ≤ y | U ≤ cg(Y ) ) =
FX (y). In fact
f (Y )
f (Y )
P (U ≤ cg(Y ) , Y ≤ y)
P Y ≤y|U ≤ =
cg(Y ) 1/c
f (Y )
Z y P (U ≤ cg(Y ) | Y = ω ≤ y)
= g(ω)dω
−∞ 1/c
Z y
f (ω)
= c g(ω)dω = FX (y)
−∞ cg(ω)
Example 3.7 (Acceptance-rejection method), Beta(α = 2, β = 2) dist.
The Beta(2,2) density is f (x) = 6x(1 − x), 0 < x < 1. Let g(x)
be the U (0, 1) density. Then f (x)/g(x) ≤ 6 for all 0 < x < 1, so
c = 6. A random x from g(x) is accepted if
f (x)/cg(x) = x(1 − x) > u.
Averagely, cn = 6000 iterations will be required for sample size 1000.
n <- 1000; k <- 0 # counter for accepted
y <- numeric ( n ); j <- 0 # iterations
while ( k < n ) { u <- runif (1); j <-j +1;
x <- runif (1) # random variate from g
if ( x * (1 - x ) > u ) { k <-k +1; y [ k ] <- x }} # we accept x
>j
[1] 5873
has the Logarithmic() dist., where bxc denotes the integer part
of x.
Example 3.8 (Beta distribution)
1.0
0.8
0.6
Sample
Fig.3.2: QQ Plot comparing the Be-
ta(3, 2) distribution with a simulated
0.4
data in Example 3.8.
Continuous mixture X R +∞
dist. of X is FX (x) = −∞ FX|Y =y (x)fY (y)dy for a family
X|Y = y indexed
R +∞ by the real numbers y and weighting function fY
such that −∞ fY (y)dy = 1.
0.8
0.5
s of the samples generated by the convolu-
0.4
0.6
tion S = X1 + X2 and the mixture FX =
0.3
Density
Density
0.4
0.5FX1 + 0.5FX2 .
0.2
0.2
Remark: Histograms of the convolution S
0.1
0.0
0.0
and mixture X are different. 0 1 2
s
3 4 5 0 1 2
x
3 4
R note 3.6: par: set (or query) certain graphical parameters. par():
return a list of all graphical parameters. par(mfcol=c(n,m)): configure
graphical device to display nm graphs per screen (n rows and m columns).
Example 3.12 (Mixture of several gamma dist.)
There are several components to the mixture
P5and the mixing weights
are not uniform. The mixture is FX = i=1 θj FXj where Xj ∼
Gamma(r = 3, λj = 1/j) are independent and the mixing prob. are
θj = j/15, j = 1, · · · , 5.
To simulate the mixture FX :
1. Generate an integer k ∈ {1, 2, 3, 4, 5}, P (k) = θk , k = 1, . . . , 5.
2. Deliver a random Gamma(r, λk ) variate.
which suggests using a for loop to generate a sample size n, but
for loops are really inefficient in R.
Efficient vectorized algorithm:
1. Generate a random sample k1 , . . . , kn of integers in vector k,
where P (k) = θk , k = 1, . . . , 5. k[i] indicates which of the
five gamma distributions will be sampled to get the ith element
of sample (use sample).
2. Set rate equal to the length n vector λ = (λk ).
3. Generate a gamma sample size n, with shape parameter r and
rate vector rate (use rgamma).
n <- 5000
k <- sample (1:5 , size =n , replace = TRUE , prob =(1:5) / 15)
rate <- 1 / k ; x <- rgamma (n , shape =3 , rate = rate )
# plot the density of the mixture
# with the densities of the components
plot ( density ( x ) , xlim = c (0 ,40) , ylim = c (0 ,.3) ,
lwd =3 , xlab = " x " , main = " " )
for ( i in 1:5)
lines ( density ( rgamma (n , 3 , 1 / i )))
0.30
0.25
0.20
0 10 20 30 40
x
Example 3.14 (Plot density of mixture)
0 2 4 6 8
x
R note 3.7
• apply function requires a dimension attribute for x.
• Vector x does not have a dimension attribute, which is thus as-
signed by dim(x) <- length(x). Alternately, x <- as.matrix(x)
converts x to a matrix (a column vector), which has a dimen-
sion attribute.
Example 3.15 (Poisson-Gamma mixture, continuous mixture)
3.16.
y
−1
−3 −2 −1 0 1 2 3
x
SVD Method of generating Nd (µ, Σ) samples
SVD generalizes the idea of eigenvectors to rectangular matrices.
• svd: X = U DV T , where D is a vector containing the singular
values of X, U is a matrix whose columns contain the left
singular vectors of X, and V is a matrix whose columns contain
the right singular vectors of X.
• Since Σ 0 (positive definite), U V T = I, thus Σ1/2 =
U Λ1/2 U T and svd is equivalent to spectral decomposition.
• svd is less efficient because it does not take advantage of the
fact that the matrix Σ is square symmetric.
8
7
Sepal.Length
6
5
3.5
Petal.Width
1.5
0.30
0.20
0.20
Density
Density
0.10
0.10
Fig.3.8: Histograms of the
0.00
0.00
−4 0 2 4 6 8 10 −4 0 2 4 6 8 10 marginal distributions of
x[, i] x[, i]
MVN data generated in
0.30
0.30
Example 3.20.
0.20
0.20
Density
Density
0.10
0.10
0.00
0.00
−4 0 2 4 6 8 10 −4 0 2 4 6 8 10
x[, i] x[, i]
R note 3.9
• par(pty = "s"): set the square plot type so the circle is round
rather than elliptical;
• par(pty = "m") restore the type to maximal plotting region.