Exercise 3 Computer Intensive Statistics
Exercise 3 Computer Intensive Statistics
Johan S. Wind
April 2018
Problem A
1
We implemented residual resampling bootstrapping, with T = 100 and B =
2000, as specified in the problem description. We got the following results:
Least squares
B i a s i n b e t a 1 : −0.0911826
Bias in beta2 : 0.07552565
Variance in beta1 : 0.005416145
Variance in beta2 : 0.00532669
Absolute d i f f e r e n c e s
Bias in beta1 : 0.01754425
B i a s i n b e t a 2 : −0.01578929
Variance in beta1 : 0.0006502845
Variance in beta2 : 0.0006313204
We see that for this problem the LA estimator is clearly superior in both
bias and variance. The LS estimator is only optimal when we have Gaussian
residuals, which is clearly not the case here.
2
Again, we followed the instructions from the problem statement and got the
95% prediction intervals:
Least squares
95% p r e d i c t i o n i n t e r v a l : ( 7 . 5 5 2 9 3 1 , 22.81706 )
Absolute d i f f e r e n c e s
95% p r e d i c t i o n i n t e r v a l : ( 7 . 6 0 5 9 0 1 , 23.17706 )
We see that these are very similar, especially when looking at the width of
the intervals. This is likely because almost all the variance in the predictions
come from the irreducible error (at least irreducible in our model), and not the
variance in the β-estimates.
1
Code
Below is the source code used for Problem A, it uses the scipts ”probAhelp.R”
and ”probAdata.R” given.
s o u r c e ( ” p r o b A h e l p . R” )
s o u r c e ( ” p rob Ad at a . R” )
data3A = data3A [ [ 1 ] ]
B = 2000
beta b o o t s t r a p LA = matrix ( nrow=2 , n c o l=B)
beta b o o t s t r a p LS = matrix ( nrow=2 , n c o l=B)
for (k in 1:2) {
i f ( k == 1 ) beta = model$LS
e l s e beta = model$LA
T = l e n g t h ( data3A )
f o r ( i i n 1 : B) {
s = sample ( 1 : ( T−1) , 1 )
e b o o t s t r a p = sample ( e , T , r e p l a c e=TRUE)
data b o o t s t r a p = ARp . f i l t e r ( data3A [ c ( s , s + 1 ) ] , beta , e bootstrap )
model = ARp . beta . e s t ( data b o o t s t r a p , 2 )
i f ( k == 1 ) beta b o o t s t r a p LS [ , i ] = model$LS
e l s e beta b o o t s t r a p LA [ , i ] = model$LA
}
}
for (k in 1:2) {
if ( k == 1 ) {
beta = model$LS
beta b o o t s t r a p = beta b o o t s t r a p LS
c a t ( ” \ n L e a s t s q u a r e s \n ” )
} else {
beta = model$LA
beta b o o t s t r a p = beta b o o t s t r a p LA
c a t ( ” \ n A b s o l u t e d i f f e r e n c e s \n ” )
}
cat ( ” Bias in beta1 : ” , mean( beta b o o t s t r a p [ 1 , ] ) − beta [ 1 ] , ” \n ” )
cat ( ” Bias in beta2 : ” , mean( beta b o o t s t r a p [ 2 , ] ) − beta [ 2 ] , ” \n ” )
samples = 10000
e = ARp . r e s i d ( data3A , beta )
b e t a 1 = sample ( beta b o o t s t r a p [ 1 , ] , s a m p l e s , r e p l a c e = TRUE)
b e t a 2 = sample ( beta b o o t s t r a p [ 2 , ] , s a m p l e s , r e p l a c e = TRUE)
e = sample ( e , s a m p l e s , r e p l a c e = TRUE)
x 1 0 1 = s o r t ( b e t a 1 ∗data3A [ 1 0 0 ] + b e t a 2 ∗data3A [ 9 9 ] + e )
%h i s t ( x101 , b r e a k s =100)
c a t ( ”95% p r e d i c t i o n i n t e r v a l : ( ” , x 1 0 1 [ s a m p l e s ∗ 2 . 5 / 1 0 0 ] , ”, ”, x101 [ s a m p l e s ∗ 9 7 . 5 / 1 0 0 ] , ” )\ n” )
}
2
Problem B
Figure 1: Box plot of the logarithm of the bilirubin readings of the different
people.
We got ”F-statistic: 3.67 on 2 and 26 DF, p-value: 0.03946” from the sum-
mary, so the null hypothesis that all three people have equal levels is rejected
at a 5% confidence level.
3
Figure 2: Histogram of the F-statistics generated by the permutation test, Fval
(from original data) marked with red line.
The result of the permutation test was a p-value of 0.0408, which is very
close to the one given by the standard F-test. This indicates that the F-statistic
distribution given by the permutation test is similar to the F-distribution with
2 and 26 DF.
set . seed (12345) # R e p r o d u c i b l e results
permTest = f u n c t i o n ( ) {
b i l i r u b i n copy = b i l i r u b i n
b i l i r u b i n c o p y [ [ 1 ] ] = sample ( b i l i r u b i n [ [ 1 ] ] , r e p l a c e=F )
model = lm ( l o g ( meas ) ˜ p e r s , data= b i l i r u b i n c o p y )
sm = summary( model )
r = sm$ f s t a t i s t i c [ [ 1 ] ]
}
f = sort ( f )
h i s t ( f , b r e a k s =100 , x l i m=c ( 0 , 5 ) , x l a b= ’ F ’ , main= ’ ’ )
a b l i n e ( v=F v a l , c o l=” r e d ” )
sum( f > F v a l ) / l e n g t h ( f )
4
Problem C
1
We find an expression for a single sample (scalar valued x, y, z, u) and exploit
the independence between samples to generalize to multiple samples in the end.
ln f (x, y|λ0 , λ1 ) = ln λ0 + ln λ1 − λ0 x − λ1 y
Z ∞Z ∞
(t) (t) (t) (t)
E(ln f (x, y|λ0 , λ1 )|z, u, λ0 , λ1 ) = (ln λ0 +ln λ1 −λ0 x−λ1 y)f (x, y|z, u, λ0 , λ1 ) dx dy
0 0
Z ∞ Z ∞
(t) (t)
= ln λ0 + ln λ1 − (λ0 x + λ1 y)f (x, y|z, u, λ0 , λ1 ) dx dy
0 0
5
We split it into the two cases u = 0 and u = 1. First, u = 0:
∞ ∞ (t) (t) (t) (t)
U (y − x)δ(z − y)λ0 exp(−λ0 x)λ1 exp(−λ1 y)
Z Z
(λ0 x + λ1 y) (t) (t) (t)
dx dy
0 0 λ1 exp(−λ1 z)(1 − exp(−λ0 z))
(t) Z ∞ Z ∞
λ0 (t) (t)
= (t) (t)
(λ0 x+λ1 y)U (y−x)δ(z−y) exp(−λ0 x) exp(−λ1 y) dx dy
exp(−λ1 z)(1 − exp(−λ0 z)) 0 0
(t) Z z
λ0 (t) (t)
= (t) (t)
(λ0 x + λ1 z) exp(−λ0 x) exp(−λ1 z) dx
exp(−λ1 z)(1 − exp(−λ0 z)) 0
(t) Z z
λ0 λ1 z (t) (t)
= (t)
( (t)
(1 − exp(−λ0 z)) + λ0 x exp(−λ0 x) dx)
(1 − exp(−λ0 z) λ0 0
(t) (t)
λ0 (t) λ1 z
λ0 (t) 1 − exp(−λ0 z)
= (t) (t)
(
(1−exp(−λ0 z))− (t) (z exp(−λ0 z)− (t)
))
1 − exp(−λ0 z) λ0 λ0 λ0
λ0 λ0 z
= λ1 z + (t)
− (t)
λ0 exp(λ0 z) − 1
Similarly for u = 1:
Z ∞Z ∞
(t) (t)
(λ0 x + λ1 y)f (x, y|z, u, λ0 , λ1 ) dx dy
0 0
λ1 λ1 z
= λ0 z + (t)
− (t)
λ1 exp(λ1 z) − 1
Combining these expressions (for u = 0 and u = 1) gives the complete
expression:
(t) (t)
E(ln f (x, y|λ0 , λ1 )|z, u, λ0 , λ1 ) =
λ1 λ1 z λ0 λ0 z
ln λ0 +ln λ1 −u(λ0 z+ (t)
− (t)
)−(1−u)(λ1 z+ (t)
− (t)
)
λ1 exp(λ1 z) −1 λ0 exp(λ0 z) − 1
1 z 1 z
= ln λ0 +ln λ1 −λ0 uz+(1−u)( (t) − (t)
) −λ1 (1−u)z−u( (t)
− (t)
)
λ0 exp(λ0 z) − 1 λ1 exp(λ1 z) − 1
(t) (t)
Since the observations are independent (given λ0 , λ1 ), we can simply sum
the expectations of n observations:
(t) (t)
E(ln f (x, y|λ0 , λ1 )|z, u, λ0 , λ1 ) =
n n
X 1 zi X 1 zi
n(ln λ0 +ln λ1 )−λ0 ui zi +(1−ui )( (t)
− (t)
) −λ1 (1−u i )zi −u i ( (t)
− (t)
)
i=1 λ0 exp(λ0 zi ) − 1 i=1 λ1 exp(λ1 zi ) − 1
6
2
The maximum likelihood with respect to λ0 and λ1 is found by differentiating
and setting it to zero:
(t) (t)
∂E(ln f (x, y|λ0 , λ1 )|z, u, λ0 , λ1 )
=
∂λ0
n
n X 1 zi
− ui zi + (1 − ui )( (t) − (t)
) =0
λ0 i=1 λ0 exp(λ0 zi ) − 1
n
=⇒ λ0 = P
n 1 zi
i=1 ui zi + (1 − ui )( (t) − (t)
λ0
)
exp(λ0 zi )−1
Similarly
n
λ1 = P
n 1 zi
i=1 ui zi + (1 − ui )( (t) − (t) )
λ0 exp(λ0 zi )−1
7
Figure 3: Plot of convergence of the EM algorithm. We see the error goes down
exponentially with the number of iterations.
The maximum likelihood estimates for the data given in u.txt and z.txt were:
λ0 = 3.465735
λ1 = 9.353215
library ( ggplot2 )
U = read . t a b l e ( ” d a t a /u . t x t ” )
Z = read . t a b l e ( ” d a t a / z . t x t ” )
N = dim (U ) [ 1 ]
plotConvergence = function ( ) {
u <<− U
z <<− Z
n <<− N
lambda0 = n/sum( u∗ z )
lambda1 = n/sum((1 − u ) ∗ z )
m = 40
l 0 = 1 :m
l 1 = 1 :m
8
for ( i i n 1 :m) {
lambda0 = i t e r lambda0 ( lambda0 )
lambda1 = i t e r lambda1 ( lambda1 )
l 0 [ i ] = lambda0
l 1 [ i ] = lambda1
}
e 0 = abs ( l 0 −l 0 [m] )
e 1 = abs ( l 1 −l 1 [m] )
p l o t ( 1 : m, e0 , t y p e= ’ l ’ , l o g= ’ y ’ , y l i m=c ( 1 e −15 , max( e0 , e 1 ) ) , x l a b= ’ I t e r a t i o n ’ , y l a b = ’ E r r o r ’ , c o l= ’ r e d ’ )
l i n e s ( 1 : m, e1 , c o l= ’ b l u e ’ )
legend ( ’ t o p r i g h t ’ , legend=c ( ’ lambda 0 ’ , ’ lambda 1 ’ ) , c o l=c ( ’ r e d ’ , ’ b l u e ’ ) , l t y =1)
3
The requested values are in the output of the code:
Lambda 0 b i a s : 0 . 0 1 9 1 4 4 7 1
Lambda 1 b i a s : 0 . 0 8 5 5 7 1 1 9
Lambda 0 improved b i a s : 0 . 0 1 7 8 1 1 8 7
Lambda 1 improved b i a s : 0 . 0 8 3 2 9 5 3 7
Lambda 0 s t a n d a r d d e v i a t i o n : 0 . 2 4 8 3 4 5
Lambda 1 s t a n d a r d d e v i a t i o n : 0 . 7 9 8 0 2 2 7
Lambda 0 Lambda 1 c o r r e l a t i o n : −0.01510014
Maximum l i k e l i h o o d Lambda 0 : 3 . 4 6 5 7 3 5
Maximum l i k e l i h o o d Lambda 1 : 9 . 3 5 3 2 1 5
B i a s c o r r e c t e d Lambda 0 : 3 . 4 4 7 9 2 3
B i a s c o r r e c t e d Lambda 1 : 9 . 2 6 9 9 2
Our bootstrap algorithm works as follows (K = 10000 in our tests):
for K iterations:
s = sample(1:N, N, replace=TRUE)
λi = EM(U[s], Z[s], N)
Treat λ as an independent sample, use it to estimate the desired statistics.
This means we generate many bootstrapped datasets, and for each of those
we generate samples of λˆ0 and λˆ1 . We can then simply calculate statistics on
this bootstrapped sample, like finding the variance of the estimators.
In this case we would prefer the bias-corrected estimates over the maximum
likelihood estimates, as I would like my estimates to be as close as possible
to the true values, and I care less about the variance of the estimates. The
estimated bias is significant in this problem, so it makes sense to correct for it.
EM = f u n c t i o n ( ) {
lambda0 = n/sum( u∗ z )
lambda1 = n/sum((1 − u ) ∗ z )
m = 20
f o r ( i i n 1 :m) {
lambda0 = i t e r lambda0 ( lambda0 )
lambda1 = i t e r lambda1 ( lambda1 )
}
l = c ( lambda0 , lambda1 )
}
9
EM a l l = f u n c t i o n ( ) {
u <<− U
z <<− Z
n <<− N
l = EM( )
}
boot = function ( ) {
i t e r s = 10000
l l = matrix ( 0 , nrow=2 , n c o l= i t e r s )
u all = c ()
z all = c ()
n all = 0
for ( i t e r in 1: i t e r s ) {
s = sample ( 1 : N, N, r e p l a c e=TRUE)
u <<− U [ [ 1 ] ] [ s ]
z <<− Z [ [ 1 ] ] [ s ]
n <<− N
l l [ , i t e r ] = EM( )
u a l l = append ( u all , u)
z a l l = append ( z all , z)
n a l l = n a l l +N
}
l 0 = EM a l l ( )
c a t ( ”Lambda 0 bias : ” , mean( l l [ 1 , ] ) − l 0 [ 1 ] , ” \n ” )
c a t ( ”Lambda 1 bias : ” , mean( l l [ 2 , ] ) − l 0 [ 2 ] , ” \n ” )
u <<− u a l l
z <<− z a l l
n <<− n a l l
l = EM( )
c a t ( ”Lambda 0 i m p r o v e d b i a s : ” , mean( l l [ 1 , ] ) − l [ 1 ] , ” \n ” )
c a t ( ”Lambda 1 i m p r o v e d b i a s : ” , mean( l l [ 2 , ] ) − l [ 2 ] , ” \n ” )
c a t ( ”Lambda 0 s t a n d a r d d e v i a t i o n : ” , sd ( l l [ 1 , ] ) , ” \n ” )
c a t ( ”Lambda 1 s t a n d a r d d e v i a t i o n : ” , sd ( l l [ 2 , ] ) , ” \n ” )
c a t ( ”Lambda 0 Lambda 1 c o r r e l a t i o n : ” , cor ( l l [ 1 , ] , l l [ 2 , ] ) , ” \n ” )
c a t ( ”Maximum l i k e l i h o o d Lambda 0 : ” , l 0 [ 1 ] , ” \n ” )
c a t ( ”Maximum l i k e l i h o o d Lambda 1 : ” , l 0 [ 2 ] , ” \n ” )
c a t ( ” B i a s c o r r e c t e d Lambda 0 : ” , l 0 [ 1 ] − (mean( l l [ 1 , ] ) − l [ 1 ] ) , ” \n ” )
c a t ( ” B i a s c o r r e c t e d Lambda 1 : ” , l 0 [ 2 ] − (mean( l l [ 2 , ] ) − l [ 2 ] ) , ” \n ” )
ll
}
l = boot ( )
hist ( l [ 1 , ] , main=” ” , b r e a k s =50 , f r e q=FALSE , x l a b=e x p r e s s i o n ( lambda [ 0 ] ) )
hist ( l [ 2 , ] , main=” ” , b r e a k s =50 , f r e q=FALSE , x l a b=e x p r e s s i o n ( lambda [ 1 ] ) )
10