0% found this document useful (0 votes)
51 views10 pages

Exercise 3 Computer Intensive Statistics

This document contains the solutions to several problems involving statistical modeling and analysis. For Problem A, the author implements residual resampling bootstrapping on linear regression models and finds that the absolute differences model has lower bias and variance than the least squares model. For Problem B, the author performs an ANOVA test and permutation test on bilirubin concentration data and finds significant differences between people at the 5% level. For Problem C, the author derives the likelihood function for a Bayesian model involving exponential distributions.

Uploaded by

ageoir goe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views10 pages

Exercise 3 Computer Intensive Statistics

This document contains the solutions to several problems involving statistical modeling and analysis. For Problem A, the author implements residual resampling bootstrapping on linear regression models and finds that the absolute differences model has lower bias and variance than the least squares model. For Problem B, the author performs an ANOVA test and permutation test on bilirubin concentration data and finds significant differences between people at the 5% level. For Problem C, the author derives the likelihood function for a Bayesian model involving exponential distributions.

Uploaded by

ageoir goe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Exercise 2 - Computer intensive statistics

Johan S. Wind
April 2018

Problem A
1
We implemented residual resampling bootstrapping, with T = 100 and B =
2000, as specified in the problem description. We got the following results:
Least squares
B i a s i n b e t a 1 : −0.0911826
Bias in beta2 : 0.07552565
Variance in beta1 : 0.005416145
Variance in beta2 : 0.00532669

Absolute d i f f e r e n c e s
Bias in beta1 : 0.01754425
B i a s i n b e t a 2 : −0.01578929
Variance in beta1 : 0.0006502845
Variance in beta2 : 0.0006313204
We see that for this problem the LA estimator is clearly superior in both
bias and variance. The LS estimator is only optimal when we have Gaussian
residuals, which is clearly not the case here.

2
Again, we followed the instructions from the problem statement and got the
95% prediction intervals:
Least squares
95% p r e d i c t i o n i n t e r v a l : ( 7 . 5 5 2 9 3 1 , 22.81706 )

Absolute d i f f e r e n c e s
95% p r e d i c t i o n i n t e r v a l : ( 7 . 6 0 5 9 0 1 , 23.17706 )
We see that these are very similar, especially when looking at the width of
the intervals. This is likely because almost all the variance in the predictions
come from the irreducible error (at least irreducible in our model), and not the
variance in the β-estimates.

1
Code
Below is the source code used for Problem A, it uses the scipts ”probAhelp.R”
and ”probAdata.R” given.
s o u r c e ( ” p r o b A h e l p . R” )
s o u r c e ( ” p rob Ad at a . R” )
data3A = data3A [ [ 1 ] ]

model = ARp . beta . e s t ( data3A , 2)

B = 2000
beta b o o t s t r a p LA = matrix ( nrow=2 , n c o l=B)
beta b o o t s t r a p LS = matrix ( nrow=2 , n c o l=B)

for (k in 1:2) {
i f ( k == 1 ) beta = model$LS
e l s e beta = model$LA

e = ARp . r e s i d ( data3A , beta )

T = l e n g t h ( data3A )
f o r ( i i n 1 : B) {
s = sample ( 1 : ( T−1) , 1 )
e b o o t s t r a p = sample ( e , T , r e p l a c e=TRUE)
data b o o t s t r a p = ARp . f i l t e r ( data3A [ c ( s , s + 1 ) ] , beta , e bootstrap )
model = ARp . beta . e s t ( data b o o t s t r a p , 2 )
i f ( k == 1 ) beta b o o t s t r a p LS [ , i ] = model$LS
e l s e beta b o o t s t r a p LA [ , i ] = model$LA
}
}

for (k in 1:2) {
if ( k == 1 ) {
beta = model$LS
beta b o o t s t r a p = beta b o o t s t r a p LS
c a t ( ” \ n L e a s t s q u a r e s \n ” )
} else {
beta = model$LA
beta b o o t s t r a p = beta b o o t s t r a p LA
c a t ( ” \ n A b s o l u t e d i f f e r e n c e s \n ” )
}
cat ( ” Bias in beta1 : ” , mean( beta b o o t s t r a p [ 1 , ] ) − beta [ 1 ] , ” \n ” )
cat ( ” Bias in beta2 : ” , mean( beta b o o t s t r a p [ 2 , ] ) − beta [ 2 ] , ” \n ” )

cat ( ” V a ri a nc e in beta1 : ”, var ( beta bootstrap [ 1 , ] ) , ” \n ” )


cat ( ” V a ri a nc e in beta2 : ”, var ( beta bootstrap [ 2 , ] ) , ” \n ” )

samples = 10000
e = ARp . r e s i d ( data3A , beta )
b e t a 1 = sample ( beta b o o t s t r a p [ 1 , ] , s a m p l e s , r e p l a c e = TRUE)
b e t a 2 = sample ( beta b o o t s t r a p [ 2 , ] , s a m p l e s , r e p l a c e = TRUE)
e = sample ( e , s a m p l e s , r e p l a c e = TRUE)

x 1 0 1 = s o r t ( b e t a 1 ∗data3A [ 1 0 0 ] + b e t a 2 ∗data3A [ 9 9 ] + e )
%h i s t ( x101 , b r e a k s =100)
c a t ( ”95% p r e d i c t i o n i n t e r v a l : ( ” , x 1 0 1 [ s a m p l e s ∗ 2 . 5 / 1 0 0 ] , ”, ”, x101 [ s a m p l e s ∗ 9 7 . 5 / 1 0 0 ] , ” )\ n” )
}

2
Problem B

Figure 1: Box plot of the logarithm of the bilirubin readings of the different
people.

b i l i r u b i n <− read . t a b l e ( ” d a t a / b i l i r u b i n . t x t ” , h e a d e r=T)

boxplot ( l o g ( meas ) ˜ p e r s , data=b i l i r u b i n , x l a b = ’ P e r s o n ’ , ylab = ’ log ( concentration ) ’ ) # i n mg/ dL

model = lm ( l o g ( meas ) ˜ p e r s , data= b i l i r u b i n )


sm = summary( model )
sm
F v a l = sm$ f s t a t i s t i c [ [ 1 ] ]

We got ”F-statistic: 3.67 on 2 and 26 DF, p-value: 0.03946” from the sum-
mary, so the null hypothesis that all three people have equal levels is rejected
at a 5% confidence level.

3
Figure 2: Histogram of the F-statistics generated by the permutation test, Fval
(from original data) marked with red line.

The result of the permutation test was a p-value of 0.0408, which is very
close to the one given by the standard F-test. This indicates that the F-statistic
distribution given by the permutation test is similar to the F-distribution with
2 and 26 DF.
set . seed (12345) # R e p r o d u c i b l e results

permTest = f u n c t i o n ( ) {
b i l i r u b i n copy = b i l i r u b i n
b i l i r u b i n c o p y [ [ 1 ] ] = sample ( b i l i r u b i n [ [ 1 ] ] , r e p l a c e=F )
model = lm ( l o g ( meas ) ˜ p e r s , data= b i l i r u b i n c o p y )
sm = summary( model )
r = sm$ f s t a t i s t i c [ [ 1 ] ]
}

s a m p l e s = 5 0 0 0 # We c h o s e a larger sample size , for less variance


f = 1 : samples
for ( i in 1 : samples ) {
f [ i ] = permTest ( )
}

f = sort ( f )
h i s t ( f , b r e a k s =100 , x l i m=c ( 0 , 5 ) , x l a b= ’ F ’ , main= ’ ’ )
a b l i n e ( v=F v a l , c o l=” r e d ” )
sum( f > F v a l ) / l e n g t h ( f )

4
Problem C
1
We find an expression for a single sample (scalar valued x, y, z, u) and exploit
the independence between samples to generalize to multiple samples in the end.

f (z, u|x, y, λ0 , λ1 )f (x, y|λ0 , λ1 )


f (x, y|z, u, λ0 , λ1 ) =
f (z, u|λ0 , λ1 )

f (z, u|x, y, λ0 , λ1 ) = u U (x − y)δ(z − x) + (1 − u) U (z − y)


Where U is the Heaviside function.
Plugging in the exponential distribution, and using independence of x and
y:
f (x, y|λ0 , λ1 ) = f (x|λ0 )f (y|λ1 ) = λ0 exp(−λ0 x)λ1 exp(−λ1 y)
Marginalizing out x and y:
Z ∞Z ∞
f (z, u|λ0 , λ1 ) = (u U (x−y)δ(z−x)+(1−u) U (y−x)δ(z−y))λ0 exp(−λ0 x)λ1 exp(−λ1 y) dx dy
0 0
Z ∞ Z ∞
= λ0 λ1 ( u U (x − y)δ(z − x) exp(−λ0 x − λ1 y) dx dy+
0 0
Z ∞ Z ∞
(1 − u) U (y − x)δ(z − y) exp(−λ0 x − λ1 y) dx dy)
0 0
Z ∞ Z ∞
= λ0 λ1 ( u U (z−y) exp(−λ0 z−λ1 y) dy+ (1−u) U (z−x) exp(−λ0 x−λ1 z) dx)
0 0
Z z Z z
= λ0 λ1 (u exp(−λ0 z − λ1 y) dy + (1 − u) exp(−λ0 x − λ1 z) dx)
0 0
u 1−u
= λ0 λ1 ( exp(−λ0 z)(1 − exp(−λ1 z)) + exp(−λ1 z)(1 − exp(−λ0 z)))
λ1 λ0
= λ0 u exp(−λ0 z)(1 − exp(−λ1 z)) + λ1 (1 − u) exp(−λ1 z)(1 − exp(−λ0 z))
So now we have an expression for

(u U (x − y)δ(z − x) + (1 − u) U (y − x)δ(z − y))λ0 exp(−λ0 x)λ1 exp(−λ1 y)


f (x, y|z, u, λ0 , λ1 ) =
λ0 u exp(−λ0 z)(1 − exp(−λ1 z)) + λ1 (1 − u) exp(−λ1 z)(1 − exp(−λ0 z))

Now we can take the expectation of the log likelihood

ln f (x, y|λ0 , λ1 ) = ln λ0 + ln λ1 − λ0 x − λ1 y
Z ∞Z ∞
(t) (t) (t) (t)
E(ln f (x, y|λ0 , λ1 )|z, u, λ0 , λ1 ) = (ln λ0 +ln λ1 −λ0 x−λ1 y)f (x, y|z, u, λ0 , λ1 ) dx dy
0 0
Z ∞ Z ∞
(t) (t)
= ln λ0 + ln λ1 − (λ0 x + λ1 y)f (x, y|z, u, λ0 , λ1 ) dx dy
0 0

5
We split it into the two cases u = 0 and u = 1. First, u = 0:
∞ ∞ (t) (t) (t) (t)
U (y − x)δ(z − y)λ0 exp(−λ0 x)λ1 exp(−λ1 y)
Z Z
(λ0 x + λ1 y) (t) (t) (t)
dx dy
0 0 λ1 exp(−λ1 z)(1 − exp(−λ0 z))
(t) Z ∞ Z ∞
λ0 (t) (t)
= (t) (t)
(λ0 x+λ1 y)U (y−x)δ(z−y) exp(−λ0 x) exp(−λ1 y) dx dy
exp(−λ1 z)(1 − exp(−λ0 z)) 0 0

(t) Z z
λ0 (t) (t)
= (t) (t)
(λ0 x + λ1 z) exp(−λ0 x) exp(−λ1 z) dx
exp(−λ1 z)(1 − exp(−λ0 z)) 0

(t) Z z
λ0 λ1 z (t) (t)
= (t)
( (t)
(1 − exp(−λ0 z)) + λ0 x exp(−λ0 x) dx)
(1 − exp(−λ0 z) λ0 0

(t) (t)
λ0 (t) λ1 z
λ0 (t) 1 − exp(−λ0 z)
= (t) (t)
(
(1−exp(−λ0 z))− (t) (z exp(−λ0 z)− (t)
))
1 − exp(−λ0 z) λ0 λ0 λ0
λ0 λ0 z
= λ1 z + (t)
− (t)
λ0 exp(λ0 z) − 1
Similarly for u = 1:
Z ∞Z ∞
(t) (t)
(λ0 x + λ1 y)f (x, y|z, u, λ0 , λ1 ) dx dy
0 0

λ1 λ1 z
= λ0 z + (t)
− (t)
λ1 exp(λ1 z) − 1
Combining these expressions (for u = 0 and u = 1) gives the complete
expression:
(t) (t)
E(ln f (x, y|λ0 , λ1 )|z, u, λ0 , λ1 ) =
λ1 λ1 z λ0 λ0 z
ln λ0 +ln λ1 −u(λ0 z+ (t)
− (t)
)−(1−u)(λ1 z+ (t)
− (t)
)
λ1 exp(λ1 z) −1 λ0 exp(λ0 z) − 1
 1 z   1 z 
= ln λ0 +ln λ1 −λ0 uz+(1−u)( (t) − (t)
) −λ1 (1−u)z−u( (t)
− (t)
)
λ0 exp(λ0 z) − 1 λ1 exp(λ1 z) − 1
(t) (t)
Since the observations are independent (given λ0 , λ1 ), we can simply sum
the expectations of n observations:
(t) (t)
E(ln f (x, y|λ0 , λ1 )|z, u, λ0 , λ1 ) =
n  n 
X 1 zi  X 1 zi 
n(ln λ0 +ln λ1 )−λ0 ui zi +(1−ui )( (t)
− (t)
) −λ1 (1−u i )zi −u i ( (t)
− (t)
)
i=1 λ0 exp(λ0 zi ) − 1 i=1 λ1 exp(λ1 zi ) − 1

6
2
The maximum likelihood with respect to λ0 and λ1 is found by differentiating
and setting it to zero:
(t) (t)
∂E(ln f (x, y|λ0 , λ1 )|z, u, λ0 , λ1 )
=
∂λ0
n 
n X 1 zi 
− ui zi + (1 − ui )( (t) − (t)
) =0
λ0 i=1 λ0 exp(λ0 zi ) − 1
n
=⇒ λ0 = P  
n 1 zi
i=1 ui zi + (1 − ui )( (t) − (t)
λ0
)
exp(λ0 zi )−1

Similarly
n
λ1 = P  
n 1 zi
i=1 ui zi + (1 − ui )( (t) − (t) )
λ0 exp(λ0 zi )−1

We choose starting values to be:


n
λ0 = Pn
i=1 ui zi
n
λ1 = Pn
i=1 (1 − ui )zi
This initialization is equivalent to starting with infinite estimates for λ0 and λ1 ,
and then running one more iteration. Since the iteration converges quickly from
arbitrary large starting points, the initialization in not that important in this
problem.

7
Figure 3: Plot of convergence of the EM algorithm. We see the error goes down
exponentially with the number of iterations.

The maximum likelihood estimates for the data given in u.txt and z.txt were:

λ0 = 3.465735

λ1 = 9.353215
library ( ggplot2 )

U = read . t a b l e ( ” d a t a /u . t x t ” )
Z = read . t a b l e ( ” d a t a / z . t x t ” )
N = dim (U ) [ 1 ]

iter lambda0 = f u n c t i o n ( lambda0 ) {


r = n/sum( u∗ z+(1−u ) ∗ ( 1 / lambda0−z / ( exp ( lambda0∗ z ) − 1 ) ) )
}

iter lambda1 = f u n c t i o n ( lambda1 ) {


r = n/sum((1 − u ) ∗ z+u∗ ( 1 / lambda1−z / ( exp ( lambda1∗ z ) − 1 ) ) )
}

plotConvergence = function ( ) {
u <<− U
z <<− Z
n <<− N

lambda0 = n/sum( u∗ z )
lambda1 = n/sum((1 − u ) ∗ z )

m = 40
l 0 = 1 :m
l 1 = 1 :m

8
for ( i i n 1 :m) {
lambda0 = i t e r lambda0 ( lambda0 )
lambda1 = i t e r lambda1 ( lambda1 )
l 0 [ i ] = lambda0
l 1 [ i ] = lambda1
}
e 0 = abs ( l 0 −l 0 [m] )
e 1 = abs ( l 1 −l 1 [m] )
p l o t ( 1 : m, e0 , t y p e= ’ l ’ , l o g= ’ y ’ , y l i m=c ( 1 e −15 , max( e0 , e 1 ) ) , x l a b= ’ I t e r a t i o n ’ , y l a b = ’ E r r o r ’ , c o l= ’ r e d ’ )
l i n e s ( 1 : m, e1 , c o l= ’ b l u e ’ )
legend ( ’ t o p r i g h t ’ , legend=c ( ’ lambda 0 ’ , ’ lambda 1 ’ ) , c o l=c ( ’ r e d ’ , ’ b l u e ’ ) , l t y =1)

c a t ( ”Maximum l i k e l i h o o d Lambda 0: ”, lambda0 , ” \n ” )


c a t ( ”Maximum l i k e l i h o o d Lambda 1: ”, lambda1 , ” \n ” )
}
plotConvergence ()

3
The requested values are in the output of the code:
Lambda 0 b i a s : 0 . 0 1 9 1 4 4 7 1
Lambda 1 b i a s : 0 . 0 8 5 5 7 1 1 9
Lambda 0 improved b i a s : 0 . 0 1 7 8 1 1 8 7
Lambda 1 improved b i a s : 0 . 0 8 3 2 9 5 3 7
Lambda 0 s t a n d a r d d e v i a t i o n : 0 . 2 4 8 3 4 5
Lambda 1 s t a n d a r d d e v i a t i o n : 0 . 7 9 8 0 2 2 7
Lambda 0 Lambda 1 c o r r e l a t i o n : −0.01510014
Maximum l i k e l i h o o d Lambda 0 : 3 . 4 6 5 7 3 5
Maximum l i k e l i h o o d Lambda 1 : 9 . 3 5 3 2 1 5
B i a s c o r r e c t e d Lambda 0 : 3 . 4 4 7 9 2 3
B i a s c o r r e c t e d Lambda 1 : 9 . 2 6 9 9 2
Our bootstrap algorithm works as follows (K = 10000 in our tests):

for K iterations:
s = sample(1:N, N, replace=TRUE)
λi = EM(U[s], Z[s], N)
Treat λ as an independent sample, use it to estimate the desired statistics.

This means we generate many bootstrapped datasets, and for each of those
we generate samples of λˆ0 and λˆ1 . We can then simply calculate statistics on
this bootstrapped sample, like finding the variance of the estimators.

In this case we would prefer the bias-corrected estimates over the maximum
likelihood estimates, as I would like my estimates to be as close as possible
to the true values, and I care less about the variance of the estimates. The
estimated bias is significant in this problem, so it makes sense to correct for it.
EM = f u n c t i o n ( ) {
lambda0 = n/sum( u∗ z )
lambda1 = n/sum((1 − u ) ∗ z )

m = 20
f o r ( i i n 1 :m) {
lambda0 = i t e r lambda0 ( lambda0 )
lambda1 = i t e r lambda1 ( lambda1 )
}
l = c ( lambda0 , lambda1 )
}

9
EM a l l = f u n c t i o n ( ) {
u <<− U
z <<− Z
n <<− N
l = EM( )
}

set . seed (12345)

boot = function ( ) {
i t e r s = 10000
l l = matrix ( 0 , nrow=2 , n c o l= i t e r s )
u all = c ()
z all = c ()
n all = 0
for ( i t e r in 1: i t e r s ) {
s = sample ( 1 : N, N, r e p l a c e=TRUE)
u <<− U [ [ 1 ] ] [ s ]
z <<− Z [ [ 1 ] ] [ s ]
n <<− N
l l [ , i t e r ] = EM( )

u a l l = append ( u all , u)
z a l l = append ( z all , z)
n a l l = n a l l +N
}

l 0 = EM a l l ( )
c a t ( ”Lambda 0 bias : ” , mean( l l [ 1 , ] ) − l 0 [ 1 ] , ” \n ” )
c a t ( ”Lambda 1 bias : ” , mean( l l [ 2 , ] ) − l 0 [ 2 ] , ” \n ” )

u <<− u a l l
z <<− z a l l
n <<− n a l l
l = EM( )
c a t ( ”Lambda 0 i m p r o v e d b i a s : ” , mean( l l [ 1 , ] ) − l [ 1 ] , ” \n ” )
c a t ( ”Lambda 1 i m p r o v e d b i a s : ” , mean( l l [ 2 , ] ) − l [ 2 ] , ” \n ” )
c a t ( ”Lambda 0 s t a n d a r d d e v i a t i o n : ” , sd ( l l [ 1 , ] ) , ” \n ” )
c a t ( ”Lambda 1 s t a n d a r d d e v i a t i o n : ” , sd ( l l [ 2 , ] ) , ” \n ” )
c a t ( ”Lambda 0 Lambda 1 c o r r e l a t i o n : ” , cor ( l l [ 1 , ] , l l [ 2 , ] ) , ” \n ” )
c a t ( ”Maximum l i k e l i h o o d Lambda 0 : ” , l 0 [ 1 ] , ” \n ” )
c a t ( ”Maximum l i k e l i h o o d Lambda 1 : ” , l 0 [ 2 ] , ” \n ” )
c a t ( ” B i a s c o r r e c t e d Lambda 0 : ” , l 0 [ 1 ] − (mean( l l [ 1 , ] ) − l [ 1 ] ) , ” \n ” )
c a t ( ” B i a s c o r r e c t e d Lambda 1 : ” , l 0 [ 2 ] − (mean( l l [ 2 , ] ) − l [ 2 ] ) , ” \n ” )
ll
}
l = boot ( )
hist ( l [ 1 , ] , main=” ” , b r e a k s =50 , f r e q=FALSE , x l a b=e x p r e s s i o n ( lambda [ 0 ] ) )
hist ( l [ 2 , ] , main=” ” , b r e a k s =50 , f r e q=FALSE , x l a b=e x p r e s s i o n ( lambda [ 1 ] ) )

10

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy