0% found this document useful (0 votes)
101 views11 pages

ETC 2420/5242 Lab 10 2016: Purpose

1) The lab calculates conditional probabilities to analyze a spam filter using Bayesian inference. 2) Simulation and graphical analysis is used to examine the behavior of the posterior distribution under different priors and sample sizes. 3) The exact posterior density of the probability of heads coming up less than 3 times in 10 coin flips is derived and plotted. 4) The beta prior and posterior distributions for the proportion of Californians supporting the death penalty are derived and plotted based on survey data.

Uploaded by

Ishara
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
101 views11 pages

ETC 2420/5242 Lab 10 2016: Purpose

1) The lab calculates conditional probabilities to analyze a spam filter using Bayesian inference. 2) Simulation and graphical analysis is used to examine the behavior of the posterior distribution under different priors and sample sizes. 3) The exact posterior density of the probability of heads coming up less than 3 times in 10 coin flips is derived and plotted. 4) The beta prior and posterior distributions for the proportion of Californians supporting the death penalty are derived and plotted based on survey data.

Uploaded by

Ishara
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

ETC 2420/5242 Lab 10 2016

Souhaib Ben Taieb


Week 10

Purpose

This lab is to compute conditional probabilities and practice Bayesian inference.

Question 1

A situation where Bayesian analysis is routinely used is your spam filter in your mail server. The message is
scrutinized for the appearance of key words which make it likely that the message is spam. Let us describe
how one one of these filters might work. We imagine that the evidence for spam is that the subject message
of the mail contains the sentence “check this out”. We define events spam (the message is spam) and check
this out (the subject line contains this sentence).
From previous experience we know that 40% of emails are spam, 1% of spam email have “check this out” in
the subject line, and .4% of non-spam emails have this sentence in the subject line.
Explain the different steps to compute the conditional probability P(spam | check this out).
P (check this out|spam)P (spam)
P (spam|check this out) = P (check this out)

P (spam) = 0.4
check this out|spam = 0.01

P (check this out) = P (check this out|spam)P (spam) + P (check this out|not spam)P (not spam)
= 0.01 × 0.4 + 0.004 × 0.6 = 0.0064

P (spam|check this out) = 0.004


0.0064 = 5
8 = 0.625

Question 2

Let X1 , . . . , Xn ∼ N (θ, 9).

a. If θ ∼ N (µ, τ 2 ), what is π(θ|x1 , . . . , xn )?


b. What is the posterior mean E[θ|x1 , . . . , xn ]?
c. What is the MLE estimate θ̂MLE ?

See the slides of week 9


Suppose the “true” value is θ = 2. Consider (1) µ = 5 and τ = 1, and (2) µ = 2 and τ = 2.
For n ∈ {1, 10, 20, 50, 100, 10000}:

a. Simulate a data set consisting of n observations


b. Plot on the same graphic π(θ), π(θ|x1 , . . . , xn ) and θ̂MLE .

Discuss the behavior of π(θ|x1 , . . . , xn ) as n increases and the impact of the prior distribution.

1
set.seed(1986)
theta <- 2
sigma_0 <- 3

alln <- c(1, 2, 5, 10, 100, 10000)


for(case in c(1, 2)){
if(case == 1){
prior_mu <- 2
prior_tau <- 2
}else if(case == 2){
prior_mu <- 5
prior_tau <- 1
}

for(n in alln){
x <- rnorm(n, mean = theta, sd= sigma_0)
x_bar <- mean(x)

a <- (n * x_bar)/sigma_0^2 + prior_mu/prior_tau^2


b <- n/sigma_0^2 + 1/prior_tau^2

post_mu <- a/b


print(post_mu)
post_sigma <- 1/(n/sigma_0^2 + 1/prior_tau^2)

xx <- seq(-5, 5, by = 0.001)


xx_prior <- xx * prior_tau + prior_mu
xx_post <- xx * post_sigma + post_mu

Y <- cbind(dnorm(xx_prior, mean = prior_mu, sd= prior_tau), dnorm(xx_post, mean = post_mu, sd = post
X <- cbind(xx_prior, xx_post)
matplot(X, Y, type = 'l', lty = 1, main = paste("n = ", n))
abline(v = x_bar, lty = 1)
}
}
# [1] 1.957306

n= 1
0.20
0.10
Y

0.00

−10 0 5 10

# [1] 2.376356

2
n= 2

0.20
0.10
Y

0.00
−5 0 5 10

# [1] 1.561813

n= 5
0.30
0.15
Y

0.00

−5 0 5 10

# [1] 2.718445

n = 10
0.0 0.2 0.4
Y

−5 0 5 10

# [1] 2.307785

3
n = 100

0 1 2 3 4
Y
−5 0 5 10

# [1] 2.028544

n = 10000
400
200
Y

−5 0 5 10

# [1] 4.495602

n= 1
0.4
0.2
Y

0.0

0 2 4 6 8 10

# [1] 4.879569

4
n= 2

0.4
Y

0.2
0.0
0 2 4 6 8 10

# [1] 3.914996

n= 5
0.6
0.3
Y

0.0

0 2 4 6 8 10

# [1] 2.632301

n = 10
0.8
0.4
Y

0.0

0 2 4 6 8 10

# [1] 2.529208

5
n = 100

4
Y

2
0
0 2 4 6 8 10

# [1] 2.069093

n = 10000
400
200
Y

0 2 4 6 8 10

Question 3

Suppose there is a Beta(4, 4) prior distribution on the the probability θ that a coin will yield a “head” when
spun in a specified maner. The coin is independently spun ten times, and “heads” appear fewer than 3 times.
You are not told how many heads were seen, only that the number is less than 3. Calculate your exact
posterior density (up to a proportionality constant) for θ and plot it.
Prior density:
π(θ) ∝ θ3 (1 − θ)3
Likelihood:

10 0 10 1 10 2
     
f (data|θ) = θ (1 − θ)1 0 + θ (1 − θ)9 + θ (1 − θ)8
0 1 2
= (1 − θ)10 + 10θ(1 − θ)9 + 45θ2 (1 − θ)8

Posterior density:
π(θ|data) ∝ θ3 (1 − θ)13 + 10θ4 (1 − θ)12 + 45θ5 (1 − θ)11

6
theta <- seq(0, 1, .01)
dens <- theta^3 * (1-theta)^13 + 10 * theta^4 * (1-theta)^12 + 45 * theta^5 * (1-theta)^11
plot (theta, dens, ylim=c(0,1.1*max(dens)), type="l", xlab="theta", ylab="", xaxs="i",yaxs="i", yaxt="n"

0.0 0.4 0.8

theta

Question 4

Suppose your prior distribution for θ, the proportion of Californians who support the deat penalty, is beta
with mean 0.6 and standard deviation 0.3.

a. Determine the parameters α and β of your prior distribution. Plot the prior density function.
b. A random sample of 1000 Californians is taken, and 65% support the death penalty. What are your
posterior mean and variance for θ? Plot the posterior density function.

E[θ](1−E[θ])
α+β = var(θ) − 1 = 1.67
α = (α + β)E[θ] = 1
β = (α + β)(1 − E[θ]) = 0.67

theta <- seq(0,1,.001)


dens <- dbeta(theta,1,.67)
plot (theta, dens, xlim=c(0,1), ylim=c(0,3),
type="l", xlab="theta", ylab="", xaxs="i",
yaxs="i", yaxt="n", bty="n", cex=2)
lines (c(1,1),c(0,3),col=0)
lines (c(1,1),c(0,3),lty=3)

0.0 0.4 0.8

theta

7
Posterior distribution:
π(θ|data) = Beta(α + 650, β + 350) = Beta(651, 350.67)
E(θ|data) = 0.6499
sd(θ|data) = 0.015

theta <- seq(0,1,.001)


dens <- dbeta(theta,651,350.67)
cond <- dens/max(dens) > 0.001
plot (theta[cond], dens[cond],
type="l", xlab="theta", ylab="", xaxs="i",
yaxs="i", yaxt="n", bty="n", cex=2)

0.60 0.64 0.68

theta

Question 5

10 Prussian cavalry corp were monitored for 20 years (200 Corp-Years) and the number of fatalities due to
horse kicks were recorded:

x = # Deaths Number of Corp-Years with x Fatalities


0 109
1 65
2 22
3 3
4 1

i.i.d
Let xi , i = 1, . . . , 200, be the number of deaths in observation i. Assume that xi ∼ Poisson(θ).

a. Compute the MLE estimate θ̂MLE ?

θ̂MLE = x̄ = 122
200 = 0.61
Suppose θ ∼ Gamma(α, β).

a. What is the prior mean and variance.

E[θ] = α
β

V ar[θ] = α
β2

8
b. What is the posterior distribution π(θ|x)?

Gamma(α + n ∗ x̄, β + n)

c. What is the posterior mean and variance.

E[θ|x] = α+n∗x̄
β+n

V ar[θ|x] = α+n∗x̄
(β+n)2

Plot on the same graphic π(θ), π(θ|x) and θ̂MLE for

a. α=β = 0.5
b. α=β =1
c. α=β = 10
d. α=β = 100

n <- 200
DT <- data.frame(c(0, 1, 2, 3, 4), c(109, 65, 22, 3, 1))
xbar <- sum(DT[, 1] * DT[, 2])/n

x <- seq(0, 2, by = 0.01)

for(case in c(1, 2, 3, 4)){


if(case == 1){
alpha <- beta <- 0.5
}else if(case == 2){
alpha <- beta <- 1
}else if(case == 3){
alpha <- beta <- 10
}else if(case == 4){
alpha <- beta <- 100
}

dens <- dgamma(x, shape = alpha, rate = beta)

alpha_posterior <- alpha + n * xbar


beta_posterior <- beta + n
dens_posterior <- dgamma(x, shape = alpha_posterior, rate = beta_posterior)

matplot(x, cbind(dens, dens_posterior), lty = 1, type = 'l', ylab = "Density", xlab = "theta")
abline(v = xbar)

9
6
Density

4
2
0
0.0 0.5 1.0 1.5 2.0

theta

6
Density

4
2
0

0.0 0.5 1.0 1.5 2.0

theta
6
Density

4
2
0

0.0 0.5 1.0 1.5 2.0

theta
8
6
Density

4
2
0

0.0 0.5 1.0 1.5 2.0

theta

10
TURN IN

• Your .Rmd file


• Your Word (or pdf) file that results from knitting the Rmd.
• Make sure your group members are listed as authors, one person per group will turn in the report
• DUE: Wednesday after the lab, by 7am, loaded into moodle

Resources

• Lecture slides on Bayesian reasoning

11

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy