0% found this document useful (0 votes)

58 views16 pages

Lecture 8: Inference 36-401, Fall 2015, Section B

1. The document discusses the sampling distributions of the estimators βb1, βb0, and σ^2 from a Gaussian-noise simple linear regression model. 2. It derives that βb1 follows a normal distribution centered at the true slope β1 and σ^2/ns^2X variance. βb0 also follows a normal distribution but with a more complicated variance involving x. 3. It states that σ^2 follows a chi-squared distribution with n-2 degrees of freedom, which provides the basis for statistical inference within the model.

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views16 pages

Lecture 8: Inference 36-401, Fall 2015, Section B

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Lecture 8: Inference

36-401, Fall 2015, Section B

Having gone over the Gaussian-noise simple linear regression model, over ways of estimating
its parameters and some of the properties of the model, and over how to check the model’s as-
sumptions, we are now ready to begin doing some serious statistical inference within the model. In
previous lectures, we came up with point estimators of the parameters and the conditional mean
(prediction) function, but we weren’t able to say much about the margin of uncertainty around
these estimates. In this lecture we will focus on supplementing point estimates with reliable mea-
sures of uncertainty. This will naturally lead us to testing hypotheses about the true parameters
— again, we will want hypothesis tests which are unlikely to get the answer wrong, whatever the
truth might be.
To accomplish all this, we first need to understand the sampling distribution of our point
estimators. We can find them, mathematically, but they involve the unknown true parameters
in inconvenient ways. We will therefore work to find combinations of our estimators and the true
parameters with fixed, parameter-free distributions; we’ll get our confidence sets and our hypothesis
tests from them.
Throughout this lecture, I am assuming, unless otherwise noted, that all of the assumptions of
the Gaussian-noise simple linear regression model hold.

1 b2
Sampling Distribution of βb0 , βb1 and σ
The Gaussian-noise simple linear regression model has three parameters: the intercept β0 , the slope
β1 , and the noise variance σ 2 . We’ve seen, previously, how to estimate all of these by maximum
likelihood; the MLE for the βs is the same as their least-squares estimates. These are
n
X Xi − x
cXY
βb1 = 2 = 2 Yi (1)
sX i=1
ns X

βb0 = y − βb1 x (2)

n
2 1X
σ
b = (Yi − (βb0 + βb1 Xi ))2 (3)
n
i=1

We have also seen how to re-write the first two of these as a deterministic part plus a weighted
sum of the noise terms :
n
X Xi − x
βb1 = β1 + i (4)
i=1
ns2X
n
1X Xi − x
βb0 = β0 + 1−x 2 i (5)
n sX
i=1

Finally, we have our modeling assumption that the i are independent Gaussians, i ∼ N (0, σ 2 ).

1
1.1 Reminders of Basic Properties of Gaussian Distributions
Suppose U ∼ N (µ, σ 2 ). By the basic algebra of expectations and variances, E [a + bU ] = a + bµ,
while Var [a + bU ] = b2 σ 2 . This would be true of any random variable; a special property of
Gaussians.
Suppose U1 , U2 , . . . Un are independent Gaussians, with means µi and variances σi2 . Then
n
X X X
Ui ∼ N µi , σi2
i=1 i i

That the expected values add up for a sum is true of all random variables; that the variances add
up is true for all uncorrelated random variables. That the sum follows the same type of distribution
as the summands is a special property of Gaussians.

1.2 Sampling Distribution of βb1

Since we’re assuming Gaussian noise, the i are independent Gaussians, i ∼ N (0, σ 2 ). Hence (using
the first basic property of Gaussians)
!
Xi − x 2 2

Xi − x
i ∼ N 0, σ
ns2X ns2X
Thus, using the second basic property of Gaussians,
n n
!
Xi − x 2

X Xi − x 2
X
i ∼ N 0, σ (6)
i=1
ns2X i=1
ns2X
σ2

= N 0, 2 (7)
nsX
Using the first property of Gaussians again,
σ2

βb1 ∼ N β1 , 2 (8)
nsX
This is the distribution of estimates we’d see if we repeated the experiment (survey, observation,
etc.) many times, and collected the results. Every particular run of the experiment would give a
slightly different βb1 , but they’d average out to β1 , the average squared difference from β1 would be
σ 2 /ns2X , and a histogram of them would follow the Gaussian probability density function (Figure
2).
It is a bit hard to use Eq. 8, because it involves two of the unknown parameters. We can
manipulate it a bit to remove one of the parameters from the probability distribution,
σ2
βb1 − β1 ∼ N (0, 2 )
nsX
but that still has σ 2 on the right hand side, so we can’t actually calculate anything. We could write
βb1 − β1
q ∼ N (0, 1)
σ/ ns2X
but now we’ve got two unknown parameters on the left-hand side, which is also awkward.

2
# Simulate a Gaussian-noise simple linear regression model
# Inputs: x sequence; intercept; slope; noise variance; switch for whether to
# return the simulated values, or run a regression and return the coefficients
# Output: data frame or coefficient vector
sim.gnslrm <- function(x, intercept, slope, sigma.sq, coefficients=TRUE) {
n <- length(x)
y <- intercept + slope*x + rnorm(n,mean=0,sd=sqrt(sigma.sq))
if (coefficients) {
return(coefficients(lm(y~x)))
} else {
return(data.frame(x=x, y=y))
}
}

# Fix an arbitrary vector of x's

x <- seq(from=-5, to=5, length.out=42)

Figure 1: Code setting up a simulation of a Gaussian-noise simple linear regression model, along a fixed
vector of Xi values.

1.3 Sampling Distribution of βb0

Starting from Eq. 5 rather than Eq. 4, an argument exactly parallel to the one we just went through
gives
σ2 x2

β0 ∼ N (β0 ,
b 1+ 2 )
n sX
It follows, again by parallel reasoning, that

βb0 − β0
r ∼ N (0, 1)
σ2 x2
n 1 + s2
X

The right-hand side of this equation is admirably simple and easy for us to calculate, but the left-
hand side unfortunately involves two unknown parameters, and that complicates any attempt to
use it.

1.4 b2
Sampling Distribution of σ
It is mildly challenging, but certainly not too hard, to show that
2 n − 2 2
E σ
b = σ
n
We can be much more specific. When i ∼ N (0, σ 2 ), it can be shown that

σ2
nb
∼ χ2n−2
σ2

3
25
20
15
Density

10
5
0

−2.06 −2.04 −2.02 −2.00 −1.98 −1.96 −1.94

^
β1

# Run the simulation 10,000 times and collect all the coefficients
# What intercept, slope and noise variance does this impose?
many.coefs <- replicate(1e4, sim.gnslrm(x=x, 5, -2, 0.1, coefficients=TRUE))
# Histogram of the slope estimates
hist(many.coefs[2,], breaks=50, freq=FALSE, xlab=expression(hat(beta)[1]),
main="")
# Theoretical Gaussian sampling distribution
theoretical.se <- sqrt(0.1/(length(x)*var(x)))
curve(dnorm(x,mean=-2,sd=theoretical.se), add=TRUE,
col="blue")

Figure 2: Simulating 10,000 runs of a Gaussian-noise simple linear regression model, calculating βb1 each
time, and comparing the histogram of estimates to the4 theoretical Gaussian distribution (Eq. 8, in blue).
1.5 Standard Errors of βb0 and βb1
The standard error of an estimator is its standard deviation. We’ve just seen that the true
standard errors of βb0 and βb1 are, respectively,
h i σ
se βb1 = √ (9)
sx n
σ
h i q
se βb0 = √ s2X + x2 (10)
nsX

Unfortunately, these standard errors involve the unknown parameter σ 2 (or its square root σ,
equally unknown to us).
We can, however, estimate the standard errors. The maximum-likelihood estimates just substi-
tute σ
b for σ:
h i σ
√
b
se
b βb1 = (11)
sx n
σ
h i q
√ s2X + x2
b
se
b βb0 = (12)
sX n

For later theoretical purposes, however, things will work out slightly nicer if we use the de-biased
n
version, n−2 b2 :
σ
h i σ
√
b
se
b βb1 = (13)
sx n − 2
σ
h i q
√ s2X + x2
b
se
b βb0 = (14)
sx n − 2
These standard errors — approximate or estimated though they be — are one important way
of quantifying how much uncertainty there is around our point estimates. However, we can’t use
them, alone
h i to say anything
h i terribly precise about, say, the probability that β1 is in the interval
[β1 − se
b b β1 , β1 − se
b b b β1 ], which is the sort of thing we’d want to be able to give guarantees about
b
the reliability of our estimates.
h i
2 Sampling distribution of (β − β)/se
b b βb

It should take only a little work with the properties of the Gaussian distribution to convince yourself
that
βb1 − β1
h i ∼ N (0, 1)
se βb1

5
h i
the standard Gaussian distribution. If the Oracle told us σ 2 , we’d know se βb1 , and so we could
assert that (for example)
h i h i
P β1 − 1.96se βb1 ≤ βb1 ≤ β1 + 1.96se βb1 (15)
h i h i
= P −1.96se βb1 ≤ βb1 − β1 ≤ 1.96se βb1 (16)
 
βb1 − β1
= P −1.96 ≤ h i ≤ 1.96 (17)
se βb1
= Φ(1.96) − Φ(−1.96) = 0.95 (18)

where Φ is the cumulative distribution function of the N (0, 1) distribution.

Since the oracles have fallen silent, we can’t use this approach. What we can do is use the
following fact:

Proposition 1 If Z ∼ N (0, 1), S 2 ∼ χ2d , and Z and S 2 are independent, then

Z
p ∼ td
S 2 /d

(I call this a proposition, but it’s almost a definition of what we mean by a t distribution with d
degrees of freedom. Of course, if we take this as the definition, the proposition that this distribu-
tion has a probability density ∝ (1 + x2 /d)−(d+1)/2 would become yet another proposition to be
demonstrated.) h i
Let’s try to manipulate (βb1 − β1 )/se
b βb1 into this form.

βb1 − β1 βb1 − β1 σ
h i = h i
se
b βb1 σ se
b βb 1

βb1 −β1
σ N (0, 1/ns2X ) sX N (0, 1/ns2X )
= = σ
=
b [βb1 ]
se √
b √σ
b
σ sx σ n−2 σ n−2
√
N (0, 1/n) nN (0, 1/n) N (0, 1)
= = √ =q
√σ
b
√ nb
σ σ2 1
nb
σ n−2 σ n−2 σ 2 n−2
N (0, 1)
= q = tn−2
χ2n−2 /(n − 2)

where in the last step I’ve used the proposition I stated (without proof) above.
To sum up:
h i
Proposition 2 Using the se b βb1 of Eq. 13,

βb1 − β1
h i ∼ tn−2 (19)
se
b βb1

6
h i
Notice that we can compute se b βb1 without knowing any of the true parameters — it’s a pure
statistic, just a function of the data. This is a key to actually using the proposition for anything
useful.
By exactly parallel reasoning, we may also demonstrate that
βb0 − β0
h i ∼ tn−2
se
b βb0

3 Confidence Intervals and Tests

Define k ≡ k(n, α) such that
Z k(n,α)
f (u)du = 1 − α
−k(n,α)

b βb1 ). A 1 − α confidence interval for β1 is

where f is the density of a tn−2 distribution. Let s = se(
h i
C = βb1 − ks, βb1 + ks .

To verify this, note that

!
βb1 − β1
P (β1 ∈ C) = P (βb1 − ks ≤ β1 ≤ βb1 + ks) = P −k ≤ <k
s
= P (−k < T < k) = 1 − α
where T denotes a random var iable with a tn−2 distribution. So the interval traps β1 with proba-
bility 1 − α.
Suppose we want to test
H0 : β1 = 0 versus H1 : β1 6= 0.
We can just reject H0 is 0 is not in C. Equivalently, reject H0 if
|βb1 |
> k(n, α).
s
This is called the Wald test.
h i
Width of the confidence interval Notice that the width of the confidence interval is 2k(n, α)se
b βb1 .
This tells us what controls the width of the confidence interval:
1. As α shrinks, the interval widens. (High confidence comes at the price of big margins of
error.)
2. As n grows, the interval shrinks. (Large samples mean precise estimates.)
3. As σ 2 increases, the interval widens. (The more noise there is around the regression line, the
less precisely we can measure the line.)
4. As s2X grows, the interval shrinks. (Widely-spread measurements give us a precise estimate
of the slope.)

7
What abouth i β0 ? By exactly h iparallel reasoning, a 1 − α confidence interval for β0 is [β0 −
b
k(n, α)se
b βb0 , βb0 + k(n, α)se
b βb0 ].

What about σ 2 ? See Exercise 1.

What α should we use? It’s become conventional to set α = 0.05. To be honest, this owes more
to the fact that the resulting k tends to 1.96 as n → ∞, and 1.96 ≈ 2, and most psychologists and
economists could multiply by 2, even in 1950, than to any genuine principle of statistics or scientific
method. A 5% error rate corresponds to messing up about one working day in every month, which
you might well find high. On the other hand, there is nothing which stops you from increasing α.
It’s often illuminating to plot a series of confidence sets, at different values of α.

What about power? The coverage of a confidence set is the probability that it includes the
true parameter value. This is not, however, the only virtue we want in a confidence set; if it was, we
could just say “Every possible parameter is in the set”, and have 100% coverage no matter what.
We would also like the wrong values of the parameter to have a high probability of not being in the
set. Just as the coverage is controlled by the size / false-alarm probability / type-I error rate α of
the hypothesis test, the probability of excluding the wrong parameters is controlled by the power
/ miss probability / type-II error rate. Test with higher power exclude (correctly) more parameter
values, and give smaller confidence sets.

3.1 Confidence Sets and Hypothesis Tests

There is a general relationship between confidence sets and hypothesis tests.

1. Inverting any hypothesis test gives us a confidence set.

2. If we have a way of constructing a 1 − α confidence set, we can use it to test the hypothesis
that β = β ∗ : reject when β ∗ is outside the confidence set, retain the null when β ∗ is inside
the set.

I will leave it as a pair of exercises (2 and 3) to that inverting a test of size α gives a 1−α confidence
set, and that inverting a 1 − α confidence set gives a test of size α.

3.2 Large-n Asymptotics

h i h i
As n → ∞, b2
σ → σ2. b β → se βb . Hence,
It follows (by continuity) that se b

βb − β
h i → N (0, 1)
se
b βb

which considerably simplifies the sampling intervals and confidence sets; as n grows, we can forget
about the t distribution and just use the standard Gaussian distribution. Figure 3 plots the
convergence of k(n, α) towards the k(∞, α) we’d get from the Gaussian approximation. As you
can see from the figure, by the time n = 100 —a quite small data set by modern standards — the
difference between the t distribution and the standard-Gaussian is pretty trivial.

8
α = 0.01
10

α = 0.05
α = 0.5
8
6
k(n, α)

4
2
0

5 10 50 100 500 5000

Sample size (n)

curve(qt(0.995,df=x-2),from=3,to=1e4,log="x", ylim=c(0,10),
xlab="Sample size (n)", ylab=expression(k(n,alpha)),col="blue")
abline(h=qnorm(0.995),lty="dashed",col="blue")
curve(qt(0.975,df=x-2), add=TRUE)
abline(h=qnorm(0.975),lty="dashed")
curve(qt(0.75,df=x-2), add=TRUE, col="orange")
abline(h=qnorm(0.75), lty="dashed", col="orange")
legend("topright", legend=c(expression(alpha==0.01), expression(alpha==0.05),
expression(alpha==0.5)),
col=c("blue","black","orange"), lty="solid")

Figure 3: Convergence of k(n, α) as n → ∞, illustrated for α = 0.01, α = 0.05 and α = 0.5. (Why do I
plot the 97.5th percentile when I’m interested in α = 0.05?)
9
4 Statistical Significance: Uses and Abuses
4.1 p-Values
The test statistic for the Wald test,
βb1 − β1∗
T = h i
se
b βb1

has the nice, intuitive property that it ought to be close to zero when the null hypothesis β1 = β1∗
is true, and take large values (either positive or negative) when the null hypothesis is false. When
a test statistic works like this, it makes sense to summarize just how bad the data looks for the
null hypothesis in a p-value: when our observed value of the test statistic is Tobs , the p-value is

P = P (|T | ≥ |Tobs |)

calculating the probability under the null hypothesis. (I write a capital P here as a reminder that
this is a random quantity, though it’s conventional to write the phrase “p-value” with a lower-case
p.) This is the probability, under the null, of getting results which are at least as extreme as what
we saw. It should be easy to convince yourself that rejecting the null in a level-α test is the same
as getting a p-value < α.
It is not too hard (Exercise 4) to show that P has a uniform distribution over [0, 1] under the
null hypothesis.

4.2 p-Values and Confidence Sets

When our test lets us calculate a p-value, we can form a 1 − α confidence set by taking all the β’s
where the p-value is ≥ α. Conversely, if we have some way of making confidence sets already, we
can get a p-value for the hypothesis β = β ∗ ; it’s the largest α such that β ∗ is in the 1 − α confidence
set.

4.3 Statistical Significance

If we test the hypothesis that β1 = β1∗ and reject it, we say that the difference between β1 and β1∗
is statistically significant. Since, as I mentioned, many professions have an overwhelming urge
to test the hypothesis β1 = 0, it’s common to hear people say that “β1 is statistically significant”
when they mean “β1 is difference from 0 is statistically significant”.
This is harmless enough, as long as we keep firmly in mind that “significant” is used here as a
technical term, with a special meaning, and is not the same as “important”, “relevant”, etc. When
we reject the hypothesis that β1 = 0, what we’re saying is “It’s really implausibly hard to fit this
data with a flat line, as opposed to one with a slope”. This is informative, if we had serious reasons
to think that a flat line was a live option.
It is incredibly common for researchers from other fields, and even some statisticians, to reason
as follows: “I tested whether β1 = 0 or not, and I retained the null; therefore β1 is insignificant,
and I can ignore it.” This is, of course, a complete fallacy.
To see why, it is enough to realize that there are (at least) two reasons why our hypothesis test
might retain the null β1 = 0:

1. β1 is, in fact, zero,

10
h i
2. β1 6= 0, but se
b βb1 is so large that we can’t tell anything about β1 with any confidence.

There is a very big difference between data which lets us say “we can be quite confident that the
true β1 is, if not perhaps exactly 0, then very small”, and data which only lets us say “we have
no earthly idea what β1 is, and it may as well be zero for all we can tell.” It is good practice to
always compute a confidence interval, but it is especially important to do so when you retain the
null, so you know whether you can say “this parameter is zero to within such-and-such a (small)
precision”, or whether you have to admit “I couldn’t begin to tell you what this parameter is”.

Substantive vs. statistical significance Even a huge β1 , which h iit would be crazy to ignore in
any circumstance, can be statistically insignificant, so long as se
b βb1 is large enough. Conversely,
any β1 which isn’t exactly zero, noh matter
i how close it might be to 0, will become statistically
b β1 is small enough. Since, as n → ∞,
significant at any threshold once se b

h i σ
b βb1 →
se √
sX n
h i
βb1
b βb1 → 0, and
we can show that se → ±∞, unless β1 is exactly 0 (see below).
b [βb1 ]
se
Statistical significance is a weird mixture of how big the coefficient is, how big a sample we’ve got,
how much noise there is around the regression line, and how spread out the data is along the x axis.
This has so little to do with “significance” in ordinary language that it’s pretty unfortunate we’re
stuck with the word; if the Ancestors had decided to say “statistically detectable” or “statistically
distinguishable from 0”, we might have avoided a lot of confusion.
If you confuse substantive and statistical significance in this class, it will go badly for you.

4.4 Appropriate Uses of p-Values and Significance Testing

I do not want this section to give the impression that p-values, hypothesis testing, and statistical
significance are unimportant or necessarily misguided. They’re often used badly, but that’s true
of every statistical tool from the sample mean on down the line. There are certainly situations
where we really do want to know whether we have good evidence against some exact statistical
hypothesis, and that’s just the job these tools do. What are some of these situations?

Model checking Our statistical models often make very strong, claims about the probability
distribution of the data, with little wiggle room. The simple linear regression model, for instance,
claims that the regression function is exactly linear, and that the noise around this line has exactly
constant variance. If we test these claims and find very small p-values, then we have evidence that
there’s a detectable, systematic departure from the model assumptions, and we should re-formulate
the model.

Actual scientific interest Some scientific theories make very precise predictions about coeffi-
cients. According to Newton, the gravitational force between two masses is inversely proportional
to the square of the distance between them, ∝ r−2 . The prediction is exactly ∝ r−2 , not ∝ r−1.99
nor ∝ r−2.05 . Measuring that exponent and finding even tiny departures from 2 would be big news,
if we had reason to think they were real and not just noise. One of the most successful theories

11
in physics, quantum electrodynamics, makes predictions about some properties of hydrogen atoms
with a theoretical precision of one part in a trillion; finding even tiny discrepancies between what
the theory predicts and what we estimate would force us to rethink lots of physics. Experiments to
detect new particles, like the Higgs boson, essentially boil down to hypothesis testing, looking for
deviations from theoretical predictions which should be exactly zero if the particle doesn’t exist.
Outside of the natural sciences, however, it is harder to find examples of interesting, exact null
hypothesis which are, so to speak, “live options”. The best I can come up with are theories of
economic growth and business cycles which predict that the share of national income going to labor
(as opposed to capital) should be constant over time. Otherwise, in the social sciences, there’s
usually little theoretical reason to think that certain regression coefficients should be exactly zero,
or exactly one, or anything else.

5 Confidence Sets and p-Values in R

When we estimate a model with lm, R makes it easy for us to extract the confidence intervals of
the coefficients:

confint(object, level=0.95)

Here object is the name of the fitted model object, and level is the confidence level; if you
want 95% confidence, you can omit that argument. For instance:

library(gamair); data(chicago)
death.temp.lm <- lm(death ~ tmpd, data=chicago)
confint(death.temp.lm)

## 2.5 % 97.5 %
## (Intercept) 128.8783687 131.035734
## tmpd -0.3096816 -0.269607

confint(death.temp.lm, level=0.90)

## 5 % 95 %
## (Intercept) 129.0518426 130.8622598
## tmpd -0.3064592 -0.2728294

If you want p-values for the coefficients, those are conveniently computed as part of the summary
function:

coefficients(summary(death.temp.lm))

## Estimate Std. Error t value Pr(>|t|)

## (Intercept) 129.9570512 0.55022802 236.18763 0.00000e+00
## tmpd -0.2896443 0.01022089 -28.33845 3.23449e-164

Notice how this actually gives us an array with four columns: the point estimate, the standard
error, the t statistic, and finally the p-value. Each row corresponds to a different coefficient of the
model. If we want, say, the p-value of the intercept, that’s

12
coefficients(summary(death.temp.lm))[1,4]

## [1] 0

The summary function will also print out a lot of information about the model:

summary(death.temp.lm)

##
## Call:
## lm(formula = death ~ tmpd, data = chicago)
##
## Residuals:
## Min 1Q Median 3Q Max
## -42.275 -9.018 -0.754 8.187 305.952
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 129.95705 0.55023 236.19 <2e-16 ***
## tmpd -0.28964 0.01022 -28.34 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14.22 on 5112 degrees of freedom
## Multiple R-squared: 0.1358,Adjusted R-squared: 0.1356
## F-statistic: 803.1 on 1 and 5112 DF, p-value: < 2.2e-16

As my use of coefficients(summary(death.temp.lm)) above suggests, the summary function

actually returns a complex object, which can be stored for later access, and printed. Controlling
how it gets printed is done through the print function:

print(summary(death.temp.lm), signif.stars=FALSE, digits=3)

##
## Call:
## lm(formula = death ~ tmpd, data = chicago)
##
## Residuals:
## Min 1Q Median 3Q Max
## -42.27 -9.02 -0.75 8.19 305.95
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 129.9571 0.5502 236.2 <2e-16
## tmpd -0.2896 0.0102 -28.3 <2e-16
##
## Residual standard error: 14.2 on 5112 degrees of freedom
## Multiple R-squared: 0.136,Adjusted R-squared: 0.136
## F-statistic: 803 on 1 and 5112 DF, p-value: <2e-16

13
5.1 Coverage of the Confidence Intervals: A Demo
Here is a little computational demonstration of how the confidence interval for a parameter is
a random parameter, and how it covers the true parameter value with the probability we want.
I’ll repeat many simulations of the model from Figure 2, calculate the confidence interval on each
simulation, and plot those. I’ll also keep track of how often, in the first m simulations, the confidence
interval covers the truth; this should converge to 1 − α as m grows.

Exercises
To think through or to practice on, not to hand in.

1. Confidence interval for σ 2 : Start with the observation that nb

σ 2 /σ 2 ∼ χ2n−2 .

(a) Find a formula for the 1 − α sampling interval for σb2 , in terms of the CDF of the χ2n−2
2
distribution, α, n and σ . (Some of these might not appear in your answer.) Is the
width of your sampling interval the same for all σ 2 , the way the width of the sampling
interval for βb1 doesn’t change with β1 ?
(b) Fix α = 0.05, n = 40, and plot the sampling intervals against σ 2 .
(c) Find a formula for the 1 − α confidence interval for σ 2 , in terms of σ
b2 , the CDF of the
2
χn−2 distribution, α and n.

2. Suppose we start a way of testing the hypothesis β = β ∗ which can be applied to any β ∗ ,
and which has size (false alarm / type I error) probability α for β ∗ . Show that the set of β
retained by their tests is a confidence set, with confidence level 1 − α. What happens if the
size is ≤ α for all β ∗ (rather than exactly α)?

3. Suppose we start from a way of creating confidence sets which we know has confidence level
1 − α. We test the hypothesis β = β ∗ by rejecting when β ∗ is outside the confidence set, and
retaining when β ∗ is inside the confidence set. Show that the size of this test is α. What
happens if the initial confidence level is ≥ 1 − α, rather exactly 1 − α?

4. Prove that the p-value P is uniformly distributed under the null hypothesis. You may,
throughout, assume that the test statistic T has a continuous distribution.

(a) Show that if Q ∼ Unif(0, 1), then P = 1 − Q has the same distribution.
(b) Let X be a continuous random variable with CDF F . Show that F (X) ∼ Unif(0, 1).
Hint: the CDF of the uniform distribution FUnif(0,1) (x) = x.
(c) Show that P , as defined, is 1 − F|T | (|Tobs |).
(d) Using the previous parts, show that P ∼ Unif(0, 1).

5. Use Eq. ?? to show Eq. ??, following the derivation of Eq. ??.

14
●
● ●
● ● ●
●
● ● ● ● ●●
● ●
−1.95

●
● ●
● ● ● ●
● ● ● ● ● ● ●
● ● ●
● ● ● ●
● ●
● ● ● ●● ●● ● ●
●●
●
●
Confidence limits for slope

● ● ●
● ● ● ● ●
● ● ●
● ● ●● ● ● ●
● ● ● ●
● ●
● ● ● ● ● ● ● ●
● ● ● ● ●
● ● ●
● ●
●
●● ●
−2.00

● ● ●
● ● ●
●
● ● ● ● ●
● ●

●● ●
● ● ● ● ●
● ● ●
● ● ● ● ●
● ●
● ● ● ●
● ● ● ●● ● ● ●
● ● ● ● ●
● ●
● ●● ● ● ●●● ●
● ● ●
● ● ● ●
● ● ● ● ● ●
−2.05

● ● ● ● ● ●
●● ●
● ● ● ●
●● ● ● ● ● ●
● ● ●
● ●
● ●●
●

0 20 40 60 80 100

Simulation number

# Run 1000 simulations and get the confidence interval from each
CIs <- replicate(1000, confint(lm(y~x,data=sim.gnslrm(x=x,5,-2,0.1,FALSE)))[2,])
# Plot the first 100 confidence intervals; start with the lower limits
plot(1:100, CIs[1,1:100], ylim=c(min(CIs),max(CIs)),
xlab="Simulation number", ylab="Confidence limits for slope")
# Now the lower limits
points(1:100, CIs[2,1:100])
# Draw line segments connecting them
segments(x0=1:100, x1=1:100, y0=CIs[1,1:100], y1=CIs[2,1:100], lty="dashed")
# Horizontal line at the true coefficient value
abline(h=-2, col="grey")

15
1.0

●
●
●

● ●
●
●●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●● ●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
● ●
● ●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
● ●
●●
●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●● ●
●
●
●●
●
●
●●
●
●
●● ●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●●
●
●
●
●● ●
●●
● ●
●
● ●●
●
●
●●
●● ●●
●
●
●●●●
● ●
●
●●●●
●● ●
●
●
● ●●
0.8
Sample coverage proportion

0.6
0.4
0.2
0.0

0 200 400 600 800 1000

Number of simulations

# For each simulation, check whether the interval covered the truth
covered <- (CIs[1,] <= -2) & (CIs[2,] >= -2)
# Calculate the cumulative proportion of simulations where the interval
# contained the truth, plot vs. number of simulations.
plot(1:length(covered), cumsum(covered)/(1:length(covered)),
xlab="Number of simulations",
ylab="Sample coverage proportion", ylim=c(0,1))
abline(h=0.95, col="grey")

Calculus Cheat Sheet Integrals
0% (2)
Calculus Cheat Sheet Integrals
5 pages
36-708 Statistical Machine Learning Homework #4 Solutions: DUE: April 19, 2019
No ratings yet
36-708 Statistical Machine Learning Homework #4 Solutions: DUE: April 19, 2019
16 pages
m820 2011
67% (3)
m820 2011
396 pages
PID Examples
No ratings yet
PID Examples
15 pages
Class XII Assignment - Inverse Trigonometry
83% (6)
Class XII Assignment - Inverse Trigonometry
13 pages
University of Pennsylvania CIS 520: Machine Learning Midterm, 2016
No ratings yet
University of Pennsylvania CIS 520: Machine Learning Midterm, 2016
18 pages
Strengths Insight & Action Planning Guide
No ratings yet
Strengths Insight & Action Planning Guide
20 pages
The Elements of Real Analysis. Bartle, Robert G. 1964
No ratings yet
The Elements of Real Analysis. Bartle, Robert G. 1964
462 pages
1 Review
No ratings yet
1 Review
7 pages
Lecture 9: Predictive Inference
No ratings yet
Lecture 9: Predictive Inference
10 pages
Gaussian Process Tutorial by Andrew NG
No ratings yet
Gaussian Process Tutorial by Andrew NG
13 pages
Lec3 ppt2019
No ratings yet
Lec3 ppt2019
18 pages
Lecture 4: Simple Linear Regression Models, With Hints at Their Estimation
No ratings yet
Lecture 4: Simple Linear Regression Models, With Hints at Their Estimation
12 pages
Estimation
No ratings yet
Estimation
39 pages
Lecture 5
No ratings yet
Lecture 5
45 pages
ECONF241 GaussMarkov Theorem
No ratings yet
ECONF241 GaussMarkov Theorem
25 pages
9601001v1 (1)
No ratings yet
9601001v1 (1)
14 pages
murphysolns
No ratings yet
murphysolns
45 pages
PBM Notes
No ratings yet
PBM Notes
130 pages
Properties of OLS Estimators: Assumptions Underlying Model
100% (1)
Properties of OLS Estimators: Assumptions Underlying Model
23 pages
Statistical Inference Notes Melon University
No ratings yet
Statistical Inference Notes Melon University
5 pages
GenAI_Lab1
No ratings yet
GenAI_Lab1
4 pages
EECE 522 Notes - 02 CH - 1 Intro To Estimation - 2
No ratings yet
EECE 522 Notes - 02 CH - 1 Intro To Estimation - 2
15 pages
Properties of The OLS Estimator: Quantitative Methods 2
No ratings yet
Properties of The OLS Estimator: Quantitative Methods 2
57 pages
MVUE
No ratings yet
MVUE
89 pages
Learning Models From Data: 1 Parametric Estimation
No ratings yet
Learning Models From Data: 1 Parametric Estimation
14 pages
Class
No ratings yet
Class
35 pages
Week03 Lecture BB
No ratings yet
Week03 Lecture BB
112 pages
Merged Exercises
No ratings yet
Merged Exercises
238 pages
Gaussian Process - Part 2: 1 2 N T I 1 2 N T
No ratings yet
Gaussian Process - Part 2: 1 2 N T I 1 2 N T
4 pages
BNP PDF
No ratings yet
BNP PDF
108 pages
Lecture Notes On Bayesian Nonparametrics: Version: May 16, 2014
No ratings yet
Lecture Notes On Bayesian Nonparametrics: Version: May 16, 2014
108 pages
Simple Linear Regression: Parameters
No ratings yet
Simple Linear Regression: Parameters
34 pages
Chap02-5 (Autosaved)
No ratings yet
Chap02-5 (Autosaved)
66 pages
Simple-Linear-Regression-Model-3 24
No ratings yet
Simple-Linear-Regression-Model-3 24
87 pages
Asymptotic Theory and Parametric Inference
No ratings yet
Asymptotic Theory and Parametric Inference
32 pages
807 1
No ratings yet
807 1
18 pages
Statistics
No ratings yet
Statistics
53 pages
Bayes Gauss
100% (1)
Bayes Gauss
29 pages
Lecture 11_Stochastic Regressors Measurement Errors
No ratings yet
Lecture 11_Stochastic Regressors Measurement Errors
6 pages
Machine Learning and Pattern Recognition Gaussian Processes
No ratings yet
Machine Learning and Pattern Recognition Gaussian Processes
6 pages
Math644 - Chapter 1 - Part2 PDF
No ratings yet
Math644 - Chapter 1 - Part2 PDF
14 pages
Linera Regression II PDF
No ratings yet
Linera Regression II PDF
14 pages
Lecture Notes For Mathematical Statistics
No ratings yet
Lecture Notes For Mathematical Statistics
184 pages
IntroBayesTimeSeries1
No ratings yet
IntroBayesTimeSeries1
72 pages
Lecture Notes Statistics II PDF
No ratings yet
Lecture Notes Statistics II PDF
139 pages
Chaeat Sheet Econometrics
100% (2)
Chaeat Sheet Econometrics
5 pages
glm
No ratings yet
glm
4 pages
Statistic SimpleLinearRegression
No ratings yet
Statistic SimpleLinearRegression
7 pages
Bayesian Kernel Methods
No ratings yet
Bayesian Kernel Methods
40 pages
Gaussian Processes For Regression: A Tutorial
No ratings yet
Gaussian Processes For Regression: A Tutorial
7 pages
Gaussian MLEstimator
No ratings yet
Gaussian MLEstimator
42 pages
Lectura 2 Point Estimator Basics
No ratings yet
Lectura 2 Point Estimator Basics
11 pages
EXPERIMENT-11:Noise Realization: 1 Aim/Practice Questions
No ratings yet
EXPERIMENT-11:Noise Realization: 1 Aim/Practice Questions
13 pages
Supplement: Interpreting Parameters After Transformation: 1 Transformed Predictor
No ratings yet
Supplement: Interpreting Parameters After Transformation: 1 Transformed Predictor
7 pages
Statistics I: Parameter Estimation, Part II
No ratings yet
Statistics I: Parameter Estimation, Part II
22 pages
9 Mle
No ratings yet
9 Mle
39 pages
BA501 Week5 Linear Regression
No ratings yet
BA501 Week5 Linear Regression
45 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Algebraic Equations
From Everand
Algebraic Equations
Demetrios P. Kanoussis
No ratings yet
Introduction to Bessel Functions
From Everand
Introduction to Bessel Functions
Frank Bowman
2.5/5 (1)
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
A First Course in Functional Analysis
From Everand
A First Course in Functional Analysis
Martin Davis
No ratings yet
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
From Everand
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
Rob Porter
No ratings yet
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Factoring and Algebra - A Selection of Classic Mathematical Articles Containing Examples and Exercises on the Subject of Algebra (Mathematics Series)
From Everand
Factoring and Algebra - A Selection of Classic Mathematical Articles Containing Examples and Exercises on the Subject of Algebra (Mathematics Series)
CSPacademic
No ratings yet
Linear Classification: 1 1 N N I D I
No ratings yet
Linear Classification: 1 1 N N I D I
33 pages
Density Estimation 36-708
No ratings yet
Density Estimation 36-708
32 pages
Differential Privacy: 1 N I 1 N N
No ratings yet
Differential Privacy: 1 N I 1 N N
7 pages
Nonparametric Classification 10/36-702: 1 1 N N N I I
No ratings yet
Nonparametric Classification 10/36-702: 1 1 N N N I I
20 pages
Sparse Additive Models: University of California, Berkeley, USA
No ratings yet
Sparse Additive Models: University of California, Berkeley, USA
22 pages
Linear Regression: 1 1 N N I I I D I I
No ratings yet
Linear Regression: 1 1 N N I I I D I I
20 pages
Support Vector Machines
No ratings yet
Support Vector Machines
5 pages
Boosting: I I I I
No ratings yet
Boosting: I I I I
5 pages
36-708 Statistical Methods For Machine Learning Homework #1 Solutions
No ratings yet
36-708 Statistical Methods For Machine Learning Homework #1 Solutions
12 pages
Online Learning: T T T T T T T T
No ratings yet
Online Learning: T T T T T T T T
8 pages
Dimension Reduction and Hidden Structure: 1.1 Principal Component Analysis (PCA)
No ratings yet
Dimension Reduction and Hidden Structure: 1.1 Principal Component Analysis (PCA)
40 pages
High-Dimensional, Two-Sample Testing
No ratings yet
High-Dimensional, Two-Sample Testing
9 pages
10/36-702 Statistical Machine Learning Homework #2 Solutions
No ratings yet
10/36-702 Statistical Machine Learning Homework #2 Solutions
11 pages
36-708 Statistical Machine Learning Homework #3 Solutions: DUE: March 29, 2019
No ratings yet
36-708 Statistical Machine Learning Homework #3 Solutions: DUE: March 29, 2019
22 pages
Manifold Estimation, Hidden Structure and Dimension Reduction
No ratings yet
Manifold Estimation, Hidden Structure and Dimension Reduction
39 pages
Causal Inference: 1.1 Two Types of Causal Questions
No ratings yet
Causal Inference: 1.1 Two Types of Causal Questions
19 pages
Homework 4 Due Friday April 19 3:00 PM Submit A PDF File On Canvas
No ratings yet
Homework 4 Due Friday April 19 3:00 PM Submit A PDF File On Canvas
2 pages
A Closer Look at Sparse Regression Ryan Tibshirani: 2.1 Three Norms: ', ', '
No ratings yet
A Closer Look at Sparse Regression Ryan Tibshirani: 2.1 Three Norms: ', ', '
25 pages
Data Analysis Exam 1 36-401, Section B
No ratings yet
Data Analysis Exam 1 36-401, Section B
3 pages
36-401 Modern Regression HW #2 Solutions: Problem 1 (36 Points Total)
No ratings yet
36-401 Modern Regression HW #2 Solutions: Problem 1 (36 Points Total)
15 pages
High-Dimensional, Two-Sample Testing
No ratings yet
High-Dimensional, Two-Sample Testing
9 pages
Data Analysis Project 2 Due 5:00 PM Nov 21 1 Instructions
No ratings yet
Data Analysis Project 2 Due 5:00 PM Nov 21 1 Instructions
3 pages
Nonparametric Regression
No ratings yet
Nonparametric Regression
24 pages
HW7
No ratings yet
HW7
1 page
Causal Inference: 1.1 Two Types of Causal Questions
No ratings yet
Causal Inference: 1.1 Two Types of Causal Questions
8 pages
Lecture 7: Diagnostics: 36-401, Fall 2017, Section B
No ratings yet
Lecture 7: Diagnostics: 36-401, Fall 2017, Section B
35 pages
Fault Tree Analysis
No ratings yet
Fault Tree Analysis
23 pages
(191 196) V11N09CT
No ratings yet
(191 196) V11N09CT
6 pages
Module Integration
No ratings yet
Module Integration
24 pages
Data Science & Analytics Graduate Program Prerequisite Self-Assessment
No ratings yet
Data Science & Analytics Graduate Program Prerequisite Self-Assessment
3 pages
Bloom's Taxonomy Interpreted For Mathematics
No ratings yet
Bloom's Taxonomy Interpreted For Mathematics
2 pages
An Introduction To Z Transforms
No ratings yet
An Introduction To Z Transforms
26 pages
Data Analysis: Unit 4 - II
No ratings yet
Data Analysis: Unit 4 - II
7 pages
Ordinary Differential Equation
No ratings yet
Ordinary Differential Equation
73 pages
Cal163 Triple Integrals
No ratings yet
Cal163 Triple Integrals
5 pages
Question Bank Linear Control Systems
No ratings yet
Question Bank Linear Control Systems
15 pages
Numerical Analysis Notes for Mechanical Engineering
No ratings yet
Numerical Analysis Notes for Mechanical Engineering
99 pages
Lab Report Exp 1
No ratings yet
Lab Report Exp 1
20 pages
Elliott Wave With MACD Indicator ( How-To Guide ) _ BULLWAVES.org
No ratings yet
Elliott Wave With MACD Indicator ( How-To Guide ) _ BULLWAVES.org
10 pages
Section Properties of Padeye Computation of Major Axis Section Modulus
No ratings yet
Section Properties of Padeye Computation of Major Axis Section Modulus
2 pages
Analytic Functions of Complex Variable
No ratings yet
Analytic Functions of Complex Variable
18 pages
p10 p50 p90
No ratings yet
p10 p50 p90
8 pages
Gen-Math-Midterm-MIDTERM EXAM
100% (2)
Gen-Math-Midterm-MIDTERM EXAM
3 pages
Paired and Independent Samples T Test
No ratings yet
Paired and Independent Samples T Test
34 pages
Matlab Homework Experts 5
No ratings yet
Matlab Homework Experts 5
9 pages
Solutions Chapter 10
No ratings yet
Solutions Chapter 10
11 pages
A Heuristic Reference Recursive Recipe For Adaptively Tuning The Kalman Filter Statistics Part-1: Formulation and Simulation Studies
No ratings yet
A Heuristic Reference Recursive Recipe For Adaptively Tuning The Kalman Filter Statistics Part-1: Formulation and Simulation Studies
18 pages
5.1 Modeling Periodic Behaviour (JSM)
No ratings yet
5.1 Modeling Periodic Behaviour (JSM)
4 pages
DAL Workshop 2
No ratings yet
DAL Workshop 2
5 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lecture 8: Inference 36-401, Fall 2015, Section B

Uploaded by

Lecture 8: Inference 36-401, Fall 2015, Section B

Uploaded by

Lecture 8: Inference

36-401, Fall 2015, Section B

βb0 = y − βb1 x (2)

1.2 Sampling Distribution of βb1

# Fix an arbitrary vector of x's

1.3 Sampling Distribution of βb0

−2.06 −2.04 −2.02 −2.00 −1.98 −1.96 −1.94

where Φ is the cumulative distribution function of the N (0, 1) distribution.

Proposition 1 If Z ∼ N (0, 1), S 2 ∼ χ2d , and Z and S 2 are independent, then

3 Confidence Intervals and Tests

b βb1 ). A 1 − α confidence interval for β1 is

To verify this, note that

What about σ 2 ? See Exercise 1.

3.1 Confidence Sets and Hypothesis Tests

1. Inverting any hypothesis test gives us a confidence set.

3.2 Large-n Asymptotics

5 10 50 100 500 5000

Sample size (n)

4.2 p-Values and Confidence Sets

4.3 Statistical Significance

1. β1 is, in fact, zero,

4.4 Appropriate Uses of p-Values and Significance Testing

5 Confidence Sets and p-Values in R

## Estimate Std. Error t value Pr(>|t|)

As my use of coefficients(summary(death.temp.lm)) above suggests, the summary function

print(summary(death.temp.lm), signif.stars=FALSE, digits=3)

1. Confidence interval for σ 2 : Start with the observation that nb

0 200 400 600 800 1000

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.