0% found this document useful (0 votes)
21 views9 pages

MATH3714 Jan 2024

The document outlines the examination details for the MATH371401 module on Linear Regression and Robustness at the University of Leeds, including calculator and dictionary usage, exam duration, and structure. It contains specific questions related to linear regression models, statistical outputs, and robust statistical methods. Additionally, it provides instructions for showing calculations and using provided statistical tables.

Uploaded by

suriya sivabalan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views9 pages

MATH3714 Jan 2024

The document outlines the examination details for the MATH371401 module on Linear Regression and Robustness at the University of Leeds, including calculator and dictionary usage, exam duration, and structure. It contains specific questions related to linear regression models, statistical outputs, and robust statistical methods. Additionally, it provides instructions for showing calculations and using provided statistical tables.

Uploaded by

suriya sivabalan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Module Code: MATH371401

Module Title: Linear Regression and Robustness ©UNIVERSITY OF LEEDS

School of Mathematics Semester One 202324

Calculator instructions:

• You are allowed to use a non-programmable calculator in this examination.

Dictionary instructions:

• You are not allowed to use your own dictionary in this exam. A basic English dictionary
is available to use. Raise your hand and ask an invigilator if you need it.

Exam information:

• There are 9 pages to this examination.

• There will be 2 hours 30 minutes to complete this examination.

• This examination is worth 80% of the module mark.

• This is an Open Book Examination. You are allowed to bring one A4 sheet of notes with
you into the examination. You may write your notes on both sides of the paper. Your
notes may be printed or handwritten.

• There are four questions in this exam paper. You must answer all four questions.

• The numbers in brackets indicate the marks available for each question.

• Statistical tables are attached.

• You must show all your calculations.

• You must write your answers in the answer booklet provided. If you require an additional
answer booklet, raise your hand so an invigilator can provide one.

• You must clearly state your name and Student ID Number in the relevant boxes on your
answer booklet. Other boxes may be left blank.

Page 1 of 9 Turn the page over


Module Code: MATH371401

1. The linear regression model in matrix notation is given by

y = Xβ + ε

where y ∈ Rn , β ∈ Rp+1 , X ∈ Rn×(p+1) and ε ∼ N (0, σ 2 I). An intercept is used,


which is represented by a column of ones in the design matrix X. We assume that X ⊤ X
is invertible.

(a) Let H be the hat matrix. Give the definition of H and show that the fitted values [2]
satisfy ŷ = Hy.
(b) Show that H is idempotent. [2]
(c) Show that H is symmetric. [2]
(d) Show that the residuals satisfy ε̂ = (I − H)y. [2]
(e) Give a short proof of the fact that ε̂⊤ ŷ = 0. [3]
(f) State the distributions of y, ŷ and ε̂. (No proofs are required for this part.) [3]
(g) Let v ∈ Rn be a fixed vector. Determine the distribution of v ⊤ ε̂. [2]
(h) Determine the expectation E ŷ ⊤ ŷ . Explicitly state any results from lectures you

[4]
may use.

Page 2 of 9 Turn the page over


Module Code: MATH371401

2. Consider the following R ouput, for fitting the model

y = β0 + β1 x1 + β2 x2 + β3 x3 + β4 x4 + ε

to data:
Call:
lm(formula = y ~ x1 + x2 + x3 + x4)

Residuals:
Min 1Q Median 3Q Max
-2.2495 -0.6392 -0.1275 0.8969 1.9140

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.11799 0.55602 0.212 0.834
x1 -0.10824 0.07753 -1.396 0.176
x2 0.40373 0.33949 1.189 0.246
x3 3.32964 0.31410 10.601 2.51e-10 ***
x4 3.25805 0.08065 40.398 < 2e-16 ***
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 1.126 on 23 degrees of freedom


Multiple R-squared: 0.9897, Adjusted R-squared: 0.9879
F-statistic: 553.1 on 4 and 23 DF, p-value: < 2.2e-16

(a) Using the information from the R output, test the hypothesis H0 : β1 = −1 against [4]
H1 : β1 ̸= −1 at the 5% significance level. Clearly state your conclusion.
(b) Determine a 95% confidence interval for β2 . [4]
(c) Assume that we want to test

H0 : β2 = 1 and β3 = β4 against H1 : β2 ̸= 1 or β3 ̸= β4 .

i. How is the test statistic calculated for this test? [3]


ii. What is the distribution of the test statistic under H0 ? [2]
iii. Which part of the R output above can be used when computing the test [2]
statistic? Which additional information (not shown in the output above) is
required?
(d) Finally, assume that the error variance σ 2 is known exactly.
i. State the distribution of β̂i . [2]
ii. From this, derive the formula for a 95% confidence interval for βi in terms of [3]
the error variance σ 2 .

Page 3 of 9 Turn the page over


Module Code: MATH371401

3. A pack of food is taken from the freezer for thawing, and the temperature is repeatedly
measured, after 1, 2, . . . , 20 minutes. Some of the results are given in the following
table:

t [min] 1 2 3 ··· 20
y [°C] -16.2 -14.8 -13.9 · · · 2.9

(a) A linear model is fitted to these data. The following figure shows a residual plot [4]
(left) and a Q-Q plot (right) for the fitted model.

Sample Quantiles
0.5

0.5
resid(m)

−0.5

−0.5
−1.5

−15 −10 −5 0 −1.5 −2 −1 0 1 2

fitted(m) Theoretical Quantiles

Discuss relevant features seen in these plots. Which problem with the model fit do
these plots indicate?
(b) In this model, how would the parameter estimate β̂ change if time was measured [4]
in hours (instead of minutes) and temperature in Fahrenheit (instead of Celsius)?
To transform between degrees Celsius and degrees Fahrenheit, note that c°C cor-
responds to f °F, when f = 1.8 c + 32.
(c) Based on results from physics, a good way to model the temperature might be to [4]
use the model
y = Troom − Ae−kt ,
where Troom is the known room temperature. How can a linear model be used to
estimate the unknown constants A and k from data?

[The question continues overleaf.]

Page 4 of 9 Turn the page over


Module Code: MATH371401

We now return to the general linear regression problem, instead of the specific example
above. Let β̂ (i) ∈ Rp+1 be the estimate of β, computed from the data after removing
the ith observation. It can be shown that
ε̂i
β̂ (i) − β̂ = −(X ⊤ X)−1 xi ,
1 − hii
where xi is the ith row of the design matrix X and hii is the ith diagonal element of
the hat matrix H. (You don’t need to prove this.)
(i)
(d) Let ŷi be the fitted value for xi computed from the model fitted after removing [4]
the ith observation. Show that
(i) hii
ŷi − ŷi = −ε̂i .
1 − hii

(e) The Prediction Error Sum of Squares (PRESS) is defined as [4]


n
(i) 2
X
PRESS = yi − ŷi .
i=1

Show that the PRESS can be computed from the residuals as


n
X ε̂2i
PRESS = .
i=1
(1 − hii )2

Page 5 of 9 Turn the page over


Module Code: MATH371401

4. (a) For a small dataset with n = 20 observations and p = 2 inputs, a linear model
with intercept is fitted using least squares regression. The following table shows the
diagonal elements of the hat matrix, Cook’s D values, residuals and studentized
residuals for each observation:
i hii Di epsi ri
1 0.065 0.001 0.075 0.227
2 0.126 0.000 -0.013 -0.040
3 0.245 0.011 -0.097 -0.324
4 0.205 0.004 0.064 0.208
5 0.150 0.000 -0.001 -0.004
6 0.083 0.011 0.199 0.605
7 0.837 1.371 -0.124 -0.895
8 0.054 0.008 0.213 0.639
9 0.060 0.015 -0.279 -0.837
10 0.051 0.040 0.495 1.479
11 0.069 0.046 0.452 1.365
12 0.145 0.555 -0.993 -3.130
13 0.052 0.019 -0.342 -1.023
14 0.118 0.030 -0.265 -0.821
15 0.128 0.003 -0.076 -0.236
16 0.086 0.025 0.291 0.887
17 0.101 0.014 0.197 0.606
18 0.074 0.007 0.166 0.501
19 0.159 0.006 -0.095 -0.302
20 0.189 0.014 0.133 0.429
i. Identify any outliers in this dataset. Justify your answer. [4]
ii. The studentized residual for the ith observation is defined as [4]
ε̂i
ri = p .
σ̂ 2 (1 − hii )
Explain the motivation of this definition and discuss the advantages of utilizing
studentized residuals over ordinary residuals, ε̂i .
(b) In a few sentences, explain what an M estimator is, and how M estimators address [3]
the issue of x-space and y-space outliers.
(c) Explain two reasons why robust statistical methods are often not used by default, [4]
and are instead only utilized when the presence of outliers is suspected in the data.
(d) For n data points (xi , yi ), the Theil-Sen estimator calculates the n(n − 1)/2 slopes
between all pairs of points (xi , yi ) and (xj , yj ) with i < j, and then takes the
median slope as a robust estimate of the slope of the regression line in simple
regression. We denote the median slope by β̂.

i. Show that by changing k := ⌈(1 − 1/ 2)n⌉ of the points, the value of β̂ [3]
can be made arbitrarily large. (The symbol ⌈x⌉ denotes the smallest integer
greater than or equal to x.)
ii. What does the result from part (i) mean for the breakdown point of the [2]
estimator β̂?

Page 6 of 9 End of questions


Module Code: MATH371401

Normal Distribution Function Tables

The first table gives


Z x
1 1 2
Φ(x) = √ e− 2 t dt
2π −∞

and this corresponds to the shaded area in Φ(x)


the figure to the right. Φ(x) is the prob-
ability that a random variable, normally dis-
tributed with zero mean and unit variance, will
be less than or equal to x. When x < 0 use
Φ(x) = 1 − Φ(−x), as the normal distribution x
with mean zero is symmetric about zero. To
interpolate, use the formula
x − x1 
Φ(x) ≈ Φ(x1 ) + Φ(x2 ) − Φ(x1 )
x 2 − x1

Table 1

x Φ(x) x Φ(x) x Φ(x) x Φ(x) x Φ(x) x Φ(x)

0.00 0.5000 0.50 0.6915 1.00 0.8413 1.50 0.9332 2.00 0.9772 2.50 0.9938
0.05 0.5199 0.55 0.7088 1.05 0.8531 1.55 0.9394 2.05 0.9798 2.55 0.9946
0.10 0.5398 0.60 0.7257 1.10 0.8643 1.60 0.9452 2.10 0.9821 2.60 0.9953
0.15 0.5596 0.65 0.7422 1.15 0.8749 1.65 0.9505 2.15 0.9842 2.65 0.9960
0.20 0.5793 0.70 0.7580 1.20 0.8849 1.70 0.9554 2.20 0.9861 2.70 0.9965

0.25 0.5987 0.75 0.7734 1.25 0.8944 1.75 0.9599 2.25 0.9878 2.75 0.9970
0.30 0.6179 0.80 0.7881 1.30 0.9032 1.80 0.9641 2.30 0.9893 2.80 0.9974
0.35 0.6368 0.85 0.8023 1.35 0.9115 1.85 0.9678 2.35 0.9906 2.85 0.9978
0.40 0.6554 0.90 0.8159 1.40 0.9192 1.90 0.9713 2.40 0.9918 2.90 0.9981
0.45 0.6736 0.95 0.8289 1.45 0.9265 1.95 0.9744 2.45 0.9929 2.95 0.9984

0.50 0.6915 1.00 0.8413 1.50 0.9332 2.00 0.9772 2.50 0.9938 3.00 0.9987

Table 2. The inverse function Φ−1 (p) is tabulated below for various values of p.

p 0.900 0.950 0.975 0.990 0.995 0.999 0.9995


Φ−1 (p) 1.2816 1.6449 1.9600 2.3263 2.5758 3.0902 3.2905

Page 7 of 9 Turn the page over


Module Code: MATH371401

Quantiles of the t-Distribution

This table gives the α-quantiles for the t(ν)-


distribution with various degrees of freedom ν,
and for various values of α, as indicated by
the figure to the right. Quantiles for the P (T ≤ qα ) = α
lower tail can be found using the symmetry
P (T ≤ −q) = P (T ≥ q) = 1 − P (T ≤ q).
The limiting distribution of t(ν) as ν → ∞ is
the standard normal distribution. 0 qα x

ν α = 0.9 α = 0.95 α = 0.975 α = 0.99 α = 0.995


1 3.078 6.314 12.706 31.821 63.657
2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841
4 1.533 2.132 2.776 3.747 4.604
5 1.476 2.015 2.571 3.365 4.032
6 1.440 1.943 2.447 3.143 3.707
7 1.415 1.895 2.365 2.998 3.499
8 1.397 1.860 2.306 2.896 3.355
9 1.383 1.833 2.262 2.821 3.250
10 1.372 1.812 2.228 2.764 3.169
11 1.363 1.796 2.201 2.718 3.106
12 1.356 1.782 2.179 2.681 3.055
13 1.350 1.771 2.160 2.650 3.012
14 1.345 1.761 2.145 2.624 2.977
15 1.341 1.753 2.131 2.602 2.947
16 1.337 1.746 2.120 2.583 2.921
17 1.333 1.740 2.110 2.567 2.898
18 1.330 1.734 2.101 2.552 2.878
19 1.328 1.729 2.093 2.539 2.861
20 1.325 1.725 2.086 2.528 2.845
25 1.316 1.708 2.060 2.485 2.787
30 1.310 1.697 2.042 2.457 2.750
35 1.306 1.690 2.030 2.438 2.724
40 1.303 1.684 2.021 2.423 2.704
45 1.301 1.679 2.014 2.412 2.690
50 1.299 1.676 2.009 2.403 2.678
60 1.296 1.671 2.000 2.390 2.660
80 1.292 1.664 1.990 2.374 2.639
100 1.290 1.660 1.984 2.364 2.626

Page 8 of 9 Turn the page over


Module Code: MATH371401

Quantiles of the χ2-Distribution

This table gives the α-quantiles for the χ2 (ν)-


distribution with various degrees of freedom ν,
and for various values of α, as indicated by the P (Y ≤ qα ) = α
figure to the right. (The figure uses ν√= 3.)
If Y ∼ χ2 (ν) for ν > 100, then 2Y is
approximately
√ normally distributed with mean
2ν − 1 and unit variance.
0 qα x

ν α = 0.025 α = 0.9 α = 0.95 α = 0.975 α = 0.99 α = 0.995


1 0.001 2.706 3.841 5.024 6.635 7.879
2 0.051 4.605 5.991 7.378 9.210 10.597
3 0.216 6.251 7.815 9.348 11.345 12.838
4 0.484 7.779 9.488 11.143 13.277 14.860
5 0.831 9.236 11.070 12.833 15.086 16.750
6 1.237 10.645 12.592 14.449 16.812 18.548
7 1.690 12.017 14.067 16.013 18.475 20.278
8 2.180 13.362 15.507 17.535 20.090 21.955
9 2.700 14.684 16.919 19.023 21.666 23.589
10 3.247 15.987 18.307 20.483 23.209 25.188
11 3.816 17.275 19.675 21.920 24.725 26.757
12 4.404 18.549 21.026 23.337 26.217 28.300
13 5.009 19.812 22.362 24.736 27.688 29.819
14 5.629 21.064 23.685 26.119 29.141 31.319
15 6.262 22.307 24.996 27.488 30.578 32.801
16 6.908 23.542 26.296 28.845 32.000 34.267
17 7.564 24.769 27.587 30.191 33.409 35.718
18 8.231 25.989 28.869 31.526 34.805 37.156
19 8.907 27.204 30.144 32.852 36.191 38.582
20 9.591 28.412 31.410 34.170 37.566 39.997
25 13.120 34.382 37.652 40.646 44.314 46.928
30 16.791 40.256 43.773 46.979 50.892 53.672
35 20.569 46.059 49.802 53.203 57.342 60.275
40 24.433 51.805 55.758 59.342 63.691 66.766
45 28.366 57.505 61.656 65.410 69.957 73.166
50 32.357 63.167 67.505 71.420 76.154 79.490
60 40.482 74.397 79.082 83.298 88.379 91.952
80 57.153 96.578 101.879 106.629 112.329 116.321
100 74.222 118.498 124.342 129.561 135.807 140.169

Page 9 of 9 End of paper

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy