MATH3806 Project 1
MATH3806 Project 1
Foreword
All computations in this project are done solely using python, in particular,
the numpy, pandas, matplotlib, and scipy.stats packages. The equations
documented are ones that are technically required to compute the results.
However, computations such as finding the optimal power transformation is
done using the algorithms employed by the respective packages. Thank you
for your time and effort in advance.
1
Question 1
a.
Mean: 1 −1 · x̄ = 19.272
− 9.42 = 9.852
1
Variance: 1 −1 · S · = s21 + s22 − 2cov(x1 , x2 ) =
−1
14.139166666666666 + 62.23876666666666 − 13.472667 = 49.4326
b.
xc = x2 − x1
1
Pn
Mean: n
xc,i = 9.852
i=1 P
1 n
Variance: n−1
¯c )2
i=1 (xc,i − x = 49.4326
The means and the variances are the same, showing that there is no dif-
ference between the approaches when it comes to computing the mean and
variance of a dataset
c.
d2j = (x − x̄)⊤ S −1 (x − x̄) ∼ χ2df,α
By calculating the squared distance of each point and comparing them to a
χ2 distribution with 2 degrees of freedom and a 95% confidence level χ22,0.05 =
5.991464547107979, observations 4 (d24 = 7.749818085576154) and 20 (d220 =
9.373870986979817) are shown to be outliers.
2
d.
n
n X
MLE of Power Transformation: L(λ) = − ln(s2 ) + (λ − 1) ln(yi )
2 i=1
−1 i − 0.5
Q-Q Plot Construction: zi = Φ
n
Optimal λ for x1 = max(L(λ)) = 0.05449653698671072
y 0.05449653698671072 − 1
y (λ) =
0.05449653698671072
e.
Optimal λ for x2 = max(L(λ)) = −0.7013811554461303
y − 0.7013811554461303 − 1
y (λ) =
−0.7013811554461303
3
f.
T
By appending x1 with x2 such that x = x1 x2 , then repeating the pro-
cedures in part d and e:
Optimal λ for the bivariate case = max(L(λ)) = −0.02000817559628896
y −0.02000817559628896 − 1
y (λ) =
−0.02000817559628896
4
Question 2
a.
n
1 X
S= (xi − x̄)(xi − x̄)⊤
n − 1 i=1
T
124054.67241379 361620.44827586
x̄ = 1860.500000 8354.133333 S =
361620.44827586 3486333.15402299
b.
Half Length : 901.6522227 140.51187343
−0.10573993 −0.99439382
Eigenvector Matrix :
−0.99439382 0.10573993
s
c2 λi
Axis = ∗ ei = Half Lengthi ∗ Eigenvector Matrix,i
ni
(n − 1)p
Fp,n−p (α) := c2
n−p
No. [2000, 10000], represented by the pink dot, is not within the area of the
confidence region. Therefore, using a confidence level of 95%, the proposed
mean does not align with the given data.
5
c.
Pn
j=1 (x(j) − x̄)(q(j) − q̄)
Correlation Coefficient = qP qP
n 2 n 2
j=1 (x (j) − x̄) j=1 (q(j) − q̄)
Since the data for both x1 and x2 follow the normality assumption line pretty
closely, the bivariate assumption can be comfortably assumed to be true.
This is further supported by the correlation coefficients of both the x1 and x2
normal probability plots being 0.9892631529453173 and 0.9883208213778977,
respectively. Plus, there are no visible outliers from the scatterplot.
6
Question 3
a.
b.
−1 (n − 1)p
T 2 = n(x̄ − µ0 )⊤ Spooled (x̄ − µ0 ) ≤ Fp,n−p (α) := c2
n−p
Pk
(ni − 1)Si
Spooled = Pi=1
k
i=1 (ni − 1)
1 1
Coefficient Vector = ( ∗ S1 + ∗ S2 )−1 (µ1 − µ2 )
n1 n2
7
The null hypothesis is rejected either way, so both treatments in our case did
not cause any major differences.
c.
Simultaneous Confidence Interval=
s r s r
p(n − 1) sjj p(n − 1) sjj
x̄j − Fp,n−p (α) ≤ µj ≤ x̄j + Fp,n−p (α)
n−p n n−p n
Case Observation 31 = 184:
x1 = −11.90907531 −1.15759136 x2 = −6.16866246 8.34644023
Half Length : 6.32119702 2.24142036
−0.55889415 −0.82923901
Eigenvector Matrix :
−0.82923901 0.55889415
8
Half Length : 6.35396193 2.254200933
−0.55881027 −0.82929554
Eigenvector Matrix :
−0.82929554 0.55881027
d.
Male birds generally have larger tails than females according to the given
data. The comparison of tail sizes remains inconclusive.