0% found this document useful (0 votes)
20 views9 pages

MATH3806 Project 1

This project involves statistical analysis using Python, focusing on mean, variance, and outlier detection in datasets. Key findings include optimal power transformations for two datasets and the rejection of the null hypothesis regarding treatment differences. Additionally, the analysis indicates that male birds tend to have larger tails than females, although the comparison remains inconclusive.

Uploaded by

vymotu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views9 pages

MATH3806 Project 1

This project involves statistical analysis using Python, focusing on mean, variance, and outlier detection in datasets. Key findings include optimal power transformations for two datasets and the rejection of the null hypothesis regarding treatment differences. Additionally, the analysis indicates that male birds tend to have larger tails than females, although the comparison remains inconclusive.

Uploaded by

vymotu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Project 1

Chun Kit Bruce Lam


MATH 3806
01/04/2025
April 3, 2025

Foreword
All computations in this project are done solely using python, in particular,
the numpy, pandas, matplotlib, and scipy.stats packages. The equations
documented are ones that are technically required to compute the results.
However, computations such as finding the optimal power transformation is
done using the algorithms employed by the respective packages. Thank you
for your time and effort in advance.

1
Question 1
a.
 
Mean: 1 −1 · x̄ = 19.272
 − 9.42 = 9.852
  1
Variance: 1 −1 · S · = s21 + s22 − 2cov(x1 , x2 ) =
−1
14.139166666666666 + 62.23876666666666 − 13.472667 = 49.4326

b.
xc = x2 − x1
1
Pn
Mean: n
xc,i = 9.852
i=1 P
1 n
Variance: n−1
¯c )2
i=1 (xc,i − x = 49.4326

The means and the variances are the same, showing that there is no dif-
ference between the approaches when it comes to computing the mean and
variance of a dataset

c.
d2j = (x − x̄)⊤ S −1 (x − x̄) ∼ χ2df,α
By calculating the squared distance of each point and comparing them to a
χ2 distribution with 2 degrees of freedom and a 95% confidence level χ22,0.05 =
5.991464547107979, observations 4 (d24 = 7.749818085576154) and 20 (d220 =
9.373870986979817) are shown to be outliers.

2
d.
n
n X
MLE of Power Transformation: L(λ) = − ln(s2 ) + (λ − 1) ln(yi )
2 i=1
 
−1 i − 0.5
Q-Q Plot Construction: zi = Φ
n
Optimal λ for x1 = max(L(λ)) = 0.05449653698671072

y 0.05449653698671072 − 1
y (λ) =
0.05449653698671072

e.
Optimal λ for x2 = max(L(λ)) = −0.7013811554461303

y − 0.7013811554461303 − 1
y (λ) =
−0.7013811554461303

3
f.
 T
By appending x1 with x2 such that x = x1 x2 , then repeating the pro-
cedures in part d and e:
Optimal λ for the bivariate case = max(L(λ)) = −0.02000817559628896

y −0.02000817559628896 − 1
y (λ) =
−0.02000817559628896

4
Question 2
a.
n
1 X
S= (xi − x̄)(xi − x̄)⊤
n − 1 i=1
 
T
  124054.67241379 361620.44827586
x̄ = 1860.500000 8354.133333 S =
361620.44827586 3486333.15402299

b.

 
Half Length : 901.6522227 140.51187343
 
−0.10573993 −0.99439382
Eigenvector Matrix :
−0.99439382 0.10573993
s
c2 λi
Axis = ∗ ei = Half Lengthi ∗ Eigenvector Matrix,i
ni
(n − 1)p
Fp,n−p (α) := c2
n−p
No. [2000, 10000], represented by the pink dot, is not within the area of the
confidence region. Therefore, using a confidence level of 95%, the proposed
mean does not align with the given data.

5
c.
Pn
j=1 (x(j) − x̄)(q(j) − q̄)
Correlation Coefficient = qP qP
n 2 n 2
j=1 (x (j) − x̄) j=1 (q(j) − q̄)

Since the data for both x1 and x2 follow the normality assumption line pretty
closely, the bivariate assumption can be comfortably assumed to be true.
This is further supported by the correlation coefficients of both the x1 and x2
normal probability plots being 0.9892631529453173 and 0.9883208213778977,
respectively. Plus, there are no visible outliers from the scatterplot.

6
Question 3
a.

b.
−1 (n − 1)p
T 2 = n(x̄ − µ0 )⊤ Spooled (x̄ − µ0 ) ≤ Fp,n−p (α) := c2
n−p
Pk
(ni − 1)Si
Spooled = Pi=1
k
i=1 (ni − 1)
1 1
Coefficient Vector = ( ∗ S1 + ∗ S2 )−1 (µ1 − µ2 )
n1 n2

Case Observation 31 = 184:

T 2 : 25.662530996663882 Processed F : 6.273885668660057


 
Coefficient Vector : −3.57426836 2.12202034

Case Delete Observation:

T 2 : 24.96490074510203 Processed F : 6.2772565319529265


 
Coefficient Vector : −3.49023807 2.07954999

7
The null hypothesis is rejected either way, so both treatments in our case did
not cause any major differences.

c.
Simultaneous Confidence Interval=
s r s r
p(n − 1) sjj p(n − 1) sjj
x̄j − Fp,n−p (α) ≤ µj ≤ x̄j + Fp,n−p (α)
n−p n n−p n
Case Observation 31 = 184:
   
x1 = −11.90907531 −1.15759136 x2 = −6.16866246 8.34644023

 
Half Length : 6.32119702 2.24142036
 
−0.55889415 −0.82923901
Eigenvector Matrix :
−0.82923901 0.55889415

Case Delete Observation:


   
x1 = −11.89886251 −1.02740012 x2 = −6.16232369 8.51585904

8
 
Half Length : 6.35396193 2.254200933
 
−0.55881027 −0.82929554
Eigenvector Matrix :
−0.82929554 0.55881027

d.
Male birds generally have larger tails than females according to the given
data. The comparison of tail sizes remains inconclusive.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy