0% found this document useful (0 votes)

49 views11 pages

Exercise 1 Statistical Learning

The document summarizes the results of fitting a K-nearest neighbors model to data with different values of K. It finds that intermediate values of K perform best, balancing the trade-off between variance and bias. Higher K results in a smoother fit but more bias, while lower K is more flexible but has higher variance. The optimal value is found to be around K=3-4.

Uploaded by

ageoir goe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views11 pages

Exercise 1 Statistical Learning

Uploaded by

ageoir goe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Exercise 1 Statistical learning

Johan S. Wind

February 2018

a)
In Figure 2 from the task description we see the result of running K-NN many
times with different random seeds for the noise. This causes the resulting (blue)
lines to vary, and therefore create a band around the true (black) signal. We
see that the band is tighter when we average over more points (K larger). Also,
the averaged forms a straight line on the edges, as the K nearest neighbors are
the same for all points far enough to the right (or left). We also see that higher
K leads to a smoother fit.
A low value of K gives the most flexible fit. The most extreme case is K = 1
when the fit passes through all available data points. Higher values of K forces
the fit to be smoother and take more data points into account for each predicted
point.
In Figure 3 from the task description we see the training error and test error for
different values of K. As expected a low K gives a better training data fit, as
it is more flexible. However, too low values of K (< 3) has a higher test error
than K = 5, showing that we are over-fitting the noise, and not only capturing
the signal. As K increases further the model becomes too strict to fit the data
well, so both training error and test error steadily go up. Judging from the test
error as a function of K the minimum seems to be around K = 4, so this would
be the best value.
b)
By repeating the fitting M times with different random seeds for the noise, we
can estimate the variance and bias in the fit. The variance is simply the variance
in the estimated fits, and the bias is the average difference between the fits and
the true signal. So the variance is high if the model overfits to noise, and bias
is high if the model is too restrictive to fit the data well (and this shows up as
consistent differences in fit and true signal over may random seeds).
In figure 4 from the task description we see how the different components of the
error varies with K. The irreducible error is of course not dependent on K, The
variance goes down as K increases, because larger K enforces a more smooth

1
fit. The bias increases with K, again because K restricts the fit by enforcing
smoothness and wider averaging. We now see the reason why the total error has
a minimum around K = 3, this is where the variance-bias trade-off is optimized.
The value K = 3 found here is slightly lower than the answer K = 4 found in
a), but there isn’t much difference in the actual test error for K ∈ 3, 4, 5, so we
would say the answers are consistent.
Extra: If we naively look at each plot independently we find the minimums are
about K = 9, K = 15, K = 10 and K = 12. Averaging these gives K = 11.5,
which is much greater than the previous answers. This is however a bad way
to find the optimal K, because it does not take into account that the edges (x
around −3 or 3) are a special case where larger K give a much worse fit. Also,
it doesn’t take into account if the function has a clear minimum or a very flat
one for each x. We would therefore still trust K ∈ 1, 2, 3 much more.

a)
The fitted model is:

 T  
−1.103e − 01 1
−2.989e − 04  SEX 
   
 2.378e − 04  AGE
√
 
   
−2.504e − 04
−1/ SY SBP =   CU RSM OKE 
 
 3.087e − 04   BM I 
   
 9.288e − 06   T OT CHOL 
5.469e − 03 BP M EDS

The ”estimate” field in the summary is the least squares estimate of the con-
tribution of the corresponding factor. This means that it should√ be how much
a change in the factor would affect the prediction (here −1/ SY SBP ). F.ex.
AGE’s esimate of 2.378e − 04 means that accurding to the model, being one
year older makes negative one over the square root of the systonic blood pres-
sure 2.378e−04 higher. The intercept estimate represents the baseline, as it gives
the result if all other parameters were set to zero, and its (linear) contribution
is the same no matter the parameters. The formula for the estimate is the least
squares fit β = (X T X)−1 X T y, where X are the observed factors augmented
with a column of ones (for the intercept) and t is the observed response.
The ”Std.Error” field is an estimate of the standard error of the ”estimate” field
of the same factor. So, basically, a low ”Std.Error” means a that the ”estimate”
field is certain, and conversely a high ”Std.Error” means the corresponding ”es-
timate” field is uncertain. A formula for the ”Std.Error” is taking the diagonal
of the covariance matrix n1 εT (I − X(X T X)−1 X T )ε(X T X)−1 . Here ε = y − Xβ
is the empirical residual.

2
The ”t value” is the t statistic of a hypothesis checking if the corresponding
”estimate” field is different from zero. The formula is simply the corresponding
”estimate” divided by ”Std.Error”.
The ”P r(> |t|)” field is the probability density that a randomly chosen standard
t statistic (here with 2600−7 = 2593 degrees of freedom) will exceed the absolute
value in the ”t value” field. So it is an approximation of the probability that the
”estimate” field should be zero and the correlation was a coincidence, these fields
are also marked with stars to easily see which factors contribute significantly.
The ”Residual standard error” is the standard deviation of the residuals,
p so it
measures how closely the model fits the observed data. The formula is V ar(ε),
where V ar is the empirical variance estimator.
The ”F-statistic” field shows how the model compares to a constant model,
measuring against the hypothesis that all your factors are uncorrelated with the
response. The p value of this test is the chance that a constant model would
perform this good. The formula is F = (T SS−RSS)/p
(y − yi )2
P
P 2 RSS/(n−p−1) . where T SS =
and RSS = εi .
b)
The proportion of the variability explained by the model is given by the ”Mul-
tiple R-squared” field, so about 25% is explained. This metric does not take
into account overfitting of noise, so if we have many factors and we can often
get a high ”R2 ” based on overfitting, not based on explaining the underlying
causes. So in these cases the model would generalize poorly to other datasets
and predictions. ”Adjusted R-squared” tries to mitigate this problem, and is
very similar to the ”Multiple R-squared” in this case. This, combined with the
fact that we only have 6 factors (7 with intercept) and thousands of data points
makes me believe 25% is fairly close to the proportion of explained variabilty of
the underlying distribution.
The “fitted values vs. standardized residuals” shows fairly uniformly distributed
residuals, in the ”QQ-plot of standardized residuals” the residuals follow the
line well, and the Anderson-Darling normality test gives no reason to suspect
anything wrong with the model (p = 0.8959). So modelA passes all diagnostics
tests well.
modelB is a whole other story. The “fitted values vs. standardized residuals” are
clearly biased towards higher (positive) values, in the ”QQ-plot of standardized
residuals” the residuals clearly curve upwards compared to the reference line,
and the Anderson-Darling normality test gives extremely strong evidence (A =
13.2, p-value < 2.2e − 16) that the regression assumptions are violated.
For all purposes we would therefore prefer modelA over modelB.
As a side note, we note that ”Multiple R-squared” and ”Adjusted R-squared”
are both around 25% for modelB, like they were for modelA. This shows that
R2 isn’t an indicator of model validity (this time at least).

3
Figure 1: Plots of standarized residuals for modelA (left) and modelB(right)

Figure 2: Plots of theoretical quantiles for modelA (left) and modelB(right)

c)
The estimate β̂BM I = 3.087e − 04 is given in the summary. This says that
according to the model, if you keep the other parameters the same, but increase
the BMI by one, you will increase the negative one over the square root of the
systonic blood pressure by 3.087e−04. So it is the amount the BMI is estimated
to affect the response in our model.
Our estimate β̂BM I follows a t distribution based on its mean (3.087e − 04) and
standard error (2.955e − 05). We are working with 2593 degrees of freedom so
we can approximate it with a normal distribution. A 99% confidence interval
is then (3.087e − 04 − 2.58 ∗ 2.955e − 05, 3.087e − 04 + 2.58 ∗ 2.955e − 05) =
(2.324610e − 04, 3.849390e − 04), here 2.58 comes from the 99% confidence on a

4
normal distributed variable. This interval tells me there is a about 99% chance
of the true βBM I lying in this interval. The confidence interval also tells me
that the the p-value of the null hypothesis βBM I = 0 is less than 0.01, since 0 is
not in the confidence interval and 99% of the βBM I -values lie in the confidence
interval leaving at most 1% for our hypothesis.
d)
√
”predict(modelA, new)” gives us our best estimate for his −1/ SY SBP of
−0.08667246 meaning our best guess for his SY SBP is 133.1.
”predict(modelA, new, interval=”prediction”,
√ level = 0.90)” constructs a 90%
confidence interval for us in −1/ SY SBP . The output is (−0.09625664, −0.07708829).
The interval in SY SBP is then (107.92911212012403, 168.27638580887458). We
see that the prediction is quite uncertain, as the range is very wide even at 90%
confidence. Because of the high uncertainty we wouldn’t find this interval very
useful, unless it counts that the interval shows that it IS very uncertain, which
you wouldn’t know if you just looked at the best prediction produced by the
model. I find the underlying model parameters the more useful result of this
task, as they indicate what and how you could change to affect your systolic
blood pressure (even if some of it might be correlation and not causality).

3
We chose random seed 1234 and reuse the code given in the task description as
much as possible.
a)

pi eβ0 +β1 xi1 +β2 xi2

logit(pi ) = log( ) = log( ) = β0 + β1 xi1 + β2 xi2
1 − pi 1
So logit(pi ) is clearly linear.
We fit the model using ”model = glm(formula = y x1+x2, family = bino-
mial(link = ”logit”), data = train)” and get the following:
β̂0 = 3.3824, β̂1 = 0.3354 and β̂2 = −1.9645.
β1 can be interpreted as how much the log-odds of Y = 1 increases with
one increment in ”Alcalinity of ash”, β2 can be interpreted correspondingly for
”Color intensity”.
We write out the formula for the boundary

eβ0 +β1 xi1 +β2 xi2

Pˆr(Y = 1|xi ) > 0.5 ⇐⇒ > 0.5 ⇐⇒ β0 +β1 xi1 +β2 xi2 > 0
1 + eβ0 +β1 xi1 +β2 xi2

5
This is clearly a linear boundary, if plotted as x1 against x2 along the boundary
we simply get a straight line (x2 = − ββ20 − ββ12 x1 ).

Figure 3: Training data with decision boundary

The relevant part of the summary output is β̂0 = 3.3824, β̂1 = 0.3354 and
β̂2 = −1.9645. Inserting this into the formula for pi given in the problem
statement gives

eβ0 +β1 xi1 +β2 xi2

Pˆr(Y = 1|x1 = 17, x2 = 3) = = 24.305/25.305 ≈ 96%
1 + eβ0 +β1 xi1 +β2 xi2
So our model predicts that there is a 96% chance of this element having Y = 1.
The model gives the following confusion matrix for thresholding at probability
0.5:
0 1
0 24 11
1 0 30

The accuracy is then 54/65 = 83%, this classifier performs reasonably taking
the number of data points into account. However, we see that the model is
clearly biased towards answering Y = 1, the error rate is less than half if we put
the threshold at 0.9 instead of 0.5. Changing this hyperparameter based on the
test data is however not recommended when we have this few data points and
no validation dataset.

6
The sensitivity is 100% while the specificity is 24/35 = 69%. So we also here
observe a strong bias towards false positives.
b)
In the expression
1 X
P (Y = 1|X = x0 ) = I(yi = 1)
K
i∈N

P (Y = 1|X = x0 ) is the probability of a specific element having Y = 1 given

X = x0 . K is the number of points to average, N is the neighboring points to
the current point of interest, and I is the indicator function which is 1 for true
and 0 for false. So in practice it averages the class of the K nearest neighbors
to the point of interest.
We get the following confusion matrix for K = 3.
0 1
0 25 10
1 0 30

The accuracy is 85%, the sensitivity is 100% while the specificity is 25/35 = 71%.
So we also here observe a strong bias towards false positives.
We get the following confusion matrix for K = 9.
0 1
0 22 13
1 0 30

The accuracy is 80%, the sensitivity is 100% while the specificity is 22/35 = 63%
I prefer the K = 3 case because it gives higher test accuracy.
We can’t choose K too low because of the risk of overfitting, and we can’t choose
K too high because that would limit our model from accurately modelling the
ideal solution (extreme case K ≥ n we can only get the same class for all points).
c)
πk is the prior probability of a random element have Y = k, µk is the mean of
x = (x1 , x2 )T for all elements with Y = k (not just the training samples, but
the whole population), Σ is the covariance matrix of x (again of the population),
and fk (x) is the posterior probability density that x has class Y = k.
πk is easily estimated as the fraction of the training data having class Y = k,
µk is estimated to be the empirical mean of x = (x1 , x2 )T for all elements with
Y = k, Σ is the estimated as the empirical, unbiased covariance matrix of x.
Writing out the lda object from R, and using ”var(train[c(”x1”, ”x2”)])” give
the estimates. Here ”Prior probabilities of groups” is πk , the rows of ”Group
means” are µk and ”Covariance” is Σ.

7
Prior p r o b a b i l i t i e s o f groups :
0 1
0.3692308 0.6307692

Group means :
x1 x2
0 16.4375 5.737500
1 19.7122 3.179024

Covariance :
x1 x2
x1 1 2 . 4 9 9 3 6 5 −2.190996
x2 −2.190996 2 . 6 7 4 9 4 6

P r(Y = 0|X = x) = P r(Y = 1|X = x) ⇐⇒ π0 f0 = π1 f1 ⇐⇒

log π0 + log f0 = log π1 + log f1 ⇐⇒ δ0 (x) = δ1 (x) (1)
Where
1 T −1
δk (x) = xT Σ−1 µk − µ Σ µk + log πk
2 k

If we want the class boundary at P r(Y = 1|x) > 0.5 we get the same line as
above, giving classification Y = 1 iff:

δ1 (x) > δ0 (x)

We solve the equation (1) to get a linear equation for the boundary between
classes and plot it.
l t r a i n=l d a ( y˜ x1+x2 , data=t r a i n )
ltrain
c a t ( ” \ nCovariance : \ n ” )
cov = var ( t r a i n [ c ( ” x1 ” , ” x2 ” ) ] )
cov
i n f o = s o l v e ( cov )
mu1 = c ( 1 6 . 4 3 7 5 , 5 . 7 3 7 5 0 0 )
pi1 = 0.3692308
cat (”\n”)
mu2 = c ( 1 9 . 7 1 2 2 , 3 . 1 7 9 0 2 4 )
pi2 = 0.6307692
ab = i n f o%∗%mu1−i n f o%∗%mu2
c = −0.5∗( t (mu1)%∗% i n f o%∗%mu1)+ l o g ( p i 1 ) −( −0.5∗( t (mu2)%∗% i n f o%∗%mu2)+ l o g ( p i 2 ) )
s l o p e = −ab [ 1 ] / ab [ 2 ]
i n t e r c e p t = −c /ab [ 2 ]

g1 = g g p l o t ( data=t r a i n , a e s ( x=x1 , y=x2 , c o l o u r=y ) ) + geom point ( pch = 1 )

8
g1 + g e o m a b l i n e ( s l o p e=s l o p e , i n t e r c e p t=i n t e r c e p t )+
g g t i t l e ( ” t r a i n data and l d a boundary ” ) + geom point ( data = t e s t , pch = 3 )

Figure 4: Training data is represented by circles (o), while test data is repre-
sented by plusses (+)

We use the following code to generate the confusion matrix

pred=p r e d i c t ( o b j e c t = l t r a i n , newdata = t e s t )
testclass = pred$class
t = table ( test$y , t e s t c l a s s )
t

We get the following confusion matrix for LDA.

0 1
0 24 11
1 0 30

The accuracy is 83%, the sensitivity is 100% while the specificity is 24/35 = 69%.
So we also here observe a strong bias towards false positives.
The QDA lets each class have its own covariance matrix, this results in degrees
of freedom quadratic in the number of explaining factors. This means we shouls
use it if each class is suspected to have a different covariance matrix, and be
extra careful about overfitting.
d)

9
Because of the strong bias towards false positives across all methods, we expect
that the training/test data split we got was hard for the algorithms with the
default threshold of 0.5. This also means we wouldn’t put much trust in choosing
the methods based on results gained with this threshold. But for the sake of
the task question, K-NN with K = 3 scored the best with default threshold,
this is probably since K-NN is less sensitive than the other methods towards
the threshold (there isn’t really a continuous threshold in the same way as the
other methods have).
When we change the threshold in the previous algorithms, we get a trade-
off between sensitivity and specificity. ROC traces out this trade-off for all
thresholds so we can get an informative plot of it. AUC is simply the area
under the ROC.
We use the code provided in the task description to get the ROC and AUC.

Figure 5: Comparision of ROCs for the different methods

The outputted AUCs for the different methods are

Logistic regression :
0.9733
K−NN:
0.9238
LDA:
0.9752

We see that K-NN performs significantly worse than the other two methods,
which are quite similar. If we had to choose one we would choose LDA as it is

10
slightly better at both the ROC plot and AUC than Logistic regression. Here we
see the importance of taking the threshold into account, as the method K-NN
which seemed the most promising with default threshold scores considerably
worse with other thresholds.

Options Strategies eBook (Hindi Version)
100% (2)
Options Strategies eBook (Hindi Version)
44 pages
Cheat Sheet
No ratings yet
Cheat Sheet
4 pages
Introduction To Statistical Learning: With Applications in R
No ratings yet
Introduction To Statistical Learning: With Applications in R
13 pages
Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
Exercise 3 Computer Intensive Statistics
No ratings yet
Exercise 3 Computer Intensive Statistics
10 pages
HMWK 4
No ratings yet
HMWK 4
5 pages
Lecture BDS 3 23 24 Print
No ratings yet
Lecture BDS 3 23 24 Print
20 pages
Grad Lecture 3
No ratings yet
Grad Lecture 3
27 pages
Presentation__SICEAMS_2024 (1)
No ratings yet
Presentation__SICEAMS_2024 (1)
31 pages
Nu - Edu.kz Econometrics-I Assignment 7 Answer Key
No ratings yet
Nu - Edu.kz Econometrics-I Assignment 7 Answer Key
6 pages
Introduction To ANOVA
No ratings yet
Introduction To ANOVA
11 pages
Homework 1: Statistics 109 Due February 17, 2019 at 11:59pm EST
No ratings yet
Homework 1: Statistics 109 Due February 17, 2019 at 11:59pm EST
23 pages
2024HW2Boot GOF Eng (1)
No ratings yet
2024HW2Boot GOF Eng (1)
4 pages
Is The Dependent Variable Related To The Independent Variable?
No ratings yet
Is The Dependent Variable Related To The Independent Variable?
10 pages
16-Two-Sample-T-tests
No ratings yet
16-Two-Sample-T-tests
40 pages
STAT359 Study Guide
No ratings yet
STAT359 Study Guide
7 pages
cheatsheet
No ratings yet
cheatsheet
4 pages
Chapter 2: Properties of The Regression Coe Cients and Hypothesis Testing
No ratings yet
Chapter 2: Properties of The Regression Coe Cients and Hypothesis Testing
5 pages
SRM Notes
No ratings yet
SRM Notes
38 pages
Linear Regression
100% (2)
Linear Regression
228 pages
Statistical_Computing
No ratings yet
Statistical_Computing
8 pages
Regression Analysis
100% (1)
Regression Analysis
280 pages
Summary Data
No ratings yet
Summary Data
9 pages
2018dec_02402_solution_en
No ratings yet
2018dec_02402_solution_en
31 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
6 pages
Statistical+Inference+1 Shaw2007
No ratings yet
Statistical+Inference+1 Shaw2007
66 pages
intro to regression
No ratings yet
intro to regression
4 pages
Weatherwax Weisberg Solutions
No ratings yet
Weatherwax Weisberg Solutions
162 pages
Statistics
No ratings yet
Statistics
60 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
26 pages
Appendix: Answers To Selected Exercises: /user
No ratings yet
Appendix: Answers To Selected Exercises: /user
8 pages
Bias:Variance Tradeoff
No ratings yet
Bias:Variance Tradeoff
6 pages
ISLR solutions——Classification
No ratings yet
ISLR solutions——Classification
20 pages
04 BasicAnalyses
No ratings yet
04 BasicAnalyses
44 pages
R Commands
No ratings yet
R Commands
5 pages
Bootstrap Regression With R: Histogram of KPL
No ratings yet
Bootstrap Regression With R: Histogram of KPL
5 pages
Chapter 3 Notes
No ratings yet
Chapter 3 Notes
5 pages
Statistics Help Card Full
No ratings yet
Statistics Help Card Full
6 pages
STATSCHEATSHeet
No ratings yet
STATSCHEATSHeet
5 pages
Assignment-3-Forecasting-
No ratings yet
Assignment-3-Forecasting-
12 pages
Assignment IV Probability
No ratings yet
Assignment IV Probability
18 pages
Multiple Linear Regression: Beginning of Next Lecture - Online Course Evaluation (Bring A Tablet, Laptop, Phone?)
No ratings yet
Multiple Linear Regression: Beginning of Next Lecture - Online Course Evaluation (Bring A Tablet, Laptop, Phone?)
37 pages
ECON 1630 Problem Set #2 Fall 2021: Bias Variance
No ratings yet
ECON 1630 Problem Set #2 Fall 2021: Bias Variance
9 pages
FormulaSheet FinalExam
No ratings yet
FormulaSheet FinalExam
8 pages
Statistics Notes
No ratings yet
Statistics Notes
18 pages
HW1
No ratings yet
HW1
18 pages
Tutorial+6
No ratings yet
Tutorial+6
13 pages
Lecture 21: Model Selection 1 Choosing Models
No ratings yet
Lecture 21: Model Selection 1 Choosing Models
14 pages
Contoh Lampiran
No ratings yet
Contoh Lampiran
17 pages
Formula Sheet For Statistics
No ratings yet
Formula Sheet For Statistics
43 pages
2025-L1-Seminars_unlocked
No ratings yet
2025-L1-Seminars_unlocked
40 pages
On Fitting Models For Danish Fire Data
No ratings yet
On Fitting Models For Danish Fire Data
49 pages
MA 324, Lecture 1: Yohann Tendero Yohann - Tendero@
No ratings yet
MA 324, Lecture 1: Yohann Tendero Yohann - Tendero@
19 pages
1.3. MR Using SPSS
No ratings yet
1.3. MR Using SPSS
24 pages
Statistical Inference
No ratings yet
Statistical Inference
35 pages
Histogram: Number
No ratings yet
Histogram: Number
38 pages
ISLR
No ratings yet
ISLR
9 pages
Formula Sheet
No ratings yet
Formula Sheet
8 pages
Student Solutions Manual for Mathematics for Economics, fourth edition
From Everand
Student Solutions Manual for Mathematics for Economics, fourth edition
Michael Hoy
No ratings yet
BAYES Theorem
From Everand
BAYES Theorem
Jeffery Short
2/5 (5)
Instruction for Using a Slide Rule
From Everand
Instruction for Using a Slide Rule
W. Stanley
No ratings yet
What's Next?: Tree Models Decision Trees Ranking and Probability Estimation Trees
No ratings yet
What's Next?: Tree Models Decision Trees Ranking and Probability Estimation Trees
49 pages
TWDLMDA20DTK DATASHEET WW en-GB
No ratings yet
TWDLMDA20DTK DATASHEET WW en-GB
8 pages
Lecture 14, 15 Stability
No ratings yet
Lecture 14, 15 Stability
45 pages
A D T 8 8 6 0 + T V 1 0 0 4 C M User's Manual
No ratings yet
A D T 8 8 6 0 + T V 1 0 0 4 C M User's Manual
76 pages
BoneXpert Brochure
No ratings yet
BoneXpert Brochure
1 page
DXTBMP QuickStart
No ratings yet
DXTBMP QuickStart
2 pages
Full download Data Ingestion with Python Cookbook: A practical guide to ingesting, monitoring, and identifying errors in the data ingestion process 1st Edition Esppenchutz pdf docx
No ratings yet
Full download Data Ingestion with Python Cookbook: A practical guide to ingesting, monitoring, and identifying errors in the data ingestion process 1st Edition Esppenchutz pdf docx
41 pages
19 What is IPv4
No ratings yet
19 What is IPv4
5 pages
ShipSim2008 MissionEditorManual
No ratings yet
ShipSim2008 MissionEditorManual
20 pages
History and Vision of Optical Fiber Fusion Splicing Technology
No ratings yet
History and Vision of Optical Fiber Fusion Splicing Technology
4 pages
Tufline - Jacketed Valves
No ratings yet
Tufline - Jacketed Valves
16 pages
App PPMP Calawaan Es 2023 Updated
No ratings yet
App PPMP Calawaan Es 2023 Updated
13 pages
Internal Memory (RAM and ROM) User Guide
No ratings yet
Internal Memory (RAM and ROM) User Guide
58 pages
All Chapters
No ratings yet
All Chapters
369 pages
General Nokia Secret Codes For All Phones: Consulting Contact Us Advertise Here Write For Us
No ratings yet
General Nokia Secret Codes For All Phones: Consulting Contact Us Advertise Here Write For Us
6 pages
Rtu Scada Drawing Ess-1
No ratings yet
Rtu Scada Drawing Ess-1
62 pages
BH1 S4hana2023 BPD en Us
No ratings yet
BH1 S4hana2023 BPD en Us
38 pages
Digital Clock Using 4026
100% (1)
Digital Clock Using 4026
16 pages
Fybsc - CS Sem 1 Mysql Journal
No ratings yet
Fybsc - CS Sem 1 Mysql Journal
47 pages
Sabre Basic and Advanced Updated 2024
No ratings yet
Sabre Basic and Advanced Updated 2024
8 pages
r21ut0256ej0130-ainavi
No ratings yet
r21ut0256ej0130-ainavi
13 pages
Ex - No:01 Rotate An Image Date
No ratings yet
Ex - No:01 Rotate An Image Date
11 pages
Administrative Manager Resume
100% (1)
Administrative Manager Resume
8 pages
Comparison of Turbulence Models For Flow Past NACA
No ratings yet
Comparison of Turbulence Models For Flow Past NACA
16 pages
DEV3 Section1 Activities V7.1 PDF
No ratings yet
DEV3 Section1 Activities V7.1 PDF
25 pages
(NN4 - GROUP 5) MARKETING PLAN OF HERTFORDSHIRE UNIVERSITY - Document
No ratings yet
(NN4 - GROUP 5) MARKETING PLAN OF HERTFORDSHIRE UNIVERSITY - Document
19 pages
State of GPT
No ratings yet
State of GPT
50 pages
(Guide) Different Ways To Open Command Prompt As Administrator in Windows - AskVG
No ratings yet
(Guide) Different Ways To Open Command Prompt As Administrator in Windows - AskVG
7 pages
DSGT Sample Ques Bank (CSE-DS)
No ratings yet
DSGT Sample Ques Bank (CSE-DS)
7 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Exercise 1 Statistical Learning

Uploaded by

Exercise 1 Statistical Learning

Uploaded by

Exercise 1 Statistical learning

Figure 2: Plots of theoretical quantiles for modelA (left) and modelB(right)

pi eβ0 +β1 xi1 +β2 xi2

eβ0 +β1 xi1 +β2 xi2

Figure 3: Training data with decision boundary

eβ0 +β1 xi1 +β2 xi2

P (Y = 1|X = x0 ) is the probability of a specific element having Y = 1 given

P r(Y = 0|X = x) = P r(Y = 1|X = x) ⇐⇒ π0 f0 = π1 f1 ⇐⇒

δ1 (x) > δ0 (x)

g1 = g g p l o t ( data=t r a i n , a e s ( x=x1 , y=x2 , c o l o u r=y ) ) + geom point ( pch = 1 )

We use the following code to generate the confusion matrix

We get the following confusion matrix for LDA.

Figure 5: Comparision of ROCs for the different methods

The outputted AUCs for the different methods are

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.