0% found this document useful (0 votes)
5 views4 pages

ES Key

The document is an end-semester exam paper for CS 771A: Intro to Machine Learning at IIT Kanpur, dated July 16, 2024. It includes instructions for answering questions, a series of questions related to machine learning concepts such as the EM algorithm, Mercer kernels, convex functions, and distribution properties. The exam consists of multiple questions requiring theoretical justifications, calculations, and algorithm design.

Uploaded by

Prasoon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views4 pages

ES Key

The document is an end-semester exam paper for CS 771A: Intro to Machine Learning at IIT Kanpur, dated July 16, 2024. It includes instructions for answering questions, a series of questions related to machine learning concepts such as the EM algorithm, Mercer kernels, convex functions, and distribution properties. The exam consists of multiple questions requiring theoretical justifications, calculations, and algorithm design.

Uploaded by

Prasoon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

CS 771A: Intro to Machine Learning, IIT Kanpur Endsem Exam (16 July 2024)

Name MELBO 40 marks


Roll No 24007 Dept. AWSM Page 1 of 4

Instructions:
1. This question paper contains 2 pages (4 sides of paper). Please verify.
2. Write your name, roll number, department in block letters with ink on each page.
3. Write your final answers neatly with a blue/black pen. Pencil marks may get smudged.
4. Don’t overwrite/scratch answers especially in MCQ – ambiguous cases may get 0 marks.

Q1. (True-False) Write T or F for True/False (write only in the box on the right-hand side). You
must also give a brief justification for your reply in the space provided below.(3 x (1+2) = 9 marks)

EM run on data 𝐱1 , … , 𝐱 𝑁 ∈ ℝ2 s.t. ‖𝐱 𝑖 ‖2 ≤ 2 for all 𝑖 ∈ [𝑁] to learn mixture of two


1 T
Gaussians 𝒩(𝛍𝑘 , 𝐼), 𝑘 ∈ [2] will always ensure that the means satisfy ‖𝛍𝑘 ‖2 ≤ 2.
In any iteration of the EM algorithm, the means 𝛍𝑘 are updated as 𝛍𝑐 = ∑𝑖∈[𝑁] 𝜂𝑖𝑐 ⋅ 𝐱 𝑖 where
𝑞𝑐𝑖
𝜂𝑖𝑐 ≝ 𝑗 i.e. ∑𝑖∈[𝑁] 𝜂𝑖𝑐 = 1 i.e., a weighted average of the points is used to update the mean.
∑𝑖∈[𝑁] 𝑞𝑐
However, convex sets 𝒞 satisfy the property that if 𝐱, 𝐲 ∈ 𝒞, then 𝜂 ⋅ 𝐱 + (1 − 𝜂) ⋅ 𝐲 ∈ 𝒞 as well
for any 𝜂 ∈ [0,1]. Since the set ℬ2 (0,2) ≝ {𝐱 ∈ ℝ2 : ‖𝐱‖2 ≤ 2} is convex, the result follows.

The difference of two Mercer kernels can never be Mercer. If True, give a proof. If
2 False, construct two Mercer kernels 𝐾1 , 𝐾2 with maps 𝜙1 , 𝜙2 s.t. the difference F
𝐾3 ≝ 𝐾1 − 𝐾2 is also a Mercer kernel with map 𝜙3 . Give maps 𝜙1 , 𝜙2 , 𝜙3 explicitly.
Let 𝐾1 , 𝐾2 : ℝ × ℝ → ℝ be defined as 𝐾1 (𝑥, 𝑦) ≝ 25𝑥𝑦 and 𝐾2 (𝑥, 𝑦) ≝ 16𝑥𝑦. The corresponding
feature maps are 𝜙1 (𝑥) = [5𝑥] and 𝜙2 (𝑥) = [4𝑥]. Note the feature maps are unidimensional.
We have 𝐾3 (𝑥, 𝑦) = 9𝑥𝑦 for which the feature map 𝜙3 (𝑥) = [3𝑥] works.

𝑥+𝑦
For convex differentiable 𝑓: ℝ → ℝ, if 𝑓 ( ) > 1 for some 𝑥, 𝑦 ∈ ℝ, then we must
3 2 T
have max{𝑓(𝑥), 𝑓(𝑦)} > 1. Justify either using a proof or counter example.
𝑥+𝑦 𝑓(𝑥)+𝑓(𝑦)
Convex functions satisfy 𝑓 ( )≤ . If 𝑓(𝑥) ≤ 1 as well as 𝑓(𝑦) ≤ 1 then we will have
2 2
𝑥+𝑦 1+1 𝑥+𝑦
𝑓( ) ≤ i.e. 𝑓 ( ) ≤ 1 which is a contradiction. Thus, at least one of 𝑓(𝑥) or 𝑓(𝑦) must
2 2 2
be strictly greater than 1 which implies that max{𝑓(𝑥), 𝑓(𝑦)} > 1.
Page 2 of 4
Q2 (Almost Uniform) Melbo is constructing a distribution 𝒟 with
support over 2D vectors of length up to 1 i.e. {𝐱 ∈ ℝ2 : ‖𝐱‖2 ≤ 1}.
𝒟 has two parameters 𝐜 ∈ ℝ2 , 𝜖 ∈ [0,1] and assigns a high
1
density in a “dense ball” of radius 𝜖 centered at 𝐜 i.e., in
𝜖
1
{𝐱 ∈ ℝ2 : ‖𝐱‖2 ≤ 1, ‖𝐱 − 𝐜‖2 ≤ 𝜖} and a low density of in the
2𝜋
rest of the support i.e., in {𝐱 ∈ ℝ2 : ‖𝐱‖2 ≤ 1, ‖𝐱 − 𝐜‖2 > 𝜖}. We
have ‖𝐜‖2 ≤ 1 − 𝜖 i.e., the dense ball stays within the support.
a. For which values of 𝜖 will 𝒟 be a proper distribution? Find them and show calculations. You
may find the fact that 𝜋 − √𝜋 2 − 2 ∈ [0,1] and 𝜋 − √𝜋 2 − 1 ∈ [0,1] to be useful.
b. Find out the mean vector 𝛍 ∈ ℝ2 of this distribution. Show calculations. (5 + 7 = 12 marks)
Hint: the mean of a uniform distribution over a circle is its centre.
Find value(s) of 𝜖 for which 𝒟 is a proper distribution.
1 1
Distributions are normalized i.e., ⋅ 𝜋𝜖 2 + ⋅ (𝜋 − 𝜋𝜖 2 ) = 1 i.e. 𝜖 2 − 2𝜋𝜖 + 1 = 0. Solving the
𝜖 2𝜋
quadratic gives us the candidate values as 𝜋 ± √𝜋 2 − 1. However, the larger root would result in
𝜖 = 𝜋 + √𝜋 2 − 1 > 𝜋 > 1 which is absurd since that would in-turn force ‖𝐜‖2 ≤ 1 − 𝜖 < 0. Thus,
the only value 𝜖 can take is 𝜋 − √𝜋 2 − 1. Note that this value satisfies 𝜖 ∈ [0,1] using the fact
provided in the question statement.
Find out the mean vector of the distribution 𝒟.

Let 𝒰 denote the unit ball {𝐱 ∈ ℝ2 : ‖𝐱‖2 ≤ 1} and ℋ be the dense ball {𝐱 ∈ ℝ2 : ‖𝐱 − 𝐜‖2 ≤ 𝜖}.
⬚ ⬚ ⬚
We have 𝛍 = ∫𝒰 𝐱 ⋅ 𝒟(𝐱) 𝑑𝐱 = ∫
⏟ℋ 𝐱 ⋅ 𝒟(𝐱) 𝑑𝐱 + ∫
⏟𝒰\ℋ 𝐱 ⋅ 𝒟(𝐱) 𝑑𝐱.
(𝐴) (𝐵)

1 ⬚ ⬚ ⬚ 1
(𝐴) = ∫ℋ 𝐱 𝑑𝐱 . Now ∫ℋ 𝐱 𝑑𝐱 = 𝜋𝜖 2 ⋅ ∫ℋ 𝐱 ⋅ 𝒫(𝐱) 𝑑𝐱 where 𝒫(𝐱) = 2 is the (conditional)
𝜖 𝜋𝜖
uniform distribution inside the heavy ball. As the mean of a uniform distribution over a circle is its
⬚ 1
centre, we have ∫ℋ 𝐱 ⋅ 𝒫(𝐱) 𝑑𝐱 = 𝐜 which gives us (𝐴) = ⋅ 𝜋𝜖 2 ⋅ 𝐜 = 𝜋𝜖 ⋅ 𝐜.
𝜖

1 ⬚ 1 ⬚ ⬚
(𝐵) = ∫ 𝐱 𝑑𝐱 = (∫ 𝐱 𝑑𝐱 − ∫
⏟ℋ 𝐱 𝑑𝐱) . Using the same argument as above, we get
2𝜋 𝒰\ℋ 2𝜋 ⏟
𝒰
(𝐶) (𝐷)
𝜖2 𝜖2
(𝐶) = 𝜋12 ⋅ 𝟎 and (𝐷) = 𝜋𝜖 2 ⋅ 𝐜 which gives us (𝐵) = − ⋅ 𝐜 giving us 𝛍 = (𝜋𝜖 − ) ⋅ 𝐜.
2 2
1
However, recall that 𝜖 satisfies 𝜖 2 − 2𝜋𝜖 + 1 = 0 which means 𝛍 = ⋅ 𝐜.
2
1
Can you simplify these calculations? What if the low density is some general value 𝑝𝑙 ≠ ?
2𝜋
CS 771A: Intro to Machine Learning, IIT Kanpur Endsem Exam (16 July 2024)
Name MELBO 40 marks
Roll No 24007 Dept. AWSM Page 3 of 4

Q3 (Positive Linear Regression) We have data features 𝐱1 , … , 𝐱 𝑁 ∈ ℝ𝐷 and labels 𝑦1 , … , 𝑦𝑁 ∈ ℝ


stylized as 𝑋 ∈ ℝ𝑁×𝐷 , 𝐲 ∈ ℝ𝑁 . We wish to fit a linear model with positive coefficients:
1
argmin ‖𝑋𝐰 − 𝐲‖22 s. t. 𝑤𝑗 ≥ 0 for all 𝑗 ∈ [𝐷]
𝐰∈ℝ𝐷 , 2

1. Write the Lagrangian for this problem by introducing dual variables (no derivation needed).
2. Simplify the dual problem (eliminate 𝐰) – show major steps. Assume 𝑋 ⊤ 𝑋 is invertible.
3. Give a coordinate descent/ascent algorithm to solve the dual. (2 + 4 + 6 = 12 marks)
Write down the Lagrangian here (you will need to introduce dual variables and give them names)

1
ℒ(𝐰, 𝛂) = ‖𝑋𝐰 − 𝐲‖22 − 𝛂⊤ 𝐰
2
which can be rewritten for convenience as
1 1
ℒ(𝐰, 𝛂) = 𝐰 ⊤ 𝑋 ⊤ 𝑋𝐰 − 𝐰 ⊤ 𝑋 ⊤ 𝐲 − 𝐰 ⊤ 𝛂 + ‖𝐲‖22
2 2

Derive and simplify the dual. Show major calculations steps.

The dual is max {min{ℒ(𝐰, 𝛂)}}. Solving the inner problem by applying first-order optimality (since
𝛂≥𝟎 𝐰
𝜕ℒ
it is an unconstrained problem) gives us = 𝟎 ⇒ 𝑋 ⊤ (𝑋𝐰 − 𝐲) − 𝛂 = 𝟎. Putting this in the
𝜕𝐰
Lagrangian and neglecting constant terms gives us
1
min { 𝛂⊤ 𝐶𝛂 + 𝛂⊤ 𝐬}
𝛂≥𝟎 2

where 𝐶 = [𝑐𝑖𝑗 ] ≝ (𝑋 ⊤ 𝑋)−1 ∈ ℝ𝐷×𝐷 and 𝐬 = [𝑠𝑖 ] ≝ 𝐶𝑋 ⊤ 𝐲 ∈ ℝ𝐷 .


Page 4 of 4
Give a coordinate descent/ascent algorithm to solve the dual problem.

Consider a single coordinate of the dual variable, say 𝛼𝑖 (the coordinate may have been chosen
cyclically or via random permutation, etc. The optimization problem restricted to 𝛼𝑖 is
1
min 𝑐𝑖𝑖 𝛼𝑖2 + 𝛼𝑖 (𝑠𝑖 + ∑ 𝑐𝑖𝑗 𝛼𝑗 )
𝛼𝑖 ≥0 2 𝑗≠𝑖

1
Using the QUIN trick tells us that the optimal value is max {0, −
𝑐𝑖𝑖
(𝑠𝑖 + ∑𝑗≠𝑖 𝑐𝑖𝑗 𝛼𝑗 )}

Q4. (Kernel Smash) 𝐾1 , 𝐾2 , 𝐾3 : ℝ × ℝ → ℝ are Mercer kernels i.e., for any 𝑥, 𝑦 ∈ ℝ, we have
𝐾𝑖 (𝑥, 𝑦) = 〈𝜙𝑖 (𝑥), 𝜙𝑖 (𝑦)〉 with 𝜙1 (𝑥) = (1, 𝑥), 𝜙2 (𝑥) = (𝑥, 𝑥 2 ), 𝜙3 (𝑥) = (𝑥 2 , 𝑥 4 , 𝑥 6 ). Design a
2
map 𝜙4 : ℝ → ℝ7 for kernel 𝐾4 s.t. 𝐾4 (𝑥, 𝑦) = (𝐾1 (𝑥, 𝑦) − 𝐾2 (𝑥, 𝑦)) + 3𝐾3 (𝑥, 𝑦) for all 𝑥, 𝑦 ∈ ℝ.
No derivation needed. Note that 𝝓𝟒 must not use more than 7 dimensions. If your solution does
not require 7 dimensions then leave the rest of the dimensions blank or fill with zero. (7 marks)
𝜙4 (𝑥 ) =

2 4 6
( 1 , 𝑥 , 2𝑥 , 𝑥 √3 , 0 , 0 , 0 )

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy