0% found this document useful (0 votes)
45 views4 pages

ML MS 22-23-II Key

This document is the first page of a 4-page midterm exam for an introductory machine learning course. It contains instructions for taking the exam, which includes 4 multiple choice questions and 3 longer answer questions. The instructions specify to write your name, roll number, and department on each page and to show your work neatly in blue or black pen. The questions cover topics like convexity, entropy, optimization, and dual formulations.

Uploaded by

Priyanshu Nain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views4 pages

ML MS 22-23-II Key

This document is the first page of a 4-page midterm exam for an introductory machine learning course. It contains instructions for taking the exam, which includes 4 multiple choice questions and 3 longer answer questions. The instructions specify to write your name, roll number, and department on each page and to show your work neatly in blue or black pen. The questions cover topics like convexity, entropy, optimization, and dual formulations.

Uploaded by

Priyanshu Nain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

CS 771A: Intro to Machine Learning, IIT Kanpur Midsem Exam (26 Feb 2023)

Name MELBO 40 marks


Roll No 230001 Dept. AWSM Page 1 of 4

Instructions:
1. This question paper contains 2 pages (4 sides of paper). Please verify.
2. Write your name, roll number, department in block letters with ink on each page.
3. Write your final answers neatly with a blue/black pen. Pencil marks may get smudged.
4. Don’t overwrite/scratch answers especially in MCQ – ambiguous cases will get 0 marks.

Q1. Write T or F for True/False in the box. Also, give justification. (4 x (1+2) = 12 marks)

For 𝐱, 𝐲, 𝐳 ∈ ℝ2 s.t. ‖𝐱‖2 = ‖𝐲‖2 = √2, ‖𝐳‖2 = 1 and 𝐱 ⊤ 𝐲 ≥ 𝐱 ⊤ 𝐳, we always have


1 F
‖𝐱 − 𝐲‖22 ≤ ‖𝐱 − 𝐳‖22 . Give a brief proof if True else give a counter example if False.

Consider the following counterexample:


𝐱 = [√2, 0], 𝐲 = [1,1], 𝐳 = [1,0].
We have 𝐱 ⊤ 𝐲 = √2 ≥ √2 = 𝐱 ⊤ 𝐳 however we also have
2 2
‖𝐱 − 𝐲‖22 = (√2 − 1) + 1 > (√2 − 1) = ‖𝐱 − 𝐳‖22

Let 𝑓, 𝑔: ℝ → ℝ be two distinct, non-constant, convex functions i.e., 𝑓 ≠ 𝑔 and it is


not the case that for some 𝑐, 𝑑 ∈ ℝ, 𝑓(𝑥) = 𝑐, 𝑔(𝑥) = 𝑑 for all 𝑥 ∈ ℝ. Then ℎ: ℝ →
2 ℝ defined as ℎ(𝑥) ≝ 𝑓(𝑥)⁄𝑔(𝑥) can never be convex. Give a brief proof if True else F
if False, give a counter example using two distinct non-constant, convex functions.
It is okay to give a counter example where ℎ has isolated, removable discontinuities.

Consider the following counterexample:


𝑓(𝑥) = 𝑒 2𝑥 , 𝑔(𝑥) = 𝑒 𝑥 .
Both are distinct, non-constant, convex functions.
2
Note that 𝑓(𝑥) = (𝑔(𝑥)) . However,
𝑓(𝑥)⁄𝑔(𝑥) = 𝑒 𝑥 ,
which is a convex function itself.

𝑋 is a discrete random variable that takes value −1 with probability 𝑝 and 1 with
3 probability 1 − 𝑝. The value of 𝑝 at which 𝑋 has maximum entropy is the same as T
the value of 𝑝 at which 𝑋 has maximum variance.

𝔼[𝑋] = 1 − 2𝑝, 𝔼[𝑋 2 ] = 1 i.e., Var[𝑋] = 4𝑝(1 − 𝑝). Applying FOO


and the second-derivative test tells us that the maximum variance
is achieved at 𝑝 = 1/2. The entropy of 𝑋 is defined as
Ent[𝑋] = −𝑝 ln 𝑝 − (1 − 𝑝) ln(1 − 𝑝). Applying FOO tells us that
entropy is maximized at 1/2 as well.
Page 2 of 4
𝑌 is a Boolean random variable ℙ[𝑌 = 1] = 1⁄(1 + exp(−𝑡)). Then 𝑌’s entropy is
4 F
maximized as 𝑡 → ∞. Justify your answer by giving brief calculations.

Let ℙ[𝑌 = 1] = 1⁄(1 + exp(−𝑡)) ≝ 𝑝. As 𝑌 is Boolean, this gives us ℙ[𝑌 = 0] = 1 − 𝑝. Thus,


the entropy of 𝑌 is Ent[𝑌] = −𝑝 ln 𝑝 − (1 − 𝑝) ln(1 − 𝑝). The derivative of the entropy is
ln((1 − 𝑝)⁄𝑝) which is maximized as 𝑝 → 1/2. However, as 𝑡 → ∞, 𝑝 → 1 i.e., 𝑌’s entropy is not
maximized as 𝑡 → ∞. Note that entropy goes to 0 as 𝑡 → ∞ or 𝑡 → −∞. In fact, entropy is
maximized as 𝑡 → 0.

Q2. (X marks the split) Create a feature map 𝜙: ℝ2 → ℝ𝐷 for some


𝐷 > 0 so that for any 𝐳 = (𝑥, 𝑦) ∈ ℝ2 , sign(𝟏⊤ 𝜙(𝐳)) takes value
−1 if 𝐳 is in the dark cross-hatched region and +1 if 𝐳 is in the light
dotted region (see fig). 𝟏 = (1,1, … ,1) ∈ ℝ𝐷 is the 𝐷-dimensional
all-ones vector. The dashed lines in the fig are 𝑥 = 𝑦 and 𝑥 = −𝑦.
No derivation needed – just give the final map below. (3 marks)
Several solutions are possible e.g., [𝑥 2 , −𝑦 2 ] ∈ ℝ2 , [𝑥 2 − 𝑦 2 ] ∈ ℝ,
[|𝑥|, −|𝑦|] ∈ ℝ2 , [|𝑥| − |𝑦|] ∈ ℝ.
Incorrect solutions include [|𝑥𝑦|, −1] ∈ ℝ2 , [|𝑥𝑦|, −𝑦 2 ] ∈ ℝ2 and
[𝑦 2 − 𝑥 2 , 𝑥𝑦] ∈ ℝ2 . Note that all these solutions give a wrong label
on the point (1,0). The label should be +1 on this point but we
have |𝑥𝑦| − 1 = |𝑥𝑦| − 𝑦 2 = 𝑦 2 − 𝑥 2 + 𝑥𝑦 = −1 for 𝑥 = 1, 𝑦 = 0.
1
Q3. (Maximum stretch) Consider the optimization problem min3 ‖𝐱‖22 s. t. 𝐜 ⊤ 𝐱 ≥ 𝑝 which has
𝐱∈ℝ 2
3
a single constraint and 𝐜 ∈ ℝ is a constant vector and 𝑝 ∈ ℝ is a real constant. (3+2 = 5 marks)
(a) Give brief derivation solving the problem for 𝐜 = (1,2,3) and 𝑝 = 7. Write the value of 𝐱 at
which the optimum is achieved. (Hint: try orthogonal decomposition or some other trick)
Decompose 𝐱 = 𝐱 ∥ + 𝐱 ⊥ where 𝐱 ∥ is along 𝐜 and 𝐱 ⊥ is perpendicular to 𝐜. Note that 𝐜 ⊤ 𝐱 = 𝐜 ⊤ 𝐱 ∥
2 2
but by Pythagoras’s theorem, ‖𝐱‖22 = ‖𝐱 ∥ ‖2 + ‖𝐱 ⊥ ‖22 > ‖𝐱 ∥ ‖2 unless ‖𝐱 ⊥ ‖2 = 0. This means
that having 𝐱 ∥ ≠ 𝟎 does not contribute to the constraint but increases the objective value. This
means that the optimum must be achieved at 𝐱 ⊥ = 𝟎. This means 𝐱 = 𝜆 ⋅ 𝐜. We want 𝐜 ⊤ 𝐱 ≥ 𝑝
1
i.e., 𝜆 ≥ 𝑝⁄‖𝐜‖22 = 7/14 = 1/2. Since we wish to minimize ‖𝐱‖22 , we choose the smallest
2
value of 𝜆 that satisfies the constraint i.e., the optimal value of 𝐱 = (0.5,1,1.5)

(b) Give brief derivation solving the problem for 𝐜 = (−1, −2, −3) and 𝑝 = −7. Write the value
of 𝐱 at which the optimum is achieved.
1
The optimal value of 𝐱 = (0,0,0). To see this, notice that this value achieves ‖𝐱‖22 = 0 which
2
is the smallest possible value since norms always take non-negative values. Moreover, this also
satisfies the constraint since 𝐜 ⊤ 𝐱 = 0 ≥ −7.
CS 771A: Intro to Machine Learning, IIT Kanpur Midsem Exam (26 Feb 2023)
Name MELBO 40 marks
Roll No 230001 Dept. AWSM Page 3 of 4

Q4 (Elastic-net regression) Given 𝑛 pts (𝐱 𝑖 , 𝑦 𝑖 ) 𝐱 𝑖 ∈ ℝ𝑑 , 𝑦 𝑖 ∈ ℝ,


1 1 2
we wish to solve min𝑑 ‖𝐰‖22 + ‖𝐰‖1 + ∑𝑖∈[𝑛](𝑦 𝑖 − 𝐰 ⊤ 𝐱 𝑖 ) .
𝐰∈ℝ 2 2
To create its dual, we introduce variables 𝐳 = [𝑧1 , … , 𝑧𝑑 ] ∈ ℝ𝑑
and 𝐫 = [𝑟1 , … , 𝑟𝑛 ] ∈ ℝ𝑛 to give us the constrained problem in
the box on the right. Note that 𝟏 ∈ ℝ𝑑 is the all-ones vector.
We introduce dual variables 𝛼𝑗 for the constraints 𝑤𝑗 − 𝑧𝑗 ≤ 0, 𝛽𝑗 for −𝑤𝑗 − 𝑧𝑗 ≤ 0 and 𝜆𝑖 for
𝑦 𝑖 − 𝐰 ⊤ 𝐱 𝑖 − 𝑟𝑖 = 0. For simplicity, we collect the dual variables as vectors 𝛂, 𝛃 ∈ ℝ𝑑 and 𝛌 ∈ ℝ𝑛 .
For each part, give your answers in the space demarcated for that part. (3+2+6+5+4=20 marks)
a. `Fill in the circle indicating the correct constraint for the dual variables 𝛼𝑗 , 𝛽𝑗 , 𝜆𝑖 . (3x1 marks)

b. Write down the Lagrangian ℒ(𝐰, 𝐳, 𝐫, 𝛂, 𝛃, 𝛌) – no derivation needed. (2 marks)

1 1
ℒ(𝐰, 𝐳, 𝐫, 𝛂, 𝛃, 𝛌) = ‖𝐰‖22 + 𝐳 ⊤ 𝟏 + ‖𝐫‖22 + 𝛂⊤ (𝐰 − 𝐳) − 𝛃⊤ (𝐰 + 𝐳) + 𝛌⊤ (𝐲 − 𝐗𝐰 − 𝐫)
2 2

c. The dual problem is max {min ℒ(𝐰, 𝐳, 𝐫, 𝛂, 𝛃, 𝛌)}. To simplify it, solve the 3 inner problems
𝛂,𝛃,𝛌 𝐰,𝐳,𝐫
min ℒ, min ℒ and min ℒ. In each case, give brief derivation and write the expression you get
𝐰 𝐳 𝐫
while solving the inner problem (e.g., in CSVM min𝐰 ℒ gives 𝐰 = ∑𝑖 𝛼𝑖 𝑦 𝑖 𝐱 𝑖 ). (3x(1+1) marks)
Expression + derivation for min ℒ.
𝐰

𝜕ℒ
Applying FOO and setting = 𝟎 gives us 𝐰 = 𝑋 ⊤ 𝛌 + 𝛃 − 𝛂
𝜕𝐰

Expression + derivation for min ℒ.


𝐳

The term in the Lagrangian involving 𝐳 is 𝐳 ⊤ (𝟏 − 𝛂 − 𝛃) which is linear. The minimization of a


linear function always yields −∞ unless the linear function is identically 0. This means that at
the optimum, we must have 𝛂 + 𝛃 = 𝟏
Page 4 of 4
Expression + derivation for min ℒ.
𝐫

𝜕ℒ
Applying FOO and setting = 𝟎 gives us 𝐫 = 𝛌.
𝜕𝐫

d. Use the expressions obtained above and eliminate 𝛃. Fill in the 5 blank boxes below to show
us the simplified dual you get. 𝑋 ∈ ℝ𝑛×𝑑 is the feature matrix with the 𝑖th row being 𝐱 𝑖 . We
have turned the max dual problem into a min problem by negating the objective. (5x1 marks)
1 ⊤ 1 2
min𝑑 ‖𝑋 ( 𝛌 ) + ( 𝟏 − 𝟐𝛂 )‖2 + ‖𝛌‖22 − 𝛌⊤ ( 𝐲 𝟐 )
𝟐 𝟐
𝛂∈ℝ 2 2
𝛌∈ℝ𝑛
𝟎≤𝛂≤𝟏 ⟸ Write constraint for 𝛂 here.
s.t.
No constraint or equivalently 𝛌 ∈ ℝ𝑛 ⟸ Write constraint for 𝛌 here.
e. For the simplified dual obtained above, let us perform block coordinate minimization.
1. For any fixed value of 𝛂 ∈ ℝ𝒅 , obtain the optimal value of 𝛌 ∈ ℝ𝒏 .
2. For any fixed value of 𝛌 ∈ ℝ𝒏 , obtain the optimal value of 𝛂 ∈ ℝ𝒅 .
Note: the optimal value for a variable must satisfy its constraints (if any). Show brief calculations.
You may use the QUIN trick and invent shorthand notation to save space e.g., 𝐦 ≝ 𝑋𝛂.(2+2 marks)
For any fixed value of 𝛂 ∈ ℝ𝒅 , obtain the optimal value of 𝛌 ∈ ℝ𝒏 : Applying FOO (since there
are no constraints on 𝛌) gives us 𝑋(𝑋 ⊤ 𝛌 + 𝟏 − 2𝛂) + 𝛌 − 𝐲 = 0 i.e.,
𝛌 = (𝑋𝑋 ⊤ + 𝐼𝑛 )−1 (𝐲 + 𝑋(2𝛂 − 𝟏))
where 𝐼𝑛 is the 𝑛 × 𝑛 identity matrix.
For any fixed value of 𝛌 ∈ ℝ𝒏 , obtain the optimal value of 𝛂 ∈ ℝ𝒅 : The optimization problem
1
becomes min ‖𝑋 ⊤ 𝛌 + 𝟏 − 2𝛂‖22 which splits neatly into 𝑑 separate coordinate-wise
𝟎≤𝛂≤𝟏 2
problems as shown below:
1
min (𝑘𝑖 + 1 − 2𝛼𝑖 )2
𝛼𝑖 ∈[0,1] 2

where 𝐤 = [𝑘1 , 𝑘2 , … , 𝑘𝑑 ] ≝ 𝑋 ⊤ 𝛌 ∈ ℝ𝑑 . The above problem can be solved in a single step using
the QUIN trick i.e.,
𝑘𝑖 + 1
𝛼𝑖 = Π[0,1] ( )
2

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy