0% found this document useful (0 votes)

8 views46 pages

Crs Mfai 2024 Slides

Uploaded by

shibanandasahoo1057

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views46 pages

Crs Mfai 2024 Slides

Uploaded by

shibanandasahoo1057

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 46

Slides for the MFAI (Aug-Dec 2024) Lectures

slides for lectures from Sep 25 - Nov 15, 2024

C R Subramanian

CSE Dept, SECS, Indian Institute of Technology, Bhubaneswar.

I Example Motivation :
I Given : P = {(~xi , yi ) : ~xi ∈ Rd , yi ∈ R)}i=1,...,n ;
I Find : a function y = f (~x ) = ~a · ~x , ~a ∈ Rd , minimising
I total sum of squares of errors E (~a) = ni=1 (yi − ~a · ~xi )2 .
P

I
I Want to find a ~a ∈ Rd which minimises E (~a).
I
I When d = 1, E (a) becomes a continuous function of one
variable a.
I The minimiser a∗ and the minimum value E (a) can be
computed in O(n) time.
Limits and Continuity

I f : O → R is a function. O is an open set. Let a ∈ O.

I limit of f (x) as x approaches a is L if
I ∀ > 0 ∃δ > 0 such that 0 < |x − a| < δ ⇒ |f (x) − L| ≤ .
I Denoted by : Ltx→a f (x) = L.
I Left limit : Ltx→a− f (x) = L. (−δ < x − a < 0)
I Right limit : Ltx→a+ f (x) = L. (0 < x − a < δ).
I
I L exists if and only if left- and right- limits exist and equal L.
I Example : f (x) = [x] does not have limits when x is an
integer ; both left- and right- limits of f exist at integers.
I f (x) = 1/x has limits everywhere but not at x = 0. Both left
and right limits do not exist at x = 0.
Limits and Continuity
I f is continuous at a if f (a) is defined and Ltx→a f (x) = f (a).
I x, x 2 , x 3 , sin(x), cos(x), e x , |x| - continuous everywhere.
I f (x) = [x] continuous everywhere except at integers
I f (x) = x −1 continuous everywhere except at x = 0.
I f and g are continuous at a. Then, f + g , f − g , f · g are
continuous at a. g (a) 6= 0 ⇒ f /g cont. at a.
I
I f is continuous at a, g is continuous at f (a) ⇒
h(x) = g (f (x)) is continuous at a.
2 2 2
I sin e x , e sin(x ) and e sin(x) are continuous everywhere.
I
I f is cont. over [a, b] with f (a) < f (b). Then,
∀c ∈ (f (a), f (b)) ∃x ∈ (a, b) such that f (x) = c.
I f is continuous over [a, b] implies f is bounded over [a, b].
I f is continuous over [a, b] implies f achieves its min and max.
Differentiability

I f is differentiable at a if Ltx→a f (x)−f

x−a
(a)
exists.
I limit is the derivative of f at a, denoted by
I f 0 (a), f (1) (a), dfdx(a) .
I x, x 2 , x 3 , e x , sin(x), cos(x) -differentiable at every x ∈ R.
I |x| is differentiable everywhere except at x = 0.
I
I f is differentiable at a ⇒ f is continuous at a.
I Converse need not be true : |x| and x = 0, for example.
I Left-derivative : same except we focus on x < a.
I Right-derivative : same except we focus on x > a.
I For |x|, fL0 (0) = −1 and fR0 (0) = +1.
Differentiability

I Algebra :
I f and g are defined over R.
I f 0 (a) and g 0 (a) exist for a ∈ R.
I (f ± g )0 (a) = f 0 (a) ± g 0 (a).
I (f · g )0 (a) = f (a) · g 0 (a) + f 0 (a) · g (a).
0 0 (a)·g 0 (a)
I gf (a) = g (a)·f (a)−f g (a)2
provided g (a) 6= 0.
I
I Chain Rule :
I Suppose Range(f ) ⊆ Domain(g ) ; f 0 (a), g 0 (f (a)) exist.
I (g (f ))0 (a) exists and equals g 0 (f (a)) · f 0 (a).
I Familiar version :
I y = f (x), z = g (y ), z = g (f (x)) ⇒ dz dz dy
dx = dy · dx .
Differentiability

I For x ∈ R, B(x, δ) := {y ∈ R : 0 ≤ |y − x| < δ}.

I f is twice-differentiable at a if
I (i) for some δ > 0, f 0 (x) exists for every x ∈ B(a, δ)
f 0 (x)−f 0 (a)
I (ii) derivative of f 0 (x) (= Ltx→a x−a ) exists at a.
df 2 (a)
I Second derivative is denoted by f 00 (a), f (2) (a) and dx 2
.
I
I Generally, for k ≥ 1, f is k-times differentiable at a if
I (i) for some δ > 0, f (k−1) (x) exists for every x ∈ B(a, δ)
I (ii) f (k−1) (x) is differentiable at a.
df k (a)
I k-th derivative denoted by f (k) (a) or .
dx k
Differentiability

I x, x 2 , x 3 , e x , sin(x), cos(x) - k-times differentiable for every

k ≥ 1 and everywhere.
I f (x) = loge x - f (k) (x) exists for every k ≥ 1 for every x > 0.
I
I a is a local minimum / local maximum of f if
I f (a) ≤ f (x) / f (a) ≥ f (x)
I for every x ∈ B(a, δ) for some δ > 0.
I
I f : O → R, O is open.
I a ∈ O is a global minimum / global maximum of f over O if
I f (a) ≤ f (x) / f (a) ≥ f (x) for every x ∈ O.
I Every global optimum is also a local optimum.
Differentiability and optima

I If a is a local optimum for f , then f 0 (a) = 0.

I
I Necessary but not sufficient.
I Example : f (x) = x 3 for x < 0 and f (x) = x 2 for x ≥ 0.
I f 0 (0) = 0 but 0 is neither a local minimum nor a local
maximum for f .
I
I a is a saddle point if f 0 (a) = 0 but a is not a local optimum.
I f 0 (a) = 0 - a is a critical point.
Differentiability and optima

I f 0 (a) = 0 and f 00 (a) > 0 ⇒ a is a local minimum for f .

I sufficient but not necessary.
I Eg : f (x) = −x 3 for x ≤ 0 and f (x) = x 3 for x > 0.
I 0 is global minimum for f . But, f 0 (0) = f 00 (0) = 0.
I
I g 0 (a) = 0 and g 00 (a) < 0 ⇒ a is a local maximum for g .
I sufficient but not necessary.
I Eg : g (x) = −f (x)
I 0 is global maximum for g . But, g 0 (0) = g 00 (0) = 0.
I
Taylor’s Approximation Formula

I f 00 exists and is continuous over (a − δ, a + δ) for some δ > 0.

I Taylor’s first-order approximation formula :
I f (x) = f (a) + f 0 (a)(x − a) + E1 (x), ∀x ∈ B(a, δ)
I where E1 (x) = ax (x − t)f 00 (t)dt → 0 as x → a.
R

I
00 2
I E1 (x) = f (c)(x−a)
2 for some c ∈ (a, x).
I f (a + h) = f (a) + hf 0 (a) + o(h) as h → 0.
I f (a + h) ≈ f (a) + hf 0 (a) as h → 0.
I
I differentiability ⇐⇒ local linearizability.
Taylor’s Approximation Formula

I f 000 exists and is continuous over (a − δ, a + δ) for some δ > 0.

I Taylor’s second-order approximation formula :
00 2
I f (x) = f (a) + f 0 (a)(x − a) + f (a)(x−a)
2 + E2 (x), ∀x ∈ B(a, δ)
1
Rx 2 000
I where E2 (x) = 2 · a (x − t) f (t)dt → 0 as x → a.
I
f 000 (c)(x−a)3
I E2 (x) = 6 for some c ∈ (a, x).
h2 f 00 (a)
I f (a + h) = f (a) + hf 0 (a) + 2 + o(h2 ) as h → 0.
h2 f 00 (a)
I f (a + h) ≈ f (a) + hf 0 (a) + 2 as h → 0.
Taylor’s Approximation Formula

I f (n+1) () exists, continuous over (a − δ, a + δ) for some δ > 0.

I Taylor’s nth-order approximation formula :
(j) j
I f (x) = nj=0 f (a)(x−a)
P
j! + En (x), ∀x ∈ B(a, δ)
1 x
· a (x − t)n f (n+1) (t)dt → 0 as x → a.
R
I where En (x) = n!
I f (0) (a) = f (a).
I
f (n+1) (c)(x−a)n+1
I En (x) = (n+1)! for some c ∈ (a, x).
Pn f (j) (a)hj
I f (a + h) = j=0 j! + o(hn ) as h → 0.
Pn f (j) (a)hj
I f (a + h) ≈ j=0 j! as h → 0.
Lectures on 16/10/2024

I Taylor’s Formula - illustrations

I e x is infinitely differentiable over R.
I ex = 1 + x + x2 xn
2 + ... + n! + o(x n ), x → 0.
I
I log(1 + x) is infinitely differentiable for every x > −1.
x2 x3 n
I log(1 + x) = x − 2 + 3 + . . . + (−1)n−1 xn + o(x n ), x → 0.
I
I cos x is infinitely differentiable for every x ∈ R.
2 4 6 x 2n
I cos x = 1 − x2! + x4! − x6! + . . . + (−1)n (2n)! + o(x n ), x → 0.
I
Taylor’s series

I f is infinitely differentiable over (a − δ, a + δ) for some δ > 0.

I Taylor series expansion for f (x) :
(n) n
I f (x) = f (a) + f 0 (a)(x − a) + . . . + f (a)(x−a) n! + . . ..
I
f (j) (a)(x−a)j
I f (x) = ∞
P
j=0 j! , ∀x ∈ B(a, δ)
I
f (j) (a)hj
I f (a + h) = ∞
P
j=0 j! , ∀h ∈ (−δ, δ).
I
I infinite differentiability is necessary but not sufficient.
Optimisation :

I Problem : Minimise (or Maximise) f (x) subject to x ∈ Ω.

I Given : oracle access to computing f (x), f 0 (x) and f 00 (x)
I and oracle access to testing “x ∈ Ω ?” :
I Goal : Find a x ∈ Ω optimising f (x).
I
I A General Optimisation Algorithm :
1. Start with an initial guess x.
2. while x is not an optimal solution do
3. Determine a search direction p ;
4. x ← x + p. endwhile
5. Return x.
Optimisation :

I Repeatedly check for local optimality ;

I Check if f 0 (x) = 0 and if f 00 (x) 6= 0.
I Calls for finding zeroes of f 0 (x).
I Search direction p is guided by the optimality check.
I In special cases like Linear Programs or semi-definite
I programs, other direct and efficient approaches available.
I
I Checking global optimality is a much harder problem.
Newton’s Method for finding zeroes :

I Given oracle access to computing f and f 0 ,

Goal : To compute a x ∗ satisfying f (x ∗ ) = 0.
I Newton’s Method for finding roots :
1. Start with an initial guess x.
2. while f (x) 6= 0 and f 0 (x) 6= 0 do
3. p ← − ff 0(x)
(x) ; x ← x + p. endwhile
4. Return x.
I
I One can replace f (x) 6= 0 by |f (x)| > , small .
Newton’s Method - Analysis :
I Analysis of Newton’s Method :
I
I x0 = initial guess ; xk = guess after k iterations ;
I xk+1 = xk − ff 0(xk) ∗
(x ) ; ek = xk − x ;
k
f 00 (ηk )ek2 f (xk ) f 00 (ηk )ek2
I 0 = f (xk ) − ek f 0 (xk ) + 2 ⇒ ek = f 0 (xk ) + 2f 0 (xk ) .
f (xk ) f 00 (ηk ) f 00 (x ∗ )
I ek+1 = ek − f 0 (xk ) = ek2 · 2f 0 (xk ) → ek2 · 2f 0 (x ∗ ) , as xk → x ∗ ;
I
00 ∗ 2k
2f 0 (x ∗ )

f (x )
I ∀ large k, ek ≈ e0 · 2f 0 (x ∗ ) · f 00 (x ∗ ) .
I If {xk } → x ∗ , the convergence rate is quadratic with rate
f 00 (x ∗ ) |ek+1 | f 00 (x ∗ )
constant 2f 0 (x ∗ ) , that is, Ltk→∞ |e |2 = 2f 0 (x ∗ ) .
k
I Works fine if x0 is reasonably close to x ∗ and rate constant is
not too big.
Unconstrained Optimisation in 1D :

I Given : oracle access to computing f 0 and f 00 :

I Optimising f ⇐⇒ repeatedly finding roots of f 0 (x) = 0.
I Optimising strictly convex f ⇐⇒ finding a root of f 0 (x) = 0.
I
I By applying Newton’s Method for finding roots,
I can find approximations to a root of f 0 (x) = 0 with
f 000 (x ∗ )
I quadratic convergence rate and rate constant 2f 00 (x ∗ )
I where x ∗ is a root of f 0 (x) = 0.
Gradient-Descent Method :

I Assumption : |f 00 (x)| ≤ L for x ∈ [a, b].

I Given oracle access to computing f and f 0 ,
Goal : To compute a x ∗ satisfying f 0 (x ∗ ) = 0.
I
1. Start with an initial guess x. Define γ ← L−1 .
2. while f 0 (x) 6= 0 do x ← x − γf 0 (x) endwhile
3. Return x.
I
Gradient-Descent - Analysis :

I xk = value of x after k iterations ; xk+1 = xk − γf 0 (xk ).

2 0 2 0 2
I f (xk+1 ) ≤ f (xk ) − γf 0 (xk )2 + Lγ f 2(xk ) = f (xk ) − f (xk)
2L .
I f (xk ) < f (xk ) for each k. {f (xk )}k is a decreasing sequence
converging to a limit a.
I {xk }k converges to a limit x ∗ satisfying f (x ∗ ) = Ltk f (xk ).
I Ltk f 0 (xk )2 ≤ 2L · Ltk (f (xk ) − f (xk+1 )) = 0 ⇒ f 0 (x ∗ ) = 0.
I A local optimum or a saddle point can be approached
arbitrarily closely.
Scalar and Vector functions

I f : Rn → Rm , n, m ≥ 1.
I m = 1 - real-valued or scalar functions/fields.
I m > 1 - vector-valued or vector functions/fields.
I n = 1 and m > 1 - trajectories (say, of a projectile in 3-space).
I q
I ~x ∈ Rd . ||~x ||2 = x12 + . . . + xd2 - L2 -norm of x.
I ~x , ~y ∈ Rd . d2 (~x , ~y ) = ||~x − ~y ||2 - L2 -distance.
I
I f : Rn → Rm . ~a ∈ Rn , ~l ∈ Rm .
I Lt~x →~a f (x) = ~l if, ∀ > 0, ∃δ > 0
I satisfying d2 (f (~x ), ~l) ≤ whenever 0 < d2 (~x , ~a) ≤ δ.
I f is continuous at ~a if Lt~ f ~(x) = f (~a).
x →~
a
Scalar and Vector functions

I Suppose f (~x ) → f~ and g (~x ) → ~g when ~x → ~a.

I Then, f (~x ) ± g (~x ) → f~ ± ~g as ~x → ~a.
I αf (~x ) → αf~ as ~x → ~a for every α ∈ R.
I ||f (~x )||2 → ||f~||2 as ~x → ~a.
I f (~x ) · g (~x ) → f~ · ~g as ~x → ~a.
I
I f : Rn → Rm defined by f (~x ) = (f1 (~x ), . . . , fm (~x )) for each x.
I f is continuous at ~a if and only if each fi is continuous at ~a.
I
Scalar and Vector functions
I Suppose f : Rn → Rm and g : Rm → Rp . Define
h = g · f : Rn → Rp by h(~x ) = g (f (~x )) for each x ∈ Rn .
I Suppose also that f is continuous at ~a ∈ Rn and g is
continuous at f (~a) ∈ Rm . Then, h is continuous at ~a.
I
I f1 , f2 , f3 , f4 : R2 → R be defined by
I f1 (x, y ) = sin(x 2 y ) ; f2 (x, y ) = loge (x 2 + y 2 ) ;
x+y
I f3 (x, y ) = ex+y ;
I f1 is continuous everywhere ;
I f2 is continuous everywhere except at (0, 0).
I f3 is continuous everywhere except on the line x + y = 0.
I
I f (x, y ) = x 2xy
+y 2
for (x, y ) 6= (0, 0) and f (0, 0) = 0.
I f is continuous as a function of x alone and as a function of y
alone but not as function of x and y both.
Differentiability of Scalar functions

I f : Rn → R may have different derivatives along different

directions at a point ~a.
I Focus on specific directions ~y .
I
I ~a, ~y ∈ Rn . The derivative of f at ~a along ~y is defined as
I Lth→0 f (~a+h~yh)−f (~a) . Denoted by f 0 (~a, ~y ) or dfd~(~
a)
y .
I f 0 (~a, ~0) = 0 always for any ~a.
I When ||~y ||2 = 1, f 0 (~a, ~y ) is the directional derivative of f at ~a.
∂f (~
a)
I When ~y = ei along xi axis, f 0 (~a, ei ) = ∂xi .
I the gradient of f at ~a is the vector

I ∇f (~a) = ∂f∂x(~a) , . . . , ∂f∂x(~a) .
1 n
Differentiability of Scalar functions

I Existence of directional derivatives f 0 (~a, ~y ) for each ~y does

not guarantee f is continuous at ~a.
I Example : f (y ) = xy 2
x 2 +y 4
for x 6= 0 and f (0, y ) = 0 for all y .
I f 0 ((0, 0), ~y ) exists for each y ∈ Rn .
I Along the parabola x = y 2 , f (x, y ) = 1/2 and f is not
continuous at (0, 0).
I
I f is differentiable at ~a if, for some r > 0, there exist a LT
I T~a : Rn → R and a scalar function E~a (~y ) such that
I f (~a + ~y ) = f (~a) + T~a (~y ) + ||~y ||2 · E~a (~y ) holds true for all
||~y || < r and E~a (~y ) → 0 as ||~y || → 0.
I T~a is the Total Derivative of f at ~a, denoted also by f 0 (~a).
Differentiability of Scalar functions

I f is differentiable at ~a =⇒ T~a (~y ) = f 0 (~a, ~y ) for each ~y .

I Also, T~a (~y ) = ∇ f (~a) · ~y = ni=1 ∂f∂x(~a) · yi for each ~y .
P
i
I f is differentiable at ~a =⇒ f is continuous at ~a.
I
I f is differentiable at ~a =⇒ Taylor’s first order formula :
I f (~a + ~y ) = f (~a) + ∇f (~a) · ~y + ||y ||E~a (~y ), ||y || < r .
I
I When ||y || = 1, f 0 (~a, ~y ) = ||∇f (~a)|| · cos(θ) where
I θ = angle between ∇f (~a) and ~y .
I f 0 (~a, ~y ) = component of ∇f (~a) in the direction of ~y .
Sufficiency for Differentiability and Chain Rule
I f is a scalar function over Rn and ~a ∈ Rn .
I If all first-order partial derivatives exist at all points in an
open neighborhood around ~a and they are continuous at ~a,
then f is differentiable at ~a.
I
I Chain Rule : r : O → S, f : S → R, O ⊆ R, S ⊆ Rn .
I Suppose r 0 (t) exists and f 0 (r (t)) exists. Then, for g = f · r ,
I g 0 (t) exists and g 0 (t) = ∇f (r (t)) · r 0 (t).
I
I Write r (t) = (r1 (t), . . . , rn (t)).
I r 0 (t) = (r10 (t), . . . , rn0 (t)).

I ∇f (r (t)) = ∂f ∂r (r (t))
, . . . , ∂f (r (t))
∂rn .
1

I g 0 (t) = ni=1 ∂f (r (t)) dri (t)

P
∂ri · dt .
Higher-order derivatives for Scalar functions

I f : O → R, O ⊆ Rn , O is open.
I Suppose f 0 (~x ) exists for every ~x ∈ B(~a, r ).
I Derivative of f 0 at ~a, if it exists, is the second-derivative f 00 (~a).
2
I Our Focus : Second-order partial derivatives - ∂∂xf ∂x
(~a)
.
i j
2
I Hessian (denoted by ∇2 f (~a)) is the matrix ∂∂xf ∂x
(~
a)
.
i j i,j
I Hessian is symmetric if the second-order pds are continuous.
Taylor’s approximation

I f : O → R, O ⊆ Rn , ~a ∈ O.
I second-order pds are continuous.
~T ·∇2 f (~
p a)·~
p
I f (~a + p~) = f (~a) + p~T · ∇f (~a) + 2 + . . ..
I
~T ·∇2 f (~
p η )·~
p
I f (~a + p~) = f (~a) + p~T · ∇f (~a) + 2
I for some ~η ∈ L(~a, ~a + p~).
I
2
pi ∂f∂x(~ai ) + pi pj ∂∂xfi ∂x
(~
η)
Pn Pn
I f (~a + p~) = f (~a) + i=1 i,j=1 j
I
pi ∂f∂x(~ai ) + o(||~
Pn
I f (~a + p~) = f (~a) + i=1 p ||) as p~ → ~0.
I
I Linear approximation : f (~a + p~) ≈ f (~a) + p~T · ∇f (~a).
Example (from Griva, Nash and Sofer)

I Consider f (x, y ) = x 3 + 5x 2 y + 7xy 2 + 2y 3 . Let ~a = (−2, 3).

I ∇f (~a) = (3x 2 + 10xy + 7y 2 , 5x 2 + 14xy + 6y 2 )(−2,3) =
(15, −10).

2 6x + 10y 10x + 14y 18 22
I ∇ f (~a) = =
10x + 14y 14x + 12y (−2,3) 22 8
I Let p~ = (0.1, 0.2).
~T ·∇2 f (~
p a)·~
p
I f (~a + p~) = f (−1.9, 3.2) ≈ f (~a) + p~T · ∇f (~a) + 2 .
I f (−1.9, 3.2) ≈ −20 − 0.5 + 0.69 = −19.81
I Actual f (−1.9, 3.2) = −19.755.
Unconstrained minimisation of scalar functions

I f is a scalar function.
I ~a is a local minimum for f ⇒ ∇f (~a)T · p~ ≥ 0 for all p~.
I ~a is a local minimum for f ⇒ ∇f (~a) = ~0.
I Necessary but not sufficient.
I ~a is a local minimum for f ⇒ ∇2 f (~a) is positive semi-definite.
I
I Sufficiency :
I ∇f (~a) = ~0 and ∇2 f (~a) is positive definite ⇒ ~a is a local
minimum.
I A symmetric matrix B is positive semi-definite (B 0) if
x T Bx ≥ 0 for all x ∈ Rn .
I A symmetric matrix B is positive definite (B 0) if
x T Bx > 0 for all x 6= ~0.
Unconstrained Minimization : Newton’s Method

I f : Rn → R, a scalar function.
I Given oracle access to computing ∇f and ∇2 f ,
Goal : To compute a local minimizer ~x ∗ of f .
I Newton’s Method for Minimizing :
1. Start with an initial guess ~x .
2. while ∇f (~x ) 6= ~0 and ∇2 f (~x ) 0 do
−1
3. p ← − ∇2 f (~x ) · ∇f (~x ) ; x ← x + p. endwhile
4. Return x.
I
I In practice, one replaces ∇f (~x ) 6= ~0 by ||∇f (~x )|| > , small .
Unconstrained Minimization : Newton’s Method

I Obtained by minimizing the RHS of the quadratic

approximation :
2
I f (~x ) ≈ f (~xk ) + ∇f (xk )(~x − ~xk ) + (~x −~xk )∇ f2(~xk )(~x −~xk ) .
I ∇2 f is Lipschitz continuous on O, that is,
||∇2 f (~x ) − ∇2 f (~y )|| ≤ L||~x − ~y ||, ∀~x , ~y ∈ O.
I ~x ∗ - minimizer of f and ∇2 f (~x ∗ ) 0.
I If ||~x − ~x ∗ || is “sufficiently small”,
then {~xk }k converges quadratically to ~x ∗ .
I
Unconstrained Minimization : Gradient-Descent Method :

I Descent along direction of Steepest Descent, namely, −∇f (~x ).

I Assumption : ||∇2 f (~x )|| ≤ L for x ∈ O, for some L > 0.
I Given oracle access to computing ∇f () and f (),
Goal : To compute a ~x ∗ satisfying ∇f (~x ∗ ) = ~0.
I
1. Start with an initial guess x. Define γ ← L−1 .
2. while ∇f (~x ) 6= ~0 do ~x ← ~x − γ∇f (~x ) endwhile
3. Return ~x .
I
I In practice, one replaces ∇f (~x ) 6= ~0 by ||∇f (~x )|| > , small .
Minimization of Scalar functions : Grad-Des. - Analysis :

I ~xk = value of ~x after k iterations ; ~xk+1 = ~xk − γ∇(~xk ).

γ 2 ||∇2 f (~ xk )||2
xk )||·||∇f (~
I f (~xk+1 ) ≤ f (~xk ) − γ||∇f (~xk )||2 + 2
xk )||2
||∇f (~
I = f (~xk ) − 2L .
I f (~xk+1 ) < f (~xk ) for each k. {f (~xk )}k is a decreasing
sequence converging to a limit a.
I As in the 1D-case, {~xk }k converges to a limit ~x ∗ satisfying
f (~x ∗ ) = Ltk f (~xk ).
I Ltk ||∇f (~xk )||2 ≤ 2L · Ltk (f (~xk ) − f (~xk+1 )) = 0
⇒ ∇f (~x ∗ ) = ~0.
I A local optimum or a saddle point can be approached
arbitrarily closely.
Gradient Descent with Backtracking Line Search :

I Presumes apriori knowledge of L. Possibly not available.

I
I x0 ← initial guess of ~x ∗ ; n ← 0 ;
I while ∇f (~xn ) 6= ~0 do
I γn ← initial estimate of Step size γ ;
xn )||2
γn ||∇f (~
I while f (~xn − γn ∇f (~xn )) > f (~xn ) − 2 do
I γn ← γn /2 endwhile
I ~xn+1 ← ~xn − γn ∇f (~xn ) ; n ← n + 1. endwhile
I Return ~xn .
I
I In practice, one replaces ∇f (~xn ) 6= ~0 by ||∇f (~xn )|| > .
I Takes care of narrow, deep valleys and chooses γ adaptively.
Descent with Exact Line Search

I p~ is any descent direction, that is, ∇f (~x )T p~ < 0.

I Solve P: minα>0 f (~x + α~ p ) for its optimum solution α∗ .
I Replace the current solution ~x by ~x ← ~x + α∗ p~.
I Solving P exactly is often possible, involving only variable α.
I Tries to update the current solution to the best possible one
in the direction of descent p~.
Newton’s method (NM) vs Gradient descent (GD)

I GD guarantees convergence while NM can fail if Hessian is

not positive definite.
I NM provides quadratic rate of convergence if ~x0 is
“reasonably close” local minimum.
I NM is computationally expensive (computing Hessian and its
inverse) and also suffers from numerical instabilities.
I Where applicable, NM converges much faster than GD if we
start within a suitable neighborhood.
I For GD, choose step size small in regions of greater variability
of the gradient and large in regions of small variability.
Gaussian Smoothing

I Tries to find a convex approximation of f by employing

Gaussian smoothening.
I f is replaced by g where g (x) is a weighted average of values
of y in the neighborhood.
I The weights are chosen by employing a Gaussian distribution.
I Has the effect of smoothing out sudden dips or ascents in the
value of f .
I Often finds even a global minimum (as against a local one)
even for non-convex f .
Stochastic Gradient Descent

I A tool very efficient and useful for f ’s of the form

f = ni=1 fi (~x ).
P

I Example : Least-squares : Minimize

2 P
f (~x ) = ni=1 yi − ~aiT ~x = ni=1 fi (~x ).
P

I shortens the work to compute ∇f (~x ) by computing a

stochastic approximation to it. O(nd) vs O(d).
I Idea : Choose uniformly at random i ∈ {1, . . . , n} and
compute r (~x ) = n∇fi (~x ).
Pn
I E[r (~x )] = i=1 n∇f
n
i (~
x)
= ∇f (~x ).
I r (~x ) is an unbiased estimator of ∇f (~x ).
Convex sets
I S ⊆ R d is convex if,
I λx + (1 − λ)y ∈ S, ∀x, y ∈ S and 0 ≤ λ ≤ 1.
I for every x, y ∈ S, the unique line segment L(~x , ~y ) is also in S.
I
I Examples : circle : d-boxes (ai ≤ xi ≤ bi ∀i), d-spheres
(d2 (x, c) ≤ r 2 ), hyperplanes (aT · x = b), half-spaces
(aT · x ≤ b), feasible solutions of a system of linear
constraints ({~x : Ax ≤ b}), etc.
I
I Si (i ∈ I ) are convex ⇒ i∈I Si is convex.
T

I A convex combination of x1 , . . . , xk is any vector y satisfying

P P
y = i λi xi where λi ≥ 0, ∀i and i λi = 1.
I S is convex ⇒ all convex combinations of any finite subset of
S are also in S.
Convex functions

I S ⊆ Rd is convex. f : S → R - scalar function.

I
I f is convex over S if, ∀x, y ∈ S and 0 ≤ λ ≤ 1,
I f (λx + (1 − λ)y ) ≤ λf (x) + (1 − λ)f (y ).
I for every x, y ∈ S, the segment of the curve f (x) between x
and y lies entirely on or below unique line segment joining
(x, f (x)) and (y , f (y )).
I
I f is strictly convex if ≤ is replaced with <.
I f is concave if ≤ is replaced with ≥.
I f is strictly concave if ≥ is replaced with >.
I f is convex over S if and only if −f is concave over S.
Convex functions

I Examples : aT x + b, x T Qx for pos-def. Q, etc. over Rd .

I Examples of f : R → R : x 2 , x 4 , x 6 , . . . , e ax , a ∈ R, over R.
I
I f is convex ⇔ f (y ) ≥ f (x) + ∇f (x) · (y − x), ∀x, y ∈ S.
I f is convex ⇔ ∇2 f (x) is pos-semi-def. for every x ∈ S.
I f is strictly convex ⇔ ∇2 f (x) is pos-def. ∀ x ∈ S.
I Analogous statements hold true for concave functions.
I
I f , g convex over S, α ≥ 0 ⇒ f + g and αf are convex over S.
Minima and maxima convex functions
I f : S → R - convex function. S ⊆ Rd - convex.
I
I x ∈ S is a local minimum for f if, for some r > 0,
I f (x) ≤ f (y ) for every y ∈ S satisfying ||x − y ||2 ≤ r .
I
I x is a local minimum of f over S if and only if
I (A) : ∇f (x) · (y − x) ≥ 0 for every y ∈ S.
I follows from f 0 (x, y − x) = ∇f (x) · (y − x).
I For S = Rd , (A) ⇔ ∇f (x) = 0.
I
I Every local minimum of f is a global minimum of f over S.
I follows by considering y ∈ S arbitrarily close to x.
I

MATH 147 001 F23 Sample Final
No ratings yet
MATH 147 001 F23 Sample Final
4 pages
Standard Spreadsheet For Continuous Column
100% (1)
Standard Spreadsheet For Continuous Column
12 pages
Address Proof
No ratings yet
Address Proof
1 page
Paranoia Mutant Forms
100% (2)
Paranoia Mutant Forms
5 pages
Chapter 6-Well Completion
100% (4)
Chapter 6-Well Completion
49 pages
NA1 Lecture Chap.01-06-Paged
No ratings yet
NA1 Lecture Chap.01-06-Paged
313 pages
Butterfly Knife
No ratings yet
Butterfly Knife
5 pages
Math 111
No ratings yet
Math 111
163 pages
Coen3114 Intro To Assembly Language Programming PDF
No ratings yet
Coen3114 Intro To Assembly Language Programming PDF
80 pages
Gas Station Guidelines
100% (1)
Gas Station Guidelines
16 pages
Analysis I I Maths
No ratings yet
Analysis I I Maths
150 pages
Convex Optimization
No ratings yet
Convex Optimization
108 pages
Forrest M137CN PDF
No ratings yet
Forrest M137CN PDF
299 pages
Lecture 4 - Agggregate Properties
No ratings yet
Lecture 4 - Agggregate Properties
45 pages
SequnceSeries of Functions
No ratings yet
SequnceSeries of Functions
50 pages
(Answers) 20200915172413prl3 - v1 - 0 - Exercise - Year - End - Federal - 2017 - 0120
100% (1)
(Answers) 20200915172413prl3 - v1 - 0 - Exercise - Year - End - Federal - 2017 - 0120
13 pages
Alder - Multivariate Calculus
100% (1)
Alder - Multivariate Calculus
198 pages
Assignment For B.tech First Sem Mathematics I 2017
No ratings yet
Assignment For B.tech First Sem Mathematics I 2017
29 pages
Dynamical Systems: Supplementary Notes
No ratings yet
Dynamical Systems: Supplementary Notes
27 pages
Mathematical Preliminaries and Error Analysis: Per-Olof Persson
No ratings yet
Mathematical Preliminaries and Error Analysis: Per-Olof Persson
16 pages
Lecture Notes ON Differential and Integral Calculus: Pablo S. Casas
No ratings yet
Lecture Notes ON Differential and Integral Calculus: Pablo S. Casas
29 pages
General General General General Description Description Description Description
No ratings yet
General General General General Description Description Description Description
8 pages
Chapter 5 - Differentiation: A B - For Any Point C Ab
No ratings yet
Chapter 5 - Differentiation: A B - For Any Point C Ab
7 pages
Numerical Methods 1
No ratings yet
Numerical Methods 1
10 pages
Calculus Reference and Notes
No ratings yet
Calculus Reference and Notes
3 pages
Ma1102R Calculus Lesson 12: Wang Fei
No ratings yet
Ma1102R Calculus Lesson 12: Wang Fei
11 pages
Model YCRL Remote Condenser Scroll Liquid Chiller Style A: FORM 150.27-EG1 (1210)
No ratings yet
Model YCRL Remote Condenser Scroll Liquid Chiller Style A: FORM 150.27-EG1 (1210)
44 pages
Notes
No ratings yet
Notes
21 pages
53S21 Review Midterm 2 Solutions
No ratings yet
53S21 Review Midterm 2 Solutions
27 pages
Calculus Sem1 Solutions
No ratings yet
Calculus Sem1 Solutions
148 pages
Chew 1505 CH2
No ratings yet
Chew 1505 CH2
66 pages
Calc Study Notes
No ratings yet
Calc Study Notes
6 pages
Math I Lecture Notes
No ratings yet
Math I Lecture Notes
8 pages
Final Notes For AB and BC
No ratings yet
Final Notes For AB and BC
16 pages
Calculus BC Formulas
No ratings yet
Calculus BC Formulas
23 pages
Answers To Calculus Study Guide
No ratings yet
Answers To Calculus Study Guide
148 pages
Lecture 2
No ratings yet
Lecture 2
19 pages
M-Story Steel Building - FA - 01 PDF
No ratings yet
M-Story Steel Building - FA - 01 PDF
16 pages
Chapter 1 Notes
No ratings yet
Chapter 1 Notes
48 pages
1 World Cup Russia 2018 Stickers
No ratings yet
1 World Cup Russia 2018 Stickers
9 pages
LectureNotes HT22 Part2
No ratings yet
LectureNotes HT22 Part2
298 pages
Biology Course Outline 2021-2022
No ratings yet
Biology Course Outline 2021-2022
5 pages
Ordinary Differential Equation
No ratings yet
Ordinary Differential Equation
13 pages
BC Things To Know
No ratings yet
BC Things To Know
14 pages
Exercises MEF - 5 - 2018 - Solution
No ratings yet
Exercises MEF - 5 - 2018 - Solution
6 pages
Calc 2 Springstudyguide
No ratings yet
Calc 2 Springstudyguide
5 pages
Review
No ratings yet
Review
14 pages
320 HMWK9 Solns
No ratings yet
320 HMWK9 Solns
2 pages
Lenoir-Lowood - TheatersOfWar - THE MILITARY-ENTERTAINMENT COMPLEX
No ratings yet
Lenoir-Lowood - TheatersOfWar - THE MILITARY-ENTERTAINMENT COMPLEX
42 pages
Calculus BC Formulas
No ratings yet
Calculus BC Formulas
23 pages
2 - Part1-For
No ratings yet
2 - Part1-For
6 pages
MAT - Formula Chapter 1, 2, 3 & 4
No ratings yet
MAT - Formula Chapter 1, 2, 3 & 4
4 pages
NewsRecord14 04 23
No ratings yet
NewsRecord14 04 23
12 pages
Ma1505 Cheat
No ratings yet
Ma1505 Cheat
4 pages
Introductory Calculus: Jacob Lee
No ratings yet
Introductory Calculus: Jacob Lee
14 pages
Summer Gizmo Lab 2
No ratings yet
Summer Gizmo Lab 2
4 pages
Decimals: Skill 4 - 27B: Estimate Sums and Differences Directions: Estimate by Rounding. Rewrite Each Problem
No ratings yet
Decimals: Skill 4 - 27B: Estimate Sums and Differences Directions: Estimate by Rounding. Rewrite Each Problem
3 pages
Class XII - Math Chapter: Differential Calculus: X C X C +
No ratings yet
Class XII - Math Chapter: Differential Calculus: X C X C +
5 pages
Sep 6
No ratings yet
Sep 6
3 pages
School of Mathematics and Physics, The University of Queensland
No ratings yet
School of Mathematics and Physics, The University of Queensland
1 page
Film Poster Proposal 1
No ratings yet
Film Poster Proposal 1
3 pages
Jai Gurudev Maharishi Vidya Mandir, Mangadu Physics - Worksheet Electric Fields & Charges
No ratings yet
Jai Gurudev Maharishi Vidya Mandir, Mangadu Physics - Worksheet Electric Fields & Charges
3 pages
Post Task M6-Pharmacology
No ratings yet
Post Task M6-Pharmacology
2 pages
Corporate Governance
No ratings yet
Corporate Governance
10 pages
Calculus 1 讲义
No ratings yet
Calculus 1 讲义
298 pages
He Week 2
No ratings yet
He Week 2
19 pages
Analisa Respon
No ratings yet
Analisa Respon
9 pages
SFU 151 Exams
No ratings yet
SFU 151 Exams
141 pages
Prerna
No ratings yet
Prerna
4 pages
Lecture 3 Taxonomy Taylor
No ratings yet
Lecture 3 Taxonomy Taylor
4 pages
Hiking
No ratings yet
Hiking
13 pages
Calculus Formula
100% (3)
Calculus Formula
9 pages
Calculus SR 1
No ratings yet
Calculus SR 1
10 pages
Smartboard Orientation
No ratings yet
Smartboard Orientation
1 page
First-Order Taylor Approximation in Multiple Variables
No ratings yet
First-Order Taylor Approximation in Multiple Variables
28 pages
TUQ English
No ratings yet
TUQ English
3 pages
Calculus MIE
No ratings yet
Calculus MIE
50 pages
Calculus-Gilbert Strang
No ratings yet
Calculus-Gilbert Strang
2 pages
Numerical Methods MathPrelims ErrorAnalysis
No ratings yet
Numerical Methods MathPrelims ErrorAnalysis
15 pages
(강의노트) 1장 - Limits and Differentiation Rules
No ratings yet
(강의노트) 1장 - Limits and Differentiation Rules
18 pages
Numerical Analysis 1
No ratings yet
Numerical Analysis 1
40 pages
2 - Continuity and Differentiation Formulae
No ratings yet
2 - Continuity and Differentiation Formulae
8 pages
Calculus Notes
No ratings yet
Calculus Notes
10 pages
Optimization Lecture 2
No ratings yet
Optimization Lecture 2
7 pages
Differential - Calculus L.C by Kalyango - Hassan
No ratings yet
Differential - Calculus L.C by Kalyango - Hassan
46 pages
Differentiation
No ratings yet
Differentiation
11 pages
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Hyperbolic Functions (Trigonometry) Mathematics E-Book For Public Exams
From Everand
Hyperbolic Functions (Trigonometry) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Crs Mfai 2024 Slides

Uploaded by

Crs Mfai 2024 Slides

Uploaded by

Slides for the MFAI (Aug-Dec 2024) Lectures

slides for lectures from Sep 25 - Nov 15, 2024

CSE Dept, SECS, Indian Institute of Technology, Bhubaneswar.

I f : O → R is a function. O is an open set. Let a ∈ O.

I f is differentiable at a if Ltx→a f (x)−f

I For x ∈ R, B(x, δ) := {y ∈ R : 0 ≤ |y − x| < δ}.

I x, x 2 , x 3 , e x , sin(x), cos(x) - k-times differentiable for every

I If a is a local optimum for f , then f 0 (a) = 0.

I f 0 (a) = 0 and f 00 (a) > 0 ⇒ a is a local minimum for f .

I f 00 exists and is continuous over (a − δ, a + δ) for some δ > 0.

I f 000 exists and is continuous over (a − δ, a + δ) for some δ > 0.

I f (n+1) () exists, continuous over (a − δ, a + δ) for some δ > 0.

I Taylor’s Formula - illustrations

I f is infinitely differentiable over (a − δ, a + δ) for some δ > 0.

I Problem : Minimise (or Maximise) f (x) subject to x ∈ Ω.

I Repeatedly check for local optimality ;

I Given oracle access to computing f and f 0 ,

I Given : oracle access to computing f 0 and f 00 :

I Assumption : |f 00 (x)| ≤ L for x ∈ [a, b].

I xk = value of x after k iterations ; xk+1 = xk − γf 0 (xk ).

I Suppose f (~x ) → f~ and g (~x ) → ~g when ~x → ~a.

I f : Rn → R may have different derivatives along different

I Existence of directional derivatives f 0 (~a, ~y ) for each ~y does

I f is differentiable at ~a =⇒ T~a (~y ) = f 0 (~a, ~y ) for each ~y .

I g 0 (t) = ni=1 ∂f (r (t)) dri (t)

I Consider f (x, y ) = x 3 + 5x 2 y + 7xy 2 + 2y 3 . Let ~a = (−2, 3).

I Obtained by minimizing the RHS of the quadratic

I Descent along direction of Steepest Descent, namely, −∇f (~x ).

I ~xk = value of ~x after k iterations ; ~xk+1 = ~xk − γ∇(~xk ).

I Presumes apriori knowledge of L. Possibly not available.

I p~ is any descent direction, that is, ∇f (~x )T p~ < 0.

I GD guarantees convergence while NM can fail if Hessian is

I Tries to find a convex approximation of f by employing

I A tool very efficient and useful for f ’s of the form

I Example : Least-squares : Minimize

I shortens the work to compute ∇f (~x ) by computing a

I A convex combination of x1 , . . . , xk is any vector y satisfying

I S ⊆ Rd is convex. f : S → R - scalar function.

I Examples : aT x + b, x T Qx for pos-def. Q, etc. over Rd .

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.