Lect3 Removed
Lect3 Removed
(EE227A: UC Berkeley)
Lecture 3
(Convex sets and functions)
29 Jan, 2013
◦
Suvrit Sra
Course organization
http://people.kyb.tuebingen.mpg.de/suvrit/teach/ee227a/
Relevant texts / references:
♥ Convex optimization – Boyd & Vandenberghe (BV)
♥ Introductory lectures on convex optimisation – Nesterov
♥ Nonlinear programming – Bertsekas
♥ Convex Analysis – Rockafellar
♥ Numerical optimization – Nocedal & Wright
♥ Lectures on modern convex optimization – Nemirovski
♥ Optimization for Machine Learning – Sra, Nowozin, Wright
Instructor: Suvrit Sra (suvrit@gmail.com)
(Max Planck Institute for Intelligent Systems, Tübingen, Germany)
HW + Quizzes (40%); Midterm (30%); Project (30%)
TA Office hours to be posted soon
I don’t have an office yet
If you email me, please put EE227A in Subject:
2 / 42
Linear algebra recap
3 / 42
Eigenvalues and Eigenvectors
Def. If A ∈ Cn×n and x ∈ Cn . Consider the equation
Ax = λx, x 6= 0, λ ∈ C.
If scalar λ and vector x satisfy this equation, then λ is called an
eigenvalue and x and eigenvector of A.
4 / 42
Eigenvalues and Eigenvectors
Theorem Let λ1 , . . . , λn be eigenvalues of A ∈ Cn×n . Then,
X X Y
Tr(A) = aii = λi , det(A) = λi .
i i i
6 / 42
Positive definite matrices
Def. Let A ∈ Rn×n be symmetric, i.e., aij = aji . Then, A is called
positive definite if
X
xT Ax = xi aij xj > 0, ∀ x 6= 0.
ij
7 / 42
Matrix and vector calculus
f (x) ∇f (x)
xT a
P
=P i xi ai a
xT Ax = ij xi aij xj (A + AT )x
log det(X) X −1
AT
P
Tr(XA) = ij xij aji
Tr(X T A) = ij xij aij
P
A
Tr(X T AX) (A + AT )X
♣ Wikipedia
♣ My ancient notes
♣ Matrix cookbook
♣ I hope to put up notes on less brute-forced approach.
8 / 42
Convex Sets
9 / 42
Convex sets
10 / 42
Convex sets
Def. A set C ⊂ Rn is called convex, if for any x, y ∈ C, the
line-segment θx + (1 − θ)y (here θ ≥ 0) also lies in C.
Combinations
I Convex: θ1 x + θ2 y ∈ C, where θ1 , θ2 ≥ 0 and θ1 + θ2 = 1.
I Linear: if restrictions on θ1 , θ2 are dropped
I Conic: if restriction θ1 + θ2 = 1 is dropped
11 / 42
Convex sets
Theorem (Intersection).
Let C1 , C2 be convex sets. Then, C1 ∩ C2 is also convex.
Proof. If C1 ∩ C2 = ∅, then true vacuously.
Let x, y ∈ C1 ∩ C2 . Then, x, y ∈ C1 and x, y ∈ C2 .
But C1 , C2 are convex, hence θx + (1 − θ)y ∈ C1 , and also in C2 .
Thus, θx + (1 − θ)y ∈ C1 ∩ C2 .
Inductively follows that ∩m
i=1 Ci is also convex.
12 / 42
Convex sets – more examples
13 / 42
Convex sets – more examples
♥ Let x1 , x2 , . . . , xm ∈ Rn . Their convex hull is
nX X o
co(x1 , . . . , xm ) := θi xi | θi ≥ 0, θi = 1 .
i i
♥ polyhedron {x | Ax ≤ b, Cx = d}.
♥ ellipsoid x | (x − x0 )T A(x − x0 ) ≤ 1 , (A: semidefinite)
P
♥ probability simplex {x | x ≥ 0, i xi = 1}
◦
13 / 42
Convex functions
14 / 42
Convex functions
Def. Function f : I → R on interval I called midpoint convex if
f (x)+f (y)
f x+y
2 ≤ 2 , whenever x, y ∈ I.
Read: f of AM is less than or equal to AM of f .
Def. A function f : Rn → R is called convex if its domain dom(f )
is a convex set and for any x, y ∈ dom(f ) and θ ≥ 0
15 / 42
Convex functions
)
f (y)
( 1 − λ)f (y
λf ( x) +
f (x)
x y
f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y)
16 / 42
Convex functions
f (x) yi
−
(y ), x
∇f
) +h
f (y
f (y)
y x
17 / 42
Convex functions
P R
Q
x z = λx + (1 − λ)y y
18 / 42
Recognizing convex functions
♠ If f is continuous and midpoint convex, then it is convex.
♠ If f is differentiable, then f is convex if and only if dom f is
convex and f (x) ≥ f (y) + h∇f (y), x − yi for all x, y ∈ dom f .
♠ If f is twice differentiable, then f is convex if and only if dom f
is convex and ∇2 f (x) 0 at every x ∈ dom f .
19 / 42
Convex functions
Linear: f (θ1 x + θ2 y) = θ1 f (x) + θ2 f (y) ; θ1 , θ2 unrestricted
Concave: f (θx + (1 − θ)y) ≥ θf (x) + (1 − θ)f (y)
Strictly convex: If inequality is strict for x 6= y
20 / 42
Convex functions
Example The pointwise maximum of a family of convex functions is
convex. That is, if f (x; y) is a convex function of x for every y in
some “index set” Y, then
21 / 42
Convex functions
23 / 42
Recognizing convex functions
♠ If f is continuous and midpoint convex, then it is convex.
♠ If f is differentiable, then f is convex if and only if dom f is
convex and f (x) ≥ f (y) + h∇f (y), x − yi for all x, y ∈ dom f .
♠ If f is twice differentiable, then f is convex if and only if dom f
is convex and ∇2 f (x) 0 at every x ∈ dom f .
♠ By showing f to be a pointwise max of convex functions
♠ By showing f : dom(f ) → R is convex if and only if its
restriction to any line that intersects dom(f ) is convex. That is,
for any x ∈ dom(f ) and any v, the function g(t) = f (x + tv) is
convex (on its domain {t | x + tv ∈ dom(f )}).
♠ See exercises (Ch. 3) in Boyd & Vandenberghe for more ways
24 / 42
Operations preserving
convexity
25 / 42
Operations preserving convexity
26 / 42
Operations preserving convexity
27 / 42
Examples
28 / 42
Quadratic
Let f (x) = xT Ax + bT x + c, where A 0, b ∈ Rn , and c ∈ R.
What is: ∇2 f (x)?
∇f (x) = 2Ax + b, ∇2 f (x) = A 0, hence f is convex.
29 / 42
Indicator
Let IX be the indicator function for X defined as:
(
0 if x ∈ X ,
IX (x) :=
∞ otherwise.
30 / 42
Distance to a set
Example Let Y be a convex set. Let x ∈ Rn be some point. The
distance of x to the set Y is defined as
31 / 42
Norms
Let f : Rn → R be a function that satisfies
1 f (x) ≥ 0, and f (x) = 0 if and only if x = 0 (definiteness)
2 f (λx) = |λ|f (x) for any λ ∈ R (positive homogeneity)
3 f (x + y) ≤ f (x) + f (y) (subadditivity)
Such a function is called a norm. We usually denote norms by kxk.
Theorem Norms are convex.
32 / 42
Vector norms
p 1/p
P
Example (`p -norm): Let p ≥ 1. kxkp = i |xi |
Example (Frobenius-norm):
qP Let A ∈ Rm×n . The Frobenius norm
p
of A is kAkF := 2
ij |aij | ; that is, kAkF = Tr(A∗ A).
33 / 42
Mixed norms
kxkp := kξkp0 .
34 / 42
Matrix Norms
Induced norm
Let A ∈ Rm×n , and let k·k be any vector norm. We define an
induced matrix norm as
kAxk
kAk := sup .
kxk6=0 kxk
35 / 42
Operator norm
kAxk2
kAk2 := sup .
kxk2 6=0 kxk2
is a norm; 1 ≤ k ≤ n.
36 / 42
Dual norms
37 / 42
Fenchel Conjugate
38 / 42
Fenchel conjugate
Example Let f (x) = kxk. We have f ∗ (z) = Ik·k∗ ≤1 (z). That is,
conjugate of norm is the indicator function of dual norm ball.
f ∗ (z) = supx z T x − kxk. If kzk∗ > 1, then by definition of the dual
norm, there is u s.t. kuk ≤ 1 and uT z > 1. Now select x = αu and let
α → ∞. Then, z T x − kxk = α(z T u − kuk) → ∞. If kzk∗ ≤ 1, then
z T x ≤ kxkkzk∗ , which implies the sup must be zero.
39 / 42
Fenchel conjugate
40 / 42
Fenchel conjugate
40 / 42
Misc Convexity
41 / 42
Other forms of convexity
♣ Log-convex: log f is convex; log-cvx =⇒ cvx;
♣ Log-concavity: log f concave; not closed under addition!
♣ Exponentially convex: [f (xi + xj )] 0, for x1 , . . . , xn
♣ Operator convex: f (λX + (1 − λ)Y ) λf (X) + (1 − λ)f (Y )
♣ Quasiconvex: f (λx + (1 − λy)) ≤ max {f (x), f (y)}
♣ Pseudoconvex: h∇f (y), x − yi ≥ 0 =⇒ f (x) ≥ f (y)
♣ Discrete convexity: f : Zn → Z; “convexity + matroid theory.”
42 / 42