I. Introduction To Convex Optimization
I. Introduction To Convex Optimization
Introduction to Convex
Optimization
Georgia Tech ECE 8823c notes by M. A. Davenport and J. Romberg. Last updated 12:28, January 8, 2019
Introduction to Optimization
Optimization problems are ubiquitous in science and engineering.
Optimization problems arise any time we have a collection of el-
ements and wish to select the “best” one (according to some cri-
terion). In its most general form, we can express an optimization
problem mathematically as
minimize f0(x) subject to x ∈ X , (1)
x
b∈X
In order to solve this optimization problem, we must find an x
such that
b ) ≤ f (x) for all x ∈ X .
f (x (2)
b satisfying (2) a minimizer of f0 in X , and a solution
We call an x
to the optimization problem (1).
1
Georgia Tech ECE 8823c notes by M. A. Davenport and J. Romberg. Last updated 12:28, January 8, 2019
2. Uniqueness. Note that an x b satisfying (2) need not be
unique. Only when the inequality is strict can we conclude
that there is a unique (strict) minimizer. When can we con-
clude that there is a unique solution?
3. Verification. Given a candidate solution x b , is there a simple
condition we can check to determine if it is a/the solution to
to (1)?
4. Solution. Can we find a closed-form expression for a/the
solution to (1)? Can we provide an efficient algorithm for com-
puting a/the solution to (1)?
Throughout this course we will devote significant attention to all of
these questions, primarily in the context of convex problems.
Convex Optimization
2
Georgia Tech ECE 8823c notes by M. A. Davenport and J. Romberg. Last updated 12:28, January 8, 2019
for all x, y and for all θ ∈ [0, 1]. (Much more on this later!) With
these definitions in hand, a convex program simply corresponds to
one where
1. The constraint set X is a convex subset of a real vector space
(in this class we will focus exclusively on X ⊆ RN ).
2. The objective function f0 : X → R is a convex function.
2
And as we will see, much of what we do can be naturally extended to
non-smooth functions which do not have any derivatives.
3
Georgia Tech ECE 8823c notes by M. A. Davenport and J. Romberg. Last updated 12:28, January 8, 2019
The material in this course has three major components. The first
is the mathematical foundations of convex optimization. We will
see that talking about the solution to convex programs requires a
beautiful combination of algebraic and geometric ideas.
Finally, we will talk a lot about modeling. That is, how convex
optimization appears in signal processing, control systems, machine
learning, statistical inference, etc. We will give many examples of
mapping a word problem into an optimization program. These ex-
amples will be interleaved with the discussion of the first two compo-
nents, and there are several examples which we may return to several
times.
4
Georgia Tech ECE 8823c notes by M. A. Davenport and J. Romberg. Last updated 12:28, January 8, 2019
Convexity and Efficiency
You might have two questions at this point:
This is a valid matrix norm, and we will see later that all valid
norms are convex. But it is known that computing f0 is NP-hard (see
[Roh00]), as is approximating it to a fixed accuracy. So optimizations
involving this quantity are bound to be difficult.
5
Georgia Tech ECE 8823c notes by M. A. Davenport and J. Romberg. Last updated 12:28, January 8, 2019
When there is a solution to a nonconvex program, it often times relies
on nice coincidences in the structure of the problem — perturbing
the problem just a little bit can disturb these coincidences. Consider
another nonconvex program that we know how to solve:
N
X
minimize (Xi,j − Ai,j )2 subject to rank(X) ≤ R.
X
i,j=1
That is, we want the best rank-R approximation (in the least-squares
sense) to the n×n matrix A. The functional we are optimizing above
is convex, but the rank constraint definitely is not. Nevertheless, we
can compute the answer efficiently using the SVD of A:
N
X
T
A = U ΣV = σnunv Tn , σ1 ≥ σ2 ≥ · · · ≥ σn ≥ 0.
n=1
But now suppose that instead of the matrix A, we are given a subset
of its entries indexed by I. We now want to find the matrix that is
most consistent over this subset while also having rank at most R:
X
minimize (Xi,j − Ai,j )2 subject to rank(X) ≤ R.
X
(i,j)∈I
Despite its similarity to the first problem above, this “matrix com-
pletion” problem is NP-hard in general.
6
Georgia Tech ECE 8823c notes by M. A. Davenport and J. Romberg. Last updated 12:28, January 8, 2019
positive, and considering functionals of linear transforms of x all
preserve the essential convex structure.
For the rest of this introduction, we will introduce a few of the very
well-known classes of convex optimization programs, and give an
example of an application for each.
7
Georgia Tech ECE 8823c notes by M. A. Davenport and J. Romberg. Last updated 12:28, January 8, 2019
Linear programming
A linear program (LP) minimizes a linear functional subject to
multiple linear constraints:
minimize cTx subject to aTmx ≤ bm, m = 1, . . . , M.
x
The general form above can include linear equality constraints aTi x =
bi by enforcing both aTi x ≤ bi and (−ai)Tx ≤ bi — in our study
later on, we will find it convenient to specifically distinguish between
these two types of constraints. We can also write the M constraints
compactly as Ax ≤ b, where A is the M × N matrix with the aTm
as rows.
8
Georgia Tech ECE 8823c notes by M. A. Davenport and J. Romberg. Last updated 12:28, January 8, 2019
Example: Chebyshev approximations
Suppose that we want to find the vector x so that Ax does not vary
too much in its maximum deviation:
minimize
N
u subject to ym − aTmx ≤ u
x∈R , u∈R
ym − aTmx ≥ −u
m = 1, . . . , M.
9
Georgia Tech ECE 8823c notes by M. A. Davenport and J. Romberg. Last updated 12:28, January 8, 2019
Filter design
If we restrict ourselves to the case where H∗(ω) has linear phase (so
the impulse response is symmetric around some time index)3 we can
recast this as a Chebyshev approximation problem.
10
Georgia Tech ECE 8823c notes by M. A. Davenport and J. Romberg. Last updated 12:28, January 8, 2019
We will approximate the supremum on the inside by measuring it at
M equally spaced points ω1, . . . , ωM between −π and π. Then
K
X
minimize max H∗(ωm) − xk cos(kωm) = minimize ky − F xk∞,
x ωm x
k=0
It should be noted that since the ωm are equally spaced, the matrix
F (and its adjoint) can be applied efficiently using a fast discrete
cosine transform. We will see later how this has a direct impact
on the number of computations we need to solve the Chebyshev
approximation problem above.
References
[Roh00] J. Rohn. Computing the norm kAk∞,1 is NP-Hard. Linear
and Multilinear Algebra, 47:195–204, 2000.
11
Georgia Tech ECE 8823c notes by M. A. Davenport and J. Romberg. Last updated 12:28, January 8, 2019