0% found this document useful (0 votes)
75 views

I. Introduction To Convex Optimization

This document provides an introduction to convex optimization. It begins by defining optimization problems and noting that they involve minimizing or maximizing an objective function subject to constraints. The key aspects of convex optimization that make problems easier to solve are that local minimizers are also global minimizers, and first-order necessary conditions for optimality are also sufficient. The document then discusses how convex programs can be modeled and various questions that arise regarding their existence, uniqueness, verification of solutions, and methods for finding solutions.

Uploaded by

AZEEN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views

I. Introduction To Convex Optimization

This document provides an introduction to convex optimization. It begins by defining optimization problems and noting that they involve minimizing or maximizing an objective function subject to constraints. The key aspects of convex optimization that make problems easier to solve are that local minimizers are also global minimizers, and first-order necessary conditions for optimality are also sufficient. The document then discusses how convex programs can be modeled and various questions that arise regarding their existence, uniqueness, verification of solutions, and methods for finding solutions.

Uploaded by

AZEEN
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

I.

Introduction to Convex
Optimization

Georgia Tech ECE 8823c notes by M. A. Davenport and J. Romberg. Last updated 12:28, January 8, 2019
Introduction to Optimization
Optimization problems are ubiquitous in science and engineering.
Optimization problems arise any time we have a collection of el-
ements and wish to select the “best” one (according to some cri-
terion). In its most general form, we can express an optimization
problem mathematically as
minimize f0(x) subject to x ∈ X , (1)
x

where X is a set which defines the possible elements x to consider,


and the functional f0 : X → R quantifies our criterion for selecting
the “best” x. The set X is called the constraint set and the
function f0 is called the objective function.

b∈X
In order to solve this optimization problem, we must find an x
such that
b ) ≤ f (x) for all x ∈ X .
f (x (2)
b satisfying (2) a minimizer of f0 in X , and a solution
We call an x
to the optimization problem (1).

By convention, we will focus only on minimization problems, noting


b maximizes f0 in X if and only if x
that x b minimizes −f0 in X — thus
any maximization problem can be easily turned into an equivalent
minimization problem.

There are a number of fundamental questions that arise when con-


sidering an optimization problem of the form (1):
1. Existence. Does a solution to (1) even exist? It could be that
f0 is not bounded from below, or that X has been defined in
such a way as to be empty. How can we guarantee the existence
of a solution?

1
Georgia Tech ECE 8823c notes by M. A. Davenport and J. Romberg. Last updated 12:28, January 8, 2019
2. Uniqueness. Note that an x b satisfying (2) need not be
unique. Only when the inequality is strict can we conclude
that there is a unique (strict) minimizer. When can we con-
clude that there is a unique solution?
3. Verification. Given a candidate solution x b , is there a simple
condition we can check to determine if it is a/the solution to
to (1)?
4. Solution. Can we find a closed-form expression for a/the
solution to (1)? Can we provide an efficient algorithm for com-
puting a/the solution to (1)?
Throughout this course we will devote significant attention to all of
these questions, primarily in the context of convex problems.

Convex Optimization

The great watershed in optimization is not between linearity and


non-linearity, but convexity and non-convexity.
— R. Tyrrell Rockafellar

Solving optimization problems is in general very difficult. In this


class, we will develop a framework for analyzing and solving convex
programs.1 To state precisely what we mean by this, recall that a
set C is convex if
x, y ∈ C ⇒ (1 − θ)x + θy ∈ C for all θ ∈ [0, 1],
for all θ ∈ [0, 1]. Similarly, a function f is convex if
f (θx + (1 − θ)y) ≤ θf (x) + (1 − θ)f (y)
1
Throughout this course will use the terminology optimization/convex pro-
gram interchangeably with optimization/convex problem.

2
Georgia Tech ECE 8823c notes by M. A. Davenport and J. Romberg. Last updated 12:28, January 8, 2019
for all x, y and for all θ ∈ [0, 1]. (Much more on this later!) With
these definitions in hand, a convex program simply corresponds to
one where
1. The constraint set X is a convex subset of a real vector space
(in this class we will focus exclusively on X ⊆ RN ).
2. The objective function f0 : X → R is a convex function.

Typically, we will rely on X to be specified by a series of constraint


functionals:

x∈X ⇔ fm(x) ≤ bm for m = 1, . . . , M.

In this case, an equivalent way to characterize a convex program is


for each of the fm to be convex functions.

What does convexity tell us? Two important things:


• Local minimizers are also global minimizers. So we can check
if a certain point is optimal by looking in a small neighborhood
and seeing if there is a direction to move that decreases f0.
• First-order necessary conditions for optimality turn out to be
sufficient. For example, when the problem is unconstrained and
smooth, this means we can find an optimal point by finding x b
such that ∇f0(x b ) = 0.
The upshot of these two things is that if the fm(x) and their deriva-
tives2 are reasonable to compute, then relatively simple algorithms
(e.g. gradient descent) are provably effective at performing the opti-
mization.

2
And as we will see, much of what we do can be naturally extended to
non-smooth functions which do not have any derivatives.

3
Georgia Tech ECE 8823c notes by M. A. Davenport and J. Romberg. Last updated 12:28, January 8, 2019
The material in this course has three major components. The first
is the mathematical foundations of convex optimization. We will
see that talking about the solution to convex programs requires a
beautiful combination of algebraic and geometric ideas.

The second component is algorithms for solving convex programs.


We will talk about general purpose algorithms (and their associated
computational guarantees), but we will also look at algorithms that
are specialized to certain classes of problems, and even certain appli-
cations. Rather than focus exclusively on the “latest and greatest”,
we will try to understand the key ideas that are combined in different
ways in many solvers.

Finally, we will talk a lot about modeling. That is, how convex
optimization appears in signal processing, control systems, machine
learning, statistical inference, etc. We will give many examples of
mapping a word problem into an optimization program. These ex-
amples will be interleaved with the discussion of the first two compo-
nents, and there are several examples which we may return to several
times.

4
Georgia Tech ECE 8823c notes by M. A. Davenport and J. Romberg. Last updated 12:28, January 8, 2019
Convexity and Efficiency
You might have two questions at this point:

Can all convex programs be solved efficiently?


Unfortunately, no. There are many examples of even seemingly in-
nocuous convex programs which are NP-hard. One way this can
happen is if the functionals themselves are hard to compute. For
example, suppose we were trying to find the matrix with minimal
(∞, 1) norm that obeyed some convex constraints:

f0(X) = kXk∞,1 = max kXvk1.


kvk∞ ≤1

This is a valid matrix norm, and we will see later that all valid
norms are convex. But it is known that computing f0 is NP-hard (see
[Roh00]), as is approximating it to a fixed accuracy. So optimizations
involving this quantity are bound to be difficult.

Are there any non-convex programs that can be solved


efficiently?
Of course there are. Here is one for which you already know the
answer:
T
maximize
N
x Ax subject to kxk2 = 1,
x∈R

where A is an arbitrary N × N symmetric matrix. This is the


maximization of an indefinite quadratic form (not necessarily convex
or concave) over a nonconvex set. But we know that the optimal
value of this program is the largest eigenvalue, and the optimizer
is the corresponding eigenvector, and there are well-known practical
algorithms for computing these.

5
Georgia Tech ECE 8823c notes by M. A. Davenport and J. Romberg. Last updated 12:28, January 8, 2019
When there is a solution to a nonconvex program, it often times relies
on nice coincidences in the structure of the problem — perturbing
the problem just a little bit can disturb these coincidences. Consider
another nonconvex program that we know how to solve:
N
X
minimize (Xi,j − Ai,j )2 subject to rank(X) ≤ R.
X
i,j=1

That is, we want the best rank-R approximation (in the least-squares
sense) to the n×n matrix A. The functional we are optimizing above
is convex, but the rank constraint definitely is not. Nevertheless, we
can compute the answer efficiently using the SVD of A:
N
X
T
A = U ΣV = σnunv Tn , σ1 ≥ σ2 ≥ · · · ≥ σn ≥ 0.
n=1

The program above is solved simply by truncating this sum to its


first R terms:
R
X
X
c= σnunv Tn .
n=1

But now suppose that instead of the matrix A, we are given a subset
of its entries indexed by I. We now want to find the matrix that is
most consistent over this subset while also having rank at most R:
X
minimize (Xi,j − Ai,j )2 subject to rank(X) ≤ R.
X
(i,j)∈I

Despite its similarity to the first problem above, this “matrix com-
pletion” problem is NP-hard in general.

Convex programs tend to be more robust to variations of this type.


Things like adding subspace constraints, restricting variables to be

6
Georgia Tech ECE 8823c notes by M. A. Davenport and J. Romberg. Last updated 12:28, January 8, 2019
positive, and considering functionals of linear transforms of x all
preserve the essential convex structure.

For the rest of this introduction, we will introduce a few of the very
well-known classes of convex optimization programs, and give an
example of an application for each.

7
Georgia Tech ECE 8823c notes by M. A. Davenport and J. Romberg. Last updated 12:28, January 8, 2019
Linear programming
A linear program (LP) minimizes a linear functional subject to
multiple linear constraints:
minimize cTx subject to aTmx ≤ bm, m = 1, . . . , M.
x

The general form above can include linear equality constraints aTi x =
bi by enforcing both aTi x ≤ bi and (−ai)Tx ≤ bi — in our study
later on, we will find it convenient to specifically distinguish between
these two types of constraints. We can also write the M constraints
compactly as Ax ≤ b, where A is the M × N matrix with the aTm
as rows.

Linear programs do not necessarily have to have a solution; it is


possible that there is no x such that Ax ≤ b, or that the program
is unbounded in that there exists a series x1, x2, . . . , all obeying
Axk ≤ b, with lim cTxk → −∞.

There is no formula for the solution of a general linear program.


Fortunately, there exists very reliable and efficient software for solving
them. The first LP solver was developed in the late 1940s (Dantzig’s
“simplex algorithm”), and now LP solvers are considered a mature
technology. If the constraint matrix A is structured, then linear
programs with millions of variables can be solved to high accuracy
on a standard computer.

Linear programs are a very important class of optimization prob-


lems. However, if a single constraint (or the objective function) are
nonlinear, then we move into the much broader class of nonlinear
programs. While we will certainly discuss theory and algorithms
for LPs, we will spend a greater fraction of the course discussing these
more general nonlinear optimization problems.

8
Georgia Tech ECE 8823c notes by M. A. Davenport and J. Romberg. Last updated 12:28, January 8, 2019
Example: Chebyshev approximations
Suppose that we want to find the vector x so that Ax does not vary
too much in its maximum deviation:

minimize max |ym − aTmx| = minimize ky − Axk∞.


N
x∈R m=1,...,M N x∈R

This is called the Chebyshev approximation problem.


We can solve this problem with linear programming. To do this, we
introduce the auxiliary variable u ∈ R — it should be easy to see
that the program above is equivalent to

minimize
N
u subject to ym − aTmx ≤ u
x∈R , u∈R

ym − aTmx ≥ −u
m = 1, . . . , M.

To put this in the standard linear programming form, take


       
x 0 −A −1 −y
z= , c0 = , A0 = , b0 = ,
u 1 A −1 y

and then solve


0T 0 0
minimize
N +1
c z subject to A z ≤ b .
z∈R

9
Georgia Tech ECE 8823c notes by M. A. Davenport and J. Romberg. Last updated 12:28, January 8, 2019
Filter design

The standard “filter synthesis” problem is to find an finite-impulse


response (FIR) filter whose discrete-time Fourier transform (DTFT)
is as close to some target H∗(ω) as possible. That is, we would like
to solve
minimize sup |H∗(ω) − H(ω)| , subject to H(ω) being FIR
H ω∈[−π,π]

When the deviation from the optimal response is measured using a


uniform error, this is called “equiripple design”, since the error in the
solution will tend to have ripples a uniform distance away from the
ideal.

If we restrict ourselves to the case where H∗(ω) has linear phase (so
the impulse response is symmetric around some time index)3 we can
recast this as a Chebyshev approximation problem.

Specifically, a symmetric filter with 2K+1 taps (meaning that hn = 0


for |n| > K) has a real DTFT that can be written as a superposition
of a DC term plus K cosines:
K
(
X h0, k = 0
H(ω) = h̃k cos(kω), h̃k =
k=0
2hk , 1 ≤ k ≤ K.

So we are trying to solve


K

X
minimize sup H ∗ (ω) − x k cos(kω) .


x∈RK+1 ω∈[−π,π]
k=0
3
The case with general phase can also be handled using convex optimization,
but it is not naturally stated as a linear program.

10
Georgia Tech ECE 8823c notes by M. A. Davenport and J. Romberg. Last updated 12:28, January 8, 2019
We will approximate the supremum on the inside by measuring it at
M equally spaced points ω1, . . . , ωM between −π and π. Then
K

X
minimize max H∗(ωm) − xk cos(kωm) = minimize ky − F xk∞,

x ωm x
k=0

where y ∈ RM and the M × (K + 1) matrix F are defined as

1 cos(ω1) cos(2ω1) · · · cos(Kω1)


   
H∗(ω1)
 H∗ (ω2 )  1 cos(ω2 ) cos(2ω2 ) · · · cos(Kω2 ) 
y=
 ... 
 F =
 ... ... 

H∗(ωM ), 1 cos(ωM ) cos(2ωM ) · · · cos(KωM )

It should be noted that since the ωm are equally spaced, the matrix
F (and its adjoint) can be applied efficiently using a fast discrete
cosine transform. We will see later how this has a direct impact
on the number of computations we need to solve the Chebyshev
approximation problem above.

References
[Roh00] J. Rohn. Computing the norm kAk∞,1 is NP-Hard. Linear
and Multilinear Algebra, 47:195–204, 2000.

11
Georgia Tech ECE 8823c notes by M. A. Davenport and J. Romberg. Last updated 12:28, January 8, 2019

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy