0% found this document useful (0 votes)

21 views6 pages

Convex Optimization Prerequisite_topics

as the title

Uploaded by

issacwy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views6 pages

Convex Optimization Prerequisite_topics

as the title

Uploaded by

issacwy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

10-725/36-725: Convex Optimization

Prerequisite Topics

February 3, 2015

This is meant to be a brief, informal refresher of some topics that will form building blocks in
this course. The content of the first two sections of this document is mainly taken from Appendix
A of B & V, with some supplemental information where needed. See the end for a list of potentially
helpful resources you can consult for further information.

1 Real Analysis and Calculus

1.1 Properties of Functions
Limits You should be comfortable with the notion of limits, not necessarily because you will have
to evaluate them, but because they are key to understanding other attributes of functions.
Informally, limx→a f (x) is the value that f approaches as x approaches the value a.
Continuity A function f (x) is continuous at a particular point x0 if, as a sequence x1 , x2 , ...
approaches x0 , the value f (x1 ), f (x2 ), ... approaches f (x0 ). In limit notation: limi→∞ f (xi ) =
f (limi→∞ xi ). f is continuous if it is continuous at all points x0 ∈ domf .
Differentiability A function f : Rn → R is considered differentiable at x ∈ int domf if there
exists a vector 5f (x) that satisfies the following limit:

||f (z) − f (x) − Df (x)(z − x)||2

lim =0
z∈domf,z6=x,z→x ||z − x||2

We refer to 5f (x) as the derivative of f , and it is the transpose of the gradient.

Smoothness f is smooth if the derivatives of f are continuous over all of domf . We can describe
smoothness of a certain order if the derivatives of f are continuous up to a certain derivative.
It is also reasonable to talk about smoothness over a particular interval of the domain of f .

Lipschitz A function f is Lipschitz with Lipschitz constant L if ||f (x) − f (y)|| ≤ L||x − y||
∀x, y ∈ domf . If we refer to a function f as Lipschitz, we are making a stronger statement
about the continuity of f . A Lipschitz function is not only continuous, but it does not change
value very rapidly, either. This is obviously not unrelated to the smoothness of f , but a
function can be Lipschitz but not smooth.

1
Taylor Expansion The first order Taylor expansion of a function gives us an easy way to form a
linear approximation to that function:

f (y) ≈ f (x) + ∇f (x)(y − x)

And equivalent form that is often useful is the following:

Z 1
f (y) = f (x) + ∇f (t(x − y) + y)(y − x) dt.
0

For a quadratic approximation, we add another term:

1
f (y) ≈ f (x) + ∇f (x)(y − x) + (y − x)T ∇2 f (x)(y − x)
2
Often when doing convergence analysis we will upper bound the Hessian and use the quadratic
approximation to understand how well a technique does as a function of iterations.

1.2 Sets
Interior The interior intC of the set C is the set of all points x ∈ C for which ∃ > 0 s.t.
{y||y − x||2 ≤ } ⊆ C.
Closure The closure clC of a set C is the set of all x such that ∀ > 0 ∃y ∈ C s.t. ||x − y||2 ≤ .
The closure only makes sense for closed sets (see below), and can be considered the union of
the interior of C and the boundary of C.
Boundary The boundary is the set of points bdC for which the following is true: ∀ ∃y ∈ C, z ∈
/C
s.t. ||y − x||2 ≤ and ||z − x||2 ≤ .
Complement The complement of the set C ⊆ Rn is denoted by Rn
C. It is the set of all points not in C
Open vs Closed A set C is open if intC = C. A set is closed if its complement is open.
Equality You’ll notice that above we used a notion of equality for sets. To show formally that
sets A and B are equal, you must show A ⊆ B and B ⊆ A.

1.3 Norms
See B & V for a much more detailed treatment of this topic. I am going to list the most common
norms so that you are aware of the notation we will be using in this class:
`0 ||x||0 is the number of nonzero elements in x. We often want to minimize this, but it is non-
convex (and actually, not a real norm), so we approximate it (you could say we relax it) to
other norms (e.g. `1 ).
`p ||x||p = (|x1 |p + · · · + |xn |p )1/p , where p ≥ 1. Some common examples:
Pn
• ||x||1 = i=1 |xi |
pPn
• ||x||2 = 2
i=1 xi
• ||x||∞ = maxi |xi |
Spectral/Operator Norm ||X||op = σ1 (X), the largest singular value of X.
Pr
Trace Norm ||X||tr = i=1 σr (X), the sum of all the singular values of X.

1.4 Linear/Affine Functions

In this course, a linear function will be a function f (x) = aT x. Affine functions are linear functions
with a nonzero intercept term: g(x) = aT x + b.

1.5 Derivatives of Functions

See B & V for some nice examples. Consider the following for a function f : Rn → R:
Gradient The it h element of 5f is the partial derivative of f w.r.t. the it h dimension of the input
x: 5f (x)i = ∂f∂x(x)
i

Chain Rule Let h(x) = g(f (x)) for g : R → R. We have: 5h(x) = g 0 (f (x)) 5 f (x)
Hessian In the world of optimization, we denote the Hessian matrix as 52 f (x) ∈ Rn×n (some of
you have maybe seen this symbol used as the Laplace operator in other courses). The ij t h
∂ 2 f (x)
entry of the Hessian is given by: 5f (x)ij = ∂x i ∂xj

Matrix Differentials In general we will not be using these too much in class. The major differ-
entials you need to know are:
• ∂X T X = X
∂
• ∂X tr(XA) = AT

2 Linear Algebra
2.1 Matrix Subspaces
Row Space The row space of a matrix A is the subspace spanned of the rows of A.
Column Space The column space of a matrix A is the subspace spanned of the columns of A.
Null Space The null space of a matrix A is the set of all x such that Ax = 0.

Rank rankA is the number of linearly independent columns in A (or, equivalently, the number of
linearly independent rows). A matrix A ∈ Rm×n is full rank if rankA = min{m, n}. Recall
that if A is square and full rank, it is invertible.

2.2 Orthogonal Subspaces

Two subspaces S1 , S2 ∈ Rn are orthogonal if sT1 s2 = 0 ∀ s1 ∈ S1 , s2 ∈ S2 .
2.3 Decomposition
Eigen Decomposition If A ∈ S n , the set of real, symmetric, n × n matrices, then A can be
factored:
A = QΛQT
Here Q is an orthogonal matrix, which means that QT Q = I. Λ = diag(λ1 , λ2 , ..., λn ), where
the eigenvalues λi are ordered by decreasing value. Some useful facts about A that we can
ascertain from the eigen decomposition:
Qn
• |A| = i=1 λi
Pn
• trA = i=1 λi
• A is invertible iff (if and only if) all its eigenvalues are nonzero. Then A−1 = QΛ−1 QT
(note that I have used the fact that for orthogonal Q, Q−1 = QT
• A is positive semidefinite if all its eigenvalues are nonnegative.
Singular Value Decomposition Any matrix A ∈ Rm×n with rank r can be factored as:

A = U ΣV T

Here U ∈ Rm×r has the property that U T U = I and V ∈ Rn×r likewise satisfies V T V = I.
Σ = diag(σ1 , σ2 , ..., σr ) where the singular values σi are ordered by decreasing value. Some
useful facts that we can learn using this decomposition:
• The SVD of A has the following implication for the eigendecomposition of AT A:
2
T Σ 0
A A = [V W ] [V W ]T
0 0

W is the matrix such that [V W ] is orthogonal.

σ1
• The condition number of A (an important concept for us in this course) is condA = σr

Pseudoinverse The SVD of a singular matrix A yields the pseudoinverse A† = V Σ−1 U T .

3 Canonical ML Problems
3.1 Linear Regression
Linear regression is the problem of finding f : X → Y , where X ∈ Rn×p , Y is an n-dimensional
vector of real values and f is a linear function. Canonically, we find f by finding the vector β̂ ∈ Rp
that minimizes the least squares objective:

β̂ = argmin ||Xβ − Y ||22

For Y ∈ Rn×q , the multiple linear regression problem, we find a matrix B̂ that such that:

B̂ = argmin ||XB − Y ||2F

Note that in its basic form, the linear regression problem can be solved in closed form.
3.2 Logistic Regression
Logistic regression is the problem of finding f : X → Y , where Y is an n-dimensional vector binary
1
values, and f has the form f (x) = logit(β T x). The logit function is defined as logit(α) = 1+exp(−α) .
We typically solve for β by maximizing the likelihood of the observed data, which results in the
following optimization problem:
n
X
β̂ = argmax [yi β T xi − log(1 + exp(−yi β T xi )]
β i=1

3.3 Support Vector Machines

Like logistic regression, SVMs attempt to find a function that linearly separates two classes. In this
case, the elements of Y are either 1 or −1. SVMs frame the problem as the following constrained
optimization problem (in primal form):
1
β̂ = argmin ||β||22
β 2

s.t. yi (β T xi ) ≥ 1 ∀i = 1, ..., n
In its simplest form, the support vector machine seeks to find the hyperplane (parameterized
by β) that separates the classes (encoded in the constraint) and does so in a way that creates the
largest margin between the data points and the plane (encoded in the objective that is minimized).

3.4 Regularization/Penalization
Regularization (sometimes referred to as penalization) is a technique that can be applied to al-
most all machine learning problems. Most of the time, we regularize in an effort to simplify the
learned function, often by forcing the parameters to be “small” (either in absolute size or in rank)
and/or setting many of them to be zero. Regularization is also sometimes used to incorporate prior
knowledge about the problem.
We incorporate regularization by adding either constraints or penalties to the existing optimiza-
tion problem. This is easiest to see in the context of linear regression. Where previously we only
had least squares loss, we can add penalties to create the following two variations:
Ridge Regression By adding an `2 penalty, our objective to minimize becomes:
β̂ = argmin ||Xβ − Y ||22 + λ||β||2
β

This will result in many elements of β being close to 0 (more so if λ is larger).

Lasso Regression By adding an `1 penalty, our objective to minimize becomes:
β̂ = argmin ||Xβ − Y ||22 + λ||β||1
β

This will result in many elements of β being 0 (more if λ is larger).

The first example is nice because it still can be solved in closed form. Notice however that the `1
penalty creates issues not only for a closed-form solution, but also for standard first-order methods,
because it is not differentiable everywhere. We will study how to deal with this later in the course.
4 Further Resources
In addition to B & V, the following are good sources of information on these topics:
• Matrix Cookbook: http://www.mit.edu/~wingated/stuff_i_use/matrix_cookbook.pdf
• Linear Algebra Lectures by Zico Kolter: http://www.cs.cmu.edu/~zkolter/course/linalg/
index.html

• Functional Analysis/Matrix Calculus Lectures by Aaditya Ramdas: http://www.cs.cmu.

edu/~aramdas/videos.html

Lecture Notes 2
No ratings yet
Lecture Notes 2
181 pages
Aqa Syllabus PDF
No ratings yet
Aqa Syllabus PDF
45 pages
Reg Lin
No ratings yet
Reg Lin
73 pages
Medical Statistics From Scratch David Bowers All Chapter Instant Download
No ratings yet
Medical Statistics From Scratch David Bowers All Chapter Instant Download
49 pages
Intro 2 ML
No ratings yet
Intro 2 ML
162 pages
Optimization Best
No ratings yet
Optimization Best
71 pages
Optimization by UC Berkley
No ratings yet
Optimization by UC Berkley
77 pages
Lecture1 introductionPCA
No ratings yet
Lecture1 introductionPCA
75 pages
Data Science 1 2023 - Lecture 02 - Mathematical Preliminaries and Correlation
No ratings yet
Data Science 1 2023 - Lecture 02 - Mathematical Preliminaries and Correlation
49 pages
Design For Six Sigma For Green Belts And Champions Applications For Service Operationsfoundations Tools Dmadv Cases And Certification With Cdr Gitlow pdf download
No ratings yet
Design For Six Sigma For Green Belts And Champions Applications For Service Operationsfoundations Tools Dmadv Cases And Certification With Cdr Gitlow pdf download
85 pages
MAI391
No ratings yet
MAI391
52 pages
Analysis Summary
No ratings yet
Analysis Summary
4 pages
Optimizing Supply Chain Dynamics Using Machine Learning
No ratings yet
Optimizing Supply Chain Dynamics Using Machine Learning
59 pages
MachineLearningPatternRecognition_18_finalversion
No ratings yet
MachineLearningPatternRecognition_18_finalversion
265 pages
Tigistand Maru
No ratings yet
Tigistand Maru
42 pages
Jurnal+Al-balagh Vol+7+No1+2022 Removed+
No ratings yet
Jurnal+Al-balagh Vol+7+No1+2022 Removed+
30 pages
Berkeley Machine Learning
No ratings yet
Berkeley Machine Learning
185 pages
Main
No ratings yet
Main
515 pages
Lobell 2007 Environ. Res. Lett. 2 014002
No ratings yet
Lobell 2007 Environ. Res. Lett. 2 014002
8 pages
Week02 Convex Optimization
No ratings yet
Week02 Convex Optimization
48 pages
04 Nonlinear Systems and Optimization
No ratings yet
04 Nonlinear Systems and Optimization
74 pages
bookwithindex
No ratings yet
bookwithindex
96 pages
Data Driven Modelling Using MATLAB
No ratings yet
Data Driven Modelling Using MATLAB
21 pages
CSE-4119_Assignment
No ratings yet
CSE-4119_Assignment
3 pages
Liu Et Al 2022 Data Driven Machine Learning in Environmental Pollution Gains and Problems
No ratings yet
Liu Et Al 2022 Data Driven Machine Learning in Environmental Pollution Gains and Problems
10 pages
BBA Assignment 2
No ratings yet
BBA Assignment 2
6 pages
amath731_intro
No ratings yet
amath731_intro
7 pages
CO 250 Yuan Si
No ratings yet
CO 250 Yuan Si
88 pages
MA412 Final
No ratings yet
MA412 Final
82 pages
Inspira Journal of Modern Management Entrepreneurshipjmme Vol 07 n0 04 October 2017 Pages 01 To 09
No ratings yet
Inspira Journal of Modern Management Entrepreneurshipjmme Vol 07 n0 04 October 2017 Pages 01 To 09
9 pages
I. Introduction To Convex Optimization: Georgia Tech ECE 8823a Notes by J. Romberg. Last Updated 13:32, January 11, 2017
No ratings yet
I. Introduction To Convex Optimization: Georgia Tech ECE 8823a Notes by J. Romberg. Last Updated 13:32, January 11, 2017
20 pages
Albonoz Alarcon Rodrigo
No ratings yet
Albonoz Alarcon Rodrigo
10 pages
MLF Notes - Rishab Dec 24
No ratings yet
MLF Notes - Rishab Dec 24
6 pages
Convex Functions
No ratings yet
Convex Functions
13 pages
eecs127_reader
No ratings yet
eecs127_reader
199 pages
Tut 2 - FromStats2DM - Linear Algebra and Convex Optimzation
No ratings yet
Tut 2 - FromStats2DM - Linear Algebra and Convex Optimzation
26 pages
Wine Prediction
100% (1)
Wine Prediction
13 pages
Ees 400 - Topic Four - Multivariate Regression Analysis
No ratings yet
Ees 400 - Topic Four - Multivariate Regression Analysis
9 pages
Prob RV Opt Basics
No ratings yet
Prob RV Opt Basics
35 pages
lp
No ratings yet
lp
12 pages
Co 463
No ratings yet
Co 463
116 pages
convex-fns-scribed
No ratings yet
convex-fns-scribed
6 pages
Lecture_1_2_background
No ratings yet
Lecture_1_2_background
6 pages
Princeton University Notation and Terminology in optimization
No ratings yet
Princeton University Notation and Terminology in optimization
13 pages
Nonlinear Programming Concepts PDF
No ratings yet
Nonlinear Programming Concepts PDF
224 pages
Linear Programming MSM 2da (G05a, M09a) : Matthias Gerdts
No ratings yet
Linear Programming MSM 2da (G05a, M09a) : Matthias Gerdts
85 pages
Day 1
No ratings yet
Day 1
41 pages
University of Maryland: Econ 600
No ratings yet
University of Maryland: Econ 600
22 pages
Matinf 2360 Part 3
No ratings yet
Matinf 2360 Part 3
106 pages
斯坦福大学机器学习数学基础 33-40
No ratings yet
斯坦福大学机器学习数学基础 33-40
8 pages
Convex Functions: September 2, 2008
No ratings yet
Convex Functions: September 2, 2008
21 pages
Sio, U.N., & Ormerod, T.C. (2009). Does Incubation Enhance Problem Solving, A Meta-Analytic Review. Psychological Bulletin, 135(1)
No ratings yet
Sio, U.N., & Ormerod, T.C. (2009). Does Incubation Enhance Problem Solving, A Meta-Analytic Review. Psychological Bulletin, 135(1)
94 pages
OptimumEngineeringDesign Day2b
No ratings yet
OptimumEngineeringDesign Day2b
24 pages
Nonlinear Programming PDF
No ratings yet
Nonlinear Programming PDF
224 pages
Curs Tehnici de Optimizare
No ratings yet
Curs Tehnici de Optimizare
141 pages
MLF Combined
No ratings yet
MLF Combined
84 pages
Using Regression Analysis To Predict The Future Energy Consumption
No ratings yet
Using Regression Analysis To Predict The Future Energy Consumption
9 pages
Lec 18
No ratings yet
Lec 18
6 pages
Cost Estimation
No ratings yet
Cost Estimation
3 pages
G Power Test 2
No ratings yet
G Power Test 2
12 pages
Lecture Notes PDF
No ratings yet
Lecture Notes PDF
143 pages
Linear Programming Notes
No ratings yet
Linear Programming Notes
122 pages
Lecture 14: Linear Algebra: cs412: Introduction To Numerical Analysis
No ratings yet
Lecture 14: Linear Algebra: cs412: Introduction To Numerical Analysis
8 pages
03 Convex Functions
No ratings yet
03 Convex Functions
31 pages
ASTM New atNS
No ratings yet
ASTM New atNS
3 pages
Intro SVM New Example PDF
100% (1)
Intro SVM New Example PDF
56 pages
Factors Influencing On Saving Behaviour Among University Student SSSS
No ratings yet
Factors Influencing On Saving Behaviour Among University Student SSSS
12 pages
Convex Optimizatiom IP
No ratings yet
Convex Optimizatiom IP
97 pages
Math Review For ML
No ratings yet
Math Review For ML
41 pages
Statistics
No ratings yet
Statistics
9 pages
Powerpoint - Regression and Correlation Analysis
100% (1)
Powerpoint - Regression and Correlation Analysis
38 pages
Process Optimization Algorythms PDF
No ratings yet
Process Optimization Algorythms PDF
77 pages
DS Lab
No ratings yet
DS Lab
31 pages
STAT630Slide Adv Data Analysis
No ratings yet
STAT630Slide Adv Data Analysis
238 pages
Lecture 3
No ratings yet
Lecture 3
5 pages
A First Course in Optimization: Answers To Selected Exercises
No ratings yet
A First Course in Optimization: Answers To Selected Exercises
71 pages
Nla Primer-Toc PDF
No ratings yet
Nla Primer-Toc PDF
5 pages
Linear Models
No ratings yet
Linear Models
92 pages
Optimization (SF1811 SF1831 SF1841)
100% (1)
Optimization (SF1811 SF1831 SF1841)
198 pages
A First Course in Optimization: Answers To Selected Exercises
No ratings yet
A First Course in Optimization: Answers To Selected Exercises
71 pages
A First Course in Optimization: Answers To Selected Exercises
No ratings yet
A First Course in Optimization: Answers To Selected Exercises
71 pages
Instant Ebooks Textbook Introductury Econometrics: A Modern Approach 7th Edition Jeffrey M. Wooldridge Download All Chapters
100% (1)
Instant Ebooks Textbook Introductury Econometrics: A Modern Approach 7th Edition Jeffrey M. Wooldridge Download All Chapters
53 pages
Linear Algebra For Data Science 9811276226 9789811276224 - Compress
100% (2)
Linear Algebra For Data Science 9811276226 9789811276224 - Compress
257 pages
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
00 ME781 Merged Till SVM
No ratings yet
00 ME781 Merged Till SVM
604 pages
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Introduction To Functional Analysis - Goetz Grammel
No ratings yet
Introduction To Functional Analysis - Goetz Grammel
33 pages
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Convex Optimization Prerequisite_topics

Uploaded by

Convex Optimization Prerequisite_topics

Uploaded by

10-725/36-725: Convex Optimization

1 Real Analysis and Calculus

||f (z) − f (x) − Df (x)(z − x)||2

We refer to 5f (x) as the derivative of f , and it is the transpose of the gradient.

f (y) ≈ f (x) + ∇f (x)(y − x)

And equivalent form that is often useful is the following:

For a quadratic approximation, we add another term:

1.4 Linear/Affine Functions

1.5 Derivatives of Functions

2.2 Orthogonal Subspaces

W is the matrix such that [V W ] is orthogonal.

Pseudoinverse The SVD of a singular matrix A yields the pseudoinverse A† = V Σ−1 U T .

β̂ = argmin ||Xβ − Y ||22

B̂ = argmin ||XB − Y ||2F

3.3 Support Vector Machines

This will result in many elements of β being close to 0 (more so if λ is larger).

This will result in many elements of β being 0 (more if λ is larger).

• Functional Analysis/Matrix Calculus Lectures by Aaditya Ramdas: http://www.cs.cmu.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.