0% found this document useful (0 votes)
75 views27 pages

Slides Multivariate Calculus Wima 2018

This document provides an overview of multivariate calculus and optimization. It discusses functions of several variables, including visualizing them through plots and sections. It covers mathematical properties like continuity and convexity for multivariate functions. It also defines and provides examples of partial derivatives and differentiability for functions of multiple variables. Finally, it introduces the gradient vector and Jacobian matrix for multivariate functions.

Uploaded by

MilanAdzic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views27 pages

Slides Multivariate Calculus Wima 2018

This document provides an overview of multivariate calculus and optimization. It discusses functions of several variables, including visualizing them through plots and sections. It covers mathematical properties like continuity and convexity for multivariate functions. It also defines and provides examples of partial derivatives and differentiability for functions of multiple variables. Finally, it introduces the gradient vector and Jacobian matrix for multivariate functions.

Uploaded by

MilanAdzic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Lecture Analysis and Linear Algebra

Multivariate Calculus and Optimization

Rüdiger Frey

ruediger.frey@wu.ac.at
http://statmath.wu.ac.at/˜frey

Winter 2018

1 / 27
Multivariate Calculus and optimization

Functions of several variables

From now on we consider functions f : G ⊂ Rn → R,


(x1 , . . . , xn ) 7→ f (x1 , . . . , xn ).
Visualisation of function y = f (x1 , x2 ):

Plot the graph Gf = { x1 , x2 , f (x1 , x2 ) : (x1 , x2 ) ∈ G } (perspective
plot)
Contour plot. Plot the level sets or contours
Hc = {(x1 , x2 ) : f (x1 , x2 ) = c}.
Plot sections, that is graphs of the function x1 7→ f (x1 , x2 ) for fixed
values of x2 and of x2 7→ f (x1 , x2 ) for fixed x1 .

2 / 27
Multivariate Calculus and optimization

Example: plots of densities on R 2


4

4
2

2
0

0
y

y
-2

-2
-4

-4
   

-4 -2 0 2 4 -4 -2 0 2 4

x x
0.25

0.4
0 0.05 0.1 0.15 0.2

0 0.1 0.2 0.3


Z

Z
4


4


2



4 2 




4
0 2 

2
Y 0

0 Y 0
-2

X -2 

X


-2 

-2
-4 -4


-4 -4


density normal density; right bivariate t


Left contour and density plot of bivariate

3 / 27
Multivariate Calculus and optimization

Further mathematical properties

A function f : G ⊂ Rn → R is continuous in x0 ∈ G if for every


sequence xn ∈ G with xn → x0 it holds that limn→∞ f (xn ) = f (x0 ).
Be careful: it is not enough to check continuity of the sections
x1 7→ f (x1 , x2 ) and x2 7→ f (x1 , x2 ).
Suppose that G is convex. Then f is convex on G if for all x, y ∈ G
and all α ∈ [0, 1] it holds that

f (αx + (1 − α)y ) ≤ αf (x) + (1 − α)f (y ) .

4 / 27
Multivariate Calculus and optimization

A useful counterexample

Consider the function


( x1 x2
x12 +x22
for x ∈ R2 \ {0}
f (x1 , x2 ) =
0 for x = 0.

Then f is constant on every ray through the origin: For arbitrary


x = (x1 , x2 ) and arbitrary t 6= 0 one has f (tx1 , tx2 ) = f (x1 , x2 ). For
instance f (t, 0) = 0, and f (t, t) = t 2 /2t 2 = 1/2. In particular, the
sections t 7→ f (t, 0) and t 7→ f (0, t) are continuous, but f is not
continuous in t.

5 / 27
Multivariate Calculus and optimization

Differentiability and partial derivatives


Consider some f : U ⊂ Rn → R, (x1 , . . . , xn ) 7→ f (x1 , . . . , xn ), where U is
an open subset of Rn , and some x ∈ U.
Definition. (1) The ith partial derivative of f at x ∈ U is given by
∂f ∂f (x) f (x + hei ) − f (x)
(x) = = lim
∂xi ∂xi h→0 h
(if the limit exists).
(2) f is called continuously differentiable on U (f ∈ C 1 (U)) if for all
x ∈ U all partial derivatives exist and if the partial derivatives are are
continuous functions of x.
(3) More generally, a function f : U ⊂ Rn → Rm ,

(x1 , . . . , xn ) 7→ (f1 (x1 , . . . , xn ), . . . , fm (x1 , . . . , xn ))t

is continuously differentiable on U if all components f1 , . . . , fm belong to


C 1 (U).
6 / 27
Multivariate Calculus and optimization

Examples

(1) Let f (x1 , x2 ) = x13 x2 + x12 x22 + x1 + x22 . Then

∂f (x) ∂f
= 3x12 x2 + 2x1 x22 + 1, = x13 + 2x12 x2 + 2x2
∂x1 ∂x2

These are obviously continuous functions of x; hence f ∈ C 1 .


(2) Consider a symmetric 2 × 2 matrix A and let

f (x1 , x2 ) = x t Ax = a11 x12 + 2a12 x1 x2 + a22 x22 . ⇒

∂f (x) ∂f (x)
= 2a11 x1 + 2a12 x2 = (2Ax)1 and = (2Ax)2
∂x1 ∂x2

7 / 27
Multivariate Calculus and optimization

Gradient and Jacobi matrix

Suppose that f : U → R is in C 1 (U). Then the column vector


 t
∂f ∂f
∇f (x) = ∂x 1
(x), . . . , ∂xn (x) is the gradient of f .
For a C 1 function g : U → Rm the Jacobi matrix is given by
 
∂g1 (x) ∂g1 (x)
· · ·
 ∂x. 1 ∂xn
.. 
Jg (x) =  .
 . .  
∂gm (x) ∂gm (x)
∂x1 . . . ∂xn

Sometimes one uses also the gradient matrix ∇g (x) = Jg (x)t .

8 / 27
Multivariate Calculus and optimization

First order (Taylor) approximation

Theorem. Consider some C 1 function f : U → R. Then it holds for any


x, y ∈ U

f (y ) − f (x) = ∇f (x)t (y − x) + R(x, y − x) (1)

R(x,z)
where it holds that limkzk→0 kzk = 0.
Idea. The function f can be approximated locally around x by the affine
mapping y 7→ f (x) + ∇f (x)t (y − x).

9 / 27
Multivariate Calculus and optimization

Chain rule

Theorem. Consider C 1 functions f : Rn → Rm and g : Rk → Rn and let


h := f ◦ g . Then h is C 1 and it holds for the Jacobi matrix that
Jh(x) = Jf (g (x))Jg (x), i.e. the Jacobian of the concatenation is the
product of the individual Jacobi matrices.
Example. (Derivative along a vector). Consider a C 1 functions f : Rn → R
We want to consider the function f along the straight line φ(t) := x + tv ,
for t ∈ R, x, v ∈ Rn . We have Jφ(t) = v , Jf (x) = (∇f (x))t and hence

d d
f (φ(t)) = (∇f (x + tv ))t v , in particular f (φ(0)) = (∇f (x))t v .
dt dt

10 / 27
Multivariate Calculus and optimization

Second derivatives

Definition. Consider C 1 function f : U ⊂ Rn → R. Then the first order


partial derivatives ∂f∂x(x)
i
, 1 ≤ i ≤ n, are themselves functions from U to R.
1 If all partial derivatives are C 1 functions, f is called twice
continuously differentiable on U (f ∈ C 2 (U)). Fix i, j ∈ {1, . . . , n}.
Then one writes  
∂f
2
∂ f ∂ ∂xi (x)
(x) :=
∂xi ∂xj ∂xj
for the second partial derivative in direction xi and xj .
∂2f
2 For f ∈ C 2 (U) the matrix Hf with Hfij (x) = ∂xi ∂xj (x) is the Hessian
matrix of f .

11 / 27
Multivariate Calculus and optimization

Theorem of Young and Schwarz

Theorem. Consider f ∈ C 2 (U). Then the Hessian matrix is symmetric,


that is
∂2f ∂2f
(x) = (x) , 1 ≤ i, j ≤ n.
∂xi ∂xj ∂xj ∂xi
It follows that the Hessian is a symmetric matrix, that is Hfij (x) = Hfji (x),
1 ≤ i, j ≤ n. In particular, the definiteness of Hf can be checked using
eigenvalues or (for strictly definite matrices) with leading principal minors.

12 / 27
Multivariate Calculus and optimization

Example

(1) Consider f (x1 , x2 ) = x13 x2 + x12 x22 + x1 + x22 . Then we have

∂2f ∂2f ∂2f


= 6x1 x2 + 2x22 , = 2x12 + 2 , = 3x12 + 4x1 x2 .
∂x12 ∂x22 ∂x1 ∂x2

(2) Consider f (x) = x t Ax for some symmetric matrix A. Then


Hf (x) = 2A.

13 / 27
Multivariate Calculus and optimization

Second order Taylor expansion

Theorem. If f is C 2 (U) and x, y ∈ U the Taylor formula becomes

1
f (y ) − f (x) = ∇f (x)t (y − x) + (y − x)t Hf (x)(y − x) + R2 (x, y − x)
2
R2 (x,z)
where limkzk→0 kzk2
= 0.
Idea. f can be approximated locally around x ∈ U by the quadratic
function
1
y 7→ f (x) + ∇f (x)t (y − x) + (y − x)t Hf (x)(y − x) .
2
Locally, this is a better approximation than the first order Taylor
approximation.

14 / 27
Multivariate Calculus and optimization

Characterizing Convex and Concave functions

Let f : x 7→ f (x1 , x2 ) be defined on an open and convex set U and have


continuous partial derivatives of first and second order. If Hf (x) is positive
semidefinite for all x ∈ U then all sections of f along straight lines are
convex functions.
Theorem. (1) f is a convex function on U ⇔ Hf is positive semidefinite
on U.
(2 ) f is concave on U ⇔ Hf is negative semidefinite on U.
Note that we may decide convexity or concavity by finding the eigenvalues
of Hf (x).

15 / 27
Multivariate Calculus and optimization

Example

Problem. Let f (x1 , x2 ) = 2x1 − x2 − x12 + 2x1 x2 − x22 . Is f convex, concave


or none of both ?

Solution. Define the matrix


 
−1 1
A=
1 −1

An easy computation gives for the Hessian that Hf (x) = 2A. Hence we
need to check the definiteness of A.
Approach via eigenvalues. The characteristic polynomial of A is
P(λ) = λ2 + 2λ; the equation P(λ) = 0 has solutions (eigenvalues)
−2 and 0. Hence, A is negative semidefinite and the function is
concave.

16 / 27
Multivariate Calculus and optimization

Example
Problem. Let f (x1 , x2 ) = x12 − x22 − x1 x2 − x13 . Find the largest domain in
which f is concave.

Solution. We find easily that


 
2 − 6x1 −1
Hf (x) = = A(x)
−1 −2

We look for the maximal domain where A(x) is negative semidefinite. Now
a b
a symmetric 2 × 2 matrix A = is negative semidefinite if and only
b c
if a ≤ 0, c ≤ 0 and det(A) ≥ 0. This gives the conditions
m1 = 2 − 6x1 ≤ 0 and m2 = det(A) = 12x1 − 5 ≥ 0 (as c < 0). Therefore
the solution is

G = {(x1 , x2 ) : 2 − 6x1 ≤ 0 and 12x1 − 5 ≥ 0} = {(x1 , x2 ) : x1 ≥ 5/12}.

17 / 27
Multivariate Calculus and optimization

Optimization problems

In its most general form an optimization problem is

minimize f (x) subject to x ∈ X̃ (2)

Here the set of admissible points X̃ is a subset of Rn , and the cost


function f is a function from X̃ to R. Often the admissible points are
further restricted by explicit inequality constraints.
Note that maximization problems can be addressed by replacing f with
−f , as supx∈X̃ f (x) = − inf x∈X̃ {−f (x)}.

18 / 27
Multivariate Calculus and optimization

Unconstrained optimization: the problem

In this section we consider problems of the form

minimize f (x) for x ∈ X̃ = Rn (3)

Moreover, we assume that f is once or twice continuously differentiable.


Most results hold also in the case where X̃ is an open subset of Rn .

19 / 27
Multivariate Calculus and optimization

Local and global optima

Definition. Consider the optimization problem (3).


1 x ∗ is called (unconstrained) local minimum of f if there is some δ > 0
such that f (x ∗ ) ≤ f (x) for all x ∈ Rn with kx − x ∗ k < δ.
2 x ∗ is called global minimum of f , if f (x ∗ ) ≤ f (x) ∀x ∈ Rn .
3 x ∗ is said to be a strict local/global minimum if the inequality
f (x ∗ ) ≤ f (x) is strict for x 6= x ∗ .
4 The value of the problem is f ∗ := inf{f (x) : x ∈ Rn }
Remark Local and global maxima are defined analogously.

20 / 27
Multivariate Calculus and optimization

Necessary optimality conditions

Proposition. Suppose that x ∗ ∈ U is a local minimum of f .


1 If f is C 1 in U, then ∇f (x ∗ ) = 0. (First Order Necessary Condition
or FONC).
2 If moreover f ∈ C 2 (U) then Hf (x ∗ ) is positive semi-definite (Second
Order Necessary Condition or SONC).
Comments.
x ∗ ∈ Rn with ∇f (x ∗ ) = 0 is called stationary point of f .
Proof is based on Taylor formula.
Necessary conditions for a local maximum: ∇f (x ∗ ) = 0, Hf (x ∗ )
negative semidefinite.
Necessary conditions in general not sufficient: consider f (x) = x 3 ,
x ∗ = 0.

21 / 27
Multivariate Calculus and optimization

Sufficient optimality conditions

Proposition. (Sufficient conditions.) Let f : U ⊂ Rn → R be C 2 on U.


Suppose that x ∗ ∈ U satisfies the conditions

∇f (x ∗ ) = 0, Hf (x ∗ ) strictly positive definite (4)

Then x ∗ is a local minimum.


Comments.
Sufficient conditions not necessary: Consider eg. f (x) = x 4 , x ∗ = 0.
No global statements possible.

22 / 27
Multivariate Calculus and optimization

The case of convex functions

Lemma. Consider an open convex set X ⊂ Rn . A C 1 function f : X → Rn


is convex on the if and only if it holds for all x, z ∈ X that

f (z) ≥ f (x) + ∇f (x)0 (z − x).

If f is C 2 a necessary and sufficient condition for the convexity of f on X


is that Hf (x) is positive semi-definite for all x ∈ X .
Proposition. Let f : X → R be a convex function on some convex set
X ⊂ Rn . Then
1 A local minimum of f over X is also a global minimum. If f is strictly
convex, there exists at most one global minimum.
2 If X is open, the condition ∇f (x ∗ ) = 0 is necessary and sufficient for
x ∗ ∈ X to be a global minimum of f .

23 / 27
Multivariate Calculus and optimization

Example: Quadratic cost functions

Let f (x) = 12 x 0 Qx − b 0 x, x ∈ Rn for a symmetric n × n matrix Q and


some b ∈ Rn . Then we have

∇f (x) = Qx − b and Hf (x) = Q.

a) Local minima. If x ∗ is a local minimum we must have

∇f (x ∗ ) = Qx ∗ − b = 0, Hf (x ∗ ) = Q positive semi-definite;

hence if Q is not positive semi-definite, f has no local minima.


b) If Q is positive semi-definite, f is convex. In that case we need not
distinguish global and local minima, and f has a global minimum if and
only if there is some x ∗ with Qx ∗ = b.
c) If Q is positive definite, Q −1 exists and the unique global minimum is
attained at x ∗ = Q −1 b.
24 / 27
Multivariate Calculus and optimization

Constraint optimization and Lagrange multipliers

Given C 1 -functions f and h from Rn to R, consider the problem

min f (x) subject to h(x) = 0 (5)


x∈Rn

Proposition.(Existence of a Lagrange multiplier) Let x ∗ be a local


minimum for Problem 5, and suppose that ∇h(x ∗ ) 6= 0. Then
1 (FONC) There exists a unique λ∗ ∈ R such that

∇f (x ∗ ) + λ∗ ∇h(x ∗ ) = 0.

25 / 27
Multivariate Calculus and optimization

Comments and Interpretation.

i) V (x ∗ ) is the subspace of variations ∆x = (x − x ∗ ) for which the


constraint h(x) = 0 holds ’up to first order’. Condition ?? states that
∇f (x ∗ ) is orthogonal to all these ‘locally permissible’ variations: for
y ∈ V (x ∗ ) is holds that:

∇f (x ∗ )0 y = λ∗ ∇h(x ∗ )0 y = 0.

This condition is analogous to the condition ∇f (x ∗ ) = 0 in


unconstrained optimization.
ii) Using the Lagrange function L(x, λ) = f (x) + λh(x) we may write the
FONC as ∂x ∂
i
L(x ∗ , λ∗ ) = 0, 1 ≤ i ≤ n.

26 / 27
Multivariate Calculus and optimization

Examples

Geometric interpretation of multipliers: Consider the problem

min x1 + x2 subject to x12 + x22 = 2.


x∈R2

Lagrange can be used to establish geometric-arithmetic-mean


inequality

27 / 27

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy