0% found this document useful (0 votes)
9 views4 pages

Analiza Convexa

The document defines convex functions and their properties, including conditions for convexity, strict convexity, and strong convexity. It also discusses L-smooth functions and Jensen's inequality, which relates the expectation of a convex function to the function of an expectation. Various examples and theorems illustrate these concepts in the context of real vector spaces.

Uploaded by

mariustodorut78
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views4 pages

Analiza Convexa

The document defines convex functions and their properties, including conditions for convexity, strict convexity, and strong convexity. It also discusses L-smooth functions and Jensen's inequality, which relates the expectation of a convex function to the function of an expectation. Various examples and theorems illustrate these concepts in the context of real vector spaces.

Uploaded by

mariustodorut78
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 4

Let 𝑋 be a convex subset of a real vector space and let 𝑓 : 𝑋 → ℝ be a function.

Then 𝑓 is called **convex** if and only if any of the following equivalent


conditions hold:

1. For all 0 ≤ 𝑡 ≤ 1 and all 𝑥₁, 𝑥₂ ∈ 𝑋:

𝑓(𝑡𝑥₁ + (1 − 𝑡)𝑥₂) ≤ 𝑡𝑓(𝑥₁) + (1 − 𝑡)𝑓(𝑥₂)

The right hand side represents the straight line between (𝑥₁, 𝑓(𝑥₁)) and
(𝑥₂, 𝑓(𝑥₂)) in the graph of 𝑓 as a function of 𝑡; increasing 𝑡 from 0 to 1 or
decreasing 𝑡 from 1 to 0 sweeps this line. Similarly, the argument of the
function 𝑓 in the left hand side represents the straight line between 𝑥₁ and 𝑥₂
in 𝑋 or the 𝑥-axis of the graph of 𝑓. So, this condition requires that the

line between any pair of points on the curve of 𝑓 be above or just meeting the
straight

2. For all 0 < 𝑡 < 1 and all 𝑥₁, 𝑥₂ ∈ 𝑋 such that 𝑥₁ ≠ 𝑥₂:
graph.[2]

𝑓(𝑡𝑥₁ + (1 − 𝑡)𝑥₂) ≤ 𝑡𝑓(𝑥₁) + (1 − 𝑡)𝑓(𝑥₂)

The difference of this second condition with respect to the first condition above

(𝑥₁, 𝑓(𝑥₁)) and (𝑥₂, 𝑓(𝑥₂))) between the straight line passing through a
is that this condition does not include the intersection points (for example,

pair of points on the curve of 𝑓 (the straight line is represented by the right

side of this condition) and the curve of 𝑓; the first condition includes the
hand

intersection points as it becomes 𝑓(𝑥₁) ≤ 𝑓(𝑥₁) or 𝑓(𝑥₂) ≤ 𝑓(𝑥₂) at 𝑡 = 0 or 1,


or 𝑥₁ = 𝑥₂. In fact, the intersection points do not need to be considered in a
condition
of convex using

𝑓(𝑡𝑥₁ + (1 − 𝑡)𝑥₂) ≤ 𝑡𝑓(𝑥₁) + (1 − 𝑡)𝑓(𝑥₂)

because 𝑓(𝑥₁) ≤ 𝑓(𝑥₁) and 𝑓(𝑥₂) ≤ 𝑓(𝑥₂) are always true (so not useful to be a
part of a condition).

The second statement can also be modified to get the definition of strict

the strict inequality <. Explicitly, the map 𝑓 is called strictly convex if and
convexity, where the latter is obtained by replacing ≤ with

only if for all real 0 < 𝑡 < 1 and all 𝑥₁, 𝑥₂ ∈ 𝑋 such that
𝑥₁ ≠ 𝑥₂:

𝑓(𝑡𝑥₁ + (1 − 𝑡)𝑥₂) < 𝑡𝑓(𝑥₁) + (1 − 𝑡)𝑓(𝑥₂)

A strictly convex function 𝑓 is a function that the straight line between any pair
of points on the curve 𝑓 is above the curve 𝑓 except for
the intersection points between the straight line and the curve.
The function f is said to be concave (resp. strictly concave) if -f(f multiplied by
-1) is convex(resp.strictly convex).

**Strongly convex functions**

The concept of strong convexity extends and parametrizes the notion of strict
convexity. Intuitively, a strongly-convex function is a
function that grows as fast as a quadratic function.[11] A strongly convex function

dimensional function 𝑓 is twice continuously differentiable and the domain is the


is also strictly convex, but not vice versa. If a one-

real line, then we can characterize it as follows:


- 𝑓 convex if and only if 𝑓″(𝑥) ≥ 0 for all 𝑥.
- 𝑓 strictly convex if 𝑓″(𝑥) > 0 for all 𝑥 (note: this is sufficient, but not

- 𝑓 strongly convex if and only if 𝑓″(𝑥) ≥ 𝑚 > 0 for all 𝑥.


necessary).

**Functions of one variable**

- The function 𝑓(𝑥) = 𝑥² has 𝑓″(𝑥) = 2 > 0, so 𝑓 is a convex function. It is also


strongly convex (and hence strictly convex too),

- The function 𝑓(𝑥) = 𝑥⁴ has 𝑓″(𝑥) = 12𝑥² ≥ 0, so 𝑓 is a convex function. It is


with strong convexity constant 2.

strictly convex, even though the second

- The absolute value function 𝑓(𝑥) = |𝑥| is convex (as reflected in the triangle
derivative is not strictly positive at all points. It is not strongly convex.

derivative at the point 𝑥 = 0. It is not strictly convex.


inequality), even though it does not have a

- The function 𝑓(𝑥) = |𝑥|ᵖ for 𝑝 ≥ 1 is convex.


- The exponential function 𝑓(𝑥) = 𝑒ˣ is convex. It is also strictly convex,
since 𝑓″(𝑥) = 𝑒ˣ > 0, but it is not strongly convex

function 𝑔(𝑥) = 𝑒^{𝑓(𝑥)} is logarithmically convex if


since the second derivative can be arbitrarily close to zero. More generally, the

𝑓 is a convex function. The term "superconvex" is sometimes used instead.[18]


- The function 𝑥³ has second derivative 6𝑥; thus it is convex on the set where 𝑥 ≥
0 and concave on the set where 𝑥 ≤ 0.

-----------------------------------------------------------------------------------
-----------------------------------

1 L-smooth functions and strong convexity

Unless otherwise specified, X is a finite dimensional ℝ-vector space equipped with


p-norm
‖ · ‖.

**Definition 1.** *The dual space X* of X is the space of linear forms on X with
norm ‖ · ‖*
defined by

‖f‖* = max_{‖x‖=1} f(x).

As X is assumed to be finite dimensional, there is a natural equivalence between X

X*, i.e. X* is an ℝ-vector space of the same dimension as X.


and

**Remark 2.** *For X with norm ‖ · ‖_p, the dual norm is ‖ · ‖_q, where 1/p + 1/q =
1 for p > 1, and
q = ∞ for p = 1. In particular, if X has Euclidean norm, then so does X*.*

1.1 L-smooth functions

The definition of L-smooth functions can be generalized to X with an unspecified p-


norm.

**Definition 3.** *A differentiable function f : X → ℝ is L-smooth with respect to


a norm ‖ · ‖ if*

‖∇f(y) − ∇f(x)‖* ≤ L‖y − x‖, ∀x, y ∈ X.


**Theorem 4.** *Let f : X → ℝ be convex, and L > 0. Then the following are

x, y ∈ X and λ ∈ [0, 1]:*


equivalent for all

1. *f is L-smooth with respect to ‖ · ‖;*


2. f(y) ≤ f(x) + ⟨∇f(x), y − x⟩ + L/2 ‖y − x‖²;
3. f(y) − f(x) − ⟨∇f(x), y − x⟩ ≥ 1/2L ‖∇f(x) − ∇f(y)‖²_*;
4. ⟨∇f(x) − ∇f(y), x − y⟩ ≥ 1/L ‖∇f(x) − ∇f(y)‖²_*;
5. f(λx + (1 − λ)y) ≥ λf(x) + (1 − λ)f(y) − L/2 λ(1 − λ)‖x − y‖².

**Proof.** (1) ⇒ (2) Let x_λ = x + λ(y − x) for λ ∈ [0, 1]. Using the fundamental
theorem of
calculus and Hölder’s inequality:

f(y) − f(x) − ⟨∇f(x), y − x⟩ = ∫₀¹ ⟨∇f(x_λ) − ∇f(x), y − x⟩ dλ


≤ ∫₀¹ ‖∇f(x_λ) − ∇f(x)‖_* ‖y
− x‖ dλ
≤ ∫₀¹ Lλ‖y − x‖² dλ
= L/2 ‖y − x‖².

(2) ⇒ (3) For fixed x ∈ X let

φ(y) = f(y) − f(x) − ⟨∇f(x), y − x⟩.


From definition ∇φ(y) = ∇f(y) − ∇f(x), and by convexity, φ(x) = 0 is a minimum

y ∈ X, set z = y − ‖∇φ(y)‖_* / L v where v is chosen so that ⟨∇φ(y), v⟩ = ‖∇φ(y)‖_*


value. For

and ‖v‖ = 1.
Then

0 ≤ φ(z)
= φ(y) − ⟨∇φ(y), ‖∇φ(y)‖_* / L v⟩ + L/2 ‖ ∇φ(y)‖_* / L v‖²
= f(y) − f(x) − ⟨∇f(x), y − x⟩ − 1/2L ‖∇f(y) − ∇f(x)‖²_*.

(3) ⇒ (4) For each x, y ∈ X,

f(y) − f(x) − ⟨∇f(x), y − x⟩ ≥ 1/2L ‖∇f(x) − ∇f(y)‖²_*


f(x) − f(y) − ⟨∇f(y), x − y⟩ ≥ 1/2L ‖∇f(y) − ∇f(x)‖²_*

Summation yields (4).


(4) ⇒ (1) Using Hölder’s inequality,

1/L ‖∇f(x) − ∇f(y)‖²_* ≤ ⟨∇f(x) − ∇f(y), x − y⟩ ≤ ‖∇f(x) − ∇f(y)‖_* ‖x − y‖.

(2) ⇒ (5) This follows from the definition of convexity and the inequality in (2).
(5) ⇒ (2) Rewrite (5) as

f(y) ≤ f(x) + [f(x + λ(y − x)) − f(x)] / λ + L(1 − λ)/2 ‖y − x‖².

The limit as λ → 0 results in (2).


-----------------------------------------------------------------------------------
---------------------------------------------------

for every x₁ and x₂ and every p ∈ [0, 1] we have


**Definition:** A function f from the reals to the reals is convex if

pf(x₁) + (1 − p)f(x₂) ≥ f(px₁ + (1 − p)x₂).

If f is (doubly) differentiable then f is convex if and only if d²f/dx² ≥ 0.


assigning real values X(m) for m ∈ M.
Now consider a probability distribution P on a set M and a function X

**Theorem 1 (Jensen’s Inequality)** *If f is convex then for any distribution*


*P on M we have the following.*

E_{m~P} [f(X(m))] ≥ f (E_{m~P} [X(m)])

Usually the right hand side above — f of an expectation — is simpler than


the left hand side — the expectation of f. Jensen’s inequality is used to bound
the “complicated” expression E [f(X)] by the simpler expression f(E [X]).
Often these expression are actually very close to each other. (Assuming that
these expressions are equal is called the mean field approximation).

We prove Jensen’s inequality only for the case where M is a finite set
{m₁, ..., m_k}. Let x_i abbreviates X(m_i) and p_i abbreviates P(m_i). First
consider the case where M contains only two elements. In this case we have
the following.
E_{m~M} [f(X(m))] = p₁f(x₁) + p₂f(x₂)
≥ f(p₁x₁ + p₂x₂)
= f(E_{m~M} [X(m)])

Note that the definition of convexity is simply the statement that Jensen’s
inequality holds for two point distributions. We prove Jensen’s inequality for
finite M by induction on the number of elements of M. Suppose M contains
k elements and assume that Jensen’s inequality holds for distributions on
k − 1 points. We now have the following where the fourth line follows from
the induction hypothesis.

E_{m~P} [f(X(m))]
= p₁f(x₁) + p₂f(x₂) + p₃f(x₃) + ⋯ + p_kf(x_k)
= (p₁ + p₂) ( (p₁ / (p₁ + p₂)) f(x₁) + (p₂ / (p₁ + p₂)) f(x₂) ) + p₃f(x₃) + ⋯
+ p_kf(x_k)
≤ (p₁ + p₂) f ( (p₁ / (p₁ + p₂)) x₁ + (p₂ / (p₁ + p₂)) x₂ ) + p₃f(x₃) + ⋯ +
p_kf(x_k)
≤ f ( (p₁ + p₂) (p₁x₁ + p₂x₂) / (p₁ + p₂) + p₃x₃ + ⋯ + p_kx_k )
= f(p₁x₁ + p₂x₂ + p₃x₃ + ⋯ + p_kx_k)
= f(E_{m~M} [X(m)])

The definition of convexity generalizes to the case where f is a function


from vectors to reals and x₁ and x₂ are taken to be vectors. Jensen’s in-
equality also generalizes to the case where X(m) is a vector. In this case
E_{m~P} [X(m)] is an average vector. In the vector case the above definitions
and derivations go through unchanged.
-----------------------------------------------------------------------------------
---------------------------------------------------------------------------------

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy