Opt2017 Part1
Opt2017 Part1
Contents
1 Convergence in Metric Spaces 1
1.1 Definition of Metric and Normed Spaces . . . . . . . . . . . . . . . . . . . 2
1.2 Sequences and Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Open and Closed Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4 Limits of Functions, Continuity . . . . . . . . . . . . . . . . . . . . . . . . 14
1.5 Cauchy Sequences, Complete Metric Spaces . . . . . . . . . . . . . . . . . 18
1.6 Banach Contraction Mapping Theorem . . . . . . . . . . . . . . . . . . . . 18
1.7 Cantor Intersection Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.8 Separable Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.9 Basic Examples revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.10 Linear Mappings (Operators) in Normed Spaces . . . . . . . . . . . . . . . 28
1.11 Compact Sets (Heine-Borel, Bolzano-Weierstrass) . . . . . . . . . . . . . . 31
1.12 Continuous Functions on Compact Sets . . . . . . . . . . . . . . . . . . . . 36
1.13 Equivalent Metrics and Norms . . . . . . . . . . . . . . . . . . . . . . . . . 39
1.14 Back to the Reals: Rd as a Banach Space . . . . . . . . . . . . . . . . . . . 42
1.15 Summary: Structures on Vector Spaces . . . . . . . . . . . . . . . . . . . . 44
1.16 Multivalued Mappings (Correspondences) . . . . . . . . . . . . . . . . . . . 46
1
1.1 Definition of Metric and Normed Spaces
Of fundamental importance in mathematical analysis is the notion of limit and convergence
(e.g., for real numbers, complex numbers, vectors in Rn , functions etc.). This constitutes a
basis for defining two fundamental operations: differentiation and integration of functions.
The notion of limit involves a possibility to measure a “distance” between the objects.
Extending the well-known notion of the Euclidean distance
d(x, y) := |x − y| between two reals x, y ∈ R, or
p
d(x, y) := kx − yk = (x1 − y1 )2 + (x2 − y2 )2 + (x3 − y3 )2 between two points x :=
(x1 , x2 , x3 ), y := (y1 , y2 , y3 ) ∈ R3 ,
we naturally come to the notion of metric space, which is the basic one in modern math-
ematics. Roughly speaking, a metric space is a set in which we have defined a distance,
i.e., a metric. Introducing the metric, we can study the properties of limits independently
on the concrete nature of objects under consideration.
History: The notion of a metric space was first introduced by the French mathemati-
cian M. Fréchet (1906). This area of mathematics is now seen as a part of Functional
Analysis (FA).
Functional Analysis (as a branch of Mathematical Analysis) studies vector space (in
general, infinite-dimensional) endowed with some kind of limit-related structure (e.g. inner
product, norm, metric, topology, etc.) and the functions (so-called functionals or operators)
acting upon these spaces and respecting these structures in a suitable sense. (see Wiki).
FA started out at the beginning of the last century in works of the French mathemati-
cians Hadamard, Fréchet, Lévy, Riesz, . . . , and by the group of Polish mathematicians
around Stefan Banach (1892-1945). The PhD of Stephan Banach (1922) included the
basic ideas of functional analysis, which was soon to become an entirely new branch of
mathematics. Banach’s most influential work was Théorie des opérations linéaires (Theory
of Linear Operations, 1932), in which he formulated the concept now known as “Banach
spaces” (i.e., complete normed vector space) and proved many basic theorems of FA. An
important example of such spaces is a Hilbert space (named after David Hilbert, 1862-
1943), where the norm arises from an inner product. These spaces are of fundamental
importance in many areas.
Definition 1.1.1. Let X be a nonempty set (i.e, a collection of objects we call elements).
A metric (or distance function) on X is a mapping
d : X × X 3 (x, y) → d(x, y) ∈ R+ (≥ 0)
2
(ii) d(x, y) = d(y, x), ∀x, y ∈ X (symmetry);
The pair (X, d) is called a metric space. The elements x of a metric space X are also
called points.
Exercise 1.1.4. Prove that (i)–(iii) imply the inverse triangle inequality
Given a metric space (X, d) and a (nonempty) subset Y ⊂ X, it is clear that (Y, d) is
also a metric space; it is called a metric subspace of X.
Remark 1.1.6 (For Supporting material, see A. de la Fuente, p. 28.). A vector (or
linear) space is a set V of elements called vectors, together with a binary operation V ×
V → V called vector addition (and denoted by “+”) and an operation R × V → V called
scalar multiplication. These operations have the following properties:
for all x, y, z ∈ V and α, β ∈ R
(1) x + y = y + x (commutative property);
(2) x + (y + z) = (x + y) + z (associative property);
(3) ∃! 0 ∈ V : x + 0 = 0 + x = x (existence of the zero element);
3
(4) ∀ x ∈ V , ∃! (−x) ∈ V : x + (−x) = 0 (existence of inverse elements);
(5) α(x + y) = αx + αy and (α + β)x = αx + βx (distributive property);
(6) α(βx) = (αβ)x (associative law for scalars);
(7) 1 · x = x (multiplicative identity).
Then one can define the difference operation
d(x, y) := kx − yk, x, y ∈ V,
is a metric on V .
(1) The set of real numbers R with kxk := |x| (absolute value).
v
u n
uX
Euclidean distance d(x, y) = t (xi − yi )2 .
i=1
kx + yk ≤ kxk + kyk? The triangle inequality is not trivial. To check it one needs
the Cauchy-Schwarz inequality (Theorem 1.1.10 below).
4
(3) Another norm on Rn , the maximum norm
Exercise 1.1.8. Show that the trivial metric on R does not come from any norm.
(5) The space of continuous functions C([0, 1]) with the uniform (or maximum) norm.
f : [0, 1] → R, continuous,
Exercise 1.1.9. Prove that the examples (1), (2), (3) and (5) are norms. You will need:
v v
Xn u n
X
u n
u uX
2 t
|x · y| = xi y i ≤ xi · yi2 = kxk · kyk.
t
i=1 i=1 i=1
n
!2 n
! n
!
X X X
∆ = −
xi y i x2i yi2 ≤ 0.
i=1 v i=1 v i=1
Xn u n
X
u n
u uX
2 t
⇔ xi y i ≤ xi · yi2 .
t
i=1 i=1 i=1
5
Definition 1.1.11. The open ball with center at point x ∈ X and radius > 0 is
B (x) := {y ∈ X | d(x, y) ≤ } .
The subset U ⊆ X is open if for each x ∈ U there exists an open ball B (x) ⊂ U (with
radius = (x) > 0 depending on x).
U ⊆ X is closed if its complement U c := X\U is open.
U ⊆ X is called bounded if there exists some ball BR (x) (or BR (x)) such that U ⊆ BR (x)
(resp. U ⊆ BR (x)).
Proof. Let y ∈ BR (x). We have to find > 0 such that B (y) ⊆ BR (x).
Example 1.1.13. The intervals (a, b), (b, ∞), (−∞, b) are open sets in R. Their unions
are also open. The inverse statement is also true:
6
(ii) the intersection of two (or a finite number of ) open sets is open;
Exercise 1.1.17. Let X = [0, 1] with the metric d(x, y) = |y − x|. Characterise all open
balls in X. Is [0, 1] open? Is [0, 1/2] open?
(i) Y ⊆ X is bounded;
Exercise 1.1.20. [*] Show that, in general, the Hausdorff distance between two sets is
not a metric. [Hint: Show that dH (A, B) = 0 if and only if Ā = B̄. Hence, construct a
counterexample—find two subsets A 6= B ⊂ R such that Ā = B̄.]
What is true is that the Hausdorff distance on Rn defines a metric on the set of all
non-empty closed and bounded subsets of Rn (we’ll see what this means soon).
Basic Examples
Now we will introduce some of the more important metrics, as well as some metrics which
are a bit ‘non-standard’ (just to illustrate that metrics can be unintuitive sometimes):
(i) Euclidean space (Rn , | · |) of vectors x = (xi )ni=1 with the metric
v
u n
uX
d(x, y) = |x − y| := t (xi − yi )2 .
i=1
7
(ii) Manhattan (or taxicab, or city-block, or l1 -) metric (norm) on Rn (in particular,
n = 2): Xn
d(x, y) = kx − ykl1 := |xi − yi |.
i=1
(Taxicabs cannot drive through buildings. They have to drive either North-South or
East-West).
(This metric is not generated by any norm k · k. Indeed, kx − yk = d(x, y) = |x| + |y|
(for x 6= y) implies kxk = d(x, 0) = |x| (for x 6= 0 and y = 0). But then we must have
|x − y| = kx − yk = d(x, y) := |x| + |y|, which is surely wrong for arbitrary x 6= y.)
Exercise: Let n = 2 Show that the set {(0, 1)} is open in the British rail metric.
Find all points x ∈ R2 such that the set {x} is not open in the British rail metric.
Describe all the open sets in this metric.
This metric is similar to the British rail metric, but now passengers are allowed to
take a shorter journey if their destination is on the same rail line coming from Paris
(both points lie on the same ray emanating from the origin). Exercise: Can you
find an open ball in the British rail metric which is not open in the French Metro
metric?
(iv) C([a, b]) – Space of all continuous functions on a bounded interval [a, b] with the
maximum norm
kf − gk∞ := max |f (t) − g(t)|.
t∈[a,b]
(v) R∞ – space of all real sequences x = (xi )i≥1 with xi ∈ R. The metric (which
cannot be generated by any norm) is given by
∞
X 1 |xi − yi |
d(x, y) := i 1 + |x − y |
.
i=1
2 i i
8
(vi) l∞ – the space of all bounded sequences
l∞ := x = (xi )i≥1 ∈ R∞
sup |xi | < ∞
i≥1
with d(x, y) := kx − yk∞ := sup |xi − yi |.
i≥1
∞
!1/p ∞
!1/p ∞
!1/p
X X X
|xi + yi |p ≤ |xi |p + |yi |p , x, y ∈ lp .
i=1 i=1 i=1
∞ ∞
!1/p ∞
!1/q
X X X
|xi yi | ≤ |xi |p · |yi |q ,
i=1 i=1 i=1
1 1
x ∈ lp , y ∈ lq , + = 1 (p, q > 1).
p q
Exercise 1.1.21. Check that the following define metrics on the set of all positive
integers N := {1, 2, 3, . . .}:
|m−n|
(i) ρ(n, m) = mn
;
(
0, m = n;
(ii) ρ(n, m) = 1
1+ m+n
, m 6= n.
9
Notation 1.2.2. We write xn → x or limn→∞ xn = x.
Theorem 1.2.3 (Uniqueness of limits). A sequence (xn )n∈N has at most one limit.
lim xn = x, lim xn = y, x 6= y.
n→∞ n→∞
Let
:= d(x, y)/2 > 0.
Then:
∃N1 () : d(xn , x) < ∀n > N1 ();
∃N2 () : d(xn , y) < ∀n > N2 ().
Put N := max{N1 (), N2 ()}. Then for n > N : d(xn , x), d(xn , y) < .
Remark 1.2.5. In the discrete space X (see Example 1.1.2), any convergent sequence
is ‘eventually stationary’:
Definition 1.2.6. x ∈ X is a cluster point of (xn )n∈N if for any (small) open ball
with center at x, the sequence returns infinitely often to the ball.
10
In other words,
Remark 1.2.7. This is a weaker condition than convergence: we may have an infinite
number of terms outside the ball B (x). The limit of the sequence is always a cluster
point, but the converse need not be true.
Example 1.2.8. Let X = R and
(
0, n even (n = 2m),
xn =
1, n odd (n = 2m + 1).
xn = (0, 0, 1, 0, 1, 2, 0, 1, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 5, . . .)
(i) xn = (−1)n ;
(ii) xn = sin π2 n ;
(
2 − n1 , n even,
(iii) xn =
1/n, n odd;
(iv) xn = n mod 4.
11
1.3 Open and Closed Sets
Recall the definition of an open set from Section 1.1.
Example 1.3.3. We’ll show why point (ii) of Theorem 1.3.1 does not necessarily
apply to infinite intersections. Consider open balls in Rn , n ≥ 1,
is not open in Rn .
The closure of A, which will be denoted by Ā, consists of all closure points of
A. Obviously, A ⊆ Ā (since B (x) ∩ A 3 {x} for each x ∈ A).
(iii) x ∈ X is a boundary point of A if for all > 0:
x ∈ ∂A ⇐⇒ x ∈ Ā ∩ Ac .
12
From the above definitions
In particular, the closure of the open ball B (x) is always contained in the closed ball
B (x), see Definition 1.1.11. Furthermore, in the Euclidean space Rn they coincide.
Remark 1.3.5. In general, the closure of B (x) does not coincide with the closed
ball B (x)! A typical counterexample is the discrete metric with the unit open ball
B1 (x) = {x} (which is an open and closed set simultaneously) and the closed ball
B1 (x) = X.
(i) Show first that Å is open. By definition, for any x ∈ Å there exists B (x) ⊂ A.
We claim that indeed B (x) ⊂ Å. Pick an arbitrary y ∈ B (x). Since B (x) is
an open set, there exists η > 0 such that
Bη (y) ⊂ B (x) ⊂ A.
This means that any y ∈ B (x) is an interior point of A, i.e., B (x) ⊂ Å.
Show next that Å is the largest open subset of A: If B ⊆ A and B is open,
then B ⊆ Å. Indeed, for every point x ∈ B one finds a ball B (x) ⊂ B ⊂ A,
which means x ∈ Å.
(ii) If A = Å, then A is open, because Å is always open by Claim (i). If A is open,
then any x ∈ A is an interior point, which implies A ⊆ Å ⊆ A.
Exercise 1.3.7. Prove Claims (iii) and (iv). Hint: use that A open ⇐⇒ Ac closed.
∃ lim xn := x ∈ X =⇒ x ∈ A.
n→∞
13
Proof. (=⇒) Let A be closed and x := limn→∞ xn ∈ X. Suppose x ∈ / A, i.e.,
c c
x ∈ A := X\A, which is an open set. Therefore, B (x) ⊆ A for some > 0. By the
definition of convergence,
Exercise 1.3.9. Show that any closed ball B (x) is a closed set. Hint: use Theorem
1.3.8. Indeed, let (xn )n∈N ⊂ B (x) be convergent to some y ∈ X. Then by the triangle
inequality, for any n ∈ N
Intuition: you can control the changes of f : a “small ” change in x away from x0
will not change f (x0 ) too much.
The following is an equivalent definition of continuity.
Definition 1.4.2 (Continuous functions in terms of open balls).
14
Note that in the above definition δ := δ(, x0 ), i.e., it depends on as well as on the
point x0 .
History: This (, δ)-definition is due to A.-L. Cauchy (1789-1857), the French mathe-
matician who was an early pioneer of analysis. He had more than 800 research papers
and became a full professor of École Polytechnique at 28 years old. Cauchy’s inspi-
ration for continuity came from properties of differentiable functions. It was thought
for a very long time that all continuous functions were differentiable (except possibly
on a set of isolated points). Weierstrass proved that this was not true in 1872 when
he gave an example of a continuous function that was differentiable nowhere.
Exercise 1.4.3. Let y ∈ X be fixed. Show that the distance function f defined by
is continuous.
Theorem 1.4.4 (Sequential Characterization of Continuity). The function f : X →
Y is continuous at the point x0 ∈ X if and only if for every sequence (xn )n∈N ⊆ X
converging to x0 , the sequence (f (xn ))n∈N ⊆ Y is convergent to f (x0 ) ∈ Y .
Hence,
ρ(f (xn ), f (x0 )) < for all n > N (δ).
Therefore, f (xn ) −→ f (x0 ) as n → ∞.
(⇐=) Proof by contradiction: Suppose f is not continuous at x0 ∈ X. Then ∃ > 0
such that ∀δ > 0 one finds xδ ∈ X with d(xδ , x0 ) < δ but ρ(f (xδ ), f (x0 )) ≥ . Now
let δn = 1/n and choose the corresponding sequence
xn := xδn , n ∈ N.
15
Theorem 1.4.5 (Composition of Continuous Functions). Let (X, d1 ), (Y, d2 ), and
(Z, d3 ) be metric spaces. Let f : X → Y and g : Y → Z be functions such that f is
continuous at some point x0 ∈ X and g is continuous at y0 := f (x0 ) ∈ Y . Then,
their composition
h := g ◦ f : X → Z
is continuous at x0 .
The next theorem is one of the most important characterisations of continuity and
it should be memorised!
Theorem 1.4.6 (Global Continuity). The function f : X → Y is continuous if and
only if the preimage
f −1 (U ) := {x ∈ X | f (x) ∈ U }
of any open set U ⊆ Y is open in X.
16
Corollary 1.4.8. The function f : X → Y is continuous ⇐⇒ the preimage f −1 (B)
of any closed set B ⊆ Y is closed in X.
Exercise 1.4.9. Prove the above corollary using Theorem 1.4.6 and the equality
[f −1 (U )]c = f −1 (U c )
for any subset U ⊆ Y .
Warning: A similar statement for images of open (resp. closed) sets is in general
wrong! If V ⊆ X is open (resp. closed) in X, then
f (V ) := {y ∈ Y | y := f (x), x ∈ V }
is not necessarily open (resp. closed) in Y .
Exercise 1.4.10. Find an example of a continuous function f : X → Y and an open
subset U ⊆ X such that f (U ) is not an open subset of Y .
Definition 1.4.11. Let (X, d) and (Y, ρ) be two metric spaces. A function f : X → Y
is uniformly continuous if
∀ > 0, ∃δ > 0 : ρ(f (x1 ), f (x2 )) < , ∀x1 , x2 ∈ X with d(x1 , x2 ) < δ.
Remark 1.4.12. δ := δ() is independent of points x ∈ X, unlike in the (, δ)-
definition of continuity.
The following generalisations of continuity are important but will not be examinable:
Definition 1.4.13. Let (X, k · kX ) and (Y, k · kY ) be two normed spaces. A function
f : X → Y is Lipschitz-continuous if there exists a (Lipschitz) constant L > 0
such that
kf (x1 ) − f (x2 )kY ≤ Lkx1 − x2 kX , ∀x1 , x2 ∈ X.
Lemma 1.4.14. Lipschitz-continuous functions are uniformly continuous.
Proof. Let > 0 and set δ := /L > 0. Then for any x1 , x2 ∈ X with kx1 − x2 kX < δ
dY (x1 , x2 ) := kf (x1 ) − f (x2 )kY ≤ Lkx1 − x2 kX < Lδ = .
17
1.5 Cauchy Sequences, Complete Metric Spaces
Definition 1.5.1. Let (X, d) be a metric space. A sequence (xn )n∈N ⊆ X is a
Cauchy sequence if
Intuition: A sequence is Cauchy if its terms get closer and closer to each other.
Theorem 1.5.2. Every convergent sequence in (X, d) is Cauchy.
T: X →X
18
Exercise 1.6.2. Show that every contraction is a uniformly continuous function.
The next theorem is possibly the most important result in this course and it should be
memorised! It is used throughout analysis, the study of differential equations, matrix
analysis, game theory, dynamical systems and many other areas of mathematics and
economics.
Theorem 1.6.3 (Banach Contraction Mapping Theorem). Let (X, d) be a complete
metric space and T : X → X be a contraction of modulus β ∈ (0, 1). Then:
T x = x.
(ii) How to construct x∗ : For any starting point x0 ∈ X, the sequence (xn )n∈N
defined by
x1 := T x0 , x2 := T x1 , . . . , xn+1 := T xn , . . . (∗)
converges to x∗ .
Proof. It is useful to prove the existence of the fixed point and its uniqueness sepa-
rately.
(ia) Existence: Take any x0 ∈ X and define (xn )n∈N as in (∗). We show that this
sequence is Cauchy:
19
(ib) Uniqueness: Let x, y ∈ X be two fixed points. T is a contraction with β <
1 =⇒
(ii) As was shown in (ia), any sequence of the form (∗) converges to some fixed point
x∗ (possibly depending on the initial point x0 ). But according to (ib), the fixed
point is unique, which means that each approximating sequence (xn )n∈N has the
limit x∗ , which is the same for all x0 ∈ X.
Exercise 1.6.4. Let (X, d) be a complete metric space, and let T : X → X be such
that, for some n ∈ N, the operator T n is a contraction. Show that T has a unique
fixed point.
Hint: (i) Prove that T n has a unique fixed point, say x∗ . (ii) Check that this x∗ is
also the unique fixed point for T .
The Banach fixed point theorem is one of the most important theorems in all of
mathematics! Many problems can be formulated as equations F (x) = 0 and can be
rewritten in the form f (x) = x with f (x) := F (x) + x.
Standard applications of the Banach fixed point theorem include:
— The Picard-Lindelöf theorem about unique solvability of ordinary differential equa-
tions.
— The Page rank algorithm used by Google: One computes a fixed point of a linear
operator in RN (with huge N → ∞), which is a contraction. This fixed point x∗ ∈ RN
gives ordering of pages.
— Image compression: Digital encoding of images in the JPEG format is also a
mathematical algorithm based on the Banach fixed point theorem.
Theorem 1.6.5 (Continuous Dependence of the Fixed Point on Parameters). Let
(X, d) and (Ω, ρ) be metric spaces, and T (x, ω) be a mapping X × Ω → X. Further-
more, let (X, d) be complete. Suppose that for each x ∈ X
Ω 3 ω → T (x, ω) ∈ X is continuous,
20
Then the solution x∗ (ω) ∈ X of the fixed point problem
T (x, ω) = x
or
1
d(x∗n , x∗ ) ≤ d(T (x∗ , ω), T (x∗ , ωn )).
1−β
Since T (x∗ , ω) is continuous in ω, the right-hand side tends to zero as ωn → ω. Thus,
d(x∗n , x∗ ) as n → ∞.
Exercise 1.6.6. Let (X, d) be a complete metric space. Suppose we have two con-
tractions A : X → X and B : X → X such that
d(Ax, Ay) ≤ αd(x, y), d(Bx, By) ≤ βd(x, y), with α, β ∈ (0, 1).
21
Exercise 1.6.7. On a calculator, if you enter any number x0 and then apply the
cos function to x0 , you get a new number x1 := cos x0 . If we iterate this process
xn+1 := cos xn , you’ll find that the sequence of values xn converges to some other value
x∞ ' 0.739085133 . . .. Show that the function g : [0, 1] → [0, 1] given by g(x) = cos x
is a contraction (you might need to use the Mean Value Theorem). Interpret this
property of the cosine function in terms of the Banach fixed point theorem. Why does
this justify the method of iterating the cos function on a calculator to show that the
sequence xn converges? How many roots does the function f (x) = cos x − x have in
the interval [0, 1]?
∅=
6 An = Ān ⊆ X, An+1 ⊆ An , n ∈ N,
Proof. (=⇒) Suppose X is complete and consider any decaying sequence (An )n∈N ⊆
X as described above. Since An 6= ∅ =⇒ ∃xn ∈ An . Obviously, xm ∈ Am ⊆ An for
all m ≥ n, and hence
d(xn , xm ) ≤ diamAn , m ≥ n.
So, (xn )n∈N is a Cauchy sequence as diamAn → 0. Since X is complete =⇒
∃ lim xn =: x ∈ X.
n→∞
T
We claim that x ∈ n∈N An . Indeed, xm ∈ An for all m ≥ n and An is closed, thus
by the characterisation of closed sets (Theorem 1.3.8), x = lim xm = x ∈ An .
T m→∞, m≥n
So, x ∈ n∈N An .
1
Georg Cantor (1845–1918), German mathematician, inventor of set theory
22
T
Finally, we observe that n∈N An consists only of the unique point {x}. If y 6= x is
T
some other point from n∈N An , then
d(x, y) ≤ diamAn → 0, n → ∞,
which yields x = y.
(⇐=) This is left as an optional exercise. It is not so trivial. Argue by contradiction.
In other words Definition 1.8.3 says that for each x ∈ X there exists a subsequence
(xnm )m∈N of (xn )n∈N such that
x = lim xnm ,
m→∞
23
1.9 Basic Examples revisited
(i) Euclidean space (Rn , | · |) of vectors x = (xi )ni=1 with metric
v
u n
uX
d(x, y) = |x − y| := t (xi − yi )2 .
i=1
This is a Polish (normed) space with the dense set Qn (consisting of vectors
with rational components xi ∈ Q).
(ii) Manhattan (or taxicab, or city-block, or l1 -) metric (norm) on Rn (in partic-
ular, n = 2): Xn
d(x, y) = kx − ykl1 := |xi − yi |.
i=1
(We cannot cut the corners and walk along the streets). Again this is a separable
Banach space.
(iii) British rail (or post) metric on Rn (with centre 0:=London):
(
0, x = y,
d(x, y) =
|x| + |y|, x 6= y.
Exercise: Show that Euclidean space with the British rail metric is complete
but not separable. (Hint: recall that for every x 6= 0, the set {x} is open).
(iii) French Metro metric on Rn (with centre 0:=Paris):
(
|x − y|, x = cy, c ∈ R
d(x, y) =
|x| + |y|, x 6= y.
Exercise: Show that Euclidean space with the French Metro metric is not
separable. (Hint: think in polar coordinates)
(iv) C([a, b]) – Banach space of all continuous functions on a bounded interval
[a, b] with the maximum norm
For completeness of C([a, b]) see Lemma 1.9.3 below. This space is separable:
for instance, Ā = C([a, b]), where A is the set of all polynomials with rational
coefficients;
P (t) := a0 tN + a1 tN −1 + · · · + aN −1 t + aN , ai ∈ Q, N ∈ N.
24
(v) R∞ – space of all real sequences x = (xi )i≥1 with xi ∈ R. The metric (which
cannot be generated by any norm) is given by
∞
X 1 |xi − yi |
d(x, y) := .
i=1
2i 1 + |xi − yi |
This is a Polish space, a countable dense set A consists e.g. of all finite sequences
with rational coefficients (q1 , q2 , . . . , qN , 0, 0, . . .), qi ∈ Q, N ∈ N. For a sequence
(xn )n∈N ⊂ R∞ with xn = (xn,i )i≥1 = (xn,1 , xn,2 , . . . , xn,i , . . .),
d(xn , x) → 0 ⇐⇒ xn,i → xi , ∀i ∈ N,
n→∞ n→∞
uniformly w.r.t i ∈ N,
i.e., sup |xn,i − xi | → 0 as n → ∞.
i∈N
Warning! This space is complete but not separable! Define its subset
This subset is not countable (its cardinality is that of the continuum). But
d(x, y) = 1 for all x, y ∈ B, x 6= y. If there exists a set A that is dense in l∞ ,
then in each of the balls B1/2 (y), y ∈ B, there should be at least one point
x ∈ A. Such balls do not intersect, which means that A is also uncountable.
This contradicts the assumption that l∞ is separable.
(vii) Polish spaces of p-summable sequences lp , 1 ≤ p < ∞,
(
∞
)
X
lp := x = (xi )i≥1 ∈ R∞ |xi |p < ∞
i=1
v
u∞
uX
with the norm kxkp := t
p
|xi − yi |p .
i=1
25
Exercise 1.9.1. (not trivial) Check completeness of lp .
(viii) Lp ([a, b]) – Banach space of all Lebesgue p-integrable functions, 1 ≤ p < ∞,
with Z b 1/p
p
d(x, y) = kx − ykLp := |x(t) − y(t)| dt .
a
(ix) C k ([a, b]) for k = 1, 2, ..., Banach space of k-times continuously differen-
tiable functions x : [a, b] → R, with the norm
k
X
kxkC k := max |x(k) (t)|, x(0) (t) := x(t).
t∈[a,b]
i=0
Again, a countable dense set in C k ([a, b]) are polynomials with rational coeffi-
cients.
Compare:
26
(b) Cb ([a, b]) with the max-norm (since max f = sup f on a bounded closed interval
[a, b]).
Lemma 1.9.3. Let a sequence (fn )n∈N ⊂ Cb (X) converge uniformly to some func-
tion f : X → R. Then certainly f ∈ Cb (X).
|f (x) − f (y)| ≤ |f (x) − fN (x)| + |fN (x) − fN (y)| + |fN (y) − f (y)| < ,
Proof. Let (fn )n∈N ⊂ Cb (X) be Cauchy w.r.t. k·k∞ . Recall that any Cauchy sequence
is bounded, i.e.,
sup kfn k∞ := C < ∞.
n≥1
27
Furthermore, for each x ∈ X
i.e., kf k∞ ≤ C.
∀ > 0, ∃N () ∈ N : |fn (x) − fm (x)| < /2, ∀n, m > N (), ∀x ∈ X.
Thus, for each fixed n > N () and x ∈ X, in the above estimate we can pass to the
limit fm (x) → f (x) as m → ∞ and get
So, fn → f pointwise, but f ∈ / C[0, 1]. This immediately says us that fn cannot
converge uniformly on [0, 1], otherwise by Lemma 1.9.3 we should have f ∈ C[0, 1].
28
(i) L is uniformly continuous on X;
(ii) L is continuous on X;
(iii) L is continuous only at 0 ∈ X (or at some x ∈ X);
(iv) L is bounded, in the sense it has the bounded operator norm
Then ∀x ∈ X, kxkX ≤ 1,
δ
δ
2 δ
2
δ
2
x
≤ < δ =⇒ kLxkY =
L · x
=
L x
< .
2
2
δ 2
δ
2
δ
X Y Y
The definition of uniform continuity follows with any > 0 and δ = /L > 0.
Actually, by taking f0 (t) ≡ 1 with kf0 k∞ = 1, we see that I(f0 ) = (b − a) and hence
kIk = (b − a).
29
Example 1.10.3. Let X = C 1 ([0, 1]) with the same sup-norm kf k∞ .
Define D : C 1 ([0, 1]) → C([0, 1]) by
Df := f 0 (derivative).
Proof. Take
fn (t) := tn , t ∈ [0, 1].
Obviously, fn ∈ C 1 ([0, 1]) and kfn k∞ = 1. But
and
kDfn k∞ = n.
So,
kDk := sup kDf k∞ ≥ sup kDfn k∞ = ∞.
f ∈C 1 , kf k∞ ≤1 n
kLzkY
kLk := sup . (∗ ∗ ∗)
z6=0 kzkX
We claim all three definitions are equivalent.
Indeed, by the linearity of L
z
z
L · kzk X
kzk X ·
L
kLzkY
kzkX
kzk X
sup = sup
Y = sup
Y
z6=0 kzkX z6=0
z
z6=0
z
kzkX · kzk X
||zk X ·
kzkX
X X
kLykY
= sup = sup kLykY ≤ sup kLxkY = sup kLxkY
kykX =1 kykX kykX =1 0≤kxkX ≤1 0<kxkX ≤1
30
(since L0 = 0). Note that for each x ∈ X with 0 < kxkX ≤ 1
x
x
kLxkY =
L kxk · = kxk L
X
| {zX}
kxkX
kxkX
Y
Y
≤1
≤
L kxkx X
= kLykY ,
Y
where
x
y := ∈ X obeys kykX = 1.
kxkX
Hence,
sup kLykY ≤ sup kLxkY ≤ sup kLykY .
kyk=1 0<kxkX ≤1 kyk=1
(ii) A is compact if every open cover (Ui )i∈I of A has a finite subcover
[
A⊆ Uik = Ui1 ∪ · · · ∪ UiN .
1≤k≤N
Example 1.11.2. Let (xn )n≥1 be a convergent sequence in X with lim xn = x. Then
n→∞
is compact.
As Ui0 is open, we can find > 0 such that x ∈ B (x) ⊆ Ui0 . As xn → x, there exists
N0 ∈ N such that
xn ∈ B (x) ⊆ Ui0 for all n > N0 .
31
Now choose some sets from the open cover such that Uin 3 xn , 1 ≤ n ≤ N0 . Then
[
A⊆ Uik = Ui0 ∪ Ui1 ∪ · · · ∪ UiN .
0≤k≤N0
Warning: Without the limit point {x} the argument would not work! Indeed, see
the counterexample below:
Exercise 1.11.3. Show that the set A = {1/n}n≥1 is not compact in R.
S
Hint: Consider e.g. the following open cover n∈N Un ⊇ A:
1 1 1
U1 = , 2 3 1, Un = , 3 1/n, n ≥ 2.
2 n+1 n−1
Theorem 1.11.4. Any compact set is closed and bounded.
Thus,
h ic
A ⊆ UK = B1/K (x) with K = max{n1 , . . . , nN },
or B1/K (x) ⊆ B1/K (x) ⊆ Ac .
32
By compactness,
A ⊆ Bn1 (x) ∪ · · · ∪ BnN (x).
Then
A ⊆ BK (x) with K = max{n1 , . . . , nN },
which means that A is bounded.
Proof. By our assumption Ac := X\A is open. Let (Ui )i∈I be an open cover of A.
Then (Ui )i∈I and Ac together constitute an open cover of K. Since K is compact,
there exists a finite subcover
K ⊆ Ui1 ∪ · · · ∪ UiN ∪ Ac .
The next theorem is a famous and very important result which allows one to more
easily determine exactly when a subset of Euclidean space is compact. It should be
memorised!
Theorem 1.11.6 (Heine-Borel). Let A ⊆ Rn . Then
Warning: This theorem (more precisely, its sufficient part) holds only in Rn or in
finite dimensional spaces, see Riescz Theorem below.
Definition 1.11.7. A set A ⊆ X is called sequentially compact if every sequence
(xn )n≥1 ⊆ A has a convergent subsequence (xnk )k≥1 whose limit belongs to A:
∃ lim xnk =: x ∈ A.
k→∞
In other words, every (xn )n≥1 ⊆ A has at least one cluster point x ∈ A.
33
Definition 1.11.8. A set A ⊆ X is totally bounded if for any > 0 it can be
covered by a finite family of balls
The total boundedness is much stronger than the usual boundedness, but in Rn
they are equivalent! (Since ∀R, > 0: BR (0) ⊂ ∪N
k=1 B (xk ) with proper N =
n
N (R, ) ∈ N and xk ∈ R , 1 ≤ k ≤ N .)
Theorem 1.11.10 (Characterisation of Compact Sets). Let (X, d) be a metric space.
For any A ⊆ X, the following claims are equivalent:
(i) A is compact;
(ii) A is sequentially compact;
(iii) (A, d) is complete and A is totally bounded in X.
Remark 1.11.11. If (X, d) is complete, then for each A ⊆ X
where we denote 1 := (y1 ) > 0, . . . , N := (yN ) > 0. Thus, A contains only finitely
many terms of the sequence (xn )n≥1 , i.e., xn ∈ / A for all n larger than some N0 ∈ N.
This contradicts the initial assumption that (xn )n≥1 ⊆ A.
Related Claim: Any sequentially compact set A is closed (compare with Theorem
1.11.4 saying that any compact set is closed).
34
Let us prove this by contradiction. Suppose that A is not closed, then by Theorem
1.11.4 there exists (xn )n≥1 ⊆ A such that
xn → x ∈ Ac as n → ∞.
Then any subsequence (xnk )k≥1 also converges to this x. This contradicts with the
sequential compactness of A claiming that x ∈ A.
(ii) =⇒ (iii) (a) We show that the metric space (A, d) is complete. Take any
Cauchy sequence (xn )n≥1 ⊆ A. By sequential compactness, there exists a subsequence
(xnk )k≥1 that converges to some x ∈ A. So, (xn )n≥1 has a cluster point in A. But
(xn )n≥1 is Cauchy, and from the very definition (see Exercise 1.11.12 below) any
Cauchy sequence can have at most one cluster point which (provided it exists) will
also be its limit point. This means that (xn )n≥1 is convergent to x ∈ A.
(b) Let us show that A is totally bounded. Suppose not, i.e., for some > 0
we cannot find a finite -net for A. Take any x1 ∈ A and let U1 := B (x1 ). By
assumption ∃x2 ∈ A with x2 ∈ / U1 . Let U2 := B (x2 ), then {U1 , U2 } is still not a
cover of A, which implies ∃x3 ∈ A with x2 ∈/ U1 ∪ U2 . Put U3 := B (x3 ), and so on...
Consider the sequence (xn )n≥1 , by construction d(xn , xm ) ≥ for all n, m. Clearly,
(xn )n≥1 is not a Cauchy sequence, and hence it cannot contain some convergent
subsequence (xnk )k≥1 .
Exercise 1.11.12 (see also Tutorials 3). Every Cauchy sequence (xn )n≥1 ⊆ X has
at most one cluster point. If such cluster point exists, it would also be the limit
point.
Corollary 1.11.13 (already stated as Theorem 1.11.6, Heine-Borel). Let X = Rn .
Then
A ⊆ Rn is compact ⇐⇒ A is closed and bounded.
35
The dimension N = dim X < ∞ is the smallest number N ∈ N such that there
exists a basis of vectors e1 , . . . , eN ∈ X allowing the presentation (as a linear com-
bination)
XN
x= αi ei , αi ∈ R,
i=1
for all x ∈ X.
Some examples of infinite dimensional spaces include lp , Lp with 1 ≤ p ≤ ∞; C([0, 1]).
For instance, the unit ball B1 (0) in lp is not (sequentially) compact. This ball
contains a sequence of basis vectors
en := (0, . . . , 0, 1, 0, 0, . . .) = (δn,i )∞
i=1 , n ∈ N,
| {z }
n−1
so that √
p
ken klp = 1, ken − em klp = 2 > 0, n 6= m,
and hence there are no convergent subsequences in (en )∞
n=1 .
Proof. We use sequential compactness. Let (yn )n≥1 ⊆ f (K), which means that yn =
f (xn ) for some xn ∈ K. By sequential compactness of K
∃ lim xnk =: x ∈ K.
k→∞
36
Proof. Let > 0 be arbitrary. Due to continuity of f , for every x ∈ K we can choose
δ(x) > 0 such that
ρ(f (y), f (x)) < /2, ∀y ∈ Bδ(x) (x). (∗)
The family Bδ(x)/2 (x) x∈K constitutes a (trivial) open cover of K. By compactness
of K, it holds that
K ⊆ Bδ1 /2 (x1 ) ∪ ... ∪ BδN /2 (xN ), (∗∗)
for some N ∈ N and δ1 := δ(x1 ) > 0, . . . , δN := δ(xN ) > 0. Set δ := min{δ1 , . . . , δN } >
0. Now let x, y ∈ K with d(x, y) < δ/2. Because of (∗∗), there exists some xi with
1 ≤ i ≤ N , such that d(xi , x) < δi /2. Moreover,
f (xmax ) = sup f (x) = max f (x), f (xmin ) = inf f (x) = min f (x).
x∈K x∈K x∈K x∈K
We are now in a position to state what an optimisation problem is. The goal of this
course will be to study when such problems have solutions and, if they do have a
solution, how many solutions are there and what are their values?
37
Optimization Problems for f : K → R
f – objective function;
K – constraint set.
Maximize
f (x) subject to x ∈ K.
Minimize
Notation:
max{f (x) | x ∈ K}, min{f (x) | x ∈ K}.
The Weierstrass extreme value theorem is a powerful tool. But it says nothing about
how to find these extrema. Concrete (e.g., numerical) ways to do this will be the
subject of Part 3 of this course.
Problem 1.12.4. Let (X, d) be a metric space with nonempty subset K. Given some
x∈
/ K, find the “closest” element to x in K.
Proposition 1.12.5. Let K be a compact set. Then for every x ∈
/ K there exists
y0 ∈ K (not necessarily unique) such that
Note that |f (y) − f (z)| ≤ d(y, z) for any y, z ∈ K, thus f is (Lipschitz) continuous
on K. But K is compact which implies that f attains its min on K. Hence, ∃y0 ∈ K
such that
f (y0 ) = d(x, y0 ) = inf{d(x, y) | y ∈ K} = d(x, K).
Problem 1.12.6. Let (X, k·k) be a normed space and let L ⊆ X be a linear subspace
generated by a finite system of vectors {e1 , . . . , eN }, N ∈ N, i.e.,
( N
)
X
L := y ∈ X y = αi ei with α1 , . . . , αN ∈ R , dim L ≤ N.
i=1
Given some x ∈
/ L, find in L ⊂ X the “best” approximation of x.
38
Proposition 1.12.7. For every x ∈
/ L there exists y0 ∈ L such that
kx − y0 k := inf{kx − yk | y ∈ L}.
inf{kx − yk | y ∈ L} =: δ > 0.
The sequence {yn }n∈N is bounded, more precisely kyn k ≤ kxk + kx − yn k < kxk +
δ + 1 =: R. The closed ball BR (0) in the (finite dimensional) space (L, k · k) is a
compact set which implies that ∃ynk →k→∞ y0 ∈ BR (0) ⊆ L. Passing to the limit in
(∗) as nk → ∞, we conclude that kx − y0 k = δ.
(i) Given a set X 6= ∅, the metrics d1 and d2 are (topologically) equivalent if for
any x ∈ X and (xn )n≥1 ⊆ X, the sequence xn → x in (X, d1 ) if and only if
xn → x in (X, d2 ).
(ii) Given a linear space X, the norms k · k1 and k · k2 are equivalent if the metrics
generated by k · k1 and k · k2 are equivalent.
Theorem 1.13.2 (Equivalence of all norms in Rn ). Let k · k be any norm on Rn .
Then there exist constants m, M ∈ (0, ∞) such that
Proof. Note that the norm function (Rn , k · k) 3 x → kxk ∈ R is always continuous.
But we claim that
(Rn , | · |) 3 x → f (x) := kxk ∈ R
39
is also continuous (although we now consider another norm on Rn !). Indeed, each
vector x = (x1 , . . . , xn ) ∈ Rn is uniquely represented as a linear combination
x = x1 e1 + · · · + xn en ,
where
1 , 0, . . . , 0), 1 ≤ i ≤ n,
ei = (0, . . . , 0, |{z}
i
n n
!1/2 n
!1/2
X X X
≤ |xi − yi | · kei k ≤ |xi − yi |2 kei k2
|{z}
i=1 Cauchy inequ i=1 i=1
n
!1/2
X
= C|x − y|Rn , with C := kei k2 < ∞,
i=1
which means the uniform continuity of f . By Theorem 1.12.3 x → f (x) achieves its
maximum M and minimum m on the unit sphere
S1 (0) := {x ∈ Rn | |x|Rn = 1} ,
which is a compact set in (Rn , | · |Rn ). It is easy to see that m > 0 (since f (xmin ) =
kxmin k = 0 implies xmin = 0 ∈
/ S1 (0)).
Consider now any x 6= 0, then |x|Rn =: α > 0 and
Thus,
αm ≤ kxk ≤ αM, if α := |x|Rn > 0, or
m|x| ≤ kxk ≤ M |x|, ∀x ∈ Rn .
40
Proof. (⇐=) is obvious.
(=⇒) If k · k1 and k · k2 are (topologically) equivalent, the embedding operators
(X, k · k1 ) 3 x → I1 x := x ∈ (X, k · k2 ),
(X, k · k2 ) 3 x → I2 x := x ∈ (X, k · k1 ),
Equivalent metrics preserves continuity of functions and generate the same system
of open sets.
Definition 1.13.5. Let (X, k·k) be a separable Banach (i.e., complete normed) space.
A sequence (en )n≤∞ (finite or countable) is called a Schauder basis of X if every
element x ∈ X has a unique presentation as a linear combination
≤∞
X
x= αn en with some coefficients αn ∈ R,
n=1
1 , 0, . . .), n ∈ N.
en = (0, . . . , 0, |{z}
n
Clearly, each Banach space with a Schauder basis is necessarily separable. As a dense
set, one can take all finite sums N
P
n=1 αn en with αn ∈ Q and 1 ≤ n ≤ N ∈ N.
Problem 1.13.7 (The Basis problem). Does every separable Banach space have a
Schauder basis?
The basis problem was posed by S. Banach in the 1930s. It remained open for more
than 40 years and was finaly solved in 1973 by Per Enflo (born in 1944); a Norwegian
mathematician (and concert pianist!). Surprisingly, the answer is negative as Enflo
constructed a counterexample.
41
1.14 Back to the Reals: Rd as a Banach Space
We will first review of some basic facts in R (d = 1).
R = the real line, with the norm |x| = absolute value of x ∈ R.
• (R, | · |) is a Banach space, which means that every Cauchy sequence is conver-
gent.
• For a set A ⊂ R, the supremum sup A ∈ R ∪ {+∞} is the least upper bound
for A. That is, (i) ∀x ∈ A, x ≤ sup A; (ii) ∀y < sup A, ∃x ∈ A such that x > y.
• If sup A ∈ A, we call this number the maximum of A: max A ∈ R.
• Analogously we define the infimum inf A ∈ R ∪ {−∞} and the minimum
min A ∈ R.
• The supremum property (one of basic axioms for R; cannot be proved or dis-
proved):
Every nonempty set A ⊂ R which is ‘bounded above’ has its sup A ∈ R. That
is:
x ≤ M < ∞ for all x ∈ A ⇐⇒ ∃ sup A ∈ R.
Theorem 1.14.2. Every bounded above, increasing sequence (xn )n≥1 ⊂ R (such that
xn ≤ xn+1 ≤ M < ∞, ∀n ≥ 1) converges to its supremum
∃ lim xn = sup xn (≤ M ).
n→∞ n≥1
Proposition 1.14.3 (The algebra of limits in R). Let (xn )n≥1 , (yn )n≥1 be convergent
sequences in R,
lim xn = x, lim yn = y.
n→∞ n→∞
Then:
42
(ii) limn→∞ (xn · yn ) = x · y;
(iii) limn→∞ (xn /yn ) = x/y if y 6= 0;
(iv) if xn ≤ yn for all n ≥ 1, then x ≤ y.
Remark 1.14.4. Note that (iv) is not true for “<”: If xn < yn for all n ≥ 1, then
in general x ≤ y.
Definition 1.14.5. (xn )n≥1 ⊂ R tends to infinity if
• R × R 3 (x, y) → x + y ∈ R,
• R × R 3 (x, y) → x · y ∈ R,
• R × R\{0} 3 (x, y) → x/y ∈ R.
43
Proof. This follows from the algebra of limits and Theorem 1.4.4.
Proof. This follows from the Weierstrass Theorem, since [a, b] is compact.
Proof. (Idea) For concreteness, let f (a) < y0 < f (b). Define
44
Definition 1.15.1 (Hilbert spaces). Let X be a vector space. The most restrictive
structure on X is that of the inner product. By definition, this is the mapping
X × X 3 (x, y) → hx, yi ∈ R
45
An Orthonomal basis in l2 consists of the vectors
ei = (0, . . . , 0, |{z}
1 , 0, 0 . . . , ), 1 ≤ i < ∞,
i
(
1, i = j,
hei , ej i = δi,j :=
0, i 6= j.
The most general notion is a topological space. Such spaces are described by a sys-
tem of open sets (Ui )i∈I , which is called its topology. But we cannot quantitatively
measure the distance between two point x, y ∈ X. To do this we need some metric
d on X which induces the topology. Not all topologies are induced by a metric!
46
Definition 1.16.3. Let f : X →→ Y be a correspondence, and let V ⊆ Y .
The strong (or upper) inverse of V under f is
−1
fstr (V ) := {x ∈ X | f (x) ⊆ V, f (x) 6= ∅} .
Definition 1.16.4. We now have some choices about how to define continuity of a
correspondence:
(i) A correspondence f : X →→ Y is upper hemicontinuous (sometimes called
−1
semicontinuous) if the strong inverse fstr (V ) of every open set V ⊆ Y is open;
(ii) A correspondence f : X →→ Y is lower hemicontinuous if the weak inverse
−1
fweak (V ) of every open set V ⊆ Y is open;
(iii) A correspondence f : X →→ Y is continuous if it has both properties.
Definition 1.16.5. Some other important properties that a correspondence can sat-
isfy:
(i) A correspondence f : X →→ Y is compact-valued if every f (x) is a compact
set in Y ;
(ii) A correspondence f : X →→ Y is called closed if its graph Graph(f ) is a closed
set in X × Y . In more words, f is closed whenever
xn → x,
=⇒ y ∈ f (x).
yn ∈ f (xn ), yn → y
47
(i) A compact-valued correspondence f : X →→ Y is upper hemicontinuous iff for
any convergent sequence {xn }n∈N ⊂ X, limn→∞ xn = x ∈ X, every sequence
{yn }n∈N ⊂ Y , yn ∈ f (xn ), has a convergent subsequence ynk → y ∈ f (x) as
k → ∞.
(ii) A compact-valued correspondence f : X →→ Y is lower hemicontinuous iff for
any convergent sequence {xn }n∈N ⊂ X, limn→∞ xn = x ∈ X, there exists a
sequence {yn }n∈N ⊂ Y , yn ∈ f (xn ), such that yn → y ∈ f (x).
48