0% found this document useful (0 votes)
32 views40 pages

Selective Review

1) Infinite sets can have different cardinalities (sizes), with countable (aleph null) and uncountable (continuum) being the main distinctions. 2) Sets have the same cardinality if there exists a bijection between them. The set of integers, rational numbers, and algebraic numbers are countable, while the set of real numbers is uncountable. 3) Sets of measure zero can be covered by other sets of arbitrarily small positive measure and include single points, finite sets of points, and countable infinite sets of points.

Uploaded by

reema alnafisi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views40 pages

Selective Review

1) Infinite sets can have different cardinalities (sizes), with countable (aleph null) and uncountable (continuum) being the main distinctions. 2) Sets have the same cardinality if there exists a bijection between them. The set of integers, rational numbers, and algebraic numbers are countable, while the set of real numbers is uncountable. 3) Sets of measure zero can be covered by other sets of arbitrarily small positive measure and include single points, finite sets of points, and countable infinite sets of points.

Uploaded by

reema alnafisi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Cardinality of Infinite Sets

The cardinality of a set 𝑨, denoted by 𝑨 , #𝑨 or card 𝑨 , is a measure of the number of


elements of the set. For finite sets, this is obvious. If 𝑨 = 𝟐, 𝟑, 𝟓 , 𝑨 = 𝟑. But what if the set
has an infinite number of elements? Do all infinite sets have the same cardinality? Or do they
somehow differ in size?

The main idea in order to solve the question of the cardinality of infinite sets is that sets have
the same cardinality if there exists a bijection between them. In an injective or one-to-one
mapping, each element of the codomain is associated with at most one element of the domain.
In a surjective or onto mapping, each element of the codomain is associated with at least one
element of the domain. A bijective mapping is both injective and surjective where each element
of the codomain is mapped to by exactly one element in the domain.

1
For example, consider the set of positive integers ℤ+ = 𝟏, 𝟐, 𝟑, 𝟒, … and the set of positive
even integers ℤ+ + +
𝒆 = {𝟐, 𝟒, 𝟔, 𝟖, … }. The mapping 𝒇: ℤ → ℤ𝒆 , 𝒇 𝒏 = 𝟐𝒏 is a bijection between

the two sets. Despite the fact that ℤ+ +


𝒆 is a proper subset of ℤ , the existence of a bijection

between them means that they have the same cardinality, i.e., ℤ+ = ℤ+
𝒆 . (In the realm of

infinite sets, the part can be put into a one-to-one correspondence with the whole.) In fact, the
set of integers ℤ, the set of natural numbers ℕ, the set of prime integers, the set of rational
numbers ℚ, the two-dimensional integer lattice ℤ2 (composed of all possible pairs of integers),
the set of algebraic numbers, the set of computable numbers, etc., all have the same cardinality
because it can be proven that there is a bijection between any two of the these sets. This
cardinality is aleph null or aleph naught and is denoted by the symbol 𝓝𝟎 . It turns out that this
is the smallest infinity, or the smallest size/cardinality of an infinite set. There are infinite sets
with cardinality greater than 𝓝𝟎 . Sets with cardinality 𝓝𝟎 are called countable sets or
countably infinite sets.

Consider the set of real numbers ℝ and the open interval 𝒂, 𝒃 , −∞ < 𝑎 < 𝑏 < ∞. The
𝝅 𝒂+𝒃
mapping 𝑓: 𝒂, 𝒃 → ℝ, 𝒇 𝒕 = 𝒙 = tan 𝒕− is a bijection between 𝒂, 𝒃 and ℝ
𝒃−𝒂 𝟐

demonstrating ℝ = 𝒂, 𝒃 . The midpoint of the interval is mapped to 0. As 𝒕 → 𝒃,


𝝅 𝝅
𝒙 → tan = ∞ and as 𝒕 → 𝒂, 𝒙 → tan − 𝟐 = −∞. Knowing 𝒙, the corresponding 𝒕, which is
𝟐
𝒂+𝒃 𝒃−𝒂
unique, is given by: 𝒕 = + tan−𝟏 𝒙 . Since the set of real numbers is sometimes
𝟐 𝝅

referred to as the continuum, we denote ℝ = 𝒄. The previous example shows that 𝒂, 𝒃 =


𝒄. It can also be shown that 𝒂, 𝒃 = (𝒂, 𝒃] = [𝒂, 𝒃) = 𝒄.

What is the relation between 𝒄 and 𝓝𝟎 ? Before answering this question, we study the
following theorem:

The cardinality of set 𝑿 is strictly less than the cardinality of set 𝓟 𝑿 , which is the powerset of
𝑿.

Proof. This is obvious for finite nonempty sets where 𝓟 𝑿 = 𝟐 𝑿 . It is also valid for infinite
sets. Consider set 𝑿 and an injection 𝚿 𝑿 from 𝑿 to 𝓟 𝑿 . Assume that 𝚿 𝑿 is also

2
surjective. Consider set 𝑩 = 𝒙 ∈ 𝑿: 𝒙 ∉ 𝚿 𝒙 . Set 𝑩 is a subset of 𝑿 and, hence, is an
element of 𝓟 𝑿 . (The figure below provides a specific example for set 𝑩. )

If 𝚿 is surjective, then there exists 𝒛 ∈ 𝑿 such that 𝚿 𝒛 = 𝑩. If 𝒛 ∈ 𝑩, then, by the definition


of 𝑩, 𝒛 ∉ 𝚿 𝒛 . But 𝚿 𝒛 = 𝑩, which means that 𝒛 ∉ 𝑩. If 𝒛 ∉ 𝑩, then, by the definition of 𝑩,
𝒛 ∈ 𝚿 𝒛 , which means that 𝒛 ∈ 𝑩. Because of these contradictions, we conclude that 𝚿
cannot be surjective and, hence, is not bijective.

Note that since the powerset is a set, it has its own powerset with strictly greater cardinality.
And so forth. So, in fact, there is an infinite number of infinite set sizes.

Now we show that 𝒄 > 𝓝𝟎 . We can focus on the interval 𝟎, 𝟏 as 𝟎, 𝟏 = 𝒄. Suppose that we
write all elements in [𝟎, 𝟏] in binary. These are strings of ones and zeros to the right of a binary
point. Each string can be put into one-to-one correspondence with a set containing the
positions of ones in the binary representation where the first position indicates the position
immediately to the right of the binary point, the second position indicates the position to the
right of the first position, and so forth. For example, the number 𝟎. 𝟓 (decimal) is 𝟎. 𝟏𝟎𝟎 … in
binary. The associated set is 𝟏 . The number 0.5625 is 𝟎. 𝟏𝟎𝟎𝟏𝟎𝟎 … in binary. The associated

3
set is 𝟏, 𝟒 . The number 𝟎. 𝟑𝟑𝟑𝟑 … is 𝟎. 𝟎𝟏𝟎𝟏𝟎𝟏𝟎𝟏 … in binary and the associated set is
𝟐, 𝟒, 𝟔, 𝟖, 𝟏𝟎, … , which is the set of positive even integers. The number zero does not have
any ones in its representation and is associated with the empty set. The number one can be
represented in binary as 𝟎. 𝟏𝟏𝟏𝟏𝟏𝟏 … The corresponding set is 𝟏, 𝟐, 𝟑, 𝟒, 𝟓, … , which is the
set of positive integers ℤ+. Ignoring some technicalities that will not impact the conclusion, we
can see that there is a bijection between 𝟎, 𝟏 and the powerset of ℤ+. Since 𝟎, 𝟏 = 𝒄 and
ℤ+ = 𝓝𝟎 , then 𝒄 > 𝓝𝟎 . In fact, we can write 𝒄 = 𝟐𝓝𝟎 . It is an open problem in mathematics
whether there are infinite sets with cardinality lying between 𝓝𝟎 and 𝒄. Without proof (until
now), the continuum hypothesis states that there is no set whose cardinality is strictly between
that of the set of integers and that of the set of real numbers.

Sets with cardinality strictly greater than 𝓝𝟎 are called uncountable sets or uncountably
infinite sets. Examples include the set of real numbers ℝ, the set of irrational numbers, the set
of transcendental numbers, the sets ℝ𝟐 , ℝ𝟑, etc. Note that all these particular examples have
the same cardinality 𝒄.

There are three very important properties of 𝓝𝟎 that will be useful for our purposes:

i. 𝓝𝟎 + 𝓝𝟎 = 𝓝𝟎 (An example of this is that the set of even integers have a cardinality of 𝓝𝟎 ,
the set of odd integers have a cardinality of 𝓝𝟎 , and the set of integers has cardinality of 𝓝𝟎 .)

ii. 𝓝𝟎 ∙ 𝓝𝟎 = 𝓝𝟎 (An example if this is that the cardinality of ℤ2 is 𝓝𝟎 .)

iii. The finite or countable union of countably infinite sets is countably infinite.

4
Sets of Measure Zero
Consider the intervals 𝒂, 𝒃 , 𝒂, 𝒃 , 𝒂, 𝒃 or 𝒂, 𝒃 , where −∞ < 𝑎 < 𝑏 < ∞. All these intervals
have a measure or length equal to 𝒃 − 𝒂.

Set 𝑨 is covered by set 𝑩 if 𝑨 ⊂ 𝑩. In this case, the measure of 𝑨 is less than the measure of 𝑩. For
example the set 𝟏, 𝟐 of measure 1 is covered by the set 𝟎, 𝝅 , which has a measure of 𝝅.

The measure of the union of a countable number of sets is upperbounded by the sum of the measures
of the sets. If the sets are disjoint, the measure of their union is the sum of their measures. For example,
the measure of 𝟑, 𝟓 ∪ 𝟏𝟎, 𝟏𝟑 is 𝟓 = 𝟐 + 𝟑, whereas the measure of 𝟑, 𝟓 ∪ 𝟒, 𝟕 = 𝟒 while
the sum of measures of 𝟑, 𝟓 and 𝟒, 𝟕 is 𝟐 + 𝟑 = 𝟓.

A set 𝑿 is of measure zero if it can be covered by another set whose measure can be made arbitrarily
small while still covering set 𝑿.

Consider the set consisting of just a single real number 𝜶. This set can be covered by the set
𝝐 𝝐 𝝐 𝝐
[𝜶 − 𝟐 , 𝜶 + 𝟐], 𝝐 > 0. The set [𝜶 − 𝟐 , 𝜶 + 𝟐] has a measure equal to 𝝐 which can be made arbitrarily
small, thereby indicating that sets consisting of single real numbers are of measure zero.

Suppose we have a set that is composed of a finite number 𝒎 of real numbers: 𝜶𝟏 , 𝜶𝟐 , … , 𝜶𝒎 .

𝝐 𝝐 𝝐 𝝐
Given 𝝐 > 0 , this set can be covered by the set 𝛼1 − 𝟐𝒎 , 𝛼1 + 𝟐𝒎 ⋃ 𝛼2 − 𝟐𝒎 , 𝛼2 + 𝟐𝒎 ⋃ … …

𝝐 𝝐 𝜖
⋃ 𝛼𝑚 − 𝟐𝒎 , 𝛼𝑚 + 𝟐𝒎 . The measure of this set is upperbounded by 𝑚 𝑚 = 𝜖. Again, this can be made
arbitrarily small meaning that a set of a finite number of real numbers is of measure zero.

Now let's consider a countably infinite set of real numbers 𝜶𝒌 𝒌=𝟏 . Given 𝝐 > 0 , this set can be
𝝐 𝝐 𝝐 𝝐 𝝐 𝝐 𝝐 𝝐
covered by the set 𝛼1 − 𝟐
, 𝛼1 + 𝟐
⋃ 𝛼2 − 𝟒
, 𝛼2 + 𝟒
⋃ 𝛼3 − 𝟖
, 𝛼3 + 𝟖 ⋃ 𝛼4 − 𝟏𝟔 , 𝛼4 + 𝟏𝟔 ⋃ ….The
𝟏
∞ 𝝐 ∞ 𝟏 𝒌 𝟐
measure of the set is upperbounded by 𝒌=𝟏 𝟐𝒌 =𝝐 𝒌=𝟏 𝟐 =𝝐 𝟏 = 𝝐. This set is also of
𝟏−
𝟐
measure zero.

An example of an uncountably infinite set that has zero measure is Cantor set.

If a function is defined over a set of measure zero, then its integral is zero. For example,
𝟏
𝟎
𝕀 𝒙 ∈ 𝑸 𝒅𝒙 = 𝟎, where 𝑸 is the set of rational numbers, and 𝕀 . is the indicator function, which is
equal to one when its argument is true and zero otherwise.

5
Linear Algebra Review
Vector norms

If 𝒛 ∈ ℂ, 𝒛 = Real 𝒛 + 𝒊Imag 𝒛 , 𝒊 = −𝟏, Real 𝒛 , Imag 𝒛 ∈ ℝ. The complex conjugate of 𝒛


is 𝒛∗ = Real 𝒛 − 𝒊Imag 𝒛 . The magnitude of 𝒛 is 𝒛 = Real 𝒛 𝟐 + Imag 𝒛 𝟐 ≥ 𝟎 and the
Imag 𝒛
argument (or phase or angle) of 𝒛 is arg 𝒛 = tan−𝟏 . The principal value of the
Real 𝒛
argument belongs to the interval [𝟎, 𝟐𝝅) or [−𝝅, 𝝅).
𝒙𝟏
𝒙𝟐
𝒙 = ⋮ ∈ ℂ𝒏×𝟏 (or ℝ𝒏×𝟏 if we are interested only in real-valued vectors).
𝒙𝒏−𝟏
𝒙𝒏

The transpose of 𝒙 is 𝒙𝑻 = 𝒙𝟏 𝒙𝟐 … 𝒙𝒏−𝟏 𝒙𝒏 .

The Hermitian (conjugate transpose) of 𝒙 is 𝒙𝑯 = 𝒙∗𝟏 𝒙∗𝟐 … 𝒙∗𝒏−𝟏 𝒙∗𝒏 .

A norm function 𝓷 . applied to elements of a vector space 𝑽 with ℝ as a codomain must


satisfy the following properties:

i. ∀𝒗 ∈ 𝑽, 𝓷 𝒗 ≥ 𝟎, 𝓷 𝒗 = 𝟎 iff 𝒗 is the zero vector (the identity element of the vector


space).

ii. ∀𝒗 ∈ 𝑽, 𝒄 ∈ ℂ, 𝓷 𝒄𝒗 = 𝒄 𝓷 𝒗 .

iii. ∀𝒗, 𝒘 ∈ 𝑽, 𝓷 𝒗 + 𝒘 ≤ 𝓷 𝒗 + 𝓷 𝒘 (triangle inequality or subadditivity)


𝟏
The 𝓵𝒑 norm (or 𝒑-norm) of vector 𝒙 ∈ ℂ𝒏×𝟏 , 𝒑 ≥ 𝟏, is 𝒙 𝒑 = 𝒙𝟏 𝒑 + 𝒙𝟐 𝒑 + ⋯ + 𝒙𝒏 𝒑 𝒑
.
Parameter 𝒑 is allowed to take the value ∞. In this case, 𝒙 ∞ = 𝐦𝐚𝐱 𝒌 𝒙𝒌 .

𝓵𝟐 norm: 𝒙 = 𝒙𝟏 𝟐 + 𝒙𝟐 𝟐 + ⋯ + 𝒙𝒏 𝟐
𝟐

𝓵𝟏 norm: 𝒙 𝟏 = 𝒙𝟏 + 𝒙𝟐 + ⋯ + 𝒙𝒏

𝓵∞ norm: 𝒙 ∞ = 𝐦𝐚𝐱 𝒌 𝒙𝒌 = max 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝒏 (Also called maximum norm,


supremum norm, sup norm, or uniform norm)

(Sometimes the number of nonzero entries in vector 𝒙 is denoted by 𝒙 𝟎 and is referred to as


𝓵𝟎 norm. However, this is not a norm as it violates property ii.)

6
If 𝒒 > 𝑝 ≥ 1, then
𝟏 𝟏

𝒙 ≤ 𝒙 ≤𝒏 𝒑 𝒒 𝒙
𝒒 𝒑 𝒒

Inner Product and Important Inequalities


𝒏 ∗
The inner product of two vectors 𝒙, 𝒚 ∈ ℂ𝒏×𝟏 is 𝒙𝑯 𝒚 = 𝒌=𝟏 𝒙𝒌 𝒚𝒌 . Note that 𝒙𝑯 𝒚 ∗
=
𝒙𝑯 𝒚 𝑯 = 𝒚𝑯 𝒙.

Cauchy-Schwarz inequality: 𝒙𝑯 𝒚 ≤ 𝒙 𝟐 𝒚 𝟐 with equality iff 𝒙 and 𝒚 are linearly


dependent.
𝟏 𝟏
Holder's inequality: 𝒙𝑯 𝒚 ≤ 𝒙 𝒑 𝒚 𝒒 for 𝒑 ≥ 𝟏, 𝒒 ≥ 𝟏 and 𝒑 + 𝒒 = 𝟏. If 𝒙𝒌 and 𝒚𝒌 are the
𝒌th elements in 𝒙, 𝒚 ∈ ℂ𝒏×𝟏 , respectively, then Holder's inequality is satisfied with strict
equality iff ∀𝒌 ∈ 𝟏, 𝟐, … , 𝒏 , 𝒚𝒌 = 𝒄exp 𝒊𝜽 exp 𝒊arg 𝒙𝒌 𝒙𝒌 𝒑 𝒒 , where 𝒄 is a real-valued
positive constant and 𝜽 ∈ ℝ.
𝟏 𝟏
For 𝒑 + 𝒒 = 𝟏, 𝒙 𝒑 = 𝐦𝐚𝐱 𝒚 𝒒 =𝟏 𝒙𝑯 𝒚

Matrices
Element 𝒗, 𝒌 on the 𝒗th row and 𝒌th column of matrix 𝑨 is denoted 𝑨(𝒗, 𝒌) or 𝑨𝒗,𝒌 or 𝑨𝒗𝒌 .

The trace of square matrix 𝑨 ∈ ℂ𝒏×𝒏 is the sum of the elements on its main diagonal, i.e.,
trace 𝑨 = 𝒏𝒌=𝟏 𝑨(𝒌, 𝒌). If 𝑨 ∈ ℂ𝒏×𝒎 and 𝑩 ∈ ℂ𝒎×𝒏 , trace 𝑨𝑩 = trace 𝑩𝑨 .

A square matrix 𝑨 ∈ ℂ𝒏×𝒏 is nonsingular or invertible (and, hence, 𝑨−𝟏 exists) iff det 𝑨 ≠ 𝟎. In
this case det 𝑨−𝟏 = 𝟏 det 𝑨 . Generally, for 𝑨, 𝑩 ∈ ℂ𝒏×𝒏 , det 𝑨𝑩 = det 𝑨 det 𝑩 .

The range or column space of matrix 𝑨 ∈ ℂ𝒏×𝒎 is 𝓡 𝑨 = 𝒚 ∈ ℂ𝒏×𝟏 : 𝒚 = 𝑨𝒙, 𝒙 ∈ ℂ𝒎×𝟏 .

The range is the span of the columns of 𝑨.

The nullspace or kernel of matrix 𝑨 ∈ ℂ𝒏×𝒎 is 𝓝 𝑨 = 𝒙 ∈ ℂ𝒎×𝟏 : 𝑨𝒙 = 𝟎𝒏×𝟏 .

The dimension of a vector space or subspace is the number of linearly independent vectors
needed to span the whole space or subspace.

Rank-nullity Theorem: dim 𝓡 𝑨 + dim 𝓝 𝑨 = 𝒎, where 𝒎 is the number of columns of


𝑨.

The rank of matrix 𝑨 ∈ ℂ𝒏×𝒎 is the largest number of rows (columns) of 𝑨 that constitute a
linearly independent set. The rank of 𝑨 is zero iff 𝑨 is all-zero matrix.

7
rank 𝑨 = dim 𝓡 𝑨 = 𝒎 − dim 𝓝 𝑨

rank 𝑨 ≤ min 𝒏, 𝒎 (Hence, if 𝒙 is a nonzero vector, its rank is equal to 𝟏.)

rank 𝑨 = rank 𝑨𝑯 = rank 𝑨𝑨𝑯 = rank 𝑨𝑯 𝑨

rank 𝑨𝑩 ≤ min rank 𝑨 ,rank 𝑩 for 𝑨 ∈ ℂ𝒏×𝒎 and 𝑩 ∈ ℂ𝒎×𝒌 (Hence, if 𝒙 ∈ ℂ𝒏×𝟏 is a
nonzero vector, the rank of 𝒙𝒙𝑯 is equal to 𝟏. If 𝒙, 𝒚 ∈ ℂ𝒏×𝟏 are nonzero vectors, the rank of
𝒙𝒚𝑯 or 𝒚𝒙𝑯 is equal to 𝟏.)

rank 𝑨 + 𝑩 ≤ rank 𝑨 + rank 𝑩 for 𝑨 ∈ ℂ𝒏×𝒎 and 𝑩 ∈ ℂ𝒏×𝒎

A square matrix 𝑨 ∈ ℂ𝒏×𝒏 is nonsingular or invertible if rank 𝑨 = dim 𝓡 𝑨 = 𝒏 or


dim 𝓝 𝑨 = 𝟎.

Hermitian and Unitary Matrices

A square matrix 𝑨 ∈ ℂ𝒏×𝒏 is Hermitian if 𝑨𝑯 = 𝑨. The main-diagonal elements of a Hermitian


matrix are real-valued.

If 𝑨, 𝑩 ∈ ℂ𝒏×𝒏 , 𝑨𝑩 𝑯
= 𝑩𝑯 𝑨𝑯 .

If 𝑨 ∈ ℂ𝒏×𝒎 , 𝑨 = 𝑨col,𝟏 𝑨col,𝟐 … 𝑨col,𝒎−𝟏 𝑨col,𝒎 , then 𝑨𝑨𝑯 ∈ ℂ𝒏×𝒏 is Hermitian,


𝑨𝑨𝑯 = 𝒎 𝑯
𝒌=𝟏 𝑨col,𝒌 𝑨col,𝒌 .

If 𝑨 ∈ ℂ𝒏×𝒎 , 𝑨 = 𝑨col,𝟏 𝑨col,𝟐 … 𝑨col,𝒎−𝟏 𝑨col,𝒎 , then 𝑨𝑯 𝑨 ∈ ℂ𝒎×𝒎 is Hermitian.


Element 𝒗, 𝒌 (on the 𝒗th row and 𝒌th column) in 𝑨𝑯 𝑨 is 𝑨𝑯
col,𝒗 𝑨col,𝒌 .

A square matrix 𝑨 ∈ ℂ𝒏×𝒏 is unitary if 𝑨𝑯 𝑨 = 𝑨𝑨𝑯 = 𝑰𝒏×𝒏 . That is, 𝑨 is always invertible and its
inverse is 𝑨𝑯 . The columns (rows) of a unitary matrix constitute an orthonormal set of vectors.

A square matrix 𝑨 ∈ ℝ𝒏×𝒏 is symmetric if 𝑨𝑻 = 𝑨. A square matrix 𝑨 ∈ ℝ𝒏×𝒏 is orthogonal if


𝑨𝑻 𝑨 = 𝑨𝑨𝑻 = 𝑰𝒏×𝒏 . That is, 𝑨 is always invertible and its inverse is 𝑨𝑻 .

Eigendecomposition

Consider a square matrix 𝑨 ∈ ℂ𝒏×𝒏 . If 𝑨𝒒 = 𝝀𝒒, 𝒒 ∈ ℂ𝒏×𝟏 is a nonzero vector, 𝝀 ∈ ℂ, then 𝝀 is


an eigenvalue of 𝑨 and 𝒒 is an associated eigenvector. If 𝒒 is an eigenvector, so is 𝒛𝒒, 𝒛 ∈ ℂ, 𝒛 ≠
𝟎. The eigenvalue spectrum 𝝈 𝑨 of 𝑨 is the set of all eigenvalues.

The eigenvalues are the roots of the characteristic polynomial 𝒑𝑨 𝝀 = det 𝝀𝑰𝒏×𝒏 − 𝑨 .

8
Matrix 𝑨 ∈ ℂ𝒏×𝒏 has exactly 𝒏 eigenvalues, counting multiplicities. The eigenvectors
corresponding to different eigenvalues are linearly independent.

Consider square matrices 𝑨, 𝑩 ∈ ℂ𝒏×𝒏 where each has 𝒏 linearly independent eigenvectors.
Matrices 𝑨 and 𝑩 commute, , i.e., 𝑨𝑩 = 𝑩𝑨, iff they share the same eigenvectors.

Assuming 𝝀𝟏 , 𝝀𝟐 , … , 𝝀𝒏 are the eigenvalues of 𝑨 ∈ ℂ𝒏×𝒏 (perhaps not distinct), then:

𝒑𝑨 𝝀 = det 𝝀𝑰𝒏×𝒏 − 𝑨 = 𝒏𝒌=𝟏 𝝀 − 𝝀𝒌 = 𝝀𝒏 − 𝒏


𝒌=𝟏 𝝀𝒌 𝝀𝒏−𝟏 + ⋯ + −𝟏 𝒏 𝒏
𝒌=𝟏 𝝀𝒌 =
𝝀𝒏 − trace 𝑨 𝝀𝒏−𝟏 + ⋯ + −𝟏 𝒏 det 𝑨 .

That is, trace 𝑨 = 𝒏𝒌=𝟏 𝑨(𝒌, 𝒌) = 𝑛𝑘=1 𝝀𝒌 and det 𝑨 = 𝑛𝑘=1 𝝀𝒌 . The trace is the sum of the
eigenvalues and the determinant is the product of the eigenvalues.

The eigenvalues of 𝑨𝑻 are the same as those of 𝑨. The eigenvalues of 𝑨𝑯 are the complex
conjugates of the eigenvalues of 𝑨.

The set of eigenvectors associated with a particular eigenvalue 𝝀 ∈ 𝝈 𝑨 , 𝑨 ∈ ℂ𝒏×𝒏 , is a


subspace of the 𝒏-dimensional complex space, called eigenspace of 𝑨 corresponding to 𝝀. The
dimension of this eigenspace is called the geometric multiplicity of 𝝀. The multiplcity of 𝝀 as a
zero of 𝒑𝑨 𝝀 is the algebraic multiplicity of 𝝀. The algebraic multiplicity is greater than or equal
to the geometric multiplicity. A matrix with one or more eigenvalues such that the algebraic
multiplicity is greater than the geometric multiplicity is called defective.

If 𝑨 ∈ ℂ𝒏×𝒎 and 𝑩 ∈ ℂ𝒎×𝒏 , matrix 𝑨𝑩 ∈ ℂ𝒏×𝒏 has same nonzero eigenvalues as matrix
𝑩𝑨 ∈ ℂ𝒎×𝒎 .

If 𝑨 ∈ ℂ𝒏×𝒏 and 𝝀 ∈ 𝝈 𝑨 , then 𝝀𝒌 ∈ 𝝈 𝑨𝒌 .

If 𝑨 ∈ ℂ𝒏×𝒏 and 𝝀 ∈ 𝝈 𝑨 , then 𝟏 + 𝝀 ∈ 𝝈 𝑰𝒏×𝒏 + 𝑨 .

If 𝑨 ∈ ℂ𝒏×𝒏 , 𝑨 is nonsingular and 𝝀 ∈ 𝝈 𝑨 , then 𝝀−𝟏 ∈ 𝝈 𝑨−𝟏 .

Nondefective matrices have eigenvalues such that each eigenvalue has the same algebraic and
geometric multiplicities. Nondefective matrices are diagonalizable. That is, if 𝑨 ∈ ℂ𝒏×𝒏 and 𝑨 is
diagonalizable, 𝑨 can be expressed as:

𝑨 = 𝑸𝚲𝑸−𝟏
where 𝑸 ∈ ℂ𝒏×𝒏 is a matrix with linearly independent eigenvectors as columns, and 𝚲 is a
diagonal matrix containing the corresponding eigenvalues.

9
For nondefective/diagonalizable matrices, the rank of the matrix is equal to the number of
nonzero eigenvalues, counting multiplicities.

Normal Matrices
Matrix 𝑨 ∈ ℂ𝒏×𝒏 is normal if it satisfies 𝑨𝑯 𝑨 = 𝑨𝑨𝑯 . All Hermitian, skew-Hermitian and unitary
matrices are normal. All normal matrices are diagonalizable. Moreover, the eigenvector matrix
𝑸 of eigendecomposition of 𝑨 can be chosen to be unitary, i.e., 𝑸𝑯 𝑸 = 𝑸𝑸𝑯 = 𝑰𝒏×𝒏 . Hence,
for normal matrix 𝑨,

𝑨 = 𝑸𝚲𝑸𝑯
If 𝑨 ∈ ℂ𝒏×𝒏 is Hermitian, its eigenvalues are real-valued and, as mentioned for normal matrices,
its eigenvectors are orthogonal to one another.

Quadratic Forms and Definiteness of Matrices


Consider Hermitian matrix 𝑨 ∈ ℂ𝒏×𝒏 and vector 𝒙 ∈ ℂ𝒏×𝟏 . The quadratic form 𝒙𝑯 𝑨𝒙 is a real-
valued scalar that is given by: 𝒙𝑯 𝑨𝒙 = 𝒏𝒌=𝟏 𝒏𝒎=𝟏 𝒙∗𝒌 𝒙𝒎 𝑨 𝒌, 𝒎 .

A Hermitian matrix 𝑨 ∈ ℂ𝒏×𝒏 is positive definite if 𝒙𝑯 𝑨𝒙 > 0 for all nonzero 𝒙 ∈ ℂ𝒏×𝟏 .

A Hermitian matrix 𝑨 ∈ ℂ𝒏×𝒏 is positive semidefinite if 𝒙𝑯 𝑨𝒙 ≥ 𝟎 for all 𝒙 ∈ ℂ𝒏×𝟏 .

For a Hermitian matrix 𝑨 ∈ ℂ𝒏×𝒏 , let 𝝀𝟏 be its maximum eigenvalue and 𝝀𝒏 be its minimum
eigenvalue. We have:

𝝀𝒏 𝒙𝑯 𝒙 ≤ 𝒙𝑯 𝑨𝒙 ≤ 𝝀𝟏 𝒙𝑯 𝒙
The quadratic form 𝒙𝑯 𝑨𝒙 achieves its maximum possible value, 𝝀𝟏 𝒙 𝟐𝟐 , when 𝒙 is an
eigenvector of 𝑨 corresponding to its maximum eigenvalue. Hence, we can write:

𝒙𝑯 𝑨𝒙
𝝀𝟏 = 𝝀max 𝑨 = 𝐦𝐚𝐱 = 𝐦𝐚𝐱 𝒙𝑯 𝑨𝒙
𝒙≠𝟎 𝒙𝑯 𝒙 𝒙 𝟐 =𝟏

Similarly, 𝒙𝑯 𝑨𝒙 achieves its minimum possible value, 𝝀𝒏 𝒙 𝟐𝟐, when 𝒙 is an eigenvector of 𝑨


corresponding to its minimum eigenvalue. Hence, we can write:

𝒙𝑯 𝑨𝒙
𝝀𝒏 = 𝝀min 𝑨 = 𝐦𝐢𝐧 = 𝐦𝐢𝐧 𝒙𝑯 𝑨𝒙
𝒙≠𝟎 𝒙𝑯 𝒙 𝒙 𝟐 =𝟏

10
A Hermitian positive definite matrix has strictly positive eigenvalues.

A Hermitian positive semidefinite matrix has nonnegative eigenvalues.

A Hermitian matrix is positive semidefinite iff its eigenvalues are nonnegative.

Matrix Norms
𝒏 𝒎
For matrix 𝑨 ∈ ℂ𝒏×𝒎 , the Frobenius matrix norm is defined as: 𝑨 𝑭 = 𝒌=𝟏 𝒗=𝟏 𝑨 𝒌, 𝒗 𝟐.

𝟐 𝒏 𝒎 𝟐
Using the trace operation, 𝑨 𝑭 = 𝒌=𝟏 𝒗=𝟏 𝑨 𝒌, 𝒗 = trace 𝑨𝑯 𝑨 = trace 𝑨𝑨𝑯 .

Using the 𝓵𝒑 vector norm, an induced matrix norm is defined as:

𝑨𝒙 𝒑
𝑨 𝒑 = 𝐬𝐮𝐩 = 𝐬𝐮𝐩 𝑨𝒙 𝒑
𝒙≠𝟎 𝒙 𝒑 𝒙 𝒑 =𝟏

One can show that matrix norms satisfy the same criteria for a vector norm. Moreover, for all
matrix norms and matrices 𝑨 ∈ ℂ𝒏×𝒎 and 𝑩 ∈ ℂ𝒎×𝒌 :

𝑨𝑩 ≤ 𝑨 𝑩

11
Important Inequalities

Triangle Inequality

𝒏 𝒛𝒏 ≤ 𝒏 𝒛𝒏 , where 𝒛𝒏 ∈ ℂ. If we have an integral, 𝒇 𝒛 𝒅𝒛 ≤ 𝒇(𝒛) 𝒅𝒛, where


𝒇(𝒛) is a complex-valued function.

In a normed vector space 𝑽, one of the defining properties of the norm is the triangle inequality:

𝒙 + 𝒚 ≤ 𝒙 + 𝒚 for all 𝒙, 𝒚 ∈ 𝑽.

Cauchy-Schwarz Inequality
For vectors 𝒙, 𝒚 ∈ ℂ𝒏×𝟏 , 𝒙𝑯 𝒚 ≤ 𝒙 𝟐 𝒚 𝟐 with equality iff 𝒙 and 𝒚 are linearly dependent.

Proof.

The statement is obvious if one or both vectors is all-zero vector. Assume now that both 𝒙 and
𝒚 are nonzero. Consider vector 𝜶𝒙 − 𝒚, where 𝜶 ∈ ℂ.
𝒚𝒊𝒆𝒍𝒅𝒔
𝟐
𝜶𝒙 − 𝒚 𝟐 ≥𝟎 𝜶∗ 𝒙𝑯 − 𝒚𝑯 𝜶𝒙 − 𝒚 = 𝜶 𝟐 𝒙𝑯 𝒙 + 𝒚𝑯 𝒚 − 𝜶∗ 𝒙𝑯 𝒚 − 𝜶𝒚𝑯 𝒙 ≥ 𝟎

𝒙𝑯 𝒚
This inequality is valid for any 𝜶. Set 𝜶 = 𝒙𝑯 𝒙 . Hence,

𝒙𝑯 𝒚 𝟐
𝑯
𝒙𝑯 𝒚 𝟐
𝑯
𝒙𝑯 𝒚 𝟐 𝒚𝒊𝒆𝒍𝒅𝒔
𝑯
𝒙𝑯 𝒚 𝟐
𝒙 𝒙+𝒚 𝒚− 𝑯 − 𝑯 ≥ 𝟎 𝒚 𝒚≥ 𝑯
𝒙𝑯 𝒙 𝟐 𝒙 𝒙 𝒙 𝒙 𝒙 𝒙

Thus, 𝒙𝑯 𝒚 𝟐
≤ 𝒙𝑯 𝒙 𝒚𝑯 𝒚 or 𝒙𝑯 𝒚 ≤ 𝒙 𝑯 𝒙 𝒚𝑯 𝒚 = 𝒙 𝟐 𝒚 𝟐

An important point is that Cauchy-Schwarz inequality is achieved with equality if and only if
𝒚 = 𝜷𝒙, where 𝜷 ∈ ℂ. This is based on the fact that for 𝒛 ∈ ℂ𝒏 , 𝒛 𝟐 = 𝟎 iff 𝒛 is the all-zero
vector. Thus, 𝜶𝒙 − 𝒚 𝟐 = 𝟎 iff 𝜶𝒙 = 𝒚.
𝒏 𝒏
If the components of 𝒙 are 𝒙𝒌 𝒌=𝟏 and those of 𝒚 are 𝒚𝒌 𝒌=𝟏 , the Cauchy-Schwarz inequality
𝒏 ∗ 𝒏 𝟐 𝒏 𝟐
can be written as 𝒌=𝟏 𝒙𝒌 𝒚𝒌 ≤ 𝒌=𝟏 𝒙𝒌 𝒌=𝟏 𝒚𝒌

Replacing 𝒙∗𝒌 by 𝒙𝒌 on both sides, we obtain 𝒏


𝒌=𝟏 𝒙𝒌 𝒚𝒌 ≤ 𝒏
𝒌=𝟏 𝒙𝒌 𝟐 𝒏
𝒌=𝟏 𝒚𝒌 𝟐

The integral form of Cauchy-Schwarz inequality is:

12
𝒇 𝜶 𝒈 𝜶 𝒅𝜶 ≤ 𝒇 𝜶 𝟐 𝒅𝜶 𝒈 𝜶 𝟐 𝒅𝜶

where 𝒇 𝜶 and 𝒈 𝜶 are complex-valued functions.

Arithmetic-Mean Geometric Mean (AM-GM) Inequality


𝒏
Consider positive real numbers 𝒙𝒌 𝒌=𝟏 .

𝟏 𝒏 𝟏
Their arithmetic mean is
𝒏 𝒌=𝟏 𝒙𝒌 = 𝒏 𝒙𝟏 + 𝒙𝟐 + ⋯ + 𝒙𝒏 .

𝒏 𝟏 𝒏 𝟏 𝒏
Their geometric mean is 𝒌=𝟏 𝒙𝒌 = 𝒙𝟏 𝒙𝟐 … 𝒙𝒏 .
𝟏 𝒏 𝒏 𝟏 𝒏
The AM-GM inequality states that:
𝒏 𝒌=𝟏 𝒙𝒌 ≥ 𝒌=𝟏 𝒙𝒌 with equality iff all the numbers are
equal.

Proof.

We can make use of the inequality exp 𝒛 ≥ 𝟏 + 𝒛, with equality iff 𝒛 = 𝟎, which is equivalent to
exp 𝒛 − 𝟏 ≥ 𝒛 with equality iff 𝒛 = 𝟏.
𝒙𝒌 𝟏 𝒏
Consider 𝒚𝒌 = where 𝜼 is the arithmetic mean, i.e., 𝜼 = 𝒌=𝟏 𝒙𝒌 .
𝜼 𝒏

𝒏 𝟏 𝒏 𝒏 𝟏 𝒏 𝟏 𝒏
Since 𝒛 ≤ exp 𝒛 − 𝟏 , 𝒌=𝟏 𝒚𝒌 ≤ 𝒌=𝟏 exp 𝒚𝒌 − 𝟏 = exp 𝒌=𝟏 𝒚𝒌 −𝟏 .
𝒏

𝟏 𝒏 𝟏 𝒏 𝒙𝒌 𝜼
𝒏 𝒌=𝟏 𝒚𝒌 =𝒏 𝒌=𝟏 𝜼 = 𝜼
= 𝟏.

𝒏 𝟏 𝒏 𝒏 𝟏 𝒏
𝟏
𝒚𝒌 = 𝒙𝒌
𝜼
𝒌=𝟏 𝒌=𝟏

𝟏 𝒚𝒊𝒆𝒍𝒅𝒔 𝟏
𝒏 𝟏 𝒏 𝒏 𝟏 𝒏 𝒏
Therefore,
𝜼 𝒌=𝟏 𝒙𝒌 ≤ exp 𝟏 − 𝟏 = 𝟏 𝒌=𝟏 𝒙𝒌 ≤𝜼=𝒏 𝒌=𝟏 𝒙𝒌 .

Note that exp 𝒛 − 𝟏 = 𝒛 iff 𝒛 = 𝟏. This means that the AM-GM inequality is satisfied with
equality iff ∀𝒌 𝒚𝒌 = 𝟏. This is equivalent to: ∀𝒌 𝒙𝒌 = 𝜼.

13
Leibniz's Rule for Differentiation under the Integral Sign
Let 𝒇(𝒙, 𝒕) be a function such that the partial derivative of 𝒇 with respect to 𝒕 exists, and is continuous.
Then,

𝜷 𝒕 𝜷 𝒕
𝒅 𝝏𝒇 𝒅𝜷 𝒅𝜶
𝒇 𝒙, 𝒕 𝒅𝒙 = 𝒅𝒙 + 𝒇 𝜷 𝒕 ,𝒕 − 𝒇 𝜶 𝒕 ,𝒕
𝒅𝒕 𝝏𝒕 𝒅𝒕 𝒅𝒕
𝜶 𝒕 𝜶 𝒕

Example:

𝒕 𝒕
𝒅 𝐥𝐧 𝟏 + 𝒕𝒙 𝒙 𝐥𝐧 𝟏 + 𝒕𝟐
𝒅𝒙 = 𝒅𝒙 +
𝒅𝒕 𝟏 + 𝒙𝟐 𝟏 + 𝒙𝟐 𝟏 + 𝒕𝒙 𝟏 + 𝒕𝟐
𝟎 𝟎

Another example:


𝒅 𝒙𝟐 𝒕𝟐
− −
𝒆 𝟐 𝒅𝒙 = −𝒆 𝟐
𝒅𝒕
𝒕

14
Taylor Series
Consider function 𝒈 𝒙 : ℝ → ℝ. Its Taylor series expansion about a point 𝒙𝟎 is given by:

𝒌 𝒙
∞ 𝒈 𝒅𝒈(𝒙)
𝒈 𝒙 = 𝒌=𝟎
𝟎
𝒙 − 𝒙𝟎 𝒌 , where 𝒈 𝟎 𝒙 = 𝒈 𝒙 , 𝒈 𝟏 𝒙 = ,𝒈 𝟐 𝒙 =
𝒌! 𝒅𝒙
𝒅𝟐 𝒈(𝒙)
, and so on. Using 𝒏 + 𝟏 terms and a remainder term, we can write:
𝒅𝒙𝟐

𝒏
𝒈 𝒌 𝒙𝟎 𝒌
𝒈 𝒏+𝟏 𝜻 𝒏+𝟏
𝒈 𝒙 = 𝒙 − 𝒙𝟎 + 𝒙 − 𝒙𝟎
𝒌! 𝒏+𝟏 !
𝒌=𝟎

where 𝜻 = 𝝀𝒙𝟎 + 𝟏 − 𝝀 𝒙, 𝝀 ∈ 𝟎, 𝟏 , i.e., 𝜻 lies between 𝒙 and 𝒙𝟎 .

Example: Let 𝒈 𝒙 = 𝒆𝒙 and 𝒙𝟎 = 𝟎. Hence,

𝟏 𝒆𝜻
𝒆𝒙 = 𝒏 𝒌
𝒌=𝟎 𝒌! 𝒙 + 𝒏+𝟏 ! 𝒙
𝒏+𝟏 , 𝜻 ∈ 𝟎, 𝒙 when 𝒙 > 0 and 𝜻 ∈ 𝒙, 𝟎 when 𝒙 < 0.

Note that if 𝒏 is odd 𝒙𝒏+𝟏 is nonnegative. Hence, for odd 𝒏,

𝟏
𝒆𝒙 ≥ 𝒏 𝒌 𝒙
𝒌=𝟎 𝒌! 𝒙 for all 𝒙. When 𝒏 = 𝟏, 𝒆 ≥ 𝟏 + 𝒙 for all 𝒙. This means that 𝒆
−𝒙
≥ 𝟏 − 𝒙 for all 𝒙.

If function 𝒈(𝒙) is complex-valued,

𝒏 𝒙
𝒈 𝒌 𝒙𝟎 𝟏
𝒈 𝒙 = 𝒙 − 𝒙𝟎 𝒌 + 𝒙 − 𝒕 𝒏 𝒈 𝒏+𝟏 𝒕 𝒅𝒕
𝒌! 𝒏!
𝒌=𝟎 𝒙𝟎

This expression is also valid for real-valued functions.

In the multidimensional case when 𝒈 𝒙 : ℝ𝑵 → ℝ, the Taylor series expansion about point 𝒚 is given
by:

𝟏
𝒈 𝒙 = 𝒈 𝒚 + 𝛁T𝒈 𝒚 𝒙 − 𝒚 + 𝒙 − 𝒚 T𝓗 𝒚 𝒙 − 𝒚 + ⋯
𝟐
𝝏𝒈 𝝏𝒈 𝝏𝒈
where 𝛁 T 𝒈 𝒚 = … and 𝓗 𝒚 is the Hessian matrix evaluated at point 𝒚.
𝝏𝒙𝟏 𝒚 𝝏𝒙𝟐 𝒚 𝝏𝒙𝑵 𝒚

15
That is,

𝑵 𝑵 𝑵
𝝏𝒈 𝟏 𝝏𝟐 𝒈
𝒈 𝒙 =𝒈 𝒚 + 𝒙 𝒏 − 𝒚𝒏 + 𝒙 𝒏 − 𝒚𝒏 𝒙 𝒎 − 𝒚 𝒎
𝝏𝒙𝒏 𝒚
𝟐 𝝏𝒙𝒏 𝝏𝒙𝒎 𝒚
𝒏=𝟏 𝒏=𝟏 𝒎=𝟏
𝑵 𝑵 𝑵 𝟑
𝟏 𝝏 𝒈
+ 𝒙 𝒏 − 𝒚𝒏 𝒙 𝒎 − 𝒚𝒎 𝒙 𝒌 − 𝒚𝒌 + ⋯
𝟔 𝝏𝒙𝒏 𝝏𝒙𝒎 𝝏𝒙𝒌 𝒚
𝒏=𝟏 𝒎=𝟏 𝒌=𝟏

With remainder terms, and given 𝝀 ∈ 𝟎, 𝟏 , we have:

𝒈 𝒙 = 𝒈 𝒚 + 𝛁 T 𝒈 𝝀𝒙 + (𝟏 − 𝝀)𝒚 𝒙 − 𝒚 = 𝒈 𝒚 + 𝛁 T 𝒈 𝒚 + 𝝀 𝒙 − 𝒚 𝒙−𝒚 ,

𝟏
𝒈 𝒙 = 𝒈 𝒚 + 𝛁T𝒈 𝒚 𝒙 − 𝒚 + 𝒙 − 𝒚 T 𝓗 𝝀𝒙 + (𝟏 − 𝝀)𝒚 𝒙 − 𝒚
𝟐
𝟏
= 𝒈 𝒚 + 𝛁T𝒈 𝒚 𝒙 − 𝒚 + 𝒙 − 𝒚 T𝓗 𝒚 + 𝝀 𝒙 − 𝒚 𝒙 − 𝒚
𝟐

16
Differentiation with respect to vectors and matrices
In order to solve an optimization problem, we typically need to partially differentiate a real-
valued scalar function of multiple variables. The differentiation is with respect to real-valued
optimization variables, or with respect to the real and imaginary parts of complex-valued
optimization variables. These optimization variables can be the elements of a vector or a
matrix.

For example, consider vectors 𝒙, 𝒂 ∈ ℝ𝒏×𝟏 :

𝒙 = 𝒙𝟏 𝒙𝟐 … 𝒙𝒏−𝟏 𝒙𝒏 𝑻
and 𝒂 = 𝒂𝟏 𝒂𝟐 … 𝒂𝒏−𝟏 𝒂𝒏 𝑻 .

Assume that we want to differentiate 𝑱 𝒙 = 𝒂𝑻 𝒙 in order to obtain


𝝏𝑱 𝝏𝑱 𝝏𝑱 𝝏𝑱 𝑻
𝛁𝒙 𝑱 = … . Let 𝒛 be a real scalar, and 𝒆𝒌 be a vector with 𝟏 in the 𝒌th
𝝏𝒙𝟏 𝝏𝒙𝟐 𝝏𝒙𝒏−𝟏 𝝏𝒙𝒏
𝝏𝑱
position and zero elsewhere. We can compute 𝝏𝒙 as:
𝒌

𝝏𝑱 𝑱 𝒙 + 𝒛 𝒆𝒌 − 𝑱 𝒙 𝒂𝑻 𝒙 + 𝒛 𝒆𝒌 − 𝒂𝑻 𝒙
= 𝐥𝐢𝐦 = 𝐥𝐢𝐦 = 𝒂𝑻 𝒆𝒌 = 𝒂𝒌
𝝏𝒙𝒌 𝒛→𝟎 𝒛 𝒛→𝟎 𝒛

𝝏𝑱 𝝏𝑱 𝝏𝑱 𝝏𝑱 𝑻
Thus, 𝛁𝒙 𝑱 = … = 𝒂𝟏 𝒂𝟐 … 𝒂𝒏−𝟏 𝒂𝒏 𝑻
= 𝒂 ⟹ 𝛁𝒙 𝒂𝑻 𝒙 = 𝒂.
𝝏𝒙𝟏 𝝏𝒙𝟐 𝝏𝒙𝒏−𝟏 𝝏𝒙𝒏

We move now to the complex-valued case. Consider vectors 𝒙, 𝒂 ∈ ℂ𝒏×𝟏 :

𝒙 = 𝒙𝟏 𝒙𝟐 … 𝒙𝒏−𝟏 𝒙𝒏 𝑻

… 𝒙𝒏−𝟏,𝒓 + 𝒊𝒙𝒏−𝟏,𝒊 𝒙𝒏,𝒓 + 𝒊𝒙𝒏,𝒊 𝑻


= 𝒙𝟏,𝒓 + 𝒊𝒙𝟏,𝒊 𝒙𝟐,𝒓 + 𝒊𝒙𝟐,𝒊

𝒂 = 𝒂𝟏 𝒂𝟐 … 𝒂𝒏−𝟏 𝒂𝒏 𝑻

… 𝒂𝒏−𝟏,𝒓 + 𝒊𝒂𝒏−𝟏,𝒊 𝒂𝒏,𝒓 + 𝒊𝒂𝒏,𝒊 𝑻


= 𝒂𝟏,𝒓 + 𝒊𝒂𝟏,𝒊 𝒂𝟐,𝒓 + 𝒊𝒂𝟐,𝒊

Assume that we want to differentiate 𝑱 𝒙 = 𝒂𝑯 𝒙 + 𝒙𝑯 𝒂 with respect to the real and


imaginary parts of the elements of vector 𝒙 and arrange the partial derivatives as:

𝝏𝑱 𝝏𝑱
+𝒊
𝝏𝒙𝟏,𝒓 𝝏𝒙𝟏,𝒊
𝝏𝑱 𝝏𝑱
+𝒊
𝝏𝒙𝟐,𝒓 𝝏𝒙𝟐,𝒊
𝛁𝒙 𝑱 = ⋮
𝝏𝑱 𝝏𝑱
+𝒊
𝝏𝒙𝒏−𝟏,𝒓 𝝏𝒙𝒏−𝟏,𝒊
𝝏𝑱 𝝏𝑱
+𝒊
𝝏𝒙𝒏,𝒓 𝝏𝒙𝒏,𝒊

17
𝝏𝑱 𝝏𝑱
We can compute 𝝏𝒙 and 𝝏𝒙 as:
𝒌,𝒓 𝒌,𝒊

𝝏𝑱 𝑱 𝒙 + 𝒛 𝒆𝒌 − 𝑱 𝒙 𝝏𝑱 𝒙 𝑱 𝒙 + 𝒛𝒊𝒆𝒌 − 𝑱 𝒙
= 𝐥𝐢𝐦 , = 𝐥𝐢𝐦
𝝏𝒙𝒌,𝒓 𝒛→𝟎 𝒛 𝝏𝒙𝒌,𝒊 𝒛→𝟎 𝒛

𝝏𝑱 𝒂𝑯 𝒙 + 𝒛 𝒆𝒌 + 𝒙 + 𝒛 𝒆𝒌 𝑯 𝒂 − 𝒂𝑯 𝒙 + 𝒙𝑯 𝒂 𝒛 𝒂𝑯 𝒆𝒌 + 𝒆𝑯
𝒌𝒂
= 𝐥𝐢𝐦 = 𝐥𝐢𝐦
𝝏𝒙𝒌,𝒓 𝒛→𝟎 𝒛 𝒛→𝟎 𝒛

= 𝒂𝒌 + 𝒂𝒌 = 𝟐𝒂𝒌,𝒓

𝝏𝑱 𝒂𝑯 𝒙 + 𝒛𝒊𝒆𝒌 + 𝒙 + 𝒛𝒊𝒆𝒌 𝑯 𝒂 − 𝒂𝑯 𝒙 + 𝒙𝑯 𝒂 𝒊𝒛 𝒂𝑯 𝒆𝒌 − 𝒆𝑯
𝒌𝒂
= 𝐥𝐢𝐦 = 𝐥𝐢𝐦
𝝏𝒙𝒌,𝒊 𝒛→𝟎 𝒛 𝒛→𝟎 𝒛

= 𝒊𝒂𝒌 − 𝒊𝒂𝒌 = 𝟐𝒂𝒌,𝒊

𝝏𝑱 𝝏𝑱
+𝒊 = 𝟐𝒂𝒌,𝒓 + 𝒊𝟐𝒂𝒌,𝒊 = 𝟐𝒂𝒌
𝝏𝒙𝒌,𝒓 𝝏𝒙𝒌,𝒊

Thus,

𝝏𝑱 𝝏𝑱
+𝒊
𝝏𝒙𝟏,𝒓 𝝏𝒙𝟏,𝒊
𝝏𝑱 𝝏𝑱 𝒂𝟏
+𝒊
𝝏𝒙𝟐,𝒓 𝝏𝒙𝟐,𝒊 𝒂𝟐
⋮ =𝟐 ⋮ = 𝟐𝒂 ⇒ 𝛁𝒙 𝒂𝑯 𝒙 + 𝒙𝑯 𝒂 = 𝟐𝒂
𝝏𝑱 𝝏𝑱 𝒂𝒏−𝟏
+𝒊 𝒂𝒏
𝝏𝒙𝒏−𝟏,𝒓 𝝏𝒙𝒏−𝟏,𝒊
𝝏𝑱 𝝏𝑱
+𝒊
𝝏𝒙𝒏,𝒓 𝝏𝒙𝒏,𝒊

Suppose we want to differentiate the quadratic form 𝑱 𝒙 = 𝒙𝑻 𝑨𝒙, where 𝑨 ∈ ℝ𝒏×𝒏 , with
respect to the elements of vector 𝒙 ∈ ℝ𝒏×𝟏 .

𝝏𝑱 𝒙 + 𝒛 𝒆𝒌 𝑻 𝑨 𝒙 + 𝒛 𝒆𝒌 − 𝒙𝑻 𝑨𝒙 𝒛 𝒆𝑻𝒌 𝑨𝒙 + 𝒙𝑻 𝑨𝒆𝒌 + 𝒛𝟐 𝒆𝑻𝒌 𝑨𝒆𝒌


= 𝐥𝐢𝐦 = 𝐥𝐢𝐦
𝝏𝒙𝒌 𝒛→𝟎 𝒛 𝒛→𝟎 𝒛
𝑻 𝑻 𝑻 𝑻 𝑻 𝑻
= 𝒆𝒌 𝑨𝒙 + 𝒙 𝑨𝒆𝒌 = 𝒆𝒌 𝑨𝒙 + 𝒆𝒌 𝑨 𝒙 = 𝑨row,𝒌 𝒙 + 𝑨col,𝒌 𝒙

18
𝝏𝑱
𝝏𝒙𝟏
𝝏𝑱 𝑨row,𝟏 𝒙 𝑨𝑻col,𝟏 𝒙
𝝏𝒙𝟐 𝑨row,𝟐 𝒙 𝑨𝑻col,𝟐 𝒙
𝑻
𝛁𝒙 𝒙𝑻 𝑨𝒙 = 𝑨 + 𝑨𝑻 𝒙
⋮ = ⋮ + ⋮ = 𝑨+𝑨 𝒙⇒
𝛁𝒙 𝒙𝑻 𝑨𝒙 = 𝟐𝑨𝒙, 𝑨 is symmetric
𝝏𝑱 𝑨row,𝒏−𝟏 𝒙 𝑨𝑻col,𝒏−𝟏 𝒙
𝝏𝒙𝒏−𝟏 𝑨row,𝒏 𝒙 𝑨𝑻col,𝒏 𝒙
𝝏𝑱
𝝏𝒙𝒏

For the complex case, 𝑱 𝒙 = 𝒙𝑯 𝑨𝒙, where 𝑨 ∈ ℂ𝒏×𝒏 and 𝒙 ∈ ℂ𝒏×𝟏 :

𝝏𝑱 𝒙 + 𝒛 𝒆𝒌 𝑯 𝑨 𝒙 + 𝒛 𝒆𝒌 − 𝒙𝑯 𝑨𝒙 𝒛 𝒆𝑯 𝑯 𝟐 𝑯
𝒌 𝑨𝒙 + 𝒙 𝑨𝒆𝒌 + 𝒛 𝒆𝒌 𝑨𝒆𝒌
= 𝐥𝐢𝐦 = 𝐥𝐢𝐦
𝝏𝒙𝒌,𝒓 𝒛→𝟎 𝒛 𝒛→𝟎 𝒛
𝑯 𝑯
= 𝒆𝒌 𝑨𝒙 + 𝒙 𝑨𝒆𝒌

𝝏𝑱 𝒙 + 𝒛𝒊𝒆𝒌 𝑯 𝑨 𝒙 + 𝒛𝒊𝒆𝒌 − 𝒙𝑯 𝑨𝒙 𝒊𝒛 −𝒆𝑯 𝑯 𝟐 𝑯


𝒌 𝑨𝒙 + 𝒙 𝑨𝒆𝒌 + 𝒛 𝒆𝒌 𝑨𝒆𝒌
= 𝐥𝐢𝐦 = 𝐥𝐢𝐦
𝝏𝒙𝒌,𝒊 𝒛→𝟎 𝒛 𝒛→𝟎 𝒛
𝑯 𝑯
= −𝒊𝒆𝒌 𝑨𝒙 + 𝒊𝒙 𝑨𝒆𝒌

𝝏𝑱 𝝏𝑱
+𝒊 = 𝟐𝒆𝑯
𝒌 𝑨𝒙 = 𝟐𝑨row,𝒌 𝒙
𝝏𝒙𝒌,𝒓 𝝏𝒙𝒌,𝒊

𝝏𝑱 𝝏𝑱
+𝒊
𝝏𝒙𝟏,𝒓 𝝏𝒙𝟏,𝒊
𝝏𝑱 𝝏𝑱 𝑨row,𝟏 𝒙
+𝒊
𝝏𝒙𝟐,𝒓 𝝏𝒙𝟐,𝒊 𝑨row,𝟐 𝒙
⋮ =𝟐 ⋮ = 𝟐𝑨𝒙 ⇒ 𝛁𝒙 𝒙𝑯 𝑨𝒙 = 𝟐𝑨𝒙
𝝏𝑱 𝝏𝑱 𝑨row,𝒏−𝟏 𝒙
+𝒊
𝝏𝒙𝒏−𝟏,𝒓 𝝏𝒙𝒏−𝟏,𝒊 𝑨row,𝒏 𝒙
𝝏𝑱 𝝏𝑱
+𝒊
𝝏𝒙𝒏,𝒓 𝝏𝒙𝒏,𝒊

Now we investigate an example involving the optimization variables arranged in a matrix.


Consider matrix 𝑿 ∈ ℝ𝒏×𝒏 and 𝑱 𝑿 = det 𝑿 . Let 𝑿 𝒌, 𝒗 be the element of 𝑿 on its 𝒌th row
and 𝒗th column, and 𝒆𝒌,𝒗 be a matrix with one in the 𝒌th row and 𝒗th column and zero
elsewhere.

𝝏𝑱 𝑱 𝑿 + 𝒛𝒆𝒌,𝒗 − 𝑱 𝑿 det 𝑿 + 𝒛𝒆𝒌,𝒗 − det 𝑿


= 𝐥𝐢𝐦 = 𝐥𝐢𝐦
𝝏𝑿(𝒌, 𝒗) 𝒛→𝟎 𝒛 𝒛→𝟎 𝒛

19
We can expand the determinant using the 𝒌th row. The numerator is:
𝒌+𝒗 𝒌+𝒗
−𝟏 𝑿 𝒌, 𝒗 + 𝒛 𝑴𝒌,𝒗 − −𝟏 𝑿(𝒌, 𝒗)𝑴𝒌,𝒗

where minor 𝑴𝒌,𝒗 is the determinant of 𝑿 after removing the 𝒌th row and 𝒗th column. Thus,

𝝏𝑱 𝒌+𝒗
= −𝟏 𝑴𝒌,𝒗
𝝏𝑿 𝒌, 𝒗

𝝏𝑱 𝝏𝑱
… 𝟏+𝒏
𝝏𝑿 𝟏, 𝟏 𝝏𝑿 𝟏, 𝒏 𝑴𝟏,𝟏 … −𝟏 𝑴𝟏,𝒏
𝛁𝑿 𝑱 = ⋮ ⋱ ⋮ = ⋮ ⋱ ⋮
𝝏𝑱 𝝏𝑱 −𝟏 𝒏+𝟏 𝑴𝒏,𝟏 … 𝑴𝒏,𝒏

𝝏𝑿 𝒏, 𝟏 𝝏𝑿 𝒏, 𝒏
𝟏
This is the cofactor matrix, denoted by adj 𝑿 𝑻 . If 𝑿 is invertible, 𝑿−𝟏 = 𝐝𝐞 𝐭 𝑿 adj 𝑿 . Hence,
𝑻 𝑻
𝛁𝑿 det 𝑿 = adj 𝑿 and, if 𝑿 is invertible, 𝛁𝑿 det 𝑿 = adj 𝑿 = det{𝑿}𝑿−𝑻.

Finally, let 𝑿 ∈ ℝ𝒏×𝒏 , 𝒂, 𝒃 ∈ ℝ𝒏×𝟏 and 𝑱 𝑿 = 𝒂𝑻 𝑿𝒃.

𝒂 = 𝒂𝟏 𝒂𝟐 … 𝒂𝒏−𝟏 𝒂𝒏 𝑻

… 𝒃𝒏−𝟏 𝒃𝒏 𝑻
𝒃 = 𝒃𝟏 𝒃𝟐

𝝏𝑱 𝒂𝑻 𝑿 + 𝒛𝒆𝒌,𝒗 𝒃 − 𝒂𝑻 𝑿𝒃
= 𝐥𝐢𝐦 = 𝒂𝑻 𝒆𝒌,𝒗 𝒃 = 𝒂𝒌 𝒃𝒗
𝝏𝑿(𝒌, 𝒗) 𝒛→𝟎 𝒛

𝝏𝑱 𝝏𝑱

𝝏𝑿 𝟏, 𝟏 𝝏𝑿 𝟏, 𝟐 𝒂𝟏 𝒃𝟏 𝒂𝟏 𝒃𝟐 …
𝛁𝑿 𝑱 = 𝝏𝑱 𝝏𝑱 = 𝒂𝟐 𝒃𝟏 𝒂𝟐 𝒃𝟐 … = 𝒂𝒃𝑻 ⟹ 𝛁𝑿 𝒂𝑻 𝑿𝒃 = 𝒂𝒃𝑻
… ⋮ ⋮ ⋱
𝝏𝑿 𝟐, 𝟏 𝝏𝑿 𝟐, 𝟐
⋮ ⋮ ⋱

20
Convex and Concave Functions
𝒙𝟏
𝒙𝟐
Vector 𝒙 = ⋮ ∈ ℝ𝑵×𝟏 .
𝒙𝑵−𝟏
𝒙𝑵

A set 𝑺 ⊆ ℝ𝑵×𝟏 is convex if for every 𝒙, 𝒚 ∈ 𝑺, 𝝀𝒙 + 𝟏 − 𝝀 𝒚 ∈ 𝑺 for every 𝝀 ∈ 𝟎, 𝟏 .

Let 𝒈 𝒙 be a function defined over the domain 𝓓 ⊆ ℝ𝑵×𝟏 . Function 𝒈 𝒙 : 𝓓 → ℝ is convex if


its domain 𝓓 is a convex set and it satisfies:

𝒈 𝝀𝒙 + 𝟏 − 𝝀 𝒚 ≤ 𝝀𝒈 𝒙 + 𝟏 − 𝝀 𝒈 𝒚 for every 𝒙, 𝒚 ∈ 𝓓 and 𝝀 ∈ 𝟎, 𝟏 .

A function 𝒈 𝒙 is concave if its domain is a convex set and it satisfies:

𝒈 𝝀𝒙 + 𝟏 − 𝝀 𝒚 ≥ 𝝀𝒈 𝒙 + 𝟏 − 𝝀 𝒈 𝒚 for every 𝒙, 𝒚 ∈ 𝓓 and 𝝀 ∈ 𝟎, 𝟏 .

Examples:

𝒈 𝒙 = 𝒙𝟐 is convex over ℝ.

𝒈 𝒙 = ln 𝒙 is concave over ℝ+.

𝟏
𝒈 𝒙 = 𝒙 is convex over ℝ+.

𝒈 𝒙 = 𝒙𝟐𝟏 + 𝒙𝟐𝟐 is convex over ℝ𝟐 .

Differentiable Convex Functions

We now consider a differentiable function 𝒈 𝒙 on 𝓓. Given 𝒛 ∈ 𝓓, the sentence "𝒈 is


differentiable at 𝒛" is meaningful only if 𝒈 is at least defined in a neighborhood of 𝒛. Then, it is
normal to assume that 𝓓 is contained in an open set on which 𝒈 is differentiable or simply that
𝓓 is open. This means that for any point in 𝓓, there is a neighborhood that completely lies in
𝓓.

We make use of the following function:

21
𝝋 𝝀 = 𝒈 𝝀𝒙 + 𝟏 − 𝝀 𝒚 , 𝝀 ∈ 𝟎, 𝟏 . If 𝒈(𝒙) is continuously differentiable and 𝒛 =
𝒛𝟏 𝒛𝟐 … 𝒛𝑵−𝟏 𝒛𝑵 𝑻
= 𝝀𝒙 + 𝟏 − 𝝀 𝒚, then by the chain rule, the first derivative of
𝝋 𝝀 with respect to 𝝀 is:
𝑵 𝝏𝒈 𝝏𝒛𝒌 𝑵 𝝏𝒈
𝝋𝟏 𝝀 = 𝒌=𝟏 𝝏𝒛 𝝏𝝀 = 𝒌=𝟏 𝝏𝒛 𝒙𝒌 − 𝒚𝒌 = 𝛁 𝑻 𝒈 𝝀𝒙 + 𝟏 − 𝝀 𝒚 𝒙 − 𝒚 .
𝒌 𝒌

𝝏𝒈
𝝏𝒙𝟏
𝝏𝒈
𝝏𝒙𝟐
where 𝛁𝒈 𝒙 = ⋮ with the partial derivatives evaluated at 𝒙.
𝝏𝒈
𝝏𝒙𝑵−𝟏
𝝏𝒈
𝝏𝒙𝑵

If 𝒈(𝒙) is twice continuously differentiable, then the second derivative of 𝝋 𝝀 with respect to
𝝀 is:
𝒅 𝑵 𝝏𝒈 𝑵 𝒅 𝝏𝒈
𝝋 𝟐 𝝀 = 𝒅𝝀 𝒌=𝟏 𝝏𝒛 𝒙 𝒌 − 𝒚𝒌 = 𝒌=𝟏 𝒙 𝒌 − 𝒚𝒌 =
𝒌 𝒅𝝀 𝝏𝒛𝒌

𝑵 𝑵 𝝏𝟐 𝒈 𝝏𝒛𝒋 𝑵 𝑵 𝝏𝟐 𝒈
𝒌=𝟏 𝒙 𝒌 − 𝒚𝒌 𝒋=𝟏 𝝏𝒛 𝝏𝒛 𝝏𝝀 = 𝒌=𝟏 𝒋=𝟏 𝒙 𝒌 − 𝒚𝒌 𝒙 𝒋 − 𝒚𝒋
𝒌 𝒋 𝝏𝒛𝒌 𝝏𝒛𝒋

That is, 𝝋 𝟐 𝝀 = 𝒙 − 𝒚 𝐓 𝓗 𝝀𝒙 + 𝟏 − 𝝀 𝒚 𝒙 − 𝒚 where the Hessian matrix

𝝏𝟐 𝒈 𝝏𝟐 𝒈 𝝏𝟐 𝒈 𝝏𝟐 𝒈
𝝏𝒙𝟐𝟏 𝝏𝒙𝟏 𝝏𝒙𝟐 𝝏𝒙𝟏 𝝏𝒙𝑵−𝟏 𝝏𝒙𝟏 𝝏𝒙𝑵

𝝏𝟐 𝒈 𝝏𝟐 𝒈 𝝏𝟐 𝒈 𝝏𝟐 𝒈
𝝏𝒙𝟐 𝝏𝒙𝟏 𝝏𝒙𝟐𝟐 𝝏𝒙𝟐 𝝏𝒙𝑵−𝟏 𝝏𝒙𝟐 𝝏𝒙𝑵
𝓗 𝒙 = ⋮ ⋱ ⋮
𝟐
𝝏 𝒈 𝟐
𝝏𝟐 𝒈 𝝏 𝒈 𝝏𝟐 𝒈
𝝏𝒙𝑵−𝟏 𝝏𝒙𝟏 𝝏𝒙𝑵−𝟏 𝝏𝒙𝟐 𝝏𝒙𝟐𝑵−𝟏 𝝏𝒙𝑵−𝟏 𝝏𝒙𝑵

𝝏𝟐 𝒈 𝝏𝟐 𝒈 𝟐
𝝏 𝒈 𝝏𝟐 𝒈
𝝏𝒙𝑵 𝝏𝒙𝟏 𝝏𝒙𝑵 𝝏𝒙𝟐 𝝏𝒙𝑵 𝝏𝒙𝑵−𝟏 𝝏𝒙𝟐𝑵

We now show that 𝒈(𝒙) is convex iff 𝝋 𝝀 is convex. If 𝒈(𝒙) is convex, then for 𝒙, 𝒚 ∈ 𝓓 and
𝜶, 𝝀𝟏 , 𝝀𝟐 ∈ 𝟎, 𝟏 :

22
𝝋 𝜶𝝀𝟏 + (𝟏 − 𝜶)𝝀𝟐 = 𝒈 𝜶𝝀𝟏 + (𝟏 − 𝜶)𝝀𝟐 𝒙 + 𝟏 − 𝜶𝝀𝟏 + (𝟏 − 𝜶)𝝀𝟐 𝒚 =
𝒈 𝜶 𝝀𝟏 𝒙 + 𝟏 − 𝝀𝟏 𝒚 + 𝟏 − 𝜶 𝝀𝟐 𝒙 + 𝟏 − 𝝀𝟐 𝒚 ≤ 𝜶𝒈 𝝀𝟏 𝒙 + 𝟏 − 𝝀𝟏 𝒚 +
𝟏 − 𝜶 𝒈 𝝀𝟐 𝒙 + 𝟏 − 𝝀𝟐 𝒚 = 𝜶𝝋 𝝀𝟏 + 𝟏 − 𝜶 𝝋 𝝀𝟐 ⟹ 𝝋 𝝀 is convex.

If 𝝋 𝝀 is convex, then for 𝒙, 𝒚 ∈ 𝓓 and 𝜶, 𝝀𝟏 , 𝝀𝟐 ∈ 𝟎, 𝟏 :

𝒈 𝜶 𝝀𝟏 𝒙 + 𝟏 − 𝝀𝟏 𝒚 + 𝟏 − 𝜶 𝝀𝟐 𝒙 + 𝟏 − 𝝀𝟐 𝒚 = 𝝋 𝜶𝝀𝟏 + (𝟏 − 𝜶)𝝀𝟐 ≤ 𝜶𝝋 𝝀𝟏 +
𝟏 − 𝜶 𝝋 𝝀𝟐 = 𝜶𝒈 𝝀𝟏 𝒙 + 𝟏 − 𝝀𝟏 𝒚 + 𝟏 − 𝜶 𝒈 𝝀𝟐 𝒙 + 𝟏 − 𝝀𝟐 𝒚 ⟹ 𝒈(𝒙) is convex

First-Order Condition of Convexity

Assume that 𝒈(𝒙) is a convex differentiable function with continuous first-order partial
derivatives. Function 𝒈(𝒙) is convex iff 𝒈 𝒙 ≥ 𝒈 𝒚 + 𝛁 𝑻 𝒈 𝒚 𝒙 − 𝒚 for every 𝒙, 𝒚 ∈ 𝓓,

Let us now prove the equivalence between this condition and the main definition of convexity.
Since 𝒈 𝝀𝒙 + 𝟏 − 𝝀 𝒚 ≤ 𝝀𝒈 𝒙 + 𝟏 − 𝝀 𝒈 𝒚 for every 𝒙, 𝒚 ∈ 𝓓 and 𝝀 ∈ 𝟎, 𝟏 ,
𝟏
𝝋 𝝀 ≤ 𝝀𝝋 𝟏 + 𝟏 − 𝝀 𝝋 𝟎 . Thus, for 𝝀 > 0, 𝝋 𝟏 − 𝝋 𝟎 = 𝒈 𝒙 − 𝒈 𝒚 ≥ 𝝀 𝝋 𝝀 −

𝝋 𝟎 .

𝟏
Therefore, 𝒈 𝒙 − 𝒈 𝒚 ≥ 𝐥𝐢𝐦 𝝀 𝝋 𝝀 − 𝝋 𝟎 = 𝝋 𝟏 𝟎 = 𝛁𝑻𝒈 𝒚 𝒙 − 𝒚 ⟹
𝝀↓𝟎

𝒈 𝒙 ≥ 𝒈 𝒚 + 𝛁 𝑻 𝒈 𝒚 𝒙 − 𝒚 for every 𝒙, 𝒚 ∈ 𝓓. Consequently, 𝒈 𝒚 ≥ 𝒈 𝒙 +

𝛁 𝑻 𝒈 𝒙 𝒚 − 𝒙 . Adding the two inequalities, we obtain 𝟎 ≥ 𝛁 𝑻 𝒈 𝒚 − 𝛁 𝑻 𝒈 𝒙 𝒙−𝒚 ⟹

𝛁𝑻𝒈 𝒙 − 𝛁𝑻𝒈 𝒚 𝒙 − 𝒚 ≥ 𝟎 for every 𝒙, 𝒚 ∈ 𝓓.

To prove that the first-order condition implies convexity, we proceed as follows.

If 𝒈 𝒙 ≥ 𝒈 𝒚 + 𝛁 𝑻 𝒈 𝒚 𝒙 − 𝒚 for every 𝒙, 𝒚 ∈ 𝓓, then considering 𝒙𝟏 , 𝒙𝟐 , 𝒙𝟑 ∈ 𝓓:

𝒈 𝒙𝟏 ≥ 𝒈 𝒙𝟑 + 𝛁 𝑻 𝒈 𝒙𝟑 𝒙𝟏 − 𝒙𝟑

𝒈 𝒙𝟐 ≥ 𝒈 𝒙𝟑 + 𝛁 𝑻 𝒈 𝒙𝟑 𝒙𝟐 − 𝒙𝟑

Multiplying the first inequality by 𝝀 ∈ 𝟎, 𝟏 and the second by 𝟏 − 𝝀, and adding:

23
𝝀𝒈 𝒙𝟏 + 𝟏 − 𝝀 𝒈 𝒙𝟐 ≥ 𝒈 𝒙𝟑 + 𝛁 𝑻 𝒈 𝒙𝟑 𝝀𝒙𝟏 + 𝟏 − 𝝀 𝒙𝟐 − 𝒙𝟑

Since 𝓓 is a convex set, 𝝀𝒙𝟏 + 𝟏 − 𝝀 𝒙𝟐 ∈ 𝓓, we can set 𝒙𝟑 = 𝝀𝒙𝟏 + 𝟏 − 𝝀 𝒙𝟐 .


Consequently, 𝒈 𝝀𝒙𝟏 + 𝟏 − 𝝀 𝒙𝟐 ≤ 𝝀𝒈 𝒙𝟏 + 𝟏 − 𝝀 𝒈 𝒙𝟐 .

Second-Order Condition of Convexity

Now we tackle the case where 𝒈(𝒙) is twice continuously differentiable, i.e., it has continuous
second-order partial derivatives.

Function 𝒈 𝒙 is convex iff 𝓗(𝒙) is positive semidefinite for every 𝒙 ∈ 𝓓.

If the Hessian matrix is positive semidefinite at all points in 𝒙, 𝒚 ∈ 𝓓, there exists some
𝝀 ∈ 𝟎, 𝟏 such that:

𝟏
𝒈 𝒙 = 𝒈 𝒚 + 𝛁T𝒈 𝒚 𝒙 − 𝒚 + 𝒙 − 𝒚 T 𝓗 𝝀𝒙 + (𝟏 − 𝝀)𝒚 𝒙 − 𝒚
𝟐
= 𝒈 𝒚 + 𝛁 T 𝒈 𝒚 𝒙 − 𝒚 + nonnegative term

That is, 𝒈 𝒙 ≥ 𝒈 𝒚 + 𝛁 𝑻 𝒈 𝒚 𝒙 − 𝒚 for every 𝒙, 𝒚 ∈ 𝓓. Consequently, 𝒈(𝒙) is convex by


virtue of first-order condition.

We prove now that the convexity of 𝒈 𝒙 implies that 𝓗(𝒙) is positive semidefinite for every
𝒙 ∈ 𝓓. When 𝑵 = 𝟏, a convex function 𝒈(𝒙) with first derivative 𝒈(𝟏) 𝒙 satisfies
𝒈(𝟏) 𝒙 −𝒈(𝟏) 𝒚
𝒈(𝟏) 𝒙 − 𝒈(𝟏) 𝒚 𝒙 − 𝒚 ≥ 𝟎. If 𝒙 ≠ 𝒚, dividing by 𝒙 − 𝒚 𝟐
we obtain ≥ 0.
𝒙−𝒚

𝒅𝟐 𝒈 𝒙
As 𝒚 tends to 𝒙, the left-hand-side approaches the second derivative yielding ≥ 𝟎.
𝒅𝒙𝟐

If 𝒈(𝒙) is convex, then scalar-valued 𝝋 𝝀 is convex and its second derivative, 𝝋 𝟐 𝝀 , is


nonnegative. Hence, for 𝒙, 𝒚 ∈ 𝓓 and 𝝀 ∈ 𝟎, 𝟏 : 𝒙 − 𝒚 𝐓 𝓗 𝝀𝒙 + 𝟏 − 𝝀 𝒚 𝒙 − 𝒚 ≥ 𝟎.
Consider point 𝒛 ∈ 𝓓 at which we want to show that 𝓗 𝒛 is positive semidefinite. Since the
domain is open, there is a neighborhood about point 𝒛 that lies in 𝓓. Given a certain direction
𝒗, we can choose two points 𝒙 = 𝒛 + 𝒕𝒗 and 𝒚 = 𝒛 − 𝒕𝒗 where 𝒕 is a positive real-valued
parameter chosen such that 𝒛 + 𝒕𝒗, 𝒛 − 𝒕𝒗 ∈ 𝓓. Since 𝝋(𝟐) 𝝀 ≥ 𝟎 for 𝝀 ∈ 𝟎, 𝟏 , then

24
𝟏 𝟏
𝝋(𝟐) ≥ 𝟎. Thus, for 𝝀 = 𝟐 , 𝒕 ∈ ℝ+, 𝒙 = 𝒛 + 𝒕𝒗, 𝒚 = 𝒛 − 𝒕𝒗 and 𝒙, 𝒚 ∈ 𝓓:
𝟐

𝒙 − 𝒚 𝐓 𝓗 𝝀𝒙 + 𝟏 − 𝝀 𝒚 𝒙 − 𝒚 = 𝟒𝒕𝟐 𝒗𝑻 𝓗 𝒛 𝒗 ≥ 𝟎 ⟹ 𝒗𝑻 𝓗 𝒛 𝒗 ≥ 𝟎. This is true for all


𝒗 ∈ ℝ𝑵×𝟏 . Thus, 𝓗 𝒙 is positive semidefinite for all 𝒙 ∈ 𝓓.

Strictly Convex Functions

A strictly convex function 𝒈(𝒙) satisfies:

For every 𝒙, 𝒚 ∈ 𝓓, 𝒙 ≠ 𝒚, 𝝀 ∈ (𝟎, 𝟏), 𝒈 𝝀𝒙 + 𝟏 − 𝝀 𝒚 < 𝜆𝑔 𝒙 + 𝟏 − 𝝀 𝒈 𝒚

Like in the preceding section it can be shown that strict convexity is necessary and sufficient for

𝒈 𝒙 > 𝑔 𝒚 + 𝛁 𝑻 𝒈 𝒚 𝒙 − 𝒚 for every 𝒙, 𝒚 ∈ 𝓓, 𝒙 ≠ 𝒚.

If the Hessian matrix is positive definite at all points in 𝓓, 𝒙, 𝒚 ∈ 𝓓, 𝒙 ≠ 𝒚, there exists some
𝝀 ∈ 𝟎, 𝟏 such that:

𝟏
𝒈 𝒙 = 𝒈 𝒚 + 𝛁T𝒈 𝒚 𝒙 − 𝒚 + 𝒙 − 𝒚 T 𝓗 𝝀𝒙 + (𝟏 − 𝝀)𝒚 𝒙 − 𝒚
𝟐
= 𝒈 𝒚 + 𝛁 T 𝒈 𝒚 𝒙 − 𝒚 + positive term

That is, 𝒈 𝒙 > 𝑔 𝒚 + 𝛁 𝑻 𝒈 𝒚 𝒙 − 𝒚 for every 𝒙, 𝒚 ∈ 𝓓, 𝒙 ≠ 𝒚. Consequently, 𝒈(𝒙) is strictly


convex by virtue of first-order condition.

The converse is not necessarily true. For example, the one-dimensional function 𝒈 𝒙 = 𝒙𝟒 is
strictly convex although its second derivative is zero at 𝒙 = 𝟎.

Strongly Convex Functions

A strongly convex function 𝒈(𝒙) satisfies:

For some 𝜼 > 0 and every 𝒙, 𝒚 ∈ 𝓓, 𝝀 ∈ 𝟎, 𝟏 ,

𝟏 𝟐
𝒈 𝝀𝒙 + 𝟏 − 𝝀 𝒚 ≤ 𝝀𝒈 𝒙 + 𝟏 − 𝝀 𝒈 𝒚 − 𝟐 𝜼𝝀 𝟏 − 𝝀 𝒙 − 𝒚 𝟐

25
It is clear that strong convexity implies strict convexity. Now we repeat the above derivations
for differentiable convex functions in case the function is strongly convex.

𝟏 𝟐
Since 𝒈 𝝀𝒙 + 𝟏 − 𝝀 𝒚 ≤ 𝝀𝒈 𝒙 + 𝟏 − 𝝀 𝒈 𝒚 − 𝟐 𝜼𝝀 𝟏 − 𝝀 𝒙 − 𝒚 𝟐 for every 𝒙, 𝒚 ∈ 𝓓
𝟏
and 𝝀 ∈ 𝟎, 𝟏 , 𝝋 𝝀 ≤ 𝝀𝝋 𝟏 + 𝟏 − 𝝀 𝝋 𝟎 − 𝟐 𝜼𝝀 𝟏 − 𝝀 𝒙 − 𝒚 𝟐𝟐 . Thus, for 𝝀 > 0,
𝟏 𝟏
𝝋 𝟏 −𝝋 𝟎 =𝒈 𝒙 −𝒈 𝒚 ≥ 𝝀 𝝋 𝝀 −𝝋 𝟎 + 𝟐 𝜼 𝟏 − 𝝀 𝒙 − 𝒚 𝟐𝟐 . Thus,

𝟏 𝟏 𝟐
𝟏 𝟐
𝒈 𝒙 − 𝒈 𝒚 ≥ 𝐥𝐢𝐦 𝝋 𝝀 −𝝋 𝟎 + 𝜼 𝟏−𝝀 𝒙−𝒚 𝟐 = 𝝋(𝟏) 𝟎 + 𝜼 𝒙 − 𝒚 𝟐
𝝀↓𝟎 𝝀 𝟐 𝟐
𝟏
= 𝛁 𝑻 𝒈 𝒚 𝒙 − 𝒚 + 𝜼 𝒙 − 𝒚 𝟐𝟐
𝟐

𝟏 𝟐
Therefore, 𝒈 𝒙 ≥ 𝒈 𝒚 +𝛁 𝑻 𝒈 𝒚 𝒙 − 𝒚 + 𝟐 𝜼 𝒙 − 𝒚 𝟐

𝟏 𝟐
Similarly, 𝒈 𝒚 ≥ 𝒈 𝒙 +𝛁 𝑻 𝒈 𝒙 𝒚 − 𝒙 + 𝟐 𝜼 𝒙 − 𝒚 𝟐

𝟐
Adding, 𝛁 𝑻 𝒈 𝒚 𝒙 − 𝒚 +𝛁 𝑻 𝒈 𝒙 𝒚 − 𝒙 + 𝜼 𝒙 − 𝒚 𝟐 ≤𝟎⟹

𝑻 𝟐
𝛁𝒈 𝒙 − 𝛁𝒈 𝒚 𝒙−𝒚 ≥ 𝜼 𝒙−𝒚 𝟐

Now suppose we have a function 𝒈(𝒙) that satisfies

𝟏 𝟐
𝒈 𝒙 ≥ 𝒈 𝒚 +𝛁 𝑻 𝒈 𝒚 𝒙 − 𝒚 + 𝜼 𝒙 − 𝒚 𝟐
𝟐

We can show that the function satisfies the main definition of strong convexity.

Consider 𝒙𝟏 , 𝒙𝟐 , 𝒙𝟑 ∈ 𝓓.

𝟏 𝟐
𝒈 𝒙𝟏 ≥ 𝒈 𝒙𝟑 +𝛁 𝑻 𝒈 𝒙𝟑 𝒙𝟏 − 𝒙𝟑 + 𝜼 𝒙𝟏 − 𝒙𝟑 𝟐
𝟐

𝟏 𝟐
𝒈 𝒙𝟐 ≥ 𝒈 𝒙𝟑 +𝛁 𝑻 𝒈 𝒙𝟑 𝒙𝟐 − 𝒙𝟑 + 𝜼 𝒙𝟐 − 𝒙𝟑 𝟐
𝟐

26
𝝀𝒈 𝒙𝟏 + 𝟏 − 𝝀 𝒈 𝒙𝟐
𝟏 𝟐
≥ 𝒈 𝒙𝟑 +𝛁 𝑻 𝒈 𝒙𝟑 𝝀𝒙𝟏 + 𝟏 − 𝝀 𝒙𝟐 − 𝒙𝟑 + 𝜼𝝀 𝒙𝟏 − 𝒙𝟑 𝟐
𝟐
𝟏
+ 𝜼 𝟏 − 𝝀 𝒙𝟐 − 𝒙𝟑 𝟐𝟐
𝟐

Choosing 𝒙𝟑 = 𝝀𝒙𝟏 + 𝟏 − 𝝀 𝒙𝟐 ,

𝟏 𝟐
𝝀𝒈 𝒙𝟏 + 𝟏 − 𝝀 𝒈 𝒙𝟐 ≥ 𝒈 𝝀𝒙𝟏 + 𝟏 − 𝝀 𝒙𝟐 + 𝜼𝝀 𝟏 − 𝝀 𝒙𝟏 − 𝒙𝟐 𝟐
𝟐

Assume that 𝒈(𝒙) is twice continuously differentiable and 𝓗 𝒙 − 𝜼𝑰𝑵×𝑵 is positive


semidefinite over 𝓓. For 𝒙, 𝒚 ∈ 𝓓, there exists some 𝝀 ∈ 𝟎, 𝟏 such that:

𝟏
𝒈 𝒙 = 𝒈 𝒚 + 𝛁T𝒈 𝒚 𝒙 − 𝒚 + 𝒙 − 𝒚 T 𝓗 𝝀𝒙 + (𝟏 − 𝝀)𝒚 𝒙 − 𝒚
𝟐

T
Since 𝒙 − 𝒚 𝓗 𝝀𝒙 + (𝟏 − 𝝀)𝒚 − 𝜼𝑰𝑵×𝑵 𝒙 − 𝒚 ≥ 𝟎 for all points in the domain, then

𝟏 𝟐
𝒈 𝒙 ≥ 𝒈 𝒚 + 𝛁T𝒈 𝒚 𝒙 − 𝒚 + 𝟐 𝜼 𝒙 − 𝒚 𝟐 and 𝒈(𝒙) is strongly convex.

When 𝑵 = 𝟏, a strongly convex function 𝒈(𝒙) with first derivative 𝒈(𝟏) 𝒙 satisfies

𝒈(𝟏) 𝒙 − 𝒈(𝟏) 𝒚 𝒙 − 𝒚 ≥ 𝜼 𝒙 − 𝒚 𝟐𝟐 . If 𝒙 ≠ 𝒚, dividing by 𝒙 − 𝒚 𝟐


we obtain
𝒈(𝟏) 𝒙 −𝒈(𝟏) 𝒚
≥ 𝜼.
𝒙−𝒚

𝒅𝟐 𝒈 𝒙
As 𝒚 tends to 𝒙, the left-hand-side approaches the second derivative yielding ≥𝜼⟹
𝒅𝒙𝟐

𝒅𝟐 𝒈 𝒙
− 𝜼 ≥ 𝟎.
𝒅𝒙𝟐

We use again the function 𝝋 𝝀 = 𝒈 𝝀𝒙 + 𝟏 − 𝝀 𝒚 , 𝝀 ∈ 𝟎, 𝟏 . We first show that 𝒈(𝒙) is


strongly convex iff 𝝋 𝝀 strongly convex. If 𝒈(𝒙) is strongly convex with parameter 𝜼, then for
𝒙, 𝒚 ∈ 𝓓, 𝒙 ≠ 𝒚 and 𝜶, 𝝀𝟏 , 𝝀𝟐 ∈ 𝟎, 𝟏 :

𝝋 𝜶𝝀𝟏 + (𝟏 − 𝜶)𝝀𝟐 = 𝒈 𝜶𝝀𝟏 + (𝟏 − 𝜶)𝝀𝟐 𝒙 + 𝟏 − 𝜶𝝀𝟏 + (𝟏 − 𝜶)𝝀𝟐 𝒚 =


𝒈 𝜶 𝝀𝟏 𝒙 + 𝟏 − 𝝀𝟏 𝒚 + 𝟏 − 𝜶 𝝀𝟐 𝒙 + 𝟏 − 𝝀𝟐 𝒚 ≤ 𝜶𝒈 𝝀𝟏 𝒙 + 𝟏 − 𝝀𝟏 𝒚 +

27
𝟏 𝟐 𝟐
𝟏 − 𝜶 𝒈 𝝀𝟐 𝒙 + 𝟏 − 𝝀𝟐 𝒚 − 𝟐 𝜼𝜶 𝟏 − 𝜶 𝝀𝟏 − 𝝀𝟐 𝒙−𝒚 𝟐 = 𝜶𝝋 𝝀𝟏 + 𝟏 −
𝟏 𝟐 𝟐
𝜶 𝝋 𝝀𝟐 − 𝟐 𝜼 𝒙 − 𝒚 𝟐 𝜶 𝟏 − 𝜶 𝝀𝟏 − 𝝀𝟐 ⟹ 𝝋 𝝀 is strongly convex with parameter

𝜼 𝒙 − 𝒚 𝟐𝟐 .

𝟐
If 𝝋 𝝀 is strongly convex with parameter 𝜼 𝒙 − 𝒚 𝟐 for 𝒙, 𝒚 ∈ 𝓓, then for 𝜶, 𝝀𝟏 , 𝝀𝟐 ∈ 𝟎, 𝟏 :

𝒈 𝜶 𝝀𝟏 𝒙 + 𝟏 − 𝝀𝟏 𝒚 + 𝟏 − 𝜶 𝝀𝟐 𝒙 + 𝟏 − 𝝀𝟐 𝒚 = 𝝋 𝜶𝝀𝟏 + (𝟏 − 𝜶)𝝀𝟐 ≤ 𝜶𝝋 𝝀𝟏 +
𝟏 𝟐 𝟐
𝟏 − 𝜶 𝝋 𝝀𝟐 − 𝟐 𝜼 𝒙 − 𝒚 𝟐 𝜶 𝟏 − 𝜶 𝝀𝟏 − 𝝀𝟐 = 𝜶𝒈 𝝀𝟏 𝒙 + 𝟏 − 𝝀𝟏 𝒚 + 𝟏 −
𝟏 𝟐 𝟐
𝜶 𝒈 𝝀𝟐 𝒙 + 𝟏 − 𝝀𝟐 𝒚 − 𝟐 𝜼𝜶 𝟏 − 𝜶 𝝀𝟏 − 𝝀𝟐 𝒙−𝒚 𝟐 ⟹ 𝒈(𝒙) is strongly convex.

If 𝒈(𝒙) is convex, then scalar-valued 𝝋 𝝀 is strongly convex and its second derivative is greater
than or equal to 𝜼 𝒙 − 𝒚 𝟐𝟐 . Hence, for 𝒙, 𝒚 ∈ 𝓓 and 𝝀 ∈ 𝟎, 𝟏 : 𝒙 − 𝒚 𝐓 𝓗 𝝀𝒙 +
𝟏 − 𝝀 𝒚 𝒙 − 𝒚 ≥ 𝜼 𝒙 − 𝒚 𝟐𝟐 . Similar to the case of ordinary convexity, we conclude that
𝓗 𝒙 − 𝜼𝑰𝑵×𝑵 is positive semidefinite for all 𝒙 ∈ 𝓓.

28
Convex Optimization Problems

Suppose we are interested in minimizing the convex function 𝒇(𝒙) with domain 𝓓. If 𝒙∗ is a
local optimum, then it is a global optimum. We can prove this by contradiction. Since 𝒙∗ is a
local optimum, there exists an open neighborhood about 𝒙∗ with radius 𝝐, denoted 𝑵𝝐 𝒙∗ ,
such that for all 𝒙 ∈ 𝑵𝝐 𝒙∗ , i.e., all 𝒙 such that 𝒙 − 𝒙∗ 𝟐 < 𝜖, we have 𝒇(𝒙) ≥
𝒇 𝒙∗ . Assume that 𝒙∗ is not a global optimum. Specifically, assume that there exists 𝒛 ∈ 𝓓 but
𝒛 ∉ 𝑵𝝐 𝒙∗ such that 𝒇 𝒛 < 𝑓 𝒙∗ .

For some 𝝀 ∈ 𝟎, 𝟏 , 𝒘 = 𝝀𝒛 + (𝟏 − 𝝀)𝒙∗ ∈ 𝑫 and satisfies 𝒘 ∈ 𝑵𝝐 𝒙∗ . This is the case if


𝝐
𝝀< . Due to convexity, 𝒇 𝒘 = 𝒇 𝝀𝒛 + 𝟏 − 𝝀 𝒙∗ ≤ 𝝀𝒇 𝒛 + 𝟏 − 𝝀 𝒇 𝒙∗ <
𝒛−𝒙∗ 𝟐

𝜆𝑓 𝒙∗ + 𝟏 − 𝝀 𝒇 𝒙∗ = 𝒇 𝒙∗ , thereby violating the assumption that 𝒙∗ is a local optimum.

A convex optimization problem can be cast in the form:

𝐦𝐢𝐧 𝒇 𝒙
𝒙

subject to 𝒉𝒊 𝒙 ≤ 𝟎, 𝒊 = 𝟏, … , 𝒒

𝒈𝒋 𝒙 = 𝟎, 𝒋 = 𝟏, … , 𝒑

𝒒 𝒑
The functions 𝒇(𝒙) and 𝒉𝒊 𝒙 𝒊=𝟏 are convex, and 𝒈𝒋 𝒙 are affine. Recall that a convex
𝒋=𝟏

function has a domain which is a convex set. The feasible set, i.e., the set of all vectors 𝒙 that
satisfy the constraints, is a convex set. The problem is infeasible if there is no 𝒙 that satisfies the
constraints of the problem.

Assume that the optimal value of the feasible convex program is 𝒑∗ = 𝐦𝐢𝐧𝒙 𝒇(𝒙). The set of
optimal solutions, 𝓞, is a convex set. If 𝒙, 𝒚 ∈ 𝓞, then 𝒇 𝒙 = 𝒑∗ and 𝒇 𝒚 = 𝒑∗ . For 𝝀 ∈ 𝟎, 𝟏 ,
𝒑∗ ≤ 𝒇 𝝀𝒙 + 𝟏 − 𝝀 𝒚 ≤ 𝝀𝒇 𝒙 + 𝟏 − 𝝀 𝒇 𝒚 = 𝒑∗ . Hence, 𝒇 𝝀𝒙 + 𝟏 − 𝝀 𝒚 = 𝒑∗ for all
𝒙, 𝒚 ∈ 𝓞 and 𝝀 ∈ 𝟎, 𝟏 . That is, 𝓞 is a convex set. (If the problem is infeasible, 𝓞 is empty. The
empty set is considered convex.)

29
If 𝒇(𝒙) is strictly convex and 𝒙, 𝒚 ∈ 𝓞, then for 𝝀 ∈ 𝟎, 𝟏 , if 𝒙 ≠ 𝒚, 𝒇 𝝀𝒙 + 𝟏 − 𝝀 𝒚 <
𝜆𝑓 𝒙 + 𝟏 − 𝝀 𝒇 𝒚 = 𝒑∗ , thereby contradicting 𝒑∗ = 𝐦𝐢𝐧𝒙 𝒇(𝒙). Hence, 𝒙 = 𝒚. The optimal
solution, if it exists, is unique when the objective function is strictly convex. If the objective
function is convex, but not strictly convex, then the optimal solution may be unique or not. For
example, 𝐦𝐢𝐧𝒙∈ℝ𝒏×𝟏 𝒙 𝟐 = 𝟎 and is achieved uniquely by the 𝒏-dimensional all-zero vector.
𝟐
This is despite the fact that 𝒙 𝟐 is not strictly convex (whereas 𝒙 𝟐 is.)

30
Periodic Signals and Fourier Series

We focus here on one-dimensional signals. A signal 𝒙(𝜶) is periodic if there exists ∆> 0 such
that 𝒙 𝜶 + ∆ = 𝒙(𝜶) for all 𝜶.

Note that ∆ > 0 because all signals satisfy 𝒙 𝜶 + 𝟎 = 𝒙 𝜶 .

The minimum value of ∆, if it exists, satisfying 𝒙 𝜶 + ∆ = 𝒙(𝜶) is called the fundamental,


basic, or prime period. The minimum may not exist as in the case of constant signals and, for
instance, the signal 𝒙 𝜶 = 𝕀 𝜶 ∈ 𝓠 where 𝓠 is the set of rational numbers.

The Fourier Series of periodic signal 𝒙(𝜶) with period ∆ is given by:


𝒌 𝒌
𝒙 𝜶 = 𝒂𝟎 + 𝒂𝒌 𝐜𝐨𝐬 𝟐𝝅 𝜶 + 𝒃𝒌 𝐬𝐢𝐧 𝟐𝝅 𝜶
∆ ∆
𝒌=𝟏

where for any 𝜷 ∈ ℝ

𝟏 𝜷+∆
𝒂𝟎 = ∆ 𝜷
𝒙 𝜶 𝒅𝜶

𝟐 𝜷+∆ 𝒌
∀𝒌 ∈ 𝟏, 𝟐, 𝟑, … 𝒂𝒌 = ∆ 𝜷
𝒙 𝜶 𝐜𝐨𝐬 𝟐𝝅 ∆ 𝜶 𝒅𝜶

𝟐 𝜷+∆ 𝒌
∀𝒌 ∈ 𝟏, 𝟐, 𝟑, … 𝒃𝒌 = ∆ 𝜷
𝒙 𝜶 𝐬𝐢𝐧 𝟐𝝅 ∆ 𝜶 𝒅𝜶

The complex Fourier series is given by:


𝒌
𝒙 𝜶 = 𝒄𝒌 exp 𝒊𝟐𝝅 𝜶

𝒌=−∞

𝟏 𝜷+∆ 𝒌
where for all integer 𝒌, 𝒄𝒌 = ∆ 𝜷
𝒙 𝜶 exp −𝒊𝟐𝝅 ∆ 𝜶 𝒅𝜶

Note that 𝜷 is typically set to 𝟎, −∆ 𝟐 or whatever value may be convenient for the evaluation
of the integrals. If 𝒙 𝜶 is real-valued, 𝒄𝒌 = 𝒄∗−𝒌 .

31
Important Note: The equality in Fourier series is not pointwise equality over ℝ. For periodic
1 𝜷+∆ 𝟐
𝑳𝟐 𝟎, ∆ signals with finite energy over a period, i.e., for which ∆ 𝜷
𝒙 𝜶 𝒅𝜶 is finite, the

𝜷+∆ 𝒌 𝟐
𝒏
equality should be understood as 𝐥𝐢𝐦𝒏→∞ 𝜷
𝒙 𝜶 − 𝒌=−𝒏 𝒄𝒌 exp 𝒊𝟐𝝅 ∆ 𝜶 𝒅𝜶 = 𝟎. In

1966, Lennart Carelson proved the following pointwise almost everywhere convergence result
for finite-energy signals:

𝒏 𝒌
𝒙 𝜶 = 𝐥𝐢𝐦 𝒌=−𝒏 𝒄𝒌 exp 𝒊𝟐𝝅 ∆ 𝜶 almost everywhere
𝒏→∞

Almost everywhere means over the whole domain perhaps excluding a set of measure zero.
Pointwise convergence over set 𝛀 means that ∀𝜶 ∈ 𝛀, and ∀𝝐 > 0, there exists 𝒏 𝜶 ≥ 𝟎 such
𝒎 𝒌
that 𝒙 𝜶 − 𝒌=−𝒎 𝒄𝒌 exp 𝒊𝟐𝝅 ∆ 𝜶 < 𝜖 for all 𝒎 ≥ 𝒏 𝜶 .

If 𝒙 𝜶 has a finite number of finite discontinuities over a period, and if at each point in the
period the left and right derivatives exist, it can be shown that at all points the Fourier sum
𝟏
converges pointwise to 𝟐 𝒙 𝜶− + 𝒙 𝜶+ , where 𝒙 𝜶− and 𝒙 𝜶+ are the one-sided limits1

at 𝜶.

Periodization of a Signal

Consider the signal 𝒚(𝜶) and 𝑷 > 0. The summation 𝒎=−∞ 𝒚 𝜶 − 𝒎𝑷 , provided that it
converges for all 𝜶, is called the periodization of 𝒚(𝜶).

∞ ∞ ∞

𝒚 𝜶+𝑷 = 𝒚 𝜶 + 𝑷 − 𝒎𝑷 = 𝒚 𝜶− 𝒎−𝟏 𝑷 = 𝒚 𝜶 − 𝒎𝑷
𝒎=−∞ 𝒎=−∞ 𝒎=−∞

thereby indicating that 𝑷 is a period for this signal or, equivalently, that the summation repeats
every 𝑷.

1
https://en.wikipedia.org/wiki/One-sided_limit

32
If the signal has a bounded support with 𝜶 ∈ −𝒒, 𝒒 , and if 𝑷 > 2𝒒, then the shifted replicas in
the infinite sum will not overlap.

33
Dirac Delta "Function"

In this course, we adopt a mathematically non-rigorous approach to Dirac delta, 𝜹 𝒕 , which is a


distribution or generalized function and not a function in the strict mathematical sense. To us, the Dirac
delta is mainly defined in terms of its action on other functions when it is coupled with them under the
integration sign. Specifically, we will take the Dirac delta to satisfy 𝐥𝐢𝐦𝒕→±∞ 𝜹 𝒕 = 𝟎 and

∞ 𝟎+

𝜹 𝒕 𝒙 𝒕 𝒅𝒕 = 𝜹 𝒕 𝒙 𝒕 𝒅𝒕 = 𝒙 𝟎
−∞ 𝟎−

provided that 𝒙 𝒕 is continuous at 𝒕 = 𝟎. The lower limit 𝟎− can be taken to mean a negative number
that is infinitesimally small in magnitude. Similarly, 𝟎+ can be taken to be a positive number that is
infinitesimally small in magnitude. If zero is not an interior point of the domain of integration 𝓓, then

𝓓
𝜹 𝒕 𝒙 𝒕 𝒅𝒕 = 𝟎 if zero is outside 𝓓. We will take the integral to be undefined if 𝟎 is a boundary
point.


- When 𝒙 𝒕 = 𝟏, −∞
𝜹 𝒕 𝒅𝒕 = 𝟏.

- What is 𝜹 𝒂𝒕 − 𝒃 , 𝒂 ≠ 𝟎? According to our method, we will just examine the integral


−∞
𝜹 𝒂𝒕 − 𝒃 𝒙 𝒕 𝒅𝒕 and see how it is related to the basic property of Dirac delta. Let 𝒖 = 𝒂𝒕 − 𝒃.

𝟏 ∞ 𝒖+𝒃
If 𝒂 > 0, the integral becomes 𝒂 −∞
𝜹 𝒖 𝒙 𝒂
𝒅𝒖. If 𝒂 < 0, the integral becomes
𝟏 −∞ 𝒖+𝒃
𝒂 ∞
𝜹 𝒖 𝒙 𝒂
𝒅𝒖. Both cases can be combined as:

𝟏 ∞ 𝒖+𝒃 𝟏 𝟎+𝒃 𝟏 𝒃 𝟏 ∞ 𝒃
−∞
𝜹 𝒖 𝒙 𝒅𝒖 = 𝒙 = 𝒙 = −∞
𝜹 𝒕 − 𝒂 𝒙 𝒕 𝒅𝒕.
𝒂 𝒂 𝒂 𝒂 𝒂 𝒂 𝒂

𝟏 𝒃
Because of this, we will set 𝜹 𝒂𝒕 − 𝒃 = 𝒂
𝜹 𝒕−𝒂 .

- When 𝒃 = 𝟎 and 𝒂 = −𝟏, we obtain 𝜹 −𝒕 = 𝜹 𝒕 meaning that the Dirac delta is even.


- When 𝒃 = 𝒕𝟎 and 𝒂 = 𝟏, −∞
𝜹 𝒕 − 𝒕𝟎 𝒙 𝒕 𝒅𝒕 = 𝒙 𝒕𝟎 provided that 𝒙 𝒕 is continuous at 𝒕 = 𝒕𝟎 .

- What is 𝒚(𝒕)𝜹 𝒕 ? Assuming that 𝒙 𝒕 and 𝒚(𝒕) are continuous at 𝒕 = 𝟎,


∞ ∞ ∞
−∞
𝒚(𝒕)𝜹 𝒕 𝒙 𝒕 𝒅𝒕 = −∞
𝜹 𝒕 𝒚(𝒕)𝒙 𝒕 𝒅𝒕 = 𝒚 𝟎 𝒙(𝟎) = −∞
𝒚(𝟎)𝜹 𝒕 𝒙 𝒕 𝒅𝒕

34
Hence, 𝒚 𝒕 𝜹 𝒕 = 𝒚 𝟎 𝜹 𝒕 . If 𝒚 𝟎 = 𝟎, then 𝒚 𝒕 𝜹 𝒕 = 𝟎.


-Convolution: 𝜹 𝒕 − 𝒖 ∗ 𝒙 𝒕 − 𝒘 = −∞
𝜹 𝒛 − 𝒖 𝒙 𝒕 − 𝒛 − 𝒘 𝒅𝒛 = 𝒙 𝒕 − 𝒖 − 𝒘 .

In the special case of 𝒘 = 𝟎 and 𝒖 = 𝟎, 𝒙 𝒕 ∗ 𝜹 𝒕 = 𝒙(𝒕).

When 𝒘 = 𝟎, 𝒙 𝒕 ∗ 𝜹 𝒕 − 𝒖 = 𝒙(𝒕 − 𝒖).

It is mathematically valid to convolve Dirac deltas together, i.e., 𝜹 𝒕 − 𝒖 ∗ 𝜹 𝒕 − 𝒘 = 𝜹 𝒕 − 𝒖 − 𝒘 .

On the other hand, it is invalid to multiply Dirac deltas.

-Derivatives: The Dirac delta is infinitely differentiable (although, of course, not in the classical sense).
Like the Dirac delta, its derivatives can be characterized through their impact on other functions under
𝟏 𝟏
the integral sign. Denoting the first derivative of the Dirac delta by 𝜹 𝒕 and of 𝒙(𝒕) as 𝒙 𝒕:

∞ ∞ ∞
𝟏 𝟏 𝟏
𝜹 𝒕 𝒙 𝒕 𝒅𝒕 = 𝒙 𝒕 𝒅𝜹 𝒕 = − 𝜹 𝒕 𝒙 𝒕 𝒅𝒕 = −𝒙 𝟎
−∞ −∞ −∞

𝟏
provided that 𝒙 𝒕 is continuous at 𝒕 = 𝟎. (Note the use of 𝐥𝐢𝐦𝒕→±∞ 𝜹 𝒕 = 𝟎.)

𝒅𝒖(𝒕) ∞ 𝒅𝒖 𝒕
-𝜹 𝒕 = 𝒅𝒕
, where 𝒖(𝒕) is Heaviside unit-step function. This is because −∞ 𝒅𝒕
𝒙 𝒕 𝒅𝒕 =
∞ ∞
𝐥𝐢𝐦𝒕→∞ 𝒙 𝒕 𝒖(𝒕) − −∞
𝒖(𝒕)𝒙 𝟏 𝒅𝒕 = 𝐥𝐢𝐦𝒕→∞ 𝒙 𝒕 − 𝟎
𝒙 𝟏
𝒅𝒕 = 𝐥𝐢𝐦𝒕→∞ 𝒙 𝒕 − 𝐥𝐢𝐦𝒕→∞ 𝒙 𝒕 −
𝒙(𝟎) = 𝒙 𝟎 .


-CTFT 𝜹 𝒕 = −∞
𝜹 𝒕 exp −𝒊𝟐𝝅𝒇𝒕 𝒅𝒕 = exp −𝒊𝟐𝝅𝒇𝟎 = 𝟏. Hence, the inverse CTFT of 𝟏 is 𝜹 𝒕 ,
∞ ∞
i.e., −∞
𝟏exp 𝒊𝟐𝝅𝒇𝒕 𝒅𝒇 = 𝜹 𝒕 . Generally, −∞
exp ±𝒊𝟐𝝅𝒘𝒛 𝒅𝒘 = 𝜹 𝒛 .

-Composition with Dirac Delta: Consider differentiable function 𝒈 𝒕 with roots at 𝒕𝟏 , 𝒕𝟐 , …, i.e.,
𝒈 𝒕𝟏 = 𝟎, 𝒈 𝒕𝟐 = 𝟎, …. Suppose that 𝒈 𝒕 has continuous nonzero derivatives at 𝒕𝟏 , 𝒕𝟐 , …,

𝒅𝒈(𝒕)
This means that 𝒈(𝟏) 𝒕𝟏 ≠ 𝟎, 𝒈(𝟏) 𝒕𝟐 ≠ 𝟎, etc., where 𝒈(𝟏) 𝒕 = 𝒅𝒕
.

Consider function 𝝋 𝒕 that is continuous at 𝒕𝟏 , 𝒕𝟐 , …

∞ 𝒕𝒋 +𝝐
−∞
𝜹 𝒈 𝒕 𝝋 𝒕 𝒈(𝟏) 𝒕 𝒅𝒕 = 𝒋 𝒕 −𝝐 𝜹 𝒈 𝒕 𝝋 𝒕 𝒈(𝟏) 𝒕 𝒅𝒕 with infinitesimally small 𝝐.
𝒋

35
Over 𝒕 ∈ 𝒕𝒋 − 𝝐, 𝒕𝒋 + 𝝐 , 𝒈 𝒕 ≈ 𝒈 𝒕𝒋 + 𝒈(𝟏) 𝒕𝒋 𝒕 − 𝒕𝒋 = 𝒈(𝟏) 𝒕𝒋 𝒕 − 𝒕𝒋 and 𝒈(𝟏) 𝒕𝒋 ≠ 𝟎.

We can then assume that:


∞ 𝒕𝒋 +𝝐
−∞
𝜹 𝒈 𝒕 𝝋 𝒕 𝒈 𝟏 𝒕 𝒅𝒕 = 𝒋 𝒕 −𝝐 𝜹 𝒈 𝟏
𝒕𝒋 𝒕 − 𝒕𝒋 𝝋 𝒕 𝒈 𝟏 𝒕 𝒅𝒕
𝒋

𝟏 𝟏
Since 𝒈 𝒕𝒋 𝒕 − 𝒕𝒋 = 𝜹 𝒕 − 𝒕𝒋 ,
𝒈 𝟏 𝒕𝒋

∞ 𝟏
−∞
𝜹 𝒈 𝒕 𝝋 𝒕 𝒈 𝟏 𝒕 𝒅𝒕 = 𝒋 𝒈 𝟏
𝝋 𝒕𝒋 𝒈 𝟏 𝒕𝒋 = 𝒋𝝋 𝒕𝒋 .
𝒕𝒋

𝜹 𝒕−𝒕𝒋 𝜹 𝒕−𝒕𝒋
Thus, 𝜹 𝒈 𝒕 𝒈𝟏 𝒕 = 𝒋𝜹 𝒕 − 𝒕𝒋 ⇒ 𝜹 𝒈 𝒕 = 𝒋 𝒈𝟏 𝒕 = 𝒋 𝒈𝟏 𝒕 .
𝒋

𝟏 𝟏
That is, for 𝒈 𝒕 with a continuous derivative satisfying 𝒈 𝒕𝟏 = 𝟎, 𝒈 𝒕𝟏 ≠ 𝟎, 𝒈 𝒕𝟐 = 𝟎, 𝒈 𝒕𝟐 ≠
𝜹 𝒕−𝒕𝒋
𝟎, etc., 𝜹 𝒈 𝒕 = 𝒋 𝒈𝟏 𝒕 .
𝒋

For instance, if 𝒈 𝒕 = 𝒕𝟐 − 𝟏, 𝒈 𝟏 = 𝟎 and 𝒈 −𝟏 = 𝟎. The first derivative is 𝟐𝒕, which is equal to 𝟐


𝟏 𝟏
when 𝒕 = 𝟏 and −𝟐 when 𝒕 = −𝟏. Consequently, 𝜹 𝒕𝟐 − 𝟏 = 𝟐 𝜹 𝒕 − 𝟏 + 𝟐 𝜹 𝒕 + 𝟏 .

36
Soft-Thresholding Operator

Assuming 𝒛, 𝒛𝟎 ∈ ℝ, we wish to minimize the function:

𝟏 𝟐
𝒈 𝒛 = 𝒛 − 𝒛𝟎 +𝝀 𝒛
𝟐

where 𝝀 ∈ ℝ+.

𝟏 𝟏 𝟏 𝟏
𝒈 𝒛 −𝒈 𝟎 = 𝒛 − 𝒛𝟎 𝟐
+ 𝝀 𝒛 − 𝒛𝟐𝟎 = 𝒛𝟐 + 𝝀 𝒛 − 𝒛𝟎 𝒛 ≥ 𝒛𝟐 + 𝝀 𝒛 − 𝒛𝟎 𝒛 ⟹
𝟐 𝟐 𝟐 𝟐

𝟏 𝟐
𝒈 𝒛 −𝒈 𝟎 ≥ 𝒛 + 𝝀 − 𝒛𝟎 𝒛
𝟐

𝟏
If 𝒛𝟎 ≤ 𝝀, 𝒈 𝒛 ≥ 𝒈 𝟎 + 𝟐 𝒛𝟐 ≥ 𝒈(𝟎).

Hence, if 𝒛𝟎 ≤ 𝝀, the function 𝒈 𝒛 is minimized at 𝒛 = 𝟎.

When 𝒛𝟎 > 𝜆, let us examine the stationary points of 𝒈 𝒛 . When 𝒛 ≠ 𝟎,

𝒅𝒈(𝒛)
= 𝒛 − 𝒛𝟎 + 𝝀sign 𝒛 , where sign 𝒛 = 𝟏 when 𝒛 > 0 and sign 𝒛 = −𝟏 when 𝒛 < 0.
𝒅𝒛

𝒅𝒈(𝒛)
= 𝟎 ⟹ 𝒛 = 𝒛𝟎 − 𝝀sign 𝒛 . If 𝒛 > 0, 𝒛𝟎 − 𝝀 > 0 ⟹ 𝒛𝟎 > 𝜆 > 0.
𝒅𝒛

If 𝒛 < 0, 𝒛𝟎 + 𝝀 < 0 ⟹ 𝒛𝟎 < −𝜆 < 0. Thus, when 𝒛 ≠ 𝟎 and 𝒛𝟎 > 𝜆, the stationary point is

𝒛 = 𝒛𝟎 − 𝝀sign 𝒛𝟎 . The second derivative in this case is always 𝟏, i.e., is positive and, hence,
when 𝒛𝟎 > 𝜆, 𝒈(𝒛) is minimized at 𝒛 = 𝒛𝟎 − 𝝀sign 𝒛𝟎 . We can combine the cases of
𝒛𝟎 ≤ 𝝀 and 𝒛𝟎 > 𝜆 by introducing the soft-thresholding operator 𝑺𝝀 𝒛𝟎 :

𝟎, 𝒛𝟎 ≤ 𝝀
𝑺𝝀 𝒛𝟎 =
𝒛𝟎 − 𝝀sign 𝒛𝟎 , 𝒛𝟎 > 𝜆

Function 𝒈(𝒛) is minimized at 𝒛 = 𝑺𝝀 𝒛𝟎 .

37
Important Sum
Consider positive integer 𝑵 and the sum:
𝑵−𝟏
𝒎𝒗
𝑺 𝒗 = exp ±𝒊𝟐𝝅
𝑵
𝒎=𝟎

𝑵−𝟏
If 𝒗 is a multiple of 𝑵, then 𝒗 = 𝒍𝑵 with 𝒍 ∈ ℤ. In this case, 𝑺 𝒗 = 𝒎=𝟎 exp ±𝒊𝟐𝝅𝒎𝒍 =
𝑵−𝟏
𝒎=𝟎 1 = 𝑵 .

If 𝒗 is not a multiple of 𝑵, we can apply the rule for geometric series to obtain

𝒗 𝑵
𝑵−𝟏
𝒗 𝒎 𝟏 − exp ±𝒊𝟐𝝅 𝑵 𝟏 − exp ±𝒊𝟐𝝅𝒗
𝑺 𝒗 = exp ±𝒊𝟐𝝅 = 𝒗 = 𝒗 = 𝟎.
𝑵 𝟏 − exp ±𝒊𝟐𝝅 𝑵 𝟏 − exp ±𝒊𝟐𝝅 𝑵
𝒎=𝟎

(Note that the denominator is not equal to zero as 𝒗 is not a multiple of 𝑵.)

Therefore,

𝒎𝒗 𝑵, 𝑣 = 𝑙𝑁, 𝑙 ∈ ℤ
𝑺 𝒗 = 𝑵−𝟏
𝒎=𝟎 exp ±𝒊𝟐𝝅 𝑵 =
𝟎, otherwise
Recall that, for 𝒛 ∈ ℂ the sum of geometric series is given by:
𝑴
𝟏 − 𝒛𝑴−𝑲+𝟏
𝒛𝒎 = 𝒛𝑲
𝟏−𝒛
𝒎=𝑲

(𝑴 − 𝑲 + 𝟏 is the number of terms in the sum)

When 𝑴 → ∞, and provided that 𝒛 < 1,



𝒛𝑲
𝒛𝒎 =
𝟏−𝒛
𝒎=𝑲

38
Ideal Response of Continuous-Time (CT) Filters

39
Ideal Response of Discrete-Time (DT) Filters

40

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy