Selective Review
Selective Review
The main idea in order to solve the question of the cardinality of infinite sets is that sets have
the same cardinality if there exists a bijection between them. In an injective or one-to-one
mapping, each element of the codomain is associated with at most one element of the domain.
In a surjective or onto mapping, each element of the codomain is associated with at least one
element of the domain. A bijective mapping is both injective and surjective where each element
of the codomain is mapped to by exactly one element in the domain.
1
For example, consider the set of positive integers ℤ+ = 𝟏, 𝟐, 𝟑, 𝟒, … and the set of positive
even integers ℤ+ + +
𝒆 = {𝟐, 𝟒, 𝟔, 𝟖, … }. The mapping 𝒇: ℤ → ℤ𝒆 , 𝒇 𝒏 = 𝟐𝒏 is a bijection between
between them means that they have the same cardinality, i.e., ℤ+ = ℤ+
𝒆 . (In the realm of
infinite sets, the part can be put into a one-to-one correspondence with the whole.) In fact, the
set of integers ℤ, the set of natural numbers ℕ, the set of prime integers, the set of rational
numbers ℚ, the two-dimensional integer lattice ℤ2 (composed of all possible pairs of integers),
the set of algebraic numbers, the set of computable numbers, etc., all have the same cardinality
because it can be proven that there is a bijection between any two of the these sets. This
cardinality is aleph null or aleph naught and is denoted by the symbol 𝓝𝟎 . It turns out that this
is the smallest infinity, or the smallest size/cardinality of an infinite set. There are infinite sets
with cardinality greater than 𝓝𝟎 . Sets with cardinality 𝓝𝟎 are called countable sets or
countably infinite sets.
Consider the set of real numbers ℝ and the open interval 𝒂, 𝒃 , −∞ < 𝑎 < 𝑏 < ∞. The
𝝅 𝒂+𝒃
mapping 𝑓: 𝒂, 𝒃 → ℝ, 𝒇 𝒕 = 𝒙 = tan 𝒕− is a bijection between 𝒂, 𝒃 and ℝ
𝒃−𝒂 𝟐
What is the relation between 𝒄 and 𝓝𝟎 ? Before answering this question, we study the
following theorem:
The cardinality of set 𝑿 is strictly less than the cardinality of set 𝓟 𝑿 , which is the powerset of
𝑿.
Proof. This is obvious for finite nonempty sets where 𝓟 𝑿 = 𝟐 𝑿 . It is also valid for infinite
sets. Consider set 𝑿 and an injection 𝚿 𝑿 from 𝑿 to 𝓟 𝑿 . Assume that 𝚿 𝑿 is also
2
surjective. Consider set 𝑩 = 𝒙 ∈ 𝑿: 𝒙 ∉ 𝚿 𝒙 . Set 𝑩 is a subset of 𝑿 and, hence, is an
element of 𝓟 𝑿 . (The figure below provides a specific example for set 𝑩. )
Note that since the powerset is a set, it has its own powerset with strictly greater cardinality.
And so forth. So, in fact, there is an infinite number of infinite set sizes.
Now we show that 𝒄 > 𝓝𝟎 . We can focus on the interval 𝟎, 𝟏 as 𝟎, 𝟏 = 𝒄. Suppose that we
write all elements in [𝟎, 𝟏] in binary. These are strings of ones and zeros to the right of a binary
point. Each string can be put into one-to-one correspondence with a set containing the
positions of ones in the binary representation where the first position indicates the position
immediately to the right of the binary point, the second position indicates the position to the
right of the first position, and so forth. For example, the number 𝟎. 𝟓 (decimal) is 𝟎. 𝟏𝟎𝟎 … in
binary. The associated set is 𝟏 . The number 0.5625 is 𝟎. 𝟏𝟎𝟎𝟏𝟎𝟎 … in binary. The associated
3
set is 𝟏, 𝟒 . The number 𝟎. 𝟑𝟑𝟑𝟑 … is 𝟎. 𝟎𝟏𝟎𝟏𝟎𝟏𝟎𝟏 … in binary and the associated set is
𝟐, 𝟒, 𝟔, 𝟖, 𝟏𝟎, … , which is the set of positive even integers. The number zero does not have
any ones in its representation and is associated with the empty set. The number one can be
represented in binary as 𝟎. 𝟏𝟏𝟏𝟏𝟏𝟏 … The corresponding set is 𝟏, 𝟐, 𝟑, 𝟒, 𝟓, … , which is the
set of positive integers ℤ+. Ignoring some technicalities that will not impact the conclusion, we
can see that there is a bijection between 𝟎, 𝟏 and the powerset of ℤ+. Since 𝟎, 𝟏 = 𝒄 and
ℤ+ = 𝓝𝟎 , then 𝒄 > 𝓝𝟎 . In fact, we can write 𝒄 = 𝟐𝓝𝟎 . It is an open problem in mathematics
whether there are infinite sets with cardinality lying between 𝓝𝟎 and 𝒄. Without proof (until
now), the continuum hypothesis states that there is no set whose cardinality is strictly between
that of the set of integers and that of the set of real numbers.
Sets with cardinality strictly greater than 𝓝𝟎 are called uncountable sets or uncountably
infinite sets. Examples include the set of real numbers ℝ, the set of irrational numbers, the set
of transcendental numbers, the sets ℝ𝟐 , ℝ𝟑, etc. Note that all these particular examples have
the same cardinality 𝒄.
There are three very important properties of 𝓝𝟎 that will be useful for our purposes:
i. 𝓝𝟎 + 𝓝𝟎 = 𝓝𝟎 (An example of this is that the set of even integers have a cardinality of 𝓝𝟎 ,
the set of odd integers have a cardinality of 𝓝𝟎 , and the set of integers has cardinality of 𝓝𝟎 .)
iii. The finite or countable union of countably infinite sets is countably infinite.
4
Sets of Measure Zero
Consider the intervals 𝒂, 𝒃 , 𝒂, 𝒃 , 𝒂, 𝒃 or 𝒂, 𝒃 , where −∞ < 𝑎 < 𝑏 < ∞. All these intervals
have a measure or length equal to 𝒃 − 𝒂.
Set 𝑨 is covered by set 𝑩 if 𝑨 ⊂ 𝑩. In this case, the measure of 𝑨 is less than the measure of 𝑩. For
example the set 𝟏, 𝟐 of measure 1 is covered by the set 𝟎, 𝝅 , which has a measure of 𝝅.
The measure of the union of a countable number of sets is upperbounded by the sum of the measures
of the sets. If the sets are disjoint, the measure of their union is the sum of their measures. For example,
the measure of 𝟑, 𝟓 ∪ 𝟏𝟎, 𝟏𝟑 is 𝟓 = 𝟐 + 𝟑, whereas the measure of 𝟑, 𝟓 ∪ 𝟒, 𝟕 = 𝟒 while
the sum of measures of 𝟑, 𝟓 and 𝟒, 𝟕 is 𝟐 + 𝟑 = 𝟓.
A set 𝑿 is of measure zero if it can be covered by another set whose measure can be made arbitrarily
small while still covering set 𝑿.
Consider the set consisting of just a single real number 𝜶. This set can be covered by the set
𝝐 𝝐 𝝐 𝝐
[𝜶 − 𝟐 , 𝜶 + 𝟐], 𝝐 > 0. The set [𝜶 − 𝟐 , 𝜶 + 𝟐] has a measure equal to 𝝐 which can be made arbitrarily
small, thereby indicating that sets consisting of single real numbers are of measure zero.
𝝐 𝝐 𝝐 𝝐
Given 𝝐 > 0 , this set can be covered by the set 𝛼1 − 𝟐𝒎 , 𝛼1 + 𝟐𝒎 ⋃ 𝛼2 − 𝟐𝒎 , 𝛼2 + 𝟐𝒎 ⋃ … …
𝝐 𝝐 𝜖
⋃ 𝛼𝑚 − 𝟐𝒎 , 𝛼𝑚 + 𝟐𝒎 . The measure of this set is upperbounded by 𝑚 𝑚 = 𝜖. Again, this can be made
arbitrarily small meaning that a set of a finite number of real numbers is of measure zero.
∞
Now let's consider a countably infinite set of real numbers 𝜶𝒌 𝒌=𝟏 . Given 𝝐 > 0 , this set can be
𝝐 𝝐 𝝐 𝝐 𝝐 𝝐 𝝐 𝝐
covered by the set 𝛼1 − 𝟐
, 𝛼1 + 𝟐
⋃ 𝛼2 − 𝟒
, 𝛼2 + 𝟒
⋃ 𝛼3 − 𝟖
, 𝛼3 + 𝟖 ⋃ 𝛼4 − 𝟏𝟔 , 𝛼4 + 𝟏𝟔 ⋃ ….The
𝟏
∞ 𝝐 ∞ 𝟏 𝒌 𝟐
measure of the set is upperbounded by 𝒌=𝟏 𝟐𝒌 =𝝐 𝒌=𝟏 𝟐 =𝝐 𝟏 = 𝝐. This set is also of
𝟏−
𝟐
measure zero.
An example of an uncountably infinite set that has zero measure is Cantor set.
If a function is defined over a set of measure zero, then its integral is zero. For example,
𝟏
𝟎
𝕀 𝒙 ∈ 𝑸 𝒅𝒙 = 𝟎, where 𝑸 is the set of rational numbers, and 𝕀 . is the indicator function, which is
equal to one when its argument is true and zero otherwise.
5
Linear Algebra Review
Vector norms
ii. ∀𝒗 ∈ 𝑽, 𝒄 ∈ ℂ, 𝓷 𝒄𝒗 = 𝒄 𝓷 𝒗 .
𝓵𝟐 norm: 𝒙 = 𝒙𝟏 𝟐 + 𝒙𝟐 𝟐 + ⋯ + 𝒙𝒏 𝟐
𝟐
𝓵𝟏 norm: 𝒙 𝟏 = 𝒙𝟏 + 𝒙𝟐 + ⋯ + 𝒙𝒏
6
If 𝒒 > 𝑝 ≥ 1, then
𝟏 𝟏
−
𝒙 ≤ 𝒙 ≤𝒏 𝒑 𝒒 𝒙
𝒒 𝒑 𝒒
Matrices
Element 𝒗, 𝒌 on the 𝒗th row and 𝒌th column of matrix 𝑨 is denoted 𝑨(𝒗, 𝒌) or 𝑨𝒗,𝒌 or 𝑨𝒗𝒌 .
The trace of square matrix 𝑨 ∈ ℂ𝒏×𝒏 is the sum of the elements on its main diagonal, i.e.,
trace 𝑨 = 𝒏𝒌=𝟏 𝑨(𝒌, 𝒌). If 𝑨 ∈ ℂ𝒏×𝒎 and 𝑩 ∈ ℂ𝒎×𝒏 , trace 𝑨𝑩 = trace 𝑩𝑨 .
A square matrix 𝑨 ∈ ℂ𝒏×𝒏 is nonsingular or invertible (and, hence, 𝑨−𝟏 exists) iff det 𝑨 ≠ 𝟎. In
this case det 𝑨−𝟏 = 𝟏 det 𝑨 . Generally, for 𝑨, 𝑩 ∈ ℂ𝒏×𝒏 , det 𝑨𝑩 = det 𝑨 det 𝑩 .
The dimension of a vector space or subspace is the number of linearly independent vectors
needed to span the whole space or subspace.
The rank of matrix 𝑨 ∈ ℂ𝒏×𝒎 is the largest number of rows (columns) of 𝑨 that constitute a
linearly independent set. The rank of 𝑨 is zero iff 𝑨 is all-zero matrix.
7
rank 𝑨 = dim 𝓡 𝑨 = 𝒎 − dim 𝓝 𝑨
rank 𝑨𝑩 ≤ min rank 𝑨 ,rank 𝑩 for 𝑨 ∈ ℂ𝒏×𝒎 and 𝑩 ∈ ℂ𝒎×𝒌 (Hence, if 𝒙 ∈ ℂ𝒏×𝟏 is a
nonzero vector, the rank of 𝒙𝒙𝑯 is equal to 𝟏. If 𝒙, 𝒚 ∈ ℂ𝒏×𝟏 are nonzero vectors, the rank of
𝒙𝒚𝑯 or 𝒚𝒙𝑯 is equal to 𝟏.)
If 𝑨, 𝑩 ∈ ℂ𝒏×𝒏 , 𝑨𝑩 𝑯
= 𝑩𝑯 𝑨𝑯 .
A square matrix 𝑨 ∈ ℂ𝒏×𝒏 is unitary if 𝑨𝑯 𝑨 = 𝑨𝑨𝑯 = 𝑰𝒏×𝒏 . That is, 𝑨 is always invertible and its
inverse is 𝑨𝑯 . The columns (rows) of a unitary matrix constitute an orthonormal set of vectors.
Eigendecomposition
The eigenvalues are the roots of the characteristic polynomial 𝒑𝑨 𝝀 = det 𝝀𝑰𝒏×𝒏 − 𝑨 .
8
Matrix 𝑨 ∈ ℂ𝒏×𝒏 has exactly 𝒏 eigenvalues, counting multiplicities. The eigenvectors
corresponding to different eigenvalues are linearly independent.
Consider square matrices 𝑨, 𝑩 ∈ ℂ𝒏×𝒏 where each has 𝒏 linearly independent eigenvectors.
Matrices 𝑨 and 𝑩 commute, , i.e., 𝑨𝑩 = 𝑩𝑨, iff they share the same eigenvectors.
That is, trace 𝑨 = 𝒏𝒌=𝟏 𝑨(𝒌, 𝒌) = 𝑛𝑘=1 𝝀𝒌 and det 𝑨 = 𝑛𝑘=1 𝝀𝒌 . The trace is the sum of the
eigenvalues and the determinant is the product of the eigenvalues.
The eigenvalues of 𝑨𝑻 are the same as those of 𝑨. The eigenvalues of 𝑨𝑯 are the complex
conjugates of the eigenvalues of 𝑨.
If 𝑨 ∈ ℂ𝒏×𝒎 and 𝑩 ∈ ℂ𝒎×𝒏 , matrix 𝑨𝑩 ∈ ℂ𝒏×𝒏 has same nonzero eigenvalues as matrix
𝑩𝑨 ∈ ℂ𝒎×𝒎 .
Nondefective matrices have eigenvalues such that each eigenvalue has the same algebraic and
geometric multiplicities. Nondefective matrices are diagonalizable. That is, if 𝑨 ∈ ℂ𝒏×𝒏 and 𝑨 is
diagonalizable, 𝑨 can be expressed as:
𝑨 = 𝑸𝚲𝑸−𝟏
where 𝑸 ∈ ℂ𝒏×𝒏 is a matrix with linearly independent eigenvectors as columns, and 𝚲 is a
diagonal matrix containing the corresponding eigenvalues.
9
For nondefective/diagonalizable matrices, the rank of the matrix is equal to the number of
nonzero eigenvalues, counting multiplicities.
Normal Matrices
Matrix 𝑨 ∈ ℂ𝒏×𝒏 is normal if it satisfies 𝑨𝑯 𝑨 = 𝑨𝑨𝑯 . All Hermitian, skew-Hermitian and unitary
matrices are normal. All normal matrices are diagonalizable. Moreover, the eigenvector matrix
𝑸 of eigendecomposition of 𝑨 can be chosen to be unitary, i.e., 𝑸𝑯 𝑸 = 𝑸𝑸𝑯 = 𝑰𝒏×𝒏 . Hence,
for normal matrix 𝑨,
𝑨 = 𝑸𝚲𝑸𝑯
If 𝑨 ∈ ℂ𝒏×𝒏 is Hermitian, its eigenvalues are real-valued and, as mentioned for normal matrices,
its eigenvectors are orthogonal to one another.
A Hermitian matrix 𝑨 ∈ ℂ𝒏×𝒏 is positive definite if 𝒙𝑯 𝑨𝒙 > 0 for all nonzero 𝒙 ∈ ℂ𝒏×𝟏 .
For a Hermitian matrix 𝑨 ∈ ℂ𝒏×𝒏 , let 𝝀𝟏 be its maximum eigenvalue and 𝝀𝒏 be its minimum
eigenvalue. We have:
𝝀𝒏 𝒙𝑯 𝒙 ≤ 𝒙𝑯 𝑨𝒙 ≤ 𝝀𝟏 𝒙𝑯 𝒙
The quadratic form 𝒙𝑯 𝑨𝒙 achieves its maximum possible value, 𝝀𝟏 𝒙 𝟐𝟐 , when 𝒙 is an
eigenvector of 𝑨 corresponding to its maximum eigenvalue. Hence, we can write:
𝒙𝑯 𝑨𝒙
𝝀𝟏 = 𝝀max 𝑨 = 𝐦𝐚𝐱 = 𝐦𝐚𝐱 𝒙𝑯 𝑨𝒙
𝒙≠𝟎 𝒙𝑯 𝒙 𝒙 𝟐 =𝟏
𝒙𝑯 𝑨𝒙
𝝀𝒏 = 𝝀min 𝑨 = 𝐦𝐢𝐧 = 𝐦𝐢𝐧 𝒙𝑯 𝑨𝒙
𝒙≠𝟎 𝒙𝑯 𝒙 𝒙 𝟐 =𝟏
10
A Hermitian positive definite matrix has strictly positive eigenvalues.
Matrix Norms
𝒏 𝒎
For matrix 𝑨 ∈ ℂ𝒏×𝒎 , the Frobenius matrix norm is defined as: 𝑨 𝑭 = 𝒌=𝟏 𝒗=𝟏 𝑨 𝒌, 𝒗 𝟐.
𝟐 𝒏 𝒎 𝟐
Using the trace operation, 𝑨 𝑭 = 𝒌=𝟏 𝒗=𝟏 𝑨 𝒌, 𝒗 = trace 𝑨𝑯 𝑨 = trace 𝑨𝑨𝑯 .
𝑨𝒙 𝒑
𝑨 𝒑 = 𝐬𝐮𝐩 = 𝐬𝐮𝐩 𝑨𝒙 𝒑
𝒙≠𝟎 𝒙 𝒑 𝒙 𝒑 =𝟏
One can show that matrix norms satisfy the same criteria for a vector norm. Moreover, for all
matrix norms and matrices 𝑨 ∈ ℂ𝒏×𝒎 and 𝑩 ∈ ℂ𝒎×𝒌 :
𝑨𝑩 ≤ 𝑨 𝑩
11
Important Inequalities
Triangle Inequality
In a normed vector space 𝑽, one of the defining properties of the norm is the triangle inequality:
𝒙 + 𝒚 ≤ 𝒙 + 𝒚 for all 𝒙, 𝒚 ∈ 𝑽.
Cauchy-Schwarz Inequality
For vectors 𝒙, 𝒚 ∈ ℂ𝒏×𝟏 , 𝒙𝑯 𝒚 ≤ 𝒙 𝟐 𝒚 𝟐 with equality iff 𝒙 and 𝒚 are linearly dependent.
Proof.
The statement is obvious if one or both vectors is all-zero vector. Assume now that both 𝒙 and
𝒚 are nonzero. Consider vector 𝜶𝒙 − 𝒚, where 𝜶 ∈ ℂ.
𝒚𝒊𝒆𝒍𝒅𝒔
𝟐
𝜶𝒙 − 𝒚 𝟐 ≥𝟎 𝜶∗ 𝒙𝑯 − 𝒚𝑯 𝜶𝒙 − 𝒚 = 𝜶 𝟐 𝒙𝑯 𝒙 + 𝒚𝑯 𝒚 − 𝜶∗ 𝒙𝑯 𝒚 − 𝜶𝒚𝑯 𝒙 ≥ 𝟎
𝒙𝑯 𝒚
This inequality is valid for any 𝜶. Set 𝜶 = 𝒙𝑯 𝒙 . Hence,
𝒙𝑯 𝒚 𝟐
𝑯
𝒙𝑯 𝒚 𝟐
𝑯
𝒙𝑯 𝒚 𝟐 𝒚𝒊𝒆𝒍𝒅𝒔
𝑯
𝒙𝑯 𝒚 𝟐
𝒙 𝒙+𝒚 𝒚− 𝑯 − 𝑯 ≥ 𝟎 𝒚 𝒚≥ 𝑯
𝒙𝑯 𝒙 𝟐 𝒙 𝒙 𝒙 𝒙 𝒙 𝒙
Thus, 𝒙𝑯 𝒚 𝟐
≤ 𝒙𝑯 𝒙 𝒚𝑯 𝒚 or 𝒙𝑯 𝒚 ≤ 𝒙 𝑯 𝒙 𝒚𝑯 𝒚 = 𝒙 𝟐 𝒚 𝟐
An important point is that Cauchy-Schwarz inequality is achieved with equality if and only if
𝒚 = 𝜷𝒙, where 𝜷 ∈ ℂ. This is based on the fact that for 𝒛 ∈ ℂ𝒏 , 𝒛 𝟐 = 𝟎 iff 𝒛 is the all-zero
vector. Thus, 𝜶𝒙 − 𝒚 𝟐 = 𝟎 iff 𝜶𝒙 = 𝒚.
𝒏 𝒏
If the components of 𝒙 are 𝒙𝒌 𝒌=𝟏 and those of 𝒚 are 𝒚𝒌 𝒌=𝟏 , the Cauchy-Schwarz inequality
𝒏 ∗ 𝒏 𝟐 𝒏 𝟐
can be written as 𝒌=𝟏 𝒙𝒌 𝒚𝒌 ≤ 𝒌=𝟏 𝒙𝒌 𝒌=𝟏 𝒚𝒌
12
𝒇 𝜶 𝒈 𝜶 𝒅𝜶 ≤ 𝒇 𝜶 𝟐 𝒅𝜶 𝒈 𝜶 𝟐 𝒅𝜶
𝟏 𝒏 𝟏
Their arithmetic mean is
𝒏 𝒌=𝟏 𝒙𝒌 = 𝒏 𝒙𝟏 + 𝒙𝟐 + ⋯ + 𝒙𝒏 .
𝒏 𝟏 𝒏 𝟏 𝒏
Their geometric mean is 𝒌=𝟏 𝒙𝒌 = 𝒙𝟏 𝒙𝟐 … 𝒙𝒏 .
𝟏 𝒏 𝒏 𝟏 𝒏
The AM-GM inequality states that:
𝒏 𝒌=𝟏 𝒙𝒌 ≥ 𝒌=𝟏 𝒙𝒌 with equality iff all the numbers are
equal.
Proof.
We can make use of the inequality exp 𝒛 ≥ 𝟏 + 𝒛, with equality iff 𝒛 = 𝟎, which is equivalent to
exp 𝒛 − 𝟏 ≥ 𝒛 with equality iff 𝒛 = 𝟏.
𝒙𝒌 𝟏 𝒏
Consider 𝒚𝒌 = where 𝜼 is the arithmetic mean, i.e., 𝜼 = 𝒌=𝟏 𝒙𝒌 .
𝜼 𝒏
𝒏 𝟏 𝒏 𝒏 𝟏 𝒏 𝟏 𝒏
Since 𝒛 ≤ exp 𝒛 − 𝟏 , 𝒌=𝟏 𝒚𝒌 ≤ 𝒌=𝟏 exp 𝒚𝒌 − 𝟏 = exp 𝒌=𝟏 𝒚𝒌 −𝟏 .
𝒏
𝟏 𝒏 𝟏 𝒏 𝒙𝒌 𝜼
𝒏 𝒌=𝟏 𝒚𝒌 =𝒏 𝒌=𝟏 𝜼 = 𝜼
= 𝟏.
𝒏 𝟏 𝒏 𝒏 𝟏 𝒏
𝟏
𝒚𝒌 = 𝒙𝒌
𝜼
𝒌=𝟏 𝒌=𝟏
𝟏 𝒚𝒊𝒆𝒍𝒅𝒔 𝟏
𝒏 𝟏 𝒏 𝒏 𝟏 𝒏 𝒏
Therefore,
𝜼 𝒌=𝟏 𝒙𝒌 ≤ exp 𝟏 − 𝟏 = 𝟏 𝒌=𝟏 𝒙𝒌 ≤𝜼=𝒏 𝒌=𝟏 𝒙𝒌 .
Note that exp 𝒛 − 𝟏 = 𝒛 iff 𝒛 = 𝟏. This means that the AM-GM inequality is satisfied with
equality iff ∀𝒌 𝒚𝒌 = 𝟏. This is equivalent to: ∀𝒌 𝒙𝒌 = 𝜼.
13
Leibniz's Rule for Differentiation under the Integral Sign
Let 𝒇(𝒙, 𝒕) be a function such that the partial derivative of 𝒇 with respect to 𝒕 exists, and is continuous.
Then,
𝜷 𝒕 𝜷 𝒕
𝒅 𝝏𝒇 𝒅𝜷 𝒅𝜶
𝒇 𝒙, 𝒕 𝒅𝒙 = 𝒅𝒙 + 𝒇 𝜷 𝒕 ,𝒕 − 𝒇 𝜶 𝒕 ,𝒕
𝒅𝒕 𝝏𝒕 𝒅𝒕 𝒅𝒕
𝜶 𝒕 𝜶 𝒕
Example:
𝒕 𝒕
𝒅 𝐥𝐧 𝟏 + 𝒕𝒙 𝒙 𝐥𝐧 𝟏 + 𝒕𝟐
𝒅𝒙 = 𝒅𝒙 +
𝒅𝒕 𝟏 + 𝒙𝟐 𝟏 + 𝒙𝟐 𝟏 + 𝒕𝒙 𝟏 + 𝒕𝟐
𝟎 𝟎
Another example:
∞
𝒅 𝒙𝟐 𝒕𝟐
− −
𝒆 𝟐 𝒅𝒙 = −𝒆 𝟐
𝒅𝒕
𝒕
14
Taylor Series
Consider function 𝒈 𝒙 : ℝ → ℝ. Its Taylor series expansion about a point 𝒙𝟎 is given by:
𝒌 𝒙
∞ 𝒈 𝒅𝒈(𝒙)
𝒈 𝒙 = 𝒌=𝟎
𝟎
𝒙 − 𝒙𝟎 𝒌 , where 𝒈 𝟎 𝒙 = 𝒈 𝒙 , 𝒈 𝟏 𝒙 = ,𝒈 𝟐 𝒙 =
𝒌! 𝒅𝒙
𝒅𝟐 𝒈(𝒙)
, and so on. Using 𝒏 + 𝟏 terms and a remainder term, we can write:
𝒅𝒙𝟐
𝒏
𝒈 𝒌 𝒙𝟎 𝒌
𝒈 𝒏+𝟏 𝜻 𝒏+𝟏
𝒈 𝒙 = 𝒙 − 𝒙𝟎 + 𝒙 − 𝒙𝟎
𝒌! 𝒏+𝟏 !
𝒌=𝟎
𝟏 𝒆𝜻
𝒆𝒙 = 𝒏 𝒌
𝒌=𝟎 𝒌! 𝒙 + 𝒏+𝟏 ! 𝒙
𝒏+𝟏 , 𝜻 ∈ 𝟎, 𝒙 when 𝒙 > 0 and 𝜻 ∈ 𝒙, 𝟎 when 𝒙 < 0.
𝟏
𝒆𝒙 ≥ 𝒏 𝒌 𝒙
𝒌=𝟎 𝒌! 𝒙 for all 𝒙. When 𝒏 = 𝟏, 𝒆 ≥ 𝟏 + 𝒙 for all 𝒙. This means that 𝒆
−𝒙
≥ 𝟏 − 𝒙 for all 𝒙.
𝒏 𝒙
𝒈 𝒌 𝒙𝟎 𝟏
𝒈 𝒙 = 𝒙 − 𝒙𝟎 𝒌 + 𝒙 − 𝒕 𝒏 𝒈 𝒏+𝟏 𝒕 𝒅𝒕
𝒌! 𝒏!
𝒌=𝟎 𝒙𝟎
In the multidimensional case when 𝒈 𝒙 : ℝ𝑵 → ℝ, the Taylor series expansion about point 𝒚 is given
by:
𝟏
𝒈 𝒙 = 𝒈 𝒚 + 𝛁T𝒈 𝒚 𝒙 − 𝒚 + 𝒙 − 𝒚 T𝓗 𝒚 𝒙 − 𝒚 + ⋯
𝟐
𝝏𝒈 𝝏𝒈 𝝏𝒈
where 𝛁 T 𝒈 𝒚 = … and 𝓗 𝒚 is the Hessian matrix evaluated at point 𝒚.
𝝏𝒙𝟏 𝒚 𝝏𝒙𝟐 𝒚 𝝏𝒙𝑵 𝒚
15
That is,
𝑵 𝑵 𝑵
𝝏𝒈 𝟏 𝝏𝟐 𝒈
𝒈 𝒙 =𝒈 𝒚 + 𝒙 𝒏 − 𝒚𝒏 + 𝒙 𝒏 − 𝒚𝒏 𝒙 𝒎 − 𝒚 𝒎
𝝏𝒙𝒏 𝒚
𝟐 𝝏𝒙𝒏 𝝏𝒙𝒎 𝒚
𝒏=𝟏 𝒏=𝟏 𝒎=𝟏
𝑵 𝑵 𝑵 𝟑
𝟏 𝝏 𝒈
+ 𝒙 𝒏 − 𝒚𝒏 𝒙 𝒎 − 𝒚𝒎 𝒙 𝒌 − 𝒚𝒌 + ⋯
𝟔 𝝏𝒙𝒏 𝝏𝒙𝒎 𝝏𝒙𝒌 𝒚
𝒏=𝟏 𝒎=𝟏 𝒌=𝟏
𝒈 𝒙 = 𝒈 𝒚 + 𝛁 T 𝒈 𝝀𝒙 + (𝟏 − 𝝀)𝒚 𝒙 − 𝒚 = 𝒈 𝒚 + 𝛁 T 𝒈 𝒚 + 𝝀 𝒙 − 𝒚 𝒙−𝒚 ,
𝟏
𝒈 𝒙 = 𝒈 𝒚 + 𝛁T𝒈 𝒚 𝒙 − 𝒚 + 𝒙 − 𝒚 T 𝓗 𝝀𝒙 + (𝟏 − 𝝀)𝒚 𝒙 − 𝒚
𝟐
𝟏
= 𝒈 𝒚 + 𝛁T𝒈 𝒚 𝒙 − 𝒚 + 𝒙 − 𝒚 T𝓗 𝒚 + 𝝀 𝒙 − 𝒚 𝒙 − 𝒚
𝟐
16
Differentiation with respect to vectors and matrices
In order to solve an optimization problem, we typically need to partially differentiate a real-
valued scalar function of multiple variables. The differentiation is with respect to real-valued
optimization variables, or with respect to the real and imaginary parts of complex-valued
optimization variables. These optimization variables can be the elements of a vector or a
matrix.
𝒙 = 𝒙𝟏 𝒙𝟐 … 𝒙𝒏−𝟏 𝒙𝒏 𝑻
and 𝒂 = 𝒂𝟏 𝒂𝟐 … 𝒂𝒏−𝟏 𝒂𝒏 𝑻 .
𝝏𝑱 𝑱 𝒙 + 𝒛 𝒆𝒌 − 𝑱 𝒙 𝒂𝑻 𝒙 + 𝒛 𝒆𝒌 − 𝒂𝑻 𝒙
= 𝐥𝐢𝐦 = 𝐥𝐢𝐦 = 𝒂𝑻 𝒆𝒌 = 𝒂𝒌
𝝏𝒙𝒌 𝒛→𝟎 𝒛 𝒛→𝟎 𝒛
𝝏𝑱 𝝏𝑱 𝝏𝑱 𝝏𝑱 𝑻
Thus, 𝛁𝒙 𝑱 = … = 𝒂𝟏 𝒂𝟐 … 𝒂𝒏−𝟏 𝒂𝒏 𝑻
= 𝒂 ⟹ 𝛁𝒙 𝒂𝑻 𝒙 = 𝒂.
𝝏𝒙𝟏 𝝏𝒙𝟐 𝝏𝒙𝒏−𝟏 𝝏𝒙𝒏
𝒙 = 𝒙𝟏 𝒙𝟐 … 𝒙𝒏−𝟏 𝒙𝒏 𝑻
𝒂 = 𝒂𝟏 𝒂𝟐 … 𝒂𝒏−𝟏 𝒂𝒏 𝑻
𝝏𝑱 𝝏𝑱
+𝒊
𝝏𝒙𝟏,𝒓 𝝏𝒙𝟏,𝒊
𝝏𝑱 𝝏𝑱
+𝒊
𝝏𝒙𝟐,𝒓 𝝏𝒙𝟐,𝒊
𝛁𝒙 𝑱 = ⋮
𝝏𝑱 𝝏𝑱
+𝒊
𝝏𝒙𝒏−𝟏,𝒓 𝝏𝒙𝒏−𝟏,𝒊
𝝏𝑱 𝝏𝑱
+𝒊
𝝏𝒙𝒏,𝒓 𝝏𝒙𝒏,𝒊
17
𝝏𝑱 𝝏𝑱
We can compute 𝝏𝒙 and 𝝏𝒙 as:
𝒌,𝒓 𝒌,𝒊
𝝏𝑱 𝑱 𝒙 + 𝒛 𝒆𝒌 − 𝑱 𝒙 𝝏𝑱 𝒙 𝑱 𝒙 + 𝒛𝒊𝒆𝒌 − 𝑱 𝒙
= 𝐥𝐢𝐦 , = 𝐥𝐢𝐦
𝝏𝒙𝒌,𝒓 𝒛→𝟎 𝒛 𝝏𝒙𝒌,𝒊 𝒛→𝟎 𝒛
𝝏𝑱 𝒂𝑯 𝒙 + 𝒛 𝒆𝒌 + 𝒙 + 𝒛 𝒆𝒌 𝑯 𝒂 − 𝒂𝑯 𝒙 + 𝒙𝑯 𝒂 𝒛 𝒂𝑯 𝒆𝒌 + 𝒆𝑯
𝒌𝒂
= 𝐥𝐢𝐦 = 𝐥𝐢𝐦
𝝏𝒙𝒌,𝒓 𝒛→𝟎 𝒛 𝒛→𝟎 𝒛
∗
= 𝒂𝒌 + 𝒂𝒌 = 𝟐𝒂𝒌,𝒓
𝝏𝑱 𝒂𝑯 𝒙 + 𝒛𝒊𝒆𝒌 + 𝒙 + 𝒛𝒊𝒆𝒌 𝑯 𝒂 − 𝒂𝑯 𝒙 + 𝒙𝑯 𝒂 𝒊𝒛 𝒂𝑯 𝒆𝒌 − 𝒆𝑯
𝒌𝒂
= 𝐥𝐢𝐦 = 𝐥𝐢𝐦
𝝏𝒙𝒌,𝒊 𝒛→𝟎 𝒛 𝒛→𝟎 𝒛
∗
= 𝒊𝒂𝒌 − 𝒊𝒂𝒌 = 𝟐𝒂𝒌,𝒊
𝝏𝑱 𝝏𝑱
+𝒊 = 𝟐𝒂𝒌,𝒓 + 𝒊𝟐𝒂𝒌,𝒊 = 𝟐𝒂𝒌
𝝏𝒙𝒌,𝒓 𝝏𝒙𝒌,𝒊
Thus,
𝝏𝑱 𝝏𝑱
+𝒊
𝝏𝒙𝟏,𝒓 𝝏𝒙𝟏,𝒊
𝝏𝑱 𝝏𝑱 𝒂𝟏
+𝒊
𝝏𝒙𝟐,𝒓 𝝏𝒙𝟐,𝒊 𝒂𝟐
⋮ =𝟐 ⋮ = 𝟐𝒂 ⇒ 𝛁𝒙 𝒂𝑯 𝒙 + 𝒙𝑯 𝒂 = 𝟐𝒂
𝝏𝑱 𝝏𝑱 𝒂𝒏−𝟏
+𝒊 𝒂𝒏
𝝏𝒙𝒏−𝟏,𝒓 𝝏𝒙𝒏−𝟏,𝒊
𝝏𝑱 𝝏𝑱
+𝒊
𝝏𝒙𝒏,𝒓 𝝏𝒙𝒏,𝒊
Suppose we want to differentiate the quadratic form 𝑱 𝒙 = 𝒙𝑻 𝑨𝒙, where 𝑨 ∈ ℝ𝒏×𝒏 , with
respect to the elements of vector 𝒙 ∈ ℝ𝒏×𝟏 .
18
𝝏𝑱
𝝏𝒙𝟏
𝝏𝑱 𝑨row,𝟏 𝒙 𝑨𝑻col,𝟏 𝒙
𝝏𝒙𝟐 𝑨row,𝟐 𝒙 𝑨𝑻col,𝟐 𝒙
𝑻
𝛁𝒙 𝒙𝑻 𝑨𝒙 = 𝑨 + 𝑨𝑻 𝒙
⋮ = ⋮ + ⋮ = 𝑨+𝑨 𝒙⇒
𝛁𝒙 𝒙𝑻 𝑨𝒙 = 𝟐𝑨𝒙, 𝑨 is symmetric
𝝏𝑱 𝑨row,𝒏−𝟏 𝒙 𝑨𝑻col,𝒏−𝟏 𝒙
𝝏𝒙𝒏−𝟏 𝑨row,𝒏 𝒙 𝑨𝑻col,𝒏 𝒙
𝝏𝑱
𝝏𝒙𝒏
𝝏𝑱 𝒙 + 𝒛 𝒆𝒌 𝑯 𝑨 𝒙 + 𝒛 𝒆𝒌 − 𝒙𝑯 𝑨𝒙 𝒛 𝒆𝑯 𝑯 𝟐 𝑯
𝒌 𝑨𝒙 + 𝒙 𝑨𝒆𝒌 + 𝒛 𝒆𝒌 𝑨𝒆𝒌
= 𝐥𝐢𝐦 = 𝐥𝐢𝐦
𝝏𝒙𝒌,𝒓 𝒛→𝟎 𝒛 𝒛→𝟎 𝒛
𝑯 𝑯
= 𝒆𝒌 𝑨𝒙 + 𝒙 𝑨𝒆𝒌
𝝏𝑱 𝝏𝑱
+𝒊 = 𝟐𝒆𝑯
𝒌 𝑨𝒙 = 𝟐𝑨row,𝒌 𝒙
𝝏𝒙𝒌,𝒓 𝝏𝒙𝒌,𝒊
𝝏𝑱 𝝏𝑱
+𝒊
𝝏𝒙𝟏,𝒓 𝝏𝒙𝟏,𝒊
𝝏𝑱 𝝏𝑱 𝑨row,𝟏 𝒙
+𝒊
𝝏𝒙𝟐,𝒓 𝝏𝒙𝟐,𝒊 𝑨row,𝟐 𝒙
⋮ =𝟐 ⋮ = 𝟐𝑨𝒙 ⇒ 𝛁𝒙 𝒙𝑯 𝑨𝒙 = 𝟐𝑨𝒙
𝝏𝑱 𝝏𝑱 𝑨row,𝒏−𝟏 𝒙
+𝒊
𝝏𝒙𝒏−𝟏,𝒓 𝝏𝒙𝒏−𝟏,𝒊 𝑨row,𝒏 𝒙
𝝏𝑱 𝝏𝑱
+𝒊
𝝏𝒙𝒏,𝒓 𝝏𝒙𝒏,𝒊
19
We can expand the determinant using the 𝒌th row. The numerator is:
𝒌+𝒗 𝒌+𝒗
−𝟏 𝑿 𝒌, 𝒗 + 𝒛 𝑴𝒌,𝒗 − −𝟏 𝑿(𝒌, 𝒗)𝑴𝒌,𝒗
where minor 𝑴𝒌,𝒗 is the determinant of 𝑿 after removing the 𝒌th row and 𝒗th column. Thus,
𝝏𝑱 𝒌+𝒗
= −𝟏 𝑴𝒌,𝒗
𝝏𝑿 𝒌, 𝒗
𝝏𝑱 𝝏𝑱
… 𝟏+𝒏
𝝏𝑿 𝟏, 𝟏 𝝏𝑿 𝟏, 𝒏 𝑴𝟏,𝟏 … −𝟏 𝑴𝟏,𝒏
𝛁𝑿 𝑱 = ⋮ ⋱ ⋮ = ⋮ ⋱ ⋮
𝝏𝑱 𝝏𝑱 −𝟏 𝒏+𝟏 𝑴𝒏,𝟏 … 𝑴𝒏,𝒏
…
𝝏𝑿 𝒏, 𝟏 𝝏𝑿 𝒏, 𝒏
𝟏
This is the cofactor matrix, denoted by adj 𝑿 𝑻 . If 𝑿 is invertible, 𝑿−𝟏 = 𝐝𝐞 𝐭 𝑿 adj 𝑿 . Hence,
𝑻 𝑻
𝛁𝑿 det 𝑿 = adj 𝑿 and, if 𝑿 is invertible, 𝛁𝑿 det 𝑿 = adj 𝑿 = det{𝑿}𝑿−𝑻.
𝒂 = 𝒂𝟏 𝒂𝟐 … 𝒂𝒏−𝟏 𝒂𝒏 𝑻
… 𝒃𝒏−𝟏 𝒃𝒏 𝑻
𝒃 = 𝒃𝟏 𝒃𝟐
𝝏𝑱 𝒂𝑻 𝑿 + 𝒛𝒆𝒌,𝒗 𝒃 − 𝒂𝑻 𝑿𝒃
= 𝐥𝐢𝐦 = 𝒂𝑻 𝒆𝒌,𝒗 𝒃 = 𝒂𝒌 𝒃𝒗
𝝏𝑿(𝒌, 𝒗) 𝒛→𝟎 𝒛
𝝏𝑱 𝝏𝑱
⋯
𝝏𝑿 𝟏, 𝟏 𝝏𝑿 𝟏, 𝟐 𝒂𝟏 𝒃𝟏 𝒂𝟏 𝒃𝟐 …
𝛁𝑿 𝑱 = 𝝏𝑱 𝝏𝑱 = 𝒂𝟐 𝒃𝟏 𝒂𝟐 𝒃𝟐 … = 𝒂𝒃𝑻 ⟹ 𝛁𝑿 𝒂𝑻 𝑿𝒃 = 𝒂𝒃𝑻
… ⋮ ⋮ ⋱
𝝏𝑿 𝟐, 𝟏 𝝏𝑿 𝟐, 𝟐
⋮ ⋮ ⋱
20
Convex and Concave Functions
𝒙𝟏
𝒙𝟐
Vector 𝒙 = ⋮ ∈ ℝ𝑵×𝟏 .
𝒙𝑵−𝟏
𝒙𝑵
Examples:
𝒈 𝒙 = 𝒙𝟐 is convex over ℝ.
𝟏
𝒈 𝒙 = 𝒙 is convex over ℝ+.
21
𝝋 𝝀 = 𝒈 𝝀𝒙 + 𝟏 − 𝝀 𝒚 , 𝝀 ∈ 𝟎, 𝟏 . If 𝒈(𝒙) is continuously differentiable and 𝒛 =
𝒛𝟏 𝒛𝟐 … 𝒛𝑵−𝟏 𝒛𝑵 𝑻
= 𝝀𝒙 + 𝟏 − 𝝀 𝒚, then by the chain rule, the first derivative of
𝝋 𝝀 with respect to 𝝀 is:
𝑵 𝝏𝒈 𝝏𝒛𝒌 𝑵 𝝏𝒈
𝝋𝟏 𝝀 = 𝒌=𝟏 𝝏𝒛 𝝏𝝀 = 𝒌=𝟏 𝝏𝒛 𝒙𝒌 − 𝒚𝒌 = 𝛁 𝑻 𝒈 𝝀𝒙 + 𝟏 − 𝝀 𝒚 𝒙 − 𝒚 .
𝒌 𝒌
𝝏𝒈
𝝏𝒙𝟏
𝝏𝒈
𝝏𝒙𝟐
where 𝛁𝒈 𝒙 = ⋮ with the partial derivatives evaluated at 𝒙.
𝝏𝒈
𝝏𝒙𝑵−𝟏
𝝏𝒈
𝝏𝒙𝑵
If 𝒈(𝒙) is twice continuously differentiable, then the second derivative of 𝝋 𝝀 with respect to
𝝀 is:
𝒅 𝑵 𝝏𝒈 𝑵 𝒅 𝝏𝒈
𝝋 𝟐 𝝀 = 𝒅𝝀 𝒌=𝟏 𝝏𝒛 𝒙 𝒌 − 𝒚𝒌 = 𝒌=𝟏 𝒙 𝒌 − 𝒚𝒌 =
𝒌 𝒅𝝀 𝝏𝒛𝒌
𝑵 𝑵 𝝏𝟐 𝒈 𝝏𝒛𝒋 𝑵 𝑵 𝝏𝟐 𝒈
𝒌=𝟏 𝒙 𝒌 − 𝒚𝒌 𝒋=𝟏 𝝏𝒛 𝝏𝒛 𝝏𝝀 = 𝒌=𝟏 𝒋=𝟏 𝒙 𝒌 − 𝒚𝒌 𝒙 𝒋 − 𝒚𝒋
𝒌 𝒋 𝝏𝒛𝒌 𝝏𝒛𝒋
𝝏𝟐 𝒈 𝝏𝟐 𝒈 𝝏𝟐 𝒈 𝝏𝟐 𝒈
𝝏𝒙𝟐𝟏 𝝏𝒙𝟏 𝝏𝒙𝟐 𝝏𝒙𝟏 𝝏𝒙𝑵−𝟏 𝝏𝒙𝟏 𝝏𝒙𝑵
⋯
𝝏𝟐 𝒈 𝝏𝟐 𝒈 𝝏𝟐 𝒈 𝝏𝟐 𝒈
𝝏𝒙𝟐 𝝏𝒙𝟏 𝝏𝒙𝟐𝟐 𝝏𝒙𝟐 𝝏𝒙𝑵−𝟏 𝝏𝒙𝟐 𝝏𝒙𝑵
𝓗 𝒙 = ⋮ ⋱ ⋮
𝟐
𝝏 𝒈 𝟐
𝝏𝟐 𝒈 𝝏 𝒈 𝝏𝟐 𝒈
𝝏𝒙𝑵−𝟏 𝝏𝒙𝟏 𝝏𝒙𝑵−𝟏 𝝏𝒙𝟐 𝝏𝒙𝟐𝑵−𝟏 𝝏𝒙𝑵−𝟏 𝝏𝒙𝑵
⋯
𝝏𝟐 𝒈 𝝏𝟐 𝒈 𝟐
𝝏 𝒈 𝝏𝟐 𝒈
𝝏𝒙𝑵 𝝏𝒙𝟏 𝝏𝒙𝑵 𝝏𝒙𝟐 𝝏𝒙𝑵 𝝏𝒙𝑵−𝟏 𝝏𝒙𝟐𝑵
We now show that 𝒈(𝒙) is convex iff 𝝋 𝝀 is convex. If 𝒈(𝒙) is convex, then for 𝒙, 𝒚 ∈ 𝓓 and
𝜶, 𝝀𝟏 , 𝝀𝟐 ∈ 𝟎, 𝟏 :
22
𝝋 𝜶𝝀𝟏 + (𝟏 − 𝜶)𝝀𝟐 = 𝒈 𝜶𝝀𝟏 + (𝟏 − 𝜶)𝝀𝟐 𝒙 + 𝟏 − 𝜶𝝀𝟏 + (𝟏 − 𝜶)𝝀𝟐 𝒚 =
𝒈 𝜶 𝝀𝟏 𝒙 + 𝟏 − 𝝀𝟏 𝒚 + 𝟏 − 𝜶 𝝀𝟐 𝒙 + 𝟏 − 𝝀𝟐 𝒚 ≤ 𝜶𝒈 𝝀𝟏 𝒙 + 𝟏 − 𝝀𝟏 𝒚 +
𝟏 − 𝜶 𝒈 𝝀𝟐 𝒙 + 𝟏 − 𝝀𝟐 𝒚 = 𝜶𝝋 𝝀𝟏 + 𝟏 − 𝜶 𝝋 𝝀𝟐 ⟹ 𝝋 𝝀 is convex.
𝒈 𝜶 𝝀𝟏 𝒙 + 𝟏 − 𝝀𝟏 𝒚 + 𝟏 − 𝜶 𝝀𝟐 𝒙 + 𝟏 − 𝝀𝟐 𝒚 = 𝝋 𝜶𝝀𝟏 + (𝟏 − 𝜶)𝝀𝟐 ≤ 𝜶𝝋 𝝀𝟏 +
𝟏 − 𝜶 𝝋 𝝀𝟐 = 𝜶𝒈 𝝀𝟏 𝒙 + 𝟏 − 𝝀𝟏 𝒚 + 𝟏 − 𝜶 𝒈 𝝀𝟐 𝒙 + 𝟏 − 𝝀𝟐 𝒚 ⟹ 𝒈(𝒙) is convex
Assume that 𝒈(𝒙) is a convex differentiable function with continuous first-order partial
derivatives. Function 𝒈(𝒙) is convex iff 𝒈 𝒙 ≥ 𝒈 𝒚 + 𝛁 𝑻 𝒈 𝒚 𝒙 − 𝒚 for every 𝒙, 𝒚 ∈ 𝓓,
Let us now prove the equivalence between this condition and the main definition of convexity.
Since 𝒈 𝝀𝒙 + 𝟏 − 𝝀 𝒚 ≤ 𝝀𝒈 𝒙 + 𝟏 − 𝝀 𝒈 𝒚 for every 𝒙, 𝒚 ∈ 𝓓 and 𝝀 ∈ 𝟎, 𝟏 ,
𝟏
𝝋 𝝀 ≤ 𝝀𝝋 𝟏 + 𝟏 − 𝝀 𝝋 𝟎 . Thus, for 𝝀 > 0, 𝝋 𝟏 − 𝝋 𝟎 = 𝒈 𝒙 − 𝒈 𝒚 ≥ 𝝀 𝝋 𝝀 −
𝝋 𝟎 .
𝟏
Therefore, 𝒈 𝒙 − 𝒈 𝒚 ≥ 𝐥𝐢𝐦 𝝀 𝝋 𝝀 − 𝝋 𝟎 = 𝝋 𝟏 𝟎 = 𝛁𝑻𝒈 𝒚 𝒙 − 𝒚 ⟹
𝝀↓𝟎
𝒈 𝒙𝟏 ≥ 𝒈 𝒙𝟑 + 𝛁 𝑻 𝒈 𝒙𝟑 𝒙𝟏 − 𝒙𝟑
𝒈 𝒙𝟐 ≥ 𝒈 𝒙𝟑 + 𝛁 𝑻 𝒈 𝒙𝟑 𝒙𝟐 − 𝒙𝟑
23
𝝀𝒈 𝒙𝟏 + 𝟏 − 𝝀 𝒈 𝒙𝟐 ≥ 𝒈 𝒙𝟑 + 𝛁 𝑻 𝒈 𝒙𝟑 𝝀𝒙𝟏 + 𝟏 − 𝝀 𝒙𝟐 − 𝒙𝟑
Now we tackle the case where 𝒈(𝒙) is twice continuously differentiable, i.e., it has continuous
second-order partial derivatives.
If the Hessian matrix is positive semidefinite at all points in 𝒙, 𝒚 ∈ 𝓓, there exists some
𝝀 ∈ 𝟎, 𝟏 such that:
𝟏
𝒈 𝒙 = 𝒈 𝒚 + 𝛁T𝒈 𝒚 𝒙 − 𝒚 + 𝒙 − 𝒚 T 𝓗 𝝀𝒙 + (𝟏 − 𝝀)𝒚 𝒙 − 𝒚
𝟐
= 𝒈 𝒚 + 𝛁 T 𝒈 𝒚 𝒙 − 𝒚 + nonnegative term
We prove now that the convexity of 𝒈 𝒙 implies that 𝓗(𝒙) is positive semidefinite for every
𝒙 ∈ 𝓓. When 𝑵 = 𝟏, a convex function 𝒈(𝒙) with first derivative 𝒈(𝟏) 𝒙 satisfies
𝒈(𝟏) 𝒙 −𝒈(𝟏) 𝒚
𝒈(𝟏) 𝒙 − 𝒈(𝟏) 𝒚 𝒙 − 𝒚 ≥ 𝟎. If 𝒙 ≠ 𝒚, dividing by 𝒙 − 𝒚 𝟐
we obtain ≥ 0.
𝒙−𝒚
𝒅𝟐 𝒈 𝒙
As 𝒚 tends to 𝒙, the left-hand-side approaches the second derivative yielding ≥ 𝟎.
𝒅𝒙𝟐
24
𝟏 𝟏
𝝋(𝟐) ≥ 𝟎. Thus, for 𝝀 = 𝟐 , 𝒕 ∈ ℝ+, 𝒙 = 𝒛 + 𝒕𝒗, 𝒚 = 𝒛 − 𝒕𝒗 and 𝒙, 𝒚 ∈ 𝓓:
𝟐
Like in the preceding section it can be shown that strict convexity is necessary and sufficient for
If the Hessian matrix is positive definite at all points in 𝓓, 𝒙, 𝒚 ∈ 𝓓, 𝒙 ≠ 𝒚, there exists some
𝝀 ∈ 𝟎, 𝟏 such that:
𝟏
𝒈 𝒙 = 𝒈 𝒚 + 𝛁T𝒈 𝒚 𝒙 − 𝒚 + 𝒙 − 𝒚 T 𝓗 𝝀𝒙 + (𝟏 − 𝝀)𝒚 𝒙 − 𝒚
𝟐
= 𝒈 𝒚 + 𝛁 T 𝒈 𝒚 𝒙 − 𝒚 + positive term
The converse is not necessarily true. For example, the one-dimensional function 𝒈 𝒙 = 𝒙𝟒 is
strictly convex although its second derivative is zero at 𝒙 = 𝟎.
𝟏 𝟐
𝒈 𝝀𝒙 + 𝟏 − 𝝀 𝒚 ≤ 𝝀𝒈 𝒙 + 𝟏 − 𝝀 𝒈 𝒚 − 𝟐 𝜼𝝀 𝟏 − 𝝀 𝒙 − 𝒚 𝟐
25
It is clear that strong convexity implies strict convexity. Now we repeat the above derivations
for differentiable convex functions in case the function is strongly convex.
𝟏 𝟐
Since 𝒈 𝝀𝒙 + 𝟏 − 𝝀 𝒚 ≤ 𝝀𝒈 𝒙 + 𝟏 − 𝝀 𝒈 𝒚 − 𝟐 𝜼𝝀 𝟏 − 𝝀 𝒙 − 𝒚 𝟐 for every 𝒙, 𝒚 ∈ 𝓓
𝟏
and 𝝀 ∈ 𝟎, 𝟏 , 𝝋 𝝀 ≤ 𝝀𝝋 𝟏 + 𝟏 − 𝝀 𝝋 𝟎 − 𝟐 𝜼𝝀 𝟏 − 𝝀 𝒙 − 𝒚 𝟐𝟐 . Thus, for 𝝀 > 0,
𝟏 𝟏
𝝋 𝟏 −𝝋 𝟎 =𝒈 𝒙 −𝒈 𝒚 ≥ 𝝀 𝝋 𝝀 −𝝋 𝟎 + 𝟐 𝜼 𝟏 − 𝝀 𝒙 − 𝒚 𝟐𝟐 . Thus,
𝟏 𝟏 𝟐
𝟏 𝟐
𝒈 𝒙 − 𝒈 𝒚 ≥ 𝐥𝐢𝐦 𝝋 𝝀 −𝝋 𝟎 + 𝜼 𝟏−𝝀 𝒙−𝒚 𝟐 = 𝝋(𝟏) 𝟎 + 𝜼 𝒙 − 𝒚 𝟐
𝝀↓𝟎 𝝀 𝟐 𝟐
𝟏
= 𝛁 𝑻 𝒈 𝒚 𝒙 − 𝒚 + 𝜼 𝒙 − 𝒚 𝟐𝟐
𝟐
𝟏 𝟐
Therefore, 𝒈 𝒙 ≥ 𝒈 𝒚 +𝛁 𝑻 𝒈 𝒚 𝒙 − 𝒚 + 𝟐 𝜼 𝒙 − 𝒚 𝟐
𝟏 𝟐
Similarly, 𝒈 𝒚 ≥ 𝒈 𝒙 +𝛁 𝑻 𝒈 𝒙 𝒚 − 𝒙 + 𝟐 𝜼 𝒙 − 𝒚 𝟐
𝟐
Adding, 𝛁 𝑻 𝒈 𝒚 𝒙 − 𝒚 +𝛁 𝑻 𝒈 𝒙 𝒚 − 𝒙 + 𝜼 𝒙 − 𝒚 𝟐 ≤𝟎⟹
𝑻 𝟐
𝛁𝒈 𝒙 − 𝛁𝒈 𝒚 𝒙−𝒚 ≥ 𝜼 𝒙−𝒚 𝟐
𝟏 𝟐
𝒈 𝒙 ≥ 𝒈 𝒚 +𝛁 𝑻 𝒈 𝒚 𝒙 − 𝒚 + 𝜼 𝒙 − 𝒚 𝟐
𝟐
We can show that the function satisfies the main definition of strong convexity.
Consider 𝒙𝟏 , 𝒙𝟐 , 𝒙𝟑 ∈ 𝓓.
𝟏 𝟐
𝒈 𝒙𝟏 ≥ 𝒈 𝒙𝟑 +𝛁 𝑻 𝒈 𝒙𝟑 𝒙𝟏 − 𝒙𝟑 + 𝜼 𝒙𝟏 − 𝒙𝟑 𝟐
𝟐
𝟏 𝟐
𝒈 𝒙𝟐 ≥ 𝒈 𝒙𝟑 +𝛁 𝑻 𝒈 𝒙𝟑 𝒙𝟐 − 𝒙𝟑 + 𝜼 𝒙𝟐 − 𝒙𝟑 𝟐
𝟐
26
𝝀𝒈 𝒙𝟏 + 𝟏 − 𝝀 𝒈 𝒙𝟐
𝟏 𝟐
≥ 𝒈 𝒙𝟑 +𝛁 𝑻 𝒈 𝒙𝟑 𝝀𝒙𝟏 + 𝟏 − 𝝀 𝒙𝟐 − 𝒙𝟑 + 𝜼𝝀 𝒙𝟏 − 𝒙𝟑 𝟐
𝟐
𝟏
+ 𝜼 𝟏 − 𝝀 𝒙𝟐 − 𝒙𝟑 𝟐𝟐
𝟐
Choosing 𝒙𝟑 = 𝝀𝒙𝟏 + 𝟏 − 𝝀 𝒙𝟐 ,
𝟏 𝟐
𝝀𝒈 𝒙𝟏 + 𝟏 − 𝝀 𝒈 𝒙𝟐 ≥ 𝒈 𝝀𝒙𝟏 + 𝟏 − 𝝀 𝒙𝟐 + 𝜼𝝀 𝟏 − 𝝀 𝒙𝟏 − 𝒙𝟐 𝟐
𝟐
𝟏
𝒈 𝒙 = 𝒈 𝒚 + 𝛁T𝒈 𝒚 𝒙 − 𝒚 + 𝒙 − 𝒚 T 𝓗 𝝀𝒙 + (𝟏 − 𝝀)𝒚 𝒙 − 𝒚
𝟐
T
Since 𝒙 − 𝒚 𝓗 𝝀𝒙 + (𝟏 − 𝝀)𝒚 − 𝜼𝑰𝑵×𝑵 𝒙 − 𝒚 ≥ 𝟎 for all points in the domain, then
𝟏 𝟐
𝒈 𝒙 ≥ 𝒈 𝒚 + 𝛁T𝒈 𝒚 𝒙 − 𝒚 + 𝟐 𝜼 𝒙 − 𝒚 𝟐 and 𝒈(𝒙) is strongly convex.
When 𝑵 = 𝟏, a strongly convex function 𝒈(𝒙) with first derivative 𝒈(𝟏) 𝒙 satisfies
𝒅𝟐 𝒈 𝒙
As 𝒚 tends to 𝒙, the left-hand-side approaches the second derivative yielding ≥𝜼⟹
𝒅𝒙𝟐
𝒅𝟐 𝒈 𝒙
− 𝜼 ≥ 𝟎.
𝒅𝒙𝟐
27
𝟏 𝟐 𝟐
𝟏 − 𝜶 𝒈 𝝀𝟐 𝒙 + 𝟏 − 𝝀𝟐 𝒚 − 𝟐 𝜼𝜶 𝟏 − 𝜶 𝝀𝟏 − 𝝀𝟐 𝒙−𝒚 𝟐 = 𝜶𝝋 𝝀𝟏 + 𝟏 −
𝟏 𝟐 𝟐
𝜶 𝝋 𝝀𝟐 − 𝟐 𝜼 𝒙 − 𝒚 𝟐 𝜶 𝟏 − 𝜶 𝝀𝟏 − 𝝀𝟐 ⟹ 𝝋 𝝀 is strongly convex with parameter
𝜼 𝒙 − 𝒚 𝟐𝟐 .
𝟐
If 𝝋 𝝀 is strongly convex with parameter 𝜼 𝒙 − 𝒚 𝟐 for 𝒙, 𝒚 ∈ 𝓓, then for 𝜶, 𝝀𝟏 , 𝝀𝟐 ∈ 𝟎, 𝟏 :
𝒈 𝜶 𝝀𝟏 𝒙 + 𝟏 − 𝝀𝟏 𝒚 + 𝟏 − 𝜶 𝝀𝟐 𝒙 + 𝟏 − 𝝀𝟐 𝒚 = 𝝋 𝜶𝝀𝟏 + (𝟏 − 𝜶)𝝀𝟐 ≤ 𝜶𝝋 𝝀𝟏 +
𝟏 𝟐 𝟐
𝟏 − 𝜶 𝝋 𝝀𝟐 − 𝟐 𝜼 𝒙 − 𝒚 𝟐 𝜶 𝟏 − 𝜶 𝝀𝟏 − 𝝀𝟐 = 𝜶𝒈 𝝀𝟏 𝒙 + 𝟏 − 𝝀𝟏 𝒚 + 𝟏 −
𝟏 𝟐 𝟐
𝜶 𝒈 𝝀𝟐 𝒙 + 𝟏 − 𝝀𝟐 𝒚 − 𝟐 𝜼𝜶 𝟏 − 𝜶 𝝀𝟏 − 𝝀𝟐 𝒙−𝒚 𝟐 ⟹ 𝒈(𝒙) is strongly convex.
If 𝒈(𝒙) is convex, then scalar-valued 𝝋 𝝀 is strongly convex and its second derivative is greater
than or equal to 𝜼 𝒙 − 𝒚 𝟐𝟐 . Hence, for 𝒙, 𝒚 ∈ 𝓓 and 𝝀 ∈ 𝟎, 𝟏 : 𝒙 − 𝒚 𝐓 𝓗 𝝀𝒙 +
𝟏 − 𝝀 𝒚 𝒙 − 𝒚 ≥ 𝜼 𝒙 − 𝒚 𝟐𝟐 . Similar to the case of ordinary convexity, we conclude that
𝓗 𝒙 − 𝜼𝑰𝑵×𝑵 is positive semidefinite for all 𝒙 ∈ 𝓓.
28
Convex Optimization Problems
Suppose we are interested in minimizing the convex function 𝒇(𝒙) with domain 𝓓. If 𝒙∗ is a
local optimum, then it is a global optimum. We can prove this by contradiction. Since 𝒙∗ is a
local optimum, there exists an open neighborhood about 𝒙∗ with radius 𝝐, denoted 𝑵𝝐 𝒙∗ ,
such that for all 𝒙 ∈ 𝑵𝝐 𝒙∗ , i.e., all 𝒙 such that 𝒙 − 𝒙∗ 𝟐 < 𝜖, we have 𝒇(𝒙) ≥
𝒇 𝒙∗ . Assume that 𝒙∗ is not a global optimum. Specifically, assume that there exists 𝒛 ∈ 𝓓 but
𝒛 ∉ 𝑵𝝐 𝒙∗ such that 𝒇 𝒛 < 𝑓 𝒙∗ .
𝐦𝐢𝐧 𝒇 𝒙
𝒙
subject to 𝒉𝒊 𝒙 ≤ 𝟎, 𝒊 = 𝟏, … , 𝒒
𝒈𝒋 𝒙 = 𝟎, 𝒋 = 𝟏, … , 𝒑
𝒒 𝒑
The functions 𝒇(𝒙) and 𝒉𝒊 𝒙 𝒊=𝟏 are convex, and 𝒈𝒋 𝒙 are affine. Recall that a convex
𝒋=𝟏
function has a domain which is a convex set. The feasible set, i.e., the set of all vectors 𝒙 that
satisfy the constraints, is a convex set. The problem is infeasible if there is no 𝒙 that satisfies the
constraints of the problem.
Assume that the optimal value of the feasible convex program is 𝒑∗ = 𝐦𝐢𝐧𝒙 𝒇(𝒙). The set of
optimal solutions, 𝓞, is a convex set. If 𝒙, 𝒚 ∈ 𝓞, then 𝒇 𝒙 = 𝒑∗ and 𝒇 𝒚 = 𝒑∗ . For 𝝀 ∈ 𝟎, 𝟏 ,
𝒑∗ ≤ 𝒇 𝝀𝒙 + 𝟏 − 𝝀 𝒚 ≤ 𝝀𝒇 𝒙 + 𝟏 − 𝝀 𝒇 𝒚 = 𝒑∗ . Hence, 𝒇 𝝀𝒙 + 𝟏 − 𝝀 𝒚 = 𝒑∗ for all
𝒙, 𝒚 ∈ 𝓞 and 𝝀 ∈ 𝟎, 𝟏 . That is, 𝓞 is a convex set. (If the problem is infeasible, 𝓞 is empty. The
empty set is considered convex.)
29
If 𝒇(𝒙) is strictly convex and 𝒙, 𝒚 ∈ 𝓞, then for 𝝀 ∈ 𝟎, 𝟏 , if 𝒙 ≠ 𝒚, 𝒇 𝝀𝒙 + 𝟏 − 𝝀 𝒚 <
𝜆𝑓 𝒙 + 𝟏 − 𝝀 𝒇 𝒚 = 𝒑∗ , thereby contradicting 𝒑∗ = 𝐦𝐢𝐧𝒙 𝒇(𝒙). Hence, 𝒙 = 𝒚. The optimal
solution, if it exists, is unique when the objective function is strictly convex. If the objective
function is convex, but not strictly convex, then the optimal solution may be unique or not. For
example, 𝐦𝐢𝐧𝒙∈ℝ𝒏×𝟏 𝒙 𝟐 = 𝟎 and is achieved uniquely by the 𝒏-dimensional all-zero vector.
𝟐
This is despite the fact that 𝒙 𝟐 is not strictly convex (whereas 𝒙 𝟐 is.)
30
Periodic Signals and Fourier Series
We focus here on one-dimensional signals. A signal 𝒙(𝜶) is periodic if there exists ∆> 0 such
that 𝒙 𝜶 + ∆ = 𝒙(𝜶) for all 𝜶.
The Fourier Series of periodic signal 𝒙(𝜶) with period ∆ is given by:
∞
𝒌 𝒌
𝒙 𝜶 = 𝒂𝟎 + 𝒂𝒌 𝐜𝐨𝐬 𝟐𝝅 𝜶 + 𝒃𝒌 𝐬𝐢𝐧 𝟐𝝅 𝜶
∆ ∆
𝒌=𝟏
𝟏 𝜷+∆
𝒂𝟎 = ∆ 𝜷
𝒙 𝜶 𝒅𝜶
𝟐 𝜷+∆ 𝒌
∀𝒌 ∈ 𝟏, 𝟐, 𝟑, … 𝒂𝒌 = ∆ 𝜷
𝒙 𝜶 𝐜𝐨𝐬 𝟐𝝅 ∆ 𝜶 𝒅𝜶
𝟐 𝜷+∆ 𝒌
∀𝒌 ∈ 𝟏, 𝟐, 𝟑, … 𝒃𝒌 = ∆ 𝜷
𝒙 𝜶 𝐬𝐢𝐧 𝟐𝝅 ∆ 𝜶 𝒅𝜶
∞
𝒌
𝒙 𝜶 = 𝒄𝒌 exp 𝒊𝟐𝝅 𝜶
∆
𝒌=−∞
𝟏 𝜷+∆ 𝒌
where for all integer 𝒌, 𝒄𝒌 = ∆ 𝜷
𝒙 𝜶 exp −𝒊𝟐𝝅 ∆ 𝜶 𝒅𝜶
Note that 𝜷 is typically set to 𝟎, −∆ 𝟐 or whatever value may be convenient for the evaluation
of the integrals. If 𝒙 𝜶 is real-valued, 𝒄𝒌 = 𝒄∗−𝒌 .
31
Important Note: The equality in Fourier series is not pointwise equality over ℝ. For periodic
1 𝜷+∆ 𝟐
𝑳𝟐 𝟎, ∆ signals with finite energy over a period, i.e., for which ∆ 𝜷
𝒙 𝜶 𝒅𝜶 is finite, the
𝜷+∆ 𝒌 𝟐
𝒏
equality should be understood as 𝐥𝐢𝐦𝒏→∞ 𝜷
𝒙 𝜶 − 𝒌=−𝒏 𝒄𝒌 exp 𝒊𝟐𝝅 ∆ 𝜶 𝒅𝜶 = 𝟎. In
1966, Lennart Carelson proved the following pointwise almost everywhere convergence result
for finite-energy signals:
𝒏 𝒌
𝒙 𝜶 = 𝐥𝐢𝐦 𝒌=−𝒏 𝒄𝒌 exp 𝒊𝟐𝝅 ∆ 𝜶 almost everywhere
𝒏→∞
Almost everywhere means over the whole domain perhaps excluding a set of measure zero.
Pointwise convergence over set 𝛀 means that ∀𝜶 ∈ 𝛀, and ∀𝝐 > 0, there exists 𝒏 𝜶 ≥ 𝟎 such
𝒎 𝒌
that 𝒙 𝜶 − 𝒌=−𝒎 𝒄𝒌 exp 𝒊𝟐𝝅 ∆ 𝜶 < 𝜖 for all 𝒎 ≥ 𝒏 𝜶 .
If 𝒙 𝜶 has a finite number of finite discontinuities over a period, and if at each point in the
period the left and right derivatives exist, it can be shown that at all points the Fourier sum
𝟏
converges pointwise to 𝟐 𝒙 𝜶− + 𝒙 𝜶+ , where 𝒙 𝜶− and 𝒙 𝜶+ are the one-sided limits1
at 𝜶.
Periodization of a Signal
∞
Consider the signal 𝒚(𝜶) and 𝑷 > 0. The summation 𝒎=−∞ 𝒚 𝜶 − 𝒎𝑷 , provided that it
converges for all 𝜶, is called the periodization of 𝒚(𝜶).
∞ ∞ ∞
𝒚 𝜶+𝑷 = 𝒚 𝜶 + 𝑷 − 𝒎𝑷 = 𝒚 𝜶− 𝒎−𝟏 𝑷 = 𝒚 𝜶 − 𝒎𝑷
𝒎=−∞ 𝒎=−∞ 𝒎=−∞
thereby indicating that 𝑷 is a period for this signal or, equivalently, that the summation repeats
every 𝑷.
1
https://en.wikipedia.org/wiki/One-sided_limit
32
If the signal has a bounded support with 𝜶 ∈ −𝒒, 𝒒 , and if 𝑷 > 2𝒒, then the shifted replicas in
the infinite sum will not overlap.
33
Dirac Delta "Function"
∞ 𝟎+
𝜹 𝒕 𝒙 𝒕 𝒅𝒕 = 𝜹 𝒕 𝒙 𝒕 𝒅𝒕 = 𝒙 𝟎
−∞ 𝟎−
provided that 𝒙 𝒕 is continuous at 𝒕 = 𝟎. The lower limit 𝟎− can be taken to mean a negative number
that is infinitesimally small in magnitude. Similarly, 𝟎+ can be taken to be a positive number that is
infinitesimally small in magnitude. If zero is not an interior point of the domain of integration 𝓓, then
𝓓
𝜹 𝒕 𝒙 𝒕 𝒅𝒕 = 𝟎 if zero is outside 𝓓. We will take the integral to be undefined if 𝟎 is a boundary
point.
∞
- When 𝒙 𝒕 = 𝟏, −∞
𝜹 𝒕 𝒅𝒕 = 𝟏.
∞
−∞
𝜹 𝒂𝒕 − 𝒃 𝒙 𝒕 𝒅𝒕 and see how it is related to the basic property of Dirac delta. Let 𝒖 = 𝒂𝒕 − 𝒃.
𝟏 ∞ 𝒖+𝒃
If 𝒂 > 0, the integral becomes 𝒂 −∞
𝜹 𝒖 𝒙 𝒂
𝒅𝒖. If 𝒂 < 0, the integral becomes
𝟏 −∞ 𝒖+𝒃
𝒂 ∞
𝜹 𝒖 𝒙 𝒂
𝒅𝒖. Both cases can be combined as:
𝟏 ∞ 𝒖+𝒃 𝟏 𝟎+𝒃 𝟏 𝒃 𝟏 ∞ 𝒃
−∞
𝜹 𝒖 𝒙 𝒅𝒖 = 𝒙 = 𝒙 = −∞
𝜹 𝒕 − 𝒂 𝒙 𝒕 𝒅𝒕.
𝒂 𝒂 𝒂 𝒂 𝒂 𝒂 𝒂
𝟏 𝒃
Because of this, we will set 𝜹 𝒂𝒕 − 𝒃 = 𝒂
𝜹 𝒕−𝒂 .
- When 𝒃 = 𝟎 and 𝒂 = −𝟏, we obtain 𝜹 −𝒕 = 𝜹 𝒕 meaning that the Dirac delta is even.
∞
- When 𝒃 = 𝒕𝟎 and 𝒂 = 𝟏, −∞
𝜹 𝒕 − 𝒕𝟎 𝒙 𝒕 𝒅𝒕 = 𝒙 𝒕𝟎 provided that 𝒙 𝒕 is continuous at 𝒕 = 𝒕𝟎 .
34
Hence, 𝒚 𝒕 𝜹 𝒕 = 𝒚 𝟎 𝜹 𝒕 . If 𝒚 𝟎 = 𝟎, then 𝒚 𝒕 𝜹 𝒕 = 𝟎.
∞
-Convolution: 𝜹 𝒕 − 𝒖 ∗ 𝒙 𝒕 − 𝒘 = −∞
𝜹 𝒛 − 𝒖 𝒙 𝒕 − 𝒛 − 𝒘 𝒅𝒛 = 𝒙 𝒕 − 𝒖 − 𝒘 .
-Derivatives: The Dirac delta is infinitely differentiable (although, of course, not in the classical sense).
Like the Dirac delta, its derivatives can be characterized through their impact on other functions under
𝟏 𝟏
the integral sign. Denoting the first derivative of the Dirac delta by 𝜹 𝒕 and of 𝒙(𝒕) as 𝒙 𝒕:
∞ ∞ ∞
𝟏 𝟏 𝟏
𝜹 𝒕 𝒙 𝒕 𝒅𝒕 = 𝒙 𝒕 𝒅𝜹 𝒕 = − 𝜹 𝒕 𝒙 𝒕 𝒅𝒕 = −𝒙 𝟎
−∞ −∞ −∞
𝟏
provided that 𝒙 𝒕 is continuous at 𝒕 = 𝟎. (Note the use of 𝐥𝐢𝐦𝒕→±∞ 𝜹 𝒕 = 𝟎.)
𝒅𝒖(𝒕) ∞ 𝒅𝒖 𝒕
-𝜹 𝒕 = 𝒅𝒕
, where 𝒖(𝒕) is Heaviside unit-step function. This is because −∞ 𝒅𝒕
𝒙 𝒕 𝒅𝒕 =
∞ ∞
𝐥𝐢𝐦𝒕→∞ 𝒙 𝒕 𝒖(𝒕) − −∞
𝒖(𝒕)𝒙 𝟏 𝒅𝒕 = 𝐥𝐢𝐦𝒕→∞ 𝒙 𝒕 − 𝟎
𝒙 𝟏
𝒅𝒕 = 𝐥𝐢𝐦𝒕→∞ 𝒙 𝒕 − 𝐥𝐢𝐦𝒕→∞ 𝒙 𝒕 −
𝒙(𝟎) = 𝒙 𝟎 .
∞
-CTFT 𝜹 𝒕 = −∞
𝜹 𝒕 exp −𝒊𝟐𝝅𝒇𝒕 𝒅𝒕 = exp −𝒊𝟐𝝅𝒇𝟎 = 𝟏. Hence, the inverse CTFT of 𝟏 is 𝜹 𝒕 ,
∞ ∞
i.e., −∞
𝟏exp 𝒊𝟐𝝅𝒇𝒕 𝒅𝒇 = 𝜹 𝒕 . Generally, −∞
exp ±𝒊𝟐𝝅𝒘𝒛 𝒅𝒘 = 𝜹 𝒛 .
-Composition with Dirac Delta: Consider differentiable function 𝒈 𝒕 with roots at 𝒕𝟏 , 𝒕𝟐 , …, i.e.,
𝒈 𝒕𝟏 = 𝟎, 𝒈 𝒕𝟐 = 𝟎, …. Suppose that 𝒈 𝒕 has continuous nonzero derivatives at 𝒕𝟏 , 𝒕𝟐 , …,
𝒅𝒈(𝒕)
This means that 𝒈(𝟏) 𝒕𝟏 ≠ 𝟎, 𝒈(𝟏) 𝒕𝟐 ≠ 𝟎, etc., where 𝒈(𝟏) 𝒕 = 𝒅𝒕
.
∞ 𝒕𝒋 +𝝐
−∞
𝜹 𝒈 𝒕 𝝋 𝒕 𝒈(𝟏) 𝒕 𝒅𝒕 = 𝒋 𝒕 −𝝐 𝜹 𝒈 𝒕 𝝋 𝒕 𝒈(𝟏) 𝒕 𝒅𝒕 with infinitesimally small 𝝐.
𝒋
35
Over 𝒕 ∈ 𝒕𝒋 − 𝝐, 𝒕𝒋 + 𝝐 , 𝒈 𝒕 ≈ 𝒈 𝒕𝒋 + 𝒈(𝟏) 𝒕𝒋 𝒕 − 𝒕𝒋 = 𝒈(𝟏) 𝒕𝒋 𝒕 − 𝒕𝒋 and 𝒈(𝟏) 𝒕𝒋 ≠ 𝟎.
𝟏 𝟏
Since 𝒈 𝒕𝒋 𝒕 − 𝒕𝒋 = 𝜹 𝒕 − 𝒕𝒋 ,
𝒈 𝟏 𝒕𝒋
∞ 𝟏
−∞
𝜹 𝒈 𝒕 𝝋 𝒕 𝒈 𝟏 𝒕 𝒅𝒕 = 𝒋 𝒈 𝟏
𝝋 𝒕𝒋 𝒈 𝟏 𝒕𝒋 = 𝒋𝝋 𝒕𝒋 .
𝒕𝒋
𝜹 𝒕−𝒕𝒋 𝜹 𝒕−𝒕𝒋
Thus, 𝜹 𝒈 𝒕 𝒈𝟏 𝒕 = 𝒋𝜹 𝒕 − 𝒕𝒋 ⇒ 𝜹 𝒈 𝒕 = 𝒋 𝒈𝟏 𝒕 = 𝒋 𝒈𝟏 𝒕 .
𝒋
𝟏 𝟏
That is, for 𝒈 𝒕 with a continuous derivative satisfying 𝒈 𝒕𝟏 = 𝟎, 𝒈 𝒕𝟏 ≠ 𝟎, 𝒈 𝒕𝟐 = 𝟎, 𝒈 𝒕𝟐 ≠
𝜹 𝒕−𝒕𝒋
𝟎, etc., 𝜹 𝒈 𝒕 = 𝒋 𝒈𝟏 𝒕 .
𝒋
36
Soft-Thresholding Operator
𝟏 𝟐
𝒈 𝒛 = 𝒛 − 𝒛𝟎 +𝝀 𝒛
𝟐
where 𝝀 ∈ ℝ+.
𝟏 𝟏 𝟏 𝟏
𝒈 𝒛 −𝒈 𝟎 = 𝒛 − 𝒛𝟎 𝟐
+ 𝝀 𝒛 − 𝒛𝟐𝟎 = 𝒛𝟐 + 𝝀 𝒛 − 𝒛𝟎 𝒛 ≥ 𝒛𝟐 + 𝝀 𝒛 − 𝒛𝟎 𝒛 ⟹
𝟐 𝟐 𝟐 𝟐
𝟏 𝟐
𝒈 𝒛 −𝒈 𝟎 ≥ 𝒛 + 𝝀 − 𝒛𝟎 𝒛
𝟐
𝟏
If 𝒛𝟎 ≤ 𝝀, 𝒈 𝒛 ≥ 𝒈 𝟎 + 𝟐 𝒛𝟐 ≥ 𝒈(𝟎).
𝒅𝒈(𝒛)
= 𝒛 − 𝒛𝟎 + 𝝀sign 𝒛 , where sign 𝒛 = 𝟏 when 𝒛 > 0 and sign 𝒛 = −𝟏 when 𝒛 < 0.
𝒅𝒛
𝒅𝒈(𝒛)
= 𝟎 ⟹ 𝒛 = 𝒛𝟎 − 𝝀sign 𝒛 . If 𝒛 > 0, 𝒛𝟎 − 𝝀 > 0 ⟹ 𝒛𝟎 > 𝜆 > 0.
𝒅𝒛
If 𝒛 < 0, 𝒛𝟎 + 𝝀 < 0 ⟹ 𝒛𝟎 < −𝜆 < 0. Thus, when 𝒛 ≠ 𝟎 and 𝒛𝟎 > 𝜆, the stationary point is
𝒛 = 𝒛𝟎 − 𝝀sign 𝒛𝟎 . The second derivative in this case is always 𝟏, i.e., is positive and, hence,
when 𝒛𝟎 > 𝜆, 𝒈(𝒛) is minimized at 𝒛 = 𝒛𝟎 − 𝝀sign 𝒛𝟎 . We can combine the cases of
𝒛𝟎 ≤ 𝝀 and 𝒛𝟎 > 𝜆 by introducing the soft-thresholding operator 𝑺𝝀 𝒛𝟎 :
𝟎, 𝒛𝟎 ≤ 𝝀
𝑺𝝀 𝒛𝟎 =
𝒛𝟎 − 𝝀sign 𝒛𝟎 , 𝒛𝟎 > 𝜆
37
Important Sum
Consider positive integer 𝑵 and the sum:
𝑵−𝟏
𝒎𝒗
𝑺 𝒗 = exp ±𝒊𝟐𝝅
𝑵
𝒎=𝟎
𝑵−𝟏
If 𝒗 is a multiple of 𝑵, then 𝒗 = 𝒍𝑵 with 𝒍 ∈ ℤ. In this case, 𝑺 𝒗 = 𝒎=𝟎 exp ±𝒊𝟐𝝅𝒎𝒍 =
𝑵−𝟏
𝒎=𝟎 1 = 𝑵 .
If 𝒗 is not a multiple of 𝑵, we can apply the rule for geometric series to obtain
𝒗 𝑵
𝑵−𝟏
𝒗 𝒎 𝟏 − exp ±𝒊𝟐𝝅 𝑵 𝟏 − exp ±𝒊𝟐𝝅𝒗
𝑺 𝒗 = exp ±𝒊𝟐𝝅 = 𝒗 = 𝒗 = 𝟎.
𝑵 𝟏 − exp ±𝒊𝟐𝝅 𝑵 𝟏 − exp ±𝒊𝟐𝝅 𝑵
𝒎=𝟎
(Note that the denominator is not equal to zero as 𝒗 is not a multiple of 𝑵.)
Therefore,
𝒎𝒗 𝑵, 𝑣 = 𝑙𝑁, 𝑙 ∈ ℤ
𝑺 𝒗 = 𝑵−𝟏
𝒎=𝟎 exp ±𝒊𝟐𝝅 𝑵 =
𝟎, otherwise
Recall that, for 𝒛 ∈ ℂ the sum of geometric series is given by:
𝑴
𝟏 − 𝒛𝑴−𝑲+𝟏
𝒛𝒎 = 𝒛𝑲
𝟏−𝒛
𝒎=𝑲
38
Ideal Response of Continuous-Time (CT) Filters
39
Ideal Response of Discrete-Time (DT) Filters
40