Optimality Conditions
Optimality Conditions
Bang-Shien Chen∗
When it comes to optimality conditions, we often discuss well-known conditions such as the
Karush-Kuhn-Tucker (KKT) conditions and Fritz John (FJ) conditions. However, it is helpful
to start with a simpler case: unconstrained optimization problems, and derive its necessary and
sufficient conditions for optimality.
Let x∗ be a local minimum of Problem 1.1, and suppose that f is continuously differentiable
at x∗ . Then
∇f (x∗ ) = 0.
Proof. Given some direction d ∈ Rn and scalar α, we have the first-order approximation
f (x∗ + αd) ≈ f (x∗ ) + ∇f (x∗ )⊤ ((x∗ + αd) − x∗ ).
Since x∗ is a local minimum, we have the inequality
f (x∗ ) ≤ f (x∗ ) + ∇f (x∗ )⊤ (αd).
For small α > 0 and α < 0, we have d⊤ ∇f (x∗ ) ≥ 0 for all d and d⊤ ∇f (x∗ ) ≤ 0 for all d,
respectively. Together, d⊤ ∇f (x∗ ) = 0 for all d, thus ∇f (x∗ ) = 0.
∗
https://dgbshien.com/
1
Similar to the first-order condition, we could also derive a second-order condition using
second-order approximation.
Let x∗ be a local minimum of Problem 1.1, and suppose that f is twice continuously
differentiable at x∗ . Then
∇2 f (x∗ ) ⪰ 0.
Proof. Given some direction d ∈ Rn and scalar α, we have the second-order approximation
α2 ⊤ 2
f (x∗ + αd) ≈ f (x∗ ) + α∇f (x∗ )⊤ d + d ∇ f (x∗ )d.
2
Since x∗ is a local minimum, we have the inequality
α2 ⊤ 2
f (x∗ ) ≤ f (x∗ ) + α∇f (x∗ )⊤ d + d ∇ f (x∗ )d.
2
By Theorem 1.1 and α2 > 0, we have d⊤ ∇2 f (x∗ )d ≥ 0 for all d, i.e., ∇2 f (x∗ ) ⪰ 0.
Proof. Given some direction d ∈ Rn and by the second-order approximation the first condition,
1
f (x∗ + d) = f (x∗ ) + ∇f (x∗ )⊤ d + d⊤ ∇2 f (x∗ )d + O(∥d∥2 )
2
1
= f (x∗ ) + d⊤ ∇2 f (x∗ )d + O(∥d∥2 ).
2
Since ∇2 f (x∗ ) is real symmetric, we have the spectral decomposition ∇2 f (x∗ ) = QΛQ⊤ , where
Λ is a diagonal matrix of eigenvalues and Q is orthogonal whose columns are the corresponding
eigenvectors. Let z = Q⊤ d, we have ∥z∥ = ∥Q⊤ d∥ = ∥d∥ since Q is orthogonal and
n
X n
X
d⊤ ∇2 f (x∗ )d = d⊤ QΛQ⊤ d = z ⊤ Λz = λi zi2 ≥ λmin zi2 = λmin ∥z∥2 = λmin ∥d∥2 .
i=1 i=1
2
1.3 Existence of Optimal Solutions
In general, a minimum need not exist. For example, f (x) = x and f (x) = ex have no global
minimum. To derive existence, we first review some definitions. Let X ⊆ Rn ,
3. There exists a scalar γ such that the level set {x ∈ X | f (x) ≤ γ} is nonempty and
compact.
Then there exists a vector x∗ ∈ X such that f (x∗ ) = inf x∈X f (x), i.e., x∗ is a global
minimum.
This can be reduced to the well-known Weierstrass Extreme Value Theorem, which states
that every continuous function on a nonempty compact set attains its extreme values on that
set, including a global minimum. We next give an example of using optimality conditions to
prove a well-known inequality.
s.t. y1 + y2 + · · · yn = s,
we aim to show the optimal value is nes/n . We rewrite an equivalent unconstrained problem
3
Note that since f is coercive, by Proposition 1.4, there exists a global minimum. Let
(y1∗ , y2∗ , . . . , yn−1
∗ ) be the global minimum, by Theorem 1.1, we have
∂f
= eyi + es−y1 −···−yn−1 (−1) = 0 for i = 1, . . . , n − 1,
∂yi
which implies yi∗ = s−y1 −· · ·−yn−1 for i = 1, . . . , n−1. The system has only one solution:
∗ ∗
yi = s/n for all i, which is also the unique global minimum. Also, ey1 + · · · + eyn = nes/n .
2 Lagrange Duality
In the previous section, we derived optimality conditions for unconstrained problems. We
will next show the optimality conditions for constrained problems, such as the Fritz John con-
ditions and Karush-Kuhn-Tucker conditions. However, since it is easier to understand these
conditions with a basic knowledge of Lagrange duality, we first give a brief introduction to
Lagrange duality.
where λ and ν are called the Lagrange multipliers or dual variables. We then define the (La-
grange) dual function g : Rm × Rl → R by
m l
!
X X
g(λ, ν) = inf L(x, λ, ν) = inf f0 (x) + λi fi (x) + µi hi (x) .
x∈D x∈D
i=1 i=1
Note that the dual function is the point-wise infimum of a family of affine functions of (λ, ν),
which is always concave even if Problem (2.1) is not convex. Now suppose that x∗ is an optimal
solution, for any λ ≥ 0 and any ν, we have
m
X l
X
L(x∗ , λ, ν) = f0 (x∗ ) + λi fi (x∗ ) + λi hi (x∗ ) ≤ f0 (x∗ ),
i=1 i=1
| {z } | {z }
≤0 =0
which implies
g(λ, ν) = inf L(x, λ, ν) ≤ inf L(x∗ , λ, ν) ≤ f0 (x∗ ) = p∗ . (2.2)
x∈D x∈D
That is, the optimal value p∗ is an upper bound of the dual function g, which is also the
main idea of weak duality. Also, the optimal value p∗ is a lower bound of the objective function
f0 . Figure 2.1 illustrates this property.
4
Example 2.1
min x3 + 2x2 − x + 1
x
s.t. x2 ≤ 1 ⇐⇒ 0 ≤ x ≤ 1.
We can observe from Figure 2.1 that the optimal value is the lower bound of the objective
function, and also is the upper bound of the dual function.
0
7 objective function
optimal value
6 −20
5
−40
f0 (x)
g(λ)
4
−60
3
2 −80
dual function
1
optimal value
−100
−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 0 20 40 60 80 100
x λ
Since the optimal value is an upper bound of the dual function, the (Lagrange) dual problem
aims to maximize the dual function:
max g(λ, ν)
λ,ν
(2.3)
s.t. λ ⪰ 0.
The dual problem is always convex whether the primal problem is convex or not.
d∗ ≤ p∗ .
d∗ = p∗ .
While strong duality does not hold in general, there are some results that establish condi-
tions on the problem under which strong duality holds. These conditions are called constraint
qualifications. Here we give a simple and widely used constraint qualification in the context of
convex optimization, the Slater’s condition: There exists an x ∈ relintD such that fi (x) < 0
for all i = 1, . . . , m and hi (x) = 0 for all i = 1, . . . , l, i.e., there exists a strictly feasible point in
the relative interior of the feasible set.
5
2.3 Complementary Slackness
Suppose that the primal optimal value p∗ and dual optimal value d∗ are attained where x∗
is the primal optimal solution and (λ∗ , ν ∗ ) is the dual optimal solution. This means that
m
X l
X
∗ ∗ ∗
g(λ , ν ) ≤ f0 (x ) + λ∗i fi (x∗ ) + νi∗ hi (x∗ ) ≤ f0 (x∗ ).
i=1 i=1
| {z } | {z }
≤0 =0
The first inequality is because infimum is a lower bound, and the second inequality is because
x∗ is feasible such that fi (x∗ ) ≤ 0 for i = 1, . . . , m and h(x∗ ) = 0 for i = 1, . . . , l. Now suppose
that strong duality holds, i.e., g(λ∗ , ν ∗ ) = f0 (x∗ ), we have that the two inequalities in the chain
hold equality. This implies the complementary slackness:
λ∗i fi (x∗ ) = 0, i = 1, . . . , m. (2.4)
We say an inequality constraint is binding or active if fi (x∗ ) = 0. An important result of
complementary slackness is that the i-th inequality constraint is binding if λ∗i is not zero, or λ∗i
is zero if the i-th inequality constraint is non-binding.
• λ∗i > 0 =⇒ fi (x∗ ) = 0.
• fi (x∗ ) < 0 =⇒ λ∗i = 0.
Roughly speaking, only the constraints that are binding at the optimal point have a direct
impact on the optimal solution through their Lagrange multipliers, while constraints that are
non-binding do not affect the solution, as their Lagrange multipliers are zero. Complementary
slackness allows us to identify which constraints are active and further simplify an optimization
problem, or identify which constraints are critical and provide insights for decision-making.
Another result of complementary slackness is that since
m
X l
X
f0 (x∗ ) = g(λ∗ , ν ∗ ) ≤ f0 (x∗ ) + λ∗i fi (x∗ ) + νi∗ hi (x∗ ) = L(x∗ , λ∗ , ν ∗ ),
i=1 i=1
we can conclude that x∗ minimizes L(x, λ∗ , ν ∗ ). Therefore, by Theorem 1.1, it follows that
m
X l
X
∇f0 (x∗ ) + λ∗i ∇fi (x∗ ) + νi∗ ∇hi (x∗ ) = 0. (2.5)
i=1 i=1
6
3.1 Fritz John Necessary Conditions
The FJ necessary conditions if for local minimums. We need some assumptions:
1. Stationary condition: λ0 ∇f0 (x) + i∈I λi ∇fi (x) + li=1 νi ∇hi (x) = 0.
P P
This can be reduced to a more used FJ necessary conditions [4] with complementary slackness
by stronger assumptions:
Corollary 3.2
7
• Binding inequality constraints are strictly pseudoconvex.
4 Karush-Kuhn-Tucker Conditions
The FJ conditions provide a general set of necessary conditions for optimality, which does
not require the non-negativity of the Lagrange multiplier associated with the objective function,
i.e., λ0 ≥ 0. This makes FJ conditions more general but less restrictive. For example, if
λ0 = 0, the stationary condition no longer reflects a balance between the objective and the
constraints. The main difference between the FJ conditions and the KKT conditions is that
the Lagrange multiplier λ0 cannot be zero, i.e., λ0 > 0. Thus, the KKT conditions can be seen
as a special case of the FJ conditions, where the presence of regularity in the problem ensures
meaningful Lagrange multipliers. We can derive the KKT conditions from the FJ conditions if
some constraint qualification holds.
• Constraint qualification.
1. Stationary condition: ∇f0 (x) + i∈I λi ∇fi (x) + li=1 νi ∇hi (x) = 0.
P P
8
Similar to the FJ necessary conditions, this can be reduced to a more used KKT necessary
conditions with complementary slackness by stronger assumptions:
• Objective function and inequality constraints are differentiable.
• Equality constraints are continuously differentiable.
• Constraint qualification.
Corollary 4.2
1. Stationary condition: ∇f0 (x) + i∈I λi ∇fi (x) + li=1 νi ∇hi (x) = 0.
P P
Suppose that x is a KKT point, and let I = {i | fi (x) = 0} be the set of binding constraints.
If f0 is pseudoconvex, fi for i ∈ I are quasiconvex, and hi is quasiconvex if νi > 0 and
quasiconcave if νi < 0, then x is a global minimum.
For convex optimization problems, the KKT conditions are sufficient. Here is an example
of solving a convex program with the KKT conditions.
Example 4.4
We first check the Slater’s condition. We have a feasible point (0, 0) where f1 (0, 0) = −1
is strictly feasible (and both the objective and constraint are differentiable). Then by the
9
KKT conditions, there exists a unique λ such that stationary condition holds
then check the primal/dual feasibility. Suppose that the constraint is binding, the system
becomes
2x1 − 2 + λ = 0
2 1 6
2x2 − 4 + 3λ = 0 =⇒ (x1 , x2 , λ) = , , ,
5 5 5
x1 + 3x2 − 1 = 0
5 Constraint Qualifications
The FJ necessary conditions are a general set of conditions that hold for a local minimum of
constrained optimization problems. The KKT necessary conditions reduce the FJ conditions by
requiring that the Lagrange multiplier associated to the objective function is non-negative, which
requires a constraint qualification (regularity condition) must hold. We list some constraint
qualifications:
• Slater’s condition.
• Linear independence constraint qualification: ∇fi (x) for i ∈ I and ∇hi (x) for i = 1, . . . , l
are linear independent.
For more information about constraint qualifications, please refer to Bazaraa [1].
References
[1] M. S. Bazaraa, H. D. Sherali, and C. M. Shetty. Nonlinear Programming: Theory and
Algorithms. John Wiley & Sons, 2006.
[3] S. Boyd and L. Vandenberghe. Convex optimization. Cambridge University Press, 2004.
[4] G. Giorgi. Remarks on fritz john conditions for problems with inequality and equality
constraints. International Journal of Pure and Applied Mathematics, 71:643–657, 2011.
10