lgo
lgo
Lecture notes
Ulrich Brenner
Research Institute for Discrete Mathematics, University of Bonn
Summer term 2023
April 9, 2024
18:11
1
Preface
Continuous updates of these lecture notes can be found on the webpage of the lecture course:
http://www.or.uni-bonn.de/lectures/ss22/lgo ss22.html
These lecture notes are based on a number of textbooks and lecture notes from earlier courses.
See e.g. the lecture notes by Tim Nieberg (winter term 2012/2013) and Stephan Held (winter
term 2013/2014 and 2017/18) that are available online on the teaching web pages of the Research
Institute for Discrete Mathematics, University of Bonn (http://www.or.uni-bonn.de/lectures).
Recommended textbooks:
• Chvátal [1983]: Still a good introduction into the field of linear programming.
• Korte and Vygen [2018]: Chapters 3–5 contain the most important results of this lecture
course. Very compact description.
• Matoušek and Gärtner [2007]: Very good description of the linear programming part. For
some results, proofs are missing, and the book does not consider integer programming.
• Schrijver [1986]: Comprehensive textbook covering both linear and integer programming.
Proofs are short but precise.
Prerequisites of this course are the lectures “Algorithmische Mathematik I” and “Lineare
Algebra I/II”. The lecture “Algorithmische Mathematik I” is covered by the textbook by
Hougardy and Vygen [2018]. The results concerning Linear Algebra that are used in this course
can be found, e.g., in the textbooks by Anthony and Harvey [2012], Bosch [2007], and Fischer
[2009].
We we also make use of some basic results of the complexity theory as they are taught in the
lecture course “Einführung in die Diskrete Mathematik”. These results on complexity theory
can be found e.g. in Chapter 15 of the textbook by Korte and Vygen [2018].
The notation concerning graphs is based on the notation proposed in the textbook by Korte
and Vygen [2018].
Please report any errors in these lecture notes to brenner@or.uni-bonn.de
2
Inhaltsverzeichnis
1 Introduction 5
1.1 A First Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Optimization Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Possible Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Integrality Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Modeling of Optimization Problems as (Integral) Linear Programs . . . . . . . 9
1.6 Polyhedra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Duality 17
2.1 Dual LPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Fourier-Motzkin Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 Farkas’ Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4 Strong Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.5 Complementary Slackness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4 Simplex Algorithm 45
4.1 Feasible Basic Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.2 The Simplex Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.3 Efficiency of the Simplex Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 57
4.4 Dual Simplex Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3
4.5 Network Simplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5 Sizes of Solutions 65
5.1 Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6 Ellipsoid Method 69
6.1 Idealized Ellipsoid Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.2 Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.3 Ellipsoid Method for Linear Programs . . . . . . . . . . . . . . . . . . . . . . . 78
6.4 Separation and Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4
1 Introduction
Assume that a farmer has 10 hectares of farmland where he can grow two kinds of crops: corn
and wheat (or a combination of both). For each hectare of corn he gets a revenue of 2 units of
money and for each hectare of wheat he gets 3 units of money. Planting corn in an area of one
hectare takes him 1 day while planting wheat takes him 2 days per hectare. In total, he has 16
days for the work on his field. Moreover, each hectare planted with corn needs 5 units of water
and each hectare planted with wheat needs 2 units of water. In total he has 40 units of water.
How can he maximize his revenue?
If x1 is the number of hectares planted with corn and x2 is the number of hectares planted with
wheat we can write the corresponding optimization problem in the following compact way:
This is what we call a linear program (LP). In such an LP, we are given a linear objective
function (in our case (x1 , x2 ) 7→ 2x1 + 3x2 ) that has to be maximized or minimized under a
number of linear constraints. These constraints can be given by linear inequalities (but not
strict inequalities “<”) or by linear equations. However, a linear equation can easily be replaced
by a pair of inequalities (e.g. 4x1 + 3x2 = 7 is equivalent to 4x1 + 3x2 ≤ 7 and 4x1 + 3x2 ≥ 7),
so we may assume that all constraints are given by linear inequalities.
In our example, there were only two variables, x1 and x2 . In this case, linear programs can be
solved graphically. Figure 1 illustrates the method. The grey area is the set
which is the set of all feasible solutions of our problem. We can solve the problem by moving
the green line, which is orthogonal to the cost vector 23 (shown in red), in the direction of 23
as long as it intersects the feasible area. We end up with x1 = 4 and x2 = 6, which is in fact an
optimum solution.
5
x2
5x1 + 2x2 = 40
x1
x1 + x2 = 10 x1 + 2x2 = 16
In this lecture course, we consider optimization problems with linear objective functions and
linear constraints. The constraints can be written in a compact way using matrices:
Linear Programming
Instance: A matrix A ∈ Rm×n , vectors c ∈ Rn and b ∈ Rm .
Task: Find a vector x ∈ Rn with Ax ≤ b maximizing ct x.
6
Notation: Unless stated differently, always let A = (aij ) i=1,...,m ∈ Rm×n , b = (b1 , . . . , bm ) ∈ Rm
j=1,...,n
and c = (c1 , . . . , cn ) ∈ Rn .
Remark: Real vectors are simply ordered sets of real numbers. But when we multiply vectors
with each other or with matrices, we have to interpret them as n × 1-matrices (column vectors)
or as 1 × n-matrices (row vectors). By default, we consider vectors as column vectors in this
context, so if we want to use them as row vectors, we have to transpose them (“ct ”).
We often write linear programs in the following way:
max ct x
(1)
s.t. Ax ≤ b
Both standard forms can be transformed into each other: If we are given a linear program in
standard equation form we can replace each equation by a pair of inequalities and the constraint
x ≥ 0 by −In x ≤ 0 (where In is always the n × n-identity matrix). This leads to a formulation
of the same linear program in standard inequality form.
The transformation from the standard inequality form into the standard equation form is slightly
more complicated: Assume we are given the following linear program in standard inequality
form
max ct x
(3)
s.t. Ax ≤ b
We replace each variable xi by two variables zi and z̄i . Moreover, for each of the m constraints
we introduce a new variable x̃i (a so-called slack variable). With variables z = (z1 , . . . , zn ),
z̄ = (z̄1 , . . . , z̄n ) and x̃ = (x̃1 , . . . , x̃m ), we state the following LP in standard equation form:
7
Note that [A | − A | Im ] is the m × 2n + m-matrix that we get by concatenating the matrices
A, −A and Im . Any solution z,z̄ and x̃ of the LP (4) gives a solution of the LP (3) with the
same cost by setting: xj := zj − z̄j (for j ∈ {1, . . . , n}).
On the other hand, if x is a solution of LP (3), then we get a solution of LP (4) with thePsame cost
n
by setting zj := max{xj , 0}, Pnz̄j := − min{xj , 0} (for j ∈ {1, . . . , n}) and x̃i = bi − j=1 aij xj
(for i ∈ {1, . . . , m}, where j=1 aij xj ≤ bi is the i-th constraint of Ax ≤ b).
Note that (in contrast to the first transformation) this second transformation (from the standard
inequality form into the standard equation form) leads to a different solution space because we
have to introduce new variables.
There are three possible outcomes for a linear program max{ct x | Ax ≤ b}:
We will see that deciding if a linear program is feasible is as hard as computing an optimum
solution to a feasible and bounded linear program (see Section 2.4).
In many applications, we need an integral solution. This leads to the following class of problems:
8
Replacing the constraint x ∈ Rn by x ∈ Zn makes a huge difference. We will see that
there are polynomial-time algorithms for Linear Programming while Integer Linear
Programming is NP-hard.
Of course, one can also consider optimization problems where we have integrality constraints
only for some of the variables. These linear optimization problems are called Mixed Integer
Linear Programs.
We consider some examples how optimization problems can be modeled as LPs or ILPs. Many
flow problems can easily formulated as linear programs:
Definition 2 Let G be a directed graph with capacities u : E(G) → R>0 and let s and t
be two vertices of G. A feasible s-t-flow in (G, u) is a mapping f : E(G) → R≥0 with
•
P P
+
e∈δG (v) f (e) − −
e∈δG (v) f (e) = 0 for all v ∈ V (G) \ {s, t}.
P P
The value of an s-t-flow f is val(f ) = e∈δ+ (s) f (e) − e∈δ− (s) f (e).
G G
Maximum-Flow Problem
Instance: A directed Graph G, capacities u : E(G) → R>0 , vertices s, t ∈ V (G) with s ̸= t.
Task: Find an s-t-flow f : E(G) → R≥0 of maximum value.
P P
max xe − xe
+ −
e∈δG (s) e∈δG (s)
s.t. xe ≥ 0 for e ∈ E(G)
(7)
P P xe ≤ u(e) for e ∈ E(G)
xe − xe = 0 for v ∈ V (G) \ {s, t}
+ −
e∈δG (v) e∈δG (v)
It is well known that the value of a maximum s-t-flow equals the capacity of a minimum cut
separating s from t. We will see in Section 2.5 that this result also follows from properties of the
linear program formulation. Moreover, if the capacities are integral, there is always a maximum
flow that is integral (see Section 8.4).
9
In some cases, we first have to modify a given optimization problem slightly in order to get a
linear program formulation. See the following example of a modified version of the Maximum-
Flow Problem where we have two sources and want to maximize the minimal out-flow of
both sources.
The objective function here is not a linear function but the minimum of two linear function. To
see how such a problem can be written as an LP, we assume slightly more general that we are
given the following optimization problem:
max min{ct x + d, et x + f }
s.t. Ax ≤ b
And of course, this trick also works if we want to compute the minimum of more than two
linear functions.
More or less the same trick can be applied to the following problem in which the objective
function contains absolute values of linear functions:
min |ct x + d|
s.t. Ax ≤ b
for some c ∈ Rn and d ∈ R. Again the problem can be written equivalently as a linear program
10
in the following form:
max −σ
s.t. −σ − ct x ≤ d
−σ + ct x ≤ −d
Ax ≤ b
The two additional constraints on σ ensure that we have σ ≥ max{ct x + d, −ct x − d} = |ct x + d|.
Other problems allow a formulation as an ILP but assumably not an LP formulation:
This problem is known to be NP-hard (see standard textbooks like Korte and Vygen [2018]),
so we cannot hope for a polynomial-time algorithm. Nevertheless, the problem can easily be
formulated as an integer linear program:
P
min v∈V (G) xv c(v)
s.t. x v + xw ≥ 1 for {v, w} ∈ E(G) (8)
xv ∈ {0, 1} for v ∈ V (G)
For each vertex v ∈ V (G), we have a 0-1-variable xv which is 1 if and only if v should be in the
set X, i.e. if (xv )v∈V (G) is an optimum solution to (8), the set X = {v ∈ V (G) | xv = 1} is an
optimum solution to the Vertex Cover Problem.
This example shows that Integer Linear Programming itself is an NP-hard problem. By
skipping the integrality constraints (xv ∈ {0, 1}) we get the following linear program:
P
min v∈V (G) xv c(v)
s.t. x v + xw ≥ 1 for {v, w} ∈ E(G)
(9)
xv ≥ 0 for v ∈ V (G)
xv ≤ 1 for v ∈ V (G)
We call this linear program an LP-relaxation of (8). In this particular case, the relaxation
gives a 2-approximation of the Vertex Cover Problem: For any solution x of the relaxed
problem, we get an integral solution x̃ by setting
1 : xv ≥ 21
x̃v =
0 : xv < 12
P P
It is easy to check that yields a feasible solution of the ILP with x̃v c(v) ≤ 2 xv c(v).
v∈V (G) v∈V (G)
Obviously, in minimization problems relaxing some constraints can only decrease the value of
an optimum solution. We call the supremum of the ratio between the values of the optimum
11
solutions of an ILP and its LP-relaxation the integrality gap of the relaxation. The rounding
procedure described above also proves that in this case the integrality gap is at most 2. Indeed,
2 is the integrality gap as the example of a complete graph with weights c(v) = 1 for all vertices
v shows. For the Maximum-Flow Problem with integral edge capacities, the integrality gap
is 1 because there is always an optimum flow that is integral.
The following problem is NP-hard as well:
P
max v∈V (G) xv c(v)
s.t. x v + xw ≤ 1 for {v, w} ∈ E(G)
(11)
xv ≥ 0 for v ∈ V (G)
xv ≤ 1 for v ∈ V (G)
Unfortunately, in this case, the LP-relaxation is of no use. Even if G is a complete graph (were a
feasible solution of the Stable Set Problem can contain at most one vertex), setting xv = 12
for all v ∈ V (G) would be a feasible solution of the LP-relaxation. This example shows that the
integrality gap is at least n2 . Hence, this LP-relaxation does not provide any useful information
about a good ILP solution.
1.6 Polyhedra
12
Definition 4 For x1 , . . . , xk ∈ Rn , λ1 , . . . , λk , λi ≥ 0 (i ∈ {1, . . . , k}) with ki=1 λi = 1,
P
Remark: It is easy to check that the convex hull of a set X ⊆ Rn is the (inclusion-wise)
minimal convex set containing X.
• X = Rn
13
of A by a1 , . . . am . If aj = 0 for an j ∈ {1, . . . , m}, then bj ≥ 0 (where b = (b1 , . . . , bm )) because
otherwise X = ∅. Hence we have
m
\ \
X= {x ∈ Rn | atj x ≤ bj } = {x ∈ Rn | atj x ≤ bj },
j=1 j=1,...,m:aj ̸=0
In other words, the dimension of X ⊆ Rn is n minus the maximum size of a set of linear
independent vectors that are orthogonal to any difference of elements in X. For example, the
empty set and sets consisting of exactly one vector have dimension 0. The set Rn has dimension
n.
Observation: The dimension of a set X ⊆ Rn is the largest d for which X contains elements
v0 , v1 , . . . , vd such that v1 − v0 , v2 − v0 , . . . , vd − v0 are linearly independent.
Observation: A non-empty set X ⊆ Rn is a convex cone if and only if X is convex and for all
x ∈ X and λ ∈ R≥0 we have λx ∈ X.
14
sufficiently large λ, the i-th entry of λAx would be greater than bi which is a contradiction to
the assumption that X is a convex cone. Therefore, X = {x ∈ Rn | Ax ≤ 0}. 2
Let x1 , . . . , xm ∈ Rn be vectors. The cone generated by x1 , . . . , xm is the set
( m )
X
cone({x1 , . . . , xm }) := λi xi | λ1 , . . . , λm ≥ 0 .
i=1
15
16
2 Duality
How can we find upper bounds on the value of an optimum solution? By combining the first
two constraints we can get the following bound for any feasible solution (x1 , y1 ):
1 1
12x1 + 10x2 = 2 · (4x1 + 2x2 ) + (8x1 + 12x2 ) ≤ 2 · 5 + · 7 = 13.5.
2 2
We can even do better by combining the last two inequalities:
7 4 7 4
12x1 + 10x2 = · (8x1 + 12x2 ) + · (2x1 − 3x2 ) ≤ · 7 + · 1 = 9.5.
6 3 6 3
More generally, for computing upper bounds we ask for non-negative numbers u1 , u2 , u3 such
that
12x1 + 10x2 = u1 · (4x1 + 2x2 ) + u2 · (8x1 + 12x2 ) + u3 · (2x1 − 3x2 ).
Then, 5 · u1 + 7 · u2 + 1 · u3 is an upper bound on the value of any solution of (P), so we want
to chose u1 , u2 , u3 in such a way that 5 · u1 + 7 · u2 + 1 · u3 is minimized.
This leads us to the following linear program (D):
This linear program is called the dual linear program of (P). Any solution of (D) yields
an upper bound on the optimum value of of (P), and in this particular case it turns out that
u1 = 0 , u2 = 76 , u3 = 43 (the second solution from above) with value 9.5 is an optimum solution
of (D) because x1 = 11 16
, x2 = 18 is a solution of (P) with value 9.5.
For a general linear program (P)
max ct x
s.t. Ax ≤ b
17
in standard inequality form we define its dual linear program (D) as
min bt y
s.t. At y = c
y ≥ 0
In this context, we call the linear program (P) primal linear program.
Remark: Note that the dual linear program does not only depend on the objective function
and the solution space of the primal linear program but on its description by linear inequalities.
For example adding redundant inequalities to the system Ax ≤ b will lead to more variables in
the dual linear program.
ct x = (At y)t x = y t Ax ≤ y t b.
2
Remark: The term “dual” implies that applying the transformation from (P) to (D) twice
yields (P) again. This is not exactly the case but it is not very difficult to see that dualizing (D)
(after transforming it into standard equational form) gives a linear program that is equivalent
to (P) (see the exercises).
3x + 2y + 4z ≤ 10
3x + 2z ≤ 9
2x − y ≤ 5
(12)
−x + 2y − z ≤ 3
−2x ≤ 4
2y + 2z ≤ 7
Assume that we just want to decide if a feasible solution x, y, z exists. The goal is to get rid of
the variables one after the other. To get rid of x, we first reformulate the inequalities such that
18
we can easily see lower and upper bounds for x:
x ≤ 10
3
− 2
3
y − 4
3
z
2
x ≤ 3 − 3
z
5
x ≤ 2
+ 12 y
(13)
x ≥ −3 + 2y − z
x ≥ −2
2y + 2z ≤ 7
This system of inequalities has a feasible solution if and only if the following system (that does
not contain x) has a solution:
19
Note that this method, which is called Fourier-Motzkin elimination, is in general very
inefficient. If m is the number of inequalities in the initial system, it may be necessary to state
m2
4
inequalities in the system with one variable less (this is the case if there are m2 inequalities
that gave an upper bound on the variable we got rid of and m2 inequalities that gave a lower
bound).
Nevertheless, the Fourier-Motzkin elimination can be used to get a certificate that a given
system of inequalities does not have a feasible solution. In the proof of the following theorem
we give a general description of one iteration of the method:
Theorem 4 Let A ∈ Rm×n and b ∈ Rm (with n ≥ 1). Then there are à ∈ Rm̃×(n−1) and
2
b̃ ∈ Rm̃ with m̃ ≤ m + m4 such that
(a) Each inequality in the system Ãx̃ ≤ b̃ is a positive linear combination of inequalities
from Ax ≤ b
(b) The system Ax ≤ b has a solution if and only if Ãx̃ ≤ b̃ has a solution.
Proof: Denote the entries of A by aij , i.e. A = (aij ) i=1,...,m . We will show how to get rid of
j=1,...,n
the variable with index 1. To this end, we partition the index set {1, . . . , m} of the rows into
three disjoint sets U ,L, and N :
We can assume that |ai1 | = 1 for all i ∈ U ∪ L (otherwise we divide the corresponding inequality
by |ai1 |).
For vectors ãi = (ai2 , . . . ain ) and x̃ = (x2 , . . . xn ) (that are empty if n = 1), we replace the
inequalities that correspond to indices in U and L by
Obviously, each of these |U |·|L| new inequalities is simply the sum of two of the given inequalities
(and hence a positive linear combination of them).
The inequalities with index in N are rewritten as
ãtl x̃ ≤ bl l ∈ N. (18)
The inequalities in (17) and (18) form a set of inequalities Ãx̃ ≤ b̃ with n − 1 variables, and each
solution of Ax ≤ b gives a solution of Ãx̃ ≤ b̃ by restricting x = (x1 , . . . , xn ) to (x2 , . . . , xn ).
20
On the other hand, if x̃ = (x2 , . . . , xn ) is a solution of Ãx̃ ≤ b̃, then we can set x̃1 to any value
in the (non-empty) interval
where we set the minimum of an empty set to ∞ and the maximum of an empty set to −∞.
Then, x = (x̃1 , x2 , . . . , xn ) is a solution of Ax ≤ b. 2
Theorem 6 (Farkas’ Lemma, most general case) For A ∈ Rm1 ×n1 , B ∈ Rm1 ×n2 , C ∈
Rm2 ×n1 , D ∈ Rm2 ×n2 , a ∈ Rm1 and b ∈ Rm2 exactly one of the two following systems has
a feasible solution:
System 1:
Ax + By ≤ a
Cx + Dy = b (19)
x ≥ 0
System 2:
ut A + v t C ≥ 0t
ut B + v t D = 0t
(20)
u ≥ 0
u a + vtb
t
< 0
21
Proof: The first system is equivalent to
Ax + By ≤ a
Cx + Dy ≤ b
−Cx − Dy ≤ −b
−In1 x ≤ 0
By Theorem 5, this system has a solution if and only if the following sytem does not have a
solution:
Obviously, this system has a solution if and only if the second system of the theorem has a
solution. 2
Corollary 7 (Farkas’ Lemma, further variants) For A ∈ Rm×n and b ∈ Rm , the following
statements hold:
Proof: Restrict the statement of Theorem 6 to the vector b and matrix C (for part (a)) of D
(for part (b)). 2
Remark: Statement (a) of Corollary 7 has a nice geometric interpretation. Let C be the cone
generated by the columns of A. Then, the vector b is either in C or there is a hyperplane (given
by the normal u) that separates b from C.
2 3
and b1 = 52 and b2 = 13 (see Figure 2). The vector b1
As an example consider A =
1 1
is in the cone generated by the columns of A (because 52 = 21 + 31 ) while b2 can by separated
1
from the cone by a hyperplane orthogonal to u = −2 .
Hence, we have get the following corollary.
22
y
1
b2 = 3
5
b1 = 2
2
1
3
1
x
1
u= −2
Corollary 8 Let a1 , . . . , an ∈ Rm . Then for any vector b ∈ Rm exactly one of the following
statements holds:
Proof: Statement (b) is equivalent to the existence of a vector u ∈ Rm with ut ai ≥ 0 for all
i ∈ {1, . . . , n} and ut b < 0. Thus the corollary follows from Corollary 7 (a). 2
23
2.4 Strong Duality
max ct x (P )
s.t. Ax ≤ b
and
min bt y (D)
s.t. At y = c
y ≥ 0
4. Both (P) and (D) have a feasible solution. Then both have an optimal solution, and
for an optimal solution x̃ of (P) and an optimal solution ỹ of (D), we have
ct x̃ = bt ỹ.
24
ũ := z1 u. This implies At ũ = c and ũ ≥ 0, so ũ is a feasible solution of (D). Therefore (D) is
feasible. It is bounded as well because of the weak duality.
It remains to show that there are feasible solutions x of (P) and y of (D) such that ct x ≥ bt y.
This is the case if (and only if) the following system has a feasible solution:
Ax ≤ b
At y = c
−ct x + bt y ≤ 0
y ≥ 0
By Theorem 6, this is the case if and only if the following system (with variables u ∈ Rm ,
v ∈ Rn and w ∈ R) does not have a feasible solution:
ut A −wct = 0
v t At + wbt ≥ 0
u b + vtc
t
< 0 (23)
u ≥ 0
w ≥ 0
Hence, assume that system (23) has a feasible solution u, v and w.
Case 1: w = 0. Then (again by Farkas’ Lemma) the system
Ax ≤ b
At y = c
y ≥ 0
does not have a feasible solution, which is a contradiction because both (P) and (D) have a
feasible solution.
Case 2: w > 0. Then
0 > wut b + wv t c ≥ ut (−Av) + v t (At u) = 0,
which is a contradiction. 2
Remark: Theorem 9 shows in particular that if a linear program max{ct x | Ax ≤ b} is feasible
and unbounded that there is a vector x̃ with Ax̃ ≤ b such that ct x̃ = sup{ct x | Ax ≤ b}.
The following table gives an overview of the possible combinations of states of the primal and
dual LPs (“✓” means that the combination is possible, “x” means that it is not possible):
(D)
Feasible, Feasible,
Infeasible
bounded unbounded
Feasible,
✓ x x
bounded
(P) Feasible,
x x ✓
unbounded
Infeasible x ✓ ✓
25
Remark: The previous theorem can be used to show that computing a feasible solution of
a linear program is in general as hard as computing an optimum solution. Assume that we
want to compute an optimum solution of the program (P) in the theorem. To this end, we can
compute any feasible solution of the following linear program:
max ct x
s.t. Ax ≤ b
At y = c (24)
ct x ≥ bt y
y ≥ 0
Here x and y are the variables. We can ignore the objective function in the modified LP because
we just need any feasible solution. The constraints At y = c, ct x ≥ bt y and y ≥ 0 guarantee that
any vector x from a feasible solution of the new LP is an optimum solution of (P).
26
Primal LP Dual LP
Variables x1 , . . . , x n y1 , . . . , y m
Matrix A At
Right-hand side b c
Objective function max ct x min bt y
n
P
aij xj ≤ bi yi ≥ 0
j=1
Pn
aij xj ≥ bi yi ≤ 0
j=1
Pn
aij xj = bi yi ∈ R
j=1
Constraints m
P
xj ≥ 0 aij yi ≥ cj
i=1
Pm
xj ≤ 0 aij yi ≤ cj
i=1
Pm
xj ∈ R aij yi = cj
i=1
max{ct x | Ax ≤ b, x ≥ 0} min{bt y | y t A ≥ c, y ≥ 0}
max{ct x | Ax ≥ b, x ≥ 0} min{bt y | y t A ≥ c, y ≤ 0}
max{ct x | Ax = b, x ≥ 0} min{bt y | y t A ≥ c}
(b) ct x = bt y.
(c) y t (b − Ax) = 0.
27
Proof: The equivalence of the statements (a) and (b) follows from Theorem 9. To see the
equivalence of (b) and (c) note that y t (b − Ax) = y t b − y t Ax = y t b − ct x, so ct x = bt y is
equivalent to y t (b − Ax) = 0. 2
With the notation of the theorem, let at1 , . . . , atm be the rows of A and b = (b1 , . . . , bm ). Then,
the theorem implies that for an optimum primal solution x and an optimum dual solution y and
i ∈ {1, . . . , m} we have yi = 0 or ati x = bi (since m t t
P
i=1 i i − ai x) must be zero and yi (bi − ai x)
y (b
cannot be negative for any i ∈ {1, . . . , m}).
(b) ct x = bt y.
Proof: The equivalence of the statements (a) and (b) follows again from Theorem 9. To
see the equivalence of (b) and (c) note that 0 ≤ y t (b − Ax) and 0 ≤ xt (At y − c). Hence
y t (b − Ax) + xt (At y − c) = y t b − y t Ax + xt At y − xt c = y t b − xt c is zero if and only if
0 = y t (b − Ax) and 0 = xt (At y − c). 2
Proof: The linear program is bounded if and only if its dual linear program is feasible. This is
the case if and only if there is a vector y ≥ 0 with y t A = c which is equivalent to the statement
that c is in the cone generated by the rows of A. 2
Theorem 11 allows us to strengthen the statement of the previous Corollary. Let x be an
optimum solution of the linear program max{ct x | Ax ≤ b} and y an optimum solution of its
dual min{bt y | At y = c, y ≥ 0}. Denote the row vectors of A by at1 , . . . , atm . Then yi = 0 if
ati x < bi (for i ∈ {1, . . . , m}), so c is in fact in the cone generated only by these rows of A where
ati x = bi (see Figure 3 for an illustration).
28
y
a1
c
a2
at3 x = b3
at1 x = b1
{x ∈ R2 | Ax ≤ b}
x
at2 x = b2
(a) The primal LP max{ct x | Ax ≤ b} has an optimum solution x∗ with ati x∗ < bi .
(b) The dual LP min{bt y | At y = c, y ≥ 0} has an optimum solution y ∗ with yi∗ > 0.
Proof: By complementary slackness, at most one the statements can be true. Let δ =
max{ct x | Ax ≤ b} be the value of an optimum solution. Assume that (a) does not hold. This
means that
max −ati x
Ax ≤ b
−ct x ≤ −δ
has an optimum solution with value −bi . Hence, also its dual LP
min bt y − δu
At y − uc = −ai
y ≥ 0
u ≥ 0
must have an optimum solution of value −bi . Therefore, there are y ∈ Rm and u ∈ R with y ≥ 0
and u ≥ 0 with y t A − uct = −ati and y t b − uδ = −bi . Let ỹ = y + ei (i.e. ỹ arises from y by
increasing the i-th entry by one). If u = 0, then ỹ t A = y t A+ati = 0 and ỹ t b = y t b+bi = 0, so if y ∗
29
is an optimum solution of min{bt y | At y = c, y ≥ 0}, then y ∗ + ỹ is also an optimum solution and
has a positive i-th entry. If u > 0, then u1 ỹ is an optimum solution of min{bt y | At y = c, y ≥ 0},
(because u1 ỹ t A = u1 y t A + u1 ati = ct and u1 ỹ t b = u1 y t b + u1 bi = δ) and has a positive i-th entry. 2
Proof: By Theorem 14, for any inequality ati x ≤ bi there is a pair of optimum solutions
(i)
x(i) ∈ Rn , y (i) ∈ Rm such that ati x(i) < bi or yi > 0. Since the convex
Pm combination of optimum
LP solutions is again an optimum solution, we can set x := m i=1 x and y := m1 m
∗ 1 (i) ∗ (i)
P
i=1 y
and get a pair of optimum solutions fulfilling the conditions of the theorem. 2
P P
max xe − xe
+ −
e∈δG (s) e∈δG (s)
s.t. xe ≥ 0 for e ∈ E(G)
(25)
P x
P e ≤ u(e) for e ∈ E(G)
xe − xe = 0 for v ∈ V (G) \ {s, t}
+ −
e∈δG (v) e∈δG (v)
P
min u(e)ye
e∈E(G)
s.t. ye ≥ 0 for e ∈ E(G)
ye + zv − zw ≥ 0 for e = (v, w) ∈ E(G), {s, t} ∩ {v, w} = ∅
ye + zv ≥ 0 for e = (v, t) ∈ E(G), v ̸= s
ye − zw ≥ 0 for e = (t, w) ∈ E(G), w ̸= s (26)
ye − zw ≥ 1 for e = (s, w) ∈ E(G), w ̸= t
ye + zv ≥ −1 for e = (v, s) ∈ E(G), v ̸= t
ye ≥ 1 for e = (s, t) ∈ E(G)
ye ≥ −1 for e = (t, s) ∈ E(G)
In a simplified way its dual LP can be written with two dummy variables zs = −1 and zt = 0:
30
P
min u(e)ye
e∈E(G)
s.t. ye ≥ 0 for e ∈ E(G)
ye + zv − zw ≥ 0 for e = (v, w) ∈ E(G) (27)
zs = −1
zt = 0
We will use the dual LP to show the Max-Flow-Min-Cut-Theorem. We call a set δ + (R) with
R ⊂ V (G), s ∈ R and t ̸∈ R an s-t-cut.
Proof: If x is a feasible solution of the primal problem (25) (i.e. x encodes an s-t-flow) and
δ + (R) is an s-t-cut, then
X X X X X X X X
xe − xe = xe − xe = xe − xe ≤ u(e).
+ − v∈R + − + − +
e∈δG (s) e∈δG (s) e∈δG (v) e∈δG (v) e∈δG (R) e∈δG (R) e∈δG (R)
P P
The first equation follows from the flow conservation rule (i.e. xe − xe = 0) applied
+ −
e∈δG (v) e∈δG (v)
to all vertices in R \ {s} and the second one from the fact that flow values on edges inside R
cancel out in the sum. The last inequality follows from the fact that flow values are between 0
and u.
Thus, the capacity of any s-t-cut is an upper bound for the value of an s-t-flow. We will show
that for any maximum s-t-flow there is an s-t-cut whose capacity equals the value of the flow.
Let x̃ be an optimum solution of the primal problem (25) and ỹ, z̃ be an optimum solution of
the dual problem (27). In particular x̃ defines a maximum s-t-flow. Consider the set R := {v ∈
V (G) | z̃v ≤ −1}. Then s ∈ R and t ̸∈ R.
+
If e = (v, w) ∈ δG (R), then z̃v < z̃w , so ỹe ≥ z̃w − z̃v > 0. By complementary slackness
−
this implies x̃e = u(e). On the other hand, if e = (v, w) ∈ δG (R), then z̃v > z̃w and hence
ỹe + z̃v − z̃w ≥ z̃v − z̃w > 0, so again by complementary slackness x̃e = 0. This leads to:
X X X X X X X X
x̃e − x̃e = x̃e − x̃e = x̃e − x̃e = u(e).
+ − v∈R + − + − +
e∈δG (s) e∈δG (s) e∈δG (v) e∈δG (v) e∈δG (R) e∈δG (R) e∈δG (R)
31
32
3 The Structure of Polyhedra
is a polyhedron.
Proof: Exercise. 2
x
Remark: The set P = {x ∈ Rn | ∃y ∈ Rk : A y
≤ b} is called a projection of {z ∈ Rn+k |
Az ≤ b} to Rn .
More generally, the image of a polyhedron {x ∈ Rn | Ax ≤ b} under an affine linear mapping
f : Rn → Rk , which is given by D ∈ Rk×n , d ∈ Rk and x 7→ Dx + d is also a polyhedron:
{y ∈ Rk | ∃x ∈ Rn : Ax ≤ b and y = Dx + d}
is a polyhedron.
(
k n
y ∈ R | ∃x ∈ R : Ax ≤ b and y = Dx + d
A 0 b
x
= y ∈ Rk | ∃x ∈ Rn : D −Ik ≤ −d
y
−D Ik d
33
3.2 Faces
(a) F is a face of P .
Proof:
34
Set z := x∗ + ϵ(x∗ − y) (see Figure 4). Then ct z > δ, so z ̸∈ P . Therefore, there must
be an inequality at x ≤ β of the system Ax ≤ b such that at z > β. We claim that this
inequality cannot belong to Ãx ≤ b̃. To see this assume that at x ≤ β belongs to Ãx ≤ b̃.
If at x∗ ≤ at y then
at z = at x∗ + ϵat (x∗ − y) ≤ at x∗ < β.
But if at x∗ > at y then
t t ∗ t ∗ β − at x ∗ t ∗
t ∗
a z = a x + ϵa (x − y) < a x + t ∗ a (x − y) = β.
a (x − y)
In both cases, we get a contradiction, so the inequality at x ≤ β belongs to A′ x ≤ b′ .
Therefore, at y = at (x∗ + 1ϵ (x∗ − z)) = (1 + 1ϵ )β − 1ϵ at z < β, which means that A′ y ̸= b′ .
c
x∗ + ϵ(x∗ − y)
F
x∗
y
P
(a) Let c ∈ Rn be a vector such that max{ct x | x ∈ P } < ∞. Then the set of all vectors
x where the maximum of ct x over P is attained is a face of P .
(b) F is a polyhedron.
We are in particular interested in the largest and the smallest faces of a polyhedron.
35
3.3 Facets
Proof: If P = {x ∈ Rn | Ax = b}, then P does not have a facet (the only face of P is P itself,
see Corollary 20 (d)), so both statements are trivial.
Hence assume that P ̸= {x ∈ Rn | Ax = b}.
Let A′ x ≤ b′ be a minimal system of inequalities such that P = {x ∈ Rn | Ax = b, A′ x ≤ b′ }.
Let at x ≤ β be an inequality in A′ x ≤ b′ , and let A′′ x ≤ b′′ be the rest of the system A′ x ≤ b′
without at x ≤ β.
We will show that at x ≤ β is facet-defining.
Let y ∈ Rn be a vector with Ay = b, A′′ y ≤ b′′ and at y > β. Such a vector exists because
otherwise A′′ y ≤ b′′ would be a smaller system of inequalities than A′ ≤ b′ with P = {x ∈ Rn |
Ax = b, A′′ x ≤ b′′ }, which is a contradiction to the definition of A′ x ≤ b′ .
Moreover, let ỹ ∈ P be a vector with A′ ỹ < b′ (such a vector ỹ exists because P is full-
dimensional in the linear subspace {x ∈ Rn | Ax = b} and because of the minimality of the
system A′ x ≤ b′ ). Consider the vector
β − at ỹ
z = ỹ + (y − ỹ).
at y − at ỹ
t t
Then, at z = at ỹ + aβ−a ỹ t t β−a ỹ
t y−at ỹ (a y − a ỹ) = β. Furthermore, 0 < at y−at ỹ < 1. Thus, z is the convex
that is met by all elements of F with equality (e.g. the vector z ∈ F fulfills all inequalities in
A′′ x ≤ b′′ with strict inequality).
On the other hand, by Proposition 19 any facet is defined by an inequality of A′ x ≤ b′ . 2
36
Corollary 22 Let P ⊆ Rn be a polyhedron.
In particular, this means that the smallest possible representation of a full-dimensional polyhe-
dron P = {x ∈ Rn | Ax ≤ b} is unique (up to swapping inequalities and multiplying inequalities
with positive constants). If possible, we want to describe any polyhedron by facet-defining
inequalities because according to the Theorem 21, this gives such a smallest possible description
of the polyhedron (with respect to the number of inequalities).
Proof: “⇒:” Let F be a minimal face of P . By Proposition 19, we know that there is a
subsystem A′ x ≤ b′ of Ax ≤ b with F = {x ∈ P | A′ x = b′ }. Choose A′ x ≤ b′ maximal with
this property. Let Ãx ≤ b̃ be a minimal subsystem of Ax ≤ b such that F = {x ∈ Rn | A′ x =
b′ , Ãx ≤ b̃}.
We have to show the following claim:
Claim: Ãx ≤ b̃ is an empty system of inequalities.
Proof of the Claim: Assume that at x ≤ β is an inequality in Ãx ≤ b̃. The inequality at x ≤ β
is not redundant, so by Theorem 21, F ′ = {x ∈ Rn | A′ x = b′ , Ãx ≤ b̃, at x = β} is a facet of F ,
and hence, by Corollary 20, F ′ is s face of P . On the other hand, we have F ′ ̸= F , because
at x = β is not valid for all elements of F (otherwise we could have added at x ≤ β to the set of
inequalities A′ x ≤ b′ ). This is a contradiction to the minimality of F . This proves the claim.
“⇐:” Assume that F = {x ∈ Rn | A′ x = b′ } ⊆ P (for a subsystem A′ x ≤ b′ of Ax ≤ b) is
non-empty.
Then, F cannot contain a proper subset as a face (see Corollary 20 (d)).
37
Moreover, F = {x ∈ Rn | A′ x = b′ } = {x ∈ P | A′ x = b′ }, so by Proposition 19 the set F is a
face of P . Since any proper subset of F that is a face of P would also be a face of F and we
know that F does not contain proper subsets as faces, F is a minimal face of P . 2
(a) x′ is a vertex of P .
Proof:
i ∈ {1, . . . , k}, then at x′ = ki=1 λi at x(i) < β, which is a contradiction. But then, we have
P
x(i) ∈ {x ∈ P | A′ x = b′ } = {x′ } for all i ∈ {1, . . . , k}, which is a contradiction, too.
38
“(d) ⇒ (b)”: Let A′ x ≤ b′ be a maximal subsystem of Ax ≤ b such that A′ x′ = b′ . Assume
that A′ does not contain n linearly independent rows. Then, there is a vector d that is
orthogonal to all rows in A′ . Hence, for any ϵ > 0, we have A′ (x′ + ϵd) = A′ (x′ − ϵd) = b′ .
For any inequality at x ≤ β that is in Ax ≤ b but not in A′ x ≤ b′ , we have at x′ < β.
Therefore, if ϵ > 0 is sufficiently small, at (x′ + ϵd) ≤ β and at (x′ − ϵd) ≤ β are valid for
inequalities at x ≤ β in Ax ≤ b but not in A′ x ≤ b′ . In other words, we have (x′ + ϵd) ∈ P
and (x′ − ϵd) ∈ P . 2
Examples:
Corollary 26 If the linear program max{ct x | Ax ≤ b} is feasible and bounded and the
polyhedron P = {x ∈ Rn | Ax ≤ b} is pointed, then there is a vertex x′ of P such that
ct x′ = max{ct x | Ax ≤ b}. 2
3.5 Cones
39
Proof: Let {a1 , . . . , ak } be an inclusion-wise minimal set of vectors in X such that c ∈
cone({a
Pk 1 , . . . , ak }). This means that there are positive numbers λ1 , . . . , λk such that c =
i=1 i i .
λ a
We show that the vectors a1 , . P . . , ak are linearly independent. If this is not the case, there are
numbers γ1 , . . . , γk such that ki=1 γi ai = 0. We can assume that at least one γi is positive.
Choose σ maximal such that λi − σγi ≥ 0 for all i ∈ {1,P. . . , k}. Then, in particular, for at least
on on i ∈ {1, . . . , k}, we have λi − σγi = 0. Therefore, ki=1 (λi − σγi )ai is a representation of c
with less vectors, which is a contradiction to the minimality of the set {a1 , . . . , ak }. 2
Proof: Obviously, at most one of the statements can be valid. Let A be the matrix with rows
at1 , . . . , atm .
If c ∈ cone({a1 , . . . , am }) then by the previous theorem, c can be written as a non-negative
combination of linearly independent vectors from at1 , . . . , atm .
Hence, assume that c ̸∈ cone({a1 , . . . , am }), so there is no vector v ∈ Rm , v ≥ 0 such that
ct = v t A. By Farkas’ Lemma (Theorem 6), this implies that there is a vector ũ ∈ Rn such that
Aũ ≥ 0 and ct ũ < 0. This implies that the following LP (with u ∈ Rn as variable vector) has a
feasible solution:
max ct u
s.t. ct u ≤ −1
−ct u ≤ 1
−Au ≤ 0
Moreover, the LP is bounded (-1 is the value of an optimum solution). Hence, the optimum is
attained on a face of the solution polyhedron. By Theorem 23, we can write a minimal face
where the optimum solution value is attained as a set F = {u ∈ Rn | A′ u = b′ } where A′ u ≤ b′
is a subsystem of ct u ≤ −1, −ct u ≤ 1, −Au ≤ 0 consisting of t linearly independent vectors.
Hence, any vector u ∈ F fulfills the condition of (b). 2
40
Theorem 29 (Farkas-Minkowski-Weyl Theorem) A convex cone is polyhedral if and only
if it is finitely generated.
• {a1 , . . . , am } ⊆ Hu , and
• There are n − 1 linearly independent vectors ai1 , . . . , ain−1 in {a1 , . . . , am } such that
ut aij = 0 for j ∈ {1, . . . , n − 1}
m
The set H is finite because there are at most n−1 such half-spaces, and by Theorem 28 the
set cone({a1 , . . . , am }) is the intersection of these half-spaces. Hence, cone({a1 , . . . , am }) is a
polyhedron.
“⇒:” Let C = {x ∈ Rn | Ax ≤ 0} be a polyhedral cone. We have to show that C is finitely
generated. Let CA be the cone generated by the rows of A. By the first part of the proof, we
know that CA (as any other finitely generated cone) is polyhedral. Hence, there are vectors
d1 , . . . , dk ∈ Rn such that CA = {x ∈ Rn | dt1 x ≤ 0, . . . , dtk x ≤ 0}. Let CB = cone({d1 , . . . , dk })
be the cone generated by d1 , . . . , dk .
Claim: C = CB .
Proof of the claim: “CB ⊆ C”: Every row vector of A is contained in CA . Hence Adi ≤ 0 for all
i ∈ {1, . . . , k}. Therefore, di ∈ C (for i ∈ {1, . . . , k}) and thus (as C is a cone) CB ⊆ C.
“C ⊆ CB ”: Assume that there is a y ∈ C \ CB . Again by the first part, CB is polyhedral. Thus,
there must be a vector w ∈ Rn with wt di ≤ 0 (for i = 1, . . . , k) and wt y > 0. This implies
w ∈ CA , and therefore wt x ≤ 0 for all x ∈ C. Obviously, together with wt y > 0 this is a
contradiction to the assumption y ∈ C. 2
Remark: For a set S ⊆ Rn we call the set S o = {x ∈ Rn | xt y ≤ 0 for all y ∈ S}, the polar
cone of S (in particular it obviously is a convex cone). For a polyhedral cone C = {x ∈ Rn |
Ax ≤ 0} its polar cone C o is the cone generated by the rows of A (see exercises). We have just
seen in the proof that C oo = C for a polyhedral cone C.
3.6 Polytopes
41
Proof: “⇒:” Let X = {x ∈ Rn | Ax ≤ b} be a non-empty polytope. We can write X as
follows:
n x
X= x∈R | ∈C
1
where
x n+1
C= ∈R | λ ≥ 0, Ax − λb ≤ 0 .
λ
The set C is a polyhedral cone, so be Theorem 29 it s finitely generated by a set λx11 , . . . , λxkk
Hence, all λi are positive (for i ∈ {1, . . . , k}). We can even assume that we have λi = 1 for all
i ∈ {1, . . . , k} because otherwise we could scale all vectors by the factor λi . Thus, we have
x x1 xk
x ∈ X ⇔ ∃µ1 , . . . , µk ≥ 0 : = µ1 + · · · + µk .
1 1 1
This implies that X is the convex hull of x1 , . . . , xk .
“⇐:” Let X = conv({x1 , . . . , xk }) be the
convex hull of x1 , . . . , xk . Wex1have
to show
that X is a
x1 xk xk
polytope. Let C = cone({ 1 , . . . , 1 }) be the cone generated by 1 , . . . , 1 .
Then, we have
x
x∈X⇔ ∈ C.
1
x
By Theorem 29, C is polyhedral, so we can write C as C = { λ
| Ax + bλ ≤ 0}. This shows
X = {x ∈ Rn | Ax + b ≤ 0}, so X is a polyhedron.
It is P
even a polytope, because for M = max{||x i || | i ∈ {1, . . . , k}}
Pkand x ∈ X, we can
Pk write x as
k Pk
x = i=1 λi xi with λ1 , . . . , λk ≥ 0 and i=1 λi = 1, so ||x|| ≤ i=1 λi ||xi || ≤ M i=1 λi = M .
2
Proof: Let P be a polytope with vertex set X. Since P is convex and X ⊆ P , we have
conv(X) ⊆ P . It remains to show that P ⊆ conv(X). Theorem 30 implies that conv(X) is a
polytope, so in particular a polyhedron. Assume that there is a vector y ∈ P \ conv(X). Then,
there is a half-space Hy = {x ∈ Rn | ct x ≤ δ} such that conv(X) ⊆ Hy and y ̸∈ Hy . This means
that ct y > ct x for all x ∈ X, so the maximum of the function ct x over P will not be attained at
a vertex. This is a contradiction to Corollary 26. 2
Notation: For two vector sets X, Y ⊆ Rn , we define their Minkowski sum as:
X + Y := {z ∈ Rn | ∃x ∈ X ∃y ∈ Y : z = x + y}.
42
In particular if X = ∅ or Y = ∅, we get X + Y = ∅.
P = conv(V ) + cone(E).
43
44
4 Simplex Algorithm
The Simplex Algorithm by Dantzig [1951] is the oldest algorithm for solving general linear
programs. Geometrically it works as follows: Given a polyhedron P and a linear objective
function, we start with any vertex of P . Then we walk along a one-dimensional face of P to
another vertex and repeat this until we found a vertex where the objective function attains a
maximum.
If we want to have a chance to follow this main strategy, we need a pointed polyhedron. That
is why in this section we consider linear programs in standard equation form:
max ct x
s.t. Ax = b (28)
x ≥ 0
max ct x
s.t. Ãx ≤ b (29)
x ≥ 0
45
4.1 Feasible Basic Solutions
Notation: We denote the index set of the columns of a matrix A ∈ Rm×n by {1, . . . , n}. For a
subset B ⊆ {1, . . . , n}, we denote by AB the sub-matrix of A containing exactly the columns
with index in B. Similarly, for a vector x ∈ Rn , we denote by xB the sub-vector of x containing
the entries with index in B. Note that xB is a vector of length |B| but its entries are not indexed
from 1 to |B|, but the indices are the elements of B, so for example for B = {2, 4, 9} we have
xB = (x2 , x4 , x9 ).
(b) If x is a basic solution of Ax = b for B, then the variables xj with j ∈ B are called
basic variables and the variables xj with j ∈ N are called non-basic variables.
(c) A basic solution x is called feasible if x ≥ 0. A basis is called feasible if its basic
solution is feasible.
Remark: We also use the above definition for inequality systems of the type Ãx ≤ b, x ≥ 0
(with à ∈ Rm×ñ ). E.g. we call a vector x∗ ∈ Rñ with Ãx∗ ≤ b and x∗ ≥ 0 a basic solution if x∗ , s∗
with s∗ := b − Ãx∗ is a basic solution for Ãx + Im s = b, x ≥ 0, s ≥ 0 (with n := ñ + m variables).
In particular, in a feasible basic solution of Ãx ≤ b, x ≥ 0, the number of tight constraints
(including non-negativity constraints) must be at least n − m = ñ, and in a non-degenerated
feasible basic solution, the number of tight constraints must be exactly ñ. This is because
each positive non-slack variable and each positive slack variable is associated with a non-tight
constraint.
Example: Consider the following system of equations:
x 1 + x2 + s 1 = 1
2x1 + x2 + s2 = 2 (30)
x1 , x 2 , s 1 , s 2 ≥ 0
The variables are x1 , x2 , s1 , and s2 . We denoted the last two variables by s1 and s2 because
they can be interpreted as slack variables for the following system of inequalities: x1 + x2 ≤
1, 2x1 + x2 ≤ 2, x1 , x2 ≥ 0.
46
If we write the system of equations in matrix notation, we get:
x1
1 1 1 0 x 2 = 1
2 1 0 1 s1 2
s2
1 1
For B = {1, 2}, we get AB = with feasible basis solution (1, 0, 0, 0). So in particular
2 1
this
basic feasible solution is degenerated. If we choose instead B = {2, 3}, we get AB =
1 1
and the corresponding basic solution is (0, 2, −1, 0) which if, of course, infeasible.
1 0
Figure 5 illustrates these two basic solutions. However, note that the figure does not show the
solution space (which is 4-dimensional) but only the solution space of the problem without the
slack variables s1 and s2 , i.e. the solution space of the system x1 + x2 ≤ 1, 2x1 + x2 ≤, x1 , x2 ≥ 0.
So the two points (1, 0) and (0, 2) are basic solutions only in the sense of the remark stated
after the last definition.
y
Infeasible
2x1 + x2 = 2
x1 + x2 = 1
Degenerated
In this example we could easily make the degenerated basic solution non-degenerated by
skipping the redundant constraint 2x1 + x2 ≤ 2. This is always possible if we only have two
non-slackness variables but already in three dimensions there are instances where we cannot
get rid of degenerated basic solutions. As an example consider Figure 6. If the pyramid defines
the set of all feasible solution the marked vector is a degenerated basic solution, because four
constraints are fulfilled with equality while there are only three non-slack variables.
Note that the example (30) shows that the same vertex of a polyhedron can belong to a
degenerated or a non-degenerated basic solution, depending on how we describe the polyhedron
by a system of inequalities.
47
Degenerated basic solution
Proof: The vector x′ is a vertex of P is and only if it is a feasible solution of the following
system and fulfills n linearly independent inequalities of the system with equality:
Ax ≤ b
−Ax ≤ −b
−In x ≤ 0
This is the case if and only if x′ ≥ 0, Ax′ = b and x′N = 0 for a set N ⊆ {1, . . . , n} with
|N | = n − m such that with B = {1, . . . , n} \ N the matrix AB has full rank. This is equivalent
to being a feasible basic solution. 2
Before we describe the algorithm in general, we will present some examples (which are taken
from Matoušek and Gärtner [2007]).
Consider the following linear program:
max x1 + x2
s.t. −x1 + x2 + x3 = 1
x1 + x4 = 3
x2 + x5 = 2
x1 , x 2 , x 3 , x 4 , x 5 ≥ 0
x1
−1 1 1 0 0 x2
1
1 0 0 1 0 x3 = 3
0 1 0 0 1 x4 2
x5
48
We first need a basis to start with. We simply choose B = {3, 4, 5}, which gives us the basic
solution x = (0, 0, 1, 3, 2). We write the constraints and the objective function in a so-called
simplex tableau:
x3 = 1 + x1 − x2
x4 = 3 − x1
x5 = 2 − x2
z = x1 + x2
The first three rows describe an equation system that is equivalent to the given one but each
basic variable is written as a combination of the non-basic variable. The last line describes the
objective function.
We will try to increase non-basic variables (which are zero in the current solution) with a
positive coefficient in the objective function. Hence, here we could use x1 or x2 , and we choose
x2 . Equation x3 = 1 + x1 − x2 is the critical constraint that prevents us from increasing to
something bigger than 1 (without increasing x1 ). If we set x2 to something bigger than 1,
x3 would become negative. The constraint x5 = 2 − x2 only gives an upper bound of 2 for
the value of x2 . Since the bound induced by non-negativity of x3 is tighter (so the constraint
x3 = 1 + x1 − x2 is critical), we replace 3 in the basis by 2. The new basic variable x2 can be
written as a combination of the non-basic variables by using the first constraint: x2 = 1 + x1 − x3 .
The new base is B = {2, 4, 5} with a new basic solution x = (0, 1, 0, 3, 1). This is the new
simplex tableau:
x2 = 1 + x1 − x3
x4 = 3 − x1
x5 = 1 − x1 + x3
z = 1 + 2x1 − x3
Increase x1 . x5 = 1−x1 +x3 is critical. x1 = 1+x3 −x5 . New base B = {1, 2, 4}. x = (1, 2, 0, 2, 0).
x1 = 1 + x3 − x5
x2 = 2 − x5
x4 = 2 − x3 + x5
z = 3 + x3 − 2x5
Increase x3 . x4 = 2−x3 +x5 is critical. x3 = 2−x4 +x5 . New base B = {1, 2, 3}. x = (3, 2, 2, 0, 0).
x1 = 3 − x4
x2 = 2 − x5
x3 = 2 − x4 + x5
z = 5 − x4 − x5
The value of the objective function for any feasible solution (x1 , . . . , x5 ) is 5 − x4 − x5 . Since we
have found a solution where x4 = x5 = 0 and we have the constraint that xi ≥ 0 (i = 1, . . . , 5),
49
our solution is an optimum solution.
Unbounded instance:
As a second example, consider:
max x1
s.t. x 1 − x2 + x3 = 1
−x1 + x2 + x4 = 2
x1 , x 2 , x 3 , x 4 ≥ 0
Quite obviously this LP in unbounded (one can choose x1 arbitrarily large and set x2 = x1 ,
x3 = 1, and x4 = 2).
Again we use the “slack variables” (here x3 and x4 ) for a first basis. This gives B = {3, 4} and
x = (0, 0, 1, 2).
x3 = 1 − x1 + x2
x4 = 2 + x1 − x2
z = x1
x1 = 1 + x2 − x 3
x4 = 3 − x3
z = 1 + x2 − x 3
We can increase x2 as much as we want (provided that we increase x1 by the same amount).
Thus the simplex tableau show that the linear program is unbounded.
Degeneracy:
A final example shows what may happen if we get a degenerated basic solution.
max x2
s.t. −x1 + x2 + x3 = 0
x1 + x4 = 2
x1 , x 2 , x 3 , x 4 ≥ 0
x3 = x1 − x2
x4 = 2 − x1
z = x2
50
We want to increase x2 . x3 = x1 − x2 is critical. x2 = x1 − x3 . We will replace 3 by 2 in the
basis. However, we cannot increase x2 . New base B = {2, 4}. x = (0, 0, 0, 2).
x2 = x1 − x3
x4 = 2 − x1
z = x1 − x3
x1 = 2 − x4
x2 = 2 − x3 − x4
z = 2 − x3 − x4
Again, we have found an optimum solution because all coefficients of the non-basic variables in
the objective function z = 2 − x3 − x4 are negative.
After these three examples, we will now describe the simplex method in general.
For a basis B, the simplex tableau is a system T (B) of m + 1 linear equations with variables
x1 , . . . , xn and z with this form
xB = p + QxN
(31)
z = z0 + rt xN
• xB is the vector of the basic variables, N = {1, . . . , n} \ B, and xN is the vector of the
non-basic variables,
Note that the entries of p are not necessarily numbered from 1 to m but that p uses B as the
set of indices (and for r, we have a corresponding statement). In particular, the rows of Q are
indexed by B and the columns by N . We denote the entries of Q by qij (where i ∈ B and
j ∈ N ).
Then xB = A−1 −1
B b − AB AN xN which is equivalent to AB xB = b − AN xN and Ax = b.
51
Moreover, z = ctB A−1 t t −1 t −1 t t −1
B b + (cN − (cB AB AN ))xN = cB AB (b − AN xN ) + cN xN = cB AB AB xB +
ctN xN = ctB xB + ctN xN = ct x for any solution x of Ax = b. 2
Remark: It is easy to check that there is only one simplex tableau for every feasible basis B.
The cost function z0 + rt xN does not directly depend on the basic variables but only on the
non-basic variables. Their impact on the overall cost is given be the vector r = cN − (ctB A−1 t
B AN ) .
An entry of r is called the reduced cost of its corresponding non-basic variable.
If all reduced costs are non-positive, we have already found an optimum solution:
Lemma 35 Let T (B) be a simplex tableau for a feasible basis B. If r ≤ 0, then the basic
solution of B is optimum.
Proof: Let x the feasible basic solution for B. Let K ∈ R with K > ct x be a constant. Define
tx
a new feasible solution x̃ as follows: x̃α := K−c
rα
, x̃i = xi for i ∈ N \ {α}, and x̃j := pj + qjα x̃α
for j ∈ B. It is easy to check that x̃ is a feasible solution with ct x̃ ≥ K. Hence, the linear
program is unbounded. 2
In the following, we denote the entries of A by aij (i ∈ {1, . . . , m}, j ∈ {1, . . . , n}). The column
of A with index j is denoted by a·j .
Proof: We have to show that AB̃ has full rank and that it is feasible i.e. that its basic solution
is non-negative.
All but one columns of AB̃ belong to AB . Hence, the matrix A−1B AB̃ contains all unit
vectors ei with the possible exception of eβ because we removed the β-th column from
AB . However, this removed column has been replaced by the α-th column a·α of A, so
52
the remaining column of A−1 −1
B AB̃ is AB a·α . But this is exactly the column with index
α of −Q = A−1 −1
B AN . By construction, qβα ̸= 0, so all columns of AB AB̃ are linearly
independent.
β p
(ii) We have to show that the basic solution of B̃ is non-negative. We increase xα to − qβα and
pβ
set the basic variables xB to p − q·α qβα , where q·α is the column with index α of Q. For
pβ
i ∈ B with qiα ≥ 0 (so in particular i ̸= β) we have pi − qiα qβα ≥ pi ≥ 0. For i ∈ B with
pβ pi pβ
qiα < 0 we have qβα ≥ qiα , so pi ≥ qiα qβα with equality in the last inequality for i = β.
This leads to xβ = 0 and xB ≥ 0, so we get a feasible basic solution for B̃. 2
xB = p + QxN
z = z0 + rt xN
for the basis B; // See equation (31) and the following notation.
5 if r ≤ 0 then
return x̃ = x; // x̃ is optimum (see Lemma 35).
6 Choose an index α ∈ N with rα > 0;
// Here we can apply different pivot rules.
7 if qiα ≥ 0 for all i ∈ B then
return “unbounded”; // By Lemma 36, the LP is unbounded.
pβ
8 Choose an index β ∈ B with qβα < 0 and qβα = max{ qpiαi | qiα < 0, i ∈ B};
// Again, we can apply different pivot rules.
9 Set B = (B \ {β}) ∪ {α};
// See Lemma 37 proving that we get a new feasible basis.
10 go to line 3
53
max −(xn+1 + xn+2 + · · · + xn+m )
s.t. Ãx̃ = b (32)
x̃ ≥ 0
For this linear program, it is trivial to find a feasible basis ({n + 1, . . . , n + m} will work), so
we can solve it by the Simplex Algorithm. If the value of its optimum solution is negative,
this means that the original linear program does not have a feasible solution. Otherwise, the
Simplex Algorithm will provide a basic solution for the original linear program. In this case,
the solution of the new LP computed by the Simplex Algorithm could contain variables
from xn+1 , . . . , xn+m as basic variables but their value must be 0 and hence they can be replaced
easily by variables from x1 , . . . , xn .
In lines 6 and 8, we may have a choice between different candidates to enter or leave the basis.
The elements chosen in these steps are called pivot elements, and the rules by which we choose
them are called pivot rules. Several different pivot rules for the entering variable have been
proposed:
• Largest coefficient rule: For the entering variable choose α such that rα is maximized.
This is the rule that was proposed by Dantzig in his first description of the Simplex
Algorithm.
• Largest increase rule: Choose the entering variable such that the increase of the
objective function is maximized. Finding an α with that property takes more time because
it is not sufficient to consider the vector r only.
• Steepest edge rule: Choose the entering variable in such a way that we move the
feasible basic solution in a direction as close to the direction of the vector c as possible.
This means we maximize
ct (xnew − xold )
||xnew − xold ||
where xold is the basic feasible solution of the current basis and xnew is the basic feasible
solution of the basis after the exchange step. This rule is even more timing-consuming but
in many practical experiments it turned out to lead to a small number of exchange steps.
Here, we only analyze a pivot rule that is quite inefficient in practice but has the nice property
that we can show that the Simplex Algorithm terminates at all, if we follow that rule. If all
exchange steps improve the value of the current solution, we can be sure that the algorithm will
terminate because we can never visit the same basic solution twice, and there is only a finite
(though exponential) number of basic solutions. However, exchange steps do not necessarily
change the value of the solution. Therefore, depending on the pivot rules, it is possible that
the Simplex Algorithm runs in an endless loop by considering the same sequence of bases
forever. This behavior is called cycling (see page 30 ff. of Chvátal [1983] for an example that
this can really happen). The good news is that we can avoid cycling by using an appropriate
pivot rule.
54
If the algorithm does not terminate, it has to consider the same basis B twice. The computation
between two occurrences of B is called a cycle. Let F ⊆ {1, . . . , n} be the indices of the variables
that have been added to (and hence removed from) the basis during one cycle. We call xF the
cycle variables.
Lemma 38 If the Simplex Algorithm cycles, all basic solutions during the cycling
are the same and all cycle variables are 0.
Proof: The value of a solution considered in Simplex Algorithm never decreases, so during
cycling it cannot increase either. Let B be a feasible basis that occurs in the cycle, and let
B ′ = (B ∪ {α}) \ {β} be the next basis. The only non-basic variable that could be increased is
xα . However, if it indeed was increased, then, because rα > 0, this would increase the value of
the solution. This shows that the non-basic variables remain zero. But then, all variables remain
unchanged because the basic variables are determined uniquely by the non-basic variables. 2
A pivot rule that is able to avoid cycling is Bland’s rule (Bland [1977]) that can be described
as follows: In line 6 of the Simplex Algorithm, we choose α among all elements in N with
rα > 0 such that α is minimal. In line 8, we choose β among all elements in B with qβα < 0
pβ
and qβα = max{ qpiαi | qiα < 0, i ∈ B} such that β is minimal.
Theorem 39 With Bland’s rule as pivot rule in lines 6 and 8, the Simplex Algorithm
terminates after a finite number of steps.
Proof: Assume that the algorithm cycles while using Bland’s rule. We use the notation from
above and consider the set F of the indices of the cycle variables. Let π be the largest element
of F , and let B be the basis just before π enters the basis. Let p,Q,r and z0 be the entries of
the simplex tableau T (B). Let B ′ be the basis just before π leaves it. Let p′ ,Q′ ,r′ and z0′ be the
entries of the simplex tableau T (B ′ ).
Let N = {1, . . . , n} \ B be the set of the non-basic variables (so in particular π ∈ N ). According
to Bland’s rule we choose the smallest index and π = max(F ), so when B is considered, π is
the only candidate in F to enter the basis. In other words:
Let α be the index entering B ′ . Again by Bland’s rule, π must have been the only candidate
among all elements of F to leave B ′ . Since p′j = 0 for all j ∈ B ′ ∩ F , this means that
′ ′
qπα < 0 and qjα ≥ 0 for j ∈ B ′ ∩ (F \ {π}). (34)
Roughly spoken, we will get a contradiction because (33) says that in a feasible basic solution
increasing a non-basic variable in xF \{π} or decreasing xπ (to something negative!) will not
55
improve the result. On the other hand, (34) says that increasing xα while decreasing xπ (again
to something negative) will improve the result.
We will formalize this statement by considering the following auxiliary linear program:
max ct x
s.t. Ax = b
xF \{π} ≥ 0 (35)
xπ ≤ 0
xN \F = 0
≥0 if j ∈ F \ {π}
xj
≤0 if j = π
56
4.3 Efficiency of the Simplex Algorithm
We have seen that Bland’s rule guarantees that the Simplex Algorithm will terminate. What
can we say about the running time? Consider for some ϵ with 0 < ϵ < 12 the following example:
max xn
−x1 ≤ 0
x1 ≤ 1
ϵxj−1 − xj ≤ 0 for j ∈ {2, . . . , n}
ϵxj−1 + xj ≤ 1 for j ∈ {2, . . . , n}
Of course, adding non-negativity constraints for all variables would not change the problem.
The polyhedron defined by these inequalities is called Klee-Minty cube (Klee and Minty
[1972]). It turns out that the Simplex Algorithm with Bland’s rule (depending on the initial
solution) may consider 2n bases before finding the optimum solution. In particular, this example
shows that we don’t get a polynomial-time algorithm.
The bad news is that for any of the above pivot rules instances have been found where the
Simplex Algorithm with that particular pivot rule has exponential running time (see e.g.
Goldfarb and Sit [1979] for the steepest edge rule).
Assume that you are given an optimum pivot rule that guides you to an optimum solution
with a smallest possible number of iterations. Then, the number of iterations depends on the
following property of the instances:
Obviously, if we don’t make any assumptions on the starting solution, the number of iterations
performed by the Simplex Algorithm optimizing over a polyhedron P will be at least the
combinatorial diameter of P , even with an optimum pivot rule.
It is an open question what the largest combinatorial diameter of a d-dimensional polyhedron
with n facets is. In 1957, W. Hirsch conjectured that the combinatorial diameter could be
at most n − d. This conjecture was open for decades but it has been disproved by Santos
[2011] who showed that there is a 20-dimensional polyhedron with 40 facets and combinatorial
diameter 21. More generally, he proved that there are counter-examples to the Hirsch conjecture
with arbitrarily many facets. Nevertheless, it is still possible that the combinatorial diameter
is always polynomially (or even linearly) bounded in the dimension and the number of facets.
An upper bound for the combinatorial diameter is O(n2+log d ), which was proven by Kalai and
Kleitman [1992]. This bound has been improved by Todd [2014] to O((n − d)log d ). For an
57
overview of this topic see Section 3.3 of Ziegler [2007].
In practical experiments, the Simplex Algorithm typically turns out to be very efficient. It
could also be proved that the average running time (with a specified probabilistic model) is
polynomial (see Borgwardt [1982]). Moreover, Spielmann and Teng [2005] have shown that the
expected running time on a slight perturbation of a worst-case instance can be bounded by a
polynomial (“smoothed analysis”, see also Dadush and Hulberts [2019])
If the linear program max{ct x | Ax = b, x ≥ 0} is feasible and bounded then the Simplex
Algorithm does not only provide an optimum primal solution but we can also get an optimum
solution of the dual linear program min{bt y | At y ≥ c}. To see this, let B the feasible basis
corresponding to the optimum computed by the Simplex Algorithm. Set ỹ = A−t B cB (where
A−tB = (A t −1
B ) ). This leads to A t
B ỹ = c B and At
N ỹ = A t
A −t
N B B c ≥ c N where the last inequality
t −1 t
follows from the fact that in T (B) we have 0 ≥ r = cN − (cB AB AN ) . So the vector ỹ is feasible
for the dual LP, and it is an optimum solution because together with the (primal) basic solution
x̃ for the basis B, it satisfies the complementary slackness condition (ỹ t A − ct )x̃ = 0.
In fact, the condition r ≤ 0 in the simplex tableau T (B) guarantees the existence of a dual
solution y with y t AB = ctB . In the Dual Simplex Algorithm, we start with a feasible basic
dual solution, i.e. a feasible dual solution for which a basis B exists with y t AB = ctB . If ctB A−1
B
is a feasible dual solution, we call B a dual feasible basis. Then, we compute the corresponding
simplex tableau T (B) (which exists for any basis not just a feasible basis). Thus the vector r
will have no positive entry. Note that B may not be feasible, so entries of p can be negative.
Now the algorithm swaps elements between the basis and the rest of the variables similarly to
the simplex algorithm but instead of keeping p non-negative it keeps r non-positive.
For any basis B such that in T (B) the vector r has no positive entry, the following properties
(that are easy to prove) are the basis of the Dual Simplex Algorithm:
58
• z0 is the current solution value of the dual solution.
• If there is a β ∈ B with pβ < 0 such that qβj ≤ 0 for all j ∈ N , then the primal LP is
infeasible.
r
• For β ∈ B with pβ < 0 and α ∈ N with qβα > 0 with qrβα α
≥ qβjj for all j ∈ N with qβj > 0,
then (B \ {β}) ∪ {α} is a dual feasible basis. Then the value of the dual solution is changed
−p
by qβαβ rα . In particular, if rα ̸= 0 then the value of the dual solution gets smaller.
The Dual Simplex Algorithm simply applies the exchange steps in the last item until we
get a feasible basis. The algorithm can be considered as the Simplex Algorithm applied to
the dual LP. Thus it can also run into cycling and its efficiency is not better then the efficiency
of the Simplex Algorithm.
However, in some applications, the Dual Simplex Algorithm is very useful: If you add
an additional constraint to the primal LP, then a primal solution can become infeasible, so in
the Primal Simplex Algorithm we have to start from scratch. However, the dual solution
is still feasible. It is possibly not optimal but often it can be made optimal with just some
iterations of the Dual Simplex Algorithm.
The Network Simplex Algorithm can be seen as the Simplex Algorithm applied to
Min-Cost-Flow-Problems. Even for this special case, we cannot prove a polynomial running
time but it turns out that, in practice, the Network Simplex Algorithm is among the
fastest algorithms for Min-Cost-Flow-Problems. Though it is a variant of the Simplex
Algorithm, it can be described as a pure combinatorial algorithm.
•
P P
e∈δ + (v) f (e) −
G e∈δ − (v) f (e) = b(v) for all v ∈ V (G).
G
Notation: We call b(v) the balance of v. If b(v) > 0, we call it the supply of v, and if b(v) < 0,
we call it the demand of v. Nodes v of G with b(v) > 0 are called sources, nodes v with
b(v) < 0 are called sinks.
During this chapter, n is always the number of nodes and m the number of edges of the graph
G.
59
Minimum-Cost Flow Problem
↔ ↔
Definition 16 Let G be a directed graph. We define the graph G by V (G) = V (G) and
↔ ← ←
E(G) = E(G)∪{ ˙ e | e ∈ E(G)} where e is an edge from w to v if e is an edge from v to
← ↔
w. e is called the reverse edge of e. Note that G may have parallel edges even if G does
not contain any parallel edges. If we have edge costs c : E(G) → R these are extended
↔ ←
canonically to edges in E(G) by setting c( e ) = −c(e).
Let (G, u, b, c) be an instance of the Minimum-Cost Flow Problem and let f be a
b-flow in (G, u). Then, the residual graph Gu,f is defined by V (Gu,f ) := V (G) and
← ↔
E(Gu,f ) := {e ∈ E(G) | f (e) < u(e)}∪{ ˙ e ∈ E(G) | f (e) > 0}. For e ∈ E(G) we define
←
the residual capacity of e by uf (e) = u(e) − f (e) and the residual capacity of e by
←
uf ( e ) = f (e).
The residual graph contains the edges where flow can be increased as forward edges and edges
where flow can be reduced as reverse edges. In both cases, the residual capacity is the maximum
value by which the flow can be modified. If P is a subgraph of the residual graph, then an
augmentation along P by γ means that we increase the flow on forward edges in P (i.e. edges
in E(G) ∩ E(P )) by γ and reduce it on reverse edges in P by γ. Note that the resulting mapping
is only a flow if γ is at most the minimum of the residual capacity of the edges in P .
60
Lemma 40 Let (G, u, b, c) be an instance of the Minimum-Cost Flow Problem. A
b-flow f is a spanning tree solution if and only if x̃ ∈ RE(G) with x̃e = f (e) is a vertex of
the polytope
X X
E(G)
x∈R | 0 ≤ xe ≤ u(e) (e ∈ E(G)), xe − xe = b(v) (v ∈ V (G)) . (36)
+ −
e∈δ (v) e∈δ (v)
Proof: “⇒:” Let f be a spanning tree solution and x̃ ∈ RE(G) with x̃e = f (e) for e ∈ E(G).
Consider all inequalities xe ≥ 0 for edges e with f (e) = 0, xe ≤ u(e) for edges e with f (e) = u(e)
and for each connectedP componentPof (V (G), {e ∈ E(G) | 0 < f (e) < u(e)}) for all but one
vertex the equation e∈δ+ (v) xe − e∈δ− (v) xe = b(v). These are |E(G)| linearly independent
inequalities that are fulfilled with equality by x̃. Hence x̃ is a vertex.
“⇐:” Let f be a b-flow. Assume that x̃ ∈ RE(G) with x̃e = f (e) is a vertex of the polytope (36).
Assume that (V (G), {e ∈ E(G) | 0 < f (e) < u(e)}) contains an undirected cycle C. Choose
an ϵ > 0 such that ϵ ≤ min{min{f (e), u(e) − f (e)} | e ∈ E(C)}. Fix one of the two possible
orientations of C. We call an edge of C a forward edge if its orientation is the same as the
chosen orientation, otherwise it is called backward edge. Set x′e = ϵ for all forward edges and
x′e = −ϵ for all backward edges. For all edges e ∈ E(G) \ E(C), we set x′e = 0. Then x̃ + x′
and x̃ − x′ belong to the polytope (36) and x̃ = 12 ((x̃ + x′ ) + (x̃ − x′ )), so by Proposition 25, x̃
cannot be a vertex. Hence, we have a contradiction. 2
Proof: Since the polyhedron (36) is in fact a polytope, it is pointed, so there is an optimum
solution that is a vertex. Together with Lemma 40, this proves the statement. 2
61
Definition 18 Let (G, u, b, c) be an instance of the Minimum-Cost Flow Problem
where we assume that G is connected. A spanning tree structure is a quadruple
(r, T, L, U ) where r ∈ V (G), E(G) = T ∪˙ L ∪˙ U , |T | = |V (G)| − 1, and (V (G), T ) does
not contain any undirected cycle.
The b-flow f associated to the spanning tree structure (r, T, L, U ) is defined by
• f (e) = 0 for e ∈ L,
• f (e) = v∈Ce b(v) + e′ ∈U ∩δ− (Ce ) u(e′ ) − e′ ∈U ∩δ+ (Ce ) u(e′ ) for e ∈ T where we
P P P
G G
denote by Ce vertex set of the the connected component of (V (G), T \ {e}) containing
v (for e = (v, w)).
Let (r, T, L, U ) be a spanning tree structure and f the b-flow associated to it. The structure
(r, T, L, U ) is called feasible if 0 ≤ f (e) ≤ u(e) for all e ∈ T .
An edge (v, w) ∈ T is called downward if v is on the undirected r-w-path in T , otherwise
is is called upward.
A feasible spanning tree structure (r, T, L, U ) is called strongly feasible if 0 < f (e) for
every downward edge e ∈ T and f (e) < u(e) for every upward edge e ∈ T (where f is
again the b-flow associated to (r, T, L, U )).
We call the unique function π : V (G) → R with π(r) = 0 and cπ (e) := c(e)+π(v)−π(w) = 0
for all e = (v, w) ∈ T the potential associated to the spanning tree structure
(r, T, L, U ).
Remarks:
• Obviously, the b-flow associated to the spanning tree structure (r, T, L, U ) fulfills the flow
conservation rule, but it may be infeasible.
↔ ↔
• π(v) is the length of the r-v-path in (G, c ) consisting of edges of T and their reverse
edges, only.
• In a strongly feasible tree structure, we can send a positive flow from each vertex v to r
along tree edges such that that the new flow remains non-negative and fulfills the capacity
constraints.
62
Proof: Since the potential π just encodes the distances to r in T , a breadth-first search in
the edges of T and the reverse edges of T is sufficient.
We can compute f by scanning the vertices in an order of non-increasing distance to r in T . 2
Proposition 43 Let (r, T, L, U ) be a feasible spanning tree structure and π the potential
associated to it. If cπ (e) ≥ 0 for all e ∈ L and cπ (e) ≤ 0 for all e ∈ U , then the b-flow
associated to (r, T, L, U ) is optimum.
Proof: The flow associated to (r, T, L, U ) is a basic solution of the standard linear program-
ming formulation for the minimum-cost flow problem. The criterion in the proposition is
equivalent to the statement that the reduced costs of all non-basic variables are non-positive.
This is equivalent to the optimality of the solution. 2
↔ ←
For an edge e = (v, w) ∈ E(G) \ T with e ̸∈ T , we call e together with the w-v path consisting
of edges of T and reverse edges of edges of T only, the fundamental circuit of e. The vertex
closest to r in the fundamental circuit is called the peak of e.
Algorithm 2 gives a summary of the Network Simplex Algorithm. As an input, we need
a strongly feasible tree structure. However, even if there is a feasible b-flow, such a strongly
feasible tree structure may not exist. But we can modify the instance such that we can easily
find a strongly feasible tree structure (r, T, L, U ). We add artificial expensive edges between r
and all other nodes. For each sink v ∈ V (G) \ {r}, we add an edge (r, v) with u((r, v)) = −b(v).
For all other nodes v ∈ V (G) \ {r} we add an edges (v, r) with u((v, r)) = b(v) + 1. Then,
we get a strongly feasible spanning tree structure by setting L to the set of all old edges (i.e.
without the artificial edges connecting r) and by setting U = ∅. If the weight on the artificial
edges is high enough (1 + n maxe∈E(G) |c(e)| would be sufficient) and there is a solution that
does not use these edges at all, no optimum solution will send flow along these new edges, so
the new instance is equivalent.
Proof: It is easy to check that after the modification in the lines 7 to 14 f and π are still the
b-flow and the potential associated to (r, T, L, U ).
We will show that the spanning tree structure (r, T, L, U ) remains strongly feasible. By the
choice of γ in line 5 it remains feasible.
For an edge e = (v, w) on T let ẽ = (v, w) if e is an upward edge and ẽ = (w, v) if e is a
downward edge. We have to show that after an iteration of the algorithm, for all edges e ∈ T ,
the edge ẽ has a positive residual capacity. This is obvious for all edges outside C. For the edges
on the path on C from the head of e′ to the peak of C, this is also obvious because we augment
63
Algorithm 2: Network Simplex Algorithm
Input: An instance (G, u, b, c) of the Minimum-Cost Flow Problem and a strongly
feasible spanning tree structure (r, T, L, U ).
Output: A minimum-cost flow f .
1 Compute the b-flow f and the potential π associated to (r, T, L, U );
2 Let e0 be an edge with e0 ∈ L and cπ (e0 ) < 0 or an edge with e0 ∈ U and cπ (e0 ) > 0;
3 if No such edge exists then
return f
←
4 Let C be the fundamental circuit of e0 (if e0 ∈ L) or of e0 (if e0 ∈ U ) and let ρ = cπ (e0 );
5 Let γ = mine′ ∈E(C) uf (e′ ), and let e′ the last edge where this minimum is attained when
C is traversed (starting at the peak);
←
6 Let e1 be the corresponding edge in the input graph, i.e. e′ = e1 or e′ =e1 ;
7 Remove e0 from L or U ;
8 Set T = (T ∪ {e0 }) \ {e1 };
9 if e′ = e1 then
Set U = U ∪ {e1 };
10 else
Set L = L ∪ {e1 };
11 Augment f along γ by C;
12 Let X be the connected component of (V (G), T \ {e0 }) that contains r;
13 if e0 ∈ δ + (X) then
Set π(v) = π(v) + ρ for v ∈ V (G) \ X;
14 if e0 ∈ δ − (X) then
Set π(v) = π(v) − ρ for v ∈ V (G) \ X;
15 go to line 2;
by γ = uf (e′ ) which is smaller than the residual capacities on this path (by the choice of e′ ).
For the remaining edges e on C − e′ , the residual capacity uf (ẽ) is, after the augmentation, at
least γ. Thus, is if γ > 0, we are done. But if γ = 0, then e′ must be on the path from the peak
to e0 , so for the edges e on the path from the peak to the tail of e′ we had uf (ẽ) > 0 before
the augmentation (because (r, T, L, U ) was strongly feasible), so this is still the case after the
augmentation.
We will show that we never consider the same spanning tree structure twice. In each iteration,
the cost of the flow is reduced by γ|ρ|, so if γ > 0, then we are done. Hence assume that γ = 0.
− +
P e0 ̸= e1 (since all capacities are positive). Moreover, e0 ∈ L ∩ δ (X) or e0 ∈ U ∩ δ (X),
Then
so v∈V (G) π(v) will get larger (and it will never get smaller). This shows that we can never
get the same spanning tree structure twice.Since there is only a finite number of spanning tree
structures, this proves that the algorithm will terminate after a finite number of iterations.
By Proposition 43, the output of the algorithm is optimal when the algorithm terminates. 2
64
5 Sizes of Solutions
Before we will describe polynomial-time algorithms for solving linear programs we have to make
sure that we can store the output and all intermediate results with numbers whose sizes are
polynomial in the input size. To this end we have to define the size of numbers. Assuming that
all numbers are given in a binary representation, we define for
• r= p
q
with p, q ∈ Z, relatively prime: size(r) := size(p) + size(q),
Remark: In order to get a description of a fraction r of with size(r) bits, we have to write
r as pq for numbers p, q ∈ Z that are relatively prime. Therefore, in any computation, when a
fraction pq arises, we apply the Euclidean Algorithm to p and q and divide p and q by their
greatest common divisor. The Euclidean Algorithm has polynomial running time, so during
any algorithm, we can assume that any fraction r is stored by using just size(r) bits.
Proof: Both statements are obvious if the numbers r1 , . . . , rn are integers. Hence assume that
ri = pqii for non-zero numbers pi and qi that are relatively prime (i = 1, . . . , n).
n
n
n
n n n
Q Q Q P P P
(a) size ri ≤ size pi + size qi ≤ size(pi ) + size(qi ) = size(ri ).
i=1 i=1 i=1 i=1 i=1 i=1
!
n
Q n
P n
P n
P Q
(b) We have size qi ≤ size(qi ) ≤ size(ri ), and size pi qj ≤
i=1 i=1 i=1 i=1 j∈{1,...,n}\{i}
!
n n n n n
1
P Q P P P Q
size |pi | qj ≤ size(ri ). Since ri = n
Q pi qj , this proves the
i=1 j=1 i=1 i=1 qi i=1 j∈{1,...,n}\{i}
i=1
claim. 2
65
Proposition 46 For x, y ∈ Qn , we have
Proof:
(a) We have
n
X n
X n
X
size(x + y) = n + size(xi + yi ) ≤ n + 2 size(xi ) + 2 size(yi ) = 2(size(x) + size(y)) − 3n.
i=1 i=1 i=1
(b) We have
n
! n n n
!
X X X X
t
size(x y) = size xi yi ≤2 size(xi yi ) ≤ 2 size(xi ) + size(yi )
i=1 i=1 i=1 i=1
= 2(size(x) + size(y)) − 4n.
2
p
Proof: Write the entries aij of A as aij = qijij where pij and qij are relatively prime (i, j =
1, . . . , n). Let det(A) = pq where p and q are relatively prime, too.
Then |det(A)| ≤ ni=1 nj=1 (|pij | + 1) and |q| ≤ ni=1 nj=1 |qij |. Therefore,
Q Q Q Q
size(q) ≤ size(A)
Qn Qn
and |p| = |det(A)||q| ≤ i=1 j=1 (|pij | + 1)|qij | . We can conclude
n X
X n
size(p) ≤ (size(pij ) + 1 + size(qij )) = size(A).
i=1 j=1
66
Proof: By Corollary 20 the maximum of ct x over P = {x ∈ Rn | Ax ≤ b} must be attained in
a minimal face of P . Let F be a minimal face where the maximum is attained. By Proposition 23,
we can write F = {x ∈ Rn | Ãx = b̃} for some subsystem Ãx ≤ b̃ of Ax ≤ b. We can assume
that the rows of à are linearly independent. Choose B ⊆ {1, . . . , n} such that ÃB is a regular
square matrix. Then x ∈ Rn with xB = Ã−1 B b̃ and xN = 0 (with N = {1, . . . , n} \ B) is an
optimum solution of the linear program. By Cramer’s rule the entries of xB can be written
det(Ã )
as xj = det(Ã j ) where Ãj arises from ÃB by replacing the j-th column by b̃. Thus, we have
B
Proof: According to the proof of the previous proposition there is an optimum solution x
such that for each entry xj of x we have size(xj ) ≤ 4n(size(A) + size(b)). Since every positive
number smaller than 2−4n(size(A)+size(b)) has a size larger than 4n(size(A) + size(b)), this proves
the claim. 2
Assume that we want solve an equation system Ax = b. We can do this by applying the Gaussian
Elimination. This algorithm performs three kinds of operations to the matrix A:
It should be well-known (see e.g. textbooks Hougardy and Vygen [2018] or Korte and Vygen
[2018]) that with these steps O(mn(rank(A) + 1)) elementary arithmetical operations are
sufficient to transform A into an upper (right) triangular matrix. Then it is easy to check if the
equation system is feasible, and, in case that it is feasible, to compute a solution. However, in
order to show that Gaussian Elimination is a polynomial-time algorithm, we have to show that
the numbers that arise during the algorithm aren’t too big.
The intermediate matrices that occur during the algorithm are of the type
B C
, (37)
0 D
67
where B is an upper triangular matrix. Then, an elementary step of the Gaussian Elimination
consist of choosing a non-zero entry of D (called pivot element; if no such entry exists, we are
done) and to swap rows and/or columns such that this element is at position (1, 1) of D. Then
we add a multiple of the first row of D to the other rows of D such that the entry at position
(1, 1) is the only non-zero entry of the first column of D.
We want to prove that the numbers that occur during the algorithm can be encoded using a
polynomial number of bits. We can assume that we don’t need any swapping operation because
swapping columns or rows doesn’t change the numbers in the matrix.
B C
Assume that our current matrix is à = where B is a k × k-matrix. Then for each
0 D
entry dij of D we have
det(Ã1,...,k,k+i 1,...,k
1,...,k,k+j ) = dij · det(Ã1,...,k ). (38)
where Mji11,...,j
,...,it
t
denotes the submatrix of a matrix M induced by the rows i1 , . . . , it and the
columns j1 , . . . , jt . To see the correctness of (38), apply Laplace’s formula to the last row of
Ã1,...,k,k+i
1,...,k,k+j which contains dij as the only non-zero element. Since the determinant does not
change if we add the multiple of a row to another row, this leads to
det(A1,...,k,k+i
1,...,k,k+j )
dij =
det(A1,...,k
1,...,k )
By Proposition 47 and Proposition 45, this implies size(dij ) ≤ 4 size(A). Since all entries of the
matrix occur as entries of such a matrix D, this shows that the sizes of all numbers that are
considered during the Gaussian Elimination are bounded by 4size(A).
Note that we have to apply the Euclidean Algorithm to any intermediate result in order to
get small representations of the numbers. But this is not a problem because the Euclidean
Algorithm is polynomial as well.
Finally, we get the result:
In particular this result shows that the following problems can be solved with a polynomial
running time:
• Solving a system of linear equations.
• Computing the determinant of a matrix.
• Computing the rank of a matrix.
• Computing the inverse of a regular matrix.
• Checking if a set of rational vectors is linearly independent.
68
6 Ellipsoid Method
The Ellipsoid Method (proposed by Khachiyan [1979]) was the first polynomial-time algorithm
for linear programming. The algorithm solves the problem of finding a feasible solution of a
linear program. As we have seen in Section 2.4, this is sufficient to solve as well the optimization
problem.
E = {M x + s | x ∈ B n }
But (using the previous remark) this is equivalent to the statement that there is a positive
definite n × n-matrix Q and a vector s ∈ Rn such that E = {x ∈ Rn | (x − s)t Q−1 (x − s) ≤ 1}.
2
The Ellipsoid Algorithm just finds an element in an polytope or ends with the assertion
that the polytope is empty. On the other hand, it can be applied to more general sets K ⊆ Rn
69
provided that K is a compact convex set and that for any x ∈ Rn \ K we can find a half-space
containing K such that x is on the border of the half-space.
Basically, the algorithms works as follows: We always keep track of an ellipsoid containing K.
Then we check if the center c of the ellipsoid is contained in K. If this is the case, we are done.
Otherwise, we compute the intersection X of the ellipsoid and a half-space containing K such
that c is on the border of the half-space. Then, we find a new (smaller) ellipsoid containing X.
For the 1-dimensional space, the ellipsoid method contains the binary search as a special case.
However, for technical reasons, we assume in the following that the dimension of our solution
space is at least 2.
We start with a special case that is easier to handle: We assume that our given ellipsoid is
the ball B n (with radius 1 and center 0). We want to find a small ellipsoid E covering the
intersection of B n with the half-space {x ∈ Rn | x1 ≥ 0} (the gray area in Figure 7).
(0, 1)
B2
(c, 0) (1, 0)
E
(0, −1)
For symmetry reasons, we choose the center of the new smaller ellipsoid on the vector e1 at a
position c · e1 (where c is still to be determined). Our candidates for the ellipsoid are of the form
( n
)
X
E = x ∈ Rn | α2 (x1 − c)2 + β 2 x2i ≤ 1
i=2
1
where we also have to choose α and β. The matrix Q is then a diagonal matrix with entry α2
at position (1, 1) and β12 on all other diagonal positions.
To keep E small, we want e1 to lie on the border of E. This condition leads to α2 (1 − c)2 = 1
and hence
1
α2 = . (39)
(1 − c)2
70
be on the border of E. This condition leads to α2 c2 + β 2 = 1 and thus
c2 1 − 2c
β 2 = 1 − α 2 c2 = 1 − 2
= . (40)
(1 − c) (1 − c)2
p
The volume of an ellipsoid E = {x ∈ Rn | (x−s)t Q−1 (x−s) ≤ 1} is vol(E) = det(Q)×vol(B n )
(a result from measure theory, see e.g. Proposition 6.1.2 in Cohn [1980]).
p
Therefore, our goal is to choose α, β and c in such a way that det(Q) = α−1 β −(n−1) is
minimized.
(1−c)2n
Thus, we want to find a c minimizing (1−2c)n−1
.
2n 2n 2n−1
d (1−c)
We have dc (1−2c)n−1
= 2(n−1)(1−c)
(1−2c)n
− 2n(1−c)
(1−2c)n−1
which is zero if 2(n−1)(1−c)
1−2c
= 2n. This leads to
2(n − 1) − 2c(n − 1) = 2n − 4cn and c(2n − (n − 1)) = 1. Thus, we minimize the volume by
1
setting c = n+1 .
(n+1)2 n2 −1
Then, α2 = n2
and β 2 = n2
.
1 + x ≤ ex for any x ∈ R. 2
71
Lemma 53 (Half-Ellipsoid Lemma) Let E = p + {x ∈ Rn | xt Q−1 x ≤ 1} be an ellipsoid
and a ∈ Rn with at Qa = 1. Then,
2
n t t 1 ′ n n −1 t −1 2 t
E ∩{x ∈ R | a x ≥ a p} ⊆ E = p+ Qa+ x ∈ R | x Q + aa x ≤ 1 .
n+1 n2 n−1
1
vol(E ′ )
Moreover, vol(E)
≤ e− 2(n+1) .
E ∩ {x ∈ Rn | at x ≥ at p}
= (p + M B n ) ∩ {x ∈ Rn | at x ≥ at p}
= p + (M B n ∩ {x ∈ Rn | at (x + p) ≥ at p})
= p + (M B n ∩ {x ∈ Rn | at x ≥ 0})
= p + M (B n ∩ M −1 {x ∈ Rn | at x ≥ 0})
= p + M (B n ∩ {x ∈ Rn | at M x ≥ 0})
= p + M (B n ∩ {x ∈ Rn | et1 x ≥ 0})
2
1 n n −1 t 2
Lem. 52
t
⊆ p+ M e1 + M x ∈ R | x In + e1 e x ≤ 1
n+1 n2 n−1 1
2
1 n n −1 −1 t 2 t −1
= p+ M e1 + x ∈ R | (M x) In + e1 e M x ≤ 1
n+1 n2 n−1 1
2
1 n n −1 t −1 2 t
= p+ Qa + x ∈ R | x Q + aa x ≤ 1
n+1 n2 n−1
n o
We can write the ellipsoid E ′ in standard form as E ′ = p + 1
n+1
Qa + x ∈ Rn | xt Q̃−1 x ≤ 1
2
with Q̃ = n2n−1 Q − n+1
2
Qaat Qt because
n2 − 1 n2
2 −1 t 2 t t
Q + aa Q− Qaa Q
n2 n−1 n2 − 1 n+1
2 2 4
= In − aat Qt + aat Q − 2 a at Qa at Qt
n+1 n−1 n − 1 | {z }
=1
= In .
q
vol(E ′ )
Therefore, vol(E) = det( Q̃)
det(Q)
.
2 n n
n2 n2
We have det( Q̃) n 2 2
det(Q)
= det n2 −1
In − n+1
aat Qt = n2 −1
det In − n+1
aat Qt = n2 −1
(1 −
2
n+1
). To see the last equality note that the matrix aat Qt has eigenvalue 1 for the eigenvector
72
a (because at Qt a = 1) while all other eigenvalues are zero (the rank of aat Qt is 1).q Since the
determinant is the product of all eigenvalues, this implies the last equation. Hence, det( Q̃)
det(Q)
≤
2 n2 n−1
1
≤ e− 2(n+1) (see the proof of the Half-Ball Lemma for
n 2 1 n n2 2
n2 −1
(1 − n+1 ) 2 = n+1 n2 −1
details of the last steps). 2
n o 2
Remark: The ellipsoid E ′ = p+ n+1 1
Qa+ x ∈ Rn | xt Q̃−1 x ≤ 1 with Q̃ = n2n−1 Q − n+12
Qaat Qt
is called Löwner-John ellipsoid. It is in fact the smallest ellipsoid containing E ∩ {x ∈ Rn |
at x ≥ at p}.
A separation oracle for a convex set K ⊆ Rn is a black-box algorithm which, given x ∈ Rn ,
either returns an a ∈ Rn with at y > at x for all y ∈ K or asserts x ∈ K.
Observation: Given A ∈ Qm×n and b ∈ Qm , a separation oracle for {x ∈ Rn | Ax ≤ b} can be
implemented in O(mn) arithmetical operations.
Proof: As an invariant, we will prove that during the k-th iteration of the algorithm, the
set K is contained in the set pk + {x ∈ Rn | xt A−1k x ≤ 1}. For k = 0, this is true because R is
big enough. For the step from k to k + 1, we apply the Half-Ellipsoid Lemma (Lemma 53) to
t
Q = Ak and a = √ āt (this scaling leads to at Ak a = āāt A k ā
Ak ā
= 1).
ā Ak ā
73
We have vol({x ∈ Rn | xt x ≤ R2 }) ≤ vol([−R, R]n ) = 2n Rn , and in each iteration, the
1
− 2(n+1)
volume of Ek = {x ∈ Rn | xt A−1
k x ≤ 1} is reduced at least by the factor e , so we get
k
− 2(n+1)
vol(Ek ) ≤ e 2n Rn .
k
Thus, we have to find a smallest k such that e− 2(n+1) 2n Rn ≤ ϵ which is equivalent to 2(n+1)k
≥
2n Rn 1 1
ln ϵ and k ≥ 2(n + 1)(n ln(2R) + ln( ϵ )). This shows that O(n(n ln(R) + ln( ϵ ))) iterations
are sufficient. 2
We cannot compute square roots exactly, so during the algorithm, we have to work with rounded
intermediate solutions. Let pek and Ãk be the exact values and pk and Ak be the rounded values
(and the same for the corresponding ellipsoids Ẽk and Ek ). Note that pek and Ãk are based on
the rounded values pk−1 and Ak−1 .
Let δ be an upper bound on the maximum absolute rounding error for the entries in pek and
Ãk , so ∥pk − pek ∥∞ ≤ δ and ∥Ak − Ãk ∥∞ ≤ δ. So δ (that will be defined later) describes the
precision of the rounding. When we round the entries in Ãk , we do it in such a way that the
matrix remains symmetric. Let Γk = Ak − Ãk and ∆k = pk − pek .
In the following, we write by ∥∥˙ the Euclidean norm for vectors and the induced operator norm
for the matrices. When considering matrices, we often make use of the fact that the Frobenius
norm is an upper bound for the operator norm induced by the Euclidean norm.
−1
For any x ∈ K we can assume that (x − pek )t Ãk (x − pek ) ≤ 1 and we want to prove the same
for pk and Ak . To this end, we have to increase the ellipsoid slightly by scaling Ãk .
−1 −1
We have (x − pk )t A−1 t
k (x − pk ) = (x − pk ) Ãk (x − pk ) + (x − pk )t (A−1
k − Ãk )(x − pk ). We
analyze the two summands separately:
−1 −1 −1 −1
(x − pk )t Ãk (x − pk ) = (x − p˜k )t Ãk (x − p˜k ) + |2∆tk Ãk (x − p˜k )| + ∆tk Ãk ∆k
−1 −1
≤ 1 + 2∥∆k ∥ · ∥Ãk ∥ (R + ∥p˜k ∥) + ∥∆k ∥2 · ∥Ãk ∥ (41)
√ −1 −1
≤ 1 + 2 nδ ∥Ãk ∥ (R + ∥p˜k ∥) + nδ 2 ∥Ãk ∥.
And:
−1 −1
(x − pk )t (A−1
k − Ãk )(x − pk ) ≤ ∥x − pk ∥2 · ∥A−1
k − Ãk ∥
−1 −1
≤ (R + ∥pk ∥)2 ∥Ak (Ak − Ãk )Ãk ∥
−1 (42)
≤ (R + ∥pk ∥)2 ∥A−1
k ∥ · ∥Ãk ∥ · ∥Γk ∥
2 −1 −1
≤ (R + ∥pk ∥) ∥Ak ∥ · ∥Ãk ∥ · nδ
1
We adjust Ãk by multiplying it by µ = 1 + 2n(n+1)
, so we replace Ãk by µÃk (which we call Ãk
again). Then
−1 1 2n(n + 1) 1
(x − p̃k )t Ãk (x − p̃k ) = 1 = < 1 − . (43)
1+ 2n(n+1)
2n2 + 2n + 1 4n2
74
and (E
]k+1 also refers to the scaled version of Ãk ):
n2
vol(E
] k+1 ) 1
− 2(n+1) 1 1 1 1
≤e 1+ ≤ e− 2(n+1) e 4(n+1) = e− 4(n+1) . (44)
vol(Ek ) 2n(n + 1)
Thus,
q
vol(Ek+1 ) vol(E
]k+1 ) vol(Ek+1 ) 1 −1
= ≤ e− 4(n+1) det(Ak+1 A
]k+1 ) (45)
vol(Ek ) vol(Ek ) vol(E]k+1 )
We have
−1 −1
det(Ak+1 A
] k+1 ) = det In + (Ak+1 − A
] k+1 )A]k+1
(∗) −1
n
≤ ∥In + (Ak+1 − A ] k+1 )Ak+1 ∥
]
−1
n
≤ (1 + ∥Γk+1 ∥ · ∥A
] k+1 ∥)
−1
n
≤ (1 + nδ∥A
]k+1 ∥)
−1
2 δ∥A ∥
≤ en
^ k+1 ,
Qn
where inequality (∗) follows from Hadamard’s inequality (| det(A)| ≤ i=1 ∥ai ∥ for an n × n-
matrix with columns a1 , . . . , an , see exercises).
This implies
vol(Ek+1 ) 1 1 2
−1
≤ e− 4(n+1) · e 2 n δ∥Ak+1 ∥ .
^
vol(Ek )
−1 1
Hence, if we had 12 δ∥A
] k+1 ∥ <
1
8(n+1)3
, then we had vol(Ek+1 )
vol(Ek )
< e− 8(n+1) .
Therefore, and by equations (41) and (42) our goal is to choose δ such that we get the following
inequalities:
√ fk −1 ∥ (R + ∥p˜k ∥) + nδ 2 ∥A
fk −1 ∥ + (R + ∥pk ∥)2 ∥A−1 ∥ · ∥A
fk −1 ∥nδ ≤
• 2 nδ ∥A k
1
4n2
−1
• δ∥A
]k+1 ∥ ≤
1
4(n+1)3
1
Proposition 55 Assume that δ is chosen such that δ ≤ 12n4k
in iteration k of the
Ellipsoid Method. Then, we have:
(c) ∥Ak ∥ ≤ R2 2k , ∥A
fk ∥ ≤ R2 2k .
75
Proof: We have
n2 − 1 1 āāt
−1 2
A
]k+1 = A−1
k + .
n2 µ n − 1 āt Ak ā
−1
Thus, as a sum of a positive definite matrix and a positive semidefinite matrix A
]k+1 is positive
n2 2 t
definite. Therefore Ak+1 = n2 −1 µ(Ak − n+1 bk bk ) is positive definite.
]
Thus,
n2 − 1 1 āāt
−1 2
∥A
]k+1 ∥≤ ∥A−1
k ∥+ ∥ t ∥ ≤ 3∥Ak −1 ∥
n2 µ n − 1 ā Ak ā
Let λ be a smallest eigenvalue of Ak+1 and v a vector with ∥v∥ = 1 such that λ = v t Ak+1 v.
Then:
v t Ak+1 v ≥ v t A
]k+1 v − nδ
≥ min{ut A ] n
k+1 u | u ∈ R , ∥u∥ = 1} − nδ
1
≥ −1 − nδ
∥Ak+1 ∥
]
1
≥ − nδ
3∥Ak −1 ∥
1 1
≥ − nδ
3 R−2 4k
1
≥ ,
R−2 4k+1
provided that:
R2
1 1
nδ ≤ − . (46)
3 4 4k
n2
∥Ak+1 ∥ ≤ ∥A
] k+1 ∥ + ∥Γk+1 ∥ ≤ 2−1
µ ∥Ak ∥ + nδ ≤ R2 2k+1
n
| {z }
≤ 32
n2
We also get ∥A
]k+1 ∥ ≤ n2 −1
µ∥Ak ∥ ≤ R2 2k+1 , so we have proved (c).
76
We can write Ak = M M t with a regular matrix M . Then,
r s
∥Ak ā∥ t
ā Ak Ak ā (M t ā)t Ak (M t ā) p k
∥bk ∥ = √ t = t
= t t t
≤ ∥Ak ∥ ≤ R2 2 , (47)
ā Ak ā ā Ak ā (M ā )(M ā)
where the first inequality follows from the fact that ∥Ak ∥ = max{xT Ak x | ∥x∥ = 1} because Ak
is positive semidefinite (see exercises).
Therefore, we get by induction (using the fact that p0 = 0)
1 √ k √ k 1
∥pk+1 ∥ ≤ ∥pk ∥ + ∥bk ∥ + nδ ≤ ∥pk ∥ + R2 2 + nδ ≤ R2k + R2 2 + √ k ≤ R2k+1 .
n+1 3 n4
−1
Lemma 56 Let δ be positive with δ < 26(N (R,ϵ)+1) 16n3 where N (R, ϵ) :=
1
⌈8(n + 1)(n ln(2R) + ln( ϵ ))⌉. Then, in iteration k of the Ellipsoid Algorithm, we
k
have K ⊆ pk + Ek and vol(Ek ) < e− 8(n+1) 2n Rn .
1 1
Proof: By the choice of δ, we have nδ ≤ 12 4k
.
Moreover,
√ fk −1 ∥ (R + ∥p˜k ∥) + nδ 2 ∥A
fk −1 ∥ +(R + ∥pk ∥)2 ∥A−1 ∥ · ∥A
fk −1 ∥ nδ ≤ δn26k ≤
• 2 nδ ∥A k
1
4n2
| {z } |{z} | {z } |{z} | {z } | {z }
≤R−2 4k ≤R2k ≤R−2 4k ≤R2k ≤R−2 4k ≤R−2 4k
77
−1
• δ ∥A
] ∥≤ 1
| k+1
{z } 4(n+1)3
≤R−2 4k
Hence, by the above analysis, Ek (with rounded numbers) always contains the set K, and
1
is reduced at least by a factor of e− 8(n+1) in each iteration, so after
the volume of Ek
O n n ln R + ln 1ϵ iterations, the algorithm terminates with a correct output. 2
There number of calls of the separation oracle can be reduced to O(n ln( nR
ϵ
)) (see Lee, Sidford,
and Wong [2015] for an algorithm that only needs O(n ln( ϵ )) oracle calls and O(n3 lnO(1) ( nR
nR
ϵ
))
additional time).
We first want to use the Ellipsoid Algorithm just to check if a given polyhedron P is empty.
This can be done directly, provided that P is in fact a polytope and if we have the assertion
that if P is non-empty, its volume cannot be arbitrarily small. The following proposition implies
that we can assume these properties:
(a) P = ∅ ⇔ PR,ϵ = ∅.
2ϵ
n
(b) If P ̸= ∅, then vol(PR,ϵ ) ≥ n2size(A)
.
Proof:
78
y t A = 0 and y t b = −1. Then, by Proposition 48
min 11t y
At y = 0
bt y = −1
y ≥ 0
has an optimum solution y such that the absolute value of any entry of y is at most
24n(size(A)+size(b)) . Thus, y t (b + ϵ11) < −1 + (n + 1)24n(size(A)+size(b)) ϵ < 0. Again by Farkas’
Lemma, this implies that Ax ≤ b + ϵ11 does not have a feasible solution. In particular,
there is no feasible solution in [−R, R]n , so PR,ϵ = ∅.
(b) If P ̸= ∅, then PR−1,0 ̸= ∅ (with the same proof as in (a) for R). But for any z ∈ PR−1,0 , we
ϵ
have {x ∈ Rn | ||x − z||∞ < n2size(A) } ⊆ PR,ϵ . Hence vol(PR,ϵ ) ≥ vol({x ∈ Rn | ||x − z||∞ <
n
ϵ 2ϵ
2
n2size(A)
}) = n2size(A) .
√
Proof: We can apply the Ellipsoid Algorithm to K = PR,ϵ with R = ⌈ n(1+24n(size(A)+size(b)) )⌉
n −1
and ϵ′ = n2size(A)
2ϵ
(for ϵ = 2n24n(size(A)+size(b)) as a lower bound for the volume). We need
N (R, ϵ′ ) = O(n(n ln(R) + ln( ϵ1′ ))) iterations, which is polynomial in the input size.
Moreover, it is sufficient to set the bound on the absolute rounding error to any value δ <
′ −1
26(N (R,ϵ )+1) 16n3 , so also the number of bits that we have to compute during the algorithm
is polynomial. 2
Proof: By Theorem 59, we can check in polynomial time if a given linear program has a
feasible solution. We will show that this is sufficient for computing a feasible solution if one exists.
Assume that we are given m inequalities ati x ≤ bi with ai ∈ Qn and bi ∈ Q (i ∈ {1, . . . , m}). First
check if the system is feasible. If it infeasible, we are done. Otherwise, perform for i = 1, . . . , m
the following steps: Check if the system remains feasible if we replace ati x ≤ bi by ati x = bi . If
this is the case, replace ati x ≤ bi by ati x = bi . Otherwise, the inequality is redundant, and we
can skip it. We end up with a feasible system of equations with the property that any solution
of this system of equations is also a solution of the given system of inequalities. However, the
system of equations can be solved in polynomial time by using Gaussian Elimination (see
79
Section 5.1). Hence, for any linear program, we can compute in polynomial-time a feasible
solution if one exists.
In Section 2.4 we have seen that the task of computing an optimum solution for a bounded
feasible linear program can be reduced to the computation of a feasible solution of a modified
linear program (see the LP (24)). Thus, we can also compute an optimum solution. 2
Remark: By Proposition 23, the method described in the previous proof computes a solution
in a minimal face of the solution polyhedron P . In particular, if P is pointed, we compute a
vertex of P .
An advantage of the Ellipsoid Algorithm is that it does not necessarily need a complete
description of a solution space K ⊆ Rn but only needs a separation oracle that provides a linear
inequality satisfied by all elements of K but not by a given vector x ∈ Rn \ K. This allows us
to use the method e.g. for linear program with an exponential number of constraints.
Example: Consider the Maximum-Matching Problem. A matching in an undirected
graph is a set M ⊆ E(G) such that |δG (v) ∩ M | ≤ 1 for all v ∈ V (G). In the Maximum-
Matching Problem we are given an undirected graph G and ask for a matching with
maximum cardinality. It can be formulated as the following integer linear program:
P
max x
P e∈E(G) e
e∈δG (v) xe ≤ 1 v ∈ V (G)
xe ∈ {0, 1} e ∈ E(G)
In the LP-relaxation, we simply replace the constraint “xe ∈ {0, 1}” by “xe ≥ 0”. However, this
allows us e.g. in the graph K3 (i.e. the complete graph on three vertices) to set all values xe to
1
2
. To avoid such solutions, we may add the following constraints:
P |U |−1
e∈E(G[U ]) xe ≤ 2
U ⊆ V (G), |U | odd
are indeed the convex combinations of the solutions of the ILP formulation. In other words, the
vertices of the solution polyhedron of the LP are the integer solutions. We won’t prove this
statement here, see Edmonds [1965] for a proof. Hence, solving the linear program would be
sufficient to solve the matching problem. The number of constraints is exponential in the size
of the graph, but the good news is that there is a separation oracle with polynomial running
80
time for this linear program (see Padberg and Rao [1982]). We will see how such a separation
oracle can be used for solving the optimization problem.
In the remainder of this chapter, we always consider closed convex sets K for which numbers r
and R with 0 < r < R2 exist such that rB n ⊆ K ⊆ RB n . We call sets for which such numbers r
and R exist, r-R-sandwiched sets.
We will consider relaxed versions both of linear optimization problems and of separation
problems. In the weak optimization problem we are given a set K ⊆ Rn , a number ϵ > 0
and a vector c ∈ Qn . The task is to find an x ∈ K with ct x ≥ max{ct z | z ∈ K} − ϵ.
In order to apply the Ellipsoid Algorithm directly to an optimization problem, we need the
property that the set of almost optimum solutions cannot have an arbitrarily small volume.
The following lemma guarantees this for r-R-sandwiched sets:
n−1
ϵ 1
rn−1 vol(Bn−1 )
2 ct z
ϵ n−1 ϵ 1
′ n−1
vol(conv(A ∪ {z})) ≥ r vol(Bn−1 )
2ct z 2∥c∥ n
n−1
ϵ 1 ϵ 1
≥ rn−1 n .
2∥c∥R| n 2∥c∥ n
Here we use the fact that conv(A′ ∪ {z}) is an n-dimensional pyramid with height at least ϵ
2∥c∥
n−1 n−1
and a base of ((n − 1)-dimensional) volume 2cϵt z r vol(Bn−1 ). 2
This result allows us to find a polynomial-time algorithm for the weak optimization problem
provided that we can solve the corresponding separation problem efficiently:
81
Proposition 62 Given a polynomial-time separation oracle for an r-R-sandwiched
convex set K ⊆ Rn with running time polynomial in size(R), size(r) and size(x)
(where x is the input vector for the oracle), a number ϵ > 0 and a vector c, there is a
polynomial-time algorithm (w.r.t. size(R), size(r), size(c) and size(ϵ)) that computes a
vector v ∈ K with ct v ≥ sup{ct x | x ∈ K} − ϵ.
Proof: Apply the Ellipsoid Algorithm to find an almost optimum vector in K. Use the
previous lemma that shows that the set of almost optimum vectors in K cannot be arbitrarily
small. 2
A weak separation oracle for a convex set K ⊆ Rn is an algorithm which, given x ∈ Rn
and η with 0 < η < 21 , either asserts x ∈ K or finds v ∈ Rn with v t z ≤ 1 for all z ∈ K and
v t x ≥ 1 − η.
Remark: For the previous proposition, it would be enough to have a weak separation oracle
for K.
Notation: For K ⊆ Rn , we define K ∗ := {y ∈ Rn | y t x ≤ 1 for all x ∈ K}.
Proof: Claim: K ∗∗ = K
Proof of the claim: For x ∈ K, we have y t x ≤ 1 for all y ∈ K ∗ which implies x ∈ K ∗∗ . Therefore,
we have K ⊆ K ∗∗ .
Now let z ∈ Rn \ K. And let w ∈ K be a vector such that ∥z − w∥2 is smallest possible over
vectors in K (w exists because K is convex and closed). Let u = z − w. Then, for all x ∈ K, we
have ut x ≤ ut w < ut z. Moreover, since 0 ∈ K, we have ut w ≥ 0. By scaling u, we can assume
that ut z > 1 while ut x ≤ 1 for all x ∈ K. But then u ∈ K ∗ and ut z > 1 which implies z ̸∈ K ∗∗ .
Thus K ∗∗ ⊆ K. This prove the claim.
Now, let x ∈ Rn be an instance for the weak separation oracle. If x = 0, we can assert x ∈ K,
x
and if ∥x∥ > R we can choose v = ∥x∥ 2 . Therefore, we can assume that 0 < ∥x∥ ≤ R.
We can solve the (strong) separation problem for K ∗ (see the exercises). Since K ∗ is a closed
convex R1 - 1r -sandwiched set, we can apply Proposition 62 to it, and thus, we can solve the weak
η
optimization problem for K ∗ with c = ∥x∥ x
2 and ϵ = R in polynomial time. Thus, we get a vector
xt t xt
v0 ∈ K ∗ with v
∥x∥ 0
x
≥ max{ ∥x∥ v | v ∈ K ∗ } − Rη . If v
∥x∥ 0
≥ 1
∥x∥
− Rη , then v0t x ≥ 1 − η ∥x∥
R
≥ 1 − η,
t
and v0t z ≤ 1 for all z ∈ K (since v0 ∈ K ∗ ). Otherwise max{ ∥x∥
x
v | v ∈ K ∗} ≤ 1
∥x∥
, so
82
max{xt v | v ∈ K ∗ } ≤ 1, which implies x ∈ K ∗∗ . Together with the above claim, this implies
x ∈ K. Therefore, we have a weak separation oracle for K in polynomial running time. 2
It turns out that for rational r-R-sandwiched polyhedra P an exact polynomial-time separati-
on algorithm also provides an exact polynomial-time optimization algorithm, provided that
appropriate bounds on the sizes of the vertices of P are given:
83
84
7 Interior Point Methods
The Ellipsoid Algorithm gives a polynomial-time algorithm for solving linear programs but
in practice it is typically much less efficient than the Simplex Algorithm. In contrast, the
algorithm that we will describe in this section is efficient both in theory and practice.
The term “interior point method” refers to several quite different algorithms. They all have in
common that during the algorithm we always consider vectors in the interior of the polyhedron
of feasible solutions (in contrast to the Simplex Algorithm where we always have vectors
on the border of the polyhedron). Here, we restrict ourselves to one variant and follow the
description by Mehlhorn and Saxena [2015]. The first version of the algorithm has been proposed
by Karmakar [1984].
We consider an LP max{ct x | Ax ≤ b} in standard inequality form.
To simplify the notation, we write the slack variables s explicitly, so we consider the following
problem:
max ct x
s.t. Ax + s = b (48)
s ≥ 0
min bt y
s.t. At y = c (49)
y ≥ 0
Ax + s = b
At y = c
yts = 0 (50)
y ≥ 0
s ≥ 0
Note that y t s = 0 is not a linear constraint. Without this constraint (i.e. for the system
Ax + s = b, At y = c, y ≥ 0, s ≥ 0), the term y t s is exactly the difference between the
85
(dual) value of the dual solution y and the (primal) value of the primal solution x, s because
bt y − ct x = xt At y + st y − ct x = xt c + st y − ct x = st y.
The system (50) has a solution only if both the primal and the dual linear program are feasible
and bounded, so for the moment we assume that this is the case. In Section 7.1, we will see
what to do to enforce these properties.
In the interior point methods, one generally considers vectors in the interior of the solution
space. In the system (50), the only inequalities are y ≥ 0 and s ≥ 0, so during the algorithm,
we always have solutions x, s, y with y > 0 and s > 0. We will replace the condition y t s = 0 by
2
the condition σ 2 := m yi s i
− 1 ≤ 14 for some number µ > 0. During the iterations of the
P
i=1 µ
algorithm, we will decrease µ more and more towards 0.
To summarize, during the algorithm, we have a number µ > 0 and vectors x, s, y meeting the
following invariants
Ax + s = b
At y = c
Pm yi si 2
1 (51)
i=1 µ
−1 ≤ 4
y > 0
s > 0
(II) Reduce µ by a constant factor and adapt x, y and s to this new value of µ such that we
again get a solution of (51). Iterate this step until µ is small enough (Section 7.2).
We will show how we can modify (51) to an equivalent problem that can be solved easily,
provided that we are allowed to choose µ. This modification will in particular make both the
primal and the dual LP feasible. This is equivalent to the statement that one of them is feasible
and bounded. We will show how to modify the dual LP (49) such that the modified version is
feasible and bounded.
In a first step, we make the LP (49) bounded (in such a way that we do not change the
problem if the given LP was bounded). By Theorem 48, we know that if (49) is feasible and
bounded, then there is a W with W ∈ 2Θ(m(size(A)+size(c))) such that there is an optimum solution
y = (y1 , . . . , ym ) ≥ 0 with yi ≤ W (i = 1, . . . , m). So in this case there is a vector y ≥ 0 with
11t y ≤ mW and At y = c. Equivalently (after dividing everything by W ), we can ask for a vector
y ≥ 0 with 11t y ≤ m and At y = W1 c. By relaxing the constraint 11t y ≤ m to 11t y ≤ m + 1 and
86
by adding a slack variable ym+1 ≥ 0 this leads to the following LP which is equivalent to (49)
provided that (49) is bounded:
min bt y
s.t. At y = W1 c
11t y + ym+1 = m+1 (52)
y ≥ 0
ym+1 ≥ 0
In a second step, we will make the LP feasible. To this end, we add a new variable ym+2
such that setting all variables to 1 will get us a feasible solution. Let H be a constant (to be
determined later). Then, we state the following LP:
min bt y + Hy
m+2
1
t
s.t. A y + W
c − A 11 ym+2 = W1 c
t
t
11 y + ym+1 + ym+2 = m + 2
(53)
y ≥ 0
ym+1 ≥ 0
ym+2 ≥ 0
The goal is to choose H that big that if this LP has a feasible solution with ym+2 = 0 at all,
then in any optimum solution ym+2 = 0 will hold. In fact, by Corollary 49 we know that there
is a constant l such that if there is an optimum solution of (53) with ym+2 > 0, then there is an
optimum solution with ym+2 ≥ 2−4ml(size(A)+size(c)+size(W )) . On the other hand, bt y ≤ ∥b∥1 (m + 2)
in any feasible solution of (53), so if we set H = (∥b∥1 (m + 2) + 1)24ml(size(A)+size(c)+size(W )) , then
we enforce that ym+2 = 0 in any optimum solution (if a solution with ym+2 = 0 exists).
Proof: Setting y = 11, ym+1 = 1 and ym+2 = 1 gives a feasible solution, and due to the
constraint 11y + ym+1 + ym+2 = m + 2 and the non-negativity constraints the LP is bounded. 2
In addition, we can use an optimum solution of (53) to check if the initial dual LP was feasible
and/or bounded, and if it is feasible and bounded, we can find an optimum solution of it:
Let y1 , . . . , ym+2 be an optimum solution of (53) where all non-zero entries have an absolute
values of at least 2−4ml(size(A)+size(c)+size(W )) . If ym+2 > 0, then we know that (52) has no feasible
solution (otherwise there was a feasible solution of (53) with ym+2 = 0 which is cheaper). Thus,
the LP (49) has no feasible solution either. On the other hand, if ym+2 = 0, then the initial dual
LP must be feasible. Assume that this is the case, then we still have to check if the initial dual
LP was bounded. If ym+1 > 0, the initial dual program must be bounded. If ym+1 = 0, then the
initial dual LP can be bounded or unbounded. To decide if it is bounded, we can replace c by
the all-zero vector and first solve this new problem. Then, by Farkas’ Lemma, the LP (49) is
bounded if and only if the value of an optimum solution of the new problem is non-negative.
87
If we dualize the LP (53), we get the following LP (with variables x ∈ Rn , s ∈ Rm and additional
variables xn+1 , sm+1 , and sm+2 ):
max W1 ct x + (m + 2)xn+1
Ax + xn+1 11 + s = b
1 t t
W
c − 11 A x + xn+1 + sm+2 = H
xn+1 + sm+1 = 0 (54)
s ≥ 0
sm+1 ≥ 0
sm+2 ≥ 0
Instead of the primal-dual pair (48) and (49), we will consider the pair (53) and (54). Due to
the modification, both LPs are feasible and bounded.
For the new pair
2 of LPs we can easily find feasible solutions and a number µ such that
Pm+2 yi si
i=1 µ
− 1 ≤ 41 : We set y1 = y2 = · · · = ym = ym+1 = ym+2 = 1 which is obviously
µ
feasible for (53). For (54), we set x1 = x2 = · · · = xn = 0. Moreover, we choose sm+1 = ym+1 =µ
(where µ itself is still to be determined). This leads to xn+1 = −µ, sm+2 = H + µ, and
si = bi − xn+1 = bi + µ (i = 1, . . . m).
As a consequence of this choice, we get:
y i si bi
−1 = i = 1, . . . , m
µ µ
ym+1 sm+1
−1 = 0
µ
ym+2 sm+2 H
−1 =
µ µ
Therefore, !
m+2
X 2 m
yi si 1 X
σ2 = −1 = 2 H2 + b2i .
i=1
µ µ i=1
p Pm 2 2 1
Hence, by choosing µ = 2 H 2 + i=1 bi , we enforce σ ≤ 4
. Moreover, since µ > |bi |, we have
si = bi + µ > 0 for i ∈ {1, . . . , m}.
So what did we get so far? We have replaced the primal-dual pair (48) and (49) by the pair (53)
and (54) such that optimum solutions of these modified problems directly lead to a solution of
the original problem. Moreover, the new primal-dual pair consists of two feasible and bounded
problems.
We will write (53) as
min b̃t y
s.t. Ãt y = c̃ (55)
y ≥ 0
88
and (54) as
max c̃t x
s.t. Ãx + s = b̃ (56)
s ≥ 0
Ãx + s = b̃
Ãt y = c̃
Pm+2 yi si 2
i=1 µ
− 1 ≤ 14 (57)
y > 0
s > 0
In this section, we will describe a solution for the following problem: Given a solution
µ(k) , x(k) , y (k) , s(k) of (57) we want to compute a new solution µ(k+1) , x(k+1) , y (k+1) , s(k+1) of
(57) where µ(k+1) = (1 − δ)µ(k) for some δ that does not depend on the solution (to be
determined later).
In a first version, we describe the step without considering the sizes of the numbers that occur
during the computation. Afterwards, we will show how we can round intermediate solutions in
such a way that the numbers can be written with a polynomial number of bits.
We write x(k+1) = x(k) + f , y (k+1) = y (k) + g, and s(k+1) = s(k) + h. Think of the entries of f , g
and h as relatively small values. Assuming that µ(k+1) is fixed, we describe how to compute
appropriate values for f , g and h. The first two conditions of (57) lead to Ãf + h = 0 and
(k) (k)
Ãt g = 0. In addition we want to choose f and h such that (yi + gi )(si + hi ) is close to µ(k+1)
(k) (k) (k) (k) (k) (k)
(i = 1 . . . , m + 2). Since (yi + gi )(si + hi ) = yi si + gi si + yi hi + gi hi and the product gi hi
(k) (k) (k) (k)
is small (provided that gi and hi are small) we simply demand yi si + gi si + yi hi = µ(k+1)
(i = 1 . . . , m + 2). Hence, we want to compute f , g and h such that
Ãt g = 0
Ãf + h = 0 (58)
(k) (k) (k) (k)
si gi + yi hi = µ(k+1) − yi si i = 1, . . . , m + 2
Note that y (k) and s(k) are constant in this context. In this formulation, we skipped the
constraints that y (k+1) > 0 and s(k+1) > 0. We will see what we can do to get positive values,
anyway.
89
Let f , g and h be a solution of (58). By construction, we have
g t h = −g t Ãf = 0t f = 0. (60)
This implies
t
b̃t y (k+1) − c̃t x(k+1) = Ã(x(k) + f ) + (s(k) + h) (y (k) + g) − c̃t (x(k) + f )
t
= Ã(x(k) + f ) (y (k) + g) + (m + 2)µ(k+1) − c̃t (x(k) + f ) (61)
t
= x(k) + f Ãt y (k) + (m + 2)µ(k+1) − c̃t (x(k) + f )
= (m + 2)µ(k+1)
(k)
Proof: Let S be an (m + 2) × (m + 2)-diagonal matrix with si as entry at position (i, i)
(k)
and Y be an (m + 2) × (m + 2)-diagonal matrix with yi as entry at position (i, i).
Then, the last condition of (58) is equivalent to
which is equivalent to
g + S −1 Y h = S −1 µ(k+1) 11m+2 − y (k) .
This implies
Ãt g + Ãt S −1 Y h = Ãt S −1 µ(k+1) 11m+2 − Ãt y (k) , (62)
and hence
Ãt S −1 Y h = Ãt S −1 µ(k+1) 11m+2 − c̃. (63)
With h = −Ãf this leads to
However, the matrix Ãt S −1 Y Ã is invertible, so f = (Ãt S −1 Y Ã)−1 (c̃ − Ãt S −1 µ(k+1) 11m+2 ) is
the unique solution of this last equation. In particular, if (58) has a solution, this is the only
choice for f . By setting h = −Ãf , we fulfill the second constraint of (58). Finally, we set
g = S −1 µ(k+1) 11m+2 − y (k) − S −1 Y h (again the only choice) satisfying the third constraint of
(58).
Since we have chosen g and h such that (62) and (63) are met, we also have Ãt g = 0, so the
solution satisfies the first condition of (58). 2
90
In the above proof we have to solve an equation system −Ãt S −1 Y Ãf = Ãt S −1 µ(k+1) 11m+2 − c̃
in order to compute f . This equation system depends on the previous solutions s(k) and y (k) , so
here the sizes of the numbers to store the intermediate solutions could get too big. At the end
of this section, we will describe how to handle such issues.
s 2 s 2 sm+2
m+2 (k) (k) m+2 (k+1) (k+1) P gi hi 2
yi s i y s
We have σ (k) = − 1 and σ (k+1) =
P P
µ(k)
i i
µ(k+1)
− 1 = µ(k+1)
.
i=1 i=1 i=1
It remains to show that y (k+1) > 0 and s(k+1) > 0 and σ (k+1) ≤ 21 .
We first show that for an appropriate choice of µ(k+1) we get σ (k+1) ≤ 21 .
µ(k) 1
Lemma 68 (a) For i = 1, . . . , m + 2 we have (k) (k) ≤ 1−σ (k)
.
yi si
Proof:
2 2
(k) (k) (k) (k)
Pm+2 yi s i yi s i
(a) We have (σ (k) )2 = i=1 µ(k)
− 1 , so µ(k)
−1 ≤ (σ (k) )2 which implies
(k) (k) (k) (k)
yi si yi s i
1− µ(k)
≤ σ (k) and µ(k)
≥ 1 − σ (k) for i = 1, . . . , m + 2. This proves the claim.
(b) The statement is simply a special case of the Cauchy-Schwarz inequality that can be
proved as follows:
m+2
!2
(k) (k)
X y s
(σ (k) 2
) (m + 2) − 1 − i (k)i
i=1
µ
!2
m+2 (k) (k) 2 m+2 (k) (k)
X y i si X y s
= (m + 2) 1− − 1 − i (k)i
i=1
µ(k) i=1
µ
m+2 (k) (k) 2 m+2
X m+2 (k) (k) (k) (k)
X y i si X y s yj sj
= (m + 1) 1− − 2 1 − i (k)i · 1−
i=1
µ(k) i=1 j=i+1
µ µ(k)
m+2 (k) (k)
!2
X m+2
X y s
(k) (k)
y j sj
= 1 − i (k)i − 1−
i=1 j=i+1
µ µ(k)
≥ 0
91
Lemma 69 If δ = √1 (i.e. µ(k+1) = (1 − √1 )µ(k) ) then σ (k+1) < 21 .
8 m+2 8 m+2
r r
(k) (k)
si yi
Proof: Let Gi := gi (k) and Hi := hi (k) (for i ∈ {1, . . . , m + 2}).
yi µ(k+1) si µ(k+1)
v v
um+2 um+2
u X gi hi 2 uX
σ (k+1) = t
(k+1)
= t (Gi Hi )2
i=1
µ i=1
v !
u 1 m+2 m+2
u
X X
= t (G2 + Hi2 )2 − (G2i − Hi2 )2
4 i=1 i i=1
v
um+2 m+2
1u X 1X 2
≤ t 2
(G + Hi )2 2
≤ (G + Hi2 )
2 i=1 i 2 i=1 i
m+2 m+2
g t h=0 1X 1X 1 (k) (k) 2
= (Gi + Hi )2 = g i s i + hi y i
2 i=1 2 i=1 yi(k) s(k)
i µ
(k+1) | {z }
(k) (k)
=µ(k+1) −yi si
m+2 2 !2
(k) (k)
1X µ(k) µ(k+1) yi si
= −
2 i=1 yi(k) s(k)
i µ
(k+1) µ(k) µ(k)
m+2
!2
(k) (k)
1 X µ(k) 1 y s
= (k) (k)
−δ + 1 − i (k)i
2 i=1 yi si 1 − δ µ
| {z }
1
≤
1−σ (k)
(k) (k) 2
m+2
! m+2
! !
(k) (k)
1 X y s yi si
X
≤ (m + 2)δ 2
− 2δ 1 − i (k)i
+ 1−
2(1 − δ)(1 − σ (k) ) i=1
µ i=1
µ(k)
(k) (k) 2
m+2 m+2
! !
(k) (k)
1 X y s X y s
≤ (m + 2)δ 2 + 2δ 1 − i (k)i + 1 − i (k)i
2(1 − δ)(1 − σ (k) ) µ µ
|i=1 {z
√
} i=1
| {z }
2
≤σ (k) m+2 =(σ )
(k)
1 √ 2
≤ (k)
(m + 2)δ 2 + 2δσ (k) m + 2 + σ (k)
2(1 − δ)(1 − σ )
1 √ 2
(k)
= m + 2δ + σ
2(1 − δ)(1 − σ (k) )
σ (k) ≤ 12
2 √ 2
√
1 1 8 m+2 1 1
≤ m + 2δ + = √ +
1−δ 2 8 m+2−1 8 2
1
≤ .
2
2
92
Lemma 70 We have y (k+1) > 0 and s(k+1) > 0.
(k+1) (k+1)
Proof: Claim: We have yi si > 0 for i = 1, . . . , m + 2.
Proof of the Claim:
(k+1) (k+1)
Assume that yj sj ≤ 0 for a j ∈ {1, . . . , m + 2}. Then,
m+2
!2 (k+1) (k+1)
!2
(k+1) (k+1)
(k+1) 2
X yi si yj sj
σ = −1 ≥ −1 ≥ 1,
i=1
µ(k+1) µ(k+1)
(k) (k)
which is a contradiction to the fact that si , yi , and µ(k+1) are positive. 2
93
7.3 Finding an Optimum Solution
Pm+2 yi si 2
Proof: By the condition i=1 µ
− 1 ≤ 41 , we get
µ 3µ
≤ y i si ≤ < 2µ
2 2
for all i ∈ {1, . . . , m + 2}. Therefore, st y = m+2
P
i=1 yi si ≤ 2(m + 2)µ.
(a) Since y ∗ is an optimum and y a feasible solution of the dual LP, we have b̃t y ≥ b̃t y ∗ and
thus
st y = b̃t y − xt Ãt y = b̃t y − c̃t x ≥ b̃t y ∗ − c̃t x = b̃t y ∗ − xt Ãt y ∗ = st y ∗ .
η
Let i ∈ {1, . . . , m + 2} with yi < 4(m+2)
. We have
µ 2(m + 2)µ st y
si ≥ > ≥ .
2yi η η
Assume that yi∗ > 0, so yi∗ ≥ η. This implies
st y
st y ∗ ≥ si yi∗ > · η = st y ≥ st y ∗ ,
η
which is a contradiction. Therefore, yi∗ = 0.
(b) The case is very similar to part (a): Since x∗ , s∗ is an optimum and x, s a feasible solution
of the primal LP, we have c̃t x ≤ c̃t x∗ and thus
µ 2(m + 2)µ st y
yi ≥ > ≥ .
2si η η
94
Assume that s∗i > 0, so s∗i ≥ η. This implies
st y
y t s∗ ≥ s∗i yi > η · = st y ≥ y t s∗ ,
η
There are several ways to find an optimum solution. Before we describe a method to round
an interior point directly to an optimum solution, we will present a simpler but less efficient
η2
method: We choose k big enough such that µ(k) < 32(m+2) 2 . Then, for each i ∈ {1, . . . , m + 2},
(k) η (k) η
we have yi < 4(m+2)
or si <4(m+2)
. Let Āt y = c̄ be the subsystem of Ãt y = c̃ consisting of
(k) η
the rows with indices i for which si
< 4(m+2) , so s∗i = 0. For all other rows, we know that
yi∗ = 0, so we can ignore them when computing an optimum solution for the dual LP. If Āt y = c̄
has only one solution, we compute it and get an optimum solution of the modified dual LP
(k) η
(53) (provided that the result is non-negative). Otherwise, we check if yi0 < 4(m+2) for some
i0 ∈ {1, . . . , m}. In this case we know that if the initial dual LP has an optimal solution, then
there is one with yi0 = 0. Hence we can start the whole process again but now without the
variable yi0 , without the row of A with index i0 and without the entry of b with index i0 . Hence
we have reduced the instance size, so this method will terminate after at most m iterations.
(k) η
What can we do if there is no i ∈ {1, . . . , m} with yi < 4(m+2)
? To handle this case, we first
make sure that the system Ãx = b̃ does not have a feasible solution. If it has a feasible solution
(which can be checked by Gaussian Elimination), we modify b̃ slightly to a vector b∗ such that
Ãx = b∗ has no feasible solution. To this end choose n linearly independent rows of A. These
rows will define the solution of Ãx = b̃. Then, any modification of b outside these rows will make
the system Ãx = b̃ infeasible. We simply add an ϵ > 0 to one of these entries of b. If ϵ is small
enough, then an optimum solution of the dual LP with respect to b∗ will still be an optimum
solution of the original dual LP. To see that we can write ϵ with a polynomial number of bits,
observe that the absolute value of the difference between the costs of two basic solutions of an
LP is either 0 or can be bounded from below by some value 2−L where L is polynomial in the
input size. This follows from the fact that any basic solution can be written with a polynomial
number of bits. Thus, the same is true for any difference u of two basic solutions and for the
scalar product b̃t u. Hence, b̃t u is either zero or its absolute value is at least 2−L . This implies
that we can choose ϵ in such a way that it can be written with polynomially many bits and
that no suboptimal solution can become optimal by the modification.
Now assume that the initial dual LP is bounded and feasible. Then, we can compute optimum
solutions x∗ , y ∗ , s∗ of the modified LPs (53) and (54) by expanding optimum solutions of the
initial primal and dual problems in an canonical way. In particular, we will set xn+1 to 0. Then
Ax∗ + s∗ = b∗ but Ax = b∗ has no feasible solution. Hence, there must be an i0 ∈ {1, . . . , m}
(k) η
with s∗i0 > 0, so yi0 < 4(m+2) and yi∗0 = 0. Again, we get rid of at least one dual variable and
can restart the whole procedure on a smaller instance.
Now, we describe how we can avoid iterating the whole process:
95
Consider again the two problems (55) and (56). Theorem 14 implies that we can partition the
˙ such that for i ∈ B
index set {1, . . . , m + 2} of the dual variables into {1, . . . , m + 2} = B ∪N
∗ ∗
there is an optimum dual solution y with yi > 0 and for i ∈ N there is an optimum primal
solution x∗ , s∗ with s∗i > 0. Moreover, it is easy to see that there is also solutions x∗ , s∗ and y ∗
with these properties such that in addition the size of their entries is O(size(Ã)+size(b̃)+size(c̃)).
η η
Hence, in Lemma 71 for any i ∈ {1, . . . , m + 2} we can either have yi < 4(m+2) or si < 4(m+2) but
2
η
not both. Now we choose k big enough such that µ(k) < 32(m+2) 2 ∆ for some ∆ ≥ 1 that will be
η
determined later. Then, for each i ∈ {1, . . . , m + 2}, exactly one of the inequalities yi < 4(m+2)∆
η ˙ . In
and si < 4(m+2)∆ holds. Therefore, we can find the partitioning {1, . . . , m + 2} = B ∪N
η η
particular, we have yi ≥ 4(m+2) for each i ∈ B and yi < 4(m+2)∆ for each i ∈ N
Let AB be the submatrix of à consisting of the rows with indices in B, and AN be the submatrix
(k) (k)
of à consisting of the remaining rows. By yB , yN , bB , bN we denote the corresponding subvectors
(k)
of vectors y (k) and b. As in the description of the Simplex Algorithm, the entries of e.g. yB
are not necessarily indexed from 1 to |B| but their index set is the set B ⊆ {1, . . . , m + 2}. We
can assume that AB has full column rank.
In the following, the vector norm is the Euclidean norm ∥ · ∥2 and the matrix norm is the norm
induced by the Euclidean norm.
p
Theorem 72 Set ∆ = max{ (m + 2)∥AB (AtB AB )−1 AtN ∥, 1}. Let k be big enough such
η2
that µ(k) < 32(m+2)2 ∆ . Let YB be a diagonal matrix whose rows and columns are indexed
(k)
with B such that the entry at position (i, i) is yi . Define
(k)
dy := YB AB (AtB (YB )2 AB )−1 AtN yN
(k)
and ỹB = YB dy + yB . Then:
(c) The vector ỹ ∈ Rm+2 which arises from ỹB by adding zeros for the entries with index
in N is an optimum dual solution.
Proof:
96
(b)
(k)
∥dy ∥ = ∥YB AB (AtB (YB )2 AB )−1 AtN yN ∥
(k)
= ∥YB AB (AtB (YB )2 AB )−1 AtB YB YB−1 AB (AtB AB )−1 AtN yN ∥
| {z }
=In
(k)
≤ ∥YB AB (AtB (YB )2 AB )−1 AtB YB ∥ ·∥YB−1 AB (AtB AB )−1 AtN yN ∥
| {z }
=1
(k)
≤ ∥YB−1 ∥ · ∥AB (AtB AB )−1 AtN ∥ · ∥yN ∥
| {z } | {z } | √{z }
4(m+2) ∆ η m+2
≤ ≤ √m+2 < 4(m+2)∆
η
≤ 1.
(c) By (a), we have Ãt ỹ = c̃, and by (b), we know that ỹB > 0, so we have ỹ ≥ 0. Hence ỹ
is a feasible dual solution. Moreover, we know that there is a feasible primal solution in
which the slack variables si are zero for i ∈ B. Hence, by complementary slackness, ỹ is
an optimum dual solution. 2
Interior point methods can be motivated as follows: When we consider the primal LP
max ct x
s.t. Ax + s = b (64)
s ≥ 0
97
For any fixed vector y the maximum value L̃(y) = max{L(x, s, y) | x ∈ Rn , s ∈ Rn } is an upper
m
bound on max{ct x + µ i=1 ln(si ) | Ax + sP= b}, and hence one asks for min{L̃(y) | y ∈ Rm }.
P
In this setting, one can show that ct x + µ m i=1 ln(si ) is maximum if all partial derivatives of
L(x, s, y) are zero, i.e. all the following terms must be zero:
m
∂L(x, s, y) X
= cj − yi aij for j ∈ {1, . . . , n}
∂xj i=1
∂L(x, s, y) µ
= − yi for i ∈ {1, . . . , m}
∂si si
m
∂L(x, s, y) X
= bi − aij xj − si for i ∈ {1, . . . , m}
∂yi j=1
This leads to y t A = c, yi sj = µ (for j ∈ {1, . . . , n}), and Ax + s = b. Note that we do not get
explicit sign constraints on y but if we request s to be positive, then y will be positive. For
a decreasing sequence of µ(k) values, the sequence of corresponding vectors x(k) , s(k) and y (k)
(k) (k)
satisfying (y (k) )t A = c, yi sj = µ(k) (for j ∈ {1, . . . , n}), and Ax(k) + s(k) = b(k) is called a
central path. However, note that also the sequence of solutions we constructed in the first
(k) (k)
three subsections with only yi sj ≈ µ(k) is called central path.
98
8 Integer Linear Programming
Imposing integrality constraints on all or some variables of a linear program allows to model
many new conditions that could not be described by linear constraints. For example, even if
we only consider Binary Linear Programs (i.e. all integrality constraints are of the type
x ∈ {0, 1}) we can easily model the following conditions for variables x, y:
On the other hand, we have already seen that there are NP-hard optimization problems that
can be modeled as (mixed) integer linear programs. Hence, we cannot hope for polynomial-time
algorithms to solve general ILPs.
Fig. 8: A polyhedron P (given by the red hyperplanes) and its integer hull PI (green). The
black dots indicate the integral vectors.
Observations:
99
• For a rational polyhedral cone (i.e. a cone C = {x ∈ Rn | Ax ≤ 0} with A ∈ Qm×n ), we
have CI = C (because a polyhedral cone is rational if and only if it is generated by a
finite number of integral vectors).
100
rational polyhedra P = {x ∈ Rn | Ax ≤ b} ⊆ Rn such that PI has Ω(ϕn−1 ) vertices, where
ϕ = size(A) + size(b).
In this section, our goal is to find a certificate that a given system of equations does not have
any integral solution (which will be the result of Corollary 77).
The following operations on matrices are called elementary unimodular column operations:
101
Proof: We may assume that A is integral. Assume that we have already transformed A
F 0
into a matrix where F is a lower triangular matrix with positive diagonal. Let
G H
h11 , . . . , h1k be the first row of H. Apply elementary unimodular column operations to H such
that all h1j are non-negative and such that kj=1 h1j is as small as possible. We may assume
P
that h11 ≥ h12 ≥ · · · ≥ h1k . Then, h11 > 0 because A has rank m. Moreover, h1j = 0 for
j ∈ {2, . . . , k} because otherwise subtracting h1j from h11 would reduce kj=1 h1j . Hence, we
P
have obtained a larger lower triangular matrix F ′ .
We iterate this step and end up with a matrix B 0 where B is a lower triangular matrix
with positive diagonal. Denote the entries of B be bij (i = 1, . . . , m, j = 1, . . . , m). Finally, we
perform for i = 2, . . . , m the following steps: For j = 1, . . . , i − 1 add an integer multiple of the
i-th column of B to the j-th column of B such that the bij is non-negative and less than bii . 2
Proof: “⇒:” If x and y t A are integral vectors and Ax = b, then y t Ax = y t b is also integral.
“⇐:” Assume that bt y is integral for each y ∈ Qm for which At y is integral. Then, Ax = b must
have a (fractional) solution, since otherwise, by Farkas’ Lemma (Corollary 7), there would be
a vector y ∈ Qm with y t A = 0 and y t b = − 12 . Thus, we may assume that the rows of A are
linearly independent, so A has rank m.
It is easy to check the statement to be proved holds for A if and and only if it holds for any
matrix à where à arises from A by applying an elementary unimodular column operation.
Hence, we can assume that A is in Hermite normal form [B 0]. Thus B −1 [B 0] = [Im 0] is
−1 −1
an integral matrix. Therefore by our
−1 assumption (applied to−1therows of B ), B b is an
B b B b
integral vector. Since [B 0] = b, the vector x := is an integral solution for
0 0
[B 0] x = b. 2
102
8.3 TDI Systems
(a) P is integral
(e) Each rational supporting hyperplane of P contains at least one integral vector.
(f ) max{ct x | x ∈ P } is attained by an integral vector for each c for which the maximum
is finite.
(g) max{ct x | x ∈ P } is an integer for each integral vector c for which the maximum is
finite.
Proof: The following implications are obvious: “(b) ⇔ (c)”, “(b) ⇒ (d)”, “(d) ⇒ (e)”, and
“(f) ⇒ (g)”
“(a) ⇒ (b):” Assume that P is integral. Let F = P ∩ H be a face of P where H = {x ∈ Rn |
ct x = δ} is a supporting hyperplane of P . Then, any z ∈ F is a convex combination of integral
vectors v1 , . . . , vk of P . If vi ∈ P \ F (so ct vi < δ) for an i ∈ {1, . . . , k}, then (since ct x = δ)
there must be a j ∈ {1, . . . , k} with ct vj > δ, which is a contradiction to vj ∈ P . Thus, all vi
must be in F , so in particular F contains an integral vector.
“(c) ⇒ (f):” Follows from Corollary 20.
“(f) ⇒ (a):” Assume that (f) holds but P ̸= PI . Then, there is an x∗ ∈ P \ PI . By Theorem 74,
PI is a polyhedron, so there is an inequality at x ≤ β that is valid for PI but not for x∗ , so
at x∗ > β. This is a contradiction to (f) because max{at x | x ∈ P } is finite (by Proposition 75)
but is not attained by any integral vector.
So far, we have proved that (a),(b),(c), and (f) are equivalent.
“(e) ⇒ (c):” We may assume that A and b are integral. Let F = {x ∈ Rn | A′ x = b′ } be a
minimal face of P (where A′ x ≤ b′ is a subsystem of Ax ≤ b). If there is no integral vector x
with A′ x = b′ , then, by Corollary 77, there must be a rational vector y such that c := (A′ )t y is
integral while δ := y t b′ is not an integer. Moreover, we may assume that all entries of y are
positive (otherwise we add an appropriate integral vector to y). Since c is integral but δ is not
integral, the rational hyperplane H := {x ∈ Rn | ct x = δ} does not contain any integral vector.
We will show that H ∩ P = F which implies that H is a supporting hyperplane. By construction,
103
we have F ⊆ H, so we have to show that H∩P ⊆ F . Let x ∈ H∩P . Then, y t A′ x = ct x = δ = y t b′ ,
so y t (A′ x − b′ ) = 0. Thus, since all components of y are positive, A′ x = b′ , so x ∈ F .
Now, we know that (a),(b),(c),(d),(e), and (f) are equivalent.
“(g) ⇒ (e):” Let H = {x ∈ Rn | ct x = δ} be a rational supporting hyperplane of P , so
max{ct x | x ∈ P } = δ. Assume that H does not contain any integral vector. Then, by
Corollary 77, there is a positive number γ for which γc is integral but γδ is not integral. Then
max{(γc)t x | x ∈ P } = γ max{ct x | x ∈ P } = γδ ̸∈ Z, so the statement of (g) is false.
Since “(f) ⇒ (g)” is trivial, this shows the equivalence of all statements. 2
Note that total dual integrality is in fact a property of the system of inequalities, not just of
the polyhedron that is defined by them. For example the systems
1 1 0
x1
1 0 ≤ 0
x2
1 −1 0
and
1 1 x1 0
≤
1 −1 x2 0
define the same polyhedron. But it is easy to check that the first system of inequalities is TDI
while the second one is not TDI.
Theorem 81 Let A ∈ Qm×n and b ∈ Zm such that Ax ≤ b is totally dual integral. Then,
the polyhedron P = {x ∈ Rn | Ax ≤ b} is integral.
104
Proof: If Ax ≤ b is TDI, then by definition min{bt y | At y = c, y ≥ 0} is an integer for each
integral vector c for which the minimum is finite. By duality, this implies that max{ct x | Ax ≤ b}
is an integer for each integral vector c for which the maximum is finite. Thus, by the implication
“(g) ⇒ (a)” of Theorem 79, P is integral. 2
min{bt y + βγ | At y + γa = c, y ≥ 0, γ ≥ 0} = max{ct x | Ax ≤ b, at x ≤ β}
= max{ct x | Ax ≤ b}
= min{bt y | At y = c, y ≥ 0}
The last minimization problem has an optimum solution y ∗ that is integral. Together with
γ ∗ = 0, this gives an optimum integral solution for the first minimization problem. 2
Hence, if a system Ax ≤ b is not TDI, then no proper subsystem A′ x ≤ b′ with {x ∈ Rn | Ax ≤
b} = {x ∈ Rn | A′ x ≤ b′ } can be TDI. We call a system Ax ≤ b minimally TDI if it is TDI
but no proper subsystem of Ax ≤ b defining the same polyhedron is TDI.
is finite. Let x∗ , y ∗ , λ∗ , µ∗ be optimum primal and dual solutions. Set c̃ := c + ⌈µ∗ ⌉a. Then,
max{c̃t x | Ax ≤ b, at x ≤ β}
(66)
= min{bt y + βλ | y ≥ 0, λ ≥ 0, At y + λa = c̃}
is finite because x∗ is feasible for the maximum and y ∗ and λ∗ + ⌈µ∗ ⌉ − µ∗ are feasible for the
minimum.
Since Ax ≤ b, at x ≤ β is a TDI-system, the minimum in equation (66) has an integer optimum
solution ỹ, λ̃. Then, y := ỹ, λ := λ̃, µ := ⌈µ∗ ⌉ is an integer optimum solution for the minimum
in (65): it is obviously feasible, and its cost is:
105
The inequality follows from the fact that y ∗ , λ∗ + ⌈µ∗ ⌉ − µ∗ is feasible for the minimum in (66)
and ỹ, λ̃ is an optimum solution for the minimum in (66). Hence, the minimum in (65) has an
integral optimum solution, so Ax ≤ b, at x = β is TDI. 2
Example: The set of the standard unit vectors together with their negatives is a Hilbert basis
of the cone Rn .
is integral and an element of P . Thus (since {b1 , . . . , bk } ⊆ H), b can be written a a non-negative
integral combination of the elements of H. This shows that H is a Hilbert basis. 2
Notation: For a system of inequalities Ax ≤ b and a face F of {x ∈ Rn | Ax ≤ b}, we call
a row of A active, if the corresponding inequality in Ax ≤ b is satisfied with equality for all
x ∈ F.
106
Proof: “⇒:” Suppose that Ax ≤ b is TDI. Let F be a minimal face of P and let a1 , . . . , at be
the rows of A that are active for F . We have to show that {a1 , . . . , at } is a Hilbert basis. Let
c be an integral vector in cone({a1 , . . . , at }). We have to write c as an integral non-negative
combination of a1 , . . . , at . The maximum in the LP-duality equation
is attained by every vector x in F . Since Ax ≤ b is TDI, the dual problem has an integral
optimum solution y. By complementary slackness, the entries of y at positions corresponding
to rows that are not active in F are 0. Thus, c is an integral non-negative combination of
a1 , . . . , a t .
“⇐:” Assume that for each minimal face F of P , the rows that are active in F form a Hilbert
basis. Let c be an integral vector for which the optima in (67) are finite. We have to show that
the minimum is attained by an integral vector. Let F be a minimal face of P such that each
vector in F attains the maximum in the duality equation. Let a1 , . . . , at be rows of A that are
active in F . Then, by complementary Pt slackness, c ∈ cone({a1 , . . . , at }). Since a1 , . . . , at form a
Hilbert basis, we can write c = i=1 λi ai for certain non-negative integral numbers λ1 , . . . , λt .
We can extend (λ1 , . . . , λt ) with zero-components to a vector y ∈ Zm with y ≥ 0, At y = c and
bt y = xt At y = ct x for all x ∈ F . In other words, y is an integral optimum solution of the dual
LP. 2
Theorem 86 The rational system of inequalities Ax ≤ 0 is TDI if and only if the rows
of A form a Hilbert basis.
Proof: Follows from the previous Theorem with b = 0 (note that in the unique minimal face
of {x ∈ Rn | Ax ≤ 0} all rows of A are active). 2
Theorem 87 (Giles and Pulleyblank [1979]) For each rational polyhedron P ⊆ Rn there
exists a rational TDI-system Ax ≤ b with A ∈ Zm×n and P = {x ∈ Rn | Ax ≤ b}. The
vector b can be chosen to be integral if and only if P is integral.
Proof: We can assume w.l.o.g. that P ̸= ∅. For each minimal face F of P , we define
Then, CF is a polyhedral cone. To see this, assume that P = {Ãx ≤ b̃} is some description of
P . Then CF is generated by the rows of à that are active in F .
Let F be a minimal face, and let a1 , . . . , at be an integral Hilbert basis generating CF . Choose
x0 ∈ F , and define βi := ati x0 for i = 1, . . . , t. Then, βi = max{ati x | x ∈ P } (i = 1, . . . , t). Let
SF be the system at1 x ≤ β1 , . . . , att x ≤ βt . All inequalities in SF are valid for P . Let Ax ≤ b
be the union of the systems SF over all minimal faces F of P . Then, P ⊆ {x ∈ Rn | Ax ≤ b}.
107
On the other hand, if x∗ ∈ Rn \ P , then there is a supporting hyperplane of P separating x∗
from P , and this supporting hyperplane touches P in a minimal face, so there is an inequality
in Ax ≤ b that is violated by x∗ . Hence, P = {x ∈ Rn | Ax ≤ b}. Moreover, by Theorem 85,
Ax ≤ b is TDI.
If P is integral, then all the βi can chosen to be integral because we can choose the vectors
x0 ∈ F as integral vectors. On the other hand, if b is integral, then by Theorem 81, P is integral.
2
In this section, we want to identify integral matrices A such that Ax ≤ b, x ≥ 0 is TDI for
any vector b. It will turn out that these are exactly the totally unimodular matrices (see
Corollary 92).
In particular, a regular square matrix is unimodular if and only if it is integral and its determinant
is −1 or 1. Moreover, by Cramer’s rule, the inverse of any unimodular square matrix is an
integral matrix.
Exercise: Check that any series of elementary unimodular column operations, applied to a
matrix A (see Chapter 8.2), can be performed by multiplying A from the right by an appropriate
regular unimodular square matrix.
Theorem 88 Let A be a totally unimodular matrix, and let b be an integral vector. Then,
the polyhedron P = {x ∈ Rn | Ax ≤ b} is integral.
Proof: Let F be a minimal face of P . We will show that F contains an integral vector. By
the implication “(c) ⇒ (a)” of Theorem 79 this is sufficient to prove that P is integral.
By Proposition 23, we can write the minimal face as F = {x ∈ Rn | A′ x = b′ } where A′ x ≤ b′
is a subsystem of Ax ≤ b. We can assume that A′ has full row rank. By permuting coordinates,
108
U −1 b′
we can write A′ = U V for some matrix U with det(U ) ∈ {−1, 1}. Thus x :=
0
is an
integral vector in F . 2
Theorem 89 Let A ∈ Zm×n be a matrix with rank m. Then A is unimodular if and only
if for each integral vector b the polyhedron {x ∈ Rn | Ax = b, x ≥ 0} is integral.
Proof: “⇒:” Assume that A is unimodular, and let b be an integral vector. Let x′ be a vertex
of {x ∈ Rn | Ax = b, x ≥ 0}. This means that there are n linearly independent constraints
in the system Ax ≤ b, −Ax ≤ −b, −In x ≤ 0 that are satisfied by x′ with equality. Thus, the
columns of A corresponding to non-zero entries of x′ are linearly independent. This set of
columns can be extended to a regular m × m-submatrix B of A. Then, the restriction of x′ to
coordinates corresponding to B is B −1 b. This is integral (because det(B) ∈ {−1, 1}). The other
entries of x′ are zero, so x′ is integral.
“⇐:” Suppose that {x ∈ Rn | Ax = b, x ≥ 0} is integral for every integral vector b. Let B be
a regular m × m-submatrix of A. We have to show that det(B) ∈ {−1, 1}. To this end, it
is sufficient to show that B −1 u is integral for every integral vector u (by Cramer’s rule). So
let u be an integral vector. Then, there is an integral vector y such that z := y + B −1 u ≥ 0.
Then, b := Bz is integral. Let z ′ be a vector with Az ′ = Bz = b that arises from z by adding
zero-entries. Then, z ′ is a feasible (i.e. non-negative) basic solution of Ax = b, so it is a vertex
of {x ∈ Rn | Ax = b, x ≥ 0}. Therefore z ′ is integral, which also shows that z is integral. This
implies that B −1 u = z − y is integral. 2
Proof: The matrix A is totally unimodular if and only if Im A is unimodular. Let b be an
integral vector. Then, the
vertices
of {x ∈ Rn | Ax ≤ b, x ≥ 0} are integral if and only if the
vertices of {z ∈ Rm+n | Im A z = b, z ≥ 0} are integral. Thus, the statement follows from
Theorem 89. 2
Corollary 91 An integral matrix A is totally unimodular if and only if for all integral
vectors b and c optimum values for both sides of the duality equation
max{ct x | Ax ≤ b, x ≥ 0} = min{bt y | At y ≥ c, y ≥ 0}
Proof: Follows directly from Hoffmans and Kruskal’s Theorem (Theorem 90) using the fact
109
that a matrix is totally unimodular if and only if its transposed matrix is totally unimodular. 2
Proof: “⇒:” If A is totally unimodular, then also At is totally unimodular. Thus, by Theo-
rem 90, min{bt y | At y ≥ c, y ≥ 0} is attained by an integral vector for each vector b and each
integral vector c for which the minimum is finite. This implies that the system Ax ≤ b, x ≥ 0 is
TDI for each vector b.
“⇐:” Suppose that Ax ≤ b, x ≥ 0 is TDI for each vector b. By Theorem 81 this implies that the
polyhedron {x ∈ Rn | Ax ≤ b, x ≥ 0} is integral for each integral vector b. By Theorem 90, this
means that A is totally unimodular. 2
The following theorem provides as a certificate to show that a matrix is totally unimodular.
Proof: “⇒:” Let A be totally unimodular and R ⊆ {1, . . . , n}. Let d ∈ {0, 1}n be the
characteristic vector for R, i.e.
1 for r ∈ R
dr =
0 for r ∈ {1, . . . , n} \ R
A
Since A is totally unimodular, also the matrix −A is also totally unimodular. Thus, the
In
polytope
n 1 1
P := x ∈ R | Ax ≤ Ad , Ax ≥ Ad , x ≤ d, x ≥ 0
2 2
110
Define R1 := {r ∈ R | zr = 0} and R2 := {r ∈ R | zr = 1}. For i ∈ {1, . . . , m}, this yields
X X n
X
aij − aij = aij (dj − 2zj ) ∈ {−1, 0, 1}
j∈R1 j∈R2 j=1
˙ 2 as
“⇐:” Assume that for each R ⊆ {1, . . . , n} there are sets R1 , R2 ⊆ R with R = R1 ∪R
described in the theorem. We show by induction in k that every k × k-submatrix of A has
determinant -1,0, or 1. For k = 1 this follows from the criterion for |R| = 1.
Let k > 1. Let B = (bij )i,j∈{1,...,k} a submatrix of A. We can assume that B is non-singular
because otherwise its determinant is 0.
′
By Cramer’s rule, each entry of B −1 is det(B
det(B)
)
where B ′ arises from B by replacing a column by
a unit vector. By the induction hypothesis det(B ′ ) ∈ {−1, 0, 1}. Hence, all entries of the matrix
B ∗ := (det(B))B −1 are in {−1, 0, 1}.
Let b∗ be the first column of B ∗ . Then, Bb∗ = det(B)e1 where e1 is the first unit P vector.∗ We
∗ ∗
define R := {j ∈ {1, . . . , k} | bj ̸= 0}. For i ∈ {2, . . . , k}, we have 0 = (Bb )i = j∈R bij bj , so
|{j ∈ R | bij ̸= 0}| is even.
˙ 2 such that j∈R bij − j∈R bij ∈ {−1, 0, 1} for all i ∈ {1, . . . , k}. Thus, for
P P
Let R = R1 ∪R 1 2 P P
i ∈ {2,
P . . . , k}, we
P have (since |{j ∈ R | b ij ̸
= 0}| is even): j∈R 1
b ij − j∈R2 bij = 0. If we also
had j∈R1 b1j − j∈R2 b1j = 0, then the columns of B would not be linearly independent. Hence,
k
P P
j∈R1 b1j − j∈R2 b1j ∈ {−1, 1} and thus, Bx ∈ {e1 , −e1 } where the vector x ∈ {−1, 0, 1} is
defined by
1 for j ∈ R1
xj = −1 for j ∈ R2
0 for j ∈ {1, . . . , k} \ R
Therefore, b∗ = det(B)B −1 e1 ∈ {det(B)x, −det(B)x}. But both b∗ and x are non-zero vectors
with entries -1,0,1 only, so we can conclude that det(B) ∈ {−1, 1}. 2
This result allows as to prove total unimodularity for some quite important matrices: The
incidence matrix of an undirected graph G is the matrix AG = (av,e ) v∈V (G) which is defined
e∈E(G)
by:
1, if v ∈ e
av,e =
0, if v ∈
̸ e
The incidence matrix of a directed graph G is the matrix AG = (av,e ) v∈V (G) which is defined
e∈E(G)
by:
−1, if v = x
av,(x,y) = 1, if v = y
0, if v ̸∈ {x, y}
111
Proof: Let G be an undirected graph and AG its incidence matrix. Since a matrix is TU if
and only if its transposed matrix is TU, we can apply Theorem 93 to the rows of AG : AG is TU
if and only if for each X ⊆ V (G) there is a partition X = A∪B ˙ with E(G[A]) = E(G[B]) = ∅.
The last condition is satisfied if and only if G is bipartite. 2
Applications:
• The previous theorem can be used to show König’s Theorem: The maximum cardinality
of a matching in a bipartite graph equals the minimum cardinality of a vertex cover.
To see this, let G be a bipartite graph and AG its incident matrix. Then, a maximum
matching is given by an integral solution of max{11m x | AG x ≤ 11n , x ≥ 0} and a minimum
vertex cover by an integral solution of min{11n y | AtG y ≥ 11m , y ≥ 0}. By the previous
theorem, AG is TU, so by Corollary 91 both optima are attained by integral vectors.
Proof: Again, we apply Theorem 93 to the transpose of the incidence matrix. For any set
R ⊆ {1, . . . , m} we can choose the R1 := R and R2 := ∅ satisfying the constraints of Theorem 93.
2
Remark: This result gives a reason for the existence of integral optimum solution of flow
problems. These results can be extended to more general linear functions on the edges of
directed graphs (see exercises).
The general strategy of cutting-plane methods can be described as follows: Assume that we are
given a polyhedron P and we want to optimize a linear function over the integral vectors in P .
To this end, we first find an optimum solution x∗ over P . If this belongs to PI , we are done,
because then we can also easily compute and integral solution of the same cost. Otherwise we
112
look for an hyperplane separating x∗ from PI , so we ask for a vector c and a number δ, such
that ct x ≤ δ for all x ∈ PI but ct x∗ > δ. Then, we add the constraint ct x ≤ δ, solve the linear
program again and iterate these steps until we get an integral solution.
How can we find half-space that contain PI but not necessarily P ? An easy observation is that
if H is a half-space that contains P , then PI is contained in HI . This motivates the following
definition:
Definition 29 Let P ⊆ Rn be a convex set. Let M be the set of all rational half-spaces
H = {x ∈ Rn | ct x ≤ δ} with P ⊆ H. Then, we define
\
P ′ := HI .
H∈M
′
We set P (0) := P and P (i+1) := P (i) for i ∈ N \ {0}. P (i) is the i-th Gomory-Chvátal-
truncation of P .
1 α−1
x∗ = (αx∗ − (α − 1)y) + y.
α α
Since ct (αx∗ − (α − 1)y) ≤ ct y = ⌊δ⌋, this shows that x∗ is the convex combination of two
integral vectors in H, so x∗ ∈ HI . 2
113
Proposition 97 Let P = {x ∈ Rn | Ax ≤ b} be a rational polyhedron. Then
Let ũ be an optimum solution of the minimum. Since ũt A = ct is integral, this leads to
ũt Az ≤ ⌊ũt b⌋, so
ct z = ũt Az ≤ ⌊ũt b⌋ ≤ ⌊δ⌋.
By the previous lemma, this implies z ∈ HI . Since this is true for any half-space H containing
P , it also shows z ∈ P ′ . 2
Cuts that are given by inequalities of the type ut Ax ≤ ⌊ut b⌋ (for some vector u ≥ 0 with ut A
integral) are called Gomory-Chvátal cuts. They have been used for the first algorithms for
integer linear programming based on cutting planes (see Gomory [1963]).
114
Since Ax ≤ b is TDI, the minimum is attained by an integral vector ỹ. Thus,
ut Ax̃ = ỹ t Ax̃ ≤ ỹ t ⌊b⌋ ≤ ⌊ỹ t b⌋ ≤ ⌊ut b⌋.
This shows P ′ ⊇ {x ∈ Rn | Ax ≤ ⌊b⌋}. 2
Proof: Follows from the previous theorem and the fact that any rational polyhedron can be
described by a rational TDI system with integral matrix (Theorem 87). 2
Dunkel and Schulz [2013] haven shown that also for any polytope P (no matter if it is rational
or not), P ′ is a polytope.
115
Now assume in addition that P is rational. Since U is unimodular, U x is integral if and only if
x is integral. This implies
(f (P ))I = conv({y ∈ Zn | y = U x, x ∈ P })
= conv({y ∈ Rn | y = U x, x ∈ P, x ∈ Zn })
= conv({y ∈ Rn | y = U x, x ∈ PI })
= f (PI ).
By Theorem 87, we can assume that Ax ≤ b is TDI, A is integral and b is rational. Then, for
any integral vector c for which min{bt y | y t AU −1 = ct , y ≥ 0} is feasible and bounded, also
min{bt y | y t A = ct U, y ≥ 0} is feasible and bounded and ct U is integral. Hence AU −1 x ≤ b is
TDI. Thus, Theorem 98 implies
(f (P ))′ = {x ∈ Rn | AU −1 x ≤ b}′ = {x ∈ Rn | AU −1 x ≤ ⌊b⌋} = f (P ′ ).
2
Remark: This shows as well that (f (P ))(i) = f (P (i) ) for a rational polyhedron P and i ∈ N.
Theorem 103 (Schrijver [1980]) For every rational polyhedron P , there is a number t
with P (t) = PI .
116
PI = {x ∈ Rn | Cx ≤ d} with some integral matrix C and some rational vector d. If PI = ∅, we
choose C = A and d = b − A′ 11n where A′ arises from A by taking the absolute value of each
entry. Note that {x ∈ Rn | Ax + A′ 11n ≤ b} = ∅ because any vector x∗ with Ax∗ + A′ 11n ≤ b
could be rounded down to an integral vector x with Ax ≤ b.
Let ct x ≤ δ be an inequality in Cx ≤ d. Then, we claim that there is an s ∈ N with
P (s) ⊆ H := {x ∈ Rn | ct x ≤ δ}. The theorem is a direct consequence of this claim.
Proof of the claim: Observe that there is an integral number β ≥ δ with P ⊆ {x ∈ Rn | ct x ≤ β}.
If PI = ∅, this is true by construction. In the case PI ≠ ∅, it follows from the fact that ct x is
bounded over P if and only if it is bounded over PI (Proposition 75).
Assume that the claim is false, so there is an integer γ with δ < γ ≤ β for which there is an
s0 ∈ N with P (s0 ) ⊆ {x ∈ Rn | ct x ≤ γ} but there is no s ∈ N with P (s) ⊆ {x ∈ Rn | ct x ≤ γ−1}.
Then, max{ct x | x ∈ P (s) } = γ for all s ≥ s0 . To see this, assume that max{ct x | x ∈ P (s) } < γ
for some s. Then there is an ϵ > 0 with P (s) ⊆ {x ∈ Rn | ct x ≤ γ − ϵ}. This implies
max{ct x | x ∈ P (s+1) } ≤ γ − 1 because {x ∈ Rn | ct x ≤ γ − ϵ}I ⊆ {x ∈ Rn | ct x ≤ γ − 1}.
Define F := P (s0 ) ∩ {x ∈ Rn | ct x = γ}. Then, dim(F ) < n = dim(P ), so we can apply the
induction hypothesis to F , which implies that there is a number s1 with F (s1 ) = FI . Thus,
F (s1 ) = FI ⊆ PI ∩ {x ∈ Rn | ct x = γ} = ∅.
Proof: We claim that there is a rational polytope Q with P ⊆ Q such that QI = PI . This
is sufficient to show the theorem because we can apply the previous theorem to Q. To prove
the claim, let W be a hypercube containing P . Then, there are finitely many integral vectors
z ∈ W \ P . For each such vector z we choose a rational hyperplane separating z from P . The
set of all the inequalities corresponding to these separating hyperplanes together with the
inequalities that define W give a description of a set Q with the desired properties. 2
For a polyhedron P , the smallest number t with P (t) = PI (if there is such a number) is called
Chvátal rank of P .
117
Nevertheless, they are of great practical relevance. Algorithm 5 describes the approach for
integer linear programs but it can be applied to mixed integer linear programs, too. The
algorithm stores a number L which is the cost of the best integral solution found so far (so in
the beginning it is −∞). In each iteration of the main loop, the algorithm chooses a polyhedron
Pj , which is a subset of the given polyhedron P0 , and solves the corresponding linear program. If
this LP is bounded and feasible, the algorithm first checks if the value c∗ of an optimum solution
x∗ is larger than L. If this is not the case, the algorithm can reject the polyhedron Pj because
it cannot contain a better integral solution than the best current solution (this is the bounding
part). If c∗ > L and x∗ is integral, we have found a better integral solution and can update L.
Otherwise, we choose a non-integral component x∗i of x∗ and compute sub-polyhedra P2j+1 and
P2j+2 of Pj with additional constraints that arise by rounding x∗i up or down (branching step).
19 if L > −∞ then
20 return x̃;
21 else
22 return “There is no feasible solution”;
118
Example: Consider the following ILP:
Figure 9 illustrates what the algorithm may do on this instance. Since the optimum solution
of the LP-relaxation is not integral, we create in the first branching step two sub-polytopes
P1 = {(x1 , x2 ) | x2 ≤ 2} ∩ P0 and P1 = {(x1 , x2 ) | x2 ≥ 3} ∩ P0 = ∅. In P1 we still do not find
an integral optimum solution, so we branch again and get the polytopes P3 and P4 . In P4 we
get an integral optimum x∗ = (1, 2) with cost 3. In P3 we get a non-integral optimum solution
(0, 1.5) whose cost is not better than the best integral solution found so far (provided that we
considered P4 before P3 ), so the algorithm will stop here.
A branch-and-bound computation is often represented by a so-called branch-and-bound tree.
This is in fact rather an arborescence than a tree. Its nodes are the polyhedra Pj that are
considered during the computation, and P0 is the root. For any Pj , the nodes P2j+1 and P2j+2
are its children (if they exist).
In line 5 of the algorithm, we have to choose the next LP to be solved, and in line 15 we have to
decide which non-integral component is used for creating new sub-problems. There are different
strategies for these steps (branching rules). For example, it is often reasonable to store the
elements of K in a last-in-first-out queue and to choose the last element that has been added to
K. In the branch-and-bound tree, this corresponds to a leaf with the biggest distance to the
root. This strategy can reduce the time until the first feasible solution has been found. Another
reasonable branching rule consist in choosing a polyhedron Pj for which max{ct x | x ∈ Pj } is
as large as possible. Note that the maximum over all these values for all Pj ∈ K gives an upper
bound U on the best possible solution that can still be computed. Hence, by choosing a Pj with
max{ct x | x ∈ Pj } = U , we get a chance to reduce U . This can be useful if we do not want to
compute an exact optimum solution but we stop as soon as U − L is small enough.
For the choice of x∗i a common strategy is to choose x∗i such that |x∗i − ⌊x∗i ⌋ − 12 | is minimized.
Another, more time-consuming approach is to choose x∗i such that the effect on the objective
function is maximized (strong branching).
Further remarks:
• In order to get at least a finite algorithm, we have to guarantee that in line 8 we always
find a integral optimum solution if Pj is integral.
• Instead of initializing L with −∞, it is often possible to compute some reasonable integral
solution by some heuristics. In particular this is often the case for combinatorial problems.
• If one only wants to compute an approximative solution, the condition c∗ > L can be
strengthened to c∗ > L(1 + ϵ). This way, some optimum solution may be cut off but
119
x2
x∗
P0
−4x1 + 6x2 = 9
x1
x1 + x 2 = 4
x2
P2 = ∅
x2 = 3
x2 = 2
x∗
P1
−4x1 + 6x2 = 9
x1
x1 + x2 = 4
x2
x1 = 0 x2 = 1 x1 + x2 = 4
120
the running can sometimes be reduced drastically. In many cases, an optimum solution
is computed quite early but then much time is needed just in order to prove that the
solution is optimum.
• The branch-and-bound strategy can be combined with a cutting-plane algorithm (see the
previous section). For each sub-polyhedron Pj , one can try to find hyperplanes separating
some non-integral vectors in Pj from (Pj )I . This combination is called branch-and-cut
method. For example, this approach has been for solving quite large Traveling Salesman
Problems (see Padberg and Rinaldi [1991]).
121
Bibliography
Adler, I., Karp, R.M., Shamir, R. [1987]: A simplex variant solving an m × d linear program in
O(min(m2 , d2 )) expected number of steps. Journal of Complexity, 3, 372–387, 1987.
Ahuja, R.K., Magnanti, T.L., and Orlin [1993]: Network Flows: Theory, Algorithms, and Applications.
Prentice Hall, 1993.
Anthony, M., and Harvey, M. [2012]: Linear Algebra: Concepts and Methods. Cambridge University
Press, 2012.
Bárány, I., Howe, R., and Lovász, L. [1992]: On integer points in polyhedra: a lower bound. Combina-
torica, 12, 135–142, 1992.
Bertsimas, D., and Tsitsiklis, J.N. [1997]: Introduction to Linear Optimization. Athena Scientific, 1997.
Bertsimas, D., and Weismantel, R. [2005]: Optimization over Integers. Dynamic Ideas, 2005.
Bland, R.G. [1977]: New finite pivoting rules for the simplex method. Mathematics of Operations
Research, 2, 103–107, 1977.
Borgwardt, K. [1982]: The average number of pivot steps required by the simplex method is polynomial.
Zeitschrift für Operations Research, 26, 157–177, 1982.
Chvátal, V. [1983]: Linear programming. Series of books in the mathematical sciences, W.H. Freeman,
1983.
Cunningham, W.H. [1976]: A network simplex method. Mathematical Programming, 11, 105–116,
1976.
Dadush, D. and Hulberts, S. [2019]: A Friendly Smoothed Analysis of the Simplex Method. SIAM
Jounal on Computing, 49, 5, 449–499, 2019
Dantzig, G.B. [1951]: Maximization of a linear function of variables subject to linear inequalities. In:
Koopmans, T.C (ed.), Activity Analysis of Production and Allocation, 359–373, Wiley, 1951.
Dunkel, J. and Schulz, A.S. [2013]: The Gomory-Chvátal closure of a non-rational polytope is a rational
polytope. Mathematics of Operation Research, 38, 1, 63–91, 2013.
Edmonds, J. [1965]: Maximum matching and polyhedron with (0,1) vertices. Journal of Research of
the National Bureau of Standards, B, 69, 125–130, 1965.
Eisenbrand, F. [2003]: Fast integer programming in fixed dimension. Lecture Notes in Computer
Science, 2832, 196–207, 2003.
Fischer, G. [2009]: Lineare Algebra: Eine Einführung für Studienanfänger. 18th edition, Springer,
2013.
122
Ghoulia-Houri, A. [1962]: Charactérisation des matrices totalement unimodulaires. Comptes Rendus
Hebdomadaires des Séances de l’Académie des Sciences (Paris), 254, 1192-1194, 1962.
Giles, F.R. and Pulleyblank, W.R. [1979]: Total dual integrality and integer polyhedra. Linear Algebra
and Its Applications, 25, 191–196, 1979.
Goldfarb, D. and Sit, W.Y. [1979]: Worst case behavior of the steepest edge simplex method. Discrete
Applied Mathematics, 1, 4, 277–285, 1979.
Gomory, R.E. [1963]: An algorithm for integer solutions of linear programs. In: Recent Advances in
Mathematical Programing (R.L. Graves, P. Wolfe, eds.), McGraw-Hill, 269–302, 1963.
Grötschel, M., Lovász, L. and Schrijver, A. [1981]: The ellipsoid method and its consequences in
combinatorial optimization. Combinatorica, 1, 169–197, 1981.
Guenin, B., Könemann, J., and Tunçel, L. [2014]: A Gentle Introduction to Optimization. Cambridge
University Press, 2014.
Hoffman, A. and Kruskal, J. [1956]: Integral boundary points of convex polyhedra. Linear Inequalities
and Related Systems (H. Kuhn, A. Tucker, eds.), Annals of Mathematics Studies, 38, 223–246, 1956.
Hougardy, S., and Vygen, J. [2018]: Algorithmische Mathematik. Second edition, Springer, 2018.
Kafer, S. [2022]: Polyhedral Diameters and Applications to Optimization. PhD Thesis, University of
Waterloo, 2022.
Kalai, G., and Kleitman, D. [1992]: A quasi-polynomial bound for the diameter of graphs of polyhedra.
Bulletin of the American Mathematical Society, 26, 315–316, 1992.
Khachiyan, L. [1979]: A polynomial algorithm for linear programming. Soviet Mathematics Doklady,
20, 191–194, 1979.
Klee, V., and Minty, G.J. [1972]: How good is the simplex algorithm? In: Inequalities III (O. Shisha,
ed.), Academic Press, 159–175, 1972.
Korte, B., and Vygen, J. [2018]: Combinatorial Optimization: Theory and Algorithms. Sixth edition,
Springer, 2018.
Lee, T., Sidford, A., Wong, S.C. [2015]: A Faster Cutting Plane Method and its Implications for
Combinatorial and Convex Optimization. arxiv.org/abs/1508.04874, Symposium on Foundations of
Computer Science, 2015.
Lenstra, H.W. [1983]: Integer programming with a fixed number of variables. Mathematics of Operations
Research, 8, 538–548, 1983.
123
Matoušek, J., and Gärtner, B. [2007]: Understanding and Using Linear Programming. Springer, 2007.
Megiddo, N. [1984]: Linear programming in linear time when the dimension is fixed. Journal of the
ACM, 31, 114–127, 1984.
Mehlhorn, K., and Saxena, S. [2015]: A still simpler way of introducing the interior-point method for
linear programming. Computer Science Review, 22, 1–11, 2016.
Padberg, M. [1999]: Linear Optimization and Extensions. Second edition, Springer, 1999
Padberg, M., and Rao, M. [1982]: Odd minimum cut-sets and b-matchings. Mathematics of Operations
Research, 7, 67–80, 1982.
Padberg, M., and Rinaldi, G. [1991]: A Branch-and-Cut Algorithm for the Resolution of Large-Scale
Symmetric Traveling Salesman Problems. SIAM Review, 33, 1, 60–100, 1991.
Panik, M.J. [1996]: Linear Programming: Mathematics, Theory and Algorithms. Kluwer Academic
Publishers, 1996.
Roos, C., Terlaky, T., Vial, J.-P. [2005]: Interior Point Methods for Linear Optimization. Second
edition, Springer, 2005.
Rubin, D. [1970]: On the unlimited number of faces in integer hulls of linear programs with a single
constraint. Operations Research, 18, 5, 940 – 946, 1970.
Santos, F. [2011]: A counterexample to the Hirsch conjecture. Annals of Mathematics, 176, 1, 383–412,
2011.
Sierksma, G., and Zwols, Y. [2015]: Linear and Integer Optimization. Theory and Practice. Third
edition, CRC Press, 2015.
Spielmann, D.A., and Teng, S.-H. [2005]: Smoothed analysis of algorithms: Why the simplex algorithm
usually takes polynomial time. Journal of the ACM, 51, 3, 385 – 463, 2004.
Strang, G. [1980]: Linear Algebra and Its Applications. Second edition, Academic Press, 1980.
Tardos, É. [1986]: A strongly polynomial algorithm to solve combinatorial linear programs. Operations
Research, 34, 2, 250 – 256, 1986.
Terlaky, R.J. [2001]: An easy way to teach interior point methods. European Journal of Operational
Research, 130, 1–19, 2001.
Todd, M. [2014]: An improved Kalai-Kleitman bound for the Diameter of a polyhedron. SIAM Jounal
on Discrete Mathematics, 28, 4, 1944–1947, 2014.
124
Vanderbei, R.J. [2014]: Linear Programming: Foundations and Extensions. Fourth edition, Springer,
2014.
Ye, Y. [1992]: On the finite convergence of interior-point algorithms for linear programming. Mathe-
matical Programming, 57, 325–335, 1992.
Ye, Y. [1997]: Interior Point Algorithms. Theory and Analysis. Wiley, 1997.
125