0% found this document useful (0 votes)
18 views69 pages

2025 Lent Partii Logic and Set Notes

The document is a comprehensive overview of propositional logic, set theory, and related concepts, structured into sections covering definitions, examples, and theorems. It introduces key terms such as tautology, semantic entailment, and syntactic entailment, along with axioms and rules of deduction. The document aims to establish foundational principles in logic and set theory, culminating in the completeness theorem.

Uploaded by

Angelo Oppio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views69 pages

2025 Lent Partii Logic and Set Notes

The document is a comprehensive overview of propositional logic, set theory, and related concepts, structured into sections covering definitions, examples, and theorems. It introduces key terms such as tautology, semantic entailment, and syntactic entailment, along with axioms and rules of deduction. The document aims to establish foundational principles in logic and set theory, culminating in the completeness theorem.

Uploaded by

Angelo Oppio
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

Part II Logic and Set Theory

András Zsák
Lent 2025

Contents
1 Propositional Logic 1

2 Well-Orderings and Ordinals 9

3 Posets and Zorn’s Lemma 22

4 First-order Predicate Logic 30

5 Set Theory 45

6 Cardinal Arithmetic 58

7 *Classical descriptive set theory* 62

1 Propositional Logic

Definition. The language of propositional logic consists of a set P of primitive


propositions and the set L = L(P ) of propositions (or compound propositions),
which is defined inductively as follows.

(i) P ⊂ L,

(ii) ⊥ ∈ L (the symbol ‘⊥’ is read ‘false’ or ‘bottom’),

(iii) if p, q ∈ L then (p ⇒ q) ∈ L.

Examples. We often use P = {p1 , p2 , p3 , . . . }. In other words, we need a count-


able infinite supply of primitive propositions. The following are then examples
of compound propositions.

(p1 ⇒ p2 ) , ((p1 ⇒ ⊥) ⇒ p3 ) , ((p1 ⇒ p2 ) ⇒ (p1 ⇒ p3 ))

or, for any p ∈ L,


((p ⇒ ⊥) ⇒ ⊥)
is also in L.

1
S
Remarks. 1. ‘L defined inductively’ means, more precisely, that L = n∈N Ln ,
where
L1 = P ∪ {⊥}
and for n ∈ N,
Ln+1 = Ln ∪ {(p ⇒ q) : p, q ∈ Ln } .

2. A proposition is a finite string of symbols from the alphabet P ∪{⊥, ⇒, (, )}.


It is easy to check that L is the smallest (with respect to inclusion) subset of
the set Σ of all finite strings of symbols from the alphabet P ∪ {⊥, ⇒, (, )}
satisfying clauses (i)-(iii) above. Note that L 6= Σ. For example, the string
⇒ p1 )(
is in Σ but is not a proposition.
3. Every proposition is built uniquely using clauses (i)-(iii) above, i.e., for every
p ∈ L, either p ∈ P or p = ⊥ or p is (q ⇒ r) for unique q, r ∈ L.
4. We will later introduce other logical connectives like ∧ and ∨.

Semantic Entailment
Definition. A valuation on L is a function v : L → {0, 1} such that:
(i) v(⊥) = 0,
(
0 if v(p) = 1, v(q) = 0,
(ii) v(p ⇒ q) =
1 otherwise

for all p, q ∈ L.

Example. If v(p1 ) = 1 and v(p2 ) = 0, then v (⊥ ⇒ p1 ) ⇒ (p1 ⇒ p2 ) = 0 .

Proposition 1.
(i) If v and v 0 are valuations with vP = v 0P , then v = v 0 .

(ii) For any function w : P → {0, 1}, there exist a valuation v on L such that
vP = w.

Remark. The proposition says that a valuation is determined by its values on


P , and any values will do.
Proof. (i) Let L0 = {p ∈ L : v(p) = v 0 (p)}. By assumption, L1 ⊂ L0 . If
p, q ∈ L0 , then v(p ⇒ q) = v 0 (p ⇒ q) by definition. It follows that Ln ⊂ L0
implies Ln+1 ⊂ L0 for every n ∈ N. By induction, Ln ⊂ L0 for all n ∈ N, and
thus v = v 0 .
(ii) Set v(p) = w(p) for every p ∈ P , and set v(⊥) = 0. This defines v on L1 .
Having defined v on Ln for some n ∈ N, for p ∈ Ln+1 \ Ln , write p = (q ⇒ r)
for unique q, r ∈ Ln , and define
(
0 if v(q) = 1, v(r) = 0,
v(p) =
1 otherwise.
This defines v on Ln+1 . Thus, we obtain a valuation v on L with vP = w.

2
Definition. Say t ∈ L is a tautology if v(t) = 1 for all valuations v.

Definition. At this point, we enrich our language by adding the symbols >
(‘true’ or ‘top’), ∧ (‘and’), ∨ (‘or’), ¬ (‘not’) and ⇔ (‘iff’) as abbrevations as
follows.

> = (⊥ ⇒ ⊥)
¬p = (p ⇒ ⊥)
(p ∨ q) = (¬p ⇒ q)
(p ∧ q) = ¬(p ⇒ ¬q)
(p ⇔ q) = ((p ⇒ q) ∧ (q ⇒ p))

for any p, q ∈ L. We have v(>) = 1 for any valuation v, and so > is a tautology.
Similarly, v(¬p), v(p ∨ q) and v(p ∧ q) have the expected values.

Examples. The following three examples of tautologies will play a role later.

1. (p ⇒ (q ⇒ p)) (‘a true statement is implied by anything’).


We check:
v(p) v(q) v(q ⇒ p) v(p ⇒ (q ⇒ p))
0 0 1 1
0 1 0 1
1 0 1 1
1 1 1 1

Note that an identically 1 column indicates a tautology.

2. (¬¬p ⇒ p), i.e., (((p ⇒ ⊥) ⇒ ⊥) ⇒ p) (‘the law of excluded middle’).


This can also be written as (¬p ∨ p).

v(p) v(p ⇒ ⊥) v((p ⇒ ⊥) ⇒ ⊥) v(((p ⇒ ⊥) ⇒ ⊥) ⇒ p)


0 1 0 1
1 0 1 1

so a tautology.
  
3. p ⇒ (q ⇒ r) ⇒ (p ⇒ q) ⇒ (p ⇒ r)
Suppose not a tautology. Then
 
v p ⇒ (q ⇒ r) = 1 and v (p ⇒ q) ⇒ (p ⇒ r) = 0

for some valuation v. Then v(p ⇒ q) = 1 and v(p ⇒ r) = 0. Thus,


 v(p) =
1, v(r) = 0, and so v(q) = 1, v(q ⇒ r) = 0 and v p ⇒ (q ⇒ r) = 0, a
contradiction.

Definition. For S ⊂ L and t ∈ L, say S entails t (or semantically entails t),


written S  t, if for every valuation v,

v(s) = 1 for all s ∈ S implies v(t) = 1 ,

i.e., ‘whenever all of S is true, t is true as well.’

3
Examples. {p, p ⇒ q}  q {p ⇒ q, q ⇒ r}  (p ⇒ r)

Note. t is a tautology if and only if ∅  t, which we abbreviate to  t.

Definition. For t ∈ L, if v(t) = 1, then say t is true in v or v is a model of t.


For S ⊂ L, a valuation v is a model of S if v(s) = 1 for all s ∈ S.

Note. S  t says that t is true in every model of S.

Syntactic Entailment

A notion of proof consists of axioms and deduction rules. In Propositional Logic


we adopt the following propositions as axioms.
(A1) p ⇒ (q ⇒ p) (p, q ∈ L)
 
(A2) p ⇒ (q ⇒ r) ⇒ (p ⇒ q) ⇒ (p ⇒ r) (p, q, r ∈ L)

(A3) ¬¬p ⇒ p (p ∈ L)
These are more accurately called ‘axiom-schemes’, as each is an infinite collec-
tion of axioms.

Note. The axioms are all tautologies.

We will have only one deduction rule, called modus ponens: ‘from p and p ⇒ q,
can deduce q’.

For S ⊂ L and t ∈ L, a proof of t from S is a finite sequence t1 , t2 , . . . , tn of


propositions such that tn = t and for each i,
(i) either ti is an axiom,

(ii) or ti is a member of S (premiss or hypothesis)

(iii) or ti follows from earlier lines by modus ponens (MP): there exist j, k < i
with tk = (tj ⇒ ti ).

If there exists a proof of t from S, say S proves t, or syntactically entails t,


written S ` t.

Say t is a theorem if ∅ ` t, which we simply denote ` t.

Examples. 1. {p ⇒ q, q ⇒ r} ` (p ⇒ r)
 
p ⇒ (q ⇒ r) ⇒ (p ⇒ q) ⇒ (p ⇒ r) (A2)
(q ⇒ r) ⇒ (p ⇒ (q ⇒ r)) (A1)
q⇒r (premiss)
p ⇒ (q ⇒ r) (MP)
(p ⇒ q) ⇒ (p ⇒ r) (MP)
p⇒q (premiss)
p⇒r (MP)

4
2. ` (p ⇒ p)

p ⇒ (p ⇒ p) ⇒ p (A1)
    
p ⇒ (p ⇒ p) ⇒ p ⇒ p ⇒ (p ⇒ p) ⇒ (p ⇒ p) (A2)

p ⇒ (p ⇒ p) ⇒ (p ⇒ p) (MP)
p ⇒ (p ⇒ p) (A1)
p⇒p (MP)

In showing S ` t, the following result is often helpful.

Proposition 2. (Deduction Theorem) Let S ⊂ L and p, q ∈ L. Then we


have S ` (p ⇒ q) if and only if S ∪ {p} ` q.

Remark. So ‘⇒’ really does behave like implication in formal proofs.

Note. To show {p ⇒ q, q ⇒ r} ` (p ⇒ r) (Example 1 above), it is enough to


show that {p ⇒ q, q ⇒ r, p} ` r. This is much easier: write down all three
premisses and apply modus ponens twice.

Proof. Assume that S ` (p ⇒ q). Write down a proof of p ⇒ q from S and add
the following two lines to obtain a proof of q from S ∪ {p}:

p (premiss)
q (MP)

Conversely, assume that S ∪ {p} ` q and let t1 , t2 , . . . , tn = q be a proof of q


from S ∪ {p}. We show that S proves p ⇒ ti for each i by induction. Then in
particular, S proves p ⇒ q and we are done.
Case 1: If ti is an axiom or ti ∈ S, then

ti ⇒ (p ⇒ ti ) (A1)
ti (axiom or premiss)
p ⇒ ti (MP)

is a proof of p ⇒ ti from S.
Case 2: If ti = p, then S ` (p ⇒ ti ) since ` (p ⇒ p) (Example 2 above).
Case 3: Finally, if there exist j, k < i such that tk = (tj ⇒ ti ), then by induction
hypothesis, there are proofs of p ⇒ tj and p ⇒ (tj ⇒ ti ) from S. Adding the
lines
 
p ⇒ (tj ⇒ ti ) ⇒ (p ⇒ tj ) ⇒ (p ⇒ ti ) (A2)
(p ⇒ tj ) ⇒ (p ⇒ ti ) (MP)
p ⇒ ti (MP)

we obtain a proof of p ⇒ ti from S.


Our aim is to prove the Completeness Theorem: S ` t if and only if S  t. This
is made up of soundness (if S ` t then S  t) and adequacy (if S  t then S ` t).

5
Soundness says that our notion of proof is sound: it doesn’t lead to absurd
conclusions. Adequacy says that our notion of proof is sufficiently strong to
prove from S every semantic consequence of S.

Proposition 3. (Soundness Theorem) Let S ⊂ L and t ∈ L. Then S ` t


implies S  t.
Proof. Let t1 , t2 , . . . , tn = t be a proof of t from S. Let v be a model of S.
We show that v(ti ) = 1 for all i by induction. Then in particular, v(t) = 1, as
required.
If ti is an axiom, then v(ti ) = 1 since axioms are tautologies. If ti is a
premiss, then v(ti ) = 1 since v is a model of S. Finally, if there exist j, k < i
such that tk = (tj ⇒ ti ), then v(tj ) = v(tj ⇒ ti ) = 1 by induction hypothesis,
and hence v(ti ) = 1.

Definition. Let S ⊂ L. Say S is inconsistent if S ` ⊥; otherwise S is consis-


tent.

Note. A special case of adequacy: S  ⊥ implies S ` ⊥, i.e., if S has no model,


then S is inconsistent. Equivalently, if S is consistent, then S has a model.

Theorem 4. (Model existence lemma) Let S ⊂ L. If S is consistent, then


S has a model.

Idea. Note that if S ` t, then S  t (soundness), so v(t) = 1 for every model v


of S. So we could try (
1 if S ` t ,
v(t) =
0 otherwise.
However, this doesn’t work. It is possible to have t ∈ L with S 0 t and S 0 ¬t.
So we could try to enlarge S by adding either t or ¬t for every t ∈ L, while
keeping S consistent.
Proof. We present the proof in case the set P of primitive propositions is count-
able. The general case will be presented in Chapter 3. Note that if P is count-
able, then so is L1 = P ∪ {⊥}. S It follows by induction that Ln is countable
for every n ∈ N, and hence L = n Ln is also countable. We enumerate L as
t1 , t2 , t3 , . . . .
Next observe that if T ⊂ L is consistent and t ∈ L, then either T ∪ {t} or
T ∪ {¬t} is consistent. Indeed, if T ∪ {t} ` ⊥ and T ∪ {¬t} ` ⊥, then T ` ¬t
and T ` ¬¬t by the Deduction Theorem, which in turn implies that T ` ⊥ by
modus ponens – a contradiction.
We now start with a consistent S ⊂ L, set S0 = S and define Sn ⊂ L for
each n ∈ N inductively as follows. Having defined Sn−1 and assuming Sn−1 is
be either Sn−1 ∪{tn } or Sn−1 ∪{¬tn } whichever is consistent.
consitent, we let Sn S
Finally, we let S = n>0 Sn .
By construction, for each t ∈ L, either t ∈ S or ¬t ∈ S. We note the following
two properties of S.
S is consistent: if S ` ⊥, then since proofs are finite, we have Sn ` ⊥ for some
n – a contradiction.

6
S is deductively closed : if S ` t, then t ∈ S. Indeed, if t ∈
/ S, then ¬t ∈ S. It
follows that S proves both t and ¬t, and hence S ` ⊥ contradicting that S is
consistent.
We now define v : L → {0, 1} by
(
1 if S ` t (or equivalently, if t ∈ S),
v(t) =
0 otherwise.
We show that v is a valuation on L. It will then follow that v is a model of S
completing the proof.
Firstly, v(⊥) = 0 since ⊥ ∈
/ S as S is consistent. Next, we examine v(p ⇒ q)
for arbitrary p, q ∈ L.
Case 1: v(p) = 1 and v(q) = 0, i.e., p ∈ S and q ∈/ S. We need to show that
v(p ⇒ q) = 0. If not, then (p ⇒ q) ∈ S. Then the sequence
p (premiss)
p⇒q (premiss)
q (MP)

is a proof of q from S. Since S is deductively closed, it follows that q ∈ S –


contradiction.
Case 2: v(q) = 1, i.e., q ∈ S. We need to show that v(p ⇒ q) = 1, i.e., that
(p ⇒ q) ∈ S. The sequence
q (premiss)
q ⇒ (p ⇒ q) (A1)
p⇒q (MP)

is a proof of (p ⇒ q) from S. Since S is deductively closed, it follows that


(p ⇒ q) ∈ S as required.
/ S. We need to show that v(p ⇒ q) = 1, i.e., that
Case 3: v(p) = 0, i.e., p ∈
(p ⇒ q) ∈ S. As in previous cases, since S is deductively closed, it is enough
to show that S ` (p ⇒ q), which is equivalent to showing that S ∪ {p} ` q by
the Deduction Theorem. Note that ¬p ∈ S since p ∈ / S. The following then is
a proof of q from S ∪ {p}.
p (premiss)
¬p (premiss)
⊥ (MP)
⊥ ⇒ (¬q ⇒ ⊥) (A1)
¬¬q (MP)
¬¬q ⇒ q (A3)
q (MP).

Corollary 5. (Adequacy Theorem) Let S ⊂ L and t ∈ L. If S  t, then


S ` t.

7
Proof. If S  t, then S ∪ {¬t}  ⊥. Hence by Theorem 4, we have S ∪ {¬t} ` ⊥,
which in turn implies S ` ¬¬t by the Deduction Theorem. We now obtain a
proof of t from S by adding the following lines to a proof of ¬¬t from S.

¬¬t ⇒ t (A3)
t (MP).

Theorem 6. (Completeness Theorem) Let S ⊂ L and t ∈ L. Then S  t if


and only if S ` t.

Proof. ‘If’ is soundness (Proposition 3).


‘Only if’ is adequacy (Corollary 5).

Corollary 7. (Compactness Theorem) Let S ⊂ L and t ∈ L. If S  t, then


there is a finite subset S 0 of S such that S 0  t.
Proof. Trivial if ‘’ is replaced with ‘`’ since proofs are finite.

Note. This is highly non-trivial without completeness. A special case of Corol-


lary 7 is the following.

Corollary 8. Let S ⊂ L. If every finite subset of S has a model, then S has a


model.
Proof. If S does not have a model, then S  ⊥. By Corollary 7 there is a finite
subset S 0 of S such that S 0  ⊥.

Remark. Sometimes Corollary 8 is called the Compactness Theorem. It implies


Corollary 7. Indeed, assume that S  t. Then S ∪ {¬t}  ⊥. By Corollary 8
there is a finite subset S 0 of S such that S 0 ∪ {¬t}  ⊥, and thus S 0  t.

Corollary 9. (Decidability Theorem) Let S ⊂ L be a finite set and t ∈ L.


Then there is an algorithm that determines in finite time whether S ` t or not.
Proof. Trivial if ‘`’ is replaced with ‘’ by simply writing out a truth table of
S ∪ {t} for the 2n possible values of the primitive propositions appearing in
members of S ∪ {t}, where n is the number of such propositions.

Remark. If S ` t, then a proof can be found in finite time: by writing out all
proofs from S, we will eventually arrive at t. However, this algorithm does not
terminate if S 0 t.

8
2 Well-Orderings and Ordinals
A linear order or total order on a set X is a relation < on X that is
(i) irreflexive: ¬(x < x) for all x ∈ X.

(ii) transitive: (x < y) ∧ (y < z) ⇒ (x < z) for all x, y, z ∈ X.

(iii) trichotomous: (x < y) ∨ (x = y) ∨ (y < x) for all x, y ∈ X.


We will say ‘X is linearly ordered by <’ or simply ‘X is a linearly ordered set’.

Note. In (iii) exactly one of the three possibilities hold. E.g., if x < y and
y < x, then x < x by (ii), which contradicts (i).

Examples. N, Z, Q and R in the usual order. Note that N = {1, 2, 3, . . . }.

Note. For a set X of size at least 2, the relation on the power set PX of X (the
set of all subsets of X) defined by a < b if a ⊂ b and a 6= b is not trichotomous.
Note that a ⊂ b means: (x ∈ a) ⇒ (x ∈ b) for all x ∈ X, which includes the
case a = b.

Notation. We write ‘x > y’ for ‘y < x’ and ‘x 6 y’ for ‘x < y or x = y’. Then
the relation 6 is
(i) reflexive: x 6 x for all x ∈ X

(ii) antisymmetric: (x 6 y) ∧ (y 6 x) ⇒ (x = y) for all x, y ∈ X

(iii) transitive: (x 6 y) ∧ (y 6 z) ⇒ (x 6 z) for all x, y, z ∈ X

(iv) trichotomous: (x 6 y) ∨ (y 6 x) for all x, y ∈ X.

Note. If X is linearly ordered by <, then any subset Y of X is linearly ordered


by < (or, more precisely, by the restriction of < to Y ).

Definition. A well-ordering of a set X is a linear order < on X such that every


non-empty subset of X has a least element:

(∀ S ⊂ X) S 6= ∅ ⇒ (∃ x ∈ S)(∀ y ∈ S)(x 6 y)

Note that the least element is unique by antisymmetry.


We will say ‘X is well-ordered by <’ or simply ‘X is a well-ordered set’.

Examples. N is well-ordered by the usual order.


Z, Q and R are not (e.g., they have no least element).
The subset [0, ∞) of R does have a least element, but it is not well-ordered:
e.g., the subset (0, ∞) has no least element.

Note. A subset of a well-ordered set is well-ordered. We will see that Q has a


rich collection of well-ordered subsets.

Definition. Two linearly ordered sets X and Y are order-isomorphic if there


is a bijection f : X → Y that is order-preserving: x < y implies f (x) < f (y).

9
We say f is an order-isomorphism. Note that f −1 is also an order-isomorphism,
and thus x < y ⇐⇒ f (x) < f (y).

Note. If the linearly ordered sets X and Y are order-isomorphic, and X is


well-ordered, then so is Y .

Examples. N and Q are not order-isomorphic.


Q and Q \ {0} are order-isomorphic.
A = { 12 , 23 , 34 , . . . } is order-isomorphic to N (n 7→ n
n+1 ).
B = A ∪ {1} is well-ordered, not order-isomorphic to N.
C = A ∪ {2} is order-isomorphic to B.
D = A ∪ (A + 1) is well-ordered, not order-isomorphic to A or B.

Definition. A subset I of a linearly ordered set X is an initial segment of X


if x ∈ I and y < x implies y ∈ I.

Examples. {1, 2, 3, 4} is an initial segment of N, {1, 2, 3, 5} is not.


The real interval (0, 1) is an initial segment of (0, ∞).
In general, for every x ∈ X, the subset Ix = {y ∈ X : y < x} is a proper initial
segment of X by transitivity. Not every proper initial segment of X is of this
form in general (e.g., the subset (−∞, 1] of R).

Note. If I is a proper initial segment of a well-ordered set X, then I = Ix for


x the least element of X \ I. Indeed, if y ∈ Ix , then y < x, so y ∈ I by choice
of x. If y ∈ I and x 6 y, then x ∈ I as I is an initial segment – contradiction,
so y ∈ Ix .

Lemma 1. Let X and Y be well-ordered sets, I be an initial segment of Y


and f : X → I be an order-isomorphism. Then f (x) is the least element of
Y \ {f (y) : y < x} for every x ∈ X.
Proof. The set A = Y \ {f (y) : y < x} is not empty since f (x) ∈ A. Let a be
the least element of A. Then a 6 f (x) and f (x) ∈ I, so a ∈ I. Thus a = f (z)
for some z ∈ X. We need to show that z = x.
Since f (z) = a 6 f (x), it follows that z 6 x as f is order-preserving.
If z < x, then f (z) ∈
/ A by definition of A, which contradicts the choice of a.
Thus, z = x as required.

Proposition 2. (Proof by induction) Let X be a well-ordered set and S ⊂ X


such that for every x ∈ X the following holds: if y ∈ S for all y < x, then x ∈ S.
Then S = X.

Note. Assume that S is defined in terms of a property p: S = {x ∈ X : p(x)}.


Then Proposition 2 says:
  
(∀ x) (∀ y < x)p(y) ⇒ p(x) =⇒ (∀ x)p(x) .

The
 ‘base case’(p(x) for the least element x of X) is included in the assumption
‘ (∀ y < x)p(y) ⇒ p(x)’.

10
Proof. Is S 6= X, then X \ S has a least element x, say. For y < x, we have
y ∈ S by choice of x. It follows from the assumption on S that x ∈ S –
contradiction.

Remark. We next show an example of how induction is used.

Proposition 3. Let X and Y be well-ordered sets that are order-isomorphic.


Then there is a unique order-isomorphism from X to Y .

Note. This is false in general for linearly ordered sets. E.g., from Z → Z we
have n 7→ n or n 7→ n + 17; from [0, ∞) → [0, ∞) we have x 7→ x or x 7→ x2 .
Proof. Let f, g : X → Y be order-isomorphisms. We show (∀ x)(f (x) = g(x))
by induction.
Let x ∈ X and assume that f (y) = g(y) for all y < x (the induction hypothesis).
We need to show that f (x) = g(x), which will then complete the induction. By
Lemma 1, f (x) is the least element of A = Y \ {f (y) : y < x}, and g(x) is the
least element of B = Y \ {g(y) : y < x}. By the induction hypothesis, A = B,
and hence f (x) = g(x).

Remark. Induction allows us to prove things. We will also need a tool to


construct things: recursion. We first recall that a function from a set X to a
set Y is a subset f of X × Y such that
(i) for all x ∈ X there exists y ∈ Y with (x, y) ∈ f ;

(ii) for all x ∈ X and y, z ∈ Y , if (x, y) ∈ f and (x, z) ∈ f , then y = z.


We of course write ‘y = f (x)’ for ‘(x, y) ∈ f ’ and say that ‘f maps x to y’.
Note that f ∈ P(X × Y ).
For Z ⊂ X, the restriction of f to Z is f Z = {(x, y) ∈ f : x ∈ Z} which is a
function Z → Y . Note that fZ is a subset of Z × Y , and so in particular fZ
is also an element of P(X × Y ).

Theorem 4. (Definition by recursion) Let X be a well-ordered set and Y


be an arbitrary set. Then for every function G : P(X ×Y ) → Y there is a unique
function f : X → Y such that f (x) = G(fIx ) for all x ∈ X.
Proof. Say h is an attempt if h is a function I → Y , where the domain of h,
denoted dom(h), is an initial segment I of X such that h(x) = G(hIx ) for every
x ∈ dom(h) (note that x ∈ dom(h) implies Ix ⊂ dom(h)). We need to show
that there is a unique attempt whose domain is X.
We first show that if h, h0 are attempts, then h(x) = h0 (x) for all x ∈ dom(h) ∩
dom(h0 ), which in particular will show the uniqueness in the statement of the
theorem. We prove this by induction. Fix x ∈ dom(h) ∩ dom(h0 ) and assume
that h(y) = h0 (y) for all y < x (induction hypothesis). Then hIx = h0 Ix , and
thus h(x) = G(hIx ) = G(h0Ix ) = h0 (x).
S
We complete the proof of existence by setting f = {h : h is an attempt}.
Then f is a function since for any x ∈ X, if there is an attempt h defined at x,
then the value h(x) is independent of h by what we showed above.

11
S
The domain of f is the union {dom(h) : h is an attempt}, and thus dom(f )
is an initial segment of X.
For any x ∈ dom(f ), there is an attempt h such that f (x) = h(x). It follows
that f (y) = h(y) for all y < x, and hence f (x) = h(x) = G(hIx ) = G(f Ix ).
Thus, f is an attempt.
Finally, assume that dom(f ) 6= X. Then dom(f ) = Ix for some x ∈ X. In
particular, there is no attempt defined at x. However, f ∪ {(x, G(f )} is an
attempt defined at x. Thus, f is defined on the whole of X.

Proposition 5. (Subset collapse) Let Y be a well-ordered set and X ⊂ Y .


Then X is order-isomorphic to a unique initial segment of Y .
Proof. For uniqueness, assume that f is an order-isomorphism from X to an
initial segment of Y . By Lemma 1, we have

f (x) = min Y \ {f (y) : y ∈ X, y < x} .
It follows by induction that f is uniquely determined.
For existence, we may assume Y 6= ∅. Fix y0 ∈ Y and define f : X → Y by
recursion as follows:
( 
min Y \ {f (y) : y ∈ X, y < x} if this exists,
f (x) =
y0 otherwise.

We first show that the ‘otherwise’ clause never arises by showing that f (x) 6 x
for all x ∈ X. Indeed, fix x ∈ X and assume that f (y) 6 y holds for all y ∈ X
with y < x. Then x ∈ Y \ {f (y) : y ∈ X, y < x}, and hence f (x) 6 x. The
claim follows by induction.
Given y < x in X, since
f (x) ∈ Y \ {f (z) : z ∈ X, z < x} ⊂ Y \ {f (z) : z ∈ X, z < y} ,
it follows that f (y) < f (x). Thus, f is order-preserving.
Finally, assume that a ∈ Y \ im(f ). We show by induction that f (x) < a for all
x ∈ X, which shows that im(f ) is an initial segment of Y . Fix x ∈ X and assume
that f (y) < a for all y ∈ X with y < x. Then a ∈ Y \ {f (y) : y ∈ X, y < x},
and thus f (x) < a, as required.

Remark. It follows from Proposition 5 that a well-ordered set X is not order-


isomorphic to any proper initial segment of X.

Notation. For well-ordered sets X, Y , we write X 6 Y if X is order-isomorphic


to an initial segment of Y .

Theorem 6. Let X, Y be well-ordered sets. Then either X 6 Y or Y 6 X.


Proof. Assume that Y 66 X. Then in particular, Y 6= ∅. Fix y0 ∈ Y and define
f : X → Y by recursion as follows.
( 
min Y \ {f (y) : y < x} if this exists,
f (x) =
y0 otherwise.

12
If the ‘otherwise’ clause ever arises, then let x be the least element of X for
which this happens. Then f (Ix ) = Y and for y < x the ‘otherwise’ clause does
not arise in the definition of f (y). It follows as in the proof of Proposition 5
that f is an order-isomorphism from Ix to Y contradicting Y 66 X. Hence
the ‘otherwise’ clause never arises, and then it follows again as in the proof
of Proposition 5 that f is an order-isomorphism from X to an initial segment
of Y .

Proposition 7. Let X, Y be well-ordered sets. If X 6 Y and Y 6 X, then X


and Y are order-isomorphic.

Proof. Let f : X → Y and g : Y → X be order-isomorphisms to initial segments


of Y and X, respectively. Then g ◦ f is an order-isomorphism from X to an
initial segment of X. By uniqueness in Proposition 5, and by Proposition 3, it
follows that g ◦ f = IdX , the identity on X. Similarly, f ◦ g = IdY .

Remark. What the above shows is that ‘6’ is a linear order (reflexive, anti-
symmetric, transitive and trichotomous) on the collection of well-ordered sets
provided we identify order-isomorphic sets. (We haven’t showed transitivity but
that is straightforward.) It is natural to introduce the corresponding ‘<’ sign
as follows. For well-ordered sets X, Y , write X < Y to mean ‘X 6 Y and X
not order-isomorphic to Y ’. Equivalently, X < Y if and only if X is order-
isomorphic to a proper initial segment of Y . Then ‘<’ is irreflexive, transitive
and (regarding order-isomorphic well-ordered sets the same) trichotomous. A
natural question arises: Is the collection of all well-ordered sets a well-ordered
set? We return to this question later in the chapter, but first show how to build
new well-ordered sets from old ones.

There is always another one. Let X be a well-ordered set. Fix z ∈ / X


and define X + = X ∪ {z} well-ordered by extending the well-ordering < on X
by defining x < z for all x ∈ X. Then X + is uniquely defined up to order-
isomorphism (i.e., it does not depend on the choice of z) and X < X + .

Note. For any set X, there is always some z not in X, for example, because
there is no surjection from X to the power set PX by Cantor’s diagonal argu-
ment.

Upper bounds. We next want to show that if {Xi : i ∈ I} is a set of well-


ordered sets, then there is a well-ordered set X such that Xi 6 X for all i ∈ I.
For well-ordered sets (X, <X ) and (Y, <Y ), we say Y extends X if X ⊂ Y , <X
is the restriction of <Y to X (formally, <X =<Y ∩(X × X)) and X is an initial
segment of Y . We say that the set {Xi : i ∈ I} is nested if for all i, j ∈ I, either
Xj extends Xi or Xi extends Xj .

Proposition 8. Let {Xi : i ∈ I} be a nested set of well-ordered sets. Then


there is a well-ordered set X such that Xi 6 X for all i ∈ I.
S
Proof. Set X = i∈I Xi and for x, y ∈ X set x < y if and only if there exists
i ∈ I with x, y ∈ Xi and x <i y, where <i denotes the well-ordering of Xi .
From the assumption that the Xi are nested, it follows that < is a well-defined
linear order on X and each Xi is an initial segment of X.

13
Given a non-empty subset S ⊂ X, we have S ∩ Xi 6= ∅ for some i ∈ I. Since Xi
is well-ordered, S ∩ Xi has a least element x. Since Xi is an initial segment of
X, it follows that x is a least element of S.

Remark. The same result holds without the assumption that the Xi are nested
(see Chapter 5). The well-ordered set X constructed in the proof above is in
fact a least upper bound for {Xi : i ∈ I}.

Ordinals
Definitions. An ordinal is a well-ordered set with two ordinals regarded the
same if they are order-isomorphic. The order-type of a well-ordered set X is the
unique ordinal to which X is order-isomorphic.

Remark. The formal definition of ordinal will be given in Chapter 5. For now
you can view the word ordinal as shorthand for identifying well-ordered sets
that are order-isomorphic. The results in this chapter can be expressed purely
in terms of well-ordered sets.

Examples. For k ∈ {0} ∪ N we write k for the order-type of a well-ordered set


of size k. We let ω denote the order-type of N (same as order-type of {0} ∪ N).
Note that in Q, the set A = { 12 , 23 , 34 , . . . } also has order-type ω.

Definition. Let α, β be ordinals and X, Y be well-ordered sets with order-types


α, β, respectively. We write α 6 β if X 6 Y and α < β if X < Y . Similarly, we
write α+ for the order-type of X + .

Note. The notions above are well-defined, i.e., they don’t depend on the choice
of X and Y . For arbitrary ordinals α, β, we have either α 6 β or β 6 α
(Theorem 6), and if α 6 β and β 6 α, then α = β (Proposition 7).

Theorem 9. Let α be an ordinal. Then the ordinals strictly smaller than α


form a well-ordered set of order-type α.
Proof. Fix a well-ordered set X with order-type α and form the set

X 0 = {Y ⊂ X : Y is a proper initial segment of X} .

Then X 0 is linearly ordered by the relation < by Propositions 5 and 7 and by


Theorem 6. Note that here we do not have to identify well-ordered sets that
are order-isomorphic: Indeed, if Y, Z ∈ X 0 are order-isomorphic, then Y = Z
by Proposition 5.
The map x 7→ Ix is an order-isomorphism from X to X 0 , and thus X 0 is
well-ordered with order-type α. We can now form the set

I = {order-type(Y ) : Y ∈ X 0 }

which consists precisely of all ordinals strictly less than α. It is linearly ordered
by <, and moreover, the map Y 7→ order-type(Y ) is an order-isomorphism
X 0 → I, and hence I is well-ordered of order-type α.

14
Note. It is natural to denote the set of ordinals {β : β < α} by Iα . It is a
well-ordered set of order-type α.

Proposition 10. A non-empty set S of ordinals has a least element.

Proof. Fix α ∈ S. If α is not a least element of S, then S ∩ Iα 6= ∅. Then S ∩ Iα


has a least element β by Theorem 9. Since Iα is an initial segment of ordinals,
it follows that β is a least element of S.

Theorem 11. (Burali-Forti paradox) The ordinals do not form a set.


Proof. Assume that the ordinals do form a set X. Then X is well-ordered by
Proposition 10. Let α be the order-type of X. Then α ∈ X and Iα is a proper
initial segment of X also of order-type α. Thus, X is order-isomorphic to its
proper initial segment Iα contradicting Proposition 5.

Remark. Let S = {αi : i ∈ I} be a set of ordinals. Applying Proposi-


tion 8 to the nested set {Iαi : i ∈ I}, we obtain an ordinal α that is an
upper bound for S. The least element of the non-empty set {β ∈ Iα ∪ {α} :
β is an upper bound
S for S} of ordinals is a least upper bound for S denoted
sup S. Note that i∈I Iαi = Isup S .

Examples. We now list further examples of ordinals. We have already seen


0, 1, 2, . . . , ω. Note that ω = sup{0, 1, 2, . . . }. The next ordinal is ω + which we
write as ω + 1. For now this is just notation but it is consistent with ordinal
addition defined later in this chapter. We then have ω + 1, ω + 2, . . . . It is
natural to denote sup{ω, ω + 1, ω + 2, . . . } by ω + ω or by ω · 2. The latter
notation is consistent with ordinal multiplication defined later in the chapter.
We continue with a few more ordinals.

0, 1, 2, 3, . . . ω, ω + 1 (officially ω + ), ω + 2 (officially ω ++ ), ω + 3, . . .
ω + ω = ω · 2 (officially sup{ω, ω + 1, ω + 2, . . . }), ω · 2 + 1, ω · 2 + 2, ω · 2 + 3,
. . . ω · 3, . . . ω · 4 . . . ω · 5, ω · ω = ω 2 (officially sup{ω, ω · 2, ω · 3, . . . }), ω 2 + 1,
ω 2 + 2, . . . ω 2 + ω, ω 2 + ω + 1, . . . ω 2 + ω · 2, . . . ω 2 + ω · 3, . . . ω 2 + ω 2 = ω 2 · 2,
ω 2 · 2 + 1, . . . ω 2 · 2 + ω, . . . ω 2 · 3, . . . ω 2 · 4, . . . ω 3 , . . . ω 3 · 2, . . . ω 3 · 3, . . .
ω 4 , . . . ω 5 , . . . ω ω (officially sup{ω, ω 2 , ω 3 , . . . }), . . . ω ω · 2, . . . ω ω · 3, . . .
2 2
ω ω ·ω = ω ω+1 , . . . ω ω+2 , . . . ω ω+3 , . . . ω ω·2 , . . . ω ω·3 , . . . ω ω·4 , . . . ω ω = ω (ω ) ,
2 2 3 4 ω ω2 ω3 ωω
·2 ·3
. . . ωω , . . . ωω , . . . ωω , . . . ωω , . . . ωω , . . . ωω , . . . ωω , . . . ωω ,
.
..
ωω ω
. . . ε0 = ω ω (officially sup{ω, ω ω , ω ω , . . . }), ε0 + 1, ε0 + 2, . . . ε0 + ω,
2
. . . ε0 + ε0 = ε0 · 2, . . . ε0 · 3, . . . ε0 · ω, . . . ε0 · ε0 = ε20 , . . . ε30 , . . . εω ω
0 , . . . ε0 ,
..
. ε ε.
.. ε ε 0 ε 0
3 ω ωω ε00 ε0 0 ε0 0
. . . εω ω ω
0 , . . . ε0 , . . . ε0 = εε00 , . . . ε0 , . . . ε0 , . . . ε1 = ε0 , . . . ε2 ,
. . . ε3 , . . . εω , . . . εε0 , . . . εεε0 , . . . εεε . . , . . .
.

15
Note. All the ordinals above are countable! By this we mean, of course, that
they are order-types of countable well-ordered sets.

Questions. Does there exist an uncountable ordinal? I.e., does there exist an
uncountable well-ordered set? Can we well-order R?

Theorem 12. There exists an uncountable ordinal.

Idea. If there is an uncountable ordinal, then there is a least one, α say. Then
Iα is the set of countable ordinals, i.e., the set of order-types of well-orderings
of subsets of N.
Proof. We can form the set

A = {(M, R) ∈ PN × P(N × N) : R is a well-ordering of M }

of well-orderings of subsets of N. Then the set

B = {order-type(X) : X ∈ A}

consists of all countable ordinals. Set ω1 = sup B. If ω1 is countable, then so is


ω1+ , and hence ω1+ ∈ B. It follows that ω1+ 6 ω1 , which is a contradiction.

Note. The ordinal ω1 constructed in the proof above is the least uncountable
ordinal. (Indeed, if α < ω1 , then α < β for some β ∈ B, and so α is countable.)
It follows that every proper initial segment of ω1 is countable. If α1 , αS2 , . . . are
countable ordinals, then so is sup{α1 , α2 , . . . } being the order-type of n∈N Iαn .
(Here we are using the fact that a countable union of countable sets is countable.)

Theorem 13. (Hartogs’ Lemma) For every set X, there is an ordinal γ


which does not inject into X.
Proof. This is a generalisation of Theorem 12. We form the set B of order-types
of well-orderings of subsets of X. We let γ = (sup B)+ . If there is an injection
from γ to X, then this induces a well-ordering on a subset of X with order-type
γ. Thus, γ ∈ B, and so γ 6 sup B < γ — contradiction.

Notation. The least ordinal that does not inject into X is denoted γ(X).

Types of ordinals. Let α be an ordinal. We have two cases according to


whether α (or, more precisely, any well-ordered set of order-type α, e.g., Iα )
has a greatest element or not.

If Iα has greatest element β, then Iα = Iβ ∪ {β}, and hence α = β + . In this


case, we say α is a successor ordinal. Note that in this case β = sup Iα < α.

If Iα has no greatest element, then α = sup Iα and we say α is a limit ordinal

Examples. Rather trivially, 0 is a limit ordinal, whereas 1 = 0+ is a successor.


Any ordinal n with 0 < n < ω is a successor since any non-empty, finite well-
ordered set has a greatest element. Since ω = sup{n : n < ω}, it follows that ω
is a limit. On the other hand, ω + is a successor.

16
Ordinal Arithmetic
Ordinal addition. For ordinals α, β, we define α + β by recursion on β, with
α fixed, as follows.
α+0=α
α + β + = (α + β)+
α + λ = sup{α + β : β < λ} for a non-zero limit ordinal λ .

Remark. Technically, since the ordinals do not form a set, we need to fix an
ordinal γ and define α + β for β < γ by recursion in the well-ordered set Iγ . The
definition is then independent of γ by uniqueness of recursion. This justifies the
recursive definition above and others given below.

In a similar way, induction on ordinals works even though ordinals do not form
a set. Indeed, assume p is a property of ordinals. Then
(∀ α)((∀ β < α)p(β) ⇒ p(α)) =⇒ (∀ α)p(α)
since otherwise the assumption (∀ α)((∀ β < α)p(β) ⇒ p(α)) holds, but for some
γ we have ¬p(γ). Then the non-empty set S = {β 6 γ : ¬p(β)} has a least
element α. By minimality, we then have (∀ β < α)p(β) which implies p(α) by
our assumption, which contradicts α ∈ S.

Examples. α + 1 = α + 0+ = (α + 0)+ = α+ for every ordinal α.

m + 0 = m and m + (n + 1) = m + n+ = (m + n)+ = (m + n) + 1 for any


m, n < ω. This is the usual recursive definition of integer addition.

ω + 1 = ω + by above, and so ω + 2 = ω + 1+ = (ω + 1)+ = ω ++ .

ω + ω = sup{ω + n : n < ω} = sup{ω, ω + 1, ω + 2, . . . }.

1 + ω = sup{1 + n : n < ω} = sup{1, 2, 3, . . . } = ω 6= ω + 1. Thus ordinal


addition is not commutative.

Proposition 14. β 6 γ implies α + β 6 α + γ.


Proof. We proceed by induction on γ with α fixed. We consider three cases.

γ = 0: if β 6 γ, then β = 0, so α + β = α + γ = α.

γ = δ + is a successor: if β 6 γ, then either β = γ and the result is clear, or


β 6 δ, and so by induction,
α + β 6 α + δ < (α + δ)+ = α + γ .

γ is a non-zero limit ordinal: if β 6 γ, then again we can assume that β < γ,


and so β < δ for some δ < γ by the definition of a limit ordinal. It follows by
induction and by the definition of α + γ that
α+β 6α+δ 6α+γ .

17
Remark. It follows directly from the result above that β < γ implies α + β <
α + γ. Indeed, we have β + 6 γ, and hence

α + β < (α + β)+ = α + β + 6 α + γ .

Note, however, that β < γ does not imply β + α < γ + α in general. For
example, 1 < 2 but 1 + ω = 2 + ω = ω. On the other hand, β 6 γ does imply
β + α 6 γ + α (by induction on α).

Lemma 15. Let S be a non-empty set of ordinals. Then

α + sup S = sup{α + β : β ∈ S}

for any ordinal α.


Proof. Put T = {α + β : β ∈ S}.

For β ∈ S, we have β 6 sup S, and hence α + β 6 α + sup S by Proposition 14.


It follows that sup T 6 α + sup S.

To show the reverse inequality, first assume that S has a greatest element γ.
Then γ = sup S, and α + γ is also the greatest element of T by Proposition 14.
It follows that sup T = α + γ = α + sup S.

Now assume that S has no greatest element. Then λ = sup S is a non-zero limit
ordinal. Indeed, λ ∈ / S as S has no greatest element, and so S ⊂ Iλ , which
implies that λ = sup S 6 sup Iλ . It follows that λ = sup Iλ and λ is a limit
ordinal (λ > 0 since S 6= ∅). By definition of ordinal addition, we have

α + sup S = α + λ = sup{α + γ : γ < λ} .

Given γ < λ, there exists β ∈ S with γ < β. Hence α + γ 6 α + β 6 sup T . It


follows that α + sup S 6 sup T .

Proposition 16. α + (β + γ) = (α + β) + γ.
Proof. We proceed by induction on γ with α, β fixed. As usual, there are three
cases.

γ = 0: α + (β + 0) = α + β = (α + β) + 0.

γ = δ + is a successor: Then by induction, we get


+  +
α + (β + γ) = α + (β + δ)+ = α + (β + δ) = (α + β) + δ = (α + β) + γ .


γ is a non-zero limit: Then using Lemma 15 and induction, we get

α + (β + γ) = α + sup{β + δ : δ < γ}
= sup{α + (β + δ) : δ < γ}
= sup{(α + β) + δ : δ < γ} = (α + β) + γ .

18
Remark. The definition of ordinal addition we gave above is called inductive
definition. We now give an alternative.

Synthetic definition of ordinal addition. For well-ordered sets X, Y , we


write X t Y for X × {0} ∪ Y × {1} (the disjoint union of X and Y ) well-ordered
by the relation

either i = j = 0 and x < y in X

(x, i) < (y, j) ⇐⇒ or i = j = 1 and x < y in Y

or i = 0, j = 1.

Informally, this is ‘X followed by Y ’. For ordinals α, β, we define α + β to be


the order-type of α t β or, more precisely, the order-type of X t Y , where X is
a well-ordered set of order-type α and Y is a well-ordered set of order-type β.

Remark. In a moment we show that the synthetic and inductive definitions


coincide. For now, observe that α+1 = α+ for any ordinal α holds by definition.
Also, associativity is easy in the synthetic definition, since X t (Y t Z) is order-
isomorphic to (X t Y ) t Z for any well-ordered sets X, Y, Z. The inequality
α + β 6 α + γ for β 6 γ also follows easily since if Y is an initial segment of Z,
then X t Y is an initial segment of X t Z.

Proposition 17. The synthetic and inductive definitions of ordinal addition


coincide.
Proof. Let us temporarily write α u β for the synthetic definition while keeping
α + β for the inductive definition of addition. We prove α + β = α u β by
induction on β with α fixed.

β = 0: α + 0 = α = order-type(α t 0) = α u 0.

β = δ + is a successor: by induction, we have α + β = (α + δ)+ = (α u δ)+ . The


latter is the order-type of (α t δ) t 1 which is order-isomorphic to α t (δ t 1)
and, in turn, to α t β. Thus, α + β = α u β.

β is a non-zero limit: By induction we have

α + β = sup{α + δ : δ < β} = sup{α u δ : δ < β} .

For δ < β, α u δ is the order-type of αSt δ, and the supremum of the nested set
{α t δ : δ < β} of well-ordered sets is δ<β (α t δ) = α t β which has order-type
α u β. This completes the proof that α + β = α u β in this case.

Ordinal multiplication. As for addition, we give an inductive and a synthetic


definition. We start with the former: we define α · β by recursion on β with α
fixed:

α·0=0
α · β+ = α · β + α
α · λ = sup{α · δ : δ < λ} for a non-zero limit ordinal λ .

19
For the synthetic definition, we first define the product X × Y of well-ordered
sets to be their Cartesian product well-ordered as follows:
(
either y < v in Y
(x, y) < (u, v) ⇐⇒
or y = v and x < u in X .

We then define α·β to be the order-type of α×β or, more precisely, the order-type
of X × Y , where X is a well-ordered set of order-type α and Y is a well-ordered
set of order-type β. It is then straightforward to verify (by induction on β) that
the two definitions coincide.

Examples. For m, n < ω, m · 0 = 0 and m · (n + 1) = m · n+ = m · n + m,


which is the usual inductive definition of integer multiplication.

ω · 2 = ω · 1+ = ω · 1 + ω = ω · 0+ + ω = (ω · 0 + ω) + ω = ω + ω.

2 · ω = sup{2 · n : n < ω} = ω. So ordinal multiplication is not commutative.

Properties. The following can be verified either by induction on γ (or by


induction on α for the last implication) or by using the synthetic definition.
α(βγ) = (αβ)γ

β 6 γ ⇒ αβ 6 αγ.

β < γ ⇒ αβ < αγ provided α > 0.

β 6 γ ⇒ βα 6 γα.
Note, however, that the last inequality cannot be strengthened. E.g., 1 < 2 but
1 · ω = 2 · ω = ω.

Exercise. Which, if any, of the following distributive laws hold: α(β + γ) =


αβ + αγ and (α + β)γ = αγ + βγ?

Ordinal exponentiation. We define αβ by recursion on β:

α0 = 1
+
αβ = αβ · α
αλ = sup{αδ : δ < λ} for a non-zero limit ordinal λ .

A synthetic definition also exists.

Examples. For m, n < ω we recover the usual meaning of mn .


+ +
ω 2 = ω 1 = ω 1 · ω = ω 0 · ω = (1 · ω) · ω = ω · ω

2ω = sup{2n : n < ω} = ω is countable!

An application to Functional Analysis (non-examinable)


Remark. This section is for those who attended the Part II Linear Analysis
course in the Michaelmas Term.

20
Fact. Every separable Banach space embeds isometrically into the separable
Banach space C[0, 1] of continuous functions on [0, 1] with the uniform norm.
Thus, the class SB of all separable Banach spaces has a universal element: there
is a member Z of SB that contains isomorphic (or, in this case, even isometric)
copies of every other member of the class.

Question. Does the class SR of separable reflexive spaces have a universal


member?

Solution by W. Szlenk. Szlenk introduced an ordinal index, known today as


the Szlenk index, which associates an ordinal Sz(X) to every separable Banach
space X. This has the following key properties.

(i) Sz(X) 6 ω1 and furthermore, Sz(X) < ω1 if and only if the dual space X ∗
of X is separable;

(ii) if the Banach space X isomorphically embeds into the Banach space Y ,
then Sz(X) 6 Sz(Y );

(iii) for every countable ordinal α, there exists a separable, reflexive Banach
space Xα such that Sz(Xα ) > α.

It follows immediately that the answer to the question posed above is ‘no’.
Indeed, if Z is a universal member of the class SR, then each Xα embeds iso-
morphically into Z. Then by property (ii) above, Sz(Z) > α for all countable
ordinals α. Thus, Sz(Z) = ω1 , which implies that Z ∗ is not separable by prop-
erty (i). Since Z is reflexive, Z ∗∗ = Z is separable, and hence Z ∗ is separable (as
the dual of a space is always at least as ‘big’ as the space itself) — contradiction.

END OF NON-EXAMINABLE SECTION

21
3 Posets and Zorn’s Lemma

Definition. A partial order on a set X is a relation 6 on X that is


reflexive: x 6 x for all x ∈ X;

antisymmetric: if x 6 y and y 6 x, then x = y for all x, y ∈ X;

transitive: if x 6 y and y 6 z, then x 6 z for all x, y, z ∈ X.

We then define x < y to mean x 6 y and x 6= y for x, y ∈ X. Then < is a


relation on X that is irreflexive and transitive.

Definition. A partially ordered set or poset is a set with a partial order on it.

Examples. 1. Any linearly ordered set is a poset.


2. On N letting a 6 b iff a|b is a partial order.
3. For any set S, the power set X = PS is a poset with a 6 b iff a ⊂ b.
4. Any subset of a poset is a poset by restricting the partial order to the subset.
E.g., if G is a group, then the subset {H ⊂ G : H a subgroup of G} of PG is a
poset.
5. This is an example of a poset given by a Hasse diagram:
f
s
@@ X = {a, b, c, d, e, f }
ds @se
@ a 6 b, a 6 c, b 6 d, c 6 d, c 6 e, d 6 f, e 6 f
@
s @s and all consequences by reflexivity and transitivity
b@ c
@
@s
a

In general, a Hasse diagram of a poset X is a drawing of the points of X with


x joined to y with an upward line if y covers x meaning: x < y and there is no
z with x < z < y.
For example, here are drawings of P{1, 2} and P{1, 2, 3}.

{1, 2, 3}
" b
{1, 2}
" b
"
" bb
@ {1, 2} {1, 3} {2, 3}
@ bb "" bb ""
{1} {2} "b "b
" b " b
@
@∅ {1} {2} {3}
b ""
b "
b "
b ∅"

Q has no Hasse diagram as there is no x, y ∈ Q such that y covers x.


6. The following Hasse diagram shows that unlike in the previous examples, in
general there is no sensible notion of ‘height’ or ‘rank’ in a Hasse diagram.

22
es
@@
@sd If the ‘height’ of a, c, d, e is 0, 1, 2, 3, respectively, then
b s
JJ s what should be the ‘height’ of b?
c
J
Js
a

d e
7. s s
T There needs to be no relation between different parts.
s TTs s

a b c

8. s s s s There needs to be no relation at all other than what is


a b c d necessary by reflexivity.

Definition. A subset S of a poset X is a chain if it is linearly ordered by the


partial order of X.

Examples. 1. Every subset of a linearly ordered set is a chain.


2. Every subset of a chain in a poset is a chain.
3. In N with a 6 b iff a|b, the set {1, 2, 4, 8, . . . } of powers of 2 is a chain.
4. The set {∅, {1}, {1, 2}, {1, 2, 3}} is a chain in P{1, 2, 3}.
5. In the poset es
@
@
@sd
b s {a, c, d, e} is a chain.
JJ s
c
J
Js
a

6. The set {(−∞, x) ∩ Q : x ∈ R} is an uncountable chain in PQ.

Definition. A subset S of a poset X is an antichain if no two distinct members


of S are related: for all x, y ∈ S, if x 6 y, then x = y.

Examples. 1. In a linearly ordered set, there is no antichain of size > 1.


2. In N with a 6 b iff a|b, the set of primes is an antichain.
3. in P{1, 2, . . . , n}, the family Fk = {A ⊂ {1, . . . , n} : |A| = k} is an antichain
for each k = 0, 1, . . . , n.
4. In f
s
@
@
ds @se
@
@
s @s {b, c} and {d, e} are antichains.
b@ c
@
@s
a

23
5. In s s s s the whole set {a, b, c, d} is an antichain.
a b c d
Definition. Let X be a poset, S ⊂ X and x ∈ X.
x is an upper bound for S if y 6 x for all y ∈ S.
x is a least upper bound or supremum for S if x is an upper bound for S and
x 6 y for every upper bound y for S.
W
If a supremum for S exists, it is unique and is denoted sup S or S.
S
Examples. 1. If S ⊂ PX, then sup S = {A : A ∈ S}.
2. In R, sup(0, 1) = 1 and sup[0, 1] = 1.
3. In Q, the set {x : x2 < 2} has an upper bound, e.g., 2, but it has no
supremum.
4. In the poset es
@
@
@sd
b s sup{a, b, c} = e.
JJ s
c
J
Js
a

5. In the poset c d
s s
l ,
,
l {a, b} has upper bounds c, d but
s
, ls {a, b} has no supremum.
a b

6. In the poset d e
s s
T {b, c} has no upper bounds.
s TTs s

a b c

Definition. A poset X is complete if every subset of X has a supremum.

Examples. 1. The real interval [0, 1] in the usual order is complete.


2. R is not complete; e.g., R has no upper bound.
3. [0, 2] ∩ Q is not complete; e.g., {x : x2 < 2} has no supremum.
4. For any set S, the power set X = PS is complete.

Note. If X is complete, then X has greatest element sup X and least element
sup ∅. In particular, X is non-empty.

Definition. A function f : X → Y between posets X and Y is order-preserving


if f (x) 6 f (y) whenever x 6 y.

Examples. 1. f : N → N, f (n) = n + 1.
2. f : P(S) → P(S), f (A) = A ∪ B, where B ⊂ S is fixed.

24
Note. If f : X → Y is order-preserving and injective, then x < y implies
f (x) < f (y). The converse holds if X is linearly ordered.

Theorem 1. (Knaster–Tarski fixed point theorem) Let X be a complete


poset and f : X → X order-preserving. Then f has a fixed point.
Proof. Set S = {x ∈ X : x 6 f (x)} and let z = sup S. We show that f (z) = z.
For x ∈ S, we have x 6 z, and hence x 6 f (x) 6 f (z). Thus, f (z) is an upper
bound for S, and so z 6 f (z). It follows that f (z) 6 f (f (z)), and so f (z) ∈ S
and f (z) 6 z.

Corollary 2. (Schröder–Bernstein theorem) Suppose that A, B are sets


such that there are injections f : A → B and g : B → A. Then there is a
bijection h : A → B.
Proof. We seek partitions A = P ∪ Q of A and B = R ∪ S of B such that
f (P ) = R and g(S) = Q. If we then define h : A → B by
(
f (x) if x ∈ P
h(x) =
g −1 (x) if x ∈ Q

then h is a bijection. To see that such partitions exist, define H : PA → PA by


H(P ) = A \ g(B \ f (P )). Then H is an order-preserving map on the complete
poset PA. Hence by Knaster–Tarski, H has a fixed point P . Setting Q = A \ P ,
R = f (P ) and S = B \ R, we get the desired partitions.

Zorn’s Lemma
Definition. Let X be a poset. Say x ∈ X is a maximal element of X if x 6 y
implies x = y for every y ∈ X, i.e., there is no y in X such that x < y.

Examples. 1. For a set S, the poset X = P(S) has maximal element X. In


fact, X is a maximum element.
2. In general, a greatest element is maximal, but the converse is false in general.
E.g., in the poset c d
s s
l ,
,
l c and d are maximal, but there is no maximum.
s
, ls
a b

3. In N with a 6 b iff a|b, there are no maximal elements.

Theorem 3. (Zorn’s Lemma) Let X be a (non-empty) poset in which every


chain has an upper bound. Then X has a maximal element.

Remark. The empty set is a chain in X, so it has an upper bound by as-


sumption. Thus X 6= ∅. However, we sometimes verify the conditions of Zorn’s
lemma by checking that X 6= ∅ and that every non-empty chain has an upper
bound.

25
Proof. Assume that X has no maximal elements. Then for every x ∈ X we
can fix an element x0 ∈ X with x < x0 . Let us also fix, for every chain C, an
upper bound u(C) for C in X. Let γ = γ(X) (from Hartogs’ lemma) and define
f : γ → X by recursion as follows:

f (0) = u(∅)

f (α + 1) = f (α)0

f (λ) = u({f (α) : α < λ}) λ 6= 0 limit.

An easy induction (on β with α fixed) shows that f (α) < f (β) for all α < β. It
follows that f is injective contradicting the choice of γ.

Remark. Technically, the definition of f (λ) above is only valid when {f (α) :
α < λ} is a chain, otherwise we should define f (λ) differently, e.g., we could set
f (λ) = u(∅) in that case. Then an easy induction shows that the ‘otherwise’
clause never arises.

Theorem 4. Every vector space V has a basis.


Proof. We seek a maximal (with respect to inclusion) linearly independent set
B. Any such set is a basis, i.e., spans V otherwise for any x ∈ V \ spanB the
set B ∪ {x} is linearly independent contradicting the maximality of B.
Let X = {A ⊂ V : A is linearly independent} partially ordered S by inclusion.
Let C = {Ai : i ∈ I} be a chain in X. We show that A = i∈IPAi is linearly
n
independent. Then A is an upper bound for C. So assume that j=1 λj xj = 0
is a linear relation on A. For each j = 1, . . . , n we have xj ∈ Aij for some
ij ∈ I. Since C is a chain, there is a k, 1 6 k 6 n, such that Aij ⊂ Aik for each
j = 1, . . . , n. Since Aik is linearly independent, it follows that λj = 0 for all j,
and hence A is linearly independent.
We showed that every chain in X has an upper bound. By Zorn’s Lemma, X
has a maximal element.

Remarks. 1. A similar argument shows that every linearly independent subset


B0 of V is contained in a basis B of V .
2. R is a vector space over the field Q. A basis of R over Q is called a Hamel
basis.
3. The real vector space RN of all real sequences has no countable basis. By
Theorem 4, a basis of RN does exist.
4. There are many examples for the use of Zorn’s Lemma in different areas of
mathematics. E.g., the existence of maximal ideals in rings with 1, the existence
of continuous linear functionals in normed spaces, the compactness of a product
of compact topological spaces (Tychonov’s theorem).

We next use Zorn’s Lemma to complete the proof of the Model Existence Lemma
from Chapter 1 without the assumption that the set of primitive propositions
is countable.

26
Theorem 5. Let P be any set of primitive propositions. Let S ⊂ L = L(P ) be
consistent. Then there exists S ⊂ L such that S ⊂ S and for all t ∈ L, either
t ∈ S or ¬t ∈ S.

Proof. We seek a maximal (with respect to inclusion) consistent set S ⊃ S. We


then complete the proof as follows. Given t ∈ L, one of S ∪ {t} and S ∪ {¬t} is
consistent since otherwise S ∪ {t} ` ⊥ and S ∪ {¬t} ` ⊥, which implies S ` ¬t
and S ` ¬¬t by the Deduction Theorem, and hence S ` ⊥ by modus ponens.
It follows by maximality of S that either t ∈ S or ¬t ∈ S.
Let X = {T ⊂ L : S ⊂ T, T is consistent} partially ordered by inclusion. Note
that X 6= ∅ since S ∈ X.S Next, let C = {Ti : i ∈ I} be a non-empty chain in
X. We show that T = i∈I Ti is an upper bound of C. We have S ⊂ T (as
C 6= ∅) and Ti ⊂ T for all i ∈ I. So we just need to show that T ∈ X, i.e.,
Sn If T ` ⊥, then since proofs are finite, there exist i1 , . . . , in in
T is consistent.
I such that j=1 Tij ` ⊥. Since C is a chain, for some k = 1, . . . , n we have
Sn
j=1 Tij = Tik ` ⊥ — contradiction.
By Zorn’s Lemma, X has a maximal element as required.

Theorem 6. (Well-ordering principle (WO)) Every set A can be well-


ordered.
Proof. Let X = {(B, R) ∈ PA × P(A × A) : R is a well-ordering on B}. We
partially order X by (B1 , R1 ) 6 (B2 , R2 ) if and only if B2 extends B1 (i.e.,
B1 ⊂ B2 , R1 = R2 ∩ (B1 × B1 ) and B1 is an initial segment of B2 ).
Note that X 6= ∅ as (∅, ∅) ∈ X. Assume that {(Bi , Ri ) : i ∈
SI} is a S non-empty

chain in X, i.e., a nested set of well-ordered sets. Then i∈I Bi , i∈I Ri is
an upper bound (cf. Proposition 8 in Chapter 2).
By Zorn’s Lemma, there is a maximal element (B, R) in X. If B 6= A,  then for
any x ∈ A \ B, the extension (B, R)+ = B ∪ {x}, R ∪ {(b, x) : b ∈ B} of (B, R)
contradicts the maximality of (B, R). Thus, A = B is well-ordered by R.

Example. R can be well-ordered which is surprising.

Remark. In applications of Zorn’s Lemma, like the previous example, the


maximal object whose existence it asserts cannot be described explicitely.

The Axiom of Choice (AC)

Note. In the proof of Zorn’s Lemma we used two functions: 0 : X → X,


x 7→ x0 ∈ {y ∈ X : x < y}, and u : {C ⊂ X : C is a chain} → X with
u(C) ∈ {x ∈ X : x is an upper bound for C}. These are examples S of choice
functions. Another example from IA Numbers and Sets: to show that n∈N An
is countable if each An is countable, we chose, for each n ∈ N, an injection
fn : An → N.

Axiom of Choice (AC). This is the assertion that for S every set X of non-
empty sets, X = {Ai : i ∈ I}, there is a function f : I → i∈I Ai with f (i) ∈ Ai
for all i ∈ I, called a choice function for X.

27
Note. This rule differs in character from other rules for building sets (e.g.,
union, power set) in that the object whose existence it asserts is not unique.
It is therefore often of interest whether a result whose proof uses AC can be
proved without AC. We show in a moment that ZL and WO both need AC.

Remark. We don’t need AC when I is finite. In this case, the existence of a


choice function can be proved by induction on |I|.

Theorem 7. AC ⇐⇒ ZL ⇐⇒ WO
Proof. We have already proved the implications AC⇒ZL (Theorem 3) and
ZL⇒WO (Theorem 6). It remainsSto show WO⇒AC. Let X = {Ai : i ∈ I} be
a set of non-empty sets. Let Y = i∈I Ai and fix a well-ordering of Y . Define
f : I → Y by letting f (i) be the least element of Ai . Then f is a choice function
of X.

Exercise. Prove the remaining three implications directly.

Remark. We finish this chapter with some additional material.

START OF NON-EXAMINABLE SECTION

Definition. A poset X is chain-complete if X 6= ∅ and every non-empty chain


in X has a supremum.

Examples. Every complete poset and every non-empty finite poset is chain-
complete. In general, if X is a poset, then
Y = {C ⊂ X : C is a chain}
partially ordered by inclusion is chain-complete.

Definition. A function f : X → X on a poset X is inflationary if x 6 f (x) for


all x ∈ X.

Theorem 8. (Bourbaki–Witt fixed-point theorem) If X is a chain-


complete poset and f : X → X is inflationary, then f has a fixed point.
Proof 1 (with AC). By ZL, X has a maximal element x. Then x 6 f (x), and
hence x = f (x).
Proof 2 (without AC). Let γ = γ(X) (from Hartogs’ Lemma). Fix x0 ∈ X
(X 6= ∅) and define g : γ → X recursively as follows:
g(0) = x0
g(α+ ) = f (g(α))
g(λ) = sup{g(α) : α < λ} λ 6= 0 limit
An easy induction shows that g is increasing: α 6 β implies g(α) 6 g(β). (Note
that in particular, for a non-zero limit λ, the set {g(α) : α < λ} is a chain, and
thus the definition of g(λ) makes sense).
If f has no fixed point, then g is strictly increasing (α < β implies g(α) < g(β)),
and hence injective which contradicts the choice of γ.

28
Theorem 9. AC+Bourbaki–Witt =⇒ ZL

Remark. For this reason, the Bourbaki–Witt fixed point theorem is sometimes
called the ‘choice-free’ part of ZL.

Proof. Let X be a poset in which every chain has an upper bound.


Case 1: X is chain-complete. Fix a choice function g : PX \ {∅} → X. Assume
that X has no maximal element. We can then define f : X → X by letting
f (x) = g({y ∈ X : x < y}) for every x ∈ X. Then x < f (x) for all x ∈ X
contradicting Bourbaki–Witt.
Case 2: general case. Define

C = {C ⊂ X : C is a chain}

partially ordered by inclusion. Then C is chain-complete, and hence it contains


a maximal element C by Case 1. Let x be an upper bound for C in X. If x < y
for some y ∈ X, then C ∪ {y} is a chain contradicting the maximality of C.
Thus, x is a maximal element of X.

Remark. As a by-product, we proved the Hausdorff Maximality Principle


(which is also equivalent to AC): every poset contains a maximal chain.

END OF NON-EXAMINABLE SECTION

29
4 First-order Predicate Logic

Introduction. In Propositional Logic we had a set P of primitive propositions


and we combined them using ⊥ and ⇒ (and later ∧, ∨, ¬, >, ⇔) to build the
language L = L(P ) of all (compound) propositions. The primitive propositions,
however, had no meanings attached to them.
Our aim now is to describe a wide variety of mathematical theories. We replace
primitive propositions with statements like
m(x, m(y, z)) = m(m(x, y), z) , m(x, i(x)) = e
in the language of groups, or x 6 y in the language of posets. These statements
are built using variables (like x, y, z, . . . above), operation symbols (like the
binary symbol m, the unary symbol i and the nullary symbol, i.e., constant, e)
and predicate symbols (like the binary predicate 6).
We then combine these statements into formulae. For example, in the language
of groups we have
(∀ x)(∀ y)(∀ z)(m(x, m(y, z)) = m(m(x, y), z))
to describe associativity, or
(∀ x)(m(x, x) = e) ⇒ (∀ x)(∀ y)(m(x, y) = m(y, x))
to describe the statement you proved in IA Groups that if every non-identity
element has order 2, then the group is abelian. An example from the language
of posets is the sentence
(∀ x)(∀ y)(∀ z)(((x 6 y) ∧ (y 6 z)) ⇒ (x 6 z))
to describe transitivity.
In Propositional Logic we had valuations. A valuation v can be thought of as
a choice of constant v(p), for every proposition p, from the set {0, 1}. In turn,
a choice of constant from {0, 1} is a function from a singleton set to {0, 1}. In
first-order logic we will have a structure which will be a set A together with a
choice of function pA : An → {0, 1} for every formula p, where n is the number
of variables in p. For a set S of formulae, we will define the notion of model
for S and we will define semantic and syntactic consequences of S in a way
very similar to what we did in Chapter 1. It is now time to give the formal
definitions.

The Language
The language is specified by a disjoint pair of sets: the set Ω of operation
symbols and the set Π of predicate symbols together with an arity function
α : Ω ∪ Π → N0 = N ∪ {0}. The language L = L(Ω, Π) then consists of the
following.

Variables. This is a countable set disjoint from Ω and Π. We usually denote


variables as x1 , x2 , . . . or x, y, z, . . . .

Terms. Also called Ω-terms, these are defined inductively as follows.

30
(i) Every variable is a term.

(ii) If ω ∈ Ω, n = α(ω) and t1 , t2 , . . . , tn are terms, then ωt1 t2 . . . tn is also a


term.
For example, the language of groups consists of Ω = {m, i, e} and Π = ∅ with
arities α(m) = 2, α(i) = 1, α(e) = 0. The following then are examples of terms
in the language of groups.

mxmyz , mmxyz , mxe , mxix , mee

Note that brackets are not needed as their positions are uniquely determined by
the order of operation symbols and variables. The following strings of operation
symbols and variables are not terms: mxymz, emx, mxyz.

Atomic formulae. These are of two kinds.


(i) If s, t are terms, then (s = t) is an atomic formula.

(ii) If ϕ ∈ Π, α(ϕ) = n and t1 , t2 , . . . , tn are terms, then ϕt1 t2 . . . tn is an atomic


formula.
For example, the language of posets consists of Ω = ∅ and Π = {6} with arity
α(6) = 2. The following are then atomic formulae in the language of posets.

(x1 = x2 ) , (x1 6 x2 )

The latter example should really be written as 6 x1 x2 . However, here and


below we shall often revert to the more conventional way of writing algebraic
expressions which then may require the insertion of brackets to avoid ambiguity.

Formulae. These are defined inductively as follows.


(i) Atomic formulae are formulae.

(ii) ⊥ is a formula.

(iii) If p, q are formulae, then (p ⇒ q) is a formula. As before, we introduce


∧, ∨, ¬, >, ⇔ in order to abbreviate certain formulae.

(iv) If p is a formula and x is a free variable in p, then (∀ x)p is a formula. We


also introduce (∃ x)p as an abbreviation for ¬(∀ x)¬p.
To explain clause (iv), note that a formula is a string of symbols from Ω ∪ Π,
from variables and from the set {⊥, ⇒, (, ), =, ∀ } (and other symbols introduced
as abbreviations). An occurence of a variable inside a term or inside a term as
part of a formula is either free or bound, defined inductively on the language as
follows. Any occurence of a variable in any term or in any atomic formula is
free. Given formulae p and q, an occurence of a variable in p or q remains an
occurence of the same type in the formula (p ⇒ q). Finally, if q is a formula
containing free occurences of a variable x, then those occurences become bound
in the formula (∀ x)q (all other occurences of variables in q remain unchanged
in (∀ x)q). For a formula p, we denote by FV(p) the set of free variables of p,
i.e., the set of variables that have at least one free occurence in p.

31
Examples. The following are formulae in the language of groups. In each case
we indicate for every occurence of the variable x whether it is a free or bound
occurence.

(mxix = e) ⇒ (mixx = e)
↑ ↑ ↑↑
free free

(∀ x)(mxx = e) ⇒ (∀ x)(∀ y)(mxy = myx)


↑↑ ↑ ↑
bound bound bound

(mxx = y) ⇒ ¬(∃ x)(mmxxx = y)


↑↑ ↑↑↑
free bound

Note that in the last example, the variable x has both free and bound occurences.
Although such formulae are technically allowed, it is usual mathematical prac-
tice to avoid them. In the second example, there are no free variables: both x
and y only have bound occurences. Such formulae have a special name.

Definition. A formula with no free variables is called a sentence.

Structures
Definition. A structure in a language L = L(Ω, Π), or L-structure, is a non-
empty set A together with functions

ωA : An → A (ω ∈ Ω, n = α(ω))

and subsets ϕA ⊂ An or equivalently, identifying a subset with its indicator


function, functions

ϕA : An → {0, 1} (ϕ ∈ Π, n = α(ϕ))

Note. An operation symbol ω of arity 0 is called a constant. Its interpretation


in a structure A is a function ωA from A0 , a singleton set, to A, i.e., ωA is
simply an element of A.

Note. In normal mathematical practice, we allow the empty set to be a struc-


ture. Its exclusion here is a simplifying assumption (see later for an explanation).

Examples. 1. If L is the language of groups, then an L-structure is a set A


together with functions

mA : A × A → A , iA : A → A

and an element eA ∈ A. Note that A is not a group yet!


2. If L is the language of posets, then an L-structure is a non-empty set A
together with a subset 6A of A × A, i.e., a relation on A. Again, note that A
is not yet a poset.

32
Motivation. Let L = L(Ω, Π) be a first-order language and A an L-structure.
Given a formula p in the language L, we want to define the interpretation of
p in A and what it means that ‘p is satisfied in A’. For example, if p is the
formula (mxix = e) in the language of groups, then we let pA be the subset
{a ∈ A : mA (a, iA (a)) = eA } of A, and then say that p is satisfied in A if
pA = A. Equivalently, identifying pA with its indicator function, we say p is
satisfied in A if pA : A → {0, 1} is the constant function with value 1, where
pA (a) = 1 if mA (a, iA (a)) = eA , and pA (a) = 0 otherwise. If q now is the
sentence (∀ x)p, then its interpretation in A should be a function A0 → {0, 1},
i.e., qA is simply an element of {0, 1}. We set qA = 1 if pA (a) = 1 for every
a ∈ A, i.e., if p holds in A, otherwise we set qA = 0. We now give the formal
description of how to interpret formulae in a structure. This is rather dry, and
it is best to let these motivating examples be the guide.

Definitions. Let L = L(Ω, Π) be a first-order language and A an L-structure.


Let t be a term in L and p be a formula in L both with free variables contained
in the set {x1 , . . . , xn }. We define interpretations tA : An → A of t in A and
pA : An → {0, 1} (equivalently, pA ⊂ An ) of p in A by induction on the language
L as follows. (Think of these as substitutions.)

• If t is xi for some i = 1, . . . , n, then

tA : An → A , tA (a1 , . . . , an ) = ai

• If t is ωt1 . . . tm for some ω ∈ Ω with arity m and terms t1 , . . . , tm , then

tA : An → A , tA (a1 , . . . , an ) = ωA (t1 )A (a1 , . . . , an ), . . . , (tm )A (a1 , . . . , an ) .




[Note: if ω ∈ Ω with α(ω) = n and t is the term ωx1 . . . xn , then tA = ωA .]

• If p is (u = v) for terms u, v, then

pA = {(a1 , . . . , an ) ∈ An : uA (a1 , . . . , an ) = vA (a1 , . . . , an )} .

• If p is ϕt1 . . . tm for some ϕ ∈ Π with arity m and terms t1 , . . . , tm , then

pA = (a1 , . . . , an ) ∈ An : (t1 )A (a1 , . . . , an ), . . . , (tm )A (a1 , . . . , an ) ∈ ϕA


 

or, identifying ϕA ⊂ Am with its indicator function, pA : An → {0, 1} is given


by 
pA (a1 , . . . , an ) = ϕA (t1 )A (a1 , . . . , an ), . . . , (tm )A (a1 , . . . , an )
[Note: if ϕ ∈ Π with n = α(ϕ) and p is ϕx1 . . . xn , then pA = ϕA .]

• ⊥A is the constant function An → {0, 1} with value 0.

• If p is (q ⇒ r), then

pA = {(a1 , . . . , an ) ∈ An : qA (a1 , . . . , an ) = 0 or rA (a1 , . . . , an ) = 1} .

• If p is (∀ xn+1 )q with F V (q) ⊂ {x1 , . . . , xn , xn+1 }, then

pA = {(a1 , . . . , an ) ∈ An : (a1 , . . . , an , an+1 ) ∈ qA for all an+1 ∈ A} .

33
Theories and Models
Definition. Let L = L(Ω, Π) be a first-order language and A be an L-structure.
Given a formula p in the language L, we say p is satisfied in A (or p holds in A, or
p is true in A or A is a model of p) if pA = An or, equivalently, pA : An → {0, 1}
is the constant function with value 1, where n is the number of free variables
in p.

Note. When p is a sentence, then pA is a function A0 → {0, 1}, i.e., an element


of {0, 1}. Thus, p holds in A if and only if pA = 1.

Definition. A theory in L is a set of sentences in L. A model for a theory T is


an L-structure in which each sentence in T is satisfied.

Examples. 1. Theory of groups The language of groups has been defined


above: it consists of Ω = {m, i, e} and Π = ∅ with the arities of m, i, e being
2, 1, 0, respectively. The theory of groups is the following set of sentences.

T = {(∀ x)(∀ y)(∀ z)(mxmyz = mmxyz) ,


(∀ x)(mxe = x ∧ mex = x) ,
(∀ x)(mxix = e ∧ mixx = e)}

Then every model of T is a group and every group is a model of T . So groups


can be axiomatised as a first-order theory.
2. Theory of posets The language is as defined above: Ω = ∅, Π = {6} and
the arity of 6 is 2. The theory of posets is then the following set of sentences.

T = {(∀ x)(x 6 x) ,
(∀ x)(∀ y)((x 6 y ∧ y 6 x) ⇒ x = y) ,
(∀ x)(∀ y)(∀ z)((x 6 y ∧ y 6 z) ⇒ x 6 z)}

This axiomatises posets: every (non-empty) poset is a model of T and every


model of T is a poset.
3. Theory of rings with 1 The language consists of Ω = {+, ×, −, 0, 1} with
arities 2, 2, 1, 0, 0, respectively, and Π = ∅. The theory is as follows.

T = {(∀ x)(∀ y)(∀ z)((x + (y + z)) = ((x + y) + z)) ,


(∀ x)(∀ y)(x + y = y + x) ,
(∀ x)(x + 0 = x) ,
(∀ x)(x + (−x) = 0) ,
(∀ x)(∀ y)(∀ z)(x(yz) = (xy)z) ,
(∀ x)(∀ y)(∀ z)((x(y + z) = xy + xz) ∧ ((x + y)z = xz + yz)) ,
(∀ x)((x1 = x) ∧ (1x = x))}

The models of T are precisely the rings with 1. Note that here we reverted to
writing x + y instead of +xy, xy instead of x × y, etc. This and the theory of
groups are examples of algebraic theories: the sentences only involve equations;
well, they don’t actually, but for example the sentence (∀ x)((x1 = x)∧(1x = x))
can be replaced with the two sentences (∀ x)(x1 = x) and (∀ x)(1x = x), etc.

34
4. Theory of fields The language is the same as for rings with 1. The theory is
the union of the theory for rings with 1 and the following set of three sentences.

(∀ x)(∀ y)(xy = yx), ¬(0 = 1), (∀ x)(¬(x = 0) ⇒ (∃ y)(xy = 1))

This axiomatises fields. Note that this is not an algebraic theory, and indeed
fields cannot be axiomatised as an algebraic theory. This is because every field
has at least two elements, and it is easy to see that the singleton set is a model
for every algebraic theory.
5. Graph theory The language consists of Ω = ∅, Π = {a} with a having
arity 2 (a is the adjacency predicate). The theory is

(∀ x)¬(axx), (∀ x)(∀ y)(axy ⇒ ayx)

So graphs can also be axiomatised as a first-order theory.


6. Propositional theories This example shows that propositional logic is a
special case of predicate logic. Let P be a set of primitive propositions. Set
Ω = ∅ and Π = P with each primitive proposition having arity 0. In the language
L = L(Ω, Π) every primitive proposition is an atomic formula, and thus every
proposition in L(P ) (as defined in Chapter 1) is a formula. An L-structure is
a nonempty set A together with a function v : P → {0, 1} that maps p ∈ P to
pA ∈ {0, 1} (as usual we identify a function A0 → {0, 1} with the value it takes
in {0, 1}). Given a (compound) proposition t ∈ L(P ), its interpretation tA in
A is an element of {0, 1} (i.e., a function A0 → {0, 1}), and the map t 7→ tA is
precisely the extension v : L(P ) → {0, 1} of v to L(P ) as defined in Chapter 1.
Given S ⊂ L(P ), a model of S is a structure A with a function v : P → {0, 1}
such that for every s ∈ S we have sA = v(s) = 1. Thus, the meaning of model
coincides with the definition of model in Chapter 1. Note that the underlying
set A here is irrelevant.

Semantic entailment
Definition. Given a first-order language L = L(Ω, Π), a set S of sentences in
L and a sentence t in L, we say S (semantically) entails t, written S  t, if t
holds in every model of S.

Examples. 1. Let T be the theory of groups (in the language of groups). Then

T  (∀ x)(xx = e) ⇒ (∀ x)(∀ y)(xy = yx)

2. Let T be the theory of fields (in the language of rings with 1). Then

T  (∀ x) ¬(x = 0) ⇒ (∀ y)(∀ z)(xy = xz ⇒ y = z)

Remark. We will also need to define S  t in the case when S ∪ {t} contains
formulae with free variables. The following example motivates the definition.
Let T be the theory of fields (in the language of rings with 1). Let p be the
formula ¬(x = 0), let t be the formula (∃ y)(xy = 1) and let S = T ∪ {p}. It
ought to be the case that S  t because, given a field F , if we assign a value
a ∈ F to the variable x, then according to the field axioms, if pF (a) is true (i.e.,
pF (a) = 1), then tF (a) is also true.

35
Definition. Let L = L(Ω, Π) be a first-order language, S a set of formulae in
L and t a formula in L. Introduce a new constant to L for each free variable
occuring in S ∪ {t}. For a formula u ∈ S ∪ {t}, let u0 be the sentence in the
new language L0 obtained from u by replacing each free occurence of a variable
with the corresponding constant and set S 0 = {s0 : s ∈ S}. We then say S
(semantically) entails t, written S  t, if S 0  t0 .

Definition. A formula t in a first-order language L is a tautology if ∅  t, i.e.,


if t holds in every L-structure.

Note. In the definition of semantic entailment we substituted constants into


formulae. Later we will need a more general notion of substitution.

Definition. Let p be a formula in a first-order language L. If x is a free variable


in p and t is a term in L whose variables do not occur bound in p, then p[t/x]
is the formula obtained from p by replacing each free occurence of x with t.

Examples. Let p be the formula (∀ y)(mmxyx = mmyxy) in the language of


groups.
(i) If t is the term mzz, then p[t/x] is (∀ y)(mmmzzymzz = mmymzzy).

(ii) If t is the term mxx, then p[t/x] is (∀ y)(mmmxxymxx = mmymxxy).

(iii) If t is the term myy, then p[t/x] is not defined as y occurs bound in p.

Syntactic entailment
Axioms.
(A1) p ⇒ (q ⇒ p) (p, q any formulae)

(A2) (p ⇒ (q ⇒ r)) ⇒ ((p ⇒ q) ⇒ (p ⇒ r)) (p, q, r any formulae)

(A3) ¬¬p ⇒ p (p any formula)

(A4) (∀ x)(x = x)

(A5) (∀ x)(∀ y)((x = y) ⇒ (p ⇒ p[y/x]))


(x, y distinct variables, p a formula, y does not occur bound in p)

(A6) ((∀ x)p) ⇒ p[t/x]


(p a formula, x a free variable of p, t a term no variable of which occurs
bound in p)

(A7) (∀ x)(p ⇒ q) ⇒ (p ⇒ (∀ x)q)


(p, q formulae, x a free variable of q that does not occur free in p)

Note. Every axiom is a tautology.

36
Rules of deduction.

Modus ponens (MP): from p and p ⇒ q, we can deduce q.

Generalisation (Gen): from p with free variable x, we can deduce (∀ x)p pro-
vided x does not occur free in any premiss used in the
proof of p.
Definition. Let S be a set of formulae and p be a formula. A proof of p from
S is a finite sequence t1 , . . . , tn of formulae such that tn = p and for every i,
(i) either ti is an axiom, or

(ii) ti is a premiss (member of S), or

(iii) ti follows by modus ponens (∃ j, k < i such that tk is (tj ⇒ ti )), or

(iv) ti follows by Generalisation (∃ j < i such that ti = (∀ x)tj where x is a free


variable in tj that does not occur free in any premiss tk , k 6 j).
We say S proves p, and write S ` p, if there is a proof of p from S. If S is a
theory, p is a sentence and S ` p, then we say p is a theorem of S.

Remark. We can now explain why a structure in a language L had to be non-


empty. Suppose we allowed the empty set as a structure. Then (∀ x)¬(x = x)
is satisfied in ∅, whereas ⊥ is not. Thus, {(∀ x)¬(x = x)} 2 ⊥. On the other
hand, {(∀ x)¬(x = x)} ` ⊥. Indeed, we have
(∀ x)¬(x = x) (premiss)
(∀ x)¬(x = x) ⇒ ¬(x = x) (A6)
¬(x = x) (MP)
(∀ x)(x = x) (A4)
(∀ x)(x = x) ⇒ (x = x) (A6)
(x = x) (MP)
⊥ (MP)

Example. {(x = y)} ` (y = x)

(∀ x)(∀ y)((x = y) ⇒ ((x = z) ⇒ (y = z))) (A5)


(x = y) ⇒ ((x = z) ⇒ (y = z)) (A6+MP twice)
(x = y) (premiss)
(x = z) ⇒ (y = z) (MP)
(∀ z)((x = z) ⇒ (y = z)) (Gen)
(∀ z)((x = z) ⇒ (y = z)) ⇒ ((x = x) ⇒ (y = x)) (A6)
(x = x) ⇒ (y = x) (MP)
(∀ x)(x = x) (A4)
(∀ x)(x = x) ⇒ (x = x) (A6)
(x = x) (MP)
(y = x) (MP)

37
Proposition 1. (Deduction Theorem) Let S be a set of formulae and p, q
be formulae. Then S ` (p ⇒ q) if and only if S ∪ {p} ` q.

Proof. Assume that S ` (p ⇒ q). Write down a proof of (p ⇒ q) from S and


append the lines

p (premiss)
q (MP)

to obtain a proof of q from S ∪ {p}.

Now assume that S ∪ {p} ` q and let t1 , . . . , tn be a proof of q from S ∪ {p}.


We show that S ` (p ⇒ ti ) for all i by induction.
Induction hypothesis before the ith step: for each j < i, we have S ` (p ⇒ tj )
in such a way that if a variable x did not occur free in any premiss used in the
proof t1 , . . . , tj of tj from S ∪ {p}, then x does not occur free in any premiss
used in the proof of (p ⇒ tj ) from S.
We now show S ` (p ⇒ ti ) by considering a number of cases.
Case 1: If ti is an axiom or ti ∈ S, then

ti ⇒ (p ⇒ ti ) (A1)
ti (axiom or premiss)
p ⇒ ti (MP)

is a proof of p ⇒ ti from S.
Case 2: If ti = p, then S ` (p ⇒ ti ) since ` (p ⇒ p).
Case 3: If there exist j, k < i such that tk = (tj ⇒ ti ), then by induction
hypothesis, there are proofs of p ⇒ tj and p ⇒ (tj ⇒ ti ) from S. Adding the
lines
 
p ⇒ (tj ⇒ ti ) ⇒ (p ⇒ tj ) ⇒ (p ⇒ ti ) (A2)
(p ⇒ tj ) ⇒ (p ⇒ ti ) (MP)
p ⇒ ti (MP)

we obtain a proof of p ⇒ ti from S.


Case 4: Finally, if for some j < i we have ti = (∀ x)tj , then x occurs free in tj
and x does not occur free in any premiss used in the proof of tj from S ∪ {p}.
We now have two further cases.
Case 4a: If x occurs free in p, then p was not used in the proof of tj from S ∪{p},
and so we have a proof of tj from S and x does not occur free in any premiss
used in this proof. We write down this proof and append the lines

(∀ x)tj (Gen)
(∀ x)tj ⇒ (p ⇒ (∀ x)tj ) (A1)
p ⇒ (∀ x)tj (MP)

to obtain a proof of p ⇒ ti from S.

38
Case 4b: If x does not occur free in p, then we write down a proof of p ⇒ tj
from S in which x does not occur free in any premiss (possible by induction
hypothesis). We then append the following lines

(∀ x)(p ⇒ tj ) (Gen)

(∀ x)(p ⇒ tj ) ⇒ p ⇒ (∀ x)tj (A7)
p ⇒ (∀ x)tj (MP)

to obtain a proof of p ⇒ ti from S. In all cases, it is easy to check that the


induction hypothesis is valid before the (i + 1)th step.

Aim. We now embark on the proof of the Completeness Theorem that states
that, for first-order logic, ` and  coincide.

Proposition 2. (Soundness Theorem) Let S be a set of formulae in a


first-order language L and p be a formula in L. If S ` p, then S  p.
Proof (non-examinable). We proceed by induction on the length of a proof
of p from S. Note that if p follows by Generalisation from an earlier line q in
the proof, i.e., if p = (∀ x)q, then there exists S1 ⊂ S such that only premisses
in S1 are used in the proof of q and x does not occur free in S1 . By induction
hypothesis, S1  q and since x does not occur free in S1 , it follows that S1  p,
which in turn implies that S  p.

Note. For the converse, i.e., that S  p implies S ` p, we first consider the
special case when S is a theory and p is the formula ⊥.

Definition. A theory S in a first-order language is inconsistent if S ` ⊥, and


otherwise S is consistent.

Theorem 3. (Model existence lemma) If S is a consistent theory in a


first-order language L = L(Ω, Π), then S has a model.

Idea of proof. We will build a model from the language L itself. We initially
choose our structure to be the set A of all closed terms of L, i.e., terms not
involving variables. Examples of closed terms in the language of commutative
rings with 1:

1+1 , (1 + 0) + 1 , 1·1 , (1 + 0) · 0 + 1 , etc.

We turn A into a structure by interpreting the operation symbols in the obvious


way. In the example above, we would have (1 + 0) +A (1 + 1) = (1 + 0) + (1 + 1)
or 0A = 0, 1A = 1.
However, if S is the theory of fields, for example, then A is not a model of S.
E.g., we have S ` (1 + 0 = 1) but 1 + 0 = 1 is not satisfied in A since the closed
terms 1 + 0 and 1 are different terms. There is an easy remedy. Introduce the
equivalence relation s ∼ t if and only if S ` (s = t) and replace A by its quotient
A/ ∼. Operation symbols are now interpreted on representatives of equivalence
classes. E.g., [1] +A [1] = [1 + 1] in our example. Two issues remain.

39
For an example of the first issue, consider the theory S of fields with charac-
teristic 2 or 3, which consists of the theory of fields together with the sentence
(1+1 = 0)∨(1+1+1 = 0). Then S 0 (1+1 = 0) since S has models that are fields
of characteristic 3. Similarly, S 0 (1 +1 +1 = 0). It follows that in our structure
A, we have [1] +A [1] = [1 + 1] 6= [0] and [1] +A [1] +A [1] = [1 + 1 + 1] 6= [0], and
thus A is not a model. As for propositional logic, we will first extend the theory
S to a consistent theory that is complete. In general, we say that a theory S
in a first-order language L is complete if for every sentence p, either S ` p or
S ` ¬p.
For an example of the second issue, consider the theory S of fields in which
2 has a square root. This consists of the theory of fields together with the
sentence (∃ x)(xx = 1 + 1). Then the structure A defined above consisting of
∼-equivalence classes of closed terms is not a model, since there is no closed
term t such that [tt] = [1 + 1]. In other words, we lack a witness to the sentence
(∃ x)(xx = 1 + 1), i.e., we lack a closed term t such that S ` p[t/x] where p
is the formula (xx = 1 + 1). The solution is to add a new constant c to our
language and the new sentence (cc = 1 + 1) to our theory.
The problem is that the two processes of adding witnesses and of completion
pull in different directions. When we add witnesses to a complete theory, the
new theory may no longer be complete. When we complete a theory which has
witnesses, the new theory may lack witnesses.

Proof of Theorem 3 (non-examinable). We begin with the observation that if


S is a consistent theory, then for any sentence p, one of S ∪ {p} and S ∪ {¬p} is
consistent. Indeed, otherwise, by the Deduction Theorem, we have S ` ¬p and
S ` ¬¬p, and hence an application of modus ponens yields S ` ⊥, which is a
contradiction. An argument using Zorn’s Lemma now shows that S is contained
in a consistent theory S such that for every sentence p, either p ∈ S or ¬p ∈ S.
In particular, S is complete.

Next assume that S is a consistent theory and S ` (∃ x)p where p is a formula


with one free variable x. We add a new constant c to the language L and show
that S ∪ {p[c/x]} is consistent. Indeed, otherwise, by the Deduction Theorem,
we get S ` ¬p[c/x]. Since c does not occur in S, if we replace every occurence of
c in the proof of ¬p[c/x] from S with a new variable not used in the proof, and
then apply (Gen), axiom (A6) and (MP), we obtain a proof of ¬p from S. It
follows that S ` (∀ x)¬p by Generalisation, and since S ` (∃ x)p by assumption,
we deduce S ` ⊥ by modus ponens, which is a contradiction. Applying this to
all theorems of S of the form (∃ x)p, we obtain a new language L = L(Ω ∪ C, Π)
where C is a set of new constants, and a consistent theory S ⊃ S such that for
every sentence of the form (∃ x)p in the language L that is deducible from S,
there is a closed term t in L such that S ` p[t/x].

Let us now start with a consistent theory S in a first-order language L =


L(Ω, Π). Put S0 = S, L0 = L and inductively define, using the two processes
described above, theories S0 ⊂ S1 ⊂ T1 ⊂ S2 ⊂ T2 ⊂ . . . and languages
Ln = L(Ω ∪ C1 ∪ · · · ∪ Cn , Π), where C1 , C2 , . . . are pairwise disjoint sets of
constants each disjoint from Ω (and from Π), such that for each n ∈ N, Sn is a
consistent, complete theory in Ln−1
S and Tn is a consistent theory in Ln which
has witnesses for Sn . We put L∗ = n Ln and S ∗ = n Sn . It is straightforward
S

40
to verify that S ∗ is a consistent theory in the language L∗ that is complete and
has witnesses. Since every model of S ∗ is also a model of S, to complete the
proof, we may assume that S is complete and has witnesses.

We let A be the set of equivalence classes of closed terms in L under the equiv-
alence relation s ∼ t if and only if S ` (s = t). We turn A into an L-structure
as follows. For ω ∈ Ω with arity n, we define ωA : An → A by setting

ωA ([t1 ], . . . , [tn ]) = [ωt1 . . . tn ] ,

and for ϕ ∈ Π with arity n, we let

ϕA ([t1 ], . . . , [tn ]) = 1 if and only if S ` (ϕt1 . . . tn ) .

An easy induction shows that if s is a term with variables in {x1 , . . . , xn },


then sA ([t1 ], . . . , [tn ]) = [s[t1 /x1 , . . . , tn /xn ]]. In particular, sA = [s] for any
closed term s. Another induction on the language then shows that for any
formula p with FV(p) ⊂ {x1 , . . . , xn } and for closed terms t1 , . . . , tn , we have
S ` p[t1 /x1 , . . . , tn /xn ] if and only if pA ([t1 ], . . . , [tn ]) = 1. In particular, if
p ∈ S, then pA = 1 (i.e., p holds in A), and thus A is a model of S.

Corollary 4. (Adequacy Theorem) Let S be a set of formulae in a first-order


language L and p be a formula in L. If S  p, then S ` p.
Proof (non-examinable). We first reduce to the case when S is a theory and
p is a sentence. Indeed, by definition, S  p means that S 0  p0 , where S 0
and p0 are obtained from S and p by replacing free occurences of variables with
constants added to the language. Then if we know that there is a proof of p0
from S 0 , then we can put new variables back in to replace the new constants,
repeatedly apply (Gen), axiom (A6) and (MP) to obtain a proof of p from S.

Now since S  p, the theory S ∪ {¬p} has no models, and so S ∪ {¬p} ` ⊥ by


Theorem 3. By the Deduction Theorem, we have S ` ¬¬p, and hence S ` p
using the axiom ¬¬p ⇒ p and modus ponens.

Theorem 5. (Gödel’s Completeness Theorem for first-order logic)


Let S be a set of formulae in a first-order language L and p be a formula in L.
Then S  p if and only if S ` p.

Corollary 6. (Compactness Theorem) Let S be a theory in a first-order


language L. If every finite subset of S has a model, then S has a model.
Proof. If S has no model, then S ` ⊥ by Theorem 3. As proofs are finite, there
is a finite subset S 0 of S such that S 0 ` ⊥. By the Soundness Theorem, S 0  ⊥,
i.e., S 0 has no model, which contradicts the assumption about S.

Applications of completeness/compactness
Question. Can we axiomatise the theory of finite groups? In other words, does
there exist a first-order theory T in a suitable language such that every finite
group is a model of T and every model of T is a finite group?

41
Consider, for each n ∈ N, the following sentence tn defined in any language L:

(∃ x1 )(∃ x2 ) . . . (∃ xn )(∀ x)(x = x1 ∨ x = x2 ∨ · · · ∨ x = xn )

Note that tn is satisfied in an L-structure A if and only if A has at most n


elements. So we would like to take T to be the theory of groups (in the language
of groups) and add the ‘sentence’ t1 ∨ t2 ∨ . . . . Obviously, this is not possible
as sentences are finite strings of symbols.

Corollary 7. The theory of finite groups is not axiomatisable as a first-order


theory.
Proof. Assume that T is a first-order theory whose models are the finite groups.
Let
S = T ∪ {¬t1 , ¬t2 , ¬t3 , . . . }
where the tn are the sentences defined above. Note that ¬tn says that there are
more than n elements in any model. Thus, every finite subset of S has a model:
e.g., the cyclic group of order N for sufficiently large N . By the Compactness
Theorem, S has a model, which is absurd.
A similar argument shows the following.

Corollary 8. Let S be a theory in a first-order language. If S has arbitrarily


large finite models, then S has an infinite model.
Proof. Set S 0 = S ∪ {¬t1 , ¬t2 , . . . } (tn as before). Observe that, by assump-
tion, every finite subset of S 0 has a model, and thus S 0 has a model by the
Compactness Theorem. A model of S 0 is an infinite model of S.

Corollary 9. (The Upward Löwenheim–Skolem Theorem) If a first-order


theory S has an infinite model, then it has an uncountable model.
Proof. Add an uncountable set {ci : i ∈ I} of new constants to the language
and set
S 0 = S ∪ {¬(ci = cj ) : i, j, ∈ I, i 6= j} .
Any infinite model of S, which exist by assumption, is a model of any finite
subset of S 0 . Hence by the Compactness Theorem, S 0 has a model. Note that
a model of S 0 is a model A of S together with an injection I → A.

Note. For any set X we can take I = γ(X) (from Hartogs’ Lemma). This
shows that S has models that do not inject into X.

Remark. We can easily write down uncountable groups or vector spaces, but
already for fields, the Upward Löwenheim–Skolem Theorem is not obvious.

Corollary 10. (The Downward Löwenheim–Skolem Theorem) Let S be


a first-order theory in a countable language (meaning that Ω ∪ Π is countable).
If S has a model, then it has a countable model.
Proof. By the Soundness Theorem, S is consistent since it has a model. Then
the model constructed in the proof of the Model Existence Lemma is countable.

42
Peano Arithmetic
We finish this chapter with another worked example. Our aim is to axiomatise
the set of natural numbers. The key defining property of N is induction which
we try to emulate with an axiom-scheme.

The language consists of Ω = {0, s, +, ×} with arities 0, 1, 2, 2, respectively, and


Π = ∅.

Peano Arithmetic (PA) (also known as formal number theory) is the theory in
the language above with sentences as follows.

(∀ x)¬(sx = 0)

(∀ x)(∀ y)(sx = sy ⇒ x = y)

(∀ x)(x + 0 = 0)

(∀ x)(∀ y)(x + sy = s(x + y))

(∀ x)(x × 0 = 0)

(∀ x)(∀ y)(x × sy = (x × y) + x)
  
(∀ y1 ) . . . (∀ yn ) p[0/x] ∧ (∀ x)(p ⇒ p[sx/x]) ⇒ (∀ x)p
for any formula p with FV(p) = {x, y1 , . . . , yn }.

Remark. The last axiom is the axiom-scheme for induction. The variables
y1 , . . . , yn are parameters. To see why they are needed, consider the following
formula p: (x + y) + z = x + (y + z) with free variables x, y, z. We can prove
(∀ x)(∀ y)(∀ z)p by iduction on z with x, y treated as parameters. Formally, we
verify that
p[0/z] ∧ (∀ z)(p ⇒ p[sz/z])
holds in any model of PA, and hence it is provable by completeness. We
then use the induction-scheme to deduce that (∀ z)p holds, and hence so does
(∀ x)(∀ y)(∀ z)p by Generalisation.

Note. An obvious model of PA is the set N0 of non-zero integers. (N is also


a model but it is more natural to include 0 in this context.) Then by the
Upward Löwenheim–Skolem Theorem, PA has uncountable models. Doesn’t
this contradict the fact that induction (and the other axioms of PA) uniquely
determine N0 ? The answer is ‘no’ because true induction says:
  
(∀ A ⊂ N0 ) 0 ∈ A ∧ (∀ x)(x ∈ A ⇒ sx ∈ A) ⇒ A = N0

whereas in first-order theory we cannot quantify over subsets of a structure.


Since the language of PA is countable, the induction axiom-scheme can only
capture countably many subsets of N0 .

Definition. A subset A is N0 is definable in PA if there is a formula p with


free variable x such that pN0 , the interpretation of p in N0 , is A.

43
Examples. The following sets are definable using the given formula:

set of squares: (∃ y)(y × y = x)



set of primes: ¬(x = 1) ∧ (∀ y) y|x ⇒ (y = 1 ∨ y = x)
(where y|x is shorthand for (∃ z)(z × y = x) and 1 is s0)

powers of 2: (∀ y) (‘y is a prime’ ∧ y|x) ⇒ y = 2
(where 2 is ss0.)

Remark. Gödel’s Incompleteness Theorem says (amongst other things) that


PA is not a complete theory. Thus, there is a formula p in PA that holds in N0
but PA0 p.

44
5 Set Theory
This is just another piece of mathematical theory like group theory, topology,
etc. We will axiomatise set theory as a first-order theory. So we could think
of this chapter as just another worked example of first-order logic. Since any
model of set theory should contain all of mathematics, it will obviously be a
very complicated example of a first-order theory.

Zermelo–Fraenkel Set Theory (ZF). This is the version of set theory we


will be studying. The language has no operation symbols (Ω = ∅) and a single
predicate ∈ of arity 2. There will be 9 axioms (i.e., sentences of the theory): 2
to get started, 4 for building new sets and 3 further ones.
A model of ZF will be denoted by V . This is a non-empty set together with an
interpretation ∈V ⊂ V × V of the predicate ∈ in which the 9 axioms of ZF are
satisfied. Elements of V will be called ‘sets’. When (a, b) is in ∈V , we say that
‘a belongs to b’ or that ‘a is a member of b’. We refer to V as the ‘set-theoretic
universe’ or the ‘universe of sets’. So ‘set’ and ‘belongs to’ now have technical
meanings inside the universe, but they retain their usual meaning in the world
of mathematics of which V is a part. We will be interested in the question of
what the universe of all sets looks like. We begin by listing the axioms of ZF.

1. Axiom of extensionality (Ext).


‘If two sets have the same members, then they are equal.’

(Ext) (∀ x)(∀ y) (∀ z)(z ∈ x ⇔ z ∈ y) ⇒ x = y

2. Axiom of separation (Sep).


‘Can form a subset of a set.’ This is in fact an axiom-scheme:

(Sep) (∀ t1 ) . . . (∀ tn )(∀ x)(∃ y)(∀ z) z ∈ y ⇔ (z ∈ x ∧ p)

for any formula p with FV(p) = {z, t1 , . . . , tn }, where t1 , . . . , tn should be


thought of as parameters. The set y whose existence is asserted in the sen-
tence above is unique by (Ext) and is denoted by {z ∈ x : p}. (Formally, we
are adding an (n + 1)-ary operation symbol to the language of ZF.)

Note. Parameters are needed. For example, in a model of ZF we can form,


given sets t and x, the subset {z ∈ x : t ∈ z} of x.

3. Empty-set axiom (Emp).


‘There is a set with no members.’

(Emp) (∃ x)(∀ y)¬(y ∈ x)

The set x whose existence this axiom asserts is unique by (Ext). We call this set
the empty set denoted by ∅. (Formally, we add the constant ∅ to the language
of ZF with the sentence (∀ y)¬(y ∈ ∅).)
Strictly speaking, this axiom is not needed as it follows from (Sep). Indeed, in
a structure V , we can pick any set x and form the set {y ∈ x : ¬(y = y)} by
(Sep). However, if in first-order logic we allow the empty set as a structure,
then (Emp) is needed (or some axiom asserting the existence of some set).

45
4. Pair-set axiom (Pair).
‘For any sets x, y, we can form {x, y}.’
 
(Pair) (∀ x)(∀ y)(∃ z)(∀ t) t ∈ z ⇔ t = x ∨ t = y

The unique (by Extensionality) set z whose existence is asserted here is denoted
by {x, y}. We shall write {x} for {x, x}. (Formally, we add a binary operation
{, } and a unary operation {} to the language of ZF.)
It follows from (Ext) that {x, y} = {y, x} for all x, y. Thus, (Pair) gives us
unordered pairs. Ordered pairs can be constructed using the following device
(due, independently, to K. Kuratowski and N. Wiener): for sets x, y define
(x, y) = {{x}, {x, y}}. This satisfies:
 
(∀ x)(∀ y)(∀ z)(∀ t) (x, y) = (z, t) ⇔ x = z ∧ y = t

as one would expect. We now introduce a number of abbreviations.


‘x is an ordered pair’ means

(∃ y)(∃ z)(x = (y, z))

‘f is a function’ means

(∀ x) x ∈ f ⇒ ‘x is an ordered pair’
  
∧ (∀ x)(∀ y)(∀ z) (x, y) ∈ f ∧ (x, z) ∈ f ⇒ y = z

‘f is a function from x to y’ (written f : x → y) means


  
(‘f is a function’) ∧ (∀ s) (∃ t) (s, t) ∈ f ⇔ s ∈ x
  
∧ (∀ t) (∃ s) (s, t) ∈ f ⇒ t ∈ y

5. Union axiom (Un).


‘Can form the union of a set.’
 
(Un) (∀ x)(∃ y)(∀ z) z ∈ y ⇔ (∃ t) t ∈ x ∧ z ∈ t

The set y whose existence


S this axiom asserts is unique by Extensionality,Sand
will be denoted by x. (Formally, weSadd the unary operation symbol to
the language of ZF.) Note that the set x is the union of the members of x.
S
S that if x = {a, b}, then t ∈ x if and only if t ∈ a or t ∈ b.
As an example, note
We shall denote {a, b} by a ∪ b. (Formally, we introduce a binary operation
symbol ∪ to the language of ZF.)

Note. We do not need a separate axiom for intersections. Indeed, the following
sentence follows from the axioms so far:
 
(∀ x) ¬(x = ∅) ⇒ (∃ y)(∀ z) z ∈ y ⇔ (∀ t)(t ∈ x ⇒ z ∈ t)

46
Indeed, given a non-empty set x, we can form the set
n [ o
y= z∈ x : (∀ t)(t ∈ x ⇒ z ∈ t)

using (Un) and (Sep). Note that technically we work in a model here to construct
the set y; then the sentence above follows by the Completeness Theorem. T We
will denote the unique
T (by Extensionality) set y constructed above by x. We
will write a ∩ b for {a, b}.

Note. We can now define the domain of a function fS Note that if (x, y) ∈ f ,
. S
then since (x, y) = {{x}, {x, y}}, it follows that x, y ∈ f . We can then form
the set n [[ o
dom f = x ∈ f : (∃ y) (x, y) ∈ f

using (Un) and (Sep). This of course makes sense for any set f . Formally, we
introduce a new unary operation symbol dom to the language of ZF.

6. Power-set axiom (Pow).


‘Can form the power set of a set.’

(Pow) (∀ x)(∃ y)(∀ z)(z ∈ y ⇔ z ⊂ x)

where z ⊂ x is an abbreviation for (∀ t)(t ∈ z ⇒ t ∈ x). The unique set y whose


existence is asserted here is denoted by Px.
We can now form the Cartesian product of sets x and y. First note that for
s ∈ x and t ∈ y, the ordered pair (s, t) = {{s}, {s, t}} belongs to PP(x ∪ y). We
can then form the set
n o
x × y = z ∈ PP(x ∪ y) : (∃ s)(∃ t) s ∈ x ∧ t ∈ y ∧ z = (s, t)

using (Un), (Pow) and (Sep). In turn, we can form the set y x of all functions
from x to y:
y x = {f ∈ P(x × y) : f : x → y}
using (Pow) and (Sep).

7. Axiom of infinity (Inf ). With the first six axioms we can already do
quite a bit of mathematics. Also, in any model V there will be infinitely many
elements. For example, it is easy to show that the sets ∅, P∅, PP∅, . . . are pairwise
distinct.
For another example, let us first introduce for a set x, the successor of x to be
the set x+ = x ∪ {x}. Then the sets ∅, ∅+ , ∅++ , ∅+++ , . . . are pairwise distinct.
We shall denote these sets by 0, 1, 2, 3, . . . , respectively. Thus,

0=∅, 1 = {0} , 2 = {0, 1} , 3 = {0, 1, 2} , . . .

These examples show that from the outside, V is an infinite set. However, V is
not a set, i.e., the sentence (∃ x)(∀ y)(y ∈ x) does not hold in V . This is known
as Russell’s paradox. (Indeed, if the sentence holds in V , then we can form the
set y = {z ∈ x : ¬(z ∈ z)} by (Sep), and get a contradiction by considering
whether y ∈ y.) So we need an axiom that says that there is a set that contains,
for example, the elements 0, 1, 2, 3, . . . . We begin with a definition.

47
Say that ‘x is a successor set’ if

(0 ∈ x) ∧ (∀ y ∈ x)(y + ∈ x)

where (∀ y ∈ x)p is shorthand for (∀ y)(y ∈ x ⇒ p) for any formula p. Any


successor set will contain 0, 1, 2, 3, . . . , and hence any successor set is infinite.
The axiom of infinity states that successor sets exist.

(Inf) (∃ x)(‘x is a successor set’)

It follows from this axiom (and the ones listed so far) that there is a smallest
successor set:

(∃ x) ‘x is a successor set’ ∧ (∀ y)(‘y is a successor set’ ⇒ x ⊂ y)

To prove this, fix a successor set y and form the set

z = {t ∈ Py : ‘t is a successor set’}
T
by (Pow) and (Sep). It is easy to check that x = z is the smallest successor
set which will be denoted by ω.
Note that every successor set contained in ω is ω:
 
(∀ x ⊂ ω) 0 ∈ x ∧ (∀ y ∈ x)(y + ∈ x) ⇒ x = ω


where (∀ x ⊂ y)p is shorthand for (∀ x)(x ⊂ y ⇒ p) for any formula p. So we


have true induction in V as we quantify over all subsets of ω. We refer to this
as ω-induction. (Note, however, that from the outside, ω may have subsets that
do not form a set inside V .)
It is straightforward to check that (∀ x ∈ ω)¬(x+ = 0). One can also prove that
(∀ x ∈ ω)(∀ y ∈ ω)(x+ = y + ⇒ x = y). Thus, ω satisfies the usual rules for the
natural numbers (cf. axioms of Peano Arithmetic).
We now define abbreviations ‘x is finite’ to mean (∃ y ∈ ω)(‘x bijects with y’)
and ‘x is countable’ to mean that (∃ f )((f : x → ω) ∧ ‘f is injective’).

8. Axiom of replacement (Rep).


With the axioms so far, we can form the set ω containing 0, 1, 2, . . . . How about
the sets ∅, P∅, PP∅, . . . ? We will see later that there is a map (inside V ) defined
by a formula that maps 0 7→ ∅, 1 7→ P∅, 2 7→ PP∅, . . . . A map defined by a
formula is called a function-class. So we need an axiom saying that the image of
a set under a function-class is a set. It will then follow that the image of ω under
the function-class 0 7→ ∅, 1 7→ P∅, 2 7→ PP∅, . . . is a set containing ∅, P∅, PP∅, . . . .

Digression on classes.

Definition. A class is a collection C of elements of V such that there is a


formula p with one free variable satisfying C = pV . Thus, x is in C if and only
if p(x) holds in V , i.e., pV (x) = 1.

Examples. 1. V is a class: we can take p to be the formula (x = x).


2. The collection of sets of size one is a class: we can take p to be the formula
(∃ y)(x = {y}).

48
Definition. A class C, given by a formula p with free variable y, is a set if
(∃ x)(∀ y)(y ∈ x ⇔ p) holds in V . Otherwise we say C is a proper class.

Examples. The class given by the formula (∀ y)(‘y is a successor set’ ⇒ x ∈ y)


is the set ω. On the other hand, V is a proper class (Russell’s paradox).

Definition. A function-class is a subset F of V × V such that F = pV for some


formula p with two free variables x, y satisfying

(∀ x)(∀ y)(∀ z) (p ∧ p[z/y]) ⇒ y = z

in V . Thus, (x, y) is in F if and only if p(x, y) holds in V , which we will of


course often write as F (x) = y.

Example. The map x 7→ {x} is a function class: we can take p to be (y = {x}).

End of digression.
The Axiom of Replacement is an axiom-scheme stating that the image of a set
under a function-class is a set. As usual, we use parameters.
h 
(Rep) (∀ t1 ) . . . (∀ tn ) (∀ x)(∀ y)(∀ z) (p ∧ p[z/y]) ⇒ (y = z)
i
⇒ (∀ x)(∃ y)(∀ z) z ∈ y ⇔ (∃ u)(u ∈ x ∧ p[u/x, z/y])

for any formula p with FV(p) = {x, y, t1 , . . . , tn }.

9. Axiom of foundation (Fnd).


We want a picture of the universe in which sets appear at a certain ‘time’, and
a set cannot appear before all its elements do. So we want to avoid pathological
behaviour like x ∈ x, i.e., we want to avoid {x} having no ∈-minimal member.
Similarly, we don’t want x ∈ y and y ∈ x, i.e., we want to avoid {x, y} having no
∈-minimal member. The Axiom of Foundation (or Axiom of Regularity) states
that every non-empty set has an ∈-minimal member:

(Fnd) (∀ x)(¬(x = ∅) ⇒ (∃ y)(y ∈ x ∧ (∀ z)(z ∈ x ⇒ ¬(z ∈ y))))

Remark. The nine axioms and axiom-schemes above form ZF set theory. Note
that the Axiom of Choice is not included. We shall write ZFC for ZF+AC, i.e.,
ZF set theory with the Axiom of Choice.
[
(AC) (∀ x)((∀ y ∈ x)¬(y = ∅) ⇒ (∃ f )(f : x → x ∧ (∀ y ∈ x)(f (y) ∈ y)))

Remark. For the rest of this chapter we work within ZF. Our ultimate aim is
to describe the set-theoretic universe V . We first prove versions of induction and
recursion similar to but more general than those introduced for well-ordered sets
in Chapter 2. This will eventually lead to a proper definition of ordinals thereby
filling the gap from Chapter 2. We then describe a picture of the universe in
which sets appear in ‘time’ measured by ordinals where no set appears before
all its members do.

49
Definition. A set x is transitive if every member of a member of x is a member
of x. Thus, ‘x is transitive’ is shorthand for

(∀ y)((∃ z)(z ∈ x ∧ y ∈ z) ⇒ y ∈ x)
S
or equivalently, if x ⊂ x.

Remark. This is not the same as saying that ∈ is a transitive relation on x.

Examples. An easy ω-induction shows that every member of ω is a transitive


set. Another ω-induction shows that (∀ x ∈ ω)(x ⊂ ω), which implies that ω is
also a transitive set.

Lemma 1. Every set x is contained in a transitive set:

(∀ x)(∃ y)(‘y is transitive’ ∧ x ⊂ y)

Remark. Since the intersection of a non-empty set of transitive sets is transi-


tive, Lemma 1 implies that there is a smallest transitive set containing x called
the transitive closure of x denoted TC(x).
S
Idea SofSproof. If y is a transitive set with x ⊂ y, then S Sx ⊂Sy,S and in
turn x ⊂ y, etc. So we would like to form the set {x, x, x, . . . }
which is
S SS indeed transitive and contains x. However, for this to work, we need
{x, x, x, . . . } to be a set. This will follow by the S
Axiom ofSReplacement.
S
So we need to show that the map sending 0 7→ x, 1 7→ x, 2 7→ x, . . . is a
function-class.
Proof of Lemma 1. We introduce the abbreviation ‘f is an attempt’ to mean

‘f is a function’ ∧ (dom f ∈ ω) ∧ (x = f (0))


 [ 
∧ (∀ n) (n+ ∈ dom f ) ⇒ f (n+ ) = f (n)

A straightforward ω-induction shows that any two attempts agree on the inter-
section of their domains:
h
(*) (∀ f )(∀ g)(∀ n) ‘f is an attempt’ ∧ ‘g is an attempt’
i 
∧ n ∈ dom f ∧ n ∈ dom g ⇒ f (n) = g(n)

Another ω-induction shows that every member of ω is in the domain of some


attempt:

(**) (∀ n ∈ ω)(∃ f )(‘f is an attempt’ ∧ n ∈ dom f )

To see this, form the set



w = n ∈ ω : (∃ f )(‘f is an attempt’ ∧ n ∈ dom f )

by (Sep). We show that w is a successor set from which it will follow that
w = ω, as required. Since f = {(0, x)} is an attempt, 0 ∈ w. If n ∈ w, then fix
an attempt f with n ∈ dom f . Since every member of ω is transitive, we have

50
n ⊂ dom f , and hence n+ ⊂ dom f . By restricting f to n+ , we can assume that
dom f = n+ in which case

g = f ∪ n+ , f (n)
 S 

is an attempt with n+ ∈ dom g. Thus, n+ ∈ w.


We now let p be the following formula with free variables y, z:

(∃ f )(‘f is an attempt’ ∧ (y, z) ∈ f )

It follows from (∗) that

(∀ y)(∀ z)(∀ u)((p ∧ p[u/z]) ⇒ u = z)

and thus p defines a function-class. By Replacement, the image of ω under this


function-class is a set: we can form the set w S S z ∈ w if and only if
suchSthat
holds. Informally, w is the set {x, x,
(∃ y)p(y, z) S x, . . . }. We now form
the set t = w and show that t is transitive with x ⊂ t which will complete the
proof.
Since {(0, x)} is an attempt, it follows that x ∈ w and x ⊂ t.
Now assume that a ∈ t. We need to show that a ⊂ t. Since a ∈ t, it follows
that a ∈ z for some z ∈ w, and in turn z = f (n) for some attempt f with
n ∈ dom f . By (∗∗), there is an attempt g with n+ ∈ dom g. Then n ∈ dom g
since members of ωSare transitive, and g(n) = f (n) = z by (∗).SIt follows that
g(n+ ) = g(n)
S
S = z by the definition of an attempt, and so z ∈ w. Thus,
we have a ⊂ z ⊂ t, as required.

Remark. The set t constructed in the proof above is in fact the transitive
closure TC(x) of x.

Theorem 2. (Principle of ∈-induction) For each formula p with free vari-


ables FV(p) = {x, t1 , . . . , tn }, the following sentence holds in ZF.
  
(∀ t1 ) . . . (∀ tn ) (∀ x) (∀ y ∈ x)p[y/x] ⇒ p ⇒ (∀ x)p

Proof. Fix values t1 , . . . , tn of the parameters and assume that p(x) holds when-
ever p(y) holds for all members y of x, i.e., that (∀ x)((∀ y ∈ x)p[y/x] ⇒ p) holds.
Assume for a contradiction that ¬(∀ x)p holds and fix any set x such that p(x)
fails.
(At this point we would like to take a minimal counterexample, i.e., an ∈-
minimal member of {y : ¬p(y)}. However, {y : ¬p(y)} may not be a set. This
is where transitive closure comes in.)
By Lemma 1 we can form the set t = TC({x}), and by (Sep) we can form the
set u = {y ∈ t : ¬p(y)}. Then x ∈ u, and hence u has an ∈-minimal member z.
If y ∈ z, then y ∈ t since t is transitive, and thus y ∈
/ u since z is ∈-minimal in
u. It follows that p(y) holds for all y ∈ z. By assumption on p, we deduce p(z)
contradicting the choice of z.

51
Remark. In the presence of the first eight axioms of ZF, the Principle of ∈-
induction is equivalent to the Axiom of Foundation. One direction is Theorem 2.
For the converse, assume the Principle of ∈-induction. Say that a set x is regular
if
(∀ y)(x ∈ y ⇒ ‘y has an ∈-minimal member’)
(this definition is the clever bit). Then (Fnd) is equivalent to the assertion that
(∀ x)(‘x is regular’) which we prove by ∈-induction. Fix a set x and assume that
every y ∈ x is regular (the induction hypothesis). Let z be a set with x ∈ z.
We need to show that z has an ∈-minimal member. This is obviously true if x
itself is an ∈-minimal member of z. If not, then we have y ∈ z for some y ∈ x,
in which case z has an ∈-minimal member since y is regular by the induction
hypothesis. This shows that x is regular, as required.
We now turn to ∈-recursion. Informally, this is the statement that a function f
can be defined so that for every x, the value f (x) is given in terms of the values
f (y), y ∈ x.

Theorem 3. (∈-recursion theorem) Let G be a function-class (given by a


formula p in two variables) defined everywhere (i.e., (∀ x)(∃ y)p(x, y)), then there
is a function-class F (given by a formula q in two variables) defined everywhere
such that (∀ x)(F (x) = G(Fx )). Moreover, F is unique.

Note. The restriction F x of F to x is {(s, F (s)) : s ∈ x} which is a set by


Replacement: It is the image of the set x under the function-class s 7→ (s, F (s))
which is given by the formula (∃ z)(q(s, z) ∧ t = (s, z)).
Proof. We first prove uniqueness. Assume that F1 and F2 both satisfy the
conclusions of the theorem. We show (∀ x)(F1 (x) = F2 (x)) by ∈-induction. Fix
a set x and assume that (∀ y ∈ x)(F1 (y) = F2 (y)). Then F1x = F2x , and hence
F1 (x) = G(F1x ) = G(F2x ) = F2 (x).
We now turn to existence. We say that a set f is an attempt if

‘f is a function’ ∧ ‘dom f is transitive’ ∧ (∀ x)(x ∈ dom f ⇒ f (x) = G(fx ))

Note that if x ∈ dom f , then x ⊂ dom f since dom f is transitive, and hence fx
makes sense. Now a straightforward ∈-induction (as in the proof of uniqueness)
shows that any two attempts agree on the intersection of their domains:

(*) (∀ f )(∀ g)(∀ x) ‘f is an attempt’ ∧ ‘g is an attempt’
 
∧ x ∈ dom f ∧ x ∈ dom g ⇒ f (x) = g(x)

Another ∈-induction shows that every set is in the domain of some attempt:

(∀ x)(∃ f )(‘f is an attempt’ ∧ x ∈ dom f )

To see this, fix a set x and assume that for all y ∈ x there is an attempt defined
at y. Note that an attempt defined at y is defined on TC({y}) since the domain
of an attempt is transitive, and the restriction to TC({y}) of this attempt is still
an attempt. Hence by (∗), for each y ∈ x there is a unique attempt S fy defined
on TC({y}). Then {fy : y ∈ x} is a set by Replacement and f 0 = {fy : y ∈ x}

52
is an attempt whose domain contains x. Finally, f = f 0 ∪ {(x, G(f 0 x ))} is an
attempt defined at x.
We now let q be the formula
(∃ f )(‘f is an attempt’ ∧ y = f (x))
It is now straightforward to verify that q defines a function-class F with the
required properties.

Remark. We can generalize ∈-induction and ∈-recursion to other relations.


By a relation-class we mean a formula r with two free variables (whose inter-
pretation in a model V is a subset of V × V ). For example, if r is (x ∈ y),
then r is the ∈-relation. The next definition identifies the two properties of the
∈-relation that were crucial in proving induction and recursion.

Definition. A relation-class r is well-founded if every non-empty set has an


r-minimal member:
  
(∀ x) ¬(x = ∅) ⇒ (∃ y) y ∈ x ∧ (∀ z ∈ x)¬r(z, y)

A relation-class r is local if the r-predecessors of a set form a set:


(∀ x)(∃ y)(∀ z)(z ∈ y ⇔ r(z, x))

Remark. Given a local relation-class r, we can define the notion of r-closure


similar to transitive closure. Then if in addition r is well-founded, we can prove
r-induction and r-recursion.
In what follows we will restrict attention to relations defined on sets. If r is a
relation on a set a, i.e., r ⊂ a × a, then r is automatically local since for any
x ∈ a, we can form the set {y ∈ a : y r x} by Separation. (Here we use the
familiar notation y r x instead of (y, x) ∈ r.) So in this case we just need r to be
well-founded (every non-empty subset of a has an r-minimal member) in order
to prove r-induction and r-recursion.
The next result shows that a well-founded relation on a set can always be mod-
elled by ∈ provided we assume a further property.

Definition. A relation r on a set a is extensional if members of a are uniquely


determined by their set of r-predecessors:

(∀ x ∈ a)(∀ y ∈ a) (∀ z ∈ a)(z r x ⇔ z r y) ⇒ x = y

Theorem 4. (Mostowski’s Collapsing Theorem) Let r be a well-founded,


extensional relation on a set a. Then there is a transitive set b and a bijection
f : a → b such that

(∀ x ∈ a)(∀ y ∈ a) x r y ⇔ f (x) ∈ f (y)
Moreover, the pair (b, f ) is unique.

Remark. The sentence above states that f is an order-isomorphism between a


with relation r and b with relation ∈. Thus, well-foundedness and extensionality
of r are necessary conditions.

53
Proof. We begin with existence. By r-recursion there is a function-class f such
that 
(∀ x ∈ a) f (x) = {f (y) : y ∈ a, y r x}
We set b = {f (x) : x ∈ a} which is a set by (Rep). Since {(x, f (x)) : x ∈ a}
is also a set by (Rep), we can take f to be a function. We verify that (b, f )
satisfies the conclusions of the theorem.
We first show that b is transitive. Given z ∈ b, z = f (x) for some x ∈ a, and
hence z = {f (y) : y ∈ a, y r x}. It follows that w ∈ b whenever w ∈ z.
By definition, f is surjective and x r y implies f (x) ∈ f (y) for all x, y ∈ a. It
remains to show that f is injective. This will also show that f (x) ∈ f (y) implies
x r y for all x, y ∈ a. Indeed, if f (x) ∈ f (y), then f (x) = f (z) for some z ∈ a
with z r y. Then by injectivity z = x, and thus x r y.
For x ∈ a, say that f is injective at x if (∀ y ∈ a)(f (y) = f (x) ⇒ y = x).
Then f is injective if and only if (∀ x ∈ a)(f is injective at x) which we show
by r-induction. Fix x ∈ a and assume that f is injective at s for all s ∈ a with
s r x. Assume that f (x) = f (y) for some y ∈ a. Then

{f (s) : s ∈ a, s r x} = {f (t) : t ∈ a, t r y}

Since f is injective at every s ∈ a with s r x, it follows that

{s : s ∈ a, s r x} = {t : t ∈ a, t r y}

Since the relation r is extensional, we have x = y as required.


We complete the proof by showing uniqueness. Let (b, f ) and (b0 , f 0 ) be pairs
both satisfying the conclusions of the theorem. We show

(∀ x ∈ a)(f (x) = f 0 (x))

by r-induction. Fix x ∈ a and assume that f (y) = f 0 (y) for all y ∈ a with y r x.
Given w ∈ f (x), we have w ∈ b since b is transitive, and w = f (z) for some
z ∈ a since f is surjective. Then f (z) ∈ f (x), and hence z r x. It follows by
the induction hypothesis that w = f (z) = f 0 (z) ∈ f 0 (x). Similarly, w ∈ f 0 (x)
implies w ∈ f (x). Thus, by the Axiom of Extensionality, we have f (x) = f 0 (x).
This completes the r-induction which shows that f = f 0 , and thus b = b0 .

Definition. An ordinal is a transitive set well-ordered by ∈ (equivalently,


linearly ordered by ∈ since ∈ is well-founded by (Fnd)).

Note. Suppose a is a set and r is a well-ordering on a. By Mostowski, there


is a transitive set b and a bijection f : a → b such that x r y ⇔ f (x) ∈ f (y) for
all x, y ∈ a. Thus, (a, r) is order-isomorphic to (b, ∈). It follows that b is an
ordinal. Moreover, by the uniqueness in Theorem 4, this is the unique ordinal
to which (a, r) is order-isomorphic. This unique ordinal is called the order-type
of the well-ordered set a.

Remark. We denote by ON the class of all ordinals. ON is a proper class by the


Burali-Forti paradox. Note that each order-isomorphism class of well-ordered
sets contains exactly one ordinal.

54
Proposition 5. Let α, β ∈ ON and a be a set of ordinals.
(i) Every member of α is an ordinal.

(ii) β ∈ α ⇔ β < α

(iii) β ∈ α or β = α or α ∈ β

(iv) α+ = α ∪ {α}
S S
(v) a is an ordinal and a = sup a.

Remarks. 1. Recall from Chapter 2 that the notation β < α in part (ii) means
that β is order-isomorphic to a proper initial segment of α. Parts (i) and (ii)
together show that the ordinal α really is the set of ordinals strictly less than α.
2. Part (iii) shows that ∈ is a linear order on the class ON.
3. Part (iv) reconciles two definitions. According to the definition in Chapter 2,
α+ is the unique (up to order-isomorphism) well-ordered set that consists of
α as a proper initial segment and one extra element that is a maximum. By
Mostowski, this well-ordered set is order-isomorpic to a unique ordinal (its order-
type). Part (iv) shows that this ordinal is the successor of the set α as defined
in this chapter. In particular, this shows that the successor of an ordinal is an
ordinal.
4. Part (v) implies that any set x of well-ordered sets has an upper bound. This
was owed from Chapter 2 (see the Remark following Proposition 2.8). Indeed,
a = {order-type(y) : y ∈ x} is a set of ordinals by (Rep) which by part (v) has
an upper bound.

Proof. Since α is transitive, for x ∈ α, we have x = {y ∈ α : y ∈ x} is the


proper initial segment Ix of α. Parts (i) and (ii) now follow immediately. Note
that for the right-to-left implication in (ii), we use the uniqueness in Theorem 4.
(iii) is immediate from part (ii) and from Theorem 2.6. Note that if α 6= β, then
α and β are not order-isomorphic by uniqueness in Theorem 4.
(iv) Let γ be the successor of α, i.e., γ = α ∪ {α}. It is straightforward to check
that γ is an ordinal. Then α is the maximum element of γ (with respect to ∈ of
course), and α is a proper initial segment of γ. So γ is indeed α+ in the sense
of Chapter 2.
S
(v) It is clear that a is transitive. It follows from parts (ii) and (iii) that
β ⊂ α ⇔ β 6 α and that one of α 6 β and β 6 α holds. Thus, a is a nested set
of well-ordered
S S by ∈). It follows from Proposition 2.8
sets (each well-ordered
that a is well-ordered by ∈. Thus, a is an ordinal and it is clearly sup a.

Examples. We finally give some examples of ordinals. Rather trivially, 0 = ∅


is an ordinal. An easy ω-induction and Proposition
S 5 (iv) shows that every
member of ω is an ordinal. Hence, so is ω = ω (the equality follows from
transitivity of ω). This shows that ω = sup ω, and thus the set ω introduced in
this chapter reassuringly coincides with the ω of Chapter 2.

55
Picture of the Universe
Idea. We build the
S entire universe V starting from the empty
S set by repeatedly
applying P and . So we have ∅, P∅, PP∅, . . . , then {∅, P∅, PP∅, . . . }, etc.
Formally, we define Vα , α ∈ ON, by ∈-recursion as follows.
V0 = ∅
Vα+ = PVα
[
Vλ = {Vα : α < λ} for a non-zero limit ordinal λ .

The class of sets Vα , α ∈ ON, is called the von Neumann hierarchy. Our aim is
to show that every set appears in one of the sets Vα . This leads to the following,
somewhat unstable-looking, picture of the universe, which perhaps also explains
why it is usually denoted by V .

A ↑ ON 
A .
A .
A .
A α  Vα
A .
A .
A .
A ω +1  Vω+1 = PVω
A ω  Vω = S{V0 , V1 , V2 , . . . }
A .
A .
A .
A 2  V2 = PP∅
A 1
V1 = P∅
A V0 = ∅

Lemma 6. Vα is transitive for all α ∈ ON.


Proof. We proceed by induction on α.
α = 0: V0 = ∅ is transitive.
α = β + 1: Assume Vβ is transitive and let x ∈ Vα and y ∈ x. We need to
show that y ∈ Vα . Since Vα = PVβ , we have x ⊂ Vβ and y ∈ Vβ . Since Vβ is
transitive, it follows that y ⊂ Vβ , and hence y ∈ PVβ = Vα , as required.
α is a non-zero limit: Assume Vβ is transitive for all β < α. Then Vα is a union
of transitive sets, and thus transitive.

Lemma 7. Vα ⊂ Vβ for all α 6 β.


Proof. We proceed by induction on β. In each case, we can assume α < β. The
case β = 0 is clear.
β = γ +1: Let α < β. Then α 6 γ, and so Vα ⊂ Vγ by the induction hypothesis.
It follows that Vα ∈ PVγ = Vβ . By Lemma 6, the set Vβ is transitive, and thus
Vα ⊂ Vβ , as required.
β is a non-zero limit: If α < β, then Vα ⊂ Vβ by definition of Vβ .

56
Theorem 8. The von Neumann hierarchy exhausts the set-theoretic universe,
i.e., (∀ x)(∃ α ∈ ON)(x ∈ Vα ) holds in ZF.

Note. If x ∈ Vα , then x ⊂ Vα since Vα is transitive. Conversely, if x ⊂ Vα ,


then x ∈ PVα = Vα+1 . Thus, Theorem 8 is equivalent to the assertion that
(∀ x)(∃ α ∈ ON)(x ⊂ Vα ) holds in ZF. For a set x, the least ordinal α with
x ⊂ Vα is called the rank of x, denoted by rank(x).
Proof. We prove (∀ x)(∃ α ∈ ON)(x ⊂ Vα ) by ∈-induction.
Fix a set x, and assume that every y ∈ x has a rank (the induction hypothesis).
Set α = sup{rank(y)+ : y ∈ x} noting that {rank(y)+ : y ∈ x} is a set by
Replacement. For each y ∈ x, we have y ⊂ Vrank(y) , and so y ∈ Vrank(y)+ .
By Lemma 7 we have Vrank(y)+ ⊂ Vα for all y ∈ x, and hence x ⊂ Vα , as
required.

Corollary 9. rank(x) = sup{rank(y)+ : y ∈ x}.


Proof. It follows from the proof of Theorem 8 that

rank(x) 6 sup{rank(y)+ : y ∈ x} .

For the reverse inequality, we first show that if x ∈ Vα then rank(x) < α. So
let us assume that x ∈ Vα . Then α > 0. If α = β + , then x ⊂ Vβ , and hence
rank(x) 6 β < α. If α is a limit ordinal, then x ∈ Vβ for some β < α. It follows
that x ⊂ Vβ since Vβ is transitive, and thus rank(x) 6 β < α.
Now set α = rank(x). Then for each y ∈ x, we have y ∈ Vα , and hence
rank(y) < α by the claim above. It follows that sup{rank(y)+ : y ∈ x} 6 α.

Example. Using the formula above, an easy induction shows that rank(α) = α
for every ordinal α.

57
6 Cardinal Arithmetic
In this chapter we are interested in the size of sets. So we will want to identify
sets that have the same size. We introduce the abbreviation ‘x ≡ y’ for the
formula (∃ f )(‘f is a bijection from x to y’). Note that this is an equivalence
relation on V .
Next, we wish to define the size, or cardinality, of a set x to be a set card(x)
such that the following holds.

(†) (∀ x)(∀ y)(card(x) = card(y) ⇔ x ≡ y)

One obvious choice for card(x) would be the ≡-equivalence class {y : y ≡ x}


of x. However, this is always a proper class (except when x = ∅ whose ≡-
equivalence class is the set {∅}). Another possibility is to choose a particular
representative of the ≡-equivalence class of x to be card(x). This can be done if
we assume the Axiom of Choice. It turns out that something along the lines of
the first possibility can be made work in ZF (this is a device due to D. S. Scott).

Definitions. Let x be a set. In ZFC we define the cardinality card(x) of x to


be the least ordinal α such that x ≡ α. Note that such ordinals exist since in
ZFC x can be well-ordered.
In ZF we first define the essential rank ess rank(x) of x to be the least ordinal
α such that there exists a set y with rank(y) = α and y ≡ x. We then define

card(x) = {y ∈ Vess rank(x)+1 : y ≡ x}

which is a set by Separation.

Note. In the rest of this chapter we will work in ZFC, so we could adopt the
first definition of cardinality. However, the exact definition does not matter that
much. What is important is property (†). Also, much of what we do below is
valid in ZF.

Definition. Say that a set m is a cardinal if m = card(x) for some set x. In


this case we say m is the cardinality of x.
Before discussing the arithmetic of cardinals, we introduce initial ordinals and
the alephs.

The Alephs
Definition. Say α ∈ ON is an initial ordinal if (∀ β ∈ ON)(β < α ⇒ ¬(β ≡ α)).

Examples. For every set x, the Hartogs’ ordinal γ(x) is an initial ordinal. Since
for n < ω we have γ(n) = n+ (easy ω-induction), it follows that all members of
ω are initial ordinals, which in turn implies that ω is an initial ordinal.
.
..
ωω
The ordinals ω 2 , ω 3 or ε0 = ω ω are not initial ordinals as they all biject
with ω. In fact, the next initial ordinal after ω is γ(ω) = ω1 . More generally,
we can index the infinite initial ordinals as follows.

58
Definition. Define ωα for α ∈ ON by recursion:

ω0 = ω
ωβ + = γ(ωβ )
ωλ = sup{ωβ : β < λ} (λ non-zero limit)

Proposition 1. The ordinals ωα are exactly the infinite initial ordinals.

Remark. Note that for ordinals α < β, if β injects into α, then by the Schröder–
Bernstein theorem we have α ≡ β. It follows that if α < β and β is an initial
ordinal, then β cannot inject into α. We shall use this simple observation several
times below.
Proof. We first show that the ωα are initial ordinals by induction on α. We only
need to check the case when α is a non-zero limit. In this case, assume that
ωα ≡ γ for some γ < ωα . Then γ < ωβ for some β < α. Since ωβ < ωα (easy
induction on α), it follows that ωβ injects into γ contradicting the induction
hypothesis that ωβ is an initial ordinal.
Now assume that δ is an infinite initial ordinal. An easy induction shows that
α 6 ωα for all α ∈ ON, and hence there is a least α with δ < ωα . Since δ is
infinite, α 6= 0, and moreover α cannot be a limit otherwise δ < ωβ for some
β < α contradicting the minimality of α. Thus α = β + for some β that satisfies
ωβ 6 δ < ωβ + = γ(ωβ ). It follows that δ injects into ωβ , and thus δ = ωβ as δ
is an initial ordinal.

Notation. For α ∈ ON we denote by ℵα (‘aleph-α’) the cardinality of ωα . By


Proposition 1 the alephs are the cardinalities of all infinite sets. (This is true in
ZFC. In ZF the alephs are the cardinalities of all infinite well-ordered sets.)

The arithmetic of cardinals


We use the letter m, n, p for cardinals, and M, N, P for sets with cardinalities
m, n, p, respectively.

Definition. Write m 6 n if M injects into N , and m < n if m 6 n and m 6= n.

Note. The relation 6 is well defined, i.e., does not depend on the choice of
the sets M, N . It is also easy to check that 6 is a partial order on the class
of cardinals. Antisymmetry (m 6 n and n 6 m imply m = n) follows from
Schröder–Bernstein. In ZFC it is even a linear order.

Definition. We define cardinal addition, multiplication and exponentiation as


follows.

m + n = card(M t N )
m · n = card(M × N )
mn = card(M N ) (M N = {f ∈ P(N × M ) : f : N → M })

These operations are well-defined, i.e., they do not depend on the choice of the
sets M, N .

59
Properties. The following are straightforward to check by writing down a
bijection between appropriate sets.

m+n=n+m
(m + n) + p = m + (n + p)
m·n=n·m
(mn)p = m(np)
m(n + p) = mn + mp
(mn)p = mp np
mn+p = mn mp
p
mn = mnp

In addition, if m 6 n, then m + p 6 n + p, mp 6 np, mp 6 np and pm 6 pn .


This is again easy to verify by writing down appropriate injections.

Note. Cantor’s diagonal argument shows that m < 2m for all cardinals m
(there is no surjection M → 2M ). In particular, ℵ0 < 2ℵ0 which contrasts with
ω = 2ω for ordinal exponentiation.
Similarly, we have 2 · ℵ0 = ℵ0 · 2 in contrast with 2 · ω = ω 6= ω · 2 for ordinal
multiplication.
A consequence of the next result is that addition and multiplication of alephs
is easy.

Theorem 2. ℵα · ℵα = ℵα for all α ∈ ON.


Proof. We proceed by induction on α. The case α = 0 is clear since ω × ω ≡ ω.
Now let α > 0 and assume that ωβ × ωβ ≡ ωβ for all β < α.
Well-order ωα × ωα by ‘going up in squares’: (x, y) < (w, z) if and only if

either max{x, y} < max{w, z}

or max{x, y} = max{w, z} = δ, say, and either y<δ=z


or y<z<δ
or x < w, y = z = δ

Given δ ∈ ωα × ωα , the proper initial segment Iδ is contained in β × β for


some β < ωα . (E.g., if δ = (x, y), then β = max{x, y}+ will do.) Then
card(β) < card(ωα ) since ωα is initial. So by induction hypothesis, either β is
finite, or β × β ≡ β. It follows that card(Iδ ) 6 card(β × β) < card(ωα ).
The above shows that every proper initial segment of ωα × ωα has order-type
< ωα , and hence ωα × ωα has order-type 6 ωα . It follows that ωα × ωα injects
into ωα , and so ℵα · ℵα 6 ℵα .
Since ℵα = ℵα · 1 6 ℵα · ℵα , the result follows.

Corollary 3. Let α 6 β be ordinals. Then ℵα + ℵβ = ℵα · ℵβ = ℵβ .


Proof. ℵβ 6 ℵα + ℵβ 6 ℵβ · 2 6 ℵα · ℵβ 6 ℵβ · ℵβ = ℵβ .

60
Note. In ZFC one can define more general infinite sums and products of car-
dinals. In the definitions below, as earlier, lower-case letters denote cardinals
and upper-case letters denote sets with cardinality the corresponding lower-case
letter.

Definitions. Let I be a set, and for each i ∈ I let mi be a cardinal. Then


X G  Y Y 
mi = card Mi and mi = card Mi
i∈I i∈I i∈I i∈I
F S
where i∈I Mi = Mi × {i} and
i∈I
Y n S o
Mi = f : I → i∈I Mi : f (i) ∈ Mi for all i ∈ I .
i∈I

Note. We need AC in these definitions twice. Firstly, we need to make a choice


of sets Mi with cardinality mi . Secondly, when we show that these definitions
don’t depend on the choice of the Mi , we need to make a choice of bijections
fi : Mi → Mi0 where Mi0 is another set with cardinality mi .

Example. It is possible to show results similar


P to Corollary 3. For example, if
mi 6 ℵα for all i ∈ I and card(I) 6 ℵα , then i∈I mi 6 ℵα .
Infinite products of cardinals relate to cardinal exponentiation which is hard. We
can achieve some reduction in the problem of studying cardinal exponentiation.
For example, if α 6 β, then
ℵ ℵβ
2ℵβ 6 ℵαβ 6 2ℵα = 2ℵα ·ℵβ = 2ℵβ

So it is of interest to study cardinals of the form 2ℵβ . We know that ℵβ < 2ℵβ
but very little else is known. For example, a natural question is whether 2ℵ0
is equal to ℵ1 . Since 2ℵ0 is the cardinality of R, this became known as the
Continuum Hypothesis (or CH for short):

(CH) 2ℵ0 = ℵ1

P. Cohen proved in the 1960s that if ZFC is consistent, then so are ZFC+CH
and ZFC+¬CH. So CH is independent of ZFC.

61
7 *Classical descriptive set theory*

Note. The material in this chapter is non-examinable. We study ‘definable’


sets in Polish spaces: Borel sets and projective sets (see definitions below). We
aim to cover enough material to show the existence of analytic non-Borel sets
(a sort of superficial analogue of P6=NP) and that the Continuum Hypothesis
holds for analytic sets.

Polish spaces
Definition. A Polish space is a separable, complete metrizable topological
space.

Examples. The most important example for us is Baire space N = NN , the


space of sequences in N with the product topology: open sets are unions of basic
open sets which are of the form

Um1 ,...,mk = {n = (ni )i∈N ∈ N : ni = mi for 1 6 i 6 k}

for any finite sequence m1 , . . . , mk in N. For m 6= n in N , let k ∈ N be minimal


with mk 6= nk , and set d(m, n) = 1/k. It is easy to check that d is a complete
metric on N inducing the product topology (the open balls are exactly the basic
open sets). Moreover, the set of eventually constant sequences is a countable
dense subset of N . Thus Baire space is indeed a Polish space.
Another important example is the space {0, 1}N which is compact in the product
topology and can be viewed as a subspace of N in the obvious way. Further
examples are euclidean spaces Rd and, more generally, separable Banach spaces
and closed subsets thereof.

Lemma 1. Every (non-empty) Polish space X is the continuous image of Baire


space.
S
Proof. By separability, we can write X as a union m∈N Um of non-empty open
sets each of diameter at most 1. In turn, each Um can be written as a union
S
n∈N Um,n of non-empty open sets each of diameter at most 1/2. Continuing
this way, we find a family of non-empty open sets Um1 ,...,mk of diameter
S at most
1/k indexed by finite sequences in N such that Um1 ,...,mk−1 = mk ∈N Um1 ,...,mk
for all k, m1 , . . . , mk−1 ∈ N.
Next, fix an element xm1 ,...,mk of Um1 ,...,mk for each finite sequence m1 , . . . , mk
in N. Define ϕ : N → X by ϕ(n) = limk→∞ xn1 ,...,nk for n = (ni )i∈N ∈ N . It is
straightforward to verify that ϕ is continuous and surjective.

Lemma 2. N is homeomorphic to the set of irrationals in [0, 1].


Proof. Use continued fractions: Define
1
ϕ(n) = 1
n1 + n2 + n 1
3 +...

for n = (ni )i∈N ∈ N .

62
Borel hierarchy
Definitions. Let X be an arbitrary set. A σ-field (or σ-algebra) on X is a
subset F of the power set PX such that

(i) ∅ ∈ F
S
(ii) A1 , A2 , · · · ∈ F implies n∈N An ∈ F

(iii) A ∈ F implies X \ A ∈ F
Note that in particular a σ-field is closed under countable intersections (as well
as countable unions).
Now assume that X is a Polish space. The Borel σ-field B on X is the smallest
σ-field on X containing all the open sets. (Equivalently, B is the intersection
of all σ-fields on X that contain the open sets; there exists at least one such
σ-field, namely PX.) Members of B are called Borel sets.

Borel hierarchy. For a Polish space X, we define families Σ0α and Π0α of subsets
of X for ordinals 1 6 α < ω1 by recursion as follows.

Σ01 = {U ⊂ X : U open}

Π01 = {F ⊂ X : F closed}

Σ0α+1 = { n∈N An : An ∈ Π0α for all n ∈ N}


S

Π0α+1 = {X \ A : A ∈ Σ0α+1 }

Σ0λ = { n∈N An : ∀ n ∈ N ∃ α < λ An ∈ Π0α }


S

Π0λ = {X \ A : A ∈ Σ0λ }

where in the last two lines λ is a non-zero limit. The collections of these families
is the Borel hierarchy of X.
We define ∆0α = Σ0α ∩ Π0α for 1 6 α < ω1 .

Example. Σ02 is the family of countable unions of closed sets known as Fσ -sets.
Π02 is the family of countable intersections of open sets known as Gδ -sets.

Remark. Any open set in a Polish space (or indeed in any metric space) is
a countable union of closed sets, i.e., Σ01 ⊂ Σ02 . An easy induction then shows
that Σ0α ⊂ ∆0β and Π0α ⊂ ∆0β for 1 6 α < β < ω1 .

Σ01 Σ02 Σ03


∆01 ∆02 ∆03 ···


Π01 Π02 Π03

[ [
Lemma 3. Σ0α = Π0α = B in any Polish space.
16α<ω1 16α<ω1

63
Proof. The first equality follows from the inclusions above. For the second
0
S first show by induction that Σα ⊂ B for all 1 6 α < ω1 , and then show
equality,
that 16α<ω1 Σ0α is a σ-field containing the open sets.

Definition. A subset A ⊂ N × N is a universal Σ0α -set if


(i) A is Σ0α , and

(ii) if B ⊂ N is Σ0α , then B = {n ∈ N : (m, n) ∈ A} for some m ∈ N .

Theorem 4. For every 1 6 α < ω, there exists a universal Σ0α -set.


Proof. We first show that there exists a universal open set. Enumerate the basic
open sets (recall that these are indexed by finite sequences of positive integers)
as U1 , U2 , U3 , . . . . Let

A = {(m, n) ∈ N × N : ∃ i ∈ N n ∈ Umi }

It is easy to check that A is open.


S
An open set B ⊂ N can be written as a union of basic open sets: B = i∈N Umi
for some m ∈ N . Then

n∈B ⇔ ∃ i ∈ N n ∈ Umi ⇔ (m, n) ∈ A

and thus B = {n ∈ N : (m, n) ∈ A}.


So far we have established the theorem for α = 1. One can prove the general
statement by induction on α. This is left as an exercise.

Corollary 5. For each 1 6 α < ω1 there is a Σ0α -subset of N that is not Π0α .

Remark. This leads to the following refinement of the picture of the Borel
hierarchy of N .

Σ01 Σ02 Σ03


(

∆01 ∆02 ∆03 ···


(

Π01 Π02 Π03

Proof. Fix a universal Σ0α -set A ⊂ N × N . Then B = {n ∈ N : (n, n) ∈ A} is


Σ0α since n 7→ (n, n) is continuous. If B is Π0α , then B = {n ∈ N : (m, n) ∈
/ A}
for some m ∈ N . Considering whether m ∈ B leads to a contradiction.

Projective hierarchy
Definition. A subset of a Polish space is analytic if it is a continuous image of
N (or it is empty).

Examples. It follows from Lemma 1 that any Polish space, and thus any closed
subset of a Polish space, is analytic.

64
Remark. We shall often use implicitly the following observation. The spaces
N × N = N{0}tN , N × N = NNtN and N N = NN×N are homeomorphic to N in
the obvious way.

Proposition 6. Let X be a Polish space and A ⊂ X. Then TFAE.


(i) A is analytic.

(ii) A is the continuous image of a Borel subset of some Polish space.

(iii) A is the projection onto X of a Borel subset of Y × X (for some Polish


space Y ).

(iv) A is the projection onto X of a closed subset of Y × X (for some Polish


space Y ).

(v) A is the projection onto X of a Borel subset of N × X.

(vi) A is the projection onto X of a closed subset of N × X.


Proof. We show the implications (ii)⇒(i)⇒(vi). Together with the trivial im-
plications, this will complete the proof.
(i)⇒(vi): Let f : N → X be continuous with f (N ) = A. Then A is the projec-
tion onto X of the graph {(n, f (n)) : n ∈ N } of f , which is closed since f is
continuous and X is Hausdorff.
(ii)⇒(i): We need to show that every Borel set is analytic. Since closed sets,
i.e., Π01 -sets, are analytic, it is enough to show that countable unions and in-
tersections of analytic sets are analytic. Indeed, an easy induction then shows
that Σ0α -sets and Π0α -sets are analytic for all 1 6 α < ω1 , and the result follows.
For each k ∈ N let Ak ⊂ X be an analytic set, and let Fk be a closed subset of
N × X such that Ak is the projection onto X of Fk . Then
[
x∈ Ak ⇔ ∃ k ∈ N ∃ n ∈ N (n, x) ∈ Fk
k∈N
⇔ ∃ (k, n) ∈ N × N (n, x) ∈ Fk ,
S
and thus k∈N Ak is the projection onto X of the closed subset

{(k, n, x) ∈ N × N × X : (n, x) ∈ Fk }

of N × N × X. Similarly, we have
\
x∈ Ak ⇔ ∀ k ∈ N ∃ n ∈ N (n, x) ∈ Fk
k∈N

⇔ ∃ (n(k) )k∈N ∈ N N ∀ k ∈ N (n(k) , x) ∈ Fk


T
and thus k∈N Ak is the projection onto X of the closed subset
\
{(n(1) , n(2) , . . . , x) ∈ N N × X : (n(k) , x) ∈ Fk }
k∈N

of N N × X.

65
Definition. We define Σ11 to be the family of analytic sets (in some Polish
space) and Π11 to be the family of complements of analytic sets called coanalytic
sets. Then inductively, for 1 6 n < ω, we define Σ1n+1 to be the family of
continuous images of Π1n -sets, and Π1n+1 to be the family of complements of
Σ1n+1 -sets. We also let ∆1n = Σ1n ∩ Π1n for 1 6 n < ω.

Note. It follows from Proposition 6 that B ⊂ ∆11 . Then an easy induction


establishes the following inclusions.
⊂ Σ11 Σ12 Σ13


∆11 ∆12 ∆13 ···


Π11 Π12 Π13

Σ1n = Π1n ; we denote the common union by P.


S S
It follows that 16n<ω 16n<ω

Definition. The collection of families Σ1n and Π1n , 1 6 n < ω, is called the
projective hierarchy. Members of P are the projective sets.

Theorem 7. There exists a universal analytic set A ⊂ N × N .


Proof. Let U ⊂ N × (N × N ) be a universal open set: If V ⊂ N × N is open,
then V = {(m, n) ∈ N × N : (p, m, n) ∈ U } for some p ∈ N .
Set
A = {(p, n) ∈ N × N : ∃ m ∈ N (p, m, n) ∈
/ U} .
Then A is analytic since it is the projection of a closed set. We show that A is
universal.
Let B ⊂ N be analytic. Then there is a closed set F ⊂ N × N such that

B = {n ∈ N : ∃ m ∈ N (m, n) ∈ F }

and by the choice of U above, there exists p ∈ N such that

F = {(m, n) ∈ N × N : (p, m, n) ∈
/ U} .

It follows that

B = {n ∈ N : ∃ m ∈ N (p, m, n) ∈
/ U } = {n ∈ N : (p, n) ∈ A}

as required.

Corollary 8. There exists an analytic subset of N that is not coanalytic, i.e.,


it belongs to Σ11 \ Π11 .
Proof. Let A ⊂ N × N be a universal analytic set. Let

B = {n ∈ N : (n, n) ∈ A} .

Then B is analytic since n 7→ (n, n) is continuous.


Assume B is also coanalytic. Then B = {n ∈ N : (m, n) ∈
/ A} for some m ∈ N .
We get a contradiction by considering whether m ∈ B.

66
Remark. We have already observed that Borel sets are both analytic and
coanalytic. So the set B constructed in the proof above is analytic non-Borel.
We will now show the converse that a set that is both analytic and coanalytic
is Borel.

Definition. We say two subsets Y, Z of a Polish space X can be separated


by Borel sets if there exist disjoint Borel sets B, C in X such that Y ⊂ B and
Z ⊂ C; equivalently, there exists a Borel set B in X such that Y ⊂ B ⊂ X \ Z.

Theorem 9. (Lusin’s separation theorem) Disjoint analytic sets in a Polish


space can be separated by Borel sets.
S S
Proof. We first observe that given sets Y = n∈N Yn and Z = n∈N Zn in a
Polish space, if Ym and Zn can be separated by Borel sets for all m, n ∈ N,
then Y and Z can also be separated by Borel sets. Indeed, for each m, n ∈ N,
choose T set Bm,n such that Ym ⊂ Bm,n ⊂ X \ Zn . Then the Borel set
S a Borel
B = m∈N n∈N Bm,n satisfies Y ⊂ B ⊂ X \ Z.
Now consider two disjoint analytic sets in a Polish space X. These have the
form f (N ) and g(N ) where f : N → X and g : N → X are continuous with
f (N ) ∩ g(N ) = ∅. Recall that for any finite sequence m1 , . . . , mk of positive
integers, there is a corresponding basic open set

Um1 ,...,mk = {n ∈ N : ni = mi for 1 6 i 6 k}

in N . Now assume S that f (N ) and g(N )Scannot be separated by Borel sets.


Since f (N ) = n∈N f (Un ) and g(N ) = n∈N g(Un ), it follows by the initial
observation that f (Um1 ) and g(Un1 ) cannot be separated by Borel sets for some
m1 , n1 ∈ N. Repeatedly applying this reasoning, we obtain m, n ∈ N such that
f (Um1 ,...,mk ) and g(Un1 ,...,nk ) cannot be separated by Borel sets for any k ∈ N.
Since f (N ) and g(N ) are disjoint, it follows that f (m) 6= g(n), and hence f (m)
and g(n) can be separated by disjoint open sets V, W (as X is Hausdorff). Hence
f −1 (V ) and g −1 (W ) are disjoint open neighbourhoods of m and n, respectively,
and thus contain Um1 ,...,mk and Un1 ,...,nk , respectively, for a sufficiently large
k ∈ N. It follows that f (Um1 ,...,mk ) ⊂ V and g(Un1 ,...,nk ) ⊂ W wich contradicts
the choice of m and n.

Corollary 10. A subset of a Polish space is Borel if and only if it is both


analytic and coanalytic. In symbols: Σ11 ∩ Π11 = B.

Remark. Corollary 8 shows the existence of an analytic non-Borel set. We


now present a concrete example.

Example. Let Seq be the set of all finite sequences of positive integers. Then
P Seq = {0, 1}Seq is a Polish space in the product topology (homeomorphic to
{0, 1}N since Seq is countable).
Given s = (m1 , . . . , mk ) and t = (n1 , . . . , nl ) in Seq, write s ≺ t if 0 6 k 6 l
and mi = ni for 1 6 i 6 k.
Say T ⊂ Seq is a tree if s ∈ T whenever s ≺ t and t ∈ T . Say n ∈ N is an
infinite branch of T if (n1 , . . . , nk ) ∈ T for all k ∈ N. Say T is well-founded if
T has no infinite branch.

67
Let T be the set of all trees and WFT be the set of all well-founded trees. Note
that T is a closed subset of P Seq, and thus T is also a Polish space. We show
that the subset WFT of T is coanalytic. For any T ∈ T we have

T ∈
/ WFT ⇔ ∃ n ∈ N ∀ k ∈ N (n1 , . . . , nk ) ∈ T .

It follows that T \ WFT is the projection onto T of the closed set


\
{(n, T ) ∈ N × T : (n1 , . . . , nk ) ∈ T } ,
k∈N

and thus analytic. It is possible to show that WFT is not analytic, and hence
WFT is a coanalytic non-Borel set.

Definition. A subset of a topological space is perfect if it is closed and contains


no isolated points. (A point x in a subset A of a topological space X is isolated
in A if there is an open neighbourhood U of x such that U ∩ A = {x}.)

Lemma 11. A non-empty perfect subset of a Polish space has cardinality 2ℵ0 .
Proof. Let A be a non-empty perfect subset of a Polish space X. Given x ∈ A
and a radius r > 0, since x is not isolated in A, there exist y, z ∈ A and a radius
s > 0 such that the closed balls Bs (y) and Bs (z) are disjoint and contained in
Br (x). (Note that we are implicitly assuming that X comes with a complete
metric defining its topology.)
Since A is not empty, we can fix a point x∅ in A. Using the observation above, we
inductively construct points xε1 ,...,εk in A indexed by finite sequences ε1 , . . . , εk
in {0, 1} (where ∅ is the sequence of length zero) and radii r1 , r2 , . . . such that
the closed ball Brk (xε1 ,...,εk ) contains the disjoint closed balls Brk+1 (xε1 ,...,εk ,0 )
and Brk+1 (xε1 ,...,εk ,1 ), and moreover rk → 0 as k → ∞.
It is easy to verify that the function ϕ : {0, 1}N → A given by

ϕ(ε1 , ε2 , . . . ) = lim xε1 ,...,εk


k→∞

is injective. It follows that 2ℵ0 6 card(A).


Since X is the continuous image of N , it follows that

card(A) 6 card(X) 6 card(N ) = 2ℵ0 ,

and hence card(A) = 2ℵ0 .

Theorem 12. Every analytic set either has a perfect subset or is countable. It
follows that every infinite analytic set has cardinality ℵ0 or 2ℵ0 .
Proof. For a tree T let

[T ] = {n ∈ N : (n1 , . . . , nk ) ∈ T for all k ∈ N}

be the set of all infinite branches of T . Note that for T = Seq we have [T ] = N ,
and for T ∈ WFT we have [T ] = ∅. For a tree T and s ∈ Seq, let

T (s) = {t ∈ T : t ≺ s or s ≺ t} .

68
Now fix an analytic set A in some Polish space X. Then A = f (N ) = f ([Seq])
for some continuous function f : N → X. For a tree T let

T 0 = {s ∈ Seq : f ([T (s)]) is uncountable} .

Note that T 0 is a tree contained in T . Next, define trees T (α) by recursion as


follows.

T (0) = Seq
0
T (α) = T (β) if α = β +
\
T (α) = T (β) if α is a non-zero limit
β<α

Since Seq is countable, there exists α < ω1 such that T (α+1) = T (α) . Set
T = T (α) and consider the following two cases.
If T = ∅, then 
[ 
T (β) \ T (β+1)
 
A= f
β<α

and
  [ n  (β)  o
T (β) \ T (β+1) = f T (s) : f T (β) (s) is countable
   
f

which implies that A is countable.


Now consider the case when T 6= ∅. Given s ∈ T , since s ∈ T 0 , it follows
in particular that f ([T (s)]) has at least two distinct elements f (m) and f (n).
Then by the continuity of f , there exists k ∈ N such that f ([T (m1 , . . . , mk )]) ∩
f ([T (n1 , . . . , nk )]) = ∅. We have thus shown that for all s ∈ T there exist
t, u ∈ T such that s ≺ t, s ≺ u and f ([T (t)]) ∩ f ([T (u)]) = ∅.
Using the above observation, we can construct elements sε1 ,...,εk of T for all finite
sequence ε1 , . . . , εk in {0, 1} such that for all k ∈ N, we have sε1 ,...,εk ≺ sε1 ,...,εk ,0 ,
sε1 ,...,εk ≺ sε1 ,...,εk ,1 and f ([T (sε1 ,...,εk ,0 )]) ∩ f ([T (sε1 ,...,εk ,1 )]) = ∅.
Finally, let

M = {n ∈ N : ∃ (εi ) ∈ {0, 1}N such that sε1 ,...,εk ≺ n for all k ∈ N} .

One can show that M is compact and f (M) is a perfect set. This is left as an
exercise.

69

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy