4400 Full
4400 Full
Pete L. Clark
Contents
Chapter 1. The Fundamental Theorem and Some Applications
1. Foundations
2. The Fundamental Theorem (in Z)
3. Some examples of failure of unique factorization
4. Consequences of the fundamental theorem
5. Some Irrational Numbers
6. Primitive Roots
7
7
11
14
16
23
26
29
29
33
35
39
39
39
42
43
46
49
50
50
53
57
59
60
61
63
64
66
67
67
69
69
71
73
75
75
3
CONTENTS
2.
3.
4.
5.
76
77
80
83
85
85
86
89
90
91
92
93
95
95
96
101
104
107
107
108
110
111
113
114
119
119
124
125
127
Chapter 11. The Prime Number Theorem and the Riemann Hypothesis
1. Some History of the Prime Number Theorem
2. Coin-Flipping and the Riemann Hypothesis
131
131
134
Chapter 12. The Gauss Circle Problem and the Lattice Point Enumerator
1. Introduction
2. Better Bounds
3. Connections to average values
139
139
142
144
147
147
154
161
161
163
168
CONTENTS
171
171
175
177
179
179
183
184
186
189
190
191
194
195
197
197
198
203
205
207
209
212
214
217
219
220
222
226
228
231
231
233
234
236
237
240
245
245
246
247
249
256
259
Appendix C.
263
More on Polynomials
CONTENTS
1. Polynomial Rings
2. Finite Fields
Appendix.
Bibliography
263
264
267
CHAPTER 1
Let us elaborate. Consider rst the non-negative integers which, as is traditional, we will denote by N endowed with the operation +. This is a very simple
structure: we start with 0, the additive identity, and get every positive integer by
repeatedly adding 1.1 In some sense the natural numbers under addition are the
simplest nontrivial algebraic structure.
Note that subtraction is not in general dened on the natural numbers: we
would like to dene a b = c in case a = b + c, but of course there is not always
such a natural number c consider e.g. 3 5.
As you well know, there are two dierent responses to this: the rst is to formally extend the natural numbers so that additive inverses always exist. In other
words, for every positive integer n, we formally introduce a corresponding number
n with the property that n + (n) = 0. Although it is not a priori obvious that
such a construction works rather, the details and meaning of this construction
were a point of confusion even among leading mathematicians for a few thousand
years nowadays we understand that it works to give a consistent structure: the
integers Z, endowed with an associative addition operation +, which has an identity
0 and for which each integer n has a unique additive inverse n.
The second response is to record the relation between two natural numbers a
and b such that b a exists as a natural number. Of course this relation is just
that a b. This is quite a simple relation on N: indeed, for any pair of integers,
we have either a b or b a, and we have both exactly when a = b.2
Now for comparison consider the positive integers
Z+ = 1, 2, 3, . . .
under the operation of multiplication. This is a richer structure: whereas additively, there is a single building block 1 the multiplicative building blocks are
the prime numbers 2, 3, 5, 7, . . .. Of course the primes are familiar objects, but
the precise analogy with the additive case may not be as familiar, so let us spell
it out carefully: just as subtraction is not in general dened on N, division is not
in general dened on Z+ . On the one hand we can formally complete Z+ by
adjoining multiplicative inverses, getting this time the positive rational numbers
Q+ . However, again one can view the fact that a/b is not always a positive integer
as being intriguing rather than problematic, and we again consider the relation between two positive integers a and b that b/a be a positive integer: in other words,
that there exist a positive integer c such that b = a c. In such a circumstance
we say that a divides b, and write it as a|b.3 It is easy to see that the relation of
divisibility is more complicated than the relation since divisibility is not a total
ordering: e.g. 2 | 3 and also 3 | 2. What are we to make of this divisibility relation?
First, on a case-by-case basis, we do know how to determine whether a | b.
Proposition 1. (Division Theorem) For any positive integers n and d, there
exist unique non-negative integers q and r with 0 r < d and n = qd + r.
1Here I am alluding to the fact that in the natural numbers, addition can be dened in terms
of the successor operation s(n) = n + 1, as was done by the 19th century mathematical logician
Giuseppe Peano. No worries if you have never heard of the Peano axioms their importance lies
in the realm of mathematical logic rather than arithmetic itself.
2That is to say, the relation on N is a linear, or total, ordering.
3Careful: a|b b is an integer.
a
1. FOUNDATIONS
This is a very useful tool, but it does not tell us the structure of Z+ under the
divisibility relation. To address this, the primes inevitably come into play: there
is a unique minimal element of Z+ under divisibility, namely 1 (in other words, 1
divides every positive integer and is the only positive integer with this property): it
therefore plays the analogous role to 0 under on N. In N \ 0, the unique smallest
element is 1. In Z+ \ 1 the smallest elements are the primes p. Given that the
denition of a prime is precisely an integer greater than one divisible only by one
and itself, this is clear. The analogue to repeatedly adding 1 is taking repeated
powers of a single prime: e.g., 2, 22 , 23 , . . .. However, we certainly have more
than one prime in fact, as you probably know and we will recall soon enough,
there are innitely many primes and this makes things more complicated. This
suggests that maybe we should consider the divisibility relation one prime at a time.
So, for any prime p, let us dene a |p b to mean that ab is a rational number which,
when written in lowest terms, has denominator not divisible by p. For instance,
3 |2 5, since 35 , while not an integer, doesnt have a 2 in the denominator. For that
matter, 3 |p 5 for all primes p dierent from 3, and this suggests the following:
Proposition 2. For any a, b Z+ , a|b a |p b for all primes p.
Proof. Certainly if a|b, then a |p b for all primes p. For the converse, write
in lowest terms, say as B
A . Then a |p b i A is not divisible by p. But the only
positive integer which is not divisible by any primes is 1.
b
a
In summary, we nd that the multiplicative structure of Z+ is similar to the additive structure of N, except that instead of there being one generator, namely
1, such that every element can be obtained as some power of that generator, we
have innitely many generators the primes and every element can be obtained
(uniquely, as we shall see!) by taking each prime a non-negative integer number of
times (which must be zero for all but nitely many primes). This switch from one
generator to innitely many does not in itself cause much trouble: given
a = pa1 1 pann
and
b = pb11 pbnn
we nd that a | b i a |p b for all p i ai bi for all i. Similarly, it is no problem to
multiply the two integers: we just have
ab = pa1 1 +b1 pann +bn .
Thus we can treat positive integers under multiplication as vectors with innitely
many components, which are not fundamentally more complicated than vectors
with a single component.
The trouble begins when we mix the additive and multiplicative structures. If
we write integers in standard decimal notation, it is easy to add them, and if we
write integers in the above vector factored form, it is easy to multiply them.
But what is the prime factorization of 2137 + 3173 ? In practice, the problem of
given an integer n, nding its prime power factorization (1) is extremely computationally dicult, to the extent that most present-day security rests on this diculty.
10
It is remarkable how quickly we can nd ourselves in very deep waters by asking apparently innocuous questions that mix additive and multiplicative structure.
For instance, although in the multiplicative structure, each of the primes just rests
on its own axis as a generator, in the additive structure we can ask where the
primes occur with respect to the relation . We do not have anything approaching
a formula for pn , and the task of describing the distribution of the pn s inside N is
a branch of number theory in and of itself (we will see a taste of it later on). For
instance, consider the quantity g(n) = pn+1 pn , the nth prime gap. For n > 1,
the primes are all odd, so g(n) 2. Computationally one nds lots of instances
when g(n) is exactly 2, e.g. 5, 7, 11, 13, and so forth: an instance of g(n) = 2
equivalently, of a prime p such that p + 2 is also a prime is called a twin prime
pair. The trouble is that knowing the factorization of p tells us nothing4 about the
factorization of p + 2. Whether or not there are innitely many twin primes is a
big open problem in number theory.
It goes on like this: suppose we ask to represent numbers as a sum of two odd
primes. Then such a number must be even and at least 6, and experimenting, one
soon is led to guess that every even number at least 6 is a sum of two odd primes:
this is known as Goldbachs Conjecture, and is about 400 years old. It remains
unsolved. There are many, many such easily stated unsolved problems which mix
primes and addition: for instance, how many primes p are of the form n2 +1? Again,
it is a standard conjecture that there are innitely many, and it is wide open. Note
that if we asked instead how many primes were of the form n2 , we would have no
trouble answering the innocent addition of 1 gives us terrible problems.
Lest you think we are just torturing ourselves by asking such questions, let me
mention three amazing positive results:
Theorem 3. (Fermat, 12/25/1640) A prime p > 2 is of the form x2 + y 2 i
it is of the form 4k + 1.
This is, to my mind, the rst beautiful theorem of number theory. It says that
to check whether an odd prime satises the very complicated condition of being a
sum of two (integer, of course!) squares, all we need to do is divide it by four: if
its remainder is 1, then it is a sum of two squares; otherwise its remainder will be
3 and it will not be a sum of two squares.
Theorem 4. (Lagrange, 1770) Every positive integer is of the form x2 + y 2 +
z + w2 .
2
11
that x2 + y 2 can be factored in the ring Z[i] of Gaussian integers as (x + iy)(x iy)
and will be our jumping o point to the use of algebraic methods. There is an analogous proof of Theorem 4 using a noncommutative ring of integral quaternions.
This proof however has some technical complications which make it less appealing
for in-class presentation, so we do not discuss it in these notes.5 On the other
hand, we will give parallel proofs of Theorems 3 and 4 using geometric methods.
The proof of Theorem 5 is of a dierent degree of sophistication than any other
proofs in this course. We do present a complete proof at the end of these notes,
but I have not managed to persuade myself that our treatment is appropriate for a
one-semester undergraduate course in the subject.
Admission: In fact there is a branch of number theory which studies only the
addition operation on subsets of N: if A and B are two subsets of natural numbers,
then by A+B we mean the set of all numbers of the form a+b for a A and b B.
For a positive integer h, by hA we mean the set of all h-fold sums a1 + . . . + ah
of elements of A (repetitions allowed). There are plenty of interesting theorems
concerning these operations, and this is a branch of mathematics called additive
number theory. In truth, though, it is much more closely related to other branches
of mathematics like combinatorics, Fourier analysis and ergodic theory than to the
sort of number theory we will be exploring in this course.
2. The Fundamental Theorem (in Z)
2.1. Existence of prime factorizations.
We had better pay our debts by giving a proof of the uniqueness of the prime
power factorization. This is justly called the Fundamental Theorem of Arithmetic.
Let us rst nail down the existence of a prime power factorization, although as
mentioned above this is almost obvious:
Proposition 6. Every positive integer n is a product of primes pa1 1 par r
(when n = 1 this is the empty product).
Proof. By induction on n, the case of n = 1 being trivial. Assume n > 1 and
the result holds for all m < n. Among all divisors d > 1 of n, the least is necessarily
a prime, say p. So n = pm and apply the result inductively to m.
Important Remark: Note that the result seemed obvious, and we proved it by
induction. Formally speaking, just about any statement about the integers contain
an appeal to induction at some point, since induction or equivalently, the wellordering principle that any nonempty subset of integers has a smallest element is
(along with a few much more straightforward axioms) their characteristic property.
But induction proofs can be straightforward, tedious, or both. Often I will let you
ll in such induction proofs; I will either just say by induction or, according to
taste, present the argument in less formal noninductive terms. To be sure, sometimes an induction argument is nontrivial, and those will be given in detail.
A factorization n = pa1 1 par r is in standard form if p1 < . . . < pr . Any factorization can be put in standard form by correctly ordering the prime divisors.
5It was, in fact, the subject of a student project in the 2007 course.
12
b
p does not divide either a or b. Writing out a = i pai i and b = j qj j , our as
a
sumptions are equivalent to pi = p = qj for all i, j. But then ab = pai i qj j , and
collecting this into standard form we get that no positive power of the prime p appears in the standard form factorization of ab. On the other hand, by assumption
p | ab so ab = p m, and then factoring m into primes we will get a standard form
factorization of ab in which p does apear to some positive power, contradicting the
uniqueness of the standard form prime factorization.
Theorem 142 = Theorem 140: Let us induct on the (minimal!) number r
of factors in a prime factorization of n. The case of r = 0 i.e., n = 1 is trivial.
Suppose the result holds for numbers with < r factors, and consider
n = pa1 1 par r = q1b1 qsbs .
b
13
n = p1 pr = q1 qs .
Here the pi s and qj s are prime numbers, not necessarily distinct from each other.
However,we must have p1 = qj for any j. Indeed, if we had such an equality, then
after relabelling the qj s we could assume p1 = q1 and then divide through by
p1 = q1 to get a smaller positive integer pn1 . By the assumed minimality of n, the
prime factorization of pn1 must be unique: i.e., r 1 = s 1 and pi = qi for all
2 i r. But then multiplying back by p1 = q1 we see that we didnt have two
dierent factorizations after all. (In fact this shows that for all i, j, pi = qj .)
In particular p1 = q1 . Without loss of generality, assume p1 < q1 . Then, if we
subtract p1 q2 qs from both sides of (1), we get
(2)
14
15
It is not obvious, but rather familiar and true. The best way to perceive the
non-obviousness is to consider new and dierent contexts.
Example: let E denote the set of even integers.7 Because this is otherwise known
as the ideal (2) = 2Z, it has a lot of structure: it forms a group under addition,
and there is a well-dened multiplication operation satisfying all the properties of
a ring except one: namely, there is no 1, or multiplicative identity. (A ring without identity is sometimes wryly called a rng, so the title of this section is not a typo.)
Let us consider factorization in E: in general, an element x of some structure should
be prime if every factorization x = yz is trivial in some sense. However, in E,
since there is no 1, there are no trivial factorizations, and we can dene an element
x of E to be prime if it cannot be written as the product of two other elements of E.
Of course this is a new notion of prime: 2 is a conventional prime and also a prime
of E, but clearly none of the other conventional primes are E-prime. Moreover there
are E-primes which are not prime in the usual sense: e.g., 6 is E-prime. Indeed, it is
not hard to see that an element of E is an E-prime i it is divisible by 2 but not by 4.
Now consider
36 = 2 18 = 6 6.
Since 2, 18 and 6 are all divisible by 2 and not 4, they are E-primes, so 36 has two
dierent factorizations into E-primes.
This example begins to arouse our skepticism about unique factorization: it is
not, for instance, inherent in the nature of factorization that factorization into
primes must be unique. On the other hand, the rng E is quite articial: it is an
inconveniently small substructure of a better behaved ring Z. Later we will see
more distressing examples.
Example 2: Let R = R[cos , sin ] be the ring of real trigonometric polynomials: i.e., the ring whose elements are polynomial expressions in sin and cos with
real coecients. We view the elements as functions from R to R and add and multiply them pointwise.
Of course this ring is not isomorphic to the polynomial ring R[x, y], since we
have the Pythagorean identity cos2 + sin2 = 1. It is certainly plausible and can
be shown to be true that all polynomial relations between the sine and cosine are
consequences of this one relation, in the sense that R is isomorphic to the quotient
ring R[x, y]/(x2 + y 2 1).
Now consider the basic trigonometric identity
(4)
It turns out that cos , 1 + sin and 1 sin are all irreducible elements in the ring
R . Moreover, the only units in R are the nonzero real numbers, so all three of
these elements are nonassociate, and therefore (34) exhibits two dierent factorizations into irreducible elements! Thus, in a sense, the failure of unique factorization
7This example is taken from Silvermans book. In turn Silverman took it, I think, from
Harold Starks introductory number theory text. Maybe it is actually due to Stark...
16
n=
pai i ,
i
where pi denotes the ith prime in sequence, and ai is a non-negative integer. This
looks like an innite product, but we impose the condition that ai = 0 for all but
nitely many i,8 so that past a certain point we are just multiplying by 1. The
convenience of this is that we do not need dierent notation for the primes dividing
some other integer.
Now suppose we have two such factored positive integers
8In fact, this representation is precisely analogous to the expression of (Z, ) = (N, +) of
problem G1).
a=
17
pai i ,
b=
pbi i .
Then we can give a simple and useful formula for the gcd and the lcm. Namely,
the greatest common divisor of a and b is
min(a ,b )
i i
gcd(a, b) =
pi
,
i
where min(c, d) just gives the smaller of the two integers c and d (and, of course, the
common value c = d when they are equal). More generally, we have that, writing
out two integers a and b in factored form above, we have that a | b ai bi
for all i. In fact this is exactly the statement that a|b a|p b for all p that we
expressed earlier.
We often (e.g. now) nd ourselves wanting to make reference to the ai in the
prime power factorization of an integer a. The ai is the highest power of pi that
divides a. One often says that pai i exactly divides a, meaning that pai i |a and piai +1
does not. So let us dene, for any prime p, ordp (a) to be the highest power of p
that divides a: equivalently:
ordp (n)
n=
pi i .
i
Notice that ordp is reminiscent of a logarithm to the base p: in fact, thats exactly
what it is when n = pa is a power of p only: ordp (pa ) = a. However, for integers
n divisible by some prime q = p, logp (n) is nothing nice in fact, it is an irrational number whereas ordp (n) is by denition always a non-negative integer. In
some sense, the beauty of the functions ordp is that they allow us to localize our
attention at one prime at a time: every integer n can be written as pr m with
gcd(m, p) = 1, and the ordp just politely ignores the m: ordp (pr m) = ordp (pr ) = r.
This is really just notation, but it is quite useful: for instance, we can easily see
that for all p,
ordp (gcd(a, b)) = min(ordp (a), ordp (b));
this just says that the power of p which divides the gcd of a and b should be the
largest power of p which divides both a and b. And then a positive integer n is
determined by all of its ordp (n)s via the above equation.
Similarly, dene the least common multiple lcm(a, b) of positive integers a and
b to be a positive integer m with the property that a|e & b|e = m|e. Then
essentially the same reasoning gives us that
ordp (lcm(a, b)) = max(ordp (a), ordp (b)),
and then that
lcm(a, b) =
We can equally well dene ordp on a negative integer n: it is again the largest
power i of p such that pi |n. Since multiplying by 1 doesnt change divisibility in
18
any way, we have that ordp (n) = ordp (n). Note however that ordp (0) is slightly
problematic every pi divides 0: 0 pi = 0 so if we are going to dene this at all
it would make sense to put ordp (0) = .
We do lose something by extending the ord functions to negative integers: namely,
since for all p, ordp (n) = ordp (n), the ord functions do not allow us to distinguish
between n and n. From a more abstract algebraic perspective, this is because
n and n generate the same ideal (are associates; more on this later), and we
make peace with the fact that dierent generators of the same ideal are more or
less equivalent when it comes to divisibility. However, in Z we do have a remedy:
we could dene a map ord1 : Z \ {0} 1 such that ord1 (n) = +1 if n > 0 and
1 if n < 0. Then 1 acts as a prime of order 2, in contrast to the other innite
order primes, and we get a corresponding unique factorization statement.9 But
although there is some sense to this, we will not adopt it formally here.
Proposition 12. For p a prime and m and n integers, we have:
a) ordp (mn) = ordp (m) + ordp (n).
b) ordp (m + n) min(ordp (m), ordp (n)).
c) If ordp (m) = ordp (n), ordp (m + n) = min(ordp (m), ordp (n)).
We leave these as exercises: suitably decoded, they are familiar facts about divisibility. Note that part a) says that ordp is some sort of homomorphism from Z \ {0}
to Z. However, Z \ {0} under multiplication is not our favorite kind of algebraic
structure: it lacks inverses, so is a monoid rather than a group. This perhaps suggests that we should try to extend it to a map on the nonzero rational numbers
Q (which, if you did problem G1), you will recognize as the group completion of
Z \ {0}; if not, no matter), and this is no sooner said than done:
For a nonzero rational number
a
b,
we dene
a
ordp ( ) = ordp (a) ordp (b).
b
In other words, powers of p dividing the numerator count positively; powers of
p dividing the denominator count negatively. There is something to check here,
namely that the denition does not depend upon the choice of representative of ab .
But it clearly doesnt:
ac
ordp ( ) = ordp (ac) ordp (bc)
bc
a
= ordp (a) + ordp (c) ordp (b) ordp (c) = ordp (a) ordp (b) = ordp ( ).
b
So we get a map
ordp : Q Z
which has all sorts of uses: among other things, we can use it to recognize whether
a rational number x is an integer: it will be i ordp (x) 0 for all primes p.
Example: Let us look at the partial sums Si of the harmonic series n=1 n1 . The
rst partial sum S1 = 1 thats a whole number. The second one is S2 = 1 + 12 = 32
which is not. Then S3 = 1 + 21 + 13 = 11
6 is not an integer either; neither is
9This perspective is due to J.H. Conway.
19
25
.
S4 = 1 + 21 + 13 + 14 = 12
It is natural to ask whether any partial sum Sn for n 1 is an integer. Indeed,
this is a standard question in honors math classes because...well, frankly, because
its rather hard.10 But using properties of the ord function we can give a simple
proof. The rst step is to look carefully at the data and see if we can nd a pattern.
(This is, of course, something to do whenever you are presented with a problem
whose solution you do not immediately know. Modern presentations of mathematics including, alas, these notes, to a large extent often hide this experimentation
and discovery process.) What we see in the small partial sums is that not only are
they not integers, they are all not integers for the same reason: there is always a
power of 2 in the denominator.
So what wed like to show is that for all n 1, ord2 (Sn ) < 0. It is true for
n = 2; moreover we dont have to do the calculation for n = 3: since ord2 ( 31 ) =
0 = ord2 (S2 ), we must have ord2 (S2 + 13 ) = min(ord2 (S2 ), ord2 (S3 )) = 1. And
then we get 41 , which 2-order 2, which is dierent from ord2 (S3 ), so again, using
that when we add two rational numbers with dierent 2-orders, the 2-order of the
sum is the smaller of the 2 2-orders, we get that ord2 (S4 ) = 2. Excitedly testing
1
a few more values, we see that this pattern continues: ord2 (Sn ) and ord2 ( n+1
) are
always dierent; if only we can show that this always holds, this will prove the
result. In fact one can say even more: one can precisely what ord2 (Sn ) is as a
function of n and thus see in particular that it is always negative. I will leave the
nal observation and proof to you why should I steal your fun?
20
in rational numbers, where a and b are nonzero rational numbers and c is any
rational number. Well, its not much fun, is it? Let x be any rational number at
all, and solve for y:
c ax
y=
.
b
Speaking more geometrically, any line y = mx + b in the plane passing through one
rational point and with rational slope roughly speaking, with m and b rational
will have lots of rational solutions: one for every rational choice of x.
So for Diophantus, the rst interesting example was quadratic polynomial equations. Indeed, after this section, the quadratic case will occupy our interest for
perhaps the majority of the course.
However, over Z things are never so easy: for instance, the equation
3x + 3y = 1
clearly does not have an integer solution, since no matter what integers x and y we
choose, 3x + 3y will be divisible by y. More generally, if a and b have a common
divisor d > 1, then it is hopeless to try to solve
ax + by = 1.
But this is the only restriction, and indeed we saw this before: en route to proving
the fundamental theorem, we showed that for any integers a and b, not both zero,
then gcd(a, b) generates the ideal {xa + yb | x, y Z}, meaning that for any integer
m, the equation
ax + by = m gcd(a, b)
has solutions in x and y. In other words, we can solve
ax + by = n
if n is a multiple of the gcd of a and b. By the above, it is also true that we can only
solve the equation if n is a multiple of the gcd of x and y the succinct statement
is the equality of ideals Ia,b = (gcd(a, b)) so we have (and already had, really) the
following important result.
Theorem 13. For xed a, b Z, not both zero, and any m Z, the equation
ax + by = m
has a solution in integers (x, y) i gcd(a, b) | m.
In particular, if gcd(a, b) = 1, then we can solve the equation for any integer m.
The fundamental case is to solve
ax + by = 1,
because if we can nd such x and y, then just by multiplying through by m we can
solve the general equation.
This is a nice result, but it raises two further questions. First, we found one
solution. Now what can we say about all solutions?11 Second, given that we know
that solutions exist, how do we actually nd them?
11Diophantus was for the most part content with nding a single solution. The more penetrating inquiry into the set of all solutions was apparently rst made by Fermat.
21
ax + by = N
ax + by = 0
Here we are saying something quite basic in a fancy way: the real solutions of (6)
form a line through the origin in R2 , with slope m = a
b . But the set of integer
solutions to (6) also has a nice algebraic structure: if (x1 , y1 ), (x2 , y2 ) are any two
integer solutions and C is any integer, then since
a(x1 + x2 ) + b(y1 + y2 ) = (ax1 + by1 ) + (ax2 + by2 ) = 0 + 0 = 0,
a(Cx1 ) + b(Cy2 ) = C(ax1 + by1 ) = C 0 = 0,
both the sum (x1 , y1 ) + (x2 , y2 ) and the integer multiple C(x1 , y1 ) are solutions. To
be algebraically precise about it, the set of integer solutions to (6) forms a subgroup
of the additive group of the one-dimensional R-vector space of all real solutions.
Now we claim that it is easy to solve the homogeneous equation directly. The
Q-solutions are clearly {(x, a
b x) | x Q}. And, since a and b are relatively prime,
in order for x and a
x
to
both
be integers, it is necessary and sucient that x
b
itself be an integer and that it moreover be divisible by b. Therefore the general
integral solution to the homogeneous equation is {(nb, na) | n Z}.
12Especially, from the elementary theory of dierential equations.
22
23
( )2
( )
2
N
1
a2 + b2
N
1
d
N =
+
=N
+
=
N
=
N.
a
b
a2
b2
ab
ab
Thus when N is small, LN is a very small line segment, and since consecutive
integral solutions on the line are spaced d units apart, it is by no means guaranteed
that there are any integral solutions on LN . For instance, since ax + by a + b 2,
there is no positive integral solution to ax + by = 1. But since LN grows linearly
with N and d is independent of N , when N is suciently large we must have some
integral points on LN . In fact this must happen as soon as N > d.13 By similar
N
reasoning, the number of solutions must be extremely close to dn = ab
. Precisely:
Theorem 15. Let a, b Z+ be relatively prime, and let N Z+ .
a) If N > ab, then there exist positive integers x, y such that ax + by = N .
b) Let NN be the number of positive integral solutions (x, y) to ax + by = N . Then
N
N
1 NN + 1.
ab
ab
Proof. Suppose not: then there exist integers a and b = 0 such that 2 = ab ,
2
meaning that 2 = ab2 . We may assume that a and b have no common divisor if
they do, divide it out and in particular that a and b are not both even.
Now clear denominators:
a2 = 2b2 .
So 2 | a2 . It follows that 2 | a. Notice that this is a direct consequence of Euclids
Lemma if p | a2 , p | a or p | a. On the other hand, we can simply prove the
contrapositive: if a is odd, then a2 is odd. By the Division Theorem, a number
is odd i we can represent it as a = 2k + 1, and then we just check: (2k + 1)2 =
4k 2 + 4k + 1 = 2(2k 2 + 2k) + 1 is indeed again odd. So a = 2A, say. Plugging this
into the equation we get
(2A)2 = 4A2 = 2b2 , b2 = 2A2 ,
so 2 | b2 and, as above, 2 | b. Thus 2 divides both a and b: contradiction.
13To understand the reasoning here, imagine that you know that a certain bus comes once
every hour at a xed time i.e., at a certain number of minutes past each hour but you dont
know exactly what that xed time is. Nevertheless, if you wait for any full hour, you will be able
to catch the bus.
24
Can we prove that 3 is irrational in the same way(s)? The Euclids Lemma
What about the case of general n? Well, of course n2 is not only rational but is
an integer, namely n. Moreover, an arbitrary positive integer n can be factored to
25
get one of these two limiting cases: namely, any n can be uniquely decomposed as
n = sN 2 ,
2
where
s is squarefree. (Prove it!) Since sN = N s, we have that n is rational
i s is rational; by the above result, this only occurs if s = 1. Thus:
3
cube; with this one can prove that the n is irrational unless n = N 3 . For the sake
of variety, we prove the general result in a dierent way.
2 Z
26
or
(7)
(
)
an = b cn1 an1 . . . bn2 c1 a bn1 c0 .
If b > 1, then some prime p divides b and then, since p divides the right hand
side of (34), it must divide the left hand side: p | an , so p | a. But, as usual, this
contradicts the fact that a and b were chosen to be relatively prime.
Example: Is 12 Z? Well, the polynomial 2t 1 is not monic, but maybe 12 satises
some other monic polynomial? By Theorem 20, the answer is no: otherwise 12 Z.
We
can deduce Theorem 4 from Theorem 5 by noticing that for any k and n,
k
n
is a root of the polynomial tk n so lies in Z. On the other hand, evidently
k
n is an integer i n is a perfect kth power, so when n is not a perfect kth power,
k
n Z \ Z, so by Theorem 20, k n Q.
In fact Theorem 20 is a special case of a familiar result from high school algebra.
Theorem 21. (Rational Roots Theorem) If
P (x) = an X + . . . + a1 x + a0
is a polynomial with integral coecients, then the only possible rational roots are
those of the form dc , where c | a0 , d | an .
6. Primitive Roots
Let N be a positive integer. An integer g is said to be a primitive root modulo
N if every element x of (Z/N Z) is of the form g i for some positive integer i.
Equivalently, the nite group (Z/N Z) is cyclic and g (mod N ) is a generator.
Wed like to nd primitive roots mod N , if possible. There are really two problems:
Question 1. For which N does there exist a primitive root modulo N ?
Question 2. Assuming there does exist a primitive root modulo N , how do we
nd one? How do we nd all of them?
We can and shall give a complete answer to Question 1. We already know that the
group of units of a nite eld is nite, and we know that Z/N Z is a eld if (and
only if) N is prime. Thus primitive roots exist modulo N when N is prime.
When N is not prime we might as well ask a more general question: what is the
structure of the unit group (Z/N Z) ? From our work on the Chinese Remainder
theorem, we know that if N = pa1 1 par r , there is an isomorphism of unit groups
r
Thus it is enough to gure out the group structure when N = pa is a prime power.
Theorem 22. The nite abelian group (Z/pa Z) is cyclic whenever p is an
odd prime, or when p = 2 and a is 1 or 2. For a 3, we have
(Z/2a Z)
= Z2 Z2a2 .
Before proving Theorem 22, let us nail down the answer it gives to Question 1.
6. PRIMITIVE ROOTS
27
Corollary 23. Primitive roots exist modulo N in precisely the following cases:
(i) N = 1, 2 or 4.
(ii) N = pa is an odd prime power.
(iii) N = 2pa is twice an odd prime power.
Proof. Theorem 22 gives primitive roots in cases (i) and (ii). If p is odd, then
(Z/2pa Z)
= (Z/2Z) (Z/pa Z)
= (Z/pa Z)
since (Z/2Z) is the trivial group. Conversely, if N is not of the form (i), (ii) or
(iii) then N is divisible either by 8 or by two distinct odd primes p and q. In the
rst case, write N = 2a M with (2, M ) = 1 and a 3. Then
(Z/N Z)
= (Z/2a Z) (Z/M Z) ,
and (Z/N Z) , having the noncylic subgroup (Z/2a Z) , cannot itself be cyclic
[Handout A2.5, Corollary 6]. In the second case write N = pa q b M ; then
(Z/N Z)
= (Z/pa Z) (Z/q b Z) (Z/M Z) .
Both (Z/pa Z) and (Z/q a Z) have even order, hence their orders are not relatively
prime and the product group cannot be cyclic [Handout A2.5, Corollary 10].
Proof of Theorem 22: The idea for odd p is as follows: if g is a primitive root
mod p, then [Handout A2.5, Corollary 2] the order of g mod pa is divisible by p 1,
k
hence of the form pk (p 1) for some k a 1. Therefore g = g p has order
p 1 [Handout A2.5, Proposition 7]. We claim z = 1 + p has order pa1 ; since
gcd(pa1 , p 1) = 1, g z has order pa1 (p 1) [Handout A2.5, Example 4].
Lemma 24. Let p be an odd prime and z Z, z 1 (mod p).
a) We have ordp (z p 1) = ordp (z 1) + 1.
k
b) For all k Z+ , ordp (z p 1) = ordp (z 1) + k.
Proof. Write z = 1 + xp for some x Z, so ordp (z 1) = 1 + ordp (x). Then
( )
( )
(
)
p
p
p
p
p
2
(8) z 1 = (1 + xp) 1 =
(xp) +
(xp) + . . . +
(xp)p1 + (xp)p .
1
2
p1
For the rst term on the right hand side of (8), we have
( )
p
ordp (
xp) = 2 + ordp (x) = ordp (z 1) + 1.
1
The remaining terms have larger p-orders, so the p-order of z p 1 is ordp (z 1) + 1,
k
k1
whence part a). Since z p 1 = (z p )p 1, part b) follows by induction.
k1
28
just 4x + 4x2 = 4x(x + 1), whose 2-order is at least 3 + ord2 (x) if x is odd. So
instead we take x even. In fact we may just take x = 2, so z = 1 + 2x = 5,
ord2 (z 2 1) = ord2 (z 1) + ord2 (z + 1) = ord2 (z 1) + ord2 (6) = ord2 (z 1) + 1.
Again, inductively, we get
k
ord2 (z 2 1) = ord2 (z 1) + k,
or ord2 (52 1) = k + 2. Thus for a 2, 5 has order 2a2 in (Z/2a Z) . Moreover
5k + 1 2 (mod 4) for all k, so 5k = 1 (mod 2a ), so the subgroups generated by
the classes of 5 and of 1 are disjoint. This completes the proof of Theorem 22.
k
CHAPTER 2
Pythagorean Triples
1. Parameterization of Pythagorean Triples
1.1. Introduction to Pythagorean triples.
By a Pythagorean triple we mean an ordered triple (x, y, z) Z3 such that
x2 + y 2 = z 2 .
The name comes from elementary geometry: if a right triangle has leg lengths x
and y and hypotenuse length z, then x2 + y 2 = z 2 . Of course here x, y, z are posi2
2
tive real numbers. For most integer values of x and
y, the integer x + y will not
2
2
be a perfect square, sothe positive real number x + y will be irrational: e.g.
x = y = 1 = z = 2. However, a few integer solutions to x2 + y 2 = z 2 are
familiar from high school algebra (and the SATs): e.g. (3, 4, 5), (5, 12, 13).
Remark: As soon as we have one solution, like (3, 4, 5), we can nd innitely many
more, however in a somewhat unsatisfying way. Namely, if (x, y, z) is a Pythagorean
triple and a is any integer, then also (ax, ay, az) is a Pythagorean triple:
(ax)2 + (ay)2 = a2 (x2 + y 2 ) = a2 z 2 = (az)2 .
This property of invariance under scaling is a characteristic feature of solutions
(x1 , . . . , xn ) to homogeneous polynomials P (t1 , . . . , tn ) in n-variables. We recall
what this means: a monomial is an expression of the form cta1 1 tann , and the degree
of the monomial is dened to be a1 + . . . + +an , i.e., the sum of the exponents. A
polynomial is said to be homogeneous of degree d if each of its monomial terms
has degree d, and simply homogeneous if it is homogeneous of some degree d. For
instance, the polynomial P (x, y, z) = x2 + y 2 z 2 is homogeneous of degree 2, and
indeed for any N the Fermat polynomial
PN (x, y, z) = xN + y N z N
is homogeneous of degree N . Moreover, every (nonconstant) homogeneous polynomial P (t1 , . . . , tn ) has zero constant term, hence P (0, . . . , 0) = 0. So (0, . . . , 0) is a
solution to any homogeneous polynomial, called the trivial solution.
Coming back to Pythagorean triples, these considerations show that for all a Z,
(3a, 4a, 5a) is a Pythagorean triple (again, familiar to anyone who has studied for
the SATs). For many purposes it is convenient to regard these rescaled solutions as
being equivalent to each other. To this end we dene a Pythagorean triple (a, b, c)
to be primitive if gcd(a, b, c) = 1. Then every nontrivial triple (a, b, c) is a positive
integer multiple of a unique primitive triple, namely ( ad , db , dc ) where d = gcd(a, b, c).
29
30
2. PYTHAGOREAN TRIPLES
Our goal is to nd all primitive Pythagorean triples. There are many ways to
do so. We prefer the following method, both for its simplicity and because it motivates the study of not just integral but rational solutions of polynomial equations.
Namely, consider the algebraic curve x2 + y 2 = 1 in R2 : i.e., the unit circle. Why?
Well, suppose (a, b, c) is a nontrivial Pythagorean triple, so a2 + b2 = c2 with c = 0
(if c = 0, then a2 + b2 = 0 = a = b = 0). So we may divide through by c, getting
( a )2 ( b ) 2
+
= 1.
c
c
Thus ( ac , cb ) is a rational point on the unit circle. Moreover, the process can be
essentially reversed: suppose that (r, s) Q2 is such that r2 + s2 = 1. Then,
writing r = ac and s = db (so cd = 0), we have
( a )2 ( b )2
+
= 1.
c
d
Multiplying through by (cd)2 , we get
(da)2 + (bc)2 = (bd)2 ,
so that (da, bc, bd) is a nontrivial Pythagorean triple. If we start with a primitive
Pythagorean triple (a, b, c), pass to the rational solution ( ac , cb ) and then clear denominators using the above formula, we get (ca, cb, c2 ). This is not the primitive
triple that we started with, but it is simply a rescaling: no big deal. At the end we
will nd the correct scaling that gives primitive triples on the nose.
1.2. Rational parameterization of the unit circle.
Fix any one rational point P = (x , y ) on the unit circle. The argument that
we are about to make works for any choice of P e.g. ( 35 , 45 ) but let me pass
along the wisdom of hindsight: the computations will be especially simple and clean
if we take P = (1, 0). So let us do so.
Now suppose P = (xP , yP ) is any other rational point on the unit circle. Then
there is a unique line joining P to P , which of course has rational coecients:
yP y
: y yP =
(x xP ) .
xP x
In particular, the slope of this line
yP y
mP =
,
xP x
is a rational number. This already places a limitation on the rational solutions,
since most lines passing through the xed point P have irrational slope. More
interesting is the converse: for any m Q, let
m : y = (y y ) = m(x x ) = m(x + 1),
be the line passing through P = (1, 0) with slope m. We claim that this line
intersects the unit circle in precisely one additional point Pm , and that this point
Pm also has rational coordinates. That is, we claim that the rational points on the
unit circle are precisely the point P = (1, 0) together with the set of points Pm
as m ranges through the rational numbers.
31
Why is this so? With a bit of thought, we can argue for this in advance. Briey,
we plug the linear equation m into the quadratic x2 + y 2 = 1 thereby getting a
quadratic equation in x with rational coecients. Because we know that this equation has at least one rational solution namely 1, the coordinate of P the other
solution must be rational as well, as follows from contemplation of the quadratic
formula. On the other hand, such forethought is not really necessary in this case,
because we want to nd the solutions explicitly anyway. In other words, lets do it!
We have the system of equations
(9)
x2 + y 2 = 1
(10)
y = m(x + 1).
2m2
2m2 2
m2 1
=
.
2
2(1 + m )
1 + m2
1
Notice that by taking the minus sign, we get the solution x = m
1+m2 = 1. Thats
great, because 1 is the x-coordinate of P , so that it had better be a solution.
The other solution is the one we really want:
2
xm =
and then we get
ym
1 m2
,
1 + m2
)
(
2m
1 m2
= 2
,
= m(1 + xm ) = m 1 +
1 + m2
m +1
so that nally
(
Pm =
1 m2
2m
,
1 + m2 1 + m2
)
.
32
2. PYTHAGOREAN TRIPLES
33
triple is of the form (da, db, dc) for some d Z+ , where (a, b, c) is a Pythagorean
triple with gcd(a, b, c) = 1, called primitive.
c) In every primitive Pythagorean triple (a, b, c), exactly one of a and b are even
integers. Every primitive triple with a odd is of the form (v 2 u2 , 2uv, v 2 + u2 )
where u, v Z are relatively prime integers of opposite parity. Conversely, all such
pairs u, v yield a primitive Pythagorean triple with rst coordinate odd.
2. An Application: Fermats Last Theorem for N = 4
In this section we will prove Fermats Last Theorem for N = 4. In fact, following
Fermat, we establish something stronger, from which FLT(4) immediately follows.
Theorem 26. (Fermat) X 4 + Y 4 = Z 2 has no solutions with X, Y, Z Z \ {0}.
Proof. Step 1: Let (x, y, z) be a positive integral solution to X 4 + Y 4 = Z 2 .
We claim there is a positive integral solution (x , y , z ) with gcd(x , y ) = 1 and
z z. Indeed, if x and y are not relatively prime, they are both divisible by some
prime number p. Then p4 | X 4 + Y 4 = Z 2 , so p2 | Z. Therefore xp , yp , pz2 Z+ and
( )4 ( )4
( )2
x
y
1
1
z
+
= 4 (x4 + y 4 ) = 4 (z 2 ) =
,
p
p
p
p
p2
so ( xp , yp , pz2 ) is another positive integral solution, with z-coordinate smaller than
the one we started with. Therefore the process can be repeated, and since the
z-coordinate gets strictly smaller each time, it must eventually terminate with a
solution (x , y , z ) as in the statement.
Step 2: Given a positive integral solution (x, y, z) to X 4 +Y 4 = Z 2 with gcd(x, y) =
1, we will produce another positive integral solution (u, v, w) with w < z.
First, we may assume without loss of generality that x is odd and y is even.
They cannot both be even, since they are relatively prime; if instead x is even and
y is odd, then we can switch x and y; so what we need to check is that x and y
cannot both be odd. But then considering x4 + y 4 = z 2 modulo 4, we nd 2
= z2
2
2
2
2
(mod 4), which is impossible: 0 2 0 (mod 4), 1 3 1 (mod 4).
Now we bring in our complete solution of Pythagorean triples: since (x2 )2 +
2 2
(y ) = z 2 and x and y are relatively prime, (x, y, z 2 ) is a primitive Pythagorean
triple with rst coordinate odd. Therefore by Theorem 25 there exist relatively
prime integers m and n of opposite parity such that
(11)
x2 = m2 n2
y 2 = 2mn
z = m2 + n2 .
2
2
2
Now rewrite (27) as n + x = m . Since gcd(m, n) = 1, this is again a primitive
Pythagorean triple. Moreover, since x is odd, n must be even. So we can use our
parameterization again (!!) to write
x = r2 s2 ,
n = 2rs,
m = r2 + s2 ,
for coprime integers r, s of oposite parity. Now observe
( n ) 2mn
( y )2
y2
m
=
=
=
.
2
4
4
2
34
2. PYTHAGOREAN TRIPLES
Since m and n2 are coprime integers whose product is a perfect square, they must
both be perfect squares. Similarly,
2rs
n
rs =
= = ,
2
2
so r and s must both be squares. Let us put r = u2 , s = v 2 , m = w2 , and substitute
these quantities into m = r2 + s2 to get
u4 + v 4 = w 2 .
Here w 1, so
w w4 < w4 + n2 = m2 + n2 = z,
so that as promised, we found a new positive integral solution (u, v, w) with w < z.
Step 3: Steps 1 and 2 together lead to a contradiction, as follows: if we have any
positive integral solution (x, y, z) to X 4 + Y 4 = Z 2 , then by Step 1 we have one
(x , y , z ) with z z with gcd(x , y ) = 1. Then by Step 2 we have another positive
integral solution (u, v, w) with w < z z. Then by Step 1 we have another positive
integral solution (u , v , w ) with w w < z z with gcd(u , v ) = 1, and then
by Step 2 we get another solution whose nal coordinate is strictly smaller than
w. And so on. In other words, the assumption that there are any positive integer
solutions at all leads to the construction of an innite sequence of positive integer
solutions (xn , yn , zn ) with zn+1 < zn for all n. But thats impossible: there are no
innite strictly decreasing sequences of positive integers. Contradiction!
Lemma 27. Let A and B be coprime integers. Then:
a) If A and B have the same parity, gcd(A + B, A B) = 2.
b) If A and B have opposite parity, gcd(A + B, A B) = 1.
Exercise: Prove Lemma 27.
Here is a second proof, communicated to us by Barry Powell.
Proof. Seeking a contradiction, suppose the equation X 4 + Y 4 = Z 2 has
solutions (x, y, z) with z = 0. Among all such solutions, choose one with z 2 minimal.
For such a minimal solution we must have gcd(x, y) = 1: indeed, if a prime p divided
both x and y, then p4 | z 2 so p2 |z and we may take x = px , y = py , z = p2 z
to get a solution (x , y , z ) with (z )2 < z 2 . Moreover x and y must have opposite
parity: being coprime they cannot both be even; if both were odd then reducing
modulo 4 gives a contradiction. It is no loss of generality to assume that x is odd
and y is even, and it follows that z is odd.
We claim that gcd(z +y 2 , zy 2 ) = 1. Indeed, let d = gcd(z +y 2 , z y 2 ). Since y
is even and z is odd, z+y 2 is odd, hence d is odd. Suppose p is an odd prime dividing
d. Then p | (z + y 2 ) + (z y 2 ) = 2z, so p | z; moreover, p | (z + y 2 ) (z y 2 ) = 2y 2 ,
so p | y. Since x4 = z 2 y 4 , it follows that p | x, contradicting gcd(x, y) = 1.
By uniqueness of factorization there are coprime integers r and s such that
(12)
z y 2 = r4 , z + y 2 = s4 .
So (s2 + r2 )(s2 r2 ) = s4 r4 = 2y 2 with y even, hence r and s are both odd. Since
s2 , r2 are coprime integers of the same parity, by Lemma 27, gcd(s2 +r2 , s2 r2 ) = 2.
2
2
s2 +r 2
2
2
Since r, s are odd, so is s +r
2 , so gcd( 2 , s r ) = 1, and then by uniqueness
of factorization there are coprime integers a, b with
r2 + s2 = 2b2 , (r + s)(r s) = r2 s2 = a2 .
35
rs
Again by Lemma 27, gcd(r + s, r s) = 2, so gcd( r+s
2 , 2 ) = 1 and thus there are
coprime integers u, v with
s r = 2u2 , s + r = 2v 2 .
It follows that
4(u4 + v 4 ) = (s r)2 + (s + r)2 = 2(s2 + r2 ) = 4b2 ,
and thus
u4 + v 4 = b2 .
Since x is odd, hence nonzero, and x4 + y 4 = z 2 , y 2 y 4 < z 2 , so
2b2 = s2 + r2 (s2 + r2 )(s2 r2 ) = 2y 2 < 2z 2 .
Note also that b = 0, since otherwise u = v = 0, contradicting the fact that they
are coprime. Therefore we have found a solution (u, v, b) to X 4 + Y 4 = Z 2 with
0 < b2 < z 2 , contradicting the minimality of z 2 .
Corollary 28. X 4 + Y 4 = Z 4 has no solutions with X, Y, Z Z \ {0}.
Proof. Suppose there are x, y, z Z \ {0} such that x4 + y 4 = z 4 . We may
assume x, y, z are all positive. Then, since Z 4 = (Z 2 )2 , the triple (x, y, z 2 ) is a
positive integer solution to X 4 + Y 4 = Z 2 , contradicting Theorem 26.
The strategy of the above proof is known as innite descent. Over the centuries
it has been rened and developed, and the modern theory of descent is one of
the mainstays of contemporary Diophantine geometry.
3. Rational Points on Conics
The method of drawing lines that we used to nd all rational points on the unit
circle has further applicability. Namely, we consider an arbitrary conic curve
aX 2 + bY 2 = cZ 2 ,
(13)
for a, b, c Q \ {0}.
Remark: More generally, one calls a plane conic any curve given by an equation
aX 2 + bXY + cXZ + dY 2 + eY Z + f Z 2 = 0.
for a, b, c, d, e, f Q, not all zero. But as one learns in linear algebra, by making
a linear change of variables, new coordinates can be found in which the equation
is diagonal, i.e., in the form (13), and one can easily relate integral/rational points
on one curve to those on the other. So by considering only diagonalized conics, we
are not losing out on any generality.
Now, as in the case a = b = c = 1, we have a bijective correspondence between
primitive integral solutions to aX 2 + bY 2 = cZ 2 and rational points on
(14)
ax2 + by 2 = c.
If we can nd any one rational point P = (x , y ) on (14) then our previous method
works: taking the set of all lines through P with rational slope, together with the
vertical line x = x and intersecting with the conic (14), we get all rational solutions.
In the exercises the reader is invited to try this in certain cases where there are
36
2. PYTHAGOREAN TRIPLES
obvious rational solutions. For instance, if a = c then an obvious rational solution is (1, 0). The reader is asked to carry this out in a particular case and also
to investigate the structure of the primitive integral solutions in the exercises.
But there need not be any rational solutions at all! An easy example of this is
x2 + y 2 = 1,
where indeed there are clearly no R-solutions. But this is not the only obstruction.
Consider for instance
3x2 + 3y 2 = 1,
whose real solutions form a circle of radius 13 . We claim that there are however
no rational points on this circle. Equivalently, there are no integral solutions to
3X 2 + 3Y 2 = Z 2 with gcd(x, y, z) = 1. For suppose there is such a primitive
integral solution (x, y, z). Then, since 3 | 3x2 + 3y 2 = z 2 , we have 3 | z. So we may
put z = 3z , getting 3x2 + 3y 2 = 9(z )2 , or
x2 + y 2 = 3z 2 .
Now reducing mod 3, we get
x2 + y 2 0 (mod 3).
Since the squares mod 3 are 0 and 1, the only solution mod 3 is x y 0 (mod 3),
but this means 3 | x, 3 | y, so that the solution (x, y, z) is not primitive after all: 3
is a common divisor.
This argument can be made to go through with 3 replaced by any prime p with p 3
(mod 4). Arguing as above, it suces to show that the congruence x2 + y 2 = 0
(mod p) has only the zero solution. But if it has a solution with, say, x = 0,
( )2
then x is a unit modulo p and then xy = 1 (mod p). We will see later that
for an odd prime p, the equation a2 1 (mod p) has a solution i p 1 (mod 4).
In fact, for an odd prime p 1 (mod 4), the curve
px2 + py 2 = 1
does always have rational solutions, although this is certainly not obvious. Overall
we need a method to decide whether the conic aX 2 + bY 2 = cZ 2 has any nontrivial
integral solutions. This is provided by the following elegant theorem of Legendre.
Theorem 29. Let a, b, c be nonzero integers, square, relatively prime in pairs,
and neither all positive nor all negative. Then
ax2 + by 2 + cz 2 = 0
has a solution in nonzero integers (x, y, z)
(i) There exists x Z such that ab x2
(ii) There exists y Z such that bc y 2
(iii) there exists z Z such that ca z 2
In particular, since we can compute all of the squares modulo any integer n by a
direct, nite calculation, we can easily program a computer to determine whether
or not the equation has any nonzero integer solutions. Once we know whether
there are any integral solutions, we can search by brute force until we nd one.
The following result of Holzer puts an explicit upper bound on our search:
37
CHAPTER 3
Quadratic Rings
1. Quadratic Fields and Quadratic Rings
It is also easy to check by hand that the ring Q[ D] is a eld. For this and
for many other things to come, the key identity is
(a + b D)(a b D) = a2 Db2 .
For rational numbers a and b which are not both zero, the rational number a2
Db2
a2
is also nonzero: equivalently there are no solutions to D = b2 , because D is
irrational. It follows that again, for a, b not both 0 we have
(
)
(
)
a
b
a+b D
D
= 1,
a2 Db2
a2 Db2
1This equality is a fact which is not dicult to check; it is not the denition of Z[
By way of comparison, we recommend that the reader check that the ring Z[
form Z + Z for any two xed elements , of Z[
generated as an abelian group.
39
D
].
2
D
]
2
D].
is not of the
40
3. QUADRATIC RINGS
x2 + y 2 = p?
41
Exercise: Use the cyclicity of U (p) to give a quick proof of Proposition 32.
As it happens, in order to determine which primes are a sum of 2 squares we
only need half of the above result, and that half has a more elementary proof.
Lemma 33. (Fermats Lemma) For a prime p 1 (mod 4), there is an integer
x such that p | x2 + 1. Equivalently, 1 is a square modulo p.
Proof. A reduced residue system modulo p is a set S of p 1 integers
(p 1) (p 1)
(p 1)
,
+ 1, . . . , 1, 1, . . . ,
}.
2
2
2
Then
1
x (1)
p1
2
xS
p1
2
is even, (1)
)
p1
(
!)2
2
p1
2
(mod p).
It follows from Fermats Lemma that (34) has no Z-solutions unless p 1 (mod 4).
What about the converse: if p 1 (mod 4), must p be a sum of two squares?
By Fermats Lemma, there is x Z such that x2 1 (mod p), i.e.,
there exists n Z such that pn = x2 + 1. Now factor the right hand side over Z[ 1]:
pn = (x + 1)(x 1).
of Z[ 1], i.e., that both xp and p1 are integers. But obviously p1 is not an integer.
Therefore p is not prime, so3 there exists a nontrivial factorization
(16)
p = ,
p = p = = .
1]
a
b
with a2 + b2 = 1, then its multiplicative inverse in Q[ 1] is a2 +b
1 =
2 a2 +b2
2
2
a b 1 which again lies in Z[ 1]. In other words, a + b = 1 implies that
3A gap occurs in the argument here. It has been deliberately inserted for pedagogical reasons.
Please keep reading at least until the beginning of the next section!
42
3. QUADRATIC RINGS
(19)
x2 + Dy 2 = p,
pn = (x + D)(x D).
D), it must
divide one
of the factors: p | x D. But since xp p1 D is still not in Z[ D], this is
, nonunits
in
Z[
D].
Let
us
now
dene,
for
any
=
a
+
b
D
Q(
D),
43
44
3. QUADRATIC RINGS
45
q, r R with a = qb + r and N (r) < N (b). Then what we have actually shown is
that any domain which admits a generalized Euclidean norm is a PID.5
4.2. PIDs and UFDs.
One also knows that any PID is a UFD. This is true in general, but in the general case it is somewhat tricky to establish the existence of a factorization into
irreducibles. In the presence of a multiplicative norm function N : R N
i.e., a function such that N (x) = 0 x = 0, N (x) = 1 x R ,
N (xy) = N (x)N (y)x, y R this part of the argument becomes much easier to
establish, since for any nontrivial factorization x = yz we have N (y), N (z) < N (x).
Complete details are available in loc. cit.
4.3. Some Euclidean quadratic rings.
ous subsection, what we must show is: for all Q( D), thereexists Z[ D]
with N ( ) < 1. A general element of is of the form
r + s D with r, s Q,
and we are trying to approximate it by an element x + y D with x, y Z.
Let us try something easy: take x (resp. y) to be an integer nearest to r (resp. s).
If z is any real number, there exists an integer n with |z n| 12 , and this bound
is sharp, attained for all real numbers with fractional part 12 .6 So let x, y Z be
Evidently
|D|+1
4
|D| + 1
.
4
46
3. QUADRATIC RINGS
47
which are congruent to 1 modulo 4, and q1 , . . . , qs for the distinct prime divisors of
n which are congruent to 1 modulo 4, so that
ns
mr n1
1
n = 2a pm
1 p r q1 qs ,
Euclids
Lemma applies.
We have p | x2 + y 2 = (x + y 1)(x y 1), so that
p | x + y 1 or p | x y 1. This implies that xp , yp Z, i.e., p | x and p | y.
In summary, we have shown:
Theorem 40. (Full Two Squares Theorem) A positive integer n is a sum of
two squares i ordp (n) is even for all primes p 1 (mod 4).
CHAPTER 4
Quadratic Reciprocity
We now come to the most important result in our course: the law of quadratic
reciprocity, or, as Gauss called it, the aureum theorema (golden theorem).
Many beginning students of number theory have a hard time appreciating this
golden theorem. I nd this quite understandable, as many rst courses do not
properly prepare for the result by discussing enough of the earlier work which makes
quadratic reciprocity an inevitable discovery and its proof a cause for celebration.
Happily, our study of quadratic rings and the quadratic form x2 Dy 2 has provided
excellent motivation. There are also other motivations, involving (what we call here)
the direct and inverse problems regarding the Legendre symbol.
A faithful historical description of the QR law is especially complicated and
will not be attempted here; we conne ourselves to the following remarks. The
rst traces of QR can be found in Fermats Lemma that 1 is a square modulo
an odd prime p i p 1 (mod 4), so date back to the mid 1600s. Euler was
the rst to make conjectures equivalent to the QR law, in 1744. He was unable
to prove most of his conjectures despite a steady eort over a period of about 40
years. Adrien-Marie Legendre was the rst to make a serious attempt at a proof
of the QR law, in the late 1700s. His proofs are incomplete but contain much
valuable mathematics. He also introduced the Legendre symbol in 1798, which as
we will see, is a magical piece of notation with advantages akin to Leibnizs dx in
the study of dierential calculus and its generalizations. Karl Friedrich Gauss gave
the rst complete proof of the QR law in 1797, at the age of 19(!). His argument
used mathematical induction(!!). The proof appears in his groundbreaking work
Disquisitiones Arithmeticae which was written in 1798 and rst published in 1801.
The circle of ideas surrounding quadratic reciprocity is so rich that I have
found it dicult to linearize it into one written presentation. (In any classroom
presentation I have found it useful to begin each class on the subject with an
inscription of the QR Law on a side board.) In the present notes, the ordering is
as follows. In 1 we give a statement of the quadratic reciprocity law and its two
supplements in elementary language. Then in 2 we discuss the Legendre symbol,
restate QR in terms of it, and discuss (with proof) some algebraic properties of the
Legendre symbol which are so important that they should be considered part of the
quadratic reciprocity package. In 3 we return to our
unnished theorems about
representation of primes by |x2 Dy 2 | when Z[ D] is a PID: using quadratic
reciprocity, we can state and prove three bonus theorems which complement
Fermats Two Squares Theorem. In 4 we dene and discuss the direct and inverse
problems for the Legendre symbol and show how quadratic reciprocity is useful
for both of these, in particular for rapid computation of Legendre symbols. More
precisely, the computation would be rapid if we could somehow avoid having to
49
50
4. QUADRATIC RECIPROCITY
factor numbers quickly, and 5 explains how we can indeed avoid this by using an
extension of the Legendre symbol due to Jacobi.
1. Statement of Quadratic Reciprocity
Notational comment: when we write something like p a, b, c (mod n), what we
mean is that p a (mod n) or p b (mod n) or p c (mod n). (I dont see any
other vaguely plausible interpretation, but it doesnt hurt to be careful.)
Theorem 41. (Quadratic Reciprocity Law) Let p = q be odd primes. Then:
(i) If p 1 (mod 4) or q 1 (mod 4), p is a square mod q i q is a square mod p.
(ii) If p q 3 (mod 4), p is a square mod q i q is not a square mod p.
Theorem 42. (First Supplement to the Quadratic Reciprocity Law) If p is an
odd prime, then 1 is a square modulo p i p 1 (mod 4).
Theorem 43. (Second Supplement to the Quadratic Reciprocity Law) If p is
an odd prime, then 2 is a square modulo p i p 1, 7 (mod 8).
2. The Legendre Symbol
2.1. Dening the Legendre Symbol.
We now introduce a piece of notation created by Adrien-Marie Legendre in 1798.
There is no new idea here; it is merely notation, but is an example of how incredibly useful well-chosen notation can be.
For n an integer and p an odd prime, we dene the Legendre symbol
( )
0, if n 0 (mod p)
n
1, if n mod p is a nonzero square
:=
p
1, if n mod p is nonzero and not a square
( )
We must of course distinguish the Legendre symbol np from the rational number
n
p.
Example 1: To compute ( 12
5 ), we must rst observe that 5 does not divide 12 and
then determine whether 12 is a nonzero square modulo 5. Since 12 2 (mod 5)
and the squares modulo 5 are 1, 4, the answer to the question Is 12 a nonzero
square modulo 5? is negative, so ( 12
5 ) = 1.
Example 2: To compute ( 101
97 ) note that 97 is prime! we observe that 97
does not divide 101. Since 101 4 22 (mod 97), the answer to the question Is
101 a nonzero square modulo 97? is positive, so ( 101
97 ) = 1.
97
Example 3: To compute ( 101
) note that 101 is prime! we observe that 101 certainly does not divide 97. However, at the moment we do not have a very ecient
way to determine whether 97 is a square modulo 101: our only method is to compute
all of the squares modulo 101. Some calculation reveals that 400 = 202 = 3101+7,
97
) = 1.
so 202 97 (mod 101). Thus 97 is indeed a square modulo 101, so ( 101
1There is in fact some relationship with n divided by p: if we divide n by p, getting
n = qp + r with 0 r < p, then the Legendre symbols ( n
) and ( pr ) are equal.
p
51
a)
[n] : G/G[n] Gn .
Now further suppose that G is nite. Then
#Gn =
#G
.
#G[n]
Consider for a moment the case gcd(n, #G) = 1. Suppose g G[n]. Then the
order of g divides n, whereas by Lagranges theorem, the order of g divides #G, so
the order of G divides gcd(n, #G) = 1: so g = 1 and G[n] = {1}. Thus #Gn = #G
so Gn = G. So in this case every element of G is an nth power.
We remark in passing that the converse is also true: if gcd(n, #G) > 1, then
G[n] is nontrivial, so the subgroup Gn of nth powers is proper in G. We do not
need this general result, so we do not prove it here, but mention only that it can
be deduce from the classication theorem for nite commutative groups.
Now we specialize to the case G = U (p) = (Z/pZ) and n = 2. Then
G[2] = {x Z/pZ \ {0} | x2 = 1}.
52
4. QUADRATIC RECIPROCITY
We claim that G[2] = {1}. First, note that since p is odd, 1 1 (mod p), i.e.,
+1 and 1 are distinct elements in Z/pZ, and they clearly both square to 1, so
that G[2] contains at least the two element subgroup {1}. Conversely, as above
every element of G[2] is a root of the quadratic polynomial t2 1 in the eld Z/pZ.
But a polynomial of degree d over any eld (or integral domain) can have at most d
distinct roots: whenever p(a) = 0, applying the division algorithm to p(t) and t a
gives p(t) = q(t)(t a) + c, where c is a constant, and plugging in t = a gives c = 0.
Thus we can factor out t a and the degree decreases by 1. Therefore #G[2] 2,
and since we have already found two elements, we must have G[2] = {1}.
So G2 is an index two subgroup of G and the quotient G/G2 has order two. Like any
group of order 2, it is uniquely isomorphic to the group {1} under multiplication.
Thus we have dened a surjective group homomorphism
L : U (p) {1},
namely we take x U (p) to the coset xU (p)2 . So, L(x) = 1 if x is a square in
(Z/pZ) and L(x) = 1 otherwise. But this means that for all x Z/pZ \ {0},
L(x) = ( xp ). Thus we have recovered the Legendre symbol in terms of purely
algebraic considerations and also shown that
( ) ( )( )
xy
x
y
x, y U (p),
=
.
p
p
p
In fact we can give a (useful!) second description of the Legendre symbol using
power maps. (This discussion repeats the proof of Proposition 32, but we are
happy to do so.) To see this, consider the map
p1
[
] : U (p) U (p).
2
We claim that the kernel of this map is again the subgroup U (p)2 of squares, of order
p1
p1
2
2 p1
2
= xp1 = 1
2 . On the one hand, observe that U (p) U (p)[ 2 ]: indeed (x )
p1
by Lagranges Theorem. Conversely, the elements of U (p)[ 2 ] are roots of the
p1
polynomial t 2 1 in the eld Z/pZ, so there are at most p1
of them. Thus
2
p1
p1
2
2
U (p) = U (p)[ 2 ]. By similar reasoning we have U (p)
{1}, hence we can
view [ p1
]
as
a
homomorphism
2
p1
] : U (p) {1}.
2
Since the kernel of L is precisely the subgroup U (p)2 and there are only two possible values, it must be the case that L (x) = 1 for all x U (p) \ U (p)2 . In other
words, we have L (x) = ( xp ).
L = [
53
p1
2
=g
i(p1)
2
54
4. QUADRATIC RECIPROCITY
The odd primes p < 200 for which 2 is a square modulo p are:
3, 11, 17, 19, 41, 43, 59, 67, 73, 83, 89, 97, 107, 113, 131, 137, 139, 163, 179, 193.
Notice that these are precisely the primes p < 200 with p 1, 3 (mod 8).
For D = 2, 3 we will give some data and allow you a chance to nd the pattern.
The odd primes p < 200 for which 2 is a square modulo p are:
7, 17, 23, 31, 41, 47, 71, 73, 79, 89, 97, 103, 113, 127, 137, 151, 167, 191, 193, 199.
The odd primes p < 200 for which 3 is a square modulo p are:
3, 11, 13, 23, 37, 47, 59, 61, 71, 73, 83, 97, 107, 109, 131, 157, 167, 179, 181, 191, 193.
While we are at it, why not a bit more data?
The odd primes p < 200 for which 5 is a square modulo p are:
5, 11, 19, 29, 31, 41, 59, 61, 71, 79, 89, 101, 109, 131, 139, 149, 151, 179, 181, 191, 199.
The odd primes p < 200 for which 7 is a square modulo p are:
3, 7, 19, 29, 31, 37, 47, 53, 59, 83, 103, 109, 113, 131, 137, 139, 149, 167, 193, 197, 199.
3.2. With the help of quadratic reciprocity.
We already know that a prime p is of the form |x2 2y 2 | i ( p2 ) = 1, and the
second supplement tells us that this latter conditions holds i p 1,
7 (mod 8).
While we are here, lets deal with the
absolute value: it happens that Z[ 2] contains
an element of norm 1, namely 1 2:
N (1 2) = (1 2)(1 + 2) = 12 2 12 = 1.
From this and the multiplicaitivity of the norm map, it follows that if we can represent any integer n in the form x2 2y 2 , we can also represent it in the form
(x2 2y 2 ), and conversely. From this it follows that the absolute value is superuous and we get the following result.
Theorem 47. (First Bonus Theorem) A prime number p is of the form x2 2y 2
i p = 2 of p 1, 7 (mod 8).
Now lets look at the case of D = 2, i.e., the form x2 + 2y 2 . Since 2 = 02 + 2 12 , 2
is of of the form x2 + 2y 2 . Now assume that p is odd. We know that an odd prime
p is of the form x2 + 2y 2 i ( 2
p ) = 1. We dont have a single law for this, but the
multiplicativity of the Legendre symbol comes to our rescue. Indeed,
(
) (
)( )
2
1
2
=
,
p
p
p
so
(
)
(
) ( )
2
1
2
= 1
=
.
p
p
p
2
Case 1: ( 1
p ) = ( p ) = 1. By the rst and second supplements, this occurs i p 1
(mod 4) and p 1, 7 (mod 8), so i p 1 (mod 8).
55
2
Case 2: ( 1
p ) = ( p ) = 1. By the rst and second supplements, this occurs i
p 3 (mod 4) and p 3, 5 (mod 8), so i p 3 (mod 8). Thus:
p = x2 3y 2
or
(22)
p = 3y 2 x2 .
It turns out that for any prime p, exactly one of the two equations (21), (22) holds,
which is extremely convenient: it means that we can always show that one of the
equations holds by showing that the other one does not hold!
Indeed, if we reduce the equation p = x2 3y 2 modulo 3: we get p x2
(mod 3), i.e., ( p3 ) = 1, so p 1 (mod 3). So if p 11 (mod 12) then p is not of the
form x2 3y 2 so must be of the form 3y 2 x2 . Simiarly, if we reduce the equation
p = 3y 2 x2 modulo 3, we get p x2 1 (mod 3), so if p 1 (mod 3) then
22 has no solution, so it must be that p = x2 3y 2 does have a solution.
A very similar argument establishes the following more general result.
56
4. QUADRATIC RECIPROCITY
who have done their homework know better: in fact if Z[ q] is a PID, then we must
This is a new phenomenon for us. Note that when q = 5, in conjunction with
the above (unproved) result, we get the following
Theorem 51. An odd prime p is of the form x2 + 5y 2 i p 1, 9 (mod 2)0.
q
p
Case 2b): Suppose also q 3 (mod 4). Then 1 = ( q
p ) = ( p ) = ( q ). Thus the
two congruence conditions are consistent in this case.
58
4. QUADRATIC RECIPROCITY
p1
2
(mod 28).
The QR law leads to the following general solution of the inverse problem:
Corollary 52. Let q(be)any odd prime.
a) If q 1 (mod 4), then pq = 1 i p is congruent to a square modulo q (so lies
in one of
q1
2
59
(7)
. Using QR we can invert the
Example: Suppose we want to compute 19
Legendre symbol, tacking on an extra factor of 1 because 7 19 1 (mod 4):
( )
( )
( )
( )
( )
7
19
5
7
2
=
=
=
=
.
19
7
7
5
5
We( have
) reduced to a problem we know: 2 is not a square mod 5, so the nal answer
7
is 19
= (1) = 1.
Example:
Example:
41
103
(
=
) ( )( ) ( )( )
21
3
7
41
41
=
=
=
41
41
41
3
7
(
)(
)
1
1
=
= 1 1 = 1.
3
7
103
41
) (
) ( ) ( )( )
79
101
22
2
11
=
=
=
=
101
79
79
79
79
( )
( )
( )
11
79
2
1
=
=
= (1) = 1.
79
11
11
Let us now stop and make an important observation: the quadratic reciprocity
law along with its rst and second supplements, together with parts
( ) a) and c) of
Proposition 45, allows for a computation of the Legendre symbol np in all cases.
Indeed, it is multiplicative in the numerator, so we may factor n as follows:
n = (1) 2a pb p1 pr m2 ,
,
b
p1
pr
60
4. QUADRATIC RECIPROCITY
Using the Euler relation ( ap ) a 2 (mod p) to compute ( ap ) is also rather ecient, as one can takie advantage of a powering algorithm to rapidly compute
p1
exponents modulo p (the basic idea being simply to not compute the integer a 2
at all but rather to alternate raising a to successively larger powers and reducing
the result modulo p): this can be done in time O(log3 p). For more information on
this and many other topics related to number-theoretic algorithms, we recommend
Henri Cohens A Course in Computational Algebraic Number Theory.
6. Preliminaries on Congruences in Cyclotomic Rings
For a positive integer n, let n = e
2i
n
61
zni =
aij nj
j=0
xy
p
Rn Q = Z.
To prove the second supplement we will take n = 8. To prove the QR law we will
take n = p an odd prime. These choices will be constant throughout each of the
proofs so we will abbreviate = 8 (resp. p ) and R = R8 (resp. Rp ).
7. Proof of the Second Supplement
Put = 8 , a primitive eighth root of unity and R = R8 = Z[8 ]. We have:
0 = 8 1 = ( 4 + 1)( 4 1).
Since 4 = 1 (primitivity), we must have 4 + 1 = 0. Multiplying by 2 we get
2 + 2 = 0.
So
( + 1 )2 = 2 + 2 + 2 = 2.
62
4. QUADRATIC RECIPROCITY
= ( )
(mod pR).
p
The is by Eulers relation. Multiplying through by , we get:
( )
2
(23)
p
(mod p).
p
Lemma 57. (Schoolboy binomial theorem)
Let R be a commutative ring, p a prime number and x, y R. We have
(x + y)p xp + y p
(mod pR).
(mod pR).
(mod pR)
so (23) becomes
( )
2
(mod pR).
p
It is tempting to cancel the s, but we must be careful: pR need not be a prime
ideal of the ring R.4 But, sneakily, instead of dividing we multiply by , getting
( )
2
2
2 2
(mod pR),
p
which by Lemma 2 means that
22
( )
2
p
(mod p)
in the usual sense. Since 2 is a unit in Z/pZ, dividing both sides by 2 is permissible.
We do so, getting the desired conclusion in this case:
( )
2
1 (mod p).
p
Case 2: p 1 (mod 8) is very similar: this time p = 1 , but still p
p + p 1 + = (mod
( )pR). The remainder of the argument is the same, in
particular the conclusion: p2 1 (mod p).
4In fact, it can be shown not to be prime in the case p 1 (mod 8).
63
(mod pR).
( )
2
(mod pR),
p
and again we multiply by to get a congruence modulo p and conclude
( )
2
= 1.
p
This would mean in particular that Q( p) Q(), which is far from obvious.
2i
Indeed, it need not even be quite true. Take p = 3: since 3 = e 3 = ( 1 2 3 ), the
cyclotomic eld Q(3 ) is the same as the imaginary quadratic eld Q( 3). There
is an element Z[3 ] with 2 = 3 but not one with 2 = 3.
But take heart: nding a square root of p in Q(p ) isnt exactly what we wanted
anyway. Recall that a strange factor of 1 according to whether p 1 (mod 4)
is the hallmark of quadratic reiciprocity. So actually we are on the right track.
Now, like a deus ex machina comes the Gauss sum:5
p1 ( )
t
:=
t.
p
t=0
In other words, we sum up all the pth roots of unity, but we insert 1 signs in front
of them according to a very particular recipe. This looks a bit like a random walk
in the complex plane with p steps of unit length. A probabilist would guess that
the magnitude of the complex number is roughly p.6 Well, it is our lucky day:
Theorem 58. (Gauss)
2 = (1)
p1
2
p.
p1
That is, | | = p on the nose! The extra factor of (1) 2 is more than welcome,
p1
since it appears in the quadratic reciprocity law. In fact, we dene p = (1) 2 p,
and then it is entirely straightforward to check the following
5We make the convention that from now until the end of the handout, all sums extend over
0 i p 1.
6Much more on this, the philosophy of almost square root error, can be found in the
analytic number theory part of these notes.
64
4. QUADRATIC RECIPROCITY
Lemma 59. The quadratic reciprocity law is equivalent to the fact that for
distinct odd primes p and q, we have
( ) ( )
q
p
=
.
p
q
Proof. Exercise.
Remarkably, we can now push through a proof as in the last section:
( )
p
q1
2 q1
q1
2
2
= ( )
= (p )
(mod q),
q
and suddenly our way is clear: multiply by to get
( )
p
(24)
q
(mod q).
q
On the other hand, we have
(
)q
(t)
(t)
q
t
qt
p
p
t
t
(mod q).
Now, since q is prime to p and hence to the order of , the elements qt still run
through all distinct pth roots of unity as t runs from 0 to p 1. In other words, we
can make the change of variable t 7 q 1 t and then the sum becomes
( 1 ) ( )
( )
( q 1 t )
q
t
q
t
t
=
=
.
p
p
p
p
t
t
So we win: substituting this into (24) we get
( )
( )
q
p
(mod q),
p
q
and multiplying through by we get an ordinary congruence
( )
( )
q
p
p
p (mod q);
p
q
since p is prime to q, we may cancel to get
( ) ( )
q
p
(mod q),
p
q
and nally that
( ) ( )
q
p
=
.
p
q
9. . . . the Computation of the Gauss Sum
p1
2
p.
65
We do this by introducing a slightly more general sum: for any integer a, we dene
(t)
a :=
at .
p
t
If a 0 (mod p), then
a =
(t)
p
ap
(t)
p
Notice that q came up in the proof of the quadratic reciprocity law and we quickly
rewrote it in terms of . That argument still works here, to give:
( )
a
a =
.
p
Now we will evaluate the sum a a a in two dierent ways. First, if a = 0, then
)
(
)
( )(
p1
a
a
1
2
a a =
=
2 = (1) 2 2 .
p
p
p
On the other hand
0 =
(t)
p
0t
(t)
= 0,
since each nonzero quadratic residue mod p contributes +1, each quadratic nonresidue contributes 1, and we have an equal number of each. It follows that
p1
a a = (1) 2 (p 1) 2 .
a
We also have
a a =
(x) (y )
x
a(xy) .
Lemma 60.
a) If a 0 (mod
then t at = p;
p),
b) Otherwise t at = 0.
The proof is easy. So interchanging the summations we get
(x) (y )
a a =
a(xy) .
p
p
a
x
y
a
The inner sum is 0 for all x = y, and the outer sum is 0 when x = y = 0. For each
of the remaining p 1 values of x = y, we get a contribution to the sum of p, so
a a = (p 1)p.
a
a a a
(p 1)p = (1)
gives
p1
2
(p 1) 2 ,
p1
2
p = p .
66
4. QUADRATIC RECIPROCITY
10. Comments
Working through this proof feels a little bit like being an accountant who has been
assigned to carefully document a miracle. Nevertheless, every proof of QR I know
feels this way, sometimes to an even greater extent. At least in this proof the miracle can be bottled: there are many fruitful generalizations of Gauss sums, which
can be used to prove an amazing variety of results in mathematics, from number
theory to partial dierential equations (really!).
The proof just given is a modern formulation of Gauss sixth and last proof, in
which his polynomial identities have been replaced by more explicit reference to
algebraic integers. In particular I took the proof from the wonderful text of Ireland
and Rosen, with only very minor expository modications. In addition to being no
harder than any other proof of QR that I have ever seen, it has other merits:
rst,
it shows that the cyclotomic eld Q(p ) contains the quadratic eld Q( p ) in
fact, Galois theory shows that this is the unique quadratic eld contained in Q a
fact which comes up again and again in algebraic number theory. Second, the proof
can be adapted with relative ease to prove certain generalizations of the quadratic
reciprocity law to cubic and biquadratic residues (for this see Ireland and Rosen
again). These higher reciprocity laws were much sought by Gauss but found only
by his student Eisenstein (not the lmmaker).
Finally, the Gauss sum can be rewritten to look more like the Gaussians one
studies in continuous mathematics: you are asked in the homework to show that
2it2
=
e p .
t
CHAPTER 5
(x)
R/(xy) R/(x) 0
(xy)
68
Let I be a nonzero ideal of the FQ-domain R. Then I contains a nonzero element x, we have a natural surjection R/(x) R/I, and since R/(x) is nite, so is
R/I. We may therefore extend the norm map to nonzero ideals by |I| = #R/I
and also put |(0)| = 0. Note that this generalizes the previous norm map in that
for all x R we have |(x)| = |x|. As above, we say an ideal I is odd if |I| is odd.
A nonzero proper ideal
r b of R is factorable if there exist prime ideals p1 , . . . , pr
of R such that b = i=1 pi . Note that if the element b is factorable, then so is
the principal ideal (b), but in general the converse does not hold. Because of this
we say that an element b is I-factorable if the ideal (p) factors into a product of
prime ideals.
( )
For a R and an odd prime ideal p, we dene the Legendre symbol ap :
it is 0 if a p, 1 if a
/ p and a x2 (mod p), and 1 if a
/ p and a x2 (mod p).
For an odd factorable ideal b = p1 pr of R we dene the Jacobi symbol
r ( )
(a)
a
=
.
b
b
i=1
Let r be a ring, and let a r . Then the map ma : r r by x 7 xa is a bijection
(its inverse is a1 ).
Now suppose moreover r is nite, of order n, so upon choosing a bijection of R
with {1, . . . , n}, we may identify a with an element of the symmetric group Sn , and
in particular a has a well-dened sign [ ar ] {1} The sign map : Sn {1} is
a homorphism into a commutative group, so for all , Sn , ( 1 ) = (). In
a
R/b
is well-dened.
69
a 0 (mod 2)
a 1, 7 (mod 8)
,
2
a 3, 5 (mod 8)
) 0 a=0
(
a
1 a>0
,
=
1
1 a < 0
}
(a) {
0 a = 1
=
.
1 a=1
0
With these additional rules there is a unique extension of the Jacobi symbol to
a symbol ( na ) dened for any n, a Z such that for all integers n, a, b, we have
n
a
b
( ab
) = ( na )( nb ). One also has ( ab
n ) = ( n )( n ), i.e., the symbol is bi-multiplicative.
This extension of the Jacobi symbol is known as the Kronecker symbol.
(a)
0
1
=
When n is not odd and positive, some authors (e.g. [DH05]) dene ( na ) only
when a 0, 1 (mod 4). It is not worth our time to discuss these two conventions,
but we note that all of our results involve only thisrestricted Kronecker symbol.
For odd n Z+ , dene n = (1) 2 n. Full quadratic reciprocity i.e.,
the usual QR law together with its First and Second Supplements is equivalent
to one elegant identity: for a Z and an odd positive n Z,
n1
(25)
(a)
n
(
=
n
a
)
.
70
moreover, since a (Z/nZ) , there exists b (Z/nZ) such that ab 1 (mod n),
and then a b = b a = IdG , so that each a is an automorphism of G.
As for any group action on a set, this determines a homomorphism from (Z/nZ)
to the group Sym(G) of permutations of G, the latter group being isomorphic to Sn ,
the symmetric group on n elements. Recall that there is a unique homomorphism
from Sn to the cyclic group Z2 given by the sign of the permutation. Therefore we
have a composite homomorphism
(Z/nZ) Sym(G) Z2
which we will denote by
a (mod n) 7
(a)
.
G
Example 2.1 (Zolotarev): Let p be an odd prime and G( = Z
) p is the cyclic group
a
the usual Legendre symbol a 7 ( ap ). Indeed, the group (Z/pZ) is cyclic of even
order, so admits a unique surjective homomorphism to the group Z2 = {1}: if
g is a primitive root mod p, we send g to 1 and hence every odd power of g to
1 and every even power of g to +1. This precisely describes the Legendre
( )symbol
a
(mod n1 )
, g2
(mod n2 )
).
4. THE PROOF
71
2 n 1
the identity, so that r1 = 1 and r2 = n1
2 . In this case G = |G| = (1)
(mod 4). If n is even, then n r1 = 2r2 0 (mod 2), so r1 is even and hence is at
least 2, so G = (1)r2 nr1 0 (mod 4).
( )
So the Kronecker symbol Ga is always dened (even in the restricted sense).
Theorem 66. (Duke-Hopkins Reciprocity Law) For a nite commutative group
G and an integer a, we have
( a ) ( G )
=
.
G
a
The proof will be given in the next section.
Corollary 67. a) Suppose G has odd order n. Then for any a (Z/nZ) ,
we have
( a ) ( n )
=
.
G
a
b) Taking G = Zn we recover (34).
a
c) We have ( G
) = 1 for all a (Z/nZ) i n is a square.
Proof of Corollary 67: In the proof of Lemma 65 we saw that G = n ; part a)
a
then follows immediately from the reciprocity law. By part a), the symbol ( G
) can
be computed using any group of order n, so factor n into a product
p
p
of
not
1
r
r
a
necessarily distinct primes and apply Example 2.1: we get ( G
) = i=1 ( pai ) = ( na ).
This gives part b). Finally, using the Chinese Remainder Theorem it is easy to see
that there is some a such that ( na ) = 1 i n is not a square.
4. The Proof
Enumerate the elements of G as g1 , . . . , gn and the characters of G as 1 , . . . , n .
Let M be the n n matrix whose (i, j) entry is i (gj ).
Since any character X(G) has values on the unit circle in C, we have 1 = .
Therefore the number r1 of xed points of 1 on G is the same as the number
of characters such that = , i.e., real-valued characters. Thus the eect of
complex conjugation on the character matrix M is to x each row corresponding to
a real-valued character and to otherwise swap the ith row with the jth row where
j = i . In all r2 pairs of rows get swapped, so
det(M ) = det(M ) (1)r2 .
Moreover, with M = (M )t , we have
M M = nIn ,
so that
det(M ) det(M ) = nn ,
so
(26)
72
where =nr2 . (In particular det(M )2 is a positive integer. Note that det(M ) itself
lies in Q( G ), and is not rational if n is odd.) So for any a Z, we have
(
) ( )
det(M )2
G
(27)
=
.
a
a
The character matrix M has values in the cyclotomic eld Q(n ), which is a Galois
extension of Q, with Galois group isomorphic to (what a concidence!) (Z/nZ) , an
explicit isomorphism being given by making a (Z/nZ) correspond to the unique
automorphism a of Q(n ) satisfying a (n ) = na . (All of this is elementary Galois
theory except for the more number-theoretic fact that the cyclotomic polynomial n
is irreducible over Q.) In particular the group (Z/nZ) also acts by permutations
on the character group X(G), and indeed in exactly the same way it acts on G:
g G, (a )(g) = (g a ) = ((g))a = a (g),
so a = a . This has the following beautiful consequence:
For a (Z/nZ) , applying the Galois automorphism a to the character matrix
M induces a permutation of the rows which is the same as the permutation a
of G. In particular the signs are the same, so
(a)
(28)
det(a M ) = det(M )
.
G
Combining (35) and (28), we get that for all a (Z/nZ) ,
( a )
a ( G ) =
G .
G
Now, by the multiplicativity on both sides it is enough to prove Theorem 66 when
a = p is a prime not dividing n and when a = 1.
Proposition
c) ( Gp ) = 1.
The proof of this a standard result in algebraic number theory is omitted for now.
We deduce that
G
p
)
=
(p)
G
1
G
G > 0
G
G = 1 ( G ) =
=
G ,
G G <0
G
1
so
1
G
(
=
G
1
)
.
5. IN FACT...
73
5. In Fact...
...the real Duke-Hopkins reciprocity law is an assertion about a group G of order
n which is not necessarily commutative. In this case, the map g 7 g a need not be
an automorphism of G, so a more sophisticated approach is needed. Rather, one
considers the action of (Z/nZ) on the conjugacy classes {C1 , . . . , Cm } of G: if
g = xhx1 then g a = xha x1 , so this makes sense. We further dene r1 to be the
number of real conjugacy classes C = C 1 and assume that in our labelling
C1 , . . . , Cr1 are all real and dene r2 by the equation m = r1 + 2r2 . Then in place
of our G (notation which is not used in [DH05]), one has the discriminant
r1
CHAPTER 6
Then for all primes p, ordp (a ) = n ordp (a) = ordp (x) and ordp (bn ) = n ordp (b) =
ordp (y). We conclude x = an , y = bn , establishing part a). Part b) follows upon
noticing that if n is odd, (1)n = 1, so we may write x = (a)n , y = (b)n .
1.1. An application.
Theorem 70. The only integral solutions to
y 2 y = x3
(29)
are (0, 0) and (0, 1).
76
(30)
y 2 + 2 = x3
factorization in sight takes place over the quadratic ring Z[ 2], namely:
x3 = (y + 2)(y 2).
Looking back at the previous argument,
it seems
that whatwe would like to say is
y + 2 = 3 , y 2 = 3 .
The
justication for this will be a version of the coprime powers trick in the ring
Z[ 2], but let us assume it just for a moment and see what comes of it.
77
78
1
innite sequence {xi }
i=1 of elements of R such that xi+1 properly divides xi for
all i. This is a very mild condition: it is satised by any Noetherian ring and by
any UFD: c.f. [Factorization in Integral Domains].
Now let be a nonzero prime element of R, and let x R \ {0}. (ACCP) ensures
that there exists a largest non-negative integer n such that n | x, for otherwise
n | x for all n and { xn } is an innite sequence in which each element properly
divides the previous one. We put ord (x) to be this largest integer n. In other
words, ord (x) = n i n | x and n+1 - x. We formally set ord (0) = +, and
we extend ord to a function on the fraction eld K of R by multiplicativity:
( )
x
ord
:= ord (x) ord (y).
y
This generalizes the functions ordp on Z and Q, and the same properties hold.
Proposition 73. Let R be an (ACCP) domain with fraction eld K. Let be
a nonzero prime element of R and x, y K \ {0}. Then:
a) ord (xy) = ord (x) + ord (y).
b) ord (x + y) min(ord (x), ord (y)).
c) Equality holds in part b) if ord (x) = ord (y).
Proof. We will suppose for simplicity that x, y R \ {0}. The general case
follows by clearing denominators as usual. Put a = ord (x), b = ord (y). By
hypothesis, there exists x , y such that x = a x , y = b y and - x , y .
a) xy = a+b (x y ). Thus ord (xy) a + b. Conversely, suppose that a+b+1 | xy.
Then | x y , and, since is a prime element, this implis | x or | y , contradiction. Thus ord (xy) = a + b = ord (x) + ord (y).
b) Let c = min a, b, so x + y = c ( ac x + bc y , and thus c | x + y and
ord (x + y) c = min(ord (x), ord (y)).
c) Suppose without loss of generality that a < b, and write x + y = a (x + ba y ).
If a+1 | x + y = a x + b y , then | x + ba y . Since b a > 0, we have
| (x + ba y ) ( ba y ) = x , contradiction.
Suppose that and are associate nonzero prime elements, i.e., there exists a unit
u R such that = u. Then a moments thought shows that the ord functions
ord and ord coincide. This means that ord depends only on the principal ideal
p = () that the prime element generates. We could therefore redene the ord
function as ordp for a nonzero principal prime ideal p = () of R, but for our purposes it is convenient to just choose one generator of each such ideal p. Let P
be a maximal set of mutually nonassociate nonzero prime elements, i.e., such that
each nonzero prime ideal p contains exactly one element of P.
Now suppose that R is a UFD, and x R \ {0} is an element such that ord (x) = 0
for all P. Then x is not divisible by any irreducible elements, so is necessarily
a unit. In fact the same holds for elements x K \ {0}, since we can express x = ab
with a and b not both divisible by any prime element. (In other words, in a UFD we
can reduce fractions to lowest terms!) It follows that any x K \ {0} is determined
1We say that a properly divides b if a | b but a is not associate to b.
79
up to a unit by the integers ord (x) as ranges over elements of P. Indeed, put
y=
ord x .
P
Then we have
ord ( xy )
x
y
y =
ordp (y)
n
pP
where the product extends over a maximal set of pairwise nonassociate nonzero
ord (x)
prime elements of R. By construction, we have ordp ((x )n ) = n ordp (x ) = n np
=
n
ordp (x) for all p P, so the elements x and (x ) are associate: i.e., there exists
a unit u in R such that x = u(x )n . Exactly the same applies to y and y : there
exists a unit v R such that y = v(y )n .
3.2. Application to the Bachet-Fermat Equation.
To complete the proof of Theorem 71 we need to verify
that the hypotheses of
Proposition 72b)
apply:
namely,
that
every
unit
in
Z[
2] is a cube and that the
N (d)N (d ) = N (2 2) = 8,
N (d)N () = N (d) = N (y + 2) = y 2 + 2 = x3 ,
so N (d) | x3 . We claim that x must be odd. For if not, then reducing the equation
x3 = y 2 + 2 mod 8 gives y 2 6 (mod 8), but the only squares mod 8 are 0, 1, 4.
Thus x3 is odd and N (d) | gcd(x3 , 8) = 1 so d = 1 is a unit in R.
3.3. Application to the Mordell Equation with k = 1.
Theorem 74. The only integer solution to y 2 + 1 = x3 is (1, 0).
Proof. This time we factor the left hand side over the UFD R = Z[ 1]:
(y + 1)(y 1) = x3 .
80
If a nonunit
d in
R divides bothy +
1 and y 1, then it divides
(y + 1)
(y
1) = 2 1 = (1 + 1)2 1. The element 1 + 1, having norm
N (1 + 1) = 2 a prime number, must be an irreducible (hence prime) element of
R. So 1 + i is the only possible common prime divisor. We compute
y 1 1 1
y 1 + (y 1) 1
=
2
1 + 1 1 1
which is an element of R i y is odd. But consider the equation y 2 + 1 = x3 modulo
4: if y is odd, then y 2 + 1 2 (mod 4),
but 2 is not a cube modulo 4. Therefore we
although
must have that y is even, so that y 1 are indeed coprime. Moreover,
y + 1 = 3 , y 1 = 3 .
y + 1 = a3 3b2 a + (3a2 b b3 ) 1,
or
y = a(a2 3b2 ), 1 = b(3a2 b2 ).
So we have either 1 = b = 3a2 b2 , which leads to 3a2 = 2, which has
no integral
2
2
solution,
or
1
=
b
=
3a
b
,
which
leads
to
a
=
0,
so
1, y =
the present situation we are using the assumption that Z[ k] is a UFD in order
to showthat y 2 + k = x3 has very few solutions, earlier we used the assumption
that Z[ D] is a UFD to show that the family of equations x2 + Dy 2 = p had
many solutions, namely for all primes p for which D is a
square mod p.
A more signicant dierence is that the assumption Z[ D] was necessary as
well as sucient for our argument to go through: we saw
that whenever D < 3 2
is not of the form x2 +Dy 2 . On the other hand, suppose Z[ k] is not a UFD: must
the coprime powers trick fail? It is not obvious, so let us study it more carefully.
We would like to axiomatize the coprime powers trick. There is an agreed upon
denition of coprimality of two elements x and y in a general domain R: if d | x
and d | y then d is a unit. However it turns out to be convenient to require a
stronger property than this, namely that the ideal x, y = {rx + sy | r, s R}
generated by x and y be the unit ideal R. More generally, for two ideals I, J of a
ring, the sum I + J = {i + j | i I, j J} is an ideal, and we say that I and J
are comaximal if I + J = R; equivalently, the only ideal which contains both I
and J is the improper ideal R. Since every proper ideal in a ring is contained in
a maximal, hence prime, ideal, the comaximality can be further reexpressed as the
property that there is no prime ideal p containing both I and J. (This will be the
formulation which is most convenient for our application.)
Notice that the condition that x and y be coprime can be rephrased as saying
4. BEYOND UFDS
81
that the only principal ideal (d) containing both x and y is the improper ideal
R = (1). So the notions of coprime and comaximal elements coincide in a principal
domain, but not in general.
Now, for a positive positive integer n, say that an integral domain R has property CM(n) if the comaximal powers trick is valid in degree n: namely, for all
x, y, z R with x, y = R and xy = z n , then there exist elements a, b R and
units u, v R such that x = uan , y = vbn . Exactly as above, if we also have
(R )n = (R ) i.e., every unit in R is an nth power then the units u and v can
be omitted. Now consider the following
+
Theorem
75. Let k Z be squarefree with k 1, 2 (mod 4). Suppose that
the ring Z[ k] has property CM(3). Then:
a) If there exists an integer a such that k = 3a2 1, then the only integer solutions
to the Mordell equation y 2 + k = x3 are (a2 + k, a(a2 3k)).
b) If there is no integer a as in part a), the Mordell equation y 2 + k = x3 has no
integral solutions.
(y + k)(y k) = x3 .
We wish to show
that y + k, y k
= R. If not,
there exists
a prime ideal
k
p.
Then
(y
+
k)
(y
k)
=
2
k p, hence
p of R with
y
Since property CM(3) holds in the PIDs Z[ 1] and Z[ 2], whatever else Theorem 75 may be good for, it immediately implies Theorems 71 and 74. Moreover
its proof was shorter than the proofs of either of these theorems! The economy was
gained by consideration of not necessarily principal ideals.
Thus, if for a given k as in the statement of Theorem 75 we can nd more solutions to the Mordell Equation
than the ones enumerated in the conclusion of the
theorem we know that Z[ k] does not satisfy property CM(3). In the following
examples we simply made a brute force search over all x and y with |x| 106 .
(There is, of course, no guarantee that we will nd all solutions this way!)
Example:
The equation y 2 + 26 = x3 has solutions (x, y) = (3, 1), (35, 207), so
82
Example:
The equation y 2 + 53 = x3 has solutions (x, y) = (9, 26), (29, 156),
observe that 5 is not of the form a2 + 109, Z[ 109], so by Theorem 75, Z[ 109]
does not have property CM(3).
83
h(Q( k)) =
1 for k = 1, 2, 3, 7, 11, 19, 43, 67, 163
2 for k = 5, 6, 10, 13, 15, 22, 35, 37, 51, 58, 91, 115, 123, 187
3 for k = 23, 31, 59, 83, 107, 139
4 for k = 14, 17, 21, 30, 33, 34, 39, 42, 46, 55, 57, 70, 73, 78, 82, 85, 93, 97, 102, 130,
133, 142, 155, 177, 190, 193, 195
5 for k = 47, 79, 103, 127, 131, 179
6 for k = 26, 29, 38, 53, 61, 87, 106, 109, 118, 157
7 for k = 71, 151
8 for k = 41, 62, 65, 66, 69, 77, 94, 98, 105, 113, 114, 137, 138, 141, 145, 154, 158, 165, 178
9 for k = 199
10 for k = 74, 86, 122, 166, 181, 197 11 for k = 167
12 for k = 89, 110, 129, 170, 174, 182, 186
13 for k = 191
14 for k = 101, 134, 149, 173
16 for k = 146, 161, 185
20 for k = 194
So Theorem 75 applies to give a complete solution to the Mordell equation y 2 + k =
x3 for the following values of k:
1, 2, 5, 6, 10, 13, 14, 17, 21, 22, 30, 33, 34, 37, 41, 42, 46, 57, 58, 62, 65, 69, 70, 73, 74, 77, 78,
82, 85, 86, 93, 94, 97, 98, 101, 102, 106, 113, 114, 122, 130, 133, 134, 137, 138,
141, 142, 145, 146, 149, 154, 158, 161, 165, 166, 177, 178, 181, 185, 190, 193, 194, 197.
2
3
Example: The equation
y +47 = x has solutions (x, y) = (6, 13), (12, 41), (63, 500).
On the other hand Z[ 47] has class number 5 so does have property CM(3). Note
that 47 3 (mod 4).
Example: Z[ 29] has class number 6, but nevertheless y 2 + 29 = x3 has no integral solutions.3 Thus there is (much) more to this story than the coprime powers
trick. For more details, we can do no better than recommend [M, Ch. 26].
5. Remarks and Acknowledgements
Our rst inspiration for this material was the expository note [Conr-A]. Conrad
proves Theorems 70 and 71 as an application of unique factorization in Z and
Z[ 2]. Many more examples of successful (and one unsuccessful!) solution of
Mordells equation for various values of k are given in [Conr-B]. A range of techniques is showcased, including the coprime powers trick but also: elementary (but
somewhat intricate) congruence arguments and quadratic reciprocity.
Also useful for us were lecture notes of P. Stevenhagen [St-ANT]. Stevenhagens
treatment is analogous our discussion of quadratic rings. In particular, he rst
3How do we know? For instance, we can look it up on the internet:
http://www.research.att.com/njas/sequences/A054504
84
proves Theorem 74. He then assumes that Z[ 19] satises CM(3) and deduces
that y 2 +19 = x3 has no integral solutions; nally he points out (x, y) = (18, 7). We
did not discuss this example in the text because it depends critically on the fact
that
For rings like Z[ 19] the denition we gave of the class number is not the correct
one: we should count only equivalence classes of invertible ideals i.e., nonzero
ideals I for which there
exists J such that IJ is principal. In this amended sense
the class number of Z[ 19] is 3.
A generalization of Theorem 75 appears in 5.3 of lecture notes of Franz Lemmermeyer:
http://www.fen.bilkent.edu.tr/franz/ant/ant1-7.pdf
Lemmermeyer
nds all integer solutions to the equation y 2 + k = x3 whenever
CHAPTER 7
(32)
1.1. History.
Leonhard Euler called (32) Pells Equation after the English mathematician John
Pell (1611-1685). This terminology has persisted to the present day, despite the
fact that it is well known to be mistaken: Pells only contribution to the subject
was the publication of some partial results of Wallis and Brouncker. In fact the
correct names are the usual ones: the problem of solving the equation was rst
considered by Fermat, and a complete solution was given by Lagrange.
By any name, the equation is an important one for several reasons only some
of which will be touched upon here and its solution furnishes an ideal introduction to a whole branch of number theory, Diophantine Approximation.
1.2. First remarks on Pells equation.
We call a solution (x, y) to (32) trivial if xy = 0. We always have at least two
trivial solutions: (x, y) = (1, 0), which we shall call trivial. As for any plane
conic curve, as soon as there is one solution there are innitely many rational solutions (x, y) Q2 , and all arise as follows: draw all lines through a single point, say
(1, 0), with rational slope r, and calculate the second intersection point (xr , yr )
of this line with the quadratic equation (32).
The above procedure generates all rational solutions and thus contains all integer solutions, but guring out which of the rational solutions are integral is not
straightforward. This is a case where the question of integral solutions is essentially
dierent, and more interesting, than the question of rational solutions. Henceforth
when we speak of solutions (x, y) to (32) we shall mean integral solutions.
Let us quickly dispose of some uninteresting cases.
Proposition 77. If the Pell equation x2 dy 2 = 1 has nontrivial solutions,
then d is a positive integer which is not a perfect square.
Proof. (d = 1): The equation x2 + y 2 = 1 has four trivial solutions:
(1, 0), (0, 1).
(d < 1): Then x =
0 = x2 dy 2 2, so (32) has only the solutions (1, 0).
85
86
x2 1
.
2
In other words, we are looking for positive integers x for which x 21 is an integer
square. First of all x2 1 must be even, so x must be odd. Trying x = 1 gives, of
course, the trivial solution (1, 0). Trying x = 3 we get
2
32 1
= 4 = 22 ,
2
so (3, 2) is a nontrivial solution. Trying successively x = 5, 7, 9 and so forth we nd
2
that it is rare for x 21 to be a square: the rst few values are 12, 24, 40, 60, 69, 112
and then nally with x = 17 we are in luck:
172 1
= 144 = 122 ,
2
so (17, 12) is another positive solution. Searching for further solutions is a task more
suitable for a computer. My laptop has no trouble nding some more solutions:
the next few are (99, 70), (577, 408), and (3363, 2378). Further study suggests that
(i) the equation x2 2y 2 has innitely many integral solutions, and (ii) the size of
the solutions is growing rapidly, perhaps even exponentially.
2.1. Return of abstract algebra. If weve been paying attention, there are
some clues here that we should be considering things from an algebraic perspective.
Namely, (i) we see that factorization of the left-hand side of the equation x2 dy 2 =
1 leads only to trivial solutions; and (ii) when d < 0, we reduce to a problem that
we have already solved: namely, nding all the units in the quadratic ring
Z[ d] = {a + b d | a, b Z}.
We brought up the problem of determining the units in real quadratic rings Z[ d],
but we did not solve it. We are coming to grips with this same problem here.
= r s d,
87
N : Q( d) Q, N () = ;
Lemma 78. The norm map is multiplicative: for any , Q( d), we have
N () = N ()N ().
Proof. This is a straightforward, familiar computation. Alternately, we get a
more conceptual proof by using the homomorphism property of conjugation (which
is itself veried by a simple computation!):
N () = () = = ( )( ) = N ()N ().
Thus Z-solutions (x, y) to (32) correspond to norm one elements x+y d Z[ d]:
N (x + y d) = x2 dy 2 = 1.
N ((x1 +y1 d)(x2 +y2 d)) = N (x1 +y1 d)N (x2 +y2 d) = (x21 dy12 )(x22 dy22 ) = 11 = 1;
multiplying out (x1 +y1 d)(x2 +y2 d) and collecting rational and irrational parts,
we get a new solution (x1 x2 + dy1 y2 , x1 y2 + x2 y1 ).
Let us try out this formula in the case d = 2 for (x1 , y1 ) = (x2 , y2 ) = (3, 2).
Our new solution is (3 3 + 2 2 2, 2 3 + 3 2) = (17, 12), nothing else than the second smallest positive solution! If we now apply the formula with (x1 , y1 ) = (17, 12)
and (x2 , y2 ) = (3, 2), we get the next smallest solution (99, 70).
Indeed,
for any positive integer n, we may write the nth power (3 + 2 2)n as
xn + yn d and know that (xn , yn ) is a solution to the Pell equation. One can
see from the formula for the product that it is a positivesolution. Moreover, the
solutions are all dierent because the real numbers (3 + 2 2)n are all distinct: the
only complex numbers z for which z n = z m for some m < n are the roots of unity,
and the only real roots of unity
are 1. Indeed, we get
(1, 0)
the trivial solution
by taking the 0th power of 3 + 2 2. Moreover, (3 + 2 2)1 =3 2 2 is a halfpositive solution, and taking negative integral powers of 3 + 2 2 we get innitely
many more such solutions.
88
In total, every
solution to x2 dy 2 = 1 that we have found is of the form (xn , yn )
where xn + yn d = (3 + 2 2)n for some n Z.
Let us try to prove that these are all the integral solutions. It is enough to
show that every positive solution is
of the form (xn , yn ) for some positive integer n,
since every norm one element x + y d is obtained from an element with x, y Z+
by multiplying by 1 and/or taking the reciprocal.
a)
b)
c)
d)
2
2
Lemma 79. Let (x, y) be a nontrivial
integral solution to x dy = 1.
x and y are both positive x+ y d > 1.
d < 1.
x > 0 and y < 0 0 < x + y
x < 0 and y > 0 1 < x + y d
< 0.
x and y are both negative x + y d < 1.
Proof. Exercise.
= (x + y d) (3 + 2 2)n = (x + y d) (3 2 2)n .
xn + yn d = (3 + 2 2)n
for n Z+ . If we apply conjugation to this equation, then using the fact that it is
a eld homomorphism, we get
xn yn d = (3 2 2)n .
3. A RESULT OF DIRICHLET
89
(3 + 2 2)n
xn =
,
2
(3 + 2 2)n
yn =
2 2
for some positive integer n.
Among other things, this explains why it was not so easy to nd solutions by hand:
the size of both the x and y coordinates grow exponentially! The reader is invited
to plug in a value of n for herself: for
e.g. n = 17 it is remarkable how close the
irrational numbers u17 /2 and u17 /(2 2) are to integers:
u17 /2 = 5168247530882.999999999999949;
very close to 2. Indeed, by turning this observation on its head we shall solve the
Pell equation for general nonsquare d.
3. A result of Dirichlet
Lemma 81. (Dirichlet) For any irrational (real) number , there are innitely
many rational numbers xy (with gcd(x, y) = 1) such that
|x/y | <
1
.
y2
Proof. Since the lowest-term denominator of any rational number xy is unchanged by subtracting any integer n, by subtracting the integer part [] of we
may assume [0, 1). Now divide the half-open interval [0, 1) into n equal pieces:
[0, n1 ) [ n1 , n2 ) . . . [ n1
n , 1). Consider the fractional parts of 0, , 2, . . . , n. Since
we have n + 1 numbers in [0, 1) and only n subintervals, by the pigeonhole principle some two of them must lie in the same subinterval. That is, there exist
0 j < k n such that
1
|j [j] (k [k])| < .
n
Now take y = j k, x = [k] [j], so that the previous inequality becomes
|x y| <
1
.
n
90
We may assume that gcd(x, y) = 1, since were there a common factor, we could
divide through by it and that would only improve the inequality. Moreover, since
0 < y < n, we have
x
1
1
| | <
< 2.
y
ny
y
This exhibits one solution. To see that there are innitely many, observe that
since is irrational, | xy | is always strictly greater than 0. But by choosing n
suciently large we can apply the argument to nd a rational number
|
x
y
such that
x
x
| < | |,
y
y
Remark: The preceding argument is perhaps the single most famous application of
the pigeonhole principle. Indeed, in certain circles, the pigeonhole principle goes
by the name Dirichlets box principle because of its use in this argument.
4. Existence of Nontrivial Solutions
We are not ready to prove that for all positive nonsquare integers d, the Pell equation x2 dy 2 = 1 has a nontrivial solution. Well, almost. First we prove an
approximation to this result and then use it to prove the result itself.
Proposition 82. For some real number M , there exist innitely many pairs
of coprime positive integers (x, y) such that |x2 dy 2 | < M .
an innite
sequence of coprime positive (since d is positive) integers (x, y) with
1
|x y d| < .
y
Since
in order to bound the left-hand side we also need a bound on |x + y d|. There is
no reason to expect that it is especially small, but using the triangle inequality we
can get the following:
1
|x + dy| = |x dy + 2 dy| |x dy| + 2 dy < + 2 dy.
y
Thus
1
1 1
|x2 dy 2 | < ( )( + 2 dy) = 2 + 2 d 1 + 2 d = M.
y y
y
91
= X1 + Y1 d
and
= X2 + Y2 d;
we have N () = N () = m. Arst thought is to divide by to get an element
of norm 1; however, / Q( d) but does not necessarily have integral x and y
coordinates. However, it works after a small trick: consider instead
= X + Y d.
I claim that both X and Y are divisible by m. Indeed we just calculate, keeping in
mind that modulo m we can replace X2 with X1 and Y2 with Y1 :
X = X1 X2 dY1 Y2 X12 dY12 0
(mod |m|),
Y = X1 Y2 X2 Y1 X1 Y1 X1 Y1 0 (mod |m|).
,
= ,
2
2
2 d
2 d
+
for a unique n Z .
c) Every solution to the Pell equation is of the form (xn , yn ) for n Z.
Proof. Above we showed the existence of a positive solution (x, y). It is easy
to see that for
any M > 0 there are only nitely many pairs of positive integers
such that
x
+
y
d M , so among all positive solutions, there must exist one with
x+y
d least. By taking positive integral powers of this fundamental solution
x1 + y1 d we get innitely many positive solutions, whose x and y coordinates can
be found explicitly as in 2. Moreover, the argument of 2 given there for d = 2
92
works generally to show that every positive solution is of this form. The reader
is invited to look back over the details.
6. A Caveat
It is time to admit that solving the Pell equation
is generally taken to mean
explicitly nding the fundamental solution x1 + y1 d. As usual in this course, we
have concentrated on existence and not considered the question of how dicult it
would be in practice to nd the solution. Knowing that it exists
we can, in principle, nd it by trying all pairs (x, y) in order of increasing x + y d until we nd
one. When d = 2 this is immediate. If we try other values of d we will see that
sometimes it is no trouble at all:
For d = 3, the fundamental solution is (2, 1). For d = 6, it is (5, 2). Similarly
the fundamental solution can be found by hand for d 12; it is no worse than
(19, 6) for d = 10. However, for d = 13 it is (649, 180): a big jump!
If we continue to search we nd that the size of the fundamental solution seems to
obey no reasonable law: it does not grow in a steady way with d e.g. for d = 42
it is the tiny (13, 2) but sometimes it is very large: for d = 46 it is (24335, 3588),
and hold on to your hat! for d = 61 the fundamental solution is
(1766319049, 226153980).
And things get worse from here on in: one cannot count on a brute-force search for
d even of modest size (e.g. ve digits).
There are known algorithms which nd the fundamental solution relatively eciently. The most famous and elementary of them is as follows: one can nd
the
fundamental solution as a convergent in the continued fraction expansion of d,
and this is relatively fast it depends upon the period length. Alas, we shall not
touch the theory of continued fractions in this course.
Continued fractions are not the last word on solving the Pell Equation, however.
When d is truly large, other methods are required. Amazingly, a test case for this
can be found in the mathematics of antiquity: the so-called cattle problem of
Archimedes. Archimedes composed a lengthy poem (twenty-two Greek elegiac
distichs) which is in essence the hardest word problem in human history. The rst
part, upon careful study, reduces to solving a linear Diophantine equation (in several variables), which is essentially just linear algebra, and it turns out that there
is a positive integer solution. However, to get this far is merely competent, according to Archimedes. The second part of the problem poses a further constraint
which boils down to solving a Pell equation with d = 410286423278424. In 1867
C.F. Meyer set out to solve the problem using continued
fractions. However, he
computed 240 steps of the continued fraction expansion of d, whereas the period
length is in fact 203254. Only in 1880 was the problem solved, by A. Amthor. (The
gap between the problem and the solution 2000 years and change makes the
case of Fermats Last Theorem look fast!) Amthor used a dierent method. All of
this and much more is discussed in a recent article by Hendrik Lenstra [Le02].
93
of unity, of which there are no more than 1 in all of R, let alone Q( d). The result follows from these considerations; the proof of this is left as an optional exercise.
This is a special case of an extremely important and general result in algebraic
number theory. Namely, one can consider any algebraic number eld a nite degree eld extension K of Q and then the ring OK of all algebraic integers of K
that is, elements of K which satisfy
a monic polynomial with Z coecients. We
have been looking at the case K = Q( d), a real quadratic eld. Other relatively
familiar examples are the cyclotomic elds Q(N ) obtained by adjoining an N th
root of unity: one can show that this eld has degree (N ) over Q (equivalently,
the cyclotomic polynomial N is irreducible over Q).
However, the argument that we used to nd the general solution to the Pell equation is fascinating and important. On the face of it, it is very hard to believe that
the problem of nding good rational approximations to an irrational number (a
problem which is, lets face it, not initially so fascinating) can be used to solve
Diophantine equations: we managed to use a result involving real numbers and
inequalities to prove a result involving equalities and integers! This is nothing less
than an entirely new tool, lying close to the border between algebraic and analytic
number theory (and therefore helping to ensure a steady commerce between them).
94
This subject is notoriously dicult but here is one easy result. Dene
10k! .
L=
n=0
pN
A
|< B
qN
qN
for all suciently large N . On the other hand, Liouville proved the following:
Theorem 85. Suppose satises a polynomial equation ad xd + . . . + a1 x + a0
with Z-coecients. Then there is A > 0 such that for all integers p and 0 = q,
p
A
| | > d .
q
q
That is, being algebraic of degree d imposes an upper limit on the goodness of
the approximation by rational numbers. An immediate and striking consequence is
that Liouvilles number L cannot satisfy an algebraic equation of any degree: that
is, it is a transcendental number. In fact, by this argument Liouville established
the existence of transcendental numbers for the rst time!
Liouvilles theorem was improved by many mathematicians, including Thue and
Siegel, and culminating in the following theorem of Klaus Roth:
Theorem 86. (Roth, 1955) Let be an algebraic real number (of any degree),
and let > 0 be given. Then there are at most nitely many rational numbers pq
satisfying
p
1
| | < 2+ .
q
q
For this result Roth won the Fields Medal in 1958.
CHAPTER 8
Arithmetic Functions
1. Introduction
Denition: An arithmetic function is a function f : Z+ C.
Truth be told, this denition is a bit embarrassing. It would mean that taking any
function from calculus whose domain contains [1, +) and restricting it to positive
e3x
integer values, we get an arithmetic function. For instance, cos2 x+(17
log(x+1)) is
an arithmetic function according to this denition, although it is, at best, dubious
whether this function holds any signicance in number theory.
If we were honest, the denition we would like to make is that an arithmetic function
is a real or complex-valued function dened for positive integer arguments which is
of some arithmetic signicance, but of course this is not a formal denition at all.
Probably it is best to give examples:
Example A: The prime counting function n 7 (n), the number of prime numbers p, 1 p n.
This is the example par excellence of an arithmetic function: approximately half
of number theory is devoted to understanding its behavior. This function really
deserves a whole unit all to itself, and it will get one: we put it aside for now and
consider some other examples.
Example 1: The function (n), which counts the number of distinct prime divisors of n.
Example 2: The function (n), which counts the number of integers k, 1 k n,
with gcd(k, n) = 1. Properly speaking this function is called the totient function, but its fame inevitably precedes it and modern times it is usually called just
the phi function or Eulers phi function. Since a congruence class k modulo
n is invertible in the ring Z/nZ i its representative k is relatively prime to n, an
equivalent denition is
(n) := #(Z/nZ) ,
the cardinality of the unit group of the nite ring Z/nZ.
Example 3: The function n 7 d(n), the number of positive divisors of n.
95
96
8. ARITHMETIC FUNCTIONS
k (n) =
dk ,
d | n
the sum of the kth powers of the positive divisors of n. Note that 0 (n) = d(n).
Example 5: The Mobius function (n), dened as follows: (1) = 1, (n) = 0
if n is not squarefree; (p1 pr ) = (1)r , when p1 , . . . , pr are distinct primes.
Example 6: For a positive integer k, the function rk (n) which counts the number of representations of n as a sum of k integral squares:
rk (n) = #{(a1 , . . . , ak ) | a21 + . . . + a2k = n}.
These examples already suggest many others. Notably, all our examples but Example 5 are special cases of the following general construction: if we have on hand,
for any positive integer n, a nite set Sn of arithmetic objects, then we can dene
an arithmetic function by dening n 7 #Sn . This shows the link between number
theory and combinatorics. In fact the Mobius function is a yet more purely combinatorial gadget, whose purpose we shall learn presently. In general we have lots
of choices as to what sets Sn we want to count: the rst few examples are elementary in the sense that the sets counted are dened directly in terms of such things
as divisibility, primality, and coprimality: as we shall, see, they are also elementary in the sense that we can write down exact formulas for them. The example
rk (n) is more fundamentally Diophantine in character: we have a polynomial in
several variables here P (x1 , . . . , xk ) = x21 + . . . + x2k , and the sets we are conunting
are just the number of times the value n is taken by this polynomial. This could
clearly be much generalized, with the obvious proviso that there should be some
suitable restrictions so as to make the number of solutions nite in number (e.g.
we would not want to count the number of integer solutions to ax + by = N , for
that is innite; however we could restrict x and y to taking non-negative values).
Ideally we would like to express these Diophantine arithmetic functions like rk
in terms of more elementary arithmetic functions like the divisor sum functions k .
Very roughly, this is the arithmetic analogue of the analytical problem expressing
a real-valued function f (x) as a combination of simple functions like xk or cos(nx),
sin(nx). Of course in analysis most interesting functions are not just polynomials
(or trigonometric polynomials), at least not exactly: rather, one either needs to
consider approximations to f by elementary functions, or to express f as some sort
of limit (e.g. an innite sum) of elementary functions (or both, of course). A similar
philosophy applies here, with a notable exception: even the elementary functions
like d(n) and (n) are not really so elementary as they rst appear!
2. Multiplicative Functions
2.1. Denition and basic properties.
An important property shared by many arithmetically signicant functions is
multiplicativity.
Denition: An arithmetic function f is said to be multiplicative if:
2. MULTIPLICATIVE FUNCTIONS
97
(M1) f (1) = 0.
(M2) For all relatively prime positive integers n1 , n2 , f (n1 n2 ) = f (n1 ) f (n2 ).
Lemma 87. If f is multiplicative, then f (1) = 1.
Proof. Taking n1 = n2 = 1, we have, using (M2)
f (1) = f (1 1) = f (1) f (1) = f (1)2 .
Now by (M1), f (1) = 0, so that we may cancel f (1)s to get f (1) = 1.
Exercise: Suppose an arithmetic function f satises (M2) but not (M1). Show that
f 0: i.e., f (n) = 0 for all n Z+ .
The following is a nice characterization of multiplicative functions:
Proposition 88. For an arithmetic function f , the following are equivalent:
a) f is multiplicative;
b) f is not identically zero, and for all n = pa1 1 pakk (the standard form factork
ization of n), we have f (n) = i=1 f (pai i ).
Remark: Here we are using the convention that for n = 1, k = 0, and a product
extending over zero terms is automatically equal to 1 (just as a sum extending over
zero terms is automatically equal to 0). (If this is not to your taste, just insert in
part b) the condition that f (1) = 1!)
Proof. Exercise.
98
8. ARITHMETIC FUNCTIONS
k (n1 n2 ) =
dk =
dk2 ) = k (n1 )k (n2 ).
(d1 d2 )k = (
dk1 )(
d | n1 n2
d1 | n1 , d2 | n2
d1 | n 1
d2 | n 2
2.4. CRT and the multiplicativity of the totient. The multiplicativity
of is closely connected to the Chinese Remainder Theorem, as we now review.
Namely, for coprime n1 and n2 , consider the map : Z/(n1 n2 ) Z/(n1 ) Z/(n2 )
given by
k (mod n1 n2 ) 7 (k (mod n1 ), k (mod n2 )).
This map is a well-dened homomorphism of rings, since if k1 k2 (mod n)i ,
then k1 k2 (mod n)1 n2 . Because the source and target have the same, nite,
cardinality n1 n2 , in order for it to be an isomorphism it suces to show either
that it is injective or that it is surjective. Note that the standard, elementary
form of the Chinese Remainder Theorem addresses the surjectivity: given any pair
of congruence classes i (mod n1 )and j (mod n2 ) the standard proof provides an
explicit formula for a class p(i, j) (mod n1 n2 ) which maps via onto this pair of
classes. However, writing down this formula requires at least a certain amount of
cleverness, whereas it is trivial to show the injectivity: as usual, we need only show
that the kernel is 0. Well, if (k) = 0, then k is 0 mod n1 and 0 mod n2 , meaning
that n1 | k and n2 | k. In other words, k is a common multiple of n1 and n2 , so, as
weve shown, k is a multiple of the least common multiple of n1 and n2 . Since n1
and n2 are coprime, this means that n1 n2 | k, i.e., that k 0 (mod n1 n2 )!
Theorem 90. There is a canonical isomorphism of groups
(Z/(n1 n2 )) (Z/(n1 )) (Z/(n2 )) .
Proof. This follows from the isomorphism of rings discussed above, together
with two almost immediate facts of pure algebra. First, if : R S is an
isomorphism of rings, then the restriction of to the unit group R of R is an
isomorphism onto the unit group S of S. Second, if S = S1 S2 is a product of
rings, then S = S1 S2 , i.e., the units of the product is the product of the units.
We leave it to the reader to verify these two facts.
Corollary 91. The function is multiplicative.
Proof. Since (n) = #(Z/(n)) , this follows immediately.
Now let us use the philosophy of multiplicativity to give exact formulas for k (n)
and (n). In other words, we have reduced to the case of evaluating at prime
power values of n, but this is much easier. Indeed, the positive divisors of pa are
2. MULTIPLICATIVE FUNCTIONS
99
1 (pk )a+1
1 p(a+1)k
=
.
1 pk
1 pk
Similarly, the only numbers 1 i pa which are not coprime to pa are the multiples
of p, of which there are pa1 : 1 p, 2 p, . . . pa1 p = pa . So
1
(pa ) = pa pa1 = pa1 (p 1) = pa (1 ).
p
Corollary 92. Suppose n = pa1 1 pakk . Then:
k
a) d(n) = i=1 (ai + 1).
k
(a+1)k
b) For k > 0, k (n) = i=1 1p
.
1pk
k
c) (n) = i=1 pa1 (p 1).
The last formula is often rewritten as
(n)
1
(33)
=
(1 ).
n
p
p | n
While we are here, we quote the following more general form of the CRT, which is
often useful:
Theorem 93. (Generalized Chinese Remainder Theorem) Let n1 , . . . , nr be
any r positive integers. Consider the natural map
: Z Z/n1 Z Z/n2 Z . . . Z/nr Z
which sends an integer k to (k (mod n1 ), . . . , k (mod nr )).
a) The kernel of is the ideal (lcm(n1 , . . . , nr )).
b) The following are equivalent:
(i) is surjective;
(ii) lcm(n1 , . . . , nr ) = n1 nr .
(iii) The integers n1 , . . . , nr are pairwise relatively prime.
The proof is a good exercise. In fact the result holds essentially verbatim for
elements x1 , . . . , xr in a PID R, and, in some form, in more general commutative
rings.
2.5. Additive functions. The function is not multiplicative: e.g. (1) = 0
and (2) = 1. However it satises a property which is just as good as multiplicativity: (n1 n2 ) = (n1 ) + (n2 ) when gcd(n1 , n2 ) = 1. Such functions are
called additive. Finally, we have the notion of complete additivity: f (n1 n2 ) =
f (n1 ) + f (n2 ) for all n1 , n2 Z+ ; i.e., f is a homomorphism from the positive
integers under multiplication to the complex numbers under addition. We have
seen some completely additive functions, namely, ordp for a prime p.
Proposition 94. Fix any real number a > 1 (e.g. a = e, a = 2). A function
f is additive (respectively, completely additive) i af is multiplicative (respectively,
completely multiplicative).
Proof. Exercise.
100
8. ARITHMETIC FUNCTIONS
3. DIVISOR SUMS, CONVOLUTION AND MOBIUS
INVERSION
101
this form. The argument is a well-known one (and is found in Silvermans book) so
we omit it here. Whether or not there exist any odd perfect numbers is one of the
notorious open problems of number theory. At least you should not go searching for
odd perfect numbers by hand: it is known that there are no odd perfect numbers
N < 10300 , and that any odd perfect number must satisfy a slew of restrictive
conditions (e.g. on the shape of its standard form factorization).
3. Divisor Sums, Convolution and M
obius Inversion
The proof of the multiplicativity of the functions k , easy though it was, actually establishes a more general result. Namely,
suppose that f is a multiplicative
F (n) =
F (d).
d | n
For instance, if we start with the function f (n) = nk , then F = k . Note that
f (n) = nk is (in fact completely) multiplicative. The generalization of the proof is
then the following
d1 |n1 , d2 |n2
d1 |n1
d2 |n2
Exercise: Show by example that f completely multiplicative need not imply F
completely multiplicative.
It turns out that the operation f 7 F is of general interest; it gives rise to a
certain kind of duality among arithmetic functions. Slightly less vaguely, sometimes f is simple and F is more complicated, but sometimes the reverse takes place.
Denition: Dene the function by (1) = 1 and (n) = 0 for all n > 1. Note that
is multiplicative. Also write for the function n 7 n.
r
r
r r
(d) = 1 r +
+ . . . + (1)
= (1 1)r = 0.
2
3
r
d | n
For part b) we take advantage of the fact that since is multiplicative, so is the
sum over its divisors. Therefore it is enough to verify the identity for a prime power
102
8. ARITHMETIC FUNCTIONS
d |
(pa ) =
pa
i=0
(pi ) = 1 +
i=1
F (d)(n/d).
f (n) =
d|n
n
(f g)(n) =
f (d)g( ).
d
d | n
3. DIVISOR SUMS, CONVOLUTION AND MOBIUS
INVERSION
103
Why is this relevant? Well, dene 1 as the function 1(n) = 1 for all n;1 then
F = f 1. We have also seen that
1 = ,
(34)
f g(n) =
f (d1 )g(d2 ).
d1 d2 =n
The sum extends over all pairs of positive integers d1 , d2 whose product is n. This
already makes the commutativity clear. As for the associativity, writing things out
one nds that both (f g) h and f (g h) are equal to
and hence they are equal to each other! For (iii), we have
f (d1 )(d2 );
(f )(n) =
d1 d2 =n
b1 b2 =n
a1 a2 b1 b2 =mn
f (x)g(y) = (f g)(mn).
xy=mn
1We now have three similar-looking but dierent functions oating around: , and 1. It may
help the reader to keep on hand a short cheat sheet with the denitions of all three functions.
104
8. ARITHMETIC FUNCTIONS
4. Some Applications of M
obius Inversion
4.1. Application: another proof of the multiplicativity of the totient.
Our rst application of Mobius inversion is to give a proof of the multiplicativity
of which is independent of the Chinese
Remainder Theorem. To do this, we will
give a direct proof of the identity d|n (d) = n. Note that it is equivalent to write
the left hand side as
n
( ),
d
d|n
since as d runs through all the divisors of n, so does nd .2 Now let us classify elements
of {1, . . . , n} according to their greatest common divisor with n. The greatest common divisor of any such element k is a divisor d of n, and these are exactly the
elements k such that kd is relatively prime to nd , or, in yet other words, the elements
dl with 1 l nd and gcd(l, nd ) = 1, of which there are ( nd ). This proves the identity! Now, we can apply Mobius inversion to conclude that = is multiplicative.
Here is a closely related approach. Consider the additive group of Z/nZ, a cyclic
group of order n. For a given positive integer d, how many order d elements does it
have? Well, by Lagranges Theorem we need d|n. An easier question is how many
elements there are of order dividing a given d (itself a divisor of n): these are just the
elements x Z/nZ for which dx = 0, i.e., the multiples of n/d, of which there are
clearly d. But Mobius Inversion lets us pass from the easier question to the harder
question:indeed, dene f (k) to be the number of elements of order k in Z/nZ; then
F (k) = d|k f (d) is the number of elements of order dividing k, so we just saw
that F (k) = k. Applying Mobius inversion, we get that f (k) = (I )(k) = (k).
On the other hand, it is not hard to see directly that f (k) = 0 if k does not divide
n and otherwise equals (k) e.g., using the fact that there is a unique subgroup
of order k for all k | n and this gives another proof that 1 = .
4.2. A formula for the cyclotomic polynomials. For a positive integer d,
let d (x) be the monic polynomial whose roots are the primitive dth roots of unity,
i.e., those complex numbers which have exact order d in the multiplicative group
C (meaning that z d = 1 and z n = 1 for any integer 0 < n < d). These primitive
roots are contained in the group of all dth roots of unity, which is cyclic of order
d, so by the above discussion there are exactly (d) of them: in other words, the
degree of the polynomial d is (d).3 It turns out that these important polynomials
have entirely integer coecients, although without a somewhat more sophisticated
algebraic background this may well not be so obvious. One might think that to
write down formulas for the d one would have to do a lot of arithmetic with
complex numbers, but that is not
at all the case. Very much in the spirit of the
group-theoretical interpretation of d|n (d) = n, we have
d (x) = xn 1,
d|n
2Alternately, this just says that f 1 = 1 f .
3For once, the ancients who xed the notation have planned ahead!
4. SOME APPLICATIONS OF MOBIUS
INVERSION
105
since, both the left and right-hand sides are monic polynomials whose roots consist
of each nth root of unity exactly once.
In fact it follows from this formula, by induction, that the d s have integral
coecients. But Mobius inversion gives us an explicit formula. The trick here is to
convert the divisor product into a divisor sum by taking logarithms:
log
d (x) =
log d (x) = log(xn 1).
d|n
d|n
n
log n (x) =
log(xd 1)( )
d
d|n
so exponentiating back we get a formula which at rst looks too good to be true:
(xd 1)(n/d) .
n (x) =
d|n
F (n) = d|n f (d) makes sense, as does d|n F (d)(n/d), where for a A we interpret 0 a as being the additive identity element
0A , 1 a as a and 1 a as the
additive inverse a of a. Then one can check that d|n F (d)(n/d) = f (n) for all
f , just as before. We leave the proof as an exercise.
106
8. ARITHMETIC FUNCTIONS
1 d ( n )
I(Z/pZ, n) =
p
.
n
d
d|n
The proof of Theorem 103 requires some preliminaries on polynomials over nite
elds. We give a complete treatment in Appendix C.
CHAPTER 9
108
109
pretty result which once again underscores the importance of keeping an eye out
for multiplicativity:
Theorem 105. Suppose f is a multiplicative arithmetic function such that
f (pa ) 0 as pa . Then f (n) 0 as n .
In other words if f is a multiplicative function such that for every > 0, |f (pm )| <
for all suciently large prime powers, it follows that |f (n)| < for all suciently
large n, prime power or otherwise.
Remark: As long as our multiplicative function f is never 0, an equivalent statement is that if f (pn ) for all prime powers than f (n) for all n. (Just
apply the theorem to g = f1 , which is multiplicative i f is.) So assuming the
theorem, we can just look at
(pa ) = pa1 (p 1) max(p 1, a 1),
and if pa is large, at least one of p and a is large. But actually we get more:
Corollary 106. For any xed , 0 < < 1, we have (n)/n .
n
Proof. We wish to show that f (n) := (n)
0 as n . Since both n
and (n) are multiplicative, so is their quotient f , so by the theorem it suces to
show that f approaches zero along prime powers. No problem:
f (pn ) =
pn
p
=
(p1 )n .
n1
p
(p 1)
p1
Here 1 < 0, so as p the rst factor approaches 1 and the second factor
approaches 0 (just as x 0 as x for negative ). On the other hand, if p
stays bounded and n then the expression tends to 0 exponentially fast.
Now let us prove Theorem 105. We again use the idea that for any L > 0, there
exists N = N (L) such that n > N implies N is divisible by a prime power pa > L.
First lets set things up: since f (pm ) 0 we have that f is bounded on prime
powers, say |f (pm )| C. Moreover, there exists a b such that |f (pm )| 1 for all
pm b; and nally, for every > 0 there exists L() such that pm > L() implies
|f (pm )| < . Now write n = pa1 1 par r , so that
f (n) = f (pa1 1 ) f (par r ).
Since there are at most b indices i such that pai i B, there are at most b factors in
the product which are at least 1 in absolute value, so that the product over these
bad indices has absolute value at most C b . Every other factor has absolute value
at most 1. Moreover, if n is suciently large with respect to L() (explicitly, if
n > L()!L() , as above), then the largest prime power divisor par r of n is greater
than L() and hence |f (par r )| < . This gives
|f (n)| = |f (pa1 1 par r )| C b .
Since C and b are xed and is arbitrary, this shows that f (n) 0 as n .
A nice feature of Theorem 105 is that it can be applied to other multiplicative
functions. For instance, it allows for a quick proof of the following useful upper
bound on the divisor function:
110
d(n)
= 0.
n
Proof. Exercise.
Note that Corollary 106 is equivalent to the following statement: for every 0 < <
1, there exists a positive constant C() such that for all n,
(n) C()n .
Still equivalent would be to have such a statement for all n N0 . This would
be very useful provided we actually knew an acceptable value of C() for some ,
possibly with an explicitly given and reasonably small N0 () of excluded values.
We quote without proof the following convenient result for = 21 :
i=1
(1
1
).
pi
n
1
=
(1 );
(n) i=1
pi
r
now what
we need to show is that for any L > 0, we can choose primes p1 , . . . , pr
r
such that i=1 ( pip1
)1 > L.
i
Well, at the moment we (sadly for us) dont know much more about the sequence
of primes except that it is innite, so why dont we just take n to be the product
of the rst r primes p1 = 2, . . . , pr ? And time for a dirty trick: for any i, 1 i r,
we can view 11 1 as the sum of a geometric series with ratio r = p1i . This gives
pi
n
1
2
=
(1 )1 =
(1 + p1
i + pi + . . .).
(n) i=1
pi
i=1
r
The point here is that if we formally extended this product over all primes:
2
3
(1 + p1
i + pi + pi + . . .)
i=1
and multiplied it all out, what would we get? A moments reection reveals a
beautiful surprise: the uniqueness of the prime power factorization is precisely
111
equivalent to the
statement that multiplying out this innite product we get the
innite series n=1 n1 , i.e., the harmonic series! Well, except that the harmonic
series is divergent. Thats actually a good
r thing; but rst lets just realize that
if we multiply out the nite product i= (1 p1i )1 we get exactly the sum of
the reciprocals of the integers n which are divisible only by the rst r primes. In
particular since of course pr r, this sum contains the reciprocal of the rst r
integers, so: with n = p1 pr ,
r
n
1
.
(n) n=1 n
But now were done, since as we said before the harmonic series diverges recall that a very good approximation to the rth partial sum is log r, and certainly
limr log r = . This proves the result.
To summarize, if we want to make (n)/n arbitrarily small, we can do so by taking
n to be divisible by suciently many primes. On the other hand (n)/n doesnt
1
have to be small: (p)/p = p1
p = 1 p , and of course this quantity approaches 1
as p . Thus the relative size of (n) compared to n depends quite a lot on the
shape of the prime power factorization of n.
Contemplation of this proof shows that we had to take n to be pretty darned
large in order for (n) to be signicantly smaller than n. In fact this is not far from
the truth.
4. The Truth About Eulers Function
It is the following:
Theorem 110. a) For any > 0 and all suciently large n, one has
(n) log log n
e .
n
b) There exists a sequence of distinct positive integers nk such that
lim
1
( ) log n 0.5772.
n
k
lim
k=1
(b) What the result is really saying is that n/(n) can be, for arbitrarily large n,
as large as a constant times log log n, but no larger.
In stating the result in two parts we have just spelled out a fundamental concept
from real analysis (which however is notoriously dicult for beginning students to
understand): namely, if for any function f : Z+ R we have a number L with the
property: for every > 0, then
(i) for all suciently large n one has
f (n) > L ,
112
and (ii) for all L < L there are only nitely many n such that f (n) < L , then one
says that L is the lower limit (or limit inferior) of f (n), written
lim inf f (n) = L.
n
There is a similar denition of the upper limit (or limit superior) of a function:
it is the largest L such that for any > 0, for all but nitely many n we have
f (n) < L + . A function which is unbounded below (i.e., takes arbitrarily small
values) has no lower limit according to our denition, so instead one generally says
that lim inf f = , and similarly we put lim sup f = + when f is unbounded
above. With these provisos, the merit of the upper and lower limits is that they
always exist; moreover one has
lim inf f lim sup f
always, and equality occurs i limn f exists (or is ). Using this terminology
we can summarize the previous results much more crisply:
Since (p) = p 1, we certainly have
lim sup (n)/n = 1,
so we are only interested in how small (n) can be for large n. We rst showed
that limn (n) = +, and indeed that for any < 1,
lim (n)/n = .
However, for = 1,
lim inf (n)/n = 0.
n
Thus the lower order of (n) lies somewhere between n for < 1 (i.e., is larger
than this for all suciently large n) and n (i.e., is smaller than this for innitely
many n). In general, one might say that an arithmetic function f has lower order
g : Z+ (0, ) (where g is presumably some relatively simple function) if
lim inf
n
f
= 1.
g
e n
log log n .
Remark: all statements about limits, lim infs lim sups and so on of a function
f , by their nature are independent of the behavior of f on any xed nite set of
values: if we took any arithmetic function and dened it completely randomly for
10
the rst 1010 values, then we would not change its lower/upper order. However in
practice we would like inequalities which are true for all values of the function, or
at least are true for an explicitly excluded and reasonably small nite set of values.
In the jargon of the subject one describes the latter, better, sort of estimate as an
eective bound. You can always ask the question Is it eective? at the end of
any analytic number theory talk and the speaker will either get very happy or very
defensive according to the answer. So here we can ask if there is an eective lower
bound for of the right order of magnitude, and the answer is a resounding yes.
Here is a nuclear-powered lower bound for the totient function:
5. OTHER FUNCTIONS
113
n
log log n +
3
log log n
.
5.1. The sum of divisors function . The story for the function is quite
similar to that of . In fact there is a very close relationship between the size of
and the size of coming from the following beautiful double inequality.
Proposition 112. For all n, we have
(n)(n)
1
<
< 1.
(2)
n2
(1 p1
i ),
i
so
(n)(n)
i 1
=
(1 pa
).
i
n2
i
We have a product of terms in which each factor is less than one; therefore the
product is at most 1. Conversely, each of the exponents
is less than or equal to 2,
1
(1 ps )1 =
(1 + ps + p2s + . . .) =
= (s),
ns
p
p
n=1
so the last product is equal to
Remark: Recall that (2) =
1
(2) .
2
6 ,
so that
1
(2)
6
2 .
From this result and the corresponding results for we immediately deduce:
Theorem 113. For every > 0,
(n)
n1+
0.
In fact we can prove this directly, the same way as for the function.
The truth about the lower order of dualizes to give the true upper order of ,
up to an ambiguity in the multiplicative constant, which will be somewhere between
(2)1 e and e . In fact the latter is correct:
Theorem 114.
e (n)
= 1.
n n log log n
lim sup
And again, because (p) = p + 1 p for primes, we nd that the lower order of
(n) is just n.
114
5.2. The divisor function. The divisor function d(n) is yet more irregularly
behaved than and , as is clear because d(p) = 2 for all primes 2, but of course
d takes on arbitrarily large values. In particular the lower order of d is just the
constant function 2. As regards the upper order, we limit ourselves to the following
two estimates, which you are asked to establish in the homework:
Theorem 115. For any > 0, limn
d(n)
n
= 0.
In other words, for large n, the number of divisors of n is less than any prearranged
power of n. This makes us wonder whether its upper order is logarithmic or smaller,
but in fact this is not the case either.
Proposition 116. For any k Z+ and any real number C, there exists an n
such that d(n) > C(log n)k .
Thus the upper order of d(n) is something greater than logarithmic and something
less than any power function. We leave the matter there, although much more
could be said.
6. Average orders
If we take the perspective that we are interested in the distribution of values of an
arithmetic function like (or d or or . . .) in a statistical sense, then we ought
to worry that just knowing the upper and lower orders is telling us a very small
piece of the story. As a very rough comparison, suppose we tried to study the
American educational system by looking at its best and worst students, not in the
sense of best or worst ever, but were concerned with what sort of education the top
0.1% and the bottom 0.1% of Americans get, durably over time. The rst task is
certainly of some interest for instance, we all wonder how our upper echelon compares to the creme de la creme of the educational systems of other nations and other
times; are we producing more or less scientic geniuses and so forth and the latter
task is profoundly depressing, but probably no one will be deluded into believing
that we are studying anything like what the typical or average American learns.
It may interest you to know that the rich range of statistical techniques that can
be so fruitfully applied to studying distributions of real-world populations can be
equally well applied to study the distribution of values of arithmetic functions. Indeed, this is a ourishing subbranch of analytic number theory: statistical number theory. Here we have time only to sample some of the main developments
of analytic number theory. And, what is yet more sad, we cannot assume that we
know enough about these statistical tools to apply them in all their glory.1 But
probably we are all familiar with the notion of an average.
The idea here is that if f (n) is irregularly behaved, we can smooth it out by
considering its successive averages, say
1
fa (n) =
f (k).
n
n
k=1
1This is one case where by we I mean me; maybe your knowledge of statistics is equal
to the task, but I must confess that mine is not.
6. AVERAGE ORDERS
115
We have every right to expect for fa to be better behaved than f itself, and we
say that the average order of f is some (presumably simpler) function g if fa g.
As a sample we give the following classic result:
Theorem 117. The average order of the totient function is g(n) =
1
2(2) n
3
2 n.
Thus in some sense the typical value of (n)/n is about .304. It would be nice to
interpret this as saying that if we pick an n at random, then with large probability
(n)/n is close to .304, but of course we well know that the average i.e., the
arithmetic mean does not work that way. Just because the average score on an
exam is 78 does not mean that most students got a grade close to 78. (Perhaps
2/3 of the course got A grades and the other third failed; these things do happen.)
Nevertheless it is an interesting result, and to prove it we will derive a very interesting consequence.
First however it is nice to have a harder analysis
n analogue of Theorem 117. That
is, the theorem at the moment asserts that n1 k=1 (k) 32 n, or equivalently
n
k=1 (k)
lim
= 1.
3 2
n
2 n
n
This in turn means that if we dene the error term E(n) = k=1 (k) ( 32 )n2 ,
n
so that k=1 (k) = 32 n2 + E(n), then the error term is small compared to the
main term: namely it is equivalent to
lim
E(n)
= 0.
(3/ 2 )n2
So far we are just pushing around the denitions. But a fundamentally better
thing to do would be to give an upper bound on the error E(n), i.e., to nd a
h(n)
nice, simple function h(n) such that E(n) h(n) and (3/
2 )n2 0 In fact we do
not need E(n) h(n) quite: if E(n) were less than 100h(n) that would be just as
good, because if h(n) divided by (3/ 2 )n2 approaches zero, the same would hold
for 100h(n). This motivates the following notation:
We say that f (n) = O(g(n)) if there exists a constant C such that for all n,
f (n) Cg(n). So it would be enough to show that E(n) = O(h(n)) for some function h which approaches zero when divided by 32 n2 , or in more colloquial language,
for any function grows less than quadratically. So for instance a stronger statement
would be that E(n) = O(n ) for any < 2. In fact one can do a bit better than
this:
Theorem 118.
k=1
(k) =
3 2
n + O(n log n).
2
Or again, dividing through by n, this theorem asserts that that the average over the
rst n values of the function is very nearly 32 n, and the precise sense in which
this is true is that the dierence between the two is bounded by a constant times
log n. Note that it would be best of all to know an actual acceptable value of such
116
V (N )
L(N )
6
2 .
Before we prove the result, we can state it in a slightly dierent but equally striking
way. We are asking after all for the number of ordered pairs of integers (x, y) each
of absolute value at most N , with x and y relatively prime. So, with a bit of poetic
license perhaps, we are asking: what is the probability that two randomly chosen
integers are relatively prime? If we lay down the ground rules that we are randomly choosing x and y among all integers of size at most N , then the astonishing
answer is that we can make the probability as close to 62 as we wish by taking N
suciently large.
6. AVERAGE ORDERS
117
Now let us prove the result, or at any rate deduce it from Theorem 117. First
we observe that the eight lattice points immediately nearest the origin i.e., those
with max(|x|, |y|) 1 are all visible. Indeed there is an eightfold symmetry in
the situation: the total number of visible lattice points in the square |x|, |y| N
will then be these 8 plus 8 times the number of lattice points with 2 x N ,
1 y x (i.e., the ones whose angular coordinate satisfy 0 < 2 . But now
we have
V (N ) = 8 +
1=8
(N ).
Aha: we know that
2nN 1mn,(m,n)=1
1nN
(N ) =
3
2
2 N
1nN
+ O(N log N ), so
V (N ) 242 N 2
N log N
|C
.
L(N )
L(N )
log N
0 as N , so we nd that
But now L(N ) = (2N + 1)2 , and C N N
2
V (N ) 242 N 2
,
N
(2N + 1)2
0 = lim
or
24
V (N )
6
2
= lim
= 2.
N L(N )
N (2 + 1 )2
lim
Having given a formal proof of this result based upon the unproved Theorem 117
(trust me that this theorem is not especially dicult to prove; it just requires a
bit more patience than we have at the moment), let us now give a proof which
is not rigorous but is extremely interesting and enlightening. Namely, what does
it really mean for two integers x and y to be relatively prime? It means that
there is no prime number p which simultaneously divides both x and y. Remarkably, this observation leads directly to the result. Namely, the chance that x is
divisible by a prime p is evidently p1 , so the chance that x and y are both divisible
by p is p12 . Therefore the chance that x and y are not both divisible by a prime
p is (1 p2 ). Now we think of being divisible by dierent primes as being independent events: if I tell you that an integer is divisible by 3, not divisible by
5 and divisible by 7, and ask you what are the odds its divisible by 11, then we
1
still think the chance is 11
. Now the probability that each of a set of independent
events all occur is the product of the probabilities that each of them occur, so the
probability that x and y are not simultaneously divisible by any prime p ought to
be (1 22 ) (1 32 . . . (1 p2 )
, but we saw earlier that this innite product
is nothing else than the reciprocal of n=1 n12 = (2). Thus the answer should be
1
6
(2) = 2 !
The argument was not rigorous because the integers are not really a probability
space: there is nothing random about whether, say, 4509814091046 is divisible by
103; either it is or it isnt. Instead of probability one should rather work with the
notion of the density of a set of integers (or of a set of pairs of integers) a notion
which we shall introduce rather soon and then all is well until we pass from sets
dened by divisibility conditions on nitely many primes to divisibility conditions
on all (innitely many) primes. This is not to say that such a probability-inspired
proof cannot be pushed through it absolutely can. Moreover, the fact that the
118
probabilistic argument gives an answer which can be proven to be the correct answer via conventional means is perhaps most interesting of all.
Finally, we note that probabilistic reasoning gives the same answer to a closely
related question: what is the probability that a large positive integer n is squarefree? This time we want, for each prime p independently, n not to be divisible
by p2 , of which 1 p2 percent of all integers are. Therefore we predict that the
probability that n is squarefree is also 62 , and this too can be proved by similar
(although not identical) means to the proof of Theorem 117.
6.1. The average order of the M
obius function. We are interested in the
behavior of
n
1
a (n) =
(k).
n
k=1
CHAPTER 10
After the previous section we well know that one can ask for more, namely for
the asymptotic behavior (if any) of (n). The asymptotic behavior is known the
celebrated Prime Number Theorem, coming up soon but it admits no proof
simple enough to be included in this course. So it is of interest to see what kind of
bounds (if any!) we get from some of the proofs of the innitude of primes we shall
discuss.
1.1. Euclids proof. We recall Euclids proof. There is at least one prime,
namely p1 = 2, and if p1 , . . . , pn are any n primes, then consider
Nn = p1 pn + 1.
This number Nn may or may not be prime, but being at least 3 it is divisible by
some prime number q, and we cannot have q = pi for any i: if so pi |p1 pn and
pi |Nn implies pi |1. Thus q is a new prime, which means that given any set of n
distinct primes we can always nd a new prime not in our set: therefore there are
innitely many primes.
Comments: (i) Euclids proof is often said to be indirect or by contradiction,
but this is unwarranted: given any nite set of primes p1 , . . . , pn , it gives a perfectly
denite procedure for constructing a new prime.
(ii) Indeed, if we dene E1 = 2, and having dened E1 , . . . , En , we dene En+1 to
be the smallest prime divisor of E1 En + 1, we get a sequence of distinct prime
numbers, nowadays called the Euclid sequence (of course we could get a dierent
sequence by taking p1 to be a prime dierent from 2). The Euclid sequence begins
2, 3, 7, 43, 13, 53, 5, . . .
Many more terms can be found on the online handbook of integer sequences. The
obvious question does every prime occur eventually in the Euclid sequence with
p1 = 2 (or in any Euclid sequence?) remains unanswered.
119
120
(iii) It is certainly a classic proof, but it is not aesthetically perfect (whatever that may mean). Namely, there is a moment when the reader wonders hey,
why are we multiplying together the known primes and adding one? One can address this by pointing out in advance the key fact that gcd(n, n + 1) = 1 for all n.
Therefore if there were only nitely many primes p1 , . . . , pr , there would be an integer divisible by all of them, N = p1 pr , and then the fact that gcd(N, N + 1) = 1
leads to a contradiction. I do like this latter version better, but it is really just a
rewording of Euclids proof.1
(iv) Euclids proof can be used to prove some further results. For instance:
Theorem 121. Fix a positive integer N > 2. Then there are innitely many
primes p which are not congruent to 1 (mod N ).
Proof. Take p1 = 2, which is not congruent to 1 (mod N ). Assume that
p1 , . . . , pn is a list of n primes, none of which are 1 (mod N ). Now consider the
product
Pn := N p1 . . . pn 1.
Pn N 1 2, so it has a prime divisor. Also Pn 1 (mod N ). So if every
prime divisor q of Pn were 1 mod N , then so would Pn be 1 (mod N ) which it
isnt therefore Pn has at least one prime divisor q which is not 1 (mod N ). As
above, clearly q = pi for any i, which completes the proof.
In fact this argument can be adapted to prove the following generalization.
Theorem 122. Fix a positive integer N > 2, and let H be a proper subgroup of
U (N ) = (Z/N Z) . There are innitely many primes p such that p (mod N ) H.
The proof is left as an exercise. (Suggestion: x a Z+ , 1
< a < N , such that a
n
(mod N ) H. Take P0 = 2N + a and for n 1, Pn = (2N i=1 pi ) + a.)
Remark: If (N ) = 2 that is, for N = 3, 4, or 6 then 1 gives a reduced
residue system modulo N , so that any prime p > N 1 which is not 1 (mod N ) is
necessarily 1 (mod N ). Thus the argument shows that there are innitely many
primes p which are 1 (mod 3), 1 (mod 4) or 1 (mod 6).
Remark: Interestingly, one can also prove without too much trouble that there
are innitely many primes p 1 (mod N ): the proof uses cyclotomic polynomials.
Theorem 123. For any eld F , there are innitely many irreducible polynomials over F , i.e., innitely many irreducible elements in F [T ].
Proof. Euclids argument works here: take e.g. p1 (t) = t, and having produced p1 (t), . . . , pr (t), consider the irreducible factors of p1 (t) pr (t) + 1.
Note that one can even conclude that there are innitely many prime ideals in
F [t] equivalently, there are innitely many monic irreducible polynomials. When
F is innite, the monic polynomials t a for a F do the trick. When F is
1I have heard this argument attributed to the great 19th century algebraist E. Kummer. For
what little its worth, I believe I came up with it myself as an undergraduate. Surely many others
have had similar thoughts.
121
nite, we showed there are innitely many irreducible polynomials, but there are
only #F 1 dierent leading coecients, so there must be innitely many monic
irreducible polynomials. It is interesting to think about why this argument does
not work in an arbitrary PID.2
1.2. Fermat numbers. Another way to construe Euclids proof is that it
suces to nd an innite sequence ni of pairwise coprime positive integers, because
these integers must be divisible by dierent primes. The Euclid sequence is such a
sequence. A more natural looking sequence is the following.
n
n1
Fd + 2.
d=0
This certainly suces, since if p is some common prime divisor of Fd (for any d < n)
and Fn then p | Fn 2, hence p | 2, but all the Fermat numbers are odd. The claim
itself can be established by induction; we leave it to the reader.
1.3. Mersenne numbers. Recall that Fermat believed that all the Fermat
numbers were prime, and this is not true, since e.g.
5
F5 = 22 + 1 = 641 6700417,
and in fact there are no larger known prime Fermat numbers. Nevertheless the
previous proof shows that there is something to Fermats idea: namely, they are
almost prime in the sense that no two of them have a common divisor. One then
wonders whether one can devise a proof of the innitude of the primes using the
Mersenne numbers 2p 1, despite the fact that it is unknown whether there are
innitely many Mersenne primes. This can indeed be done:
Let p be a prime (e.g. p = 2, as usual) and q a prime divisor of 2p 1. Then
2p 1 (mod q). In other words, p is a multiple of the order of 2 in the cyclic group
(Z/qZ) . Since p is prime the order of 2 must be exactly p. But by Lagranges theorem, the order of an element divides the order of the group, which is (q) = q 1,
so p | q 1 and hence p < q. Thus we have produced a prime larger than the one
we started with.
1.4. Eulers rst proof. It is a remarkable fact that the formal identity
1
1
(1 )1 =
p
n
p
n
which amounts to unique factorization immediately implies the innitude of
primes. Indeed, on the left hand side we have a possibly innite product, and on
the right-hand side we have an innite sum. But the innite sum is well-known to
be divergent, hence the product must be divergent as well, but if it were a nite
product it would certainly be convergent!
2For there are PIDs with only nitely many prime ideals: e.g. the set of rational numbers
whose reduced denominator is prime to 42 is a PID with exactly three prime ideals.
122
Many times in the course we have seen a rather unassuming bit of abstract algebra turned into a mighty number-theoretic weapon. This example shows that
the same can be true of analysis.
1.5. Chaitins proof. In his most recent book3 the computer scientist Gregory Chaitin announces an algorithmic information theory proof of the innitude
of primes. He says: if there were only nitely many primes p1 , . . . , pk then every
positive integer N could be written as
N = pa1 1 pakk ,
which is too ecient a way of representing all large integers N . Chaitin compares
his proof with Euclids proof and Eulers proof (with a grandiosity that I confess
I nd unjustied and unbecoming). But criticism is cheaper than understanding:
can we at least make sense of his argument?
Let us try to estimate how many integers n, 1 n N , could possibly be expressed in the form pa1 1 pakk , i.e., as powers of a xed set of k primes. In order for
this expression to be at most N , every exponent has to be much smaller than N :
precisely we need 0 ai logpi N ; the latter quantity is at most log2 N , so there
are at most log2 N + 1 choices for each exponent, or (log2 N + 1)k choices overall.
But aha this latter quantity is much smaller than N when N is itself large: it is
indeed the case that the percentage of integers up to N which we can express as a
product of any k primes tends to 0 as N approaches innity.
So Chaitins proof is indeed correct and has a certain admirable directness to it.
1.6. Another important proof. However, the novelty of Chaitins proof is
less clear. Indeed, in many standard texts (including Hardy and Wright, which
was rst written in 1938), one nds the following argument, which is really a more
sophisticated version of Chaitins proof.
Again, we will x k and estimate the number of integers 1 n N which are
divisible only by the rst k primes p1 , . . . , pk , but this time we use a clever trick:
recall that n can be written uniquely as uv 2 where u is squarefree. The number of
squarefree us however large! which are divisible only by the rst k primes is 2k
(for each pi , we either choose to include it or not). On the other hand, n = uv 2 N
1
implies that v 2 N and hence v N 2 . Hence the number of n N divisible only
by the rst k primes is at most 2k N . If there are k primes less than or equal to
N , we therefore have
2k N N
or
k
log2 (N )
.
2
123
This shows the innitude of the primes in Z, since we saw that Z[ 5] is not a PID!
The proof of Theorem 125 lies further up and further in the realm of algebraic
number theory than we dare to tread in this course. But here is a sketch of a proof
for the slummers4: the ring S is a Dedekind domain, so for any nonzero prime
ideal p of R, pS is a nontrivial nite product of powers of prime ideals. The distinct prime ideals Pi appearing in this factorization are precisely the prime ideals
P lying over p, i.e., such that P R = p. This shows that the restriction map
P 7 P R from prime ideals of S to prime ideals of R has nite bers. Thus, since
by assumption there are only nitely many prime ideals of R, there are only nitely
many prime ideals of S. Finally, a Dedekind domain with only nitely many prime
ideals is necessarily a PID, as can be shown using the Chinese Remainder Theorem.
This is a proof with a moral: we need to have innitely many primes in order
for number theory to be as complicated as it is.
1.8. Furstenbergs proof. The last proof we will give is perhaps the most
remarkable one. In the 1955 issue of the American Mathematical Monthly there
appeared the following article by Hillel Furstenberg, which we quote in its entirety:
In this note we would like to oer an elementary topological proof of the innitude of the prime numbers. We introduce a topology into the space of integers
S, by using the arithmetic progressions (from to +) as a basis. It is not
dicult to verify that this actually yields a topological space. In fact under this
topology S may be shown to be normal and hence metrizable. Each arithmetic
progression is closed as well as open, since its complement is the union of other
arithmetic progressions (having the same dierence). As a result the union of any
nite number of arithmetic progressions is closed. Consider now the set A = Ap ,
where Ap consists of all multiples of p, and p runs though the set of primes 2.
4i.e., more advanced readers who are reading these notes
124
The only numbers not belonging to A are 1 and 1, and since the set {1, 1} is
clearly not an open set, A cannot be closed. Hence A is not a nite union of closed
sets which proves that there are an innity of primes.
Remarks: Furstenberg was born in 1935, so this ranks as one of the leading instances of undergraduate mathematics in the 20th century. He is now one of the
leading mathematicians of our day. What is all the more remarkable is that this
little argument serves as a preview of the rest of his mathematical career, which has
concentrated on applying topological and dynamical methods (ergodic theory) to
the study of problems in number theory and combinatorics.
2. Bounds
Let us now go through some of these proofs and see what further information, if
any, they yield on the function (n).
1. From Euclids proof one can deduce that (n) C log log n. We omit the
argument, especially since the same bound follows more readily from the Fermat
numbers proof. Of course this is a horrible bound.
2. The Mersenne numbers proof gives, I believe, an even worse (iterated logarithmic) bound. I leave it to the reader to check this.
3. Eulers rst proof does not immediately come with a bound attached to it.
However, as we saw earlier in our study of the function, it really shows that
r
r
1
1
C log r.
(1 )1 >
pi
r
i=1
i=1
After some work, one can deduce from this that
n
1
C log log n,
p
i=1 i
whence the divergence of the prime reciprocals. We will not enter into the details.
4. Chatins proof gives a lower bound on (n) which is between log log n and
log n (but much closer to log n).
5. As we saw, one of the merits of the proof of 1.6 is that one easily deduces
the bound (n) log22 n . (Of course, this is still almost a full exponential away
from the truth.)
6. As we mentioned, knowing that the prime reciprocals diverge suggests that (n)
is at worst only slightly smaller than n itself. It shows that (n) is not bounded
above by any power function Cn for < 1.
7. The last two proofs give no bounds whatsoever, not even implicitly. This seems
to make them the worst, but there are situations in which one wants to separate
out the problem of proving the innitude of a set of numbers from the problem of
estimating its size, the latter problem being either not of interest or (more often)
125
hopelessly out of current reach. In some sense all of the arguments except the
last two are implicitly trying to prove too much in that they give lower bounds on
(n). Trying to prove more than what you really want is often a very good technique in mathematics, but sometimes, when the problem is really hard, making
sure that you are concentrating your eorts solely on the problem at hand is also a
key idea. At any rate, there are many problems in analytic combinatorics for which
Furstenberg-type existence proofs either were derived long before the explicit lower
bounds (which require much more complicated machinery) or are, at present, the
only proofs which are known.
3. The Density of the Primes
All of the results of the previous section were lower bounds on (x). It is also of
interest to give an upper bound on (x) beyond the trivial bound (x) x. The
following gives such a result.
Theorem 126. As n , we have
(n)
n
0.
If you like, this result expresses that the probability that a randomly chosen positive
integer is prime is 0. We will come back to this idea after the proof, replacing
probability by the more precise term density.
Proof. Let us rst observe that there are at most N2 primes in the interval
[1, N ] since all but one of them must be odd. Similarly, since only one prime is
divisible by 3, every prime p > 6 must be of the form 6k + 1 or 6k + 5, i.e., only
2 of the 6 residue classes mod 6 can contain more than one prime (in fact some of
them, like 4, cannot contain any primes, but we dont need to worry about this),
so that of the integers n N , at most 26 N + 6 + 2 are primes.
In fact this simple reasoning can be carried much farther, using what we know
about the function. Namely, for any positive integer d, if gcd(a, d) > 1 there is
at most one prime p a (mod d).5 In other words, only (d) out of d congruence
classes mod d can contain more than one prime, so at most ( (d)
d )N + d + (d)
of the integers 1 n N can possibly be prime. (Here we are adding d once to
take care of the one prime that might exist in each congruence class and adding
d a second time to take care of the fact that since N need not be a multiple of
d, so the partial congruence class at the end may contain a higher frequency of
primes than (d)/d, but of course no more than (d) of primes overall.) But we
know, thank goodness, that for every > 0, there exists a d such that (d)
< ,
d
and choosing this d we nd that the number of primes n N is at most
(N )
N + d + (d)
d + (d)
=+
.
N
N
N
This approaches as N , so is, say, less than 2 for all suciently large N .
Remark: Reecting on the proof, something slightly strange has happened: we
showed that (d)/d got arbitrarily small by evaluating at d = p1 pr , the product
of the rst r primes. Thus, in order to show that the primes are relatively sparse,
we used the fact that there are innitely many of them!
5Recall this is true because if x a (mod d), gcd(a, d) | d | x a, and gcd(a, d) | a, so
gcd(a, d)|x.)
126
In fact, by similarly elementary reasoning, one can prove a more explicit result,
that (n) logCn
log n . Before moving on to discuss some similar and stronger statements about the order of magnitude of (n), let us digress a bit on the notion of
density of a set of integers.
Denition: A subset A of the positive integers is said to have density (A) = if
#{1 n N | n A}
= .
N
N
lim
4. SUBSTANCE
127
Ei ) =
P (Ei ).
i=1
Our density function satises nite additivity but not countable additivity: indeed, if we took Ai to be the singleton set {i}, then certainly (Ai ) = 0 for all i
but the union of all the Ai s are the positive integers themselves, so have density
1. This is the problem: for a probability measure we cannot have (countably!)
innitely many sets of measure zero adding up to a set of positive measure, but
this happens for densities.
A similar problem occurs in our proof that the squarefree integers have density 62 . The set Sp2 of integers which are not multiples of p2 has density 1 p12 ,
and it is indeed true that these sets are nitely independent in the sense that the
intersection of any nite number of them has density equal to the product of the
densities of the component sets:
(Sp1 . . . Spn ) =
(1
i=1
1
).
p2
4. Substance
Let us dene a subset S of the positive integers to be substantial if
1
nS n
= .
+
nS
nS
nZ
So there are plenty of substantial subsets. It is certainly possible for both S and
S to be substantial: take, e.g. the set of even numbers (or any S with 0 < (S) < 1:
see below).
Example 4: For any xed k > 1, the
set of all perfect kth powers is not substantial: by (e.g.) the Integral Test, n=1 n1k < .
Example 5: The set of integers whose rst decimal digit is 1 is substantial.
128
Example 6: Indeed any set S with positive upper density is substantial. This
is elementary but rather tricky to show, and is left as a (harder) exercise.
The converse does not hold. Indeed, we saw above that the primes have zero
density, but we will now establish the following:
But this says that for any N , at least half of all positive integers are divisible by
one of the rst k primes, which the argument of 1.6 showed not to be the case.
Remarks: Maybe this is the best elementary proof of the innitude of the primes.
Aside from being an elegant and interesting argument,
it is a quantum leap beyond
1
the previous results: since for any k 2,
converges,
it shows that there
n nk
are, in some sense, many more primes than perfect squares. In fact it implies that
there is no < 1 and constant C such that (n) Cn for all n, so that if (n)
is well-behaved enough to have a true order of magnitude than its true order is
rather close to n itself.
A striking substance-theoretic result that we will not be able to prove here:
Theorem 128. (Brun) The set T of twin primes i.e., primes p for which
at least one of p 2 and p + 2 is prime is insubstantial.
In a sense, this is disappointing, because we do not know whether T is innite,
whereas if T had turned out to be substantial we would immediately know that
innitely many twin primes exist! Nevertheless a fair amount of work has been
devoted (for some reason) to calculating Bruns sum
1
1.902 . . . .
n
nT
In particular Tom Nicely has done extensive computations of Bruns sum. His work
got some unexpected publicity in the mid 1990s when his calculations led to the
recognition of the infamous Pentium bug, a design aw in many of the Intel microprocessors.6
6The PC I bought in 1994 (my freshman year of college) had such a bug. The Intel corporation
reassured consumers that the bug would be of no practical consequence unless they were doing
substantial oating point arithmetic. Wonderful. . .
4. SUBSTANCE
129
The last word on density versus substance: In 1972 Endre Szemeredi proved
by elementary combinatorial means the sensational result that any subset S of
positive upper density contains arbitrarily long arithmetic progressions, a vast generalization of a famous theorem of van der Waerden (on colorings) which was
conjectured by Erdos and Turan in 1936.7 Unfortunately this great theorem does
not apply to the primes, which have zero density.
However, Erdos and Turan made the much more ambitious conjecture that any
substantial subset should contain arbitrarily long arithmetic progressions. Thus,
when Green and Tao proved in 2002 that there are arbitrarily long arithmetic progressions in the primes, they veried a very special case of this conjecture. Doubtless many mathematicians are now reconsidering the Erdos-Turan conjecture with
renewed seriousness.
7Several other mathematicians have devoted major parts of their career to bringing more
CHAPTER 11
x
log x ;
i.e.,
(x) log x
= 1.
x
1.1. Gauss at 15. The prime number theorem (aectionately called PNT)
was apparently rst conjectured in the late 18th century, by Legendre and Gauss
(independently). In particular, Gauss conjectured an equivalent but more appealing form of the PNT in 1792, at the age of 15 (!!!).
Namely, he looked at the frequency of primes in intervals of lengths 1000:
(x) (x 1000)
.
1000
Computing by hand, Gauss observed that (x) seemed to tend to 0, however very
slowly. To see how slowly he computed the reciprocal, and found
1
log x,
(x)
(x) =
meaning that
1
.
log x
Evidently 15 year old Gauss knew both dierential and integral calculus, because
he realized that (x) was a slope of the secant line to the graph of y = (x). When
x is large, this suggests that the slope of the tangent line to (x) is close to log1 x ,
and hence he guessed that the function
x
dt
Li(x) :=
log
t
2
(x)
x
.
log x
132
Thus PNT is equivalent to (x) Li(x). The function Li(x) called the logarithmic integral is not elementary, but has a simple enough power series expansion
(see for yourself). Nowadays we have lots of data, and one can see that the error
|(x) Li(x)| is in general much smaller than |(x) logx x |, so the dilogarithm gives
a better asymptotic expansion. (How good? Read on.)
1.2. A partial result. As far as I know, there was no real progress for more
than fty years, until the Russian mathematician Pafnuty Chebyshev proved the
following two impressive results.
Theorem 131. (Chebyshev, 1848, 1850)
a) There exist explicitly computable positive constants C1 , C2 such that for all x,
C2 x
C1 x
< (x) <
.
log x
log x
b) If limx
(x)
x/(log x)
Remarks:
(i) For instance, one version of the proof gives C1 = 0.92 and C2 = 1.7.
(But I dont know what values Chebyshev himself derived.)
(ii) The rst part shows that (x) is of order of magnitude logx x , and the second
shows that if it is regular enough to have an asymptotic value at all, then it must
be asymptotic to logx x . Thus the additional trouble in proving PNT is establishing
this regularity in the distribution of the primes, a quite subtle matter. (We have
seen that other arithmetical functions, like and d are far less regular than this
their upper and lower orders dier by more than a multiplicative constant, so the
fact that this regularity should exist for (x) is by no means assured.)
(iii) Chebyshevs proof is quite elementary: it uses less machinery than some of
the other topics in this course. However we will not give the time to prove it here:
blame it on your instructors failure to understand the proof.
1.3. A complex approach.
The next step was taken by Riemann in 1859. We have seen the zeta function
)1
(
1
1
(s) =
=
1
ns
ps
p
n=1
and its relation to the primes (e.g. obtaining a proof that (x) by the
above factorization). However, Riemann considered (s) as a function of a complex
variable: s = + it (indeed he used these rather strange names for the real and
imaginary parts in his 1859 paper, and we have kept them ever since), so
ns = n+it = n nit .
Here n is a real number and nit = ei(log n)t is a point on the unit circle, so in
modulus we have |ns | = n . From this we get that (s) is absolutely convergent for
= (s) > 1. Using standard results from analysis, one sees that it indeed denes
133
an analytic function in the half-plane > 1. Riemann got the zeta function named
after him by observing the following:
Fact: (s) extends (meromorphically) to the entire complex plane and is analytic everywhere except for a simple pole at s = 1.
We recall in passing, for those with some familiarity with complex variable theory, that the extension of an analytic function dened in one (connected) domain
in the complex plane to a larger (connected) domain is unique if it exists at all: this
is the principle of analytic continuation. So the zeta function is well-dened. The
continuation can be shown to exist via an integral representation valid for > 0
and a functional equation relating the values of (s) to that of (1 s). (Note
that the line = 12 is xed under the s 7 1 s.) Riemann conjectured, but could
not prove, certain simple (to state!) analytic properties of (s), which he saw had
profound implications on the distribution of the primes.
1.4. A nonvanishing theorem.
It is a testament to the diculty of the subject that even after this epochal paper
the proof of PNT did not come for almost 40 years. In 1896, Jacques Hadamard
and Charles de la Vallee-Poussin proved PNT, independently, but by rather similar
methods. The key point in both of their proofs (which Riemann could not establish) was that (s) = 0 for any s = 1 + it, i.e., along the line with = 1.
Their proof does come with an explicit error estimate, albeit an ugly one.
Theorem 132. There exist positive constants C and a such that
|(x) Li(x)| Cxea
log x
It is not completely obvious that this is indeed an error bound, i.e., that
ea log x
lim
= 0.
x
Li(x)
This is left as another calculus exercise.
1.5. An elementary proof is prized.
Much was made of the fact that the proof of PNT, a theorem of number theory,
used nontrivial results from complex analysis (which by the end of the 19th century
had been developed to a large degree of sophistication). Many people speculated on
the existence of an elementary proof, a yearning that to my knowledge was never
formalized precisely. Roughly speaking it means a proof that uses no extraneous
concepts from higher analysis (such as complex analytic functions) but only the
notion of a limit and the denition of a prime. It thus caused quite a stir when Atle
Selberg and Paul Erdos (not independently, but not quite collaboratively either
the story is a controversial one!) gave what all agreed to be an elementary proof of
PNT in 1949. In 1950 Selberg (but not Erdos) received the Fields Medal.
In recent times the excitement about the elementary proof has dimmed: most
experts agree that it is less illuminating and less natural than the proof via Riemanns zeta function. Moreover the elementary proof remains quite intricate: ironically, more so than the analytic proof for those with some familiarity with functions
134
of a complex variable. For those who do not, the time taken to learn some complex
analysis will probably turn out to be time well spent.
1.6. Equivalents of PNT.
Many statements are equivalent to PNT: i.e., it is much easier to show that
they imply and are implied by PNT than to prove them. Heres one:
Theorem 133. Let pn be the nth prime. Then
pn n log n.
Note that this result implies (by the integral test) that pn
this consequence is much easier to prove than PNT itself.
1
p
Far more intriguing is that that PNT is equivalent to an asymptotic formula for
the average value of the Mobius function:
Theorem 134.
N
lim
n=1
(n)
= 0.
Recall that the Mobius function is 0 if n is not squarefree (which we know occurs
with density 1 62 ) and is (1)r if n is a product of r distinct primes. We also
saw that the set of all positive integers divisible by only a bounded number, say
k, of primes is equal to zero, so most integers 1 n N are divisible by lots of
primes, and by adding up the values of we are recording +1 if this large number
is even and 1 if this large number is odd. It is very tempting to view this parity
as being essentially random, similar to what would happen if we ipped a coin for
each (squarefree) n and gave ourselves +1 if we got heads and 1 if we got tails.
With this randomness idea planted in our mind, the above theorem seems to
assert that if we ip a large number N of coins then (with large probability) the
number of heads minus the number of tails is small compared to the total number
of coin ips. But now it seems absolutely crazy that this result is equivalent to
PNT since under the (as yet completely unjustied) assumption of randomness
it is far too weak: doesnt probability theory tell us that the running total of heads
minus tails will be likely to be on the order of the square root of the number of
coin ips? Almost, but not quite. And is this probabilistic model justied? Well,
that is the $ 1 million dollar question.
2. Coin-Flipping and the Riemann Hypothesis
Let us dene the Mertens function
M (N ) =
(n).
n=1
The goal of this lecture is to discuss the following seemingly innocuous question.
Question 5. What is the upper order of M (N )?
Among other incentives for studying this question there is a large nancial one: if
the answer is close to what we think it is, then proving it will earn you $ 1 million!
135
Recall (n) takes on only the values 1 and 0, so the trivial bound is
M (N ) N.
In fact we can do better, since we know that (n) = 0 i n is not squarefree, and
we know, asymptotically, how often this happens. This leads to an asymptotic
expression for the absolute sum:
N
|(n)| = #{squarefree n N }
n=1
6
N.
2
(N )
However, in the last lecture we asserted that MN
0, which we interpreted
as saying that the average order of is asymptotically 0. Thus the problem is
one of cancellation in a series whose terms are sometimes positive and sometimes
negative. Stop for a second and recall how much more complicated the theory of
conditional convergence of such series is than the theory of convergence of series
with positive terms. It turns out that the problem of how much cancellation to
expect in a series whose terms are sometimes positive and sometimes negative (or
a complex series in which the arguments of the terms are spread around on the
unit circle) is absolutely a fundamental one in analysis and number theory. Indeed
in such matters we can draw fundamental inspiration (if not proofs, directly) from
probability theory, and to do so i.e., to make heuristic probabilistic reasoning
even in apparently deterministic situations is an important theme in modern
mathematics ever since the work of Erdos and Kac in the mid 20th century.
But our story starts before the 20th century. In the 1890s Mertens1 conjectured:
(MC1) M (N )
136
M (N )
lim inf
C2 .
N
N
In plainer terms, each of the inequalities N M (N ) and M (N ) N fails for
innitely many N .
The Odlyzko-te Riele proof doesnot supply a concrete value
of N
for which M (N ) > N , but soon after J. Pintz showed [Pi87] that M (N ) > N for
64
64
some N < e3.2110 101.410 (note the double exponential: this is an enormous
number!). Recent work of Saouter and te Riele [SatR14] shows that this inequality
33
holds for some N < e1.00410 . Yet more recently Best and Trudgian have shown
that in Theorem 135 one may take C1 = 1.6383 C2 = 1.6383 [BTxx].
Remark 136. An earlier draft contained the claim that M (N ) > N for some
N < 10154 . I thank Tim Trudgian for bringing this to my attention. Not only is
a counterexample to (MC1) for such a small value of N not known, in fact it
seems quite unlikely that there is a counterexample at anything close to this order
of magnitude. Trudgian recommends a work of Kotnik and van de Lune [KvdL04]
which contains experimental data and conjectures on M (N ). In particular they give
a conjecture which suggests that the rst counterexample to (MC1) should occur at
23
roughly N 102.310 : i.e., much smaller than the known counterexamples and
much larger than 10154 . It should be emphasized that such conjectures are rather
speculative, and the literature contains several incompatible such conjectures.
We still do not know whether (MC2) holds so conceivably Stieltjes was right all
along and the victim of some terrible mix up although I am about to spin a tale
to try to persuade you that (MC2) should be almost, but not quite, true.
But rst, what about the million dollars?
In the last section we mentioned two interesting equivalents of PNT. The following theorem takes things to another level:
Theorem 137. The following (unknown!) assertions are equivalent:
1
1
8 x log x
c) Suppose (s0 ) = 0 for some s0 with real part 0 < (s0 ) < 1. Then (s0 ) = 21 .
We note that the somewhat abstruse part c) which refers to the behavior of the
zeta function in a region which it is not obvious how it is dened is the Riemann
hypothesis (RH). Thus we care about RH (for instance) because it is equivalent
to a wonderful error bound in the Prime Number Theorem.
In 2000 the Clay Math Institute set the Riemann Hypothesis as one of seven $ 1
million prize problems. If you dont know complex analysis, no problem: just prove
part a) about the order of magnitude of the partial sums of the Mobius function.
Note that (MC1) (which is false!) = (MC2) = condition a) of the theorem, so in announcing a proof of (MC2) Stieltjes was announcing a stronger result
than the Riemann hypothesis, which did not have a million dollar purse in his day
but was no less a mathematical holy grail then than now. (So you can decide how
137
likely it is that Stieltjess paper got lost in the mail and never found.)
But why should we believe in the Riemann hypothesis anyway? There is some
experimental evidence for it in any rectangle |t| N , 0 < < 1 the zeta function
can have only nitely many zeros (this holds for any function meromorphic on C),
so one can nd all the zeros up to a certain imaginary part, and the fact that all of
these zeros lie on the critical line i.e., have real part 21 has been experimentally
conrmed in a certain range of t. It is also known that there are innitely many
zeros lying on the critical line (Hardy) and that even a positive proportion of them
as we go up lie on the critical line (Selberg as I said, a great mathematician). For
various reasons this evidence is rather less than completely convincing.
So let us go back to randomness suppose really were a random variable. What
would it do, in all probability?
We can consider instead the random walk on the integers, where we start at 0
and at time i, step to the right with probability 12 and step to the left with probability 12 . Formally speaking, our walk is given by an innite sequence {i }
i=1 ,
each i = 1. The set of all such sign sequences, {1} forms in a natural way a
probability space (meaning it has a natural measure but dont worry about the
details; just hold on for the ride). Then we dene a random variable
S(N ) = 1 + . . . + n ,
meaning a function that we can evaluate on any sign sequence, and it tells us where
we end up on the integers after N steps. Now the miracle of modern probability
theory is that it makes perfect sense to ask what the lim sup of SN is.
If youve had a course in probability theory (good
for you. . .) you will probably
remember that SN should be no larger than N , more or less. But this seems
disappointing, because that is (MC1) (or maybe (MC2)), which feels quite dubious
for the partial sums of the Mobius function. But in between
Mertens day and ours
probability theory grew up, and we now know that N is not exactly the correct
upper bound. Rather, it is given by the following spectacular theorem:
Theorem 138. (Kolmogorov) With probability 1, we have
SN
lim sup
= 1.
2N log log N
N
Thus if you ip a fair coin N times, then in all probability there will be innitely
many moments in time when your running tally of heads minus tails is larger than
any constant times the square root of the number of ips. (Similarly, and symmetrically, the limit inmum is 1.) So true randomness predicts that (MC2) is false.
On the other hand, it predicts that the RiemannHypothesis is true, since indeed
for all > 0 there exists a constant C such that 2 log log N < C N .
So if we believed in the true randomness of , we would believe the following
Conjecture 139. (Good-Churchhouse [GC68])
lim sup
N
M (N )
< .
N log log N
138
lim inf
N
M (N )
> .
N log log N
Just to make sure, this conjecture is still signicantly more precise than the |M (N )|
1
C N 2 + which is equivalent to the Riemann Hypothesis, making it unclear exactly
how much we should pay the person who can prove it: $ 2 million? Or more??
Kolmogorovs law of the iterated logarithm, and hence Conjecture 139, does
not seem to be very well-known outside of probabilistic circles.2 In searching the
literature I found a paper from the 1960s predicting such a logarithm law for
M (N
). More recently I haveseen another paper suggesting that perhaps it should
be log log log N instead of log log N . To be sure, the Mobius function is clearly
not random, so one should certainly be provisional in ones beliefs about the precise form of the upper bounds on M (N ). The game is really to decide whether the
Mobius function is random enough to make the Riemann hypothesis true.
Nevertheless the philosophy expressed here is a surprisingly broad and deep one:
whenever one meets a sum SN of N things, each of absolute value 1, and varying
in sign (or in argument in the case of complex numbers), one wants to know how
much cancellation there is, i.e., how far one can improve upon the trivial bound
of |SN | N . The mantra here is that if there is really no extrastructure in
the summands i.e., randomness then one should expect SN N , more or
less! More accurately the philosophy has
two parts, and the part that expresses
that |SN | should be no smaller than N unless there is hidden structure is an
2i
extremely reliable one. An example of hidden structure is an = e N , when in fact
n
an = 0.
n=1
But here we have chosen to sum over all of the N th roots of unity in the complex
plane, a special situation. The second
part of the philosophy allows
us tohope that
2I learned about Kolmogorovs theorem from a talk at Harvard given by W. Russell Mann.
3When all other lights go out?
CHAPTER 12
I myself nd this heuristic convincing but not quite rigorous. More precisely, I
believe it for a circular region and become more concerned as the boundary of the
region becomes more irregularly shaped, but the heuristic doesnt single out exactly
what nice properties of the circle are being used. Moreover the error bound is
fuzzy: it would be useful to know an explicit value of C.
To be more quantitative about it, we dene the error
E(r) = |L(r) r2 |,
139
140
12. THE GAUSS CIRCLE PROBLEM AND THE LATTICE POINT ENUMERATOR
L(r) (r + 2)2 = r2 + 2 2r + 2.
A similar argument gives a lower bound forL(r). Namely, if (x, y) is any point
with distance from the origin at most r 2, then the entire square (x, x +
1) (y, y + 1) lies within the circle of radius r. Thus the union of all the unit
2
2
squares S(P
) attached to lattice points on or inside x + y = r covers the circle of
radius r 2, giving
L(r) (r 2)2 = r2 2 2r + 2.
Thus
This argument skillfully exploits the geometry of the circle. I would like to present
an alternate argument with a much dierent emphasis.
The rst step is to notice that instead of counting lattice points in an expanding sequence of closed disks, it is equivalent to x the plane region once and for all
here, the unit disk D : x2 +y 2 1 and consider the number of points (x, y) Q2
with rx, ry Z. That is, instead of dividing the plane into squares of side length
one, we divide it into squares of side length 1r . If we now count these 1r -lattice
points inside D, a moments thought shows that this number is precisely L(r).
What sort of thing is an area? In calculus we learn that areas are associated to integrals. Here we wish to consider the area of the unit disk as a double integral over
the square [1, 1]2 . In order to do this, we need to integrate the characteristic
function D of the unit disk: that is, (P ) evaluates to 1 if P D and (P ) = 0
otherwise. The division of the square [1, 1]2 into 4r2 subsquares of side length 1r
is exactly the sort of sequence of partitions that we need to dene aRiemann sum:
that is, the maximum diameter of a subrectangle in the partition is r2 , which tends
r := 2
(Pi,j
)
r i,j
1. INTRODUCTION
141
lim r =
D = Area(D) = .
r
[1,1]2
But we observe that r is very close to the quantity L(r). Namely, if we take each
sample point to be the lower left corner of corner of the corresponding square, then
r2 r = L(r) 2, because every such sample point is a lattice point (which gets
multiplied by 1 i the point lies inside the unit circle) and the converse is true
except that the points (1, 0) and (0, 1) are not chosen as sample points. So
L(r)
L(r) 2 + 2
= lim
= lim r + 0 = .
r
r
r2
r2
The above argument is less elementary than Gausss and gives a weaker result: no
explicit upper bound on E(r) is obtained. So why have we bothered with it? The
answer lies in the generality of this latter argument. We can replace the circle by
any plane region R [1, 1]2 . For any r R>0 , we dene the r-dilate of R,
lim
rR = {rP | P R}.
This is a plane region which is similar to R in the usual sense of Euclidean
geometry. Note that if R = D is the closed unit disk then rD = {(x, y) R2 | x2 +
y 2 r2 } is the closed disk of radius r. Therefore a direct generalization of the
counting function L(r) is
LR (r) = #{(x, y) Z2 rR}.
As above, we can essentially view LRr2(r) as a sequence of Riemann sums for [1,1]2 R
essentially because any lattice points with x or y coordinate equal to 1 exactly
will contribute to LR (r) but not to the Riemann sum. But since the total number
of 1r -squares which touch the top and/or right sides of the square [1, 1]2 is 4r + 1,
this discrepancy goes to 0 when divided by r2 . (Another approach is just to assume
that R is contained in the interior (1, 1)2 of the unit square. It should be clear
that this is no real loss of generality.) We get the following result:
Theorem 142. Let R [1, 1]2 be a planar region. Then
LR (r)
= Area(R).
r2
There remains a technical issue: what do we mean by a plane region? Any subset
of [1, 1]2 ? A Lebesgue measurable subset? Neither of these is correct: take
(35)
lim
142
12. THE GAUSS CIRCLE PROBLEM AND THE LATTICE POINT ENUMERATOR
2. BETTER BOUNDS
143
tends towards some constant as r . We have at the moment only four values
of r, so this is quite rough, but nevertheless lets try it:
r = 10 : P (r) = .453 . . . ,
r = 100 : P (r) = .01538 . . . ,
r = 1000 : P (r) = .54667 . . . ,
r = 10000 : P (r) = .5817 . . . .
Whatever is happening is happening quite slowly, but it certainly seems like E(r)
Cr for some which is safely less than 1.
The rst theoretical progress was made in 1904 by a Polish undergraduate, in
competition for a prize essay sponsored by the departments of mathematics and
physics at the University of Warsaw. The student showed that there exists a con2
stant C such that ED (r) Cr 3 . His name was Waclaw Sierpinski, and this was
the beginning of a glorious mathematical career.1
On the other hand, in 1916 G.H. Hardy and E. Landau, independently, proved
1
that there does not exist a constant C such that E(r) Cr 2 . The conventional
1
wisdom however says that r 2 is very close to the truth: namely, it is believed that
for every real number > 0, there exists a constant C such that
(36)
E(r) C r 2 + .
is a country with an especially distinguished mathematical tradition. Sierpinski is most remembered nowadays for the fractal triangle pattern that bears his name. I have encountered his work
several times over the years, and the work on the Gauss Circle Problem is typical of his style: his
theorems have elementary but striking statements and dicult, intricate proofs.
2I dont know what value he had for C or even whether his proof gave an explicit value.
1
3
144
12. THE GAUSS CIRCLE PROBLEM AND THE LATTICE POINT ENUMERATOR
of possible values for the y-coordinate, so that Lr (R) = (2r + 1)2 = 4r2 + 4r + 1.
In this case we have
ER (r) = |Lr (R) Area(rR)| = 4r + 1,
so that the true error is a linear function of r. This makes us appreciate Sierpinskis
result more: to get a bound of ED (r) Cr for some < 1 one does need to use
properties specic to the circle: in the roughest possible terms, there cannot be as
many lattice points on the boundary of a curved region as on a straight line segment.
More formally, in his 1919 thesis van der Corput proved the following result:
Theorem 143. Let R R2 be a bounded planar region whose boundary is
C -smooth and with nowhere vanishing curvature. Then there exists a constant C
(depending on R) such that for all suciently large r R>0 ,
1
(f (1) + . . . + f (n)) .
n
145
As its name suggests, fave (n) is the average of the rst n values of f .
It is also convenient to work also with the summatory function F (n) :=
The relation between them is simple:
n
k=1
f (k).
F (n) = LD ( n) 1 ( n)2 1 n.
In this context, the Gauss Circle Problem is equivalent to studying the error between
F (n) and n. Studying errors in asymptotic expansions for arithmetic functions is
one of the core topics of analytic number theory.
We remark with amusement that the average value of r2 (n) is asymptotically constant and equal to the irrational number : there is no n for which r2 (n) = !
In fact there is a phenomenon here that we should take seriously. A natural question is how often is r2 (n) = 0? We know that r2 (n) = 0 for all n = 4k + 3, so it is
equal to zero at least 41 of the time. But the average value computation allows us
to do better. Suppose that there exists a number 0 < 1 such that r2 (n) = 0 at
most proportion of the time. Then r2 (n) > 0 at least 1 of the time, so the
average value of r2 (n) is at least 8(1 ). Then 8(1 ), or
1 /8 .607.
That is, weve shown that r2 (n) = 0 more than 60% of the time.3
In fact this only hints at the truth. In reality, r2 (n) is equal to zero with probability one. In other words, if we pick a large number N and choose at random an
elment 1 n N , then the probability that n is a sum of two squares approaches
0 as N . This exposes one of the weaknesses of the arithmetic mean (one
that those who compose and grade exams become well aware of): without further
assumptions it is unwarranted to assume that the mean value is a typical value
in any reasonable sense. To better capture the notion of typicality one can import
further statistical methods and study the normal order of an arithmetic function.
With regret, we shall have to pass this concept over entirely as being too delicate
for our course. See for instance G. Tenenbaums text [T] for an excellent treatment.
The lattice point counting argument generalizes straightforwardly (but fruitfully)
to higher-dimensional Euclidean space RN . For instance, the analogous argument
involving lattice points on or inside the sphere of radius r in R3 gives:
Theorem 145. The number R3 (r) of integer solutions (x, y, z) to x2 +y 2 +z 2
r is asymptotic to 43 r3 , with error being bounded by a constant times r2 .
2
3This argument was not intended to be completely rigorous, and it isnt. What it really
shows is that it is not the case that r2 (n) = 0 on a set of density at least = 1 /8. But this
is morally the right conclusion: see below.
146
12. THE GAUSS CIRCLE PROBLEM AND THE LATTICE POINT ENUMERATOR
Corollary 146. The average value of the function r3 (n), whichcounts representations of n by sums of three integer squares, is asymptotic to 34 n.
We can similarly give asymptotic expressions for the average value of rk (n) the
number of representations of n as a sum of k squares for any k, given a formula for
the volume of the unit ball in Rk . We leave it to the reader to nd such a formula
and thereby compute an asymptotic for the average value of e.g. r4 (n).
CHAPTER 13
More precisely, we showed that this holds for all bounded sets which are Jordan
measurable, meaning that the characteristic function 1 is Riemann integrable.
It is also natural to ask for sucient conditions on a bounded subset for it
to have lattice points at all. One of the rst results of this kind is a theorem of
Minkowski, which is both beautiful in its own right and indispensably useful in the
development of modern number theory (in several dierent ways).
Before stating the theorem, we need a bit of terminology. Recall that a subset
RN is convex if for all pairs of points P, Q , also the entire line segment
P Q = {(1 t)P + tQ | 0 t 1}
is contained in . A subset RN is centrally symmetric if whenever it contains a point v RN it also contains v, the reection of v through the origin.
A convex body is a nonempty, bounded, centrally symmetric convex set.
Some simple observations and examples:
i) A subset of R is convex i it is an interval.
ii) A regular polygon together with its interior is a convex subset of R2 .
iii) An open or closed disk is a convex subset of R2 .
iv) Similarly, an open or closed ball is a convex subset of RN .
v) If is a convex body, then P ; then P and 0 = 21 P + 12 (P ) .
vi) The open and closed balls of radius r with center P are convex bodies i P = 0.
147
148
Warning: The term convex body often has a similar but slightly dierent meaning: e.g., according to Wikipedia, a convex body is a closed, bounded convex subset
of RN which has nonempty interior (i.e., there exists at least one point P of
such that for suciently small > 0 the entire open ball B (P ) of points of RN of
distance less than from P is contained in ). Our denition of convex body is
chosen so as to make the statement of Minkowskis Theorem as clean as possible.
First we record a purely technical result, without proof:
Lemma 147. (Minkowski) A bounded convex set RN is Jordan measurable:
that is, the function
1 : x 7 1, x ; 0, x
is Riemann integrable. Therefore we can dene the volume of as
1 .
Vol() =
RN
149
0. If r = 1, it meets the four closest points to 0 if the disk is closed but not if it is
open; for r > 1 it necessarily meets other lattice points.
Can we nd a convex body in R2 which contains no nonzero lattice points but
has larger area than the open unit disk, i.e., area larger than ? Of course we can:
the open square
(1, 1)2 = {(x, y) R2 | |x|, |y| < 1}
has area 4 but meets no nonzero lattice points. As in the case of circles, this is
certainly the limiting case of its kind : any centrally symmetric i.e., with vertices
(a, b) for positive real numbers a, b will contain the lattice point (1, 0) if a > 1
and the lattice point (0, 1) if b > 1, so if it does not contain any nonzero lattice
points we have max(a, b) 1 and thus its area is at most 4. But what if we rotated
the rectangle? Or took a more elaborate convex body?
One way (not infallible by any means, but a place to start) to gain intuition in
a multi-dimensional geometric problem is to examine the problem in a lower dimension. A symmetric convex subset of the real line R1 is just an interval, either of
the form (a, a) or [a, a]. Thus by reasoning similar to, but even easier than, the
above we see that a centrally symmetric convex subset of R must have a nontrivial lattice point if its one dimensional volume is greater than 2, and a centrally
symmetric convex body (i.e., closed) must have a nontrivial lattice point if its onedimensional volume is at least 2.
Now passing to higher dimensions, we see that the open cube (1, 1)N is a symmetric convex subset of volume 2N which meets no nontrivial lattice point, whereas for
1/N
1/N
any 0 < V < 2N the convex body [ V 2 , V 2 ]N meets no nontrivial lattice point
and has volume V . After some further experimentation, it is natural to suspect the
following result.
Theorem 150. (Minkowskis Convex Body Theorem) Suppose RN is a
convex body with Vol() > 2N . Then there exist integers x1 , . . . , xN , not all zero,
such that P = (x1 , . . . , xN ) .
1.2. First Proof of Minkowksis Convex Body Theorem.
Step 0: By Corollary 149, 12 is also a convex body of volume
1
1
Vol( ) = N Vol() > 1.
2
2
Moreover contains a nonzero integral point P ZN i 12 contains a nonzero
half-integral point a nonzero P such that 2P ZN . So it suces to show:
for any convex body RN with volume greater than one, there exist integers
x1 , . . . , xN , not all zero, such that P = ( x21 , . . . , x2N ) lies in .
Step 1: Observe that if contains P and Q, by central symmetry it contains
Q and then by convexity it contains 12 P + 12 (Q) = 12 P 21 Q.
Step 2: For a positive integer r, let L(r) be the number of 1r -lattice points of ,
i.e., points P RN such that rP ZN . By Lemma 147, is Jordan measurable, and then by [Theorem 3, Gausss Circle Problem], limr L(r)
= Vol().
rN
150
Since Vol() > 1, for suciently large r we must have L(r) > rN . Because
#(Z/rZ)N = rN , by the pigeonhole principle there exist distinct integral points
P = (x1 , . . . , xN ) = Q = (y1 , . . . , yN )
such that
1
1
r P, r Q
1
N
But xi yi (mod r) for all i and therefore 1r (P Q) = ( x1 y
, . . . , xN y
) ZN
r
r
1 1
and thus R = 2 ( r (P Q)) is a half integral point lying in : QED!
P () :=
x + ;
xZN
that is, P () is the union of the translates of by all integer points x. We say
that is packable if the translates are pairwise disjoint, i.e., if for all x = y ZN ,
(x + ) (y + ) = .
Example: Let = B0 (r) be the open disk in RN centered at the origin with
radius r. Then is packable i r 12 .
Example: For r > 0, let = [0, r]N be the cube with side length r and one
vertex on the origin. Then if packable i r < 1, i.e., i Vol() < 1. Also the
open cube (0, 1)N is packable and of volume one.
These examples serve to motivate the following result.
Theorem 151. (Blichfeldts Theorem) If a bounded, Jordan measurable subset
RN is packable, then Vol() 1.
Proof. Suppose that is packable, i.e., that the translates {x + | x ZN }
are pairwise disjoint. Let d be a positive real number such that every point of
lies at a distance at most d from the origin (the boundedness of is equivalent to
d < ).
Let B r (0) be the closed ball of radius r centered at the origin. It has volume
c(N )rN where c(N ) depends only on N .1 By our work on Gausss Circle Problem,
we know that the number of lattice points inside B r (0) is asymptotic to c(N )rN .
Therefore the number of lattice points inside B rd (0) is asymptotic, as r ,
to c(N )(r d)N c(N )rN . Therefore for any xed > 0, there exists R such
that r R implies that the number of lattice points inside B rd (0) is at least
(1 )c(N )rN .
Now note that if x ZN is such that ||x|| r d, then the triangle inequality
1The values of c(N ) are known of course c(2) = and c(3) = 4 are familiar from our
3
2
mathematical childhood, and later on you will be asked to compute c(4) = 2 . But as you will
shortly see, it would be pointless to substitute in the exact value of c(N ) here.
151
Remark: The reader who knows about such things will see that the proof works
verbatim if is merely assumed to be bounded and Lebesgue measurable.
Now we use Blichfeldts Theorem to give a shorter proof of Minkowksis Theorem. As in the rst proof, after the rescaling 7 12 , our hypothesis is that is
a convex body with Vol() > 1 and we want to prove that contains a nonzero
point with half-integral coordinates. Applying Blichfeldts Lemma to , we get
x, y ZN such that (x + ) (y + ) is nonempty. In other words, there exist
P, Q such that x + P = y + Q, or P Q = y x ZN . But as we saw
above, any convex body which contains two points P and Q also contains Q and
therefore 12 P 12 Q = 12 (P Q), which is a half-integral point.
1.4. Minkowskis Theorem Mark II.
Let RN . In the last section we considered the eect of a dilation on :
we got another subset , which was convex i was, centrally symmetric i
was, and whose area was related to in a predictable way.
Note that dilation by R>0 can be viewed as a linear automorphism of
RN : that is, the map (x1 , . . . , xn ) 7 (x1 , . . . , xn ) is an invertible linear map.
Its action on the standard basis e1 , . . . , eN of RN is simply ei 7 ei , so its matrix
representation is
0 0 ... 0
0 0 ... 0
: RN RN , (x1 , . . . , xn )t 7 .
(x1 , . . . , xn )t .
..
...
152
Vol()
Vol()
=
,
det(M )
Vol(M )
153
154
[a, b] =
i=1 [ai , bi ]
f = sup
f,
RN
[a,b]
where the supremum ranges all integrals over all rectangles. Note that such an
improper integral is always dened although it may be : for instance it will be if
we integrate the constant function 1 over RN .
Theorem 158. (Rened Minkowski Theorem) Let RN be a nonempty
centrally symmetric convex subset.
a) Then #( ZN ) 2( Vol()
1) + 1.
2N
) + 1.
b) If is closed and bounded, then #( ZN ) 2( Vol()
2N
In other words, part a) says that if for some positive integer k we have Vol() is
strictly greater than k 2N , then contains at least 2k nonzero lattice points (which
necessarily come in k antipodal pairs P , P ). Part b) says that the same conclusion holds in the limiting case Vol() = k 2N provided is closed and bounded.
There are analogous renements of Blichfeldts theorem; moreover, by a linear
change of variables we can get a Rened Mark II Minkowski Theorem with the
standard integral lattice ZN replaced by any lattice = M ZN , with a suitable
correction factor of Vol() thrown in.
We leave the proof of Theorem 158 and the statements and proofs of these other
renements as exercises for the interested reader.
2. Diophantine Applications
2.1. The Two Squares Theorem Again.
Suppose p = 4k + 1 is a prime number.
By Fermats Lemma (Lemma 2 of Handout 4), there exists u Z such that u2 1
(mod p): equivalently, u has order 4 in (Z/pZ) . Dene
[
]
1 0
M :=
.
u p
We have det(M ) = p2 , so := M Z2 denes a lattice in R2 with
Vol() = det(M ) Vol(Z2 ) = p.
If (t1 , t2 ) Z2 and (x1 , x2 )t = M (t1 , t2 )t , then
x21 + x22 = t21 + (ut1 + pt2 )2 (1 + u2 )t21 0 (mod p).
Now let
2. DIOPHANTINE APPLICATIONS
155
Proof. Exercise!
Thus the set of sums of four integer squares is closed under multiplication. Since
1 = 12 + 02 + 02 + 02 is a sum of four squares, it suces to show that each prime p
is a sum of four squares. Since 2 = 12 + 12 + 02 + 02 , we may assume p > 2.
Lemma 160. The (four-dimensional) volume of a ball of radius r in R4 is
2 4
2 r .
Proof. Exercise!
Lemma 161. For a prime p > 2 and a Z, there exist r, s Z such that
r2 + s2 a (mod p).
p1
p+1
Proof. There are p1
2 nonzero squares mod p and hence 2 +1 = 2 squares
2
2
mod p. Rewrite the congruence as r a s (mod p). Since the map Fp Fp
given by t 7 a t is an injection, as x ranges over all elements of Fp both the left
p+1
p+1
and right hand sides take p+1
2 distinct values. Since 2 + 2 > p, these subsets
cannot be disjoint, and any common value gives a solution to the congruence.
p
0
M =
0
0
s
r
.
0
1
be the open ball of radius 2p about the origin in R4 . Using Lemma 160 we have
2 4
( 2p) = 2 2 p2 > 16p2 = 24 Vol ,
Vol() =
2
156
1
(mod
4))
so
in
fact
it
is,
and
in
this
case
2. DIOPHANTINE APPLICATIONS
157
computes an integral basis for K (the key word here is discriminant, but no more
about that!).
Question 8. For which number elds K is ZK a principal ideal domain?
This is absolutely one of the deepest and most fundamental number-theoretic questions because, as we have seen, in trying to solve a Diophantine equation we are
often naturally led to consider arithmetic in aring of integers ZK e.g., in studying
the equation x2 dy 2 = n we take K = Q( d) and in studying xn + y n = z n we
take K = Q(n ). If ZK turns out to be a PID, we can use Euclids Lemma, a
formidable weapon. Indeed, it turns out that a common explanation of each of the
classical success stories regarding these two families of equations (i.e., theorems of
Fermat, Euler and others) is that the ring ZK is a PID.
Gauss conjectured that there are innitely many squarefree
d > 0 such that the
ring of integers of the real quadratic eld K = Q( d) is a PID. This is still unknown; in fact, for all we can prove there are only nitely many number elds K (of
any and all degrees!) such that ZK is a PID. In this regard two important goals are:
(i) To give an algorithm that will decide, for any given K, whether ZK is a PID;
(ii) When it isnt, to quantify the failure of uniqueness of factorization in ZK .
For this we dene the concept of class number. If R is any integral domain,
we dene an equivalence relation on the set I(R) of nonzero ideals of R. Namely
we put I J i there exist nonzero elements a, b R such that (a)I = (b)J.
This partitions all the nonzero ideals into equivalence classes, simply called ideal
classes.3 The class number of R is indeed the number of classes of ideals. For
an arbitrary domain R, the class number may well be innite.
The point here is that there is one distinguished class of ideals: an ideal I is
equivalent to the unit ideal R = (1) i it is principal. It follows that R is a PID
i its class number is equal to one. Therefore both (i) and (ii) above would be
addressed if we can compute the class number of an arbitrary ring of integers ZK .
This is exactly what Minkowski did:
Theorem 163. (Minkowski) Let K be any number eld.
a) The ideals of the ring ZK of integers of K fall into nitely many equivalence
classes; therefore K has a well-dened class number h(K) < .
b) There is an explicit upper bound on h(K) in terms of invariants of K which can
be easily computed if an integral basis is known.
c) There is an algorithm to compute h(K).
The proof is not easy; apart from the expected ingredients of more basic algebraic
number theory, it also uses, crucially, Theorem 150!
As an example of the usefulness of the class number in quantifying failure of
3In fact, the use of the term class in mathematics in the context of equivalence relations
can be traced back to this very construction in the case of R = ZK the ring of integers of an
imaginary quadratic eld K, which was considered by Gauss in his Disquisitones Arithmeticae.
158
factorization even when ZK is not a UFD, we note that Lame erroneously believed
he could prove FLT for all odd primes p because he assumed (implicitly, since the
concept was not yet clearly understood) that Z[p ] was always a PID. Lames proof
is essentially correct when the class number of Q(p ) is equal to one, which is some
progress from the previous work on FLT, but unforunately this happens i p 19.
Kummer on the other hand found a sucient condition for FLT(p) to hold which
turns out to be equivalent to: the class number of Q(p ) is not divisible by p.
This condition, in turn, is satised for all p < 200 except for 37, 59, 67, 101, 103,
131, 149, and 157; and conjecturally for a subset of the primes of relative density
1
e 2 0.61. Note nally that this remains conjectural to this day while FLT has
been proven: the study of class numbers really is among the deepest and most
dicult of arithmetic questions.
2.4. Comments and complements.
As is the case for many of the results we have presented, one of the attractions
of Theorem 162 is its simple statement. Anyone who is inquisitive enough to wonder which integers can be written as a sum of four squares will eventually conjecture
the result, but the proof is of course another matter. Apparently the rst recorded
statement without proof is in the Arithmetica of Diophantus of Alexandria,
some time in the third century AD. Diophantus text entered into the mathematical consciousness of Renaissance Europe through Gaspard Bachets 1620 Latin
translation of the Arithmetica.
Famously, Fermat was an ardent reader of Bachets book, and he saw and
claimed a proof of the Four Squares Theorem. As we have already mentioned, with
one exception (FLT for n = 4) Fermat never published proofs, making the question
of exactly which of his theorems he had actually proved a subject of perhaps eternal debate. In this case the consensus among mathematical historians seems to be
skepticism that Fermat actually had a proof. In any case, the proof was still much
sought after Fermats death in 1665. Euler, one of the greatest mathematicians of
the 18th century, labored for 40 years without nding a proof. Finally the theorem
was proved and published in 1770 by the younger and equally great4 Italian-French
mathematician Joseph-Louis Lagrange.
There are many quite dierent looking proofs of the Four Squares Theorem. It
There are proofs which are completely elementary, i.e., which neither require nor
introduce any extraneous concepts like lattices. The most pedestrian proof begins as ours did with Eulers identity and Lemma 161. From this we know that it
suces to represent any prime as a sum of four squares and also that for any prime
p, some positive integer multiple mp is of the form r2 + s2 + 12 and in particular a
sum of four squares, and the general strategy is to let m0 be the smallest integral
multiple of p which is a sum of four squares and to show, through a descent
argument, that m0 = 1. Lagranges original proof followed this strategy, and many
elementary number theory texts contain such a proof, including Hardy and Wright.
Another proof, which has the virtue of explaining the mysterious identity of Lemma
4In quality at least. No one has ever equalled Euler for quantity, not even the famously
prolic and relatively long-lived twentieth century mathematician Paul Erd
os, although there are
one or two living mathematicians that might eventually challenge Euler.
2. DIOPHANTINE APPLICATIONS
159
159 proceeds in analogy to our rst proof of the two squares theorem: it works in
a certain (non-commutative!) ring of integral quaternions. Quaternions play
a vital role modern number theory, but although it is not too hard to introduce
enough quaternionic theory to prove the Four Squares Theorem (again see Hardy
and Wright), one has to dig deeper to begin to appreciate what is really going on.
Yet another proof uses the arithmetic properties of theta series; this leads to an
exact formula for r4 (n), the number of representations of a positive integer as a
sum of four squares. In this case, to understand what is really going on involves
discussion of the arithmetic theory of modular forms, which is again too rich for
our blood (but we will mention that modular forms and quaternions are themselves
quite closely linked!); and again Hardy and Wright manage to give a proof using
only purely formal power series manipulations, which succeeds in deriving the formula for r4 (n).
Regarding generalizations of Theorem 162, we will only mention one: a few months
before Lagranges proof, Edward Waring asserted that every number is a sum of
four squares, nine cubes, nineteen biquadrates [i.e., fourth powers] and so on. In
other words, Waring believed that for every positive integer k there exists a number
n depending only on k such that every positive integer is a sum of n non-negative
kth powers. If so, we can dene g(k) to be the least such integer k. Evidently the
Four Squares Theorem together with the observation that 7 is not a sum of three
squares, give us g(2) = 4. That g(k) actually exists for all k is by no means obvious,
and indeed was rst proven by David Hilbert in 1909. We now know the exact value
of g(k) for all k; that g(3) = 9 was established relatively early on (Wieferich, 1912),
but g(4) was the last value to be established, by Balasubramanian in 1986: indeed
g(4) = 19, so all of Warings conjectures turned out to be correct.
Of more enduring interest is the quantity G(k), dened to be the least positive
integer n such that every suciently large positive integer can be written as a sum
of n non-negative kth powers: i.e., we allow nitely many exceptions. Since for all
k, 8k + 7 is not even a sum of three squares modulo 8, none of these innitely many
integers are sums of three squares, so g(2) = G(2) = 4. On the other hand it is
known that G(3) 7 < 9 = g(3), and it morever suspected that g(3) = 4, but this
is far from being proven. Indeed only one other value of G is known.
Theorem 164. (Davenport, 1939) G(4) = 16.
Getting better bounds on G(k) is an active topic in contemporary number theory.
CHAPTER 14
162
163
164
165
(1 xq1
).
i
i=1
Fnq
Pf (t) :=
F, dene
yFn
q
f (y)
(1 (ti yi )q1 ).
i=1
Every term in the sum with y = x yields zero, and the term y = x yields f (x).
On the other hand, over a nite eld Fq , a nonzero polynomial may evaluate to the
zero function: indeed tq t is a basic one variable example. There is no contradiction here because a nonzero
polynomial over a domain cannot have more roots
than its degree, but tq t = aFq (t a) has exactly as many roots as its degree.
Moreover, no nonzero polynomial of degree less than q could lie in the kernel of
the evaluation map, so tq t is a minimal degree nonzero element of Ker(). But,
since Fq [t] is a PID, every nonzero ideal I is generated by its unique monic element
of least degree, so Ker() = tq t.
We would like to compute Ker() in the multivariable case. Reasoning as above it
166
is clear that for all 1 i n the polynomials tqi ti must lie in the kernel of the
evaluation map, so at least we have J = tq1 t1 , . . . , tqn tn Ker(). We will
see that in fact J = Ker(). We can even do better: for each polynomial P (t) we
can nd a canonical element P of the coset P (t) + Ker().
The key idea is that of a reduced polynomial. We say that a monomial cta1 1 tann
is reduced if ai < q for all i. A polynomial P Fq [t] is reduced if each of its
nonzero monomial terms is reduced. Equivalently, a reduced polynomial is one for
which the total degree in each variable is less than q.
Example: The polynomial Pf (t) above is a sum of polynomials each having degree q 1 in each variable, so is reduced.
Exercise 5: The reduced polynomials form an Fq -subspace of Fq [t1 , . . . , tn ], with a
basis being given by the reduced monomials.
The idea behind the denition is that if in a monomial term we had an exponent tai i with ai q, then from the perspective of the associated function this is
just wasteful: we have
q+(ai q)
xai i = xi
a (q1)
167
r
i=1
We want to show that #Z 0 (mod p). Let 1Z : Fnq Fq be the (Fq -valued)
characteristic function of the subset Z, i.e., the function which maps x to 1 if
x Z and x to 0 otherwise. Now one polynomial representative 1Z is
(40)
P (t) :=
)
1 Pi (t)q1 ;
i=1
)
1 (ti xi )q1 .
xZ i=1
Now comes the devilry: the total degree of P (t) is (q 1) i di < (q 1)n.
in
On the other hand, consider the coecient of the monomial tq1
tq1
n
1
n
QZ (t): it is (1) #Z. If we assume that #Z is not divisible by p, then this term
is nonzero and QZ (t) has total degree at least (q 1)n. By Exercise X.X, we have
deg(P ) deg(P ) < (q 1)n deg(QZ ).
Therefore P = deg(QZ ), whereas we ought to have P = QZ , since each is the
reduced polynomial representative of 1Z . Evidently we assumed something we
shouldnt have: rather, we must have p | #Z, qed.
2.3. Axs proof of Warnings theorem.
The following book proof of Warnings Theorem is due to James Ax [Ax64].
We maintain the notation of the previous section, especially the polynomial P (t)
of (40) and the subset Z of (39). Because P (x) = 1Z (x) for all x Fnq , we have
168
So we just need to evaluate the sum. Since every polynomial is an Fq -linear combination
of monomial terms, it is reasonable to start by looking for a formula for
a1
an
x
n
xFq 1 xn for non-negative integers a1 , . . . , an . It is often the case that if
f : G C is a nice
function from an abelian group to the complex numbers,
then the complete sum xG f (x) has a simple expression. Case in point:
Lemma 173. Let a1 , . . . , an be non-negative integers.
xa1 1 xann =
xai i .
x Fn
q
i=1
xi Fq
xi Fq
xai i = 0ai +
xi F
q
xai i = 0 +
q2
(N )ai =
N =0
q2
N =
N =0
1 q1
11
=
= 0.
1
1
Finally, the polynomial P (t) has degree i=1 di (q 1) = (q 1) i=1 di < (q 1)n.
Thus in each monomial term cta1 1 tanr in P (t) must have a1 +. . .+ar < (q1)n,
so
it cant be the case that each ai q 1. Therefore Lemma 173 applies, and xFnq
is an Fq -linear combination of sums each of which evaluates to 0 in Fq and therefore
the entire sum is 0. This completes our second proof of Warnings Theorem.
3. Some Later Work
Under the hypotheses of Warnings theorem we can certainly have 0 solutions.
For instance, we could take P1 (t) to be any polynomial with deg(P1 ) < n2 and
P2 (t) = P1 (t) + 1. Or, when q is odd, let a Fq be a quadratic nonresidue, let
P1 (t) be a polynomial of degree less than n2 and put P (t) = P1 (t)2 a.
On the other hand, it is natural to wonder: in Warnings theorem, we might actually have #Z 0 mod q? The answer is now known, but it took 46 years.
First consider the case of r = 1, i.e., a single polynomial P of degree less than
n. In [Wa36], Warning proved that #Z, if positive, is at least q nd . And in the
same paper [Ax64], Ax showed that q b | #Z for all b < nd . By hypothesis we can
take b = 1, so the aforementioned question has an armative answer in this case.
169
These divisibilities are called estimates of Ax-Katz type. It is known that there
are examples in which the Ax-Katz divisibilities are best possible, but rening these
estimates in various cases is a topic of active research: for instance there is a 2007
paper by W. Cao and Q. Sun, Improvements upon the Chevalley-Warning-Ax-Katztype estimates, J. Number Theory 122 (2007), no. 1, 135141.
Notice that the work since Warning has focused on the problem of getting best
possible p-adic estimates for the number of solutions: that is, instead of bounds of
the form #Z N , we look for bounds of the form ordp (#Z) N . Such estimates
are closely linked to the p-adic cohomology of algebraic varieties, a beautiful (if
technically dicult) eld founded by Pierre Deligne in his landmark paper Weil II.
The hypotheses of the Chevalley-Warning theorem are also immediately suggestive to algebraic geometers: (quite) roughly speaking there is a geometric division
of algebraic varieties into three classes: Fano, Calabi-Yau, and general type. The
degree conditions in Warnings theorem are precisely those which give, among the
class of algebraic varieties represented nicely by r equations in n variables (smooth
complete intersections), the Fano varieties. A recent result of Hel`ene Esnault gives
the geometrically natural generalization: any Fano variety over Fq has a rational
point. There are similar results for other Fano-like varieties.
CHAPTER 15
Additive Combinatorics
1. The Erd
os-Ginzburg-Ziv Theorem
1.1. A Mathematical Card Game.
Consider the following game. One starts with a deck of one hundred cards (or
N cards, for some arbitrary positive integer N ). Any number of players may play;
one of them is the dealer. The dealer shues the deck, and the player to the dealers
left selects a card (any card) from the deck and shows it to everyone. The player
to the dealers right writes down the numerical value of the card, say n, and keeps
this in a place where everyone can see it. The card numbered n is reinserted into
the deck, which is reshued. The dealer then deals cards face up on the table, one
at a time, at one minute intervals, or sooner by unanimous consent (i.e., if everyone
wants the next card, including the dealer, then it is dealt; otherwise the dealer waits
for a full minute). A player wins this round of the game by correctly selecting any
k > 0 of the cards on the table such that the sum of their numerical values is divisible by n. When all the cards are dealt, the players have as much time as they wish.
For example, suppose that n = 5 and the rst card dealt is 16. 16 is not divisible by 5, so the players all immediately ask for another card: suppose it is 92.
92 is not divisible by 5 and neither is 92 + 16 = 118, so if the players are good,
they will swiftly ask for the next card. Suppose the next card is 64. Then someone can win by collecting the 64 and the 16 and calling attention to the fact that
64 + 16 = 80 is divisible by 5.
Heres the question: is it always possible to win the game, or can all the cards
be dealt with no solution?
We claim that it is never necessary to deal more than n cards before a solution
exists. Moreover, if the number N of cards in the deck is suciently large compared
to the selected modulus n, it is possible for fewer than n cards to be insucient.
To see the latter, note that if n = 1 we obviously need n cards, and if n = 2
we will need n cards i the rst card dealt is odd. If n = 3 we may need n cards i
N 4, since if 1 and 4 are the rst two cards dealt there is no solution. In general,
if the cards dealt are 1, 1+n, 1+2n, . . . , 1+(n2)n, then these are n1 cards which
are all 1 (mod n) and clearly we cannot obtain 0 (mod n) by adding up the values
of any 0 < k n1 of these. This is possible provided N n2 2n+1 = (n1)2 .1
1We neglect the issue of guring out exactly how many card are necessary if n is moderately
large compared to N . It seems interesting but does not segue into our ultimate goal.
171
172
But why are n cards always sucient? We can give an explicit algorithm for
nding a solution: for each 1 k n, let Sk = a1 + . . . + ak be the sum of the
values of the rst k cards. If for some k, Sk is divisible by n, we are done: we
can at some point select all the cards. Otherwise, we have a sequence S1 , . . . , Sn
of elements in Z/nZ, none of which are 0 (mod n). By the pigeonhole principle,
there must exist k1 < k2 such that Sk1 Sk2 (mod n), and therefore
0 Sk2 Sk1 = ak1 +1 + . . . + ak2
(mod n).
In other words, not only does a solution exist, for some k n a solution exists
which we can scoop up quite eciently, by picking up a consecutive run of cards
from right to left starting with the rightmost card.
Notice that this is not always the only way to win the game, so if this is the
only pattern you look for you will often lose to more skillful players. For instance,
in our example of n = 5, the sequence (which we will now reduce mod 5) 1, 2, 4
already has a solution but no consecutively numbered solution.
An interesting question that we will leave the reader with is the following: x
n and assume that N is much larger than n: this is eectively the same as drawing with replacement (because after we a draw any one card ai , the change in the
proportion of the cards in the deck which are congruent to ai (mod n) is negligible
if N is suciently large, and we will never deal more than n cards). Suppose then
that we deal 1 k n cards. What is the probability that a solution exists?
Anyway, we have proven the following amusing mathematical fact:
Theorem 174. Let a
1 , . . . , an be any integers. There exists a nonempty subset
I {1, . . . , n} such that iI ai 0 (mod n).
1.2. The Erd
os-Ginzburg-Ziv Theorem.
After a while it is tempting to change the rules of any game. Suppose we make
things more interesting by imposing the following additional requirement: we deal
cards in sequence as before with a predetermined modulus n Z+ . But this time,
instead of winning by picking up any (positive!) number of cards which sum to 0
modulo n, we must select precisely n cards ai1 , . . . , ain such that ai1 + . . . + ain 0
(mod n). Now (again assuming that N gn, or equivalently, dealing with replacement), is it always possible to win eventually? If so, how many cards must be dealt?
Well, certainly at least n: since the problem is more stringent than before, again
if the rst n 1 congruence classes are all 1 (mod n) then no solution exists. If
we have at least n instances of 1 mod n then we can take them and win. On the
other hand, if the rst n 1 cards are all 1s, then by adding up any k n 1 of
them we will get something strictly less than n, so if the next few cards all come
out to be 0 (mod n), then we will not be able to succeed either. More precisely, if
in the rst 2n 2 cards we get n 1 instances of 1 (mod n) and n 1 instances
of 0 (mod n), then there is no way to select precisely n of them that add up to 0
(mod n). Thus at least 2n 1 cards may be required. Conversely:
1. THE ERDOS-GINZBURG-ZIV
THEOREM
173
2p1
ai tp1
,
i
i=1
P2 (t1 , . . . , t2p1 ) =
2p1
tp1
.
i
i=1
(41)
ai xp1
= 0,
i
i=1
2p1
(42)
xip1 = 0.
i=1
Put
I = {1 i 2p 1 | xi = 0}.
p1
ai 0 (mod p),
iI
10
(mod p).
iI
But we have 0 < #I < 2p, and therefore #I = p, completing the proof of Step 1.
Step 2: Because we know the theorem is true for all primes n, by induction we
may assume that n = km for 1 < k, m < n (i.e., n is composite) and, by induction,
that the theorem holds for k and m.
By an easy induction on r, one sees that if for any r 2 we have rk 1 integers a1 , . . . , ark1 , then there are r 1 pairwise disjoint subsets of I1
, . . . , Ir1 of
{1, . . . , rk 1}, each of size k, such that for all 1 j r 1 we have lIj ai 0
(mod k). Apply this with r = 2m to our given set of 2n 1 = (2mk) 1 integers:
this gives 2m 1 pairwise disjoint subsets I1 , . . . , I2m1 {1, . . . , 2n 1}, each of
size k, such that for all 1 j 2m 1 we have
ai 0 (mod k).
iIj
174
ai , bj =
iIj
bj
.
k
b1 , . . . , b2m1 .
ai
ai
kbj 0 (mod km).
iI
jJ iIj
jJ
1.3. EGZ theorems in nite groups.
This application of Chevalley-Warning one which makes good use of our ability to
choose multiple polynomials is apparently well-known to combinatorial number
theorists. But I didnt know about it until Patrick Corn brought it to my attention.
As with the Chevalley-Warning theorem itself, the EGZ theorem is sort of a prototype for a whole class of problems in combinatorial algebra. In any group G
(which, somewhat unusually, we will write additively even if it is not commutative)
a zero sum sequence is a nite sequence x1 , . . . , xn of elements of G such that
(guess what?) x1 + . . . + xn = 0. By a zero sum subsequence we shall mean
the sequence xi1 , . . . , xik associated to a nonempty subset I {1, . . . , n}. In this
language, our Theorem 174 says that any sequence of n elements in Z/nZ has a
zero sum subsequence. The same argument proves the following result:
Theorem 176. Let G be a nite group (not necessarily commutative), of order
n. Then any sequence x1 , . . . , xn in G has a zero sum subsequence.
Some EGZ-type theorems in this context are collected in the following result.
Theorem 177. (EGZ for nite groups)
a) (Erd
os-Ginzburg-Ziv, 1961) Let G be a nite solvable group of order n and
x1 , . . . , x2n1 G. Then there exist distinct indices i1 , . . . , in (not necessarily in
increasing order) such that xi1 + . . . + xin = 0.
b) [Ol76] Same as part a) but for any nite group.
c) [Su99] Same as part a) but the indices can be chosen in increasing order:
i1 < . . . < in .
d) [Su99] The conclusion of part c) holds for a nite group G provided it holds for
all of its Jordan-H
older factors.
We draw the readers attention to the distinction between the results of parts a)
and b) and those of c) and d): in the rst two parts, we are allowed to reorder
the terms of the subsequence, whereas in the latter two we are not. In a commutative group it makes no dierence thus, the generalization to all nite abelian
groups is already contained in the original paper of EGZ but in a noncommutative group the desire to preserve the order makes the problem signicantly harder.
The inductive argument in Step 2 of Theorem 175 is common to all the proofs,
and is most cleanly expressed in Surys paper as the fact that the class of nite
175
groups for which EGZ holds is closed under extensions. Thus the case in which G
is cyclic of prime order is seen to be crucial. In 1961 Erdos, Ginzburg and Ziv gave
an elementary proof avoiding Chevalley-Warning. Nowadays there are several
proofs available; a 1993 paper of Alon and Dubiner presented at Erdos 80th birthday conference gives ve dierent proofs. Olsons proof also uses only elementary
group theory, but is not easy. In contrast, Surys paper makes full use of ChevalleyWarning and is the simplest to read: it is only three pages long.
Surys result has the intriguing implication that it would suce to prove the EGZ
theorem for all nite simple groups (which are now completely classied. . .). To
my knowledge no one has followed up on this.
There is another possible generalization of the EGZ theorem to nite abelian, but
non-cyclic, groups. Consider for instance G(n, 2) := Zn Zn , which of course
has order n2 . Rather than asking for the maximal length of a sequence without
an n2 -term zero sum subsequence, one might ask for the maximal length of a sequence without an n-term zero sum subsequence. (One might ask many other such
questions, of course, but in some sense this is the most reasonable vector-valued
analogue of the EGZ situation.) A bit of thought shows that the analogous lower
bound is given by the sequence consisting of n 1 instances each of (0, 0), (0, 1),
(1, 0) and (1, 1): in other words, this is the obvious sequence with no n-term
zero-sum subsequence, of length 4(n 1). It was conjectured by A. Kemnitz in
1983 that indeed any sequence in G(n, 2) of length at least 4n 3 has an n-term
zero sum subsequence. Kemnitzs conjecture was proved in 2003 independently by
Christian Reiher [Re07] (an undergraduate!) and Carlos di Fiore (a high school
student!!). Both proofs use the Chevalley-Warning theorem, but in quite intricate
and ingenious ways.
For any positive integer k, dene G(n, d) = (Zn )d , the product of d copies of
the cyclic group of order n, and consider lengths of sequences without an n-term
zero sum subsequence: let us put f (n, d) for the maximal length of such a sequence.
Analogues of the above sequences with {0, 1}-coordinates give
f (n, d) 2d (n 1).
In 1973 Heiko Harborth established the (much larger) upper bound
f (n, d) nd (n 1).
Harborth also computed G(3, 3) = 18 > 23 (3 1): i.e., in this case the obvious
examples do not have maximal length! It seems that the computation of G(n, 3)
for all n or still more, of G(n, d) for all d would be a signicant achievement.
2. The Combinatorial Nullstellensatz
In this section we describe a celebrated result of Noga Alon that has served as
a powerful technical tool and organizing principle in combinatorics and additive
number theory. Recall that in 2 we considered the evaluation map
: k[t1 , . . . , tn ] Map(k n , k)
and showed that when k is a nite eld of order q, Ker is the ideal I0 generated
by tq1 t1 , . . . , tqn tn . We showed this in a somewhat unorthodox way: rst, by an
explicit construction we saw that is surjective. Further, clearly Ker contains
176
dn
i=0
gi (ti ) =
(ti si ) k[t] = k[t1 , . . . , tn ].
si Si
Suppose that for all s = (s1 , . . . , sn ) S we have f (s) = 0. Then there are
polynomials h1 (t1 , . . . , tn ), . . . , hn (t1 , . . . , tn ) such that
f=
hi gi .
i=1
tdi i +1
di
gij tji .
j=0
xdi i +1 =
di
gij xji .
j=0
n
i=1 hi gi .
177
We hope the reader has noticed that the proof of Theorem 179 bears more than a
passing resemblance to our rst proof of Warnings Theorem.
Example: Let k = Fq be a nite eld, and take S1 = . . . = Sn = Fq . Then
for 1 i n, gi = tqi ti . Applying the Combinatorial Nullstellensatz, we
see that a polynomial f which vanishes at every point in Fnq lies in the ideal
I0 = tq1 t1 , . . . , tqn tn . That is, we have yet again computed the kernel of
the evaluation map .
Exercise: a) Show that, in the notation of the proof of Theorem 179, the polynomials h1 , . . . , hn satisfy deg hi deg f deg gi for all 1 i n.
b) Show that the coecients of h1 , . . . , hn lie in the subring of k generated by the
coecients of f, g1 , . . . , gn .
Corollary 180. (Polynomial Method) Let k be a eld, n Z+ , a1 , . . . , an
N, and let f k[t] = k[t1 , . . . , tn ]. We suppose:
(i) deg f = a1 + . . . + an .
(ii) The coecient of ta1 1 tann in f is nonzero.
Then, for any subsets
S1 , . . . , Sn of k with #Si > ai for 1 i n, there is
n
s = (s1 , . . . , sn ) S = i=1 Si such that f (s) = 0.
Proof. It is no loss of generality to assume that #Si = ai + 1 for all i, and
we do so. We will show that if (i) holds and f |S 0, then (ii) does not hold, i.e.,
the coecient of ta1 1 tann in f is 0.
hi gi
i=1
and
deg hi (a1 + . . . + an ) deg gi , 1 i n,
so
(44)
deg hi gi deg f.
178
f (t1 , t2 ) =
(t1 + t2 c) Fp [t1 , t2 ]
cC
CHAPTER 16
Dirichlet Series
1. Introduction
In considering the arithmetical functions f : N C as a ring under pointwise
addition and convolution:
f g(n) =
f (d1 )g(d2 ),
d1 d2 =n
(f g)(m) :=
f (d1 )g(d2 ).
d1 d2 =m
But not so fast! For this denition to make sense, we either need some assurance
that for all m M the set of all pairs d1 , d2 such that d1 d2 = m is nite (so
1Recall that Lewis Carroll or rather Charles L. Dodgson (1832-1898) was a mathematician.
179
180
the sum is a nite sum), or else some analytical means of making sense of the sum
when it is innite. But let us just give three examples:
Example 1: (M, ) = (Z+ , ). This is the example we started with and of course
the set of pairs of positive integers whose product is a given positive integer is nite.
Example 2: (M, ) = (N, +). This is the additive version of the previous example:
(f g)(n) =
f (i)g(j).
i+j=n
Of course this sum is nite: indeed, for n N it has exactly n+1 terms. As we shall
see shortly, this additive convolution is closely related to the Cauchy product of
innite series.
Example 3: (M, ) = (R, +). Here we have seem to have a problem, because
for functions f, g : R C, we are dening
(f g)(x) =
f (x y)g(y),
f (d1 )g(d2 ) =
yR
d1 +d2 =x
and although it is possible to dene a sum over all real numbers, it turns out never
to converge unless f and g are zero for the vast majority of their values.2 However,
there is a well-known replacement for a sum over all real numbers: the integral.
So one should probably dene
(f g)(x) =
f (x y)g(y)dy.
Here still one needs some conditions on f and g to ensure convergence of this
improper integral. It is a basic result of analysis that if
|f | < ,
|g| < ,
if for all > 0, there exists a nite subset T S such that for all nite subsets T T we have
| xi T xi x| < . (This is a special case of a Moore-Smith limit.) It can be shown that such
a sum can only converge if the set of indices i such that xi = 0 is nite or countably innite.
1. INTRODUCTION
181
f (n)xn .
n=0
That is,to the sequence {f (n)} we associate the corresponding power series
F (x) = n f (n)xn . One can look at this construction both formally and analytically.
The formal construction is purely algebraic:
the ring of formal power series C[[t]]
consists of all expressions of the form n=0 an xn where the an s are complex numbers. We dene addition and multiplication in the obvious ways:
an xn +
n=0
n=0
an xn )(
bn xn :=
n=0
n=0
bn xn ) :=
(an + bn )xn ,
n=0
n=0
The latter denition seems obvious because it is consistent with the way we multiply
polynomials, and indeed
then polynomials C[t] sit inside C[[t]] as the subring of all
formal expressions
n an x with an = 0 for all suciently large n. Now note
that this denition of multiplication is just the convolution product in the additive
monoid (N, +):
a0 bn + . . . + an b0 = (a b)(n).
It is not immediately clear that anything has been gained. For instance, it is,
technically, not for free that this multiplication law of formal power series is associative (although of course this is easy to check). Nevertheless, one should not
underestimate the value of this purely formal approach. Famously, there are many
nontrivial results about sequences fn which can be proved
just by simple algebraic
182
N k1
.
(k 1)!(a1 ak )
Nevertheless we also have andneed an analytic theory of power series, i.e., of the
study of properties of F (x) = n an xn viewed as a function of the complex variable
x. This theory famously works out very nicely, and can be summarized as follows:
c) If two power series F (x) = n an xn , G(x) = n bn xn are dened and equal for
all x in some open disk of radius R > 0, then an = bn for all n.
In particular, it follows
of products of absolutely convergent
from Cauchys theory
series that if F (x) = n an xn and G(x) = n bn xn are two power series convergent on some disk of radius R > 0, then on this disk the function
F G the product
of F and G in the usual sense is given by the power series n (ab)(n)xn . In other
words, with suitable growth conditions on the sequences, we get that the product
of the transforms is the transform of the convolutions, as advertised.
Now we return to the case of interest: (M, ) = (Z+ , ). The transform that does
the trick is f 7 D(f, s), where D(f, s) is the formal Dirichlet series
D(f, s) =
f (n)
.
ns
n=1
(m,n)
(m,n)
where in both sums m and n range over over all positive integers. To make a Dirichlet series out of this, we need to collect all the terms with a given denominator, say
N s . The only way to get 1 in the denominator is to have m = n = 1, so the rst
. Now to get a 2 in the denominator we could have m = 1, n = 2
term is f (1)g(1)
1s
giving the term f (1)g(2)
or also m = 2, n = 1 giving the term f (2)g(1)
, so all in
2s
2s
all the numerator of the 2s -term is f (1)g(2) + f (2)g(1).
Aha. In general, to collect all the terms with a given denominator N s in the
183
f (n) g(n)
d|n f (d)g(n/d)
D(f, s) D(g, s) = (
)(
)=
= D(f g, s).
s
s
n
n
ns
n=1
n=1
n=1
Thus we have attained our goal: under the transformation which associates to
an arithmetical function its Dirichlet series D(f, s), Dirichlet convolution of arithmetical functions becomes the usual multiplication of functions!
There are now several stages in the theory of Dirichlet series:
Step 1: Explore the purely formal consequences: that is, that identities involving
convolution and inversion of arithmetical functions come out much more cleanly on
the Dirichlet series side.
Step 2: Develop the theory of D(f, s) as a function of a complex variable s. It
is rather easy to tell when the series D(f, s) is absolutely convergent. In particular,
with suitable growth conditions on f (n) and g(n), we can see that
D(f, s)D(g, s) = D(f g, s)
holds not just formally but also as an equality of functions of a complex variable.
In particular, this leads to an analytic proof of the Mobius Inversion Formula.
On the other hand, unlike power series there can be a region of the complex
plane with nonempty interior in which the Dirichlet series D(f, s) is only conditionally convergent (that is, convergent but not absolutely convergent). We will
present, without proofs, the basic results on this more delicate convergence theory.
In basic analysis we learn to abjure conditionally convergent series, but they lie
at the heart of analytic number theory. In particular, in order to prove Dirichlets
theorem on arithmetic progressions one studies the Dirichlet series L(, s) attached
to a Dirichlet character (a special kind of arithmetical function we will dene
later on), and it is extremely important that for all = 1, there is a critical strip
in the complex plane for which L(, s) is only conditionally convergent. We will
derive this using the assumed results about conditional convergence of Dirichlet
series and a convergence test, Dirichlets test, from advanced calculus.3 Finally,
as an example of how much more content and subtlety lies in conditionally convergent series, we will use Dirichlet series to give an analytic continuation of the zeta
function to the right half-plane (complex numbers with positive real part), which
allows for a rigorous and concrete statement of the Riemann hypothesis.
2. Some Dirichlet Series Identities
Example 1: If f = 1 is the constant function 1, then by denition D(1, s) is what is
probably the most single important function in all of mathematics, the Riemann
zeta function:
1
.
(s) = D(1, s) =
s
n
n=1
3P.G.L. Dirichlet propounded his convergence test with this application in mind.
184
1
.
(s)
Probably this is the most important such identity: it relates combinatorial methods (the Mobius function is closely related to the inclusion-exclusion principle) to
analytical methods. More on this later.
We record without proof the following further identities, whose derivations are
similarly straightforward. Some notational reminders: we write for the function
n 7 n; k for the function n 7 nk ; and for the function n 7 (1)(n) , where
(n) is the number of prime divisors of n counted with multiplicity.
D(, s) = (s 1).
D(k , s) = (s k).
D(, s) = (s)(s 1).
D(k , s) = (s)(s k).
D(, s) =
(s 1)
.
(s)
D(, s) =
(2s)
.
(s)
3. Euler Products
Our rst task is to make formal sense of an innite product of innite series, which
is unfortunately somewhat technical. Suppose that we have an innite indexing set
P and for each element of p of P an innite series whose rst term is 1:
n=0
Then by the innite product pP n ap,n we mean an innite series whose terms
3. EULER PRODUCTS
185
t T is just afunction
t : P N such that t(p) = 0 for all but nitely many p in
P .4 Then by pP n ap,n we mean the formal innite series
ap,t(p) .
tT pP
Note well that for eacht, since t(p) = 0 except for nitely many p and since ap,0 = 1
for all p, the product pP ap,t(p) is really a nite product. Thus the series is welldened formally that is, merely in order to write it down, no notion of limit of
an innite process is involved.
Let us informally summarize the preceding: to make sense of a formal innite
product of the form
we give ourselves one term for each possible product of one term from the rst
series, one term from the second series, and so forth, but we are only allowed to
choose a term which is dierent from the ap,0 = 1 term nitely many times.
With that out of the way, recall that when developing the theory of arithmetical functions, we found ourselves in much better shape under the hypothesis of
multiplicativity. It is natural to ask what purchase we gain on D(f, s) by assuming the multiplicativity of f . The answer is that multiplicativity of f is equivalent
to the following formal identity:
(
)
f (n)
f (p) f (p2 )
(46)
D(f, s) =
=
1 + s + 2s + . . . .
ns
p
p
p
n=1
Here the product extends over all primes. The fact that this identity holds (as an
identity of formal series) follows from the uniqueness of the prime power factorization of positive integers.
An expression as in (46) is called an Euler product expansion. If f is moreover
k
k
completely multiplicative, then fp(pks ) = ( fp(p)
s ) , and each factor in the product is a
geometric series with ratio
f (p)
ps ,
so we get
)1
(
f (p)
D(f, s) =
1 s
.
p
p
(
1
(n)
1
1
(47)
=
=
,
ns
(s)
ps
p
n=1
4The property that t(p) = 0 except on a nite set is, by denition, what distinguishes the
innite direct sum from the innite direct product.
186
and, plugging in s = 2,
(
)
6
1
(n)
1
=
=
=
1
.
2
(2) n=1 n2
p2
p
But not so fast! We changed the game here: so far (47) expresses a formal identity
of Dirichlet series. In order to be able to plug in a value of s, we need to discuss the
convergence properties of Dirichlet series and Euler products. In particular, since
we did not put any particular ordering on our formal innite product, in order for
the sum to be meaningful we need the series involved to be absolutely convergent.
It is therefore to this topic that we now turn.
4. Absolute Convergence of Dirichlet Series
Let us rst study the absolute convergence of Dirichlet series n anns . That is, we
Theorem 185. Suppose a Dirichlet series D(s) = n anns is absolutely convergent at some complex number s0 = 0 + it0 . Then it is also absolutely convergent
at all complex numbers s with = (s) > s0 .
Proof. If = (s) > 0 = (s0 ), then n > n0 for all n Z+ , so
an
|an | |an | an
| s| =
=
| s0 | < .
n
n
n 0
n
n=1
n=1
n=1
n=1
It follows that the domain of absolute convergence of a Dirichlet series D(f, s)
is one of the following:
(i) The empty set. (I.e., for no s does the series absolutely converge.)
(ii) (, ).
(iii) A half-innite interval of the form (S, ).
(iv) A half-innite interval of the form [S, ).
Notice that in all cases, there is a unique ac [, ] such that:
(AAC1) For all s with (s) > ac , D(s) is absolutely convergent.
(AAC2) For all s with (s) < ac , D(s) is not absolutely convergent.
This unique ac is called the abscissa of absolute convergence of D(s).
n
Example 1 (Type i): D(s) = n 2ns .
This series does not converge (absolutely or otherwise) for any s C: no matter
what s is, |2n ns | : exponentials grow faster than power functions. So
ac = .
Example 2 (Type ii): A trivial example is the zero series an = 0 for all n, or
5In other words, for a complex number s we write for its real part and t for its imaginary
part. This seemingly unlikely notation was introduced in a fundamental paper of Riemann, and
remains standard to this day.
187
for that matter, any series with an = 0 for all suciently large n: these give nite
sums. Or we could take an = 2n and now the series converges absolutely independent of s. So ac = .
Example 3 (Type iii): (s) =
1
ns
1
(log n)2 ,
C
=
C
< .
n
nN +1+
nN +1+
n1+
n
n
n
n
Corollary 188. Let f (n), g(n) be arithmetical functions with polynomial
growth of order N . Then
D(f, s)D(g, s) = D(f g, s)
is an equality of functions dened on (N + 1, ).
This follows easily from the theory of absolute convergence and the Cauchy product.
Theorem 189. (Uniqueness Theorem) Let f (n), g(n) be arithmetical functions
whose Dirichlet series are both absolutely convergent in the halfplane = (s) > 0 .
Suppose there exists an innite sequence sk of complex numbers, with k = (sk ) >
0 for all k and k such that D(f, sk ) = D(g, sk ) for all k. Then f (n) = g(n)
for all n.
188
Proof. First we put h(n) := f (n) g(n), so that D(h, s) = D(f, s) D(g, s).
Then our assumption is that D(h, sk ) = 0 for all k, and we wish to show that
h(n) = 0 for all n.
So suppose not, and let N be the least n for which h(n) = 0. Then
h(n)
h(N )
h(n)
D(h, s) =
=
+
,
ns
Ns
ns
n=N
n=N +1
so
h(n)
.
ns
h(N ) = N s D(h, s) N s
n=N +1
h(N ) = N sk
n=N +1
h(n)
.
nsk
n=N +1
|h(n)|n
N k
(N + 1)k c
n=N +1
|h(n)|n
(
C
N
N +1
)k
,
F
(d)(n/d).
d|n
Surely this was the rst known version of the Mobius inversion formula. Of course
as Hardy and Wright remark in their classic text, the real proof of MIF is the
purely algebraic one we gave earlier, but viewing things in terms of honest functions has a certain appeal.
Moreover, the theory of absolute convergence of innite products (see e.g. [A1,
11.5]) allows us to justify our formal Euler product expansions:
189
valid for all s with (s) > ac . If f is completely multiplicative, this simplies to
)1
(
f (p)
D(f, s) =
1 s
.
p
p
Euler products are ubiquitous in modern number theory: they play a prominent
role in (e.g.!) the proof of Fermats Last Theorem.
5. Conditional Convergence of Dirichlet Series
an
Let D(f, s) =
n=1 ns be a Dirichlet series. We assume that the abscissa of
absolute convergence ac is nite.
Theorem 193. There exists a real number c with the following properties:
(i) If (s) > c , then D(f, s) converges (not necessarily absolutely).
(ii) If (s) < c , then D(f, s) diverges.
Because the proof of this result is already somewhat technical, we defer it until
X.X on general Dirichlet series, where we will state and prove a yet stronger result.
Denition: c is called the abscissa of convergence.
Contrary to the case of absolute convergence we make no claims about the convergence or divergence of D(f, s) along the line (s) = : this is quite complicated.
Proposition 194. We have
0 ac c 1.
Proof. Since absolutely convergent series are convergent, we evidently must
have
a complex number such that
aacn . On the other hand, let s = + it abe
n
converges.
Of
course
this
implies
that
0 as n , and that in
n=1 ns
ns
turn implies that there exists an N such that n N implies | anns | = |ann | 1. Now
let s be any complex number with real part + 1 + for any > 0. Then for all
n N,
an
|an |
1
1
| s | = 1+ 1+ ,
n
n
n
n
so by comparison to a p-series with p = 1 + > 1, D(f, s ) is absolutely convergent.
It can be a delicate matter to show that a series is convergent but not absolutely
convergent: there are comparatively few results that give criteria for this. The
following one sometimes encountered in an advanced calculus class will serve us
well.
Proposition 195. (Dirichlets Test) Let {an } be a sequnece of complex numbers and {bn } a sequence of real numbers. Suppose both of the following hold:
N
(i) There exists a xed M such that for all N Z+ , | n=1 an | M (bounded
partial sums);
(ii) b1
b2 . . . bn . . . and limn bn = 0.
190
N
Proof. Write SN for n=1 , so that by (i) we have |SN | M for all N . Fix
1
> 0, and choose N such that bN < 2M
. Then, for all m, n N :
|
ak bk | =
n
k=m
=|
n1
k=m
k=m
=|
Sk b k
n1
Sk bk+1 |
k=m1
k=m
n1
k=m
( n1
)
= 2M bm 2M bN < .
k=m
Therefore the series satises the Cauchy criterion and hence converges.6
{an }
n=1
be a complex sequence.
Theorem 196. Let
N
a)
Suppose
that
the
partial
sums
n=1 an are bounded. Then the Dirichlet series
an
has
0.
c
n=1 ns
b) Assume in addition that an does not converge to 0. Then ac = 1, c = 0.
Proof. By Proposition 186, ac = 1. For any real number > 0, by
taking
bn = n1 the hypotheses of Proposition 195 are satised, so that D() = n nan
converges. The smallest right open half-plane which contains all positive real numbers is of course (s) > 0, so 0. By Proposition 194 we have 1 = ac 1 + ,
so we conclude that = 0.
Theorem 197. (Theorem 11.11 of [A1]) A Dirichlet series D(f, s) converges
uniformly on compact subsets of the half-plane of convergence (s) > .
Suce it to say that, in the theory of sequences of functions, uniform convergence on compact subsets is the magic incantation. As a consequence, we may
dierentiate and integrate Dirichlet series term-by-term. Also:
f (n)
Corollary 198. The function D(f, s) =
n=1 ns dened by a Dirichlet
series in its half-plane (s) > of convergence is complex analytic.
6. Dirichlet Series with Nonnegative Coecients
Suppose we are given a Dirichlet series D(s) = n anns with the property that for
all n, an is real and non-negative. There is more to say about the analytic theory
of such series. First, the non-negativity hypothesis ensures that for any real s, D(s)
is a series with non-negative terms, so its absolute convergence is equivalent to its
convergence. Thus:
Lemma 199. For a Dirichlet series with non-negative real coecients, the abscissae of convergence and absolute convergence coincide.
6This type of argument is known as summation by parts.
191
Thus one of the major dierences from the the theory of power series is eliminated
for Dirichlet series with non-negative real coecients. Another critical property of
all complex power series is that the radius of convergence R is as large as conceivably possible, in that the function necessarily has a singularity somewhere on the
boundary of the disk |z z0 | < R of convergence. This property need not be true
for an arbitrary Dirichlet series. Indeed the series
(1)n+1
1
1
D(s) =
= 1 s + s ...,
s
(2n + 1)
3
5
n=1
has = 0 but extends to an analytic function on the entire complex plane.7 However:
Theorem 200. (Landau) Let D(s) = n anns be a Dirichlet series, with an real
and non-negative for all n. Suppose that for a real number , D(s) converges in the
half-plane (s) > , and that D(s) extends to an analytic function in a neigborhood
of . Then strictly exceeds the abscissa of convergence c .
Proof (Kedlaya): Suppose on the contrary that D(s) extends to an analytic function
on the disk |s | < , for some > 0 but = c . Choose c (, + /2), and
write
D(s) =
an nc ncs =
an nc e(cs) log n
n
an nc (log n)k
n=1 k=0
k!
(c s)k .
Here we have a double series with all coecients non-negative, so it must converge
absolutely on the disk |s c| < 2 . In particular, viewed as a Taylor series in (c s),
this must be the Taylor series expansion of D(s) at s = c. Since D(s) is assumed
to be holomorphic in the disk |s c| < 2 , this Taylor series is convergent there.
In particular, choosing any real number with 2 < < , we have that
D( ) is absolutely convergent. But this implies that the original Dirichlet series is
convergent at , contradiction!
For example,
it follows from Landaus theorem that the Riemann zeta function
(s) = n n1s must have a singularity at s = 1, since otherwise there would exist
some < 1 such that the series converges in the entire half-plane (s) > .
Of course this is a horrible illustration of the depth of Landaus theorem, since
we used the fact that (1) = in order to compute the abscissa of convergence
of the zeta function! We will see a much deeper application of Landaus theorem
during the proof of Dirichlets theorem on primes in arithmetic progressions.
7. Characters and L-Series
Let f : Z+ C be an arithmetic function.
Recall that f is said to be completely mutliplicative if f (1) = 0 and for all
a, b Z, f (ab) = f (a)f (b). The conditions imply f (1) = 1.
7We will see a proof of the former statement shortly, but not the latter. More generally, it is
true for the L-function associated to any primitive Dirichlet character.
192
193
194
(n)
L(, s) = D(, s) =
.
ns
n=1
In particular, taking = 1 = 1, we get L(1 , s) = (s), which has ac = c = 1.
But this is the exception:
Theorem 203. Let be a nontrivial Dirichlet character. Then for the Dirichlet
L-series L(, s) = D(, s), we have ac = 1, c = 0.
Proof. It follows from the orthogonality relations [Handout A2.5, Theorem
17] that since is nonprincipal, the partial sums of L(, s) are bounded. Indeed
since |(n)| 1 for each n and the sum over any N consecutive values is zero, the
partial sums are bounded by N . Also we clearly have (n) = 1 for innitely many
n, e.g. for all n 1 (mod N ). So the result follows directly from Theorem 196.
We remark that most of the proof of the Dirichlets theorem specically, that
every congruence class n (Z/N Z) contains innitely many primes involves
showing that for every nontrivial character , L(, 1 + it) is nonzero for all t R.
This turns out to be much harder if takes on only real values.
8. An Explicit Statement of the Riemann Hypothesis
Let g be the arithmetical function g(n) = (1)n+1 . Then:
D(g, s) =
(1)n+1
1
1
=
2
= (s)(1 21s ).
s
s
s
n
n
(2k)
n=1
n=1
n=1
This formal manipulation holds analytically on the region on which all series are
absolutely convergent, namely on (s) > 1. On the other hand, by Example XX
above we know that D(g, s) is convergent for (s) > 0. So consider the function
D(g, s)
.
1 21s
By Corollary 198 the numerator is complex analytic for (s)
inator is dened and analytic on the entire complex plane,
21s = e(1s) log 2 = 1, or when 1 s = 2ni
log 2 for n Z, so when
But by construction Z(s) = (s) for (s) > 1, so Z(s) is what
morphic continuation of the zeta function.
Z(s) =
Remark: All of the zeroes of 21s are simple (i.e., are not also zeroes for the
derivative). It follows that for n = 0, Z(s) is holomorphic at sn i D(g, sn ) = 0.
We will see in the course of the proof of Dirichlets theorem that this indeed the
case, and thus Z(s) = (s) is analytic in (s) > 0 with the single exception of a
simple pole at s = 1.
However, our above analysis already shows that 21s is dened and nonzero in
the critical strip 0 < (s) < 1, so that for such an s, Z(s) = 0 D(g, s) = 0.
195
We can therefore give a precise statement of the Riemann hypothesis in the following (misleadingly, of course) innocuous form:
Conjecture 204. (Riemann Hypothesis) Suppose s is a zero of the function
D(g, s) =
(1)n
ns
n=1
an esn ,
n=1
c = lim sup
n
log |
k=1
n
k=1
ak |
log |
ak |
log |
k=1
ak |
c = lim sup
n
1
ln |
ai |.
n
i=1
196
But in fact it is no coincidence: just as general Dirichlet series generalize ordinary Dirichlet series which we recover by taking n = log n, they also generalize
power series which we essentially recover by taking n = n. Indeed,
an ens =
an xn ,
n=1
n=1
with x = es . This change of variables takes right half-planes to disks around the
origin: indeed the open disk |x| < R corresponds to
|x| = |es | = |eit | = e < R,
or > log R, a right half-plane. Under the change of variables x = es the origin
x = 0 corresponds to some ideal complex number with innitely large real part.
At rst the fact that we have a theory which simultaneously encompasses Dirichlet
series and power series seems hard to believe, since the open disks of convergence
and of absolute convergence for a power series are identical. However, the analogue
of Proposition 194 for general Dirichlet series is
Proposition 206. Let D (a, s) be a general Dirichlet series. Then the abscissae of absolute convergence and of convergence are related by:
log n
0 ac c lim sup
.
n
n
In the case n = n we have
log n
n
We leave it as an exercise for the interested reader to compare the formulae (48)
1
and (49) with Hadamards formula R1 = lim supn |an | n for the radius of convergence of power series. (After making the change of variables x = es they are not
identical formulae, but it is not too hard to show that they are equivalent in the
sense that any of them can be derived from the others without too much trouble.)
CHAPTER 17
Pa (s) =
pa
(mod N )
1
,
ps
which is dened say for real numbers s > 1, and to show that lims1+ Pa (s) = +.
Of course this suces, because a divergent series must have innitely many terms!
The function Pa (s) will in turn be related to a nite linear combination of logarithms
of Dirichlet L-series, and the diering behavior of the Dirichlet series for principal
and non-principal characters is a key aspect of the proof. Indeed, the fuel for the
entire proof is the following surprisingly deep fact:
Theorem 208. (Dirichlets Nonvanishing Theorem) For any non-principal Dirichlet character of period N , we have L(, 1) = 0.
1In fact, with relatively little additional work, one can show that the primes are, in a certain
precise sense, equidistributed among the (N ) possible congruence classes.
197
198
There are many possible routes to Theorem 208. We have chosen (following Serre)
to present a proof which exploits the theory of Dirichlet series which we have developed in the previous handout in loving detail. As in our treatment of Dirichlet
series, we do nd it convenient to draw upon a small amount of complex function
theory. These result are summarized in Appendix C, which may be most useful
for a reader who has not yet been exposed to complex analysis but has a good
command of the theory of sequences and series of real functions.
I hope that readers who are unable or unwilling to check carefully through all
the analytic details of the proof will still gain an appreciation for the sometimes
dicult but also quite beautiful ideas which are on display here. It may be appropriate for me to end this introduction with a personal statement. I believe
that I rst encountered the proof of Dirichlets theorem during a reading course in
(mostly analytic) number theory that I took as an undergraduate with Professor R.
Narasimhan, but in truth I have little memory of it. For my entire graduate career
I neglected analysis in general and analytic number theory in particular, to the extent that I came to regard the study of conditionally convergent series as a sort of
idle amusement. As a postdoc in Montreal I found myself in an environment where
analytic and algebraic number theory were regarded with roughly equal importance
(and better yet, often practiced simultaneously). Eventually the limitations of my
overly algebraic bias became clear to me, and since my arrival at UGA I have made
some progress working my way back towards a more balanced perspective.
Dirichlets theorem points the way towards modern analytic number theory
more than any other single result (even more than the Prime Number Theorem,
in my opinion, whose analytic proof is harder but less immediately enlightening).
Thus I came to the desire to dicsuss the proof of Dirichlets theorem in the course
(which was not done the rst time I taught it).
The proof that I am about to present is not substantively dierent from what
can be found in many other texts (and especially, to the proof given in [Se73]).
Nevertheless, in order to both follow every detail of the proof and also to get a
sense of what was going on in the proof as a whole took me dozens of hours of
work, much more so than any other topic in this course. But to nally be able to
present the proof feels wonderful, like coming home again. So although I have done
what I can to present this material as transparently as possible, not only will I be
sympathetic if you nd parts of it confusing the rst time around, I will even be a
little jealous if you dont! But do try to enjoy the ride.
2. The Main Part of the Proof of Dirichlets Theorem
2.1. Prelude on complex logarithms.
We begin rather inauspiciously by discussing logarithms. By a complex logarithm,
we mean a holomorphic function L(z) such that eL(z) = z. As compared to the
usual real logarithm, there are two subtleties. First, there are multiple such functions: since ez+2in = ez for all z, if L(z) is any complex logarithm, so is L(z)+2in
for any integer n. More seriously, no complex logarithm can be dened on the entire complex plane. Clearly we cannot have a logarithm dened at 0, since 0 is not
in the image of the complex exponential function. In complex analysis one learns
that if one removes from the complex line any ray passing through the origin
199
the real interval (, 0] being the most standard choice then one can dene a
complex logarithm on this restricted domain. In particular, given any open disk in
the complex plane which does not contain the origin, there is a complex logarithm
dened on that disk.
For the moment though, let us proceed exactly as in calculus: we dene a function
log(1 z) for |z| < 1 by the following convergent Taylor series expansion:
(50)
log(1 z) =
zn
.
n
n=1
In our analysis, we will come to a point where we have an analytic function, say
f (z), and we will want initially want to interpret log f (z) in a rather formal way,
i.e., simply as the series expansion
(1 f (z))n
.
log(1 (1 f (z))) =
n
n=1
It will be clear for our particular f (z) that the series converges to an analytic
function, say g, of z. The subtle point is whether g really is a logarithm of f
in the above sense, i.e., whether and for which values of z we have eg(z) = f (z).
Our expository choice here is to state carefully the claims we are making about
logarithms during the course of the proof and then come back to explain them at
the end. Readers with less familiarity with complex analysis may skip these nal
justications without fear of losing any essential part of the argument.
2.2. The proof. To begin the proof proper, we let X(N ) denote the group
of Dirichlet characters modulo N . Fix a with gcd(a, N ) = 1 as in the statement of
Dirichlets theorem.
Write Pa for the set of prime numbers p a (mod N ), so our task is of course
to show that Pa is innite. For this we consider the function
1
Pa (s) :=
,
ps
pPa
dened for s with (s) > 1. Our goal is to show that Pa (s) approaches innity as s
approaches 1. (It would be enough to show this for real i.e., lim1+ Pa () =
but nevertheless for the proof it is useful to consider complex s.)
Remark: Notice that this gives more than just the innitude of Pa : it shows that
it is substantial in the sense of Handout X.X.
The overarching idea of the proof is to express Pa (s) in terms of some Dirichlet
L-series for characters X(N ), and thus to reduce the unboundedness of Pa (s)
as s 1 from some corresponding analytic properties of L-series near s = 1.
Why should Pa (s) have anything to do with Dirichlet L-series? First, dene 1a
be the characteristic function of the congruence class a (mod N ): i.e., 1a (n) is 1
if n a (mod N ) and is 0 otherwise. Then Pa (s) is reminiscent of the Dirichlet
series for the arithmetical function 1a , except it is a sum only over primes. Note
that since 1a is not a multiplicative function, it would be unfruitful to consider
200
its Dirichlet series D(1a , s) it does not have an Euler product expansion. Nevertheless 1a has some character-like properties: it is N -periodic and it is 0 when
gcd(n, N ) > 1. Therefore 1a is entirely determined by the corresponding function
U (N ) C, n (mod N ) 7 1a (n).
Now recall from [Handout A2.5, 4.3] that any function f : U (N ) C can
be uniquely expressed as a C-linear combination of characters; [Ibid, Corollary 18]
even gives an explicit formula.
With all this in mind, it is easy to discover the following result (which we may
as well prove directly):
Lemma 209. For all n Z, we have
1a (n) =
X(N )
(a)1
(n).
(N )
Proof: By the complete multiplicativity of the s, the right hand side equals
1
(a1 n) ,
(N )
X(N )
The terms
(p)
ps
log(1
(p)
).
ps
Expanding out this logarithm using the series (50) as advertised above, we get
(p)
(52)
log(L(, s)) =
( s )n /n.
p
p
n
But we regard the above as just being motivational; let us now be a little more
precise. The right hand side of (52) is absolutely convergent for (s) > 1 and
uniformly convergent on closed half-planes (s) 1 + . So if we simply dene
( (p) )n
(, s) :=
/n,
ps
p
n
then, whatever else it may be, (, s) is an analytic function on the half-plane
(s) > 1. Of course we know what the whatever else should be:
201
First Claim on Logarithms: In the halfplane (s) > 1, we have e(,s) = L(, s).
As stated above, we postpone justication of this claim until the next section.
Notice that the n = 1 contribution to (, s) alone gives precisely the sums appearing in (51); there are also all the n 2 terms, which we dont want. So lets
separate out these two parts of the series: denining
(p)
1 (, s) =
ps
p
and
R(, s) =
(p)n
n2 p
npns
we have
(, s) = 1 (, s) + R(, s)
and also
Pa (s) =
X(N )
(a1 )
1 (, s).
(N )
But recall what were trying to show: that Pa (s) is unbounded as s 1. If were
trying to show that something is bounded, any terms which do remain bounded as
s 1 can be ignored. But
1
( 1 )n
|R(, 1)|
npn
p
p
p
n2
n2
1 p
1
2
< .
2 p1
2
2
p
p
n
p
p
n=1
X(N )
(a)1
(, s) + O(1);
(N )
1 s
Pa (s) =
p +
(, s) + O(1).
(N )
p-N
=N
202
s1
1
lim+ Pa (s) =
= +,
ps
s1
pa
(mod N )
sn
n
n=1
sn
n=1 n
=1s
for all such s. By the principle of analytic continuation, the corresponding complex
power series gives a well-dened logarithm whenever it is dened, which is at least
for complex s with |s| < 1. We have
lim
(s)+
L(, s) = 1,
so that there exists a 0 such that (s) > 0 implies |1 L(, s)| < 1. Thus in this
halfplane we do have e(,s) = L(, s). By the principle of analytic continuation,
this identity will continue to hold so long as both sides are well-dened analytic
functions, which is the case for all (s) > 1, justifying the rst claim on logarithms.
Similar reasoning establishes the second claim: since L(, s) is analytic and nonzero
at s = 1, there exists some small open disk about L(, 1) which does not contain the
origin, and therefore we can choose a branch of the logarithm such that log L(, s)
is well-dened on the preimage of that disk, so in particular on some small open
disk D about s = 1. Then log L(, 1) is a well-dened complex number. It may not
be equal to our (, 1), but since any two logarithms of the same analytic function
dier by a constant integer multiple of 2i, by the principle of analytic continuation
there exists some n Z such that (, s) 2n = log(, s) on the disk D, and no
matter what n is, this means that (, s) remains bounded as s 1.
3. NONVANISHING OF L(, 1)
203
3. Nonvanishing of L(, 1)
We claim that L(, 1) = 0 for all nonprincipal characters X(N ). Our argument
is as follows: consider the behavior of the Dedekind zeta function
N (s) =
L(, s).
X(N )
is that the Dirichlet series N (s) has a very particular form. To see this,
just a little notation: for a prime p not dividing N , let f (p) denote the
)
p in the unit group U (N ), and put g(p) = (N
f (p) , which is by Lagranges
a positive integer. Now:
N (s) =
(
p-N
1
1
)g(p) .
pf (p)s
b) Therefore N (s) is a Dirichlet series with non-negative integral coecients, converging absolutely in the half-plane (s) > 1.
Proof. Let f (p) be the group of f (p)th roots of unity. Then for all p - N we
have the polynomial identity
(1 wT ) = 1 T f (p) .
wf (p)
Indeed, both sides have the f (p)th roots of unity as roots (with multiplicity one),
so they dier at most by a multiplicative constant; but both sides evaluate to 1 at
T = 0. Now by the Character Extension Lemma [Lemma 13, Handout A2.5], for
all w f (p) there are precisely g(p) elements X(N ) such that (p) = w. This
establishes part a), and part b) follows from the explicit formula of part a).
Now for a deus ex machina. We are given that N (s) is a Dirichlet series with nonnegative real coecients. Therefore we can apply Landaus Theorem: if is the
abscissca of convergence of the Dirichlet series, then the function N (s) has a singulariety at . Clearly 1, so, contrapositively, if N (s) does not have a singularity
at s = 1, then not only does N (s) extend analytically to some larger halfplane
(s) > 1 , but it extends until it meets a singularity on the real line. But we
have already seen that each Dirichlet L-series is holomorphic for 0 < (s) < 1, so
Landaus theorem tells us that 0.
If you think about it for a minute, it is exceedingly unlikely that a Dirichlet series
with non-negative integral coecients has absissca of convergence 0, and in
204
our case it is quite straightforward to see that this is not the case: take s to be in
the real interval (0, 1). Expanding out the pth Euler factor we get
(
)
1
1
1
(
)g(p) = 1 + pf (p)s + p2f (p)s + . . . .
1
1 pf (p)s
Ignoring all the crossterms gives a crude upper bound: this quantity is at least
1
1
1 + (N )s + 2(N )s + . . . .
p
p
Multiplying this over all p, it follows that
1
N (s)
.
(N
)s
n
n | (n,N )=1
When we evaluate at s =
1
(N )
we get
(n,N )=1
1
.
n
Since the set of integers prime to N has positive density, it is substantial. More
concretely, since every n of the form N k + 1 is coprime to N , this last sum is at
least as large as
1
= .
Nk + 1
k=1
QED!
CHAPTER 18
aij xi xj ,
1ijn
where the coecients aij are usually either integers or rational numbers (although
we shall also be interested in quadratic forms with coecients in Z/nZ and R). For
instance, a binary quadratic form is any expression of the form
q(x, y) = ax2 + bxy + cy 2 .
As for most Diophantine equations, quadratic forms were rst studied over the
integers, meaning that the coecients aij are integers and only integer values of
x1 , . . . , xn are allowed to be plugged in. At the end of the 19th century it was
realized that by allowing the variables x1 , . . . , xn to take rational values, one gets
a much more satisfactory theory. (In fact one can study quadratic forms with coecients and values in any eld F . This point of view was developed by Witt in
the 1930s, expanded in the middle years of this century by, among others, Pster
and Milnor, and has in the last decade become especially closely linked to one of
the deepest and most abstract branches of contemporary mathematics: homotopy
K-theory.) However, a wide array of repower has been constructed over the years
to deal with the complications presented by the integral case, culminating recently
in some spectacular results. Here we will concentrate on what can be done over the
rational numbers, and also on what statements about integral quadratic forms can
be directly deduced from the theory of rational quadratic forms.
Let us distinguish two types of problems concerning a quadratic form q(x1 , . . . , xn ),
which we will allow to have either integral or rational coecients aij .
Homogeneous problem (or isotropy problem): Determine whether there exist
integers, x1 , . . . , xn , not all zero, such that q(x1 , . . . , xn ) = 0. A quadratic form
such that q(x) = 0 has a nontrivial integral solution is said to be isotropic; if there
is no nontrivial solution it is said to be anisotropic.
205
206
Example 0: The sum of squares forms x21 + . . . + x2n are all anisotropic. Indeed,
for any real numbers x1 , . . . , xn , not all zero, x21 + . . . + x2n > 0: a form with this
property is said to be positive denite.
Example 1: The Z-quadratic form x2 ny 2 is isotropic i n is a perfect square.
Inhomogeneous problem: For a given integer n, determine whether the equation
q(x1 , . . . , xn ) = n has an integer solution (if so, we say q represents n). More
generally, for xed q, determine all integers n represented by q.
Example 2: We determined all integers n represented by a x21 + x22 , and stated
without proof the results for the quadratic forms x21 + x22 + x23 and x21 + x22 + x23 + x24 ;
in the latter case, all positive integers are represented.
In general the inhomogeneous problem is substantially more dicult than the homogeneous problem. One reason why the homogeneous problem is easier is that,
even if we originally state it in terms of the integers, it can be solved using rational
numbers instead:
Proposition 211. (Principle of homogeneous equivalence) Let P (x1 , . . . , xn )
be a homogeneous polynomial with integral coecients. Then P (x1 , . . . , xn ) has a
nontrivial solution with x1 , . . . , xn Z i it has a nontrivial solution with x1 , . . . , xN
Q.
Proof. Of course a nontrivial integral solution is in particular a nontrivial
rational solution. For the converse, assume there exist pq11 , . . . , pqnn , not all 0, such
that P ( pq11 , . . . , pqnn ) = 0. Suppose P is homogeneous of degree k. Then for any
R , we have
P (x1 , . . . , xn ) = k P (x1 , . . . , xn ),
since we can factor out k s from every term. So let N = lcm(q1 , . . . , qn ). Then
P (N
p1
pn
p1
pn
, . . . , N ) = N k P ( , . . . , ) = N k 0 = 0,
q1
qn
q1
qn
Thus the homogeneous problem for integral forms (of any degree) is really a problem about rational forms.
Remark: The inhomogeneous problem still makes sense for forms of higher degree, but to solve it even for rational forms is generally extremely dicult. For
instance, Selmer conjectured in 1951 that a prime p 4, 7, 8 (mod 9) is of the
form x3 + y 3 for two rational numbers x and y. A proof of this in the rst two
cases was announced (but not published) by Noam Elkies in 1994; more recently,
Dasgupta and Voight have carefully written down a proof of a slightly weaker result
[DV09]. The case of p 8 (mod 9) remains open. In this case (i.e., that of binary
cubic forms) the rich theory of rational points on elliptic curves can be fruitfully
applied. Even less is known about (say) binary forms of higher degree.
207
208
2. LEGENDRES THEOREM
209
210
Some remarks on the conditions: if a, b and c are all positive or all negative, the
quadratic form is denite over R and has no nontrivial real solutions. Because
integral isotropy is equivalent to rational isotropy, we may adjust a, b and c by any
rational square, and therefore we may assume that they are squarefree integers.
Moreover, if two of them are divisible by a prime p, then they are both exactly divisible by p, and by a simple ordp argument the equation certainly has no solutions
unless p divides c. But then we may divide through a, b and c by p.
Let us prove the easy half of this theorem now, namely showing that these conditions are necessary. In fact, let us show that they are precisely the conditions
obtained by postulating a primitive integral solution (x, y, z) and going modulo a,
b and c. Indeed, go modulo c: we get
ax2 by 2
(mod c);
(mod c).
Suppose rst that there exists some prime p | c such that p | x. Then since
gcd(b, c) = 1, we get p | y, and that implies p2 | ax2 by 2 = cz 2 . Since c
is squarefree, this implies p | z, contradicting primitivity. Therefore x is nonzero
modulo every prime p dividing c, so x is a unit modulo c, and we can divide, getting
ab (byx1 )2
(mod c),
which is condition (i). By symmetry, reducing modulo a we get (ii) and reducing
modulo b we get (iii).
Following Ireland and Rosen, to prove the suciency we will state the theorem
in an equivalent form, as follows:
Theorem 215. (Legendres theorem restated) For a and b positive squarefree
integers, the equation
ax2 + by 2 = z 2
has a nontrivial integral solution i all of the following hold:
(i) a b.
(ii) b a.
(iii) dab2 d, where d = gcd(a, b).
We leave it as a (not dicult, but somewhat tedious) exercise to the reader to check
that Theorem 215 is equivalent to Theorem 214.
Now we prove the suciency of the conditions of Theorem 215.
The result is obvious if a = 1.
Case 1: a = b. The theorem asserts that ax2 + ay 2 = z 2 has a solution i 1
is a square modulo a. By the rst supplement to QR, this is last condition is equivalent to: no prime p 3 (mod 4) divides a. If this condition holds then by the two
squares theorem we have a = r2 +s2 , and then we can take x = r, y = s, z = r2 +s2 .
On the other hand, if there exists p | a, p 3 (mod 4), then taking ordp of both sides
of the equation z 2 = a(x2 + y 2 ) gives a contradiction, since ordp (z 2 ) = 2 ordp (z) is
2. LEGENDRES THEOREM
211
even, and ordp (a(x2 + y 2 )) = ordp (a) + ordp (x2 + y 2 ) = 1 + ordp (x2 + y 2 ) implies
ordp (x2 + y 2 ) is odd, contradicting the Two Squares Theeorem.
If b > a, we can interchange a and b, so we may now assume that a > b.
We will now prove the theorem by a descent-type argument, as follows: assuming
the hypotheses of Theorem 215 we will construct a new form Ax2 +by 2 = z 2 satisfying the same hypotheses, with 0 < A < a, and such that if this latter equation has a
nontrivial solution then so does ax2 + by 2 = z 2 . We perform this reduction process
repeatedly, interchanging A and b if A < b. Since each step reduces max(A, b),
eventually we will be in the case A = 1 or A = b, in which we have just shown
the equation has a solution. Reversing our sequence of reductions shows that the
original equation has a solution.
Now, since b a, there exist T and c such that
(53)
c2 b = aT,
a2
,
4
a
< a.
4
Claim: A b.
Recalling d = gcd(a, b), write a = a1 d, b = b1 d, so that gcd(a1 , b1 ) = 1; since
a and b are squarefree, this implies gcd(a1 , d) = gcd(b1 , d) = 1. Then (53) reads
c2 b1 d = a1 dAm2 = aAm2 .
So d | c2 , and since d is squarefree, d | c. Put c = c1 d and cancel:
(54)
dc21 b1 = Aa1 m2 .
Aa21 m2 a1 b1
(mod d).
Now, any common prime factor p of m and d would divide both b1 and d, a contradiction; so gcd(m, d) = 1. Since ab
d2 = a1 b1 is a square modulo d by (iii)
and a1 and m are units modulo d, (55) implies that A d. Moreover, c2 aAm2
(mod b1 ). Since a b, a b1 . Also gcd(a, b1 ) = 1 a common divisor would divide
d, but gcd(b1 , d) = 1 and similarly gcd(m, b1 ) = 1. So
A c2 (am2 )1
(mod b1 ),
212
(mod r),
AX 2 = Z 2 bY 2 .
aX 2 + bY 2 Z 2
(mod pa )
213
(59)
a, bp = 1
p
This has the extremely useful upshot that instead of having to check congruences
modulo all powers of all primes and a sign condition, it suces to omit any one
p from these checks. In particular, we could omit p = from the checking
and get the following result which looks hard to believe based upon the proof we
gave: if ax2 + by 2 = z 2 has a solution modulo pa for all p and a, then it necessarily has an integral solution: in particular the condition that a and b are not
both positive follows automatically from all the congruence conditions, although it
is certainly independent of any nite number of them!
In fact, with a bit of hindsight one can see that the condition of whether or not
there is going to be a solution modulo all powers of 2 is the most complicated one.
This is taken into account in the statement of Legendres theorem: the congruence
conditions on their own would not imply that a, b2 = +1 without the sign conditions (conditions at ), so somehow Legendres clean conditions exploit this
slight redundancy. To see this, consider the case of a = b = 1, which has solutions
modulo every power of an odd prime, but no nontrivial solutions modulo 4 (and
also no real solutions).
Hilbert also found explicit formulae for a, bp in terms of Legendre symbols. For
the sake of concision we do not state it here. However, we cannot help but mentioning that if one knows these formulae (which are not so hard to prove), then
the relation (59) is equivalent to knowing quadratic reciprocity together with its
rst and second supplements! It turns out that all aspects of the theory rational
214
quadratic forms can be generalized to the case where the coecients lie not in
Q but in an arbitrary algebraic number eld K. In particular, a suitable version
of Hilberts reciprocity law holds over K, and this is a very clean way to phrase
quadratic reciprocity over number elds.
4. The Local-Global Principle
We are now in a position to state what is surely one of the most important and
inuential results in all of number theory.
Theorem 217. (Hasse-Minkowski) Let q(x1 , . . . , xn ) be an integral quadratic
form. The following are equivalent:
a) q is isotropic (over Z over Q).
b) q is isotropic over R, and for all n Z+ , there are nontrivial solutions to the
congruence q(x1 , . . . , xn ) 0 (mod n).
It is clear that a) = b). Indeed, in contrapositive form, this has been our favorite easy method for showing that an equation does not have a solution: any
integral solution also gives a real solution and a solution to every possible congruence. The matter of it is in the converse, which asserts that if a quadratic form
q(x1 , . . . , xn ) = 0 does not have an integral solution, we can always detect it via
congruences and/or over the real numbers.
This turns out to be the master theorem in the area of rational quadratic forms. It
is not (yet) stated in a form as explicit as Legendres theorem for ternary quadratic
forms which, recall, did not just assert that isotropy modulo n for all n implied
isotropy over Z (or equivalently, over Q) but actually said explicitly, in terms of
the coecients, a nite list of congruence conditions to check. Indeed one knows
such explicit conditions in all cases, and we will return to mention them in the next
section, but for now let us take a broader approach.
First, even in its qualitative form the theorem gives an algorithm for determining
whether any quadratic form is isotropic. Namely, we just have to search in parallel
for one of the two things:
(i) Integers x1 , . . . , xn , not all 0, such that q(x1 , . . . , xn ) = 0.
(ii) An integer N such that the congruence q(x1 , . . . , xn ) 0 (mod N ) has only
the all-zero solution.
For any given N , (ii) is a nite problem: we have exactly N n 1 values to plug
in and see whether we get 0. Similarly, if we wanted to check all tuples of integers
(x1 , . . . , xn ) with maxi |xi | M , then that too is obviously a nite problem. Conceivably we could search forever and never nd either a value of M as in (i) or a
value of N as in (ii) for sure we will never nd both! but the Hasse-Minkowksi
theorem asserts that if we search long enough we will nd either one or the other.
This then is our algorithm!
In point of fact the situation is better for part (ii): it can be shown that for any
degree k form P (x1 , . . . , xn ) with integer coecients, there is a recipe (algorithm!)
for computing a single value of N such that if P (x1 , . . . , xn ) 0 (mod N ) has a
215
nontrivial solution, then for all N the congruence has a solution. Moreover, one
can determine whether or not there are any real solutions (using methods from
calculus). For this the two essential tools are:
(i) The Weil bounds for points on curves over Z/pZ, which allows one to compute a nite set of primes S such that for all p > S the congruence P 0 (mod p)
automatically has nontrivial solutions (in fact, a number of solutions which tends
to with p).
This is a serious piece of mathematics dating from around the 1940s.
(ii) Hensels Lemma, which gives sucient conditions for lifting a solution (x1 , . . . , xn )
to P 0 (mod p) to solutions modulo all higher powers pa of p.
This turns out to be surprisingly similar to Newtons method for nding roots
of equations, and the proof is relatively elementary.
Alas, we do not have time to say more about either one.
So in nite time we can determine whether or not there is any value of N for
which P (x1 , . . . , xn ) 0 has only the trivial solution, and we can also tell whether
there are real solutions. Of course, if P = 0 fails to have congruential solutions
and/or real solutions, then we know it cannot have nontrivial integral (equivalently, rational) solutions. But suppose we nd that our form P passes all these
tests? Can we then assert that it has a nontrivial integral solution?
As we have just seen (or heard), the answer is a resounding yes when P is a
quadratic form. In general, whenever the answer to this question is yes, one
says that the local-global principle, or Hasse principle, holds for P . Of course
the big question is: does the Hasse principle hold for all forms of higher degree?
One can also ask whether the Hasse principle holds for not-necessarily homogeneous polynomials, like x2 + y 3 + z 7 = 13. The following remarkable result shows
that it could not possibly hold for all polynomials in several variables over the
integers.
Theorem 218. (Davis-Matijasevic-Putnam-Robinson) There is no algorithm
that will accept as input a polynomial P (x1 , . . . , xn ) with integral coecients and
output 1 if P (x1 , . . . , xn ) = 0 has an integral solution, and 0 otherwise.
Since we just said that there is an algorithm which determines if a polynomial (not
necessarily homogeneous, in fact) has congruential solutions and real solutions,
there must therefore be some polynomials which pass these tests and yet still have
no solutions.
Remark: It is unknown whether there exists an algorithm to decide if a polynomial with rational coecients has a rational solution.
One might think that such counterexamples to the Hasse principle might be in
some sense nonconstructive, but this is not at all the case:
216
Theorem 219. The following equations have congruential solutions and real
solutions, but no nontrivial integral solutions:
a) (Selmer) 3X 3 + 4Y 3 + 5Z 3 = 0;
b) (Bremner) 5w3 + 9x3 + 10y 3 + 12z 3 = 0.
These are just especially nice examples. It is known (if not well-known) that for
every k > 2 there is a form P (x, y, z) = 0 of degree k which violates the local-global
principle. In fact some of my own work has been devoted to constructing large (in
particular, innite) sets of counterexamples to the local-global principle.
There are however some further positive results, the most famous and important
being the following:
Theorem 220. (Birch) Let k be a positive integer. Then there exists an n0 (k)
with the following property:
a) If k is odd, then every degree k form P (x1 , . . . , xn ) = 0 in n n0 variables has
a nontrivial integral solution.
b) If k is odd and P (x1 , . . . , xn ) is a degree k form in n n0 variables with lowdimensional singularities, then P has a nontrivial integral solution i it has a
nontrivial real solution.
Remark: The condition of low-dimensional singularities is a bit technical. Let us
rather dene what it means for an equation to have no singularities at all, which
is a special case. A nontrivial complex solution (x1 , . . . , xn ) to P (x1 , . . . , xn ) at
P
which all the partial derivatives x
vanish is called a singular point. (Perhaps
i
you remember from multivariable calculus these are the points at which a curve or
surface can be not so nice: i.e., have self-intersections, cusps, or other pathologies.) P is said to be nonsingular if there are no singular points. In particular,
one immediately checks that a diagonal form P (x1 , . . . , xn ) = a1 xk1 + . . . + ak xkn
is nonsingular, so Birchs theorem applies to diagonal forms, and in particular to
quadratic forms. (As far as I know it is an open problem whether the theorem
holds for forms of even degree without any additional hypotheses.)
Thus morally, if only there are enough variables compared to the degree, then
all congruence conditions are automatically satised and moreover th. However, in
the proof n0 does indeed have to be very large compared to k, and it is quite an
active branch of analytic number theory to improve upon these bounds.
Another idea, which we shall be able to express only vaguely and see an example of in the case of the inhomogeneous problem for integral quadratic forms, is
that if one asks as a yes/no question whether or not the existence of congruential
solutions and real solutions is enough to ensure the existence of integral solutions,
then one has to take rather drastic measures e.g., enormously many variables
compared to the degree, as above to ensure that the answer is yes rather than
no most of the time. However, if one can somehow quantify the failure of a
local-global phenomenon, then one can hope that in any given situation it fails
only to a nite extent.
217
CHAPTER 19
220
of congruence and sign do not take into account the size of the coecients of the
quadratic form, whereas one clearly wants some or all of the coecients to be small
in order for a positive denite quadratic form to have a ghting chance at representing small positive integers.
So what to do?
Let us describe some of the ways that various mathematicians have reacted to
this question over the years.
1. The Davenport-Cassels Lemma
Here is a beautiful observation which allows us to solve the representation problem
for x2 + y 2 + z 2 :
n
Lemma 222. (Davenport-Cassels) Let q(x) = f (x1 , . . . , xn ) = i,j=1 aij xi xj
be a quadratic form with aij = aji Z. We suppose condition (DC): that for any
y = (y1 , . . . , yn ) Qn \ Zn , there exists x = (x1 , . . . , xn ) Zn such that
0 < |q(x y)| < 1.
Then, for any integer d, q represents d rationally i q represents d integrally.
Proof. For x, y Qn , put xy := 12 (q(x+y)q(x)q(x)). Then (x, y) 7 xy
is bilinear and x x = q(x). Note that for x, y Zn , we need not have x y Z,
but certainly we have 2(x y) Z. Our computations below are parenthesized so
as to emphasize this integrality property.
Let d Z, and suppose that there exists x Qn such that q(x) = d. Equivalently,
there exists t Z and x Zn such that t2 d = x x . We choose x and t such that
|t| is minimal, and it is enough to show that |t| = 1.
221
Remark 1: Suppose that the quadratic form q is anisotropic. Then condition (DC)
is equivalent to the following more easily veried one: for all x Qn , there exists
y Zn such that |q(x y)| < 1. Indeed, since x Zn and y Zn , x y Zn . In
particular x y = (0, . . . , 0), so since q is anistropic, necessarily |q(x y)| > 0.
Remark 2: Lemma 222 has a curious history. So far as I know there is no paper of Davenport and Cassels (two eminent 20th century number theorists) which
contains it: it is more folkloric. The attribution of this result seems to be due to
J.-P. Serre in his inuential text [Se73]. Later, Andre Weil pointed out [W] that
in the special case of f (x) = x21 + x22 + x23 , the result goes back to a 1912 paper of
the amateur mathematician L. Aubry [Au12].
There is also more than the usual amount of variation in the hypotheses of this
result. Serres text makes the additional hypothesis that f is positive denite
i.e., x = 0 = f (x) > 0. Many of the authors of more recent number theory
texts that include this result follow Serre and include the hypothesis of positive
deniteness. Indeed, when I rst wrote these notes in 2006, I did so myself (and included a place-holder remark that I belived that this hypothesis was superuous).1
To get from Serres proof to ours requires only (i) inserting absolute values where
appropriate, and (ii) noting that whenever we need x y to be integral, we have
an extra factor of 2 in the expression to make it so. The result is also stated and
proved (in a mildly dierent way) in Weils text.
Remark 3: In the isotropic case, the stronger hypothesis 0 < |q(x y)| < 1 is
truly necessary. Consider for instance q(x, y) = x2 y 2 : we ask the reader to show
that 2 is represented rationally but not integrally.
One might call a quadratic form Euclidean if it satises (DC). For example, the
quadratic form q(x, y) = x2 dy 2 is Euclidean i given rational numbers rx , ry , we
can nd integers nx , ny such that
(60)
Since we know that we can nd an integer within 12 of any rational number (and
that this estimate is best possible!), the quantity in question is at most ( 21 )2 +|d|( 12 )
if d < 0 and at most d4 when d > 0. So the values of d for which (60) holds are precisely d = 1, 2, 2, 3. This should
be a familiar list: these are
precisely the values
of d for which you proved that Z[ d] is a PID. Whenever Z[ d] is a PID, one can
use Euclids Lemma to solve the problem of which primes (and in fact which integers, with more care) are integrally represented by x2 dy 2 . The Davenport-Cassels
Lemma allows for a slightly dierent approach: for these values of d, x2 dy 2 = N
has an integral solution i it has a rational solution i x2 dy 2 N z 2 = 0 is
isotropic, which we can answer using Legendres Theorem.
Also x2 + y 2 + z 2 satises the hypotheses of the Davenport-Cassels lemma: given
rational numbers x, y, z, nd integers n1 , n2 , n3 at most 21 a unit away, and then
(x n1 )2 + (x n2 )2 + (x n3 )2
1 1 1
+ + < 1.
4 4 4
1A notable exception is Lams 2005 text on quadratic forms, which states the result for
anisotropic forms, simplied as in Remark 1.
222
223
224
( ) ( )
for all 1 i pi and
pqi = 1
pi
q 1 (mod 4).
(Indeed, each of the rst conditions restricts q to a nonempty set of congruence
classes modulo the distinct odd primes pi , whereas the last condition is a condition
modulo a power of 2. By the Chinese Remainder Theorem this amounts to a set of
congruence conditions modulo 4p1 pr and all of the resulting congruence classes
are relatively prime to 4p1 pr , so Dirichlets Theorem applies.)
It follows that for all 1 i r,
(
) (
)( )
q
1
q
=
= 1,
pi
pi
pi
and
m
q
(
=
p1
q
pr
q
(
=
q
p1
q
pr
(
=
1
p1
1
pr
)
= 1.
The last equality holds because the number of factors of 1 is the number of primes
pi 3 (mod 4), which as observed above is an even number.
since q is a square modulo each of the distinct primes pi , by the Chinese Remainder Theorem it is also a square modulo m = p1 pr . Therefore by the Chinese
Remainder Theorem there is an integer x such that
x2 q
(mod m)
x2 m (mod q).
But according to Legendres Theorem, these are precisely the congruence conditions
necessary and sucient for the homogeneous equation
qu2 + z 2 mt2 = 0
to have a solution in integers (u, z, t), not all zero. Indeed, we must have t = 0,
for otherwise qu2 + z 2 = 0 = u = z = 0. Moreover, since q 1 (mod 4),
by Fermats Two Squares Theorem there are x, y Z such that qu2 = x2 + y 2 .
Therefore
mt2 z 2 = qu2 = x2 + y 2 ,
so
m=
( x )2
( y )2
( z )2
t
t
t
and m is a sum of three rational squares, completing the proof in this case.
Case 2: Suppose m = 2m1 = 2p1 pr with m1 = p1 pr squarefree and odd. In
this case we may proceed exactly as above, except that we require q 1 (mod 8).
Case 3: Suppose m = p1 pr is squarefree and m 3 (mod 8). By Lemma 226,
the number of prime divisors pi of m which are either 5 or 7 modulo 8 is even. By
Dirichlets
there exists a prime q such that
)
( ) (Theorem
q
2
pi = pi for all 1 i pi and
q 5 (mod 8).
It follows that for all 1 i r,
(
) (
)( )
2q
2
q
=
= 1,
pi
pi
pi
and
m
q
(
=
p1
q
pr
q
(
=
q
p1
q
pr
(
=
2
p1
225
2
pr
)
= 1.
The last equality holds because the number of factors of 1 is the number of primes
pi 5, 7 (mod 8), which as observed above is an even number.
Therefore there is an integer x such that
x2 2q
(mod m)
x m (mod q),
so by Legendres Theorem the equation
2
2qu2 + z 2 mt2 = 0
has a solution in integers (u, z, t) with t = 0. Since q 1 (mod 4), there are
x, y Z such that 2qu2 = x2 + y 2 , so
mt2 z 2 = 2qu2 = x2 + y 2 ,
and thus once again
m=
( x )2
t
( y )2
t
( z )2
t
.
226
i=1
rq (i)
227
is counting lattice points lying on or inside the ellipsoid q(x1 , . . . , xn ) = N in ndimensional Euclidean space. Recalling our previous study of this sort of problem,
we know that there exists a constant V such that
Rq (N ) V N n/2 ,
so that the average value of rq (N ) is asymptotically N 2 1 .
n
Here d(N ) is the divisor function, which recall, grows slower than any positive
power of N . One can interpret this result as saying that a local-global principle for
rq (N ) holds asymptotically, with almost square root error!
The proof of this theorem requires lots of techniques from 20th century number
theory, and in particular the introduction of objects which are a lot less elementary
and quaint than quadratic polynomials with integer coecients. Notably the proof
rst associates to a quadratic form a modular form a certain especially nice
kind of function of a complex variable and the result follows from a bound on
the coecients of a power series expansion of this function. In particular, one uses
results on the number of solutions to much more general systems of equations over
nite elds established by fundamental work of Pierre Deligne in the 1970s (work
that justly landed him the Fields Medal).
Corollary 232. Let q be a positive-denite quadratic form in n 4 variables.
Then there exists N0 such that if N N0 , q(x1 , . . . , xn ) = N satises the localglobal principle (has integral solutions i it has congruential solutions).
Again, the theory of congruential solutions is suciently well-developed so as to
enable one to determine (with some work, to be sure) precise conditions on N such
that solutions exist everywhere locally. Therefore the corollary gives a method
for solving the representation problem for integral quadratic forms in at least four
variables: (i) explicitly compute the value of N0 in the Corollary; (ii) explicitly
compute the local conditions for solvability; (iii) check each of the nitely many
values of N , 1 N N0 to see whether q(x1 , . . . , xn ) = N has a solution.
228
12 + 12 + 12 + 1 22
22 + 12 + 02 + 2 12
22 + 12 + 12 + 3 12
12 + 12 + 12 + 4 12
12 + 12 + 02 + 5 12
12 + 02 + 02 + 6 12
02 + 02 + 02 + 7 12
= 7.
= 7.
= 7.
= 7.
= 7.
= 7.
= 7.
229
230
Example: Taking S to be the prime numbers, Bhargava showed that one may take
S0 to be the primes less than or equal to 73.
The proof gives an algorithm for determining S0 , but whether or not it is practical seems to depend very much on the choice of S: it gets much harder if S does
not contain several very small integers.
Indeed, we have been saying integer matrix quadratic forms for the last few
results, but a quadratic form is represented by a polynomial with integer coecients i its dening matrix satises the slightly weaker condition that its diagonal
entries are integers and its o-diagonal entries are half-integers (e.g. q(x, y) = xy).
However, if q is any integral quadratic form, then the matrix entries of 2q are certainly integers, and q represents an integer N i 2q represents 2N . Thus, applying
Bhargavas Master Theorem to the subset of positive even integers, one deduces the
existence of an integer N0 such that if a positive-denite integral matrix represents
every N {1, . . . , N0 } then it represents every positive integer.
Already in Conways course it was suggested that N0 could be taken to be 290.
However, the calculations necessary to establish this result were Herculean: one
needs to show that each of 6, 436 quaternary quadratic forms is universal. Some
of these forms can be proven universal in relatively slick and easy ways, but about
1, 000 of them are seriously hard. So Bhargava enlisted the help of Jonathan Hanke,
and after several years of intense work (including extremely intensive and carefully
checked computer calculations), they were able to show the following result.
Theorem 236. (290 Theorem [BHxx]) If a positive-denite integral quadratic
form represents each of:
1, 2, 3, 5, 6, 7, 10, 13, 14, 15, 17, 19, 21, 22, 23, 26, 29, 30, 31, 34, 35, 37, 42, 58, 93, 110, 145, 203, 290,
then it represents all positive integers.
APPENDIX A
231
232
2. RING HOMOMORPHISMS
233
234
c(n) = 0, and one can check that Ker(c) consists of all integer multiples of n, a set
which we will denote by nZ or by (n). This integer n is called the characteristic
of R, and if no such n exists we say that R is of characteristic 0 (yes, it would seem
to make more sense to say that n has innite characteristic). As an important
example, the homomorphism c : Z Zn is an extension of the map modn to all of
Z; in particular the characteristic of Zn is n.
3. Integral Domains
A commutative ring R (which is not the zero ring!) is said to be an integral domain if it satises either of the following equivalent properties:2
(ID1) If x, y R and xy = 0 then x = 0 or y = 0.
(ID2) If a, b, c R, ab = ac and a = 0, then b = c.
(Suppose R satises (ID1) and ab = ac with a = 0. Then a(b c) = 0, so b c = 0
and b = c; so R satises (ID2). The converse is similar.)
(ID2) is often called the cancellation property and it is extremely useful when
solving equations. Indeed, when dealing with equations in a ring which is not an
integral domain, one must remember not to apply cancellation without further justication! (ID1) expresses the nonexistence of zero divisors: a nonzero element
x of a ring R is called a zero divisor if there exists y in R such that xy = 0.
An especially distressing kind of zero divisor is an element 0 = a R such that
an = 0 for some positive integer n. (If N is the least positive integer N such that
aN = 0 we have a, aN 1 = 0 and a aN 1 = 0, so a is a zero divisor.) Such an
element is called nilpotent, and a ring is reduced if it has no nilpotent elements.
One of the diculties in learning ring theory is that the examples have to run
very fast to keep up with all the denitions and implications among denitions.
But, look, here come some now:
Example 3.1: Let us consider the rings Zn for the rst few n.
The rings Z2 and Z3 are easily seen to be elds: indeed, in Z2 the only nonzero
element, 1 is its own multiplicative inverse, and in Z3 1 = 11 and 2 = (2)1 .
In the ring Z4 22 = 0, so 2 is nilpotent and Z4 is nonreduced.
In Z5 one nds after some trial and error that 11 = 1, 21 = 3, 31 =
1
2, 4 = 4 so that Z5 is a eld.
In Z6 we have 2 3 = 0 so there are zero-divisors, but a bit of calculation shows
there are no nilpotent elements. (We take enough powers of every element until we
get the same element twice; if we never get zero then no power of that element will
be zero. For instance 21 = 2, 22 = 4, 23 = 2, so 2n will equal either 2 or 4 in Z6 :
never 0.)
2The terminology integral domain is completely standardized but a bit awkward: on the
one hand, the term domain has no meaning by itself. On the other hand there is also a notion
of an integral extension of rings. And, alas, it may well be the case that an extension of integral
domains is not an integral extension! But there is no clear remedy here, and proposed changes
in the terminology e.g. Langs attempted use of entire for integral domain have not been
well received.
3. INTEGRAL DOMAINS
235
236
Remark: The eld F (R) is called the eld of fractions3 of the integral domain R.
Example 3.3 (Subrings of Q): There are in general many dierent integral domains
with a given quotient eld. For instance, let us consider the integral domains with
quotient eld Q, i.e., the subrings of Q. The two obvious ones are Z and Q, and
it is easy to see that they are the extremes: i.e., for any subring R of Q we have
Z R Q. But there are many others: for instance, let p be any prime number,
and consider the subset Rp of Q consisting of rational numbers of the form ab where
b is not divisible by any prime except p (so, taking the convention that b > 0, we
are saying that b = pk for some k). A little checking reveals that Rp is a subring
of Q. In fact, this construction can be vastly generalized: let S be any subset of
the prime numbers (possibly innite!), and let RS be the rational numbers ab such
that b is divisible only by primes in S. It is not too hard to check that: (i) RS is a
subring of Q, (ii) if S = S , RS = RS , and (iii) every subring of Q is of the form
RS for some set of primes S. Thus there are uncountably many subrings in all!
4. Polynomial Rings
Let R be a commutative ring. One can consider the ring R[T ] of polynomials with
coecients in T : that
n is, the union over all natural numbers n of the set of all
formal expressions i=0 ai T i (T 0 = 1). (If an = 0, this polynomial is said to have
degree n. By convention, we take the zero polynomial to have degree .) There
are natural addition and multiplication laws which reduce to addition in R, the law
T i T j = T i+j and distributivity. (Formally speaking we should write down these
laws precisely and verify the axioms, but this is not very enlightening.) One gets a
commutative ring R[T ].
One can also consider polynomial rings in more than one variable: R[T1 , . . . , Tn ].
These are what they sound like; among various possible formal denitions, the most
technically convenient is an inductive one: R[T1 , . . . , Tn ] := R[T1 , . . . , Tn1 ][Tn ], so
e.g. the polynomial ring R[X, Y ] is just a polynomial ring in one variable (called
Y ) over the polynomial ring R[X].
Proposition 240. R[T ] is an integral domain i R is an integral domain.
Proof. R is naturally a subring of R[T ] the polynomials rT 0 for r R and
any subring of an integral domain is a domain; this shows necessity. Conversely,
suppose R is an integral domain; then any two nonzero polynomials have the form
an T n + an1 T n1 + . . . + a0 and bm T m + . . . + b0 with an , bm = 0. When we
multiply these two polynomials, the leading term is an bm T n+m ; since R is a domain,
an bm = 0, so the product polynomial has nonzero leading term and is therefore
nonzero.
Corollary 241. A polynomial ring in any number of variables over an integral
domain is an integral domain.
This construction gives us many new integral domains and hence many new elds.
For instance, starting with a eld F , the fraction eld of F [T ] is the set of all formal
3The term quotient eld is also used, even by me until rather recently. But since there is
already a quotient construction in ring theory, it seems best to use a dierent term for the fraction
construction.
5. COMMUTATIVE GROUPS
237
P (T )
quotients Q(T
) of polynomials; this is denoted F (T ) and called the eld of rational
functions over F . (One can equally well consider elds of rational functions in several variables, but we shall not do so here.)
The polynomial ring F [T ], where F is a eld, has many nice properties; in some
ways it is strongly reminiscent of the ring Z of integers. The most important common property is the ability to divide:
Theorem 242. (Division theorem for polynomials) Given any two polynomials
a(T ), b(T ) in F [T ], there exist unique polynomials q(T ) and r(T ) such that
b(T ) = q(T )a(T ) + r(T )
and deg(r(T )) < deg(a(T )).
A more concrete form of this result should be familiar from high school algebra:
instead of formally proving that such polynomials exist, one learns an algorithm for
actually nding q(T ) and r(T ). Of course this is as good or better: all one needs
to do is to give a rigorous proof that the algorithm works, a task we leave to the
reader. (Hint: induct on the degree of b.)
Corollary 243. (Factor theorem) For a(T ) F [T ] and c F , the following
are equivalent:
a) a(c) = 0.
b) a(T ) = q(T ) (T c).
Proof. We apply the division theorem with b(T ) = (T c), getting a(T ) =
q(T )(T c) + r(T ). The degree of r must be less than the degree of T c, i.e.,
zero so r is a constant. Now plug in T = c: we get that a(c) = r. So if a(c) = 0,
a(T ) = q(T )(T c). The converse is obvious.
Corollary 244. A nonzero polynomial p(T ) F [T ] has at most deg(p(T ))
roots.
Remark: The same result holds for polynomials with coecients in an integral domain R, since every root of p in R is also a root of p in the fraction eld F (R).
This may sound innocuous, but do not underestimate its power a judicious application of this Remark (often in the case R = Z/pZ) can and will lead to substantial
simplications of classical arguments in elementary number theory.
Example 4.1: Corollary 244 does not hold for polynomials with coecients in an
arbitrary commutative ring: for instance, the polynomial T 2 1 Z8 [T ] has degree
2 and 4 roots: 1, 3, 5, 7.
5. Commutative Groups
A group is a set G endowed with a single binary operation : GG G, required
to satisfy the following axioms:
(G1) for all a, b, c G, (a b) c = a (b c) (associativity)
(G2) There exists e G such that for all a G, e a = a e = a.
(G3) For all a G, there exists b G such that ab = ba = e.
238
Example: Take an arbitrary set S and put G = Sym(S), the set of all bijections
f : S S. When S = {1, . . . , n}, this is called the symmetric group of order n,
otherwise known as the group of all permutations on n elements: it has order n!.
We have notions of subgroups and group homomorphisms that are completely
analogous to the corresponding ones for rings: a subgroup H G is a subset which
is nonempty, and is closed under the group law and inversion: i.e., if g, h H
then also g h and g 1 are in H. (Since there exists some h H, also h1 and
e = h h1 H; so subgroups necessarily contain the identity.)4 And a homomorphism f : G1 G2 is a map of groups which satises f (g1 g2 ) = f (g1 ) f (g2 ) (as
mentioned above, that f (eG1 ) = eG2 is then automatic). Again we get many examples just by taking a homomorphism of rings and forgetting about multiplication.
Example 5.1: Let F be a eld. Recall that for any positive integer n, the n n
matrices with coecients in F form a ring under the operations of matrix addition
and matrix multiplication, denoted Mn (F ). Consider the subset of invertible matrices, GLn (F ). It is easy to check that the invertible matrices form a group under
matrix multiplication (the unit group of the ring Mn (F ), coming up soon). No
matter what F is, this is an interesting and important group, and is not commutative if n 2 (when n = 1 it is just the group of nonzero elements of F under
multiplication). The determinant is a map
det : GLn (F ) F \ {0};
a well-known property of the determinant is that det(AB) = det(A) det(B). In
other words, the determinant is a homomorphism of groups. Moreover, just as
for a homomorphism of rings, for any group homomorphism f : G1 G2 we can
consider the subset Kf = {g G1 | f (g) = eG2 } of elements mapping to the identity
element of G2 , again called the kernel of f . It is easy to check that Kf is always a
subgroup of G1 , and that f is injective i Kf = 1. The kernel of the determinant
map is denoted SLn (F ); by denition, it is the collection
matrices
[ of all n n ]
cos
sin
5
with determinant 1. For instance, the rotation matrices
form a
sin cos
subset (indeed, a subgroup) of the group SL2 (R).
Theorem 245. (Lagrange) For a subgroup H of the nite group G, we have
#H | #G.
The proof6 is combinatorial: we exhibit a partition of G into a union of subsets
Hi , such that #Hi = #H for all i. Then, the order of G is #H n, where n is the
number of subsets.
The Hi s will be the left cosets of H, namely the subsets of the form
gH = {gh | h H}.
4Indeed there is something called the one step subgroup test: a nonempty subset H G
is a subgroup i whenever g and h are in H, then g h1 H. But this is a bit like saying you
can put on your pants in one step if you hold them steady and jump into them: its true but
not really much of a time saver.
5The GL stands for general linear and the SL stands for special linear.
6This proof may be too brief if you have not seen the material before; feel free to look in any
algebra text for more detail, or just accept the result on faith for now.
5. COMMUTATIVE GROUPS
239
Here g ranges over all elements of G; the key is that for g1 , g2 G, the two cosets
g1 H and g2 H are either equal or disjoint i.e., what is not possible is for them to
share some but not all elements. To see this: suppose x g1 H and is also in g2 H.
This means that there exist h1 , h2 H such that x = g1 h1 and also x = g2 h2 ,
1
so g1 h1 = g2 h2 . But then g2 = g1 h1 h1
is
2 , and since h1 , h2 H, h3 := h1 h2
also an element of H, meaning that g2 = g1 h3 is in the coset g1 H. Moreover, for
any h H, this implies that g2 h = g1 h3 h = g1 h4 g1 H, so that g2 H g1 H.
Interchanging the roles of g2 and g1 , we can equally well show that g1 H g2 H, so
that g1 H = g2 H. Thus overlapping cosets are equal, which was to be shown.
Remark: In the proof that G is partitioned into cosets of H, we did not use the
niteness anywhere; this is true for all groups. Indeed, for any subgroup H of any
group G, we showed that there is a set S namely the set of distinct left cosets
{gH} such that the elements of G can be put in bijection with S H. If you know
about such things (no matter if you dont), this means precisely that #H divides
#G even if one or more of these cardinalities is innite.
Corollary 246. If G has order n, and g G, then the order of g i.e., the
least positive integer k such that g k = 1 divides n.
Proof. The set of all positive powers of an element of a nite group forms a
subgroup, denoted g, and it is easily checked that the distinct elements of this
group are 1, g, g 2 , . . . , g k1 , so the order of g is also #g. Thus the order of g
divides the order of G by Lagranges Theorem.
Example 5.2: For any ring R, (R, +) is a commutative group. Indeed, there is
nothing to check: a ring is simply more structure than a group. For instance, we
get for each n a commutative group Zn just by taking the ring Zn and forgetting
about the multiplicative structure.
A group G is called cyclic if it has an element g such that every element x in
G is of the form 1 = g 0 , g n := g g . . . g or g n = g 1 g 1 . . . g 1 for some
positive integer n. The group (Z, +) forms an innite cyclic group; for every positive integer n, the group (Zn , +) is cyclic of order n. It is not hard to show that
these are the only cyclic groups, up to isomorphism.
An element u of a ring R is a unit if there exists v R such that uv = vu = 1.
Example 5.3: 1 is always a unit; 0 is never a unit (except in the zero ring, in
which 0 = 1). The units in Z are 1.
A nonzero ring is a division ring i every nonzero element is a unit.
The set of all units in a ring is denoted R . It is not hard to see that the units
form a group under multiplication: for instance, if u and v are units, then they
have two-sided inverses denoted u1 and v 1 , and then
uv (v 1 u1 ) = (v 1 u1 )uv = 1,
so uv is also a unit. Similarly, the (unique) inverse u1 of a unit is a unit. In
general, R is not commutative, but of course it will be if R is commutative.
240
241
or (a); they are simple and easy to understand and are called principal ideals and
a is called a generator.
Proposition 248. (To contain is to divide) Let a and b be elements of R. The
following are equivalent:
a) (a) (b).
b) a | b; i.e., b = ac for some c in R.
Proof. Exercise!
242
243
244
APPENDIX B
246
247
n
gcd(a,n) .
d | ka , and since gcd( d , a) = 1, the least such k is d .
Corollary 261. Let a Z, n Z+ .
a) The class of a Zn is a generator if and only if gcd(a, n) = 1. In particular
there are (n) generators.
b) For any d | n,
there are precisely (d) elements of Zn of order d.
c) It follows that d | n (d) = n.
Proof. Part a) is immediate from Proposition 260. For any d | n, each element
of order d generates a cyclic subgroup of order d, and we know that there is exactly
one such subgroup of Zn , so the elements of order d are precisely the (d) generators
of this cyclic group. Part c) follows: the left hand side gives the number of elements
of order d for each d | n and the right hand side is #Zn .
This leads to a very useful result:
Theorem 262. (Cyclicity criterion) Let G be a nite group, not assumed to be
commutative. Suppose that for each n Z+ , there are at most n elements x in G
with xn = e. Then G is cyclic.
Proof. Suppose G has order N , and for all 1 d N , let f (d) be the number
of elements
of G of order d. By Lagranges Theorem, f (d) = 0 unless d | N , so
N = #G = d | N f (d). Now, if f (d) = 0 then there exists at least one element of
order d, which therefore generates a cyclic group of order d, whose elements give
d solutions to the equation xd = e. By our assumption there cannot be any more
solutions to this equation, hence all the elements of order d are precisely the (d)
generators of this cyclic group. In other words, for all d |n we have either f (d) = 0
or f (d) = (d), so in any case we have
N=
f (d)
(d) = N.
d | N
d | N
Therefore we must have f (d) = (d) for all d | N , including d = N , i.e., there exists
an element of G whose order is the order of G: G is cyclic.
Corollary 263. Let F be a eld, and let G F be a nite subgroup of the
group of nonzero elements of F under multiplication. Then G is cyclic.
Proof. By basic eld theory, for any d Z+ the degree d polynomial td 1
can have at most d solutions, so the hypotheses of Theorem 262 apply to G.
3. Products of Elements of Finite Order in a Commutative Group
Let G be a commutative group, and let x, y G be two elements of nite order, say
of orders m and n respectively. There is a unique smallest subgroup H = H(x, y) of
G containing both x and y, called the subgroup generated by x and y. H(x, y)
is the set of all elements of the form xa y b for a, b Z. Moreover, since x has order
m and y has order n, we may write every element of H as xa y b with 0 a < m,
0 b < n, so that #H mn. In particular the subgroup of an abelian group
248
gcd( m
f , n)
= 1,
m
f
| gcd(a, m), or
m | f gcd(a, m) | f a.
249
Similarly
n | f gcd(a, n) | f a.
Therefore lcm(m, n) | f a, or
lcm(m,n)
gcd(m,n)
Remark: The divisibilities in Theorem 264 are best possible: if h and o are positive
integers such that lcm(m, n) | h | mn and lcm(m,n)
gcd(m,n) | o | lcm(m, n), then there exist
elements x, y Zm Zn such that #H(x, y) = h, #xy = o.
Remark: The situation is profoundly dierent for noncommutative groups: for
every m, n 2 and 2 r there exists a group G containing elements x of
order m, y of order n whose product xy has order r. Moreover, if r < then one
can nd a nite group G with these properties, whereas one can nd an innite
1
group with these properties i m
+ n1 + 1r 1.
The following is a consequence of Theorem 264 (but is much simpler to prove):
Corollary 265. Let m, n Z+ . Then Zm Zn is cyclic i gcd(m, n) = 1.
Proof. The order of any element (c, d) divides lcm(m, n), and the order of
(1, 1) is lcm(m, n). So the group is cyclic i mn = lcm(m, n) i gcd(m, n) = 1.
4. Character Theory of Finite Abelian Groups
4.1. Introduction.
In this section our goal is to present the theory of characters of nite abelian
groups. Although this is an easy theory in that we can present it in its entirety
here, it nevertheless of the highest impotance, being the jumping o point for at
least two entire disciplines of mathematics: the general theory of linear representations of groups, and Fourier analysis. The special case of characters of the unit
groups U (N ) = (Z/N Z) will be used as one of the essential ingredients in the
proof of Dirichlets theorem on primes in arithmetic progessions.
Let G be a nite commutative group. A character : G C of G is a homomorphism from G to the group C of nonzero complex numbers under multiplication.
Suppose N = #G. By Lagranges theorem we have, for any g G, that g N = e
(the identity element), and thus for any character on G we have
(g)N = (g N ) = (e) = 1.
Thus (g) is itself a complex N th root of unity. Recall that the set of all complex
N th roots of unity forms a cyclic group of order N , say N . In other words, every
character on a group G of order N is really just a homomorphism from G to N ,
or equally well, from G into any xed order N cyclic group.
We write X(G) for the set of all characters of G. We can endow X(G) with the
structure of a group: given 1 , 2 X(G), we dene their product pointwise:
g G, (1 2 )(g) := 1 (g)2 (g).
250
The identity element is the trivial character g 7 1 for all g, and the inverse of
1
is the function 1 : g 7 (g)
. Because for any z C we have zz = |z|2 , if if z is a
root of unity, then the inverse of z is given by its complex conjugate z. It follows
that the inverse of a character is also given by taking complex conjugates:
1
(g) = (g) =
= 1 (g).
(g)
4.2. The Character Extension Lemma.
Most of the content of the entire theory resides in the following result.
Lemma 266. (Character Extension Lemma) Let H be a subgroup of a nite
commutative group G. For any character : H C , there are precisely [G : H]
characters : G C such that |H = .
Proof. The result is clear if H = G, so we may assume there is g G \ H.
Let Hg = g, H be the subgroup generated by H and g. Now we may or may not
have Hg = G, but suppose that we can establish the result for the group Hg and
its subgroup H. Then the general case follows by induction, since for any H G
choose g1 , . . . , gn such that G = H, g1 , . . . , gn . Then we can dene G0 = H and
for 1 i n, Gi = Gi1 , gi . Applying the Lemma in turn to Gi1 as a subgroup
of Gi gives that in all the number of ways to extend the character of H = G0 is
[G1 : G0 ][G2 : G1 ] [Gn : Gn1 ] = [G : G0 ] = [G : H].
So let us now prove that the number of ways to extend from H to Hg = H, g
:= H g. The
is [Hg : H]. For this, let d be the order of g in G, and consider G
is equal to
number of ways to extend a character of H to a character of G
#g = d: such a homomorphism is uniquely specied by the image of (1, g) in
d C , and all d such choices give rise to homomorphisms.
Moreover, there is a surjective homomorphism : H g to Hg : we just take
(h, g i ) 7 hg i . The kernel of is the set of all pairs (h, g i ) such that g i = h. In
other words it is precisely the intersection H g, which has cardinality a divisor
of d, say e. It follows that
#Hg =
#H g
d
= #H,
#H g
e
so
d
.
e
But a homomorphism f : H g C descends to a homomorhpism on the
quotient Hg i it is trivial on the kernel of the quotient map, i.e., is trivial on H g.
In other words, the extensions of to a character of Hg correspond precisely to the
d
number of ways to map the order d element g into C such that g e gets mapped to
d
1. Thus we must map g to a ( e )th root of unity, and conversely all such mappings
induce extensions of . Thus the number of extensions is de = [Hg : H].
[Hg : H] =
Corollary 267. For any nite commutative group G, X(G) is nite and
#X(G) = #G.
Proof. Apply Lemma 266 with H = 1.
251
S=
(g).
gG
(g0 )S =
(g)(g0 ) =
(gg0 ) =
(g) = S;
gG
gG
gG
252
whereas if 1 = 2 , then
1 1
2
In other words, the set X(G) of characters of G is orthonormal with respect to the
given inner product. In particular, the subset X(G) of CG is linearly independent.
Since its cardinality, #G, is equal to the dimension of CG , we conclude:
Corollary 271. Let G be a nite commutative group, and let CG be the Cvector space of all functions from G to C, endowed with the inner product
1
f, g =
f (x)g(x).
#G
xG
f=
f, .
X(G)
This can be viewed as the simplest possible case of a Fourier inversion formula.
4.4. The canonical and illicit isomorphism theorems; Pontrjagin duality.
In the course of study of nite commutative groups, one sees that subgroups and
quotient groups have many similar properties. For instance, subgroups of cyclic
groups are cyclic, and also quotients of cyclic groups are cyclic. Moreover, a cyclic
group of order n has a unique subgroup of every order dividing n and no other
subgroups, and the same is true for its quotients. If one plays around for a bit with
nite commutative groups, one eventually suspects the following result:
1Alternately, using the canonical isomorphism G X(X(G)) described in the next section,
=
one can literally deduce part b) from part a).
253
then r = s and there exists a bijection : {1, . . . , r} {1 . . . s} such that for all
1 i r, q(i) = pi and b(i) = ai .
254
Now please bear with me while I make a few possibly confusing remarks about why
I have labelled Theorem 273 the illicit isomorphism theorem. In some sense it is
lucky that G
= X(G), in that it is not part of the general meaning of duality
that an object be isomorphic to its dual object. Rather, what one has in much
more generality is a canonical injection from an object to its double dual. Here,
this means the following: we can construct a canonical map G X(X(G)). In
other words, given an element g in G, we want to dene a character, say g, on the
character group, i.e., a homomorphism X(G) C . This may sound complicated
at rst, but in fact there is a very easy way to do this: dene g := (g)! It is no
problem to check that the association g 7 g is a homomorphism of nite abelian
groups. Moreover, suppose that for any xed g G the map g were trivial: that
means that for all X(G), (g) = 1. Applying Corollary 268, we get that g = 1.
Therefore this map
: G X(X(G))
is an injective homomorphism between nite abelian groups. Moreover,
#X(X(G)) = #X(G) = #G,
so it is an injective homomorphism between nite groups of the same order, and
therefore it must be an isomorphism.
In order to write down the isomorphism , we did not have to make any choices.
There is a precise sense in which the isomorphism to the double dual is canonical
and any isomorphism between G and X(G) is noncanonical, but explanining this
involves the use of category theory so is not appropriate here. More interesting is to
remark that there is a vastly more general class of commutative groups G for which
X(G) is dened in such a way as to render true all of the results we have proved here
except the illicit isomorphism theorem: we need not have G
= X(G). For this we
take G to be a commutative group endowed with a topology which makes it locally
compact Hausdor. Any commutative group G can be endowed with the discrete
topology, which gives many examples. For a nite group the discrete topology is the
only Hausdor topology, so this is certainly the right choice, but an innite group
may or may not carry other interesting locally compact topologies. Some examples:
Example 1: The integers Z: here we do want the discrete topology.
Example 2: The additive group (R, +) with its usual Euclidean topology: this
is a locally compact group which is neither discrete nor compact. More generally, one can take (Rn , +) (and in fact, if G1 and G2 are any two locally compact
commutative groups, then so is G1 G2 when endowed with the product topology).
Example 3: The multiplicative group C of the complex numbers is again locally compact but neither discrete nor compact, but it is closer to being compact
then the additive group C
= R2 . In fact, considering polar coordinates gives an
isomorphism of topological groups C
= R>0 S 1 , where S 1 is the unit circle.
Moreover, the logarithm function shows that R>0 is isomorphic as a topological
group to (R, +), so all in all C
= (R, +) S 1 . Note that S 1 , the circle group, is
itself a very interesting example.
255
Now, given any locally compact commutative group G, one denes the Pontrjagin
dual group X(G), which is the group of all continuous group homomorphisms
from G to the circle group S 1 . Moreover, X(G) can be endowed with a natural
topology.2 Again, one has a natural map G X(X(G)) which turns out to be an
isomorphism in all cases.
If G is a nite, discrete commutative group, then as we saw, any homomorphism
to C lands in S 1 (and indeed, the countable subgroup of S 1 consisting of all roots
of unity) anyway; moreover, by discreteness every homomorphism is continuous.
Thus X(G) in this new sense agrees with the character group we have dened. But
for innite groups Pontrjagin duality is much more interesting: it turns out that
G is compact i X(G) is discrete.3 Since a topological space is both compact and
discrete i it is nite, we conclude that a topological group G which is innite and
either discrete or compact cannot be isomorphic to its Pontrjagin dual.
It is easy to see that Hom(Z, S 1 ) = S 1 , which according to the general theory
implies Hom(S 1 , S 1 ) = Z: the discrete group Z and the compact circle group S 1
are mutually dual. This is the theoretical underpinning of Fourier series.
However, if G is neither discrete nor compact, then the same holds for X(G),
so there is at least a ghting chance for G to be isomorphic to X(G). Indeed this
happens for R: Hom(R, S 1 ) = R, where we send x R to the character t 7 e2itx .
This is the theoretical underpinning of the Fourier transform.
Another sense in which the isomorphism between G and X(G) for a nite commutative group G is illicit is that turns out not to be necessary in the standard
number-theoretic applications. A perusal of elementary number theory texts reveals that careful authors take it as a sort of badge of honor to avoid using the
illicit isomorphism, even if it makes the proofs a bit longer. For example, the most
natural analysis of the group structure of (Z/2a Z) for a 3 would consist in
showing: (i) the group has order 2a1 ; (ii) it has a cyclic subgroup of order 2a2 ;
(iii) it has a noncyclic quotient so is itself not cyclic. Applying Theorem 274 one
can immediately conclude that it must be isomorphic to Z2a2 Z2 . In our work
in Handout 9.5, however, we show the isomorphism by direct means.
This was rst drawn to my attention by a close reading of J.-P. Serres text
[Se73] in which the illicit isomorphism is never used. Following Serre, our main
application of character groups namely the proof of Dirichlets theorem on primes
in arihtmetic progressions uses only #X(G) = #G, but not X(G)
= G.
However, to my mind, avoiding the proof of Theorem 274 gives a misleading impression of the diculty of the result.4 On the other hand, Theorem 274 evidently
has some commonalities with the fundamental theorem of arithmetic, which makes
2If you happen to know something about topologies on spaces of functions, then you know
that there is one particular topology that always has nice properties, namely the compact-open
topology. That is indeed the correct topology here.
3Similarly, G is discrete i X(G) is compact; this follows from the previous statement together
with G
= X(X(G)).
4The real reason it is often omitted in such treatments is that the authors know that they
will be giving a more general treatment of the structure theory nitely generated modules over a
principal ideal domain, of which the theory of nite commutative groups is a very special case.
256
it somewhat desirable to see the proof. In the next section we provide such a proof,
which is not in any sense required reading.
5. Proof of the Fundamental Theorem on Finite Commutative Groups
First some terminology: Let G be a commutative group, written multiplicatively.
If #G = pa is a prime power, we say G is a p-group.
For n Z+ , we put G[n] = {x G | xn = 1}. This is a subgroup of G.
We say that two H1 , H2 subgroups of G are complementary if H1 H2 = {1},
H1 H2 = G. In other words, every element g of G can be uniquely expressed in
the form h1 h2 , with hi Hi . In yet other (equivalent) words, this means precisely
that the homomorphism H1 H2 G, (h1 , h2 ) 7 h1 h2 is an isomorphism. We
say that a subgroup H is a direct factor of G if there exists H such that H, H
are complementary subgroups. Thus, in order to prove part a) it suces to show
that every nite commutative group has a nontrivial direct factor which is cyclic of
prime power order; and in order to prove part b) it suces (but is much harder!)
to show that if G
= H .
= H H then H
= H H
More generally if we have a nite set {H1 , . . . , Hr } of subgroups of G such that
Hi Hj = {1} for all i = j and G = H1 Hr , we say that the Hi s form a set of
complementary subgroups and that each Hi is a direct factor. In such a circumstance we have G
= H1 . . . Hr .
We now begin the proof of Theorem 274.
Step 1 (primary decomposition): For any commutative group G, let Gp be the
set of elements of G whose order is a power of p. Also let Gp be the set of elements
of G whose order is prime to p. It follows from Theorem 264b) that Gp and Gp are
Certainly Gp Gp = {e}, since any element of the intersection would have both
order a power of p and relatively prime to p and thus have order 1 and be the
identity. On the other hand, let x be any element of G, and write its order as pk b
with gcd(p, b) = 1. Thus we can choose i and j such that ipk + jb = 1, and then
k
k
k
x = x1 = xip +jb = (xp )i (xb )j , and by Proposition 260 the order of (xp )i divides
b (so is prime to p) and the order of (xb )j divides pk . This proves the claim. Now
a simple induction argument gives the following:
Proposition 275. Let G be a nite abelian group, of order n = pa1 1 par r .
Then {Gpi }ri=1 forms a set of complementary subgroups, and the canonical map
H1 . . . Hr G, (h1 , . . . , hr ) 7 h1 hr is an isomorphism.
Thus any nite commutative group can be decomposed, in a unique way, into a direct product of nite commutative groups of prime power order. We may therefore
assume that G is a commutative p-group from now on.
Step 2: We prove a renement of Theorem 262 for commutative p-groups.
k1
k1
258
#G
#G
= b1 +...+b = psk ,
k
pa1 +...+ak
p
259
G[2] = {x G | 2x = 0}.
Every nonzero element of G[2] has order 2, so by Theorem 274, G[2]
= Z2 . . .
Z2 = Z2k , a direct product of copies of the cyclic group of order 2.6
Consider the involution : G G given by x 7 x. The xed points of
i.e., the elements x G such that (x) = x are precisely the elements of
G[2].
Thus the elements of G \ G[2]
occur inpairs of distinct elements x, x, so
x = 0.
x=
xG[2]
x{0}
Moreover, in this case d2 (G) = 0, in agreement with the statement of the theorem.
Case 2: k = 1, i.e., G[2] = Z2 . Then
x=
x = 0 + 1 = 1,
xG[2]
xZ2
260
Gi [2].
Gi [2] =
i=1
i=1
r
r
b) If G1 , . . . , Gr are nite, then d2 ( i=1 Gi ) = i=1 d2 (Gi ).
r
Proof. a) More generally, consider any x = (x1 , . . . , xn ) i=1 Gi . Then
the order of x is the lcm of the orders of the components xi . Further, the lcm of a
nite set of numbers divides 2 i each number in the set divides 2.
b) This follows immediately from part a).
We now show that Theorem 280 implies Theorem 279. Let F be a nite eld.
We take G = F , the multiplicative group of nonzero elements of F.7 Now x
G[2] x2 = 1, and the polynomial t2 1 has exactly two roots in any eld of
characteristic dierent from 2 and exactly one root in any eld of characteristic 2.
So d2 (F ) is equal to 1 if #F is odd and equal to 0 if #F is even. Thus:
U (pbi i ).
i=1
261
After this section was rst written, I found a closely related paper of the early
American group theorist George Abram Miller [Mi03]. In particular Miller proves
Theorem 280 (with a very similar proof) and applies it to prove Theorem 283. That
this result was rst stated and proved by Gauss is not mentioned in Millers paper,
but its title suggests that he may have been aware of this.
APPENDIX C
More on Polynomials
1. Polynomial Rings
Let k be a eld, and consider the univariate polynomial ring k[t].
Theorem 284. The ring k[t] is a PID and hence a UFD.
Proof. In fact the argument is very close to the one we used to show that Z
is a PID. Namely, let I be an ideal of k[t]: we may assume that I = 0. Let b be a
nonzero element of I of minimal degree: we claim that I = a. Indeed, let a be
any element of I. By polynomial division (this is the key!), there are q, r k[t] such
that a = qb + r and deg r < deg b. Since a, b I, r = a qb I. Since deg r < deg b
and b has minimal degree among nonzero elements of I, we must have r = 0, and
thus a = qb and a b. Thus k[t] is a PID and hence also a UFD.
Polynomial dierentiation: When k = R, every polyomial f R[t] can be
viewed as a dierentiable function f : R R, and indeed the derivative f is again
a polynomial. Although the derivative is dened by a limiting process, when restricted to polynomials it is characterized by the following two properties:
(P1): f 7 f is an R-linear map: for all , R and all polynomials f, g,
(f + g) = f + g .
(P2): 1 = 0, and for all n Z+ , (tn ) = ntn1 .
Indeed, the set {1, tn | n Z+ } is a basis for the R-vector space R[t], and since
dierentiation is linear, it is entirely determined by its action on this basis.
Now let k be any eld. It is still true that {1, tn | n Z+ } is a k-basis for k[t], so
there is a unique k-linear endomorphism of k[t] dened by 1 = 0 and (tn ) = ntn1 .
We continue to call the operator f 7 f dierentiation and refer to f as the derivative of f , despite the fact that there are no limits here: it is purely algebraic.
Exercise: Show that for any eld k, polynomial dierentiation satises the product rule: for all f, g k[t], (f g) = f g + f g .
Exercise: Compute the kernel of dierentiation as a linear endomorphism of k[t].
In more concrete terms, nd all polynomials f k[t] such that f = 0. (Hint: the
answer strongly depends on the characteristic of k.)
We say a polynomial f k[t] is separable if gcd(f, f ) = 1.
Proposition 285. A separable polynomial is squarefree.
263
264
C. MORE ON POLYNOMIALS
2. FINITE FIELDS
265
Theorem 290. Let F be a nite eld of order q, and let n Z+ . The number
of monic irreducible polynomials of degree n with coecients in Fq is
1 d (n)
q
(62)
I(F, n) =
.
n
d
d|n
Proof. For any d Z , let I(F, d) be the number of monic irreducible degree
n
d polynomials with F-coecients. Let n Z+ . Then since tq t is the squarefree
product of all monic irreducible polynomials of degrees d | n, equating degrees gives
dI(F, d) = q n .
+
d|n
d|n
qd
(n)
d
.
Corollary 291. a) For any nite eld F and any n Z+ , there is at least
one irreducible polynomial of degree n with F-coecients.
b) For every prime power q = pa , there is a nite eld F of order q.
266
C. MORE ON POLYNOMIALS
Proof. a) Theorem 290 gives us an expression for the number of monic irreducible polynomials of degree n with F-coecients. By making some very crude
estimates we can quickly see that this quantity is always positive. Indeed:
1 d (n)
1
I(F, n) =
q
(q n (q n1 + . . . + q + 1))
n
d
n
d|n
(
)
1
qn 1
1
1
=
qn
(q n (q n 1)) = > 0.
n
q1
n
n
b) By part a), there is a degree a irreducible polynomial f with Fp -coecients.
Then Fp [t]/(f ) is a nite eld of order pa .
Exercise: Try to extract from (62) more realistic estimates on the size of I(F, n).
Bibliography
[A1]
[A2]
[AD93]
[Al99]
[An57]
[AT92]
[Au12]
[Ax64]
[BR89]
[Ba11]
[BC12]
[Bh00]
[BHxx]
[Bl14]
[BR51]
[BTxx]
[CaGN]
[CaQF]
[CE59]
[Ch36]
[Cl94]
[Cl09]
[Cl21]
[CM98]
[Cox]
268
BIBLIOGRAPHY
[Coh73] P.M. Cohn, Unique factorization domains. Amer. Math. Monthly 80 (1973), 118.
[Conr-A] K.
Conrad,
Two
applications
of
unique
factorization.
http://www.math.uconn.edu/kconrad/blurbs/ringtheory/ufdapp.pdf
[Conr-B] K. Conrad, Examples of Mordells Equation.
http://www.math.uconn.edu/kconrad/blurbs/gradnumthy/mordelleqn1.pdf
[Con97] J.H. Conway, The sensual (quadratic) form. With the assistance of Francis Y. C. Fung.
Carus Mathematical Monographs, 26. Mathematical Association of America, Washington, DC, 1997.
[Con00] J.H. Conway, Universal quadratic forms and the fteen theorem. Quadratic forms and
their applications (Dublin, 1999), 23-26, Contemp. Math., 272, Amer. Math. Soc., Providence, RI, 2000.
[CS07]
W. Cao and Q. Sun, Improvements upon the Chevalley-Warning-Ax-Katz-type estimates. J. Number Theory 122 (2007), 135-141.
[Dic27]
L.E. Dickson, Integers represented by positive ternary quadratic forms. Bull. Amer.
Math. Soc. 33 (1927), 63-70.
[DH05] W. Duke and K. Hopkins, Quadratic reciprocity in a nite group. Amer. Math. Monthly
112 (2005), no. 3, 251256.
[DV09]
S. Dasgupta and J. Voight, Heegner points and Sylvesters conjecture. Arithmetic geometry, 91102, Clay Math. Proc., 8, Amer. Math. Soc., Providence, RI, 2009.
E. Ehrhrart, Une g
en
eralisation du th
eor`
eme de Minkowski. C. R. Acad. Sci. Paris 240
[Eh55]
(1955), 483-485.
Euclid, The thirteen books of Euclids Elements translated from the text of Heiberg.
[Euc]
Vol. I: Introduction and Books I, II. Vol. II: Books IIIIX. Vol. III: Books XXIII and
Appendix. Translated with introduction and commentary by Thomas L. Heath. 2nd ed.
Dover Publications, Inc., New York, 1956.
[EGZ61] P. Erd
os, A. Ginzburg and A. Ziv, Theorem in the additive number theory. Bull. Research Council Israel 10F (1961), 4143.
[F]
D.E. Flath, Introduction to Number Theory. Wiley-Interscience Publications, 1989.
H. Furstenberg, On the innitude of primes. Amer. Math. Monthly 62 (1955), 353.
[Fu55]
[Go59]
S.W. Golomb, A Connected Topology for the Integers. Amer. Math. Monthly 66 (1959),
663665.
[GC68] I.J. Good and R.F. Churchhouse, The Riemann hypothesis and pseudorandom features
of the M
obius sequence. Math. Comp. 22 (1968), 857861.
[GM84] R. Gupta and M.R. Murty, A remark on Artins conjecture. Invent. Math. 78 (1984),
127-130.
[GPZ]
J. Gebel, A. Peth
o and H.G. Zimmer, On Mordells equation. Compositio Math. 110
(1998), 335-367.
[Hag11] T. Hagedorn, Primes of the form x2 + ny 2 and the geometry of (convenient) numbers,
preprint.
[Ham68] J. Hammer, On some analogies to a theorem of Blichfeldt in the geomety of numbers.
Amer. Math. Monthly 75 (1968), 157160.
[Han04] J.P. Hanke, Local densities and explicit bounds for representability by a quadratric form.
Duke Math. J. 124 (2004), 351-388.
[Har73] H. Harborth, Ein Extremalproblem f
ur Gitterpunkte. Collection of articles dedicated
to Helmut Hasse on his seventy-fth birthday. J. Reine Angew. Math. 262/263 (1973),
356-360.
BIBLIOGRAPHY
269
G. K
arolyi, Cauchy-Davenport theorem in group extensions. Enseign. Math. (2) 51
(2005), 239-254.
[Ke83]
A. Kemnitz, On a lattice point problem. Ars Combin. 16 (1983), B, 151-160.
[KvdL04] T. Kotnik and J. van de Lune, On the order of the Mertens function. Experiment.
Math. 13 (2004), 473-481.
H.W. Lenstra, Solving the Pell equation. Notices Amer. Math. Soc. 49 (2002), 182-192.
[Le02]
[Li33]
F.A Lindemann, The Unique Factorization of a Positive Integer. Quart. J. Math. 4,
319-320, 1933.
H. London and R. Finkelstein, On Mordells equation y 2 k = x3 . Bowling Green State
[LF73]
University, Bowling Green, Ohio, 1973.
[Le96]
M. Lerch, Sur un th
eor`
eme de Zolotarev. Bull. Intern. de lAcad. Francois Joseph 3
(1896), 34-37.
[Mar71] C.F. Martin, Unique factorization of arithmetic functions. Aequationes Math. 7 (1971),
211.
[Me09]
I.D. Mercer, On Furstenbergs proof of the innitude of primes. Amer. Math. Monthly
116 (2009), 355-356.
G.A. Miller, A new proof of the generalized Wilsons theorem. Ann. of Math. (2) 4
[Mi03]
(1903), 188190.
[M]
L.J. Mordell, Diophantine equations. Pure and Applied Mathematics, Vol. 30 Academic
Press, London-New York, 1969.
[Mo69]
L.J. Mordell, On the magnitude of the integer solutions of the equation ax2 +by 2 +cz 2 =
0. J. Number Theory 1 (1969), 13.
[Mo79]
P. Morton, A generalization of Zolotarevs theorem. Amer. Math. Monthly 86 (1979),
374-375.
[MT06] M.R. Murty and N. Thain, Prime numbers in certain arithmetic progressions. Funct.
Approx. Comment. Math. 35 (2006), 249-259.
[MT07] M.R. Murty and N. Thain, Picks theorem via Minkowskis theorem. Amer. Math.
Monthly 114 (2007), 732-736.
T. Nagell, Introduction to number theory. 2nd ed. New York: Chelsea Publishing Com[Nag]
pany; 1964.
[Nar]
W. Narkiewicz. Elementary and analytic theory of algebraic numbers. Third edition.
Springer Monographs in Mathematics. Springer-Verlag, Berlin, 2004.
[NZM]
I. Niven, H.S. Zuckerman and H.L. Montgomery. An introduction to the theory of numbers. 5th ed. New York: John Wiley & Sons, Inc.; 1991.
[Nu10]
L.M. Nunley, Geometry of Numbers Approach to Small Solutions of the Extended Legendre Equation. UGA Masters thesis, 2010.
[Ol76]
J.E. Olson, On a combinatorial problem of Erd
os, Ginzburg, and Ziv. J. Number Theory
8 (1976), 52-57.
[OlR85] A.M. Odlyzko and H.J.J. te Riele Disproof of the Mertens conjecture. J. Reine Angew.
Math. 357 (1985), 138-160.
[Pi87]
J. Pintz. An eective disproof of the Mertens conjecture. Ast
erisque 147148 (1987),
325-333.
[Ra17]
Ramanujan, On the expression of a number in the form ax2 + by 2 + cz 2 + du2 ,
Proceedings of the Cambridge Philosophical Society 19(1917), 1121. See also
http://en.wikisource.org/wiki/Proceedings of the Cambridge Philosophical Society
[Re07]
C. Reiher, On Kemnitz conjecture concerning lattice-points in the plane. Ramanujan
J. 13 (2007), 333-337.
[Ro63]
K. Rogers, Classroom notes: Unique Factorization. Amer. Math. Monthly 70 (1963),
547548.
[RS62]
J.B. Rosser and L. Schoenfeld, Approximate formulas for some functions of prime
numbers. Illinois J. Math. 6 (1962), 64-94.
P. Samuel, Unique factorization. Amer. Math. Monthly 75 (1968), 945952.
[Sa68]
[SatR14] Yannick Saouter and H. te Riele, Improved results on the Mertens conjecture. Math.
Comp. 83 (2014), 421-433.
J.-P. Serre, A course in arithmetic. Translated from the French. Graduate Texts in
[Se73]
Mathematics, No. 7. Springer-Verlag, New York-Heidelberg, 1973.
C.L. Siegel, Lectures on the geometry of numbers. Berlin: Springer-Verlag; 1989.
[SiGN]
[K
a05]
270
BIBLIOGRAPHY