Analytic Number Theory
Analytic Number Theory
THOMAS F. BLOOM
These are lecture notes for the Part III lecture course given in Lent Term 2019.
They are meant to be a faithful copy of the material given in lectures, with some
supplementary footnotes and historical notes. The lectures themselves are the guide
for what material is examinable, and any additional material in these printed notes
will be marked as non-examinable. In the case of any doubt, ask the lecturer.
As these are informal lecture notes, I have not given proper references. My main
source when compiling these notes, and the recommended textbook for the course,
is Multiplicative Number Theory by Montgomery and Vaughan.
If you have any questions, concerns, corrections, please contact the lecturer at
tb634@cam.ac.uk.
Elementary Techniques
1. Arithmetic functions
An arithmetic function is simply a function on the natural numbers1, f : N → R.
An arithmetic function is multiplicative if
f (nm) = f (n)f (m) whenever (n, m) = 1,
and is completely multiplicative if f (nm) = f (n)f (m) for all n, m ∈ N.
An important operation on the space of arithmetic functions is that of multi-
plicative convolution:
X
f ? g(n) = f (a)g(b).
ab=n
If f and g are both multiplicative functions, then so too is f ? g. The most obvious
arithmetic function is the constant function:
1(n) = 1.
We recall the definition of the Möbius function,
(
(−1)k if n = p1 · · · pk where pi are distinct primes and
µ(n) =
0 otherwise (i.e. if n is divisible by a square).
A fundamental relationship is that of Möbius inversion, which says that the Möbius
function acts as an inverse to multiplicative convolution:
1 ? f = g if and only if µ ? g = f.
A great deal of analytic number theory is concerned with a deep study of the
distribution of the prime numbers. For this the ‘correct’ way to count primes is
not, as one might expect, the indicator function
(
1 if n is prime, and
1P (n) =
0 otherwise,
3
4 THOMAS F. BLOOM
but instead the von Mangoldt function, which firstly also counts prime powers
pk , but also counts them not with weight 1, but with weight log p instead:
(
log p if n = pk and
Λ(n) =
0 otherwise.
The main reason that this function is much easier to work with than 1P directly, is
the following identity.
Lemma 1.
1 ? Λ(n) = log n and log ?µ(n) = Λ(n).
Proof. The second identity follows from the first by Möbius inversion. To establish
the first, if we let n = pk11 · · · pkr r , then
ki
r X
X
1 ? Λ(n) = log pi
i=1 j=1
Xr
= log pki i
i=1
= log n.
2. Summation
A major theme of analytic number theory is understanding the basic arithmetic
functions,
P particularly how large they are on average, which means understand-
ing n≤x f (n). For example, if f is the indicator function of primes, then this
summatory function is precisely the prime counting function π(n).
We say that f has average order g if
X
f (n) ∼ xg(x).
n≤x
One of the most useful tools in dealing with summations is partial summation,
which is a discrete analogue of integrating by parts.
Theorem 1 (Partial summation). If an is any sequence of complex numbers and
f : R+ → R is such that f 0 is continuous then
X Z x
an f (n) = A(x)f (x) − A(t)f 0 (t) dt,
1≤n≤x 1
P
where A(x) = 1≤n≤x an .
Proof. Let N = bxc. Using an = A(n) − A(n − 1)
X N
X
an f (n) = f (n)(A(n) − A(n − 1))
1≤n≤N n=1
N
X −1
= f (N )A(N ) − A(n)(f (n + 1) − f (n)).
n=1
ANALYTIC NUMBER THEORY 5
X N
X −1 Z n+1
an f (n) = f (N )A(N ) − A(x)f 0 (x) dx,
1≤n≤N n=1 n
This is extremely useful even when the coefficients an are identically 1, when
A(x) = bxc = x + O(1).
Lemma 2.
X 1
= log x + γ + O(1/x),
n
n≤x
Lemma 3.
X
log n = x log x − x + O(log x).
1≤n≤x
3. Divisor function
We now turn our attention to number theory proper, and examine one of those
most important arithmetic functions: the divisor function2
X X
τ (n) = 1 ? 1(n) = 1= 1.
ab=n d|n
It is a deep and difficult problem to improve the error term here – the truth is
probably O(x1/4+ ), but this is an open problem, and the best known is O(x0.3149··· ).
We have just shown that the ‘average’ number of divisors of n is log n. The worst
case behaviour can differ dramatically from this average behaviour, however.
Theorem 3. For any n ≥ 1,
τ (n) ≤ nO(1/ log log n) .
In particular, for any > 0, τ (n) = O (n ).
Proof. Let 0 < < 1/2 be something to be chosen later. Let n = pk11 · · · pkr r , so
that τ (n) = (k1 + 1) · · · (kr + 1), and hence
r
τ (n) Y ki + 1
= .
n i=1
pki i
The proof of this is surprisingly involved, and we will return to it later in the course
when we examine the Riemann zeta function.
It is much easier to show, if not an asymptotic formula, at least that this is the
correct rate of growth of the function. This was proved in 1850 by Chebyshev.
Theorem 4 (Chebyshev).
ψ(x) x.
Proof. We will first prove the lower bound. This relies on the observation that, for
any x ≥ 1, bxc ≤ 2bx/2c + 1.
X
ψ(x) = Λ(n)
n≤x
X j x k
j x k
≥ Λ(n) −2
n 2n
n≤x
X X
= Λ(n) − 2 Λ(n)
nm≤x nm≤x/2
X X
= log n − 2 log n.
n≤x n≤x/2
8 THOMAS F. BLOOM
By Lemma 3,
x x
ψ(x) ≥ x log x−x+O(log x)−2 log(x/2) − + O(log x) = (log 2)x+O(log x).
2 2
It follows that, for any c > 0 and x sufficiently large, ψ(x) ≥ (log 2 − c)x, and hence
ψ(x) x.
For the upper bound, we do something very similar, except we note that for
x ∈ [1, 2) we have equality bxc = 2bx/2c + 1. Furthermore, for any x ≥ 1, we have
the lower bound bxc ≥ 2bx/2c. It follows that
X X j x k j x k
Λ(n) = Λ(n) −2
n 2n
x/2<n≤x x/2<n≤x
X j x k j x k
≤ Λ(n) −2
n 2n
n≤x
= (log 2)x + O(log x)
by the above calculation. The left hand side is ψ(x) − ψ(x/2), and so we have
shown that
ψ(x) − ψ(x/2) ≤ (log 2)x + O(log x).
Using the fact that ψ(x) = 0 for any x ≤ 1,
dlog2 xe
X
ψ(x) = (ψ(x/2k ) − ψ(x/2k+1 )) ≤ (2 log 2)x + O((log x)2 ),
k=0
It follows that θ(x) = ψ(x) + O(x1/2 log x). We apply partial summation with
an = Λ(n) if n is prime, and 0 otherwise, and f (n) = log1 n . This gives
Z x
X θ(x) θ(t) θ(x) x
π(x) = 1= + 2
dt = + O .
log x 2 t(log t) log x (log x)2
p≤x
Lemma 5.
X log p
= log x + O(1).
p
p≤x
ANALYTIC NUMBER THEORY 9
It remains to deal with the contribution from prime powers pk ≤ x for k ≥ 2, which
we bound trivially by
X X 1 X 1
log p k
= log p 2 1.
p p −p
p≤x1/2 k≥2 1/2 p≤x
Lemma 6.
X1
= log log x + b + O(1/ log x),
p
p≤x
Lemma 7.
Y −1
1
1− = c log x + O(1)
p
p≤x
P∞ k
Proof. We use log(1 − t) = − k=1 tk to deduce that
Y −1
1 X
log 1− =− log(1 − 1/p)
p
p≤x p≤x
∞ X
X 1
=
kpk
k=1 p≤x
X 1 XX 1
= +
p kpk
p≤x k≥2 p≤x
0
= log log x + b + O(1/ log x).
x
The result follows since e = 1 + O(x) for |x| ≤ 1.
It is a little tricky to determine what the constant c in Lemma 7 actually is –
it turns out to be eγ ≈ 1.78 · · · . We can use this fact to point out why the naive
probabilistic heuristic can be misleading (and hopefully give some idea why the
prime number theorem itself, unlike these simple asymptotics, is hard to prove).
As a heuristic, we might guess that the probability that a given prime number p
divides a randomly chosen n is 1/p. Furtermore, we expect that these probabilities
should be independent for distinct primes p. Using the fact that n ≥ 3 is prime if
and only if p - n for all 2 ≤ p ≤ n1/2 , we might guess that
Y 1
1n is prime ≈ P(p - n for all 2 ≤ p ≤ n1/2 ) ≈ 1− ≈ 2e−γ / log n.
1/2
p
p≤n
The first term is trivially O(1). If π(x) = c(1 + R(x)) logx x , where R(x) → 0 as
x → ∞, then
X1 Z x
1 + R(t)
=c dt + O(1) = c(1 + o(1)) log log x + O(1).
p 1 t log t
p≤x
By Lemma 6 the left hand side is (1 + o(1)) log log x, and hence c = 1.
ANALYTIC NUMBER THEORY 11
Why have we defined such a function? We will prove another useful identity for
Λ2 , which should also explain this peculiar choice of notation.
Lemma 8.
Λ2 (n) = Λ(n) log n + Λ ? Λ(n).
In particular, 0 ≤ Λ2 (n) ≤ (log n)2 , and if Λ2 (n) 6= 0 then n has at most two
distinct prime divisors.
Proof. We will prove this identity using Möbius inversion, so that it is enough to
show that
X
(Λ(d) log d + Λ ? Λ(d)) = (log n)2 .
d|n
= (log n)2
This concludes the proof of the identity. From this identity it is obvious that
Λ2 (n) ≥ 0, since both terms on the right hand side are, for all n ≥ 1. Furthermore,
if n has at least three distinct prime divisors then the first term is zero, as is the
second, since in any decomposition ab = n at least one of a or b has at least two
distinct prime divisors, whence Λ(a)Λ(b) = 0.
2
P
Finally, since d|n 2 (d) = (log n) , and Λ2 (d) ≥ 0 for all d, it follows that
Λ
Λ2 (n) ≤ (log n)2 for all n.
Theorem 6.
X
Λ2 (n) = 2x log x + O(x).
n≤x
Proof. We have
X X X
Λ2 (n) = µ(a) (log b)2 .
n≤x a≤x b≤x/a
By partial summation
X
(log n)2 = x(log x)2 − 2x log x + 2x + O((log x)2 ).
n≤x
12 THOMAS F. BLOOM
Sieve methods
The idea of sieve theory is to start with some set of integers (usually an interval)
and remove, or ‘sift out’, those integers divisible by some set of primes. The classic
example is the Sieve of Eratosthenes: beginning with the set of integers {1, . . . , n},
and removing all integers divisible by some prime p ≤ n1/2 , we are left with only
primes. By counting how many removals took place, we aim to find estimates for
the count of what is left.
6. Preliminaries
We first introduce some important notation – we do this in some generality, for
part of the virtue of sieve theory is that the same tools can be applied to study
many different problems. Our basic data will be:
• a finite set A, the set which is to be sifted;
• a set of primes P , which we will sift by (usually this is just the set of all
primes);
• a sifting limit z, which is an upper bound on how large the primes we sieve
by;
• the sifting function
X
S(A, P ; z) = 1(n,P (z))=1 ,
n∈A
Q
where P (z) = p∈P,p<z p, which counts those integers in A left after the
sieve;
• we are able to count using sieve theory only if we are able to count A under
various divisibility conditions - if
Ad = {n ∈ A : d | n}
then we suppose that
f (d)
|Ad | = X + Rd ,
d
where f (d) ≥ 0 is some multiplicative function and Rd is thought of as an
error term. In general, we will only need this estimate for squarefree d. By
convention we suppose that f (p) = 0 if p 6∈ P ;
• finally, the expected density is defined as
Y f (p)
W (z) = WP (z) = 1− .
p
p∈P
p<z
We note that
f (1)
|A| = |A1 | = X + R1 = X + R1 ,
1
14
ANALYTIC NUMBER THEORY 15
for all d. Thus we let f ≡ 1 and then this gives us an error bound of Rd 1, and
X = y is a smooth count of how many integers are in A. The sifting function is
Example 2: We can also use sieve methods to count the number of twin primes.
For this we let A = {n(n + 2) : 1 ≤ n ≤ x} and P be the set of all primes except 2.
We will show later that if d is odd and square-free then
2ω(d)
|Ad | = x + O(2ω(d) ),
d
The most basic sieve is the sieve of Eratosthenes which counts the sifting function
by an inclusion-exclusion argument. That is, we first subtract from the count all
integers divisible by a single sifted prime, then add back all those divisible by two
primes, and so on. In analytic language this can be cleanly expressed using the
identity
(
X 1 if n = 1 and
µ(d) =
d|n
0 otherwise.
Proof.
X X
S(A, P ; z) = µ(d)
n∈A d|(n,P (z))
X X
= µ(d) 1d|n
d|P (z) n∈A
X
= µ(d) |Ad |
d|P (z)
X µ(d)f (d) X
=X +O |Rd |
d
d|P (z) d|P (z)
Y f (p)
X
=X 1− +O |Rd | .
p
p∈P d|P (z)
p<z
We can apply this to the interval as in Example
Q1 to get an upper bound for the
number of primes in an interval. If we take P = p≤z p then
Y 1
W (z) = 1− 1/ log z
p
p≤z
by Lemma 7. Choosing z = log y yields the following, which is striking in that the
upper bound is completely independent of x.
Corollary 1. For any x, y ≥ 1,
y
π(x + y) − π(x) .
log log y
Proof. Let A = {x < n ≤ x + y} and P be the set of all primes. Then
Y 1
1
W (z) = 1−
p<z
p log z
and so
y
|{x < n ≤ x + y : if p | n then p ≥ z}| + 2z .
log z
Choosing z = log y means that the right hand side is y/ log log y. We are done
noting that any primes in (x, x + y] must survive this sieving process (unless they
are log y themselves, which introduces an error of only O(log y)).
7. Selberg’s sieve
The main weakness of the sieve of Eratosthenes is the large error term, which
forces us to take z quite small. This is because we have to sift by all d dividing
P (z), the number of which grow exponentially with z. We can instead consider
what happens if we only sieve up to some limit, so only considering the effect of
ANALYTIC NUMBER THEORY 17
those d < D. We must then abandon any hope of getting an exact formula, but we
can hope that we’re still counting the main terms, and thus might get useful upper
and lower bounds for the sifting function – and now with much smaller error terms.
One of the most successful sieves was developed by Selberg. We will return to
lower bound sieves later on, and for now just focus on upper bounds. The key input
in the sieve of Eratosthenes was the replacing of
X
1(n,P (z))=1 by µ(d).
d|(n,P (z))
If we are content with an upper bound then we’d be happy instead replacing it
with any function F (n) such that F (1) ≥ 1 and F (n) ≥ 0 for all n. The starting
point for Selberg’s sieve is the trivial observation that any sequence of real numbers
λd ∈ R such that λ1 = 1 can lead to such a function, since
2 (
X 1 if n = 1 and
λd ≥
d|n
0 if n > 1.
and also that f (p) 6= 0 for p ∈ P . This allows us to make the useful definition of a
multiplicative function g such that for all primes p,
−1
f (p) f (p)
g(p) = 1− −1= ,
p p − f (p)
X X
S(A, P ; z) ≤ + 3ω(d) |Rd |
G(t, z)
d|P (z)
d<t2
Proof. Let λd ∈ R be some sequence to be chosen later, with the only restriction
that λ1 = 1. By the observation above, using the notation [d, e] for the least
18 THOMAS F. BLOOM
= XV + R,
say, where
X f ([d, e])
V = λd λe
[d, e]
d,e|P (z)
and
X
R= λd λe R[d,e] .
d,e|P (z)
We will first examine the main term V . If we write [d, e] = abc where c = (d, e)
and d = ac and e = bc, so that (a, b) = (b, c) = (a, c) = 1, then using the fact that
f is multiplicative,
X f (c) X f (a)f (b)
V = λac λbc
c ab
c|P (z) ab|P (z)/c
(a,b)=1
X f (c) f (a)f (b)
X X
= λac λbc µ(d)
c ab
c|P (z) ab|P (z)/c d|(a,b)
2
X f (c) X X f (m)
= µ(d) λcm
c m
c|P (z) d|P (z) d|m|P (z)/c
2
X X c X f (n)
= µ(d) λn
f (c) n
d|P (z) c|P (z)/d cd|n|P (z)
2
X X c X f (n)
= µ(m/c) λn .
f (c) n
m|P (z) c|m m|n|P (z)
holds for all primes n ∈ P and thus by multiplicativity also for all n | P (z). It
follows by Möbius inversion that
X c 1
µ(m/c) = .
f (c) g(m)
c|m
It’s convenient to introduce the sifting limit t at this point, so we will assume that
λd is chosen so that λd = 0 whenever d ≥ t. Then
2
X 1 X f (n)
V = λn
g(m) n
m|P (z) m|n|P (z)
X yk2
= ,
g(k)
k|P (z)
k<t
say. Since we are trying to produce an upper bound here, we want to choose λd
(and hence yk ) such that V is as small as possible.
What is the relationship between λd and yk ? First note that yk = 0 for k ≥ t.
For fixed d,
X X X f (e)λe
µ(k)yk 1d|k = µ(k) 1k|e 1d|k
e
k|P (z) k|P (z) e|P (z)
X f (e)λe X
= 1d|e µ(k)
e
e|P (z) k|e/d
f (d)λd
= µ(d) .
d
In particular, X
µ(k)yk = λ1 = 1.
k|P (z)
Since X X yk
1= µ(k)yk = µ(k)g(k)1/2 · ,
g(k)1/2
k|P (z) k|P (z)
k<t k<t
by the Cauchy-Schwarz inequality, it follows that
X y2 X
k
1≤ g(k)
= G(t, z)V.
g(k)
k|P (z) k|P (z)
k<t
g(d)d
= Gd .
f (d)
and
X
1 ≤ 3ω(n) .
[d,e]=n
y
π(x + y) − π(x) .
log y
X h
Y
G(z, z) = (pi − 1)−1
p1 ···ph <z i=1
h
X Y 1 1
= + 2 + ···
p1 ···ph <z i=1
pi pi
X X 1
= k1
p1 ···ph <z ki ≥1 p1 · · · pkhh
X1
≥
n<z
n
log z.
log 3
Furthermore, since 3ω(d) ≤ τ (d) log 2 d for any squarefree d, as in the estimate
for divisor function, the error term is z 2+ . In total then,
y
S(A, P ; z) + z 2+ ,
log z
The power of Selberg’s sieve is made most evident by the following application,
which gives an upper bound for the number of twin primes of the correct order of
magnitude.
Theorem 9 (Brun). Let π2 (x) count the number of n ≤ x such that both n and
n + 2 are prime. We have
x
π2 (x) .
(log x)2
2ω(d)
|Ad | = x + O(2ω(d) ),
d
22 THOMAS F. BLOOM
We claim that the right hand side is (log z)2 . By partial summation it suffices
to show that
X
2ω(d) z log z.
d<z
For this we use the identity d2 m=n µ(d)τ (m) = 2ω(n) , which is true since both
P
sides are multiplicative
P and it is easily checked for prime powers. We therefore
have, using that m<y τ (m) = y log y + O(y),
X X X
2ω(d) = µ(d) τ (m)
d<z d2 <z m<z/d2
!
X µ(d) X log d X 1
= z log z − z µ(d) + O z
d2 d2 d2
d2 <z d2 <z d2 <z
X µ(d) X 1
= z log z 2
+ O z + z log z
d d2
d 1/2 d>z
= cz log z + O(z),
P∞
where c = n=1 µ(d)d2 > 0 is some constant. This is z log z as required. Finally,
we note that the error term is bounded above by
X
6ω(d) z 2+
d<z 2
say. If we choose z = x1/4 , say, then the right hand side is x/(log x)2 , and the
left-hand side is at least π2 (x) + O(x1/4 ), and the result follows.
ANALYTIC NUMBER THEORY 23
8. Combinatorial sieve
Lemma 9 (Buchstab formula).
X
S(A, P ; z) = |A| − S(Ap , P ; p).
p|P (z)
Q By the same reasoning we can obtain a similar formula for the main term W (z) =
p∈P,p<z (1 − f (p)/p).
Lemma 10.
X f (p)
W (z) = 1 − W (p).
p
p|P (z)
Since p < `(d), the numbers pd are all square-free and `(pd) = p. Furthermore, as d
ranges over all divisors of P (z) with ω(d) = r, these pd range over all divisors of P (z)
with ω(d0 ) = r + 1. The general form follows by induction, since µ(d) = (−1)r .
In particular, X
S(A, P ; z) ≥ µ(d) |Ad |
d|P (z)
ω(d)<r
24 THOMAS F. BLOOM
if r is odd, and similarly this holds with an upper bound if r is even. We can
use this formula to truncate the Sieve of Eratosthenes at any level r, and examine
the remainder term. This leads us to Brun’s pure sieve, the simplest form of the
combinatorial sieve.
Theorem 10 (Brun’s pure sieve). If r ≥ 6 |log W (z)| then
−r X
S(A, P ; z) = XW (z) + O
2 X + |Rd |
.
d|P (z)
d≤z r
Also by Corollary 3,
X f (d) X f (d)
W (z) = µ(d) +O ,
d d
d|P (z) d|P (z)
ω(d)<r ω(d)=r
and so
X f (d) X
S(A, P ; z) = XW (z) + O X + |Rd |
.
d
d|P (z) d|P (z)
ω(d)=r ω(d)≤r
Furthermore,
X f (p) X
≤ − log(1 − f (p)/p) = − log W (z).
p
p|P (z) p|P (z)
If − log W (z) ≤ r/2e, then, the right-hand side above is ≤ 2−r . The final error term
we bound crudely by noting that if d | P (z) and ω(d) ≤ r then certainly d ≤ z r ,
since it is the product of at most r primes, all less than z.
Q
For example, if W (z) = p<z (1 − 1/p) ≈ 1/ log z, then we can take r =
O(log log z), so the level is elog z log log z , compared to 2z from the Sieve of Eratos-
thenes.
ANALYTIC NUMBER THEORY 25
Corollary 4. For any z ≤ exp o logloglogx x such that z → ∞ with x we have the
asymptotic formula
x
|{1 ≤ n ≤ x : if p | n then p ≥ z}| ∼ e−γ .
log z
Proof. We apply Brun’s pure sieve to A = {1 ≤ n ≤ x} and P the set of all primes.
As we have shown in the previous chapter,
Y 1
1
W (z) = 1− = (1 + o(1))e−γ .
p log z
p≤z
In particular, if we take x large enough, then r = 100dlog log ze will satisfy the
lower bound in Brun’s sieve. For this value of r,
x
2−r x ≤ 2−100 log log z x ≤ ,
(log z)2
say. Furthermore, since |Rd | 1, the other error term is
X
1 ≤ z r ≤ exp(200 log z log log z).
d≤z r
If log z = o(log x/ log log x) then, for x sufficiently large, this is at most x1/2 , say.
It follows that
−γ x x x
S(A, P ; z) = (1 + o(1))e +O 2
+x 1/2
= (1 + o(1))e−γ ,
log z (log z) log z
since z → ∞ as x → ∞.
CHAPTER 3
In this chapter, we will use (as is traditional for this topic) the letter s to denote
a complex variable, and σ and t to denote its real and imaginary parts respectively,
so that s = σ + it. Before we begin, it’s worth pausing to explicitly point out what
we mean by ns , where n is a natural number and s ∈ C. By definition this is
ns = es log n = nσ eit log n .
It is easy to check the multiplicative property, that (nm)s = ns ms .
A Dirichlet series is an infinite series of the form
∞
X an
,
n=1
ns
for some coefficients an ∈ C.
Lemma 11. For any sequence an there is an abscissa of convergence σc such that
α(s) converges for all s with σ > σc and for no s with σ < σc . If σ > σc then there
is a neighbourhood of s in which α(s) converges uniformly. In particular, α(s) is
holomorphic at s.
Proof. It suffices to show that if α(s) converges at s = s0 and we take some s with
σ > σ0 then α converges uniformly in some neighbourhood of s. The lemma then
follows by taking σc = inf{σ : α(s) converges}.
Suppose that α(s) converges at s = s0 . If we let R(u) = n>u an n−s0 then by
P
partial summation, for any s,
X Z N
−s s0 −s s0 −s
an n = R(M )M − R(N )N + (s0 − s) R(u)us0 −s−1 du.
M <n≤N M
Since the sum here is convergent, the summands tend to 0, and hence cn nσ0 . It
follows that that this sum is absolutely convergent for σ > σ0 + 1. Since each term
tends to 0 as σ → ∞, and the series is absolutely convergent, the right-hand side
tends to 0, and hence cN = 0.
Lemma 13. If α(s) and β(s) are two Dirichlet series, both absolutely convergent
at s, then
∞
!
X X
ac bd n−s
n=1 cd=n
is absolutely convergent and equals α(s)β(s).
Proof. We simply multiply out the product of two series,
! ! !
X an X bm X an bm X X
a s = = an bm k −s ,
n
n m
ms n,m
(nm)s
k nm=k
p≤y n
p|n =⇒ p≤y
Therefore the difference between the product here and the Dirichlet series is at
most X
|f (n)| n−σ → 0 as y → ∞.
n>y
We now define the Riemann zeta function in the half-plane σ > 1 by
X 1
ζ(s) = .
n
ns
Observe that this series diverges at s = 1, and the series actually converges ab-
solutely for σ > 1. By the above, ζ(s) defines a holomorphic function in this
half-plane. For our applications, we need to extend this definition to be able to talk
about ζ(s) for σ > 0.
Lemma 15. For σ > 1,
∞
{t}
Z
1
ζ(s) = 1 + −s dt.
s−1 1 ts+1
Proof. By partial summation, for any x,
Z x
X bxc btc
n−s = s + s s+1
dt.
x 1 t
1≤n≤x
28 THOMAS F. BLOOM
which is valid for σ > 1. From this it follows that ζ(s) 6= 0 for σ > 1. The Euler
product leads to the identity
1 Y 1
X
µ(n)
= 1− s = .
ζ(s) p
p n
ns
Furthermore, when σ > 1, the series is absolutely convergent, and so the derivative
can be computed summand by summand, leading to
X log n
ζ 0 (s) = − .
n
ns
From the Euler product we have
X X Λ(n)
log 1 − p−s = − n−s .
log ζ(s) = −
p n
log n
Finally, taking the derivative of this, we obtain the Dirichlet series with Λ(n) as
coefficients:
ζ0 X Λ(n)
(s) = − .
ζ n
n−s
then Z ∞
A(t)
α(s) = s dt.
1 ts+1
That is, α(s) can be expressed as a function of A(x). We are more interested in the
0
converse – for example, if an = Λ(n), then α(s) = − ζζ (s), which we hope we can
P
understand via analysis, and A(x) = n≤x Λ(n) = ψ(x), the asymptotics of which
are the subject of the prime number theorem.
ANALYTIC NUMBER THEORY 29
For the proof of the prime number theorem we require the following three facts
about zero-free regions of ζ, which we will prove in the next section.
30 THOMAS F. BLOOM
(1) there is a constant c such that, if σ > 1 − c/ log t and |t| ≥ 7/8, then
0
ζ(s) 6= 0 and ζζ (s) log(|t| + 2), and
(2) ζ(s) 6= 0 for 89 < σ and |t| ≤ 7/8, and
0
(3) ζζ (s) = s−1
−1
+ O(1) for 5/6 ≤ σ ≤ 2 and |t| ≤ 7/8.
Let’s first examine the error term here. The first summand is
x
x log x X 1 x(log x)2
.
T n=1 n T
The infinite sum in the second summand is σ01−1 . Overall, then the error term
is
x(log x)2 xσ0
+ .
T T (σ0 − 1)
Choosing σ0 = 1 + 1/ log x this is O(x(log x)2 /T ). Now let σ1 = 1 − c/ log T , where
c is such that ζ(s) 6= 0 for σ ≥ σ1 , and C be the rectangle contour connecting the
two lines. Since ζ 0 /ζ(s) has a simple pole at s = 1 with residue −1, and is otherwise
analytic (as there are no zeros of ζ inside the contour), we have
Z 0
1 ζ xs
(s) ds = −x.
2πi C ζ s
The right-hand side of this contour is −ψ(x) = O(xT −1 (log x)2 ) by the above. It
remains to bound the contribution from the other sides of the rectangle. To this
end, first note that
Z σ0 +iT 0
ζ xs xσ0
(s) ds (σ1 − σ0 ) log T,
σ1 +iT ζ s T
0
since |s| T on this line, |xs | ≤ xσ0 , the line has length σ1 − σ0 , and ζζ (s) log t.
Using our choices for σ0 and σ1 , it follows that the two short sides of the rectangle
contribute O(x/T ), provided T ≤ x.
Finally, we turn our attention to the long left-hand side of the rectangle. Away
0
from t = 0 (where |t| ≥ 1, say) we use the bound ζζ log |t| to bound the
contribution along this line by
Z T
σ1 1
x log T dt xσ1 (log T )2 .
1 t
ANALYTIC NUMBER THEORY 31
0
In the interval −1 < t < 1 we use the bound ζζ |s−1| 1
to bound the contribution
to the integral by Z 1
1
xσ1 dt xσ1 .
−1 |t(t − 1)|
Combining these estimates we have, for any 1 ≤ T ≤ x,
x
ψ(x) = x + O (log x)2 + x1−c/ log T (log T )2 .
T
√
We now make a choice of T to optimise this error term, which is T = exp(c log x)
for some small constant c > 0, and the proof is complete.
Proof. Let
f (z)
g(z) = ,
z(2M − f (z))
so that g is holomorphic for |z| ≤ R. Observe that, using <f (z) ≤ M ,
2 2
|f (z)| = <(f (z))2 + =(f (z))2 ≤ (2M − <(f (z)))2 + =(f (z))2 = |2M − f (z)| ,
and so |2M − f (z)| ≥ |f (z)| for |z| ≤ R. In particular, if |z| = R then |g(z)| ≤ 1/R.
By the maximum modulus principle, if |z| = r, then
|f (z)| 1
|g(z)| = ≤ ,
r |2M − f (z)| R
32 THOMAS F. BLOOM
and hence
R |f (z)| ≤ |2M r − rf (z)| ≤ 2M r + r |f (z)| ,
or
2r
|f (z)| ≤ M.
R−r
This shows that |f (z)| M . To deduce the same bound for f 0 (z), we use Cauchy’s
formula Z
1 f (w)
f 0 (z) = dw,
2πi r0 (w − z)2
where the integral is taken over some circle of radius r < r0 < R, say.
Lemma 18. Suppose that f (z) is analytic in a domain containing |z| ≤ 1, that
|f (z)| ≤ M in this disc, and that f (0) 6= 0. Let 0 < r < R < 1. Then for |z| ≤ r,
K
f0
X 1 M
(z) = + O log
f z − zk |f (0)|
k=1
It follows that, since each zk satisfies |zk | ≤ R − , the number of zeros satisfies
K log M . Let h(z) = log(g(z)/g(0)) (which is allowed since g(z) has no zeros in
|z| ≤ R). We have h(0) = 0, and
<h(z) = log |g(z)| − log |g(0)| ≤ log M
for |z| ≤ R. By the Borel-Carathéodory lemma,
|h0 (z)| log M.
Finally,
K K
f0 X 1 X 1
h0 (z) = (z) − + .
f z − zk z − R2 /zk
k=1 k=1
2 2
If |z| ≤ r then z − R /zk ≥ R /zk − |z| ≥ R − r, and so the final sum is
log M .
Since |ζ(3/2 + it)| 1 and |ζ(s + 3/2 + it)| t for |s| ≤ 1 we obtain the
following information about the zeta function.
ANALYTIC NUMBER THEORY 33
where the sum is over all zeros ρ of ζ(s) in the region |ρ − (3/2 + it)| ≤ 5/6.
Theorem 15. There is a constant c > 0 such that
c
ζ(s) 6= 0 for σ ≥ 1 − .
log t
Proof. Let ρ = σ + it be such that ζ(ρ) = 0, and let δ > 0 be something to be
chosen later. By Corollary 5,
ζ0 1 X 1
−< (1 + δ + it) = − −< + O(log t)
ζ 1+δ−σ 0
1 + δ + it − ρ0
ρ 6=ρ
Since <ρ0 ≤ 1 for all zeros ρ0 , it follows that <(1/(1 + δ + it − ρ0 )) > 0, provided
δ > 0. In particular,
ζ0 1
−< (1 + δ + it) ≤ − + O(log t).
ζ 1+δ−σ
Similarly,
ζ0
−< (1 + δ + 2it) log t.
ζ
Finally,
ζ0 1
− (1 + δ) = + O(1).
ζ δ
We now note that
ζ0 ζ0 ζ0
< −3 (1 + δ) − 4 (1 + δ + it) − (1 + δ + 2it)
ζ ζ ζ
is
∞
X Λ(n)
(3 + 4 cos(t log n) + cos(2t log n)) .
n=1
n1+δ
Since 3 + 4 cos θ + cos 2θ = 2(1 + cos θ)2 ≥ 0, the entire sum is ≥ 0. It follows that
3 4
− + O(log t) ≥ 0.
δ 1+δ−σ
This implies that 1 − σ 1/ log t, choosing δ ≈ 1/ log t.
This result has been improved by Korobov and Vinogradov to 1 − c/(log t)2/3+ .
For this we will need the following lemma of Landau. The real interest of this
lemma is that when our integrand is non-negative we obtain not just a half-plane of
convergence, but more importantly knowledge about what happens on the boundary
line itself. A similar fact holds for Dirichlet series.
Lemma 19 (Landau). Supppose that A is an integrable function bounded in any
finite interval, A(x) ≥ 0 for all large x ≥ X, and let
Z ∞
σc = inf σ : A(x)x−σ dx < ∞ .
X
The function Z ∞
F (s) = A(x)x−s dx
1
is analytic in σ > σc but not at s = σc .
Proof. Divide the integral in the definition of F to [1, X] and [X, ∞), given a
corresponding decomposition into F = F1 + F2 , say. The function F1 is entire.
For σ > σc , the integral converges absolutely, and hence F2 also defines an entire
function. Suppose that F2 is analytic at s = σc . We may expand F2 (s) as a power
series at s = σc + 1, so that
∞
X
F2 (s) = ck (s − 1 − σc )k ,
k=0
where
(k)
1 ∞
Z
F2 (1 + σc )
ck = = A(x)(− log x)k x−1−σc dx.
k! k! X
The radius of convergence of this power series is the distance from 1 + σc to the
nearest singularity of F2 (s), and hence by assumption is at least 1 + δ for some
δ > 0, say. If we consider s = σc − δ/2, then
∞
(1 + σc − s)k ∞
X Z
F2 (s) = A(x)(log x)k x−1−σc dx.
k! X
k=0
This is a convergent series with all non-negative terms, and hence we can inter-
change the integral and summation, to find
Z ∞ Z ∞
F2 (s) = A(x)x−1 exp((1 + σc − s) log x) dx = A(x)x−s dx,
X X
and so the integral must converge at s = σc − δ/2, which contradicts the definition
of σc .
When discussing lower bounds for error terms, the following notation is useful.
We say that f = Ω± (g) if
f (x)
lim sup ≥c>0
x→∞ g(x)
and
f (x)
lim inf ≤ −c < 0,
x→∞ g(x)
for some absolute constant c > 0. That is, not only does f (x) exceed (some constant
multiple of) g(x) infinitely often, but it does so both positively and negatively.
ANALYTIC NUMBER THEORY 35
Theorem 16. If σ0 is the supremum of the real parts of the zeros of ζ(s) then, for
any σ < σ0 ,
ψ(x) = x + Ω± (xσ ).
If there is a zero ρ with <ρ = σ0 , then
ψ(x) = x + Ω± (xσ0 ).
Proof. Suppose that ψ(x) − x ≤ cxσ for all large enough x, say x > X. Following
Lemma 19 we will consider the function
Z ∞
c ζ 0 (s) 1
F (s) = (cxσ − ψ(x) + x)x−s−1 dx = + + .
1 s − σ sζ(s) s − 1
The right-hand side has a pole at s = σ, but is analytic for real s > σ. It follows
that in fact the above identity must hold for all s with <s > σ. It follows that
there can’t be any zeros of ζ(s) in this region, which is a contradiction if σ < σ0 .
For the second, stronger, conclusion, we need to argue a little more carefully.
Suppose that there is a zero ρ = σ0 + it. Consider instead
Z ∞
eiθ F (s + it) + e−iθ F (s − it)
F (s)+ = (cxσ − ψ(x) + x) (1+cos(θ−t log x))x−s−1 dx.
2 1
The coefficients here are still non-negative real numbers. The left-hand side has a
pole at s = σ with residue
meiθ /ρ + me−iθ /ρ
c+ ,
2
where m is the multiplicity of the zero ρ. We have freedom to choose θ to be
whatever we like, in particular so that this expression is c − m/ |ρ|. The lim inf of
the right-hand side is > −∞ as s approaches σ from the right along the real axis.
We must therefore have
m
c− ≥ 0,
|ρ|
and hence c ≥ m/ |ρ|. This establishes the Ω+ aspect of the theorem. For Ω− we
use the same argument with signs reversed, so that we start with
Z ∞
F (s) = (−cxσ + ψ(x) − x)x−s−1 ds,
1
and so on.
Rx
If we let F (x) = 0 f (u) du then, by integration by parts,
Z ∞ Z ∞
f (u) −s−1 ∞ F (u)
s+1
= [F (u)u ] 0 + (s + 1) du.
1 u 1 us+2
Since F (x) is bounded, the integral here converges for any s with σ > −1, and
hence the left-hand side also converges in this region. We may therefore take
Z ∞
1 1 f (u)
ζ(s) = + +s s+1
du
2 s−1 1 u
as the definition of ζ(s) in the half-plane σ > −1. If −1 < σ < 0 then
Z 1
1 1 1
Z Z 1
f (u) 1 1 1
s+1
du = s+1
du − s
du = − + ,
0 u 2 0 u 0 u 2s s − 1
and so in this strip Z ∞
f (u)
ζ(s) = s s+1
du.
0 u
We now note that f (x) is a periodic function, continuous in (0, 1), and so it has a
Fourier series, which is
∞
X sin(2πnu)
f (u) = .
n=1
πn
For −1 < σ < 0 we therefore get
Z ∞ ∞ ∞
s X 1 ∞ sin(2πnu)
Z
1 X sin(2πnu)
ζ(s) = s du = du.
0 us+1 n=1 πn π n=1 n 0 us+1
We should justify the interchange of integral and summation here. For this, we note
that the series is uniformly convergent almost everywhere, furthermore converges to
some bounded value in (−1/2, 1/2] almost everywhere, and hence the interchange
of limits is justified by the dominated convergence theorem.
By change of variable, we have
Z ∞ Z ∞
sin(2πnu) s sin(u)
s+1
= (2πn) du.
0 u 0 us+1
Furthermore, writing sin(u) = 2i1
(eiu − e−iu ) and using another change of variable,
Z ∞ Z ∞
sin u
s+1
du = − sin(πs/2) t−s−1 e−t dt.
0 u 0
The integral here is an important one known as the Gamma function. It cannot be
simplified further, in general, but is one of the fundamental functions of analysis.
Let Z ∞
Γ(s) = ts−1 e−t dt,
0
which converges for σ > 0, and defines a holomorphic function in this region.
Integrating by parts we see that
Γ(s + 1) = sΓ(s).
This identity, combined with the fact that Γ(1) = 1, implies that Γ(n) = (n − 1)!
for all integer n ≥ 1. Furthermore, it allows us to analytically extend Γ(s) to a
meromorphic function on the entire complex plane with poles at s = 0, −1, −2, . . ..
ANALYTIC NUMBER THEORY 37
The right-hand side is actually analytic for any σ < 1, and hence we can take the
right-hand side to be a definition of ζ(s) in this region. By analytic continuation it
follows that this identity must hold for all s ∈ C. We have proved the following.
Theorem 17 (Functional equation). The zeta function ζ(s) can be extended a
function meromorphic on the whole complex plane, and for all s satisfies the identity
ζ(s) = 2s π s−1 sin(πs/2)Γ(1 − s)ζ(1 − s).
Many interesting facts can be deduced from this identity. We will first use it to
study the possible poles of ζ(s). We know that ζ(s) has a simple pole at s = 1,
and nowhere else for σ > −1. Suppose that ζ has a pole at s for σ < 0. Then so
too does Γ(1 − s)ζ(1 − s), but both Γ(s) and ζ(s) are holomorphic for all s with
<s > 1, which is a contradiction. It follows that ζ(s) only has one pole in C, which
is a simple pole at s = 1.
We will now consider the zeros of ζ(s). Suppose that ζ(s) = 0 and σ < 0. It
follows that
sin(πs/2)Γ(1 − s)ζ(1 − s) = 0.
Again, neither Γ(1 − s) nor ζ(1 − s) can be zero or a pole, and so sin(πs/2), which
means s must be an even integer. These are called the trivial zeros of ζ(s), located
at s = −2, −4, −6, . . .. Since there are no zeros with σ ≥ 1, there are no other zeros
with σ ≤ 0.
Aside from the trivial zeros, then, all zeros of ζ must lie in the critical strip
0 < σ < 1. Furthermore, since the other factors in the functional equation are
entire and non-zero in this strip, this implies that if ρ is a zero in the critical strip,
then so too is 1 − ρ. There is therefore a symmetry around the critical line σ = 1/2.
The Riemann hypothesis is motivated in part by the belief that this symmetry
should collapse so that all the zeros are located exactly on this line.
Using this symmetry and the results of the previous section we obtain the follow-
ing equivalence between the Riemann hypothesis and the error term in the prime
number theorem.
Theorem 18. The Riemann hypothesis is equivalent to the statement that
ψ(x) = x + O x1/2+o(1) .
Proof. If the Riemann hypothesis is true then the contour integration proof of the
prime number theorem can be improved to show that ψ(x) = x + O(x1/2+o(1) ). For
the converse, we use Theorem 16. If the Riemann hypothesis fails, there is some
zero ρ with real part 0 < σ < 1 such that σ 6= 1/2. Since 1 − ρ is also a zero, we
may assume without loss of generality that σ > 1/2. If we take some 1/2 < σ 0 < σ
then Theorem 16 shows
0
ψ(x) = x + Ω± (xσ ),
which would contradict ψ(x) = x + O(x1/2+o(1) ) for large enough x (large enough
so that the 1/2 + o(1) exponent is less than σ 0 ), which concludes the proof.
38 THOMAS F. BLOOM
If we also assume the existence of at least one zero on the critical line (there are
infinitely many, and the first is at 1/2 + (14.1 . . .)it), then we have the following
error term bound.
Theorem 19.
ψ(x) = x + Ω± (x1/2 ).
Proof. If the Riemann hypothesis is true then we use the existence of a zero on the
critical line and the second part of Theorem 16. If the Riemann hypothesis is false
then an even stronger statement is true by the first part of Theorem 16.
CHAPTER 4
In the statement
P of this lemma, and throughout this chapter unless specified
otherwise, χ means the sum is taken over all Dirichlet characters modulo q.
Proof. Let S be the first sum. If χ = χ0 then the claim is trivial. If χ 6= χ0 there
exists some a with (a, q) = 1 such that χ(a) 6= 1, 0. It follows that
X X
χ(a)S = χ(an) = χ(m) = S,
1≤n≤q 1≤m≤q
(q,n)=1 (q,m)=1
Our main interest lies in using Dirichlet characters to detect when a number lies
in a certain residue class modulo q, for which we note that, if (a, q) = 1, then
1 X 1 X
1n≡a (mod q) = χ(an) = χ(a)χ(n),
φ(q) χ φ(q) χ
39
40 THOMAS F. BLOOM
The innermost sum here we can study using analytic machinery, just as we have
done for ψ(x) previously. We now need to consider a more general type of zeta
function.
The Dirichlet L-function of χ modulo q is the Dirichlet series
∞
X χ(n)
L(s, χ) = .
n=1
ns
This is certainly absolutely convergent, and hence holomorphic, for σ > 1. By
partial summation, however, we can do better, if the character is not principal.
Lemma 21. If χ is a non-principal character then
∞
X χ(n)
L(s, χ) =
n=1
ns
converges for all σ > 0.
P
Proof. By partial summation, for any x, if F (x) = 1≤n≤x χ(n)
X χ(n) Z x
F (x) F (t)
= − s dt.
ns xs 1 ts+1
1≤n≤x
P
Since χ is periodic and 1≤n≤q χ(n) = 0, we know F (x) is bounded, and hence
the left-hand side has a limit as x → ∞ provided σ > 0.
In particular, unlike the Riemann zeta function, there is no pole at s = 1. This
has surprisingly deep implications, as we will see later on.
Since χ is a totally multiplicative function, L(s, χ) can be written as an Euler
product,
Y −1
χ(p)
L(s, χ) = 1− s
p
p
for σ > 1. In particular, note that
Y
L(s, χ0 ) = ζ(s) (1 − p−s ),
p|q
so that the L-function of the principal character is closely related to the zeta func-
tion. In particular, L(s, χ0 ) is analytic for σ > 0 except for a simple pole at s = 1
with residue φ(q)/q.
Just as with the Riemann zeta function, we can take logarithms of this Euler
product and then differentiate, which yields
∞
L0 X Λ(n)χ(n)
− (s, χ) = ,
L n=1
ns
which is valid for σ > 1. In the next section we will use this to prove Dirichlet’s
theorem.
ANALYTIC NUMBER THEORY 41
To show that there are infinitely many primes ≡ a (mod q) it would suffice to show
that the final error term remains bounded as s → 1, which would imply that
X log p
= ∞.
p
p≡a (mod q)
Since L(s, χ) is analytic for σ > 0, L0 /L is analytic except possibly for zeros of L.
To prove Dirichlet’s theorem, then, it suffices to prove the following.
Theorem 20. If χ 6= χ0 then L(1, χ) 6= 0.
Proof. We will first introduce some convenient terminology: a character is quadratic
if χ2 = χ0 but χ 6= χ0 , so that it takes only values −1, 0, 1. Otherwise, χ takes on
some non-real values, and it is called complex. We will prove the theorem first for
complex characters.
For σ > 1,
∞
!
Y XX Λ(n) −s
L(s, χ) = exp χ(n)n
χ χ n=2
log n
X Λ(n) −s
= exp
φ(q) n
.
log n
n≥2
n≡1 (mod q)
If s = σ > 1 is real then the sum above is a non-negative real number, and so
Y
L(σ, χ) ≥ 1
χ
for all σ > 1. Since L(s, χ0 ) has a pole at s = 1 it follows that L(1, χ) = 0 can hold
for at most one χ 6= χ0 , for otherwise the product would tend to 0 as σ → 1. The
theorem now follows immediately for complex χ, for if χ is a complex character then
so is χ, and since L(s, χ) = L(s, χ), and so L(1, χ) = 0 if and only if L(1, χ) = 0.
42 THOMAS F. BLOOM
If χ is complex then χ 6= χ, which contradicts the fact that L(1, χ) = 0 for at most
one character.
The case when χ is quadratic is harder, because there is no natural other charac-
ter to pair it with. Instead, we will pair it with ζ(s) itself. Suppose that L(1, χ) = 0.
Then ζ(s)L(s, χ) is analytic for σ > 0, and for σ > 1 this is a Dirichlet series with
coefficients X
r(n) = χ(d).
d|n
Clearly r is multiplicative, and furthermore r(n) ≥ 0 for all n, which is easily
checked by verifying it for prime powers, since
X
r(pk ) = χ(p)j .
0≤j≤k
Proof. This follows from Lemma 18 with f (s) = L(s + 3/2 + it, χ), R = 5/6 and
r = 2/3. We first establish a lower bound for f (0) using the Euler product, so
Y −1
Y χ(p) 1
|f (0)| = 1 − 3/2+it ≥ 1 + 3/2 1.
p
p p
p
We also require an upper bound for f (s) for |s| ≤ 1. For this, by partial summation,
we have, for σ > 0, the identity
Z ∞
F (t)
L(s, χ) = s dt,
1 ts+1
P
where F (t) = 1≤n≤t χ(n). Observing that |F (t)| ≤ q, by periodicity of q, we
deduce that Z ∞
1
|L(s, χ)| |s| q σ+1
dt,
1 t
and hence |f (z)| qτ for |z| ≤ 1.
where the sum is over all zeros ρ of L(s, χ) such that |ρ − (3/2 + it)| ≤ 5/6.
Proof. Comparing Euler products, we see that for σ > 0,
Y
L(s, χ0 ) = ζ(s) (1 − p−s ).
p|q
When σ ≥ 5/6 the sum over p is log q, since that is a trivial bound for the
number of primes dividing q, and each summand is 1. The lemma now follows
from Theorem 13 (when s is close to 1) and Corollary 5.
and so if L(s, χ0 ) = 0 and σ > 0 then ζ(s) = 0, and so in this case the theorem fol-
lows from the zero-free region we have established for ζ(s). We will thus henceforth
suppose that χ is a complex character.
Let ρ = σ + it be such that L(ρ, χ) = 0, and let δ > 0 be some parameter to be
chosen later. From the Euler product we know that σ ≤ 1. By Lemma 23, as in
the proof for the zero-free region for ζ, we have
L0 1
−< (1 + δ + it, χ) ≤ − + O(log qτ )
L 1+δ−σ
and
L0
< (1 + δ + 2it, χ2 ) log(qτ ).
L
This last step is where we crucially use the fact that χ is not quadratic, and so
χ2 is not the principal character, since Lemma 23 is not applicable to principal
characters.
We also have that, by Lemma 24,
L0 1
−< (1 + δ, χ0 ) = + O(log q).
L δ
Taking a linear combination of these three inequalities, as in the zero-free region
for the zeta function, we deduce that
L0 L0 L0
< −3 (1 + δ, χ0 ) − 4 (1 + δ + it, χ) − (1 + δ + 2it), χ2 )
L L L
3 4
≤ − + O(log qτ ).
δ 1+δ−σ
The reason for this choice of linear combination, as well as the choice for the three
Dirichlet characters, becomes clear when we write the left-hand side as a Dirichlet
series,
X Λ(n)
<(3 + 4χ(n)n−it + χ(n)2 n−2it ).
n1+δ
n≥1
(n,q)=1
If χ(n)n−it = eiθ then the real part in this is precisely 3 + 4 cos θ + cos(2θ) ≥ 0. In
particular, this is a series with non-negative summands, and hence
3 4
− + O(log qτ ) ≥ 0.
δ 1+δ−σ
We have a contradiction if δ = c1 / log qτ and σ ≥ 1 − c2 / log qτ for suitably chosen
constants c1 , c2 > 0, and the proof is complete.
We have proved a good zero-free region when χ is not a quadratic character.
This case is much harder (as you might guess from the earlier difficulty showing
even that L(1, χ) 6= 0 when χ is quadratic), and we can show much less.
Theorem 22. Let χ be a quadratic character modulo q. There exists a constant
c > 0 such that L(s, χ) has
(1) no zeros in the region σ > 1 − c/ log qτ and t 6= 0, and
(2) at most one real zero 1 − c/ log q < ρ < 1.
ANALYTIC NUMBER THEORY 45
We cannot rule out the existence of a real zero of L(s, χ) very close to 1 when
χ is quadratic. These are called exceptional zeros (and similarly χ is called an
exceptional character and q an exceptional modulus).
Proof. Suppose that ρ = σ + it is a zero of L(s, χ) and that t 6= 0. Let δ > 0 be a
parameter to be chosen later. As before,
L0 1
−< (1 + δ + it, χ) ≤ − + O(log qτ )
L 1+δ−σ
and
L0 1
−< (1 + δ, χ0 ) ≤ + O(log qτ ).
L δ
The key difference is now that χ2 = χ0 , and so we have the additional term 1/s − 1
in the expansion of L0 /L. When τ ≥ C(1 − σ) for some suitably large absolute
constant C > 0, this works in our favour. In this case, we have
L0 1 δ
−< (1 + δ + 2it, χ2 ) ≤ < + O(log qτ ) ≤ 2 + O(log qτ ).
L δ + 2it δ + 4t2
Taking a linear combination and using non-negativity of the Dirichlet series as
before implies that
3 4 δ
0≤ − + + O(log qτ ).
δ 1 + δ − σ δ 2 + 4t2
If σ = 1 this is a contradiction as δ → 0. Otherwise, we can choose δ = 1 − σ,
and again this a contradiction unless σ ≤ 1 − c1 / log qτ for some constant c1 > 0.
Observe the importance of t 6= 0 here in ensuring that the third summand is
≤ (1 − ) 1δ for some small > 0.
For small values of τ we require a different argument, and will no longer compare
1 + δ, 1 + δ + it and 1 + δ + 2it. Instead, we will just use 1 + δ and 1 + δ + it, and
use additionally the observation that since L(ρ, χ) = L(ρ, χ) (since χ is quadratic),
if ρ is a zero then ρ is also. Since t 6= 0 these are two distinct zeros of L, and so
L0
1 1
−< (1 + δ + it, χ) ≤ −< + + O(log qτ ).
L 1+δ−ρ 1+δ−ρ
The right-hand side is
−2(1 + δ − σ)
+ O(log qτ ).
(1 + δ − σ)2 + t2
It follows that
0
L0
L 1 −2(1 + δ − σ)
−< (1 + δ, χ0 ) + (1 + δ + it), χ) ≤ + + O(log qτ ).
L L δ (1 + δ − σ)2 + t2
The left-hand side is
X Λ(n)
< 1 + χ(n)nit .
n 1+δ
n≥1
(n,q)=1
Again, if χ(n)nit = eiθ then the real part here is 1 + cos θ, which is obviously ≥ 0.
Once again, we obtain a contradiction choosing δ = c1 (1 − σ) if σ ≥ 1 − c2 / log qτ
for suitable c1 , c2 > 0.
It remains to consider the case of real zeros. The previous strategy no longer
works, since ρ and ρ are not distinct zeros. But this idea does allow us to rule out
the existence of more than one such real zero.
46 THOMAS F. BLOOM
The previous theorem shows that for a fixed quadratic character modulo q, there
is at most one exceptional zero. We can say a little more, and show that in fact
amongst all possible characters for fixed q, there is at most one exceptional zero
(and so we can justly talk of ’the’ exceptional zero of q).
Lemma 25. If χ1 and χ2 are distinct quadratic characters modulo q then L(s, χ1 )L(s, χ2 )
has at most one real zero β with 1 − c/ log q < β < 1.
Proof. Say βi is a real zero of L(s, χi ) for i = 1, 2. Without loss of generality,
5/6 ≤ β1 ≤ β2 < 1. Let δ > 0 be some parameter to be chosen later. We have
L0 1
−< (1 + δ, χi ) ≤ − + O(log q)
L 1 + δ − βi
and
L0
−<(1 + δ, χ1 χ2 ) ≤ O(log q),
L
6 χ0 . Finally, we have
using the fact that χ1 χ2 =
L0 1
−< (1 + δ, χ0 ) ≤ + O(log q).
L δ
It follows that
0
L0 L0 L0
L
−< (1 + δ, χ0 ) + (1 + δ, χ1 ) + (1 + δ, χ2 ) + (1 + δ, χ1 χ2 )
L L L L
1 2
≤− + O(log q).
δ 1 + δ − β1
The left-hand side is the Dirichlet series
X Λ(n)
<(1 + χ1 (n) + χ2 (n) + χ1 χ2 (n)).
n1+δ
n≥1
(n,q)=1
The term inside the brackets is (1 + χ1 (n))(1 + χ2 (n)) ≥ 0, since both χ1 and χ2
take only the values ±1. If we choose δ = c1 (1 − β1 ) for some constant c1 > 0, then
β1 ≤ 1 − c/ log q for some c > 0 as required.
By the same analysis as for ζ(s), the error term is O(x(log x)2 /T ) if we choose
σ0 = 1 + 1/ log x. We extend the contour integral to the rectangular contour with
corners at σ0 ± iT and σ1 ± iT where σ1 < 1 is chosen to avoid any (non-exceptional
zeros) on or inside the contour. The error terms from the short sides and the long
left-hand side contribute a total of
x(log x)2
1−qσ1
p
O +x = O x exp(−c log x)
T
√
if we choose σ1 = 1 − c1 / log qT and T = exp(O log x ).
It remains to note that the integral around the integral contour is x if χ = χ0
(from the simple pole at s = 1), is 0 if χ 6= χ0 has no exceptional zeros (since there
are no zeros or poles of L(s, χ) inside the contour), and xβ /β if χ has an exceptional
zero at β.
Using the previous identity, we can immediately deduce the following prime
number theorem for arithmetic progressions.
√
Corollary 6. Let (a, q) = 1 and q ≤ exp(O( log x)). If q has no exceptional zero
then
x p
ψ(x; q, a) = + O(x exp(−c log x)).
φ(q)
If q has an exceptional zero at β and χ1 then
x χ1 (a)xβ p
ψ(x; q, a) = − + O(x exp(−c log x)).
φ(q) φ(q)β
17. Siegel-Walfisz theorem
The prime number theorem we established in the previous section is quite frus-
trating in that we have two different results, depending on the existence of an
exceptional zero. One may ask whether, if q is small enough, this obstruction can
be overcome.
The answer is yes, and is the following.
Theorem 24 (Siegel-Walfisz). For all A > 0, if (a, q) = 1 and q ≤ (log x)A then
x p
ψ(x; q, a) = + OA (x exp(−c log x)).
φ(q)
This follows immediately from the following similar expression for ψ(x, χ).
Theorem 25. If q ≤ (log x)A and x is large enough (depending only on A) and if
χ 6= χ0 then p
ψ(x, χ) = OA (exp(−c log x)).
48 THOMAS F. BLOOM
In particular, note that this bound holds regardless of whether or not q has an
exceptional zero at χ. This in turn follows from the following result of Siegel.
Theorem 26. For all > 0 there exists C such that if χ is a quadratic character
modulo q and β is a real zero then
β < 1 − C q − .
Proof. Omitted; the curious student may find a proof in many standard texts on
analytic number theory, such as Davenport’s Multiplicative Number Theory or
Montgomery and Vaughan’s Multiplicative Number Theory.
A curious feature of Siegel’s result is that the constant C is ineffective – that
is, we are not using the notation C to hide some constant that we can’t be bothered
to work out exactly, but there is no way to find from the proof how the constant
depends on at all. In turn, this means that the constant in the Siegel-Walfisz
theorem is also ineffective.
Proof that Theorem 26 implies Theorem 25. If χ has no exceptional zero then we
have already proved this. Suppose that χ has an exceptional zero at β. By Theo-
rem 26, for any > 0, we know β < 1 − C q − . It follows that
β
x p p
ψ(x, χ) = O + x exp(−c log x) = xO exp(−C q − log x) + exp(−c log x) .
β
√
Using the bound q ≤ (log x)A the error term here is O(exp(−c log x)) as required,
if we choose = 1/3A, say.