0% found this document useful (0 votes)
24 views48 pages

Analytic Number Theory

Uploaded by

nothankkk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views48 pages

Analytic Number Theory

Uploaded by

nothankkk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

ANALYTIC NUMBER THEORY

THOMAS F. BLOOM

These are lecture notes for the Part III lecture course given in Lent Term 2019.
They are meant to be a faithful copy of the material given in lectures, with some
supplementary footnotes and historical notes. The lectures themselves are the guide
for what material is examinable, and any additional material in these printed notes
will be marked as non-examinable. In the case of any doubt, ask the lecturer.
As these are informal lecture notes, I have not given proper references. My main
source when compiling these notes, and the recommended textbook for the course,
is Multiplicative Number Theory by Montgomery and Vaughan.
If you have any questions, concerns, corrections, please contact the lecturer at
tb634@cam.ac.uk.

What is analytic number theory?


Analytic number theory is the study of the integers using techniques from anal-
ysis, both real and complex.
At first sight this may seem paradoxical – how can the continuous methods of
analysis be useful for studying discrete objects? It is remarkable that not only can
they be useful, they are often the most successful techniques that we have. In this
course we will cover a variety of different methods, and see what they lead to.
We will first give some examples of the kinds of questions that analytic number
theory tries to answer. These are usually quantitative questions (e.g. ‘how many’,
or ‘how large’) asked about simple number theoretic objects, particularly prime
numbers.
(1) How many prime numbers are there? One of the first mathematical results
was Euclid’s theorem that there are infinitely many prime numbers. We
can ask for a more precise counting result: if π(x) denotes the number of
primes p satisfying 1 ≤ p ≤ x then can we give an asymptotic formula for
π(x)? The Prime Number Theorem, one of the great accomplishments of
analytic number theory, gives such an asymptotic:
x
π(x) ∼ .
log x
This formula, first conjectured (independently) by Legendre and Gauss in
the 18th century, was proved (independently) by Hadamard and de la Vallée
Poussin in 1896, using complex analysis.
(2) How many twin primes are there? That is, primes p such that p + 2 is also
prime. Unlike the primes themselves, it is not even known whether there
are infinitely many such primes, let alone what the precise count is. We can
make a reasonable guess, however – the prime number theorem roughly says
that the ‘probability’ that a number 1 ≤ n ≤ x is prime is about 1/ log x,
1
2 THOMAS F. BLOOM

so the number of pairs n, n + 2 ≤ x which are both prime should be about


x
π2 (x) ≈ .
(log x)2
Of course, we cannot prove that this is correct – in this course, however,
using sieve methods, we will prove that the implicit upper bound is correct,
giving evidence that this is the right guess.
(3) Given (a, q) = 1, how many primes are there congruent to a modulo q?
Dirichlet was the first to prove that there are infinitely many, in 1837.
Once again, we ask the more refined question as to how many such primes
are in the interval [1, x]. Since all such primes must be in one of the φ(q)
many residue classes modulo q that are coprime to q, we might guess that
the primes are evenly distributed amongst them, so that
1 x
π(x; a, q) ≈ .
φ(q) log x
This has been shown to be true for small q by Siegel and Walfisz, and we
will prove this in the final section of the course.

Outline of the course


The course will be divided into four roughly equal parts:
(1) Elementary Techniques
(2) Sieve Methods
(3) Riemann’s zeta function and the prime number theorem
(4) Primes in arithmetic progressions
The first two parts will use only elementary methods – that is, just real analysis,
and no complex analysis. Complex analysis will be used often in the last two parts.
CHAPTER 1

Elementary Techniques

Review of asymptotic notation. We write f (x) = O(g(x)) if there exists some


constant C > 0 such that |f (x)| ≤ C |g(x)| for all sufficiently large x. We will also
use the Vinogradov notation f  g to denote the same thing (so that f = O(g)
and f  g are equivalent).
We write f (x) = o(g(x)) if limx→∞ fg(x)
(x)
= 0. We write f ∼ g if limx→∞ fg(x)
(x)
= 1.
Observe that
f ∼ g if and only if f = (1 + o(1))g.

1. Arithmetic functions
An arithmetic function is simply a function on the natural numbers1, f : N → R.
An arithmetic function is multiplicative if
f (nm) = f (n)f (m) whenever (n, m) = 1,
and is completely multiplicative if f (nm) = f (n)f (m) for all n, m ∈ N.
An important operation on the space of arithmetic functions is that of multi-
plicative convolution:
X
f ? g(n) = f (a)g(b).
ab=n
If f and g are both multiplicative functions, then so too is f ? g. The most obvious
arithmetic function is the constant function:
1(n) = 1.
We recall the definition of the Möbius function,
(
(−1)k if n = p1 · · · pk where pi are distinct primes and
µ(n) =
0 otherwise (i.e. if n is divisible by a square).
A fundamental relationship is that of Möbius inversion, which says that the Möbius
function acts as an inverse to multiplicative convolution:
1 ? f = g if and only if µ ? g = f.
A great deal of analytic number theory is concerned with a deep study of the
distribution of the prime numbers. For this the ‘correct’ way to count primes is
not, as one might expect, the indicator function
(
1 if n is prime, and
1P (n) =
0 otherwise,

1For the purposes of this course, 0 is not a natural number.

3
4 THOMAS F. BLOOM

but instead the von Mangoldt function, which firstly also counts prime powers
pk , but also counts them not with weight 1, but with weight log p instead:
(
log p if n = pk and
Λ(n) =
0 otherwise.
The main reason that this function is much easier to work with than 1P directly, is
the following identity.
Lemma 1.
1 ? Λ(n) = log n and log ?µ(n) = Λ(n).
Proof. The second identity follows from the first by Möbius inversion. To establish
the first, if we let n = pk11 · · · pkr r , then
ki
r X
X
1 ? Λ(n) = log pi
i=1 j=1
Xr
= log pki i
i=1
= log n.


2. Summation
A major theme of analytic number theory is understanding the basic arithmetic
functions,
P particularly how large they are on average, which means understand-
ing n≤x f (n). For example, if f is the indicator function of primes, then this
summatory function is precisely the prime counting function π(n).
We say that f has average order g if
X
f (n) ∼ xg(x).
n≤x

One of the most useful tools in dealing with summations is partial summation,
which is a discrete analogue of integrating by parts.
Theorem 1 (Partial summation). If an is any sequence of complex numbers and
f : R+ → R is such that f 0 is continuous then
X Z x
an f (n) = A(x)f (x) − A(t)f 0 (t) dt,
1≤n≤x 1
P
where A(x) = 1≤n≤x an .
Proof. Let N = bxc. Using an = A(n) − A(n − 1)
X N
X
an f (n) = f (n)(A(n) − A(n − 1))
1≤n≤N n=1
N
X −1
= f (N )A(N ) − A(n)(f (n + 1) − f (n)).
n=1
ANALYTIC NUMBER THEORY 5

We now observe that


Z n+1
f 0 (x) dx = f (n + 1) − f (n),
n

and so, since A(x) is constant for x ∈ [n, n + 1),

X N
X −1 Z n+1
an f (n) = f (N )A(N ) − A(x)f 0 (x) dx,
1≤n≤N n=1 n

and the result follows since if N ≤ x < N + 1 then


 Z x 
A(x)f (x) = A(N )f (x) = A(N ) f (N ) + f 0 (x) dx .
N

This is extremely useful even when the coefficients an are identically 1, when
A(x) = bxc = x + O(1).

Lemma 2.
X 1
= log x + γ + O(1/x),
n
n≤x

where γ = 0.577 · · · is a constant, known as Euler’s constant.

Proof. By partial summation


Z x
X 1 bxc btc
= + dt
n x 1 t2
n≤x
Z x Z ∞ Z ∞
1 {t} {t}
=1+ dt + 2
dt − dt + O(1/x)
1 t 1 t x t2
 Z ∞ 
{t}
= log x + 1 + dt + O(1/x).
1 t2
It remains to note that the second term is a constant, since the integral converges.


It is remarkable how little we understand about Euler’s constant – it is not even


known whether it is irrational or not.

Lemma 3.
X
log n = x log x − x + O(log x).
1≤n≤x

Proof. By partial summation


x
btc
X Z
log n = bxc log x − dt
1 t
n≤x
= x log x − x + O(log x).

6 THOMAS F. BLOOM

3. Divisor function
We now turn our attention to number theory proper, and examine one of those
most important arithmetic functions: the divisor function2
X X
τ (n) = 1 ? 1(n) = 1= 1.
ab=n d|n

We first find its average order.


Theorem 2. X
τ (n) = x log x + (2γ − 1)x + O(x1/2 ).
n≤x

In particular, the average order of τ (n) is log n.


Proof. A first attempt might go as follows:
X X
τ (n) = 1
n≤x ab≤x
X jxk
=
a
a≤x
X1
=x + O(x)
a
a≤x
= x log x + γx + O(x).
The problem is that the second term γx is lost in the error term O(x). To improve
the error term we use what is known as the hyperbola method, which is the obser-
vation that when summing over pairs (a, b) such that ab ≤ x we can express this
as the sum over pairs where a ≤ x1/2 and where b ≤ x1/2 , and then subtract the
contribution where max(a, b) ≤ x1/2 .
X X jxk X jxk X
1= + − 1
a b
ab≤x 1/2
a≤x 1/2 b≤x 1/2 a,b≤x
X 1
= 2x − bx1/2 c2 + O(x1/2 )
1/2
a
a≤x

= x log x + (2γ − 1)x + O(x1/2 ).




It is a deep and difficult problem to improve the error term here – the truth is
probably O(x1/4+ ), but this is an open problem, and the best known is O(x0.3149··· ).
We have just shown that the ‘average’ number of divisors of n is log n. The worst
case behaviour can differ dramatically from this average behaviour, however.
Theorem 3. For any n ≥ 1,
τ (n) ≤ nO(1/ log log n) .
In particular, for any  > 0, τ (n) = O (n ).

2Alternative notation used in some places is d(n) or σ (n).


0
ANALYTIC NUMBER THEORY 7

Proof. Let 0 <  < 1/2 be something to be chosen later. Let n = pk11 · · · pkr r , so
that τ (n) = (k1 + 1) · · · (kr + 1), and hence
r
τ (n) Y ki + 1
= .
n i=1
pki i 

If p > 21/ then


k+1 k+1
k
≤ ≤1
p 2k
and if p ≤ 21/ then, using the inequality x + 1/2 ≤ 2x , valid for all x ≥ 0, and the
fact that p ≥ 2,
k+1 1
k
≤ .
p 
Trivially bounding the number of primes p ≤ 21/ by 21/ , it follows that
1/
τ (n) ≤ n −2 .
This already shows the inequality τ (n)  n for any fixed 0 <  < 1/2. To be
more precise, we choose  = c/ log log n for some small enough constant c > 0, from
which the first result follows. 

4. Estimates on prime numbers


The prime number theorem is the statement that
x
π(x) ∼ ,
log x
or, equivalently (we will justify this equivalence soon),
X
ψ(x) = Λ(n) ∼ x.
n≤x

The proof of this is surprisingly involved, and we will return to it later in the course
when we examine the Riemann zeta function.
It is much easier to show, if not an asymptotic formula, at least that this is the
correct rate of growth of the function. This was proved in 1850 by Chebyshev.
Theorem 4 (Chebyshev).
ψ(x)  x.
Proof. We will first prove the lower bound. This relies on the observation that, for
any x ≥ 1, bxc ≤ 2bx/2c + 1.
X
ψ(x) = Λ(n)
n≤x
X j x k
j x k
≥ Λ(n) −2
n 2n
n≤x
X X
= Λ(n) − 2 Λ(n)
nm≤x nm≤x/2
X X
= log n − 2 log n.
n≤x n≤x/2
8 THOMAS F. BLOOM

By Lemma 3,
x x 
ψ(x) ≥ x log x−x+O(log x)−2 log(x/2) − + O(log x) = (log 2)x+O(log x).
2 2
It follows that, for any c > 0 and x sufficiently large, ψ(x) ≥ (log 2 − c)x, and hence
ψ(x)  x.
For the upper bound, we do something very similar, except we note that for
x ∈ [1, 2) we have equality bxc = 2bx/2c + 1. Furthermore, for any x ≥ 1, we have
the lower bound bxc ≥ 2bx/2c. It follows that
X X j x k j x k
Λ(n) = Λ(n) −2
n 2n
x/2<n≤x x/2<n≤x
X j x k j x k
≤ Λ(n) −2
n 2n
n≤x
= (log 2)x + O(log x)
by the above calculation. The left hand side is ψ(x) − ψ(x/2), and so we have
shown that
ψ(x) − ψ(x/2) ≤ (log 2)x + O(log x).
Using the fact that ψ(x) = 0 for any x ≤ 1,
dlog2 xe
X
ψ(x) = (ψ(x/2k ) − ψ(x/2k+1 )) ≤ (2 log 2)x + O((log x)2 ),
k=0

and hence ψ(x)  x as required. 


Chebyshev’s estimate is the first non-trivial quantitative information we have
about the primes, and leads to a host of other facts about the primes – rather
surprisingly, not just big-oh behaviour, but precise asymptotic results.
Lemma 4.  
ψ(x) x
π(x) = +O .
log x (log x)2
In particular, π(x)  x/ log x, and π(x) ∼ x/ log x if and only if ψ(x) ∼ x.
Proof.
P We first remove the contribution from prime powers by noting that, if θ(x) =
p≤x log p, then
dlog xe
X X X
ψ(x) − θ(x) ≤ log p  x1/k  x1/2 log x.
k≥2 p≤x1/k k=2

It follows that θ(x) = ψ(x) + O(x1/2 log x). We apply partial summation with
an = Λ(n) if n is prime, and 0 otherwise, and f (n) = log1 n . This gives
Z x  
X θ(x) θ(t) θ(x) x
π(x) = 1= + 2
dt = + O .
log x 2 t(log t) log x (log x)2
p≤x


Lemma 5.
X log p
= log x + O(1).
p
p≤x
ANALYTIC NUMBER THEORY 9

Proof. Recalling that log = 1 ? Λ, and using Lemma 3,


X
x log x + O(x) = log n
n≤x
X
= Λ(b)
ab≤x
X Λ(b)
=x + O(ψ(x)).
b
b≤x

Using Chebyshev’s estimate, this proves that


X Λ(n)
= log x + O(1).
n
n≤x

It remains to deal with the contribution from prime powers pk ≤ x for k ≥ 2, which
we bound trivially by
X X 1 X 1
log p k
= log p 2  1.
p p −p
p≤x1/2 k≥2 1/2 p≤x

Lemma 6.
X1
= log log x + b + O(1/ log x),
p
p≤x

where b is some constant.


P
Proof. Let A(x) = p≤x (log p)/p = log x + R(x), say, where R(x) = O(1). By
partial summation
X1 Z x
A(x) A(t)
= + 2
dt
p log x 2 t(log t)
p≤x
Z x Z x
1 R(t)
= 1 + O(1/ log x) + dt + 2
dt
2 t log t 2 t(log t)
Z ∞
R(t)
= log log x + 1 − log log 2 + dt + O(1/ log x).
2 t(log t)2

Lemma 7.
Y −1
1
1− = c log x + O(1)
p
p≤x

where c > 1 is some constant.


10 THOMAS F. BLOOM

P∞ k
Proof. We use log(1 − t) = − k=1 tk to deduce that
 
Y −1
1 X
log  1− =− log(1 − 1/p)
p
p≤x p≤x
∞ X
X 1
=
kpk
k=1 p≤x
X 1 XX 1
= +
p kpk
p≤x k≥2 p≤x
0
= log log x + b + O(1/ log x).
x
The result follows since e = 1 + O(x) for |x| ≤ 1. 
It is a little tricky to determine what the constant c in Lemma 7 actually is –
it turns out to be eγ ≈ 1.78 · · · . We can use this fact to point out why the naive
probabilistic heuristic can be misleading (and hopefully give some idea why the
prime number theorem itself, unlike these simple asymptotics, is hard to prove).
As a heuristic, we might guess that the probability that a given prime number p
divides a randomly chosen n is 1/p. Furtermore, we expect that these probabilities
should be independent for distinct primes p. Using the fact that n ≥ 3 is prime if
and only if p - n for all 2 ≤ p ≤ n1/2 , we might guess that
Y  1

1n is prime ≈ P(p - n for all 2 ≤ p ≤ n1/2 ) ≈ 1− ≈ 2e−γ / log n.
1/2
p
p≤n

This would in turn suggest that


X X 1 x
π(x) = 1n is prime ≈ 2e−γ ≈ 2e−γ .
log n log x
n≤x n≤x
−γ
But since 2e = 1.12 · · · , this contradicts the prime number theorem! This shows
that, while heuristically thinking about discrete concepts in terms of ‘probability’
can lead to roughly the right order of magnitude, one must take care not to take
the constants obtained too seriously!
Indeed, we can use the elementary estimates already obtained to show, not the
prime number theorem itself, but at least the fact that if π(x)xlog x converges to a
limit at all, then this limit must be 1, and hence the prime number theorem is true.
The hard part is showing that the limit exists.
Theorem 5 (Chebyshev). If π(x) ∼ c logx x then c = 1.
Proof. By partial summation,
X1 Z x
π(x) π(t)
= + dt.
p x 1 t2
p≤x

The first term is trivially O(1). If π(x) = c(1 + R(x)) logx x , where R(x) → 0 as
x → ∞, then
X1 Z x
1 + R(t)
=c dt + O(1) = c(1 + o(1)) log log x + O(1).
p 1 t log t
p≤x

By Lemma 6 the left hand side is (1 + o(1)) log log x, and hence c = 1. 
ANALYTIC NUMBER THEORY 11

5. Selberg’s elementary approach


Let
X
Λ2 (n) = µ ? (log)2 = µ(a)(log b)2 .
ab=n

Why have we defined such a function? We will prove another useful identity for
Λ2 , which should also explain this peculiar choice of notation.

Lemma 8.
Λ2 (n) = Λ(n) log n + Λ ? Λ(n).
In particular, 0 ≤ Λ2 (n) ≤ (log n)2 , and if Λ2 (n) 6= 0 then n has at most two
distinct prime divisors.

Proof. We will prove this identity using Möbius inversion, so that it is enough to
show that
X
(Λ(d) log d + Λ ? Λ(d)) = (log n)2 .
d|n

To this end, we write the left hand side as


X XX X X X
Λ(d) log d + Λ(a)Λ(b) = Λ(d) log d + Λ(a) Λ(b)
d|n d|n ab=d d|n a|n n
b| a
X X
= Λ(d) log d + Λ(a) log(n/a)
d|n a|n
X
= log n Λ(d)
d|n

= (log n)2

This concludes the proof of the identity. From this identity it is obvious that
Λ2 (n) ≥ 0, since both terms on the right hand side are, for all n ≥ 1. Furthermore,
if n has at least three distinct prime divisors then the first term is zero, as is the
second, since in any decomposition ab = n at least one of a or b has at least two
distinct prime divisors, whence Λ(a)Λ(b) = 0.
2
P
Finally, since d|n 2 (d) = (log n) , and Λ2 (d) ≥ 0 for all d, it follows that
Λ
Λ2 (n) ≤ (log n)2 for all n. 

Theorem 6.
X
Λ2 (n) = 2x log x + O(x).
n≤x

Proof. We have
X X X
Λ2 (n) = µ(a) (log b)2 .
n≤x a≤x b≤x/a

By partial summation
X
(log n)2 = x(log x)2 − 2x log x + 2x + O((log x)2 ).
n≤x
12 THOMAS F. BLOOM

To handle this, we first establish by partial summation that


X τ (n) 1
= (log x)2 + c log x + c0 + O(x−1/2 )
n 2
n≤x
1 c1 X
= (log x)2 + τ (n) + c2 + O(x−1/2 ).
2 x
n≤x

It follows that the main sum is


 
X X x X x
=2 µ(a)  τ (b) − c1 τ (b) − c2 + O((x/a)−1/2 ) .
ab a
a≤x b≤x/a b≤x/a

The error term here is X


 x−1/2 a1/2  x.
a≤x
The third term is
X x X jxk X X X
 µ(a) = µ(a) +O(x) = µ(a) 1+O(x) = µ?1(n)+O(x)  x.
a a
a≤x a≤x a≤x b≤x/a n≤x

The second term is


X X X X
 µ(a) 1= 1 ? µ(d)  x.
a≤x bc≤x/a b≤x d≤x/b

Finally, the first term is


X µ ? τ (n) X 1
2x = 2x = 2x log x + O(x).
n n
n≤x n≤x

A fourteen point plan to proving the prime number theorem


(NON-EXAMINABLE)
In this section we give the main points on how to go from Selberg’s identity to
a full elementary proof of the prime number theorem. You are encouraged to try
and follow this outline yourself and try to prove each of the points below, thereby
producing a full proof. This will hopefully instill a proper respect for the power of
elementary methods, and how involved they can get. Everything in this section is
non-examinable, however.
We start by introducing some convenient notation. Let
ψ(x)
− 1.
r(x) =
x
The prime number theorem is equivalent to limx→∞ |r(x)| = 0.
(1) Deduce from Selberg’s identity that
X Λ(n)
r(x) log x = − r(x/n) + O(1).
n
n≤x

(2) By considering this identity with x replaced by x/m, show that


X Λ2 (n)
r(x)(log x)2 ≤ |r(x/n)| + O(log x).
n
n≤x
ANALYTIC NUMBER THEORY 13

(3) Show that


X Z bxc
Λ2 (n) = 2 log t dt + O(x).
n≤x 1

(4) Show that


X Λ2 (n) Z n
X 1
|r(x/n)| = 2 |r(x/n)| log t dt + O(log x).
n n n−1
n≤x 2≤n≤x

(5) Show that


1 n
Z Z x
X 1
|r(x/n)| log t dt = |r(x/t)| dt + O(log x).
n n−1 1 t log t
2≤n≤x

(6) Deduce that


X Λ2 (n) Z x
1
|r(x/n)| = 2 |r(x/t)| dt + O(log x).
n 1 t log t
n≤x

(7) Now let V (u) = r(eu ). Show that


Z uZ v
u2 |V (u)| ≤ 2 |V (t)| dt dv + O(u).
0 0
(8) Let α = lim sup |V (u)| (so that the prime number theorem is equivalent to
α = 0). Show that
1 u
Z
α ≤ lim sup |V (t)| dt.
u 0
(9) Show that, for every u < v,
Z v
V (t) dt  1.
u
(10) Show that if u > 0 and V (u) = 0 then
Z α
α2
|V (u + t)| dt ≤ + O(1/u).
0 2
(11) Let δ > α and consider the behaviour of V (t) for t ∈ [u, u + δ − α]. Show
that either V (t) = 0 for some t in this interval, or else V (t) changes sign at
most once.
(12) If V (t) = 0 for some t ∈ [u, u + δ − α] show that
Z u+δ
|V (t)| dt ≤ α(δ − α/2) + o(1).
u
(13) If V (t) changes sign just once in [u, u + δ − α], show that
Z u+δ
|V (t)| dt < α2 + O(1).
u
(14) If α > 0, by choosing δ suitably, show that
1 u
Z
lim sup |V (t)| dt < α.
u 0
In particular, this contradicts (8) above, and hence α = 0 and we have
proved the prime number theorem.
CHAPTER 2

Sieve methods

The idea of sieve theory is to start with some set of integers (usually an interval)
and remove, or ‘sift out’, those integers divisible by some set of primes. The classic
example is the Sieve of Eratosthenes: beginning with the set of integers {1, . . . , n},
and removing all integers divisible by some prime p ≤ n1/2 , we are left with only
primes. By counting how many removals took place, we aim to find estimates for
the count of what is left.

6. Preliminaries
We first introduce some important notation – we do this in some generality, for
part of the virtue of sieve theory is that the same tools can be applied to study
many different problems. Our basic data will be:
• a finite set A, the set which is to be sifted;
• a set of primes P , which we will sift by (usually this is just the set of all
primes);
• a sifting limit z, which is an upper bound on how large the primes we sieve
by;
• the sifting function
X
S(A, P ; z) = 1(n,P (z))=1 ,
n∈A
Q
where P (z) = p∈P,p<z p, which counts those integers in A left after the
sieve;
• we are able to count using sieve theory only if we are able to count A under
various divisibility conditions - if
Ad = {n ∈ A : d | n}
then we suppose that
f (d)
|Ad | = X + Rd ,
d
where f (d) ≥ 0 is some multiplicative function and Rd is thought of as an
error term. In general, we will only need this estimate for squarefree d. By
convention we suppose that f (p) = 0 if p 6∈ P ;
• finally, the expected density is defined as
Y f (p)

W (z) = WP (z) = 1− .
p
p∈P
p<z

We note that
f (1)
|A| = |A1 | = X + R1 = X + R1 ,
1
14
ANALYTIC NUMBER THEORY 15

so that X is an approximation to the size of our original set. We demonstrate this


with two examples.
Example 1: The most basic example is sifting an interval by all primes. Here
A = {x < n ≤ x + y}, so that
  j k
x+y x y
|Ad | = − = + O(1)
d d d

for all d. Thus we let f ≡ 1 and then this gives us an error bound of Rd  1, and
X = y is a smooth count of how many integers are in A. The sifting function is

S(A, P ; z) = {x < n ≤ x + y : p | n =⇒ p ≥ z}.

For example, if x = 0 and z = y 1/2 , then

S(A, P ; z) = π(y) − π(y 1/2 ) + 1.

Example 2: We can also use sieve methods to count the number of twin primes.
For this we let A = {n(n + 2) : 1 ≤ n ≤ x} and P be the set of all primes except 2.
We will show later that if d is odd and square-free then

2ω(d)
|Ad | = x + O(2ω(d) ),
d

so that we can take f to be the multiplicative function defined by f (2) = 0 and


f (p) = 2 for all odd primes. The sifting function can be used to count the number
of twin primes, since if either n or n + 2 is composite then it must be divisible by
a prime at most x1/2 , and so

S(A, P ; z) = π2 (x) + O(x1/2 ).

The most basic sieve is the sieve of Eratosthenes which counts the sifting function
by an inclusion-exclusion argument. That is, we first subtract from the count all
integers divisible by a single sifted prime, then add back all those divisible by two
primes, and so on. In analytic language this can be cleanly expressed using the
identity
(
X 1 if n = 1 and
µ(d) =
d|n
0 otherwise.

Theorem 7 (Sieve of Eratosthenes-Legendre).


 
X
S(A, P ; z) = XW (z) + O  |Rd | .
d|P (z)
16 THOMAS F. BLOOM

Proof.
X X
S(A, P ; z) = µ(d)
n∈A d|(n,P (z))
X X
= µ(d) 1d|n
d|P (z) n∈A
X
= µ(d) |Ad |
d|P (z)
 
X µ(d)f (d) X
=X +O |Rd |
d
d|P (z) d|P (z)
 
Y f (p)
 X
=X 1− +O |Rd | .
p
p∈P d|P (z)
p<z


We can apply this to the interval as in Example
Q1 to get an upper bound for the
number of primes in an interval. If we take P = p≤z p then
Y 1

W (z) = 1−  1/ log z
p
p≤z

by Lemma 7. Choosing z = log y yields the following, which is striking in that the
upper bound is completely independent of x.
Corollary 1. For any x, y ≥ 1,
y
π(x + y) − π(x)  .
log log y
Proof. Let A = {x < n ≤ x + y} and P be the set of all primes. Then
Y 1

1
W (z) = 1− 
p<z
p log z

and Rd = O(1). It follows that


X
|R(d)|  τ (P (z))  2z
d|P (z)

and so
y
|{x < n ≤ x + y : if p | n then p ≥ z}|  + 2z .
log z
Choosing z = log y means that the right hand side is  y/ log log y. We are done
noting that any primes in (x, x + y] must survive this sieving process (unless they
are  log y themselves, which introduces an error of only O(log y)). 

7. Selberg’s sieve
The main weakness of the sieve of Eratosthenes is the large error term, which
forces us to take z quite small. This is because we have to sift by all d dividing
P (z), the number of which grow exponentially with z. We can instead consider
what happens if we only sieve up to some limit, so only considering the effect of
ANALYTIC NUMBER THEORY 17

those d < D. We must then abandon any hope of getting an exact formula, but we
can hope that we’re still counting the main terms, and thus might get useful upper
and lower bounds for the sifting function – and now with much smaller error terms.
One of the most successful sieves was developed by Selberg. We will return to
lower bound sieves later on, and for now just focus on upper bounds. The key input
in the sieve of Eratosthenes was the replacing of
X
1(n,P (z))=1 by µ(d).
d|(n,P (z))

If we are content with an upper bound then we’d be happy instead replacing it
with any function F (n) such that F (1) ≥ 1 and F (n) ≥ 0 for all n. The starting
point for Selberg’s sieve is the trivial observation that any sequence of real numbers
λd ∈ R such that λ1 = 1 can lead to such a function, since
 2 (
X 1 if n = 1 and
 λd  ≥
d|n
0 if n > 1.

We have complete freedom to choose λd to be whatever weights we wish (possibly


depending on A and P ) to make the right-hand side as small as possible, limited
only by the restriction λ1 = 1.
Before giving Selberg’s sieve, we first make the assumption that

0 ≤ f (p) < p for all p,

and also that f (p) 6= 0 for p ∈ P . This allows us to make the useful definition of a
multiplicative function g such that for all primes p,
 −1
f (p) f (p)
g(p) = 1− −1= ,
p p − f (p)

extending this definition to all square-free d multiplicatively. Note that


X Y 1
g(d) = (1 + g(p)) = .
W (z)
d|P (z) p∈P
p<z

Theorem 8 (Selberg’s sieve).

X X
S(A, P ; z) ≤ + 3ω(d) |Rd |
G(t, z)
d|P (z)
d<t2

for all t ≥ 1, where


X
G(t, z) = g(d).
d|P (z)
d<t

Proof. Let λd ∈ R be some sequence to be chosen later, with the only restriction
that λ1 = 1. By the observation above, using the notation [d, e] for the least
18 THOMAS F. BLOOM

common multiple of d and e,


X
S(A, P ; z) = 1(n,P (z))=1
n∈A
 2
X X
≤  λd 
n∈A d|(n,P (z))
X X
= λd λe 1[d,e]|n
d,e|P (z) n∈A

= XV + R,

say, where
X f ([d, e])
V = λd λe
[d, e]
d,e|P (z)

and
X
R= λd λe R[d,e] .
d,e|P (z)

We will first examine the main term V . If we write [d, e] = abc where c = (d, e)
and d = ac and e = bc, so that (a, b) = (b, c) = (a, c) = 1, then using the fact that
f is multiplicative,
X f (c) X f (a)f (b)
V = λac λbc
c ab
c|P (z) ab|P (z)/c
(a,b)=1
X f (c) f (a)f (b)
X X
= λac λbc µ(d)
c ab
c|P (z) ab|P (z)/c d|(a,b)
 2
X f (c) X X f (m)
= µ(d)  λcm 
c m
c|P (z) d|P (z) d|m|P (z)/c
 2
X X c  X f (n) 
= µ(d) λn
f (c) n
d|P (z) c|P (z)/d cd|n|P (z)
  2
X X c X f (n) 
=  µ(m/c)  λn .
f (c) n
m|P (z) c|m m|n|P (z)

For primes p ∈ P , we note that


1 p − f (p) p
1+ =1+ = .
g(p) f (p) f (p)
In particular, the identity
X 1 n
=
g(d) f (n)
d|n
ANALYTIC NUMBER THEORY 19

holds for all primes n ∈ P and thus by multiplicativity also for all n | P (z). It
follows by Möbius inversion that
X c 1
µ(m/c) = .
f (c) g(m)
c|m

It’s convenient to introduce the sifting limit t at this point, so we will assume that
λd is chosen so that λd = 0 whenever d ≥ t. Then
 2
X 1  X f (n) 
V = λn
g(m) n
m|P (z) m|n|P (z)
X yk2
= ,
g(k)
k|P (z)
k<t

say. Since we are trying to produce an upper bound here, we want to choose λd
(and hence yk ) such that V is as small as possible.
What is the relationship between λd and yk ? First note that yk = 0 for k ≥ t.
For fixed d,
X X X f (e)λe
µ(k)yk 1d|k = µ(k) 1k|e 1d|k
e
k|P (z) k|P (z) e|P (z)
X f (e)λe X
= 1d|e µ(k)
e
e|P (z) k|e/d

f (d)λd
= µ(d) .
d
In particular, X
µ(k)yk = λ1 = 1.
k|P (z)
Since X X yk
1= µ(k)yk = µ(k)g(k)1/2 · ,
g(k)1/2
k|P (z) k|P (z)
k<t k<t
by the Cauchy-Schwarz inequality, it follows that
 
 
X y2  X 
k 
1≤ g(k)
 = G(t, z)V.
g(k) 
k|P (z) k|P (z)
k<t

In particular, V ≥ G(t, z)−1 . Furthermore, we can achieve equality if there is some


constant c such that
yk = cµ(k)g(k).
Using λ1 = 1 as above again, we must have c = G(t, z)−1 . This choice determines
all yk (for k < t, for k ≥ t we choose yk = 0), and hence all λd . The only thing left
to do is to control the error term. We will show that |λd | ≤ 1. The exact expression
for λd for d < t is
g(d)d Gd
λd = µ(d)
f (d) G
20 THOMAS F. BLOOM

where G = G(t, z) and


X
Gd = g(e).
e|P (z)
e<t/d
(e,d)=1

We now note that


X
G= g(e)
e|P (z)
e<t
X X
= g(e)
k|d e|P (z)
e<t
(e,d)=k
X X
= g(k) g(m)
k|d m|P (z)
m<t/k
(m,d)=1
X
≥ Gd g(k)
k|d

g(d)d
= Gd .
f (d)

It follows that |λd | ≤ 1. Finally,


X X X
R≤ λd λe R[d,e] ≤ |Rn | 1,
d,e|P (z) n|P (z) [d,e]=n
n<t2

and
X
1 ≤ 3ω(n) .
[d,e]=n

We first give an application to the problem considered in the previous section –


deriving an upper bound for π(x + y) − π(x) uniform in x.

Corollary 2. For any x, y ≥ 2,

y
π(x + y) − π(x)  .
log y

Proof. As before, we take A = {x < n ≤ x + y} and P to be the set of all primes,


but now we apply Selberg’s sieve. To this end, observe that since f (p) = 1 for all
ANALYTIC NUMBER THEORY 21

p, we have g(p) = 1/(p − 1). We now estimate

X h
Y
G(z, z) = (pi − 1)−1
p1 ···ph <z i=1
h  
X Y 1 1
= + 2 + ···
p1 ···ph <z i=1
pi pi
X X 1
= k1
p1 ···ph <z ki ≥1 p1 · · · pkhh
X1

n<z
n
 log z.

log 3
Furthermore, since 3ω(d) ≤ τ (d) log 2  d for any squarefree d, as in the estimate
for divisor function, the error term is  z 2+ . In total then,

y
S(A, P ; z)  + z 2+ ,
log z

and the proof is complete, choosing z = y 1/3 , say. 

The power of Selberg’s sieve is made most evident by the following application,
which gives an upper bound for the number of twin primes of the correct order of
magnitude.

Theorem 9 (Brun). Let π2 (x) count the number of n ≤ x such that both n and
n + 2 are prime. We have

x
π2 (x)  .
(log x)2

Proof. We apply Selberg’s sieve with A = {n(n + 2) : 1 ≤ n ≤ x} and P being the


set of all primes except 2. We note that

|Ad | = #{1 ≤ n ≤ x : d | n(n + 2)}.

If d is odd and square-free, say d = p1 · · · pr , then d | n(n + 2) if and only if


n ≡ 0 or − 2 (mod pi ) for all 1 ≤ i ≤ r. By the Chinese Remainder Theorem this
is true if and only if n lies in one of 2ω(d) many residue classes modulo d. It follows
that

2ω(d)
|Ad | = x + O(2ω(d) ),
d
22 THOMAS F. BLOOM

so that f (2) = 0 and f (p) = 2 otherwise. In particular, g(p) = 2/(p−2) ≥ 2/(p−1),


and so g(d) ≥ 2ω(d) /φ(d) for all odd square-free d. We thus have
X 2ω(d)
G(z, z) ≥
φ(d)
d<z
d odd and square-free
h
X Y 2

p1 ···ph
p −1
<z i=1 i
h  
X Y 2 2
= + + ···
p1 ···ph <z i=1
pi p2i
X 2ω(n)
≥ .
n<z
n

We claim that the right hand side is  (log z)2 . By partial summation it suffices
to show that
X
2ω(d)  z log z.
d<z

For this we use the identity d2 m=n µ(d)τ (m) = 2ω(n) , which is true since both
P
sides are multiplicative
P and it is easily checked for prime powers. We therefore
have, using that m<y τ (m) = y log y + O(y),
X X X
2ω(d) = µ(d) τ (m)
d<z d2 <z m<z/d2
!
X µ(d) X log d X 1
= z log z − z µ(d) + O z
d2 d2 d2
d2 <z d2 <z d2 <z
 
X µ(d) X 1
= z log z 2
+ O z + z log z 
d d2
d 1/2 d>z

= cz log z + O(z),
P∞
where c = n=1 µ(d)d2 > 0 is some constant. This is  z log z as required. Finally,
we note that the error term is bounded above by
X
6ω(d)  z 2+
d<z 2

for any  > 0, since 6ω(d)  d . It follows that, for any z,


x
S(A, P ; z)  + z3,
(log z)2

say. If we choose z = x1/4 , say, then the right hand side is  x/(log x)2 , and the
left-hand side is at least π2 (x) + O(x1/4 ), and the result follows.

ANALYTIC NUMBER THEORY 23

8. Combinatorial sieve
Lemma 9 (Buchstab formula).
X
S(A, P ; z) = |A| − S(Ap , P ; p).
p|P (z)

Proof. The left-hand side counts the number of elements in


S = {n ∈ A : if p | n and p ∈ P then p ≥ z}.
If we let
Sp = {n ∈ A : p | n and if q | n and q ∈ P then q ≥ p}
then it is clear that Sp is disjoint from S if p < z, and furthermore Sp1 and Sp2 are
disjoint if p1 6= p2 . Finally, if n ∈ A then either n ∈ S or n ∈ Sp where p = `(n),
the least prime divisor of n. It follows that
G
A=St Sp ,
p∈P
p<z

and the result follows by equating the cardinalities of both sides. 

Q By the same reasoning we can obtain a similar formula for the main term W (z) =
p∈P,p<z (1 − f (p)/p).

Lemma 10.
X f (p)
W (z) = 1 − W (p).
p
p|P (z)

Corollary 3. For any r ≥ 1,


X X
S(A, P ; z) = µ(d) |Ad | + (−1)r S(Ad , P ; `(d)),
d|P (z) d|P (z)
ω(d)<r ω(d)=r

where `(d) denotes the least prime divisor of d. Similarly,


X f (d) X f (d)
W (z) = µ(d) + (−1)r W (`(d)).
d d
d|P (z) d|P (z)
ω(d)<r ω(d)=r

Proof. This follows by iteratively applying Buchstab’s identity. We use induction


on r, noting that the case r = 1 is precisely Buchstab’s identity. In general, for
fixed d | P (z), with ω(d) = r, by Buchstab’s identity
X
S(Ad , P ; `(d)) = |Ad | − S(Apd , P ; p).
p∈P
p<`(d)

Since p < `(d), the numbers pd are all square-free and `(pd) = p. Furthermore, as d
ranges over all divisors of P (z) with ω(d) = r, these pd range over all divisors of P (z)
with ω(d0 ) = r + 1. The general form follows by induction, since µ(d) = (−1)r . 
In particular, X
S(A, P ; z) ≥ µ(d) |Ad |
d|P (z)
ω(d)<r
24 THOMAS F. BLOOM

if r is odd, and similarly this holds with an upper bound if r is even. We can
use this formula to truncate the Sieve of Eratosthenes at any level r, and examine
the remainder term. This leads us to Brun’s pure sieve, the simplest form of the
combinatorial sieve.
Theorem 10 (Brun’s pure sieve). If r ≥ 6 |log W (z)| then
 
 −r X 
S(A, P ; z) = XW (z) + O 
 2 X + |Rd |
.
d|P (z)
d≤z r

Proof. We note the trivial bounds


f (d)
0 ≤ S(Ad , P ; `(d)) ≤ |Ad | = X + Rd .
d
Therefore, by Corollary 3,
 
X f (d)  X f (d) X 
S(A, P ; z) = X µ(d) +OX + |Rd |
.
d  d
d|P (z) d|P (z) d|P (z)
ω(d)<r ω(d)=r ω(d)≤r

Also by Corollary 3,
 
X f (d)  X f (d) 
W (z) = µ(d) +O ,
d  d 
d|P (z) d|P (z)
ω(d)<r ω(d)=r

and so  
 X f (d) X 
S(A, P ; z) = XW (z) + O X + |Rd |
.
 d
d|P (z) d|P (z)
ω(d)=r ω(d)≤r

We now note that


P r !r
f (p) P f (p)
X f (d) p|P (z) p e p|P (z) p
≤ ≤ .
d r! r
d|P (z)
ω(d)=r

Furthermore,
X f (p) X
≤ − log(1 − f (p)/p) = − log W (z).
p
p|P (z) p|P (z)

If − log W (z) ≤ r/2e, then, the right-hand side above is ≤ 2−r . The final error term
we bound crudely by noting that if d | P (z) and ω(d) ≤ r then certainly d ≤ z r ,
since it is the product of at most r primes, all less than z. 
Q
For example, if W (z) = p<z (1 − 1/p) ≈ 1/ log z, then we can take r =
O(log log z), so the level is elog z log log z , compared to 2z from the Sieve of Eratos-
thenes.
ANALYTIC NUMBER THEORY 25

  
Corollary 4. For any z ≤ exp o logloglogx x such that z → ∞ with x we have the
asymptotic formula
x
|{1 ≤ n ≤ x : if p | n then p ≥ z}| ∼ e−γ .
log z
Proof. We apply Brun’s pure sieve to A = {1 ≤ n ≤ x} and P the set of all primes.
As we have shown in the previous chapter,
Y 1

1
W (z) = 1− = (1 + o(1))e−γ .
p log z
p≤z

In particular, if we take x large enough, then r = 100dlog log ze will satisfy the
lower bound in Brun’s sieve. For this value of r,
x
2−r x ≤ 2−100 log log z x ≤ ,
(log z)2
say. Furthermore, since |Rd |  1, the other error term is
X
 1 ≤ z r ≤ exp(200 log z log log z).
d≤z r

If log z = o(log x/ log log x) then, for x sufficiently large, this is at most x1/2 , say.
It follows that
 
−γ x x x
S(A, P ; z) = (1 + o(1))e +O 2
+x 1/2
= (1 + o(1))e−γ ,
log z (log z) log z
since z → ∞ as x → ∞. 
CHAPTER 3

The Riemann zeta function

In this chapter, we will use (as is traditional for this topic) the letter s to denote
a complex variable, and σ and t to denote its real and imaginary parts respectively,
so that s = σ + it. Before we begin, it’s worth pausing to explicitly point out what
we mean by ns , where n is a natural number and s ∈ C. By definition this is
ns = es log n = nσ eit log n .
It is easy to check the multiplicative property, that (nm)s = ns ms .
A Dirichlet series is an infinite series of the form

X an
,
n=1
ns
for some coefficients an ∈ C.
Lemma 11. For any sequence an there is an abscissa of convergence σc such that
α(s) converges for all s with σ > σc and for no s with σ < σc . If σ > σc then there
is a neighbourhood of s in which α(s) converges uniformly. In particular, α(s) is
holomorphic at s.
Proof. It suffices to show that if α(s) converges at s = s0 and we take some s with
σ > σ0 then α converges uniformly in some neighbourhood of s. The lemma then
follows by taking σc = inf{σ : α(s) converges}.
Suppose that α(s) converges at s = s0 . If we let R(u) = n>u an n−s0 then by
P
partial summation, for any s,
X Z N
−s s0 −s s0 −s
an n = R(M )M − R(N )N + (s0 − s) R(u)us0 −s−1 du.
M <n≤N M

If |R(u)| ≤  for all u ≥ M , and if σ > σ0 , then it follows that


Z ∞  
X
−s σ0 −σ−1 |s − s0 |
an n ≤ 2 +  |s − s0 | t dt ≤ 2 + .
M σ − σ0
M <n≤N

There is some neighbourhood of s in which |s − s0 |  σ−σ0 , and hence by Cauchy’s


principle the series converges uniformly at s. 
an n−s = bn n−s for all s in some half-plane σ > σ0 then
P P
Lemma 12. If
an = bn for all n.
cn n−s = 0 for all s with σ > σ0 then cn = 0 for
P
Proof. It suffices to show that if
all n. Suppose that cn = 0 for all n < N . We can write
X
cN = − cn (n/N )−σ .
n>N
26
ANALYTIC NUMBER THEORY 27

Since the sum here is convergent, the summands tend to 0, and hence cn  nσ0 . It
follows that that this sum is absolutely convergent for σ > σ0 + 1. Since each term
tends to 0 as σ → ∞, and the series is absolutely convergent, the right-hand side
tends to 0, and hence cN = 0. 
Lemma 13. If α(s) and β(s) are two Dirichlet series, both absolutely convergent
at s, then

!
X X
ac bd n−s
n=1 cd=n
is absolutely convergent and equals α(s)β(s).
Proof. We simply multiply out the product of two series,
! ! !
X an X bm X an bm X X
a s = = an bm k −s ,
n
n m
ms n,m
(nm)s
k nm=k

which is justified since both series are absolutely convergent. 


|f (n)| n−σ converges then
P
Lemma 14. If f is multiplicative and

X Y
f (n)n−s = 1 + f (p)p−s + f (p2 )p−2s + · · · .

n=1 p

Proof. By comparison each sum in the product is absolutely convergent. Since a


product of finitely many absolutely convergent series can be arbitrarily rearranged,
Y X
1 + f (p)p−s + f (p2 )p−2s + · · · = f (n)n−s .


p≤y n
p|n =⇒ p≤y

Therefore the difference between the product here and the Dirichlet series is at
most X
|f (n)| n−σ → 0 as y → ∞.
n>y

We now define the Riemann zeta function in the half-plane σ > 1 by
X 1
ζ(s) = .
n
ns
Observe that this series diverges at s = 1, and the series actually converges ab-
solutely for σ > 1. By the above, ζ(s) defines a holomorphic function in this
half-plane. For our applications, we need to extend this definition to be able to talk
about ζ(s) for σ > 0.
Lemma 15. For σ > 1,

{t}
Z
1
ζ(s) = 1 + −s dt.
s−1 1 ts+1
Proof. By partial summation, for any x,
Z x
X bxc btc
n−s = s + s s+1
dt.
x 1 t
1≤n≤x
28 THOMAS F. BLOOM

The integral here is


Z x x x
{t} {t}
Z Z
s s
s t−s dt − s dt = − x1−s − s dt.
1 1 ts+1 s−1 s−1 1 ts+1
Since σ > 1, if we take the limit as x → ∞, we have
Z ∞
s {t}
ζ(s) = −s s+1
dt,
s−1 1 t
noting that the integral converges. 
The integral here is convergent for any σ > 0, and therefore the right hand side
defines an analytic function for σ > 0, aside from a simple pole at s = 1 with
residue 1. We have therefore given an analytic continuation for ζ(s) up to σ = 0.
We also record here the Euler product for ζ(s),
Y −1
1
ζ(s) = 1− s ,
p
p

which is valid for σ > 1. From this it follows that ζ(s) 6= 0 for σ > 1. The Euler
product leads to the identity
1 Y 1
 X
µ(n)
= 1− s = .
ζ(s) p
p n
ns
Furthermore, when σ > 1, the series is absolutely convergent, and so the derivative
can be computed summand by summand, leading to
X log n
ζ 0 (s) = − .
n
ns
From the Euler product we have
X X Λ(n)
log 1 − p−s = − n−s .

log ζ(s) = −
p n
log n

Finally, taking the derivative of this, we obtain the Dirichlet series with Λ(n) as
coefficients:
ζ0 X Λ(n)
(s) = − .
ζ n
n−s

9. Prime number theorem


By partial summation, in a generalisation of the argument from the previous
section, one can show that if
X an X
α(s) = s
and A(x) = an
n
n
n≤x

then Z ∞
A(t)
α(s) = s dt.
1 ts+1
That is, α(s) can be expressed as a function of A(x). We are more interested in the
0
converse – for example, if an = Λ(n), then α(s) = − ζζ (s), which we hope we can
P
understand via analysis, and A(x) = n≤x Λ(n) = ψ(x), the asymptotics of which
are the subject of the prime number theorem.
ANALYTIC NUMBER THEORY 29

Lemma 16. If σ0 > 0 then


Z σ0 +iT s (
1 y 1 if y > 1 and
ds = + O(y σ0 /T log y).
2πi σ0 −iT s 0 if 0 < y < 1
Proof. Let y > 1, u < 0, and let C be the contour from −u − iT to σ0 − iT to
σ0 + iT to −u + iT . The function y s /s is analytic apart from a simple pole at s = 0,
where the residue is 1. It follows that
Z s
1 y
ds = 1.
2πi C s
We can bound the contribution from the two unwanted paths by
Z σ0 −iT s Z σ0 σ−iT
1 σ0 σ 1 σ0 σ y σ0
Z Z
y y
ds = dσ  y dσ ≤ y dσ = .
−u−iT s −u σ − iT T −u T −∞ T log y
Note that here we used that y > 1 for the final part. The contribution from the
left-hand side of the rectangle is
Z T
yu yu
 dt  T → 0 as u → −∞.
−T |u − it| u
The case 0 < y < 1 is similar, but we need to take the mirror image of the
contour. 
Theorem 11 (Perron’s formula). Suppose that α(s) is absolutely convergent for
σ > σa . If σ0 > max(0, σa ) and x > 0 is not an integer then, for any T ≥ 1,
 
Z σ0 +iT
X 1 xs x X |a n | x σ0 X
|an |
an = α(s) ds + O 2σ0 + .
2πi σ0 −iT s T |x − n| T n nσ0
n≤x x/2<n<2x

Proof. Since σ0 > 0, we can write


Z σ0 +iT
(x/n)s (x/n)σ0
 
1
1n<x = ds + O .
2πi σ0 −iT s T log(x/n)
It follows that, since the series converges absolutely, and so we can interchange it
with a finite integration,
Z σ0 +iT s !
X 1 x xσ0 X |an |
an = α(s) + O .
n<x
2πi σ0 −iT s T n nσ0 |log(x/n)|
Note that the interchange of integration and summation is valid since both are over
finite ranges. To simplify the error term, write log(x/n) = |log(1 + (n − x)/x)|
and use the fact that |log(1 + δ)|  |δ| uniformly for −1/2 ≤ δ ≤ 1, so that
|log(x/n)|  n−x
x uniformly for x/2 < n < 2x. For other values of n, we have
|log(x/n)|  1. The error term is therefore
2σ0 x X |an | xσ0 X |an |
 + .
T |x − n| T n nσ0
x/2<n<2x


For the proof of the prime number theorem we require the following three facts
about zero-free regions of ζ, which we will prove in the next section.
30 THOMAS F. BLOOM

(1) there is a constant c such that, if σ > 1 − c/ log t and |t| ≥ 7/8, then
0
ζ(s) 6= 0 and ζζ (s)  log(|t| + 2), and
(2) ζ(s) 6= 0 for 89 < σ and |t| ≤ 7/8, and
0
(3) ζζ (s) = s−1
−1
+ O(1) for 5/6 ≤ σ ≤ 2 and |t| ≤ 7/8.

Theorem 12 (Prime Number Theorem). There is c > 0 such that


 
x
ψ(x) = x + O √ .
exp(c log x)
In particular, ψ(x) ∼ x.
Proof. We can assume that x = N + 1/2 for some integer N . By Perron’s formula,
for any 2 > σ0 > 1, we have
 
Z σ0 +iT 0
1 ζ xs x X Λ(n) x σ0 X
Λ(n)
ψ(x) = − (s) ds + O 2σ0 + .
2πi σ0 −iT ζ s T |x − n| T n nσ0
x/2<n<2x

Let’s first examine the error term here. The first summand is
x
x log x X 1 x(log x)2
  .
T n=1 n T

The infinite sum in the second summand is  σ01−1 . Overall, then the error term
is
x(log x)2 xσ0
 + .
T T (σ0 − 1)
Choosing σ0 = 1 + 1/ log x this is O(x(log x)2 /T ). Now let σ1 = 1 − c/ log T , where
c is such that ζ(s) 6= 0 for σ ≥ σ1 , and C be the rectangle contour connecting the
two lines. Since ζ 0 /ζ(s) has a simple pole at s = 1 with residue −1, and is otherwise
analytic (as there are no zeros of ζ inside the contour), we have
Z 0
1 ζ xs
(s) ds = −x.
2πi C ζ s
The right-hand side of this contour is −ψ(x) = O(xT −1 (log x)2 ) by the above. It
remains to bound the contribution from the other sides of the rectangle. To this
end, first note that
Z σ0 +iT 0
ζ xs xσ0
(s) ds  (σ1 − σ0 ) log T,
σ1 +iT ζ s T
0
since |s|  T on this line, |xs | ≤ xσ0 , the line has length σ1 − σ0 , and ζζ (s)  log t.
Using our choices for σ0 and σ1 , it follows that the two short sides of the rectangle
contribute O(x/T ), provided T ≤ x.
Finally, we turn our attention to the long left-hand side of the rectangle. Away
0
from t = 0 (where |t| ≥ 1, say) we use the bound ζζ  log |t| to bound the
contribution along this line by
Z T
σ1 1
 x log T dt  xσ1 (log T )2 .
1 t
ANALYTIC NUMBER THEORY 31

0
In the interval −1 < t < 1 we use the bound ζζ  |s−1| 1
to bound the contribution
to the integral by Z 1
1
 xσ1 dt  xσ1 .
−1 |t(t − 1)|
Combining these estimates we have, for any 1 ≤ T ≤ x,
x 
ψ(x) = x + O (log x)2 + x1−c/ log T (log T )2 .
T

We now make a choice of T to optimise this error term, which is T = exp(c log x)
for some small constant c > 0, and the proof is complete. 

10. Zero-free region


2 8
Theorem 13. If σ > (1 + t )/2 then ζ(s) 6= 0. In particular, ζ(s) 6= 0 if 9 ≤σ≤1
and |t| ≤ 87 . Furthermore,
1 ζ0 1
ζ(s) = + O(1) and − (s) = + O(1)
s−1 ζ s−1
8
uniformly for 9 ≤ σ ≤ 2 and |t| ≤ 78 .
Proof. We recall the identity

{u}
Z
1
ζ(s) = 1 + −s du.
s−1 1 us+1
In particular,
s |s|

ζ(s) − ,
s−1 σ
which proves the first claim and the second. The final claim follows since if ζ(s) =
(s − 1)−1 + f (s) then
ζ0 −(s − 1)−2 + f 0 (s) −1 f (s) + f 0 (s)(s − 1) −1
(s) = −1
= + = + O(1).
ζ (s − 1) + f (s) s−1 1 + f (s)(s − 1) s−1

Theorem 14 (Maximum modulus principle). If U is a bounded connected open
set and f is holomorphic on U then |f | attains its maximum on the boundary
∂U = U \U .
Lemma 17 (Borel-Carathéodory Lemma). Let f be holomorphic on |z| ≤ R such
that f (0) = 0 and suppose <f (z) ≤ M for all |z| ≤ R. For any r < R,
sup (|f (z)| , |f 0 (z)|) r,R M.
|z|≤r

Proof. Let
f (z)
g(z) = ,
z(2M − f (z))
so that g is holomorphic for |z| ≤ R. Observe that, using <f (z) ≤ M ,
2 2
|f (z)| = <(f (z))2 + =(f (z))2 ≤ (2M − <(f (z)))2 + =(f (z))2 = |2M − f (z)| ,
and so |2M − f (z)| ≥ |f (z)| for |z| ≤ R. In particular, if |z| = R then |g(z)| ≤ 1/R.
By the maximum modulus principle, if |z| = r, then
|f (z)| 1
|g(z)| = ≤ ,
r |2M − f (z)| R
32 THOMAS F. BLOOM

and hence
R |f (z)| ≤ |2M r − rf (z)| ≤ 2M r + r |f (z)| ,
or
2r
|f (z)| ≤ M.
R−r
This shows that |f (z)|  M . To deduce the same bound for f 0 (z), we use Cauchy’s
formula Z
1 f (w)
f 0 (z) = dw,
2πi r0 (w − z)2
where the integral is taken over some circle of radius r < r0 < R, say. 

Lemma 18. Suppose that f (z) is analytic in a domain containing |z| ≤ 1, that
|f (z)| ≤ M in this disc, and that f (0) 6= 0. Let 0 < r < R < 1. Then for |z| ≤ r,
K
f0
 
X 1 M
(z) = + O log
f z − zk |f (0)|
k=1

where the sum is over all zeros zk of f for which |zk | ≤ R.


Proof. Without loss of generality, we may suppose that f (0) = 1. Furthermore, we
can assume that there are no zeros of f in the annulus r < R −  ≤ |z| ≤ R +  < 1
for some small enough  > 0. All implicit constants in this proof can depend on r,
R, and . Let
K
Y R2 − zzk
g(z) = f (z) .
R(z − zk )
k=1
Observe that the kth factor has a pole at zk , and has modulus 1 on |z| = R. It
follows that g is an analytic function in |z| ≤ R, and if |z| = R then |g(z)| =
|f (z)| ≤ M . By the maximum modulus principle,
K
Y R
|g(0)| = ≤ M.
|zk |
k=1

It follows that, since each zk satisfies |zk | ≤ R − , the number of zeros satisfies
K  log M . Let h(z) = log(g(z)/g(0)) (which is allowed since g(z) has no zeros in
|z| ≤ R). We have h(0) = 0, and
<h(z) = log |g(z)| − log |g(0)| ≤ log M
for |z| ≤ R. By the Borel-Carathéodory lemma,
|h0 (z)|  log M.
Finally,
K K
f0 X 1 X 1
h0 (z) = (z) − + .
f z − zk z − R2 /zk
k=1 k=1
2 2
If |z| ≤ r then z − R /zk ≥ R /zk − |z| ≥ R − r, and so the final sum is
 log M . 

Since |ζ(3/2 + it)|  1 and |ζ(s + 3/2 + it)|  t for |s| ≤ 1 we obtain the
following information about the zeta function.
ANALYTIC NUMBER THEORY 33

Corollary 5. If |t| ≥ 7/8 and 5/6 ≤ σ ≤ 2 then


ζ0 X 1
(s) = + O(log |t|),
ζ ρ
s−ρ

where the sum is over all zeros ρ of ζ(s) in the region |ρ − (3/2 + it)| ≤ 5/6.
Theorem 15. There is a constant c > 0 such that
c
ζ(s) 6= 0 for σ ≥ 1 − .
log t
Proof. Let ρ = σ + it be such that ζ(ρ) = 0, and let δ > 0 be something to be
chosen later. By Corollary 5,
ζ0 1 X 1
−< (1 + δ + it) = − −< + O(log t)
ζ 1+δ−σ 0
1 + δ + it − ρ0
ρ 6=ρ

Since <ρ0 ≤ 1 for all zeros ρ0 , it follows that <(1/(1 + δ + it − ρ0 )) > 0, provided
δ > 0. In particular,
ζ0 1
−< (1 + δ + it) ≤ − + O(log t).
ζ 1+δ−σ
Similarly,
ζ0
−< (1 + δ + 2it)  log t.
ζ
Finally,
ζ0 1
− (1 + δ) = + O(1).
ζ δ
We now note that
ζ0 ζ0 ζ0
 
< −3 (1 + δ) − 4 (1 + δ + it) − (1 + δ + 2it)
ζ ζ ζ
is

X Λ(n)
(3 + 4 cos(t log n) + cos(2t log n)) .
n=1
n1+δ
Since 3 + 4 cos θ + cos 2θ = 2(1 + cos θ)2 ≥ 0, the entire sum is ≥ 0. It follows that
3 4
− + O(log t) ≥ 0.
δ 1+δ−σ
This implies that 1 − σ  1/ log t, choosing δ ≈ 1/ log t. 
This result has been improved by Korobov and Vinogradov to 1 − c/(log t)2/3+ .

11. Error terms


Assuming the Riemann hypothesis, the proof above of the prime number theorem
gives
ψ(x) = x + O(x1/2+o(1) ).
In fact the error term here can be taken as x1/2 (log x)2 , but this more precise form
takes some finesse to achieve. It is natural to ask how good an error term we might
hope for here. It turns out that, just as the absence of zeros of the zeta function
allows us to show the error term is small, their presence allows us to show that
error term must be somewhat large.
34 THOMAS F. BLOOM

For this we will need the following lemma of Landau. The real interest of this
lemma is that when our integrand is non-negative we obtain not just a half-plane of
convergence, but more importantly knowledge about what happens on the boundary
line itself. A similar fact holds for Dirichlet series.
Lemma 19 (Landau). Supppose that A is an integrable function bounded in any
finite interval, A(x) ≥ 0 for all large x ≥ X, and let
 Z ∞ 
σc = inf σ : A(x)x−σ dx < ∞ .
X
The function Z ∞
F (s) = A(x)x−s dx
1
is analytic in σ > σc but not at s = σc .
Proof. Divide the integral in the definition of F to [1, X] and [X, ∞), given a
corresponding decomposition into F = F1 + F2 , say. The function F1 is entire.
For σ > σc , the integral converges absolutely, and hence F2 also defines an entire
function. Suppose that F2 is analytic at s = σc . We may expand F2 (s) as a power
series at s = σc + 1, so that

X
F2 (s) = ck (s − 1 − σc )k ,
k=0

where
(k)
1 ∞
Z
F2 (1 + σc )
ck = = A(x)(− log x)k x−1−σc dx.
k! k! X
The radius of convergence of this power series is the distance from 1 + σc to the
nearest singularity of F2 (s), and hence by assumption is at least 1 + δ for some
δ > 0, say. If we consider s = σc − δ/2, then

(1 + σc − s)k ∞
X Z
F2 (s) = A(x)(log x)k x−1−σc dx.
k! X
k=0

This is a convergent series with all non-negative terms, and hence we can inter-
change the integral and summation, to find
Z ∞ Z ∞
F2 (s) = A(x)x−1 exp((1 + σc − s) log x) dx = A(x)x−s dx,
X X
and so the integral must converge at s = σc − δ/2, which contradicts the definition
of σc . 
When discussing lower bounds for error terms, the following notation is useful.
We say that f = Ω± (g) if
f (x)
lim sup ≥c>0
x→∞ g(x)
and
f (x)
lim inf ≤ −c < 0,
x→∞ g(x)

for some absolute constant c > 0. That is, not only does f (x) exceed (some constant
multiple of) g(x) infinitely often, but it does so both positively and negatively.
ANALYTIC NUMBER THEORY 35

Theorem 16. If σ0 is the supremum of the real parts of the zeros of ζ(s) then, for
any σ < σ0 ,
ψ(x) = x + Ω± (xσ ).
If there is a zero ρ with <ρ = σ0 , then
ψ(x) = x + Ω± (xσ0 ).
Proof. Suppose that ψ(x) − x ≤ cxσ for all large enough x, say x > X. Following
Lemma 19 we will consider the function
Z ∞
c ζ 0 (s) 1
F (s) = (cxσ − ψ(x) + x)x−s−1 dx = + + .
1 s − σ sζ(s) s − 1
The right-hand side has a pole at s = σ, but is analytic for real s > σ. It follows
that in fact the above identity must hold for all s with <s > σ. It follows that
there can’t be any zeros of ζ(s) in this region, which is a contradiction if σ < σ0 .
For the second, stronger, conclusion, we need to argue a little more carefully.
Suppose that there is a zero ρ = σ0 + it. Consider instead
Z ∞
eiθ F (s + it) + e−iθ F (s − it)
F (s)+ = (cxσ − ψ(x) + x) (1+cos(θ−t log x))x−s−1 dx.
2 1
The coefficients here are still non-negative real numbers. The left-hand side has a
pole at s = σ with residue
meiθ /ρ + me−iθ /ρ
c+ ,
2
where m is the multiplicity of the zero ρ. We have freedom to choose θ to be
whatever we like, in particular so that this expression is c − m/ |ρ|. The lim inf of
the right-hand side is > −∞ as s approaches σ from the right along the real axis.
We must therefore have
m
c− ≥ 0,
|ρ|
and hence c ≥ m/ |ρ|. This establishes the Ω+ aspect of the theorem. For Ω− we
use the same argument with signs reversed, so that we start with
Z ∞
F (s) = (−cxσ + ψ(x) − x)x−s−1 ds,
1
and so on. 

12. Functional equation


Finally, we establish an important analytic property of the zeta function, which
in particular reveals a certain symmetry in the location of zeros.
We first extend the definition of the zeta function to a larger half-plane. Recall
that for σ > 0 we defined
Z ∞
1 {u}
ζ(s) = 1 + −s du.
s−1 1 us+1
We will extend the region where this is valid by integrating by parts. First let
f (x) = 12 − {x}, so that
Z ∞
1 1 f (u)
ζ(s) = + +s s+1
du.
2 s−1 1 u
36 THOMAS F. BLOOM

Rx
If we let F (x) = 0 f (u) du then, by integration by parts,
Z ∞ Z ∞
f (u) −s−1 ∞ F (u)
s+1
= [F (u)u ] 0 + (s + 1) du.
1 u 1 us+2
Since F (x) is bounded, the integral here converges for any s with σ > −1, and
hence the left-hand side also converges in this region. We may therefore take
Z ∞
1 1 f (u)
ζ(s) = + +s s+1
du
2 s−1 1 u
as the definition of ζ(s) in the half-plane σ > −1. If −1 < σ < 0 then
Z 1
1 1 1
Z Z 1
f (u) 1 1 1
s+1
du = s+1
du − s
du = − + ,
0 u 2 0 u 0 u 2s s − 1
and so in this strip Z ∞
f (u)
ζ(s) = s s+1
du.
0 u
We now note that f (x) is a periodic function, continuous in (0, 1), and so it has a
Fourier series, which is

X sin(2πnu)
f (u) = .
n=1
πn
For −1 < σ < 0 we therefore get
Z ∞ ∞ ∞
s X 1 ∞ sin(2πnu)
Z
1 X sin(2πnu)
ζ(s) = s du = du.
0 us+1 n=1 πn π n=1 n 0 us+1
We should justify the interchange of integral and summation here. For this, we note
that the series is uniformly convergent almost everywhere, furthermore converges to
some bounded value in (−1/2, 1/2] almost everywhere, and hence the interchange
of limits is justified by the dominated convergence theorem.
By change of variable, we have
Z ∞ Z ∞
sin(2πnu) s sin(u)
s+1
= (2πn) du.
0 u 0 us+1
Furthermore, writing sin(u) = 2i1
(eiu − e−iu ) and using another change of variable,
Z ∞ Z ∞
sin u
s+1
du = − sin(πs/2) t−s−1 e−t dt.
0 u 0
The integral here is an important one known as the Gamma function. It cannot be
simplified further, in general, but is one of the fundamental functions of analysis.
Let Z ∞
Γ(s) = ts−1 e−t dt,
0
which converges for σ > 0, and defines a holomorphic function in this region.
Integrating by parts we see that
Γ(s + 1) = sΓ(s).
This identity, combined with the fact that Γ(1) = 1, implies that Γ(n) = (n − 1)!
for all integer n ≥ 1. Furthermore, it allows us to analytically extend Γ(s) to a
meromorphic function on the entire complex plane with poles at s = 0, −1, −2, . . ..
ANALYTIC NUMBER THEORY 37

Combining the above we have shown that, for −1 < σ < 0,



s X (2πn)s
ζ(s) = sin(πs/2)Γ(−s) = 2s π s−1 sin(πs/2)Γ(1 − s)ζ(1 − s).
π n=1 n

The right-hand side is actually analytic for any σ < 1, and hence we can take the
right-hand side to be a definition of ζ(s) in this region. By analytic continuation it
follows that this identity must hold for all s ∈ C. We have proved the following.
Theorem 17 (Functional equation). The zeta function ζ(s) can be extended a
function meromorphic on the whole complex plane, and for all s satisfies the identity
ζ(s) = 2s π s−1 sin(πs/2)Γ(1 − s)ζ(1 − s).
Many interesting facts can be deduced from this identity. We will first use it to
study the possible poles of ζ(s). We know that ζ(s) has a simple pole at s = 1,
and nowhere else for σ > −1. Suppose that ζ has a pole at s for σ < 0. Then so
too does Γ(1 − s)ζ(1 − s), but both Γ(s) and ζ(s) are holomorphic for all s with
<s > 1, which is a contradiction. It follows that ζ(s) only has one pole in C, which
is a simple pole at s = 1.
We will now consider the zeros of ζ(s). Suppose that ζ(s) = 0 and σ < 0. It
follows that
sin(πs/2)Γ(1 − s)ζ(1 − s) = 0.
Again, neither Γ(1 − s) nor ζ(1 − s) can be zero or a pole, and so sin(πs/2), which
means s must be an even integer. These are called the trivial zeros of ζ(s), located
at s = −2, −4, −6, . . .. Since there are no zeros with σ ≥ 1, there are no other zeros
with σ ≤ 0.
Aside from the trivial zeros, then, all zeros of ζ must lie in the critical strip
0 < σ < 1. Furthermore, since the other factors in the functional equation are
entire and non-zero in this strip, this implies that if ρ is a zero in the critical strip,
then so too is 1 − ρ. There is therefore a symmetry around the critical line σ = 1/2.
The Riemann hypothesis is motivated in part by the belief that this symmetry
should collapse so that all the zeros are located exactly on this line.
Using this symmetry and the results of the previous section we obtain the follow-
ing equivalence between the Riemann hypothesis and the error term in the prime
number theorem.
Theorem 18. The Riemann hypothesis is equivalent to the statement that
 
ψ(x) = x + O x1/2+o(1) .

Proof. If the Riemann hypothesis is true then the contour integration proof of the
prime number theorem can be improved to show that ψ(x) = x + O(x1/2+o(1) ). For
the converse, we use Theorem 16. If the Riemann hypothesis fails, there is some
zero ρ with real part 0 < σ < 1 such that σ 6= 1/2. Since 1 − ρ is also a zero, we
may assume without loss of generality that σ > 1/2. If we take some 1/2 < σ 0 < σ
then Theorem 16 shows
0
ψ(x) = x + Ω± (xσ ),
which would contradict ψ(x) = x + O(x1/2+o(1) ) for large enough x (large enough
so that the 1/2 + o(1) exponent is less than σ 0 ), which concludes the proof. 
38 THOMAS F. BLOOM

If we also assume the existence of at least one zero on the critical line (there are
infinitely many, and the first is at 1/2 + (14.1 . . .)it), then we have the following
error term bound.
Theorem 19.
ψ(x) = x + Ω± (x1/2 ).
Proof. If the Riemann hypothesis is true then we use the existence of a zero on the
critical line and the second part of Theorem 16. If the Riemann hypothesis is false
then an even stronger statement is true by the first part of Theorem 16. 
CHAPTER 4

Primes in arithmetic progression

13. Dirichlet characters and L-functions


A Dirichlet character modulo q is a totally multiplicative function χ : N → C
such that χ(n) = 0 if (n, q) 6= 1 and χ has period q. Another way to think of them
is as group homomorphisms from (Z/qZ)× → C, extended to a function on the
whole of N in the natural way.
For every q there is a trivial character, known as the principal character, denoted
by χ0 , which is equal to 1 for all n with (n, q) = 1. Since (Z/qZ)× is a finite abelian
group, the set of characters is also a finite abelian group, isomorphic to (Z/qZ)× .
In particular, there are exactly φ(q) many Dirichlet characters modulo q.
These are multiplicative characters, in contrast to the additive characters n 7→
e2πitn used in Fourier analysis. As such, they are ideal for use in multiplicative
number theory. Their most fundamental property is their orthogonality.
Lemma 20. If χ is a Dirichlet character modulo q then
(
X φ(q) if χ = χ0 and
χ(n) =
1≤n≤q
0 otherwise.
(q,n)=1

Similarly, for any n ∈ N,


(
X φ(q) if n ≡ 1 (mod q) and
χ(n) =
χ
0 otherwise.

In the statement
P of this lemma, and throughout this chapter unless specified
otherwise, χ means the sum is taken over all Dirichlet characters modulo q.

Proof. Let S be the first sum. If χ = χ0 then the claim is trivial. If χ 6= χ0 there
exists some a with (a, q) = 1 such that χ(a) 6= 1, 0. It follows that
X X
χ(a)S = χ(an) = χ(m) = S,
1≤n≤q 1≤m≤q
(q,n)=1 (q,m)=1

since n 7→ an is a permutation on the reduced residue classes modulo q, and hence


S = 0. The proof of the second claim is similar. 

Our main interest lies in using Dirichlet characters to detect when a number lies
in a certain residue class modulo q, for which we note that, if (a, q) = 1, then
1 X 1 X
1n≡a (mod q) = χ(an) = χ(a)χ(n),
φ(q) χ φ(q) χ
39
40 THOMAS F. BLOOM

where a denotes the multiplicative inverse of a modulo q. Thus, for example, if we


want to count primes ≡ a (mod q), then it is natural to consider the sum
X 1 X X
Λ(n)1n≡a (mod q) = χ(a) Λ(n)χ(n).
φ(q) χ
n≤x n≤x

The innermost sum here we can study using analytic machinery, just as we have
done for ψ(x) previously. We now need to consider a more general type of zeta
function.
The Dirichlet L-function of χ modulo q is the Dirichlet series

X χ(n)
L(s, χ) = .
n=1
ns
This is certainly absolutely convergent, and hence holomorphic, for σ > 1. By
partial summation, however, we can do better, if the character is not principal.
Lemma 21. If χ is a non-principal character then

X χ(n)
L(s, χ) =
n=1
ns
converges for all σ > 0.
P
Proof. By partial summation, for any x, if F (x) = 1≤n≤x χ(n)
X χ(n) Z x
F (x) F (t)
= − s dt.
ns xs 1 ts+1
1≤n≤x
P
Since χ is periodic and 1≤n≤q χ(n) = 0, we know F (x) is bounded, and hence
the left-hand side has a limit as x → ∞ provided σ > 0. 
In particular, unlike the Riemann zeta function, there is no pole at s = 1. This
has surprisingly deep implications, as we will see later on.
Since χ is a totally multiplicative function, L(s, χ) can be written as an Euler
product,
Y −1
χ(p)
L(s, χ) = 1− s
p
p
for σ > 1. In particular, note that
Y
L(s, χ0 ) = ζ(s) (1 − p−s ),
p|q

so that the L-function of the principal character is closely related to the zeta func-
tion. In particular, L(s, χ0 ) is analytic for σ > 0 except for a simple pole at s = 1
with residue φ(q)/q.
Just as with the Riemann zeta function, we can take logarithms of this Euler
product and then differentiate, which yields

L0 X Λ(n)χ(n)
− (s, χ) = ,
L n=1
ns
which is valid for σ > 1. In the next section we will use this to prove Dirichlet’s
theorem.
ANALYTIC NUMBER THEORY 41

14. Dirichlet’s theorem


One of our goals is to count the number of primes which are congruent to a
(mod q) for some fixed residue class with (a, q) = 1. The first result to establish
is that there are in fact infinitely many such primes. This was first established by
Dirichlet using the machinery of L-functions.
To this end, note that by the discussion in the previous section, for σ > 1
∞ ∞
X Λ(n) 1 X Λ(n) X 1 X L0
1n≡a (mod q) = χ(a)χ(n) = − χ(a) (s, χ).
n=1
ns φ(q) n=1 ns χ φ(q) χ L

The summand from χ = χ0 contributes


1
+ Oq (1),
φ(q)(s − 1)
since L(s, χ0 ) has a simple pole at s = 1. It follows that
 
∞ X L0
X Λ(n) 1
1n≡a (mod q) = + Oq (1) + Oq  (s, χ) .
n=1
ns φ(q)(s − 1) L
χ6=χ0

To show that there are infinitely many primes ≡ a (mod q) it would suffice to show
that the final error term remains bounded as s → 1, which would imply that
X log p
= ∞.
p
p≡a (mod q)

Since L(s, χ) is analytic for σ > 0, L0 /L is analytic except possibly for zeros of L.
To prove Dirichlet’s theorem, then, it suffices to prove the following.
Theorem 20. If χ 6= χ0 then L(1, χ) 6= 0.
Proof. We will first introduce some convenient terminology: a character is quadratic
if χ2 = χ0 but χ 6= χ0 , so that it takes only values −1, 0, 1. Otherwise, χ takes on
some non-real values, and it is called complex. We will prove the theorem first for
complex characters.
For σ > 1,

!
Y XX Λ(n) −s
L(s, χ) = exp χ(n)n
χ χ n=2
log n
 
 X Λ(n) −s 
= exp 
φ(q) n 
.
log n
n≥2
n≡1 (mod q)

If s = σ > 1 is real then the sum above is a non-negative real number, and so
Y
L(σ, χ) ≥ 1
χ

for all σ > 1. Since L(s, χ0 ) has a pole at s = 1 it follows that L(1, χ) = 0 can hold
for at most one χ 6= χ0 , for otherwise the product would tend to 0 as σ → 1. The
theorem now follows immediately for complex χ, for if χ is a complex character then
so is χ, and since L(s, χ) = L(s, χ), and so L(1, χ) = 0 if and only if L(1, χ) = 0.
42 THOMAS F. BLOOM

If χ is complex then χ 6= χ, which contradicts the fact that L(1, χ) = 0 for at most
one character.
The case when χ is quadratic is harder, because there is no natural other charac-
ter to pair it with. Instead, we will pair it with ζ(s) itself. Suppose that L(1, χ) = 0.
Then ζ(s)L(s, χ) is analytic for σ > 0, and for σ > 1 this is a Dirichlet series with
coefficients X
r(n) = χ(d).
d|n
Clearly r is multiplicative, and furthermore r(n) ≥ 0 for all n, which is easily
checked by verifying it for prime powers, since
X
r(pk ) = χ(p)j .
0≤j≤k

If χ(p) = 1 then this is k + 1, if χ(p) = −1 this is 0 or 1, depending on whether k


is odd or even, and if χ(p) = 0 then this is 1. It follows that, not only is r(n) ≥ 0,
but also r(n2 ) ≥ 1 for all n.
We have a Dirichlet series with non-negative coefficients, so we will now ap-
ply
P Landau’s lemma for Dirichlet series, stated below. It follows that the series
−s
n r(n)n must converge for σ > 0 , but it can’t converge for s = 1/2, where
∞ ∞
X X 1
r(n)n−1/2 ≥ = ∞.
n=1 m=1
m

Lemma 22 (Landau). If Lf (s) is a Dirichlet series with non-negative coefficients
f (n) ≥ 0 then, if

( )
X f (n)
σ0 = inf σ : <∞ ,
n=1

Lf (s) is analytic for all σ > σ0 but has a pole at s = σ0 .

15. Zero-free region


Our main goal is to go further than Dirichlet’s theorem and establish a precise
quantitative result concerning the number of primes in arithmetic progressions.
Using the relationship

L0 X Λ(n)χ(n)
− (s, χ) = ,
L n=1
ns
we will use counter integration, as in the proof of the prime number theorem. For
this we first need to establish a zero-free region for L(s, χ). Much of the argument
is similar to that we used for ζ(s), but there is a surprising difficulty owing to the
lack of a pole at s = 1.
We introduce the convenient notation of τ for |t| + 4, where t as usual denotes
the imaginary part of s.
Lemma 23. If χ 6= χ0 is a character modulo q and 5/6 ≤ σ ≤ 2 then
L0 X 1
(s, χ) = + O(log qτ ),
L ρ
s−ρ
where the sum is over all zeros ρ of L(s, χ) such that |ρ − (3/2 + it)| ≤ 5/6.
ANALYTIC NUMBER THEORY 43

Proof. This follows from Lemma 18 with f (s) = L(s + 3/2 + it, χ), R = 5/6 and
r = 2/3. We first establish a lower bound for f (0) using the Euler product, so
Y −1
Y χ(p) 1
|f (0)| = 1 − 3/2+it ≥ 1 + 3/2  1.
p
p p
p

We also require an upper bound for f (s) for |s| ≤ 1. For this, by partial summation,
we have, for σ > 0, the identity
Z ∞
F (t)
L(s, χ) = s dt,
1 ts+1
P
where F (t) = 1≤n≤t χ(n). Observing that |F (t)| ≤ q, by periodicity of q, we
deduce that Z ∞
1
|L(s, χ)|  |s| q σ+1
dt,
1 t
and hence |f (z)|  qτ for |z| ≤ 1. 

We will also need a similar lemma for the principal character.


Lemma 24. If χ0 is the principal character modulo q and 5/6 ≤ σ ≤ 2 then
L0 1 X 1
(s, χ0 ) = − + + O(log qτ )
L s−1 ρ
s−ρ

where the sum is over all zeros ρ of L(s, χ) such that |ρ − (3/2 + it)| ≤ 5/6.
Proof. Comparing Euler products, we see that for σ > 0,
Y
L(s, χ0 ) = ζ(s) (1 − p−s ).
p|q

Taking logarithmic derivatives it follows that


L0 ζ0 X log p
(s, χ0 ) = (s) + .
L ζ ps − 1
p|q

When σ ≥ 5/6 the sum over p is  log q, since that is a trivial bound for the
number of primes dividing q, and each summand is  1. The lemma now follows
from Theorem 13 (when s is close to 1) and Corollary 5. 

Theorem 21. Let χ be a non-quadratic Dirichlet character modulo q. There is an


absolute constant c > 0 such that
c
L(s, χ) 6= 0 for σ > 1 − .
log(qτ )
The overally strategy is similar to that we used for the zeta function - cleverly
choose a linear combination of L0 /L at 1 + δ, 1 + δ + it, and 1 + δ + 2it to create a
Dirichlet series which is always non-negative, and then derive a contradiction if σ
is close to 1 by taking δ → 0. There is some complication, however, introduced by
the presence of the character χ. Surprisingly, we need to use not only the L-series
for χ, but also that for χ0 and χ2 in the proof.
44 THOMAS F. BLOOM

Proof. We first note that, comparing Euler products,


Y
L(s, χ0 ) = ζ(s) (1 − p−s ),
p|q

and so if L(s, χ0 ) = 0 and σ > 0 then ζ(s) = 0, and so in this case the theorem fol-
lows from the zero-free region we have established for ζ(s). We will thus henceforth
suppose that χ is a complex character.
Let ρ = σ + it be such that L(ρ, χ) = 0, and let δ > 0 be some parameter to be
chosen later. From the Euler product we know that σ ≤ 1. By Lemma 23, as in
the proof for the zero-free region for ζ, we have
L0 1
−< (1 + δ + it, χ) ≤ − + O(log qτ )
L 1+δ−σ
and
L0
< (1 + δ + 2it, χ2 )  log(qτ ).
L
This last step is where we crucially use the fact that χ is not quadratic, and so
χ2 is not the principal character, since Lemma 23 is not applicable to principal
characters.
We also have that, by Lemma 24,
L0 1
−< (1 + δ, χ0 ) = + O(log q).
L δ
Taking a linear combination of these three inequalities, as in the zero-free region
for the zeta function, we deduce that
L0 L0 L0
 
< −3 (1 + δ, χ0 ) − 4 (1 + δ + it, χ) − (1 + δ + 2it), χ2 )
L L L
3 4
≤ − + O(log qτ ).
δ 1+δ−σ
The reason for this choice of linear combination, as well as the choice for the three
Dirichlet characters, becomes clear when we write the left-hand side as a Dirichlet
series,
X Λ(n)
<(3 + 4χ(n)n−it + χ(n)2 n−2it ).
n1+δ
n≥1
(n,q)=1

If χ(n)n−it = eiθ then the real part in this is precisely 3 + 4 cos θ + cos(2θ) ≥ 0. In
particular, this is a series with non-negative summands, and hence
3 4
− + O(log qτ ) ≥ 0.
δ 1+δ−σ
We have a contradiction if δ = c1 / log qτ and σ ≥ 1 − c2 / log qτ for suitably chosen
constants c1 , c2 > 0, and the proof is complete. 
We have proved a good zero-free region when χ is not a quadratic character.
This case is much harder (as you might guess from the earlier difficulty showing
even that L(1, χ) 6= 0 when χ is quadratic), and we can show much less.
Theorem 22. Let χ be a quadratic character modulo q. There exists a constant
c > 0 such that L(s, χ) has
(1) no zeros in the region σ > 1 − c/ log qτ and t 6= 0, and
(2) at most one real zero 1 − c/ log q < ρ < 1.
ANALYTIC NUMBER THEORY 45

We cannot rule out the existence of a real zero of L(s, χ) very close to 1 when
χ is quadratic. These are called exceptional zeros (and similarly χ is called an
exceptional character and q an exceptional modulus).
Proof. Suppose that ρ = σ + it is a zero of L(s, χ) and that t 6= 0. Let δ > 0 be a
parameter to be chosen later. As before,
L0 1
−< (1 + δ + it, χ) ≤ − + O(log qτ )
L 1+δ−σ
and
L0 1
−< (1 + δ, χ0 ) ≤ + O(log qτ ).
L δ
The key difference is now that χ2 = χ0 , and so we have the additional term 1/s − 1
in the expansion of L0 /L. When τ ≥ C(1 − σ) for some suitably large absolute
constant C > 0, this works in our favour. In this case, we have
L0 1 δ
−< (1 + δ + 2it, χ2 ) ≤ < + O(log qτ ) ≤ 2 + O(log qτ ).
L δ + 2it δ + 4t2
Taking a linear combination and using non-negativity of the Dirichlet series as
before implies that
3 4 δ
0≤ − + + O(log qτ ).
δ 1 + δ − σ δ 2 + 4t2
If σ = 1 this is a contradiction as δ → 0. Otherwise, we can choose δ = 1 − σ,
and again this a contradiction unless σ ≤ 1 − c1 / log qτ for some constant c1 > 0.
Observe the importance of t 6= 0 here in ensuring that the third summand is
≤ (1 − ) 1δ for some small  > 0.
For small values of τ we require a different argument, and will no longer compare
1 + δ, 1 + δ + it and 1 + δ + 2it. Instead, we will just use 1 + δ and 1 + δ + it, and
use additionally the observation that since L(ρ, χ) = L(ρ, χ) (since χ is quadratic),
if ρ is a zero then ρ is also. Since t 6= 0 these are two distinct zeros of L, and so
L0
 
1 1
−< (1 + δ + it, χ) ≤ −< + + O(log qτ ).
L 1+δ−ρ 1+δ−ρ
The right-hand side is
−2(1 + δ − σ)
+ O(log qτ ).
(1 + δ − σ)2 + t2
It follows that
 0
L0

L 1 −2(1 + δ − σ)
−< (1 + δ, χ0 ) + (1 + δ + it), χ) ≤ + + O(log qτ ).
L L δ (1 + δ − σ)2 + t2
The left-hand side is
X Λ(n)
< 1 + χ(n)nit .

n 1+δ
n≥1
(n,q)=1

Again, if χ(n)nit = eiθ then the real part here is 1 + cos θ, which is obviously ≥ 0.
Once again, we obtain a contradiction choosing δ = c1 (1 − σ) if σ ≥ 1 − c2 / log qτ
for suitable c1 , c2 > 0.
It remains to consider the case of real zeros. The previous strategy no longer
works, since ρ and ρ are not distinct zeros. But this idea does allow us to rule out
the existence of more than one such real zero. 
46 THOMAS F. BLOOM

The previous theorem shows that for a fixed quadratic character modulo q, there
is at most one exceptional zero. We can say a little more, and show that in fact
amongst all possible characters for fixed q, there is at most one exceptional zero
(and so we can justly talk of ’the’ exceptional zero of q).
Lemma 25. If χ1 and χ2 are distinct quadratic characters modulo q then L(s, χ1 )L(s, χ2 )
has at most one real zero β with 1 − c/ log q < β < 1.
Proof. Say βi is a real zero of L(s, χi ) for i = 1, 2. Without loss of generality,
5/6 ≤ β1 ≤ β2 < 1. Let δ > 0 be some parameter to be chosen later. We have
L0 1
−< (1 + δ, χi ) ≤ − + O(log q)
L 1 + δ − βi
and
L0
−<(1 + δ, χ1 χ2 ) ≤ O(log q),
L
6 χ0 . Finally, we have
using the fact that χ1 χ2 =
L0 1
−< (1 + δ, χ0 ) ≤ + O(log q).
L δ
It follows that
 0
L0 L0 L0

L
−< (1 + δ, χ0 ) + (1 + δ, χ1 ) + (1 + δ, χ2 ) + (1 + δ, χ1 χ2 )
L L L L
1 2
≤− + O(log q).
δ 1 + δ − β1
The left-hand side is the Dirichlet series
X Λ(n)
<(1 + χ1 (n) + χ2 (n) + χ1 χ2 (n)).
n1+δ
n≥1
(n,q)=1

The term inside the brackets is (1 + χ1 (n))(1 + χ2 (n)) ≥ 0, since both χ1 and χ2
take only the values ±1. If we choose δ = c1 (1 − β1 ) for some constant c1 > 0, then
β1 ≤ 1 − c/ log q for some c > 0 as required.


16. Prime number theorem for arithmetic progressions


Recall that in proving Dirichlet’s theorem we used the identity
X 1 X X
Λ(n) = χ(a) Λ(n)χ(n).
φ(q) χ
n≤x n≤x
n≡a (mod q)
P
We therefore need to prove asymptotic formulas for ψ(x, χ) = n≤x Λ(n)χ(n),
which we will now do using Perron’s formula.

Theorem 23. If q ≤ exp(O( log x)) then
(1)  
p
ψ(x, χ0 ) = x + O x exp(−c log x) .
(2) If χ 6= χ0 and χ has no exceptional zero then
 p 
ψ(x, χ) = O x exp(−c log x) .
ANALYTIC NUMBER THEORY 47

(3) If χ 6= χ0 and χ has an exceptional zero at β then


xβ  p 
ψ(x, χ) = − + O x exp(−c log x) .
β
Proof. By Perron’s formula, Theorem 11, if 1 < σ0 < 2 and x is not an integer
then, for any T ≥ 1,
 
Z σ0 +iT 0 s σ0 X
1 L x x X Λ(n) x Λ(n) 
ψ(x, χ) = − (s, χ) ds+O  + .
2πi σ0 −iT L s T |x − n| T n nσ0
x/2<n<2x

By the same analysis as for ζ(s), the error term is O(x(log x)2 /T ) if we choose
σ0 = 1 + 1/ log x. We extend the contour integral to the rectangular contour with
corners at σ0 ± iT and σ1 ± iT where σ1 < 1 is chosen to avoid any (non-exceptional
zeros) on or inside the contour. The error terms from the short sides and the long
left-hand side contribute a total of
x(log x)2
   
1−qσ1
p
O +x = O x exp(−c log x)
T
√ 
if we choose σ1 = 1 − c1 / log qT and T = exp(O log x ).
It remains to note that the integral around the integral contour is x if χ = χ0
(from the simple pole at s = 1), is 0 if χ 6= χ0 has no exceptional zeros (since there
are no zeros or poles of L(s, χ) inside the contour), and xβ /β if χ has an exceptional
zero at β. 
Using the previous identity, we can immediately deduce the following prime
number theorem for arithmetic progressions.

Corollary 6. Let (a, q) = 1 and q ≤ exp(O( log x)). If q has no exceptional zero
then
x p
ψ(x; q, a) = + O(x exp(−c log x)).
φ(q)
If q has an exceptional zero at β and χ1 then

x χ1 (a)xβ p
ψ(x; q, a) = − + O(x exp(−c log x)).
φ(q) φ(q)β
17. Siegel-Walfisz theorem
The prime number theorem we established in the previous section is quite frus-
trating in that we have two different results, depending on the existence of an
exceptional zero. One may ask whether, if q is small enough, this obstruction can
be overcome.
The answer is yes, and is the following.
Theorem 24 (Siegel-Walfisz). For all A > 0, if (a, q) = 1 and q ≤ (log x)A then
x p
ψ(x; q, a) = + OA (x exp(−c log x)).
φ(q)
This follows immediately from the following similar expression for ψ(x, χ).
Theorem 25. If q ≤ (log x)A and x is large enough (depending only on A) and if
χ 6= χ0 then p
ψ(x, χ) = OA (exp(−c log x)).
48 THOMAS F. BLOOM

In particular, note that this bound holds regardless of whether or not q has an
exceptional zero at χ. This in turn follows from the following result of Siegel.
Theorem 26. For all  > 0 there exists C such that if χ is a quadratic character
modulo q and β is a real zero then
β < 1 − C q − .
Proof. Omitted; the curious student may find a proof in many standard texts on
analytic number theory, such as Davenport’s Multiplicative Number Theory or
Montgomery and Vaughan’s Multiplicative Number Theory. 
A curious feature of Siegel’s result is that the constant C is ineffective – that
is, we are not using the notation C to hide some constant that we can’t be bothered
to work out exactly, but there is no way to find from the proof how the constant
depends on  at all. In turn, this means that the constant in the Siegel-Walfisz
theorem is also ineffective.
Proof that Theorem 26 implies Theorem 25. If χ has no exceptional zero then we
have already proved this. Suppose that χ has an exceptional zero at β. By Theo-
rem 26, for any  > 0, we know β < 1 − C q − . It follows that
 β 
x p  p 
ψ(x, χ) = O + x exp(−c log x) = xO exp(−C q − log x) + exp(−c log x) .
β

Using the bound q ≤ (log x)A the error term here is O(exp(−c log x)) as required,
if we choose  = 1/3A, say. 

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy