0% found this document useful (0 votes)
270 views602 pages

DISCRETE MATHEMATICS AND ITS APPLICATIONS Series Editor

COMPUTATIONAL NUMBER THEORY

Uploaded by

jerry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
270 views602 pages

DISCRETE MATHEMATICS AND ITS APPLICATIONS Series Editor

COMPUTATIONAL NUMBER THEORY

Uploaded by

jerry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 602

Applied Mathematics

COMPUTATIONAL NUMBER THEORY


DISCRETE MATHEMATICS AND ITS APPLICATIONS DISCRETE MATHEMATICS AND ITS APPLICATIONS
Series Editor KENNETH H. ROSEN Series Editor KENNETH H. ROSEN

Developed from the author’s popular graduate-level course,


Computational Number Theory presents a complete treatment of
COMPUTATIONAL
number-theoretic algorithms. Avoiding advanced algebra, this self-
contained text is designed for advanced undergraduate and beginning
graduate students in engineering. It is also suitable for researchers
new to the field and practitioners of cryptography in industry.
NUMBER THEORY
Requiring no prior experience with number theory or sophisticated
algebraic tools, the book covers many computational aspects of
number theory and highlights important and interesting engineering
applications. It first builds the foundation of computational number
theory by covering the arithmetic of integers and polynomials at a
very basic level. It then discusses elliptic curves, primality testing,
algorithms for integer factorization, computing discrete logarithms,
and methods for sparse linear systems. The text also shows how
number-theoretic tools are used in cryptography and cryptanalysis. A
dedicated chapter on the application of number theory in public-key
cryptography incorporates recent developments in pairing-based
cryptography.
With an emphasis on implementation issues, the book uses the freely
available number-theory calculator GP/PARI to demonstrate complex
arithmetic computations. The text includes numerous examples and
exercises throughout and omits lengthy proofs, making the material
accessible to students and practitioners. DAS

ABHIJIT DAS
K12950

K12950_Cover.indd 1 2/13/13 10:41 AM


COMPUTATIONAL
NUMBER THEORY

K12950_FM.indd 1 2/6/13 2:31 PM


DISCRETE
MATHEMATICS
ITS APPLICATIONS
Series Editor
Kenneth H. Rosen, Ph.D.
R. B. J. T. Allenby and Alan Slomson, How to Count: An Introduction to Combinatorics,
Third Edition
Craig P. Bauer, Secret History: The Story of Cryptology
Juergen Bierbrauer, Introduction to Coding Theory
Katalin Bimbó, Combinatory Logic: Pure, Applied and Typed
Donald Bindner and Martin Erickson, A Student’s Guide to the Study, Practice, and Tools of
Modern Mathematics
Francine Blanchet-Sadri, Algorithmic Combinatorics on Partial Words
Miklós Bóna, Combinatorics of Permutations, Second Edition
Richard A. Brualdi and Dragos̆ Cvetković, A Combinatorial Approach to Matrix Theory and Its
Applications
Kun-Mao Chao and Bang Ye Wu, Spanning Trees and Optimization Problems
Charalambos A. Charalambides, Enumerative Combinatorics
Gary Chartrand and Ping Zhang, Chromatic Graph Theory
Henri Cohen, Gerhard Frey, et al., Handbook of Elliptic and Hyperelliptic Curve Cryptography
Charles J. Colbourn and Jeffrey H. Dinitz, Handbook of Combinatorial Designs, Second Edition
Abhijit Das, Computational Number Theory
Martin Erickson, Pearls of Discrete Mathematics
Martin Erickson and Anthony Vazzana, Introduction to Number Theory
Steven Furino, Ying Miao, and Jianxing Yin, Frames and Resolvable Designs: Uses,
Constructions, and Existence
Mark S. Gockenbach, Finite-Dimensional Linear Algebra
Randy Goldberg and Lance Riek, A Practical Handbook of Speech Coders
Jacob E. Goodman and Joseph O’Rourke, Handbook of Discrete and Computational Geometry,
Second Edition

K12950_FM.indd 2 2/6/13 2:31 PM


Titles (continued)

Jonathan L. Gross, Combinatorial Methods with Computer Applications


Jonathan L. Gross and Jay Yellen, Graph Theory and Its Applications, Second Edition
Jonathan L. Gross and Jay Yellen, Handbook of Graph Theory
David S. Gunderson, Handbook of Mathematical Induction: Theory and Applications
Richard Hammack, Wilfried Imrich, and Sandi Klavžar, Handbook of Product Graphs,
Second Edition
Darrel R. Hankerson, Greg A. Harris, and Peter D. Johnson, Introduction to Information Theory
and Data Compression, Second Edition
Darel W. Hardy, Fred Richman, and Carol L. Walker, Applied Algebra: Codes, Ciphers, and
Discrete Algorithms, Second Edition
Daryl D. Harms, Miroslav Kraetzl, Charles J. Colbourn, and John S. Devitt, Network Reliability:
Experiments with a Symbolic Algebra Environment
Silvia Heubach and Toufik Mansour, Combinatorics of Compositions and Words
Leslie Hogben, Handbook of Linear Algebra
Derek F. Holt with Bettina Eick and Eamonn A. O’Brien, Handbook of Computational Group Theory
David M. Jackson and Terry I. Visentin, An Atlas of Smaller Maps in Orientable and
Nonorientable Surfaces
Richard E. Klima, Neil P. Sigmon, and Ernest L. Stitzinger, Applications of Abstract Algebra
with Maple™ and MATLAB®, Second Edition
Richard E. Klima and Neil P. Sigmon, Cryptology: Classical and Modern with Maplets
Patrick Knupp and Kambiz Salari, Verification of Computer Codes in Computational Science
and Engineering
William Kocay and Donald L. Kreher, Graphs, Algorithms, and Optimization
Donald L. Kreher and Douglas R. Stinson, Combinatorial Algorithms: Generation Enumeration
and Search
Hang T. Lau, A Java Library of Graph Algorithms and Optimization
C. C. Lindner and C. A. Rodger, Design Theory, Second Edition
Nicholas A. Loehr, Bijective Combinatorics
Toufik Mansour, Combinatorics of Set Partitions
Alasdair McAndrew, Introduction to Cryptography with Open-Source Software
Elliott Mendelson, Introduction to Mathematical Logic, Fifth Edition
Alfred J. Menezes, Paul C. van Oorschot, and Scott A. Vanstone, Handbook of Applied
Cryptography
Stig F. Mjølsnes, A Multidisciplinary Introduction to Information Security
Jason J. Molitierno, Applications of Combinatorial Matrix Theory to Laplacian Matrices of Graphs
Richard A. Mollin, Advanced Number Theory with Applications

K12950_FM.indd 3 2/6/13 2:31 PM


Titles (continued)

Richard A. Mollin, Algebraic Number Theory, Second Edition


Richard A. Mollin, Codes: The Guide to Secrecy from Ancient to Modern Times
Richard A. Mollin, Fundamental Number Theory with Applications, Second Edition
Richard A. Mollin, An Introduction to Cryptography, Second Edition
Richard A. Mollin, Quadratics
Richard A. Mollin, RSA and Public-Key Cryptography
Carlos J. Moreno and Samuel S. Wagstaff, Jr., Sums of Squares of Integers
Goutam Paul and Subhamoy Maitra, RC4 Stream Cipher and Its Variants
Dingyi Pei, Authentication Codes and Combinatorial Designs
Kenneth H. Rosen, Handbook of Discrete and Combinatorial Mathematics
Douglas R. Shier and K.T. Wallenius, Applied Mathematical Modeling: A Multidisciplinary
Approach
Alexander Stanoyevitch, Introduction to Cryptography with Mathematical Foundations and
Computer Implementations
Jörn Steuding, Diophantine Analysis
Douglas R. Stinson, Cryptography: Theory and Practice, Third Edition
Roberto Togneri and Christopher J. deSilva, Fundamentals of Information Theory and Coding
Design
W. D. Wallis, Introduction to Combinatorial Designs, Second Edition
W. D. Wallis and J. C. George, Introduction to Combinatorics
Jiacun Wang, Handbook of Finite State Based Models and Applications
Lawrence C. Washington, Elliptic Curves: Number Theory and Cryptography, Second Edition

K12950_FM.indd 4 2/6/13 2:31 PM


DISCRETE MATHEMATICS AND ITS APPLICATIONS
Series Editor KENNETH H. ROSEN

COMPUTATIONAL
NUMBER THEORY

ABHIJIT DAS

K12950_FM.indd 5 2/6/13 2:31 PM


CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2013 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works


Version Date: 20130308

International Standard Book Number-13: 978-1-4822-0582-4 (eBook - PDF)

This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication
and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information stor-
age or retrieval system, without written permission from the publishers.

For permission to photocopy or use material electronically from this work, please access www.copy-
right.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222
Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that pro-
vides licenses and registration for a variety of users. For organizations that have been granted a pho-
tocopy license by the CCC, a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
Dedicated to
C. E. Veni Madhavan
Contents

Preface xv

1 Arithmetic of Integers 1
1.1 Basic Arithmetic Operations . . . . . . . . . . . . . . . . . . 2
1.1.1 Representation of Big Integers . . . . . . . . . . . . . 3
1.1.1.1 Input and Output . . . . . . . . . . . . . . . 3
1.1.2 Schoolbook Arithmetic . . . . . . . . . . . . . . . . . . 5
1.1.2.1 Addition . . . . . . . . . . . . . . . . . . . . 5
1.1.2.2 Subtraction . . . . . . . . . . . . . . . . . . . 6
1.1.2.3 Multiplication . . . . . . . . . . . . . . . . . 7
1.1.2.4 Euclidean Division . . . . . . . . . . . . . . . 8
1.1.3 Fast Arithmetic . . . . . . . . . . . . . . . . . . . . . . 11
1.1.3.1 Karatsuba–Ofman Multiplication . . . . . . . 11
1.1.3.2 Toom–Cook Multiplication . . . . . . . . . . 13
1.1.3.3 FFT-Based Multiplication . . . . . . . . . . 16
1.1.4 An Introduction to GP/PARI . . . . . . . . . . . . . . 20
1.2 GCD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.2.1 Euclidean GCD Algorithm . . . . . . . . . . . . . . . 27
1.2.2 Extended GCD Algorithm . . . . . . . . . . . . . . . . 29
1.2.3 Binary GCD Algorithm . . . . . . . . . . . . . . . . . 31
1.3 Congruences and Modular Arithmetic . . . . . . . . . . . . . 33
1.3.1 Modular Exponentiation . . . . . . . . . . . . . . . . . 38
1.3.2 Fast Modular Exponentiation . . . . . . . . . . . . . . 39
1.4 Linear Congruences . . . . . . . . . . . . . . . . . . . . . . . 41
1.4.1 Chinese Remainder Theorem . . . . . . . . . . . . . . 41
1.5 Polynomial Congruences . . . . . . . . . . . . . . . . . . . . 44
1.5.1 Hensel Lifting . . . . . . . . . . . . . . . . . . . . . . . 44
1.6 Quadratic Congruences . . . . . . . . . . . . . . . . . . . . . 46
1.6.1 Quadratic Residues and Non-Residues . . . . . . . . . 47
1.6.2 Legendre Symbol . . . . . . . . . . . . . . . . . . . . . 47
1.6.3 Jacobi Symbol . . . . . . . . . . . . . . . . . . . . . . 49
1.7 Multiplicative Orders . . . . . . . . . . . . . . . . . . . . . . 51
1.7.1 Primitive Roots . . . . . . . . . . . . . . . . . . . . . . 51
1.7.2 Computing Orders . . . . . . . . . . . . . . . . . . . . 53
1.8 Continued Fractions . . . . . . . . . . . . . . . . . . . . . . . 54
1.8.1 Finite Continued Fractions . . . . . . . . . . . . . . . 54

ix
x

1.8.2 Infinite Continued Fractions . . . . . . . . . . . . . . . 55


1.9 Prime Number Theorem and Riemann Hypothesis . . . . . . 58
1.10 Running Times of Arithmetic Algorithms . . . . . . . . . . . 60

2 Arithmetic of Finite Fields 75


2.1 Existence and Uniqueness of Finite Fields . . . . . . . . . . . 76
2.2 Representation of Finite Fields . . . . . . . . . . . . . . . . . 77
2.2.1 Polynomial-Basis Representation . . . . . . . . . . . . 78
2.2.2 Working with Finite Fields in GP/PARI . . . . . . . . 81
2.2.3 Choice of the Defining Polynomial . . . . . . . . . . . 83
2.3 Implementation of Finite Field Arithmetic . . . . . . . . . . 84
2.3.1 Representation of Elements . . . . . . . . . . . . . . . 84
2.3.2 Polynomial Arithmetic . . . . . . . . . . . . . . . . . . 86
2.3.2.1 Addition and Subtraction . . . . . . . . . . . 86
2.3.2.2 Multiplication . . . . . . . . . . . . . . . . . 87
2.3.2.3 Comb Methods . . . . . . . . . . . . . . . . . 88
2.3.2.4 Windowed Comb Methods . . . . . . . . . . 90
2.3.2.5 Modular Reduction . . . . . . . . . . . . . . 92
2.3.3 Polynomial GCD and Inverse . . . . . . . . . . . . . . 94
2.3.3.1 Euclidean Inverse . . . . . . . . . . . . . . . 94
2.3.3.2 Binary Inverse . . . . . . . . . . . . . . . . . 95
2.3.3.3 Almost Inverse . . . . . . . . . . . . . . . . . 97
2.4 Some Properties of Finite Fields . . . . . . . . . . . . . . . . 99
2.4.1 Fermat’s Little Theorem for Finite Fields . . . . . . . 99
2.4.2 Multiplicative Orders of Elements in Finite Fields . . 101
2.4.3 Normal Elements . . . . . . . . . . . . . . . . . . . . . 102
2.4.4 Minimal Polynomials . . . . . . . . . . . . . . . . . . . 103
2.4.5 Implementing Some Functions in GP/PARI . . . . . . 106
2.5 Alternative Representations of Finite Fields . . . . . . . . . 108
2.5.1 Representation with Respect to Arbitrary Bases . . . 108
2.5.2 Normal and Optimal Normal Bases . . . . . . . . . . . 109
2.5.3 Discrete-Log Representation . . . . . . . . . . . . . . . 110
2.5.4 Representation with Towers of Extensions . . . . . . . 111
2.6 Computing Isomorphisms among Representations . . . . . . 113

3 Arithmetic of Polynomials 121


3.1 Polynomials over Finite Fields . . . . . . . . . . . . . . . . . 122
3.1.1 Polynomial Arithmetic . . . . . . . . . . . . . . . . . . 122
3.1.2 Irreducible Polynomials over Finite Fields . . . . . . . 122
3.1.3 Testing Irreducibility of Polynomials . . . . . . . . . . 125
3.1.4 Handling Irreducible Polynomials in GP/PARI . . . . 127
3.2 Finding Roots of Polynomials over Finite Fields . . . . . . . 128
3.2.1 Algorithm for Fields of Odd Characteristics . . . . . . 129
3.2.2 Algorithm for Fields of Characteristic Two . . . . . . 131
3.2.3 Root Finding with GP/PARI . . . . . . . . . . . . . . 132
xi

3.3 Factoring Polynomials over Finite Fields . . . . . . . . . . . 133


3.3.1 Square-Free Factorization . . . . . . . . . . . . . . . . 134
3.3.2 Distinct-Degree Factorization . . . . . . . . . . . . . . 135
3.3.3 Equal-Degree Factorization . . . . . . . . . . . . . . . 136
3.3.4 Factoring Polynomials in GP/PARI . . . . . . . . . . 142
3.4 Properties of Polynomials with Integer Coefficients . . . . . . 145
3.4.1 Relation with Polynomials with Rational Coefficients . 145
3.4.2 Height, Resultant, and Discriminant . . . . . . . . . . 147
3.4.3 Hensel Lifting . . . . . . . . . . . . . . . . . . . . . . . 151
3.5 Factoring Polynomials with Integer Coefficients . . . . . . . 154
3.5.1 Berlekamp’s Factoring Algorithm . . . . . . . . . . . . 154
3.5.2 Basis Reduction in Lattices . . . . . . . . . . . . . . . 160
3.5.3 Lenstra–Lenstra–Lovász Factoring Algorithm . . . . . 166
3.5.4 Factoring in GP/PARI . . . . . . . . . . . . . . . . . . 169

4 Arithmetic of Elliptic Curves 177


4.1 What Is an Elliptic Curve? . . . . . . . . . . . . . . . . . . . 178
4.2 Elliptic-Curve Group . . . . . . . . . . . . . . . . . . . . . . 183
4.2.1 Handling Elliptic Curves in GP/PARI . . . . . . . . . 191
4.3 Elliptic Curves over Finite Fields . . . . . . . . . . . . . . . . 194
4.4 Some Theory of Algebraic Curves . . . . . . . . . . . . . . . 199
4.4.1 Affine and Projective Curves . . . . . . . . . . . . . . 199
4.4.1.1 Affine Curves . . . . . . . . . . . . . . . . . . 199
4.4.1.2 Projective Curves . . . . . . . . . . . . . . . 200
4.4.2 Polynomial and Rational Functions on Curves . . . . 205
4.4.3 Rational Maps and Endomorphisms on Elliptic Curves 213
4.4.4 Divisors . . . . . . . . . . . . . . . . . . . . . . . . . . 217
4.5 Pairing on Elliptic Curves . . . . . . . . . . . . . . . . . . . . 222
4.5.1 Weil Pairing . . . . . . . . . . . . . . . . . . . . . . . . 222
4.5.2 Miller’s Algorithm . . . . . . . . . . . . . . . . . . . . 223
4.5.3 Tate Pairing . . . . . . . . . . . . . . . . . . . . . . . 227
4.5.4 Non-Rational Homomorphisms . . . . . . . . . . . . . 232
4.5.4.1 Distortion Maps . . . . . . . . . . . . . . . . 232
4.5.4.2 Twists . . . . . . . . . . . . . . . . . . . . . 233
4.5.5 Pairing-Friendly Curves . . . . . . . . . . . . . . . . . 234
4.5.6 Efficient Implementation . . . . . . . . . . . . . . . . . 236
4.5.6.1 Windowed Loop in Miller’s Algorithm . . . . 237
4.5.6.2 Final Exponentiation . . . . . . . . . . . . . 237
4.5.6.3 Denominator Elimination . . . . . . . . . . . 239
4.5.6.4 Loop Reduction . . . . . . . . . . . . . . . . 240
4.6 Elliptic-Curve Point Counting . . . . . . . . . . . . . . . . . 243
4.6.1 A Baby-Step-Giant-Step (BSGS) Method . . . . . . . 244
4.6.1.1 Mestre’s Improvement . . . . . . . . . . . . . 246
4.6.2 Schoof’s Algorithm . . . . . . . . . . . . . . . . . . . . 247
xii

5 Primality Testing 265


5.1 Introduction to Primality Testing . . . . . . . . . . . . . . . 266
5.1.1 Pratt Certificates . . . . . . . . . . . . . . . . . . . . . 266
5.1.2 Complexity of Primality Testing . . . . . . . . . . . . 268
5.1.3 Sieve of Eratosthenes . . . . . . . . . . . . . . . . . . 268
5.1.4 Generating Random Primes . . . . . . . . . . . . . . . 269
5.1.5 Handling Primes in the GP/PARI Calculator . . . . . 270
5.2 Probabilistic Primality Testing . . . . . . . . . . . . . . . . . 271
5.2.1 Fermat Test . . . . . . . . . . . . . . . . . . . . . . . . 271
5.2.2 Solovay–Strassen Test . . . . . . . . . . . . . . . . . . 274
5.2.3 Miller–Rabin Test . . . . . . . . . . . . . . . . . . . . 275
5.2.4 Fibonacci Test . . . . . . . . . . . . . . . . . . . . . . 277
5.2.5 Lucas Test . . . . . . . . . . . . . . . . . . . . . . . . 280
5.2.6 Other Probabilistic Tests . . . . . . . . . . . . . . . . 284
5.3 Deterministic Primality Testing . . . . . . . . . . . . . . . . 284
5.3.1 Checking Perfect Powers . . . . . . . . . . . . . . . . . 285
5.3.2 AKS Test . . . . . . . . . . . . . . . . . . . . . . . . . 287
5.4 Primality Tests for Numbers of Special Forms . . . . . . . . 289
5.4.1 Pépin Test for Fermat Numbers . . . . . . . . . . . . . 289
5.4.2 Lucas–Lehmer Test for Mersenne Numbers . . . . . . 290

6 Integer Factorization 297


6.1 Trial Division . . . . . . . . . . . . . . . . . . . . . . . . . . 299
6.2 Pollard’s Rho Method . . . . . . . . . . . . . . . . . . . . . . 301
6.2.1 Floyd’s Variant . . . . . . . . . . . . . . . . . . . . . . 302
6.2.2 Block GCD Calculation . . . . . . . . . . . . . . . . . 304
6.2.3 Brent’s Variant . . . . . . . . . . . . . . . . . . . . . . 304
6.3 Pollard’s p – 1 Method . . . . . . . . . . . . . . . . . . . . . 306
6.3.1 Large Prime Variation . . . . . . . . . . . . . . . . . . 309
6.4 Dixon’s Method . . . . . . . . . . . . . . . . . . . . . . . . . 310
6.5 CFRAC Method . . . . . . . . . . . . . . . . . . . . . . . . . 316
6.6 Quadratic Sieve Method . . . . . . . . . . . . . . . . . . . . 318
6.6.1 Sieving . . . . . . . . . . . . . . . . . . . . . . . . . . 319
6.6.2 Incomplete Sieving . . . . . . . . . . . . . . . . . . . . 323
6.6.3 Large Prime Variation . . . . . . . . . . . . . . . . . . 324
6.6.4 Multiple-Polynomial Quadratic Sieve Method . . . . . 326
6.7 Cubic Sieve Method . . . . . . . . . . . . . . . . . . . . . . . 327
6.8 Elliptic Curve Method . . . . . . . . . . . . . . . . . . . . . . 330
6.9 Number-Field Sieve Method . . . . . . . . . . . . . . . . . . 335

7 Discrete Logarithms 345


7.1 Square-Root Methods . . . . . . . . . . . . . . . . . . . . . . 347
7.1.1 Shanks’ Baby-Step-Giant-Step (BSGS) Method . . . . 348
7.1.2 Pollard’s Rho Method . . . . . . . . . . . . . . . . . . 349
7.1.3 Pollard’s Lambda Method . . . . . . . . . . . . . . . . 350
xiii

7.1.4 Pohlig–Hellman Method . . . . . . . . . . . . . . . . . 351


7.2 Algorithms for Prime Fields . . . . . . . . . . . . . . . . . . 352
7.2.1 Basic Index Calculus Method . . . . . . . . . . . . . . 353
7.2.2 Linear Sieve Method (LSM) . . . . . . . . . . . . . . . 355
7.2.2.1 First Stage . . . . . . . . . . . . . . . . . . . 356
7.2.2.2 Sieving . . . . . . . . . . . . . . . . . . . . . 358
7.2.2.3 Running Time . . . . . . . . . . . . . . . . . 359
7.2.2.4 Second Stage . . . . . . . . . . . . . . . . . . 359
7.2.3 Residue-List Sieve Method (RLSM) . . . . . . . . . . 360
7.2.4 Gaussian Integer Method (GIM) . . . . . . . . . . . . 363
7.2.5 Cubic Sieve Method (CSM) . . . . . . . . . . . . . . . 366
7.2.6 Number-Field Sieve Method (NFSM) . . . . . . . . . 369
7.3 Algorithms for Fields of Characteristic Two . . . . . . . . . . 370
7.3.1 Basic Index Calculus Method . . . . . . . . . . . . . . 371
7.3.1.1 A Faster Relation-Collection Strategy . . . . 373
7.3.2 Linear Sieve Method (LSM) . . . . . . . . . . . . . . . 375
7.3.3 Cubic Sieve Method (CSM) . . . . . . . . . . . . . . . 378
7.3.4 Coppersmith’s Method (CM) . . . . . . . . . . . . . . 381
7.4 Algorithms for General Extension Fields . . . . . . . . . . . 384
7.4.1 A Basic Index Calculus Method . . . . . . . . . . . . . 384
7.4.2 Function-Field Sieve Method (FFSM) . . . . . . . . . 385
7.5 Algorithms for Elliptic Curves (ECDLP) . . . . . . . . . . . 386
7.5.1 MOV/Frey–Rück Reduction . . . . . . . . . . . . . . . 387

8 Large Sparse Linear Systems 393


8.1 Structured Gaussian Elimination . . . . . . . . . . . . . . . . 395
8.2 Lanczos Method . . . . . . . . . . . . . . . . . . . . . . . . . 404
8.3 Wiedemann Method . . . . . . . . . . . . . . . . . . . . . . . 410
8.4 Block Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 415
8.4.1 Block Lanczos Method . . . . . . . . . . . . . . . . . . 415
8.4.2 Block Wiedemann Method . . . . . . . . . . . . . . . 421

9 Public-Key Cryptography 429


9.1 Public-Key Encryption . . . . . . . . . . . . . . . . . . . . . 433
9.1.1 RSA Encryption . . . . . . . . . . . . . . . . . . . . . 433
9.1.2 ElGamal Encryption . . . . . . . . . . . . . . . . . . . 436
9.2 Key Agreement . . . . . . . . . . . . . . . . . . . . . . . . . 437
9.3 Digital Signatures . . . . . . . . . . . . . . . . . . . . . . . . 438
9.3.1 RSA Signature . . . . . . . . . . . . . . . . . . . . . . 438
9.3.2 ElGamal Signature . . . . . . . . . . . . . . . . . . . . 439
9.3.3 DSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440
9.3.4 ECDSA . . . . . . . . . . . . . . . . . . . . . . . . . . 441
9.4 Entity Authentication . . . . . . . . . . . . . . . . . . . . . . 442
9.4.1 Simple Challenge-Response Schemes . . . . . . . . . . 442
9.4.2 Zero-Knowledge Protocols . . . . . . . . . . . . . . . . 444
xiv

9.5 Pairing-Based Cryptography . . . . . . . . . . . . . . . . . . 447


9.5.1 Identity-Based Encryption . . . . . . . . . . . . . . . . 449
9.5.1.1 Boneh–Franklin Identity-Based Encryption . 449
9.5.2 Key Agreement Based on Pairing . . . . . . . . . . . . 452
9.5.2.1 Sakai–Ohgishi–Kasahara Two-Party Key Agree-
ment . . . . . . . . . . . . . . . . . . . . . . 452
9.5.2.2 Joux Three-Party Key Agreement . . . . . . 453
9.5.3 Identity-Based Signature . . . . . . . . . . . . . . . . . 454
9.5.3.1 Shamir Scheme . . . . . . . . . . . . . . . . . 454
9.5.3.2 Paterson Scheme . . . . . . . . . . . . . . . . 455
9.5.4 Boneh–Lynn–Shacham (BLS) Short Signature Scheme 457

Appendices 465

Appendix A Background 467


A.1 Algorithms and Their Complexity . . . . . . . . . . . . . . . 467
A.1.1 Order Notations . . . . . . . . . . . . . . . . . . . . . 468
A.1.2 Recursive Algorithms . . . . . . . . . . . . . . . . . . 471
A.1.3 Worst-Case and Average Complexity . . . . . . . . . . 475
A.1.4 Complexity Classes P and NP . . . . . . . . . . . . . . 479
A.1.5 Randomized Algorithms . . . . . . . . . . . . . . . . . 481
A.2 Discrete Algebraic Structures . . . . . . . . . . . . . . . . . . 483
A.2.1 Functions and Operations . . . . . . . . . . . . . . . . 483
A.2.2 Groups . . . . . . . . . . . . . . . . . . . . . . . . . . 484
A.2.3 Rings and Fields . . . . . . . . . . . . . . . . . . . . . 488
A.2.4 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . 492
A.2.5 Polynomials . . . . . . . . . . . . . . . . . . . . . . . . 494
A.3 Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . 495
A.3.1 Linear Transformations and Matrices . . . . . . . . . . 496
A.3.2 Gaussian Elimination . . . . . . . . . . . . . . . . . . 497
A.3.3 Inverse and Determinant . . . . . . . . . . . . . . . . . 502
A.3.4 Rank and Nullspace . . . . . . . . . . . . . . . . . . . 506
A.3.5 Characteristic and Minimal Polynomials . . . . . . . . 509
A.4 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 510
A.4.1 Random Variables and Probability Distributions . . . 511
A.4.2 Birthday Paradox . . . . . . . . . . . . . . . . . . . . 513
A.4.3 Random-Number Generators . . . . . . . . . . . . . . 514

Appendix B Solutions to Selected Exercises 517

Index 583
Preface

This book is a result of my teaching a masters-level course with the same name
for five years at the Indian Institute of Technology Kharagpur. The course was
attended mostly by MTech and final-year BTech students from the Depart-
ment of Computer Science and Engineering. Students from the Department
of Mathematics and other engineering departments (mostly Electronics and
Electrical Engineering, and Information Technology) also attended the course.
Some research students enrolled in the MS and PhD programs constituted the
third section of the student population. Historically, therefore, the material
presented in this book is designed to cater to the need and taste of engineering
students in advanced undergraduate and beginning graduate levels. However,
several topics that could not be covered in a one-semester course have also
been included in order to make this book a comprehensive and complete treat-
ment of number-theoretic algorithms.
A justification is perhaps needed to explain why another textbook on com-
putational number theory is necessary. Some (perhaps not many) textbooks on
this subject are already available to international students. These books vary
widely with respect to their coverage and technical sophistication. I believe
that a textbook specifically targeting the engineering population is missing.
This book should be accessible (but is not restricted) to students who have
not attended any course on number theory. My teaching experience shows
that heavy use of algebra (particularly, advanced topics like commutative al-
gebra or algebraic number theory) often demotivates students. While I have
no intention to underestimate the role played by algebra in number theory, I
believe that a large part of number theory can still reach students not conver-
sant with sophisticated algebraic tools. It is, of course, meaningless to avoid
algebra altogether. For example, when one talks about finite fields or elliptic
curves, one expects the audience to be familiar with the notion of basic alge-
braic structures like groups, rings and fields (and some linear algebra, too).
But that is all that I assume on the part of the reader. Although I made an
attempt to cover this basic algebra in an appendix, the concise appendix is
perhaps more suited as a reference than as a tool to learn the subject. Like-
wise, students who have not attended a course on algorithms may pick up
the basic terminology (asymptotic notations, types of algorithms, complexity
classes) from another appendix. Any sophisticated topic has been treated in a
self-contained manner in the main body of the text. For example, some basic
algebraic geometry (projective curves, rational functions, divisors) needed for

xv
xvi

understanding pairing has been developed from scratch. It is not a book on


only computational aspects of elementary number theory. Instead its treat-
ment is kept as elementary as possible.
This book is not above the allegation that it is meant for cryptographers.
As a practitioner of cryptography, I do not fully deny this allegation. Having
said that, I would also add that cryptography and cryptanalysis make heavy
use of many important number-theoretic tools. Prime numbers and integer fac-
torization are as important in the fundamental theorem of (computational)
number theory as they are in the RSA cryptosystem. It is difficult to locate
cryptography-free corners in computational number theory. Some fun stuff
in number theory is omitted, like calculation of the digits of π, generalized
Mersenne and Fermat numbers, and numerous simply stated but hard-to-
prove conjectures of older origins. If one equates dubbing these issues as non-
serious with my affinity to cryptology, I am not going to debate. It appears
too subjective to tell what is important and fundamental from what is not.
However, a dedicated chapter on applications of number theory in public-key
cryptography is a product of my background. I think that this book without
this chapter still makes sense, and there is no harm in highlighting practi-
cally important and interesting engineering applications of the material with
which the remaining chapters deal. Arguably, privacy is as ancient as number
theory, and when they go hand in hand, I should not be blamed for pointing
that out. On the contrary, inclusion of recent developments in pairing-based
cryptography is expected to be a value added to this book.
Emphasis on implementation issues is another distinctive feature of this
book. Arithmetic of integers and polynomials is covered at the very basic level.
The rest of computational number theory is built on top of that. It, however,
appears too annoying to invoke low-level subroutines for complicated algo-
rithms. Consequently, the freely available number-theory calculator GP/PARI
has been taken up as the medium to demonstrate arithmetic computations.
The reader may wonder why GP/PARI and not sage has been promoted as the
demonstration environment. A partial justification is this: Work on this book
started in 2006. It was a time when sage existed but had not gained its present
popularity. Although sage, being a collection of all freely available mathemat-
ical utilities including GP/PARI itself, is more efficient and versatile than each
of its individual components, GP/PARI is still a convenient and low-footprint
package sufficient for most of this book.
Many examples and exercises accompany the technical presentation. While
the examples are mostly illustrative in nature, the exercises can be broadly
classified in two categories. Some of them are meant for deepening the under-
standing of the material presented in the chapters and for filling out certain
missing details. The rest are used to develop additional theory which I could
not cover in the main text of the book. No attempts have been made to classify
exercises as easy or difficult.
Proofs of theorems, propositions, and correctness of algorithms are in many
places omitted in the book, particularly when these proofs are long and/or in-
xvii

volved and/or too sophisticated. Although this omission may alienate readers
from mathematical intricacies, I believe that the risk is not beyond control.
After all, every author of a book has to make a compromise among the bulk,
the coverage, and the details. I achieved this in a way I found most suitable.
I have not made an attempt to formally cite every contribution discussed
in the text. Some key references are presented as on-line comments and/or
footnotes. I personally find citations like [561] or [ABD2c] rather distracting,
suited to technical papers and research monographs, not to a textbook.
I am not going to describe the technical organization of this book. The ta-
ble of contents already accomplishes this task. I instead underline the impos-
sibility of covering the entire material of this book in a standard three-to-four
hour per week course in a semester (or quarter). Chapters 1 and 2 form the
backbone of computational number theory, and may be covered in the first
half of a course. In the second half, the instructor may choose from a variety
of topics. The most reasonable coverage is that of Chapters 5 and 6, followed,
if time permits, by excerpts from Chapters 3 and/or 7. A second course might
deal with the rest of the book. A beginners’ course on elliptic curves may
concentrate on Chapters 1, 2 and 4. Suitable portions from Chapters 1, 2, 5, 6
and 9 make a course on introductory public-key cryptology. The entire book
is expected to be suitable for self study, for students starting a research career
in this area, and for practitioners of cryptography in industry.
While efforts have been made to keep this book as error-free as possible,
complete elimination of errors is a dream for which any author can hope.
The onus lies on the readers, too, to detect errors and omissions at any level,
typographical to conceptual to philosophical. Any suggestion will improve
future editions of this book. I can be reached at abhij@cse.iitkgp.ernet.in
and also at SadTijihba@gmail.com.
No project like authoring this book can be complete without the active help
and participation of others. No amount of words suffice to describe the contri-
bution of my PhD supervisor C. E. Veni Madhavan. It is he who introduced me
to the wonderful world of computational number theory and thereby changed
the course of my life forever. I will never forget the days of working with him
as his student on finite-field arithmetic and the discrete-logarithm problem.
Those were, without any shred of doubt, the sweetest days in my academic life.
Among my other teachers, A. K. Nandakumaran, Basudeb Datta, Dilipkumar
Premchand Patil, Sathya S. Keerthi and Vijay Chandru, all from the Indian
Institute of Science, Bangalore, deserve specific mention for teaching me var-
ious aspects of pure and applied mathematics. I also gratefully acknowledge
Tarun Kumar Mukherjee from Jadavpur University, Calcutta, who inculcated
in me a strong affinity for mathematics in my undergraduate days. My one-
year stay with Uwe Storch and Hartmut Wiebe at the Ruhr-Universität in
Bochum, Germany, was a mathematically invigorating experience.
In the early 2000s, some of my colleagues in IIT Kharagpur developed
a taste for cryptography, and I joined this research group with an eye on
public-key algorithms. I have been gladly supplementing their areas of inter-
xviii

est in symmetric cryptology and hardware-based implementations of crypto


protocols. Their constructive suggestions for this book have always been an
asset to me. To Debdeep Mukhopadhyay, Dipanwita Roy Choudhury, Indranil
Sengupta and Rajat Subhra Chakraborty, thanks a lot to all of you. I must also
thank another colleague, Goutam Biswas (he is not a cryptologist though),
who has taken up the responsibility of continuing to teach the course after
me. He referred to this book quite often, and has pointed out many errors and
suggestions for its improvement. Jayanta Mukherjee, neither a cryptologist
nor any type of number theorist, is also gratefully acknowledged for setting
up my initial contact with CRC Press. The remaining faculty members of my
department, too, must be thanked for always extending to me the requisite
help and moral support toward the completion of this book.
I express my gratitude to Bimal Kumar Roy, Kishan Chand Gupta, Palash
Sarkar, Rana Barua and Subhomoy Maitra of the Indian Statistical Institute,
Calcutta, for our frequent exchange of ideas. I acknowledge various forms
of number-theoretic and cryptographic interactions with Debasis Giri from
the Haldia Institute of Technology; Chandan Mazumdar, Goutam Paul and
Indranath Sengupta from Jadavpur University; Gagan Garg from IIT Mandi;
Sugata Gangopadhyay from ISI Chennai; Sanjay Burman from DRDO; and
Aravind Iyer and Bhargav Bellur from General Motors, ISL, Bangalore.
No course or book is complete without the audience—the students. It is
a group of them whose active persuasion let me introduce Computational
Number Theory to our list of masters-level elective courses. In my five years
of teaching, this course gained significant popularity, the semester registra-
tion count rocketing from 10 to over 100. I collectively thank this entire
student population here. Special thanks are due to Angshuman Karmakar,
Aniket Nayak, Anup Kumar Bhattacharya, Binanda Sengupta, Debajit
Dey, Debojyoti Bhattacharya, Mathav Kishore, Pratyay Mukherjee, Rishiraj
Bhattacharyya, Sabyasachi Karati, Satrajit Ghosh, Somnath Ghosh, Souvik
Bhattacherjee, Sourav Basu, Sourav Sen Gupta, Utsab Bose and Yatendra
Dalal, whose association with the book at all stages of its conception has been
a dream to cherish forever.
Anonymous referees arranged by CRC Press provided invaluable sugges-
tions for improving the content and presentation of this book. The publishing
team at CRC Press deserves their share of thanks for always being nice and
friendly to me. Last, but not the least, I must acknowledge the support and
encouragement of my parents and my brother.

Abhijit Das
Kharagpur
Chapter 1
Arithmetic of Integers

1.1 Basic Arithmetic Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2


1.1.1 Representation of Big Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.1.1 Input and Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.2 Schoolbook Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.2.1 Addition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.1.2.2 Subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1.2.3 Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1.2.4 Euclidean Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.1.3 Fast Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.1.3.1 Karatsuba–Ofman Multiplication . . . . . . . . . . . . . . . . . . . . . . . 11
1.1.3.2 Toom–Cook Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.1.3.3 FFT-Based Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.1.4 An Introduction to GP/PARI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.2 GCD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.2.1 Euclidean GCD Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.2.2 Extended GCD Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.2.3 Binary GCD Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.3 Congruences and Modular Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
1.3.1 Modular Exponentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
1.3.2 Fast Modular Exponentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
1.4 Linear Congruences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
1.4.1 Chinese Remainder Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
1.5 Polynomial Congruences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
1.5.1 Hensel Lifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
1.6 Quadratic Congruences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
1.6.1 Quadratic Residues and Non-Residues . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
1.6.2 Legendre Symbol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
1.6.3 Jacobi Symbol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
1.7 Multiplicative Orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
1.7.1 Primitive Roots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
1.7.2 Computing Orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
1.8 Continued Fractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
1.8.1 Finite Continued Fractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
1.8.2 Infinite Continued Fractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
1.9 Prime Number Theorem and Riemann Hypothesis . . . . . . . . . . . . . . . . . . . . . . . 58
1.10 Running Times of Arithmetic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

Loosely speaking, number theory deals with the study of integers, also called
whole numbers. It is an ancient branch of mathematics, that has enjoyed study
for millenniums and attracted a variety of researchers ranging from profes-
sional to amateur. Recently, in particular, after the invention of public-key
cryptology, number theory has found concrete engineering applications and

1
2 Computational Number Theory

has turned out to be an interesting and important area of study for computer
scientists and engineers and also for security experts. Several outstanding com-
putational challenges pertaining to the theory of numbers continue to remain
unsolved and are expected to boost fascinating research in near future.
Let me reserve some special symbols in the blackboard-bold font to denote
the following important sets.
N = {1, 2, 3, . . .} = the set of natural numbers,
N0 = {0, 1, 2, 3, . . .} = the set of non-negative integers,
Z = {. . . , −3, −2, −1, 0, 1, 2, 3, . . .} = the set of all integers,
na o
Q = | a ∈ Z, b ∈ N = the set of rational numbers,
b
R = the set of real numbers,
C = the set of complex numbers,
P = {2, 3, 5, 7, 11, 13, . . .} = the set of (positive) prime numbers.1

1.1 Basic Arithmetic Operations


All common programming languages provide built-in data types for doing
arithmetic on integers. Usually, each such data type is of a fixed limited size
and so can store only finitely many possible integer values. For example, the
data type int provided in C is typically a 32-bit signed integer value capable of
representing integers in the range −231 , . . . , 231 − 1. If we want to store bigger
integers, we may use floating-point data types, but doing so leads to loss of
precision. The integer 100! requires 158 decimal digits or 525 bits for an exact
representation. A truncated floating-point number like 9.332621544394×10157
is only an approximate representation of 100! (for that matter, also of 100!+1,
and even of 100! + 50!). In number theory, we need to store the exact values
of big integers, that is, 100! should be stored as
100! = 93326215443944152681699238856266700490715968264381621
46859296389521759999322991560894146397615651828625369
7920827223758251185210916864000000000000000000000000.
User-defined data types are needed to this effect. Moreover, it is necessary
to write specific subroutines to implement the standard arithmetic operations
(+,-,*,/,%) suitable for these data types.2
1 Thenegative primes −2, −3, −5, −7, −11, −13, . . . are primes in an algebraic sense.
2 Donald Ervin Knuth (1938–) provides a comprehensive treatment of integers and
floating-point numbers of arbitrary precisions in his second volume (Seminumerical Al-
gorithms, Chapter 4) of The Art of Computer Programming.
Arithmetic of Integers 3

1.1.1 Representation of Big Integers


It is customary to express a big integer n in some predetermined base B,
and store the B-ary digits of n in an array of integers. The base B should be
so chosen that each B-ary digit can fit in a single built-in integer data type. In
order to maximize efficiency, it is worthwhile to take B as a power of 2 and as
large as possible. In 32-bit machines, the typical choice is B = 232 , for which a
B-ary digit (a value among 0, 1, 2, . . . , 232 − 1) fits in a 32-bit unsigned integer
variable. In 64-bit machines, it is preferable to choose the base B = 264 . We
refer to an integer in such a representation as a multiple-precision integer.

Example 1.1 For the purpose of illustration, I choose a small base B = 28 =


256 in this and many subsequent examples. For this choice, a B-ary digit is
a value between 0 and 255 (both inclusive), and can be stored in an eight-bit
unsigned integer variable (like unsigned char in C).
Let n = 12345678987654321. The base-256 expansion of n is

n = 12345678987654321
= 43 × B 6 + 220 × B 5 + 84 × B 4 + 98 × B 3 + 145 × B 2 + 244 × B + 177
= (43, 220, 84, 98, 145, 244, 177)B

Here, 43 is the most significant (or leftmost) digit, whereas 177 is the least
significant (or rightmost) digit. (In general, an integer n having the base-B
representation (ns−1 , ns−2 , . . . , n1 , n0 )B with ns−1 6= 0 needs s B-ary digits
with ns−1 and n0 being respectively the most and least significant digits.) An
array of size seven suffices to store n of this example. While expressing an
integer in some base, it is conventional to write the most significant digit first.
On the other hand, if one stores n in an array with zero-based indexing, it is
customary to store n0 in the zeroth index, n1 in the first index, and so on. For
example, the above seven-digit number has the following storage in an array.
Array index 0 1 2 3 4 5 6
Digit 177 244 145 98 84 220 43

The sign of an integer can be stored using an additional bit with 0 meaning
positive and 1 meaning negative. Negative integers may be represented in the
standard 1’s-complement or 2’s-complement format. In what follows, I will
stick to the signed magnitude representation only. ¤

1.1.1.1 Input and Output


Users prefer to input/output integers in the standard decimal notation.
A long integer is typically typed as a string of decimal digits. It is not diffi-
cult to convert a string of decimal digits to any base-B representation. The
converse transformation too is not difficult. I am illustrating these conversion
procedures using a couple of examples.
4 Computational Number Theory

Suppose that the user reads the input string d0 d1 d2 . . . dt−1 , where each
di is a decimal digit (0–9). This is converted to the base-B representation
(ns−1 , ns−2 , . . . , n1 , n0 )B . Call this integer n. Now, let another digit dt come
in the input string, that is, we now need to represent the integer n′ whose
decimal representation is d0 d1 d2 . . . dt−1 dt . We have n′ = 10n+dt . This means
that we multiply the representation (ns−1 , ns−2 , . . . , n1 , n0 )B by ten and add
dt to the least significant digit. Multiplication and addition algorithms are
described in the next section. For the time being, it suffices to understand
that the string to base-B conversion boils down to a sequence of elementary
arithmetic operations on multiple-precision integers.

Example 1.2 We convert the string “123454321” to an integer in base 256.


Input digit Operation B-ary representation
Initialization ()B
1 Multiply by 10 ()B
Add 1 (1)B
2 Multiply by 10 (10)B
Add 2 (12)B
3 Multiply by 10 (120)B
Add 3 (123)B
4 Multiply by 10 (4, 206)B
Add 4 (4, 210)B
5 Multiply by 10 (48, 52)B
Add 5 (48, 57)B
4 Multiply by 10 (1, 226, 58)B
Add 4 (1, 226, 62)B
3 Multiply by 10 (18, 214, 108)B
Add 3 (18, 214, 111)B
2 Multiply by 10 (188, 96, 86)B
Add 2 (188, 96, 88)B
1 Multiply by 10 (7, 91, 195, 112)B
Add 1 (7, 91, 195, 113)B

The base-256 representation of 123454321 is, therefore, (7, 91, 195, 113)256 . ¤

For the sake of efficiency, one may process, in each iteration, multiple dec-
imal digits from the input. For example, if n′′ has the decimal representation
d0 d1 d2 . . . dt−1 dt dt+1 , we have n′′ = 100n + (dt dt+1 )10 = 100n + (10dt + dt+1 ).
In fact, the above procedure can easily handle chunks of k digits simultane-
ously from the input so long as 10k < B.
The converse transformation is similar, albeit a little more involved. Now,
we have to carry out arithmetic in a base B ′ which is a suitable integral power
of ten. For example, we may choose B ′ = 10l with 10l < B < 10l+1 . First, we
express the representation base B in the base B ′ used for output: B = HB ′ +L
with H, B ∈ {0, 1, . . . , B ′ − 1}. Let n = (ns−1 , ns−2 , . . . , n1 , n0 )B be available
Arithmetic of Integers 5

in the base-B representation. Denote Ni = (ns−1 , ns−2 , . . . , ni )B . We have


Ns = 0 and N0 = n. We iteratively compute the base-B ′ representations of
Ns−1 , Ns−2 , . . . , N0 with the initialization Ns = ()B ′ . We have Ni = Ni+1 B +
ni = Ni+1 (HB ′ +L)+ni . Since ni is a B-ary digit, it may consist of two B ′ -ary
digits, so we write ni = hi B ′ +li . This gives Ni = Ni+1 (HB ′ + L)+(hi B ′ + li ).
Therefore, from the base-B ′ representation of Ni+1 , we can compute the base-
B ′ representation of Ni using multiple-precision arithmetic under base B ′ .

Example 1.3 Like the previous two examples, we choose the base B = 256.
The base for decimal output can be chosen as B ′ = 100. We, therefore, need
to convert integers from the base-256 representation to the base-100 represen-
tation. We write B = 2 × B ′ + 56, that is, H = 2 and L = 56. For the input
n = (19, 34, 230, 113)B , the conversion procedure works as follows.
i B-ary digit ni hi li Operation B ′ -ary representation
4 Initialization ()B ′
3 19 0 19 Multiply by 256 ()B ′
Add 19 (19)B ′
2 34 0 34 Multiply by 256 (48, 64)B ′
Add 34 (48, 98)B ′
1 230 2 30 Multiply by 256 (1, 25, 38, 88)B ′
Add 230 (1, 25, 41, 18)B ′
0 113 1 13 Multiply by 256 (3, 21, 5, 42, 8)B ′
Add 113 (3, 21, 5, 43, 21)B ′
If we concatenate the B ′ -ary digits from the most significant end to the least
significant end, we obtain the decimal representation of n. There is only a small
catch here. The digit 5 should be printed as 05, that is, each digit (except the
most significant one) in the base B ′ = 10l must be printed as an l-digit integer
after padding with the requisite number of leading zeros (whenever necessary).
In the example above, l is 2, so the 100-ary digits 21, 43 and 21 do not require
this padding, whereas 5 requires this. The most significant digit 3 too does
not require a leading zero (although there is no harm if one prints it). To sum
up, we have n = (19, 34, 230, 113)B = 321054321. ¤

1.1.2 Schoolbook Arithmetic


We are used to doing arithmetic operations (addition, subtraction, multi-
plication and Euclidean division) on integers in the standard decimal represen-
tation. When we change the base of representation from ten to any other base
B (a power of two or a power of ten), the basic procedures remain the same.

1.1.2.1 Addition
Let two multiple-precision integers a = (as−1 , as−2 , . . . , a1 , a0 )B and b =
(bt−1 , bt−2 , . . . , b1 , b0 )B be available in the base-B representation. Without loss
6 Computational Number Theory

of generality, we may assume that s = t (if not, we pad the smaller operand
with leading zero digits). We keep on adding ai + bi with carry adjustments
in the sequence i = 0, 1, 2, . . . .
A small implementation-related issue needs to be addressed in this context.
Suppose that we choose a base B = 232 in a 32-bit machine. When we add
ai and bi (possibly also an input carry), the sum may be larger than 232 − 1
and can no longer fit in a 32-bit integer. If such a situation happens, we know
that the output carry is 1 (otherwise, it is 0). But then, how can we detect
whether an overflow has occurred, particularly if we work only with high-level
languages (without access to assembly-level instructions)? A typical behavior
of modern computers is that whenever the result of some unsigned arithmetic
operation is larger than 232 − 1 (or 264 − 1 for a 64-bit machine), the least
significant 32 (or 64) bits are returned. In other words, addition, subtraction
and multiplication of 32-bit (or 64-bit) unsigned integers are actually com-
puted modulo 232 (or 264 ). Based upon this assumption about our computer,
we can detect overflows without any assembly-level support.
First, suppose that the input carry is zero, that is, we add two B-ary digits
ai and bi only. If there is no overflow, the modulo-B sum ai + bi (mod 232 ) is
not smaller than ai and bi . On the other hand, if an overflow occurs, the sum
ai + bi (mod 232 ) is smaller than both ai and bi . Therefore, by inspecting the
return value of the high-level sum, one can deduce the output carry. If the
input carry is one, ai + bi + 1 (mod 232 ) is at most as large as ai if and only
if an overflow occurs. Thus, if there is no overflow, this sum has to be larger
than ai . This observation lets us detect the presence of the output carry.

Example 1.4 Consider the two operands in base B = 256 = 28 :


a = 9876543210 = (2, 76, 176, 22, 234)B ,
b = 1357902468 = (80, 239, 242, 132)B .
Word-by-word addition proceeds as follows. We use cin and cout to denote the
input and the output carries, respectively.
i ai bi cin ai + bi + cin (mod 256) Carry? cout
0 234 132 0 110 (110 < 234)? 1
1 22 242 1 9 (9 6 22)? 1
2 176 239 1 160 (160 6 176)? 1
3 76 80 1 157 (157 6 76)? 0
4 2 0 0 2 (2 < 2)? 0
It therefore follows that a + b = (2, 157, 160, 9, 110)B = 11234445678. ¤

1.1.2.2 Subtraction
The procedure for the subtraction of multiple-precision integers is equally
straightforward. In this case, we need to handle the borrows. The input and
output borrows are denoted, as before, by cin and cout . The check whether
Arithmetic of Integers 7

computing ai −bi −cin results in an output borrow can be performed before the
operation is carried out. If cin = 0, the output borrow cout is one if and only if
ai < bi . On the other hand, for cin = 1, the output borrow is one if and only
if ai 6 bi . Even in the case that there is an output borrow, one may blindly
compute the mod B operation ai − bi − cin and keep the returned value as the
output word, provided that the CPU supports 2’s-complement arithmetic. If
not, ai − bi − cin may be computed as ai + (B − bi − cin ) with B − bi − cin
computed using appropriate bit operations. More precisely, B − bi − 1 is the
bit-wise complement of bi , and B − bi is one more than that.

Example 1.5 Let us compute a − b for the same operands as in Example 1.4.
i ai bi cin Borrow? cout ai − bi − cin (mod 256)
0 234 132 0 (234 < 132)? 0 102
1 22 242 0 (22 < 242)? 1 36
2 176 239 1 (176 6 239)? 1 192
3 76 80 1 (76 6 80)? 1 251
4 2 0 1 (2 6 0)? 0 1
We obtain a−b = (1, 251, 192, 36, 102)B = 8518640742. In this example, a > b.
If a < b, then a−b can be computed as −(b−a). While performing addition and
subtraction of signed multiple-precision integers, a subroutine for comparing
the absolute values of two operands turns out to be handy. ¤

1.1.2.3 Multiplication
Multiplication of two multiple-precision integers is somewhat problematic.
Like decimal (or polynomial) multiplication, one may multiply every word of
the first operand with every word of the second. But a word-by-word multi-
plication may lead to a product as large as (B − 1)2 . For B = 232 , the product
may be a 64-bit value, whereas for B = 264 , the product may be a 128-bit
value. Correctly obtaining all these bits is trickier than a simple check for an
output carry or borrow as we did for addition and subtraction.
Many compilers support integer data types (perhaps non-standard) of size
twice the natural word size of the machine. If so, one may use this facility to
compute the double-sized intermediate products. Assembly-level instructions
may also allow one to retrieve all the bits of word-by-word products. If neither
of these works, one possibility is to √ break each operand√ word in two half-sized
integers. That is, one writes ai = hi B +li and bj = h′j B +lj′ , and computes

ai bj as hi h′j B + (hi lj′ + li h′j ) B + li lj′ . Here, hi h′j contributes only to the more
significant word of ai bj , and li lj′ to only the less significant word. But hi lj′ and
li h′j contribute to both the words. With appropriate bit-shift and extraction
operations, these contributions can be separated and added to appropriate
words of the product. When ai bj is computed as hB + l = (h, l)B , the less
significant word l is added to the (i + j)-th position of the output, whereas h
is added to the (i + j + 1)-st position. Each such addition may lead to a carry
8 Computational Number Theory

which needs to be propagated to higher positions until the carry is absorbed


in some word of the product.

Example 1.6 We compute the product of the following two operands avail-
able in the representation to base B = 28 = 256.
a = 1234567 = (18, 214, 135)B ,
b = 76543210 = (4, 143, 244, 234)B .
The product may be as large as having 3 + 4 = 7 B-ary words. We initialize
the product c as an array of seven eight-bit values, each initialized to zero. In
the following table, c is presented in the B-ary representation with the most
significant digit written first.
i ai j bj ai bj = (h, l)B Operation c
Initialization (0, 0, 0, 0, 0, 0, 0)B
0 135 0 234 (123, 102)B Add 102 at pos 0 (0, 0, 0, 0, 0, 0, 102)B
Add 123 at pos 1 (0, 0, 0, 0, 0, 123, 102)B
1 244 (128, 172)B Add 172 at pos 1 (0, 0, 0, 0, 1, 39, 102)B
Add 128 at pos 2 (0, 0, 0, 0, 129, 39, 102)B
2 143 ( 75, 105)B Add 105 at pos 2 (0, 0, 0, 0, 234, 39, 102)B
Add 75 at pos 3 (0, 0, 0, 75, 234, 39, 102)B
3 4 ( 2, 28)B Add 28 at pos 3 (0, 0, 0, 103, 234, 39, 102)B
Add 2 at pos 4 (0, 0, 2, 103, 234, 39, 102)B
1 214 0 234 (195, 156)B Add 156 at pos 1 (0, 0, 2, 103, 234, 195, 102)B
Add 195 at pos 2 (0, 0, 2, 104, 173, 195, 102)B
1 244 (203, 248)B Add 248 at pos 2 (0, 0, 2, 105, 165, 195, 102)B
Add 203 at pos 3 (0, 0, 3, 52, 165, 195, 102)B
2 143 (119, 138)B Add 138 at pos 3 (0, 0, 3, 190, 165, 195, 102)B
Add 119 at pos 4 (0, 0, 122, 190, 165, 195, 102)B
3 4 ( 3, 88)B Add 88 at pos 4 (0, 0, 210, 190, 165, 195, 102)B
Add 3 at pos 5 (0, 3, 210, 190, 165, 195, 102)B
2 18 0 234 ( 16, 116)B Add 116 at pos 2 (0, 3, 210, 191, 25, 195, 102)B
Add 16 at pos 3 (0, 3, 210, 207, 25, 195, 102)B
1 244 ( 17, 40)B Add 40 at pos 3 (0, 3, 210, 247, 25, 195, 102)B
Add 17 at pos 4 (0, 3, 227, 247, 25, 195, 102)B
2 143 ( 10, 14)B Add 14 at pos 4 (0, 3, 241, 247, 25, 195, 102)B
Add 10 at pos 5 (0, 13, 241, 247, 25, 195, 102)B
3 4 ( 0, 72)B Add 72 at pos 5 (0, 85, 241, 247, 25, 195, 102)B
Add 0 at pos 6 (0, 85, 241, 247, 25, 195, 102)B

The product of a and b is, therefore, c = ab = (0, 85, 241, 247, 25, 195, 102)B =
(85, 241, 247, 25, 195, 102)B = 94497721140070. ¤

1.1.2.4 Euclidean Division


Euclidean division turns out to be the most notorious among the basic
arithmetic operations. Given any integers a, b with b 6= 0, there exist unique
integers q, r satisfying a = bq + r and 0 6 r 6 |b| − 1. We call q the quotient
Arithmetic of Integers 9

of Euclidean division of a by b, whereas r is called the remainder of Euclidean


division of a by b. We denote q = a quot b and r = a rem b. For simplicity,
let us assume that a and b are positive multiple-precision integers, and a > b.
The quotient q and the remainder r are again multiple-precision integers in
general. Computing these essentially amounts to efficiently guessing the B-
ary digits of q from the most significant end. The process resembles the long
division procedure we are accustomed to for decimal integers.
In order that the guesses of the quotient words quickly converge to the
correct values, a certain precaution needs to be taken. We will assume that
the most significant word of b is at least as large as B/2. If this condition is not
satisfied, we multiply both a and b by a suitable power of 2. The remainder
computed for these modified operands will also be multiplied by the same
power of 2, whereas the quotient will remain unchanged. Preprocessing b in
this manner is often referred to as normalization.
Suppose that a = (as−1 , as−2 , . . . , a1 , a0 )B and b = (bt−1 , bt−2 , . . . , b1 , b0 )B
with bt−1 > B/2. The quotient may be as large as having s−t+1 B-ary words.
So we represent q as an array with s − t + 1 cells each initialized to zero. Let
us denote q = (qs−t , qs−t−1 , . . . , q1 , q0 )B .
If as−1 > bt−1 and a > B s−t b, we increment qs−t by one, and subtract
s−t
B b from a. Since b is normalized, this step needs to be executed at most
once. Now, assume that as−1 6 bt−1 and a < B s−t b. We now guess the
next word qs−t−1 of the quotient. The initial guess is based upon only two
most significant words of a and one most significant word ofj b, that is, kif
as−1 = bt−1 , we set qs−t−1 = B − 1, otherwise we set qs−t−1 = as−1bB+a t−1
s−2
.
Computing this division is easy if arithmetic routines for double-word-sized
integer operands are provided by the compiler.
The initial guess for qs−t−1 may be slightly larger than the correct value.
If b is normalized, the error cannot be more than three. In order to refine
the guess, we first consider one more word from the most significant end
of each of a and b. This time we do not require division, but we will keep on
decrementing qs−t−1 so long as as−1 B 2 +as−2 B +as−3 < (bt−1 B +bt−2 )qs−t−1
(this computation involves multiplications only). For a normalized b, this check
needs to be carried out at most twice. This means that the guessed value of
the word qs−t−1 is, at this stage, either the correct value or just one more
than the correct value. We compute c = qs−t−1 B s−t−1 b. If c > a, the guess is
still incorrect, so we decrement qs−t−1 for the last time and subtract bB s−t−1
from c. Finally, we replace a by a − c.
Suppose that after the operations described in the last two paragraphs, we
have reduced a to (a′s−2 , a′s−3 , . . . , a′1 , a′0 )B . We repeat the above operations
on this updated a with s replaced by s − 1. The division loop is broken when
all the B-ary digits of q are computed, that is, when a is reduced to an integer
with at most t B-ary digits. At this stage, the reduced a may still be larger
than b. If so, q0 is incremented by 1, and a is replaced by a − b. The value
stored in a now is the remainder r = a rem b.
10 Computational Number Theory

Example 1.7 Let me explain the working of the above division algorithm on
the following two operands available in the base-256 representation.
a = 369246812345567890 = (5, 31, 212, 4, 252, 77, 138, 146)B ,
b = 19283746550 = (4, 125, 102, 158, 246)B .
The most significant word of b is too small. Multiplying both a and b by
25 = 32 completes the normalization procedure, and the operands change to
a = 1463015397982818880 = (163, 250, 128, 159, 137, 177, 82, 64)B ,
b = 617079889600 = (143, 172, 211, 222, 192)B .
We initially have s = 8 and t = 5, that is, the quotient can have at most
s − t + 1 = 4 digits to the base B = 256. The steps of the division procedure
are now illustrated in the following table.
s Condition Operation Intermediate values
Initialization q = (0, 0, 0, 0)B
a = (163, 250, 128, 159, 137, 177, 82, 64)B
8 (a7 > b4 )? Yes. Increment q3 q = (1, 0, 0, 0)B
Subtract B 3 b from a a = ( 20, 77, 172, 192, 201, 177, 82, 64)B
¥ a7 B+a6 ¦
(a7 = b4 )? No. Set q2 = b4
q = (1, 36, 0, 0)B
2
(a7 B + a6 B + a5 < q2 (b4 B + b3 ))?
No. Do nothing.
Compute c = q2 B 2 b c = ( 20, 52, 77, 203, 83, 0, 0, 0)B
(c > a)? No. Do nothing.
Set a := a − c a = (25, 94, 245, 118, 177, 82, 64)B
7 (a6 > b4 )? No. Do nothing.
¥a ¦
6 B+a5
(a6 = b4 )? No. Set q1 = b4
q = (1, 36, 45, 0)B
(a6 B 2 + a5 B + a4 < q1 (b4 B + b3 ))?
No. Do nothing.
Compute c = q1 Bb c = (25, 65, 97, 62, 39, 192, 0)B
(c > a)? No. Do nothing.
Set a := a − c a = (29, 148, 56, 137, 146, 64)B
6 (a5 > b4 )? No. Do nothing.
¥ a5 B+a4 ¦
(a5 = a4 )? No. Set q0 = b4
q = (1, 36, 45, 52)B
2
(a5 B + a4 B + a3 < q0 (b4 B + b3 ))?
No. Do nothing.
Compute c = q0 b c = (29, 47, 27, 9, 63, 0)B
(c > a)? No. Do nothing.
Set a := a − c a = (101, 29, 128, 83, 64)B
5 (a4 > b4 )? No. Do nothing.

Let us again use the letters a, b to stand for the original operands (before
normalization). We have computed (32a) quot (32b) = (1, 36, 45, 52)B and
Arithmetic of Integers 11

(32a) rem (32b) = (101, 29, 128, 83, 64)B . For the original operands, we then
have a quot b = (32a) quot (32b) = (1, 36, 45, 52)B = 19148084, and a rem b =
[(32a) rem (32b)]/32 = (3, 40, 236, 2, 154)B = 13571457690. ¤

1.1.3 Fast Arithmetic


Multiple-precision arithmetic has been an important field of research since
the advent of computers. The schoolbook methods for addition and subtrac-
tion of two n-digit integers take O(n) time, and cannot be improved further
(at least in the big-Oh notation). On the contrary, the O(n2 )-time schoolbook
algorithms for multiplying two n-digit integers and for dividing a 2n-digit
integer by an n-digit integer are far from optimal. In this section, I explain
some multiplication algorithms that run faster than in O(n2 ) time. When we
study modular arithmetic, some efficient algorithms for modular multiplica-
tion (multiplication followed by Euclidean division) will be discussed.

1.1.3.1 Karatsuba–Ofman Multiplication


Apparently, the first fast integer-multiplication algorithm is proposed by
Karatsuba and Ofman.3 Let a, b be two multiple-precision integers each with n
B-ary words. For the sake of simplicity, assume that n is even. Let m = n/2.
We can write a = A1 B m + A0 and b = B1 B m + B0 , where A0 , A1 , B0 , B1
are multiple-precision integers each having n/2 B-ary digits. We have ab =
(A1 B1 )B 2 + (A1 B0 + A0 B1 )B + (A0 B0 ). Therefore, if we compute the four
products A1 B1 , A1 B0 , A0 B1 and A0 B0 of n/2-digit integers, we can compute
ab using a few additions of n-digit integers. This immediately does not lead
to any improvement in the running time of schoolbook multiplication.
The Karatsuba–Ofman trick is to compute ab using only three products
of n/2-digit integers, namely A1 B1 , A0 B0 and (A1 + A0 )(B1 + B0 ). These
products give A1 B0 +A0 B1 = (A1 +A0 )(B1 +B0 )−A1 B1 −A0 B0 . This decrease
in the number of multiplications is achieved at the cost of an increased number
of additions and subtractions. But since this is only an additional O(n)-time
overhead, we can effectively speed up the computation of ab (unless the size
n of the operands is too small).
A small trouble with the above strategy is that A1 + A0 and/or B1 + B0
may be too large to fit in m = n/2 digits. Consequently, the subproduct
(A1 +A0 )(B1 +B0 ) may be of (m+1)-digit integers. It is, therefore, preferable
to compute the quantities A1 −A0 and B1 −B0 which may be negative but must
fit in m digits. Subsequently, the product (A1 − A0 )(B1 − B0 ) is computed,
and we obtain A1 B0 + A0 B1 = A1 B1 + A0 B0 − (A1 − A0 )(B1 − B0 ).

3 A. Karatsuba and Yu. Ofman, Multiplication of many-digital numbers by automatic

computers, Doklady Akad. Nauk. SSSR, Vol. 145, 293–294, 1962. The paper gives the full
credit of the multiplication algorithm to Karatsuba only.
12 Computational Number Theory

Example 1.8 Take B = 256, a = 123456789 = (7, 91, 205, 21)B , and b =
987654321 = (58, 222, 104, 177)B . We have A1 = (7, 91)B , A0 = (205, 21)B ,
B1 = (58, 222)B and B0 = (104, 177)B . The subproducts are computed as
A1 B1 = (1, 176, 254, 234)B ,
A0 B0 = (83, 222, 83, 133)B ,
A1 − A0 = −(197, 186)B ,
B1 − B0 = −(45, 211)B ,
(A1 − A0 )(B1 − B0 ) = (35, 100, 170, 78)B .
It follows that
A1 B0 + A0 B1 = A1 B1 + A0 B0 − (A1 − A0 )(B1 − B0 ) = (50, 42, 168, 33)B .
The three subproducts are added with appropriate shifts to obtain ab.
1 176 254 234
50 42 168 33
83 222 83 133
1 177 49 20 251 255 83 133
Therefore, ab = (1, 177, 49, 20, 251, 255, 83, 133)B = 121932631112635269. ¤

So far, we have used the Karatsuba–Ofman trick only once. If m = n/2 is


large enough, we can recursively apply the same trick to compute the three
subproducts A1 B1 , A0 B0 and (A1 −B1 )(A0 −B0 ). Each such subproduct leads
to three subsubproducts of n/4-digit integers. If n/4 too is large, we can again
compute these subsubproducts using the Karatsuba–Ofman algorithm. Recur-
sion stops when either the operands become too small or the level of recursion
reaches a prescribed limit. Under this recursive invocation, Karatsuba–Ofman
multiplication achieves a running time of O(nlog2 3 ), that is, about O(n1.585 ).
This is much better than the O(n2 ) time of schoolbook multiplication. In
practice, the advantages of using the Karatsuba–Ofman algorithm show up
for integer operands of size at least a few hundred bits.

Example 1.9 Let me recursively compute A1 B1 of Example 1.8. Since A1 =


7B + 91 and B1 = 58B + 222, we compute 7 × 58 = (1, 150)B , 91 × 222 =
(78, 234)B , and (7−91)(58−222) = (53, 208)B . Finally, (1, 150)B +(78, 234)B −
(53, 208)B = (26, 176)B , so A1 B1 is computed as:
1 150
26 176
78 234
1 176 254 234
This gives A1 B1 = (1, 176, 254, 234)B . The other two subproducts A0 B0 and
(A1 − A0 )(B1 − B0 ) can be similarly computed. ¤
Arithmetic of Integers 13

1.1.3.2 Toom–Cook Multiplication


Before we go to asymptotically faster multiplication algorithms, it is worth-
while to view Karatsuba–Ofman multiplication from a slightly different angle.
Let us write a = A1 R + A0 and b = B1 R + B0 , where R = B m = B n/2 .
For a moment, treat R as an indeterminate, so a and b behave as linear
polynomials in one variable. The product c = ab can be expressed as the
quadratic polynomial c = C2 R2 +C1 R +C0 whose coefficients are C2 = A1 B1 ,
C1 = A1 B0 + A0 B1 and C0 = A0 B0 . In the Karatsuba–Ofman algorithm, we
have computed these three coefficients. Indeed, these three coefficients can be
fully determined from the values c(k) at three distinct points. Moreover, since
c(k) = a(k)b(k), we need to evaluate the polynomials a and b at three points
and compute the three subproducts of the respective values. Once C2 , C1 , C0
are computed, the polynomial c is evaluated at R to obtain an integer value.
The three evaluation points k chosen as ∞, 0, 1 yield the following three
linear equations in C2 , C1 , C0 .

c(∞) = C2 = a(∞)b(∞) = A1 B1 ,
c(0) = C0 = a(0)b(0) = A0 B0 ,
c(1) = C2 + C1 + C0 = a(1)b(1) = (A1 + A0 )(B1 + B0 ).

Solving the system for C2 , C1 , C0 gives the first version of the Karatsuba–
Ofman algorithm. If we choose the evaluation point k = −1 instead of k = 1,
we obtain the equation

c(−1) = C2 − C1 + C0 = a(−1)b(−1)
= (A0 − A1 )(B0 − B1 ) = (A1 − A0 )(B1 − B0 ).

This equation along with the equations for c(∞) and c(0) yield the second
version of the Karatsuba–Ofman algorithm (as illustrated in Example 1.8).
This gives us a way to generalize the Karatsuba–Ofman algorithm. Toom4
and Cook5 propose representing a and b as polynomials of degrees higher
than one. Writing them as quadratic polynomials gives an algorithm popularly
known as Toom-3 multiplication.
Let a and b be n-digit integers. Take m = ⌈n/3⌉, and write

a = A2 R2 + A1 R + A0 ,
b = B2 R2 + B1 R + B0 ,

where R = B m . The product c = ab can be expressed as the polynomial

c = C4 R4 + C3 R3 + C2 R2 + C1 R + C0
4 Andrei L. Toom, The complexity of a scheme of functional elements realizing the mul-

tiplication of integers, Doklady Akad. Nauk. SSSR, Vol. 4, No. 3, 714–716, 1963.
5 Stephen A. Cook, On the minimum computation time of functions, PhD thesis, De-

partment of Mathematics, Harvard University, 1966.


14 Computational Number Theory

with the coefficients


C4 = A2 B2 ,
C3 = A2 B1 + A1 B2 ,
C2 = A2 B0 + A1 B1 + A0 B2 ,
C1 = A1 B0 + A0 B1 ,
C0 = A0 B0 .
A straightforward computation of these coefficients involves computing nine
subproducts Ai Bj for i, j = 0, 1, 2, and fails to improve upon the schoolbook
method. However, since we now have only five coefficients to compute, it
suffices to compute only five subproducts of n/3-digit integers. We choose five
suitable evaluation points k, and obtain c(k) = a(k)b(k) at these points. The
choices k = ∞, 0, 1, −1, −2 lead to the following equations.
c(∞) = C4 = A2 B2 ,
c(0) = C0 = A0 B0 ,
c(1) = C4 + C3 + C2 + C1 + C0 = (A2 + A1 + A0 )(B2 + B1 + B0 ),
c(−1) = C4 − C3 + C2 − C1 + C0 = (A2 − A1 + A0 )(B2 − B1 + B0 ),
c(−2) = 16C4 − 8C3 + 4C2 − 2C1 + C0 = (4A2 − 2A1 + A0 )(4B2 − 2B1 + B0 ).
This system can be written in the matrix notation as
    
c(∞) 1 0 0 0 0 C4
 c(0)   0 0 0 0 1   C3 
    
 c(1)  =  1 1 1 1 1   C2  .
    
c(−1) 1 −1 1 −1 1 C1
c(−2) 16 −8 4 −2 1 C0
We can invert the coefficient matrix in order to express C4 , C3 , C2 , C1 , C0 in
terms of the five subproducts c(k). The coefficient matrix is independent of
the inputs a and b, so the formulas work for all input integers.
 
  1 0 0 0 0  
C4 c(∞)
 1 1 1 1 
 C3   2 −2 6 2 − 6   c(0) 
   1 1
 
 C2  =  −1 −1 2

2 0 
 c(1)  .
    
C1  −2 1 1
−1 1  c(−1)
2 3 6 
C0 c(−2)
0 1 0 0 0

This can be rewritten as


C4 = c(∞),
¡ ¢
C3 = 12c(∞) − 3c(0) + c(1) + 3c(−1) − c(−2) / 6,
¡ ¢
C2 = − 2c(∞) − 2c(0) + c(1) + c(−1) / 2,
¡ ¢
C1 = − 12c(∞) + 3c(0) + 2c(1) − 6c(−1) + c(−2) / 6,
C0 = c(0).
Arithmetic of Integers 15

These formulas involve multiplications and divisions by small integers (like


2, 3, 6, 12). Multiplying or dividing an m-digit multiple-precision integer by a
single-precision integer can be completed in O(m) time, so this is no trouble.
Although some of these expressions involve denominators larger than 1, all
the coefficients Ci evaluate to integral values. If the subproducts a(k)b(k) too
are computed recursively using Toom-3 multiplication, we get a running time
of O(nlog3 5 ), that is, about O(n1.465 ) which is better than the running time
of Karatsuba–Ofman multiplication. However, this theoretical improvement
shows up only when the bit sizes of the input integers are sufficiently large. On
the one hand, the operands of the subproducts (like 4A2 − 2A1 + A0 ) may now
fail to fit in m words. On the other hand, the formulas for Ci are more cumber-
some than in the Karatsuba–Ofman method. Still, as reported in the litera-
ture, there is a range of bit sizes (several hundreds and more), for which Toom-
3 multiplication is practically the fastest known multiplication algorithm.
Example 1.10 We take B = 256 and multiply a = 1234567 = (18, 214, 135)B
and b = 7654321 = (116, 203, 177)B by the Toom-3 multiplication algorithm.
Here, n = 3, that is, m = 1, that is, the coefficients Ai and Bj are single-
precision integers. In fact, we have A2 = 18, A1 = 214, A0 = 135, B2 = 116,
B1 = 203, and B0 = 177. The five subproducts are computed as follows.
c(∞) = A2 B2 = (8, 40)B ,
c(0) = A0 B0 = (93, 87)B ,
c(1) = (A2 + A1 + A0 )(B2 + B1 + B0 ) = (1, 111)B ×(1, 240)B = (2,199,16)B ,
c(−1) = (A2 − A1 + A0 )(B2 − B1 + B0 ) = −61 × 90 = −(21, 114)B ,
c(−2) = (4A2 − 2A1 + A0 )(4B2 − 2B1 + B0 ) = −221 × 235 = −(202, 223)B .
The formulas for Ci as linear combinations of these five subproducts give us
C4 = (8, 40)B , C3 = (111, 62)B , C2 = (243, 80)B , C1 = (255, 3)B , and C0 =
(93, 87)B . Adding appropriate shifts of these values yields c as follows.
8 40
111 62
243 80
255 3
93 87
8 152 50 79 96 87
Thus, we have computed ab = (8, 152, 50, 79, 96, 87)B = 9449772114007. ¤
Toom’s 3-way multiplication can be readily generalized to any k-way mul-
tiplication. For k = 1 we have the schoolbook method, for k = 2 we get
the Karatsuba–Ofman method, whereas k = 3 gives the Toom-3 method.
For k = 4, we get the Toom-4 method that calls for only seven multipli-
cations of one-fourth-sized operands. The schoolbook method invokes six-
teen such multiplications, whereas two levels of recursion of the Karatsuba–
Ofman method generate nine such multiplications. In general, Toom-k runs in
16 Computational Number Theory

O(nlog(2k−1)/ log k ) time in which the exponent can be made arbitrarily close to
one by choosing large values of k. Toom and Cook suggest taking k adaptively
based on the size of the input. The optimal choice is shown as k = 2⌈log r⌉ ,
where each input is broken in k √parts each of size r digits. This gives an
asymptotic running time of O(n25 log n ) for the optimal Toom–Cook method.
Unfortunately, practical implementations do not behave well for k > 4.

1.1.3.3 FFT-Based Multiplication


An alternative method based again on polynomial evaluations and inter-
polations turns out to be practically significant. Proposed by Schönhage and
Strassen,6 this method achieves a running time of O(n log n log log n), and is
practically the fastest known integer-multiplication algorithm for operands of
bit sizes starting from a few thousands. The Schönhage–Strassen algorithm is
a bit too involved to be discussed in this introductory section. So I present
only a conceptually simpler version of the algorithm.
Suppose that each of the input operands a and b consists of n digits in base
B = 2r . Let 2t−1 < n 6 2t . Define N = 2t+1 , and pad a and b with leading zero
words so that each is now represented as an N -digit integer in base B. Denote
a = (aN −1 , aN −2 , . . . , a1 , a0 )B and b = (bN −1 , bN −2 , . . . , b1 , b0 )B , where each
ai or bj is an r-bit integer. We have
a = aN −1 B N −1 + aN −2 B N −2 + · · · + a1 B + a0 ,
b = bN −1 B N −1 + bN −2 B N −2 + · · · + b1 B + b0 .
Because of the padding of a and b in an N -digit space, we have ai = bj = 0
for N/2 6 i, j < N .
The cyclic convolution of the two sequences (aN −1 , aN −2 , . . . , a1 , a0 ) and
(bN −1 , bN −2 , . . . , b1 , b0 ) is the sequence (cN −1 , cN −2 , . . . , c1 , c0 ), where
X
ck = ai bj
06i,j6N −1
i+j=k or i+j=k+N

for all k ∈ {0, 1, 2, . . . , N − 1}. Since ai = bj = 0 for N/2 6 i, j < N , we have


X
ck = ai bj
06i,j6N −1
i+j=k or i+j=k+N

= ak b0 + ak−1 b1 + · · · + a0 bk + (aN −1 bk+1 + aN −2 bk+2 + · · · + ak+1 bN −1 )


X
= ak b0 + ak−1 b1 + · · · + a0 bk = ai bj ,
06i,j6k
i+j=k

that is, the product of a and b can be expressed as


ab = cN −1 B N −1 + cN −2 B N −2 + · · · + c1 B + c0 .
6 Arnold Schönhage and Volker Strassen, Schnelle Multiplikation großer Zahlen, Com-

puting, 7, 281–292, 1971.


Arithmetic of Integers 17

Now, let ωN be a primitive N -th root of unity. Depending upon the field
in which we are working, this root ωN can be appropriately defined. For the
time being, let us plan to work in the √ field of complex numbers so that

we can take ωN = e i N (where i = −1). The discrete Fourier transform
(DFT) of the sequence (aN −1 , aN −2 , . . . , a1 , a0 ) is defined as the sequence
(AN −1 , AN −2 , . . . , A1 , A0 ), where for all k ∈ {0, 1, 2, . . . , N − 1}, we have
X
ki
Ak = ωN ai .
06i<N
k k
Ak is the value of the polynomial a evaluated at ωN (replace B by ωN ). Like-
wise, let (BN −1 , BN −2 , . . . , B1 , B0 ) be the DFT of (bN −1 , bN −2 , . . . , b1 , b0 ),
and (CN −1 , CN −2 , . . . , C1 , C0 ) the DFT of (cN −1 , cN −2 , . . . , c1 , c0 ). Since Bk
k k
is b evaluated at ωN , and ck is c evaluated at ωN , we have
Ck = Ak Bk
for all k = 0, 1, 2, . . . , N − 1. Therefore, if we can efficiently compute the
DFTs of the polynomials a and b, we can compute, using only N additional
multiplications, the DFT of the product c = ab. Computing the sequence
(cN −1 , cN −2 , . . . , c1 , c0 ) from its DFT (CN −1 , CN −2 , . . . , C1 , C0 ) is called the
inverse discrete Fourier transform (IDFT). Let (ĈN −1 , ĈN −2 , . . . , Ĉ1 , Ĉ0 ) be
the DFT of (CN −1 , CN −2 , . . . , C1 , C0 ). One can check (Exercise 1.9) that
1
(cN −1 , cN −2 , . . . , c1 , c0 ) = (Ĉ1 , Ĉ2 , . . . , ĈN −1 , Ĉ0 ), (1.1)
N
that is, the IDFT of a sequence can be easily computed from its DFT. So it suf-
fices to compute the DFT as efficiently as possible. A naı̈ve application of the
DFT formula leads to O(N 2 ) running time. A divide-and-conquer procedure
for computing the DFT (AN −1 , AN −2 , . . . , A1 , A0 ) of (aN −1 , aN −2 , . . . , a1 , a0 )
is now presented. This procedure uses only O(n log n) operations in the un-
derlying field (C for the time being), and is called the fast Fourier transform
(FFT) of the input sequence. Let us write the polynomial a as
a = a(e) (B 2 ) + B × a(o) (B 2 ),
where
a(e) (B) = aN −2 B N/2−1 + aN −4 B N/2−2 + · · · + a2 B + a0 , and
a(o) (B) = aN −1 B N/2−1 + aN −3 B N/2−2 + · · · + a3 B + a1
are polynomials obtained from a by taking the terms at even and odd posi-
2
tions, respectively. But ω N = ωN is a primitive N2 -th root of unity. More-
2
over, a(e) and a(o) are polynomials with N2 terms. We recursively com-
(e) (e) (e) (e)
pute the DFT (actually, FFT) (A N −1 , A N −2 , . . . , A1 , A0 ) of a(e) , and the
2 2
N
(o) (o) (o) (o)
DFT (A N −1 , A N −2 , . . . , A1 , A0 ) of a(o) . Finally, ωN2 = −1, so for all
2 2
k = 0, 1, 2, . . . , N2 − 1 we have
(e)
k (o) k(e) (o)
Ak = Ak + ωN Ak and A N +k = Ak − ωN Ak .
2
18 Computational Number Theory

Example 1.11 Let me illustrate the FFT-based multiplication algorithm on


the following two integers represented in base B = 28 = 256.
a = 1234567890 = (73, 150, 2, 210)B ,
b = 1357924680 = (80, 240, 73, 72)B .
First, we need to pad a and b with leading zero digits so that each is of length
N = 8. The product c = ab is given as
c = IDFT(DFT(0, 0, 0, 0, 73, 150, 2, 210) · DFT(0, 0, 0, 0, 80, 240, 73, 72)).

A primitive eighth roots of unity is ω8 = e2π i /8 = (1 + i)/ 2 . In the recursive
calls, we also need a primitive fourth root of unity ω4 = e2π i /4 = i, and a
primitive second root of unity ω2 = −1. For a sequence (x1 , x0 ) of length two,
DFT(x1 , x0 ) = (x0 − x1 , x0 + x1 ),
whereas for a sequence (x3 , x2 , x1 , x0 ) of length four,
DFT(x3 , x2 , x1 , x0 ) = COMBINE(DFT(x2 , x0 ), DFT(x3 , x1 ))
= COMBINE((x0 − x2 , x0 + x2 ), (x1 − x3 , x1 + x3 ))
³
= (x0 − x2 ) − i(x1 − x3 ), (x0 + x2 ) − (x1 + x3 ),
´
(x0 − x2 ) + i(x1 − x3 ), (x0 + x2 ) + (x1 + x3 ) .

Here, COMBINE stands for the combination of the DFTs of the two recur-
sive calls. For computing the DFT (X7 , X6 , X5 , X4 , X3 , X2 , X1 , X0 ) of the
sequence (x7 , x6 , x5 , x4 , x3 , x2 , x1 , x0 ) of length eight, recursive calls are made
on (x6 , x4 , x2 , x0 ) and (x7 , x5 , x3 , x1 ) to get the two sub-DFTs
³
(Y3 , Y2 , Y1 , Y0 ) = (x0 − x4 ) − i(x2 − x6 ), (x0 + x4 ) − (x2 + x6 ),
´
(x0 − x4 ) + i(x2 − x6 ), (x0 + x4 ) + (x2 + x6 ) ,
³
(Z3 , Z2 , Z1 , Z0 ) = (x1 − x5 ) − i(x3 − x7 ), (x1 + x5 ) − (x3 + x7 ),
´
(x1 − x5 ) + i(x3 − x7 ), (x1 + x5 ) + (x3 + x7 ) ,

respectively. Finally, the combine step gives


X0 = Y0 + Z
³ 0, ´ X4 = Y0 − ³
Z0 , ´
X1 = Y1 + 1+√ i Z1 ,
2
X5 = Y1 − 1+√ i Z1 ,
2
X2 = Y2 + ³
iZ2 , ´ X6 = Y2 − ³
iZ2 , ´
−1+
√ i −1+
√ i
X3 = Y3 + 2
Z3 , X7 = Y3 − 2
Z3 .

For the first operand a, we have


(Y3 , Y2 , Y1 , Y0 ) = DFT(0, 0, 150, 210) = (210 − 150i, 60, 210 + 150i, 360),
(Z3 , Z2 , Z1 , Z0 ) = DFT(0, 0, 73, 2) = (2 − 73i, −71, 2 + 73i, 75).
Arithmetic of Integers 19

Therefore, the combining formulas give the DFT of a as


µ³ ´ ³ ´
−71−75 i

2
+ (210 − 150i), 60 + 71i, 71−75
√ i + (210 + 150i), 285,
2
³ ´ ³ ´ ¶
√ i + (210 − 150i), 60 − 71i, −71+75
71+75 i
2

2
+ (210 + 150i), 435 .

Likewise, we compute the DFT of b as


µ³ ´ ³ ´
−7−153 i 7−153
√ i + (72 + 240i), 159,

2
+ (72 − 240i), −168 + 7i, 2
³ ´ ³ ´ ¶
7+153 i −7+153 i

2
+ (72 − 240i), −168 − 7i, √
2
+ (72 + 240i), 465 .

Now, we make a point-by-point multiplication of the two DFTs to obtain the


DFT C of the product c = ab:
³ √
C = (−23766 − 9720i) 2 + (−26369 − 55506i), −10577 − 11508i,

(23766 − 9720i) 2 + (−26369 + 55506i), 45315,

(23766 + 9720i) 2 + (−26369 − 55506i), −10577 + 11508i,
√ ´
(−23766 + 9720i) 2 + (−26369 + 55506i), 202275 .

In order to recover c = IDFT(C) from C, we first take the DFT of C (the


steps are not shown here):

DFT(C) = (123792, 490768, 267888, 331912, 236160, 46720, 0, 120960).

We then obtain c as

c = IDFT(C)
1
= (0, 46720, 236160, 331912, 267888, 490768, 123792, 120960)
8
= (0, 5840, 29520, 41489, 33486, 61346, 15474, 15120).

This implies that

c = 0 × B 7 + 5840 × B 6 + 29520 × B 5 + 41489 × B 4 + 33486 × B 3 +


61346 × B 2 + 15474 × B + 15120
= 1676450206966525200.

One can check that this is indeed the value of 1234567890 × 1357924680.
Throughout this example, I used hybrid arithmetic, that is, integer arith-
metic √
in conjunction with arithmetic associated with the algebraic numbers
i and 2. Moreover, I have not shown the integer arithmetic in base 256. So
long as this example is meant for illustrating FFT-based multiplication, this
abstraction is fine. In practice, one may resort to floating-point arithmetic. ¤
20 Computational Number Theory

In the FFT procedure, computing the DFT of a sequence of length N


is recursively replaced by the computations of the DFTs of two sequences of
length N/2 each. The additional effort in this process is O(N ) field operations.
So the running time of FFT can be expressed as O(N log N ) field operations.
Since N = Θ(n), this quantity is O(n log n) too.
We now review the issues associated with complex arithmetic involved in
the process. Knuth (Footnote 2) shows that a floating-point precision of 6(t+1)
bits (where N = 2t+1 ) suffices for these computations, leading to a running
time of O(n log n log log n log log log n · · ·) for FFT-based multiplication.
Schönhage and Strassen point out that one can use an integer-only arith-
metic in this algorithm. They suggest working in the ring Z2s +1 with 2 used as
a primitive 2s-th root of unity (for a suitable s). In this way, they achieve an
O(n log n log log n) running time. The details are omitted here. Let me con-
clude this topic with a relevant advice to potential implementers. For very
small input operands, the schoolbook method is the best choice. Beyond that,
the Karatsuba–Ofman method takes over, followed by the Toom-3 method.
Eventually, for large enough operands, the Schönhage–Strassen method is the
fastest alternative. The crossover points are to be located experimentally.

1.1.4 An Introduction to GP/PARI


Does everybody interested in computational number theory need to write
all the above basic functions? Fortunately, no. There exist good computa-
tional libraries that implement multiple-precision integer arithmetic. Recently,
the GMP (the GNU Multi-Precision) library has gained popularity. Some other
public-domain libraries are LiDIA, LIP, NTL, SIMATH, and ZEN. Each such
library can be freely downloaded from the Internet. The package sage com-
bines many open-source mathematical packages and provides a python-based
interface (visit http://www.sagemath.org/). One can read the accompanying
usage instructions to learn how to work with these libraries/packages.
In this book, I use the GP/PARI calculator for illustrating multiple-precision
arithmetic. It is free and efficient, with a simple text-based interface. When run
from a command prompt (shell), it displays a welcome note and then runs an
interpreter that waits for the user to enter instructions, parses each instruction,
executes it (if error-free), and displays the output of the instruction. In my
machine, running GP/PARI invokes the interpreter as shown below. The prompt
issued by this interpreter is shown as gp > (in some versions, the prompt is ?).

bash$ gp
GP/PARI CALCULATOR Version 2.1.7 (released)
i686 running linux (ix86 kernel) 32-bit version
compiled: Feb 24 2011, gcc-4.4.3 (GCC)
(readline v6.1 enabled, extended help available)

Copyright (C) 2002 The PARI Group


Arithmetic of Integers 21

PARI/GP is free software, covered by the GNU General Public License, and
comes WITHOUT ANY WARRANTY WHATSOEVER.

Type ? for help, \q to quit.


Type ?12 for how to get moral (and possibly technical) support.

realprecision = 28 significant digits


seriesprecision = 16 significant terms
format = g0.28

parisize = 4000000, primelimit = 500000


gp >

One can enter an arithmetic expression against the prompt. GP/PARI eval-
uates the expression and displays the result. This result is actually stored
in a variable for future references. These variables are to be accessed as
%1,%2,%3,. . . . The last returned result is stored in the variable %%.
Here follows a simple conversation between ¡ me ¢ and100!
GP/PARI. I ask GP/PARI
3 2
to calculate the expressions 22 + 32 and 100 25 = 25!75! . GP/PARI uses con-
ventional precedence and associativity rules for arithmetic operators. For ex-
ample, the exponentiation operator ^ is right-associative and has a higher
precedence than the addition operator +. Thus, 2^2^3+3^2^2 is interpreted as
(2^(2^3))+(3^(2^2)). One can use explicit disambiguating parentheses.

gp > 2^2^3+3^2^2
%1 = 337
gp > 100!/(25!*75!)
%2 = 242519269720337121015504
gp >

GP/PARI supports many built-in¡ ¢ arithmetic and algebraic functions. For


example, the binomial coefficient nr can be computed by invoking binomial().

gp > binomial(100,25)
%3 = 242519269720337121015504
gp >

One can also define functions at the GP/PARI prompt. For example, one
may choose to redefine the binomial() function as follows.

gp > choose1(n,r) = n!/(r!*(n-r)!)


gp > choose1(100,25)
%4 = 242519269720337121015504
gp >
22 Computational Number Theory

Here is an alternative
¡ ¢ implementation of the binomial() function, based
on the formula nr = n(n−1)···(n−r+1)
r! . It employs sophisticated programming
styles (like for loops). The interpreter of GP/PARI reads instructions from the
user line by line. If one instruction is too big to fit in a single line, one may
let the instruction span over multiple lines. In that case, one has to end each
line (except the last) by the special character \.

gp > choose2(n,r) = \
num=1; den=1; \
for(k=1, r, \
num*=n; den*=r; \
n=n-1; r=r-1 \
); \
num/den
gp > choose2(100,25)
%5 = 242519269720337121015504
gp >

All variables in GP/PARI are global by default. In the function choose2(), the
variables num and den accumulate the numerator and the denominator. When
the for loop terminates, num stores n(n − 1) · · · (n − r + 1), and den stores r!.
These values can be printed subsequently. If a second call of choose2() is made
with different arguments, the values stored in num and den are overwritten.

gp > num
%6 = 3761767332187389431968739190317715670695936000000
gp > den
%7 = 15511210043330985984000000
gp > choose2(55,34)
%8 = 841728816603675
gp > num
%9 = 248505954558196590544596278440992435848871936000000000
gp > den
%10 = 295232799039604140847618609643520000000
gp > 34!
%11 = 295232799039604140847618609643520000000
gp >

All local variables to be used in a function must be specified by the local()


declaration. Here is a function that, upon input x, computes 2x2 + 3x3 . In
this function, the variables y,z,u,v are local, whereas the variable w is global.

gp > f(x) = local(y=x*x,z=x*x*x,u=2,v); \


v=u-1; w=u+v; \
u*eval(y) + w*eval(z)
gp > f(43)
Arithmetic of Integers 23

%12 = 242219
gp > x
%13 = x
gp > y
%14 = y
gp > z
%15 = z
gp > u
%16 = u
gp > v
%17 = v
gp > w
%18 = 3
gp >

Now, I present a more complicated example in order to illustrate how


we can use GP/PARI as a programmable calculator. Consider the expression
2
+b2
f (a, b) = aab−1 for all positive integers a, b (except a = b = 1). It is known
that f (a, b) assumes integer values for infinitely many pairs (a, b). Moreover,
whenever f (a, b) evaluates to an integer value, f (a, b) = 5. Here is a GP/PARI
function that accepts an argument L and locates all pairs (a, b) with a, b 6 L,
for which f (a, b) is an integer. Because of symmetry, we concentrate only on
those pairs with a 6 b. Since f (a, b) cannot be an integer if a = b, our search
may be restricted only to the pairs (a, b) satisfying 1 6 a < b 6 L.

gp > #
timer = 1 (on)
gp > \
searchPair(L) = \
for (a=1, L, \
for (b=a+1, L, \
x=(a^2+b^2)/(a*b-1); \
if (x == floor(x), \
print(" a = ", a, ", b = ", b, ", x = ", x, ".") \
) \
) \
)
gp > searchPair(10)
a = 1, b = 2, x = 5.
a = 1, b = 3, x = 5.
a = 2, b = 9, x = 5.
time = 1 ms.
gp > searchPair(100)
a = 1, b = 2, x = 5.
a = 1, b = 3, x = 5.
a = 2, b = 9, x = 5.
a = 3, b = 14, x = 5.
a = 9, b = 43, x = 5.
a = 14, b = 67, x = 5.
time = 34 ms.
gp > searchPair(1000)
a = 1, b = 2, x = 5.
24 Computational Number Theory

a = 1, b = 3, x = 5.
a = 2, b = 9, x = 5.
a = 3, b = 14, x = 5.
a = 9, b = 43, x = 5.
a = 14, b = 67, x = 5.
a = 43, b = 206, x = 5.
a = 67, b = 321, x = 5.
a = 206, b = 987, x = 5.
time = 3,423 ms.
gp >

In the above illustration, I turned the timer on by using the special di-
rective #. Subsequently, GP/PARI displays the time taken for executing each
instruction. The timer can be turned off by typing the directive # once again.
GP/PARI provides text-based plotting facilities also.

gp > plot(X=0,2*Pi,sin(X))

0.9996892 |’’’’’’’’’’’_x""""x_’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’’|
| _x "x |
| x "_ |
| " x |
| _" x |
| _ x |
| _ " |
| _ " |
| _ " |
|_ x |
| x |
"‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘x‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘‘"
| x "|
| _ " |
| _ " |
| _ " |
| x " |
| x _" |
| x _ |
| "_ x |
| x_ x" |
-0.999689 |..........................................."x____x"...........|
0 6.283185
gp>

This book is not meant to be a tutorial on GP/PARI. One can read the
manual supplied in the GP/PARI distribution. One can also use the extensive
on-line help facility bundled with the calculator. Entering ? at the GP/PARI
prompt yields an overview of help topics. One may follow these instructions
in order to obtain more detailed help. Some examples are given below.

gp > ?
Arithmetic of Integers 25

Help topics:
0: list of user-defined identifiers (variable, alias, function)
1: Standard monadic or dyadic OPERATORS
2: CONVERSIONS and similar elementary functions
3: TRANSCENDENTAL functions
4: NUMBER THEORETICAL functions
5: Functions related to ELLIPTIC CURVES
6: Functions related to general NUMBER FIELDS
7: POLYNOMIALS and power series
8: Vectors, matrices, LINEAR ALGEBRA and sets
9: SUMS, products, integrals and similar functions
10: GRAPHIC functions
11: PROGRAMMING under GP
12: The PARI community

Further help (list of relevant functions): ?n (1<=n<=11).


Also:
? functionname (short on-line help)
?\ (keyboard shortcuts)
?. (member functions)
Extended help looks available:
?? (opens the full user’s manual in a dvi previewer)
?? tutorial (same with the GP tutorial)
?? refcard (same with the GP reference card)

?? keyword (long help text about "keyword" from the user’s manual)
??? keyword (a propos: list of related functions).
gp > ?4

addprimes bestappr bezout bezoutres bigomega


binomial chinese content contfrac contfracpnqn
core coredisc dirdiv direuler dirmul
divisors eulerphi factor factorback factorcantor
factorff factorial factorint factormod ffinit
fibonacci gcd hilbert isfundamental isprime
ispseudoprime issquare issquarefree kronecker lcm
moebius nextprime numdiv omega precprime
prime primes qfbclassno qfbcompraw qfbhclassno
qfbnucomp qfbnupow qfbpowraw qfbprimeform qfbred
quadclassunit quaddisc quadgen quadhilbert quadpoly
quadray quadregulator quadunit removeprimes sigma
sqrtint znlog znorder znprimroot znstar

gp > ?znorder
znorder(x): order of the integermod x in (Z/nZ)*.

gp > znorder(Mod(19,101))
%19 = 25
gp > ?11

addhelp alias allocatemem break default error


extern for fordiv forprime forstep forsubgroup
forvec getheap getrand getstack gettime global
if input install kill next print
print1 printp printp1 printtex quit read
reorder return setrand system trap type
until whatnow while write write1 writetex
26 Computational Number Theory

gp > ?while
while(a,seq): while a is nonzero evaluate the expression sequence seq.
Otherwise 0.

gp > n=50; t=1; while(n, t*=n; n--)


gp > t
%20 = 30414093201713378043612608166064768844377641568960512000000000000
gp > ?if
if(a,seq1,seq2): if a is nonzero, seq1 is evaluated, otherwise seq2. seq1 and
seq2 are optional, and if seq2 is omitted, the preceding comma can be omitted
also.

gp > MAX(a,b,c) = \
if (a>b, \
if(a>c, return(a), return(c)), \
if(b>c, return(b), return(c)) \
)
gp > MAX(3,7,5)
%21 = 7
gp > Fib(n) = \
if(n==0, return(0)); \
if(n==1, return(1)); \
return(Fib(n-1)+Fib(n-2))
? Fib(10)
%22 = 55
gp > ?printp
printp(a): outputs a (in beautified format) ending with newline.

gp > ?printtex
printtex(a): outputs a in TeX format.

gp > print(x^13+2*x^5-5*x+4)
x^13 + 2*x^5 - 5*x + 4
gp > printp(x^13+2*x^5-5*x+4)
(x^13 + 2 x^5 - 5 x + 4)
gp > printtex(x^13+2*x^5-5*x+4)
x^{13}
+ 2 x^5
- 5 x
+ 4
gp >

One can close the GP/PARI interpreter by typing \q or by hitting control-d.

gp > \q
Good bye!
bash$

I will not explain the syntax of GP/PARI further in this book, but will use
the calculator for demonstrating arithmetic (and algebraic) calculations.
Arithmetic of Integers 27

1.2 GCD
Divisibility is an important property of integers. Let a, b ∈ Z. We say
that a divides b and denote this as a|b if there exists an integer c for which
b = ca. For example, 31|1023 (since 1023 = 33 × 31), −31|1023 (since 1023 =
(−33) × (−31)), every integer a (including 0) divides 0 (since 0 = 0 × a). Let
a|b with a 6= 0. The (unique) integer c with b = ca is called the cofactor of a
in b. If a|b with both a, b non-zero, then |a| 6 |b|. By the notation a6 | b, we
mean that a does not divide b.

Definition 1.12 Let a, b ∈ Z be not both zero. The largest positive integer
d that divides both a and b is called the greatest common divisor or the gcd
of a and b. It is denoted as gcd(a, b). Clearly, gcd(a, b) = gcd(b, a). For a 6= 0,
we have gcd(a, 0) = |a|. The value gcd(0, 0) is left undefined. Two integers a, b
are called coprime or relatively prime if gcd(a, b) = 1. ⊳

In addition to this usual divisibility, Z supports another notion of division,


that turns out to be very useful for the computation of gcd’s.

Theorem 1.13 [Euclidean division]7 For a, b ∈ Z with b 6= 0, there exist


unique integers q, r such that a = qb + r and 0 6 r 6 |b| − 1. We call q
the quotient and r the remainder of Euclidean division of a by b. We denote
q = a quot b and r = a rem b. ⊳

Example 1.14 We have 1023 = 11 × 89 + 44, so 1023 quot 89 = 11, and


1023 rem 89 = 44. Similarly, 1023 = (−11) × (−89) + 44, that is, 1023 quot
(−89) = −11, and 1023 rem (−89) = 44. Also −1023 = (−11) × 89 − 44 =
(−12) × 89 + (89 − 44) = (−12) × 89 + 45, that is, (−1023) quot 89 = −12, and
(−1023) rem 89 = 45. Finally, −1023 = 12 × (−89) + 45, so that (−1023) quot
(−89) = 12, and (−1023) rem (−89) = 45. The last two examples demonstrate
that while computing r = a rem b with a < 0, we force r to a value 0 6
r 6 |b| − 1. This convention sometimes contradicts the remainder operators in
programming languages (like % in C), which may return negative values. ¤

1.2.1 Euclidean GCD Algorithm


At an early age, we have learned the repeated Euclidean division procedure
for computing the gcd of two integers. The correctness of this procedure is
based on Euclid’s gcd theorem.
7 The Greek mathematician and philosopher Euclid (ca. 325–265 BC) is especially fa-

mous for his contributions to geometry. His book Elements influences various branches of
mathematics even today.
28 Computational Number Theory

Theorem 1.15 [Euclidean gcd ] Let a, b ∈ Z with b 6= 0, and r = a rem b.


Then, gcd(a, b) = gcd(b, r). ⊳

Suppose that we want to compute gcd(a, b) with both a and b positive. We


compute a sequence of remainders as follows. We first let r0 = a and r1 = b.
r0 = q2 r1 + r2 , 0 < r2 6 r1 − 1,
r1 = q3 r2 + r3 , 0 < r3 6 r2 − 1,
r2 = q4 r3 + r4 , 0 < r4 6 r3 − 1,
···
ri−2 = qi ri−1 + ri , 0 < ri 6 ri−1 − 1,
···
rk−2 = qk rk−1 + rk , 0 < rk 6 rk−1 − 1,
rk−1 = qk+1 rk .
Euclid’s gcd theorem guarantees that gcd(a, b) = gcd(r0 , r1 ) = gcd(r1 , r2 ) =
gcd(r2 , r3 ) = · · · = gcd(rk−2 , rk−1 ) = gcd(rk−1 , rk ) = rk , that is, gcd(a, b) is
the last non-zero remainder in the sequence r0 , r1 , r2 , r3 , . . . .

Example 1.16 Let us compute gcd(252, 91) by Euclid’s procedure.


252 = 2 × 91 + 70,
91 = 1 × 70 + 21,
70 = 3 × 21 + 7,
21 = 3 × 7.
Therefore, gcd(252, 91) = gcd(91, 70) = gcd(70, 21) = gcd(21, 7) = 7. ¤

Some important points need to be noted in connection with the Euclidean


gcd procedure. First, the remainder sequence follows the chain of inequalities
r1 > r2 > r3 > · · · > rk−1 > rk ,
that is, the procedure terminates after finitely many steps. More specifically,
it continues for at most r1 = b iterations.
In practice, the number of iterations is much less than b. In order to see
why, let us look at the computation of the i-th remainder ri .
ri−2 = qi ri−1 + ri .
Without loss of generality, we may assume that a > b (if not, a single Euclidean
division swaps the two operands). Under this assumption, ri−1 > ri for all
i > 1. It then follows that qi > 1 for all i > 2. Therefore, ri−2 = qi ri−1 + ri >
ri−1 + ri > ri + ri = 2ri , that is, ri 6 ri−2
2 , that is, after two iterations the
remainder reduces by at least a factor of two. Therefore, after O(lg(min(a, b)))
iterations, the remainder reduces to zero, and the algorithm terminates.
A C function for computing the gcd of two integers a, b is presented now.
Here, we assume signed single-precision integer operands only. The basic al-
gorithm can be readily extended to multiple-precision integers by replacing
Arithmetic of Integers 29

the standard operators -, %, =, etc. by appropriate function calls. If one uses a


language that supports operator overloading (like C++), this gcd implemen-
tation continues to make sense; however, the default data type int is to be
replaced by a user-defined name. The following function needs to remember
only the last two remainders in the sequence, and these remainders are always
stored in the formal arguments a and b.

Algorithm 1.1: Euclidean gcd


int gcd ( int a, int b )
{
int r;

if ((a == 0) && (b == 0)) return 0;


if (a < 0) a = -a;
if (b < 0) b = -b;
while (b != 0) {
r = a % b;
a = b;
b = r;
}
return a;
}

1.2.2 Extended GCD Algorithm


An extremely important byproduct available during the computation of
gcd’s is the Bézout relation defined by the following theorem.8

Theorem 1.17 [Bézout relation] For a, b ∈ Z, not both zero, there exist
integers u, v satisfying gcd(a, b) = ua + vb. ⊳

The computation of the multipliers u, v along with gcd(a, b) is referred to


as the extended gcd computation. I first establish that the multipliers do exist.
The remainder sequence obtained during the computation of gcd(a, b) gives
gcd(a, b) = rk = rk−2 − qk rk−1 = rk−2 − qk (rk−3 − qk−1 rk−2 ) = −qk rk−3 +
(1 + qk qk−1 )rk−2 = · · · , that is, rk is expressed first as a linear combination
of rk−2 and rk−1 , then as a linear combination of rk−3 and rk−2 , and so on
until rk is available as a linear combination of r0 and r1 (that is, of a and b).
The quotients qk , qk−1 , . . . , q2 are used in that order for computing the desired
Bézout relation and so must be remembered until the values of the multipliers
u, v for r0 , r1 are computed.
8 Étienne Bézout (1730–1783) was a French mathematician renowned for his work on

solutions of algebraic equations (see Theorem 4.27 for example).


30 Computational Number Theory

There is an alternative way to compute the multipliers u, v without remem-


bering the intermediate quotients. Algorithm 1.2 implements this new idea.
The proof of its correctness is based on a delicate invariant of the extended
Euclidean gcd loop.
For computing gcd(a, b), we obtain the remainder sequence r0 , r1 , r2 , . . . .
In addition, we also compute two sequences u0 , u1 , u2 , . . . and v0 , v1 , v2 , . . .
such that for all i > 0 we maintain the relation ri = ui a + vi b. We initialize
the three sequences as r0 = a, r1 = b, u0 = 1, u1 = 0, v0 = 0, v1 = 1, so that
the relation ri = ui a + vi b is satisfied for i = 0, 1.
Assume that during the computation of ri , we have rj = uj a + vj b for all
j = 0, 1, . . . , i − 1. Euclidean division gives ri = ri−2 − qi ri−1 . We analogously
assign ui = ui−2 − qi ui−1 and vi = vi−2 − qi vi−1 . By the induction hypothesis,

ri−2 = ui−2 a + vi−2 b, and


ri−1 = ui−1 a + vi−1 b, so that
ri = ri−2 − qi ri−1 = (ui−2 − qi ui−1 )a + (vi−2 − qi vi−1 ) = ui a + vi b.

Therefore, the invariance continues to hold for j = i too.

Example 1.18 The following table illustrates the extended gcd computation
for 252, 91 (Also see Example 1.16).

i qi ri ui vi u i a + vi b
Initialization
0 − 252 1 0 252
1 − 91 0 1 91
Iterations
2 2 70 1 −2 70
3 1 21 −1 3 21
4 3 7 4 −11 7
5 3 0 −13 36 0

Therefore, gcd(252, 91) = 7 = 4 × 252 + (−11) × 91. ¤

Before the algorithm implementing this idea is presented, some comments


are in order. The computation of ri , ui , vi in an iteration requires the values
ri−1 , ui−1 , vi−1 and ri−2 , ui−2 , vi−2 only from two previous iterations. We,
therefore, need to store only the values from two previous iterations, that is, we
store ri , ui , vi as r0 , u0 , v0 (refers to the current iteration), ri−1 , ui−1 , vi−1 as
r1 , u1 , v1 (the previous iteration), and ri−2 , ui−2 , vi−2 as r2 , u2 , v2 (the second
previous iteration). Moreover, the quotient qi calculated by the Euclidean
division of ri−2 by ri−1 is never needed after ri , ui , vi are calculated. This
eliminates the need for storing the quotient sequence. Finally, observe that
there is no need to compute both the u and v sequences inside the loop. This
is because the relation ri = ui a + vi b is always maintained, so we can compute
Arithmetic of Integers 31

vi = (ri −ui a)/b from the values ri , ui , a, b only. Furthermore, the computation
of ui = ui−2 − qi ui−1 does not require the values from the v sequence.
Algorithm 1.2 incorporates all these intricate details. Each iteration of the
gcd loop performs only a constant number of integer operations. Moreover, like
the basic gcd calculation (Algorithm 1.1), the loop is executed O(lg(min(a, b)))
times. To sum up, the extended gcd algorithm is slower by the basic gcd
algorithm by only a small constant factor.

Algorithm 1.2: Extended Euclidean gcd


int egcd ( int a, int b, int *u, int *v )
{
int q0, r0, r1, r2, u0, u1, u2;

if ((a == 0) && (b == 0)) { *u = *v = 0; return 0; }


r2 = (a < 0) ? -a : a; r1 = (b < 0) ? -b : b;
u2 = 1; u1 = 0;
while (r1 != 0) {
q0 = r2 / r1;
r0 = r2 - q0 * r1; r2 = r1; r1 = r0;
u0 = u2 - q0 * u1; u2 = u1; u1 = u0;
}
*u = u2; if (a < 0) *u = -(*u);
*v = (b == 0) ? 0 : (r2 - (*u) * a) / b;
return r2;
}

GP/PARI provides the built-in function gcd for computing the gcd of two
integers. An extended gcd can be computed by the built-in function bezout
which returns the multipliers and the gcd in the form of a 3-dimensional vector.

gp > gcd(2^77+1,2^91+1)
%1 = 129
gp > bezout(2^77+1,2^91+1)
%2 = [-151124951386849816870911, 9223935021170032768, 129]
gp > (-151124951386849816870911) * (2^77+1) + 9223935021170032768 * (2^91+1)
%3 = 129

1.2.3 Binary GCD Algorithm


Since computing gcd’s is a very basic operation in number theory, several
alternatives to the Euclidean algorithm have been studied extensively by the
research community. The Euclidean algorithm involves a division in each step.
This turns out to be a costly operation, particularly for multiple-precision in-
32 Computational Number Theory

tegers. Here, I briefly describe the binary gcd algorithm9 (Algorithm 1.3)
which performs better than the Euclidean gcd algorithm. This improvement
in performance is achieved by judiciously replacing Euclidean division by con-
siderably faster subtraction and bit-shift operations. Although the number of
iterations is typically larger in the binary gcd loop than in the Euclidean gcd
loop, the reduction of running time per iteration achieved by the binary gcd
algorithm usually leads to a more efficient algorithm for gcd computation.
Let gcd(a, b) be computed. We first write a = 2s a′ and b = 2t b′ with
a , b odd. Since gcd(a, b) = 2min(s,t) gcd(a′ , b′ ), we may assume that we are
′ ′

going to compute the gcd of two odd integers, that is, a, b themselves are
odd. First assume that a > b. We have gcd(a, b) = gcd(a − b, b). But a − b is
even and so can be written as a − b = 2r α. Since b is odd, it turns out that
gcd(a, b) = gcd(α, b). Since α = (a − b)/2r 6 (a − b)/2 6 a/2, replacing the
computation of gcd(a, b) by gcd(α, b) implies reduction of the bit-size of the
first operand by at least one. The case a < b can be symmetrically treated.
Finally, if a = b, then gcd(a, b) = a.

Algorithm 1.3: Binary gcd


int bgcd ( int a, int b )
{
int s,t;

if ((a == 0) && (b == 0)) return 0;


if (a < 0) a = -a; if (b < 0) b = -b;
if (a == 0) return b; if (b == 0) return a;
s = 0; while ((a & 1) == 0) { ++s; a = a >> 1; }
t = 0; while ((b & 1) == 0) { ++t; b = b >> 1; }
while (b > 0) {
if (a > b) {
a = a - b; while ((a & 1) == 0) a = a >> 1;
} else if (a < b) {
b = b - a; while ((b & 1) == 0) b = b >> 1;
} else b = 0;
}
return a << ((s <= t) ? s : t);
}

It is evident that the binary gcd algorithm performs at most lg a + lg b


iterations of the gcd loop. For improving the efficiency of the algorithm, bit
operations (like >> for shifting, & for checking the least significant bit) are
used instead of equivalent arithmetic operations. If a and b vary widely in

9 Josef Stein, Computational problems associated with Racah algebra, Journal of Com-

putational Physics, 1(3), 397–405, 1967. This algorithm seems to have been known in ancient
China.
Arithmetic of Integers 33

size, replacing the first iteration by a Euclidean division often improves the
performance of the algorithm considerably.
Like the extended Euclidean gcd algorithm, one can formulate the extended
binary gcd algorithm. The details are left to the reader as Exercise 1.13.
Sorenson10 extends the concept of binary gcd to k-ary gcd for k > 2.
Another gcd algorithm tailored to multiple-precision integers is the Lehmer
gcd algorithm.11

1.3 Congruences and Modular Arithmetic


The notion of divisibility of integers leads to the concept of congruences
which have far-reaching consequences in number theory.12

Definition 1.19 Let m ∈ N. Two integers a, b ∈ Z are called congruent


modulo m, denoted a ≡ b (mod m), if m|(a − b) or, equivalently, if a rem m =
b rem m. In this case, m is called the modulus of the congruence. ⊳

Example 1.20 Consider the modulus m = 5. The integers congruent to 0


modulo 5 are 0, ±5, ±10, ±15, . . .. The integers congruent to 1 modulo 5 are
. . . , −9, −4, 1, 6, 11, . . ., and those congruent to 3 modulo 5 are . . . , −7, −2,
3, 8, 13, . . . .
In general, the integers congruent to a modulo m are a+km for all k ∈ Z. ¤

Some basic properties of the congruence relation are listed now.

Proposition 1.21 Let m ∈ N and a, b, c, d ∈ Z be arbitrary.


(a) a ≡ a (mod m).
(b) If a ≡ b (mod m), then b ≡ a (mod m).
(c) If a ≡ b (mod m) and b ≡ c (mod m), then a ≡ c (mod m).
(d) If a ≡ c (mod m) and b ≡ d (mod m), then a + b ≡ c + d (mod m),
a − b ≡ c − d (mod m), and ab ≡ cd (mod m).
(e) If a ≡ b (mod m), and f (x) is a polynomial with integer coefficients, then
f (a) ≡ f (b) (mod m).
(f ) If a ≡ b (mod m) and d is a positive integer divisor of m, then a ≡
b (mod d).
m
(g) ab ≡ ac (mod m) if and only if b ≡ c (mod gcd(a,m) ). ⊳

10 Jonathan Sorenson, Two fast GCD algorithms, Journal of Algorithms, 16(1), 110–144,

1994.
11 Jonathan Sorenson, An analysis of Lehmer’s Euclidean gcd algorithm, ISSAC, 254–258,

1995.
12 The concept of congruences was formalized by the Swiss mathematician Leonhard Euler

(1707–1783) renowned for his contributions to several branches of mathematics. Many basic
notations we use nowadays (including congruences and functions) were introduced by Euler.
34 Computational Number Theory

All the parts of the proposition can be easily verified using the definition of
congruence. Part (g) in the proposition indicates that one should be careful
while canceling a common factor from the two sides of a congruence relation.
Such a cancellation should be accompanied by dividing the modulus by the
gcd of the modulus with the factor being canceled.
Definition 1.22 Let m ∈ N. A set of m integers a0 , a1 , a2 , . . . , am−1 is said
to constitute a complete residue system modulo m if every integer a ∈ Z is
congruent modulo m to one and only one of the integers ai for 0 6 i 6 m − 1.
Evidently, no two distinct integers ai , aj in a complete residue system can be
congruent to one another. ⊳

Example 1.23 The integers 0, 1, 2, . . . , m − 1 constitute a complete residue


system modulo m. So do the integers 1, 2, 3, . . . , m. If a0 , a1 , . . . , am−1 consti-
tute a complete residue system modulo m, then so also do aa0 , aa1 , . . . , aam−1
for an integer a if and only if a is coprime to m. For example, if m is odd,
then 0, 2, 4, . . . , 2m − 2 is a complete residue system modulo m. ¤

A complete residue system modulo m is denoted by Zm . Take arbitrary


elements a, b ∈ Zm . Consider the integer sum a + b. Since Zm is a complete
residue system modulo m, there exists a unique integer c ∈ Zm such that
c ≡ a + b (mod m). We define the sum of a and b in Zm to be this integer c.
The difference and product of a and b in Zm can be analogously defined. These
operations make Zm a commutative ring with identity. The additive identity
of Zm is its member congruent to 0 modulo m, whereas the multiplicative
identity of Zm is that member of Zm , which is congruent to 1 modulo m.
Z is a commutative ring (in fact, an integral domain) under addition and
multiplication of integers. Congruence modulo a fixed modulus m is an equiv-
alence relation on Z. Indeed the set mZ = {km | k ∈ Z} is an ideal of Z. The
quotient ring Z/mZ (that is, the set of all equivalence classes under congruence
modulo m) is the algebraic description of the set Zm .

Example 1.24 The most common representation of Zm is the set {0, 1, 2,


. . . , m − 1}. For a, b ∈ Zm , we can define the arithmetic operations as follows.
½
a+b if a + b < m,
a + b (mod m) =
a + b − m if a + b > m.
n
a−b if a > b,
a − b (mod m) =
a − b + m if a < b.
ab (mod m) = (ab) rem m.
Here, the operations and relations on the right sides pertain to standard in-
teger arithmetic.
As a specific example, take m = 17, a = 5, and b = 8. Then a + b as an
integer is 13 which is less than the modulus. So a + b (mod m) is 13. Since
a < b as integers, the difference a − b (mod m) is equal to 5 − 8 + 17 = 14.
Also 5 × 8 (mod m) = (5 × 8) rem 17 = 6.
Arithmetic of Integers 35

Unless otherwise mentioned, I will let Zm stand for the standard residue
system {0, 1, 2, . . . , m − 1} rather than for any arbitrary residue system. ¤

GP/PARI supports the standard representation of Zm as {0, 1, 2, . . . , m − 1}.


An element a ∈ Zm is represented as Mod(a,m). This notation is meant to
differentiate between a as an integer and a considered as an element of Zm .
A sample conversation with the GP/PARI interpreter follows.

gp > m = 17
%1 = 17
gp > a = Mod(5,m)
%2 = Mod(5, 17)
gp > b = Mod(25,m)
%3 = Mod(8, 17)
gp > a + b
%4 = Mod(13, 17)
gp > a - b
%5 = Mod(14, 17)
gp > a * b
%6 = Mod(6, 17)
gp > a / b
%7 = Mod(7, 17)
gp > 7 * a
%8 = Mod(1, 17)
gp > a^7
%9 = Mod(10, 17)

Here, 7a stands for a + a + · · · + a (7 times), whereas a7 stands for a × a × · · · ×


a (7 times). But what is a/b (mod m)? Like other situations, a/b is defined as
the product of a and the multiplicative inverse of b. Let us, therefore, look at
what is meant by modular multiplicative inverses and how to compute them.

Definition 1.25 An element a ∈ Zm is said to be invertible modulo m (or a


unit in Zm ) if there exists an integer u (in Zm ) such that ua ≡ 1 (mod m). ⊳

Example 1.26 Let m = 15. The element 7 is invertible modulo 15, since
13 × 7 ≡ 1 (mod 15). On the other hand, the element 6 of Z15 is not invertible.
I prove this fact by contradiction, that is, I assume that u is an inverse of 6
modulo 15. This means that 6u ≡ 1 (mod 15), that is, 15|(6u − 1), that is,
6u − 1 = 15k for some k ∈ Z, that is, 3(2u − 5k) = 1. This is impossible, since
the left side is a multiple of 3, whereas the right side is not. ¤

Theorem 1.27 An element a ∈ Zm is invertible if and only if gcd(a, m) = 1.


Proof First, suppose that gcd(a, m) = 1. By Bézout’s theorem, there exist
integers u, v for which 1 = ua + vm. But then ua ≡ 1 (mod m). Conversely,
suppose that a is invertible modulo m, that is, ua ≡ 1 (mod m) for some
integer u. But then ua − km = 1 for some k ∈ Z. The left side is divisible by
gcd(a, m), that is, gcd(a, m)|1, that is, gcd(a, m) = 1. ⊳
36 Computational Number Theory

The proof of Theorem 1.27 indicates that in order to compute the inverse
of a ∈ Zm , one can compute the extended gcd d = ua + vm. If d > 1, then a
is not invertible modulo m. If d = 1, (the integer in Zm congruent modulo m
to) u is the (unique) inverse of a modulo m.

Example 1.28 Let us compute the inverse of 11 modulo 15. Extended gcd
calculations give gcd(11, 15) = 1 = (−4) × 11 + 3 × 15, that is, 11−1 ≡ −4 ≡
11 (mod 15), that is, 11 is its own inverse modulo 15.
On the other hand, if we try to invert 12 modulo 15, we obtain the Bézout
relation gcd(12, 15) = 3 = (−1) × 12 + 1 × 15. Since 12 and 15 are not coprime,
12 does not have a multiplicative inverse modulo 15. ¤

Definition 1.29 Let m ∈ N. A set of integers a1 , a2 , . . . , al is called a reduced


residue system modulo m if every integer a coprime to m is congruent to one
and only one of the integers ai , 1 6 i 6 l. Elements of a reduced residue
system are themselves coprime to m and are not congruent modulo m to one
another. Every complete residue system modulo m contains a reduced residue
system modulo m.
Every reduced residue system modulo m has the same size. This size is
denoted by φ(m). The function φ(m) of m is called Euler’s phi function or
Euler’s totient function. ⊳

Example 1.30 The complete residue system {0, 1, 2, . . . , m − 1} modulo m


contains the reduced residue system {a | 0 6 a 6 m − 1, gcd(a, m) = 1}. In
view of this, φ(m) is often defined as the number of integers between 0 and
m − 1 (or between 1 and m), that are coprime to m.
If a1 , a2 , . . . , al constitute a reduced residue system modulo m and if
gcd(a, m) = 1, then aa1 , aa2 , . . . , aal again constitute a reduced residue sys-
tem modulo m.
As a specific example, take m = 15. The integers between 0 and 14 that
are coprime to 15 are 1, 2, 4, 7, 8, 11, 13, 14. Thus φ(15) = 8. Since 15 is odd,
{2, 4, 8, 14, 16, 22, 26, 28} is another reduced residue system modulo 15. ¤

A reduced residue system modulo m is denoted by the symbol Z∗m . The


standard representation of Z∗m is the set
Z∗m = {a | 0 6 a 6 m − 1, gcd(a, m) = 1}.
Algebraically, Z∗m is the set of all units of Zm . It is a commutative group under
multiplication modulo m.
Two important theorems involving the phi function follow.

Theorem 1.31 [Euler’s theorem] Let m ∈ N and gcd(a, m) = 1. Then


aφ(m) ≡ 1 (mod m).
Proof Let a1 , a2 , . . . , aφ(m) constitute a reduced residue system modulo m.
Since gcd(a, m) = 1, the integers aa1 , aa2 , . . . , aaφ(m) too constitute a reduced
residue system modulo m. In particular, we have (aa1 )(aa2 ) · · · (aaφ(m) ) ≡
Arithmetic of Integers 37

a1 a2 · · · aφ(m) (mod m). Each ai is invertible modulo m, that is, gcd(ai , m) =


1. Thus we can cancel the factors ai for i = 1, 2, . . . , φ(m) from the congruence,
and obtain the desired result. ⊳

The special case of Euler’s theorem applied to prime moduli follows.


Theorem 1.32 [Fermat’s little theorem]13 Let p ∈ P, and a an integer not
divisible by p. Then, ap−1 ≡ 1 (mod p). For any integer b, we have bp ≡
b (mod p).
Proof Since p is a prime, every integer between 1 and p − 1 is coprime to p
and so φ(p) = p − 1. Euler’s theorem implies ap−1 ≡ 1 (mod p). For proving
the second part, first consider the case p6 | b. In that case, gcd(b, p) = 1, so
bp−1 ≡ 1 (mod p). Multiplying by b gives bp ≡ b (mod p). On the other hand,
if p|b, both bp and b are congruent to 0 modulo p. ⊳
An explicit formula for φ(m) can be derived.
Theorem 1.33 Let m = pe11 . . . perr be the prime factorization of m with pair-
wise distinct primes p1 , . . . , pr and with each of e1 , . . . , er positive. Then,
Yµ 1

e1 e1 −1 er er −1
φ(m) = (p1 − p1 ) · · · (pr − pr ) = m 1− ,
p
p|m

where the last product is over the set of all (distinct) prime divisors of m.
Proof Consider the standard residue system modulo m. An integer (between
0 and m − 1) is coprime to m if and only if it is divisible by neither of the
primes p1 , . . . , pr . We can use a combinatorial argument based on the principle
of inclusion and exclusion in order to derive the given formula for φ(m). ⊳

Example 1.34 98 = 2 × 72 , so φ(98) = (2 − 1) × (72 − 7) = 42. As another


example, 100 = 22 × 52 , so φ(100) = (22 − 2) × (52 − 5) = 40. Finally, 101 is
a prime, so φ(101) = 100. ¤

The call eulerphi(m) returns φ(m) in the GP/PARI calculator.

gp > eulerphi(98)
%1 = 42
gp > eulerphi(99)
%2 = 60
gp > eulerphi(100)
%3 = 40
gp > eulerphi(101)
%4 = 100
gp > factor(2^101-1)

13 Pierre de Fermat (1601–1665), a French lawyer, was an amateur mathematician having

significant contributions in number theory. Fermat is famous for his last theorem which
states that the equation xn + y n = z n does not have integer solutions with xyz 6= 0 for all
integers n > 3. See Footnote 1 in Chapter 4 for a historical sketch on Fermat’s last theorem.
38 Computational Number Theory

%5 =
[7432339208719 1]

[341117531003194129 1]

gp > (7432339208719-1)*(341117531003194129-1)
%6 = 2535301200456117678030064007904
gp > eulerphi(2^101-1)
%7 = 2535301200456117678030064007904

1.3.1 Modular Exponentiation


Modular exponentiation is a very important primitive useful in a vari-
ety of computational problems. Given m ∈ N and integers a, e, our task
is to compute ae (mod m). If e = −d is negative (where d > 0), then
ae ≡ a−d ≡ (a−1 )d (mod m), that is, we first compute the modular in-
verse of a and then raise this inverse to the positive exponent d. In view of
this, it suffices to restrict our attention to positive exponents only. We have
ae ≡ a × a × · · · × a (e times) (mod m), that is, e − 1 modular multiplications
yield ae (mod m). If e is large, this is a very inefficient algorithm for comput-
ing the modular exponentiation. A technique called the square-and-multiply
algorithm computes ae (mod m) using only O(log e) modular multiplications.
Let e = (es−1 es−2 . . . e1 e0 )2 be the binary representation of the exponent
e, where each ei ∈ {0, 1}. Define the partial exponents xi = (es−1 es−2 . . . ei )2
for i = s, s − 1, . . . , 1, 0, where xs is to be interpreted as 0. Also, denote
bi ≡ axi (mod m). We initially have bs ≡ 1 (mod m), and then keep on
computing bs−1 , bs−2 , . . . in that sequence until b0 ≡ ae (mod m) is computed.
Since xi = 2xi+1 +ei , we have bi ≡ axi ≡ (axi+1 )2 ×aei ≡ b2i+1 ×(aei ) (mod m).
These observations lead to the following modular exponentiation algorithm.

Algorithm 1.4: Square-and-multiply modular exponentiation


Initialize t = 1 (mod m).
for (i = s − 1; i >= 0; --i) {
Set t = t2 (mod m).
If (ei equals 1), set t = ta (mod m).
}
Return t.

Algorithm 1.4 requires the computation of s modular squares and 6 s


modular multiplications. The bit-size s of the exponent e satisfies s = O(lg e),
that is, the square-and-multiply loop is executed at most O(lg e) times.
Arithmetic of Integers 39

Example 1.35 Let us compute 713 (mod 31). Here m = 31, a = 7, and
e = 13 = (1101)2 . The following table summarizes the steps of the square-
and-multiply algorithm on these parameters.
i ei xi = (e3 . . . ei )2 t (after sqr) t (after mul) bi
4 − 0 − − 1
3 1 (1)2 = 1 12 ≡ 1 (mod 31) 1 × 7 ≡ 7 (mod 31) 7
2 1 (11)2 = 3 72 ≡ 18 (mod 31) 18 × 7 ≡ 2 (mod 31) 2
1 0 (110)2 = 6 22 ≡ 4 (mod 31) (multiplication skipped) 4
0 1 (1101)3 = 13 42 ≡ 16 (mod 31) 16 × 7 ≡ 19 (mod 31) 19
Thus, 713 ≡ 19 (mod 31). ¤

1.3.2 Fast Modular Exponentiation


A modular multiplication (or squaring) is an integer multiplication fol-
lowed by computing remainder with respect to the modulus. The division
operation is costly. There are many situations where the overhead associated
with these division operations can be substantially reduced. In a modular ex-
ponentiation algorithm (like Algorithm 1.4), the division is always carried out
by the same modulus m. This fact can be exploited to speed up the expo-
nentiation loop at the cost of some moderate precomputation before the loop
starts. I will now explain one such technique called Barrett reduction.14
Let B be the base (like 232 or 264 ) for representing multiple-precision inte-
gers. Barrett reduction works for any B > 3. Let the modulus m have ¥ the base-¦
B representation (ml−1 , ml−2 , . . . , m1 , m0 )B . The quantity T = B 2l /m is
precomputed. Given a 2l-digit integer product x = (x2l−1 , x2l−2 , . . . , x1 , x0 )B
of two elements of Zm and the precomputed value T , Algorithm 1.5 computes
x (mod m) without invoking the division algorithm.
Algorithm 1.5: Barrett reduction
¥ ¦ ¥ ¦
Compute Q = x/B l−1 , Q = QT , and Q = Q/B l+1 .
Compute R = (x − Qm) (mod B l+1 ).
While R > m, set R = R − m.
Return R.

Barrett reduction works as follows. One can express x = qm + r with


0 6 r 6 m−1. We are interested in computing the remainder r. Algorithm 1.5
starts by computing an approximate value Q for the quotient q. Computing
q = ⌊x/m⌋ requires a division procedure. We instead compute
¥ j k
 x ¦ B 2l 
 B l−1 m 
Q= .
B l+1

14 Paul D. Barrett, Implementing the Rivest Shamir and Adleman public key encryption
algorithm on a standard digital signal processor, CRYPTO’86, 311–332, 1987.
40 Computational Number Theory

Easy calculations show that q − 2 6 Q 6 q, that is, the approximate quotient


Q may be slightly smaller than the actual quotient q. The computation of Q
involves an integer multiplication
¥ QT
¦ and two divisions by powers of B. Under
the base-B representation, y/B i is obtained by throwing away the i least
significant B-ary digits from y. Therefore, Q can be efficiently computed.
In the next step, an approximate remainder R is computed. The actual
remainder is r = x − qm. Algorithm 1.5 uses the approximate quotient Q to
compute R = x − Qm. But q − 2 6 Q 6 q, so R computed in this way satisfies
0 6 R < 3m. Moreover, since m < B l and B > 3, we have R < B l+1 . While
performing the subtraction x − Qm, we may, therefore, pretend that x and
Qm are (l + 1)-digit integers. Even if the last subtraction results in a borrow,
we are certain that this borrow will be adjusted in the higher digits. Thus,
the computation of R involves one integer multiplication and one subtraction
of (l + 1)-digit integers, that is, R too can be computed efficiently.
From the approximate R, the actual r is computed by subtracting m as
often as is necessary. Since R < 3m, at most two subtractions are required.
Barrett remarks that in 90% of the cases, we have R = r, in 9% of the cases,
we have R = r + m, and in only 1% of the cases, we have R = r + 2m.

Example 1.36 Let us represent multiple-precision integers in base B =


28 = 256. I now illustrate one modular multiplication step using Barrett
reduction. The modulus is chosen as m = 12345678 = (188, 97, 78)B (so
l = 3). The elements a = 10585391 = (161, 133, 47)B and b = 8512056 =
(129, 226, 56)B are multiplied. Integer multiplication gives the product x =
ab = 90103440973896¥ = (81,¦ 242, 215, 151, 160, 72)B . The quantity to be pre-
computed is T = B 6 /m = 22799474 = (1, 91, 228, 114)B . In this example,
T is an (l + 1)-digit integer. We call Algorithm 1.5 with x, m and T as input.
Integer division of x by B 2 gives Q = 1374869399 = (81, 242, 215, 151)B
(discard two least significant digits of x). Multiplication of this value with the
precomputed T gives 31346299115896126 = (111, 93, 74, 255, 211, 125, 62)B .
Finally, we divide this by B 4 (that is, discard four least significant digits) to
obtain the approximate quotient as Q = 7298378 = (111, 93, 74)B . It turns
out that this Q is one smaller than the actual quotient q = ⌊x/m⌋, but we do
not need to worry about (nor even detect) this at this moment.
The approximate remainder R is computed as follows:

x = 90103440973896 = (81, 242, 215, 151, 160, 72)B


Qm = 90103424710284 = (81, 242, 214, 159, 118, 140)B
R = x − Qm = 16263612 = ( 0, 248, 41, 188)B

This R happens to be larger than m (but smaller than 2m), so the correct
value of r = x rem m is r = R − m = 3917934 = (59, 200, 110)B . ¤
Arithmetic of Integers 41

1.4 Linear Congruences


Solving congruences modulo m is the same as solving equations in the ring
Zm . A linear congruence is of the form

ax ≡ b (mod m),

where m ∈ N is a given modulus, and a, b are integers. In order to find all


integers x satisfying the congruence, we use the following theorem.

Theorem 1.37 Let d = gcd(a, m). The congruence ax ≡ b (mod m) is solv-


able (for x) if and only if d|b. If d|b, then all solutions are congruent to each
other modulo m/d, that is, there is a unique solution modulo m/d. In par-
ticular, if gcd(a, m) = 1, then the congruence ax ≡ b (mod m) has a unique
solution modulo m.
Proof First, suppose that ax ≡ b (mod m) is solvable. For each such solution
x we have ax−b = km for some k ∈ Z. Since d|a and d|m, we have d|(ax−km),
that is, d|b. Conversely, suppose d|b, that is, b = db′ for some b′ ∈ Z. Also
let a = da′ and m = dm′ for some integers a′ , m′ . The congruence ax ≡
b (mod m) can be rewritten as da′ x ≡ db′ (mod dm′ ). Canceling the factor d
gives an equivalent congruence a′ x ≡ b′ (mod m′ ). Since gcd(a′ , m′ ) = 1, a′ is
invertible modulo m′ . Therefore, any integer x ≡ (a′ )−1 b′ (mod m′ ) satisfies
the congruence a′ x ≡ b′ (mod m′ ).
Let x1 , x2 be two solutions of ax ≡ b (mod m). We have ax1 ≡ b ≡
ax2 (mod m). Canceling a yields x1 ≡ x2 (mod m/d). ⊳

Since the original congruence is provided modulo m, it is desirable that we


supply the solutions of the congruence modulo m (instead of modulo m/d).
Let x0 be a solution (unique) modulo m/d. All the elements in Zm that are
congruent to x0 modulo m/d are xi = x0 + i(m/d) for i = 0, 1, 2, . . . , d − 1.
Thus, x0 , x1 , . . . , xd−1 are all the solutions modulo m. In particular, there are
exactly d = gcd(a, m) solutions modulo m.

Example 1.38 Take the congruence 21x ≡ 9 (mod 15). Here, a = 21, b = 9,
and m = 15. Since d = gcd(a, m) = 3 divides b, the congruence is solvable.
Canceling 3 gives 7x ≡ 3 (mod 5), that is, x ≡ 7−1 × 3 ≡ 3 × 3 ≡ 4 (mod 5).
The solutions modulo 15 are 4, 9, 14.
The congruence 21x ≡ 8 (mod 15) is not solvable, since 3 = gcd(21, 15)
does not divide 8. ¤

1.4.1 Chinese Remainder Theorem


We now investigate how congruences modulo several moduli can be solved.
42 Computational Number Theory

Theorem 1.39 [Chinese Remainder Theorem (CRT )] Let m1 , m2 , . . . , mt


be pairwise coprime moduli (that is, gcd(mi , mj ) = 1 whenever i 6= j). More-
over, let a1 , a2 , . . . , at be arbitrary integers. Then, the congruences
x ≡ a1 (mod m1 ), x ≡ a2 (mod m2 ), . . . , x ≡ at (mod mt ),
are simultaneously solvable. The solution is unique modulo m1 m2 · · · mt .
Proof A constructive proof of the Chinese remainder theorem is given here.
Call M = m1 m2 · · · mt , and ni = M/mi for all i = 1, 2, . . . , t. For each i,
compute the inverse of ni modulo mi , that is, we find an integer ui for which
ui ni ≡ 1 (mod mi ). Since the moduli m1 , m2 , . . . , mt are pairwise coprime,
Yt
ni = mj is coprime to mi , and so the inverse ui exists. Finally, we take
j=1
j6=i
t
X
x= ui ni ai . It is an easy check that this x is a simultaneous solution of
i=1
the given congruences.
In order to prove the desired uniqueness, let x, y be two integers satisfying
the given congruences. We then have x ≡ y (mod mi ), that is, mi |(x − y) for
all i. Since mi are pairwise coprime, it follows that M |(x − y). ⊳
Example 1.40 Legend has it that Chinese generals counted soldiers using
CRT. Suppose that there are x 6 1000 soldiers in a group. The general asks
the soldiers to stand in rows of seven. The number of soldiers left over is
counted by the general. Let this count be five. The soldiers are subsequently
asked to stand in rows of eleven and in rows of thirteen. In both cases, the
numbers of leftover soldiers are counted (say, three and two). The general then
uses a magic formula to derive the exact number of soldiers in the group.15
Basically, the general is finding the common solution of the congruences:
x ≡ 5 (mod 7),
x ≡ 3 (mod 11),
x ≡ 2 (mod 13).
We have m1 = 7, m2 = 11, m3 = 13, a1 = 5, a2 = 3, and a3 = 2. M =
m1 m2 m3 = 1001, and so n1 = M/m1 = m2 m3 = 143, n2 = M/m2 =
m1 m3 = 91, and n3 = M/m3 = m1 m2 = 77. The inverses are u1 ≡ 143−1 ≡
3−1 ≡ 5 (mod 7), u2 ≡ 91−1 ≡ 3−1 ≡ 4 (mod 11), and u3 ≡ 77−1 ≡ 12−1 ≡
12 (mod 13). Therefore, the simultaneous solution is x ≡ u1 n1 a1 + u2 n2 a2 +
u3 n3 a3 ≡ 5×143×5+4×91×3+12×77×2 ≡ 6515 ≡ 509 (mod 1001). One can
verify that this solution is correct: 509 = 72× 7 + 5 = 46 × 11 + 3 = 39 × 13 + 2.
If it is known that 0 6 x 6 1000 (as in the case of the Chinese general),
the unique solution is x = 509. ¤
15 Irrespective of the veracity of this story, there is something Chinese about CRT. The

oldest reference to the theorem appears in a third-century book by the Chinese mathemati-
cian Sun Tzu. In the sixth and seventh centuries, Indian mathematicians Aryabhatta and
Brahmagupta studied the theorem more rigorously.
Arithmetic of Integers 43

The proof of the CRT can be readily converted to Algorithm 1.6.

Algorithm 1.6: Chinese remainder theorem


int CRT ( int a[], int m[], int t, int *Mptr ) {
int i, M = 1, n, x = 0, u;

for (i=0; i<t; ++i) M *= m[i];


for (i=0; i<t; ++i) {
n = M / m[i];
u = invMod(n%m[i],m[i]);
x += n * u * a[i];
}
x %= M;
if (x < 0) x += M;
if (Mptr != NULL) *Mptr = M;
return x;
}

GP/PARI supports the call chinese() for CRT-based combination. The func-
tion takes only two modular elements as arguments. The function combines the
two elements, and returns an element modulo the product of the input mod-
uli. In order to run CRT on more than two moduli, we need to make nested
calls of chinese(). The function chinese() can handle non-coprime moduli
also. An integer modulo the lcm of the input moduli is returned in this case.
However, the input congruences may now fail to have a simultaneous solution.
For example, there cannot exist an integer x satisfying both x ≡ 5 (mod 12)
and x ≡ 6 (mod 18), since such an integer is of the form 18k + 6 (a multiple
of 6) and at the same time of the form 12k + 5 (a non-multiple of 6).

gp > chinese(Mod(5,7),Mod(3,11))
%1 = Mod(47, 77)
gp > chinese(Mod(5,7),Mod(-3,11))
%2 = Mod(19, 77)
gp > chinese(chinese(Mod(5,7),Mod(3,11)),Mod(2,13))
%3 = Mod(509, 1001)
gp > chinese(Mod(47,77),Mod(2,13))
%4 = Mod(509, 1001)
gp > chinese(Mod(5,12),Mod(11,18))
%5 = Mod(29, 36)
gp > chinese(Mod(5,12),Mod(6,18))
*** incompatible arguments in chinois.

The incremental way of combining congruences for more than two moduli,
as illustrated above for GP/PARI, may be a bit faster (practically, but not in
terms of the order notation) than Algorithm 1.6 (see Exercise 1.44).
44 Computational Number Theory

1.5 Polynomial Congruences


Linear congruences are easy to solve. We now look at polynomial congru-
ences of higher degrees. This means that we plan to locate all the roots of
a polynomial f (x) with integer coefficients modulo a given m ∈ N. We can
substitute x in f (x) by the elements of Zm one by one, and find out for which
values of x we get f (x) ≡ 0 (mod m). This method is practical only if m is
small. However, if the complete factorization of the modulus m is available, a
technique known as Hensel lifting efficiently solves polynomial congruences.16

1.5.1 Hensel Lifting


Let m = pe11 · · · perr with pairwise distinct primes p1 , . . . , pr and with ei ∈
N. If we know the roots of f (x) modulo each pei i , we can combine these roots
by the CRT in order to obtain all the roots of f (x) modulo m. So it suffices
to look at polynomial congruences of the form
f (x) ≡ 0 (mod pe ),
where p is a prime and e ∈ N. Hensel lifting is used to obtain all the solutions
of f (x) ≡ 0 (mod pǫ+1 ) from the solutions of f (x) ≡ 0 (mod pǫ ). This means
that the roots of f (x) are first computed modulo p. These roots are then lifted
to the roots of f (x) modulo p2 , and the roots modulo p2 are lifted to the roots
modulo p3 , and so on.
Let ξ be a solution of f (x) ≡ 0 (mod pǫ ). All integers that satisfy this
congruence and that are congruent to this solution modulo pǫ are ξ + kpǫ for
k ∈ Z. We investigate which of these solutions continue to remain the solutions
of f (x) ≡ 0 (mod pǫ+1 ) also. Let us write the polynomial f (x) as
f (x) = ad xd + ad−1 xd−1 + · · · + a1 x + a0 .
Substituting x = ξ + kpǫ yields
f (ξ + kpǫ )
= ad (ξ + kpǫ )d + ad−1 (ξ + kpǫ )d−1 + · · · + a1 (ξ + kpǫ ) + a0
= (ad ξ d + ad−1 ξ d−1 + · · · + a1 ξ + a0 )
+kpǫ (dad ξ d−1 + (d − 1)ad−1 ξ d−2 + · · · + a1 ) + p2ǫ α
= f (ξ) + kpǫ f ′ (ξ) + p2ǫ α,
where α is some polynomial expression in ξ, and f ′ (x) is the (formal) derivative
of f (x). Since ǫ > 1, we have 2ǫ > ǫ + 1, so modulo pǫ+1 the above equation
16 Kurt Wilhelm Sebastian Hensel (1861–1941) is a German mathematician whose major

contribution is the introduction of p-adic numbers that find many uses in analysis, algebra
and number theory. If the Hensel lifting procedure is carried out for all e ∈ N, in the limit
we get the p-adic solutions of f (x) = 0.
Arithmetic of Integers 45

reduces to f (ξ + kpǫ ) ≡ f (ξ) + kpǫ f ′ (ξ) (mod pǫ+1 ). We need to identify


all values of k for which f (ξ + kpǫ ) ≡ 0 (mod pǫ+1 ). These are given by
kpǫ f ′ (ξ) ≡ −f (ξ) (mod pǫ+1 ). Since ξ is a solution for f (x) ≡ 0 (mod pǫ ), we
have pǫ |f (ξ). Canceling pǫ yields the linear congruence

f (ξ)
f ′ (ξ)k ≡ − (mod p),

which has 0, 1, or p solutions for k depending on the values of f ′ (ξ) and fp(ξ)
ǫ .

Each lifting step involves solving a linear congruence only. The problem of
solving a polynomial congruence then reduces to solving the congruence mod-
ulo each prime divisor of the modulus. We will study root-finding algorithms
for polynomials later in a more general setting.

Example 1.41 We find out all the solutions of the congruence

2x3 − 7x2 + 189 ≡ 0 (mod 675).

We have f (x) = 2x3 − 7x2 + 189, and so f ′ (x) = 6x2 − 14x. The modulus
admits the prime factorization m = 33 × 52 . We proceed step by step in order
to obtain all the roots.
Solutions of f (x) ≡ 0 (mod 3)
We have f (0) ≡ 189 ≡ 0 (mod 3), f (1) ≡ 2 − 7 + 189 ≡ 1 (mod 3) and
f (2) ≡ 16 − 28 + 189 ≡ 0 (mod 3). Thus, the roots modulo 3 are 0, 2.
Solutions of f (x) ≡ 0 (mod 32 )
Let us first lift the root x ≡ 0 (mod 3). Since f (0)/3 = 189/3 = 63 and f ′ (0) =
0, the congruence f ′ (0)k ≡ − f (0) 3 (mod 3) is satisfied by k = 0, 1, 2 (mod 3),
and the lifted roots are 0, 3, 6 modulo 9.
For lifting the root x ≡ 2 (mod 3), we calculate f (2)/3 = 177/3 = 59
and f ′ (2) = 24 − 28 = −4. So the congruence f ′ (2)k ≡ − f (2) 3 (mod 3), that
is, −4k ≡ −59 (mod 3), has a unique solution k ≡ 2 (mod 3). So there is a
unique lifted root 2 + 2 × 3 = 8.
Therefore, all the roots of f (x) ≡ 0 (mod 32 ) are 0, 3, 6, 8.
Solutions of f (x) ≡ 0 (mod 33 )
Let us first lift the root x ≡ 0 (mod 32 ). We have f (0)/32 = 189/9 = 21 and
f ′ (0) = 0. The congruence f ′ (0)k ≡ − f3(0)2 (mod 3) is satisfied by k = 0, 1, 2,
that is, there are three lifted roots 0, 9, 18.
Next, we lift the root x ≡ 3 (mod 32 ). We have f (3)/32 = (2 × 27 − 7 ×
9 + 189)/9 = 6 − 7 + 21 = 20, whereas f ′ (3) = 6 × 9 − 14 × 3 = 12. Thus,
the congruence f ′ (3)k ≡ − f3(3) 2 (mod 3) has no solutions, that is, the root
3 (mod 32 ) does not lift to a root modulo 33 .
For the root x ≡ 6 (mod 32 ), we have f (6)/32 = (2×216−7×36+189)/9 =
48 − 28 + 21 = 41 and f ′ (6) = 216 − 84 = 132, so there is no solution for k in
the congruence f ′ (6)k ≡ − f3(6)2 (mod 3), that is, the root 6 does not lift to a
root modulo 33 .
46 Computational Number Theory

Finally, consider the lifting of the last root x ≡ 8 (mod 32 ). We have


f (8)/32 = (2×512−7×64+189)/9 = 765/9 = 85 and f ′ (8) = 6×64−14×8 =
272. Therefore, the congruence f ′ (8)k ≡ − f3(8) 2 (mod 3), that is, 272k ≡
−85 (mod 3), that is, 2k ≡ 2 (mod 3) has a unique solution k ≡ 1 (mod 3).
That is, the unique lifted root is 8 + 1 × 9 = 17.
To sum up, all solutions modulo 33 are 0, 9, 17, 18.
Solutions of f (x) ≡ 0 (mod 5)
We evaluate f (x) at x = 0, 1, 2, 3, 4 and discover that f (x) ≡ 0 (mod 5) only
for x = 3, 4.
Solutions of f (x) ≡ 0 (mod 52 )
First, we try to lift the root x ≡ 3 (mod 5). We have f (3)/5 = (2 × 27 − 7 ×
9 + 189)/5 = 180/5 = 36 and f ′ (3) = 6 × 9 − 14 × 3 = 12. The congruence
f ′ (3)k ≡ − f (3)
5 (mod 5), that is, 2k ≡ −1 (mod 5), has a unique solution
k ≡ 2 (mod 5), that is, there is a unique lifted root 3 + 2 × 5 = 13.
Next, we investigate the lifting of the root x ≡ 4 (mod 5). We have
f (4)/5 = (2×64−7×16+189)/5 = 205/5 = 41 and f ′ (4) = 6×16−14×4 = 40.
The congruence f ′ (4)k ≡ − f (4) 5 (mod 5), that is, the congruence 40k ≡
−41 (mod 5) does not have a solution, that is, the root 4 (mod 5) does not
lift to a root modulo 52 .
To sum up, the only solution of f (x) ≡ 0 (mod 52 ) is x ≡ 13 (mod 25).
Solutions of f (x) ≡ 0 (mod 675)
We finally combine the solutions modulo 27 and 25 using the CRT. Since
25−1 ≡ 13 (mod 27) and 27−1 ≡ 13 (mod 25), all the solutions of f (x) ≡
0 (mod 675) are 25 × 13 × a + 27 × 13 × b (mod 675) for a ∈ {0, 9, 17, 18} and
b ∈ {13}. The solutions turn out to be x ≡ 63, 288, 513, 638 (mod 675). ¤

If the factorization of the modulus m is not available, we do not know any


easy way to solve polynomial congruences in general. All we can do is first
factoring m and then applying the above procedure. Even solving quadratic
congruences modulo a composite m is known to be probabilistic polynomial-
time equivalent to the problem of factoring m.

1.6 Quadratic Congruences


A special case of polynomial congruences is of the form ax2 + bx + c ≡
0 (mod m). In view of the theory described in the previous section, it suffices
to concentrate only on prime moduli p. The case p = 2 is easy to handle. So we
assume that the modulus is an odd prime p. Since we are going to investigate
(truly) quadratic polynomial congruences, we would assume p 6 | a. In that
case, 2a is invertible modulo p, and the congruence ax2 + bx + c ≡ 0 (mod p)
can be rewritten as y 2 ≡ α (mod p), where y ≡ x + b(2a)−1 (mod p) and
Arithmetic of Integers 47

α ≡ (b2 − 4ac)(4a2 )−1 (mod p). This implies that it suffices to concentrate
only on quadratic congruences of the special form
x2 ≡ a (mod p).
Gauss was the first mathematician to study quadratic congruences formally.17

1.6.1 Quadratic Residues and Non-Residues


In order to solve the quadratic congruence x2 ≡ a (mod p), we first need
to know whether it is at all solvable. This motivates us to define the following.
Definition 1.42 Let p be an odd prime. An integer a with gcd(a, p) = 1
is called a quadratic residue modulo p if the congruence x2 ≡ a (mod p)
has a solution. An integer a with gcd(a, p) = 1 is called a quadratic non-
residue modulo p if the congruence x2 ≡ a (mod p) does not have a solution.
An integer divisible by p is treated neither as a quadratic residue nor as a
quadratic non-residue modulo p. ⊳
Example 1.43 Consider the prime p = 19. Squaring individual elements of
Z19 gives 02 ≡ 0 (mod 19), 12 ≡ 182 ≡ 1 (mod 19), 22 ≡ 172 ≡ 4 (mod 19),
32 ≡ 162 ≡ 9 (mod 19), 42 ≡ 152 ≡ 16 (mod 19), 52 ≡ 142 ≡ 6 (mod 19),
62 ≡ 132 ≡ 17 (mod 19), 72 ≡ 122 ≡ 11 (mod 19), 82 ≡ 112 ≡ 7 (mod 19),
and 92 ≡ 102 ≡ 5 (mod 19). Thus, the quadratic residues in Z19 are
1, 4, 5, 6, 7, 9, 11, 16, 17. Therefore, the quadratic non-residues modulo 19 are
2, 3, 8, 10, 12, 13, 14, 15, 18. ¤
The above example can be easily generalized to conclude that modulo
an odd prime p, there are exactly (p − 1)/2 quadratic residues and exactly
(p − 1)/2 quadratic non-residues.

1.6.2 Legendre Symbol


Definition 1.44 Let ³p be ´ an odd prime and a an integer not divisible by p.
a
The Legendre symbol p is defined as18
µ ¶ ½
a 1 if a is a quadratic residue modulo p,
=
p −1 if a is a quadratic non-residue modulo p.
³ ´
It is sometimes convenient to take ap = 0 for an integer a divisible by p. ⊳

17 Johann Carl Friedrich Gauss (1777–1855) was a German mathematician celebrated as

one of the most gifted mathematicians of all ages. Gauss is often referred to as the prince
of mathematics and also as the last complete mathematician (in the sense that he was the
last mathematician who was conversant with all branches of contemporary mathematics).
In his famous book Disquisitiones Arithmeticae (written in 1798 and published in 1801),
Gauss introduced the terms quadratic residues and non-residues.
18 Adrien-Marie Legendre (1752–1833) was a French mathematician famous for pioneering

research in several important branches of mathematics.


48 Computational Number Theory

Some elementary properties of the Legendre symbol are listed now.


Proposition 1.45 Let p be an odd ³ ´prime,
³ ´and a, b integers. Then, we have:
a
(a) If a ≡ b (mod p), then p = pb .
³ ´ ³ ´
(b) ap = a rem p
.
³ ´ ³ ´p ³ ´
(c) ab = ap b
.
³ p´ ³ ´p ³ 2´
(d) p = 0, p = 1, and ap = 1 if gcd(a, p) = 1.
0 1
³ ´ ³ ´
(e) −1 = (−1)(p−1)/2
, that is, −1
= 1 if and only if p ≡ 1 (mod 4).
³ p´ 2
³ ´p
(f ) p2 = (−1)(p −1)/8 , that is, p2 = 1 if and only if p ≡ ±1 (mod 8). ⊳

The following theorem leads to


³ an´ algorithmic solution to the problem of
a
computing the Legendre symbol p .

³ ´ 1.46 [Euler’s criterion]


Theorem Let p be an odd prime, and a any integer.
Then, ap ≡ a(p−1)/2 (mod p).
Proof The result³is´evident for a ≡ 0 (mod p). So consider the case that
gcd(a, p) = 1. Let ap = 1, that is, x2 ≡ a (mod p) has a solution, say b. But
then a(p−1)/2 ≡ bp−1 ≡ 1 (mod p) by Fermat’s little theorem.
Zp is a field, since every non-zero element in it is invertible. Every a ∈ Z∗p
satisfies ap−1 −1 ≡ (a(p−1)/2 −1)(a(p−1)/2 +1) ≡ 0 (mod p), that is, a(p−1)/2 ≡
±1 (mod p). All quadratic residues in Z∗p satisfy a(p−1)/2 ≡ 1 (mod p). But
the congruence a(p−1)/2 ≡ 1 (mod p) cannot have more than (p − 1)/2 roots,
so all the quadratic non-residues in Z∗p must satisfy a(p−1)/2 ≡ −1 (mod p). ⊳
¡ 41 ¢
Example 1.47 Let p = 541. We have 541 ≡ 41(541−1)/2 ≡ 1 (mod 541),
¡ 51 ¢
that is, 41 is a quadratic residue modulo 541. Also, 541 ≡ 51(541−1)/2 ≡
−1 (mod 541), that is, 51 is a quadratic non-residue modulo 541. ¤
³ ´
A more efficient algorithm for the computation of ap follows from the
quadratic reciprocity law which is stated without proof here.19

Theorem
³ ´ 1.48 [The law of quadratic
³ ´ reciprocity] Let p, q be odd primes.
p (p−1)(q−1)/4 q
Then, q = (−1) p . ⊳
¡ 51
¢
Example 1.49 Using the quadratic reciprocity law, we compute 541 as
µ ¶ µ ¶µ ¶
51 3 17
=
541 541 541
19 Conjectured by Legendre, the quadratic reciprocity law was first proved by Gauss.

Indeed, Gauss himself published eight proofs of this law. At present, hundreds of proofs of
this law are available in the mathematics literature.
Arithmetic of Integers 49
µ ¶ µ ¶
541 541
= (−1)(3−1)(541−1)/4 (−1)(17−1)(541−1)/4
3 17
µ ¶µ ¶ µ ¶µ ¶ µ ¶ µ ¶µ ¶
541 541 1 14 14 2 7
= = = =
3 17 3 17 17 17 17
µ ¶ µ ¶ µ ¶ µ ¶
(172 −1)/8 7 7 (7−1)(17−1)/4 17 17
= (−1) = = (−1) =
17 17 7 7
µ ¶ µ ¶ µ ¶ µ ¶
3 7 7 1
= = (−1)(3−1)(7−1)/4 =− =− = −1.
7 3 3 3
Thus, 51 is a quadratic non-residue modulo 541. ¤

1.6.3 Jacobi Symbol


³ ´
Calculating ap as in the last example has a drawback. We have to
factor several integers during the process. For example, in the very first
step, we need to factor 51. It would
¡ 51 ¢ be useful if we can directly
¡ 51apply
¢
the quadratic reciprocity law to 541 , that is, if ³we´ can write 541 =
¡ ¢ ¡ 541 ¢ ¡ 31 ¢
(51−1)(541−1)/4 541 a
(−1) 51 = 51 = 51 . However, p has so far been de-
fined only for odd primes p, and so we require an extension of the Legendre
symbol to work for non-prime denominators also.20

Definition 1.50 Let b be an odd positive integer having prime factorization


b = p1 p2 · · · pr , where the (odd) primes p1 , p2 , . . . , pr ¡are¢ not necessarily all
distinct. For an integer a, we define the Jacobi symbol ab as
³a´ Y r µ ¶
a
= .
b i=1
pi
³ ´
Here, each pai is the Legendre symbol (extended to include the case pi |a). ⊳
¡a¢
¡ a ¢If b is prime, the Jacobi symbol b is the same ¡ a ¢as the Legendre symbol
b . However, for composite b, the Jacobi symbol b has no direct
¡ ¢ relation-
ship with the solvability of the congruence x2 ≡ a (mod b). If ab = −1, the
above congruence is not solvable modulo at least ¡ ¢one prime divisor of b, and
consequently modulo b too. However, the value ab = 1 does not immediately
imply that the congruence x2 ≡ a (mod b) is solvable. For example, the con-
2 2
gruences
¡2¢ ¡ 2x¢ ≡ 2 (mod 3) and x ≡ 2 (mod 5)¡are 2
¢ both ¡ 2 unsolvable,
¢ ¡2¢ so that
3 = 5 = −1. By definition, we then have 15 = 3 5 = 1, whereas
the congruence x2 ≡ 2 (mod 15) is clearly unsolvable.
A loss of connection with the solvability of quadratic congruences is not
a heavy penalty to pay. We instead gain something precious, namely the law
20 The Jacobi symbol was introduced in 1837 by the German mathematician Carl Gustav

Jacob Jacobi (1804–1851).


50 Computational Number Theory

of quadratic reciprocity continues to hold for the generalized Jacobi symbol.


That was precisely the motivation for the generalization. Indeed, the Jacobi
symbol possesses properties identical to the Legendre symbol.

Proposition 1.51 For odd positive integers b, b′ , and for any integers a, a′ ,
we have: ³ ´ ¡ ¢ ³ ´

a′
(a) aab = ab .
¡ a ¢ ¡ a ¢ ¡ a ¢b
(b) bb′ = b b′ .
¡ ¢ ³ ′´
(c) If a ≡ a′ (mod b), then ab = ab .
¡ ¢ ¡ ¢
(d) ab = a rem b
b
.
¡ −1 ¢ (b−1)/2
(e) = (−1) .
¡ 2b¢ (b2 −1)/8
(f ) b = (−1) . ³ ′´
¡ ¢ ′
(g) [Law of quadratic reciprocity] bb′ = (−1)(b−1)(b −1)/4 bb . ⊳
¡ 51 ¢
Example 1.52 Let us compute 541 without making any factoring at-
tempts. At some steps, we may have to
¡ 51 ¢ extract powers of 2, but¡ that
¢ is ¡doable
¢
efficiently by bit operations only. 541 = (−1)(51−1)(541−1)/4 541 51 = 51 31
=
¡ ¢
(31−1)(51−1)/4 51
¡ 51 ¢ ¡ 20 ¢ ¡ 2 ¢2 ¡ 5 ¢ ¡5¢
(−1) = − 31 = − 31 = − 31 = − 31 =
¡31 ¢ ¡ 31 ¢ ¡1¢ 31
−(−1)(5−1)(31−1)/4 31 5 = − 5 = − 5 = −1. ¤
¡a¢
The GP/PARI interpreter computes the Jacobi symbol b , when the call
kronecker(a,b) is made.21 Here are some examples.

gp > kronecker(41,541)
%1 = 1
gp > kronecker(51,541)
%2 = -1
gp > kronecker(2,15)
%3 = 1
gp > kronecker(2,45)
%4 = -1
gp > kronecker(21,45)
%5 = 0

³ ´
For an odd prime p, the congruence x2 ≡ a (mod p) has exactly 1 + ap
³ ´
solutions. We first compute the Legendre symbol ap , and if the congruence is
found to be solvable, the next task is to compute the roots of the congruence.
We postpone the study of root finding until Chapter 3. Also see Exercises 1.58
and 1.59.
21 Kronecker extended the Jacobi symbol to all non-zero integers b, including even and

negative integers (see Exercise 1.65). Leopold Kronecker (1823–1891) was a German mathe-
matician who made significant contributions to number theory and algebra. A famous quote
from him is: Die ganzen Zahlen hat der liebe Gott gemacht, alles andere ist Menschenwerk.
(The dear God has created the whole numbers, everything else is man’s work.)
Arithmetic of Integers 51

1.7 Multiplicative Orders


Let a ∈ Z∗m . Since Z∗m is finite, the elements a, a2 , a3 , . . . modulo m
cannot be all distinct, that is, there exist i, j ∈ N with i > j such that
ai ≡ aj (mod m). Since a is invertible modulo m, we have ai−j ≡ 1 (mod m),
that is, there exist positive exponents e for which ae ≡ 1 (mod m). This
observation leads to the following important concept.

Definition 1.53 Let a ∈ Z∗m . The smallest positive integer e for which ae ≡
1 (mod m) is called the multiplicative order (or simply the order) of a modulo
m and is denoted by ordm a. If e = ordm a, we also often say that a belongs
to the exponent e modulo m. ⊳

Example 1.54 (1) Take m = 35. We have 81 ≡ 8 (mod 35), 82 ≡ 64 ≡


29 (mod 35), 83 ≡ 29 × 8 ≡ 22 (mod 35), and 84 ≡ 22 × 8 ≡ 1 (mod 35).
Therefore, ord35 8 = 4.
(2) For every m ∈ N, we have ordm 1 = 1.
(3) If p is an odd prime, then ordp a = 2 if and only if a ≡ −1 (mod p).
For a composite m, there may exist more than one elements of order 2. For
example, 6, 29, 34 are all the elements of order 2 modulo 35.
(4) The order ordm a is not defined for integers a not coprime to m.
Indeed, if d = gcd(a, m) > 1, then a, a2 , a3 , . . . modulo m are all multiples of
d, and so none of them equals 1 modulo m. ¤

The order ordm a is returned by the call znorder(Mod(a,m)) in GP/PARI.

gp > znorder(Mod(1,35))
%1 = 1
gp > znorder(Mod(2,35))
%2 = 12
gp > znorder(Mod(4,35))
%3 = 6
gp > znorder(Mod(6,35))
%4 = 2
gp > znorder(Mod(7,35))
*** not an element of (Z/nZ)* in order.

1.7.1 Primitive Roots


The order of an integer modulo m satisfies a very important property.

Theorem 1.55 Let a ∈ Z∗m , e = ordm a, and h ∈ Z. Then, ah ≡ 1 (mod m)


if and only if e|h. In particular, e|φ(m), where φ(m) is the Euler phi function.
52 Computational Number Theory

Proof Let e|h, that is, h = ke for some k ∈ Z. But then ah ≡ (ae )k ≡ 1k ≡
1 (mod m). Conversely, let ah ≡ 1 (mod m). Euclidean division of h by e yields
h = ke + r with 0 6 r < e. Since ae ≡ 1 (mod m), we have ar ≡ 1 (mod m).
By definition, e is the smallest positive integer with ae ≡ 1 (mod m), that is,
we must have r = 0, that is, e|h. The fact e|φ(m) follows directly from Euler’s
theorem: aφ(m) ≡ 1 (mod m). ⊳

Definition 1.56 If ordm a = φ(m) for some a ∈ Z∗m , we call a a primitive


root modulo m.22 ⊳

Primitive roots do not exist for all moduli m. We prove an important fact
about primes in this context.

Theorem 1.57 Every prime p has a primitive root.


Proof Since Zp is a field, a non-zero polynomial of degree d over Zp can
have at most d roots. By Fermat’s little theorem, the polynomial xp−1 − 1
has p − 1 roots (all elements of Z∗p ). Let d be a divisor of p − 1. We have
xp−1 − 1 = (xd − 1)f (x) for a polynomial f (x) of degree p − 1 − d. Since f (x)
cannot have more than p − 1 − d roots, it follows that there are exactly d roots
of xd − 1 (and exactly p − 1 − d roots of f (x)) modulo p.
Let p − 1 = pe11 · · · perr with pairwise distinct primes p1 , . . . , pr and with
each ei ∈ N. As argued in the last paragraph, Z∗p contains exactly pei i elements
of orders dividing pei i , and exactly pei i −1 elements of orders dividing piei −1 . This
implies that Z∗p contains at least one element (in fact, pei i − piei −1 elements)
of order equal to pei i . Let ai be any such element. By Exercise 1.66(b), the
element a = a1 · · · ar is of order pe11 · · · perr = p − 1. ⊳

Primes are not the only moduli to have primitive roots. The following
theorem characterizes all moduli that have primitive roots.

Theorem 1.58 The only positive integers > 1 that have primitive roots are
2, 4, pe , 2pe , where p is any odd prime, and e is any positive integer. ⊳

Example 1.59 (1) Take p = 17. Since φ(p) = p − 1 = 16 = 24 , the order


of every element of Z∗p is of the form 2i for some i ∈ {0, 1, 2, 3, 4}. We have
ord17 1 = 1, so 1 is not a primitive root modulo 17. Also 21 ≡ 2 (mod 16),
22 ≡ 4 (mod 17), 24 ≡ 16 (mod 17), and 28 ≡ 1 (mod 17), that is, ord17 2 = 8,
that is, 2 too is not a primitive root of 17. We now investigate powers of 3
modulo 17. Since 31 ≡ 3 (mod 17), 32 ≡ 9 (mod 17), 34 ≡ 81 ≡ 13 (mod 17),
38 ≡ 169 ≡ 16 (mod 17), and 316 ≡ 1 (mod 17), 3 is a primitive root of 17.
(2) The modulus m = 18 = 2 × 32 is of the form 2p e
¡ for1 ¢an
¡ odd¢prime
p and so has a primitive root. We have φ(18) = 18 × 1 − 2 1 − 31 = 6.
So every element of Z∗18 is of order 1, 2, 3, or 6. We have 51 ≡ 1 (mod 18),
22 The term primitive root was coined by Euler. Gauss studied primitive roots in his book

Disquisitiones Arithmeticae. In particular, Gauss was the first to prove Theorem 1.58.
Arithmetic of Integers 53

52 ≡ 7 (mod 18), 53 ≡ 17 (mod 18), and 56 ≡ 1 (mod 18). Thus, 5 is a


primitive root modulo 18.

(3) The modulus m = 16 does not have a primitive root, that is, an element
of order φ(m) = 8. One can check that ord16 1 = 1, ord16 7 = ord16 9 =
ord16 15 = 2, and ord16 3 = ord16 5 = ord16 11 = ord16 13 = 4. ¤

The call znprimroot(m) in the GP/PARI calculator returns a primitive root


modulo its argument m, provided that such a root exists.

gp > znprimroot(47)
%1 = Mod(5, 47)
gp > znprimroot(49)
%2 = Mod(3, 49)
gp > znprimroot(50)
%3 = Mod(27, 50)
gp > znprimroot(51)
*** primitive root does not exist in gener

1.7.2 Computing Orders


We do not know any efficient algorithm for computing ordm a unless the
complete prime factorization of φ(m) is provided. Let φ(m) = pe11 · · · perr with
pairwise distinct primes p1 , . . . , pr and with each ei ∈ N. Since ordm a|φ(m),
we must have ordm a = p1h1 · · · phr r for some hi with 0 6 hi 6 ei (for all i). It
suffices to compute only the exponents h1 , . . . , hr . For computing hi , we raise
other prime divisors pj , j 6= i, to the highest possible exponents ej . Then, we
try hi in the range 0, 1, . . . , ei in order to detect its exact value. The following
algorithm elaborates this idea. We do not use explicit variables storing hi , but
accumulate the product phi i in a variable e to be returned.

Algorithm 1.7: Computing the order ordm a of a ∈ Z∗m


Let φ(m) = pe11 · · · perr be the prime factorization of φ(m).
Initialize e = 1.
For i = 1, 2, . . . , r {
ei
Compute b = a[φ(m)/pi ] (mod m).
While (b 6≡ 1 (mod m)) {
Set b = bpi (mod m), and e = epi .
}
}
Return e.
54 Computational Number Theory

1.8 Continued Fractions


John Wallis and Lord William Brouncker, in an attempt to solve a prob-
lem posed by Pierre de Fermat, developed the theory of continued fractions.
However, there is evidence that this theory in some form was known to the
Indian mathematician Bhaskaracharya in the twelfth century A.D.

1.8.1 Finite Continued Fractions


An expression of the form
1
x0 +
1
x1 +
x2 + . .
. 1
+
1
xk−1 +
xk
with x0 , x1 , . . . , xk ∈ R, all positive except perhaps x0 , is called a (finite) con-
tinued fraction. If all xi are integers, this continued fraction is called simple.
In that case, the integers xi are called the partial quotients of the contin-
ued fraction. By definition, x0 may be positive, negative, or zero, whereas all
subsequent xi must be positive. Let us agree to denote the above continued
fraction by the compact notation hx0 , x1 , x2 , . . . , xk−1 , xk i.
If we start folding the stair from bottom, we see that a finite simple
continued fraction ha0 , . . . , ak i evaluates to a rational number. For example,
h9, 1, 10, 4, 2i represents 1001/101. The converse holds too, that is, given any
rational h/k with k > 0, one can develop a finite simple continued fraction ex-
pansion of h/k. This can be done using the procedure for computing gcd(h, k).

Example 1.60 We plan to compute the simple continued fraction expansion


of 1001/101. Repeated Euclidean divisions yield
1001 = 9 × 101 + 92,
101 = 1 × 92 + 9,
92 = 10 × 9 + 2,
9 = 4 × 2 + 1,
2 = 2 × 1.
Thus, 1001/101 = h1001/101i = h9, 101/92i = h9, 1, 92/9i = h9, 1, 10, 9/2i =
h9, 1, 10, 4, 2i. ¤

Let r = ha0 , . . . , ak i be a simple continued fraction. If ak > 1, the continued


fraction ha0 , . . . , ak −1, 1i also evaluates to r. On the other hand, if ak = 1, the
Arithmetic of Integers 55

continued fraction ha0 , . . . , ak−1 + 1i also evaluates to r. In either case, these


are the only two simple continued fractions for r. For example, h9, 1, 10, 4, 2i
and h9, 1, 10, 4, 1, 1i are the only simple continued fractions for 1001/101.

1.8.2 Infinite Continued Fractions


Finite continued fractions appear not to be so interesting. Assume that
an infinite sequence a0 , a1 , a2 , . . . of integers, all positive except perhaps a0 ,
is given. We want to assign a meaning (and value) to the infinite simple con-
tinued fraction ha0 , a1 , a2 , . . .i. To this end, we inductively define two infinite
sequences hn and kn as follows.
h−2 = 0, h−1 = 1, hn = an hn−1 + hn−2 for n > 0.
k−2 = 1, k−1 = 0, kn = an kn−1 + kn−2 for n > 0.
We also define the rational numbers rn = hn /kn for n > 0. This is allowed,
since we have 1 = k0 6 k1 < k2 < k3 < · · · < kn < · · · .
Theorem 1.61 With the notations just introduced, we have rn = ha0 , . . . , an i
for every n ∈ N0 . Furthermore, the rational numbers rn satisfy the inequalities:
r0 < r2 < r4 < · · · < r5 < r3 < r1 . The limit ξ = limn→∞ rn exists. For every
m, n ∈ N0 , we have r2m < ξ < r2n+1 . ⊳
This theorem allows us to let the unique real number ξ stand for the infinite
continued fraction ha0 , a1 , a2 , . . .i, that is,
hn
ha0 , a1 , a2 , . . .i = lim ha0 , a1 , . . . , an i = lim rn = lim = ξ.
n→∞ n→∞ n→∞ kn

Theorem 1.62 The value of an infinite simple continued fraction is irra-


tional. Moreover, two different infinite simple continued fractions evaluate to
different values, that is, if ha0 , a1 , a2 , . . .i = hb0 , b1 , b2 , . . .i, then an = bn for
all n ∈ N0 . ⊳
Definition 1.63 For each n ∈ N0 , the rational number rn is called the n-th
convergent to the irrational number ξ = ha0 , a1 , a2 , . . .i. ⊳
Example 1.64 Let us compute the irrational number ξ with continued frac-
tion expansion h1, 2, 2, 2, . . .i. We have ξ = 1 + λ1 , where λ = h2, 2, 2, . . .i.

Observe that λ = 2 + λ1 , so λ2 − 2λ − 1 = 0, that is, λ = 1 ± 2. Since λ is
√ √ √
positive, we have λ = 1 + 2. Therefore, ξ = 1 + 1+1√2 = 1 + ( 2 − 1) = 2. ¤

The converse question is: Does any irrational ξ expand to an infinite simple
continued fraction? The answer is: yes. We inductively generate a0 , a1 , a2 , . . .
with ξ = ha0 , a1 , a2 , . . .i as follows. We start by setting ξ0 = ξ and a0 = ⌊ξ0 ⌋.
When ξ0 , . . . , ξn and a0 , . . . , an are known for some n > 0, we calculate ξn+1 =
1/(ξn − an ) and an+1 = ⌊ξn+1 ⌋. Since ξ is irrational, it follows that each ξn is
also irrational. In addition, the integers a1 , a2 , a3 , . . . are all positive. Only a0
may be positive, negative or zero.
56 Computational Number Theory

Example 1.65 √ (1) Let us first obtain the infinite simple continued fraction
expansion of 2.

ξ0 = 2 = 1.4142135623 . . . , a0 = ⌊ξ0 ⌋ = 1 ,
1 1 √
ξ1 = =√ = 1 + 2 = 2.4142135623 . . . , a1 = ⌊ξ1 ⌋ = 2 ,
ξ0 − a0 2−1
1 1 √
ξ2 = =√ = 1 + 2 = 2.4142135623 . . . , a2 = ⌊ξ2 ⌋ = 2 ,
ξ1 − a1 2−1
√ √
and so on. Therefore, 2 = h1, 2, 2, 2, . . .i. The first few convergents to 2 are
r0 = h1i = 1, r1 = h1, 2i = 23 = 1.5, r2 = h1, 2, 2i = 75 = 1.4, r3 = h1, 2, 2, 2i =
17 41
12 = 1.4166666666 . . . , r4 = h1, 2, 2, 2, 2i = 29 = 1.4137931034 . . . . It is √
ap-
parent that the convergents r0 , r1 , r2 , r3 , r4 , . . . go successively closer to 2.
(2) Let us now develop the infinite simple continued fraction expansion
of π = 3.1415926535 . . . .
ξ0 = π = 3.1415926535 . . . , a0 = ⌊ξ0 ⌋ = 3 ,
1
ξ1 = = 7.0625133059 . . . , a1 = ⌊ξ1 ⌋ = 7 ,
ξ0 − a0
1
ξ2 = = 15.996594406 . . . , a2 = ⌊ξ2 ⌋ = 15 ,
ξ1 − a1
1
ξ3 = = 1.0034172310 . . . , a3 = ⌊ξ3 ⌋ = 1 ,
ξ2 − a2
and so on. Thus, the first few convergents to π are r0 = h3i = 3, r1 =
h3, 7i = 22 333
7 = 3.1428571428 . . . , r2 = h3, 7, 15i = 106 = 3.1415094339 . . . , r3 =
355
h3, 7, 15, 1i = 113 = 3.1415929203 . . . . Here too, the convergents r0 , r1 , r2 ,
r3 , . . . go successively closer to π. This is indeed true, in general. ¤

Lemma 1.66 Let hn /kn , n ∈ N0 , be the convergents to the irrational number


ξ. Then, |ξ − hknn | < kn k1n+1 or equivalently, |ξkn − hn | < kn+1
1
for all n > 0. ⊳

Theorem 1.67 Let hn /kn , n ∈ N0 , be the convergents to the irrational num-


ber ξ. Then, for all n > 1, we have |ξkn − hn | < |ξkn−1 − hn−1 |. In particular,
|ξ − hknn | < |ξ − hkn−1
n−1
| for all n > 1. ⊳

The convergents hn /kn to the irrational number ξ are called best possible
approximations of ξ in the sense that if a rational a/b is closer to ξ than hn /kn ,
the denominator b has to be larger than kn . More precisely, we have:
hn
Theorem 1.68 Let a ∈ Z and b ∈ N with |ξ − ab | < |ξ − kn | for some n > 1.
Then, b > kn . ⊳

The continued fraction of a real number x is returned by the call


contfrac(x) in the GP/PARI calculator. The number of terms returned in the
output depends on the precision of the calculator. The expansion is truncated
after these terms. The user may optionally specify the number of terms that
(s)he desires in the expansion.
Arithmetic of Integers 57

gp > contfrac(1001/101)
%1 = [9, 1, 10, 4, 2]
gp > contfrac(sqrt(11))
%2 = [3, 3, 6, 3, 6, 3, 6, 3, 6, 3, 6, 3, 6, 3, 6, 3, 6, 3, 6, 3, 6, 3]
gp > Pi
%3 = 3.141592653589793238462643383
gp > contfrac(Pi)
%4 = [3, 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1, 14, 2, 1, 1, 2, 2, 2, 2, 1, 84, 2,
1, 1, 15, 3]
gp > contfrac(Pi,10)
%5 = [3, 7, 15, 1, 292, 1, 1, 1, 3]
gp > contfrac(Pi,100)
%6 = [3, 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1, 14, 2, 1, 1, 2, 2, 2, 2, 1, 84, 2,
1, 1, 15, 3]

The last two convergents


µ hn ¶
/kn and hn−1 /kn−1 are returned in the form
hn hn−1
of the 2 × 2 matrix by the call contfracpnqn() which accepts a
kn kn−1
continued fraction expansion as its only input.

gp > contfrac(Pi)
%1 = [3, 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1, 14, 2, 1, 1, 2, 2, 2, 2, 1, 84, 2,
1, 1, 15, 3]
gp > contfracpnqn(contfrac(Pi))
%2 =
[428224593349304 139755218526789]

[136308121570117 44485467702853]

gp > contfrac(Pi,3)
%3 = [3, 7, 15]
gp > contfracpnqn(contfrac(Pi,3))
%4 =
[333 22]

[106 7]

gp > contfrac(Pi,5)
%5 = [3, 7, 15, 1, 292]
gp > contfracpnqn(contfrac(Pi,5))
%6 =
[103993 355]

[33102 113]

gp > contfracpnqn(contfrac(1001/101))
%7 =
[1001 446]

[101 45]
58 Computational Number Theory

1.9 Prime Number Theorem and Riemann Hypothesis


Euclid (ca. 300 BC) was seemingly the first to prove that there are infinitely
many primes. Euclid’s proof, given below (in modern terminology), is still an
inspiring and influential piece of reasoning.

Theorem 1.69 There are infinitely many primes.


Proof The assertion is proved by contradiction. Suppose that there are only
finitely many primes p1 , p2 , . . . , pr . The integer n = p1 p2 · · · pr + 1 is evidently
divisible by neither of the primes p1 , p2 , . . . , pr . So all prime factors of n (n
itself may be prime) are not present in our initial exhaustive list p1 , p2 , . . . , pr
of primes, a contradiction. ⊳

At the first sight, prime numbers are distributed somewhat erratically.


There is no formula involving simple functions like polynomials, that can
generate only prime numbers. There are arbitrarily long gaps in the sequence
of primes (for example, (n+1)!+i is composite for all i = 2, 3, . . . , n+1). Still,
mathematicians tried to harness some patterns in the distribution of primes.
For a positive real number x, we denote by π(x) the number of primes
between 1 and x. There exists no simple formula to describe π(x) for all (or
almost all) values of x. For about a century, mathematicians tried to establish
the following assertion, first conjectured by Legendre in 1797 or 1798.

Theorem 1.70 [Prime number theorem (PNT )] π(x) approaches the quan-
tity x/ ln x as x → ∞. Here, the term “approaches” means that the limit
π(x)
lim is equal to 1. ⊳
x→∞ x/ ln x

Several branches of mathematics (most notably, the study of analytic func-


tions in complex analysis) got enriched during attempts to prove the prime
number theorem. The first complete proof of the PNT (based mostly on the
ideas of Riemann and Chebyshev) was given independently by the French
mathematician Hadamard and by the Belgian mathematician de la Vallée
Poussin in 1896. Their proof is regarded as one of the most important achieve-
ments of modern mathematics. An elementary proof (that is, a proof not based
on results from analysis or algebra) of the theorem was found by Paul Erdös
(1949) and Atle Selberg (1950).
Although the formula x/ ln x for π(x) is asymptotic, it is a good ap-
proximation for π(x) for all values of x. In fact, it can be proved much
more easily than the PNT that for all sufficiently large values of x, we have
π(x)
0.922 < < 1.105. These inequalities indicate that π(x) = Θ(x/ ln x)—
x/ ln x
a result as useful to a computational number theorist as the PNT. Proving
the PNT, however, has been a landmark in the history of mathematics.
Arithmetic of Integers 59

A better approximation of π(x) isZprovided by Gauss’s Li function defined


x
dx
by the logarithmic integral Li(x) = . The quantity Li(x) approaches
2 ln x
π(x) as x → ∞. (This was first conjectured by Dirichlet in 1838.)
π(x) π(x)
Although the ratio (or the ratio ) approaches 1 asymptoti-
x/ ln x Li(x)
x
cally, the difference π(x) − (or the difference π(x) − Li(x)) is not necessar-
ln x
ily zero, nor even convergent to a finite value. However, mathematicians treat
the distribution of primes as well-behaved if this difference is not too large.
Riemann proposed the following conjecture in 1859.23

Conjecture 1.71 [Riemann hypothesis (RH )] π(x)−Li(x) = O( x ln x). ⊳

³Itxhas been

proved
´ (for example, by Vallée Poussin) that π(x) − Li(x) =
−α ln x
O e for some constant α. However, the tighter bound indicated
ln x
by the Riemann hypothesis stands unproven even in the twenty-first century.
A generalization of Theorem 1.69 is proved by Dirichlet.24

Theorem 1.72 [Dirichlet’s theorem on primes in arithmetic progression] Let


a, b ∈ N with gcd(a, b) = 1. There exist infinitely many primes in the sequence
a, a + b, a + 2b, a + 3b, . . . . ⊳

Let a, b be as above. We denote by πa,b (x) the number of primes of the


form a + kb, that are 6 x. The prime number theorem can be generalized
x
as follows: πa,b (x) approaches as x → ∞. Moreover, the Riemann
φ(b) ln x
hypothesis can be extended as follows.

Conjecture 1.73 [Extended Riemann hypothesis (ERH )]


1 √
πa,b (x) = Li(x) + O( x ln x). ⊳
φ(b)
The ERH has significant implications in computational number theory.
Certain algorithms are known to run in polynomial time only under the as-
sumption that the ERH is true. However, like the RH, this extended version
continues to remain unproven. The RH is indeed a special case (corresponding
to a = b = 1) of the ERH.
As an example of the usefulness of the ERH, let us look at the following
implication of the ERH.

Conjecture 1.74 The smallest positive quadratic non-residue modulo a


prime p is < 2 ln2 p. ⊳
23 Georg Friedrich Bernhard Riemann (1826–1866) was a German mathematician whose

works have deep impacts in several branches of mathematics including complex analysis,
analytic number theory and geometry.
24 Johann Peter Gustav Lejeune Dirichlet (1805–1859) was a German mathematician hav-

ing important contributions to analytic and algebraic number theory.


60 Computational Number Theory

In Exercise 1.59, a probabilistic polynomial-time algorithm (by Tonelli and


Shanks) is described. If Conjecture 1.74 is true, this algorithm can be read-
ily converted to a deterministic polynomial-time algorithm. However, if we
do not assume the ERH, the best provable bound on the smallest positive
quadratic non-residue modulo p turns out to be O(pα ) for some positive con-
stant α. Consequently, we fail to arrive at a deterministic algorithm that runs
in provably polynomial time.

1.10 Running Times of Arithmetic Algorithms


In introductory courses on algorithms, it is often conventional to treat the
size of an integer as a constant. For example, when analyzing the time (and
space) complexities of algorithms that sort arrays of integers, only the size of
the array is taken into consideration.
In number theory, we work with multiple-precision integers. Treating each
integer operand as having a constant size is a serious loss of generality. The
amount of time taken by arithmetic operations on multiple-precision integers
and even by simple copy and comparison operations depends heavily on the
size of the operands. For example, multiplying two million-bit integers is ex-
pected to take sufficiently longer than multiplying two thousand-bit integers.
In view of this, the input size for an arithmetic algorithm is measured by
the total number of bits needed to encode its operands. Of course, the encod-
ing involved must be reasonable. The binary representation of an integer is a
reasonable encoding of the integer, whereas its unary representation which is
exponential in the size of its binary representation is not reasonable. Hence-
forth, we take the size of an integer n as log2 n = lg n, or ignoring constant
factors as loge n = ln n, or even as log n to an unspecified (but fixed) base.
When arithmetic modulo an integer m is considered, it is assumed that
Zm has the standard representation {0, 1, 2, . . . , m − 1}. Thus, each element
of Zm is at most as big as m − 1, that is, has a bit size which is no larger than
lg m. Moreover, since aφ(m) ≡ 1 (mod m) for any a ∈ Z∗m , an exponentiation
of the form ae (mod m) is carried out under the assumption that e is available
modulo φ(m). But φ(m) < m, that is, e can also be encoded using 6 lg m bits.
To sum up, an algorithm involving only a constant number of inputs from Zm
is considered to have an input size of lg m (or ln m or log m).
A polynomial f (x) of degree d has a size needed to encode its d + 1 coeffi-
cients. If each coefficient is known to have an upper bound t on its bit length,
the size of the polynomial is 6 (d + 1)t. For example, if f (x) ∈ Zm [x], the size
of f (x) is 6 (d + 1) lg m, or more simplistically d lg m. Analogously, the size
of a k × l matrix with entries from Zm is kl lg m.
Having defined the notion of the size of arithmetic operands, we are now
ready to concentrate on the running times of arithmetic algorithms. First,
Arithmetic of Integers 61

we look at the basic arithmetic operations. Under the assumption of stan-


dard schoolbook arithmetic, we have the following complexity figures. Run-
ning times of faster arithmetic operations (like Karatsuba or Toom-3 or FFT
multiplication) are already elaborated in the text.

Operation Running time


Copy x = a O(lg a)
Comparison of a with b O(max(lg a, lg b))
Addition a + b O(max(lg a, lg b))
Subtraction a − b O(max(lg a, lg b))
Multiplication ab O(lg a lg b)
2
Square a2 O (lg a)
2
Euclidean division a quot b and/or a rem b with a > b O(lg a)
2
Euclidean gcd gcd(a, b) with a > b O(lg a)
2
Extended Euclidean gcd gcd(a, b) = ua + vb with a > b O(lg a)
2
Binary gcd gcd(a, b) with a > b O(lg a)
2
Extended binary gcd gcd(a, b) = ua + vb with a > b O(lg a)

The running times of modular arithmetic operations in Zm are as follows.

Operation Running time


Addition a + b (mod m) O(lg m)
Subtraction a − b (mod m) O(lg m)
2
Multiplication ab (mod m) O(lg m)
2
Inverse a−1 (mod m) O(lg m)
3
Exponentiation ae (mod m) O(lg m)

Finally, the running times of some other important algorithms discussed in


this chapter are compiled.

Operation Running time


2
Chinese remainder³ theorem
´ (Algorithm 1.6) O(t lg M )
a 3
Legendre-symbol p with a ∈ Zp O(lg p)
4
Order ordm a (Algorithm 1.7) O(lg m)

A derivation of these running times is left to the reader as an easy exercise.


62 Computational Number Theory

Exercises
1. Describe an algorithm to compare the absolute values of two multiple-precision
integers.
2. Describe an algorithm to compute the product of a multiple-precision integer
with a single-precision integer.
3. Squaring is a form of multiplication where both the operands are the same.
Describe how this fact can be exploited to speed up the schoolbook mul-
tiplication algorithm for multiple-precision integers. What about Karatsuba
multiplication?
4. Describe an efficient algorithm to compute the Euclidean division of a multi-
ple-precision integer by a non-zero single-precision integer.
5. Describe how multiple-precision division by an integer of the form B l ± m (B
is the base, and m is a small integer) can be efficiently implemented.
6. Explain how multiplication and division of multiple-precision integers by pow-
ers of 2 can be implemented efficiently using bit operations.
7. Describe the details of the Toom-4 multiplication method. Choose the evalu-
ation points as k = ∞, 0, ±1, −2, ± 12 .
8. Toom’s multiplication can be adapted to work for unbalanced operands, that
is, when the sizes of the operands vary considerably. Suppose that the number
of digits of a is about two-thirds the number of digits of b. Write a as a
polynomial of degree two, and b as a polynomial of degree three. Describe how
you can compute the product ab in this case using a Toom-like algorithm.
9. Derive Equation (1.1).
10. Verify the following assertions. Here, a, b, c, x, y are arbitrary integers.
(a) a|a.
(b) If a|b and b|c, then a|c.
(c) If a|b and b|a, then a = ±b.
(d) If a|b and a|c, then a|(bx + cy).
(e) If a|(bc) and gcd(a, b) = 1, then a|c.
11. Let p be a prime. If p|(ab), show that p|a or p|b. More generally, show that if
p|(a1 a2 · · · an ), then p|ai for some i ∈ {1, 2, . . . , n}.
12. Suppose that gcd(r0 , r1 ) is computed by the repeated Euclidean division algo-
rithm. Suppose also that r0 > r1 > 0. Let ri+1 denote the remainder obtained
by the i-th division (that is, in the i-th iteration of the Euclidean loop). So
the computation proceeds as gcd(r0 , r1 ) = gcd(r1 , r2 ) = gcd(r2 , r3 ) = · · · with
r0 > r1 > r2 > · · · > rk > rk+1 = 0 for some k > 1.
(a) If the computation of gcd(r0 , r1 ) requires exactly k Euclidean divisions,
show that r0 > Fk+2 and r1 > Fk+1 . Here, Fn is the n-th Fibonacci number:
F0 = 0, F1 = 1, and Fn = Fn−1 + Fn−2 for n > 2.
Arithmetic of Integers 63

(b) Modify the Euclidean gcd algorithm slightly so as to ensure that ri 6


1
2 ri−1 for i > 2. Here, ri need not be the remainder ri−2 rem ri−1 .
(c) Explain the speedup produced

by the modified algorithm. Assume that
Fn ≈ √15 ρn , where ρ = 1+2 5 = 1.6180339887 . . . is the golden ratio.
13. Modify the binary gcd algorithm (Algorithm 1.3) so that two integers u, v sat-
isfying gcd(a, b) = ua + vb are computed along with gcd(a, b). Your algorithm
should run in quadratic time as the original binary gcd algorithm.
14. Let a, b ∈ N with d = gcd(a, b) = ua + vb for some u, v ∈ Z. Demonstrate that
u, v are not unique. Now, assume that (a, b) 6= (1, 1). Prove that:
(a) If d = 1, then u, v can be chosen to satisfy |u| < b and |v| < a.
(b) In general, u, v can be chosen to satisfy |u| < b/d and |v| < a/d.
15. Define the n-th continuant polynomial Kn (x1 , x2 , . . . , xn ) recursively as:

K0 () = 1,
K1 (x1 ) = x1 ,
Kn (x1 , x2 , . . . , xn ) = xn Kn−1 (x1 , x2 , . . . , xn−1 ) + Kn−2 (x1 , x2 , . . . , xn−2 ), n > 2.

(a) Find K2 (x1 , x2 ), K3 (x1 , x2 , x3 ) and K4 (x1 , x2 , x3 , x4 ).


(b) Prove that Kn (x1 , x2 , . . . , xn ) is the sum of all subproducts of x1 x2 · · · xn
with zero or more non-overlapping contiguous pairs xi xi+1 removed.
(c) Conclude that Kn (x1 , x2 , . . . , xn ) = Kn (xn , xn−1 , . . . , x1 ).
(d) Prove that the number of terms in Kn (x1 , x2 , . . . , xn ) is Fn+1 .
(e) Deduce that for all n > 1, the continuant polynomials satisfy the identity

Kn (x1 , . . . , xn )Kn (x2 , . . . , xn+1 ) − Kn+1 (x1 , . . . , xn+1 )Kn−1 (x2 , . . . , xn ) = (−1)n .

16. Consider the extended Euclidean gcd algorithm described in Section 1.2.2.
Suppose that the algorithm terminates after computing rj = 0 (so that rj−1 is
the gcd of r0 = a and r1 = b). Assume that a > b and let d = gcd(a, b). Finally,
let q2 , q3 , . . . , qj be the quotients obtained during the Euclidean divisions.
(a) Show that

|u1 | < |u2 | 6 |u3 | < |u4 | < |u5 | < · · · < |uj |, and
|v0 | < |v1 | 6 |v2 | < |v3 | < |v4 | < · · · < |vj |.

(b) Prove that

|ui | = Ki−2 (q3 , . . . , qi ) for all i = 2, 3, . . . , j, and


|vi | = Ki−1 (q2 , . . . , qi ) for all i = 1, 2, . . . , j.

(c) Prove that gcd(ui , vi ) = 1 for all i = 0, 1, 2, . . . , j.


(d) Prove that |uj | = b/d and |vj | = a/d.
(e) Prove that the extended gcd algorithm returns the multipliers u, v with
|u| 6 b/d and |v| 6 a/d with strict inequalities holding if b6 | a.
17. Let a1 , a2 , . . . , an be non-zero integers with d = gcd(a1 , a2 , . . . , an ).
64 Computational Number Theory

(a) Prove that there exist integers u1 , u2 , . . . , un satisfying u1 a1 +u2 a2 +· · ·+


un an = d.
(b) How can you compute u1 , u2 , . . . , un along with d?
18. Assume that a randomly chosen non-zero even integer is divisible by 2t but
not by 2t+1 with probability 1/2t for all t ∈ N. Prove that the average number
of iterations of the outer loop of the binary gcd algorithm (Algorithm 1.3) is
at most max(lg a, lg b).
19. Prove that the maximum number of iterations of the outer loop of the binary
gcd algorithm (Algorithm 1.3) is at most 1 + max(lg a, lg b). Prove that this
bound is quite tight. (Hint: Establish the tighter (in fact, achievable) upper
bound lg(a + b) on the number of iterations. Use induction.)
(Remark: The inner while loops of the binary gcd algorithm may call for an
effort proportional to lg a + lg b in the entire execution of Algorithm 1.3.)
20. Argue that for two odd integers a, b, either a + b or a − b is a multiple of four.
How can you exploit this observation to modify the binary gcd algorithm?
What is the expected benefit of this modification?
21. Propose how, in an iteration of the binary gcd algorithm with a > b, you can
force the least significant word of a to become zero by subtracting a suitable
multiple of b from a. What (and when) do you gain from this modification?
22. Consider the following variant of the binary gcd algorithm which attempts to
remove one or more most significant bits of one operand in each iteration.25

Algorithm 1.8: Left-shift binary GCD


Assume that the inputs a, b are positive and odd, and that a > b.
While (b 6= 0) {
Determine e ∈ N0 such that 2e b 6 a < 2e+1 b.
Compute t = min(a − 2e b, 2e+1 b − a).
Set a = b and b = t.
If (a < b), swap a with b.
}
Return a.

(a) Prove that Algorithm 1.8 terminates and correctly computes gcd(a, b).
(b) How can you efficiently implement Algorithm 1.8 using bit operations?
(c) Prove that the number of iterations of the while loop is O(lg a + lg b).
(d) Argue that Algorithm 1.8 can be so implemented to run in O(lg2 a) time
(where a is the larger input operand).
23. Let a, b ∈ N with gcd(a, b) = 1. Assume that a 6= 1 and b 6= 1.
(a) Prove that any integer n > ab can be expressed as n = sa + tb with
integers s, t > 0.
(b) Devise a polynomial-time (in log n) algorithm to compute s, t of Part (a).
25 Jeffrey Shallit and Jonathan Sorenson, Analysis of a left-shift binary gcd algorithm,

Journal of Symbolic Computation, 17, 473–486, 1994.


Arithmetic of Integers 65

(c) Determine the running time of your algorithm.


(Remark: The Frobenius coin change problem deals with the determination
of the largest positive integer that cannot be represented as a linear non-
negative integer combination of some given positive integers a1 , a2 , . . . , ak with
gcd(a1 , a2 , . . . , ak ) = 1. For k = 2, this integer is a1 a2 − a1 − a2 .)
24. Let n ∈ N, and p a prime. The multiplicity of p in n, denoted vp (n), is
the
P largest
¥ (non-negative)
¦ exponent e for which pe |n. Prove that vp (n!) =
k
k∈N n/p . Conclude that the number of trailing zeros in n! is v5 (n!). Pro-
pose an algorithm to compute the number of trailing zeros in n!.
25. Prove that for any r ∈ N, the product of r consecutive integers
¡ ¢ is divisible by
r!. (Please avoid the argument that the binomial coefficient nr is an integer.)
26. Algorithm 1.4 is called left-to-right exponentiation, since the bits in the ex-
ponent are considered from left to right. Rewrite the square-and-multiply
exponentiation algorithm in such a way that the exponent bits are con-
sidered from right to left. In other words, if e = (el−1 el−2 . . . e1 e0 )2 , then
for i = 0, 1, 2, . . . , l − 1 in that order, the i-th iteration should compute
a(ei ei−1 ...e1 e0 )2 (mod m).
27. Algorithm 1.4 can be speeded up by constant factors using several tricks one
of which is explained here. Let w be a small integer (typically, 2, 4 or 8). One
precomputes ai (mod m) for i = 0, 1, . . . , 2w − 1. The exponent is broken into
chunks, each of size w bits. Inside the square-and-multiply loop, w succes-
sive square operations are performed. Subsequently, depending on the current
w-bit chunk in the exponent, a single multiplication is carried out. The pre-
computed table is looked up for this multiplication. Work out the details of
this windowed exponentiation algorithm. Argue how this variant speeds up the
basic exponentiation algorithm.
28. Suppose that we want to compute xr y s (mod m), where r and s are positive
integers of the same bit size. By the repeated square-and-multiply algorithm,
one can compute xr (mod m) and y s (mod m) independently, and then multi-
ply these two values. Alternatively, one may rewrite the square-and-multiply
algorithm using only one loop in which the bits of both the exponents r and
s are simultaneously considered. After each square operation, one multiplies
by 1, x, y or xy.
(a) Elaborate the algorithm outlined above. What speedup is this modifica-
tion expected to produce?
(b) Generalize the concept to the computation of xr y s z t (mod m), and ana-
lyze the speedup.
29. Show that the quotient Q and the remainder R computed by the Barrett
reduction algorithm (Algorithm 1.5) satisfy q − 2 6 Q 6 q and 0 6 R < 3m.

30. For n ∈ N, the integer x = ⌊ n⌋ is called the integer square-root of n. Newton’s
iteration can be used to obtain x from n. Assume that we want to find a zero
of the function f (x). In the specific case of computing integer square-roots,
we have f (x) = x2 − n. We start with an initial approximation x0 of the zero.
66 Computational Number Theory

Next, we enter a loop in which the approximation xi is modified to a better


approximation xi+1 . We generate the new approximation as xi+1 = xi − ff′(xi)
(xi ) .
xi +(n/xi )
For the special case f (x) = x2 − n, we have xi+1 = . Since
j2x +⌊n/x k we plan
i i⌋
to perform integer operations only, we generate xi+1 = Propose
2 .
a polynomial-time algorithm for computing the integer square-root x of n
using this idea.
√ Suggest how an √ initial approximation x0 can be obtained
in the range ⌊ n⌋ 6 x0 6 2 ⌊ n⌋ using bit operations only. Determine a
termination criterion for the Newton loop.
31. Let k > 2 be a constant integer. Design a polynomial-time√algorithm that uses
Newton’s iteration for computing the integer k-th root ⌊ k n⌋ of n ∈ N.
32. Let n ∈ N. Design a polynomial-time algorithm for checking whether n is a
perfect power, that is, n = mk for some m, k ∈ N with k > 2.
33. Assume that you are given a polynomial-time algorithm that, given positive
integers n and k, determines whether n is a perfect k-th power (that is, whether
n = mk for some positive integer m). Devise a polynomial-time algorithm that,
given two positive integers n1 , n2 , determines whether there exist positive
integers e1 , e2 such that ne11 = ne22 and if so, computes a pair of such positive
integers e1 , e2 .
¢ a prime and a, b ∈ Z. Prove the following assertions.
34. Let p¡ be ¡ ¢ p!
(a) kp ≡ 0 (mod p) for all k = 1, . . . , p − 1, where kp = k!(p−k)! is the
binomial coefficient.
r r r
(b) (a+b)p ≡ ap +bp (mod p) or, more generally, (a+b)p ≡ ap +bp (mod p)
for every r ∈ N.
(c) If ap ≡ bp (mod p), then ap ≡ bp (mod p2 ).
35. Let a, b, c be non-zero integers, and d = gcd(a, b).
(a) Prove that the equation ax + by = c is solvable in integer values of x, y if
and only if d | c.
(b) Suppose that d | c, and (s, t) is one solution of the congruence of Part (a).
Prove that all the solutions of this congruence can be given as (s + k(b/d),
t − k(a/d)) for all k ∈ Z. Describe how one solution (s, t) can be efficiently
computed.
(c) Compute all the (integer) solutions of the equation 21x + 15y = 60.
36. Prove that the multivariate linear congruence a1 x1 + a2 x2 + · · · + an xn ≡
b (mod m) is solvable for integer-valued variables x1 , x2 , . . . , xn if and only if
gcd(a1 , a2 , . . . , an , m) | b.
37. Find the simultaneous solution of the congruences: 7x ≡ 8 (mod 9), x ≡
9 (mod 10), and 2x ≡ 3 (mod 11).
38. Compute all the solutions of the following congruences:
(a) x2 + x + 1 ≡ 0 (mod 91).
(b) x2 + x − 1 ≡ 0 (mod 121).
(c) x2 + 5x + 24 ≡ 0 (mod 36).
(d) x50 ≡ 10 (mod 101).
Arithmetic of Integers 67

39. Compute all the simultaneous solutions of the congruences: 5x ≡ 3 (mod 47),
and 3x2 ≡ 5 (mod 49).
40. Let p be a prime.
(a) Show that

xp−1 − 1 ≡ (x − 1)(x − 2) · · · (x − (p − 1)) (mod p), and


p
x − x ≡ x(x − 1)(x − 2) · · · (x − (p − 1)) (mod p),

where f (x) ≡ g(x) (mod p) means that the coefficient of xi in the polynomial
f (x) is congruent modulo p to the coefficient of xi in g(x) for all i ∈ N0 .
(b) [Wilson’s theorem] Prove that (p − 1)! ≡ −1 (mod p).
(c) If m ∈ N is composite and > 4, prove that (m − 1)! ≡ 0 (mod m).
41. [Generalized Euler’s theorem] Let m ∈ N, and a any integer (not necessarily
coprime to m). Prove that am ≡ am−φ(m) (mod m).
42. Let σ(n) denote the sum of positive integral divisors of n ∈ N. Let n = pq
with two distinct primes p, q. Devise a polynomial-time algorithm to compute
p, q from the knowledge of n and σ(n).
43. (a) Let n = p2 q with p, q distinct odd primes, p6 | (q − 1) and q6 | (p − 1). Prove
that factoring n is polynomial-time equivalent to computing φ(n).
(b) Let n = p2 q with p, q odd primes satisfying q = 2p + 1. Argue that one
can factor n in polynomial time.
44. (a) Let m1 , m2 be coprime moduli, and let a1 , a2 ∈ Z. By the extended gcd
algorithm, one can compute integers u, v with um1 + vm2 = 1. Prove that x ≡
um1 a2 + vm2 a1 (mod m1 m2 ) is the simultaneous solution of the congruences
x ≡ ai (mod mi ) for i = 1, 2.
(b) Let m1 , m2 , . . . , mt be pairwise coprime moduli, and a1 , a2 , . . . , at ∈ Z.
Write an incremental procedure for the Chinese remainder theorem that starts
with the solution x ≡ a1 (mod m1 ) and then runs a loop, the i-th iteration of
which (for i = 2, 3, . . . , t in that order) computes the simultaneous solution of
x ≡ aj (mod mj ) for j = 1, 2, . . . , i.
45. [Generalized Chinese remainder theorem] Let m1 , m2 , . . . , mt be t moduli
(not necessarily coprime to one another). Prove that the congruences x ≡
ai (mod mi ) for i = 1, 2, . . . , t are simultaneously solvable if and only if
gcd(mi , mj )|(ai − aj ) for every pair (i, j) with i 6= j. Show also that in this
case the solution is unique modulo lcm(m1 , m2 , . . . , mt ).
46. (a) Design an algorithm that, given moduli m1 , m2 and integers a1 , a2 with
gcd(m1 , m2 )|(a1 − a2 ), computes a simultaneous solution of the congruences
x ≡ ai (mod mi ) for i = 1, 2.
(b) Design an algorithm to implement the generalized CRT on t > 2 moduli.
47. [Theoretical foundation of the RSA cryptosystem] Let m = p1 p2 · · · pk be a
product of k > 2 distinct primes. Prove that the map Zm → Zm that takes
a to ae (mod m) is a bijection if and only if gcd(e, φ(m)) = 1. Describe the
inverse of this exponentiation map.
68 Computational Number Theory

48. Let m = pq be a product of two distinct known primes p, q. Assume that


q −1 (mod p) is available. Suppose that we want to compute b ≡ ae (mod m) for
a ∈ Z∗m and 0 6 e < φ(m). To that effect, we first compute ep = e rem (p − 1)
and eq = e rem (q−1), and then the modular exponentiations bp ≡ aep (mod p)
and bq ≡ aeq (mod q). Finally, we compute t ≡ q −1 (bp − bq ) (mod p).
(a) Prove that b ≡ bq + tq (mod m).
(b) Suppose that p, q are both of bit sizes roughly half of that of m. Explain
how computing b in this method speeds up the exponentiation process. You
may assume classical (that is, schoolbook) arithmetic for the implementation
of products and Euclidean division.
49. Let m ∈ N be odd and composite. Show that if there exists b ∈ Z∗m for
which bm−1 6≡ 1 (mod m), then at least half of the elements a ∈ Z∗m satisfy
am−1 6≡ 1 (mod m).
50. LetYm ∈ N, m > 1. Prove that the number of solutions of xm−1 ≡ 1 (mod m)
is gcd(p − 1, m − 1), where the product is over the set of distinct prime
p|m
divisors p of m.
51. A composite number m is called a Carmichael number if am−1 ≡ 1 (mod m)
for every a coprime to m. Show that m is a Carmichael number if and only if
m is square-free and (p − 1)|(m − 1) for every prime p | m.
(a) Let k ∈ N be such that p1 = 6k + 1, p2 = 12k + 1 and p3 = 18k + 1 are
all primes. Show that p1 p2 p3 is a Carmichael number.
(b) Show that there are no even Carmichael numbers.
(c) Show that a Carmichael number must be the product of at least three
distinct odd primes.
(d) Verify that 561 = 3×11×17, 41041 = 7×11×13×41, 825265 = 5×7×17×
19 × 73, and 321197185 = 5 × 19 × 23 × 29 × 37 × 137 are Carmichael numbers.
(Remark: These are the smallest Carmichael numbers having three, four, five
and six prime factors, respectively. R. D. Carmichael first found the existence
of Carmichael numbers in 1910, studied their properties, and conjectured that
there are infinitely many of them. This conjecture has been proved by Alford,
Granville and Pomerance in 1994. All the Carmichael numbers < 105 are 561,
1105, 1729, 2465, 2821, 6601, 8911, 10585, 15841, 29341, 41041, 46657, 52633,
62745, 63973, and 75361.)
52. (a) Find all the solutions of the congruence 2x3 + x + 9 ≡ 0 (mod 11).
(b) Find all the solutions of the congruence 2x3 + x + 9 ≡ 0 (mod 121) using
Hensel’s lifting procedure.
53. In Section 1.5.1, we lifted solutions of polynomial congruences of the form
f (x) ≡ 0 (mod pe ) to the solutions of f (x) ≡ 0 (mod pe+1 ). In this exercise,
we investigate lifting the solutions of f (x) ≡ 0 (mod pe ) to solutions of f (x) ≡
0 (mod p2e ), that is, the exponent in the modulus doubles every time (instead
of getting incremented by only 1).
Arithmetic of Integers 69

(a) Let f (x) ∈ Z[x], e ∈ N, and ξ a solution of f (x) ≡ 0 (mod pe ). Write


ξ ′ = ξ + kpe . Show how we can compute all values of k for which ξ ′ satisfies
f (ξ ′ ) ≡ 0 (mod p2e ).
(b) It is given that the only solution of 2x3 + 4x2 + 3 ≡ 0 (mod 25) is
14 (mod 25). Using the lifting procedure of Part (a), compute all the solutions
of 2x3 + 4x2 + 3 ≡ 0 (mod 625).
54. Find all the solutions of the following congruences:
(a) x2 ≡ 71 (mod 713). (Remark: 713 = 23 × 31).
(b) x2 + x + 1 ≡ 0 (mod 437). (Remark: 437 = 19 × 23).
55. Let m = pe11 · · · perr be the prime factorization of an odd modulus m. Also,
let a ∈ Z. Prove that the quadratic congruence x2 ≡ a (mod m) has exactly
r µ µ ¶¶
Y a
1+ solutions modulo m. In particular, if gcd(a, m) = 1, there
i=1
p i
are either 0 or 2r solutions.
¡ ¢
56. Imitate the binary gcd algorithm for computing the Jacobi symbol ab .
57. Let p be a prime > 3. Prove that 3 is a quadratic residue modulo p if and only
if p ≡ ±1 (mod 12).
58. Let p be an odd prime and a ∈ Z∗p be a quadratic residue modulo p. Prove
the following assertions.
(a) If p ≡ 3 (mod 4), then a modular square-root of a is a(p+1)/4 (mod p).
(b) If p ≡ 5 (mod 8), then a modular square-root of a is a(p+3)/8 (mod p) if
a(p−1)/4 ≡ 1 (mod p), or 2a · (4a)(p−5)/8 (mod p) if a(p−1)/4 ≡ −1 (mod p).
³ ´
59. Let p be an odd prime, and ap = 1. If p ≡ 3 (mod 4) or p ≡ 5 (mod 8), a
modular square-root of a can be obtained by performing some modular expo-
nentiation(s) as described in Exercise 1.58. If p ≡ 1 (mod 8), Algorithm 1.9
can be used to compute a square-root of a modulo p. The algorithm is, how-
ever, valid for any odd prime p.

Algorithm 1.9: Tonelli and Shanks algorithm for computing a


square-root of a ∈ Z∗p modulo a prime p
³ ´
If ap = −1, return “failure.”
Write p − 1 = 2v q with q odd.
Find any quadratic non-residue b modulo p.
Compute g ≡ bq (mod p), and x ≡ a(q+1)/2 (mod p).
While (1) {
i
Find the smallest i ∈ {0, 1, . . . , v − 1} with (x2 a−1 )2 ≡ 1 (mod p).
If i is 0, return x.
v−i−1
Set x ≡ x · g 2 (mod p).
}

(a) Prove the correctness of Algorithm 1.9.


70 Computational Number Theory

(b) Under the assumption that a quadratic non-residue b modulo p can be


located by randomly trying O(1) elements of Z∗p , prove that Algorithm 1.9
runs in polynomial time. Justify this assumption.
(c) Convert Algorithm 1.9 to a deterministic polynomial-time algorithm as-
suming that Conjecture 1.74 is true.
60. Let p be an odd prime. Prove that the congruence x2 ≡ −1 (mod p) is solvable
if and only if p ≡ 1 (mod 4).
61. Let p be a prime.
(a) Let x be an integer not divisible by p. Show that there exist integers a, b
√ √
satisfying 0 < |a| < p, 0 < |b| < p, and a ≡ bx (mod p).
(b) Show that p can be expressed in the form a2 + b2 with a, b ∈ Z if and only
if p = 2 or p ≡ 1 (mod 4).
(c) Let m be a positive integer. Prove that m can be expressed as the sum
of two positive integers if and only if every prime divisor of m, that is of the
form 4k + 3, is of even multiplicity in m.
62. Let p be a prime of the form 4k +³1, ´and let a and b be positive integers with
a odd and a2 + b2 = p. Show that ap = 1. (Hint: Use quadratic reciprocity.)
63. Let p be a prime.
(a) Prove that the congruence x2 ≡ −2 (mod p) is solvable if and only if
p = 2 or p ≡ 1 or 3 (mod 8).
(b) Show that p can be expressed as a2 + 2b2 with a, b ∈ Z if and only if p = 2
or p ≡ 1 or 3 (mod 8).
64. Let p be a prime.
(a) Prove that the congruence x2 ≡ −3 (mod p) is solvable if and only if
p = 3 or p ≡ 1 (mod 3).
(b) Show that p can be expressed as a2 + 3b2 with a, b ∈ Z if and only if p = 3
or p ≡ 1 (mod 3).
(Remark: For a prime p and an integer constant d, 1 6 d < p, a decompo-
sition p = a2 + db2 can be computed using the Cornacchia algorithm (1908).
One first obtains a square root of −d modulo p. If a square root does not
exist, the equation p = a2 + db2 does not have a solution. Otherwise, we run
the Euclidean gcd algorithm on p and this square root, and stop the gcd loop

as soon as a remainder a less than p is obtained. If (p − a2 )/d is an integer
square b , then (a, b) is the desired solution (see Algorithm 1.10).26 )
2
¡ ¢
65. Kronecker extended Jacobi’s symbol to ab for all integers a, b with b 6= 0.
Write b = up1 p2 · · · pt , where u ∈ {1, −1}, and pi are primes (not necessarily
all distinct). Define
³a´ ³a´ Y t µ ¶
a
= .
b u i=1 pi

26 A proof for the correctness of the Cornacchia algorithm is not very easy and can be

found, for example, in the paper J. M. Basilla, On the solution of x2 +dy 2 = m, Proc. Japan
Acad., 80, Series A, 40–41, 2004.
Arithmetic of Integers 71

Algorithm 1.10: Cornacchia algorithm for solving a2 + db2 = p


³ ´
Compute the Legendre symbol −d
p .
³ ´
−d
If p = −1, return “failure.”
Compute s with s2 ≡ −d (mod p).
If s 6 p/2, replace s by p − s.
Set x = p and¥√ y¦ = s.
while (y > p ){
r = x rem y, x = y, y = r.
}
a = y. p
Set b = (p − a2 )/d.
If b is an integer, return (a, b), else return “failure.”

This extension requires the following special cases:


³a´
= 1
µ 1¶ n
a 1 if a > 0
=
−1 −1 if a < 0
(
³a´
³ ´ if a is even
0
= 2
2 |a| if a is odd

Describe an efficient algorithm to compute the Kronecker symbol.


66. Let h = ordm a, k = ordm b, and l ∈ Z. Prove the following assertions.
(a) ordm al = h/ gcd(h, l).
(b) If gcd(h, k) = 1, then ordm (ab) = hk.
(c) In general, ordm (ab)| lcm(h, k).
(d) There exist m, a, b for which ordm (ab) < lcm(h, k).
67. Let m be a modulus having a primitive root, and a ∈ Z∗m . Prove that a is a
primitive root modulo m if and only if aφ(m)/q 6≡ 1 (mod m) for every prime
divisor q of φ(m). Design an algorithm that, given a ∈ Z∗m and the prime
factorization of φ(m), determines whether a is a primitive root modulo m.
68. Suppose that m is a modulus having a primitive root. Prove that Z∗m contains
exactly φ(φ(m)) primitive roots modulo m. In particular, a prime p has exactly
φ(p − 1) primitive roots.
69. Let g, g ′ be two primitive roots modulo an odd prime p. Prove that:
(a) gg ′ is not a primitive root modulo p.
(b) g e (mod p) is a quadratic residue modulo p if and only if e is even.
70. Let p be an odd prime, a ∈ Z∗p , and e ∈ N. Prove that the multiplicative order
of 1 + ap modulo pe is pe−1 . (Remark: This result can be used to obtain
primitive roots modulo pe .)
72 Computational Number Theory

71. Expand
√ the√following
√ irrational numbers as infinite simple continued fractions:
2 − 1, 1/ 3 and 15 .
72. Let hn /kn be the convergents to an irrational ξ, and let Fn , n ∈ N0 , denote
the Fibonacci numbers.
 forÃall n ∈ N!0 . 
(a) Show that kn > Fn+1
 √ n+1 
 1 1+ 5 
(b) Deduce that kn > √  for all n ∈ N0 . (Remark: This
5 2
shows that the denominators in the convergents to an irrational number grow
quite rapidly (at least exponentially) in n.)
(c) Does there exist an irrational ξ for which kn = Fn+1 for all n ∈ N0 ?
73. (a) Prove that the continued fraction ha0 , a1 , . . . , an i equals KKn+1 (a0 ,a1 ,...,an )
n (a1 ,a2 ,...,an )
,
where Kn is the n-th continuant polynomial (Exercise 1.15).
(b) Let hn /kn be the n-th convergent to an irrational number ξ with hn , kn
defined as in Section 1.8. Prove that hn = Kn+1 (a0 , a1 , . . . , an ) and kn =
Kn (a1 , a2 , . . . , an ) for all n > 0.
(c) Argue that gcd(hn , kn ) = 1, that is, the fraction hn /kn is in lowest terms.

74. A real number of the form a+c b with a, b, c ∈ Z, c 6= 0, and b > 2 not a
perfect square, is called a quadratic irrational. An infinite simple continued
fraction ha0 , a1 , a2 , . . .i is called periodic if there exist s ∈ N0 and t ∈ N such
that an+t = an for all n > s. One can rewrite a periodic continued fractions as
ha0 , . . . , as−1 , b0 , . . . , bt−1 i, where the bar over the block of terms b0 , . . . , bt−1
indicates that this block is repeated ad infinitum. If s = 0, this continued
fraction can be written as h b0 , . . . , bt−1 i and is called purely periodic. Show
that a periodic simple continued fraction represents a quadratic irrational.
(Hint: First consider the case of purely periodic continued fractions, and
then adapt to the general case.)
75. Evaluate the periodic continued fractions h1, 2, 3, 4i and h1, 2, 3, 4i.
76. Prove that there are infinitely many solutions in positive integers of both the
equations x2 − 2y 2 = 1 and x2 −√ 2y 2 = −1. (Hint: Compute h2n − 2kn2 , where
hn /kn is the n-th convergent to 2.)

77. (a) Compute the infinite simple √ continued fraction
√ k expansion of 3.
(b) For all k > 1, write ak + bk 3 = (2 + 3) √with ak , bk integers. Prove
that for all n > 0, the (2n + 1)-th convergent of 3 is r2n+1 = an+1 /bn+1 .
(Remark: ak , bk for k > 1 constitute all the non-zero solutions of the Pell
equation a2 − 3b2 = 1. Proving this needs tools of algebraic number theory.)

78. (a) Compute the continued fraction expansion of 5.
(b) It is known that all the solutions of the Pell equation x2 − 5y 2 = 1 with
x, y√> 0 are of the form x = hn and y = kn , where hn /kn is a convergent
to 5. Find the solution of the Pell equation x2 − 5y 2 = 1 with the smallest
possible y > 0.
Arithmetic of Integers 73

(c) Let (a, b) denote the smallest solution obtained in Part (b). Define the
sequence of pairs (xn , yn ) of positive integers recursively as follows.

(x0 , y0 ) = (a, b) and


(xn , yn ) = (axn−1 + 5byn−1 , bxn−1 + ayn−1 ) for n > 1.
Prove that each (xn , yn ) is a solution of x2 − 5y 2 = 1. (In particular, there are
infinitely many solutions in positive integers of the equation x2 − 5y 2 = 1.)
79. Propose a polynomial-time algorithm that, given t ∈ N, returns a prime of bit-
length t. You may assume that you have at your disposal a polynomial-time
algorithm for proving the primality or otherwise of an integer.
80. By Exercise 1.12(a), the number of iterations in the computation of the Eu-
clidean gcd of a and b is O(lg max(a, b)). Since each Euclidean division done
in the algorithm can take as much as Θ(lg2 max(a, b)) time, the running time
of Euclidean gcd is O(lg3 max(a, b)). This, however, turns out to be a gross
overestimate. Prove that Euclidean gcd runs in O(lg2 max(a, b)) time.

Programming Exercises
Use the GP/PARI calculator to solve the following problems.
81. For n ∈ N denote by S7 (n) the sum of the digits of n expanded in base 7.
We investigate those primes p for which S7 (p) is composite. It turns out that
for small values of p, most of the values S7 (p) are also prime. Write a GP/PARI
program that determines all primes 6 106 , for which S7 (p) is composite. Pro-
vide a theoretical argument justifying the scarcity of small primes p for which
S7 (p) is composite.
82. Let B be a positive integral bound. Write a GP/PARI program that locates all
2
+b2
pairs a, b of positive integers with 1 6 a 6 b 6 B, for which aab+1 is an integer.
2 2
Can you detect a pattern in these integer values of the expression aab+1+b
? Try
to prove your guess.
83. Let B be a positive integral bound. Write a GP/PARI program that locates all
2
+b2
pairs a, b of positive integers with 1 6 a 6 b 6 B and ab > 1, for which aab−1
is an integer. Can you detect a pattern in these integer values of the expression
a2 +b2
ab−1 ? Try to prove your guess.
84. It can be proved that given any a ∈ N, there exists an exponent e ∈ N for
which the decimal expansion of 2e starts with a (at the most significant end).
For example, if a = 7, the smallest exponent e with this property is e = 46.
Indeed, 246 = 70368744177664. Write a GP/PARI program that, given a, finds
the smallest exponent e with the above property. Using the program, compute
the value of this exponent e for a = 2013.
Chapter 2
Arithmetic of Finite Fields

2.1 Existence and Uniqueness of Finite Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76


2.2 Representation of Finite Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.2.1 Polynomial-Basis Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
2.2.2 Working with Finite Fields in GP/PARI . . . . . . . . . . . . . . . . . . . . . . . . . 81
2.2.3 Choice of the Defining Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
2.3 Implementation of Finite Field Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
2.3.1 Representation of Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
2.3.2 Polynomial Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
2.3.2.1 Addition and Subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
2.3.2.2 Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
2.3.2.3 Comb Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
2.3.2.4 Windowed Comb Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
2.3.2.5 Modular Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
2.3.3 Polynomial GCD and Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
2.3.3.1 Euclidean Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
2.3.3.2 Binary Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
2.3.3.3 Almost Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
2.4 Some Properties of Finite Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
2.4.1 Fermat’s Little Theorem for Finite Fields . . . . . . . . . . . . . . . . . . . . . . . . 99
2.4.2 Multiplicative Orders of Elements in Finite Fields . . . . . . . . . . . . . . . 101
2.4.3 Normal Elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
2.4.4 Minimal Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
2.4.5 Implementing Some Functions in GP/PARI . . . . . . . . . . . . . . . . . . . . . . 106
2.5 Alternative Representations of Finite Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
2.5.1 Representation with Respect to Arbitrary Bases . . . . . . . . . . . . . . . . . 108
2.5.2 Normal and Optimal Normal Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
2.5.3 Discrete-Log Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
2.5.4 Representation with Towers of Extensions . . . . . . . . . . . . . . . . . . . . . . . . 111
2.6 Computing Isomorphisms among Representations . . . . . . . . . . . . . . . . . . . . . . . 113
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

A field is a commutative ring with identity, in which every non-zero element is


invertible. More informally, a field F is a set with two commutative operations
(addition and multiplication), in which one can add, subtract, and multiply
any two elements, divide any element by any non-zero element, and the mul-
tiplication operation distributes over addition. It is a convention to disregard
the zero ring (the ring with 0 = 1) as a field.
Common examples of fields are Q (the field of rational numbers), R (the
field of real numbers), and C (the field of complex numbers). On the other
hand, Z (the ring of integers) is not a field, because the only elements of Z
that have inverses (in Z itself) are ±1. The fields Q, R, C are all infinite, since
they contain infinitely many elements.

75
76 Computational Number Theory

Definition 2.1 A field containing only finitely many elements is called a finite
field or a Galois field.1 ⊳

Finite fields possess many properties (occasionally counter-intuitive) that


the common infinite fields mentioned above do not. It is important in number
theory and algebra to study these (and other) properties of finite fields. Lidl
and Niederreiter2 provide a comprehensive mathematical treatment of finite
fields. In this chapter, we study some of these mathematical results along with
computational issues associated with them.

2.1 Existence and Uniqueness of Finite Fields


The first counter-intuitive notion about finite fields is that they have pos-
itive characteristics as explained below.
If R is a ring with identity 1R = 1, then to every m ∈ Z we associate a
unique element mR of R. If m = 0, we take 0R = 0 (the additive identity). If
m > 0, we take mR = 1 + 1 + · · · + 1 (m times). Finally, if m < 0, we take
mR = −(−m)R . It is customary to denote the element mR simply by m.

Definition 2.2 Let R be a ring with multiplicative identity 1. The smallest


positive integer m for which m = mR = 1 + 1 + · · · + 1 (m times) = 0 is called
the characteristic of R, denoted char R. If no such positive integer m exists,
we take char R = 0. ⊳

The elements 1, 2, 3, . . . of a finite field F cannot be all distinct, that is,


there exist positive integers m1 , m2 with m1 < m2 such that m1 = m2 . But
then, m = m2 −m1 = 0 implying that finite fields have positive characteristics.

Proposition 2.3 The characteristic of a finite field is prime.


Proof Suppose not, that is, char F = m = uv with 1 < u, v < m for some
finite field F . The integers m, u, v are identified with the elements mF , uF , vF
of F . By the distributivity property, we have 0 = mF = uF vF . By definition,
1 Finite fields are called Galois fields after the French mathematician Évariste Galois

(1811–1832). Galois’s seminal work at the age of twenty solved a contemporary open math-
ematical problem that states that univariate polynomial equations cannot, in general, be
solved by radicals unless the degree of the polynomial is less than five. Galois’s work has
multiple ramifications in modern mathematics. In addition to the theory of finite fields,
Galois introduced Galois theory and a formal treatment of group theory. Galois died at
the age of twenty of a bullet injury sustained in a duel. Although Galois’s paper did not
receive immediate acceptance by mathematicians, Joseph Liouville (1809–1882) eventually
understood its importance and was instrumental in publishing the article in the Journal de
Mathématiques Pures et Appliquées in 1846.
2 Rudolf Lidl and Harald Niederreiter, Introduction to finite fields and their applications,

Cambridge University Press, 1994.


Arithmetic of Finite Fields 77

uF and vF are non-zero (u, v are smaller than m). It is easy to argue that a field
is an integral domain, that is, the product of two non-zero elements cannot
be 0. Thus, m cannot admit a factorization as assumed above. Moreover, if
m = 1, then F is the zero ring (not a field by definition). ⊳
The simplest examples of finite fields are the rings Zp with p prime. (Every
element of Zp \ {0} is invertible. Other field properties are trivially valid.)
Let F be a finite field of size q and characteristic p. It is easy to verify
that F contains Zp as a subfield. (Imagine the way Z and Q are embedded in
fields like R and C.) Thus, F is an extension (see below) of Zp . From algebra
it follows that F is a finite-dimensional vector space over Zp , that is, q = pn ,
where n is the dimension of F over Zp .
Proposition 2.4 Every finite field is of size pn for some p ∈ P and n ∈ N. ⊳
The converse of this is also true (although I will not prove it here).
Proposition 2.5 For every p ∈ P and n ∈ N, there exists a finite field with
exactly pn elements. ⊳
Let F, F ′ be two finite fields of the same size q = pn . Both F and F ′
are extensions of Zp . It can be proved (not very easily) that there exists an
isomorphism ϕ : F → F ′ of fields, that fixes the subfield Zp element-wise.
This result implies that any two finite fields of the same size follow the same
arithmetic. In view of this, it is customary to talk about the finite field of size
q (instead of a finite field of size q).
Definition 2.6 The finite field of size q = pn is denoted by Fq = Fpn . If q
itself is prime (corresponding to n = 1), the field Fq = Fp is called a prime
field. If n > 1, we call Fq an extension field. An alternative notation for Fq is
GF (q) (Galois field of size q). ⊳
For a prime p, the two notations Fp and Zp stand for the same algebraic
object. However, when q = pn with n > 1, the notations Fq and Zq refer to
two different rings. They exhibit different arithmetic. Fq is a field, and so every
non-zero element of it is invertible. On the other hand, Zq is not a field (nor
even an integral domain). Indeed φ(pn ) = pn−1 (p − 1), that is, Zq contains
pn−1 − 1 > 0 non-zero non-invertible elements, namely p, 2p, . . . , (pn−1 − 1)p.
Throughout the rest of this chapter, we take p to be a prime and q = pn
for some n ∈ N.

2.2 Representation of Finite Fields


The prime field Fp is the same as Zp , that is, the arithmetic of Fp can be
carried out as the integer arithmetic modulo the prime p. Chapter 1 already
deals with this modular arithmetic. Here. we concentrate on extension fields.
78 Computational Number Theory

2.2.1 Polynomial-Basis Representation


Let us recall the process of defining C as an extension of R. We know that
the polynomial x2 + 1 with real coefficients has no real roots, and so is an
irreducible polynomial over R. Let us imagine that i is a root of x2 + 1. Since
i ∈/ R, we need to introduce a structure C strictly bigger than R. We want
this bigger structure C to be a field again and contain both R and i.
C must be closed under arithmetic operations. To start with, we restrict to
addition, subtraction and multiplication. A general element of a set containing
R and i, and closed under these three operations must be of the form t(i) for
some polynomial t(x) ∈ R[x]. Since i is a root of x2 + 1, we have i 2 = −1,
i 3 = −i, i 4 = 1, i 5 = i, and so on. This implies that the polynomial t(x) may
have degree > 2, but the element t(i) can be simplified as u + iv for some real
numbers u, v. In short, C must contain all elements of the form u + iv.
However, C would be a field, and so every non-zero element of it must be
invertible. Take an element of ³ the form
´ x³+ iy with´ real numbers x, y not both
x −y
zero. We have (x + iy)−1 = x2 +y 2 + i x2 +y 2 , that is, the inverse of x + iy
can again be represented in the form u + iv. That is, polynomial expressions
t(i) of degrees < 2 suffice for making C a field.
We say that C is obtained by adjoining to R a root i of the irreducible
polynomial x2 + 1, and denote this as C = R(i). An analogous construction
applies to any field. Let F = Fp , and let f (x) ∈ F [x] be an irreducible poly-
nomial of degree n > 2. In order that the construction works, there must exist
such an irreducible polynomial f (x) of degree n. We fail to extend C using the
above construction, since any non-constant polynomial in C[x] has a root in
C (the fundamental theorem of algebra). However, for every p ∈ P and n ∈ N,
there exists (at least) one irreducible polynomial of degree n in Fp [x].
Imagine that θ is a root of f (x) in a (smallest) field K containing F = Fp .
Then, every polynomial expression t(θ) with t(x) ∈ Fp [x] must also reside
in K. We have f (θ) = 0 and deg f = n. Thus, θn can be expressed as an
F -linear combination of 1, θ, θ2 , . . . , θn−1 . But then, θn+1 = θ × θn can also
be so expressed. More generally, θk for all k > n can be expressed as F -linear
combinations of 1, θ, θ2 , . . . , θn−1 . Consequently, even if deg t(x) > n, we can
express t(θ) as a polynomial expression (in θ) of degree < n.
Now, let us take any non-zero element t(θ) of degree < n. Since f (x) is
irreducible of degree n, we have gcd(t(x), f (x)) = 1. Therefore, by Bézout’s
theorem for polynomials, there exist u(x), v(x) ∈ F [x] such that u(x)t(x) +
v(x)f (x) = 1. Since f (θ) = 0, we have u(θ)t(θ) = 1, that is, t(θ)−1 = u(θ). If
u(x) is of degree > n, we reduce u(θ) to a polynomial in θ of degree < n. It
therefore follows that K needs to contain only the polynomial expressions of
the form t(θ) with t(x) ∈ F [x] and with deg t(x) < n.
Let us finally take two polynomials s(θ), t(θ) each of degree < n. The
polynomial r(x) = s(x) − t(x) is also of degree < n. Assume that r(x) 6= 0 but
r(θ) = 0. But then, θ is a root of both r(x) and f (x). Since deg r(x) < n and
f (x) is irreducible, we have gcd(r(x), f (x)) = 1, that is, θ is a root of 1, an
Arithmetic of Finite Fields 79

absurdity. It follows that r(θ) = 0 if and only if r(x) = 0, that is, s(x) = t(x).
This implies that different polynomials s(x), t(x) of degrees < n correspond
to different elements s(θ), t(θ) of K. Thus, K can be represented by the set
K = {t(θ) | t(x) ∈ F [x], deg t(x) < n}.
A polynomial t(x) of this form has n coefficients (those of 1, x, x2 , . . . , xn−1 ),
and each of these coefficients can assume any of the p values from F = Fp .
Consequently, the size of K is pn , that is, K is a concrete realization of the field
Fq = Fpn . This representation is called the polynomial-basis representation of
Fq over Fp , because each element of K is an Fp -linear combination of the
polynomial basis 1, θ, θ2 , . . . , θn−1 . We denote this as K = F (θ).
Fq is an n-dimensional vector space over Fp . Any set of n elements θ0 , θ1 ,
. . . , θn−1 constitute an Fp -basis of Fq if and only if these elements are linearly
independent over Fp . The elements 1, θ, θ2 , . . . , θn−1 form such a basis.
To sum up, an irreducible polynomial f (x) of degree n in Fp [x] is needed to
represent the extension Fq = Fpn . Let s(θ), t(θ) be two elements of Fq , where
s(x) = a0 + a1 x + a2 x2 + · · · + an−1 xn−1 ,
t(x) = b0 + b1 x + b2 x2 + · · · + bn−1 xn−1 .
Arithmetic operations on these elements are defined as follows.
s(θ) + t(θ) = (a0 + b0 ) + (a1 + b1 )θ + (a2 + b2 )θ2 + · · · + (an−1 + bn−1 )θn−1 ,
s(θ) − t(θ) = (a0 − b0 ) + (a1 − b1 )θ + (a2 − b2 )θ2 + · · · + (an−1 − bn−1 )θn−1 ,
s(θ)t(θ) = r(θ), where r(x) = (s(x)t(x)) rem f (x),
s(θ)−1 = u(θ), where u(x)s(x) + v(x)f (x) = 1 (provided that s(θ) 6= 0).
Addition and subtraction in this representation of Fq do not require the irre-
ducible polynomial f (x), but multiplication and division do. A more detailed
implementation-level description of these operations follows in Section 2.3.
Example 2.7 (1) Let us look at the polynomial-basis representation of F4 =
F22 . The polynomials of degree two in F2 [x] are x2 , x2 + x, x2 + 1, x2 + x + 1.
The first two in this list are clearly reducible. Also, x2 +1 ≡ (x+1)2 (mod 2) is
reducible. The polynomial x2 +x+1 is irreducible. So we take f (x) = x2 +x+1
as the defining polynomial, and represent
F4 = F2 (θ) = {a1 θ + a0 | a1 , a0 ∈ {0, 1}}, where θ2 + θ + 1 = 0.
The elements of F4 are, therefore, 0, 1, θ, θ+1. The addition and multiplication
tables for F4 are given below.
0 1 θ θ+1 0 1 θ θ+1
0 0 1 θ θ+1 0 0 0 0 0
1 1 0 θ+1 θ 1 0 1 θ θ+1
θ θ θ+1 0 1 θ 0 θ θ+1 1
θ+1 θ+1 θ 1 0 θ+1 0 θ+1 1 θ
Addition in F4 Multiplication in F4
80 Computational Number Theory

Consider the elements θ and θ + 1. Their sum is θ + θ + 1 = 2θ + 1 = 1 modulo


2, whereas their product is θ(θ + 1) = θ2 + θ = (θ2 + θ + 1) + 1 = 1. In any
ring of characteristic two, subtraction is same as addition (since −1 = 1).
(2) Let us now represent F8 = F23 . First, we need an irreducible polyno-
mial in F2 [x] of degree three. The polynomials of degree three that split into
linear factors are x3 , x2 (x + 1) = x3 + x2 , x(x + 1)2 = x(x2 + 1) = x3 + x,
and (x + 1)3 = x3 + x2 + x + 1. On the other hand, the polynomials of degree
three that factor into one linear factor and one quadratic irreducible factor
are x(x2 + x + 1) = x3 + x2 + x, and (x + 1)(x2 + x + 1) = x3 + 1. This
leaves us with only two irreducible polynomials of degree three: x3 + x + 1 and
x3 + x2 + 1. Let us take the defining polynomial f (x) = x3 + x2 + 1 so that
F8 = F2 (θ) = {a2 θ2 + a1 θ + a0 | a2 , a1 , a0 ∈ {0, 1}},
where θ3 + θ2 + 1 = 0. The addition table for F8 follows now.
0 1 θ θ+1 θ2 θ2 + 1 θ2 + θ θ2 + θ + 1
0 0 1 θ θ+1 θ2 θ2 + 1 θ2 + θ θ2 + θ + 1
1 1 0 θ+1 θ θ2 + 1 θ2 θ2 + θ + 1 θ2 + θ
θ θ θ+1 0 1 θ2 + θ θ2 + θ + 1 θ2 θ2 + 1
θ+1 θ+1 θ 1 0 θ2 + θ + 1 θ2 + θ θ2 + 1 θ2
θ2 θ2 θ2 + 1 θ2 + θ θ2 + θ + 1 0 1 θ θ+1
θ2 + 1 θ2 + 1 θ2 θ2 + θ + 1 θ2 + θ 1 0 θ+1 θ
θ2 + θ θ2 + θ θ2 + θ + 1 θ2 θ2 + 1 θ θ+1 0 1
θ2 + θ + 1 2
θ +θ+1 2
θ +θ 2
θ +1 θ 2
θ+1 θ 1 0

Multiplication in F8 involves multiplying two elements of F8 as polynomials


over F2 . If the product has degree three or more, one uses the equation θ3 =
θ2 + 1 repeatedly, in order to reduce the product to a polynomial of degree
less than three. This leads to the following multiplication table.

0 1 θ θ+1 θ2 θ2 + 1 θ2 + θ θ2 + θ + 1
0 0 0 0 0 0 0 0 0
1 0 1 θ θ+1 θ2 θ2 + 1 θ2 + θ θ2 + θ + 1
θ 0 θ θ2 θ2 + θ θ2 + 1 θ2 + θ + 1 1 θ+1
θ+1 0 θ+1 θ2 + θ θ2 + 1 1 θ θ2 + θ + 1 θ2
θ2 0 θ2 θ2 + 1 1 θ2 + θ + 1 θ + 1 θ θ2 + θ
θ2 + 1 0 θ2 + 1 θ2 + θ + 1 θ θ+1 θ2 + θ θ2 1
θ2 + θ 0 θ2 + θ 1 θ2 + θ + 1 θ θ2 θ+1 θ2 + 1
θ2 + θ + 1 0 θ2 + θ + 1 θ + 1 θ2 θ2 + θ 1 θ2 + 1 θ

As a specific example, let us multiply θ2 + θ with θ2 + θ + 1. The product


of the polynomials over F2 is (θ2 + θ)(θ2 + θ + 1) = θ4 + 2θ3 + 2θ2 + θ = θ4 + θ.
Now, we use the fact that θ3 + θ2 + 1 = 0 in order to obtain θ4 + θ =
(θ4 + θ3 + θ) + θ3 = θ(θ3 + θ2 + 1) + θ3 = θ3 = (θ3 + θ2 + 1) + θ2 + 1 = θ2 + 1.
Let us finally compute (θ2 + θ)−1 in this representation of F8 . Obviously,
we can locate the desired inverse by looking at the above multiplication table.
However, if q is large, the entire multiplication table for Fq cannot be computed
or stored, and one should resort to computing a Bézout relation involving the
polynomial to be inverted and the defining polynomial. In this example, we
have x(x2 + x) + 1(x3 + x2 + 1) = 1. Substituting x = θ gives (θ2 + θ)−1 = θ.
Arithmetic of Finite Fields 81

(3) Rijndael (accepted as the Advanced Encryption Standard (AES)), is


a cryptographic cipher whose working is based on the arithmetic of the field
F256 = F28 represented by the irreducible polynomial x8 + x4 + x3 + x + 1.
(4) Let us now look at a finite field of characteristic larger than two, namely
at the field F9 = F32 of characteristic three. Since 2 is a quadratic non-residue
modulo 3, the polynomial x2 − 2 is irreducible in F3 [x]. But −2 ≡ 1 (mod 3),
that is, we take f (x) = x2 + 1 as the defining polynomial. (Like complex
numbers) we then have the following representation of F9 .
F9 = F3 (θ) = {a1 θ + a0 | a1 , a0 ∈ {0, 1, 2}}, where θ2 + 1 = 0.
We could have also taken 2(x2 + 1) as the defining polynomial. However, it
often turns out to be convenient to take a monic3 irreducible polynomial as
the defining polynomial. The addition table for F9 is as follows.

0 1 2 θ θ+1 θ+2 2θ 2θ + 1 2θ + 2
0 0 1 2 θ θ+1 θ+2 2θ 2θ + 1 2θ + 2
1 1 2 0 θ+1 θ+2 θ 2θ + 1 2θ + 2 2θ
2 2 0 1 θ+2 θ θ+1 2θ + 2 2θ 2θ + 1
θ θ θ+1 θ+2 2θ 2θ + 1 2θ + 2 0 1 2
θ+1 θ+1 θ+2 θ 2θ + 1 2θ + 2 2θ 1 2 0
θ+2 θ+2 θ θ+1 2θ + 2 2θ 2θ + 1 2 0 1
2θ 2θ 2θ + 1 2θ + 2 0 1 2 θ θ+1 θ+2
2θ + 1 2θ + 1 2θ + 2 2θ 1 2 0 θ+1 θ+2 θ
2θ + 2 2θ + 2 2θ 2θ + 1 2 0 1 θ+2 θ θ+1

The multiplication table for F9 is as follows.

0 1 2 θ θ+1 θ+2 2θ 2θ + 1 2θ + 2
0 0 0 0 0 0 0 0 0 0
1 0 1 2 θ θ+1 θ+2 2θ 2θ + 1 2θ + 2
2 0 2 1 2θ 2θ + 2 2θ + 1 θ θ+2 θ+1
θ 0 θ 2θ 2 θ+2 2θ + 2 1 θ+1 2θ + 1
θ+1 0 θ+1 2θ + 2 θ+2 2θ 1 2θ + 1 2 θ
θ+2 0 θ+2 2θ + 1 2θ + 2 1 θ θ+1 2θ 2
2θ 0 2θ θ 1 2θ + 1 θ+1 2 2θ + 2 θ+2
2θ + 1 0 2θ + 1 θ+2 θ+1 2 2θ 2θ + 2 θ 1
2θ + 2 0 2θ + 2 θ+1 2θ + 1 θ 2 θ+2 1 2θ

As a specific example, consider the product (θ +2)(2θ +1) = 2θ2 +5θ +2 =


2θ + 2θ + 2 = 2(θ2 + 1) + 2θ = 2θ.
2
¤

2.2.2 Working with Finite Fields in GP/PARI


GP/PARI supports arithmetic over finite fields. The arithmetic of the prime
field Fp is the modular arithmetic of Zp and is discussed earlier. Let us focus
3 The coefficient of the non-zero term of the highest degree in a non-zero polynomial f (x)

is called the leading coefficient of the polynomial, denoted lc f (x). If lc f (x) = 1, we call
f (x) monic. If f (x) is any non-zero polynomial over a field F with a = lc f (x), multiplying
f (x) by a−1 ∈ F gives a monic polynomial.
82 Computational Number Theory

on an extension field Fq = Fpn . This involves modular arithmetic of two types.


First, all polynomial coefficients are reduced modulo p, that is, the coefficient
arithmetic is the modular arithmetic of Zp . Second, the arithmetic of Fq is
the polynomial arithmetic of Zp [x] modulo the defining polynomial f (x).
As an example, let us represent F8 as in Example 2.7(2). First, we fix a
defining polynomial.

gp > f = Mod(1,2)*x^3+Mod(1,2)*x^2+Mod(1,2)
%1 = Mod(1, 2)*x^3 + Mod(1, 2)*x^2 + Mod(1, 2)

Next, we take two elements of F8 as two polynomials in F2 [x] modulo f (x).

gp > a = Mod(Mod(1,2)*x^2+Mod(1,2)*x, f)
%2 = Mod(Mod(1, 2)*x^2 + Mod(1, 2)*x, Mod(1, 2)*x^3 + Mod(1, 2)*x^2 + Mod(1, 2))
gp > b = Mod(Mod(1,2)*x^2+Mod(1,2)*x+Mod(1,2), f)
%3 = Mod(Mod(1, 2)*x^2 + Mod(1, 2)*x + Mod(1, 2), Mod(1, 2)*x^3 + Mod(1, 2)*x^2
+ Mod(1, 2))

Now, we can carry out arithmetic operations on a and b.

gp > a + b
%4 = Mod(Mod(1, 2), Mod(1, 2)*x^3 + Mod(1, 2)*x^2 + Mod(1, 2))
gp > a * b
%5 = Mod(Mod(1, 2)*x^2 + Mod(1, 2), Mod(1, 2)*x^3 + Mod(1, 2)*x^2 + Mod(1, 2))
gp > a^(-1)
%6 = Mod(Mod(1, 2)*x, Mod(1, 2)*x^3 + Mod(1, 2)*x^2 + Mod(1, 2))
gp > a / b
%7 = Mod(Mod(1, 2)*x^2, Mod(1, 2)*x^3 + Mod(1, 2)*x^2 + Mod(1, 2))
gp > a^4
%8 = Mod(Mod(1, 2)*x^2 + Mod(1, 2), Mod(1, 2)*x^3 + Mod(1, 2)*x^2 + Mod(1, 2))

The fact that the inverse a−1 is correctly computed can be verified by
invoking the extended gcd function on polynomials.

gp > bezout(Mod(1,2)*x^2+Mod(1,2)*x,f)
%9 = [Mod(1, 2)*x, Mod(1, 2), Mod(1, 2)]

The expressions handled by GP/PARI may appear a bit clumsy. But if one
looks closely at these expressions, the exact structure of the elements becomes
absolutely clear. Our simpler (and more compact) mathematical notations are
meaningful only under the assumption that certain symbols are implicitly un-
derstood from the context (like θ is a root of the defining polynomial f (x)).
Given that GP/PARI provides only a text-based interface and supports a va-
Arithmetic of Finite Fields 83

riety of objects, this explicit and verbose representation seems unavoidable


(perhaps undesirable too). However, if you insist that you do not want to see
the defining polynomial, you can lift() a field element. If, in addition, you
do not want the Mod’s in the coefficients, make another lift(). But lift()
destroys information, and should be used with proper precaution.

gp > c = lift(a * b)
%10 = Mod(1, 2)*x^2 + Mod(1, 2)
gp > d = lift(c)
%11 = x^2 + 1
gp > c^4
%12 = Mod(1, 2)*x^8 + Mod(1, 2)
gp > d^4
%13 = x^8 + 4*x^6 + 6*x^4 + 4*x^2 + 1
gp > e = a * b
%14 = Mod(Mod(1, 2)*x^2 + Mod(1, 2), Mod(1, 2)*x^3 + Mod(1, 2)*x^2 + Mod(1, 2))
gp > e^4
%15 = Mod(Mod(1, 2)*x + Mod(1, 2), Mod(1, 2)*x^3 + Mod(1, 2)*x^2 + Mod(1, 2))
gp > lift(e^4)
%16 = Mod(1, 2)*x + Mod(1, 2)
gp > lift(lift(e^4))
%17 = x + 1

2.2.3 Choice of the Defining Polynomial


The irreducible polynomial f (x) ∈ Fp [x] used to define the extension Fpn
has a bearing on the running time of the arithmetic routines of Fpn . A random
irreducible polynomial in Fp [x] of a given degree n can be obtained using the
procedure described in Section 3.1. This procedure is expected to produce
dense irreducible polynomials (polynomials with Θ(n) non-zero coefficients).
Multiplication in Fpn is a multiplication in Fp [x] of two polynomials of
degrees < n, followed by reduction modulo f (x). The reduction step involves
a long division of a polynomial of degree 6 2n − 2 by a polynomial of degree
n. A straightforward implementation of this division may take Θ(n2 ) time.
If f (x) is sparse (that is, has only a few non-zero coefficients), this reduc-
tion step can be significantly more efficient than the case of a dense f (x),
because for a sparse f (x), only a few coefficients need to be adjusted in each
step of polynomial division. In view of this, irreducible binomials, trinomials,
quadrinomials and pentanomials (that is, polynomials with exactly two, three,
four and five non-zero terms) in Fp [x] are often of importance to us. They lead
to a running time of the order O(n) for the reduction step.
An irreducible binomial in Fp [x] must be of the form xn + a with a ∈ F∗p .
We can characterize all values of a for which this polynomial is irreducible.

Theorem 2.8 The binomial xn + a ∈ Fp [x] is irreducible if and only if both


the following conditions are satisfied:
84 Computational Number Theory

(1) Every prime factor of n must divide ordp (−a), but not (p − 1)/ ordp (−a).
(2) If n ≡ 0 (mod 4), then p ≡ 1 (mod 4). ⊳

These two conditions are somewhat too restrictive. For example, we do


not have any irreducible binomials of degree 4k over a prime field of size
4l + 3. It is, therefore, advisable to study irreducible trinomials xn + axk + b
with 1 6 k 6 n − 1 and a, b ∈ F∗p . Complete characterizations of irreducible
trinomials over all prime fields Fp are not known. Some partial results are,
however, available. For example, the following result is useful when p ≫ n.

Theorem 2.9 The number of irreducible trinomials in Fp [x] of the form xn +


x + b (with b ∈ F∗p ) is asymptotically equal to p/n. ⊳

This result indicates that after trying O(n) random values of b, we expect
to obtain an irreducible trinomial of the form xn + x + b. The choice k = 1 in
Theorem 2.9 is particularly conducive to efficient implementations. However,
if p is small, there are not many choices for b for the random search to succeed
with high probability. In that case, we need to try with other values of k.
For every finite field Fp and every degree n, an irreducible binomial or
trinomial or quadrinomial may fail to exist. An example (p = 2 and n = 8) is
covered in Exercise 2.5. However, the following conjecture4 is interesting.

Conjecture 2.10 For any finite field Fq with q > 3 and for any n ∈ N, there
exists an irreducible polynomial in Fq [x] with degree n and with at most four
non-zero terms. ⊳

2.3 Implementation of Finite Field Arithmetic


Given GP/PARI functions implementing all elementary operations in finite
fields, writing these functions ourselves may sound like a waste of time. Still,
as we have done with multiple-precision integers, let us devote some time here
to these implementation issues. Impatient readers may skip this section.

2.3.1 Representation of Elements


The standard polynomial-basis representation of Fq = Fpn calls for a defin-
ing irreducible polynomial f (x) ∈ Fp [x] of degree n. This polynomial acts as
the modulus for all field operations. An element α ∈ Fq is a polynomial of
degree < n, and can be represented by its n coefficients, each of which is an
4 Joachim von zur Gathen, Irreducible trinomials over finite fields, Mathematics of Com-

putation, 72, 1987–2000, 2003.


Arithmetic of Finite Fields 85

integer available modulo p. One then needs to write polynomial-manipulation


routines to implement modular arithmetic on these polynomials.
The case p = 2 turns out to be practically the most important. In view
of this, I focus mostly on fields of characteristic two (also called binary fields)
in this section. An element of F2 is essentially a bit, so an element of F2n is
an array of n bits. It is preferable to pack multiple bits in a single word. This
promotes compact representation and efficient arithmetic routines. The word
size (in bits) is denoted by w. Two natural choices for w are 32 and 64.

Example 2.11 For the sake of illustration, I take an artificially small w = 8,


that is, eight bits of a field element are packed in a word. Let us represent F219
(with extension degree n = 19) as F2 (θ), where θ is a root of the irreducible
pentanomial f (x) = x19 + x5 + x2 + x + 1. (Incidentally, there is no irreducible
trinomial of degree 19 in F2 [x].)
An element of F219 is a polynomial of the form
a18 θ18 + a17 θ17 + · · · + a1 θ + a0 ,
where each ai ∈ {0, 1}. The bit string a18 a17 . . . a1 a0 represents this element.
All these 19 bits do not fit in a word. We need three words of size w = 8 bits
to store this bit array. This representation is denoted as
a18 a17 a16 a15 a14 a13 a12 a11 a10 a9 a8 a7 a6 a5 a4 a3 a2 a1 a0 ,
with spaces indicating word boundaries. As a concrete example, the element
θ17 + θ12 + θ11 + θ9 + θ7 + θ5 + θ2 + θ + 1
is represented as
010 00011010 10100111.
The leftmost word is not fully used (since n is not a multiple of w). The unused
bits in this word may store any value, but it is safe to pad the leftmost word
with zero bits (equivalently, the polynomial with leading zero coefficients). ¤

Extension fields of characteristic three have found recent applications (for


example, in pairing calculations). An element of F3n is represented by a poly-
nomial of degree 6 n − 1 with coefficients from {0, 1, 2}. The coefficients are
now three-valued, whereas bit-level representations are more convenient in
digital computers. One possibility is to represent each coefficient by a pair of
bits. Although the natural choices for the coefficients 0, 1, 2 are respectively
00, 01 and 10 (with the bit pattern 11 left undefined), it is not mandatory to
be natural. Kawahara et al.5 show that using the encoding 11, 01 and 10 for
0, 1, 2 respectively is more profitable from the angle of efficient implementation
of the arithmetic of F3n .
5 Yuto Kawahara, Kazumaro Aoki and Tsuyoshi Takagi, Faster implementation of η
T
pairing over GF(3m ) using minimum number of logical instructions for GF(3)-addition,
Pairing, 282–296, 2008.
86 Computational Number Theory

Therefore, an element
an−1 θn−1 + an−2 θn−2 + · · · + a1 θ + a0
with each ai ∈ {0, 1, 2} being encoded by Kawahara et al.’s scheme can be rep-
resented by the bit string hn−1 ln−1 hn−2 ln−2 . . . h1 l1 h0 l0 of length 2n, where
hi li is the two-bit encoding of ai . It is advisable to separately store the high-
order bits and the low-order bits. That means that the above element is to
be stored as two bit arrays hn−1 hn−2 . . . h1 h0 and ln−1 ln−2 . . . l1 l0 each of size
n. Each of these bit arrays can be packed individually in an array of w-bit
words, as done for binary fields.
Example 2.12 Let us represent F319 as F3 (θ), where θ is a root of f (x) =
x19 + x2 + 2. Consider the element
2θ18 + θ16 + θ13 + 2θ10 + θ6 + θ5 + 2θ2
of F319 . As a sequence of ternary digits, this polynomial can be represented as
2010010020001100200. Under Kawahara et al.’s encoding, the bit representa-
tion of this element is as follows. We take words of size w = 8 bits.
High-order bit array 110 11011111 10011111
Low-order bit array 011 11111011 11111011
Different words are separated by spaces. ¤

2.3.2 Polynomial Arithmetic


Arithmetic in an extension field Fpn under the polynomial-basis representa-
tion is the modular polynomial arithmetic of Fp [x]. The irreducible polynomial
that defines the extension Fpn is used as the modulus in all these operations.

2.3.2.1 Addition and Subtraction


Addition in binary fields is the bit-by-bit XOR operation. Since multiple
coefficients are packed per word, word-level XOR operations add multiple (w)
coefficients simultaneously. This makes the addition operation very efficient.
Example 2.13 We use the representation of F219 as in Example 2.11. Con-
sider the following two operands with the bit-vector representations:
θ18 + θ16 + θ14 + θ12 + θ10 + θ9 + θ8 + θ2 + 1 101 01010111 00000101
θ17 + θ11 + θ9 + θ8 + θ7 + θ6 + θ5 + θ4 + θ3 + 1 010 00001011 11111001
Applying word-level XOR operations on the two arrays gives the array
111 01011100 11111100
which represents the sum of the input operands, that is, the polynomial
θ18 + θ17 + θ16 + θ14 + θ12 + θ11 + θ10 + θ7 + θ6 + θ5 + θ4 + θ3 + θ2 . ¤
Arithmetic of Finite Fields 87

Addition in fields of characteristic three is somewhat more involved. We


have adopted Kawahara et al.’s encoding scheme so as to minimize the word-
level bit-wise operations on the high and low arrays of the operands. Let the
input operands be α and β. Denote the high and low bit arrays of α as αh
and αl . Likewise, use the symbols βh and βl for the two arrays of β. The two
arrays γh and γl of the sum γ = α + β need to be computed. Kawahara et
al. show that this is possible with six bit-wise operations only:
γh = (αl XOR βl ) OR ((αh XOR βh ) XOR αl ),
γl = (αh XOR βh ) OR ((αl XOR βl ) XOR αh ).
Kawahara et al. also demonstrate that no encoding scheme can achieve this
task using less than six bit-wise operations (XOR, OR, AND and NOT only).

Example 2.14 Take the following two elements α, β in the representation of


F319 given in Example 2.12.

α = 2θ18 + θ16 + θ13 + 2θ10 + θ6 + θ5 + 2θ2


αh : 110 11011111 10011111
αl : 011 11111011 11111011
β = θ17 + θ14 + 2θ13 + θ10 + θ8 + 2θ3 + 2θ2 + 1
βh : 101 10111010 11111110
βl : 111 11011111 11110011
The six operations are shown now. Temporary arrays τ1 , τ2 , τ3 , τ4 are used.
τ1 = αh XOR βh = 011 01100101 01100001
τ2 = αl XOR βl = 100 00100100 00001000
τ3 = τ1 XOR αl = 000 10011110 10011010
τ4 = τ2 XOR αh = 010 11111011 10010111
γh = τ2 OR τ3 = 100 10111110 10011010
γl = τ1 OR τ4 = 011 11111111 11110111
Thus, γ corresponds to the sequence 211 01000001 01102101 of ternary digits,
that is, to the following polynomial which is clearly α + β modulo 3.

2θ18 + θ17 + θ16 + θ14 + θ8 + θ6 + θ5 + 2θ3 + θ2 + 1. ¤

Subtraction in F2n is same as addition. For subtraction in F3n , it suffices


to note that α − β = α + (−β), and that the representation of −β is obtained
from that of β by swapping the high- and low-order bit arrays.

2.3.2.2 Multiplication
Multiplication in Fpn involves two basic operations. First, the two operands
are multiplied as polynomials in Fp [x]. The result is a polynomial of degree 6
2(n−1). Subsequently, this product is divided by the defining polynomial f (x).
The remainder (a polynomial of degree < n) is the canonical representative
88 Computational Number Theory

of the product in the field. In what follows, I separately discuss these two
primitive operations. As examples, I concentrate on binary fields only.
The first approach towards multiplying two polynomials of degrees < n is
to initialize the product as the zero polynomial with (formal) degree 2(n − 1).
For each non-zero term bi xi in the second operand, the first operand is multi-
plied by bi , shifted by i positions, and added to the product. For binary fields,
the only non-zero value of bi is 1, so only shifting and adding (XOR) suffice.

Example 2.15 Let us multiply the two elements α and β of Example 2.13.
The exponents i in the non-zero terms θi of β and the corresponding shifted
versions of α are shown below. When we add (XOR) all these shifted values,
we obtain the desired product.

i xi α(x)
0 101 01010111 00000101
3 101010 10111000 00101
4 1010101 01110000 0101
5 10101010 11100000 101
6 1 01010101 11000001 01
7 10 10101011 10000010 1
8 101 01010111 00000101
9 1010 10101110 0000101
11 101010 10111000 00101
17 1010 10101110 0000101
01010 10001000 01100101 00011011 00011101

Storing the product, a polynomial of degree 6 36, needs five eight-bit words. ¤

2.3.2.3 Comb Methods


The above multiplication algorithm can be speeded up in a variety of
ways. Here, I explain some tricks for binary fields.6 An important observation
regarding the above shift-and-add algorithm is that the shifts xi α(x) and
xj α(x) differ only by trailing zero words if i ≡ j (mod w). Therefore, the
shifted polynomials xj α(x) need to be computed only for j = 0, 1, 2, . . . , w − 1.
Once a shifted value xj α(x) is computed, it can be used for all non-zero
coefficients bi = 1 with i = j + kw for k = 0, 1, 2, . . . . All we need is to add
xj α(x) starting at the k-th word (from right) of the product. This method is
referred to as the right-to-left comb method.

Example 2.16 In the multiplication of Example 2.15, we need to compute


xj α(x) for j = 0, 1, 2, . . . , 7 only. We have x0 α(x) = α(x), and xj α(x) =
x × (xj−1 α(x)) for j = 1, 2, . . . , 7, that is, xj α(x) is obtained from xj−1 α(x)
6 Julio López and Ricardo Dahab, High-speed software multiplication in F m , IndoCrypt,
2
203–212, 2000.
Arithmetic of Finite Fields 89

by a left-shift operation. In the current example, x0 α(x) is used for i = 0, 8,


x1 α(x) for i = 9, 17, x2 α(x) is not used, x3 α(x) is used for i = 3, 11, x4 α(x)
for i = 4, x5 α(x) for i = 5, x6 α(x) for i = 6, and x7 α(x) for i = 7. ¤

The left-to-right comb method shifts the product polynomial (instead of the
first operand). This method processes bit positions j = w − 1, w − 2, . . . , 1, 0
in a word, in that sequence. For all words in β with the j-th bit set, α is
added to the product with appropriate word-level shifts. When a particular j
is processed, the product is multiplied by x (left-shifted by one bit), so that it
is aligned at the next (the (j − 1)-st) bit position in the word. The left-to-right
comb method which shifts the product polynomial is expected to be slower
than the right-to-left comb method which shifts the first operand.

Example 2.17 The left-to-right comb method is now illustrated for the mul-
tiplication of Example 2.15. The product polynomial γ is always maintained
as a polynomial of (formal) degree 36. The k-th word of γ is denoted by γk .

α= 101 01010111 00000101


x8 α = 101 01010111 00000101
x16 α = 101 01010111 00000101
j i Operation γ
γ4 γ3 γ2 γ1 γ0
Initialize γ to 0 00000 00000000 00000000 00000000 00000000
7 7 Add α from word 0 00000 00000000 00000101 01010111 00000101
Left-shift γ 00000 00000000 00001010 10101110 00001010
6 6 Add α from word 0 00000 00000000 00001111 11111001 00001111
Left-shift γ 00000 00000000 00011111 11110010 00011110
5 5 Add α from word 0 00000 00000000 00011010 10100101 00011011
Left-shift γ 00000 00000000 00110101 01001010 00110110
4 4 Add α from word 0 00000 00000000 00110000 00011101 00110011
Left-shift γ 00000 00000000 01100000 00111010 01100110
3 3 Add α from word 0 00000 00000000 01100101 01101101 01100011
11 Add α from word 1 00000 00000101 00110010 01101000 01100011
Left-shift γ 00000 00001010 01100100 11010000 11000110
2 Left-shift γ 00000 00010100 11001001 10100001 10001100
1 9 Add α from word 1 00000 00010001 10011110 10100100 10001100
17 Add α from word 2 00101 01000110 10011011 10100100 10001100
Left-shift γ 01010 10001101 00110111 01001001 00011000
0 0 Add α from word 0 01010 10001101 00110010 00011110 00011101
8 Add α from word 1 01010 10001000 01100101 00011011 00011101

The final value of γ is the same polynomial computed in Example 2.15.


For the last bit position j = 0, γ is not left-shifted since no further alignment
of γ is necessary. The shifted polynomials x8 α and x16 α need not be explicitly
computed. They are shown above only for the convenience of the reader. ¤
90 Computational Number Theory

2.3.2.4 Windowed Comb Methods


The comb methods can be made faster using precomputation and table
lookup. The basic idea is to use a window of some size k (as in the windowed
exponentiation algorithm of Exercise 1.27). The products αδ are precomputed
and stored for all of the 2k binary polynomials δ of degree < k. In the multi-
plication loop, k bits of β are processed at a time. Instead of adding α (or a
shifted version of α) for each one-bit of β, we add the precomputed polynomial
αδ (or its suitably shifted version), where δ is the k-bit chunk read from the
second operand β. A practically good choice for k is four.

Example 2.18 The working of the right-to-left windowed comb method ap-
plied to the multiplication of Example 2.16 is demonstrated now. We take the
window size k = 2. The four products are precomputed as
(00)α(x) = (0x + 0)α(x) = 0000 00000000 00000000
(01)α(x) = (0x + 1)α(x) = 0101 01010111 00000101
(10)α(x) = (1x + 0)α(x) = 1010 10101110 00001010
(11)α(x) = (1x + 1)α(x) = 1111 11111001 00001111
The multiplication loop runs as follows.

i Bits bi+1 bi xi (bi+1 x + bi )α


0 01 0101 01010111 00000101
2 10 101010 10111000 001010
4 11 11111111 10010000 1111
6 11 11 11111110 01000011 11
8 11 1111 11111001 00001111
10 10 101010 10111000 001010
12 00
14 00
16 10 1010 10101110 00001010
18 00
01010 10001000 01100101 00011011 00011101

For the bit pattern 00, no addition needs to be made. In Example 2.15,
2.16 or 2.17, ten XOR operations are necessary, whereas the windowed comb
method needs only seven. Of course, we now have the overhead of precompu-
tation. In general, two is not a good window size (even for larger extensions
than used in these examples). A good tradeoff among the overhead of pre-
computation (and storage), the number of XOR operations, and programming
convenience (the window size k should divide the word size w) is k = 4. ¤

A trouble with the right-to-left windowed comb method is that it requires


many bit-level shifts (by amounts which are multiples of k) of many or all of the
precomputed polynomials. Effectively handling all these shifts is a nuisance.
The solution is to shift the product instead of the precomputed polynomials.
Arithmetic of Finite Fields 91

That indicates that we need to convert the left-to-right comb method to the
windowed form. Since k bits are simultaneously processed from β, the product
γ should now be left-shifted by k bits.

Example 2.19 The left-to-right windowed comb method works on the mul-
tiplication of Example 2.15 as follows. We take the window size k = 2 for
our illustration. The four precomputed polynomials are the same as in Exam-
ple 2.18. The main multiplication loop is now unfolded.

j i bi+1 bi Value of variable Op


γ = 00000 00000000 00000000 00000000 00000000 Init
6 6 11 (11)α = 1111 11111001 00001111
γ = 00000 00000000 00001111 11111001 00001111 Add
14 00
22 00
γ = 00000 00000000 00111111 11100100 00111100 Shift
4 4 11 (11)α = 1111 11111001 00001111
γ = 00000 00000000 00110000 00011101 00110011 Add
12 00
20 00
γ = 00000 00000000 11000000 01110100 11001100 Shift
2 2 10 (10)α = 1010 10101110 00001010
γ = 00000 00000000 11001010 11011010 11000110 Add
10 10 x8 (10)α = 1010 10101110 00001010
γ = 00000 00001010 01100100 11010000 11000110 Add
18 00
γ = 00000 00101001 10010011 01000011 00011000 Shift
0 0 01 (01)α = 0101 01010111 00000101
γ = 00000 00101001 10010110 00010100 00011101 Add
8 11 x8 (11)α = 1111 11111001 00001111
γ = 00000 00100110 01101111 00011011 00011101 Add
16 10 x16 (10)α = 1010 10101110 00001010
γ = 01010 10001000 01100101 00011011 00011101 Add

In the above table, the operation “Add” stands for adding a word-level shift
of a precomputed polynomial to γ. These word-level shifts are not computed
explicitly, but are shown here for the reader’s convenience. The “Shift” oper-
ation stands for two-bit left shift of γ. As in Example 2.18, only seven XOR
operations suffice. The number of bit-level shifts of the product (each by k bits)
is always (w/k) − 1 (three in this example), independent of the operands. ¤

Other fast multiplication techniques (like Karatsuba–Ofman multiplica-


tion) can be used. I am not going to discuss this further here. Let me instead
concentrate on the second part of modular multiplication, that is, reduction
modulo the defining polynomial f (x).
92 Computational Number Theory

2.3.2.5 Modular Reduction


We assume that we have a polynomial γ(x) of degree 6 2(n − 1). Our task
is to compute the remainder ρ(x) = γ(x) rem f (x), where deg f (x) = n.
Euclidean division of polynomials keeps on removing terms of degrees
larger than n by subtracting suitable multiples of f (x) from γ(x). It is natural
to remove non-zero terms one by one from γ(x) in the decreasing order of their
degrees. Subtraction of a multiple of f (x) may introduce new non-zero terms,
so it is, in general, not easy to eliminate multiple non-zero terms simultane-
ously. For polynomials over F2 , the only non-zero coefficient is 1. In order to
remove the non-zero term xi from γ(x), we need to subtract (that is, add or
XOR) xi−n f (x) from γ(x), where xi−n f (x) can be efficiently computed by
left shifting f (x) by i − n bits. Eventually, γ(x) reduces to a polynomial of
degree < n. This is the desired remainder ρ(x).

Example 2.20 Let us reduce the product γ(x) computed in Examples 2.15–
2.19. Elimination of its non-zero terms of degrees > 19 is illustrated below.

i Intermediate values Operation


γ(x) = 01010 10001000 01100101 00011011 00011101 Init
35 x16 f (x) = 1000 00000000 00100111 Shift
γ(x) = 00010 10001000 01000010 00011011 00011101 Add
33 x14 f (x) = 10 00000000 00001001 11 Shift
γ(x) = 00000 10001000 01001011 11011011 00011101 Add
31 x12 f (x) = 10000000 00000010 0111 Shift
γ(x) = 00000 00001000 01001001 10101011 00011101 Add
27 x8 f (x) = 1000 00000000 00100111 Shift
γ(x) = 00000 00000000 01001001 10001100 00011101 Add
22 x3 f (x) = 1000000 00000001 00111 Shift
γ(x) = 00000 00000000 00001001 10001101 00100101 Add
19 x0 f (x) = 1000 00000000 00100111 Shift
γ(x) = 00000 00000000 00000001 10001101 00000010 Add

Here, “Shift” is the shifted polynomial xi−n f (x), and “Add” is the addition
of this shifted value to γ(x). After six iterations of term cancellation, γ(x)
reduces to a polynomial of degree 16. It follows that the product of α and β
(of Example 2.15) in F219 is θ16 + θ15 + θ11 + θ10 + θ8 + θ. ¤

Modular reduction can be made efficient if the defining polynomial f (x)


is chosen appropriately. The first requirement is that f (x) should have as few
non-zero terms as possible. Irreducible binomials, trinomials, quadrinomials
and pentanomials are very helpful in this regard. Second, the degrees of the
non-zero terms in f (x) (except xn itself) should be as low as possible. In other
words, the largest degree n1 of these terms should be sufficiently smaller than
n. If n−n1 > w (where w is the word size), cancellation of a non-zero term axi
Arithmetic of Finite Fields 93

by subtracting axi−n f (x) from γ(x) does not affect other coefficients residing
in the same word of γ(x) storing the coefficient of xi . This means that we can
now cancel an entire word together.
To be more precise, let us concentrate on binary fields, and write f (x) =
xn + f1 (x) with n1 = deg f1 (x) 6 n − w. We want to cancel the leftmost
non-zero word µ from γ(x). Clearly, µ is a polynomial of degree 6 w − 1.
If µ is the r-th word in γ, we need to add (XOR) xrw−n µf (x) to γ(x). But
xrw−n µf (x) = xrw µ + xrw−n µf1 (x). The first part xrw µ is precisely the r-th
word of γ, so we can set this word to zero without actually performing the
addition. The condition n1 6 n−w indicates that the second part xrw−n µf1 (x)
does not have non-zero terms in the r-th word of γ. Since multiplication by
xrw−n is a left shift, the only non-trivial computation is that of µf1 (x). But
µ has a small degree (6 w − 1). If f1 (x) too has only a few non-zero terms,
this multiplication can be quite efficient. We can use a comb method for this
multiplication in order to achieve higher efficiency. Since f1 (x) is a polynomial
dependent upon the representation of the field (but not on the operands), the
precomputation for a windowed comb method needs to be done only once, for
all reduction operations in the field. Even eight-bit windows can be feasible
in terms of storage if f1 (x) has only a few non-zero coefficients.

Example 2.21 Let us perform the division of Example 2.20 by word-based


operations. In our representation of F219 , we choose the irreducible polynomial
f (x) = x19 + f1 (x), where f1 (x) = x5 + x2 + x + 1, for which the degree n1 = 5
is sufficiently smaller than n = 19 (our choice for w is eight). Therefore, it
is safe to reduce γ(x) word by word. The calculations are given below. The
words of γ are indexed by the variable r. Since f1 fits in a word, we treat it
as a polynomial of degree seven, so µf1 is of degree 12 and fits in two words.

r µ Intermediate values
γ(x) = 01010 10001000 01100101 00011011 00011101
4 00001010 x13 µf1 (x) = 00000 00101110 110
γ(x) = 00000 10001000 01001011 11011011 00011101
3 10001000 x5 µf1 (x) = 00010 01010111 000
γ(x) = 00000 00000000 01001001 10001100 00011101
2 00001001 µf1 (x) = 00000001 00011111
γ(x) = 00000 00000000 00000001 10001101 00000010

The last iteration (for r = 2) is a bit tricky. This word of γ indicates the
non-zero terms x22 , x19 and x16 . We need to remove the first two of these, but
we cannot remove x16 . Since n = 19, we consider only the coefficients of x19
to x23 . The word is appropriately right shifted to compute µ in this case. ¤

For f1 of special forms, further optimizations can be made. For instance, if


f1 contains only a few non-zero terms with degrees sufficiently separated from
one another, the computation of µf1 does not require a true multiplication.
Word-level shift and XOR operations can subtract xrw−n µf1 from γ.
94 Computational Number Theory

Example 2.22 NIST recommends7 the representation of F2233 using the ir-
reducible polynomial f (x) = x233 + x74 + 1. As in a real-life implementation,
we now choose w = 64. An element of F2233 fits in four words. Moreover, an
unreduced product γ of two field elements is a polynomial of degree at most
464, and fits in eight words. Let us denote the words of γ as γ0 , γ1 , . . . , γ7 . We
need to eliminate γ7 , γ6 , γ5 , γ4 completely and γ3 partially. For r = 7, 6, 5, 4
(in that sequence), we need to compute xrw−n µf1 = x64r−233 γr (x74 + 1) =
(x64r−159 + x64r−233 )γr = (x64(r−3)+33 + x64(r−4)+23 )γr = x64(r−3) (x33 γr ) +
x64(r−4) (x23 γr ). Subtracting (XORing) this quantity from γ is equivalent to
the following four word-level XOR operations:

γr−3 is XORed with LEFT-SHIFT(γr , 33),


γr−2 is XORed with RIGHT-SHIFT(γr , 31),
γr−4 is XORed with LEFT-SHIFT(γr , 23),
γr−3 is XORed with RIGHT-SHIFT(γr , 41).

Removal of the coefficients of x255 through x233 in γ3 can be similarly handled.


The details are left to the reader as Exercise 2.10. ¤

Modular reduction involves division by the defining polynomial f (x). In


other situations (like gcd computations), we may have to divide any a(x) by
any non-zero b(x). The standard coefficient-removal procedure continues to
work here too. But a general b(x) does not always enjoy the nice structural
properties of f (x). Consequently, the optimized division procedures that work
for these f (x) values are no longer applicable to a general Euclidean division.

2.3.3 Polynomial GCD and Inverse


Given a non-zero element α in a field, we need to compute the field element
u such that uα = 1. All algorithms in this section compute the extended gcd
of α with the defining polynomial f . Since f is irreducible, and α is non-zero
with degree smaller than n = deg f , we have gcd(α, f ) = 1 = uα+vf for some
polynomials u, v. Computing u is of concern to us. We pass α and f as the two
input parameters to the gcd algorithms. We consider only binary fields F2n ,
adaptations to other fields Fpn being fairly straightforward. I will restart using
the polynomial notation for field elements (instead of the bit-vector notation).

2.3.3.1 Euclidean Inverse


Like integers, the Euclidean gcd algorithm generates a remainder sequence
initialized as r0 = f and r1 = α. Subsequently, for i = 2, 3, . . . , one computes
ri = ri−2 rem ri−1 . We maintain two other sequences ui and vi satisfying
ui α + vi f = ri for all i > 0. We initialize u0 = 0, u1 = 1, v0 = 1, v1 = 0
so that the invariance is satisfied for i = 0, 1. If qi = ri−2 quot ri−1 , then ri
7 http://csrc.nist.gov/groups/ST/toolkit/documents/dss/NISTReCur.pdf
Arithmetic of Finite Fields 95

can be written as ri = ri−2 − qi ri−1 . We update both the u and v sequences


analogously, that is, ui = ui−2 − qi ui−1 and vi = vi−2 − qi vi−1 . One can easily
verify that these new values continue to satisfy ui α + vi f = ri .
For inverse calculations, it is not necessary to explicitly compute the v
sequence. Even if vi is needed at the end of the gcd loop, we can obtain this
as vi = (ri − ui α)/f . Since each ri and each ui depend on only two previous
terms, we need to store data only from two previous iterations.

Example 2.23 Let us define F27 by the irreducible polynomial f (x) = x7 +


x3 + 1, and compute the inverse of α(x) = x6 + x3 + x2 + x. (Actually,
α = θ6 + θ3 + θ2 + θ, where θ is a root of f . In the current context, we prefer
to use x instead of θ to highlight that we are working in F2 [x].) Iterations
of the extended Euclidean gcd algorithm are tabulated below. For clarity, we
continue to use the indexed notation for the sequences ri and ui .
i qi ri ui
7 3
0 x +x +1 0
1 x6 + x3 + x2 + x 1
2 x x4 + x2 + 1 x
3 x2 + 1 x + x2 + x + 1
3
x3 + x + 1
4 x+1 x2 x4 + x3 + x2 + x + 1
5 x+1 x+1 x5 + x3 + x
6 x+1 1 x6 + x5 + 1

It follows that α−1 = x6 + x5 + 1. (Actually, α−1 = θ6 + θ5 + 1.) ¤

2.3.3.2 Binary Inverse


The binary inverse algorithm in F2n is a direct adaptation of the extended
binary gcd algorithm for integers, Now, x plays the role of 2. Algorithm 2.1
describes the binary inverse algorithm. The two polynomials α(x) and f (x)
are fed to this algorithm as input. Algorithm 2.1 maintains the invariance
u1 α + v1 f = r1 ,
u2 α + v2 f = r2 .
Here, r1 , r2 behave like the remainder sequence of Euclidean gcd. The other
sequences u and v are subject to the same transformations as the r sequence.
We do not maintain the v sequence explicitly. This sequence is necessary only
for understanding the correctness of the algorithm.
The trouble now is that when we force r1 (or r2 ) to be divisible by x, the
polynomial u1 (or u2 ) need not be divisible by x. Still, we have to extract an
x from u1 (or u2 ). This is done by mutually adjusting the u and v values.
Suppose x|r1 but x6 | u1 , and we want to cancel x from u1 α + v1 f = r1 . Since
x6 | u1 , the constant term in u1 must be 1. Moreover, since f is an irreducible
polynomial, its constant term too must be 1. But then, the constant term
96 Computational Number Theory

of u1 + f is zero, that is, x|(u1 + f ). We rewrite the invariance formula as


(u1 + f )α + (v1 + α)f = r1 . Since x divides both r1 and u1 + f (but not f ),
x must divide v1 + α too. So we can now cancel x throughout the equation.
Algorithm 2.1: Binary inverse algorithm
Initialize r1 = α, r2 = f , u1 = 1 and u2 = 0.
Repeat {
While (r1 is divisible by x) {
Set r1 = r1 /x.
If (u1 is not divisible by x), set u1 = u1 + f .
Set u1 = u1 /x.
If (r1 = 1), return u1 .
}
While (r2 is divisible by x) {
Set r2 = r2 /x.
If (u2 is not divisible by x), set u2 = u2 + f .
Set u2 = u2 /x.
If (r2 = 1), return u2 .
}
If (deg r1 > deg r2 ) {
Set r1 = r1 + r2 and u1 = u1 + u2 .
} else {
Set r2 = r2 + r1 and u2 = u2 + u1 .
}
}

Example 2.24 Under the representation of F27 by the irreducible polynomial


f (x) = x7 + x3 + 1, we compute α−1 , where α = x6 + x3 + x2 + x. The
computations are listed in the following table (continued on the next page).
r1 r2 u1 u2
x6 + x3 + x2 + x x7 + x3 + 1 1 0
Repeatedly remove x from r1 . Adjust u1 .
x5 + x2 + x + 1 x7 + x3 + 1 x6 + x2 0
Set r2 = r2 + r1 and u2 = u2 + u1 .
x5 + x2 + x + 1 x7 + x5 + x3 + x2 + x x6 + x2 x6 + x2
Repeatedly remove x from r2 . Adjust u2 .
x5 + x2 + x + 1 x6 + x4 + x2 + x + 1 x6 + x2 x5 + x
Set r2 = r2 + r1 and u2 = u2 + u1 .
x5 + x2 + x + 1 x6 + x5 + x4 x6 + x2 x6 + x5 + x2 + x
Repeatedly remove x from r2 . Adjust u2 .
x5 + x2 + x + 1 x5 + x4 + x3 x6 + x2 x5 + x4 + x + 1
5 2
x +x +x+1 x4 + x3 + x2 x6 + x2 x + x4 + x3 + x2 + 1
6
5 2
x +x +x+1 x3 + x2 + x x6 + x2 x6 + x5 + x3 + x
5 2
x +x +x+1 x2 + x + 1 x6 + x2 x5 + x4 + x2 + 1
Arithmetic of Finite Fields 97

r1 r2 u1 u2
Set r1 = r1 + r2 and u1 = u1 + u2 .
x5 x2 + x + 1 x6 + x5 + x4 + 1 x5 + x4 + x2 + 1
Repeatedly remove x from r1 . Adjust u1 .
x4 x2 + x + 1 x6 + x5 + x4 + x3 + x2 x5 + x4 + x2 + 1
3 2
x x +x+1 x5 + x4 + x3 + x2 + x x5 + x4 + x2 + 1
2 2
x x +x+1 x4 + x3 + x2 + x + 1 x5 + x4 + x2 + 1
2
x x +x+1 x6 + x3 + x + 1 x5 + x4 + x2 + 1
2
1 x +x+1 x6 + x5 + 1 x5 + x4 + x2 + 1
In this example, r1 eventually becomes 1, so the inverse of α is the value
of u1 at that time, that is, x6 + x5 + 1. ¤
For integers, binary gcd is usually faster than Euclidean gcd, since Euclid-
ean division is significantly more expensive than addition and shifting. For
polynomials, binary inverse and Euclidean inverse have comparable perfor-
mances. Here, Euclidean inverse can be viewed as a sequence of removing the
most significant terms from one of the remainders. In binary inverse, a term is
removed from the least significant end, and subsequently divisions by x restore
the least significant term back to 1. Both these removal processes use roughly
the same number and types (shift and XOR) of operations.

2.3.3.3 Almost Inverse


The almost inverse algorithm8 is a minor variant of the binary inverse
algorithm. In the binary inverse algorithm, we cancel x from both r1 and
u1 (or from r2 and u2 ) inside the gcd loop. The process involves conditional
adding of f to u1 (or u2 ). In the almost inverse algorithm, we do not extract
the x’s from the u (and v) sequences (but remember how many x’s need to be
extracted). After the loop terminates, we extract all these x’s from u1 or u2 .
More precisely, we now maintain the invariances
u1 α + v1 f = xk r1 ,
u2 α + v2 f = xk r2 ,
for some integer k > 0. The value of k changes with time (and should be
remembered), but must be the same in both the equations at the same point of
time. Suppose that both r1 and r2 have constant terms 1, and deg r1 > deg r2 .
In that case, we add the second equation to the first to get
(u1 + u2 )α + (v1 + v2 )f = xk (r1 + r2 ).
Renaming u1 +u2 as u1 , v1 +v2 as v1 , and r1 +r2 as r1 gives u1 α+v1 f = xk r1 .
Now, r1 is divisible by x. Let t be the largest exponent for which xt divides
r1 . We extract xt from r1 , and rename r1 /xt as r1 to get
u1 α + v1 f = xk+t r1 .
8 R. Schroeppel, H. Orman, S. O’Malley and O. Spatscheck, Fast key exchange with

elliptic curve systems, CRYPTO, 43–56, 1995.


98 Computational Number Theory

We do not update u1 and v1 here. However, since the value of k has changed
to k + t, the other equation must be updated to agree with this, that is, the
second equation is transformed as
(xt u2 )α + (xt v2 )f = xk+t r2 .
Renaming xt u2 as u2 and xt v2 as v2 restores both the invariances. Algorithm
2.2 implements this idea. We do not need to maintain v1 , v2 explicitly.

Algorithm 2.2: Almost inverse algorithm


Initialize r1 = α, r2 = f , u1 = 1, u2 = 0, and k = 0.
Repeat {
if (r1 is divisible by x) {
Let xt |r1 but xt+16 | r1 .
Set r1 = r1 /xt , u2 = xt u2 , and k = k + t.
If (r1 = 1), return x−k u1 (mod f ).
}
if (r2 is divisible by x) {
Let xt |r2 but xt+16 | r2 .
Set r2 = r2 /xt , u1 = xt u1 , and k = k + t.
If (r2 = 1), return x−k u2 (mod f ).
}
If (deg r1 > deg r2 ) {
Set r1 = r1 + r2 and u1 = u1 + u2 .
} else {
Set r2 = r2 + r1 and u2 = u2 + u1 .
}
}

We have to extract the required number of x’s after the termination of


the loop. Suppose that the loop terminates because of the condition r1 = 1.
In that case, we have u1 α + v1 f = xk for some k. But then, α−1 = x−k u1
modulo f . Likewise, if r2 becomes 1, we need to compute x−k u2 modulo f .
Suppose that x−k u needs to be computed modulo f for some u. One pos-
sibility is to divide u by x as long as possible. When the constant term in
u becomes 1, f is added to it, and the process continues until x is removed
k times. This amounts to doing the same computations as in Algorithm 2.1
(albeit at a different location in the algorithm).
If f is of some special form, the removal process can be made somewhat
efficient. Suppose that xl is the non-zero term in f with smallest degree > 0.
Let h denote the sum of the non-zero terms of u with degrees < l (terms
involving x0 , x1 , . . . , xl−1 ). Since u + hf is divisible by xl , l occurrences of
x can be simultaneously removed from u + hf . For small l (l = 1 in the
worst case), the removal of x’s requires too many iterations. If l is not too
small, many x’s are removed per iteration, and we expect the almost inverse
algorithm to run a bit more efficiently than the binary inverse algorithm.
Arithmetic of Finite Fields 99

Example 2.25 As in the previous two examples, take f = x7 + x3 + 1 and


α = x6 + x3 + x2 + x. The iterations of the main gcd loop are shown first.

r1 r2 u1 u2 t
x6 + x3 + x2 + x x7 + x3 + 1 1 0 0
Remove x1 from r1 . Adjust u2 .
x5 + x2 + x + 1 x7 + x3 + 1 1 0 1
Set r2 = r2 + r1 and u2 = u2 + u1 .
x5 + x2 + x + 1 x7 + x5 + x3 + x2 + x 1 1 1
Remove x1 from r2 . Adjust u1 .
x5 + x2 + x + 1 x6 + x4 + x2 + x + 1 x 1 2
Set r2 = r2 + r1 and u2 = u2 + u1 .
x5 + x2 + x + 1 x6 + x5 + x4 x x+1 2
4
Remove x from r2 . Adjust u1 .
x5 + x2 + x + 1 x2 + x + 1 x5 x+1 6
Set r1 = r1 + r2 and u1 = u1 + u2 .
x5 x2 + x + 1 x5 + x + 1 x+1 6
5
Remove x from r1 . Adjust u2 .
1 x2 + x + 1 x5 + x + 1 x6 + x5 11

The loop terminates because of r1 = 1. At that instant, k = 11, and u1 =


x5 + x + 1. So we compute x−11 (x5 + x + 1) modulo f . For f (x) = x7 + x3 + 1,
we have l = 3, that is, we can remove x3 in one iteration. In the last iteration,
only x2 is removed. The removal procedure is illustrated below.

l h u + hf (u + hf )/xl (renamed as u)
x5 + x + 1
3 x+1 x8 + x7 + x5 + x4 + x3 x + x4 + x2 + x + 1
5
2
3 x +x+1 x9 + x8 + x7 + x3 x6 + x5 + x4 + 1
3 1 x + x6 + x5 + x4 + x3
7
x + x3 + x2 + x + 1
4

2 x+1 x8 + x7 + x2 x6 + x5 + 1 ¤

2.4 Some Properties of Finite Fields


Some mathematical properties of finite fields, that have important bearings
on the implementations of finite-field arithmetic, are studied in this section.

2.4.1 Fermat’s Little Theorem for Finite Fields


Theorem 2.26 [Fermat’s little theorem for Fq ] Let α ∈ Fq . Then, we have
αq = α. Moreover, if α 6= 0, then αq−1 = 1.
100 Computational Number Theory

Proof First, take α 6= 0, and let α1 , . . . , αq−1 be all the elements of F∗q =
Qq−1
Fq \{0}. Then, αα1 , . . . , ααq−1 is a permutation of α1 , . . . , αq−1 , so i=1 αi =
Qq−1 q−1
Qq−1 q−1
i=1 (ααi ) = α i=1 αi . Canceling the product yields α = 1 and so
q q
α = α. For α = 0, we have 0 = 0. ⊳
A very important consequence of this theorem follows.
Theorem 2.27 Y The polynomial xq − x ∈ Fp [x] splits into linear factors over
Fq as xq − x = (x − α). ⊳
α∈Fq

Proposition 2.28 Let q = pn , and d a positive divisor of n. Then, Fq con-


tains a unique intermediate field of size Fpd . Moreover, an element α ∈ Fq
d
belongs to this intermediate field if and only if αp = α.
d
Proof Consider the set E = {α ∈ Fq | αp = α}. It is easy to verify that E
satisfies all axioms for a field and that E contains exactly pd elements. ⊳
Example 2.29 Let us represent F64 = F26 as
F2 (θ) = {a5 θ5 + a4 θ4 + a3 θ3 + a2 θ2 + a1 θ + a0 | ai ∈ {0, 1}},
where θ6 + θ + 1 = 0. The element a5 θ5 + a4 θ4 + a3 θ3 + a2 θ2 + a1 θ + a0 ∈ F64
is abbreviated as a5 a4 a3 a2 a1 a0 . The square of this element is calculated as
(a5 θ5 + a4 θ4 + a3 θ3 + a2 θ2 + a1 θ + a0 )2
= a5 θ10 + a4 θ8 + a3 θ6 + a2 θ4 + a1 θ2 + a0
= a5 θ4 (θ + 1) + a4 θ2 (θ + 1) + a3 (θ + 1) + a2 θ4 + a1 θ2 + a0
= a5 θ5 + (a5 + a2 )θ4 + a4 θ3 + (a4 + a1 )θ2 + a3 θ + (a3 + a0 ).
t
The smallest positive integer t for which α2 = α is listed for each α ∈ F64 .
α t α t α t α t
000000 1 010000 6 100000 6 110000 6
000001 1 010001 6 100001 6 110001 6
000010 6 010010 6 100010 6 110010 6
000011 6 010011 6 100011 6 110011 6
000100 6 010100 6 100100 6 110100 6
000101 6 010101 6 100101 6 110101 6
000110 6 010110 3 100110 6 110110 6
000111 6 010111 3 100111 6 110111 6
001000 6 011000 3 101000 6 111000 6
001001 6 011001 3 101001 6 111001 6
001010 6 011010 6 101010 6 111010 2
001011 6 011011 6 101011 6 111011 2
001100 6 011100 6 101100 6 111100 6
001101 6 011101 6 101101 6 111101 6
001110 3 011110 6 101110 6 111110 6
001111 3 011111 6 101111 6 111111 6
Arithmetic of Finite Fields 101

The proper divisors of the extension degree 6 are 1, 2, 3. The unique interme-
diate field of F64 of size 21 is {0, 1}. The intermediate field of size 22 is
{0, 1, θ5 + θ4 + θ3 + θ, θ5 + θ4 + θ3 + θ + 1}.
Finally, the intermediate field of size 23 is
n o
0, 1, θ3 +θ2 +θ, θ3 +θ2 +θ+1, θ4 +θ2 +θ, θ4 +θ2 +θ+1, θ4 +θ3 , θ4 +θ3 +θ . ¤

2.4.2 Multiplicative Orders of Elements in Finite Fields


The concept of multiplicative orders modulo p can be generalized for Fq .
Definition 2.30 Let α ∈ Fq , α 6= 0. The smallest positive integer e for which
αe = 1 is called the order of α and is denoted by ord α. By Fermat’s little
theorem for Fq , we have ord α|(q − 1). If ord α = q − 1, then α is called a
primitive element of Fq .
If Fq = Fp (θ), where θ is a root of the irreducible polynomial f (x) ∈ Fp [x],
and if θ is a primitive element of Fq , we call f (x) a primitive polynomial. ⊳
Theorem 2.31 Every finite field has a primitive element. (In algebraic
terms, the group F∗q is cyclic.)
Proof Follow an argument as in the proof of Theorem 1.57. ⊳
Example 2.32 As in Example 2.29, we continue to represent F64 as F2 (θ),
where θ6 + θ + 1 = 0. The orders of all the elements of F64 are listed now.
α ord α α ord α α ord α α ord α
000000 − 010000 63 100000 63 110000 63
000001 1 010001 21 100001 63 110001 63
000010 63 010010 21 100010 63 110010 63
000011 21 010011 63 100011 63 110011 21
000100 63 010100 9 100100 63 110100 63
000101 21 010101 63 100101 63 110101 63
000110 9 010110 7 100110 63 110110 21
000111 63 010111 7 100111 63 110111 63
001000 21 011000 7 101000 21 111000 63
001001 63 011001 7 101001 63 111001 21
001010 63 011010 9 101010 63 111010 3
001011 9 011011 63 101011 21 111011 3
001100 63 011100 9 101100 63 111100 63
001101 21 011101 63 101101 63 111101 63
001110 7 011110 63 101110 63 111110 21
001111 7 011111 9 101111 63 111111 63
The field Fq has exactly φ(q − 1) primitive elements. For q = 64, this num-
ber is φ(63) = 36. The above table shows that θ itself is a primitive element.
Therefore, the defining polynomial x6 + x + 1 is a primitive polynomial. ¤
102 Computational Number Theory

2.4.3 Normal Elements


Let Fq = Fp (θ), where θ is a root of an irreducible polynomial f (x) ∈
Fp [x] of degree n. Since f (x) has a root θ in Fq , the polynomial is no longer
irreducible in Fq [x]. But what happens to the other n − 1 roots of f (x)?
In order to answer this question, we first review the extension of R by
adjoining a root i of x2 + 1. The other root of this polynomial is −i which is
included in C. Thus, the polynomial x2 + 1 splits into linear factors over C as
x2 + 1 = (x − i)(x + i). Since the defining polynomial is of degree two, and has
one root in the extension, the other root must also reside in that extension.
Let us now extend the field Q by a root θ of the polynomial x3 −2. The three
roots of this polynomial are θ0 = 21/3 , θ1 = 21/3 e i 2π/3 , and θ2 = 21/3 e i 4π/3 .
Let us take θ = θ0 . Since this root is a real number, adjoining it to Q gives a
field contained in R. On the other hand, the roots θ1 , θ2 are properly complex
numbers, that is, Q(θ) does not contain θ1 , θ2 . Indeed, the defining polynomial
factors over the extension as x3 − 2 = (x − 21/3 )(x2 + 21/3 x + 22/3 ), the second
factor being irreducible in this extension.
The above two examples illustrate that an extension may or may not con-
tain all the roots of the defining polynomial. Let us now concentrate on finite
fields. Let us write f (x) explicitly as

f (x) = a0 + a1 x + a2 x2 + · · · + an xn

with each ai ∈ Fp . Exercise 1.34 implies f (x)p = ap0 +ap1 xp +ap2 x2p +· · ·+apn xnp .
By Fermat’s little theorem, api = ai in Fp , and so f (x)p = a0 + a1 xp + a2 x2p +
· · · + an xnp = f (xp ). Putting x = θ yields f (θp ) = f (θ)p = 0p = 0, that is,
θp is again a root of f (x). Moreover, θp ∈ Fq . We can likewise argue that
2 3 2
θp = (θp )p , θp = (θp )p , . . . are roots of f (x) and lie in Fq . One can show
2 n−1
that the roots θ, θp , θp , . . . , θp of f (x) are pairwise distinct and so must
be all the roots of f (x). In other words, f (x) splits into linear factors over Fq :

2 n−1
f (x) = an (x − θ)(x − θp )(x − θp ) · · · (x − θp ).

i
Definition 2.33 The elements θp for i = 0, 1, 2, . . . , n − 1 are called conju-
gates of θ. (More generally, the roots of an irreducible polynomial over any
field are called conjugates of one another.)
2 n−1
If θ, θp , θp , . . . , θp are linearly independent over Fp , θ is called a normal
p p2 n−1
element of Fq , and θ, θ , θ , . . . , θp a normal basis of Fq over Fp .
If a normal element θ is also a primitive element of Fq , we call θ a primitive
2 n−1
normal element, and θ, θp , θp , . . . , θp a primitive normal basis. ⊳

Example 2.34 We represent F64 as in Examples 2.29 and 2.32. The elements
i
θ2 for 0 6 i 6 5 are now expressed in the polynomial basis 1, θ, . . . , θ5 .
Arithmetic of Finite Fields 103

θ = θ
θ2 = θ2
θ4 = θ4
θ8 = θ 2
+ θ 3

θ16 = 1 + θ + θ4
θ32 = 1 + θ 3

In matrix notation, we have:


    
θ 0 1 0 0 0 0 1
2
 θ  0 0 1 0 0 0 θ 
 4  
0   θ2 
 
 θ  0 0 0 0 1
 8 =
0   θ3 
 
 θ  0 0 1 1 0
 16    4
θ 1 1 0 0 1 0 θ
θ32 1 0 0 1 0 0 θ5
The last column of the 6 × 6 matrix consists only of 0, that is, the transforma-
i
tion matrix is singular, that is, the elements θ2 for 0 6 i 6 5 are not linearly
independent. In fact, θ + θ + θ + θ + θ + θ32 = 0. Therefore, θ is not a
2 4 8 16

normal element of F64 . By Example 2.32, θ is a primitive element of Fq . Thus,


being a primitive element is not sufficient for being a normal element. ¤

2.4.4 Minimal Polynomials


The above discussion about conjugates and normal elements extends to
2 t−1
any arbitrary element α ∈ Fq . The conjugates of α are α, αp , αp , . . . , αp ,
t
where t is the smallest positive integer for which αp = α. The polynomial
2 t−1
fα (x) = (x − α)(x − αp )(x − αp ) · · · (x − αp )
is an irreducible polynomial in Fp [x]. We call fα (x) the minimal polynomial
of α over Fp , and t is called the degree of α. The element α is a root of a
polynomial g(x) ∈ Fp [x] if and only if fα (x)|g(x) in Fp [x].

Example 2.35 Extend F2 by θ satisfying θ6 + θ + 1 = 0 to obtain F64 .


(1) Let α = θ5 + θ4 + θ3 + θ. By Example 2.29, α is of degree two.
We have α2 = θ10 + θ8 + θ6 + θ2 = θ4 (θ + 1) + θ2 (θ + 1) + (θ + 1) + θ2 =
2
θ5 +θ4 +θ3 +θ +1 = α +1, and α2 = (α +1)2 = α2 +1 = (α +1)+1 = α. The
minimal polynomial of α is fα (x) = (x + α)(x + α2 ) = (x + α)(x + α + 1) =
x2 +(α+α+1)x+α(α+1) = x2 +x+(α2 +α) = x2 +x+(α+1+α) = x2 +x+1.
(2) By Example 2.29, β = θ4 +θ3 is of degree three. We have β 2 = θ8 +θ6 =
θ3 + θ2 + θ + 1, β 4 = (β 2 )2 = θ6 + θ4 + θ2 + 1 = θ4 + θ2 + θ, and β 8 = (β 4 )2 =
θ8 + θ4 + θ2 = θ2 (θ + 1) + θ4 + θ2 = θ4 + θ3 = β. So the minimal polynomial
of β is (x + β)(x + β 2 )(x + β 4 ) = x3 + (β + β 2 + β 4 )x2 + (β 3 + β 5 + β 6 )x + β 7 .
Calculations in F64 show β + β 2 + β 4 = 1, β 3 + β 5 + β 6 = 0, and β 7 = 1, that
is, fβ (x) = x3 + x2 + 1. F8 is defined by this polynomial in Example 2.7(2).
104 Computational Number Theory

(3) The element γ = θ5 + 1 of degree six has the conjugates

γ = θ5 + 1,
γ2 = θ5 + θ4 + 1,
γ4 = θ5 + θ4 + θ3 + θ2 + 1,
γ8 = θ5 + θ3 + θ2 + θ,
γ 16 = θ5 + θ2 + θ + 1, and
γ 32 = θ5 + θ2 + 1.

(γ 64 = θ5 + 1 = γ, as expected.) The minimal polynomial of γ is, therefore,


fγ (x) = (x + γ)(x + γ 2 )(x + γ 4 )(x + γ 8 )(x + γ 16 )(x + γ 32 ) = x6 + x5 + 1. ¤
2 n−1
An element α ∈ Fpn is called normal if α, αp , αp , . . . , αp are linearly
independent over Fp . If the degree of α is a proper divisor of n, these ele-
ments cannot be linearly independent. So a necessary condition for α to be
normal is that α has degree n. If α is a normal element of Fq , the basis
2 n−1
α, αp , αp , . . . , αp is called a normal basis of Fpn over Fp . If, in addition,
α is primitive, it is called a primitive normal element of Fpn , and the basis
2 n−1
α, αp , αp , . . . , αp is called a primitive normal basis.

Theorem 2.36 For every p ∈ P and n ∈ N, the extension Fpn contains


a normal element.9 Moreover, for every p ∈ P and n ∈ N, there exists a
primitive normal element in Fpn .10 ⊳

Example 2.37 (1) Consider the element γ ∈ F64 of Example 2.35. We have
    
γ 1 0 0 0 0 1 1
2
 γ  1 0 0 0 1 1 θ 
 4  
1   θ2 
 
 γ  1 0 1 1 1
 8 =  .
 γ  0 1 1 1 0 1   θ3 
 16    4
γ 1 1 1 0 0 1 θ
γ 32 1 0 1 0 0 1 θ5
The 6 × 6 transformation matrix has determinant 1 modulo 2. Therefore, γ
is a normal element of F64 and γ, γ 2 , γ 4 , γ 8 , γ 16 , γ 32 constitute a normal basis
of F64 over F2 . By Example 2.32, ord γ = 63, that is, γ is also a primitive
element of F64 . Therefore, γ is a primitive normal element of F64 , and the
basis γ, γ 2 , γ 4 , γ 8 , γ 16 , γ 32 is a primitive normal basis of F64 over F2 .
(2) The conjugates of δ = θ5 + θ4 + θ3 + 1 are
9 Eisenstein (1850) conjectured that normal bases exist for all finite fields. Kurt Hensel

(1888) first proved this conjecture. Hensel and Ore counted the number of normal elements
in a finite field.
10 The proof that primitive normal bases exist for all finite fields can be found in the

paper: Hendrik W. Lenstra, Jr. and René J. Schoof, Primitive normal bases for finite fields,
Mathematics of Computation, 48, 217–231, 1986.
Arithmetic of Finite Fields 105

δ = θ5 + θ4 + θ3 + 1,
2
δ = θ5 + θ4 + θ3 + θ2 + θ,
δ4 = θ5 + θ3 + θ + 1,
δ8 = θ5 + θ4 + θ2 + θ,
δ 16 = θ5 + θ3 , and
δ 32 = θ5 + θ4 + θ + 1,
so that
    
δ 1 0 0 1 1 1 1
2
 δ  0 1 1 1 1 1 θ 
 4  
1   θ2 
 
 δ  1 1 0 1 0
 8 =  .
 δ  0 1 1 0 1 1   θ3 
 16    4 
δ 0 0 0 1 0 1 θ
δ 32 1 1 0 0 1 1 θ5
The transformation matrix has determinant 1 modulo 2, that is, δ is a nor-
mal element of F64 , and δ, δ 2 , δ 4 , δ 8 , δ 16 , δ 32 constitute a normal basis of F64
over F2 . However, by Example 2.32, ord δ = 21, that is, δ is not a primi-
tive element of F64 , that is, δ is not a primitive normal element of F64 , and
δ, δ 2 , δ 4 , δ 8 , δ 16 , δ 32 is not a primitive normal basis of F64 over F2 . Combin-
ing this observation with Example 2.34, we conclude that being a primitive
element is neither necessary nor sufficient for being a normal element. ¤
The relevant computational question here is how we can efficiently locate
normal elements in a field Fpn . The first and obvious strategy is keeping on
picking random elements from Fpn until a normal element is found. Each nor-
mality check involves computing the determinant (or rank) of an n × n matrix
with entries from Fp , as demonstrated in Example 2.37. Another possibility
is to compute the gcd of two polynomials over Fpn (see Exercise 3.42). Such
a random search is efficient, since the density of normal elements in a finite
field is significant. More precisely, a random element of Fpn is normal over
Fp with probability > 1/34 if n 6 p4 and with probability > 1/(16 logp n)
if n > p4 . These density estimates, and also a deterministic polynomial-time
algorithm based on polynomial root finding, can be found in the paper of
Von zur Gathen and Giesbrecht.11 This paper also proposes a randomized
polynomial-time algorithm for finding primitive normal elements.
A more efficient randomized algorithm is based on the following result,
proved by Emil Artin. This result is, however, inappropriate if p is small.
Proposition 2.38 Represent Fpn = Fp (θ), where θ is a root of the monic
irreducible polynomial f (x) ∈ Fp [x] (of degree n). Consider the polynomial
f (x)
g(x) = ∈ Fpn [x],
(x − α)f ′ (α)
11 Joachim Von Zur Gathen and Mark Giesbrecht, Constructing normal bases in finite

fields, Journal of Symbolic Computation, 10, 547–570, 1990.


106 Computational Number Theory

where f ′ (x) is the formal derivative of f (x). Then, there are at least p−n(n−1)
elements a in Fp , for which g(a) is a normal element of Fpn over Fp . ⊳

It follows that if p > 2n(n − 1), then for a random a ∈ Fp , the element
g(a) ∈ Fpn is normal over Fp with probability at least 1/2. Moreover, in this
case, a random element in Fpn is normal with probability at least 1/2 (as
proved by Gudmund Skovbjerg Frandsen from Aarhus University, Denmark).
Deterministic polynomial-time algorithms are known for locating normal
elements. For example, see Lenstra’s paper cited in Footnote 15 on page 114.

2.4.5 Implementing Some Functions in GP/PARI


Let us now see how we can program the GP/PARI calculator for computing
minimal polynomials and for checking normal elements. We first introduce the
defining polynomial for representing F64 .

gp > f = Mod(1,2)*x^6 + Mod(1,2)*x + Mod(1,2)


%1 = Mod(1, 2)*x^6 + Mod(1, 2)*x + Mod(1, 2)

Next, we define a function for computing the minimal polynomial of an


element of F64 . Since the variable x is already used in the representation of
F64 , we use a separate variable y for the minimal polynomial. The computed
polynomial is lifted twice to remove the moduli f and 2 in the output.

gp > minimalpoly(a) = \
p = y - a; \
b = a * a; \
while (b-a, p *= (y-b); b = b*b); \
lift(lift(p))
gp > minimalpoly(Mod(Mod(0,2),f))
%2 = y
gp > minimalpoly(Mod(Mod(1,2),f))
%3 = y + 1
gp > minimalpoly(Mod(Mod(1,2)*x,f))
%4 = y^6 + y + 1
gp > minimalpoly(Mod(Mod(1,2)*x^5+Mod(1,2)*x^4+Mod(1,2)*x^3+Mod(1,2)*x,f))
%5 = y^2 + y + 1
gp > minimalpoly(Mod(Mod(1,2)*x^4+Mod(1,2)*x^3,f))
%6 = y^3 + y^2 + 1
gp > minimalpoly(Mod(Mod(1,2)*x^5+Mod(1,2),f))
%7 = y^6 + y^5 + 1
gp > minimalpoly(Mod(Mod(1,2)*x^5+Mod(1,2)*x^2+Mod(1,2)*x+Mod(1,2),f))
%8 = y^6 + y^5 + 1

Now, we define a function for checking whether an element of F64 is normal.


We compute the transformation matrix M and subsequently its determinant.
Arithmetic of Finite Fields 107

gp > pc(p,i) = polcoeff(p,i)


gp > isnormal(a0) = \
a1 = (a0^2) % f; a2 = (a1^2) % f; a3 = (a2^2) % f; \
a4 = (a3^2) % f; a5 = (a4^2) % f; \
M = Mat([ pc(a0,0),pc(a0,1),pc(a0,2),pc(a0,3),pc(a0,4),pc(a0,5); \
pc(a1,0),pc(a1,1),pc(a1,2),pc(a1,3),pc(a1,4),pc(a1,5); \
pc(a2,0),pc(a2,1),pc(a2,2),pc(a2,3),pc(a2,4),pc(a2,5); \
pc(a3,0),pc(a3,1),pc(a3,2),pc(a3,3),pc(a3,4),pc(a3,5); \
pc(a4,0),pc(a4,1),pc(a4,2),pc(a4,3),pc(a4,4),pc(a4,5); \
pc(a5,0),pc(a5,1),pc(a5,2),pc(a5,3),pc(a5,4),pc(a5,5) ]); \
printp("M = ", lift(M)); print("det(M) = ",matdet(M)); \
if(matdet(M)==Mod(1,2), print("normal");1, print("not normal");0)

We pass a polynomial in F2 [x] of degree less than six as the only argument
of isnormal() to check whether this corresponds to a normal element of F64 .

gp > isnormal(Mod(1,2)*x^5+Mod(1,2)*x^4+Mod(1,2)*x^3+Mod(1,2))
M =
[1 0 0 1 1 1]

[0 1 1 1 1 1]

[1 1 0 1 0 1]

[0 1 1 0 1 1]

[0 0 0 1 0 1]

[1 1 0 0 1 1]

det(M) = Mod(1, 2)
normal
%9 = 1
gp > isnormal(Mod(1,2)*x^5+Mod(1,2)*x)
M =
[0 1 0 0 0 1]

[0 0 1 0 1 1]

[0 0 1 1 0 1]

[1 1 0 0 0 1]

[1 0 1 0 1 1]

[1 0 1 1 0 1]

det(M) = Mod(0, 2)
not normal
%10 = 0
108 Computational Number Theory

2.5 Alternative Representations of Finite Fields


So far, we have used the polynomial-basis representation of extension fields.
We are now equipped with sophisticated machinery for investigating several
alternative ways of representing extension fields.

2.5.1 Representation with Respect to Arbitrary Bases


Fpn is an n-dimensional vector space over Fp . For a root θ of an irreducible
polynomial in Fp [x] of degree n, the elements 1, θ, θ2 , . . . , θn−1 form an Fp -
basis of Fpn , that is, every element of Fpn can be written as a unique Fp -linear
combination of the basis elements. This representation can be generalized to
any arbitrary Fp -basis of Fpn .
Let θ0 , θ1 , . . . , θn−1 be n linearly independent elements of Fpn . Then, any
element α ∈ Fpn can be written uniquely as α = a0 θ0 + a1 θ1 + · · · + an−1 θn−1
with each ai ∈ Fp . Let β = b0 θ0 + b1 θ1 + · · · + bn−1 θn−1 be another element of
Fpn in this representation. The sum of these elements can be computed easily
as α + β = (a0 + b0 )θ0 + (a1 + b1 )θ1 + · · · + (an−1 + bn−1 )θn−1 , where each
ai + bi stands for the addition of Fp .
Multiplication
X of the elements α and β encounters some difficulty. We have
αβ = ai bj θi θj . For each pair (i, j), we need to express θi θj in the basis
i,j
θ 0 , θ1 , 
. . . , θn−1 . Suppose θi
θj = ti,j,0 θ0 + ti,j,1 θ1 + ·· · + ti,j,n−1 θn−1 . Then,
X X X
αβ =  ai bj ti,j,0  θ0 +  ai bj ti,j,1  θ1 + · · · +  ai bj ti,j,n−1  θn−1 .
i,j i,j i,j
It is, therefore, preferable to precompute the values ti,j,k for all indices i, j, k
between 0 and n − 1. This requires a storage overhead of O(n3 ) which is
reasonable unless n is large. However, a bigger problem in this regard pertains
to the time complexity (Θ(n3 ) operations in the field Fp ) of expressing αβ in
the basis θ0 , θ1 , . . . , θn−1 . For the polynomial-basis representation, a product
in Fpn can be computed using only Θ(n2 ) operations in Fp .
Example 2.39 Let F8 = F2 (θ), where θ3 + θ + 1 = 0. The elements θ0 = 1,
θ1 = 1 + θ, and θ2 = 1 + θ + θ2 are evidently linearly independent over F2 .
We write θi θj in the basis θ0 , θ1 , θ2 .
θ02 = 1 = θ0 , θ0 θ 1 = θ 1 θ 0 = 1 + θ = θ 1 ,
θ12 = 1 + θ2 = θ0 + θ1 + θ2 , θ0 θ2 = θ2 θ0 = 1 + θ + θ2 = θ2 ,
θ22 = 1 + θ = θ1 , θ1 θ 2 = θ 2 θ 1 = θ = θ 0 + θ1 .
Now, consider the elements α = θ0 + θ2 and β = θ1 + θ2 expressed in the basis
θ0 , θ1 , θ2 . Their sum is α + β = (1 + 0)θ0 + (0 + 1)θ1 + (1 + 1)θ2 = θ0 + θ1 ,
whereas their product is αβ = (θ0 + θ2 )(θ1 + θ2 ) = θ0 θ1 + θ0 θ2 + θ2 θ1 + θ22 =
(θ1 ) + (θ2 ) + (θ0 + θ1 ) + (θ1 ) = θ0 + θ1 + θ2 . ¤
Arithmetic of Finite Fields 109

2.5.2 Normal and Optimal Normal Bases


In general, multiplication is encumbered by a representation of elements
of Fpn in an arbitrary basis. There are some special kinds of bases which yield
computational benefit. One such example is a normal basis. Let ψ ∈ Fpn be
2 n−1
a normal element so that ψ, ψ p , ψ p , . . . , ψ p constitute an Fp -basis of Fpn ,
that is, every element α ∈ Fpn can be written uniquely as α = a0 ψ + a1 ψ p +
2 n−1
a2 ψ p +· · ·+an−1 ψ p . Many applications over Fpn (such as in cryptography)
involve exponentiation of elements in Fpn . We expand the exponent e in base
p as e = el−1 pl−1 + el−2 pl−2 + · · · + e1 p + e0 with each ei ∈ {0, 1, 2, . . . , p − 1}.
l−1 l−2
We then have αe = (αp )el−1 (αp )el−2 · · · (αp )e1 (α)e0 . If α is represented
2
using a normal basis as mentioned above, we have αp = ap0 ψ p + ap1 ψ p +
3 n 2 n−1
ap2 ψ p + · · · + apn−1 ψ p = an−1 ψ + a0 ψ p + a1 ψ p + · · · + an−2 ψ p , that is, the
p
representation of α in the normal basis can be obtained by cyclically rotating
the coefficients a0 , a1 , a2 , . . . , an−1 . Thus, it is extremely easy to compute p-th
power exponentiations under the normal-basis representation.
Computing αe involves some multiplications too. For this, we need to ex-
i j 2 n−1
press ψ p ψ p in the basis ψ, ψ p , ψ p , . . . , ψ p for all i, j between 0 and n − 1.
³ ´ i
p
pi pj pj−i i j
For i 6 j, we have ψ ψ = ψψ , that is, the representation of ψ p ψ p
j−i
can be obtained by cyclically rotating the representation of ψψ p by i posi-
i
tions. So it suffices to store data only for ψψ p for i = 0, 1, 2, . . . , n − 1.
2 n−1
The complexity of the normal basis ψ, ψ p , ψ p , . . . , ψ p is defined to be
i
the total number of non-zero coefficients in the expansions of ψψ p for all
i = 0, 1, 2, . . . , n − 1. Normal bases of small complexities are preferred in
software and hardware implementations. The minimum possible value of this
complexity is 2n − 1. A normal basis with the minimum possible complexity is
called an optimal normal basis.12 Unlike normal and primitive normal bases,
optimal normal bases do not exist for all values of p and n.

Example 2.40 Let F8 = F2 (θ), where θ3  +θ+1= 0.Take ψ = θ+1.


 We  have
ψ 1 1 0 1
ψ 2 = θ2 + 1 and ψ 4 = θ2 + θ + 1, that is,  ψ 2  =  1 0 1   θ  with
ψ4 1 1 1 θ2
the coefficient matrix having determinant 1 modulo 2. Thus, ψ is a normal
i
element of F8 . We express ψ0 ψi in the basis ψ0 , ψ1 , ψ2 , where ψi = ψ 2 .

ψ · ψ = θ 2 + 1 = ψ1 , ψ · ψ 2 = θ 2 = ψ0 + ψ2 , ψ · ψ 4 = θ = ψ 1 + ψ2 .

Consequently, the complexity of the normal basis ψ0 , ψ1 , ψ2 is 5 = 2 × 3 − 1,


that is, the basis is optimal. ¤

12 R. Mullin, I. Onyszchuk, S. Vanstone and R. Wilson, Optimal normal bases in GF (pn ),

Discrete Applied Mathematics, 22, 149–161, 1988/89.


110 Computational Number Theory

2.5.3 Discrete-Log Representation


Another interesting representation of a finite field Fq (q may be prime) re-
sults from the observation that F∗q contains primitive elements. Take a prim-
itive element γ. Every non-zero α ∈ Fq can be represented as α = γ i for
some unique i in the range 0 6 i 6 q − 2, that is, we represent α by the
index i. Multiplication and exponentiation are trivial in this representation,
namely, γ i γ j = γ k , where k ≡ i + j (mod q − 1), and (γ i )e = γ l , where
l ≡ ie (mod q − 1). Moreover, the inverse of γ i is γ q−1−i for i > 0.
Carrying out addition and subtraction becomes non-trivial in this repre-
sentation. For example, given i, j, we need to find out k satisfying γ i +γ j = γ k .
We do not know an easy way of computing k from i, j. If q is small, one may
precompute and store k for each pair (i, j). Addition is then performed by
table lookup. The storage requirement is Θ(q 2 ) which is prohibitively large
except only for small values of q.
A trick can reduce the storage requirement to Θ(q). We precompute and
store only the values k for the pairs (0, j). For each j ∈ {0, 1, 2, . . . , q − 2}, we
precompute and store zj satisfying γ zj = 1 + γ j . The quantities zj are called
Zech’s logarithms or Jacobi’s logarithms.13 For i 6 j, we compute γ i + γ j =
γ i (1 + γ j−i ) = γ i γ zj−i = γ k , where k ≡ i + zj−i (mod q − 1).
Here, we have assumed both the operands γ i , γ j to be non-zero and also
their sum γ i + γ j = γ k to be non-zero. If one of the operands is zero (or
both are zero), one sets γ i + 0 = 0 + γ i = γ i (or 0 + 0 = 0), whereas if
γ i + γ j = 0 (equivalently, if γ j−i = −1), the Zech logarithm zj−i is not
defined. An undefined zj−i implies that the sum γ i + γ j would be zero.

Example 2.41 (1) 3 is a primitive element in the prime field F17 . The powers
of 3 are given in the following table.
i 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
3i (mod 17) 1 3 9 10 13 5 15 11 16 14 8 7 4 12 2 6 1

From this table, the Zech’s logarithm table can be computed as follows. For
j ∈ {0, 1, 2, . . . , 15}, compute 1 + 3j (mod 17) and then locate the value zj for
which 3zj ≡ 1 + 3j (mod 17).
j 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
zj 14 12 3 7 9 15 8 13 − 6 2 10 5 4 1 11
The Zech logarithm table can be used as follows. Take i = 8 and j = 13. Then,
3i +3j ≡ 3k (mod 17), where k ≡ i+zj−i ≡ 8+z5 ≡ 8+15 ≡ 23 ≡ 7 (mod 16).
(2) Let F8 = F2 (θ), where θ3 + θ + 1 = 0. Consider the powers of γ = θ.
i 0 1 2 3 4 5 6
γi 1 θ θ2 θ+1 θ +θ2 2
θ +θ+1 2
θ +1
13 The concept of Zech’s logarithms was introduced near the mid-nineteenth century. Ref-

erence to Zech’s logarithms is found in Jacobi’s work.


Arithmetic of Finite Fields 111

Zech’s logarithms to the base γ = θ are listed next.

j 0 1 2 3 4 5 6
zj − 3 6 1 5 4 2

Let us compute k with γ k = γ i +γ j for i = 2 and j = 5. We have k ≡ i+zj−i ≡


2 + z3 ≡ 2 + 1 ≡ 3 (mod 7). Indeed, γ 2 + γ 5 = θ2 + (θ2 + θ + 1) = θ + 1 = γ 3 . ¤

2.5.4 Representation with Towers of Extensions


In certain situations, we use one or more intermediate fields for represent-
ing Fpn . Suppose that n = st with s, t ∈ N. We first represent Fps using one
of the methods discussed earlier. Then, we represent Fpst as an extension of
Fps . Let us first concentrate on polynomial-basis representations. Let f (x) be
an irreducible polynomial in Fp [x] of degree s. Adjoining a root θ of f (x) to
Fp gives a representation of the field Fps . As the next step, we choose an ir-
reducible polynomial g(y) of degree t from Fps [y]. We adjoin a root ψ of g(y)
to Fps in order to obtain a representation of Fpst = Fpn .

Example 2.42 Let us represent F64 using the intermediate field F8 . First,
represent F8 as the extension of F2 obtained by adjoining a root θ of the
irreducible polynomial f (x) = x3 + x + 1 ∈ F2 [x]. Thus, every element of F8
is of the form a2 θ2 + a1 θ + a0 , and the arithmetic in F8 is the polynomial
arithmetic of F2 [x] modulo f (x).
Next, consider the polynomial g(y) = y 2 + (θ2 + 1)y + θ ∈ F8 [y]. One can
easily verify that g(α) 6= 0 for all α ∈ F8 , that is, g(y) has no root in F8 .
Since the degree of g(y) is 2, it follows that g(y) is irreducible in F8 [y]. Let
ψ be a root of g(y), which we adjoin to F8 in order to obtain the extension
F64 of F8 . Thus, every element of F64 is now a polynomial in ψ of degree
< 2, that is, of the form u1 ψ + u0 , where u0 , u1 are elements of F8 , that is,
polynomials in θ of degrees < 3. The arithmetic of F64 in this representation
is the polynomial arithmetic of F8 [y] modulo the irreducible polynomial g(y).
The coefficients of these polynomials follow the arithmetic of F8 , that is, the
polynomial arithmetic of F2 [x] modulo f (x).
As a specific example, consider the elements α = (θ + 1)ψ + (θ2 ) and
β = (θ2 + θ + 1)ψ + (1) in F64 . Their sum is

α + β = [(θ + 1) + (θ2 + θ + 1)]ψ + [θ2 + 1] = (θ2 )ψ + (θ2 + 1),

and their product is

αβ = [(θ + 1)(θ2 + θ + 1)]ψ 2 + [(θ + 1)(1) + (θ2 )(θ2 + θ + 1)]ψ + [(θ2 )(1)]
= (θ3 + 1)ψ 2 + (θ4 + θ3 + θ2 + θ + 1)ψ + (θ2 )
= (θ + 1 + 1)ψ 2 + [θ(θ + 1) + (θ + 1) + θ2 + θ + 1]ψ + (θ2 )
[since θ3 + θ + 1 = 0]
112 Computational Number Theory

= (θ)ψ 2 + (θ)ψ + (θ2 )


= (θ)[(θ2 + 1)ψ + θ] + (θ)ψ + (θ2 ) [since ψ 2 + (θ2 + 1)ψ + θ = 0]
= [θ(θ2 + 1) + θ]ψ + [(θ)(θ) + θ2 ]
= (θ3 )ψ + 0
= (θ + 1)ψ [since θ3 + θ + 1 = 0].
This multiplication involves several modular operations. First, there is reduc-
tion modulo 2; second, there is reduction modulo f (x); and finally, there is
reduction modulo g(y).
One may view F64 in this representation as an F2 -vector space with a basis
consisting of the six elements θi ψ j for i = 0, 1, 2, and j = 0, 1. ¤
The above construction can be readily generalized to other representations
of finite fields. In general, if we have a way to perform arithmetic in a field F ,
we can also perform the arithmetic in the polynomial ring F [x]. If f (x) ∈ F [x]
is an irreducible polynomial, we can compute remainders of polynomials in
F [x] modulo f (x). This gives us an algorithm to implement the arithmetic of
the field obtained by adjoining a root of f (x) to F .
Example 2.43 Let us represent F8 in the F2 -basis consisting of the elements
θ0 = 1, θ1 = 1+θ, and θ2 = 1+θ +θ2 , where θ3 +θ +1 = 0 (see Example 2.39).
In this basis, the polynomial g(y) of Example 2.42 is written as g(y) = (θ0 )y 2 +
(θ0 + θ1 + θ2 )y + (θ0 + θ1 ). We represent F64 in the polynomial basis 1, ψ over
F8 , where g(ψ) = 0. The elements α, β of Example 2.42 are expressed as
α = (θ1 )ψ + (θ1 + θ2 ),
β = (θ2 )ψ + (θ0 ).
The sum of these elements is
α + β = (θ1 + θ2 )ψ + (θ0 + θ1 + θ2 ),
whereas their product is
αβ = (θ1 θ2 )ψ 2 + (θ1 θ0 + θ1 θ2 + θ22 )ψ + (θ1 θ0 + θ2 θ0 )
= [(θ0 + θ1 )]ψ 2 + [(θ1 ) + (θ0 + θ1 ) + (θ1 )]ψ + [(θ1 ) + (θ2 )]
= (θ0 + θ1 )ψ 2 + (θ0 + θ1 )ψ + (θ1 + θ2 )
= (θ0 + θ1 )[(θ0 + θ1 + θ2 )ψ + (θ0 + θ1 )] + (θ0 + θ1 )ψ + (θ1 + θ2 )
= [(θ0 + θ1 )(θ0 + θ1 + θ2 ) + (θ0 + θ1 )]ψ + [(θ0 + θ1 )2 + (θ1 + θ2 )]
= (θ02 + θ0 θ1 + θ0 θ2 + θ1 θ0 + θ12 + θ1 θ2 + θ0 + θ1 )ψ + (θ02 + θ12 + θ1 + θ2 )
= (θ02 + θ0 θ2 + θ12 + θ1 θ2 + θ0 + θ1 )ψ + (θ02 + θ12 + θ1 + θ2 )
= [(θ0 ) + (θ2 ) + (θ0 + θ1 + θ2 ) + (θ0 + θ1 ) + θ0 + θ1 ]ψ +
[(θ0 ) + (θ0 + θ1 + θ2 ) + θ1 + θ2 )]
= (θ1 )ψ.
Since θ1 = θ + 1, this result tallies (it should!) with Example 2.42. ¤
Arithmetic of Finite Fields 113

2.6 Computing Isomorphisms among Representations


We have adjoined a root of an irreducible polynomial in order to obtain
an extension field. What happens if we adjoin another root of the same ir-
reducible polynomial? Do we get a different field? Let f (x) ∈ F [x] be an
irreducible polynomial, and let θ, ψ be two roots of f (x). Consider the exten-
sions K = F (θ) and L = F (ψ). The two sets K, L may or may not be the
same. However, there exists an isomorphism between the fields K and L. The
following theorem implies that K and L are algebraically indistinguishable. So
it does not matter which root of f (x) is adjoined to represent the extension.
Theorem 2.44 There exists an isomorphism K → L of fields that fixes each
element of F and that maps θ to ψ. ⊳
Now, let f (x), g(x) ∈ F [x] be two different irreducible polynomials of the
same degree. Let θ be a root of f (x), and ψ a root of g(x). The fields K = F (θ)
and L = F (ψ) may or may not be isomorphic. However, if F is a finite field,
the fields K and L have the same size, and so are isomorphic. Now, we discuss
an algorithm to compute an explicit isomorphism between K and L.
More concretely, let F = Fp , and let both K and L be extensions of Fp
of degree n. Suppose that K is represented in the Fp -basis θ0 , θ1 , θ2 , . . . , θn−1 ,
and L is represented in the Fp -basis ψ0 , ψ1 , ψ2 , . . . , ψn−1 . We plan to compute
an isomorphism µ : K → L that fixes the common subfield Fp element-wise.
It suffices to define µ only for the basis elements θ0 , θ1 , θ2 , . . . , θn−1 . Write
µ(θ0 ) = t0,0 ψ0 + t0,1 ψ1 + t0,2 ψ2 + · · · + t0,n−1 ψn−1 ,
µ(θ1 ) = t1,0 ψ0 + t1,1 ψ1 + t1,2 ψ2 + · · · + t1,n−1 ψn−1 ,
···
µ(θn−1 ) = tn−1,0 ψ0 + tn−1,1 ψ1 + tn−1,2 ψ2 + · · · + tn−1,n−1 ψn−1 ,
where each ti,j ∈ Fp . The n×n matrix T whose (i, j)-th element is ti,j is called
the transformation matrix from the representation K to the representation L
of Fpn . In matrix notation, we have
 µ(θ )   
0 ψ0
 µ(θ1 )   ψ1 
 ..  = T  . .
   . 
. .
µ(θn−1 ) ψn−1
Take an element α = a0 θ0 + a1 θ1 + · · · + an−1 θn−1 ∈ K. By linearity, we have
µ(α) = a0 µ(θ0 ) + a1 µ(θ1 ) + · · · + an−1 µ(θn−1 ), that is,
 µ(θ )   
0 ψ0
 µ(θ1 )   ψ 
µ(α) = ( a0 a1 · · · an−1 )  ..  = ( a0 a1 · · · an−1 ) T  .1 .
   . 
. .
µ(θn−1 ) ψn−1
114 Computational Number Theory

The last expression gives the representation of µ(α) in the basis ψ0 , ψ1 , ψ2 ,


. . . , ψn−1 . Thus, µ is completely specified by the transformation matrix T .
Let us specialize to polynomial-basis representations of both K and L,
that is, θi = θi and ψj = ψ j , where f (θ) = 0 = g(ψ) for some irreducible
polynomials f (x), g(x) ∈ Fp [x] of degree n. Since µ is an isomorphism of
fields, µ(θi ) = µ(θ)i for all i. Therefore, µ is fully specified by the element
µ(θ) only. Since 0 = µ(0) = µ(f (θ)) = f (µ(θ)), the element µ(θ) ∈ L must
be a root of f (x). All roots of an irreducible polynomial being algebraically
indistinguishable, the task of computing µ reduces to computing any root θ′
of f (x) in L, and setting µ(θ) = θ′ .

Example 2.45 Consider two representations K, L of F8 , where K = F2 (θ)


with θ3 + θ + 1 = 0, and L = F2 (ψ) with ψ 3 + ψ 2 + 1 = 0. The three roots of
f (x) in L are ψ + 1, ψ 2 + 1, ψ 2 + ψ. The choice θ′ = µ(θ) = ψ 2 + 1 gives

µ(1) = 1,
µ(θ) = ψ 2 + 1,
µ(θ2 ) = µ(θ)2 = ψ 4 + 1 = ψ(ψ 2 + 1) + 1 = ψ 3 + ψ + 1
= (ψ 2 + 1) + ψ + 1 = ψ 2 + ψ.
 
1 0 0
Thus, the transformation matrix is T =  1 0 1 . Take the elements
0 1 1
α = θ + 1 and β = θ2 + θ + 1 in K. Since ( 1 1 0 ) T = ( 0 0 1 ) and
( 1 1 1 ) T = ( 0 1 0 ), we have α′ = µ(α) = ψ 2 , and β ′ = µ(β) = ψ. We
have α + β = θ2 , so µ(α + β) = µ(θ2 ) = ψ 2 + ψ = α′ + β ′ = µ(α) + µ(β),
as expected. Moreover, αβ = θ3 + 1 = θ, so µ(αβ) = µ(θ) = ψ 2 + 1, whereas
α′ β ′ = ψ 3 = ψ 2 + 1, that is, µ(αβ) = µ(α)µ(β), as expected. ¤

In order to complete the description of the algorithm for the computation


of µ, we need to describe how the roots of a polynomial can be computed
in a finite field. We defer this study until Chapter 3. The root-finding algo-
rithms we study there are randomized, and run in polynomial time. There are
deterministic polynomial-time algorithms14 under the assumption that the ex-
tended Riemann hypothesis (ERH) is true. No deterministic polynomial-time
algorithm not based on unproven assumptions is known for polynomial root
finding (in finite fields). However, the problem of computing isomorphisms
between two representations of a finite field can be solved by deterministic
polynomial-time algorithms without resorting to root finding.15 For practical
purposes, our randomized algorithm based on root finding suffices.

14 S. A. Evdokimov, Factorization of solvable polynomials over finite fields and the gener-

alized Riemann hypothesis, Journal of Mathematical Sciences, 59(3), 842–849, 1992. This
is a translation of a Russian article published in 1989.
15 Hendrik W. Lenstra, Jr., Finding isomorphisms between finite fields, Mathematics of

Computation, 56(193), 329–347, 1991.


Arithmetic of Finite Fields 115

Exercises
1. Let F be a field (not necessarily finite), and F [x] the ring of polynomials in
one indeterminate x. Let f (x), g(x) ∈ F [x] with g(x) 6= 0.
(a) Prove that there exist unique polynomials q(x), r(x) ∈ F [x] satisfying
f (x) = q(x)g(x) + r(x), and either r(x) = 0 or deg r(x) < deg g(x).
(b) Prove that gcd(f (x), g(x)) = gcd(g(x), r(x)). (Remark: If d(x) is a gcd
of f (x) and g(x), then so also is ad(x) for any non-zero a ∈ F . We can adjust
a so that the leading coefficient of ad(x) equals one. This monic gcd is called
the gcd of f (x) and g(x), and is denoted by gcd(f (x), g(x)).)
(c) Prove that there exist polynomials u(x), v(x) ∈ F [x] with the property
gcd(f (x), g(x)) = u(x)f (x) + v(x)g(x).
(d) Prove that if f (x) and g(x) are non-constant, we may choose u(x), v(x)
in such a way that deg u(x) < deg g(x) and deg v(x) < deg f (x).
2. We have seen that the polynomials x2 + x + 1, x3 + x + 1 and x6 + x + 1 are
irreducible in F2 [x]. Prove or disprove: the polynomial xn + x + 1 is irreducible
in F2 [x] for every n > 2.
3. Consider the extension of Q obtained by adjoining a root of the irreducible
polynomial x4 + 1. Derive how x4 + 1 factors in the extension.
4. (a) List all monic irreducible polynomials of degrees 1, 2, 3, 4 in F2 [x].
(b) List all monic irreducible polynomials of degrees 1, 2, 3 in F3 [x].
(c) List all monic irreducible polynomials of degrees 1, 2, 3 in F5 [x].
5. (a) Verify whether x8 + x + 1 and x8 + x3 + 1 are irreducible in F2 [x].
(b) Prove or disprove: There does not exist an irreducible binomial/trinomial/
quadrinomial of degree eight in F2 [x].
6. (a) Prove that the polynomial f (x) = x4 + x + 4 is irreducible in F5 [x].
(b) Represent F625 = F54 by adjoining a root θ of f (x) to F5 , and let α =
2θ3 + 3θ + 4 and β = θ2 + 2θ + 3. Compute α + β, α − β, αβ and α/β.
7. (a) Which of the polynomials x2 ± 7 is irreducible modulo 19? Justify.
(b) Using the irreducible polynomial f (x) of Part (a), represent the field
F361 = F192 as F19 (θ), where f (θ) = 0. Compute (2θ + 3)11 in this repre-
sentation of F361 using left-to-right square-and-multiply exponentiation.
8. Let F2n have a polynomial-basis representation. Store each element of F2n as
an array of w-bit words. Denote the words of α ∈ F2n by α0 , α1 , . . . , αN −1 ,
where N = ⌈n/w⌉. Write pseudocodes for addition, schoolbook multiplication,
left-to-right comb multiplication, modular reduction, and inverse in F2n .
9. Let α = an−1 θn−1 + an−2 θn−2 + · · · + a1 θ + a0 ∈ F2n with ai ∈ F2n .
(a) Prove that α2 = an−1 θ2(n−1) + an−2 θ2(n−2) + · · · + a1 θ2 + a0 .
(b) How can you efficiently square a polynomial in F2 [x] under the bit-vector
representation? Argue that squaring is faster (in general) than multiplication.
(c) How can precomputation speed up this squaring algorithm?
116 Computational Number Theory

10. Explain how the coefficients of x255 through x233 in γ3 of Example 2.22 can
be eliminated using bit-wise shift and XOR operations.
11. Design efficient reduction algorithms (using bit-wise operations) for the fol-
lowing fields recommended by NIST. Assume a packing of 64 bits in a word.
(a) F21223 defined by x1223 + x255 + 1.
(b) F2571 defined by x571 + x10 + x5 + x2 + 1.
12. Repeat Exercise 2.11 for a packing of 32 bits per word.
13. An obvious way to compute β/α for α, β ∈ F2n , α 6= 0, is to compute β × α−1
which involves one inverse computation and one multiplication. Explain how
the multiplication can be avoided altogether by modifying the initialization
step of the binary inverse algorithm (Algorithm 2.1).
14. Let α ∈ F∗2n .
n
(a) Prove that α−1 = α2 −2 .
(b) Use Part (a) and the fact that 2n − 2 = 2 + 22 + 23 + · · · + 2n−1 to design
an algorithm to compute inverses in F2n .
n
15. We now investigate another way of computing α−1 = α2 −2 for an α ∈ F∗2n .
k
(a) Suppose that α2 −1 has been computed for some k > 1. Explain how
2k 2k+1
α2 −1 and α2 −1
can be computed.
n
(b) Based upon the result of Part (a), devise an algorithm to compute α2 −2 =
n−1
(α2 −1 )2 from the binary representation of n − 1.
(c) Compare the algorithm of Part (b) with that of Exercise 2.14(b).
16. Let α ∈ F2n . Prove that the equation x2 = α has a unique solution in F2n .
17. Represent F2n =√F2 (θ), and let α ∈ F2n .
n−1
(a) Prove that θ = θ2 .
(b) How can you express α as A0 (θ2 ) + θ × A1 (θ√ 2
) (for polynomials A0 , A1 )?
(c) Design an efficient algorithm for computing α.
18. Let F21223 be
√ defined by the irreducible trinomial f (x) = x1223 + x255 + 1.
Show that x = x + x128 modulo f (x).
612

19. Let F2n = F2 (θ), θ being a root of an irreducible trinomial f (x) = xn + xk + 1.


(a) Prove that both n and k cannot be √ even.
(b) If n and k are both odd, show that √θ = θ(n+1)/2 + θ(k+1)/2 .
−(n−1)/2 k/2
(c) If n is odd and k is even, show√ that θ = θ (θ + 1).
(d) Derive a similar formula for θ when n is even and k is odd.
20. Write arithmetic routines (addition, subtraction, schoolbook and comb-based
multiplication, and modular inverse) for the field F3n under Kawahara et al.’s
representation scheme. Pack w bits in a word.
21. Supply an efficient reduction algorithm for modular reduction in the NIST-
recommended field F3509 defined by the irreducible polynomial x509 − x318 −
x191 + x127 + 1. Represent elements of F3509 as in Exercise 2.20 with w = 64.
22. Let the field Fpn be represented in the polynomial basis 1, θ, θ2 , . . . , θn−1 . An
element an−1 θn−1 + an−2 θn−2 + · · · + a1 θ + a0 is identified with the non-
negative integer an−1 pn−1 + an−2 pn−2 + · · · + a1 p + a0 . Argue that under
Arithmetic of Finite Fields 117

this identification, the elements of Fpn are represented uniquely as integers


between 0 and pn − 1. Write pseudocodes that add, subtract, multiply and
divide two elements of Fpn in this representation.
23. Represent an element of Fpn as an n-tuple of elements of Fp , so an element
can be stored using O(n lg p) bits. Assume that we use schoolbook polynomial
arithmetic. Deduce the running times for addition, subtraction and multipli-
cation in Fpn . Also deduce the running time for inverting an element in F∗pn .
(Assume that Fpn has a polynomial-basis representation.)
24. Show that the p-th power exponentiation in Fpn can be efficiently computed.
25. [Itoh–Tsujii inversion (1988)] Let α ∈ F∗pn , and r = (pn − 1)/(p − 1) =
1 + p + p2 + · · · + pn−1 .
(a) Prove that αr ∈ Fp .
(b) How can α−1 be efficiently computed by the formula α−1 = (αr )−1 αr−1 ?
26. Consider a finite field Fqn = Fq (θ) with q large and n small. Elements of Fqn
are polynomials of degree n − 1 with coefficients from Fq . Multiplication of
two elements of Fqn first involves a polynomial multiplication over Fq in order
to obtain an intermediate polynomial of degree 2n − 2. This is followed by
reduction modulo the minimal polynomial of θ (over Fq ). Schoolbook multi-
plication requires n2 Fq -multiplications to compute the intermediate product.
Karatsuba–Ofman multiplication can reduce this number. Since Fq is a large
field, this leads to practical improvements. Let ν denote the number of Fq -
multiplications used to compute the intermediate product. Prove that:
(a) If n = 2, we can take ν = 3.
(b) If n = 3, we can take ν = 6.
(c) If n = 4, we can take ν = 9.
(d) If n = 5, we can take ν = 14.
(e) If n = 6, we can take ν = 18.
27. Prove that every element α ∈ Fpn has a unique p-th root in Fpn . Show that
√ n−1
this root is given by p α = αp .
28. Assume that p is small (like 3 or 5). Generalize the algorithm of Exercise 2.17
to compute the p-th√root of an element α ∈ Fpn . (Hint: Represent Fpn as
Fp (θ). Precompute ( p θ)i for i = 0, 1, 2, . . . , p − 1.)
29. Let F3509 be defined by the irreducible pentanomial g(x) = x509 − x318 −
x191 + x127 + 1. Show that x1/3 = x467 + x361 − x276 + x255 + x170 + x85 , and
x2/3 = −x234 + x128 − x43 modulo f (x).
30. Find the minimal polynomials of all elements in some representation of F8 .
31. Find the minimal polynomials (over F5 ) of α and β of Exercise 2.6.
32. Represent F16 by adjoining to F2 a root of an irreducible polynomial of degree
four in F2 [x]. Find a primitive element in this representation of F16 . Also find
a normal element in F16 . Check whether this normal element is primitive too.
33. Represent F27 by adjoining to F3 a root of an irreducible polynomial of degree
three in F3 [x]. Find a primitive element in this representation of F27 . Also find
a normal element in F27 . Check whether this normal element is primitive too.
118 Computational Number Theory

34. Represent F25 by adjoining to F5 a root of an irreducible polynomial of degree


two in F5 [x]. Find a primitive element in this representation of F25 . Also find
a normal element in F25 . Check whether this normal element is primitive too.
35. Find a primitive element in the field F29 .
36. Represent F4 by adjoining θ to F2 , where θ2 + θ + 1 = 0.
(a) List all monic irreducible polynomials of degrees 1, 2 in F4 [x].
(b) Let ψ be a root of an irreducible polynomial of F4 [x] of degree 2. Represent
F16 as F4 (ψ). How can you add and multiply elements in this representation?
(c) Find a primitive element in this representation of F16 .
(d) Compute the minimal polynomial of the element (θ + 1)ψ + 1 over F2 .
(e) Compute the minimal polynomial of the element (θ + 1)ψ + 1 over F4 .
37. Represent F64 = F26 as F2 (θ) with θ6 + θ3 + 1 = 0.
(a) Find all the conjugates of θ (over F2 as polynomials in θ of degrees < 6).

(b) Prove or disprove: θ is a primitive element of F64 .
3
(c) What is the minimal polynomial of θ over F2 ?
38. Represent F9 as F3 (θ), where θ2 + θ + 2 = 0.
(a) Find the roots of x2 + x + 2 in F9 .
(b) Find the roots of x2 + x + 2 in Z9 .
(c) Prove that θ is a primitive element of F9 .
(d) Prove that the polynomial y 2 − θ is irreducible over F9 .
Represent F81 as F9 (ψ), where ψ 2 − θ = 0.
(e) Determine whether ψ is a primitive element of F81 .
(f ) Find the minimal polynomial of ψ over F3 .
39. Prove that θ + 1 is a normal element in F32 = F2 (θ), where θ5 + θ2 + 1 = 0.
40. Prepare the Zech logarithm tables for the fields F25 , F27 , F29 with respect to
some primitive elements.
41. Compute an explicit isomorphism between the two representations of F16 in
Exercises 2.32 and 2.36.
42. Prove that the total number of ordered bases of Fpn over Fp is

(pn − 1)(pn − p)(pn − p2 ) · · · (pn − pn−1 ).

43. Let α ∈ Fpn , and fα (x) the minimal polynomial of α over Fp . Prove that the
degree of fα (x) divides n.
44. Let f (x), g(x) be irreducible polynomials in Fp [x] of degrees m and n. Let Fpm
be represented by adjoining a root of f (x) to Fp . Prove that:
(a) If m = n, then g(x) splits over Fpm .
(b) If gcd(m, n) = 1, then g(x) is irreducible in Fpm [x].
45. Let q = pn , f (x) a polynomial in Fq [x], and f ′ (x) the (formal) derivative of
f (x). Prove that f ′ (x) = 0 if and only if f (x) = g(x)p for some g(x) ∈ Fq [x].
46. Let p be an odd prime, n ∈ N, and q = pn . Prove that for every α ∈ Fq ,

xq − x = (x + α)((x + α)(q−1)/2 − 1)((x + α)(q−1)/2 + 1).


Arithmetic of Finite Fields 119

47. Let n ∈ N, and q = 2n . Prove that for every α ∈ Fq ,


³ n−1
´
xq + x = (x + α) + (x + α)2 + (x + α)4 + · · · + (x + α)2 ×
³ n−1
´
1 + (x + α) + (x + α)2 + (x + α)4 + · · · + (x + α)2 .

48. Modify Algorithm 1.7 in order to compute the order of an element α ∈ F∗q .
You may assume that the complete prime factorization of q − 1 is available.
2 n−1
49. Let α ∈ F∗pn . Prove that the orders of α, αp , αp , . . . , αp are the same. In
particular, all conjugates of a primitive element of Fpn are again primitive.
50. Prove that n|φ(pn − 1) for every p ∈ P and n ∈ N.
51. Let q − 1 = pe11 · · · perr be the prime factorization of the size q − 1 of F∗q with
r
X Y p2e
i
i +1
+1
each ei > 1. Prove that ord α = .
∗ i=1
p i + 1
α∈Fq

52. [Euler’s criterion for finite fields] Let α ∈ F∗q with q odd. Prove that the
equation x2 = α has a solution in F∗q if and only if α(q−1)/2 = 1.
53. [Generalized Euler’s criterion] Let α ∈ F∗q , t ∈ N, and d = gcd(t, q−1). Prove
that the equation xt = α has a solution in F∗q if and only if α(q−1)/d = 1.
54. Prove that for any finite field Fq and for any α ∈ Fq , the equation x2 + y 2 = α
has at least one solution for (x, y) in Fq × Fq .
55. Let γ be a primitive element of Fq , and r ∈ N. Prove that the polynomial
xr − γ has a root in Fq if and only if gcd(r, q − 1) = 1.
√ √
56. Prove that the field Q( 2) is not isomorphic to the field Q( 3).
57. Let θ, ψ be two distinct roots of some non-constant irreducible polynomial
f (x) ∈ F [x], and let K = F (θ) and L = F (ψ). Give an example where K = L
as sets. Give another example where K 6= L as sets.
58. Let α ∈ Fpn . The trace and norm of α over Fp are defined respectively as
2 n−1
Tr(α) = α + α p + α p + · · · + αp ,
2 n−1
N(α) = α × α p × α p × · · · × αp .
(a) Prove that Tr(α), N(α) ∈ Fp .
(b) Prove that if α ∈ Fp , then Tr(α) = nα and N(α) = αn .
(c) Prove that Tr(α + β) = Tr(α) + Tr(β) and N(αβ) = N(α) N(β) for all
α, β ∈ Fpn . (Trace is additive, and norm is multiplicative.)
(d) Prove that Tr(α) = 0 if and only if α = γ p − γ for some γ ∈ Fpn .
59. Let α ∈ F2n .
(a) Prove that x2 + x = α is solvable for x in F2n if and only if Tr(α) = 0.
1 3 5 n−2
(b) Let Tr(α) = 0. If n is odd, prove that α2 + α2 + α2 + · · · + α2 is a
solution of x2 + x = α. What is the other solution?
(c) Describe a method to solve the general quadratic equation ax2 +bx+c = 0
with a, b, c ∈ F∗2n (assume that n is odd).
120 Computational Number Theory

60. Let Fq be a finite field, and let γ ∈ F∗q be a primitive element. For every
α ∈ F∗q , there exists a unique x in the range 0 6 x 6 q − 2 such that α = γ x .
Denote this x by indγ α (index of α with respect to γ).
(a) First assume that q is odd. Prove that the equation x2 = α is solvable in
Fq for α ∈ F∗q if and only if indγ α is even.
(b) Now, let q = 2n . In this case, for every α ∈ Fq , there exists a unique β ∈ Fq
n−1
such that β 2 = α. In fact, β = α2 . Suppose that α, β ∈ F∗q , k = indγ α,
and l = indγ β. Express l as an efficiently computable formula in k and q.
61. Let θ0 , θ1 , . . . , θn−1 be elements of Fpn . The discriminant ∆(θ0 , θ1 , . . . , θn−1 )
of θ0 , θ1 , . . . , θn−1 is defined as the determinant of the n × n matrix
 Tr(θ θ ) Tr(θ θ ) ··· Tr(θ θ ) 
0 0 0 1 0 n−1
 Tr(θ1 θ0 ) Tr(θ1 θ1 ) ··· Tr(θ1 θn−1 ) 
A=
 .. .. .. .

. . ··· .
Tr(θn−1 θ0 ) Tr(θn−1 θ1 ) ··· Tr(θn−1 θn−1 )
(a) Prove that θ0 , θ1 , . . . , θn−1 constitute a basis of Fpn over Fp if and only if
∆(θ0 , θ1 , . . . , θn−1 ) 6= 0.
(b) Define the matrix B as
 
θ0 θ1 · · · θn−1
p p p
 θ0
 θ1 · · · θn−1 
B= . . .. 
 .. .. · · · . 
n−1 n−1
pn−1
θ0p θ1p · · · θn−1
Prove that B t B = A, where B t denotes the transpose of B. Conclude that
θ0 , θ1 , . . . , θn−1 constitute a basis of Fpn over Fp if and only if det B 6= 0.
Y ³ i ´
j 2
(c) Let θ ∈ Fpn . Prove that ∆(1, θ, θ2 , . . . , θn−1 ) = θp − θp .
06i<j6n−1

Programming Exercises

62. Write a GP/PARI function for the Euclidean inverse algorithm in F2n .
63. Write a GP/PARI function for the binary inverse algorithm in F2n .
64. Write a GP/PARI function for the almost inverse algorithm in F2n .
65. Write a GP/PARI function for the Euclidean inverse algorithm in Fpn .
66. Write a GP/PARI function for the binary inverse algorithm in Fpn .
67. Write a GP/PARI function for the almost inverse algorithm in Fpn .
68. Generalize the GP/PARI code of Section 2.4 for checking normal elements so as
to work for any extension F2n .
69. Write GP/PARI functions to compute traces and norms of elements in Fpn . Use
these functions to compute the traces and norms of all elements in F64 .
Chapter 3
Arithmetic of Polynomials

3.1 Polynomials over Finite Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122


3.1.1 Polynomial Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
3.1.2 Irreducible Polynomials over Finite Fields . . . . . . . . . . . . . . . . . . . . . . . . 122
3.1.3 Testing Irreducibility of Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
3.1.4 Handling Irreducible Polynomials in GP/PARI . . . . . . . . . . . . . . . . . . 127
3.2 Finding Roots of Polynomials over Finite Fields . . . . . . . . . . . . . . . . . . . . . . . . . 128
3.2.1 Algorithm for Fields of Odd Characteristics . . . . . . . . . . . . . . . . . . . . . . 129
3.2.2 Algorithm for Fields of Characteristic Two . . . . . . . . . . . . . . . . . . . . . . . 131
3.2.3 Root Finding with GP/PARI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
3.3 Factoring Polynomials over Finite Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
3.3.1 Square-Free Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
3.3.2 Distinct-Degree Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
3.3.3 Equal-Degree Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
3.3.4 Factoring Polynomials in GP/PARI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
3.4 Properties of Polynomials with Integer Coefficients . . . . . . . . . . . . . . . . . . . . . . 145
3.4.1 Relation with Polynomials with Rational Coefficients . . . . . . . . . . . . 145
3.4.2 Height, Resultant, and Discriminant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
3.4.3 Hensel Lifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
3.5 Factoring Polynomials with Integer Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . 154
3.5.1 Berlekamp’s Factoring Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
3.5.2 Basis Reduction in Lattices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
3.5.3 Lenstra–Lenstra–Lovász Factoring Algorithm . . . . . . . . . . . . . . . . . . . . 166
3.5.4 Factoring in GP/PARI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

Polynomials are useful in a variety of mathematical and computational con-


texts. We have already used modular polynomial arithmetic to represent finite
fields. In addition to such applications, polynomials themselves constitute an
independent area of study. Two most important computational problems per-
taining to polynomials are finding roots of polynomials and factoring polyno-
mials. This chapter is an introduction to computations involving polynomials.
The set of all polynomials in one indeterminate (also called variable) x
and with coefficients from a ring A is denoted by A[x]. If A is an integral
domain, then so also is A[x] (and conversely). Polynomial rings over fields
enjoy several nice algebraic properties. For example, the polynomial ring K[x]
over a field K is a unique factorization domain, that is, every non-constant
polynomial in K[x] can be written as a product of irreducible polynomials,
and this factorization is unique up to rearrangement of the factors and up to
multiplication by non-zero elements of K. This unique factorization property

121
122 Computational Number Theory

of K[x] is derived from the fact that K[x] is a Euclidean domain, that is, the
concept of Euclidean division and Euclidean gcd holds in K[x].
We start our study with polynomials over finite fields, that is, with the ring
Fq [x] for some q. Next, we look at the polynomial ring Z[x] over integers. In
some sense, the study of Z[x] is the same as the study of Q[x]. Since Q is a field,
the ring Q[x] is an easier object to study than Z[x]. Of course, Q ⊆ R ⊆ C,
and so a study of Q[x] may benefit from a study of R[x] and C[x]. However,
a study of R[x] and C[x] may lead us too far away from our focus of interest.
Indeed, we cannot represent every real (or complex) number in computers.
Every finite representation of real numbers has to be approximate. Algorithmic
issues pertaining to such approximate representations (like convergence and
numerical stability) are not dealt with in this book.

3.1 Polynomials over Finite Fields


3.1.1 Polynomial Arithmetic
Let Fq be a finite field for some q = pn with p ∈ P and n ∈ N. We know how
to carry out the arithmetic of the field Fq . Using these computational primi-
tives, we can implement the arithmetic of Fq [x]. For f (x), g(x) ∈ Fq [x], we can
compute f (x) + g(x), f (x) − g(x), and f (x)g(x) with the coefficient arithmetic
being that of Fq . Fq [x] supports Euclidean division, that is, for g(x) 6= 0, we
can compute f (x) quot g(x) and f (x) rem g(x). The Euclidean gcd condition
can be stated for polynomials as gcd(f (x), g(x)) = gcd(g(x), f (x) rem g(x))
(provided that g(x) 6= 0). Finally, a Bézout relation for f (x), g(x) ∈ Fq [x],
not both zero, is of the form gcd(f (x), g(x)) = u(x)f (x) + v(x)g(x) for some
u(x), v(x) ∈ Fq [x]. Solve Exercise 2.1 for learning the details.

3.1.2 Irreducible Polynomials over Finite Fields


Irreducible polynomials play a crucial role in the arithmetic of Fq [x]. We
start with a mathematical study of them. For more details, the reader may
consider Lidl and Niederreiter’s book (Footnote 2 on page 76).
Let Fq = Fp (θ) with f (θ) = 0, where f (x) ∈ Fp [x] is monic and irreducible
of degree n. Let g(x) be another monic irreducible polynomial in Fp [x] of
degree d|n. Let ψ be any root of g(x). Suppose that ψ ∈ / Fq . Adjoining ψ
to Fq gives a field K with |K| > q. Every element α ∈ Fq satisfies αq = α.
Moreover, ψ q = ψ, since ψ can be used for representing Fpd with d|n. It follows
that the polynomial xq −x has at least q+1 roots in K. This is impossible, since
K is a field. Therefore, ψ ∈ Fq , that is, g(x) has a root ψ and so all the roots
2 d−1 2 d−1
ψ, ψ p , ψ p , . . . , ψ p in Fq . Since g(x) = (x−ψ)(x−ψ p )(x−ψ p ) · · · (x−ψ p ),
we have proved (a part of) the following important result.
Arithmetic of Polynomials 123

Theorem 3.1 The product of all monic irreducible polynomials in Fp [x] of


n
degrees dividing n is xp − x. More generally, for m ∈ N, the product of all
m
monic irreducible polynomials in Fq [x] of degrees dividing m is xq − x. ⊳
Example 3.2 (1) Let p = 2 and n = 6. All positive integral divisors of 6 are
6
1, 2, 3, 6. The polynomial x2 − x factors over F2 as
x64 + x = x(x + 1)(x2 + x + 1)(x3 + x + 1)(x3 + x2 + 1) ×
(x6 + x + 1)(x6 + x5 + 1)(x6 + x3 + 1)(x6 + x4 + x2 + x + 1) ×
(x6 + x5 + x4 + x2 + 1)(x6 + x4 + x3 + x + 1)(x6 + x5 + x3 + x2 + 1) ×
(x6 + x5 + x2 + x + 1)(x6 + x5 + x4 + x + 1).
All the (monic) irreducible polynomials of F2 [x] of degree one are x and x + 1.
The only irreducible polynomial of F2 [x] of degree two is x2 + x + 1. The
irreducible polynomials of F2 [x] of degree three are x3 + x + 1 and x3 + x2 + 1,
whereas those of degree six are x6 + x + 1, x6 + x5 + 1, x6 + x3 + 1, x6 + x4 +
x2 + x + 1, x6 + x5 + x4 + x2 + 1, x6 + x4 + x3 + x + 1, x6 + x5 + x3 + x2 + 1,
x6 + x5 + x2 + x + 1, and x6 + x5 + x4 + x + 1.
3
(2) Let us now see how x64 + x = x4 + x factors over F4 , that is, we
take q = 4 and m = 3. We represent F64 = F2 (θ), where θ6 + θ + 1 = 0. By
Example 2.29, the unique copy of F4 contained in this representation of F64
is {0, 1, ξ, ξ + 1}, where ξ = θ5 + θ4 + θ3 + θ satisfies ξ 2 + ξ + 1 = 0. Since the
extension degree of F64 over F4 is three, x64 +x factors over F4 into irreducible
polynomials of degrees one and three only. The four irreducible polynomials
in F4 [x] of degree one are x, x + 1, x + ξ, x + ξ + 1. The irreducible polynomial
x2 + x + 1 of F2 [x] factors in F4 [x] as (x + ξ)(x + ξ + 1). The polynomials
x3 + x + 1 and x3 + x2 + 1 of F2 [x] continue to remain irreducible in F4 [x].
Each of the nine irreducible polynomials in F2 [x] of degree six factors into two
irreducible polynomials in F4 [x] of degree three, as shown below.
x6 + x + 1 = [x3 + x2 + (ξ + 1)x + ξ] [x3 + x2 + ξx + (ξ + 1)],
x6 + x5 + 1 = [x3 + (ξ + 1)x2 + ξx + ξ] [x3 + ξx2 + (ξ + 1)x + (ξ + 1)],
x6 + x3 + 1 = [x3 + ξ] [x3 + (ξ + 1)],
x + x + x2 + x + 1 = [x3 + ξx + 1] [x3 + (ξ + 1)x + 1],
6 4

x6 + x5 + x4 + x2 + 1 = [x3 + ξx2 + 1] [x3 + (ξ + 1)x2 + 1],


x6 + x4 + x3 + x + 1 = [x3 + x2 + x + ξ] [x3 + x2 + x + (ξ + 1)],
x6 + x5 + x3 + x2 + 1 = [x3 + ξx2 + ξx + ξ] [x3 + (ξ + 1)x2 + (ξ + 1)x + (ξ + 1)],
x6 + x5 + x2 + x + 1 = [x3 + ξx2 + (ξ + 1)x + ξ] [x3 + (ξ + 1)x2 + ξx + (ξ + 1)],
x6 + x5 + x4 + x + 1 = [x3 + (ξ + 1)x2 + x + ξ] [x3 + ξx2 + x + (ξ + 1)]. ¤
Theorem 3.1 has many consequences. First, it gives us a formula for com-
puting the number Nq,m of monic irreducible polynomials in Fq [x] of degree
equal to m. Equating degrees in Theorem 3.1 gives
X X
pn = dNp,d , or, more generally, q m = dNq,d .
d|n d|m
124 Computational Number Theory

These are still not explicit formulas for Np,n and Nq,m . In order to derive the
explicit formulas, we use an auxiliary result.
Definition 3.3 The Möbius function µ : N → {0, 1, −1} is defined as1

1 if n = 1,
µ(n) = 0 if p2 |n for some p ∈ P,
(−1)t

if n is the product of t ∈ N pairwise distinct primes. ⊳

Example 3.4 The following table lists µ(n) for some small values of n.
n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
µ(n) 1 −1 −1 0 −1 1 −1 0 0 1 −1 0 −1 1 1 0
¤
Lemma 3.5 For all n ∈ N, the Möbius function satisfies the identity
½
X 1 if n = 1,
µ(d) =
0 if n > 1,
d|n
X
where the sum µ(d) extends over all positive integral divisors d of n.
d|n
Proof Let n = pe11 · · · pet t be the prime factorization of n > 1 with pairwise
distinct primes p1 , . . . , pt and with each ei ∈ N. A divisor d of n is of the
form d = pr11 · · · prt t with each ri in the range 0 6 ri 6 ei . If some ri > 1,
then µ(d) = 0 by definition.
X Therefore,
Xµ(d) is non-zero if and only if each
ri ∈ {0, 1}. But then µ(d) = (−1)r1 +···+rt = (1 − 1)t = 0. ⊳
d|n (r1 ,...,rt )∈{0,1}t

Proposition
X 3.6 [Möbius inversion formula]
X : N → R satisfy f (n) =
Let f, g X
g(d) for all n ∈ N. Then, g(n) = µ(d)f (n/d) = µ(n/d)f (d).
d|n d|n d|n
Proof We have
 
X X X X
µ(d)f (n/d) = µ(d) g(d′ ) = µ(d)g(d′ )
d|n d|n d′ |(n/d) (dd′ )|n
 
X X
= g(d′ ) µ(d)
d′ |n d|(n/d′ )
 
X X X
= g(n) µ(d) + g(d′ ) µ(d) = g(n),
d|1 d′ |n d|(n/d′ )
d′ <n

where the last equality follows from Lemma 3.5. ⊳


1 August Ferdinand Möbius (1790–1868) was a German mathematician who had deep

contributions to number theory and geometry. In addition to Möbius function and Möbius
inversion formula, Möbius is also well-known for Möbius transform and Möbius strip.
Arithmetic of Polynomials 125

Corollary 3.7 The number of monic irreducible polynomials in Fp [x] of de-


gree n is
1X 1X
Np,n = µ(d)pn/d = µ(n/d)pd .
n n
d|n d|n

The number of monic irreducible polynomials in Fq [x] of degree m is


1 X 1 X
Nq,m = µ(d)q m/d = µ(m/d)q d . ⊳
m m
d|m d|m

Example 3.8 (1) First take p = 2 and n = 6. By the Möbius inversion


formula, we have N2,6 = 61 (µ(1)26 + µ(2)23 + µ(3)22 + µ(6)21 ) = 61 (64 − 8 −
4 + 2) = 9. These irreducible polynomials are listed in Example 3.2(1).
(2) For q = 4 and m = 3, we have N4,3 = 31 (µ(1)43 +µ(3)41 ) = 31 (64−4) =
20. Example 3.2(2) lists all these irreducible polynomials. ¤
A close look at the above formulas for Np,n or Nq,m indicates that the terms
containing pn or q m dominate over the other terms, so that Np,n ≈ pn /n, and
Nq,m ≈ q m /m. There are pn monic polynomials of degree n in Fp [x]. Under
the assumption that irreducible polynomials are distributed randomly among
these monic polynomials, a randomly chosen monic polynomial of degree n
in Fp [x] is irreducible with probability nearly 1/n. Likewise, a random monic
polynomial in Fq [x] of degree m is irreducible with probability about 1/m.

3.1.3 Testing Irreducibility of Polynomials


Let f (x) ∈ Fp [x] be a monic polynomial of degree d > 1. We want to check
whether f (x) is irreducible in Fp [x]. Evidently, f (x) is reducible in Fp [x] if
and only if it has an irreducible factor of some degree r 6 ⌊d/2⌋. Moreover,
r
xp − x is the product of all monic irreducible polynomials in Fp [x] of degrees
r
dividing r. Therefore, if gcd(f (x), xp −x) 6= 1 for some r 6 ⌊d/2⌋, we conclude
that f (x) has one or more irreducible factors of degrees dividing r. On the
r
other hand, if gcd(f (x), xp − x) = 1 for all r = 1, 2, . . . , ⌊d/2⌋, we conclude
that f (x) is irreducible. Algorithm 3.1 implements this idea with Fp replaced
r
by a general field Fq . Since the polynomial xq − x has a large degree, it is
r
expedient to compute xq − x modulo f (x). The correctness of the algorithm
r r
follows from the fact that gcd(f (x), xq − x) = gcd(f (x), (xq − x) rem f (x)).
Example 3.9 (1) Let us check whether the polynomial f (x) = x8 + x3 + 1 ∈
F2 [x] is irreducible. The iterations of Algorithm 3.1 reveal that f (x) has an
irreducible factor of degree three, that is, f (x) is reducible.
r r
r x2 (mod f (x)) gcd(x2 + x, f (x))
1 x2 1
2 x4 1
3
3 x +1 x3 + x + 1
126 Computational Number Theory

Algorithm 3.1: Checking whether f (x) ∈ Fq [x] with d = deg f (x) > 1
is irreducible
Initialize a temporary polynomial t(x) = x.
For (r = 1; r 6 ⌊d/2⌋; + + r) {
Set t(x) = t(x)q (mod f (x)).
If (gcd(f (x), t(x) − x) 6= 1), return False (that is, reducible).
}
Return True (that is, irreducible).

(2) For f (x) = x12 + x11 + x9 + x8 + x6 + x3 + 1 ∈ F2 [x], Algorithm 3.1


reveals that f (x) has two irreducible factors of degree three, and is reducible.
r r
r x2 (mod f (x)) gcd(x2 + x, f (x))
1 x2 1
2 x4 1
3 x8 x6 + x5 + x4 + x3 + x2 + x + 1

(3) Now, take f (x) = x15 + x4 + 1 ∈ F2 [x]. The iterations of Algorithm 3.1
proceed as follows, and indicate that x15 + x4 + 1 is irreducible in F2 [x].
r r
r x2 (mod f (x)) gcd(x2 + x, f (x))
1 x2 1
2 x4 1
3 x8 1
5
4 x +x 1
5 x10 + x2 1
6 x9 + x5 + x4 1
7 x + x8 + x7 + x3
10
1

(4) Let us represent F4 = F2 (θ), where θ2 + θ + 1 = 0. Take f (x) =


x + θx + θ ∈ F4 [x]. We first compute the Euclidean gcd of x4 + x with f (x):
6

x6 + θx + θ = x2 × (x4 + x) + (x3 + θx + θ),


x4 + x = x × (x3 + θx + θ) + (θx2 + (θ + 1)x)
= x × (x3 + θx + θ) + θ(x2 + θx),
x + θx + θ = (x + θ) × (x2 + θx) + (x + θ),
3

x2 + θx = x × (x + θ).
Thus, gcd(x4 + x, f (x)) = x + θ, that is, x6 + θx + θ is reducible. ¤

A polynomial of degree d in Fq [x] is stored as a list of its d + 1 coefficients.


Each such coefficient is a member of Fq , and can be encoded using O(log q)
Arithmetic of Polynomials 127

bits. Thus, the size of the polynomial is O(d log q). An algorithm involving
this polynomial as input is said to run in polynomial time if its running time
is a polynomial in both d and log q.
Let us deduce the running time of Algorithm 3.1. The loop continues for
a maximum of ⌊d/2⌋ = O(d) times. Each iteration of the loop involves a
modular exponentiation followed by a gcd calculation. The exponentiation is
done modulo f (x), that is, the degrees of all intermediate products are kept
at values < d. The exponent is q, that is, square-and-multiply exponentiation
makes O(log q) iterations only. In short, each exponentiation in Algorithm 3.1
requires O(d2 log q) field operations. The gcd calculation involves at most d
Euclidean divisions with each division requiring O(d2 ) operations in Fq . This
is actually an overestimate—Euclidean gcd requires O(d2 ) field operations
only. The arithmetic of Fq can be implemented to run in O(log2 q) time per
operation (schoolbook arithmetic). To sum up, Algorithm 3.1 runs in time
3
O(d3 log q) which is polynomial in both d and log q.

3.1.4 Handling Irreducible Polynomials in GP/PARI


A GP/PARI function to check the irreducibility of f (x) ∈ Fp [x] follows.

gp > \
isirr(f,p) = \
local (t,g); \
t = Mod(1,p) * x; \
for (r=1, floor(poldegree(f)/2), \
t = (t^p)%f; \
g = gcd(t-Mod(1,p)*x,f); \
print(lift(g)); \
if (g-Mod(1,p), print("Not irreducible"); return(0)) \
); \
print("Irreducible"); return(1)
gp > isirr(Mod(1,2)*x^8 + Mod(1,2)*x^3 + Mod(1,2), 2)
1
1
x^3 + x + 1
Not irreducible
%1 = 0
gp > isirr(Mod(1,2)*x^12 + Mod(1,2)*x^11 + Mod(1,2)*x^9 + Mod(1,2)*x^8 + \
Mod(1,2)*x^6 + Mod(1,2)*x^3 + Mod(1,2), 2)
1
1
x^6 + x^5 + x^4 + x^3 + x^2 + x + 1
Not irreducible
%2 = 0
gp > isirr(Mod(1,2)*x^15 + Mod(1,2)*x^4 + Mod(1,2), 2)
1
1
1
1
1
128 Computational Number Theory

1
1
Irreducible
%3 = 1
gp > isirr(Mod(1,7)*x^15 + Mod(1,7)*x^4 + Mod(1,7), 7)
1
1
1
1
1
4*x^6 + 6*x^3 + x^2 + x + 6
Not irreducible
%4 = 0

In fact, GP/PARI provides a built-in function polisirreducible() for check-


ing whether its argument is irreducible.

gp > polisirreducible(Mod(1,2)*x^8 + Mod(1,2)*x^3 + Mod(1,2))


%5 = 0
gp > polisirreducible(Mod(1,2)*x^12 + Mod(1,2)*x^11 + Mod(1,2)*x^9 + \
Mod(1,2)*x^8 + Mod(1,2)*x^6 + Mod(1,2)*x^3 + Mod(1,2))
%6 = 0
gp > polisirreducible(Mod(1,2)*x^15 + Mod(1,2)*x^4 + Mod(1,2))
%7 = 1
gp > polisirreducible(Mod(1,7)*x^15 + Mod(1,7)*x^4 + Mod(1,7))
%8 = 0

3.2 Finding Roots of Polynomials over Finite Fields


Theorem 3.1 turns out to be instrumental again for designing a probabilis-
tic polynomial-time root-finding algorithm. Let the polynomial f (x) ∈ Fq [x]
to be factored have the (unknown) factorization
f (x) = l1 (x)u1 l2 (x)u2 · · · ls (x)us g1 (x)v1 g2 (x)v2 · · · gt (x)vt ,
where li (x) are linear factors, and gj (x) are irreducible factors of degrees larger
than one. Assume that f (x), li (x), gj (x) are all monic. Each linear factor of
f (x) corresponds to a root of f (x) in Fq , whereas an irreducible factor of f (x)
of degree > 1 does not have a root in Fq . By Theorem 3.1, the polynomial
xq − x is the product of all monic linear polynomials in Fq [x], that is, f¯(x) =
gcd(xq − x, f (x)) = l1 (x)l2 (x) · · · ls (x), that is, f¯(x) has exactly the same
roots as f (x). We may, therefore, assume that f (x) itself is a product of linear
factors, and is square-free (that is, no factor of f (x) appears more than once).
Given such a polynomial f (x), we plan to find all the roots or equivalently
all the linear factors of f (x). If q is small, we can evaluate f (x) at all elements
Arithmetic of Polynomials 129

of Fq , and output all those α ∈ Fq for which f (α) = 0. This algorithm takes
time proportional to q, and is impractical for large q.
We follow an alternative strategy. We try to split f (x) as f (x) = f1 (x)f2 (x)
in Fq [x]. If deg f1 = 0 or deg f2 = 0, this split is called trivial. A non-trivial
split gives two factors of f (x) of strictly smaller degrees. We subsequently try
to split f1 (x) and f2 (x) non-trivially. This process is repeated until f (x) is
split into linear factors. I now describe a strategy2 to split f (x).

3.2.1 Algorithm for Fields of Odd Characteristics


¡ ¢¡ ¢
First, let q be odd, and write xq −x = x x(q−1)/2 − 1 x(q−1)/2 + 1 . If x is
a factor of f (x) (this can be decided from the constant term), we divide f (x) by
x, and assume that zero is not a root of f (x). We then have f (x) = f1 (x)f2 (x),
where f1 (x) = gcd(x(q−1)/2 − 1, f (x)), and f2 (x) = gcd(x(q−1)/2 + 1, f (x)).
This gives us a non-trivial split of f (x) unless all linear factors of f (x) divide
exactly one of the polynomials x(q−1)/2 ± 1. Even if this is not the case, that
is, f1 (x)f2 (x) is a non-trivial split of f (x), this method fails to split f1 (x) and
f2 (x) further, since f1 (x) divides x(q−1)/2 − 1, and f2 (x) divides x(q−1)/2 + 1.
Take any α ∈ Fq . We have (x + α)q = xq + αq = xq + α, so (x + α)q −
(x + α) = xq − x, that is, irrespective of the choice of α, the polynomial
(x+α)q −(x+α) is the product ¡ of all monic linear
¢ ¡ factors of Fq [x]. We
¢ can write
(x+α)q −(x+α) = (x+α) (x + α)(q−1)/2 − 1 (x + α)(q−1)/2 + 1 . Assuming
that¡x+α does not divide f (x), ¢ we can write f (x)¡ = f1 (x)f2 (x), where f1 (x)
¢ =
gcd (x + α)(q−1)/2 − 1, f (x) , and f2 (x) = gcd (x + α)(q−1)/2 + 1, f (x) .
Although xq − x and (x + α)q − (x + α) have the same roots (all elements of
Fq ), the roots of (x + α)(q−1)/2 − 1 are the roots (q−1)/2
¡ of x (q−1)/2− 1 shifted¢by α. So
we expect to obtain a non-trivial factor gcd (x + α) − 1, f (x) of f (x)
for some α. Indeed, Rabin shows that this gcd is non-trivial with probability
at least 1/2 for a random α. This observation leads to Algorithm 3.2 for finite
fields of odd characteristics. The algorithm assumes that the input polynomial
f (x) is square-free, and is a product of linear factors only. The polynomial
(x + α)(q−1)/2 − 1 has large degrees for large values of q. However, we need to
compute the gcd of this polynomial with f (x). Therefore, (x + α)(q−1)/2 − 1
may be computed modulo f (x). In that case, all intermediate polynomials
are reduced to polynomials of degrees < deg f , and we get a probabilistic
algorithm with expected running time polynomial in deg f and log q.

Example 3.10 (1) Let us find the roots of f (x) = x15 +x12 +2x6 +3x4 +6x3 +
3x+4 ∈ F73 [x]. We first compute t(x) ≡ x73 −x ≡ 48x14 +4x13 +23x12 +14x11 +
43x10 + 6x9 + 9x8 + 72x7 + 65x6 + 67x5 + 33x4 + 39x3 + 24x2 + 30 (mod f (x)),
and replace f (x) by gcd(f (x), t(x)) = x6 + 17x5 + 4x4 + 67x3 + 17x2 + 4x + 66.
Therefore, f (x) has six linear factors and so six roots in F73 .

2 This algorithm is first described in: Michael O. Rabin, Probabilistic algorithms in finite

fields, SIAM Journal of Computing, 9(2), 273–280, 1980.


130 Computational Number Theory

Algorithm 3.2: Finding roots of monic f (x) ∈ Fq [x] with q odd


Assumption: f (x) is a product of distinct linear factors.
If (deg f = 0), return the empty list.
If (deg f = 1), return the negative of the constant term of f (x).
Initialize the flag splitfound to zero.
While (splitfound is zero) {
Choose α ∈ Fq randomly.
Compute g(x) = (x + α)(q−1)/2 − 1 (mod f (x)).
Compute f1 (x) = gcd(f (x), g(x)).
if (0 < deg f1 < deg f ) {
Set splitfound to one.
Recursively call Algorithm 3.2 on f1 (x).
Recursively call Algorithm 3.2 on f (x)/f1 (x).
}
}

We try to split f (x) = x6 + 17x5 + 4x4 + 67x3 + 17x2 + 4x + 66. We


pick the element α = 19 ∈ F73 , and compute g(x) ≡ (x + α)(p−1)/2 − 1 ≡
(x + 19)36 − 1 ≡ 21x5 + 29x4 + 7x3 + 46x2 + 4x + 30 (mod f (x)). We have
f1 (x) = gcd(g(x), f (x)) = x2 + 44x + 43, and f2 (x) = f (x)/f1 (x) = x4 +
46x3 + 54x2 + 20x + 27. Thus, we are successful in splitting f (x) non-trivially.
We then try to split f1 (x) = x2 +44x+43. For α = 47, we compute g1 (x) ≡
(x + 47)36 − 1 ≡ 0 (mod f1 (x)), so gcd(g1 (x), f1 (x)) = f1 (x) is a trivial factor
of f1 (x). For α = 57, we compute g1 (x) ≡ (x + 57)36 − 1 ≡ 71 (mod f1 (x)),
so gcd(g1 (x), f1 (x)) = 1 is again a trivial factor of f1 (x). We then try α = 67,
and compute g1 (x) ≡ (x + 67)36 − 1 ≡ 66x + 64 (mod f1 (x)). But then,
gcd(g1 (x), f1 (x)) = x + 43 is a non-trivial factor of f1 (x). This factor is linear,
and reveals the root −43 ≡ 30 (mod 73). The cofactor f1 (x)/(x + 43) = x + 1
is also linear, and yields the root −1 ≡ 72 (mod 73).
We now attempt to split f2 (x) = x4 + 46x3 + 54x2 + 20x + 27. Take α = 33,
and compute g2 (x) ≡ (x + 33)36 − 1 ≡ 3x3 + 19x2 + 41x + 64 (mod f2 (x)).
We have f3 (x) = gcd(g2 (x), f2 (x)) = x3 + 55x2 + 38x + 70. The cofactor
f2 (x)/f3 (x) = x + 64 is linear, and yields the root −64 ≡ 9 (mod 73).
For splitting f3 (x) = x3 + 55x2 + 38x + 70, we take α = 41, compute
g3 (x) ≡ (x + 41)36 − 1 ≡ 3x2 + 43x + 4 (mod f3 (x)) and gcd(g3 (x), f3 (x)) =
x + 6. Thus, we get the root −6 ≡ 67 (mod 73).
It remains to split the cofactor f4 (x) = f3 (x)/(x + 6) = x2 + 49x + 36.
For α = 8, we have g4 (x) ≡ (x + 8)36 − 1 ≡ 31x + 28 (mod f4 (x)), and
gcd(f4 (x), g4 (x)) = 1, that is, we fail to split f4 (x). For α = 25, we have
g4 (x) ≡ (x + 25)36 − 1 ≡ 11x + 13, and gcd(g4 (x), f4 (x)) = x + 41, that is, the
root −41 ≡ 32 (mod 73) is discovered. The last root of f4 (x) is obtained from
f4 (x)/(x + 41) = x + 8, and equals −8 ≡ 65 (mod 73).
Arithmetic of Polynomials 131

To sum up, all the roots of f (x) are 9, 30, 32, 65, 67, 72 modulo 73. Indeed,

x6 + 17x5 + 4x4 + 67x3 + 17x2 + 4x + 66


≡ (x − 9)(x − 30)(x − 32)(x − 65)(x − 67)(x − 72)
≡ (x + 1)(x + 6)(x + 8)(x + 41)(x + 43)(x + 64) (mod 73).

It can be shown that the original polynomial factors in F73 [x] as

x15 + x12 + 2x6 + 3x4 + 6x3 + 3x + 4


≡ (x + 1)2 (x + 6)(x + 8)(x + 41)(x + 43)(x + 64) ×
(x8 + 55x7 + 11x6 + 37x5 + 5x4 + 62x3 + 62x2 + 46x + 62) (mod 73),

the last factor being irreducible of degree eight.


(2) Let us now consider the extension field F9 = F3 (θ), where θ2 + 1 = 0.
We plan to find the roots of f (x) = x5 + x4 + θx3 + x2 + (θ + 1)x + 2θ ∈ F9 [x].
We first compute g(x) ≡ x9 − x ≡ x4 + (2θ + 2)x3 + (2θ + 1)x2 + (θ + 1)x +
(θ + 1) (mod f (x)), and replace f (x) by gcd(f (x), g(x)) = x2 + (θ + 1)x + θ.
Our task is now to split f (x) = x2 + (θ + 1)x + θ by computing its gcd with
the polynomial (x + α)(9−1)/2 − 1 = (x + α)4 − 1 for randomly chosen α ∈ F9 .
Instead of computing (x+α)4 −1, we compute g(x) ≡ (x+α)4 −1 (mod f (x)).
We first try α = 1. In this case, g(x) ≡ (x+1)4 −1 ≡ (θ+1)x+θ (mod f (x)),
which gives gcd(g(x), f (x)) = 1, a trivial factor of f (x).
Next, we try α = θ + 1. We obtain g(x) ≡ (x + θ + 1)4 − 1 ≡ 0 (mod f (x)),
so that gcd(g(x), f (x)) = x2 + (θ + 1)x + θ is again a trivial factor of f (x).
Finally, for α = 2θ, we get g(x) ≡ (x + 2θ)4 − 1 ≡ (θ + 1)x + (θ + 2)
(mod f (x)) for which gcd(g(x), f (x)) = x + θ, that is, −θ = 2θ is a root of
f (x). The cofactor f (x)/(x + θ) = x + 1 gives the other root as −1 = 2.
To sum up, the polynomial x5 + x4 + θx3 + x2 + (θ + 1)x + 2θ ∈ F9 [x] has
the two roots 2, 2θ in F9 . Indeed, we have the factorization

x5 + x4 + θx3 + x2 + (θ + 1)x + 2θ = (x + 1)(x + θ)2 (x2 + θx + θ),

the quadratic factor being irreducible in F9 [x]. ¤

3.2.2 Algorithm for Fields of Characteristic Two


For fields Fq of characteristic 2, we cannot factor xq − x as above, since
(q − 1)/2 is not an integer. If q = 2n , we instead use the decomposition
³ n−1
´
xq − x = (x + α) + (x + α)2 + (x + α)4 + (x + α)8 + · · · + (x + α)2 ×
³ n−1
´
1 + (x + α) + (x + α)2 + (x + α)4 + (x + α)8 + · · · + (x + α)2

for any α ∈ Fq . We, therefore, compute g(x) ≡ (x + α) + (x + α)2 + (x + α)4 +


n−1
(x + α)8 + · · · + (x + α)2 (mod f (x)) and the factor f1 (x) = gcd(g(x), f (x))
132 Computational Number Theory

of f (x). If 0 < deg f1 < deg f , then f1 (x) and the cofactor f2 (x) = f (x)/f1 (x)
are split recursively. Algorithm 3.3 elaborates this idea, and assumes that the
input polynomial f (x) is square-free and a product of linear factors.

Algorithm 3.3: Finding roots of monic f (x) ∈ F2n [x]


Assumption: f (x) is a product of distinct linear factors.
If (deg f = 0), return the empty list.
If (deg f = 1), return the constant term of f (x).
Initialize the flag splitfound to zero.
While (splitfound is zero) {
Choose α ∈ F2n randomly.
Set t(x) = x + α, and s(x) = t(x).
for i = 1, 2, . . . , n − 1 {
Compute t(x) = t(x)2 (mod f (x)), and s(x) = s(x) + t(x).
}
Compute f1 (x) = gcd(f (x), s(x)).
if (0 < deg f1 < deg f ) {
Set splitfound to one.
Recursively call Algorithm 3.3 on f1 (x).
Recursively call Algorithm 3.3 on f (x)/f1 (x).
}
}

Example 3.11 Let us represent F16 = F2 (θ), where θ4 + θ + 1 = 0. We plan


to find the roots of f (x) = x10 + (θ3 + 1)x9 + (θ + 1)x5 + (θ3 + θ2 + 1)x4 +
θ3 x3 + (θ3 + θ2 + 1)x2 + θ2 x + θ3 ∈ F16 [x]. We compute h(x) ≡ x16 + x ≡
(θ3 + 1)x9 + (θ2 + 1)x8 + (θ3 + θ2 + θ + 1)x7 + (θ + 1)x6 + (θ + 1)x5 + θx4 +
(θ3 + θ2 + θ + 1)x2 + θ3 x + (θ3 + θ2 + 1) (mod f (x)), and replace f (x) by
gcd(f (x), h(x)) = x2 + (θ3 + θ2 + 1)x + (θ3 + θ2 + θ + 1).
To find the two roots of f (x), we try to split f (x) = x2 + (θ3 + θ2 + 1)x +
(θ + θ2 + θ + 1). For α = θ + 1, we have g(x) ≡ (x + α)+ (x + α)2 + (x + α)4 +
3

(x + α)8 ≡ θ2 x + (θ3 + θ + 1) (mod f (x)). This produces the split of f (x) into
f1 (x) = gcd(f (x), g(x)) = x+(θ2 +θ) and f2 (x) = f (x)/f1 (x) = x+(θ3 +θ+1),
yielding the roots θ2 + θ and θ3 + θ + 1. ¤

Algorithm 3.3 looks attractive, but has problems and cannot be used with-
out modifications. We discuss this again in Section 3.3.3. See Exercise 3.10 too.

3.2.3 Root Finding with GP/PARI


The GP/PARI interpreter provides the built-in function polrootsmod() for
computing the roots of polynomials over prime fields. The function takes two
arguments: a polynomial f (x) with integer coefficients, and a prime p. The
function returns a column vector consisting of the roots of f (x) modulo p. The
Arithmetic of Polynomials 133

computation of polrootsmod() may fail for a non-prime modulus m. But then,


one may ask GP/PARI to use a naive root-finding algorithm by providing a third
argument with value 1. This is acceptable if m is small, since polrootsmod()
does not encounter an error in this case, but may fail to produce all the roots.
For small m, a better approach is to evaluate the polynomial at all elements
of Zm . For the following snippet, notice that 731 = 17 × 43.

gp > polrootsmod(x^15 + x^12 + 2*x^6 + 3*x^4 + 6*x^3 + 3*x + 4, 73)


%1 = [Mod(9, 73), Mod(30, 73), Mod(32, 73), Mod(65, 73), Mod(67, 73), Mod(72, 73)]~
gp > polrootsmod(x^15 + x^12 + 2*x^6 + 3*x^4 + 6*x^3 + 3*x + 4, 731)
*** impossible inverse modulo: Mod(43, 731).
gp > polrootsmod(x^15 + x^12 + 2*x^6 + 3*x^4 + 6*x^3 + 3*x + 4, 731, 1)
%2 = [Mod(50, 731), Mod(424, 731)]~
gp > polrootsmod(x^2+3*x+2,731)
%3 = [Mod(730, 731)]~
gp > polrootsmod(x^2+3*x+2,731,1)
%4 = [Mod(84, 731), Mod(644, 731)]~
gp > findrootsmod(f,m) = for(x=0, m-1, if((eval(f)%m == 0), print1(x," ")))
gp > findrootsmod(x^2 + 3*x + 2, 731)
84 644 729 730
gp > findrootsmod(x^15 + x^12 + 2*x^6 + 3*x^4 + 6*x^3 + 3*x + 4, 731)
50 424 611 730

3.3 Factoring Polynomials over Finite Fields


The root-finding algorithm described above factors a product of linear
polynomials. This idea can be extended to arrive at an algorithm for factor-
ing arbitrary polynomials over finite fields. Indeed, the problem of factoring
polynomials over finite fields is conceptually akin to root finding.
Berlekamp’s Q-matrix method3 is the first modern factoring algorithm
(see Exercise 3.30). Cantor and Zassenhaus4 propose a probabilistic algorithm
of a different type. The algorithm I am going to describe is of the Cantor-
Zassenhaus type, and is a simplified version (without the optimizations) of
the algorithm of Von zur Gathen and Shoup.5 Kaltofen and Shoup6 propose
the fastest known variant in the Cantor-Zassenhaus family of algorithms.

3 Elwyn R. Berlekamp, Factoring polynomials over large finite fields, Mathematics of

Computation, 24(111), 713–735, 1970.


4 David G. Cantor and Hans Zassenhaus, A new algorithm for factoring polynomials over

finite fields, Mathematics of Computation, 36(154), 587–592, 1981.


5 Joachim von zur Gathen and Victor Shoup, Computing Frobenius maps and factoring

polynomials, STOC, 97–105, 1992.


6 Eric Kaltofen and Victor Shoup, Subquadratic-time factoring of polynomials over finite

fields, Mathematics of Computation, 67(223), 1179–1197, 1998.


134 Computational Number Theory

Our factorization algorithm has three stages described individually below.

3.3.1 Square-Free Factorization


Let f (x) ∈ Fq [x] be a non-constant polynomial to be factored. In this
stage, one writes f (x) as a product of square-free polynomials. In order to
explain this stage, we define the formal derivative of

f (x) = αd xd + αd−1 xd−1 + · · · + α1 x + α0

as the polynomial

f ′ (x) = dαd xd−1 + (d − 1)αd−1 xd−1 + · · · + 2α2 x + α1 .

Proposition 3.12 The formal derivative f ′ (x) of f (x) is 0 if and only if


f (x) = g(x)p for some g(x) ∈ Fq [x], where p is the characteristic of Fq . The
polynomial f (x)/ gcd(f (x), f ′ (x)) is square-free. In particular, f (x) is square-
free if and only if gcd(f (x), f ′ (x)) = 1.
Proof For proving the first assertion, let us rewrite f (x) as

f (x) = α1 xe1 + α2 xe2 + · · · + αk xek

with pairwise distinct exponents e1 , e2 , . . . , ek , and with each αi 6= 0. Then,

f ′ (x) = e1 α1 xe1 −1 + e2 α2 xe2 −1 + · · · + ek αk xek −1 .

The condition f ′ (x) = 0 implies that each ei is a multiple of p (where q = pn ).


n
Write ei = pǫi for i = 1, 2, . . . , k. Since αi ∈ Fpn , we have αip = αi for all i. It
n−1 n−1 n−1
follows that f (x) = g(x)p , where g(x) = α1p xǫ1 + α2p xǫ2 + · · · + αkp xǫk .
Conversely, if f (x) = g(x)p , then f ′ (x) = pg(x)p−1 g ′ (x) = 0.
Let f (x) = αf1 (x)s1 f2 (x)s2 · · · fl (x)sl be the factorization of f (x) into
monic irreducible factors fi (x) with si > 1 for all i (and with α ∈ F∗q ). Then,

f ′ (x) = αs1 f1 (x)s1 −1 f1′ (x)f2 (x)s2 · · · fl (x)sl +


αs2 f1 (x)s1 f2 (x)s2 −1 f2′ (x) · · · fl (x)sl +
· · · + αsl f1 (x)s1 f2 (x)s2 · · · fl (x)sl −1 fl′ (x)
h
= αf1 (x)s1 −1 f2 (x)s2 −1 · · · fl (x)sl −1 s1 f1′ (x)f2 (x)f3 (x) · · · fl (x) +
i
s2 f1 (x)f2′ (x)f3 (x) · · · fl (x) + · · · + sl f1 (x)f2 (x) · · · fl−1 (x)fl′ (x) .

The factor within square brackets is divisible by fi (x) if and only if p|si .
Therefore, gcd(f (x), f ′ (x)) = αf1 (x)t1 f2 (x)t2 · · · fl (x)tl , where each ti = si or
si − 1 according as whether p|si or not. That is, f (x)/ gcd(f (x), f ′ (x)) is a
divisor of f1 (x)f2 (x) · · · fl (x) and is, therefore, square-free. ⊳

This proposition shows us a way to compute the square-free factorization of


f (x). We first compute f ′ (x). If f ′ (x) = 0, we compute g(x) ∈ Fq [x] satisfying
Arithmetic of Polynomials 135

f (x) = g(x)p , and recursively compute the square-free factorization of g(x). If


f ′ (x) 6= 0, we compute h(x) = gcd(f (x), f ′ (x)). If h(x) = 1, then f (x) is itself
square-free. Otherwise, we output the square-free factor f (x)/h(x) of f (x),
and recursively compute the square-free factorization of h(x).

Example 3.13 Let us compute the square-free factorization of f (x) = x16 +


x8 + x6 + x4 + x2 + 1 ∈ F2 [x]. We have f ′ (x) = 0, so f (x) = g(x)2 , where
g(x) = x8 + x4 + x3 + x2 + x + 1. Now, g ′ (x) = x2 + 1 6= 0, and h(x) =
gcd(g(x), g ′ (x)) = x2 + 1, that is, g(x)/h(x) = x6 + x4 + x + 1 is square-free.
We compute the square-free factorization of h(x) = x2 +1. We have h′ (x) = 0.
Indeed, h(x) = (x+1)2 , where x+1, being coprime to its derivative, is square-
free. Thus, the square-free factorization of f (x) is

f (x) = (x6 + x4 + x + 1)(x6 + x4 + x + 1)(x + 1)(x + 1)(x + 1)(x + 1). ¤

3.3.2 Distinct-Degree Factorization


In view of square-free factorization, we may assume that the polynomial
f (x) ∈ Fq [x] to be factored is monic and square-free. By Theorem 3.1, the
r
polynomial xq − x is the product of all monic irreducible polynomials of Fq [x]
r
of degrees dividing r. Therefore, fr (x) = gcd(f (x), xq − x) is the product of
all irreducible factors of f (x) of degrees dividing r. If f (x) does not contain
factors of degrees smaller than r, then fr (x) is equal to the product of all
irreducible factors of f (x) of degree equal to r. This leads to Algorithm 3.4
for decomposing f (x) into the product f1 (x)f2 (x) · · · fr (x) · · · . At the r-th
r
iteration, fr (x) = gcd(f (x), xq − x) is computed, and f (x) is replaced by
f (x)/fr (x) so that irreducible factors of degree r are eliminated from f (x). It
r
is convenient to compute the polynomial xq − x modulo f (x).

Algorithm 3.4: Distinct-degree factorization of monic square-


free f (x) ∈ Fq [x]
Initialize r = 0 and g(x) = x.
While (f (x) 6= 1) {
Increment r.
Compute g(x) = g(x)q (mod f (x)).
Compute fr (x) = gcd(f (x), g(x) − x).
Output (fr (x), r).
If (fr (x) 6= 1) { Set f (x) = f (x)/fr (x) and g(x) = g(x) rem f (x). }
}

Example 3.14 (1) Let us compute the distinct-degree factorization of


f (x) = x20 +x17 +x15 +x11 +x10 +x9 +x5 +x3 +1 ∈ F2 [x]. One can check that
gcd(f (x), f ′ (x)) = 1, that is, f (x) is indeed square-free. We compute f1 (x) =
gcd(f (x), x2 + x) = 1, that is, f (x) has no linear factors. We then compute
136 Computational Number Theory

f2 (x) = gcd(f (x), x4 + x) = 1, that is, f (x) does not contain any quadratic
factors also. Since f3 (x) = gcd(f (x), x8 + x) = x6 + x5 + x4 + x3 + x2 + x + 1,
we discover that f (x) has two irreducible cubic factors. We replace f (x) by
f (x)/f3 (x) = x14 +x13 +x11 +x10 +x9 +x8 +x7 +x6 +x5 +x4 +x3 +x+1. Now,
we compute x16 + x ≡ x11 + x10 + x9 + x8 + x7 + x6 + x5 + x + 1 (mod f (x)),
where f (x) is the reduced polynomial of degree 14 mentioned above. Since
f4 (x) = gcd(f (x), (x16 + x) (mod f (x))) = x4 + x3 + x2 + x + 1, f (x) contains
a single irreducible factor of degree four, and we replace f (x) by f (x)/f4 (x) =
x10 +x8 +x7 +x5 +x3 +x2 +1. We subsequently compute x32 +x ≡ 0 (mod f (x)),
that is, f5 (x) = gcd(f (x), (x32 +x) (mod f (x))) = x10 +x8 +x7 +x5 +x3 +x2 +1,
that is, f (x) has two irreducible factors of degree five, and we replace f (x)
by f (x)/f5 (x) = 1, and the distinct-degree factorization loop terminates. We
have, therefore, obtained the following factorization of f (x).
f1 (x) = 1,
f2 (x) = 1,
f3 (x) = x6 + x5 + x4 + x3 + x2 + x + 1,
f4 (x) = x4 + x3 + x2 + x + 1,
f5 (x) = x10 + x8 + x7 + x5 + x3 + x2 + 1.
We need to factor f3 (x) and f5 (x) in order to obtain the complete factorization
of f (x). This is accomplished in the next stage.
(2) Now, let f (x) = x10 + (θ + 1)x9 + (θ + 1)x8 + x7 + θx6 + (θ + 1)x4 +
θx + θx2 + (θ + 1)x ∈ F4 [x], where F4 = F2 (θ) with θ2 + θ + 1 = 0. We
3

first compute f1 (x) = gcd(f (x), x4 + x) = x2 + θx, and replace f (x) by


f (x)/f1 (x) = x8 +x7 +x6 +(θ+1)x5 +(θ+1)x4 +x3 +x2 +θ. We then compute
g(x) ≡ x16 + x ≡ x7 + x6 + (θ + 1)x5 + θx4 + θx3 + x2 + θx + 1 (mod f (x))
and subsequently f2 (x) = gcd(f (x), g(x)) = x2 + x + θ. We then replace
f (x) by f (x)/f2 (x) = x6 + (θ + 1)x4 + θx2 + (θ + 1)x + 1. At the next
iteration, we compute x64 + x ≡ 0 (mod f (x)), that is, f3 (x) = gcd(f (x), 0) =
x6 + (θ + 1)x4 + θx2 + (θ + 1)x + 1, and f (x) is replaced by f (x)/f3 (x) = 1.
Thus, the distinct-degree factorization of f (x) is as follows.
f1 (x) = x2 + θx,
f2 (x) = x2 + x + θ,
f3 (x) = x6 + (θ + 1)x4 + θx2 + (θ + 1)x + 1.
That is, f (x) has two linear, one quadratic and two cubic factors. ¤

3.3.3 Equal-Degree Factorization


In the last stage of factorization, we assume that the polynomial f (x) ∈
Fq [x] to be factored is square-free and is a product of monic irreducible factors
of the same known degree r. If deg f = d, then f (x) is a product of d/r
irreducible factors. Our task is to find all the factors of f (x).
Arithmetic of Polynomials 137

In this stage, we use a strategy similar to the root-finding algorithm. Recall


r
that xq − x is the product of all monic irreducible polynomials of Fq [x] of
r
degrees dividing r. If we can split xq − x non-trivially, computing the gcd of
r
f (x) with a non-trivial factor of xq − x may yield a non-trivial factor of f (x).
We then recursively factor this factor of f (x) and the corresponding cofactor.
First, let q be odd. For any α ∈ Fq , we have
r r
xq − x = (x + α)q − (x + α)
³ r
´³ r
´
= (x + α) (x + α)(q −1)/2 − 1 (x + α)(q −1)/2 + 1 .
¡ r ¢
So g(x) ≡ (x + α)(q −1)/2 − 1 (mod f (x)) and f1 (x) = gcd(f (x), g(x)) are
computed. If 0 < deg f1 < deg f , we recursively split f1 (x) and f (x)/f1 (x).

Algorithm 3.5: Factoring f (x) ∈ Fq [x], a product of irreducible


factors of degree r, where q is odd
if (deg f = 0), return.
if (deg f = r) { Output f (x). Return. }
Initialize the flag splitfound to zero.
While (splitfound is zero) {
Choose α ∈ Fq randomly.
r
Compute g(x) = (x + α)(q −1)/2 − 1 (mod f (x)).
Compute f1 (x) = gcd(f (x), g(x)).
If (0 < deg f1 < deg f ) {
Set splitfound to one.
Recursively call Algorithm 3.5 on f1 (x).
Recursively call Algorithm 3.5 on f (x)/f1 (x).
}
}

Example 3.15 Let us factor f (x) = x9 +3x8 +3x7 +2x6 +2x+2 ∈ F5 [x]. It is
given that f (x) is the product of three cubic irreducible polynomials of F5 [x].
3
For α = 2 ∈ Fq , we have g(x) ≡ (x+2)(5 −1)/2 −1 ≡ (x+2)62 −1 ≡ 3x8 +3x7 +
x6 + 2x5 + 3x4 + 3x3 + 3x2 + x + 4 (mod f (x)), and f1 (x) = gcd(f (x), g(x)) =
x6 + 2x5 + x4 + 4x3 + 2x2 + x + 1. The cofactor f (x)/f1 (x) = x3 + x2 + 2 is
an irreducible factor of f (x). The other two factors are obtained from f1 (x).
For α = 0, g1 (x) ≡ x62 − 1 ≡ 0 (mod f1 (x)), that is, gcd(f1 (x), g1 (x)) =
f1 (x) is a trivial factor of f1 (x).
For α = 3, we obtain g1 (x) ≡ (x + 3)62 − 1 ≡ 4x4 + 4x2 + 4x (mod f1 (x)),
and gcd(f1 (x), g1 (x)) = x3 + x + 1 is an irreducible factor of f1 (x). The other
factor of f1 (x) is f1 (x)/(x3 + x + 1) = x3 + 2x2 + 1. Thus, f factors as

x9 + 3x8 + 3x7 + 2x6 + 2x + 2 = (x3 + x + 1)(x3 + x2 + 2)(x3 + 2x2 + 1). ¤


138 Computational Number Theory
r nr
If q = 2n , the polynomial xq − x = x2 + x factors for any α ∈ Fq as
2nr 2nr
x + x = (x + α) + (x + α)
³ 2 3 nr−1
´
= (x + α) + (x + α)2 + (x + α)2 + (x + α)2 + · · · + (x + α)2 ×
³ 2 3 nr−1
´
1 + (x + α) + (x + α)2 + (x + α)2 + (x + α)2 + · · · + (x + α)2 .
2 3 nr−1
We compute g(x) ≡ (x+α)+(x+α)2 +(x+α)2 +(x+α)2 +· · ·+(x+α)2
(mod f (x)) and f1 (x) = gcd(f (x), g(x)). If 0 < deg f1 < deg f , we recursively
split f1 (x) and f (x)/f1 (x). The steps are explained in Algorithm 3.6.

Algorithm 3.6: Factoring f (x) ∈ F2n [x], a product of irreducible


factors of degree r
If (deg f = 0) return.
If (deg f = r) return f (x).
Set the flag splitfound to zero.
While (splitfound is zero) {
Choose α ∈ F2n randomly.
Set t(x) = x + α, and s(x) = t(x).
For i = 1, 2, . . . , nr − 1 {
Set t(x) = t(x)2 (mod f (x)), and s(x) = s(x) + t(x).
}
Compute f1 (x) = gcd(f (x), s(x)).
if (0 < deg f1 < deg f ) {
Set splitfound to one.
Recursively call Algorithm 3.6 on f1 (x).
Recursively call Algorithm 3.6 on f (x)/f1 (x).
}
}

Example 3.16 Let us represent F4 = F2 (θ) with θ2 + θ + 1 = 0, and factor


f (x) = x6 +θx5 +(θ +1)x4 +x3 +θ ∈ F4 [x]. It is given that f (x) is the product
of two cubic irreducible polynomials. For α = θ, g(x) ≡ (x + θ) + (x + θ)2 +
(x + θ)4 + (x + θ)8 + (x + θ)16 + (x + θ)32 ≡ θx4 + θx2 + θx + 1 (mod f (x)),
and gcd(f (x), g(x)) = x3 + θx2 + θx + θ is a factor of f (x). The other factor
is f (x)/(x3 + θx2 + θx + θ) = x3 + x + 1. Thus, we get the factorization
x6 + θx5 + (θ + 1)x4 + x3 + θ = (x3 + x + 1)(x3 + θx2 + θx + θ). ¤
A big danger is waiting for us. In certain situations, Algorithms 3.5 and 3.6
fail to split f (x) completely into irreducible factors. The most conceivable case
is when q is small compared to the number d/r of irreducible factors of f (x).
Since there are approximately q r /r monic irreducible polynomials of degree
r in Fq [x], this situation can indeed arise. That is, if q is small compared to
d/r, it may so happen that all elements α of Fq are tried in the equal-degree
factorization algorithm, but a complete split of f (x) is not achieved.
Arithmetic of Polynomials 139

Example 3.17 (1) Let us try to split f (x) = x16 + 2x15 + x14 + x13 + 2x12 +
x11 + x10 + 2x6 + x5 + 2x4 + 2x2 + 2x + 1 ∈ F3 [x] which is known to be a
product of four irreducible factors each of degree four. We try all three values
of α ∈ F3 in Algorithm 3.5. The splits obtained are listed below.

α gα (x) ≡ x40 − 1 (mod f (x)) fα1 (x) = gcd(f (x), gα (x)) fα2 (x) = f (x)/fα1 (x)
0 0 f (x) 1
1 2x13 + x12 + x8 + 2x7 + 2x6 + x4 + x2 + 2x + 1 x12 + 2x11 + 2x7 + x6 +
2x5 + 2x4 + 2x3 + 2x + 1 x5 + x4 + x3 + x2 + 1
2 x14 + 2x13 + 2x12 + 2x11 + x8 + x7 + 2x6 + 2x5 + x8 + x7 + x6 + 2x5 +
2x + 2x + 2x + x + x + x x4 + 2x3 + 2x2 + x + 1
8 7 5 3 2
x4 + 2x2 + x + 1

Thus, we have discovered the irreducible factor f11 (x) = x4 + x2 + 2x + 1 of


f (x). If we plan to split f12 (x), we get trivial factorizations for α = 0, 1. For
α = 2, we obtain the non-trivial split f12 (x) = (x4 + x3 + 2x + 1)f21 (x) which
reveals yet another irreducible factor of f (x).
We cannot split f21 (x) non-trivially for all values of α ∈ {0, 1, 2}. On the
other hand, α = 1 splits f22 (x) into known factors x4 + x2 + 2x + 1 and
x4 + x3 + 2x + 1. Thus, Algorithm 3.5 discovers only the partial factorization

f (x) = x16 + 2x15 + x14 + x13 + 2x12 + x11 + x10 + 2x6 + x5 +


2x4 + 2x2 + 2x + 1
= (x4 + x2 + 2x + 1)(x4 + x3 + 2x + 1) ×
(x8 + x7 + 2x6 + 2x5 + x4 + 2x3 + 2x2 + x + 1),

since it fails to split the factor of degree eight for all values of α ∈ F3 .
(2) Let us try to factor f (x) = x15 + x7 + x3 + x + 1 ∈ F2 [x] using Algo-
rithm 3.6. It is given that f (x) is the product of three irreducible polynomials
each of degree five. We compute g(x) ≡ (x + α) + (x + α)2 + (x + α)4 +
(x + α)8 + (x + α)16 (mod f (x)) for α = 0, 1. For α = 0 we get g(x) = 0
so that gcd(f (x), g(x)) = f (x), whereas for α = 1 we get g(x) = 1 so that
gcd(f (x), g(x)) = 1. So Algorithm 3.6 fails to split f (x) at all.
(3) Splitting may fail even in a case where the field can supply more
elements than the number of irreducible factors of f (x). By Example 3.14(2),
f (x) = x6 + (θ + 1)x4 + θx2 + (θ + 1)x + 1 ∈ F4 [x] is a product of two
cubic irreducible factors, where F4 = F2 (θ) with θ2 + θ + 1 = 0. The values of
g(x) ≡ (x+α)+(x+α)2 +(x+α)4 +(x+α)8 +(x+α)16 +(x+α)32 (mod f (x))
are listed below for all α ∈ F4 . No value of α ∈ F4 can split f (x) non-trivially.

α g(x) gcd(f (x), g(x))


0 0 f (x)
1 0 f (x)
θ 1 1
θ+1 1 1
¤
140 Computational Number Theory

The root-finding algorithms of Section 3.2 may encounter similar failures,


particularly when the underlying field Fq is small. However, any input poly-
nomial cannot have more than q roots in Fq , so failure cases are relatively
uncommon. Moreover, we can evaluate the input polynomial at all elements
of Fq . This is feasible (may even be desirable) if q is small.
We can modify Algorithms 3.5 and 3.6 so that this difficulty is removed.
(A similar modification applies to root-finding algorithms too.) Let u(x) =
u0 + u1 x + u2 x2 + · · · + us xs be a non-constant polynomial in Fq [x]. We
r r r r r
have u(x)q = u0 + u1 xq + u2 x2q + · · · + us xsq , so that u(x)q − u(x) =
r r r r
u1 (xq − x) + u2 (x2q − x2 ) + · · · + us (xsq − xs ). Since xσq − xσ is divisible
r r
by xq − x for all σ ∈ N, the polynomial u(x)q − u(x) contains as factors
all monic irreducible polynomials of Fq [x] of degree r (together with other
irreducible factors, in general). If q is odd, we have
r
³ r
´³ r
´
u(x)q − u(x) = u(x) u(x)(q −1)/2 − 1 u(x)(q −1)/2 + 1 .
r
Therefore, gcd(f (x), u(x)(q −1)/2 −1) is potentially a non-trivial factor of f (x).
On the other hand, if q = 2n , we have
nr
³ 2 3 nr−1
´
u(x)2 + u(x) = u(x) + u(x)2 + u(x)2 + u(x)2 + · · · + u(x)2 ×
³ 2 3 nr−1
´
1 + u(x) + u(x)2 + u(x)2 + u(x)2 + · · · + u(x)2 .
2 3 nr−1
That is, gcd(f (x), u(x)+u(x)2 +u(x)2 +u(x)2 +· · ·+u(x)2 ) is potentially
a non-trivial factor of f (x). Algorithms 3.7 and 3.8 incorporate these modi-
fications. Algorithms 3.5 and 3.6 are indeed special cases of these algorithms
corresponding to u(x) = x + α for α ∈ Fq . A convenient way of choosing the
polynomial u(x) in Algorithm 3.8 is discussed in Exercise 3.10.

Algorithm 3.7: Modification of Algorithm 3.5 for equal-degree


factorization of f (x) ∈ Fq [x] with q odd
If (deg f = 0), return.
If (deg f = r), return f (x).
Set the flag splitfound to zero.
While (splitfound is zero) {
Choose random non-constant u(x) ∈ Fq [x] of small degree.
r
Compute g(x) = u(x)(q −1)/2 − 1 (mod f (x)).
Compute f1 (x) = gcd(f (x), g(x)).
If (0 < deg f1 < deg f ) {
Set splitfound to one.
Recursively call Algorithm 3.7 on f1 (x).
Recursively call Algorithm 3.7 on f (x)/f1 (x).
}
}
Arithmetic of Polynomials 141

Algorithm 3.8: Modification of Algorithm 3.6 for equal-degree


factorization of f (x) ∈ F2n [x]
If (deg f = 0), return.
If (deg f = r), return f (x).
Set the flag splitfound to zero.
While (splitfound is zero) {
Choose random non-constant u(x) ∈ F2n [x] of small degree.
Set s(x) = u(x).
for i = 1, 2, . . . , nr − 1 {
Compute u(x) = u(x)2 (mod f (x)), and s(x) = s(x) + u(x).
}
Compute f1 (x) = gcd(f (x), s(x)).
If (0 < deg f1 < deg f ) {
Set splitfound to one.
Recursively call Algorithm 3.8 on f1 (x).
Recursively call Algorithm 3.8 on f (x)/f1 (x).
}
}

Example 3.18 Let us now handle the failed attempts of Example 3.17.
(1) We factor f (x) = x8 + x7 + 2x6 + 2x5 + x4 + 2x3 + 2x2 + x + 1 ∈ F3 [x]
which is known to be a product of two irreducible factors of degree four.
Choose u(x) = x2 + x + 2 for which g(x) ≡ u(x)40 − 1 ≡ 2x7 + 2x5 + x4 + 2x3 +
2x2 + 2x + 2 (mod f (x)). This yields the non-trivial factor gcd(f (x), g(x)) =
x4 +x2 +x+1. The other factor of f (x) is f (x)/(x4 +x2 +x+1) = x4 +x3 +x2 +1.
Therefore, we have the equal-degree factorization

f (x) = x8 + x7 + 2x6 + 2x5 + x4 + 2x3 + 2x2 + x + 1


= (x4 + x2 + x + 1)(x4 + x3 + x2 + 1).

(2) Now, we try to factor f (x) = x15 + x7 + x3 + x + 1 ∈ F2 [x] which is


known to be the product of three irreducible polynomials each of degree five.
Choose u(x) = x3 +1. This gives g(x) ≡ u(x)+u(x)2 +u(x)4 +u(x)8 +u(x)16 ≡
x10 + x8 + x6 + x5 + x4 + x + 1 (mod f (x)), and we obtain the split of f (x) into
f1 (x) = gcd(f (x), g(x)) = x10 + x8 + x6 + x5 + x4 + x + 1, and the cofactor
f (x)/f1 (x) = x5 + x3 + 1 which is already an irreducible factor of f (x).
It now remains to split f1 (x). Polynomials u(x) of degrees 1, 2, 3, 4 fail
to split f1 (x). However, the choice u(x) = x5 + 1 works. We obtain g1 (x) ≡
u(x) + u(x)2 + u(x)4 + u(x)8 + u(x)16 ≡ x9 + x8 + x7 + x5 + x2 + x (mod f1 (x)),
and the factors of f1 (x) are revealed as gcd(f1 (x), g1 (x)) = x5 + x2 + 1 and
f1 (x)/(x5 + x2 + 1) = x5 + x3 + x2 + x + 1. To sum up, we have

f (x) = x15 + x7 + x3 + x + 1 = (x5 + x2 + 1)(x5 + x3 + 1)(x5 + x3 + x2 + x + 1).


142 Computational Number Theory

(3) We represent F4 = F2 (θ) with θ2 + θ + 1 = 0, and try to factor


f (x) = x6 + (θ + 1)x4 + θx2 + (θ + 1)x + 1 ∈ F4 [x] which is a product of two
cubic irreducible polynomials. All polynomials u(x) of degrees 1, 2, 3, 4 fail to
split f (x) non-trivially. The choice u(x) = x5 produces a non-trivial split. We
have g(x) ≡ u(x) + u(x)2 + u(x)4 + u(x)8 + u(x)16 + u(x)32 ≡ θx5 + x3 + θx2 +
(θ+1)x+(θ+1) (mod f (x)) which yields the factors gcd(f (x), g(x)) = x3 +x+1
and f (x)/(x3 + x + 1) = x3 + θx + 1 of f (x). ¤

The root-finding and factoring algorithms for polynomials over finite fields,
as discussed above, are randomized. The best deterministic algorithms known
for these problems have running times fully exponential in log q (the size of
the underlying field). The best known deterministic algorithm for factoring a
polynomial of degree d in Fq [x] is from Shoup7 , and is shown by Shparlinski8
to run in O(q 1/2 (log q)d2+ǫ ) time, where dǫ stands for a polynomial in log d.
Computations over finite fields exploit randomization very effectively.

3.3.4 Factoring Polynomials in GP/PARI


The generic factoring function in GP/PARI is factor(). One may supply a
polynomial over some prime field as its only argument. The function returns
the irreducible factors of the polynomial together with the multiplicity of each
factor. One may alternatively use the function factormod() which takes two
arguments: a polynomial f (x) ∈ Z[x] and a prime modulus p.

gp > factor(Mod(1,2)*x^15 + Mod(1,2)*x^7 + Mod(1,2)*x^3 + Mod(1,2)*x + Mod(1,2))


%1 =
[Mod(1, 2)*x^5 + Mod(1, 2)*x^2 + Mod(1, 2) 1]

[Mod(1, 2)*x^5 + Mod(1, 2)*x^3 + Mod(1, 2) 1]

[Mod(1, 2)*x^5 + Mod(1, 2)*x^3 + Mod(1, 2)*x^2 + Mod(1, 2)*x + Mod(1, 2) 1]

gp > factor(Mod(1,2)*x^16 + Mod(1,2)*x^8 + Mod(1,2)*x^6 + Mod(1,2)*x^4 + \


Mod(1,2)*x^2 + Mod(1,2))
%2 =
[Mod(1, 2)*x + Mod(1, 2) 6]

[Mod(1, 2)*x^2 + Mod(1, 2)*x + Mod(1, 2) 2]

[Mod(1, 2)*x^3 + Mod(1, 2)*x + Mod(1, 2) 2]

gp > factor(Mod(1,3)*x^16 + Mod(2,3)*x^15 + Mod(1,3)*x^14 + Mod(1,3)*x^13 + \


Mod(2,3)*x^12 + Mod(1,3)*x^11 + Mod(1,3)*x^10 + Mod(2,3)*x^6 + \
Mod(1,3)*x^5 + Mod(2,3)*x^4 + Mod(2,3)*x^2 + Mod(2,3)*x + Mod(1,3))
%3 =

7 Victor Shoup, On the deterministic complexity of factoring polynomials over finite

fields, Information Processing Letters, 33, 261–267, 1990.


8 Igor E. Shparlinski, Computational problems in finite fields, Kluwer, 1992.
Arithmetic of Polynomials 143

[Mod(1, 3)*x^4 + Mod(1, 3)*x^2 + Mod(1, 3)*x + Mod(1, 3) 1]

[Mod(1, 3)*x^4 + Mod(1, 3)*x^2 + Mod(2, 3)*x + Mod(1, 3) 1]

[Mod(1, 3)*x^4 + Mod(1, 3)*x^3 + Mod(2, 3)*x + Mod(1, 3) 1]

[Mod(1, 3)*x^4 + Mod(1, 3)*x^3 + Mod(1, 3)*x^2 + Mod(1, 3) 1]

gp > factormod(x^20 + x^17 + x^15 + x^11 + x^10 + x^9 + x^5 + x^3 + 1, 2)


%4 =
[Mod(1, 2)*x^3 + Mod(1, 2)*x + Mod(1, 2) 1]

[Mod(1, 2)*x^3 + Mod(1, 2)*x^2 + Mod(1, 2) 1]

[Mod(1, 2)*x^4 + Mod(1, 2)*x^3 + Mod(1, 2)*x^2 + Mod(1, 2)*x + Mod(1, 2) 1]

[Mod(1, 2)*x^5 + Mod(1, 2)*x^2 + Mod(1, 2) 1]

[Mod(1, 2)*x^5 + Mod(1, 2)*x^3 + Mod(1, 2) 1]

gp > factormod(x^9 + 3*x^8 + 3*x^7 + 2*x^6 + 2*x + 2, 5)


%5 =
[Mod(1, 5)*x^3 + Mod(1, 5)*x + Mod(1, 5) 1]

[Mod(1, 5)*x^3 + Mod(1, 5)*x^2 + Mod(2, 5) 1]

[Mod(1, 5)*x^3 + Mod(2, 5)*x^2 + Mod(1, 5) 1]

gp > factormod(x^15 + x^12 + 2*x^6 + 3*x^4 + 6*x^3 + 3*x + 4, 73)


%6 =
[Mod(1, 73)*x + Mod(1, 73) 2]

[Mod(1, 73)*x + Mod(6, 73) 1]

[Mod(1, 73)*x + Mod(8, 73) 1]

[Mod(1, 73)*x + Mod(41, 73) 1]

[Mod(1, 73)*x + Mod(43, 73) 1]

[Mod(1, 73)*x + Mod(64, 73) 1]

[Mod(1, 73)*x^8 + Mod(55, 73)*x^7 + Mod(11, 73)*x^6 + Mod(37, 73)*x^5 + Mod(5, 7


3)*x^4 + Mod(62, 73)*x^3 + Mod(62, 73)*x^2 + Mod(46, 73)*x + Mod(62, 73) 1]

Both factor() and factormod() allow working in Zm [x] for a composite m.


The factorization attempt of GP/PARI may fail in that case. Although it makes
sense to find roots of polynomials in Zm [x], factorization in Zm [x] does not
make good sense, for Zm [x] is not a unique factorization domain (not even an
integral domain). For example, x2 +7 ≡ (x+1)(x+7) ≡ (x+3)(x+5) (mod 8).

gp > factor(Mod(1,99)*x^2 + Mod(20,99)*x + Mod(1,99))


%7 =
144 Computational Number Theory

[Mod(1, 99)*x + Mod(10, 99) 2]

gp > factormod(x^2 + 20*x + 1, 99)


%8 =
[Mod(1, 99)*x + Mod(10, 99) 2]

gp > factor(Mod(1,100)*x^2 + Mod(20,100)*x + Mod(1,100))


*** impossible inverse modulo: Mod(2, 100).
gp > factormod(x^2 + 20*x + 1,100)
*** impossible inverse modulo: Mod(2, 100).

GP/PARI also provides facilities for factoring polynomials over extension


fields Fq = Fpn . The relevant function is factorff() which takes three argu-
ments: the first argument is the polynomial f (x) ∈ Fq [x] to be factored, the
second is the characteristic p of Fq , and the third is the irreducible polynomial
ι(θ) used to represent Fq as an extension of Fp . Let us use t in order to denote
the element θ (a root of ι) that is adjoined to Fp for representing the extension
Fq . We use the variable x for the polynomial f (x) ∈ Fq [x] to be factored.
Here is an example (see Example 3.18(3)). Let us represent F4 = F2 (θ),
where θ2 + θ + 1 = 0. The second argument to factorff() will then be 2, and
the third argument will be ι(t) = t2 + t + 1. Suppose that we want to factor
f (x) = x6 + (θ + 1)x4 + θx2 + (θ + 1)x + 1 ∈ F4 [x]. Thus, we should pass
x6 + (t + 1)x4 + tx2 + (t + 1)x + 1 as the first argument to factorff().

gp > lift(factorff(x^6 + (t+1)*x^4 + t*x^2 + (t+1)*x + 1, 2, t^2 + t + 1))


%9 =
[Mod(1, 2)*x^3 + Mod(1, 2)*x + Mod(1, 2) 1]

[Mod(1, 2)*x^3 + (Mod(1, 2)*t)*x + Mod(1, 2) 1]

The output of GP/PARI is quite elaborate, so we lift the output of factorff


to avoid seeing the modulus ι(t). A second lift suppresses the prime p too.
Two other examples follow. The first example corresponds to the factorization

x10 + (θ + 1)x9 + (θ + 1)x8 + x7 + θx6 + (θ + 1)x4 + θx3 + θx2 + (θ + 1)x


= x(x + θ)(x2 + x + θ)(x3 + x + 1)(x3 + θx + 1) ∈ F4 [x]

(see Examples 3.14(2) and 3.18(3)), and the second example to

x15 + (θ + 2)x12 + (2θ + 1) = (x + (θ + 1))6 (x3 + θx + (θ + 2))3 ∈ F9 [x],

where F9 = F3 (θ) with θ2 + 1 = 0.

gp > lift(lift(factorff( \
x^10+(t+1)*x^9+(t+1)*x^8+x^7+t*x^6+(t+1)*x^4+t*x^3+t*x^2+(t+1)*x, \
2, t^2+t+1)))
Arithmetic of Polynomials 145

%10 =
[x 1]

[x + t 1]

[x^2 + x + t 1]

[x^3 + x + 1 1]

[x^3 + t*x + 1 1]

gp > lift(lift(factorff(x^15 + (t+2)*x^12 + (2*t+1), 3, t^2+1)))


%11 =
[x + (t + 1) 6]

[x^3 + t*x + (t + 2) 3]

3.4 Properties of Polynomials with Integer Coefficients


In this and the next sections, we study polynomials with integer coeffi-
cients. The basic results of interest are the factorization algorithms of Sec-
tion 3.5. This section develops some prerequisites for understanding them.

3.4.1 Relation with Polynomials with Rational Coefficients


Z is a unique factorization domain (UFD) whose field of fractions is Q.
This means that given a non-zero f (x) ∈ Q[x], we can multiply f (x) by a
suitable non-zero integer a in order to obtain af (x) ∈ Z[x]. Conversely, given
a non-zero polynomial f (x) ∈ Z[x] with leading coefficient a 6= 0, we obtain
the monic polynomial a1 f (x) ∈ Q[x].
A more profound fact is that factoring in Z[x] is essentially the same as in
Q[x]. Proving this fact requires the following concept.

Definition 3.19 Let f (x) = a0 + a1 x + · · · + ad xd ∈ Z[x] be non-zero. The


positive integer gcd(a0 , a1 , . . . , ad ) is called the content of f (x), and is denoted
by cont f (x). If cont f (x) = 1, we call f (x) a primitive polynomial.9 ⊳

Lemma 3.20 Let f (x), g(x) ∈ Z[x] be non-zero. Then, cont(f (x)g(x)) =
(cont f (x))(cont g(x)). In particular, the product of two primitive polynomials
is again primitive.
9 This primitive polynomial has nothing to do with the primitive polynomial of Defini-

tion 2.30. The same term used for describing two different objects may create confusion.
But we have to conform to conventions.
146 Computational Number Theory
Pm Pn
Proof Let f (x) = i=0 ai xi and g(x) = j=0 bj xj with a = cont f (x) and
b = cont g(x). Write f (x) = af¯(x) and g(x) = bḡ(x), where f¯(x), ḡ(x) are
primitive polynomials. Since f (x)g(x) = abf¯(x)ḡ(x), it suffices to show that
the product of two primitive polynomials is primitive, and we assume without
loss of generality that f (x), g(x) are themselves primitive (that is, a = b = 1).
We proceed by contradiction. Assume that f (x)g(x) is not primitive, that is,
there exists a prime p | cont(f (x)g(x)), that is, p divides every coefficient of
f (x)g(x). Since f (x) is primitive, all coefficients of f (x) are not divisible by p.
Let s be the smallest non-negative integer for which p6 | as . Analogously, let t
be the smallest non-negative integer for which p6 | bt . The coefficient of xs+t in
f (x)g(x) is as bt + (as−1 bt+1 + as−2 bt+2 + · · ·) + (as+1 bt−1 + as+2 bt−2 + · · ·). By
the choice of s, t, the prime p divides as−1 , as−2 , . . . , a0 and bt−1 , bt−2 , . . . , b0 ,
that is, p divides as bt , a contradiction, since p6 | as and p6 | bt . ⊳
Theorem 3.21 Let f (x) ∈ Z[x] be a primitive polynomial. Then, f (x) is
irreducible in Z[x] if and only if f (x) is irreducible in Q[x].
Proof The “if” part is obvious. For proving the “only if” part, assume
that f (x) = g(x)h(x) is a non-trivial factorization of f (x) in Q[x]. We can
write g(x) = aḡ(x) and h(x) = bh̄(x) with a, b ∈ Q∗ and with primitive
polynomials ḡ(x), h̄(x) ∈ Z[x]. We have f (x) = abḡ(x)h̄(x). Since f (x) and
ḡ(x)h̄(x) are primitive polynomials in Z[x], we must have ab = ±1, that is,
f (x) = (abḡ(x))(h̄(x)) is a non-trivial factorization of f (x) in Z[x]. ⊳
A standard way to determine the irreducibility or otherwise of a primitive
polynomial f (x) in Z[x] is to factor f (x) in Z[x] or Q[x]. In Section 3.5,
we will study some algorithms for factoring polynomials in Z[x]. There are
certain special situations, however, when we can confirm the irreducibility of
a polynomial in Z[x] more easily than factoring the polynomial.
Theorem 3.22 [Eisenstein’s criterion] Let f (x) = a0 + a1 x + · · · + ad xd ∈
Z[x] be a primitive polynomial, and p a prime that divides a0 , a1 , . . . , ad−1 ,
but not ad . Suppose also that p26 | a0 . Then, f (x) is irreducible.
Proof Suppose that Pfm(x) = g(x)h(x) is aP non-trivial factorization of f (x) in
n
Z[x], where g(x) = i=0 bi xi and h(x) = j=0 cj xj . Since f (x) is primitive,
g(x) and h(x) are primitive too. We have a0 = b0 c0 . By hypothesis, p|a0 but
p2 6 | a0 , that is, p divides exactly one of b0 and c0 . Let p|c0 . Since h(x) is
primitive, all cj are not divisible by p. Let t be the smallest positive integer
for which p 6 | ct . We have t 6 deg h(x) = d − deg g(x) < d, that is, at is
divisible by p. But at = b0 ct + b1 ct−1 + b2 ct−2 + · · · . By the choice of t, all
the coefficients ct−1 , ct−2 , . . . , c0 are divisible by p. It follows that p|b0 ct , but
p6 | b0 and p6 | ct , a contradiction. ⊳
Example 3.23 I now prove that f (x) = 1 + x + x2 + · · · + xp−1 ∈ Z[x] is
irreducible for p ∈ P. Evidently, f (x) is irreducible if and only if f (x+1) is. But
p p ¡ ¢ ¡ ¢ ¡ p ¢
f (x) = xx−1−1
, so f (x + 1) = (x+1) −1
(x+1)−1 = x
p−1
+ p1 xp−2 + p2 xp−3 + · · · + p−1
satisfies Eisenstein’s criterion. ¤
Arithmetic of Polynomials 147

3.4.2 Height, Resultant, and Discriminant


In this section, we discuss some auxiliary results pertaining to polynomials.
Unless otherwise stated, we deal with polynomials in C[x]. All results for
such polynomials are evidently valid for polynomials with integral or rational
coefficients. In certain situations, we may generalize the concepts further, and
deal with polynomials over an arbitrary field K.

Definition 3.24 Let f (x) = a0 + a1 x + a2 x2 + · · · + ad xd ∈ C[x] with ad 6= 0


(so that deg f (x) = d). Suppose that z1 , z2 , . . . , zd ∈ C are all the roots of
f (x), that is, f (x) = ad (x − z1 )(x − z2 ) · · · (x − zd ). We define the following
quantities associated with f (x).
H(f ) = height of f (x) max(|a0 |, |a1 |, . . . , |ad |),
= p
|f | = Euclidean norm of (a0 , a1 , . . . , ad ) = |a0 |2 + |a1 |2 + · · · + |ad |2 ,
Qd
M (f ) = measure of f (x) = |ad | i=1 max(1, |zi |). ⊳

Now, we specialize to polynomials with integer coefficients. Let f (x) ∈ Z[x]


be as in Definition 3.24, g(x) ∈ Z[x] a factor of f (x) with deg g(x) = m, and
h(x) = f (x)/g(x). It may initially appear that the coefficients of g(x) can be
arbitrarily large (in absolute value). Cancellation of coefficients in the product
g(x)h(x) leaves only small coefficients in f (x). The following theorem shows
that this is not the case. The coefficients of g(x) are instead bounded in
absolute value by some function in the degree and coefficients of f (x).

Proposition 3.25 √ With the notations introduced in the last paragraph, we


have H(g) 6 2m d + 1 H(f ).
Proof Renaming the roots z1 , z2 , . . . , zd of f (x), we assume that the roots
of g(x) are z1 , z2 , . . . , zm , and write g(x) = b0 + b1 x + b2 x2 + · · · + bm xm =
P − z1 )(x − z2 ) · · · (x − zm ). For i ∈ {0, 1, 2, . . . , m}, we have bi =
bm (x
bm zj1 zj2 · · · zjm−i , where the sum runs over all subsets ¯ P {j1 , j2 , . . . , jm−i
¯ }
of size
P m − i of {1, 2, . . . , m}. But |b i | = |b m | × ¯ z z
j1 j2 · · · zjm−i
¯ 6
|bm | |zj1 zj2 · · · zjm−i | (by the triangle inequality). Since g(x)|f (x) in Z[x],
¡ m ¢bm |a¡dmin
we have ¢ Z, so |bm | 6 |ad |, and each |zj1 zj2¡ · ·¢· zjm−i | 6 M (f ). There
are m−i = i tuples (j1 , j2 , . . . , jm−i ), so |bi | 6 mi M (f ). By the inequal-
p
ity of Landau (Exercise 3.37), M (f ) 6 |f | = |a0 |2 + |a1 |2 + · · · + |ad |2 6
p √
2 |2 , . . . , |ad |2 ) = d + 1 H(f ). Consequently, each |bi | 6
¡m(d¢√+ 1) max(|a0 | , |a m
1√
i √ d + 1 H(f ) 6 2 d + 1 H(f ), so that H(g) = max(|b0 |, |b1 |, . . . , |bm |) 6
m
2 d + 1 H(f ) too. ⊳

Definition 3.26 Let K be an arbitrary field, and let


f (x) = am xm + am−1 xm−1 + · · · + a1 x + a0 , and
g(x) = bn xn + bn−1 xn−1 + · · · + b1 x + b0
be non-zero polynomials in K[x]. The resultant Res(f (x), g(x)) of f (x) and
g(x) is defined to be the determinant of the (m + n) × (m + n) matrix
148 Computational Number Theory

Syl(f (x), g(x)) =


 
am am−1 ··· ··· ··· ··· a1 a0 0 0 ··· 0 0
 0 am am−1 ··· ··· ··· ··· a1 a0 0 ··· 0 0 
 ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ···
 
 0 ··· 0 am am−1 ··· ··· ··· ··· ··· ··· a1 a0 
 
 bn bn−1 ··· b1 b0 0 0 ··· ··· ··· ··· 0 0 
 
 0 bn bn−1 ··· b1 b0 0 0 ··· ··· ··· 0 0 
 
··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ···
0 0 ··· ··· 0 0 bn bn−1 ··· ··· ··· b1 b0

called the Sylvester matrix10 of f (x) and g(x), that is, Res(f (x), g(x)) =
det Syl(f (x), g(x)). If f (x) = 0 or g(x) = 0, we define Res(f (x), g(x)) = 0. ⊳
Some elementary properties of resultants are listed now.
Proposition 3.27 Let f (x), g(x) ∈ K[x] be as in Definition 3.26.
(1) Res(g(x), f (x)) = (−1)mn Res(f (x), g(x)).
(2) Let m > n, and r(x) = f (x) rem g(x) 6= 0. Then, Res(f (x), g(x)) =
(−1)mn bm−n
n ×Res(g(x), r(x)). In particular, resultants can be computed using
the Euclidean gcd algorithm for polynomials.
(3) Let α1 , α2 , . . . , αm be the roots of f (x), and β1 , β2 , . . . , βn the roots of
g(x) (in some extension of K). Then, we have
m
Y n
Y m Y
Y n
Res(f (x), g(x)) = anm g(αi ) = (−1)mn bm
n f (βj ) = anm bm
n (αi − βj ).
i=1 j=1 i=1 j=1

In particular, Res(f (x), g(x)) = 0 if and only if f (x) and g(x) have a non-
trivial common factor (in K[x]). ⊳
Example 3.28 (1) For K = Q, f (x) = 2x3 + 1, and g(x) = x2 − 2x + 3,
¯ ¯
¯2 0 0 1 0¯
¯ ¯
¯0 2 0 0 1¯
¯ ¯
Res(f (x), g(x)) = ¯ 1 −2 3 0 0 ¯ = 89.
¯ ¯
¯0 1 −2 3 0 ¯
¯ ¯
0 0 1 −2 3
We can also compute this resultant by Euclidean gcd:
r(x) = f (x) rem g(x) = 2x − 11,
s(x) = g(x) rem r(x) = 89/4.
Therefore,
Res(f (x), g(x)) = (−1)6 13 Res(g(x), r(x)) =
Res(g(x), r(x)) = (−1)2 22 Res(r(x), s(x)) =
4 Res(r(x), s(x)) = 4 × (89/4) = 89.
10 This is named after the English mathematician James Joseph Sylvester (1814–1897).
Arithmetic of Polynomials 149

(2) Take f (x) = x4 + x2 + 1 and g(x) = x3 + 1 in Q[x]. We have


¯ ¯
¯1 0 1 0 1 0 0¯
¯ ¯
¯0 1 0 1 0 1 0¯
¯ ¯
¯0 0 1 0 1 0 1¯
¯ ¯
Res(f (x), g(x)) = ¯ 1 0 0 1 0 0 0¯ = 0.
¯ ¯
¯0 1 0 0 1 0 0¯
¯ ¯
¯0 0 1 0 0 1 0¯
¯ ¯
0 0 0 1 0 0 1
This is expected, since f (x) = (x2 + x + 1)(x2 − x + 1) and g(x) = (x + 1)(x2 −
x + 1) have a non-trivial common factor.
(3) Take K = F89 , f (x) = 2x3 + 1, and g(x) = x2 − 2x + 3 in F89 [x].
Part (1) reveals that Res(f (x), g(x)) = 0 in this case. In F89 [x], we have
f (x) = (x + 39)(x2 + 50x + 8), and g(x) = (x + 39)(x + 48),
that is, f (x) and g(x) share the non-trivial common factor x + 39. ¤

Definition 3.29 Let K be an arbitrary field, and f (x) ∈ K[x] a polynomial


of degree d > 1 and with leading coefficient ad . Let α1 , α2 , . . . , αd be the roots
of f (x) (in some extension of K). The discriminant of f (x) is defined as
d
Y d
Y d Y
Y d
Discr(f (x)) = a2d−2
d (αi − αj )2 = (−1)d(d−1)/2 a2d−2
d (αi − αj ).
i=1 j=i+1 i=1 j=1
j6=i

One can check that Discr(f (x)) ∈ K. ⊳

Discriminants are related to resultants in the following way.


Proposition 3.30 Let f (x) be as in Definition 3.29, f ′ (x) the formal deriva-
tive of f (x), and k = deg f ′ (x). Then,
Discr(f (x)) = (−1)d(d−1)/2 ad−k−2
d Res(f (x), f ′ (x)).
If k = d − 1 (for example, if the characteristic of K is zero), we have
Discr(f (x)) = (−1)d(d−1)/2 a−1 ′
d Res(f (x), f (x)).

For any field K, Discr(f (x)) 6= 0 if and only if f (x) is square-free.


Proof Use Proposition 3.27(3). ⊳

Example 3.31 (1) Discriminants of polynomials of small degrees have the


following explicit formulas. A verification of these formulas is left to the reader
(Exercise 3.39). In each case, the leading coefficient should be non-zero.
Discr(a1 x + a0 ) = a1 ,
Discr(a2 x2 + a1 x + a0 ) = a21 − 4a0 a2 , and
Discr(a3 x3 + a2 x2 + a1 x + a0 ) = a21 a22 − 4a0 a32 − 4a31 a3 + 18a0 a1 a2 a3 −27a20 a23 .
150 Computational Number Theory

(2) Consider the quartic polynomial f (x) = 4x4 + 5x − 8 ∈ Q[x]. We have


Discr(f (x)) = (−1)4(4−1)/2 4−1 Res(4x4 + 5x − 8, 16x3 + 5)
¯ ¯
¯ 4 0 0 5 −8 0 0 ¯
¯ ¯
¯ 0 4 0 0 5 −8 0 ¯
¯ ¯
0 0 4 0 0 5 −8 ¯
1 ¯¯ ¯
= × ¯ 16 0 0 5 0 0 0 ¯
4 ¯ ¯
¯ 0 16 0 0 5 0 0 ¯
¯ ¯
¯ 0 0 16 0 0 5 0 ¯
¯ ¯
0 0 0 16 0 0 5
= (−34634432)/4 = −8658608.

(3) We have the prime factorization −8658608 = −(24 × 7 × 97 × 797).


Therefore, if f (x) = 4x4 + 5x − 8 is treated as a polynomial in F7 [x], we have
Discr(f (x)) = 0. We have the factorization f (x) = 4(x + 3)2 (x2 + x + 6) in
F7 [x]. The repeated linear factor justifies why Discr(f (x)) = 0 in this case. ¤

A useful bound on the discriminant of a polynomial f (x) ∈ Z[x] follows.


Proposition 3.32 Let f (x) ∈ Z[x] be of degree d > 1. Then, | Discr(f (x))| 6
1
(d + 1)2d− 2 H(f )2d−1 .
Proof Let ad denote the leading coefficient of f (x). We have
| Discr(f (x))| = |ad |−1 | Res(f (x), f ′ (x))|
6 | Res(f (x), f ′ (x))| [since |ad | > 1]
6 |f |d−1 |f ′ |d [by Exercise 3.43]
6 dd |f |2d−1 [since |f ′ | 6√
d|f |]
6 dd (d + 1)(2d−1)/2 H(f )2d−1 [since |f | 6 d + 1 H(f )]
1
6 (d + 1)2d− 2 H(f )2d−1 .

In GP/PARI, the function for computing resultants is polresultant(), and
the function for computing discriminants is poldisc(). Some examples follow.

gp > polresultant(2*x^3+1,x^2-2*x+3)
%1 = 89
gp > polresultant(x^4+x^2+1,x^3+1)
%2 = 0
gp > polresultant(Mod(2,89)*x^3+Mod(1,89),Mod(1,89)*x^2-Mod(2,89)*x+Mod(3,89))
%3 = 0
gp > poldisc(a*x^2+b*x+c)
%4 = -4*c*a + b^2
gp > poldisc(a*x^3+b*x^2+c*x+d)
%5 = -27*d^2*a^2 + (18*d*c*b - 4*c^3)*a + (-4*d*b^3 + c^2*b^2)
gp > poldisc(4*x^4+5*x-8)
%6 = -8658608
gp > poldisc(Mod(4,7)*x^4+Mod(5,7)*x-Mod(8,7))
%7 = Mod(0, 7)
Arithmetic of Polynomials 151

3.4.3 Hensel Lifting


In Section 1.5, we have used the concept of Hensel lifting for solving poly-
nomial congruences modulo pn . Here, we use a similar technique for lifting
factorizations of polynomials modulo pn to modulo pn+1 .

Theorem 3.33 [Hensel lifting of polynomial factorization] Let p be a prime,


and f (x), g1 (x), h1 (x) ∈ Z[x] be non-constant polynomials satisfying:
(1) p 6 | Res(g1 (x), h1 (x)),
(2) g1 (x) is monic,
(3) deg g1 (x) + deg h1 (x) = deg f (x), and
(4) f (x) ≡ g1 (x)h1 (x) (mod p).
Then, for every n ∈ N, there exist polynomials gn (x), hn (x) ∈ Z[x] such that:
(a) gn (x) ≡ g1 (x) (mod p), and hn (x) ≡ h1 (x) (mod p),
(b) p 6 | Res(gn (x), hn (x)),
(c) gn (x) is monic, and deg gn (x) = deg g1 (x),
(d) deg gn (x) + deg hn (x) = deg f (x), and
(e) f (x) ≡ gn (x)hn (x) (mod pn ).
Proof We proceed by induction on n ∈ N. For n = 1, the properties (a)–
(e) reduce to the properties (1)–(4) and so are valid by hypothesis. In order
to prove the inductive step, assume that the polynomials gn (x), hn (x) are
available for some n > 1. We construct the polynomials gn+1 (x) and hn+1 (x).
Here, gn (x) and hn (x) are known modulo pn , that is, the polynomials
gn+1 (x) = gn (x) + pn un (x) and hn+1 (x) = hn (x) + pn vn (x)
in Z[x] satisfy f (x) ≡ gn+1 (x)hn+1 (x) (mod pn ) for any un (x), vn (x) ∈ Z[x].
Our task is to locate un (x), vn (x) so that f (x) ≡ gn+1 (x)hn+1 (x) (mod pn+1 )
also. We seek un (x), vn (x) with deg un (x) < deg g1 (x) and deg vn (x) 6
deg h1 (x). We have
f (x) − gn+1 (x)hn+1 (x)
= (f (x) − gn (x)hn (x)) − pn (vn (x)gn (x) + un (x)hn (x)) − p2n un (x)vn (x).
By the induction hypothesis, f (x) − gn (x)hn (x) = pn wn (x) for some wn (x) ∈
Z[x] with deg wn (x) 6 deg f (x). Since n > 1, we have 2n > n + 1. Therefore,
the condition f (x) ≡ gn+1 (x)hn+1 (x) (mod pn+1 ) implies
vn (x)gn (x) + un (x)hn (x) ≡ wn (x) (mod p).
This gives us a linear system modulo p in the unknown coefficients of un (x)
and vn (x). The number of variables in the system is (1 + deg vn (x)) + (1 +
deg un (x)) = (1 + deg h1 (x)) + deg g1 (x) = 1 + deg f (x) which is the same as
the number of equations. Moreover, the determinant of the coefficient matrix
equals Res(gn (x), hn (x)), and is invertible modulo p by the induction hypoth-
esis. Therefore, there exists a unique solution for the polynomials un (x), vn (x)
modulo p. That is, there is a unique lift of the factorization gn (x)hn (x) modulo
pn to a factorization gn+1 (x)hn+1 (x) modulo pn+1 .
152 Computational Number Theory

It remains to verify that the lifted polynomials gn+1 (x), hn+1 (x) continue
to satisfy the properties (a)–(d). By construction, gn+1 (x) ≡ gn (x) (mod pn ),
whereas by the induction hypothesis gn (x) ≡ g1 (x) (mod p), so gn+1 (x) ≡
g1 (x) (mod p). Analogously, hn+1 (x) ≡ h1 (x) (mod p). So Property (a) holds.
By construction, Res(gn+1 (x), hn+1 (x)) ≡ Res(gn (x), h( x)) (mod p), and
by the induction hypothesis, p6 | Res(gn (x), hn (x)), that is, Property (b) holds.
Now, deg un (x) < deg g1 (x), deg gn (x) = deg g1 (x), and gn (x) is monic.
It follows that gn+1 (x) = gn (x) + pn un (x) is monic too, with degree equal to
deg g1 (x). Thus, Property (c) holds.
For proving Property (d), assume that deg hn+1 (x) < deg h1 (x). But Prop-
erty (a) implies that f (x) ≡ gn+1 (x)hn+1 (x) (mod p) has a degree less than
the degree of g1 (x)h1 (x) (mod p), contradicting (3) and (4). ⊳
Example 3.34 We start with the following values.
f (x) = 35x5 − 22x3 + 10x2 + 3x − 2 ∈ Z[x],
p = 13,
g1 (x) = x2 + 2x − 2 ∈ Z[x], and
h1 (x) = −4x3 − 5x2 + 6x + 1 ∈ Z[x].
Let us first verify that the initial conditions (1)–(4) in Theorem 3.33 are
satisfied for these choices. We have Res(g1 (x), h1 (x)) = 33 = 2 × 13 + 7, that
is, Condition (1) holds. Clearly, Conditions (2) and (3) hold. Finally, f (x) −
g1 (x)h1 (x) = 39x5 + 13x4 − 26x3 − 13x2 + 13x = 13 × (3x5 + x4 − 2x3 − x2 + x),
that is, Condition (4) is satisfied.
We now lift the factorization g1 (x)h1 (x) of f (x) modulo p to a factorization
g2 (x)h2 (x) of f (x) modulo p2 . First, compute w1 (x) = (f (x)−g1 (x)h1 (x))/p =
3x5 + x4 − 2x3 − x2 + x. Then, we attempt to find the polynomials u1 (x) =
u11 x + u10 and v1 (x) = v13 x3 + v12 x2 + v11 x + v10 satisfying
v1 (x)g1 (x) + u1 (x)h1 (x) ≡ w1 (x) (mod p).
Expanding the left side of this congruence and equating the coefficients of xi ,
i = 5, 4, 3, 2, 1, 0, from both sides give the linear system

v13 ≡ 3  

2v13 + v12 − 4u11 ≡ 1  


−2v13 + 2v12 + v11 − 5u11 − 4u10 ≡ −2
(mod p),
−2v12 + 2v11 + v10 + 6u11 − 5u10 ≡ −1  

−2v11 + 2v10 + u11 + 6u10 ≡ 1  


−2v10 + u10 ≡ 0
that is, the system
    
1 0 0 0 0 0 v13 3
 2 1 0 0 −4 0   v12   1 
    
 −2 2 1 0 −5 −4   v11   −2 
  ≡  (mod 13).
 0 −2 2 1 6 −5   v10   −1 
    
0 0 −2 2 1 6 u11 1
0 0 0 −2 0 1 u10 0
Arithmetic of Polynomials 153

The solution of the system is


t t
( v13 v12 v11 v10 u11 u10 ) ≡ ( 3 7 12 9 3 5 ) (mod 13),
that is, u1 (x) = 3x + 5, and v1 (x) = 3x3 + 7x2 + 12x + 9. This gives g1 (x) +
pu1 (x) = x2 + 41x + 63, and h1 (x) + pv1 (x) = 35x3 + 86x2 + 162x + 118. For
a reason to be clarified in Section 3.5.1, we plan to leave each coefficient c of
gn (x), hn (x) in the range −pn /2 < c 6 pn /2, that is, in this case in the range
−84 6 c 6 84. We, therefore, take
g2 (x) = x2 + 41x + 63, and h2 (x) = 35x3 − 83x2 − 7x − 51.
Let us verify that Properties (a)–(e) of Theorem 3.33 are satisfied by the lifted
factorization. Properties (a), (c) and (d) hold obviously. Res(g2 (x), h2 (x)) =
896640999 = 68972384 × 13 + 7, that is, p 6 | Res(g2 , h2 ). Finally, f (x) −
g2 (x)h2 (x) = −1352x4 + 1183x3 + 5577x2 + 2535x + 3211 = 132 × (−8x4 +
7x3 + 33x2 + 15x + 19).
Let us now lift this factorization to g3 (x)h3 (x) modulo p3 = 2197. We have
w2 (x) = (f (x) − g2 (x)h2 (x))/p2 = −8x4 + 7x3 + 33x2 + 15x + 19. We seek
polynomials u2 (x) = u21 x + u20 and v2 (x) = v23 x3 + v22 x2 + v21 x + v20 with
v2 (x)g2 (x) + u2 (x)h2 (x) ≡ w2 (x) (mod p).
The resulting linear system is

v23 ≡ 0 


41v23 + v22 + 35u21 ≡ −8 



63v23 + 41v22 + v21 − 83u21 + 35u20 ≡ 7
(mod p),
63v22 + 41v21 + v20 − 7u21 − 83u20 ≡ 33 


63v21 + 41v20 − 51u21 − 7u20 ≡ 15 



63v20 − 51u20 ≡ 19
that is,
    
1 0 0 0 0 0 v23 0
 41 1 0 0 35 0   v22   −8 
    
 63 41 1 0 −83 35   v21   7 
  ≡  (mod 13),
 0 63 41 1 −7 −83   v20   33 
    
0 0 63 41 −51 −7 u21 15
0 0 0 63 0 −51 u20 19
which has the solution
t t
( v23 v22 v21 v20 u21 u20 ) ≡ ( 0 0 0 3 2 12 ) (mod 13).
Therefore, u2 (x) = 2x + 12 and v2 (x) = 3 which yield g2 (x) + p2 u2 (x) =
x2 + 379x + 2091 and h2 (x) + p2 v2 (x) = 35x3 − 83x2 − 7x + 456. We take
the coefficients of g3 (x) and h3 (x) between −p3 /2 and p3 /2, that is, between
−1098 and +1098, that is,
g3 (x) = x2 + 379x − 106, and h3 (x) = 35x3 − 83x2 − 7x + 456.
154 Computational Number Theory

Since Res(g3 (x), h3 (x)) = −861479699866 = (−66267669221)×13+7, we have


p 6 | Res(g3 , h3 ) (Property (a)). Moreover, f (x) − g3 (x)h3 (x) = −13182x4 +
35152x3 − 6591x2 − 173563x + 48334 = 133 × (−6x4 + 16x3 − 3x2 − 79x + 22)
(Property (e)). Other properties in Theorem 3.33 are evidently satisfied. ¤

3.5 Factoring Polynomials with Integer Coefficients


3.5.1 Berlekamp’s Factoring Algorithm
We now possess the requisite background to understand Berlekamp’s al-
gorithm for factoring polynomials in Z[x]. We first compute a square-free
decomposition of the input polynomial f (x) ∈ Z[x]. In short, we compute
gcd(f (x), f ′ (x)), where f ′ (x) is the formal derivative of f (x). Since the char-
acteristic of Z (or Q) is zero, f ′ (x) 6= 0 for a non-constant f (x). The polyno-
mial f (x)/ gcd(f (x), f ′ (x)) is square-free. The cofactor gcd(f (x), f ′ (x)), unless
equal to 1, is recursively subject to square-free factorization.
We may, therefore, assume that f (x) ∈ Z[x] is itself square-free. To start
with, let us consider the special case that f (x) is monic. Later, we will remove
this restriction. We compute ∆ = Discr(f (x)), and choose a prime p6 | ∆. Since
f (x) is square-free, ∆ is non-zero and can have at most lg |∆| prime factors.
So the smallest prime that does not divide ∆ has to be at most as large as the
(1 + lg |∆|)-th prime, that is, of the order O(log |∆| log log |∆|). By Proposi-
1
tion 3.32, |∆| 6 (d+1)2d− 2 H(f )2d−1 , where d = deg f , and H(f ) is the height
of f (x). This implies that we can take p = O(d log(dH(f )) log(d log(dH(f )))).
There is no need to factor ∆ to determine a suitable prime p. One may instead
try p from a set of small primes (for example, in the sequence 2, 3, 5, 7, 11, . . . ),
until one not dividing ∆ is located.
The motivation behind choosing p with p6 | ∆ is that the polynomial f (x)
continues to remain square-free modulo p. We factor f (x) in Fp [x] as

f (x) = f1 (x)f2 (x) · · · ft (x) ∈ Fp [x],

where f1 , f2 , . . . , ft are distinct monic irreducible polynomials in Fp [x]. If t =


1, then f (x) is evidently irreducible in Z[x]. So suppose that t > 2.
If f (x) is reducible in Z[x], it admits a factor g(x) of degree 6 ⌊d/2⌋, and
we can write f (x) = g(x)h(x) ∈ Z[x] with g(x) and h(x) monic. The polyno-
mial g(x), even if irreducible in Z[x], need not remain so in Fp [x]. However,
the factors of g(x) modulo p must come from the set {f1 (x), f2 (x), . . . , ft (x)}.
Therefore, we try all possible subsets {i1 , i2 , . . . , ik } ⊆ {1, 2, . . . , t} for which
Xk
16 deg fij (x) 6 ⌊d/2⌋. Such a subset {i1 , i2 , . . . , ik } corresponds to the
j=1
k6t
Arithmetic of Polynomials 155

polynomial g1 (x) = fi1 (x)fi2 (x) · · · fik (x) ∈ Fp [x] which is a potential re-
duction of g(x) modulo p. We then compute h1 (x) = f (x)/g1 (x) in Fp [x].
Since f (x) is square-free modulo p, we have gcd(g1 (x), h1 (x)) = 1, that is,
Res(g1 (x), h1 (x)) is not divisible by p.
We then lift the factorization f (x) ≡ g1 (x)h1 (x) (mod p) to the (unique)
factorization f (x) ≡ gn (x)hn (x) (mod pn ) using Theorem 3.33. The poly-
nomial gn (x) is monic, and is represented so as to have coefficients between
−pn /2 and pn /2. We choose n large enough to satisfy pn /2 > H(g). Then,
gn (x) in this representation can be identified with a polynomial in Z[x].
(Notice that g(x) ∈ Z[x] may have negative coefficients. This is why we
kept the coefficients√ of gn (x) between −pn /2 and pn /2.) Proposition 3.25
gives H(g) 6 √ 2⌊d/2⌋ d + 1 H(f ), that is, we choose the smallest n satisfying
n ⌊d/2⌋
p /2 > 2 d + 1 H(f ) so as to ascertain pn /2 > H(g).
Once gn (x) is computed, we divide f (x) by gn (x) in Z[x] (actually, Q[x]).
If r(x) = f (x) rem gn (x) = 0, we have detected a divisor g(x) = gn (x) ∈ Z[x]
of f (x). We then recursively factor g(x) and the cofactor h(x) = f (x)/g(x).
Let us finally remove the restriction that f (x) is monic. Let a ∈ Z
be the leading coefficient of f (x). We require the reduction of f (x) mod-
ulo p to be of degree d = deg f , that is, we require p 6 | a. Factoring
f (x) in Fp [x] then gives f (x) ≡ af1 (x)f2 (x) · · · ft (x) (mod p) with distinct
monic irreducible polynomials f1 , f2 , . . . , ft in Fp [x]. We start with a divisor
g1 (x) = fi1 (x)fi2 (x) · · · fik (x) of f (x) in Fp [x] with deg g1 6 ⌊d/2⌋, and set
h1 (x) = f (x)/g1 (x) ∈ Fp [x]. Here, g1 (x) is monic, and h1 (x) has leading coef-
ficient a (mod p). Using Hensel’s lifting, we compute gn (x), hn (x) with gn (x)
monic and f (x) ≡ gn (x)hn (x) (mod pn ).
A divisor g(x) of f (x) in Z[x] need not be monic. However, the leading co-
efficient b of g(x) must divide a. Multiplying g(x) by a/b gives the polynomial
(a/b)g(x) with leading coefficient equal to a. Moreover, (a/b)g(x) must di-
vide f (x) in Q[x]. Therefore, instead of checking whether gn (x) divides f (x),
we now check whether agn (x) divides f (x) in Q[x]. That is, agn (x) is now
identified with the √ polynomial (a/b)g(x).√Since H((a/b)g(x)) 6 H(ag(x)) =
|a|H(g) 6 |a|2⌊d/2⌋√ d + 1 H(f ) 6 2⌊d/2⌋ d + 1 H(f )2 , we now choose n so
that pn /2 > 2⌊d/2⌋ d + 1 H(f )2 .
Algorithm 3.9 summarizes all these observations in order to arrive at an
algorithm for factoring polynomials in Z[x].

Example 3.35 Let us try to factor the polynomial

f (x) = 35x5 − 22x3 + 10x2 + 3x − 2 ∈ Z[x]

of Example 3.34. We have a = 35 and d = 5. We compute Discr(f (x)) =


−17245509120, and choose the prime p = 13 dividing √ neither this discriminant
nor a. The smallest exponent n with pn > 2⌊d/2⌋+1 ( d + 1 )H(f (x))2 ≈ 24005
is n = 4. (We have 134 = 28561 > 24005.) In Fp [x], we have the factorization

f (x) ≡ 9(x + 5)(x + 10)(x + 11)(x2 + 5) (mod p).


156 Computational Number Theory

Algorithm 3.9: Berlekamp’s algorithm for factoring a non-


constant square-free polynomial f (x) ∈ Z[x]
Let a be the leading coefficient of f (x), and d = deg f (x).
Choose a prime p such that p6 | a and p6 | Discr(f (x)).√
Choose the smallest n ∈ N such that pn > 2⌊d/2⌋+1 ( d + 1 )H(f (x))2 .
Factor f (x) ≡ af1 (x)f2 (x) · · · ft (x) (mod p),
where f1 , f2 , . . . , ft ∈ Fp [x] are distinct, monic and irreducible.
If (t = 1), output the irreducible polynomial f (x), and return. ¥ ¦
For each factor g1 (x) = fi1 (x) · · · fik (x) ∈ Fp [x] of f (x) with deg g1 6 d2 {
Take h1 (x) = f (x)/g1 (x) in Fp [x].
Using Hensel’s lifting, compute the factorization
f (x) ≡ gn (x)hn (x) (mod pn ) with gn (x) monic,
gn (x) ≡ g1 (x) (mod p), and hn (x) ≡ h1 (x) (mod p).
Set ḡn (x) = agn (x) ∈ Zpn [x] with each coefficient of ḡn (x)
lying between −pn /2 and pn /2.
Treat ḡn (x) as a polynomial in Z[x].
Compute g(x) = ḡn (x)/ cont(ḡn ).
Compute r(x) = f (x) rem g(x) in Q[x].
if (r(x) = 0) {
Recursively call Algorithm 3.9 on g(x).
Recursively call Algorithm 3.9 on f (x)/g(x).
Return.
}
}
/* All potential divisors of f (x) tried, but no divisor found */
Output the irreducible polynomial f (x).

The search for potential divisors g(x) of f (x) starts by selecting g1 (x) from

{x+5, x+10, x+11, (x+5)(x+10), (x+5)(x+11), (x+10)(x+11), x2 +5}.

For every i, 1 6 i 6 n, we maintain the coefficients of gi (x) and hi (x) between


−pi /2 and pi /2. For example, x+10 is represented as x−3, and (x+5)(x+10) ≡
x2 + 2x + 11 (mod p) as x2 + 2x − 2. The computation of g(x) proceeds for
different choices of g1 (x) as follows.
Choice 1: g1 (x) = x + 5 ∈ Z13 [x].
We have h1 (x) = f (x)/g1 (x) = −4x4 − 6x3 − 5x2 − 4x − 3 ∈ Zp [x]. Three
stages of Hensel lifting produce the following polynomials.

g2 (x) = x + 83 ∈ Z132 [x],


h2 (x) = 35x4 − 32x3 − 70x2 + 74x − 55 ∈ Z132 [x],
g3 (x) = x + 421 ∈ Z133 [x],
Arithmetic of Polynomials 157

h3 (x) = 35x4 + 644x3 − 915x2 + 750x + 621 ∈ Z133 [x],


g4 (x) = x − 1776 ∈ Z134 [x],
h4 (x) = 35x4 + 5038x3 + 7873x2 − 12432x − 1576 ∈ Z134 [x].

Subsequently, we compute ḡ4 [x] = ag4 (x) = 35x − 5038 ∈ Z134 [x], g(x) =
ḡ4 (x)/ cont(ḡ4 (x)) = 35x − 5038 ∈ Z[x], and r(x) = f (x) rem g(x) =
3245470620554884228
1500625 ∈ Q[x]. Since r(x) 6= 0, g(x) is not a factor of f (x).

Choice 2: g1 (x) = x − 3 ∈ Z13 [x].


We have the following sequence of computations.

h1 (x) = f (x)/g1 (x) = −4x4 + x3 − 6x2 + 5x + 5 ∈ Z13 [x],


g2 (x) = x − 42 ∈ Z132 [x],
h2 (x) = 35x4 − 51x3 + 33x2 + 44x − 8 ∈ Z132 [x],
g3 (x) = x − 42 ∈ Z133 [x],
h3 (x) = 35x4 − 727x3 + 202x2 − 294x + 837 ∈ Z133 [x],
g4 (x) = x − 13224 ∈ Z134 [x],
h4 (x) = 35x4 + 5864x3 + 2399x2 − 6885x + 5231 ∈ Z134 [x],
ḡ4 (x) = ag4 (x) = 35x − 5864 ∈ Z134 [x],
g(x) = ḡ4 (x)/ cont(ḡ4 (x)) = 35x − 5864 ∈ Z[x],
6933621169702778694
r(x) = f (x) rem g(x) = ∈ Q[x].
1500625
Since r(x) 6= 0, this factorization attempt is unsuccessful.

Choice 3: g1 (x) = x − 2 ∈ Z13 [x].

h1 (x) = f (x)/g1 (x) = −4x4 + 5x3 + x2 − x + 1 ∈ Z13 [x],


g2 (x) = x − 41 ∈ Z132 [x],
h2 (x) = 35x4 + 83x3 + x2 + 51x + 66 ∈ Z132 [x],
g3 (x) = x − 379 ∈ Z133 [x],
h3 (x) = 35x4 + 83x3 + 677x2 − 456x + 742 ∈ Z133 [x],
g4 (x) = x − 13561 ∈ Z134 [x],
h4 (x) = 35x4 − 10902x3 − 10308x2 − 9244x − 3652 ∈ Z134 [x],
ḡ4 (x) = ag4 (x) = 35x + 10902 ∈ Z134 [x],
g(x) = ḡ4 (x)/ cont(ḡ4 (x)) = 35x + 10902 ∈ Z[x],
154002606285781371872
r(x) = f (x) rem g(x) = − ∈ Q[x].
1500625
Since r(x) 6= 0, this factorization attempt is unsuccessful too.

Choice 4: g1 (x) = (x + 5)(x + 10) = x2 + 2x − 2 ∈ Z13 [x].


158 Computational Number Theory

This choice of g1 (x) is considered in Example 3.34.


h1 (x) = f (x)/g1 (x) = −4x3 − 5x2 + 6x + 1 ∈ Z13 [x],
g2 (x) = x2 + 41x + 63 ∈ Z132 [x],
h2 (x) = 35x3 − 83x2 − 7x − 51 ∈ Z132 [x],
g3 (x) = x2 + 379x − 106 ∈ Z133 [x],
h3 (x) = 35x3 − 83x2 − 7x + 456 ∈ Z133 [x],
g4 (x) = x2 + 13561x + 8682 ∈ Z134 [x],
h4 (x) = 35x3 + 10902x2 − 7x + 9244 ∈ Z134 [x],
ḡ4 (x) = ag4 (x) = 35x2 − 10902x − 10301 ∈ Z134 [x],
g(x) = ḡ4 (x)/ cont(ḡ4 (x)) = 35x2 − 10902x − 10301 ∈ Z[x],
r(x) = f (x) rem g(x)
µ ¶
14254770160420556 13428329145305308
= x+ ∈ Q[x].
42875 42875
Since r(x) 6= 0, this factorization attempt is again unsuccessful.
Choice 5: g1 (x) = (x + 5)(x + 11) = x2 + 3x + 3 ∈ Z13 [x].

h1 (x) = f (x)/g1 (x) = −4x3 − x2 + 6x − 5 ∈ Z13 [x],


g2 (x) = x2 + 42x − 23 ∈ Z132 [x],
h2 (x) = 35x3 + 51x2 − 7x − 44 ∈ Z132 [x],
g3 (x) = x2 + 42x + 822 ∈ Z133 [x],
h3 (x) = 35x3 + 727x2 − 7x + 294 ∈ Z133 [x],
g4 (x) = x2 + 13224x + 7413 ∈ Z134 [x],
h4 (x) = 35x3 − 5864x2 − 7x + 6885 ∈ Z134 [x],
ḡ4 (x) = ag4 (x) = 35x2 + 5864x + 2406 ∈ Z134 [x],
g(x) = ḡ4 (x)/ cont(ḡ4 (x)) = 35x2 + 5864x + 2406 ∈ Z[x],
r(x) = f (x) rem g(x)
µ ¶
1173724653532041 482764549856654
= x+ ∈ Q[x].
42875 42875
Since r(x) 6= 0, this factorization attempt is yet again unsuccessful.
Choice 6: g1 (x) = (x + 10)(x + 11) = x2 − 5x + 6 ∈ Z13 [x].

h1 (x) = f (x)/g1 (x) = −4x3 + 6x2 + 6x + 4 ∈ Z13 [x],


g2 (x) = x2 − 83x + 32 ∈ Z132 [x],
h2 (x) = 35x3 + 32x2 − 7x − 74 ∈ Z132 [x],
g3 (x) = x2 − 421x + 539 ∈ Z133 [x],
h3 (x) = 35x3 − 644x2 − 7x − 750 ∈ Z133 [x],
g4 (x) = x2 + 1776x − 3855 ∈ Z134 [x],
Arithmetic of Polynomials 159

h4 (x) = 35x3 − 5038x2 − 7x + 12432 ∈ Z134 [x],


ḡ4 (x) = ag4 (x) = 35x2 + 5038x + 7880 ∈ Z134 [x],
g(x) = ḡ4 (x)/ cont(ḡ4 (x)) = 35x2 + 5038x + 7880 ∈ Z[x],
r(x) = f (x) rem g(x)
µ ¶
623273765466781 197140047380562
= x+ ∈ Q[x].
42875 8575
Since r(x) 6= 0, this factorization attempt continues to be unsuccessful.
Choice 7: g1 (x) = x2 + 5 ∈ Z13 [x].
h1 (x) = f (x)/g1 (x) = −4x3 − 2x − 3 ∈ Z13 [x],
g2 (x) = x2 − 34 ∈ Z132 [x],
h2 (x) = 35x3 − 15x + 10 ∈ Z132 [x],
g3 (x) = x2 − 879 ∈ Z133 [x],
h3 (x) = 35x3 − 15x + 10 ∈ Z133 [x],
g4 (x) = x2 + 5712 ∈ Z134 [x],
h4 (x) = 35x3 − 15x + 10 ∈∈ Z134 [x],
ḡ4 (x) = ag4 (x) = 35x2 − 7 ∈ Z134 [x],
g(x) = ḡ4 (x)/ cont(ḡ4 (x)) = 5x2 − 1 ∈ Z[x],
r(x) = f (x) rem g(x) = 0 ∈ Q[x].
We at last have r(x) = 0, that is, g(x) = 5x2 − 1 is a factor of f (x). The
corresponding cofactor is h(x) = f (x)/g(x) = 7x3 − 3x2 + 2. We then attempt
to factor g(x) and h(x) recursively. The steps are not shown here. Instead,
we argue logically. Clearly, g(x) is irreducible in Z[x], since it is irreducible
in Z13 [x]. The other factor h(x) splits modulo 13 into three linear factors. If
h(x) were reducible in Z[x], it must have at least one linear factor. But we
have seen above that neither of the three linear factors of h(x) lifts to a factor
of f (x). Consequently, h(x) too is irreducible in Z[x]. ¤
Let us now look at the running time of Berlekamp’s algorithm. The input
polynomial f (x) is of degree d and height H(f ), and can be encoded using
O(d log H(f )) bits. Thus, the input size of the algorithm is taken as d log H(f ).
It is easy to argue that under the given choices of p and n, each trial of
computing g(x) runs in time polynomial in d log H(f ). However, the algorithm
may make many unsuccessful trials. In the worst case, f (x) is irreducible in
Z[x], whereas f (x) splits into d linear factors modulo p (there are examples of
this and similar worst-case situations). One then has to attempt each of the
P⌊d/2⌋ ¡d¢ d−1
k=1 k ≈ 2 subsets of the factors of f (x) in Fp [x]. All these attempts
fail to produce a divisor of f (x) in Z[x]. Since the number of trials is an
exponential function of d in this case, Berlekamp’s algorithm takes exponential
running time in the worst case. On an average, the performance of Berlekamp’s
algorithm is not always as bad. Still, an algorithm that runs in polynomial
time even in the worst case is needed.
160 Computational Number Theory

3.5.2 Basis Reduction in Lattices


In order to arrive at a polynomial-time algorithm for factoring polynomials
with integer coefficients, Lenstra, Lenstra and Lovász11 involve lattices. In the
rest of this chapter, I provide a brief description of this L3 (or LLL) algorithm.
The proof of correctness of the algorithm is not difficult, but quite involved,
and so is omitted here. Some auxiliary results are covered as exercises.
Let b1 , b2 , . . . , bn be linearly independent vectors in Rn (written as column
vectors). The set L of all integer-linear combinations of these vectors is called
a lattice, denoted as
n
X
L= Zbi = {r1 b1 + r2 b2 + · · · + rn bn | ri ∈ Z}.
i=1

We say that b1 , b2 , . . . , bn constitute a basis of L, and also that L is generated


by b1 , b2 , . . . , bn . The determinant of L is defined as
d(L) = | det(b1 , b2 , . . . , bn )| .

Example 3.36 Figure 3.1(a) shows µ some¶ points in the


µ two-dimensional
¶ lat-
3 1
tice generated by the vectors b1 = and b2 = . The area of the
0 3¯ µ ¶¯
¯ 3 1 ¯¯
shaded region is the determinant of this lattice, which is ¯ det
¯ = 9.
0 3 ¯

FIGURE 3.1: A two-dimensional lattice


y y

(1, 3) (4, 3)
(7, 3)
x x
(0, 0) (3, 0) (0, 0)

(a) Lattice generated by (b) The same lattice generated by


(3, 0) and (1, 3) a different basis

In Figure 3.1(b), the µ


same¶ lattice is generated by
µ the¶ linearly independent
7 4
vectors c1 = 2b1 + b2 = and c2 = b1 + b2 = . Indeed, b1 = c1 − c2
3 3
11 Arjen K. Lenstra, Hendrik W. Lenstra, Jr. and László Lovász, Factoring polynomials

with rational coefficients, Mathematische Annalen, 261, 515–534, 1982.


Arithmetic of Polynomials 161

and b2 = −c1 + 2c2 , so c1 , c2 generate¯ theµsame lattice


¶¯ as b1 , b2 . The shaded
¯ 7 4 ¯¯
region in Part (b) has the same area ¯ det
¯ = 9 as in Part (a).
3 3 ¯
A lattice can have arbitrarily large basis vectors. For example, the two-
dimensional
µ ¶ lattice of Figure 3.1 is alsoµgenerated
¶ by d1 = 1000b1 + 1001b2 =
4001 3997
and d2 = 999b1 + 1000b2 = . ¤
3003 3000
It is evident that a lattice L generated by b1 , b2 , . . . , bn is also generated
by c1 , c2 , . . . , cn if and only if (c1 c2 · · · cn ) = T (b1 b2 · · · bn ) for an
n × n matrix T with integer entries satisfying det T = ±1. It follows that the
determinant of L is a property of the lattice itself, and not of bases of L.
Given a lattice L generated by b1 , b2 , . . . , bn , a pertinent question is to
find a shortest vector in L, where the length of a vector x = (x1 x2 · · · xn ) t
is its standard Euclidean (or L2 ) norm:
q
|x| = x21 + x22 + · · · + x2n .
This shortest-vector problem turns out to be NP-Complete. Lenstra, Lenstra
and Lovász propose an approximation algorithm for the problem. Although
the approximation ratio is rather large (2(n−1)/2 ), the algorithm solves many
important computational problems in number theory.
The first crucial insight in the problem comes from a simple observation.
Example 3.36 demonstrates that the same lattice L can be generated by many
bases. The longer the basis vectors are, the more slender is the region corre-
sponding to det(L) for that basis. Moreover, vectors in a basis have a tendency
to be simultaneously long or simultaneously short. For solving the shortest-
vector problem, we, therefore, plan to construct a basis of short vectors which
are as orthogonal to one another as possible.
For the time being, let us treat b1 , b2 , . . . , bn as real vectors. All real-linear
combinations of these vectors span the entire Rn . Algorithm 3.10 constructs
a basis b∗1 , b∗2 , . . . , b∗n of Rn , with the vectors b∗i orthogonal to one another.
The basis b∗1 , b∗2 , . . . , b∗n of Rn is called the Gram–Schmidt orthogonalization
of b1 , b2 , . . . , bn . The inner product (or dot product) of the two vectors x =
(x1 x2 · · · xn ) t and y = (y1 y2 · · · yn ) t is defined as
hx, yi = x1 y1 + x2 y2 + · · · + xn yn ∈ R .

Algorithm 3.10: Gram–Schmidt orthogonalization b∗1 , b∗2 , . . . , b∗n of


b1 , b2 , . . . , bn
for i = 1, 2, . . .P
, n, compute
i−1
b∗i = bi − j=1 µi,j b∗j , where µi,j = hbi , b∗j i/hb∗j , b∗j i.

The quantity µi,j b∗j in Algorithm 3.10 is the component of bi in the direc-
tion of the vector b∗j . When all these components are removed from bi , the
vector b∗i becomes orthogonal to the vectors b∗1 , b∗2 , . . . , b∗i−1 computed so far.
162 Computational Number Theory

The multipliers µi,j are not necessarily integers, so b∗i need not belong to
the lattice generated by b1 , b2 , . . . , bn . Moreover, if the vectors b1 , b2 , . . . , bn
are already orthogonal to one another, we have b∗i = bi for all i = 1, 2, . . . , n.
The notion of near-orthogonality is captured by the following definition.

Definition 3.37 A basis b1 , b2 , . . . , bn of a lattice L (or Rn ) is called reduced


if its Gram–Schmidt orthogonalization satisfies the following two conditions:
1
|µi,j | 6 for all i, j with 1 6 j < i 6 n, (3.1)
2
3 ∗ 2
|b∗i + µi,i−1 b∗i−1 |2 > |b | for all i with 2 6 i 6 n. (3.2)
4 i−1
The constant 34 in Condition (3.2) can be replaced by any real constant in the
open interval ( 14 , 1). ⊳

Example 3.38 Consider the two-dimensional lattice L of Example 3.36.


µ ¶ µ ¶
3 1
(1) First, consider the basis constituted by b1 = and b2 = . Its
0 3
Gram–Schmidt orthogonalization is computed as follows.
µ ¶
3
b∗1 = b1 = ,
0
hb2 , b∗1 i 1×3+3×0 3 1
µ2,1 = ∗ ∗ = = = ,
hb1 , b1 i 3×3+0×0 9 3
µ ¶ µ ¶ µ ¶
1 1 3 0
b∗2 = b2 − µ2,1 b∗1 = − = .
3 3 0 3

We have |b∗2 + µ2,1 b∗1 |2 = |b2 |2 = 12 + 32 = 10, and |b∗1 |2 = 32 + 02 = 9. Since


10 > 34 × 9, the basis b1 , b2 of L is reduced.
µ ¶ µ ¶
7 4
(2) Next, consider the basis constituted by c1 = and c2 = . For
3 3
this basis, we have:
µ ¶
7
c∗1 = c1 = ,
3
hc2 , c∗1 i 4×7+3×3 37
µ2,1 = ∗ ∗ = = ,
hc1 , c1 i 7×7+3×3 58
µ ¶ µ ¶ µ 27 ¶
4 37 7 − 58
c∗2 = c2 − µ2,1 c∗1 = − = 63 .
3 58 3 58

Here, |µ2,1 | > 21 , so Condition (3.1) is not satisfied. Moreover, |c∗2 + µ2,1 c∗1 |2 =
|c2 |2 = 42 + 32 = 25, whereas |c∗1 |2 = 72 + 32 = 58, that is, Condition (3.2)
too is not satisfied. The basis c1 , c2 is, therefore, not reduced. ¤

The attractiveness of reduced bases stems from the following fact.


Arithmetic of Polynomials 163

Proposition 3.39 Let b1 , b2 , . . . , bn constitute a reduced basis of a lattice L.


Then, for any non-zero vector x in L, we have
|b1 |2 6 2n−1 |x|2 .
Moreover, for any basis x1 , x2 , . . . , xn of L, we have
|bi |2 6 2n−1 max(|x1 |2 , |x2 |2 , . . . , |xn |2 )
for all i = 1, 2, . . . , n. ⊳
Now, let us come to the main question of this section: how we can convert
a given basis b1 , b2 , . . . , bn of an n-dimensional lattice L to a reduced basis. If
the given basis is not reduced, either Condition (3.1) or Condition (3.2) is vio-
lated (or both). Algorithm 3.11 repairs the violation of the condition |µk,l | > 21
for some k, l satisfying 1 6 l < k 6 n. The correctness of Algorithm 3.11 fol-
lows from the formulas for µi,j (see Algorithm 3.10). It is important to note
that Algorithm 3.11 does not alter any of the orthogonal vectors b∗i .
1
Algorithm 3.11: Subroutine for handling |µk,l | > 2

Let r be the integer nearest to µk,l .


Replace bk by bk − rbl .
Subtract rµl,j from µk,j for j = 1, 2, . . . , l − 1.
Subtract r from µk,l .

Handling the violation of the second condition is a bit more involved. Let us
denote |b∗i |2 by Bi . Since the vectors b∗i are pairwise orthogonal, the violation
of Condition (3.2) for some k in the range 2 6 k 6 n can be rephrased as:
3
Bk + µ2k,k−1 Bk−1 < Bk−1 .
4
Now, we swap bk−1 and bk . This replaces the vector b∗k−1 by (the old vector)
b∗k +µk,k−1 b∗k−1 . Moreover, b∗k (and µk,k−1 ) are so updated that the new value
of b∗k + µk,k−1 b∗k−1 equals the old vector b∗k−1 . Consequently, Condition (3.2)
is restored at k. The updating operations are given in Algorithm 3.12.

Algorithm 3.12: Subroutine for handling Bk + µ2k,k−1 Bk−1 < 34 Bk−1


Let µ = µk,k−1 , and B = Bk + µ2 Bk−1 .
Update µk,k−1 to the value µBk−1 /B.
Set Bk = Bk−1 Bk /B, and Bk−1 = B.
Swap bk−1 and bk .
Swap µk−1,j and µk,j for all j = 1, 2, . . . , k − 2.
For i = k + 1, k + 2, . . . , n {
Compute M = µi,k−1 − µµi,k .
Set µi,k−1 to µi,k + µk,k−1 M .
Set µi,k to M .
}
164 Computational Number Theory

The Lenstra–Lenstra–Lovász basis-reduction algorithm (Algorithm 3.13)


uses the above two subroutines. First, the Gram–Schmidt orthogonalization
b∗1 , b∗2 , . . . , b∗n is computed. Moreover, the squared norms Bi = |b∗i |2 are com-
puted. After this initialization stage, we do not explicitly require the orthog-
onal vectors b∗i . Only the values Bi suffice for the rest of the algorithm.

Algorithm 3.13: Reduction of the basis b1 , b2 , . . . , bn of a lattice L


Compute b∗1 , b∗2 , . . . , b∗n by Algorithm 3.10.
Compute Bi = |b∗i |2 for i = 1, 2, . . . , n.
Set t = 2.
While (t 6 n) {
If (|µt,t−1 | > 12 ), call Algorithm 3.11 with k = t and l = t − 1.
If (Bt < ( 43 − µ2t,t−1 )Bt−1 ) {
Call Algorithm 3.12 with k = t.
If t > 3, set t = t − 1.
} else {
For j = t − 2, t − 3, . . . , 1 {
If (|µt,j | > 21 ), call Algorithm 3.11 with k = t and l = j.
}
Set t = t + 1.
}
}
Return the reduced basis b1 , b2 , . . . , bn .

The main (while) loop of Algorithm 3.13 maintains the invariance on t


that Condition (3.1) is satisfied for 1 6 j < i 6 t − 1, and Condition (3.2) is
satisfied for 1 6 i 6 t − 1. Each iteration of the loop attempts to enforce the
conditions at the current value of t. First, the condition on µt,t−1 is checked,
and if this condition is violated, it is repaired by invoking Algorithm 3.11.
This repairing has no other side effects.
Next, Condition (3.2) is checked for i = t. If this condition is satisfied, the
other µt,j values are handled, t is incremented, and the next iteration of the
while loop is started. However, if Condition (3.2) does not hold for i = t, the
vectors bt−1 and bt are swapped, and the relevant updates are carried out by
Algorithm 3.12. Although this restores Condition (3.2) for i = t, the vector
bt−1 and consequently all µt−1,j values change in the process. Therefore, t is
decremented, and the next iteration of the while loop is started. However, if
t = 2, there are no valid values of the form µt−1,j , so t is not decremented.
When the while loop terminates, we have t = n+1. By the loop invariance,
both the conditions for a reduced basis are satisfied for all relevant values of
i and j, that is, b1 , b2 , . . . , bn is now a reduced basis of the given lattice.
In order to establish that the while loop terminates after finitely many
iterations, Lenstra, Lenstra and Lovász define the quantity

di = | det( hbj , bk i )16j,k6i | for i = 0, 1, 2, . . . , n.


Arithmetic of Polynomials 165

It turns out that di is the square of the volume of the fundamental region
associated with the i-dimensional lattice generated by b1 , b2 , . . . , bi , that is,
i
Y i
Y
di = |b∗j |2 6 |bj |2 6 B i ,
j=1 j=1

where B = max{|bk |2 | 1 6 k 6 n}. In particular, d0 = 1, and dn = d(L)2


(where d(L) is the determinant of the lattice L). It also turns out that
µ ¶i(i−1)/2
3
di > Mi
4
for all i = 0, 1, 2, . . . , n, where M = min{|x|2 | x ∈ L and x 6= 0} is a value
which depends only on L (and not on any basis for L). Now, let
D = d1 d2 · · · dn−1 .
It follows that D is bounded from below by a positive value determined by
the lattice L, and from above by a positive value associated with a basis of L.
Each adjustment of a µk,l by Algorithm 3.11 does not alter any b∗i , so D
remains unaffected. On the other hand, swapping of bt−1 and bt by Algo-
rithm 3.12 reduces dt−1 (and so D too) by a factor at least as large as 4/3.
Therefore, if Binit is the initial value of B, the while loop of Algorithm 3.13
goes through at most O(n2 log Binit ) iterations. Since each iteration of the
while loop can be carried out using O(n2 ) integer operations, the running
time of the Lenstra–Lenstra–Lovász basis-reduction algorithm is equal to that
of O(n4 log Binit ) integer operations. The integers on which these operations
are carried out have bit lengths O(n log Binit ). With schoolbook integer arith-
metic, the running time of Algorithm 3.13 is, therefore, O(n6 log3 Binit ).

Example 3.40 µLet¶us reduce the µ basis


¶ c1 , c2 of Figure 3.1(b). Rename these
7 4
vectors as b1 = and b2 = . In Example 3.38(2), the Gram–Schmidt
3 3
orthogonalization of this basis is computed as
µ ¶ µ 27 ¶
7 37 − 58
b∗i = , µ2,1 = , b∗2 = 63 .
3 58 58

This gives the squared norm values as (we have B1 B2 = d(L)2 , as expected):
µ ¶2 µ ¶2
2 2 27 63 81
B1 = 7 + 3 = 58, B2 = + = .
58 58 58
In the first iteration of the while loop of Algorithm 3.13, we have t = 2,
and the condition on µ2,1 is violated (we have |µ2,1 | > 21 ). The integer closest
µ ¶
37 −3
to µ2,1 = 58 is 1. So we replace b2 by b2 −b1 = , and µ2,1 by µ2,1 −1 =
0
21
− 58 . The values of B1 and B2 do not change by this adjustment.
166 Computational Number Theory
81
We now have B2 = 58 , and ( 34 − µ22,1 )B1 = 1041
29 , that is, Condition (3.2)
is violated for t = 2. So we invoke Algorithm 3.12. We first compute B =
B2 + µ22,1 B1 = 9, change µ2,1 to µ2,1 B1 /B = − 37 , set B2 = B1 B2 /B = 9 and
µ ¶
−3
B1 = B = 9. Finally, we swap b1 and b2 , that is, we now have b1 =
µ ¶ 0
7
and b2 = . Since t = 2, we do not decrement t.
3
In the second iteration of the while loop, we first discover that |µ2,1 | = 37
is again too large. The integer closest to µ2,1 is −2. So we replace b2 by
µ ¶
1
b2 + 2b1 = , and µ2,1 by µ2,1 + 2 = − 31 .
3
Since B2 = 9 > 23 3 2
4 = ( 4 − µ2,1 )B1 , we do not swap b1 and b2 . Moreover,
there are no µt,j values to take care of. So t is increased to three, and the
algorithm
µ terminates.
¶ The
µ ¶computed reduced basis consists of the vectors
−3 1
b1 = and b2 = . Compare this with the basis in Figure 3.1(a). ¤
0 3

3.5.3 Lenstra–Lenstra–Lovász Factoring Algorithm


The Lenstra–Lenstra–Lovász or the L3 or the LLL algorithm relates poly-
nomial factorization with lattices in a very clever way. Let f (x) ∈ Z[x] be a
polynomial of degree d > 0. For simplicity, assume that f is square-free and
monic. I show how the L3 algorithm is capable of producing a non-trivial split
of f , or more precisely, discovering an irreducible factor g(x) of f (x).
Like Berlekamp’s algorithm, the L3 algorithm chooses a prime p not divid-
ing Discr(f ), and also a suitable positive integer k. First, f is factored modulo
p, and this factorization is refined to one modulo pk using Hensel’s lifting.
Suppose that this gives us a non-constant polynomial γ(x) ∈ Z[x] satisfying:
(1) γ(x) in monic (in Z[x]).
(2) γ(x) divides f (x) in Zpk [x].
(3) γ(x) is irreducible in Fp [x].
(4) γ(x)2 does not divide f (x) in Fp [x].
Let l = deg γ(x). If l = d, the polynomial f (x) is itself irreducible (modulo p
and so in Z[x] too). So we assume that 1 6 l 6 d − 1. The polynomial γ(x)
needs to be known modulo pk only. So we may assume that its coefficients are
between 0 and pk − 1 (or between −pk /2 and pk /2). In any case, we have
|γ|2 6 1 + lp2k .
Lenstra, Lenstra and Lovász prove that the above four conditions uniquely
identify a monic irreducible factor g(x) of f (x) (in Z[x]). It is precisely that
irreducible factor which is divisible by γ(x) in Zpk [x]. In order to compute this
factor g(x), Lenstra et al. use a lattice. An integer m in the range l 6 m 6 d
is chosen to satisfy the following condition (to be justified later):
Arithmetic of Polynomials 167
µ ¶d/2
2m
pkl > 2md/2 |f |m+d . (3.3)
m
Let h(x) ∈ Z[x] be of degree 6 m, and treat γ(x) as a (monic) polynomial
with integer coefficients. Euclidean division of h(x) by γ(x) gives
h(x) = (qm−l xm−l + qm−l−1 xm−l−1 + · · · + q1 x + q0 )γ(x) +
(rl−1 xl−1 + rl−2 xl−2 + · · · + r1 x + r0 )
with integers qi and rj . The condition γ(x)|h(x) modulo pk is equivalent to
having each rj divisible by pk . Writing rj = pk sj , we get
h(x) = (qm−l xm−l + qm−l−1 xm−l−1 + · · · + q1 x + q0 )γ(x) +
(sl−1 pk xl−1 + sl−2 pk xl−2 + · · · + s1 pk x + s0 pk )
We treat a polynomial as a vector of its coefficients. The polynomials h(x) ∈
Z[x] of degree 6 m and divisible by γ(x) modulo pk , therefore, form an (m+1)-
dimensional lattice L generated by the vectors xi γ(x) for i = 0, 1, 2, . . . , m − l
and by pk xj for j = 0, 1, 2, . . . , l − 1. Clearly, the determinant of this lattice is
d(L) = pkl .
This lattice L is related to the irreducible factor g(x) of f (x) as follows.
Proposition 3.41 If b ∈ L satisfies
pkl > |f |m |b|d ,
then b(x)|g(x) in Z[x]. In particular, gcd(f (x), b(x)) 6= 1 in this case. ⊳
Proposition 3.41 indicates the importance of short vectors in L. Let b1 , b2 ,
. . . , bm+1 constitute a reduced basis of L as computed by Algorithm 3.13.12

Proposition 3.42 Suppose that Condition (3.3) is satisfied. Let g(x) ∈ Z[x]
be the desired irreducible factor of f (x). Then,
¡ ¢1/d
deg g(x) 6 m if and only if |b1 | < pkl /|f |m .
¡ ¢1/d
Let t > 1 be the largest integer for which |bt | < pkl /|f |m . Then, we have
deg g(x) = m+1−t and, more importantly, g(x) = gcd(b1 (x), b2 (x), . . . , bt (x)).
¡ ¢1/d
Moreover, in this case, |bi | < pkl /|f |m for all i = 1, 2, . . . , t. ⊳

Example 3.43 Let us factor


f (x) = x5 − 2x4 + 6x3 − 5x2 + 10x − 4
by the L3 algorithm. The discriminant of f is
Discr(f ) = 2634468 = 22 × 3 × 59 × 61.
12 Here, vectors are polynomials too, so I use the notation bi instead of bi .
168 Computational Number Theory

If f (x) is reducible in Z[x], it must have an irreducible factor of degree 6 2. In


order to locate this factor,√a safe (but optimistic) choice is m = 2. Moreover,
d = deg f = 5, and |f | = 12 + 22 + 62 + 52 + 102 + 42 ≈ 13.491. This gives
us an estimate of the right side of Condition (3.3). We choose the prime
p = 229497501943 such that Condition (3.3) is satisfied for k = l = 1. See
below (after this example) how the parameters should actually be chosen. For
this sample demonstration, the above choices suffice.
The polynomial f (x) factors modulo p as
f (x) ≡ (x + 108272275755)(x + 121225226186)(x + 143510525420) ×
(x2 + 85986976523x + 46046039585) (mod p).
Let us take γ(x) = x + 108272275755, so l = 1. Let us also take k = 1. The
values of m for which Condition (3.3) is satisfied are m = 1 and m = 2.
Although f (x) (if reducible in Z[x]) must have an irreducible factor of degree
6 2, this factor need not be the desired multiple g(x) of γ(x) in Fp [x]. In
practice, p and k should be so chosen that Condition (3.3) holds for m = d−1.
We first try with m = 1. We considerµ the two-dimensional
¶ µ lattice generated

1 0
by γ(x) and p, that is, by the vectors and .
108272275755 229497501943
Algorithm 3.13 gives us a reduced basis consisting of the polynomials b1 (x) =
442094x
µ − 122483
¶ µand b2 (x) =¶ −335557x − 426148 (equivalently, the vectors
442094 −335557
and ). We have
−122483 −426148
¡ kl ¢1/d
p /|f |m ≈ 111.21.
But b1 and b2 have much larger L2 norms. So the degree of the desired factor
g(x) of f (x) is larger than one, and the factoring attempt fails for m = 1.
Next, we try with m = 2. We consider the three-dimensional
 lattice

1
generated by xγ(x), γ(x) and p, that is, by the vectors  108272275755 ,
    0
0 0
 1  and  0 . A reduced basis for this lattice
108272275755 229497501943
consists of the three polynomials b1 (x) = x2 − 2x + 4, b2 (x) = 114648x2 −
  −
122759x 90039 andb3 (x)  = 180082+ 188467 + 49214, that is, the vectors
1 114648 180082
 −2 ,  −122759  and  188467 . For m = 2, we have
4 −90039 49214
¡ kl ¢1/d
p /|f |m ≈ 66.09.
Clearly, b1 has L2 norm less than this, whereas b2 and b3 have L2 norms larger
than this. So t = 1 (see Proposition 3.42), that is, deg g(x) = m + 1 − t = 2.
Since t = 1, no gcd calculation is necessary, and g(x) = b1 (x) = x2 − 2x + 4.
Arithmetic of Polynomials 169

Once the irreducible factor g(x) is discovered, what remains is to factor


f (x)/g(x) = x3 + 2x − 1. I do not show this factoring attempt here. Indeed,
this cofactor is an irreducible polynomial (in Z[x]). ¤

Some comments on the L3 factoring algorithm are now in order. First, let
me prescribe a way to fix the parameters. Since the irreducible factor g(x)
may be of degree as large as d − 1, it is preferable to start with m = d − 1.
For this choice, we compute the right side of Condition (3.3). A prime p and a
positive integer k is then chosen to satisfy this condition (may be for the most
pessimistic case l = 1). The choice k = 1 is perfectly allowed. Even if some
k > 1 is chosen, lifting the factorization of f (x) modulo p to the factorization
modulo pk is an easy effort. Factoring f (x) modulo p can also be efficiently
done using the randomized algorithm described in Section 3.3.
If f (x) remains irreducible modulo p, we are done. Otherwise, we choose
any irreducible factor of f (x) modulo pk as γ(x). Under the assumption that p
does not divide Discr(f ), no factor of f modulo p has multiplicity larger than
one. The choice of γ fixes l, and we may again investigate for which values of
m, Condition (3.3) holds. We may start with any such value of m. However,
the choice m = d − 1 is always safe, since a value of m smaller than the degree
of g(x) forces us to repeat the basis-reduction process for a larger value of m.
The basic difference between Berlekamp’s factoring algorithm and the L3
algorithm is that in Berlekamp’s algorithm, we may have to explore an expo-
nential number of combinations of the irreducible factors of f modulo p. On
the contrary, the L3 algorithm starts with any (and only one) suitable fac-
tor of f modulo p in order to discover one irreducible factor of f . Therefore,
the L3 algorithm achieves a polynomial running time even in the worst case.
However, both these algorithms are based on factoring f modulo p. Although
this can be solved efficiently using randomized algorithms, there is no known
polynomial-time deterministic algorithm for this task.
Lenstra et al. estimate that the L3 algorithm can factor f completely using
only O(d6 + d5 log |f | + d4 log p) arithmetic operations on integers of bit sizes
bounded above by O(d3 + d2 log |f | + d log p).

3.5.4 Factoring in GP/PARI


The generic factoring function in GP/PARI is factor().

gp > factor(35*x^5 - 22*x^3 + 10*x^2 + 3*x - 2)


%1 =
[5*x^2 - 1 1]

[7*x^3 - 3*x + 2 1]

gp > factor(Mod(35,13)*x^5-Mod(22,13)*x^3+Mod(10,13)*x^2+Mod(3,13)*x-Mod(2,13))
%2 =
[Mod(1, 13)*x + Mod(5, 13) 1]
170 Computational Number Theory

[Mod(1, 13)*x + Mod(10, 13) 1]

[Mod(1, 13)*x + Mod(11, 13) 1]

[Mod(1, 13)*x^2 + Mod(5, 13) 1]

GP/PARI supplies the built-in function qflll for lattice-basis reduction. The
initial basis vectors of a lattice should be packed in a matrix. Each column
should store one basis vector. The return value is again a matrix which is,
however, not the reduced basis vectors packed in a similar format. It is indeed
a transformation matrix which, when post-multiplied by the input matrix,
gives the reduced basis vectors. This is demonstrated for the two-dimensional
lattice of Example 3.40 and the three-dimensional lattice of Example 3.43.

gp > M = [ 7, 4; \
3, 3];
gp > T = qflll(M)
%2 =
[-1 -1]

[1 2]

gp > M * T
%3 =
[-3 1]

[0 3]

gp > M = [1,108272275755,0; 0,1,108272275755; 0,0,229497501943];


gp > M = mattranspose(M);
gp > T = qflll(M)
%6 =
[1 114648 180082]

[-108272275757 -12413199870881999 -19497887962323443]

[51080667973 5856296421718242 9198708849654953]

? M * T
%7 =
[1 114648 180082]

[-2 -122759 188467]

[4 -90039 49214]
Arithmetic of Polynomials 171

Exercises
1. [Multiplicative form of Möbius inversionQformula] Let f, g be two functions
of natural numbers satisfying f (n) = d|n g(d) for all n ∈ N. Prove that
Q Q
g(n) = d|n f (d)µ(n/d) = d|n f (n/d)µ(d) for all n ∈ N.
2. (a) Find an explicit formula for the product of all monic irreducible polyno-
mials of degree n in Fq [x].
(b) Find the product of all monic sextic irreducible polynomials of F2 [x].
(c) Find the product of all monic cubic irreducible polynomials of F4 [x].
3. Which of the following polynomials is/are irreducible in F2 [x]?
(a) x5 + x4 + 1.
(b) x5 + x4 + x + 1.
(c) x5 + x4 + x2 + x + 1.
4. Which of the following polynomials is/are irreducible in F3 [x]?
(a) x4 + 2x + 1.
(b) x4 + 2x + 2.
(c) x4 + x2 + 2x + 2.
5. Prove that a polynomial f (x) ∈ Fq [x] of degree two or three is irreducible if
and only if f (x) has no roots in Fq .
6. Argue that the termination criterion for the loop in Algorithm 3.4 may be
changed to deg f (x) 6 2r + 1. Modify the algorithm accordingly. Explain how
this modified algorithm may speed up distinct-degree factorization.
7. Establish that the square-free and the distinct-degree factorization algorithms
described in the text run in time polynomial in deg f and log q.
8. Consider the root-finding Algorithm 3.2. Let vα (x) = (x + α)(q−1)/2 − 1,
wα (x) = (x + α)(q−1)/2 + 1, v(x) = v0 (x), and w(x) = w0 (x).
(a) Prove that the roots of v(x) are all the quadratic residues of F∗q , and those
of w(x) are all the quadratic non-residues of F∗q .
(b) Let f (x) ∈ Fq [x] with d = deg f > 2 be a product of distinct linear factors.
Assume that the roots of f (x) are random elements of Fq . Moreover, assume
that the quadratic residues in F∗q are randomly distributed in F∗q . Compute
the probability that the polynomial gcd(f (x), vα (x)) is a non-trivial factor of
f (x) for a randomly chosen α ∈ Fq .
(c) Deduce that the expected running time of Algorithm 3.2 is polynomial in
d and log q.
9. (a) Generalize Exercise 3.8 in order to compute the probability that a random
α ∈ Fq splits f (x) in two non-trivial factors in Algorithm 3.5. Make reasonable
assumptions as in Exercise 3.8.
(b) Deduce that the expected running time of Algorithm 3.5 is polynomial in
deg f and log q.
172 Computational Number Theory

10. Consider the root-finding Algorithm 3.3 over Fq , where q = 2n . Let v(x) =
2 3 n−1
x+x2 +x2 +x2 +· · ·+x2 , and w(x) = 1+v(x). Moreover, let f (x) ∈ Fq [x]
be a product of distinct linear
Y factors, and d = deg f (x). Y
(a) Prove that v(x) = (x + γ), and that w(x) = (x + γ), where
γ∈Fq γ∈Fq
Tr(γ)=0 Tr(γ)=1

Tr(γ) ∈ F2 is the trace of γ ∈ Fq as defined in Exercise 2.58.


(b) Prove that v(x+α) is equal to v(x) or w(x) for every α ∈ Fq . In particular,
gcd(v(x), f (x)) is a non-trivial factor of f (x) if and only if gcd(v(x + α), f (x))
is a non-trivial factor of f (x) for each α ∈ Fq .
This implies that if v(x) fails to split f (x) non-trivially, then so also fails every
v(x + α). If so, we need to resort to Algorithm 3.8 with r = 1, that is, we
choose u(x) ∈ Fq [x] and compute gcd(v(u(x)), f (x)) with the hope that this
gcd is a non-trivial divisor of f (x). We now propose a way to choose u(x).
n−1
(c) Let i ∈ N, and α ∈ F∗q . Prove that v(αx2i ) = v(α2 xi )2 .
(d) Take any two polynomials u1 (x), u2 (x) ∈ Fq [x]. Prove that v(u1 (x) +
u2 (x)) = v(u1 (x)) + v(u2 (x)) = w(u1 (x)) + w(u2 (x)).
(e) Suppose that all (non-constant) polynomials u(x) ∈ Fq [x] of degrees < s
fail to split f (x). Take u(x) ∈ Fq [x] of degree s and leading coefficient α. Prove
that u(x) splits f (x) non-trivially if and only if αxs splits f (x) non-trivially.
(f ) Prescribe a strategy to choose the polynomial u(x) in Algorithm 3.8.
11. Generalize the ideas developed in Exercise 3.10 to rewrite Algorithm 3.8 for
equal-degree factorization. More precisely, establish that the above sequence
of choosing u(x) works for equal-degree factorization too.
2 nr−1
n
12. Let q = pY , r ∈ N, and v(x) = x + xp + xp +Y · · · + xp . Prove that
qr
x −x = (v(x) − a). Prove also that v(x) − a = (x − γ) for each a ∈ Fp .
a∈Fp γ∈Fq r
Tr(γ)=a

13. Find all the roots of x6 + x + 5 in F17 .


14. Find all the roots of x4 + (θ + 1)x + θ in F8 = F2 (θ) with θ3 + θ + 1 = 0.
15. Find all the roots of x5 + (θ + 1)x + (2θ + 1) in F9 = F3 (θ) with θ2 + 1 = 0.
16. Find the square-free factorization of x9 + x8 + x7 + x6 + x5 + x4 + x + 1 ∈ F2 [x].
17. Find the square-free factorization of x10 + 2x9 + x8 + 2x5 + 2x4 + 1 ∈ F3 [x].
18. Find the square-free factorization of x20 + (θ + 1)x8 + θ ∈ F8 [x], where F8 =
F2 (θ) with θ3 + θ + 1 = 0.
19. Find the square-free factorization of x15 + (2θ + 1)x12 + (θ + 2) ∈ F9 [x], where
F9 = F3 (θ) with θ2 + 1 = 0.
20. Find the distinct-degree factorization of x8 + x3 + x2 + 1 ∈ F2 [x].
21. Find the distinct-degree factorization of x8 + x2 + 1 ∈ F3 [x].
22. Find the distinct-degree factorization of x4 + (θ + 1)x + θ ∈ F8 [x], where
F8 = F2 (θ) with θ3 + θ + 1 = 0.
23. Find the distinct-degree factorization of x5 + (θ + 1)x + (2θ + 1) ∈ F9 [x], where
F9 = F3 (θ) with θ2 + 1 = 0.
Arithmetic of Polynomials 173

24. Find the equal-degree factorization of x4 + 7x + 2 ∈ F17 [x], which is a product


of two quadratic irreducible polynomials.
25. Find the equal-degree factorization of x6 + 16x5 + 3x4 + 16x3 + 8x2 + 8x + 14 ∈
F17 [x], which is a product of three quadratic irreducible polynomials.
26. Represent F8 = F2 (θ) with θ3 + θ + 1 = 0. Find the equal-degree factorization
of x4 +(θ +1)x2 +θx+(θ2 +θ +1) ∈ F8 [x], which is a product of two quadratic
irreducible polynomials.
27. Represent F9 = F3 (θ) with θ2 + 1 = 0. Find the equal-degree factorization of
x4 + θx2 + x + (θ + 2) ∈ F9 [x], which is a product of two quadratic irreducible
polynomials.
28. [Chinese remainder theorem for polynomials] Let K be a field, and m1 (x),
m2 (x), . . . , mt (x) be pairwise coprime non-constant polynomials in K[x] with
di = deg mi (x). Prove that given polynomials a1 (x), a2 (x),P. . . , at (x) ∈ K[x],
t
there exists a unique polynomial f (x) ∈ K[x] of degree < i=1 di such that

f (x) ≡ ai (x) (mod mi (x)) for all i = 1, 2, . . . , t.

29. Let f (x) ∈ Fq [x] be a monic non-constant polynomial, andQlet h(x) ∈ Fq [x]
satisfy h(x)q ≡ h(x) (mod f (x)). Prove that h(x)q − h(x) = γ∈Fq (h(x) − γ).
Q
Conclude that f (x) = γ∈Fq gcd(f (x), h(x) − γ).
30. [Berlekamp’s Q-matrix factorization] You are given a monic non-constant
square-free polynomial f (x) ∈ Fq [x] with t irreducible factors (not necessarily
of the same degree). Let d = deg f (x).
(a) Prove that there are exactly q t polynomials of degrees less than d satis-
fying h(x)q ≡ h(x) (mod f (x)).
(b) In order to determine all these polynomials h(x) of Part (a), write h(x) =
α0 + α1 x + α2 x2 + · · · + αd−1 xd−1 . Derive a d × d matrix Q such that the
unknown coefficients α0 , α1 , α2 , . . . , αd−1 ∈ Fq can be obtained by solving the
t
homogeneous linear system Q ( α0 α1 α2 · · · αd−1 ) = 0 in Fq .
(c) Deduce that the matrix Q has rank d − t and nullity t.
(d) Suppose that t > 2. Let V ∼ = Ftq denote the nullspace of Q. Prove that for
every two irreducible factors f1 (x), f2 (x) of f (x) and for every two distinct
elements γ1 , γ2 ∈ Fq , there exists an (α0 , α1 , . . . , αd−1 ) ∈ V such that h(x) ≡
γ1 (mod f1 (x)) and h(x) ≡ γ2 (mod f2 (x)), where h(x) = α0 + α1 x + · · · +
αd−1 xd−1 . Moreover, for any basis of V , we can choose distinct γ1 , γ2 ∈ Fq in
such a way that (α0 , α1 , . . . , αd−1 ) is a vector of the basis.
(e) Assume that q is small. Propose a deterministic polynomial-time algo-
rithm for factoring f (x) based on the ideas developed in this exercise.
31. Factor x8 + x5 + x4 + x + 1 ∈ F2 [x] using Berlekamp’s Q-matrix algorithm.
32. Let f (x) ∈ Fq [x] be as in Exercise 3.30. Evidently, f (x) is irreducible if and
only if t = 1. Describe a polynomial-time algorithm for checking the irre-
ducibility of f (x), based upon the determination of t. You do not need to
assume that q is small. Compare this algorithm with Algorithm 3.1.
174 Computational Number Theory

33. Using Eisenstein’s criterion, prove the irreducibility in Z[x] of:


(a) 7x2 − 180.
(b) x23 − 9x12 + 15.
(c) x2 + x + 2. (Hint: Replace x by x + 3.)
(d) x4 + 2x + 7.
34. [Eisenstein’s criterion in K[x, y] ] Let K be a field. Write f (x, y) ∈ K[x, y]
as f (x, y) = ad (y)xd + ad−1 (y)xd−1 + · · · + a1 (y)x + a0 (y) with ai (y) ∈ K[y].
Assume that d > 1, and gcd(a0 (y), a1 (y), a2 (y), . . . , ad (y)) = 1. Suppose that
there exists an irreducible polynomial p(y) in K[y] with the properties that
p(y)6 | ad (y), p(y)|ai (y) for i = 0, 1, . . . , d − 1, and p(y)2 6 | a0 (y). Prove that
f (x, y) is irreducible in K[x, y].
35. Prove that x4 y + xy 4 + 4xy 2 − y 3 + 4x − 2y is irreducible in Q[x, y].
36. Let n ∈ N. A complex number ω satisfying ω n = 1 is called an n-th root of
unity. An n-th root ω of unity is called primitive if ω m 6= 1 for 1 6 m < n.
Denote by P (n) the set of all primitive n-th
Q roots of unity. The n-th cyclotomic
polynomial Φn (x) is defined as Φn (x) = ω∈P (n) (x − ω) ∈ C[x].
(a) Prove that deg Φn (x) = φ(n), where φ() is Euler’s totient function.
(b) Compute Φn (x) for n = 1, 2, 3, 4, 5, 6.
Q
(c) Prove that xn − 1 = d|n Φd (x).
Q
(d) Using Möbius inversion formula, deduce that Φn (x) = d|n (xd − 1)µ(n/d) .
(e) Prove that Φp (x) = xp−1 + xp−2 + · · · + x + 1 for any prime p.
n−1
(f ) Prove that Φ2n (x) = x2 + 1 for any n ∈ N.
(g) Derive a formula for Φpn (x) for an odd prime p and for n ∈ N.
(h) Prove that Φn (x) ∈ Z[x].
(i) Prove that Φn (x) is irreducible in Z[x].
37. Let f (x) = a0 + a1 x + · · · + ad xd ∈ C[x] with d > 1, ad 6= 0. Prove that:
(a) |(x − z)f (x)| = |(z̄x − 1)f (x)| for any z ∈ C (where z̄ is the complex
conjugate of z).
(b) [Inequality of Gonçalves] M (f )2 + |a0 ad |2 M (f )−2 6 |f |2 .
(c) [Inequality of Landau] M (f ) 6 |f |. Moreover, if f (x) is not a monomial,
then M (f ) < |f |.
38. Prove Proposition 3.27.
39. Prove the formulas in Example 3.31(1).
40. Let K be a field, a ∈ K, and 0 6= f (x), g(x), h(x) ∈ K[x]. Prove that:
(a) Res((x − a)f (x), g(x)) = g(a) Res(f (x), g(x)).
(b) Res(f (x), g(x)h(x)) = Res(f (x), g(x)) × Res(f (x), h(x)).
41. Prove that the non-constant polynomials f (x), g(x) ∈ K[x] (K a field) have a
common non-constant factor if and only if u(x)f (x) + v(x)g(x) = 0 for some
non-zero u(x), v(x) ∈ K[x] with deg u(x) < deg g(x) and deg v(x) < deg f (x).
42. Prove that α ∈ Fpn is a normal element if and only if f (z) = z n − 1 and
n−2 n−1
g(z) = αz n−1 + αp z n−2 + · · · + αp z + αp are coprime in Fpn [z].
Arithmetic of Polynomials 175

43. [Hadamard’s inequality] Let M be an n × n matrix with real entries. Denote


Qnvector, and let |bi |
the i-th column of M by bi . Treat bi as an n-dimensional
denote the length of this vector. Prove that | det M | 6 i=1 |bi |.
44. Let M be an n × n matrix with integer entries such that each entry of M is
±1. Prove that | det M | 6 nn/2 .
45. Find the square-free factorization of f (x) = x7 + 6x6 + 6x5 − 3x4 + 27x3 +
3x2 − 32x + 12:
(a) in Z[x], and
(b) in F17 [x].
(c) Factor the discriminants of the square-free factors of f found in Part (a).
46. Factor f (x) = 2x4 + 2x3 + 7x2 + x + 3 in Z[x] using Berlekamp’s algorithm.
Note that Discr(f ) = 64152 = 23 × 36 × 11. So you may take p = 5.
47. Let p be a prime, and f (x) = x4 + 1 ∈ Fp [x].
(a) Let p = 2. Describe how f (x) factors in F2 [x].
(b) Let p ≡ 1 (mod 4). Prove that f (x) = (x2 + α)(x2 − α) in Fp [x], where
α2 ≡ −1 (mod p). ³ ´
−2
(c) Let p ≡ 3 (mod 8). Prove that p = 1. Let α2 ≡ −2 (mod p). Prove
that f (x) = (x2 + αx − 1)(x2 − αx ³
− 1)
´ in Fp [x].
2
(d) Let p ≡ 7 (mod 8). Prove that p = 1. Let α2 ≡ 2 (mod p). Prove that
f (x) = (x2 + αx + 1)(x2 − αx + 1) in Fp [x].
n
48. (a) Prove that for every n ∈ N, n > 2, the polynomial x2 + 1 is reducible in
Fp [x] for any prime p.
n
(b) Prove that for every n ∈ N, the polynomial x2 + 1 is irreducible in Z[x].
49. Prove that the degree of an irreducible polynomial in R[x] is one or two.
50. Let b1 , b2 , . . . , bn constitute a reduced basis of a lattice in Rn , and b∗1 , b∗2 , . . . ,
b∗n its Gram–Schmidt orthogonalization. Prove that:
(a) |bj |2 6 Q 2i−1 |b∗i |2 for all i, j with 1 6 j 6 i 6 n.
n
(b) d(L) 6 i=1 |bi | 6 2n(n−1)/4 d(L).
(n−1)/4
(c) |b1 | 6 2 d(L)1/n .
51. Prove Proposition 3.39.

Programming Exercises

52. Write a GP/PARI function that computes the number of monic irreducible poly-
nomials of degree m in Fq [x]. (Hint: Use the built-in function moebius().)
53. Write a GP/PARI function that computes the product of all monic irreducible
polynomials of degree m in Fq [x]. (Hint: Exercise 3.2.)
54. Write a GP/PARI function that checks whether a non-constant polynomial in
the prime field Fp [x] is square-free. (Hint: Use the built-in function deriv().)
55. Write a GP/PARI function that computes the square-free factorization of a
monic non-constant polynomial in the prime field Fp [x].
176 Computational Number Theory

56. Write a GP/PARI function that computes the distinct-degree factorization of a


monic square-free polynomial in Fp [x], where p is a prime (Algorithm 3.4).
57. Let f (x) ∈ Fp [x] (p > 2 is a prime) be monic with each irreducible factor
having the same known degree r. Write a GP/PARI function that computes the
equal-degree factorization of f (x) (Algorithm 3.7).
58. Let f (x) ∈ F2 [x] be monic with each irreducible factor having the same known
degree r. Write a GP/PARI function that computes the equal-degree factoriza-
tion of f (x) (Algorithm 3.8).
59. Write a GP/PARI program implementing Berlekamp’s Q-matrix factorization
(Exercise 3.30). (Hint: Use the built-in functions polcoeff(), matrix(),
matker() and matsize().)
60. Write a GP/PARI program that implements Hensel lifting of Theorem 3.33.
Chapter 4
Arithmetic of Elliptic Curves

4.1 What Is an Elliptic Curve? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178


4.2 Elliptic-Curve Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
4.2.1 Handling Elliptic Curves in GP/PARI . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
4.3 Elliptic Curves over Finite Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
4.4 Some Theory of Algebraic Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
4.4.1 Affine and Projective Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
4.4.1.1 Affine Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
4.4.1.2 Projective Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200
4.4.2 Polynomial and Rational Functions on Curves . . . . . . . . . . . . . . . . . . . 205
4.4.3 Rational Maps and Endomorphisms on Elliptic Curves . . . . . . . . . . 213
4.4.4 Divisors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217
4.5 Pairing on Elliptic Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
4.5.1 Weil Pairing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
4.5.2 Miller’s Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
4.5.3 Tate Pairing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
4.5.4 Non-Rational Homomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
4.5.4.1 Distortion Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
4.5.4.2 Twists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
4.5.5 Pairing-Friendly Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
4.5.6 Efficient Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
4.5.6.1 Windowed Loop in Miller’s Algorithm . . . . . . . . . . . . . . . . . . 237
4.5.6.2 Final Exponentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
4.5.6.3 Denominator Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
4.5.6.4 Loop Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
Eta Pairing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
Ate Pairing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
Twisted Ate Pairing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
Ate i Pairing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
R-ate Pairing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
4.6 Elliptic-Curve Point Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
4.6.1 A Baby-Step-Giant-Step (BSGS) Method . . . . . . . . . . . . . . . . . . . . . . . . 244
4.6.1.1 Mestre’s Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
4.6.2 Schoof’s Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255

The study of elliptic curves is often called arithmetic algebraic geometry. Re-
cent mathematical developments in this area has been motivated to a large
extent by attempts to prove Fermat’s last theorem which states that the equa-

177
178 Computational Number Theory

tion xn +y n = z n does not have non-trivial solutions in integer values of x, y, z


for (integers) n > 3. (The trivial solutions correspond to xyz = 0.)1
Purely mathematical objects like elliptic curves are used in a variety of
engineering applications, most notably in the area of public-key cryptography.
Elliptic curves are often preferred to finite fields, because the curves offer a
wide range of groups upon which cryptographic protocols can be built, and also
because keys pertaining to elliptic curves are shorter (than those pertaining to
finite fields), resulting in easier key management. It is, therefore, expedient to
look at the arithmetic of elliptic curves from a computational angle. Moreover,
we require elliptic curves for an integer-factoring algorithm.

4.1 What Is an Elliptic Curve?

Elliptic curves are plane algebraic curves of genus one.2 Cubic and quartic
equations of special forms in two variables3 X, Y are elliptic curves. The Greek

1 In 1637, the amateur French mathematician Pierre de Fermat wrote a note in his per-
sonal copy of Bachet’s Latin translation of the Greek book Arithmetica by Diophantus.
The note translated in English reads like this: “It is impossible to separate a cube into two
cubes, or a fourth power into two fourth powers, or in general, any power higher than the
second into two like powers. I have discovered a truly marvelous proof of this, which this
margin is too narrow to contain.” It is uncertain whether Fermat really discovered a proof.
However, Fermat himself published a proof for the special case n = 4 using a method which
is now known as Fermat’s method of infinite descent.
Given Fermat’s proof for n = 4, one proves Fermat’s last theorem for any n > 3 if one
supplies a proof for all primes n > 3. Some special cases were proved by Euler (n = 3), by
Dirichlet and Legendre (n = 5) and by Lamé (n = 7). In 1847, Kummer proved Fermat’s last
theorem for all regular primes. However, there exist infinitely many non-regular primes. A
general proof for Fermat’s last theorem has eluded mathematicians for over three centuries.
In the late 1960s, Hellegouarch discovered a connection between elliptic curves and Fer-
mat’s last theorem, which led Gerhard Frey to conclude that if a conjecture known as the
Taniyama–Shimura conjecture for elliptic curves is true, then Fermat’s last theorem holds
too. The British mathematician Andrew Wiles, with the help of his student Richard Taylor,
finally proved Fermat’s last theorem in 1994. Wiles’ proof is based on very sophisticated
mathematics developed in the 20th century, and only a handful of living mathematicians
can claim to have truly understood the entire proof.
It is debatable whether Fermat’s last theorem is really a deep theorem that deserved such
prolonged attention. Nonetheless, myriads of failed attempts to prove this theorem have,
without any shred of doubt, intensely enriched several branches of modern mathematics.
2 Loosely speaking, the genus of a curve is the number of handles in it. Straight lines and

conic sections do not have handles, and have genus zero.


3 Almost everywhere else in this book, lower-case letters x, y, z, . . . are used in polyno-

mials. In this chapter, upper-case letters X, Y, Z, . . . are used as variables in polynomials,


whereas the corresponding lower-case letters are reserved for a (slightly?) different purpose
(see Section 4.4). I hope this notational anomaly would not be a nuisance to the readers.
Arithmetic of Elliptic Curves 179

mathematician Diophantus seems to be the first to study such curves.4 Much


later, Euler, Gauss, Jacobi, Abel and Weierstrass (among many others) studied
these curves in connection with solving elliptic integrals. This is the genesis
of the name elliptic curve. An ellipse is specified by a quadratic equation and
has genus zero, so ellipses are not elliptic curves. Diophantus observed5 that
a quartic equation of the form

Y 2 = (X − a)(X 3 + bX 2 + cX + d) (4.1)

can be converted to the cubic equation

Y 2 = αX 3 + βX 2 + γX + 1 (4.2)

by the substitution of X by (aX + 1)/X and of Y by Y /X 2 (where α =


a3 + a2 b + ac + d, β = 3a2 + 2ab + c, and γ = 3a + b). Multiplying both sides
of Eqn (4.2) by α2 and renaming αX and αY as X and Y respectively, we
obtain an equation of the form

Y 2 = X 3 + µX 2 + νX + η. (4.3)

Such substitutions allow us to concentrate on cubic curves of the following par-


ticular form. It turns out that any plane curve of genus one can be converted
to this particular form after suitable substitutions of variables.

Definition 4.1 An elliptic curve E over a field K is defined by the cubic


equation

E : Y 2 + a1 XY + a3 Y = X 3 + a2 X 2 + a4 X + a6 (4.4)

with a1 , a2 , a3 , a4 , a6 ∈ K. Eqn (4.4) is called the Weierstrass equation6 for


the elliptic curve E. In order that E qualifies as an elliptic curve, we require
E to contain no points of singularity in the following sense. ⊳

Definition 4.2 Let C : f (X, Y ) = 0 be the equation of a (plane) curve


defined over some field K. A point of singularity on the curve C is a point
∂f ∂f
(h, k) ∈ K 2 for which f (h, k) = 0, ∂X (h, k) = 0, and ∂Y (h, k) = 0. A curve is
called non-singular or smooth if it contains no points of singularity. An elliptic
curve is required to be non-singular. ⊳
4 Diophantine equations are named after Diophantus of Alexandria. These are multi-

variate polynomial equations with integer coefficients, of which integer or rational solutions
are typically investigated.
5 Diophantus studied special cases, whereas for us it is easy to generalize the results

for arbitrary a, b, c, d. Nevertheless, purely algebraic substitutions as illustrated above are


apparently not motivated by any geometric or arithmetic significance. Diophantus seems
to be the first to use algebraic tools in geometry, an approach popularized by European
mathematicians more than one and a half millenniums after Diophantus’s time.
6 Karl Theodor Wilhelm Weierstrass (1815–1897) was a German mathematician famous

for his contributions to mathematical analysis. The Weierstrass elliptic (or P) function ℘ is
named after him.
180 Computational Number Theory

Example 4.3 (1) Take K = R. Three singular cubic curves are shown in
Figure 4.1. In each of these three examples, the point of singularity is the origin
(0, 0). If the underlying field is R, we can identify the type of singularity from
the Hessian7 of the curve C : f (X, Y ) = 0, defined as
à 2 2
!
∂ f ∂ f
Hessian(f ) = ∂X 2 ∂X∂Y .
∂2f ∂2f
∂Y ∂X ∂Y 2

FIGURE 4.1: Singular cubic curves


1
4
1
0.5
0.5 2

0 0 0

-0.5 -2
-0.5
-1
-4
-1
0 0.2 0.4 0.6 0.8 1 -1 -0.5 0 0.5 1 0 0.5 1 1.5 2 2.5 3

(a) (b) (c)


2 3
(a) A cusp or a spinode: Y = X .
(b) A loop or a double-point or a crunode: Y 2 = X 3 + X 2
(c) An isolated point or an acnode: Y 2 = X 3 − X 2

FIGURE 4.2: Elliptic curves


2 4
4

1 2 2

0 0 0

-1 -2 -2

-4
-2 -4

-1 -0.5 0 0.5 1 1.5 2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 -1 -0.5 0 0.5 1 1.5 2 2.5 3

2 3 2 3 2 3
(a) Y = X − X (b) Y = X − X + 1 (c) Y = X + X

A point of singularity P on C is a cusp if and only if Hessian(f ) at P has


determinant 0. P is a loop if and only if Hessian(f ) at P is negative-definite.8
Finally, P is an isolated point if and only if Hessian(f ) at P is positive-definite.
In each case, the tangent is not (uniquely) defined at the point of singularity.
7 The Hessian matrix is named after the German mathematician Ludwig Otto Hesse

(1811–1874). Hesse’s mathematical contributions are in the area of analytic geometry.


8 An n × n matrix A with real entries is called positive-definite (resp. negative-definite)

if for every non-zero column vector v of size n, we have v t Av > 0 (resp. v t Av < 0).
Arithmetic of Elliptic Curves 181

(2) Three elliptic curves over R are shown in Figure 4.2. The partial
∂f ∂f
derivatives ∂X and ∂Y do not vanish simultaneously at every point on these
curves, and so the tangent is defined at every point on these curves. The curve
of Part (a) has two disjoint components in the X-Y plane, whereas the curves
of Parts (b) and (c) have only single components. The bounded component of
the curve in Part (a) is the broken handle. For the other two curves, the handles
are not discernible. For real curves, handles may be broken or invisible, since
R is not algebraically closed. Curves over the field C of complex numbers
have discernible handles. But then a plane curve over C requires four real
dimensions, and is impossible to visualize in our three-dimensional world.
(3) Let us now look at elliptic curves over finite fields. Take K = F17 .
The curve defined by Y 2 = X 3 + 5X − 1 is not an elliptic curve, because it
contains a singularity at the point (2, 0). Indeed, we have X 3 + 5X − 1 ≡
(X + 15)2 (X + 13) (mod 17). However, the equation Y 2 = X 3 + 5X − 1 defines
an elliptic over R, since X 3 + 5X − 1 has no multiple roots in R (or C).
The curve E1 : Y 2 = X 3 − 5X + 1 is non-singular over F17 with 15 points:
(0, 1), (0, 16), (2, 4), (2, 13), (3, 8), (3, 9), (5, 4), (5, 13), (6, 0), (10, 4),
(10, 13), (11, 6), (11, 11), (13, 5), (13, 12).
There is only one point on E1 with Y -coordinate equal to 0. We have the
factorization X 3 − 5X + 1 ≡ (X + 11)(X 2 + 6X + 14) (mod 17).
The curve E2 : Y 2 = X 3 − 4X + 1 is non-singular over F17 with 24 points:
(0, 1), (0, 16), (1, 7), (1, 10), (2, 1), (2, 16), (3, 4), (3, 13), (4, 7), (4, 10), (5, 2),
(5, 15), (10, 3), (10, 14), (11, 8), (11, 9), (12, 7), (12, 10), (13, 2), (13, 15),
(15, 1), (15, 16), (16, 2), (16, 15).
E2 contains no points with Y -coordinate equal to 0, because X 3 − 4X + 1 is
irreducible in F17 [X].
The non-singular curve E3 : Y 2 = X 3 −3X +1 over F17 contains 19 points:
(0, 1), (0, 16), (1, 4), (1, 13), (3, 6), (3, 11), (4, 6), (4, 11), (5, 3), (5, 14), (7, 0),
(8, 8), (8, 9), (10, 6), (10, 11), (13, 0), (14, 0), (15, 4), (15, 13).
Since X 3 − 3X + 1 ≡ (X + 3)(X + 4)(X + 10) (mod 17), there are three points
on E3 with Y -coordinate equal to 0.
(4) Take K = F2n and a curve C : Y 2 = X 3 +aX 2 +bX +c with a, b, c ∈ K.
∂f ∂f
Write f (X, Y ) = Y 2 − (X 3 + aX 2 + bX + c). Then, ∂X = X 2 + b, and ∂Y = 0.
Every element in F2n has a unique square root in F2n . In particular, X 2 + b
n−1
has the root h = b2 . Plugging in this value of X in Y 2 = X 3 +aX 2 +bX +c
n−1
gives a unique solution for Y , namely k = (h3 + ah2 + bh + c)2 . But then,
(h, k) is a point of singularity on C. This means that a curve of the form
Y 2 = X 3 + aX 2 + bX + c is never an elliptic curve over F2n . Therefore, we
must have non-zero term(s) involving XY and/or Y on the left side of the
Weierstrass equation in order to obtain an elliptic curve over F2n .
182 Computational Number Theory

As a specific example, represent F8 = F2 (θ) with θ3 + θ + 1 = 0, and


consider the curve E : Y 2 + XY = X 3 + X 2 + θ. Let us first see that the
curve is non-singular. Write f (X, Y ) = Y 2 + XY + X 3 + X 2 + θ. But then,
∂f 2 ∂f ∂f ∂f
∂X = Y + X , and ∂Y = X. The condition ∂X = ∂Y = 0 implies X = Y = 0.
But (0, 0) is not a point on E. Therefore, E is indeed an elliptic curve over
F8 . It contains the following nine points:

(0, θ2 + θ), (1, θ2 ), (1, θ2 + 1), (θ, θ2 ), (θ, θ2 + θ), (θ + 1, θ2 + 1),


(θ + 1, θ2 + θ), (θ2 + θ, 1), (θ2 + θ, θ2 + θ + 1).

(5) As our final example, represent K = F9 = F3 (θ), where θ2 + 1 = 0.


Consider the curve E : Y 2 = X 3 + X 2 + X + θ defined over F9 . Writing
f (X, Y ) = Y 2 − (X 3 + X 2 + X + θ) gives the condition for singularity to be
∂f ∂f
∂X = X − 1 = 0 and ∂Y = 2Y = 0, that is, X = 1, Y = 0. But the point
(1, 0) does not lie on the curve, that is, E is an elliptic curve. It contains the
following eight points:

(0, θ + 2), (0, 2θ + 1), (1, θ + 2), (1, 2θ + 1), (θ + 1, θ), (θ + 1, 2θ),
(2θ + 2, 1), (2θ + 2, 2).

E contains no points with Y -coordinate equal to 0, since X 3 + X 2 + X + θ is


irreducible in F9 [X]. ¤

It is convenient to work with simplified forms of the Weierstrass equation


(4.4). The simplification depends upon the characteristic of the underlying
field K. If char K 6= 2, substituting Y by Y − (a1 X + a3 )/2 eliminates the
terms involving XY and Y , and one can rewrite Eqn (4.4) as

Y 2 = X 3 + b2 X 2 + b4 X + b6 . (4.5)

If we additionally have char K 6= 3, we can replace X by (X − 3b2 )/36 and


Y by Y /216 in order to simplify Eqn (4.5) further as

Y 2 = X 3 + aX + b. (4.6)

If char K = 2, we cannot use the simplified Eqns (4.5) or (4.6). Indeed,


we have argued in Example 4.3(d) that curves in these simplified forms are
singular over F2n and so are not elliptic curves at all. If a1 = 0 in Eqn (4.4),
we replace X by X + a2 to eliminate the term involving X 2 , and obtain

Y 2 + aY = X 3 + bX + c. (4.7)

Such a curve is called supersingular. If a1 6= 0, we replace X by a21 X + aa13 and


a21 a4 +a23
Y by a31 Y + a31
in order to obtain the equation

Y 2 + XY = X 3 + aX 2 + b. (4.8)

This curve is called non-supersingular or ordinary.


Arithmetic of Elliptic Curves 183

All coordinate transformations made above are invertible. Therefore, there


is a one-to-one correspondence between the points on the general curve (4.4)
and the points on the curves with the simplified equations. That is, if we
work with the simplified equations, we do not risk any loss of generality. The
following table summarizes the Weierstrass equations.

K Weierstrass equation
Any field Y 2 + (a1 X + a3 )Y = X 3 + a2 X 2 + a4 X + a6
char K 6= 2, 3 Y 2 = X 3 + aX + b
char K = 2 Y 2 + aY = X 3 + bX + c (Supersingular curve)
Y 2 + XY = X 3 + aX 2 + b (Ordinary curve)
char K = 3 Y 2 = X 3 + aX 2 + bX + c

4.2 Elliptic-Curve Group


Let E be an elliptic curve defined over a field K. A point (X, Y ) ∈ K 2
that satisfies the equation of the curve is called a finite K-rational point on E.
For a reason that will be clear soon, E is assumed to contain a distinguished
point O called the point at infinity.9 This point lies on E even if K itself is a
finite field. One has to go to the so-called projective space in order to visualize
this point. All (finite) K-rational points on E together with O constitute a
set denoted by EK .10 We provide EK with a binary operation, traditionally
denoted by addition (+), under which EK becomes a commutative group.11
The basic motivation for defining the group EK is illustrated in Figure 4.3.
For simplicity, we assume that the characteristic of the underlying field K is
not two, so we can use the equation Y 2 = X 3 + aX 2 + bX + c for E. The
curve in this case is symmetric about the X-axis, that is, if (h, k) is a (finite)
K-rational point on E, then so also is its reflection (h, −k) about the X-axis.
Take two points P = (h1 , k1 ) and Q = (h2 , k2 ) on E. Let L : Y = λX + µ
be the (straight) line passing through these two points. For a moment assume
that P 6= Q and that the line L is not vertical. Substituting Y by λX + µ
in the equation for E gives a cubic equation in X. Obviously, X = h1 and
X = h2 satisfy this cubic equation, and it follows that the third root X = h3
of the equation must also belong to the underlying field K. In other words,
9 The concept of points at infinity was introduced independently by Johannes Kepler

(1571–1630) and Gérard Desargues (1591–1661).


10 Many authors prefer to use the symbol E(K) instead of E . I prefer E , because we
K K
will soon be introduced to an object denoted as K(E).
11 In 1901, the French mathematician Jules Henri Poincaré (1854–1912) first proved this

group structure.
184 Computational Number Theory

FIGURE 4.3: Motivating addition of elliptic curve points


R
Q R
Q P Q
R P
P

(a) P + Q + R = 0

R
P P
P

R R

(a) P + P + R = 0

P
P P

Q Q
Q

(c) P + Q = 0

P P P

(d) P + P = 0

there exists a unique third point R lying on the intersection of E with L. The
group operation will satisfy the condition P + Q + R = 0 in this case.
Part (b) of Figure 4.3 shows the special case P = Q = (h, k). In this
case, the line L : Y = λX + µ is taken to be the tangent to the curve at P .
Substituting λX + µ for Y in the equation for E gives a cubic equation in X,
of which X = h is a double root. The third root identifies a unique intersection
point R of the curve E with L. We take P + P + R = 2P + R = 0.
Now, take two points P = (h, k) and Q = (h, −k) on the curve (Part (c)
of Figure 4.3). The line passing through these two points has the equation
L : X = h. Substituting X by h in the equation for E gives a quadratic
equation in Y , which has the two roots ±k. The line L does not meet the
curve E at any other point in K 2 . We set P + Q = 0 in this case.
Arithmetic of Elliptic Curves 185

A degenerate case of Part (c) is P = Q = (h, 0), and the tangent to E at


the point P is vertical (Part (d) of Figure 4.3). Substituting X by h in the
equation of E now gives us the equation Y 2 = 0 which has a double root at
Y = 0. We set P + P = 2P = 0 in this case.
In view of these observations, we can now state the group law in terms
of the so-called chord-and-tangent rule. The group operation is illustrated in
Figure 4.4. Part (c) illustrates that if P and Q are two distinct points on the
curve having the same x-coordinate, then P + Q = 0, that is, Q = −P , that
is, the reflection of P about the X-axis gives the negative of P . We call Q the
opposite of P (and P the opposite of Q) in this case. In the degenerate case
where the vertical line through P = Q is tangential to the curve E, we have a
point P which is its own opposite (Part (d)). There are at most three points
on the curve, which are opposites of themselves. For K = R, there are either
one or three such points. If K = C (or any algebraically closed field), there
are exactly three such points. Finally, if K is a finite field (of characteristic
not equal to two), there can be zero, one, or three such points.
Now, consider two points P, Q on E. If Q is the opposite of P , we define
P + Q = 0. Otherwise, the line passing through P and Q (or the tangent to
E at P in the case P = Q) is not vertical and meets the curve at a unique
third point R. We have P + Q + R = 0, that is, P + Q = −R, that is, P + Q
is defined to be the opposite of the point R (Parts (a) and (b) of Figure 4.4).
An important question now needs to be answered: What is the additive
identity 0 in this group? It has to be a point on E that should satisfy P + 0 =
0 + P = P for all points P on E. As can be easily argued from the partial
definition of point addition provided so far, no finite point on E satisfies this
property. This is where we bring the point O at infinity in consideration. We
assume that O is an infinitely distant point on E, that lies vertically above
every finite point in the X-Y plane. It is so distant from a finite point P that
joining P to O will produce a vertical line. Imagine that you are standing on
an infinitely long straight road. Near the horizon, the entire road, no matter
how wide, appears to converge to a point.
The question is why does this point lie on E? The direct answer is: for our
convenience. The illuminating answer is this: Look at the equation of an elliptic
curve over K = R. As X becomes large, we can neglect the terms involving
X 2 , X 1 , and X 0 , that is, Y 2 ≈ X 3 , that is, Y ≈ X 3/2 . But then, for any finite
3/2
Y −k
point P = (h, k) ∈ R2 , we have limX→∞ X−h = limX→∞ XX−h −k
= ∞, that
is, a point on E infinitely distant from P lies vertically above P .
There must be another point O′ on E infinitely distant from and vertically
below every finite point in the X-Y plane. Why has this point not been taken
into consideration? The direct answer is again: for our convenience. The illu-
minating answer is that if E happened to contain this point also, then some
vertical line would pass through four points on E, two finite points and the
two points O and O′ . This is a bad situation, where a cubic equation meets a
linear equation at more than three points. Moreover, we are going to use O as
186 Computational Number Theory

FIGURE 4.4: Addition of elliptic curve points


R
R
Q Q
Q P R P
P
P +Q
P +Q P +Q
(a) Addition of two points

R
P P
2P P 2P

R R
2P

(b) Doubling of a point

P
P P

−P −P
−P

(c) Opposite of a point

(d) Points which are their own opposites

the identity of the elliptic curve group. Thus, we would have −O = O, that
is, O′ and O are treated as the same point on E.
Does it look too imprecise or ad hoc? Perhaps, or perhaps not! The reader
needs to understand projective geometry in order to visualize the point at
infinity. Perfectly rigorous mathematical tools will then establish that there
is exactly one point at infinity on any elliptic curve. More importantly, if K
is any arbitrary field, even a finite field, the point O provably exists on the
curve. We defer this discussion until Section 4.4.
Let us now logically deduce that P + O = O + P = P for any finite point
P on EK . The line L passing through P and O is vertical by the choice of O.
Thus, the third point R where L meets E is the opposite of P , that is, R = −P .
By the chord-and-tangent rule, we then take P + O = O + P = −(−P ) = P .
Arithmetic of Elliptic Curves 187

Finally, what is O + O? Since Y 2 ≈ X 3 on the curve for large values of X,


the curve becomes vertical as X → ∞. Thus, the tangent L to the curve at
O is indeed a vertical line. The X-coordinate on L is infinity, that is, L does
not meet the curve E at any finite point. So the natural choice is O + O = 0,
that is, O + O = O. That is how the identity should behave.
Under the addition operation defined in this fashion, EK becomes a group.
This operation evidently satisfies the properties of closure, identity, inverse,
and commutativity. However, no simple way of proving the associativity of
this addition is known to me. Extremely patient readers may painstakingly
furnish a proof using the explicit formulas for point addition given below. A
proof naturally follows from the theory of divisors, but a complete exposure
to the theory demands serious algebra prerequisites on the part of the readers.
Let us now remove the restriction that the curve is symmetric about the
X-axis. This is needed particularly to address fields of characteristic two. Let
us work with the unsimplified Weierstrass equation E : Y 2 + a1 XY + a3 Y =
X 3 + a2 X 2 + a4 X + a6 . First, we define the opposite of a point. We take
−O = O. For any finite point P = (h, k) on E, the vertical line L passing
through P has the equation X = h. Substituting this in the Weierstrass
equation gives Y 2 + (a1 h + a3 )Y − (h3 + a2 h2 + a4 h + a6 ) = 0. This equation
already has a root k. Let the other root be k ′ . We have k + k ′ = −(a1 h + a3 ),
that is, k ′ = −(k + a1 h + a3 ). Therefore, we take

−P = (h, −(k + a1 h + a3 )).

If a1 = a3 = 0, then −P is indeed the reflection of P about the X-axis.


Now, let us find the point P +Q for P, Q ∈ EK . We take P +O = O+P = P
and O + O = O. Moreover, if Q = −P , then P + Q = O. So let P = (h1 , k1 )
and Q = (h2 , k2 ) be finite points on E, that are not opposites of one another.
In this case, the straight line passing through P and Q (the tangent to E at
P if P = Q) is not vertical and has the equation L : Y = λX + µ, where

k2 − k1 if P 6= Q,
 h2 − h1 ,

λ = 2
 3h1 + 2a2 h1 + a4 − a1 k1 , if P = Q, and

2k1 + a1 h1 + a3
µ = k1 − λh1 = k2 − λh2 .

Substituting Y by λX + µ in the equation for E gives a cubic equation in X,


of which h1 and h2 are already two roots. The third root of this equation is

h3 = λ2 + a1 λ − a2 − h1 − h2 .

That is, the line L meets E at the third point R = (h3 , λh3 + µ). We take

P + Q = −R = (h3 , −(λ + a1 )h3 − µ − a3 ).

We have simpler formulas for the coordinates of P + Q if we work with


simplified forms of the Weierstrass equation. We concentrate only on the case
188 Computational Number Theory

that P = (h1 , k1 ) and Q = (h2 , k2 ) are finite points that are not opposites of
one another. Our plan is to compute the coordinates of P + Q = (h3 , k3 ).
If char K 6= 2, 3, we use the equation Y 2 = X 3 + aX + b. In this case,

h3 = λ 2 − h1 − h2 ,
k3 = λ(h1 − h3 ) − k1 , where

k2 − k1
 h2 − h1 , if P 6= Q,

λ = 2
 3h1 + a , if P = Q .

2k1
If char K = 3, we use the equation Y 2 = X 3 + aX 2 + bX + c, and obtain

h3 = λ 2 − a − h 1 − h2 ,
k3 = λ(h1 − h3 ) − k1 , where

k −k
 h2 − h1 , if P 6= Q,

2 1
λ =

 2ah 1 + b , if P = Q .
2k1
Finally, let char K = 2. For the supersingular curve Y 2 +aY = X 3 +bX +c,
we have
³ ´
k +k 2
 h1 + h2 + h1 + h2 , if P 6= Q,


1 2
h3 = 4 2
 h1 +2 b ,


if P = Q,
³ a ´
k1 + k2
 h1 + h2 (h1 + h3 ) + k1 + a, if P 6= Q,


k3 = µ 2 ¶
 h1 + b

 a (h1 + h3 ) + k1 + a, if P = Q,

whereas for the non-supersingular curve Y 2 + XY = X 3 + aX 2 + b, we have


³ ´
 k +k 2 k +k
 h1 + h2 + h1 + h2 + h1 + h2 + a, if P 6= Q,

1 2 1 2
h3 =
 2 b
 h1 + 2 ,
 if P = Q,
h1
³ ´
k +k
 h1 + h2 (h1 + h3 ) + h3 + k1 , if P 6= Q,

1 2
k3 = ³ ´
 h1 + h1 + k1 + 1 h3 ,
 2
if P = Q.
h1
The opposite of a finite point P = (h, k) on E is
½
(h, k + a) for the supersingular curve,
−P =
(h, k + h) for the non-supersingular curve.
Arithmetic of Elliptic Curves 189

Example 4.4 (1) Consider the curve E : Y 2 = X 3 − X + 1 over Q and the


points P = (1, −1) and Q = (3, 5) on E. First, we compute the point P + Q.
Y +1 5+1
The straight line L passing through P and Q has the equation X−1 = 3−1 ,
that is, Y = 3X − 4. Substituting this value of Y in the equation of E, we
get (3X − 4)2 = X 3 − X + 1, that is, X 3 − 9X 2 + 23X − 15 = 0, that is,
(X − 1)(X − 3)(X − 5) = 0. Two of the three roots of this equation are already
the X-coordinates of P and Q. The third point of intersection of L with E
has the X-coordinate 5, and the corresponding Y -coordinate is 3 × 5 − 4 = 11.
Therefore, P + Q = −(5, 11) = (5, −11).
We now compute the point 2P . Since Y 2 equals X 3 − X + 1 on the curve,
2
differentiation with respect to X gives 2Y dd X
Y
= 3X 2 −1, that is, dd X
Y
= 3X2Y−1 .
At the point P = (1, −1), we have dd XY
= 3−1
−2 = −1. The tangent T to E at P
has the equation Y = −X + µ, where µ is obtained by putting the coordinates
of P in the equation: µ = −1 + 1 = 0, so the equation of T is Y = −X.
Plugging in this value of Y in the equation of E, we get X 2 = X 3 − X + 1,
that is, (X − 1)2 (X + 1) = 0. So the third point of intersection of T with E
has X = −1 and is (X, −X) = (−1, 1), that is, 2P = −(−1, 1) = (−1, −1).
Here, for the sake of illustration only, we have computed P + Q and 2P
using basic principles of elementary coordinate geometry. In practice, we may
straightaway apply the formulas presented earlier.
(2) Take E : Y 2 = X 3 − 5X + 1 defined over F17 , P = (2, 4), and
Q = (13, 5). The slope of the line joining P and Q is λ ≡ (5−4)×(13−2)−1 ≡
14 (mod 17). Thus, the X-coordinate of P + Q is λ2 − 2 − 13 ≡ 11 (mod 17),
and its Y -coordinate is λ(2 − 11) − 4 ≡ 6 (mod 17), that is, P + Q = (11, 6).
We now compute 2P . The slope of the tangent to E at P is λ = (3 × 22 −
5) × (2 × 4)−1 ≡ 3 (mod 17). Thus, the X-coordinate of 2P is λ2 − 2 × 2 ≡
5 (mod 17), and its Y -coordinate is λ(2 − 5) − 4 ≡ 4 (mod 17), so 2P = (5, 4).
(3) Represent F8 = F2 (θ) with θ3 + θ + 1 = 0. Take the non-supersingular
curve Y 2 + XY = X 3 + X 2 + θ over F8 and the points P = (h1 , k1 ) = (1, θ2 )
and Q³ = (h´2 , k2¡) = (θ + 1, θ2 + 1). The line joining P and Q has slope
k1 +k2
¢
λ = h1 +h2 = θ1 = θ2 + 1. Let P + Q = (h3 , k3 ). The explicit formulas
presented earlier give h3 = λ2 + λ + h1 + h2 + 1 = (θ2 + 1)2 + (θ2 + 1) + 1 + (θ +
1) + 1 = θ4 + θ2 + θ + 1 = θ(θ3 + θ + 1) + 1 = 1, and k3 = λ(h1 + h3 ) + h3 + k1 =
(θ2 + 1)(1 + 1) + 1 + θ2 = θ2 + 1, that is, P + Q = (1, θ2 + 1).
Finally, we compute 2P = (h4 , k4 ). Once again we use the
³ explicit formulas
´
given earlier to obtain h4 = h1 + h2 = θ +1, and k4 = h1 + h1 + hk11 + 1 h4 =
2 θ 2
³ ´ 1
2
1 + 1 + θ1 + 1 (θ + 1) = 1 + θ2 (θ + 1) = θ3 + θ2 + 1 = (θ3 + θ + 1) + (θ2 + θ) =
θ2 + θ, that is, 2P = (θ + 1, θ2 + θ). ¤

We now study some useful properties of the elliptic-curve group EK . The


order of an element a in an additive group G is defined to be the smallest
positive integer m for which ma = a + a + · · · + a (m times) = 0. We denote
this as ord a = m. If no such positive integer m exists, we take ord a = ∞.
190 Computational Number Theory

Proposition 4.5 Let K be a field with char K 6= 2. An elliptic curve E de-


fined over K has at most three points of order two. If K is algebraically closed,
E has exactly three such points.
Proof Suppose that 2P = O with O 6= P = (h, k) ∈ EK . Then P = −P ,
that is, P is the opposite of itself. Since char K 6= 2, we may assume that
the elliptic curve is symmetric about the X-axis and has the equation Y 2 =
X 3 + aX 2 + bX + c. Since E is non-singular, the right side of this equation has
no multiple roots. Now, P = (h, k) = −P = (h, −k) implies k = 0, that is, h is
a root of X 3 + aX 2 + bX + c. Conversely, for any root h of X 3 + aX 2 + bX + c,
the finite point (h, 0) ∈ EK is of order two. ⊳

Clearly, an elliptic curve defined over R or C contains infinitely many


points. However, an elliptic curve defined over Q may contain only finitely
many points. For example, the curve Y 2 = X 3 + 6 contains no points with
rational coordinates, that is, the only point on this curve is O. I now state
some interesting facts about elliptic curves over Q.

Theorem 4.6 [Mordell’s theorem]12 The group EQ of an elliptic curve E


defined over Q is finitely generated. ⊳

One can write EQ = Etors ⊕ Efree , where Etors is the subgroup consisting
of points of finite orders (the torsion subgroup of EQ ), and where Efree ∼ = Zr
is the free part of EQ . Mordell’s theorem states that the rank r of EQ is finite.
It is a popular belief that r can be arbitrarily large. At present (March 2012),
however, the largest known rank of an elliptic curve is 28. Elkies in 2006
discovered that the following elliptic curve has rank 28.
Y 2 + XY + Y =
X 3 − X 2 − 20067762415575526585033208209338542750930230312178956502X +
3448161179503055646703298569039072037485594435931918036126 \
6008296291939448732243429

Theorem 4.7 [Mazur’s theorem]13 The torsion subgroup Etors of EQ is ei-


ther a cyclic group of order m ∈ {1, 2, 3, . . . , 10, 12}, or isomorphic to Z2 ⊕Z2m
for m ∈ {1, 2, 3, 4}. There are infinitely many examples of elliptic curves over
Q for each of these group structures of Etors . ⊳

Theorem 4.8 [Nagell–Lutz theorem]14 Let E : Y 2 = X 3 + aX 2 + bX + c be


an elliptic curve with integer coefficients a, b, c, and let O 6= P ∈ Etors . Then,
the coordinates of P are integers. ⊳

There are two very important quantities associated with elliptic curves.
12 This result was conjectured by Poincaré in 1901, and proved in 1922 by the British

mathematician Louis Joel Mordell (1888–1972).


13 This theorem was proved in 1977 by Barry Charles Mazur (1937–) of Harvard University.
14 This theorem was proved independently by the Norwegian mathematician Trygve Nagell

(1895–1988) and the French mathematician Élisabeth Lutz (1914–2008).


Arithmetic of Elliptic Curves 191

Definition 4.9 For a cubic curve E given by Eqn (4.4), define:


d2 = a21 + 4a2
d4 = 2a4 + a1 a3
d6 = a23 + 4a6
d8 = a21 a6 + 4a2 a6 − a1 a3 a4 + a2 a23 − a24
c4 = d22 − 24d4
∆(E) = −d22 d8 − 8d34 − 27d26 + 9d2 d4 d6 (4.9)
j(E) = c34 /∆(E), if ∆(E) 6= 0 . (4.10)
∆(E) is called the discriminant of E, and j(E) the j-invariant of E. ⊳
Some important results pertaining to E are now stated.
Theorem 4.10 ∆(E) 6= 0 if and only if E is smooth. In particular, j(E) is
defined for all elliptic curves. ⊳
Theorem 4.11 Let E and E ′ be two elliptic curves defined over K. If the

groups EK and EK are isomorphic, then j(E) = j(E ′ ). Conversely, if j(E) =
′ ′
j(E ), then the groups EK̄ and EK̄ are isomorphic. ⊳
If char K 6= 2, 3, and E is given by the special Weierstrass Eqn (4.6), then
∆(E) = −16(4a3 + 27b2 ). Up to the constant factor 16, this is the same as the
discriminant of the cubic polynomial X 3 + aX + b. But ∆(X 3 + aX + b) 6= 0
if and only if X 3 + aX + b does not contain multiple roots—a necessary and
sufficient condition for E given by Eqn (4.6) to be smooth.

4.2.1 Handling Elliptic Curves in GP/PARI


GP/PARI provides extensive facilities to work with elliptic curves. An elliptic
curve is specified by the coefficients a1 , a2 , a3 , a4 , a6 in the general Weierstrass
Eqn (4.4). The coefficients are to be presented as a row vector. One may call
the built-in function ellinit() which accepts this coefficient vector and re-
turns a vector with 19 components storing some additional information about
the curve. If the input coefficients define a singular curve, an error message is
issued. In the following snippet, the first curve Y 2 = X 3 + X 2 is singular over
Q, whereas the second curve Y 2 = X 3 − X + 1 is an elliptic curve over Q.

gp > E1 = ellinit([0,1,0,0,0])
*** singular curve in ellinit.
gp > E1 = ellinit([0,0,0,-1,1])
%1 = [0, 0, 0, -1, 1, 0, -2, 4, -1, 48, -864, -368, -6912/23, [-1.32471795724474
6025960908854, 0.6623589786223730129804544272 - 0.5622795120623012438991821449*I
, 0.6623589786223730129804544272 + 0.5622795120623012438991821448*I]~, 4.7070877
61230185561883752116, -2.353543880615092780941876058 + 1.09829152506100512202582
2079*I, -1.209950063079174653559416804 + 0.E-28*I, 0.604975031539587326779708402
0 - 0.9497317195650359122756449983*I, 5.169754595877492840054389119]
192 Computational Number Theory

A finite point (h, k) on an elliptic curve is represented as the row vector


[h,k] without a mention of the specific curve on which this point lies. When
one adds or computes multiples of points on an elliptic curve, the curve must
be specified, otherwise vector addition and multiplication are performed. The
function for point addition is elladd(), and that for computing the multiple
of a point is ellpow(). The opposite of a point can be obtained by computing
the (−1)-th multiple of a point. The point at infinity is represented as [0].

gp > P1 = [1,-1]
%2 = [1, -1]
gp > Q1 = [3,5]
%3 = [3, 5]
gp > P1 + Q1
%4 = [4, 4]
gp > elladd(E1,P1,Q1)
%5 = [5, -11]
gp > 2*P1
%6 = [2, -2]
gp > ellpow(E1,P1,2)
%7 = [-1, -1]
gp > R1 = ellpow(E1,Q1,-1)
%8 = [3, -5]
gp > elladd(E1,Q1,R1)
%9 = [0]

We can work with elliptic curves over finite fields. Here is an example that
illustrates the arithmetic of the curve Y 2 = X 3 − 5X + 1 defined over F17 .

gp > E2 = ellinit([Mod(0,17),Mod(0,17),Mod(0,17),Mod(5,17),Mod(-1,17)])
*** singular curve in ellinit.
gp > E2 = ellinit([Mod(0,17),Mod(0,17),Mod(0,17),Mod(-5,17),Mod(1,17)])
%10 = [Mod(0, 17), Mod(0, 17), Mod(0, 17), Mod(12, 17), Mod(1, 17), Mod(0, 17),
Mod(7, 17), Mod(4, 17), Mod(9, 17), Mod(2, 17), Mod(3, 17), Mod(3, 17), Mod(14,
17), 0, 0, 0, 0, 0, 0]
gp > P2 = [Mod(2,17),Mod(4,17)];
gp > Q2 = [Mod(13,17),Mod(5,17)];
gp > elladd(E2,P2,Q2)
%13 = [Mod(11, 17), Mod(6, 17)]
gp > ellpow(E2,P2,2)
%14 = [Mod(5, 17), Mod(4, 17)]

One can work with curves defined over extension fields. For example, we
represent F8 = F2 (θ) with θ3 + θ + 1 = 0, and define the non-supersingular
curve Y 2 + XY = X 3 + X 2 + θ over F8 .

gp > f = Mod(1,2)*t^3+Mod(1,2)*t+Mod(1,2)
%15 = Mod(1, 2)*t^3 + Mod(1, 2)*t + Mod(1, 2)
Arithmetic of Elliptic Curves 193

gp > a1 = Mod(Mod(1,2),f);
gp > a2 = Mod(Mod(1,2),f);
gp > a3 = a4 = 0;
gp > a6 = Mod(Mod(1,2)*t,f);
gp > E3 = ellinit([a1,a2,a3,a4,a6])
%20 = [Mod(Mod(1, 2), Mod(1, 2)*t^3 + Mod(1, 2)*t + Mod(1, 2)), Mod(Mod(1, 2), M
od(1, 2)*t^3 + Mod(1, 2)*t + Mod(1, 2)), 0, 0, Mod(Mod(1, 2)*t, Mod(1, 2)*t^3 +
Mod(1, 2)*t + Mod(1, 2)), Mod(Mod(1, 2), Mod(1, 2)*t^3 + Mod(1, 2)*t + Mod(1, 2)
), Mod(Mod(0, 2), Mod(1, 2)*t^3 + Mod(1, 2)*t + Mod(1, 2)), 0, Mod(Mod(1, 2)*t,
Mod(1, 2)*t^3 + Mod(1, 2)*t + Mod(1, 2)), Mod(Mod(1, 2), Mod(1, 2)*t^3 + Mod(1,
2)*t + Mod(1, 2)), Mod(Mod(1, 2), Mod(1, 2)*t^3 + Mod(1, 2)*t + Mod(1, 2)), Mod(
Mod(1, 2)*t, Mod(1, 2)*t^3 + Mod(1, 2)*t + Mod(1, 2)), Mod(Mod(1, 2)*t^2 + Mod(1
, 2), Mod(1, 2)*t^3 + Mod(1, 2)*t + Mod(1, 2)), 0, 0, 0, 0, 0, 0]
gp > P3 = [Mod(Mod(1,2),f), Mod(Mod(1,2)*t^2,f)];
gp > Q3 = [Mod(Mod(1,2)*t+Mod(1,2),f), Mod(Mod(1,2)*t^2+Mod(1,2),f)];
gp > elladd(E3,P3,Q3)
%22 = [Mod(Mod(1, 2), Mod(1, 2)*t^3 + Mod(1, 2)*t + Mod(1, 2)), Mod(Mod(1, 2)*t^
2 + Mod(1, 2), Mod(1, 2)*t^3 + Mod(1, 2)*t + Mod(1, 2))]
gp > lift(lift(elladd(E3,P3,Q3)))
%23 = [1, t^2 + 1]
gp > lift(lift(ellpow(E3,P3,2)))
%25 = [t + 1, t^2 + t]

Given an elliptic curve and an X-coordinate, the function ellordinate()


returns a vector of Y -coordinates of points on the curve having the given
X-coordinate. The vector contains 0, 1, or 2 elements.

gp > ellordinate(E1,3)
%26 = [5, -5]
gp > ellordinate(E1,4)
%27 = []
gp > ellordinate(E2,5)
%28 = [Mod(4, 17), Mod(13, 17)]
gp > ellordinate(E2,6)
%29 = [Mod(0, 17)]

The function ellorder() computes the additive order of a point on an


elliptic curve defined over Q. If the input point is of infinite order, 0 is re-
turned. This function is applicable to curves defined over Q only. Curves
defined over finite fields are not supported. Connell and Dujella discov-
ered in 2000 that the curve Y 2 + XY = X 3 − 15745932530829089880X +
24028219957095969426339278400 has rank three. Its torsion subgroup is iso-
morphic to Z2 ⊕ Z8 , and consists of the following 16 points:
O, (−4581539664, 2290769832), (−1236230160, 203972501847720),
(2132310660, 12167787556920), (2452514160, 12747996298920),
(9535415580, 860741285907000), (2132310660, −12169919867580),
(−1236230160, −203971265617560), (9535415580, −860750821322580),
(2452514160, −12750448813080), (2346026160, −1173013080),
194 Computational Number Theory

(1471049760, 63627110794920), (1471049760, −63628581844680),


(3221002560, −82025835631080), (3221002560, 82022614628520),
(8942054015/4, −8942054015/8).
(Argue why the last point in the above list does not contradict the Nagell–
Lutz theorem.) The free part of the elliptic curve group is generated by the
following three independent points.
(2188064030, −7124272297330),
(396546810000/169, 1222553114825160/2197),
(16652415739760/3481, 49537578975823615480/205379).

gp > E4 = ellinit([1,0,0,-15745932530829089880,24028219957095969426339278400])
%30 = [1, 0, 0, -15745932530829089880, 24028219957095969426339278400, 1, -314918
65061658179760, 96112879828383877705357113600, -24793439124139356756716301340277
9136000, 755804761479796314241, -20760382044064624726576831008961, 4358115163151
13821324429157217041184204234956825600000000, 4317465449157134026457008940686688
72327010468357665602062299521/43581151631511382132442915721704118420423495682560
0000000, [2346026160.000000000000000000, 2235513503.749999999999999999, -4581539
664.000000000000000000]~, 0.00008326692542370325455895756925, 0.0000378969084242
9953714081078080*I, 12469.66709441963916544774723, -32053.9134602331409344378049
3*I, 0.000000003155559047555061173737096096]
gp > P4 = [9535415580, -860750821322580];
gp > Q4 = [-4581539664, 2290769832];
gp > R4 = [0];
gp > S4 = [2188064030, -7124272297330];
gp > ellorder(E4,P4)
%35 = 8
gp > ellorder(E4,Q4)
%36 = 2
gp > ellorder(E4,R4)
%37 = 1
gp > ellorder(E4,S4)
%38 = 0

4.3 Elliptic Curves over Finite Fields


Let E be an elliptic curve defined over a finite field Fq . When no confusions
are likely, we would use the shorthand notation Eq to stand for the group EFq .
Eq can contain at most q 2 + 1 points (all pairs in F2q and the point at infinity,
but this is a loose overestimate), that is, Eq is a finite group, and every element
of Eq has finite order. The following theorem describes the structure of Eq .

Theorem 4.12 The group Eq is either cyclic or the direct sum Zn1 ⊕ Zn2 of
two cyclic subgroups with n1 , n2 > 2, n2 |n1 , and n2 |(q − 1). ⊳
Arithmetic of Elliptic Curves 195

Let m be the size of Eq . If Eq is cyclic, it contains exactly φ(m) generators.


Let P be such a generator. Then, every point Q on Eq can be represented
uniquely as Q = sP for some s in the range 0 6 s < m. If Eq is not cyclic,
then Eq ∼= Zn1 ⊕ Zn2 with m = n1 n2 . That is, Eq contains one point P of
order n1 and one point Q of order n2 such that any point R ∈ Eq can be
uniquely represented as R = sP + tQ with 0 6 s < n1 and 0 6 t < n2 .

Example 4.13 (1) The elliptic curve E1 : Y 2 = X 3 − 5X + 1 defined over


F17 consists of the following 16 points.

P0 = O, P1 = (0, 1), P2 = (0, 16), P3 = (2, 4),


P4 = (2, 13), P5 = (3, 8), P6 = (3, 9), P7 = (5, 4),
P8 = (5, 13), P9 = (6, 0), P10 = (10, 4), P11 = (10, 13),
P12 = (11, 6), P13 = (11, 11), P14 = (13, 5), P15 = (13, 12).

The multiples of the points in (E1 )F17 and their orders are listed below.

P 2P 3P 4P 5P 6P 7P 8P 9P 10P 11P 12P 13P 14P 15P 16P ord P


P0 1
P1 P3 P14 P7 P12 P11 P6 P9 P5 P10 P13 P8 P15 P4 P2 P0 16
P2 P4 P15 P8 P13 P10 P5 P9 P6 P11 P12 P7 P14 P3 P1 P0 16
P3 P7 P11 P9 P10 P8 P4 P0 8
P4 P8 P10 P9 P11 P7 P3 P0 8
P5 P3 P13 P7 P15 P11 P2 P9 P1 P10 P14 P8 P12 P4 P6 P0 16
P6 P4 P12 P8 P14 P10 P1 P9 P2 P11 P15 P7 P13 P3 P5 P0 16
P7 P9 P8 P0 4
P8 P9 P7 P0 4
P9 P0 2
P10 P7 P4 P9 P3 P8 P11 P0 8
P11 P8 P3 P9 P4 P7 P10 P0 8
P12 P10 P2 P7 P5 P4 P14 P9 P15 P3 P6 P8 P1 P11 P13 P0 16
P13 P11 P1 P8 P6 P3 P15 P9 P14 P4 P5 P7 P2 P10 P12 P0 16
P14 P11 P5 P8 P2 P3 P12 P9 P13 P4 P1 P7 P6 P10 P15 P0 16
P15 P10 P6 P7 P1 P4 P13 P9 P12 P3 P2 P8 P5 P11 P14 P0 16

The table demonstrates that the group (E1 )F17 is cyclic. The φ(16) = 8 gen-
erators of this group are P1 , P2 , P5 , P6 , P12 , P13 , P14 , P15 .
(2) The elliptic curve E2 : Y 2 = X 3 − 5X + 2 defined over F17 consists of
the following 20 points.

P0 = O, P1 = (0, 6), P2 = (0, 11), P3 = (1, 7),


P4 = (1, 10), P5 = (2, 0), P6 = (5, 0), P7 = (6, 1),
P8 = (6, 16), P9 = (7, 2), P10 = (7, 15), P11 = (8, 7),
P12 = (8, 10), P13 = (10, 0), P14 = (12, 2), P15 = (12, 15),
P16 = (13, 3), P17 = (13, 14), P18 = (15, 2), P19 = (15, 15).

The multiples of these points and their orders are tabulated below.
196 Computational Number Theory

P 2P 3P 4P 5P 6P 7P 8P 9P 10P ord P
P0 1
P1 P4 P18 P8 P13 P7 P19 P3 P2 P0 10
P2 P3 P19 P7 P13 P8 P18 P4 P1 P0 10
P3 P7 P8 P4 P0 5
P4 P8 P7 P3 P0 5
P5 P0 2
P6 P0 2
P7 P4 P3 P8 P0 5
P8 P3 P4 P7 P0 5
P9 P3 P16 P7 P6 P8 P17 P4 P10 P0 10
P10 P4 P17 P8 P6 P7 P16 P3 P9 P0 10
P11 P4 P14 P8 P5 P7 P15 P3 P12 P0 10
P12 P3 P15 P7 P5 P8 P14 P4 P11 P0 10
P13 P0 2
P14 P7 P12 P4 P5 P3 P11 P8 P15 P0 10
P15 P8 P11 P3 P5 P4 P12 P7 P14 P0 10
P16 P8 P10 P3 P6 P4 P9 P7 P17 P0 10
P17 P7 P9 P4 P6 P3 P10 P8 P16 P0 10
P18 P7 P2 P4 P13 P3 P1 P8 P19 P0 10
P19 P8 P1 P3 P13 P4 P2 P7 P18 P0 10

Since (E2 )F17 does not contain a point of order 20, the group (E2 )F17 is not
cyclic. The above table shows that (E2 )F17 ∼
= Z10 ⊕ Z2 . We can take the point
P1 as a generator of Z10 and P5 as a generator of Z2 . Every element of (E2 )F17
can be uniquely expressed as sP1 + tP5 with s ∈ {0, 1, 2, . . . , 9} and t ∈ {0, 1}.
The following table lists this representation of all the points of E2 (F17 ).

s=0 s=1 s=2 s=3 s=4 s=5 s=6 s=7 s=8 s=9
t=0 P0 P1 P4 P18 P8 P13 P7 P19 P3 P2
t=1 P5 P10 P15 P17 P12 P6 P11 P16 P14 P9

(3) The ordinary curve E3 : Y 2 + XY = X 3 + θX 2 + (θ2 + 1) defined over


F8 = F2 (θ), where θ3 + θ + 1 = 0, contains the following 12 points.

P0 = O, P1 = (0, θ + 1), P2 = (1, θ),


P3 = (1, θ + 1), P4 = (θ, θ2 ), P5 = (θ, θ2 + θ),
P6 = (θ + 1, 0), P7 = (θ + 1, θ + 1), P8 = (θ2 , θ),
P9 = (θ2 , θ2 + θ), P10 = (θ2 + θ + 1, θ), P11 = (θ2 + θ + 1, θ2 + 1).

The multiples of these points and their orders are listed in the table below.
The table illustrates that the group (E3 )F8 is cyclic. The φ(12) = 4 generators
of this group are the points P2 , P3 , P6 , P7 .
Arithmetic of Elliptic Curves 197

P 2P 3P 4P 5P 6P 7P 8P 9P 10P 11P 12P ord P


P0 1
P1 P0 2
P2 P8 P11 P5 P6 P1 P7 P4 P10 P9 P3 P0 12
P3 P9 P10 P4 P7 P1 P6 P5 P11 P8 P2 P0 12
P4 P5 P0 3
P5 P4 P0 3
P6 P9 P11 P4 P2 P1 P3 P5 P10 P8 P7 P0 12
P7 P8 P10 P5 P3 P1 P2 P4 P11 P9 P6 P0 12
P8 P5 P1 P4 P9 P0 6
P9 P4 P1 P5 P8 P0 6
P10 P1 P11 P0 4
P11 P1 P10 P0 4

(4) As a final example, consider the supersingular curve E4 : Y 2 + Y =


X + X + θ2 defined over F8 = F2 (θ) with θ3 + θ + 1 = 0. The group (E4 )F8
3

contains the following five points.


P0 = O, P1 = (0, θ2 + θ), P2 = (0, θ2 + θ + 1),
P3 = (1, θ + θ), P4 = (1, θ2 + θ + 1).
2

Since the size of (E4 )F8 is prime, (E4 )F8 is a cyclic group, and any point on it
except O is a generator of it. ¤
The size of the elliptic curve group Eq = EFq is trivially upper-bounded
by q 2 + 1. In practice, this size is much smaller than q 2 + 1. The following
theorem implies that the size of Eq is Θ(q).
Theorem 4.14 [Hasse’s theorem]15 The size of Eq is q + 1 − t, where
√ √
−2 q 6 t 6 2 q. ⊳
The integer t in Hasse’s theorem is called the trace of Frobenius16 for the
elliptic curve E defined over Fq . It is an important quantity associated with
the curve. We define several classes of elliptic curves based on the value of t.
Definition 4.15 Let E be an elliptic curve defined over the finite field Fq of
characteristic p, and t the trace of Frobenius for E.
(a) If t = 1, that is, if the size of EFq is q, we call E an anomalous curve.
(b) If p|t, we call E a supersingular curve, whereas if p6 | t, we call E a non-
supersingular or an ordinary curve. ⊳
Recall that for finite fields of characteristic two, we have earlier defined
supersingular and non-supersingular curves in a different manner. The earlier
definitions turn out to be equivalent to Definition 4.15(b) for the fields F2n .
We have the following important characterization of supersingular curves.
15 This result was conjectured by the Austrian-American mathematician Emil Artin

(1898–1962), and proved by the German mathematician Helmut Hasse (1898–1979).


16 Ferdinand Georg Frobenius (1849–1917) was a German mathematician well-known for

his contributions to group theory.


198 Computational Number Theory

Proposition 4.16 An elliptic curve E over Fq and with trace of Frobenius


equal to t is supersingular if and only if t2 = 0, q, 2q, 3q, or 4q. In particular,
an elliptic curve defined over Fq = Fpn with p = char Fq 6= 2, 3 and with n odd
is supersingular if and only if t = 0. ⊳

Example 4.17 (1) The curve Y 2 = X 3 + X + 3 defined over F17 contains


the following 17 points, and is anomalous.
P0 = O, P1 = (2, 8), P2 = (2, 9), P3 = (3, 4), P4 = (3, 13),
P5 = (6, 2), P6 = (6, 15), P7 = (7, 8), P8 = (7, 9), P9 = (8, 8),
P10 = (8, 9), P11 = (11, 6), P12 = (11, 11), P13 = (12, 3), P14 = (12, 14),
P15 = (16, 1), P16 = (16, 16).
Evidently, anomalous curves defined over prime fields admit cyclic groups.
(2) The curve Y 2 = X 3 + X + 1 defined over F17 contains the following
18 points, and is supersingular.
P0 = O, P1 = (0, 1), P2 = (0, 16), P3 = (4, 1), P4 = (4, 16),
P5 = (6, 6), P6 = (6, 11), P7 = (9, 5), P8 = (9, 12), P9 = (10, 5),
P10 = (10, 12), P11 = (11, 0), P12 = (13, 1), P13 = (13, 16), P14 = (15, 5),
P15 = (15, 12), P16 = (16, 4), P17 = (16, 13).

(3) The curve E3 of Example 4.13(3), defined over F8 , has trace −3 (not a
multiple of two) and is non-supersingular. The curve E4 of Example 4.13(4),
defined again over F8 , has trace four (a multiple of two), and is supersingular.
(4) The non-supersingular curve Y 2 +XY = X 3 +θX 2 +θ over F8 = F2 (θ),
θ3 + θ + 1 = 0, contains the following eight points, and is anomalous.
P0 = O, P1 = (0, θ2 + θ), P2 = (θ2 , 0),
P3 = (θ2 , θ2 ), 2
P4 = (θ + 1, θ + 1), P5 = (θ2 + 1, θ2 + θ),
P6 = (θ2 + θ + 1, 1), 2 2
P7 = (θ + θ + 1, θ + θ). ¤

Example 4.76 lists some popular families of supersingular elliptic curves.


Also see Exercise 4.66.
Now follows a useful application of the trace of Frobenius. Let E be an
elliptic curve over Fq with trace of Frobenius equal to t. For any r ∈ N, the
curve E continues to remain an elliptic curve over the extension field Fqr . The
size of the group Eqr = EFqr can be determined from the size of Eq = EFq .

Proposition 4.18 [Weil’s theorem]17 Let the elliptic curve E defined over
Fq have trace t. Let α, β ∈ C satisfy W 2 − tW + q = (W − α)(W − β). Then,
for every r ∈ N, the size of the group Eqr is q r + 1 − (αr + β r ), that is, the
trace of Frobenius for E over Fqr is αr + β r . ⊳
17 André Abraham Weil (1906–1998) was a French mathematician who made profound

contributions in the areas of number theory and algebraic geometry. He was one of the
founding members of the mathematicians’ (mostly French) group Nicolas Bourbaki.
Arithmetic of Elliptic Curves 199

Example 4.19 (1) The group of the curve E1 : Y 2 = X 3 − 5X + 1 over F17


has size 16 (Example 4.13(1)), that is, the trace for this curve is t = 2. We
have W 2 − 2W + 17 = (W − α)(W − β), where α = 1 + i4 and β = 1 − i4.
Therefore, the size of (E1 )F172 is 172 + 1 − [(1 + i4)2 + (1 − i4)2 ] = 320.
(2) The anomalous curve of Example 4.17(1)√ has trace t = 1. √
We have
2 1+ i 67 1− i 67
W − W + 17 = (W − α)(W − β), where α = 2 and β = 2 . When
the underlying field is F172 , this curve contains 172 + 1 − (α2 + β 2 ) = 323
points, and is no longer anomalous.
(3) The supersingular curve E4 of Example 4.13(4) has trace t = 4. We
have W 2 − 4W + 8 = (W − α)(W − β), where α = 2 + i2 and β = 2 − i2.
The group (E4 )F512 contains 83 + 1 − (α3 + β 3 ) = 545 points, and is again
supersingular (having trace −32, a multiple of 2). ¤

4.4 Some Theory of Algebraic Curves


In this section, an elementary introduction to the theory of plane algebraic
curves is provided. These topics are essentially parts of algebraic geometry. A
treatise on elliptic curves, however, cannot proceed to any reasonable extent
without these mathematical tools. Particularly, pairing and point counting on
elliptic curves make heavy use of these tools. I have already mentioned that the
study of elliptic curves is often referred to as arithmetic algebraic geometry.
Studying relevant algebra or geometry is no harm to a number theorist.
The following treatment of algebraic geometry is kept as elementary as
possible. In particular, almost all mathematical proofs are omitted. Readers
willing to learn the mathematical details may start with the expository reports
by Charlap and Robbins,18 and by Charlap and Coley.19

4.4.1 Affine and Projective Curves


For the rest of this chapter, we let K be a field, and K̄ its algebraic closure.
More often that not, we deal with finite fields, that is, we have K = Fq with
p = char K. Mostly for visualization purposes, we would consider K = R.

4.4.1.1 Affine Curves


The two-dimensional plane over K is called the affine plane over K:
K 2 = {(h, k) | h, k ∈ K}.
18 Leonard S. Charlap and David P. Robbins, An elementary introduction to elliptic

curves, CRD Expository Report 31, 1988.


19 Leonard S. Charlap and Raymond Coley, An elementary introduction to elliptic curves

II, CCR Expository Report 34, 1990.


200 Computational Number Theory

We address the two field elements h, k for a point P = (h, k) ∈ K 2 as its


X- and Y -coordinates, respectively. These coordinates are called the affine
coordinates of P , and are unique for any given point. We denote these as
h = X(P ), k = Y (P ).
Definition 4.20 A plane affine curve C over K is defined by a non-zero
polynomial as
C : f (X, Y ) = 0.
A K-rational point on a curve C : f (X, Y ) = 0 is a point P = (h, k) ∈ K 2
such that f (P ) = f (h, k) = 0. A K-rational point is also called a finite point
on C. The finite points on C are the solutions of f (X, Y ) = 0 in K 2 . ⊳
It is customary to consider only irreducible polynomials f (X, Y ) for defin-
ing curves. The reason is that if f (X, Y ) admits a non-trivial factorization
of the form u(X, Y )v(X, Y ) (or w(X, Y )a for some a > 2), then a study of
f (X, Y ) is equivalent to a study of curves (u, v or w) of lower degree(s).
Example 4.21 (1) Straight lines are defined by the equation aX +bY +c = 0
with at least one of a, b non-zero.
(2) Circles are defined by the equation (X − a)2 + (Y − b)2 = r2 .
(3) Conic sections are defined by the equation aX 2 + bXY + cY 2 + dX +
eY + f = 0 with at least one of a, b, c non-zero.
(4) Elliptic curves are smooth curves defined by the Weierstrass equation
Y 2 + (a1 X + a3 )Y = X 3 + a2 X 2 + a4 X + a6 .
(5) A hyperelliptic curve of genus g is a smooth curve defined by an
equation of the form Y 2 + u(X)Y = v(X), where u(X), v(X) ∈ K[X] with
deg u(X) 6 g, deg v(X) = 2g + 1, and v(X) monic. If char K 6= 2, this can be
simplified as Y 2 = w(X) for a monic w(X) ∈ K[X] of degree 2g + 1.
A parabola is a hyperelliptic curve of genus zero (no handles), whereas an
elliptic curve is a hyperelliptic curve of genus one (one handle only). ¤

4.4.1.2 Projective Curves


By a systematic addition of points at infinity to the affine plane, we ob-
tain the two-dimensional projective plane over K. To that end, we define an
equivalence relation ∼ on K 3 \ {(0, 0, 0)} as (h, k, l) ∼ (h′ , k ′ , l′ ) if and only if
h′ = λh, k ′ = λk, and l′ = λl for some non-zero λ ∈ K. The equivalence class
of (h, k, l) is denoted by [h, k, l], and the set of all these equivalence classes is
the projective plane P2 (K) over K. For a point P = [h, k, l] ∈ P2 (K), the field
elements h, k, l are called the projective coordinates20 of P . Projective coor-
dinates are unique up to simultaneous multiplication by non-zero elements of
K. The three projective coordinates of a point cannot be simultaneously zero.
20 Möbius (1790–1868) introduced the concept of projective or homogeneous coordinates,

thereby pioneering algebraic treatment of projective geometry.


Arithmetic of Elliptic Curves 201

Figure 4.5 explains the relationship between the affine plane K 2 and the
projective plane P2 (K). The equivalence class P = [h, k, l] is identified with
the line in K 3 passing through the origin (0, 0, 0) and the point (h, k, l).

FIGURE 4.5: Points in the projective plane


Z A finite point
[h, k, l]
Z=1
(h/l, k/l, 1)

Y
X
A point at infinity [h, k, 0]

First, let l 6= 0. The line in K 3 corresponding to P meets the plane Z = 1


at (h/l, k/l, 1). Indeed, P = [h, k, l] = [h/l, k/l, 1]. We identify P with the
finite point (h/l, k/l). Conversely, given any point (h, k) ∈ K 2 , the point
[h, k, 1] ∈ P2 (K) corresponds to the line passing through (0, 0, 0) and (h, k, 1).
Thus, the points in K 2 are in bijection with the points on the plane Z = 1.
Next, take l = 0. The line of [h, k, 0] lies on the X-Y plane, and does not
meet the plane Z = 1 at all. Such lines correspond to the points at infinity
in P2 (K). These points are not present in the affine plane. For every slope of
straight lines in the X-Y plane, there exists a unique point at infinity.
We assume that a line passes through all the points at infinity. We call this
the line at infinity. In the projective plane, two lines (parallel or not) meet at
exactly one point. This is indeed a case of Bézout’s theorem discussed later.
We are now ready to take affine curves to the projective plane.
Definition 4.22 A (multivariate) polynomial is called homogeneous if every
non-zero term in the polynomial has the same degree. The zero polynomial is
considered homogeneous of any degree. ⊳

Example 4.23 The polynomial X 3 + 2XY Z − 3Z 3 is homogeneous of degree


three. The polynomial X 3 + 2XY − 3Z is not homogeneous. ¤

Definition 4.24 Let C : f (X, Y ) = 0 be an affine curve of degree d. The


homogenization of f is defined as f (h) (X, Y, Z) = Z d f (X/Z, Y /Z). It is a
homogeneous polynomial of degree d. The projective curve corresponding to
C is defined by the equation
C (h) : f (h) (X, Y, Z) = 0.
Let [h, k, l] ∈ P2 (K), and λ ∈ K ∗ . Since f (h) is homogeneous of degree d, we
have f (h) (λh, λk, λl) = λd f (h) (h, k, l). So f (h) (λh, λk, λl) = 0 if and only if
202 Computational Number Theory

f (h) (h, k, l) = 0. That is, the zeros of f (h) are not dependent on the choice of
the projective coordinates, and a projective curve is a well-defined concept.
A K-rational point [h, k, l] on C (h) is a solution of f (h) (h, k, l) = 0. The set
(h)
of all K-rational points on C (h) is denoted by CK . By an abuse of notation,
we often describe a curve by its affine equation. But when we talk about the
rational points on that curve, we imply all rational points on the corresponding
(h)
projective curve. In particular, CK would stand for CK . ⊳

Putting Z = 1 gives f (h) (X, Y, 1) = f (X, Y ). This gives all the finite points
on C (h) , that is, all the points on the affine curve C. If, on the other hand, we
put Z = 0, we get f (h) (X, Y, 0) which is a homogeneous polynomial in X, Y
of degree d. The solutions of f (h) (X, Y, 0) = 0 give all the points at infinity
on C (h) . These points are not present on the affine curve C.

Example 4.25 (1) A straight line in the projective plane has the equation
aX + bY + cZ = 0. Putting Z = 1 gives aX + bY + c = 0, that is, all points on
the corresponding affine line. Putting Z = 0 gives aX + bY = 0. If b 6= 0, then
Y = −(a/b)X, that is, the line contains only one point at infinity [1, −(a/b), 0].
If b = 0, we have X = 0, that is, [0, 1, 0] is the only point at infinity.
(2) A circle with center at (a, b) and radius r has the projective equation
(X − aZ)2 + (Y − bZ)2 = r2 Z 2 . All finite points on the circle are solutions
obtained by putting Z = 1, that is, all solutions of (X − a)2 + (Y − b)2 = r2 .
For obtaining the points at infinity on the circle, we put Z = 0, and obtain
X 2 + Y 2 = 0. For K = R, the only solution of this is X = Y = 0. But all
of the three projective coordinates are not allowed to be zero simultaneously,
that is, the circle does not contain any point at infinity. Indeed, a circle does
not have a part extending towards infinity in any direction.
However, for K = C, the equation X 2 + Y 2 = 0 implies that Y = ±iX,
that is, there are two points at infinity: [1, i, 0] and [1, −i, 0].

aX + bY + c = 0

r
aX + bY = 0 (a, b)

Straight Line Circle

Y = −X Y =X

Y2 =0
X2 − Y 2 = 1
2
Y =X

Parabola Hyperbola
Arithmetic of Elliptic Curves 203

(3) The parabola Y 2 = X has projective equation Y 2 = XZ. Putting


Z = 0 gives Y 2 = 0, that is, Y = 0, so [1, 0, 0] is the only point at infinity on
this parabola. Since Y 2 = X, X grows faster than Y . In the limit X → ∞, the
curve becomes horizontal, justifying why its point at infinity satisfies Y = 0.
(4) The hyperbola X 2 − Y 2 = 1 has projective equation X 2 − Y 2 = Z 2 .
Putting Z = 0 gives X 2 − Y 2 = 0, that is, Y = ±X, that is, [1, 1, 0] and
[1, −1, 0] are the two points at infinity on the hyperbola. From a plot of the
hyperbola, we see that the curve asymptotically touches the lines Y = ±X.

X=0 X=0

Y 2 = X3 − X + 1 Y 2 = X3 − X

Elliptic curves
(5) The homogenization of an elliptic curve given by the Weierstrass equa-
tion is Y 2 Z + a1 XY Z + a3 Y Z 2 = X 3 + a2 X 2 Z + a4 XZ 2 + a6 Z 3 . If we put
Z = 0, we get X 3 = 0, that is, X = 0, that is, [0, 1, 0] is the only point at
infinity on the elliptic curve. In the limit X → ∞, the curve becomes vertical.

X=0

A hyperelliptic curve of genus two: Y 2 = X(X 2 − 1)(X 2 − 2)


(6) The homogenization of the hyperelliptic curve (see Example 4.21(5)) is
Y 2 Z 2g−1 + Z g u(X/Z)Y Z g = Z 2g+1 v(X/Z). If g > 1, the only Z-free term in
this equation is X 2g+1 , so the point at infinity on this curve has X = 0, and
is [0, 1, 0]. A hyperelliptic curve of genus > 1 becomes vertical as X → ∞. ¤

Now that we can algebraically describe points at infinity on a curve, it is


necessary to investigate the smoothness of curves at their points at infinity.
Definition 4.2 handles only the finite points.
204 Computational Number Theory

Definition 4.26 Let f (X, Y ) = 0 be an affine curve, and f (h) (X, Y, Z) = 0


the corresponding projective curve. The curve is smooth at the point P (finite
or infinite) if the three partial derivatives ∂f (h) /∂X, ∂f (h) /∂Y and ∂f (h) /∂Z
do not vanish simultaneously at the point P . A (projective) curve is called
smooth if it is smooth at all points on it (including those at infinity). An
elliptic or hyperelliptic curve is required to be smooth by definition. ⊳

In what follows, I often use affine equations of curves, but talk about the
corresponding projective curves. The points at infinity on these curves cannot
be described by the affine equations and are to be handled separately.

Theorem 4.27 [Bézout’s theorem] An algebraic curve of degree m inter-


sects an algebraic curve of degree n at exactly mn points. ⊳

As such, the theorem does not appear to be true. A line and a circle must
intersect at two points. While this is the case with some lines and circles, there
are exceptions. For example, a tangent to a circle meets the circle at exactly
one point. But, in this case, the intersection multiplicity is two, that is, we
need to count the points of intersection with proper multiplicities. However,
there are examples where a line does not meet a circle at all. Eliminating one
of the variables X, Y from the equations of a circle and a line gives a quadratic
equation in the other variable. If we try to solve this quadratic equation over
R, we may fail to get a root. If we solve the same equation over C, we always
obtain two roots. These two roots may be the same, implying that this is a
case of tangency, that is, a root of multiplicity two. To sum up, it is necessary
to work in an algebraically closed field for Bézout’s theorem to hold.
But multiplicity and algebraic closure alone do not validate Bézout’s theo-
rem. Consider the case of two concentric circles X 2 +Y 2 = 1 and X 2 +Y 2 = 2.
According to Bézout’s theorem, they must intersect at four points. Even if we
allow X, Y to assume complex values, we end up with an absurd conclusion
1 = 2, that is, the circles do not intersect at all. The final thing that is neces-
sary for Bézout’s theorem to hold is that we must consider projective curves,
and take into account the possibilities of intersections of the curves at their
common points at infinity. By Example 4.25(2), every circle has two points at
infinity over C. These are [1, i, 0] and [1, −i, 0] irrespective of the radius and
center of the circle. In other words, any two circles meet at these points at
infinity. Two concentric circles touch one another at these points at infinity,
so the total number of intersection points is 2 + 2 = 4.
The equation of the circle X 2 + Y 2 = a can be written as X 2 − i 2 Y 2 = a,
so as complex curves, a circle is a hyperbola too. If we replace i by a real
number, we get a hyperbola in the real plane. It now enables us to visualize
intersections at the points at infinity. Figure 4.6 illustrates some possibilities.
Parts (a) and (b) demonstrate situations where the two hyperbolas have
only two finite points of intersection. Asymptotically, these two hyperbolas
become parallel, that is, the two hyperbolas have the same points at infinity,
and so the total number of points of intersection is four. Part (c) illustrates the
Arithmetic of Elliptic Curves 205

FIGURE 4.6: Intersection of two hyperbolas

(a) (b)

(c) (d)

Equation of the solid hyperbola: X 2 − Y 2 = 1 in all the parts.


Equations of the dashed hyperbolas: (a) (X + 31 )2 − (Y − 34 )2 = 1,
(b) (X − 3)2 − Y 2 = 1, (c) X 2 − Y 2 = 19 , (d) 94 X 2 − Y 2 = 1.

situation when the two hyperbolas not only have the same points at infinity
but also become tangential to one another at these points at infinity. There
are no finite points of intersection, but each of the two points at infinity is
an intersection point of multiplicity two. This is similar to the case of two
concentric circles. Finally, Part (d) shows a situation where the hyperbolas
have different points at infinity. All the four points of intersection of these
hyperbolas are finite. This is a situation that is not possible for circles. So we
can never see two circles intersecting at four points in the affine plane.

4.4.2 Polynomial and Rational Functions on Curves


Let C : f (X, Y ) = 0 be defined by an irreducible polynomial f (X, Y ) in
K[X, Y ]. For two polynomials G(X, Y ), H(X, Y ) ∈ K[X, Y ] with f |(G − H),
we have G(P ) = H(P ) for every rational point P on C (since f (P ) = 0 on
the curve). Thus, G and H represent the same K-valued function on C. This
motivates us to define the congruence: G(X, Y ) ≡ H(X, Y ) (mod f (X, Y ))
if and only if f |(G − H). Congruence modulo f is an equivalence relation
in K[X, Y ]. Call the equivalence classes of X and Y as x and y. Then, the
206 Computational Number Theory

equivalence class of a polynomial G(X, Y ) ∈ K[X, Y ] is G(x, y). The set of all
the equivalence classes of K[X, Y ] under congruence modulo f is denoted by

K[C] = {G(x, y) | G(X, Y ) ∈ K[X, Y ]} = K[X, Y ]/hf (X, Y )i.

Here, hf i is the ideal in the polynomial ring K[X, Y ], generated by f (X, Y ).


K[C] is called the coordinate ring of C. Since f is irreducible, the ideal hf i
is prime. Consequently, K[C] is an integral domain. The set of fractions of
elements of K[C] (with non-zero denominators) is a field denoted as

K(C) = {G(x, y)/H(x, y) | H(x, y) 6= 0}.

K(C) is called the function field of C.

Example 4.28 (1) For the straight line L : Y = X, we have f (X, Y ) = Y −


X. Since Y ≡ X (mod f (X, Y )), any bivariate polynomial G(X, Y ) ∈ K[X, Y ]
is congruent to some g(X) ∈ K[X] modulo f (substitute Y by X). It follows
that the coordinate ring K[L] is isomorphic, as a ring, to K[X]. Consequently,
K(L) ∼= K(X), where K(X) is the field of rational functions over K:
K(X) = {g(X)/h(X) | g, h ∈ K[X], h 6= 0}.

It is easy to generalize these results to any arbitrary straight line in the plane.
(2) For the circle C : X 2 + Y 2 = 1, we have f (X, Y ) = X 2 + Y 2 − 1. Since
X + Y 2 − 1 is (X + Y + 1)2 modulo 2, and we require f to be irreducible, we
2

must have char K 6= 2. Congruence modulo this f gives the coordinate ring
K[C] = {G(x, y) | G(X, Y ) ∈ K[X, Y ]}, where x2 + y 2 − 1 = 0.
Since char K 6= 2, the elements x, 1 − y, 1 + y are irreducible in K[C],
distinct from one another. But then, x2 = 1 − y 2 = (1 − y)(1 + y) gives two
different factorizations of the same element in K[C]. Therefore, K[C] is not
a unique factorization domain. On the contrary, any polynomial ring over a
field is a unique factorization domain. It follows that the coordinate ring of a
circle is not isomorphic to a polynomial ring.
2
However, the rational map K(C) → K(Z) taking x 7→ 1−Z 2Z
1+Z 2 and y 7→ 1+Z 2
can be easily verified to be an isomorphism of fields. Therefore, the function
field of a circle is isomorphic to the ring of univariate rational functions. ¤

Let us now specialize to elliptic (or hyperelliptic) curves. Write the equa-
tion of the curve as C : Y 2 + u(X)Y = v(X) (for elliptic curves, u(X) =
a1 X + a3 , and v(X) = X 3 + a2 X 2 + a4 X + a6 ). We have y 2 = −u(x)y + v(x).
If we take G(x, y) ∈ K[C], then by repeatedly substituting y 2 by the linear
(in y) polynomial −u(x)y + v(x), we can simplify G(x, y) as

G(x, y) = a(x) + yb(x)

for some a(X), b(X) ∈ K[X]. It turns out that such a representation of a
polynomial function on C is unique, that is, every G(x, y) ∈ K[C] corresponds
to unique polynomials a(X) and b(X).
Arithmetic of Elliptic Curves 207

The field K(C) of rational functions on C is a quadratic extension of the


field K(X) obtained by adjoining a root y of the irreducible polynomial
Y 2 + u(X)Y − v(X) ∈ K(X)[Y ].
The other root of this polynomial is −u(X) − y. Therefore, we define the
conjugate of G(x, y) = a(x) + yb(x) as
Ĝ(x, y) = a(x) − (u(x) + y)b(x).
The norm of G is defined as

N(G) = GĜ. (4.11)


Easy calculations show that N(G) = a(x)2 − a(x)b(x)u(x) − v(x)b(x)2 . In
particular, N(G) is a polynomial in x alone.
Now, take a rational function R(x, y) = G(x, y)/H(x, y) ∈ K(C). Multi-
plying both the numerator and the denominator by Ĥ simplifies R as
R(x, y) = s(x) + yt(x),
where s(x), t(x) ∈ K(x) are rational functions in x only.
Let C : f (X, Y ) = 0 be a plane (irreducible) curve, and P = (h, k) a
finite point on C. We plan to evaluate a rational function R(x, y) on C at
P . The obvious value of R at P should be R(h, k). The following example
demonstrates that this evaluation process is not so straightforward.

Example 4.29 Consider the unit circle C : X 2 + Y 2 − 1 = 0. Take R(x, y) =


1−x
y ∈ K(C), and P = (1, 0). Plugging in the values x = 1 and y = 0 in
R(x, y) gives an expression of the form 00 , and it appears that R is not defined
at P . However, x2 + y 2 − 1 = 0, so that y 2 = (1 − x)(1 + x), and R can also be
y
written as 1+x . Now, if we substitute x = 1 and y = 0, R evaluates to 0. ¤

Definition 4.30 The value of a polynomial function G(x, y) ∈ K[C] at P =


(h, k) is G(P ) = G(h, k) ∈ K. A rational function R(x, y) ∈ K(C) is defined at
P if there is a representation R(x, y) = G(x, y)/H(x, y) for polynomials G, H
with H(h, k) 6= 0. In that case, the value of R at P is R(P ) = G(P )/H(P ) =
G(h, k)/H(h, k) ∈ K. If R is not defined at P , we take R(P ) = ∞. ⊳

Definition 4.31 P is called a zero of R(x, y) ∈ K(C) if R(P ) = 0, whereas


P is called a pole of R if R(P ) = ∞. ⊳

The notion of the value of a rational function can be extended to the


points at infinity on C. We now define this for elliptic curves only. Neglecting
lower-degree terms in the Weierstrass equation for an elliptic curve C, we
obtain Y 2 ≈ X 3 , that is, Y grows exponentially 3/2 times as fast as X. In
view of this, we give a weight of two to X, and a weight of three to Y . Let
G(x, y) = a(x) + yb(x) ∈ K[C] with a(x), b(x) ∈ K[x]. With x, y assigned the
above weights, the degree of a(x) is 2 degx (a) (where degx denotes the usual
208 Computational Number Theory

x-degree), and the degree of yb(x) is 3 + 2 degx (b). The larger of these two
degrees is taken to be the degree of G(x, y), that is,
¡ ¢
deg G = max 2 degx (a), 3 + 2 degx (b) .
The leading coefficient of G, denoted lc(G), is that of a or b depending upon
whether 2 degx (a) > 3 + 2 degx (b) or not. The two degrees cannot be equal,
since 2 degx (a) is even, whereas 3 + 2 degx (b) is odd. Now, define the value of
R(x, y) = G(x, y)/H(x, y) ∈ K(C) (with G, H ∈ K[C]) at O as

0 if deg G < deg H,
R(O) = ∞ if deg G > deg H,

lc(G)/ lc(H) if deg G = deg H.
The point O is a zero (or pole) of R if R(O) = 0 (or R(O) = ∞).
Let C again be any algebraic curve. Although we are now able to uniquely
define values of rational functions at points on C, the statement of Defini-
tion 4.30 is existential. In particular, nothing in the definition indicates how
we can obtain a good representation G/H of R. We use a bit of algebra to
settle this issue. The set of rational functions on C defined at P is a local ring
with the unique maximal ideal comprising functions that evaluate to zero at
P . This leads to the following valuation of rational functions at P . The notion
of zeros and poles can be made concrete from this.

Theorem 4.32 There exists a rational function UP (x, y) (depending on P )


with the following two properties:
(1) UP (P ) = 0.
(2) Any non-zero rational function R(x, y) ∈ K(C) can be written as R = UPd S
with d ∈ Z and with S ∈ K(C) having neither a pole nor a zero at P .
The function UP is called a uniformizer at the point P . The integer d does
not depend upon the choice of the uniformizer. The order of R at P , denoted
ordP (R), is the integer d. ⊳

Theorem 4.33 If ordP (R) = 0, then P is neither a pole nor a zero of R. If


ordP (R) > 0, then P is a zero of R. If ordP (R) < 0, then P is a pole of R. ⊳

Definition 4.34 The multiplicity of a zero or pole of R at P is the absolute


value of the order of R at P , that is, vP (R) = | ordP (R)|. If P is neither a
zero nor a pole of R, we take vP (R) = 0. ⊳

Example 4.35 Consider the real unit circle C : X 2 + Y 2 − 1 = 0, and take


a point P = (h, k) on C. The two conditions in Theorem 4.32 indicate that
UP should have a simple zero at P . If k 6= 0, the vertical line X = h cuts the
circle at P , and we take x − h as the uniformizer at P . On the other hand, if
k = 0, the vertical line X = h touches the circle at P , that is, the multiplicity
of the intersection of C with X = h at (h, 0) is two. So we cannot take x − h as
the uniformizer at (h, 0). However, the horizontal line Y = 0 meets the circle
with multiplicity one, so y can be taken as the uniformizer at (h, 0).
Arithmetic of Elliptic Curves 209

As a specific example, consider the rational function


G(x, y) 25y 2 x + 25yx2 − 30y 2 − 30yx + 9y − 34x + 30
R(x, y) = = .
H(x, y) 5y 2 + 8x − 8

Case 1: P = ( 35 , 54 ).
In this case, we can take x − 3/5 as the uniformizer. Let us instead take
5(x − h) = 5x − 3 as the uniformizer. Clearly, multiplication of a uniformizer
by non-zero field elements does not matter. We have G(P ) = H(P ) = 0, so
we need to find an alternative representation of R. We write

G(x, y) = 25y 2 x + 25yx2 − 30y 2 − 30yx + 9y − 34x + 30


= 5y 2 (5x − 3) + y(5x − 3)2 + (5x − 3)(3x − 5)
= (5x − 3)(5y 2 + y(5x − 3) + 3x − 5).

The function 5y 2 + y(5x − 3) + 3x − 5 again evaluates to zero at P = ( 35 , 45 ).


So we can factor out 5x − 3 further from 5y 2 + y(5x − 3) + 3x − 5.

G(x, y) = (5x − 3)(5y 2 + y(5x − 3) + 3x − 5)


= (5x − 3)(y(5x − 3) + 3x − 5x2 )
= (5x − 3)2 (y − x).

Since y − x does not evaluate to zero at P , we are done. The denominator is

H(x, y) = 5y 2 + 8x − 8 = −5x2 + 8x − 3 = (5x − 3)(1 − x).

It therefore follows that


µ ¶
y−x
R(x, y) = (5x − 3) (4.12)
1−x

with the rational function y−x 3 4


1−x neither zero not ∞ at P . Therefore, ( 5 , 5 ) is
a zero of R of multiplicity one (a simple zero).
Case 2: P = (1, 0).
Eqn (4.12) indicates that R has a pole at P . But what is the multiplicity
of this pole? Since y (not x − 1) is a uniformizer at P , we rewrite R(x, y) as
µ ¶
(y − x)(1 + x)
R(x, y) = (5x − 3) = y −2 (5x − 3)(y − x)(1 + x).
y2
The function (5x − 3)(y − x)(1 + x) is neither zero nor ∞ at P , so (1, 0) is a
pole of multiplicity two (a double pole) of R.
It is instructive to study uniformizers other than those prescribed above.
The line Y = X − 1 meets (but not touches) the circle at P = (1, 0). So we
may take y − x + 1 as a uniformizer at (1, 0). Since (y − x + 1)2 = y 2 + x2 +
1 − 2yx + 2y − 2x + 1 = 2 − 2x + 2y − 2yx = 2(1 − x)(1 + y), Eqn (4.12) gives
¡ ¢
R(x, y) = (y − x + 1)−2 2(5x − 3)(y − x)(1 + y) .
210 Computational Number Theory

But 2(5x − 3)(y − x)(1 + y) has neither a zero nor a pole at P = (1, 0), so
(1, 0) is again established as a double pole of R.
It is not necessary to take only linear functions as uniformizers. The circle
(X − 1)2 + (Y − 1)2 = 1 meets the circle X 2 + Y 2 = 1 at P = (1, 0) with
multiplicity one. So (x−1)2 +(y −1)2 −1 may also be taken as a uniformizer at
(1, 0). But (x−1)2 +(y−1)2 −1 = x2 +y 2 −1−2(x+y−1) = 2(1−x−y), and so
[(x−1)2 +(y−1)2 −1]2 = 4(1+x2 +y 2 −2x−2y+2xy) = 4(1+1−2x−2y+2xy) =
8(1 − x)(1 − y), and we have the representation
¡ ¢−2 ¡ ¢
R(x, y) = (x − 1)2 + (y − 1)2 − 1 8(5x − 3)(y − x)(1 − y) ,

which yet again reveals that the multiplicity of the pole of R at (1, 0) is two.
Let us now pretend to take x2 − y 2 − 1 as a uniformizer at P = (1, 0). We
have x2 − y 2 − 1 = 2x2 − 2 − (x2 + y 2 − 1) = 2x2 − 2 = 2(x − 1)(x + 1), so that
¡ ¢
R(x, y) = (x2 − y 2 − 1)−1 − 2(5x − 3)(y − x)(1 + x) .

This seems to reveal that (1, 0) is a simple pole of R. This conclusion is wrong,
because the hyperbola X 2 − Y 2 = 1 touches the circle X 2 + Y 2 = 1 at (1, 0)
(with intersection multiplicity two), that is, x2 − y 2 − 1 is not allowed to be
taken as a uniformizer at (1, 0).
If a curve C ′ meets C at P = (1, 0) with intersection multiplicity larger
than two, R(x, y) cannot at all be expressed in terms of the equation of C ′ . For
instance, take C ′ to be the parabola Y 2 = 2(1 − X) which meets X 2 + Y 2 = 1
at (1, 0) with multiplicity four (argue why). We have y 2 − 2(1 − x) = (x2 +
y 2 − 1) − (x2 − 2x + 1) = −(1 − x)2 , that is,
¡ ¢−1 ¡ ¢
[R(x, y)]2 = y 2 − 2(1 − x) − (5x − 3)2 (y − x)2 ,

indicating that R(x, y) itself has a pole at (1, 0) of multiplicity half, an absurd
conclusion indeed.
At any rate, (non-tangent) linear functions turn out to be the handiest as
uniformizers, particularly at all finite points on a curve. ¤

Some important results pertaining to poles and zeros are now stated.

Theorem 4.36 Any non-zero rational function has only finitely many zeros
and poles. ⊳

Theorem 4.37 Suppose that the underlying field K is algebraically closed.


If the only poles of a rational function are at the points at infinity, then the
rational function is a polynomial function. ⊳

Theorem 4.38 For a projective curve defined over an algebraically closed


field, the sum of the orders of a non-zero rational function at its zeros and
poles is zero. ⊳
Arithmetic of Elliptic Curves 211

The algebraic closure of K is crucial for Theorem 4.38. If K is not alge-


braically closed, any non-zero rational function continues to have only finitely
many zeros and poles, but the sum of their orders is not necessarily zero.
For elliptic curves C : Y 2 +u(X)Y = v(X), we can supply explicit formulas
for the uniformizer. Let P = (h, k) be a finite point on C.
Definition 4.39 The opposite of P is defined as P̃ = −P = (h, −k −u(h)). P
and P̃ are the only points on C with X-coordinate equal to h. Conventionally,
the opposite of O is taken as O itself.
P is called an ordinary point if P̃ 6= P , and a special point if P̃ = P . ⊳
Any line passing through a finite point P but not a tangent to C at P can
be taken as a uniformizer UP at P . For example, we may take
½
x − h if P is an ordinary point,
UP =
y − k if P is a special point.
A uniformizer at O is x/y.
Theorem 4.32 does not lead to an explicit algorithm for computing orders
of rational functions. For elliptic curves, however, orders can be computed
without explicitly using uniformizers. Let us first consider polynomial func-
tions. Let G(x, y) = a(x) + yb(x) ∈ K[C], P = (h, k) a finite point on C, and
e the largest exponent for which (x − h)e divides both a(x) and b(x). Write
G(x, y) = (x − h)e G1 (x, y). If G1 (h, k) 6= 0, take l = 0, otherwise take l to be
the largest exponent for which (x − h)l | N(G1 ) (see Eqn (4.11)). We have
½
e+l if P is an ordinary point,
ordP (G) = (4.13)
2e + l if P is a special point.
The point O needs special attention:
ordO (G) = − max(2 degx a, 3 + 2 degx b). (4.14)
For a rational function R(x, y) = G(x, y)/H(x, y) ∈ K(C) with G, H ∈ K[C],
ordP (R) = ordP (G) − ordP (H) (4.15)
for any point P on C (including the point O).

Example 4.40 Consider the elliptic curve C : Y 2 = X 3 − X defined over C.


(1) Rational functions involving only x are simpler. The rational function
(x − 1)(x + 1)
R1 (x, y) =
x3 (x − 2)
has simple zeros at x = ±1, a simple pole at x = 2, and a pole of multiplicity
three at x = 0. The points on C with √ these X-coordinates√ are P1 = (0, 0),
P2 = (1, 0), P3 = (−1, 0), P4 = (2, 6), and P5 = (2, − 6). P1 , P2 , P3 are
special points, so ordP1 (R1 ) = −6, ordP2 (R1 ) = ordP3 (R1 ) = 2. P4 and P5
are ordinary points, so ordP4 (R1 ) = ordP5 (R1 ) = −1. Finally, note that R1 →
212 Computational Number Theory

1/x2 as x → ∞. But x has a weight of two, so R1 has a zero of order four at


O. The sum of these orders is −6 + 2 + 2 − 1 − 1 + 4 = 0.
(2) Now, consider the rational function R2 (x, y) = x/y involving y. At
the point P1 = (0, 0), R2 appears to be undefined. But y 2 = x3 − x, so
R2 = y/(x2 − 1) too, and R2 (P1 ) = 0, that is, R2 has a zero at P1 . For the
numerator y, we have e = 0 and l = 1, so Eqn (4.13) gives ordP1 (y) = 1. On
the other hand, the denominator x2 − 1 has neither a zero nor a pole at P1 .
So ordP1 (R2 ) = 1 by Eqn (4.15).
Notice that ordP1 (x) = 2 (by Eqn (4.13), since e = 1, l = 0, and P1 is a
special point), so the representation R2 = x/y too gives ordP1 (R2 ) = 2−1 = 1.
(3) Take the same curve C : Y 2 = X 3 − X defined √ over F7 . Since√6 is a
quadratic non-residue modulo 7, the points P4 = (2, 6) and P5 = (2, − 6) do
not lie on the curve over F7 . The only zeros and poles of R1 are, therefore, P1 ,
P2 , P3 , and O. The orders of R1 at these points add up to −6+2+2+4 = 2 6= 0.
This example illustrates the necessity of assuming algebraic closure of the field
of definition for the elliptic curve. However, C is defined over the algebraic
closure F̄7 of F7 . The field F̄7 contains the square roots of 6, and the problem
associated with F7 is eliminated. ¤

Example 4.41 Let us compute all the zeros and poles of the rational function
G(x, y) x+y
R(x, y) = = 2
H(x, y) x +y
on the elliptic curve E : Y 2 = X 3 + X defined over the field F5 . We handle
the numerator and the denominator of R separately.
Zeros and poles of G(x, y) = x + y : A zero of G satisfies x + y = 0, that is,
y = −x. Since x, y also satisfy the equation of E, we have y 2 = (−x)2 = x3 +x,
that is, x(x2 − x + 1) = 0. The polynomial x2 − x + 1 is irreducible over F5 . Let
θ ∈ F̄5 be the element satisfying θ2 +2 = 0. This element defines the extension
F52 = F5 (θ) in which we have x(x2 − x + 1) = x(x + (θ + 2))(x + (4θ + 2)).
Therefore, the zeros of G(x, y) correspond to x = 0, −(θ + 2), −(4θ + 2), that
is, x = 0, 4θ + 3, θ + 3. Plugging in these values of x in y = −x gives us the
three zeros of G as Q0 = (0, 0), Q1 = (4θ + 3, θ + 2), and Q2 = (θ + 3, 4θ + 2).
In order to compute the multiplicities of these zeros, we write G(x, y) =
a(x) + yb(x) with a(x) = x and b(x) = 1. We have gcd(a(x), b(x)) = 1, so we
compute N(G) = a(x)2 − y 2 b(x)2 = x2 − (x3 + x) = −x(x2 − x + 1). This
indicates that each of the three zeros of G has e = 0 and l = 1, and so has
multiplicity one (Q0 is a special point, whereas Q1 and Q2 are ordinary).
The degree of x + y is three, so G has a pole of multiplicity three at O.
Zeros and poles of H(x, y) = x2 +y : The zeros of H correspond to y = −x2 ,
that is, y 2 = (−x2 )2 = x3 + x, that is, x(x3 − x2 − 1) = 0. The cubic factor
being irreducible, all the zeros of H exist on the curve E defined over F53 .
Let ψ ∈ F̄5 satisfy ψ 3 + ψ + 1 = 0. Then, the zeros of H correspond to
x(x3 − x2 − 1) = x(x + (2ψ 2 + 2ψ + 1))(x + (4ψ 2 + 4))(x + (4ψ 2 + 3ψ + 4)) = 0,
Arithmetic of Elliptic Curves 213

FIGURE 4.7: Zeros and poles of straight lines


Q
T V
Q L R P P

P
Q

(a) (b) (c)

that is, x = 0, 3ψ 2 + 3ψ + 4, ψ 2 + 1, ψ 2 + 2ψ + 1. Since y = −x2 on H, the


corresponding y-values are respectively 0, 4ψ 2 + 2ψ + 3, ψ 2 + 4ψ + 1, 4ψ + 2.
Therefore, the zeros of H are Q0 = (0, 0), Q3 = (3ψ 2 + 3ψ + 4, 4ψ 2 + 2ψ + 3),
Q4 = (ψ 2 + 1, ψ 2 + 4ψ + 1), and Q5 = (ψ 2 + 2ψ + 1, 4ψ + 2).
The multiplicities of these zeros can be computed from the expression
N (H) = x4 − y 2 = x4 − (x3 − x) = x(x3 − x2 − 1). It follows that each of the
zeros Q0 , Q3 , Q4 , Q5 has multiplicity one.
The degree of x2 + y is four, so H has a pole of multiplicity four at O.
Zeros and poles of R(x, y) = G(x, y)/H(x, y) : We have ordQ0 (R) = 1 − 1 =
0, ordQ1 (R) = 1 − 0 = 1, ordQ2 (R) = 1 − 0 = 1, ordQ3 (R) = 0 − 1 = −1,
ordQ4 (R) = 0 − 1 = −1, ordQ5 (R) = 0 − 1 = −1, and ordO (R) = −3 − (−4) =
1. To sum up, R has simple zeros at Q1 , Q2 and O, and simple poles at Q3 ,
Q4 and Q5 . The smallest extension of F5 containing (the coordinates of) all
these points is F56 = F5 (θ)(ψ). Although Q0 is individually a zero of both G
and H, it is neither a zero nor a pole of R. ¤
Example 4.42 Zeros and poles of straight lines are quite important in the
rest of this chapter. Some examples are shown in Figure 4.7.
In Part (a), the non-vertical line L passes through the three points P, Q, R
on the elliptic curve. These are the only zeros of L on C. We have ordP (L) =
ordQ (L) = ordR (L) = 1, and ordO (L) = −3.
In Part (b), the non-vertical line T is a tangent to the curve at P . The
other point of intersection of T with the curve is Q. We have ordP (T ) = 2,
ordQ (T ) = 1, and ordO (T ) = −3.
The vertical line V of Part (c) passes through only two points P and Q of
the curve. This indicates ordP (V ) = ordQ (V ) = 1, and ordO (V ) = −2. ¤

4.4.3 Rational Maps and Endomorphisms on Elliptic Curves


From this section until the end of this chapter, we concentrate on elliptic
curves only. Some of the results pertaining to elliptic curves can be generalized
to general curves, but I do not plan to be so general. Before starting the
discussion, let me once again review our notations.
214 Computational Number Theory

Let E be an elliptic curve defined over a field K. K̄ denotes the algebraic


closure of K. Our interests concentrate primarily on finite fields K = Fq
with char K = p. This means that we will remove the restriction that K is
algebraically closed. For any field extension L of K, we denote by EL the
group of L-rational points on the elliptic curve E (which is defined over L
as well). We assume that the point O belongs to all such groups EL . When
L = K̄, we abbreviate EK̄ as E. Thus, E would denote21 both the curve and
the set (group) of K̄-rational points on it. For L = Fqk , we use the shorthand
symbol Eqk to stand for the group EFqk . A rational function R on E is a
member of K̄(E). We say that R is defined over L if R has a representation
of the form R(x, y) = G(x, y)/H(x, y) with G(x, y), H(x, y) ∈ L[E].

Definition 4.43 A rational map on E is a function E → E. A rational map α


is specified by two rational functions α1 , α2 ∈ K̄(E) such that for any P ∈ E,
the point α(P ) = α(h, k) = (α1 (h, k), α2 (h, k)) lies again on E. ⊳

Since α(P ) is a point on E, the functions α1 , α2 satisfy the equation for


E, and is a point on the elliptic curve EK̄(E) . Denote the point at infinity on
this curve by O′ . This map stands for the constant function O′ (P ) = O for
all P ∈ E. For a non-zero rational map α on EK̄(E) and for a point P on E,
either both α1 (P ), α2 (P ) are defined at P , or both are undefined at P . This
is because α1 and α2 satisfy the equation of the curve. In the first case, we
take α(P ) = (α1 (P ), α2 (P )), whereas in the second case, we take α(P ) = O.

Theorem 4.44 For all rational maps α, β on EK̄(E) and for all points P on
E, we have (α + β)(P ) = α(P ) + β(P ). ⊳

This is a non-trivial assertion about rational maps. The sum α + β is


the sum of rational maps on the curve EK̄(E) . On the other hand, α(P ) +
β(P ) is the sum of points on the curve EK̄ . Theorem 4.44 states that these
two additions are mutually compatible. Another important feature of rational
maps (over algebraically closed fields) is the following.

Theorem 4.45 A rational map is either constant or surjective. ⊳

Example 4.46 (1) The zero map O′ : E → E taking P 7→ O is already


discussed as the group identity of EK̄(E) .
(2) The constant map αh,k : E → E taking any point P to a fixed point
(h, k) on E is a generalization of the zero map. This map corresponds to the
two constant rational functions h and k.
(3) The identity map id : E → E, P 7→ P , is non-constant (so surjective).
(4) Fix a point Q ∈ E. The translation map τQ : E → E taking P 7→ P +Q
is again surjective.
21 If E is an elliptic curve defined over a field F , we use E with dual meaning. It is both a
F
geometric object (the curve, so we can say Q is on EF ) and an algebraic object (the group,
so we can say Q is in EF , or Q ∈ EF ). In algebraic geometry, they are same anyway.
Arithmetic of Elliptic Curves 215

(5) Take m ∈ Z. The multiplication-by-m map [m] : E → E takes a point


P on E to its m-th multiple mP . For m 6= 0, the map is non-constant and,
therefore, surjective. That is, given any point Q on E, we can find a point P
on E such that Q = mP . ¤

For a moment, suppose that K = Fq and (h, k) ∈ E = EK̄ . Since E is


defined over K, the coefficients in the defining equation (like ai in Eqn (4.4))
are members of K. By Fermat’s little theorem for Fq , we conclude that the
point (hq , k q ) lies on the curve E. Of course, if (h, k) is already K-rational,
then (hq , k q ) = (h, k) is K-rational too. But if (h, k) is not K-rational but
K̄-rational, then (hq , k q ) is again K̄-rational.

Definition 4.47 The q-th power Frobenius map is defined as ϕq : E → E


taking (h, k) to (hq , k q ) (where E = EK̄ ). ⊳

Definition 4.48 A rational map α : E → E is called an endomorphism or an


isogeny of E if α is a group homomorphism, that is, α(P + Q) = α(P ) + α(Q)
for all P, Q ∈ E. The set of all endomorphisms of E is denoted by End(E). ⊳

For α, β ∈ End(E) and P ∈ E, define (α + β)(P ) = α(P ) + β(P ). By


Theorem 4.44, this addition is the same as the addition in the elliptic curve
EK̄(E) . Also, define the product of α and β as the composition (α ◦ β)(P ) =
α(β(P )). The set End(E) is a ring under these operations. Its additive identity
is the zero map O′ , and its multiplicative identity is the identity map id.
The translation map τQ is an endomorphism of E only for Q = O. The
multiplication-by-m maps [m] are endomorphisms of E with [m] 6= [n] for
m 6= n. The set of all the maps [m] is a subring of End(E), isomorphic to Z.

Definition 4.49 If End(E) contains an endomorphism other than the maps


[m], we call E an elliptic curve with complex multiplication. ⊳

If E is defined over K = Fq , then the q-th power Frobenius map ϕq taking


(h, k) ∈ E to (hq , k q ) ∈ E is an endomorphism of E. We have ϕq 6= [m] for
any m ∈ Z. It follows that any elliptic curve defined over any finite field is a
curve with complex multiplication.
The notion of rational maps and isogenies can be extended to two different
elliptic curves E, E ′ defined over the same field K.

Definition 4.50 A rational map α : E → E ′ is specified by two rational


functions α1 , α2 ∈ K̄(E) such that for all P ∈ E, the image (α1 (P ), α2 (P )) is
a point on E ′ . A rational map E → E ′ , which is also a group homomorphism, is
called an isogeny of E to E ′ . An isomorphism E → E ′ is a bijective isogeny. ⊳

Exercise 4.50 provides examples of isogenies between elliptic curves defined


over F5 . For elliptic curves, the notion of isomorphism can be characterized
in terms of admissible changes of variables described in Theorem 4.51.
216 Computational Number Theory

Theorem 4.51 Two elliptic curves E, E ′ defined over K are isomorphic over
K̄ if and only if there exist u, r, s, t ∈ K̄ with u 6= 0 such that substituting X
by u2 X + r and Y by u3 Y + su2 X + t transforms the equation of E to the
equation of E ′ . ⊳

The substitutions made to derive Eqns (4.5), (4.6), (4.7) and (4.8) from the
original Weierstrass Eqn (4.4) are examples of admissible changes of variables.
For the rest of this section, we concentrate on the multiplication-by-m
endomorphisms. We identify [m] with a pair (gm , hm ) of rational functions.
These rational functions are inductively defined by the chord-and-tangent rule.
Consider an elliptic curve defined by Eqn (4.4).
g1 = x, h1 = y.

g2 = −2x + λ2 + a1 λ − a2 , h2 = −λ(g2 − x) − a1 g2 − a3 − y, (4.16)

3x2 + 2a2 x + a4 − a1 y
where λ = . Finally, for m > 3, we recursively define
2y + a1 x + a3
gm = −gm−1 −x+λ2 +a1 λ−a2 , hm = −λ(gm −x)−a1 gm −a3 −y, (4.17)
hm−1 − y
where λ = . The kernel of the map [m] is denoted by E[m], that is,
gm−1 − x
E[m] = {P ∈ E = EK̄ | mP = O}.
Elements of E[m] are called m-torsion points of E. For every m ∈ Z, E[m] is
a subgroup of E.
Theorem 4.52 Let p = char K. If p = 0 or gcd(p, m) = 1, then
E[m] ∼
= Zm × Zm ,
and so |E[m]| = m2 . If gcd(m, n) = 1, then E[mn] ∼
= E[m] × E[n]. ⊳
The rational functions gm , hm have poles precisely at the points in E[m].
But they have some zeros also. We plan to investigate polynomials having zeros
precisely at the points of E[m]. Assume that either p = 0 or gcd(p, m) = 1.
Then, E[m] contains exactly m2 points. A rational function ψm whose only
zeros are the m2 points of E[m] is a polynomial by Theorem 4.37. All these
zeros are taken as simple. So ψm must have a pole of multiplicity m2 at O.
The polynomial ψm is unique up to multiplication by non-zero elements of K̄.
If we arrange the leading coefficient of ψm to be m, then ψm becomes unique,
and is called the m-th division polynomial.
The division polynomials are defined recursively as follows.
ψ0 = 0
ψ1 = 1
ψ2 = 2y + a1 x + a3
Arithmetic of Elliptic Curves 217

ψ3 = 3x4 + d2 x3 + 3d4 x2 + 3d6 x + d8


£
ψ4 = 2x6 + d2 x5 + 5d4 x4 + 10d6 x3 + 10d8 x2 +
¤
(d2 d8 − d4 d6 )x + d4 d8 − d26 ψ2
2 2
(ψm+2 ψm−1 − ψm−2 ψm+1 )ψm
ψ2m = for m > 2
ψ2
3 3
ψ2m+1 = ψm+2 ψm − ψm−1 ψm+1 for m > 2.
Here, the coefficients di are as in Definition 4.9. The rational functions gm , hm
can be expressed in terms of the division polynomials as follows.
ψm+n ψm−n
gm − gn = − 2 ψ2
.
ψm n

Putting n = 1 gives
ψm+1 ψm−1
gm = x− 2
. (4.18)
ψm
Moreover,
2 2
ψm+2 ψm−1 − ψm−2 ψm+1 1
hm = 3
− (a1 gm + a3 ) (4.19)
2ψ2 ψm 2
2
ψm+2 ψm−1 ψm−1 ψm+1
= y+ 3
+ (3x2 + 2a2 x + a4 − a1 y) 2
. (4.20)
ψ2 ψm ψ2 ψm

4.4.4 Divisors
Let ai , i ∈ I, be symbols
P indexed by I. A finite formal sum of ai , i ∈ I, is
an expression of the form i∈I mi aP i with mi ∈ Z such that mi = 0 except for
only finitely many i ∈ I. The sum i∈I mi ai is formal in the sense that the
symbols ai are not meant to be evaluated. They act as placeholders. Define
X X X X X
mi ai + ni ai = (mi + ni )ai , and − mi ai = (−mi )ai .
i∈I i∈I i∈I i∈I i∈I

Under these definitions, the set of these finite formal sums becomes an Abelian
group called the free Abelian group generated by the symbols ai , i ∈ I.
Now, let E be an elliptic curve defined over K. For a moment, let us treat
E as a curve defined over the algebraic closure K̄ of K.

Definition 4.53 A divisor on an elliptic curve E defined over a field K is a


formal sum22 of the rational points on E = EK̄ . ⊳
P
Let us use the notation D = P ∈E mP [P ] to denote a divisor D. Here,
the symbol [P ] is used to indicate that the sum is formal, that is, the points
P must not be evaluated when enclosed within square brackets.
22 In this sense, a divisor is also called a Weil divisor.
218 Computational Number Theory
P
Definition 4.54 The support of a divisor D = P mP [P ], denoted Supp(D),
is the set of points P for which mP 6= 0. By definition,
P the support of any
divisor is a finite set. The degree of D is the sum P mP . All divisors on E
form a group denoted by DivK̄ (E) or Div(E). The divisors of degree zero form
a subgroup denoted by Div0K̄ (E) or Div0 (E). ⊳

Definition 4.55 The divisor of a non-zero rational function R ∈ K̄(E) is


X
Div(R) = ordP (R)[P ] .
P ∈E

Since any non-zero rational function can have only finitely many zeros and
poles, Div(R) is defined (that is, a finite formal sum) for any R 6= 0.
A principal divisor is the divisor of some rational function. Theorem 4.38
implies that every principal divisor belongs to Div0K̄ (E). The set of all principal
divisors is a subgroup of Div0K̄ (E), denoted by PrinK̄ (E) or Prin(E). ⊳

Principal divisors satisfy the formulas:


Div(R) + Div(S) = Div(RS), and Div(R) − Div(S) = Div(R/S). (4.21)

Definition 4.56 Two divisors D, D′ in DivK̄ (E) are called equivalent if they
differ by a principal divisor. That is, D ∼ D′ if and only if D = D′ + Div(R)
for some R(x, y) ∈ K̄(E). ⊳

Evidently, equivalence of divisors is an equivalence relation on Div(E), and


also on Div0 (E).

Definition 4.57 The quotient group DivK̄ (E)/ PrinK̄ (E) is called the divisor
class group or the Picard group23 of E, denoted PicK̄ (E) or Pic(E). The
quotient group Div0K̄ (E)/ PrinK̄ (E) is called the Jacobian24 of E, denoted
Pic0K̄ (E) or Pic0 (E) or JK̄ (E) or J(E). ⊳

We have defined divisors, principal divisors, and Jacobians with respect to


an algebraically closed field (like K̄). If we focus our attention on a field which
is not algebraically closed, a principal divisor need not always have degree zero
(see Example 4.40(3)). It still makes sense to talk about PicK (E) and JK (E)
for a field K which is not algebraically closed, but then these groups have to
be defined in a different manner (albeit with the help of PicK̄ (E) and JK̄ (E)).
The chord-and-tangent rule allows us to bypass this nitty-gritty.

Example 4.58 (1) Consider the lines given in Figure 4.7. We have
Div(L) = [P ] + [Q] + [R] − 3[O] = ([P ]−[O]) + ([Q]−[O]) + ([R]−[O]),
Div(T ) = 2[P ] + [Q] − 3[O] = 2([P ]−[O]) + ([Q]−[O]), and
Div(V ) = [P ] + [Q] − 2[O] = ([P ]−[O]) + ([Q]−[O]).
23 This is named after the French mathematician Charles Émile Picard (1856–1941).
24 This is named after Carl Gustav Jacob Jacobi (1804–1851).
Arithmetic of Elliptic Curves 219

(2) The rational function R1 of Example 4.40(1) on the complex curve


Y 2 = X 3 − X has the divisor

Div(R1 ) = −6[P1 ] + 2[P2 ] + 2[P3 ] − [P4 ] − [P5 ] + 4[O].


= −6([P1 ]−[O])+2([P2 ]−[O])+2([P3 ]−[O])−([P4 ]−[O])−([P5 ]−[O]).

(3) The rational function R of Example 4.41 on the curve Y 2 = X 3 + X


defined over F5 has the divisor

Div(R) = [Q1 ] + [Q2 ] + [O] − [Q3 ] − [Q4 ] − [Q5 ].


= ([Q1 ]−[O]) + ([Q2 ]−[O]) − ([Q3 ]−[O]) − ([Q4 ]−[O]) − ([Q5 ]−[O]).
This divisor is not defined over F5 , but over F56 . ¤

For every D ∈ Div0K̄ (E), there exist a unique rational point P and a
rational function R such that D = [P ] − [O] + Div(R). But then D ∼ [P ] − [O]
in Div0K̄ (E). We identify P with the equivalence class of [P ] − [O] in JK̄ (E).
This identification establishes a bijection between the set EK̄ of rational points
on E and the Jacobian JK̄ (E) of E. As Example 4.58(1) suggests, this bijection
also respects the chord-and-tangent rule for addition in E. The motivation for
addition of points in an elliptic-curve group, as described in Figures 4.3 and
4.4, is nothing but a manifestation of this bijection. Moreover, it follows that
the group EK̄ is isomorphic to the Jacobian JK̄ (E).
If K is not algebraically closed, a particular subgroup of JK̄ (E) can be de-
fined to be the Jacobian JK (E) of E over K. Thanks to the chord-and-tangent
rule, we do not need to worry about the exact definition of JK (E). More pre-
cisely, if P, Q are K-rational points of E, the explicit formulas for P + Q, 2P ,
and −P guarantee that these points are defined over K as well. Furthermore,
the chord-and-tangent rule provides explicit computational handles on the
group JK (E). In other words, EK is the equivalent (and computationally ori-
ented) definition of JK (E) (just as E = EK̄ was for JK̄ (E)). This equivalence
proves the following important result.
P
Theorem 4.59 A divisor D = P mP [P ] ∈ DivK (E) is principal if and
only if P
(1) PP mP = 0 (integer sum), and
(2) p mP P = O (sum under the chord-and-tangent rule). ⊳

Example 4.60 (1) By Example 4.58(1), we have P + Q + R = O in Part (a),


2P + Q = O in Part (b), and P + Q = O in Part (c) of Figure 4.7.
(2) Example 4.58(2) indicates that −6P1 + 2P2 + 2P3 − P4 − P5 = O on
the complex elliptic curve Y 2 = X 3 − X. This is obvious from the fact that
2P1 = 2P2 = 2P3 = P4 + P5 = O, and from the expression of the rational
function R1 (x, y) in the factored form as presented in Example 4.40(1).
(3) By Example 4.58(3), Q1 + Q2 − Q3 − Q4 − Q5 = O on Y 2 = X 3 + X
defined over F56 . No non-empty proper sub-sum of this sum is equal to O. ¤
220 Computational Number Theory

Divisors are instrumental not only for defining elliptic-curve groups but
also for proving many results pertaining to elliptic curves. For instance, the
concept of pairing depends heavily on divisors. I now highlight some important
results associated with divisors, that are needed in the next section.
Let P, Q be points on EK . By LP,Q we denote the unique (straight) line
passing through P and Q. If P = Q, then LP,Q is taken to be the tangent to
E at the point P . Now, consider the points P, Q, ±R as shown in Figure 4.8.
Here, P + Q = −R, that is, P + Q + R = O.

FIGURE 4.8: Divisors of a line and a vertical line

Q R
P

−R

By Example 4.58, we have


Div(LP,Q ) = [P ] + [Q] + [R] − 3[O], and Div(LR, −R ) = [R] + [−R] − 2[O].
Subtracting the second equation from the first gives
Div(LP, Q /LR, −R ) = [P ] + [Q] − [−R] − [O] = [P ] + [Q] − [P + Q] − [O].
This implies the following two equivalences.
[P ] − [O] ∼ [P + Q] − [Q], and
([P ] − [O]) + ([Q] − [O]) ∼ [P + Q] − [O].
In both the cases, the pertinent rational function is LP, Q /LP +Q, −(P +Q) which
can be easily computed. We can force this rational function to have leading
coefficient one.
Example 4.61 Consider the curve E : Y 2 = X 3 + X + 5 defined over F37 .
Take the points³ P = ´ (1, 9) and Q = (10, 4) on E. The equation of the line
4−9
LP,Q is y = 10−1 x + c = 20x + c, where c ≡ 9 − 20 ≡ 26 (mod 37),
that is, LP,Q : y + 17x + 11 = 0. This line meets the curve at the third
point R = (19, 36), and its opposite is the point −R = (19, −36) = (19, 1).
The vertical line passing through R and −R is LR,−R : x − 19 = 0, that is,
LR,−R : x + 18 = 0. Therefore, we have
µ ¶
y + 17x + 11
[P ] − [O] = [P + Q] − [Q] + Div , and
x + 18
µ ¶
y + 17x + 11
([P ] − [O]) + ([Q] − [O]) = [P + Q] − [O] + Div .
x + 18
Arithmetic of Elliptic Curves 221

The leading term of y + 17x + 11 is y (recall that y has degree three, and x has
degree two), whereas the leading term of x + 18 is x. So both the numerator
and the denominator of the rational function y+17x+11
x+18 are monic. ¤

Let us now try to evaluate a rational function at a divisor.


P
Definition 4.62 Let D = P nP [P ] be a divisor on E, and let f ∈ K̄(E)
be a non-zero rational function, such that the supports of D and Div(f ) are
disjoint. Define the value of f at D as
Y Y
f (D) = f (P )nP = f (P )nP . ⊳
P ∈E P ∈Supp(D)

Two rational functions f and g have the same divisor if and only if f = cg

for a non-zero constant c ∈ K̄ P . In that case, if D has degree zero, then
Q nP
f (D) = g(D) P c = g(D)c P nP = g(D)c0 = g(D), that is, the value of
f at a divisor D of degree zero is dependent upon Div(f ) (rather than on f ).

Theorem 4.63 [Weil’s reciprocity theorem]25 If f and g are two non-zero


rational functions on E such that Div(f ) and Div(g) have disjoint supports,
then f (Div(g)) = g(Div(f )). ⊳

Example 4.64 For a demonstration of Theorem 4.63, we take the curve of


x + 16
Example 4.61, f (x, y) = y + 17x + 11, and g(x, y) = . We know that
x+4

Div(f ) = [P1 ] + [P2 ] + [P3 ] − 3[O],

where P1 = (1, 9), P2 = (10, 4), and P3 = (19, 36). On the other hand, g has
a double zero at P4 = (−16, 0) = (21, 0) and simple poles at P5 = (−4, 14) =
(33, 14) and P6 = (−4, −14) = (33, 23). Therefore,

Div(g) = 2[P4 ] − [P5 ] − [P6 ].

Therefore, Div(f ) and Div(g) have disjoint supports. But then,

f (Div(g)) ≡ f (P4 )2 f (P5 )−1 f (P6 )−1 ≡ 352 × 31−1 × 3−1 ≡ 8 (mod 37), and
g(Div(f )) ≡ g(P1 )g(P2 )g(P3 )g(O)−3 ≡ 33 × 23 × 16 × 1−3 ≡ 8 (mod 37).

We have g(O) = 1, since both the numerator and the denominator of g are
monic, and have the same degree (two). ¤

25 This theorem was proved by André Weil in 1942.


222 Computational Number Theory

4.5 Pairing on Elliptic Curves


We are now going to define maps that accept pairs of elliptic-curve points as
inputs, and output elements of a finite field. These pairing maps are important
from both theoretical and practical angles.
Let K = Fq with p = char K. We may have q = p. Take a positive integer
m coprime to p. The set of all m-th roots of unity in K̄ is denoted by µm .
Since all elements of µm satisfy the polynomial equation X m − 1 = 0 and
gcd(m, q) = 1, there are exactly m elements in µm . There are finite extensions
of K containing µm . The smallest extension L = Fqk of K = Fq , which
contains the set µm , has the extension degree k = ordm (q) (the multiplicative
order of q modulo m). We call this integer k the embedding degree (with respect
to q and m). In general, the value of k is rather large, namely |k| ≈ |q| in terms
of bit sizes. It is those particular cases, in which k is small, that are important
from a computational perspective.
We denote the set of all points in E = EK̄ of orders dividing m by E[m] =
EK̄ [m]. For a field F with K ⊆ F ⊆ K̄, let EF [m] denote those points in E[m],
all of whose coordinates are in F . Since |E[m]| = m2 , the 2m2 coordinates
of the elements of E[m] lie in finite extensions of K. Let L′ be the smallest
extension of K containing the coordinates of all the points of E[m]. In general,
L′ is a field much larger than L. However, think of the following situation which
turns out to be computationally the most relevant one: m is a prime divisor
of the group size |Eq | with m6 | q and m6 | q − 1. Then, L′ = L, where L is the
smallest extension of Fq containing µm .

4.5.1 Weil Pairing


Weil pairing is a function

em : E[m] × E[m] → µm

defined as follows. Take P1 , P2 ∈ E[m]. Let D1 be a divisor equivalent to


[P1 ] − [O]. Since mP1 = O, by Theorem 4.59, there exists a rational function
f1 such that Div(f1 ) = mD1 ∼ m[P1 ] − m[O]. Similarly, let D2 be a divisor
equivalent to [P2 ]−[O]. There exists a rational function f2 such that Div(f2 ) =
mD2 ∼ m[P2 ] − m[O]. D1 and D2 are chosen to have disjoint supports. Define

f1 (D2 )
em (P1 , P2 ) = .
f2 (D1 )

We first argue that this definition makes sense. First, note that f1 and f2 are
defined only up to multiplication by non-zero elements of K̄. But we have
already established that the values f1 (D2 ) and f2 (D1 ) are independent of the
choices of these constants, since D1 and D2 are of degree zero.
Arithmetic of Elliptic Curves 223

Second, we show that the value of em (P1 , P2 ) is independent of the choices


of D1 and D2 . To that effect, we take a divisor D1′ = D1 +Div(g) equivalent to
D1 (where g ∈ K̄(E)) and with support disjoint from that of D2 . Call the cor-
responding rational function f1′ . We need to look at the ratio f1′ (D2 )/f2 (D1′ ).
Since mD1′ = mD1 + m Div(g) = Div(f1 ) + Div(g m ) = Div(f1 g m ), we can
take f1′ = f1 g m , and so
f1′ (D2 ) f1 g m (D2 ) f1 (D2 )g m (D2 ) f1 (D2 )g(mD2 )
′ = = = =
f2 (D1 ) f2 (D1 + Div(g)) f2 (D1 )f2 (Div(g)) f2 (D1 )f2 (Div(g))
f1 (D2 )g(Div(f2 )) f1 (D2 )g(Div(f2 )) f1 (D2 )
= = = ,
f2 (D1 )f2 (Div(g)) f2 (D1 )g(Div(f2 )) f2 (D1 )
where we have used Weil’s reciprocity theorem to conclude that f2 (Div(g)) =
g(Div(f2 )). Analogously, changing D2 to an equivalent divisor D2′ does not
alter the value of em (P1 , P2 ).
Finally, note that em (P1 , P2 ) is an m-th root of unity, since
em (P1 , P2 )m = f1 (mD2 )/f2 (mD1 )
= f1 (Div(f2 ))/f2 (Div(f1 )) = 1 (by Weil reciprocity).
The divisors [P1 ] − [O] and [P2 ] − [O] do not have disjoint supports. It is
customary to choose D2 = [P2 ] − [O] and D1 = [P1 + T ] − [T ] for any point
T ∈ E. The point T need not be in E[m]. Indeed, one may choose T randomly
from E. However, in order to ensure that D1 and D2 have disjoint supports,
T must be different from −P1 , P2 , P2 − P1 , and O.
Proposition 4.65 Let P, Q, R be arbitrary points in E[m]. Then, we have:
(1) Bilinearity:
em (P + Q, R) = em (P, R)em (Q, R),
em (P, Q + R) = em (P, Q)em (P, R).
(2) Alternating: em (P, P ) = 1.
(3) Skew symmetry: em (Q, P ) = em (P, Q)−1 .
(4) Non-degeneracy: If P 6= O, then em (P, Q) 6= 1 for some Q ∈ E[m].
(5) Compatibility: If S ∈ E[mn] and T ∈ E[n], then emn (S, T ) = en (mS, T ).
(6) Linear dependence: If m is a prime and P 6= O, then em (P, Q) = 1 if
and only if Q lies in the subgroup of E[m] generated by P (that is, Q = aP
for some integer a). ⊳

4.5.2 Miller’s Algorithm


Miller26 proposes an algorithm similar to the repeated square-and-multiply
algorithm for modular exponentiation or the repeated double-and-add algo-
rithm for computing multiples of points on elliptic curves. Miller’s algorithm
makes use of the following rational functions.
26 Victor Saul Miller, The Weil pairing, and its efficient calculation, Journal of Cryptology,

17, 235–261, 2004.


224 Computational Number Theory

Definition 4.66 Let P ∈ E, and n ∈ Z. Define the rational function fn,P as


having the divisor
Div(fn, P ) = n[P ] − [nP ] − (n − 1)[O].
The function fn, P is unique up to multiplication by elements of K̄ ∗ . We take
unique monic polynomials for the numerator and the denominator of fn,P . ⊳

The importance of the rational functions fn,P in connection with the Weil
pairing lies in the fact that if P ∈ E[m], then Div(fm, P ) = m[P ] − [mP ] −
(m − 1)[O] = m[P ] − m[O]. Therefore, it suffices to compute fm,P1 and fm,P2
in order to compute em (P1 , P2 ). We can define fn,P inductively as follows.
f0, P = f1, P = 1, (4.22)
µ ¶
LP, nP
fn+1, P = fn, P for n > 1. (4.23)
L(n+1)P, −(n+1)P
Here, LS,T is the straight line through S and T (or the tangent to E at S if
S = T ), and LS,−S is the vertical line through S (and −S). Typically, m in
Weil pairing is chosen to be of nearly the same bit size as q. Therefore, it is
rather impractical to compute fm,P using Eqn (4.23). A divide-and-conquer
approach follows from the following property of fn,P .

Proposition 4.67 For n, n′ ∈ Z, we have


µ ¶
LnP, n′ P
fn+n′, P = fn, P × fn′, P × . (4.24)
L(n+n′ )P, −(n+n′ )P
In particular, for n = n′ , we have
µ ¶
LnP, nP
f2n, P = (fn, P )2 × . (4.25)
L2nP, −2nP
Here, LnP, nP is the line tangent to E at the point nP , and L2nP, −2nP is the
vertical line passing through 2nP . ⊳

Algorithm 4.1: Miller’s algorithm for computing fn,P


Let n = (1ns−1 . . . n1 n0 )2 be the binary representation of n.
Initialize f = 1 and U = P .
For i = s − 1, s − 2, . . . , 1, 0 {
/* Doubling */ ³ ´
L U
Update f = f 2 × L2U,U,−2U and U = 2U .

/* Conditional adding */ ³ ´
L P
If (ni = 1), update f = f × LU +P,U,−(U +P )
and U = U + P .
}
Return f .
Arithmetic of Elliptic Curves 225

Eqn (4.25) in conjunction with Eqn (4.23) give Algorithm 4.1. The function
fn, P is usually kept in the factored form. It is often not necessary to compute
fn, P explicitly. The value of fn, P at some point Q is only needed. In that
case, the functions LU, U /L2U, −2U and LU, P /LU +P, −(U +P ) are evaluated at Q
before multiplication with f .
We now make the relationship between em (P1 , P2 ) and fn,P more explicit.
We choose a point T ∈ E not equal to ±P1 , −P2 , P2 − P1 , O. We have
fm, P2 (T ) fm, P1 (P2 − T )
em (P1 , P2 ) = . (4.26)
fm, P1 (−T ) fm, P2 (P1 + T )
Moreover, if P1 6= P2 , we also have
fm, P1 (P2 )
em (P1 , P2 ) = (−1)m . (4.27)
fm, P2 (P1 )
Eqn (4.27) is typically used when P1 and P2 are linearly independent.
It is unnecessary to make four (or two) separate calls of Algorithm 4.1 for
computing em (P1 , P2 ). All these invocations have n = m, so a single double-
and-add loop suffices. For efficiency, one may avoid the division operations in
Miller’s loop by separately maintaining the numerator and the denominator.
After the loop terminates, a single division is made. Algorithm 4.2 incorporates
these ideas, and is based upon Eqn (4.27). The polynomial functions L−,− are
first evaluated at appropriate points and then multiplied.

Algorithm 4.2: Miller’s algorithm for computing em (P1 , P2 )


If (P1 = P2 ), return 1.
Let m = (1ms−1 . . . m1 m0 )2 be the binary representation of m.
Initialize fnum = fden = 1, U1 = P1 , and U2 = P2 .
For i = s − 1, s − 2, . . . , 1, 0 {
/* Doubling */
2
Update numerator fnum = fnum × LU1 ,U1 (P2 ) × L2U2 ,−2U2 (P1 ).
2
Update denominator fden = fden × L2U1 ,−2U1 (P2 ) × LU2 ,U2 (P1 ).
Update U1 = 2U1 and U2 = 2U2 .
/* Conditional adding */
If (mi = 1) {
Update numerator fnum = fnum ×LU1 ,P1 (P2 )×LU2 +P2 ,−(U2 +P2 ) (P1 ).
Update denominator fden = fden ×LU1 +P1 ,−(U1 +P1 ) (P2 )×LU2 ,P2 (P1 ).
Update U1 = U1 + P1 and U2 = U2 + P2 .
}
}
Return (−1)m fnum /fden .

Example 4.68 Consider the curve E : Y 2 = X 3 + 3X defined over F43 . This


curve is supersingular with |EF43 | = 44. The group EF43 is not cyclic, but of
226 Computational Number Theory

rank two, that is, isomorphic to Z22 ⊕ Z2 . We choose m = 11. The embedding
degree for this choice is k = 2. This means that we have to work in the field
F432 = F1849 . Since p = 43 is congruent to 3 modulo 4, −1 is a quadratic non-
residue modulo p, and we can represent F432 as F43 (θ) = {a + bθ | a, b ∈ F43 },
where θ2 + 1 = 0. The arithmetic of F432 resembles that of C. F∗432 contains
all the 11-th roots of unity. These are 1, 2 + 13θ, 2 + 30θ, 7 + 9θ, 7 + 34θ,
11 + 3θ, 11 + 40θ, 18 + 8θ, 18 + 35θ, 26 + 20θ, and 26 + 23θ.
The group EF432 contains 442 elements, and is isomorphic to Z44 ⊕ Z44 .
Moreover, this group fully contains E[11] which consists of 112 elements and
is isomorphic to Z11 ⊕ Z11 . The points P = (1, 2) and Q = (−1, 2θ) constitute
a set of linearly independent elements of E[11]. Every element of E[11] can
be written as a unique F11 -linear combination of P and Q. For example, the
element 4P + 5Q = (15 + 22θ, 5 + 14θ) is again of order 11.
Let us compute em (P1 , P2 ) by Algorithm 4.2, where P1 = P = (1, 2), and
P2 = 4P +5Q = (15+22θ, 5+14θ). The binary representation of 11 is (1011)2 .
We initialize f = fnum /fden = 1/1, U1 = P1 , and U2 = P2 . Miller’s loop works
as shown in the following table. Here, Λ1 stands for the rational function
LU1 ,U1 /L2U1 ,−2U1 (during doubling) or the function LU1 ,P1 /LU1 +P1 ,−(U1 +P1 )
(during addition), and Λ2 stands for L2U2 ,−2U2 /LU2 ,U2 (during doubling) or
LU2 +P2 ,−(U2 +P2 ) /LU2 ,P2 (during addition).

i mi Step Λ1 Λ2 f U1 U2
y+20x+21 x+(36+21θ) 34+37θ 2P1 = 2P2 =
2 0 Dbl (11, 26) (7+22θ, 28+7θ)
x+32 y+(12+35θ)x+(26+14θ) 28+θ
Add Skipped
y+31x+20 x+(2+26θ) 12+15θ 4P1 = 4P2 =
1 1 Dbl (36, 18) (41+17θ, 6+6θ)
x+7 y+(18+22θ)x+(29+2θ) 25+18θ
y+2x+39 x+(41+8θ) 25+15θ 5P1 = 5P2 =
Add (10, 16) (2+35θ, 30+18θ)
x+33 y+(28+9θ)x+(31+9θ) 28+20θ
y+8x+33 x+(28+21θ) 10+22θ 10P1 = 10P2 =
0 1 Dbl (1, 41) (15+22θ, 38+29θ)
x+42 y+(19+16θ)x+(19+16θ) 12+28θ
x+42 1 12θ 11P1 = 11P2 =
Add O O
1 x+(28+21θ) 18+32θ

From the table, we obtain


µ ¶
11 12θ
em (P1 , P2 ) = (−1) = 26 + 20θ
18 + 32θ

which is indeed an 11-th root of unity in F∗432 .


Algorithm 4.2 works perfectly when P1 and P2 are linearly independent.
If they are dependent, the algorithm may encounter an unwelcome situation.
Suppose that we want to compute em (P1 , P2 ) for P1 = (1, 2) and P2 = 3P1 =
(23, 14). Now, Miller’s loop proceeds as follows.
Arithmetic of Elliptic Curves 227

i mi Step Λ1 Λ2 f U1 U2
y + 20x + 21 x + 33 17
2 0 Dbl 2P1 = (11, 26) 2P2 = (10, 27)
x + 32 y + 20x + 42 37
Add Skipped
y + 31x + 20 x + 42 0
1 1 Dbl 4P1 = (36, 18) 4P2 = (1, 2)
x+7 y + 35x + 10 20
y + 2x + 39 x+7 0
Add 5P1 = (10, 16) 5P2 = (36, 18)
x + 33 y + 19x + 22 0
y + 8x + 33 x + 20 0
0 1 Dbl 10P1 = (1, 41) 10P2 = (23, 29)
x + 42 y + 3x + 3 0
x + 42 1 0
Add 11P1 = O 11P2 = O
1 x + 20 0

During the doubling step in the second iteration, we have U2 = 2P2 . The
vertical line (x + 42 = 0) passing through 2U2 and −2U2 passes through P1 ,
since 2U2 = 4P2 = 12P1 = P1 . So the numerator fnum becomes 0. During
the addition step of the same iteration, we have U2 = 4P2 = P1 . The line
(y + 19x + 22 = 0) passing through U2 and P2 evaluates to 0 at P1 , and so
the denominator fden too becomes 0.
In practice, one works with much larger values of m. If P2 is a random
multiple of P1 , the probability of accidentally hitting upon this linear relation
in one of the Θ(log m) Miller iterations is rather small, and Algorithm 4.2
successfully terminates with high probability. Nonetheless, if the algorithm
fails, we may choose random points T on the curve and use Eqn (4.26) (instead
of Eqn (4.27) on which Algorithm 4.2 is based) until em (P1 , P2 ) is correctly
computed. In any case, Proposition 4.65(6) indicates that in this case we are
going to get em (P1 , P2 ) = 1 (when m is prime). However, checking whether
P1 and P2 are linearly dependent is, in general, not an easy computational
exercise. Although the situation is somewhat better for supersingular curves,
a check for the dependence of P1 and P2 should be avoided. ¤
The current versions of GP/PARI do not provide ready supports for Weil (or
other) pairings. However, it is not difficult to implement Miller’s algorithm
using the built-in functions of GP/PARI.

4.5.3 Tate Pairing


Another pairing of elliptic-curve points is called Tate pairing.27 Let E be
an elliptic curve defined over K = Fq with p = char K. We take m ∈ N with
gcd(m, p) = 1. Let k = ordm (q) be the embedding degree, and L = Fqk . Define
EL [m] = {P ∈ EL | mP = O}, and mEL = {mP | P ∈ EL }.
Also, let
(L∗ )m = {am | a ∈ L∗ }
27 This is named after the American mathematician John Torrence Tate Jr. (1925–).
228 Computational Number Theory

be the set of m-th powers in L∗ . Tate pairing is a function


h−, −im : EL [m] × (EL /mEL ) → L∗ /(L∗ )m
defined as follows. Let P be a point in EL [m], and Q a point in EL , to be
treated as a point in EL /mEL . Since mP = O, there is a rational function f
with Div(f ) = m[P ] − m[O]. Let D be any divisor equivalent to [Q] − [O] with
disjoint support from Div(f ). It is customary to choose a point T different
from −P, Q, Q − P, O, and take D = [Q + T ] − [T ]. Define
hP, Qim = f (D).
Since Div(f ) = m[P ] − m[O] with P ∈ EL , we can choose f as defined over
L. As a result, f (D) ∈ L∗ . Although f is unique only up to multiplication
by elements of L∗ , the value of f (D) is independent of the choice of f , since
D is a divisor of degree zero. Still, the value of f (D), as an element of L∗ , is
not unique because of its dependence on the choice of the divisor D. Let D′
be another divisor equivalent to [Q] − [O]. We can write D′ = D + Div(g) for
some rational function g. Using Weil reciprocity, we obtain
f (D′ ) = f (D + Div(g)) = f (D)f (Div(g)) = f (D)g(Div(f ))
³ ´m
= f (D)g(m[P ] − m[O]) = f (D) g([P ] − [O]) ,

that is, f (D′ ) and f (D) differ by a multiplicative factor which is an m-th
power in L∗ . Treating f (D) as an element of L∗ /(L∗ )m makes it unique.
Another way of making the Tate pairing unique is based upon the fact
q k −1 q k −1
³ ´qk −1 q k −1 k
that f (D) m = f (D′ ) m g([P ] − [O]) = f (D′ ) m , since aq −1 = 1
for all a ∈ L∗ = F∗qk . The reduced Tate pairing of P and Q is defined as
q k −1 q k −1
êm (P, Q) = (hP, Qim ) m
= f (D) m .
Raising hP, Qim to the exponent (q k − 1)/m is called final exponentiation.
Tate pairing is related to Weil pairing as
hP, Qim
em (P, Q) = ,
hQ, P im
where the equality is up to multiplication by elements of (L∗ )m . Tate pairing
shares some (not all) properties of Weil pairing listed in Proposition 4.65.
Proposition 4.69 For appropriate points P, Q, R on E, we have:
(1) Bilinearity:
hP + Q, Rim = hP, Rim × hQ, Rim ,
hP, Q + Rim = hP, Qim × hP, Rim .
(2) Non-degeneracy: For every P ∈ EL [m], P = 6 O, there exists Q for which
hP, Qim 6= 1. For every Q ∈
/ mEL , there exists P ∈ EL [m] with hP, Qim 6= 1.
Arithmetic of Elliptic Curves 229

(3) Linear dependence: Let m be a prime divisor of |EK |, and P a generator of


a subgroup G of EK of order m. If k = 1 (that is, L = K), then hP, P im 6= 1.
If k > 1, then hP, P im = 1, and so, by bilinearity, hQ, Q′ im = 1 for all
Q, Q′ ∈ G. However, if k > 1 and Q ∈ EL is linearly independent of P (that
is, Q ∈
/ G), then hP, Qim 6= 1.
All these properties continue to hold for the reduced Tate pairing. ⊳
Miller’s algorithm for computing fn,P can be easily adapted to compute
the Tate pairing of P and Q. We choose a point T 6= P, −Q, P − Q, O, and
take D = [Q + T ] − [T ]. We have
fm, P (Q + T )
hP, Qim = . (4.28)
fm, P (T )
Moreover, if P and Q are linearly independent, then
hP, Qim = fm,P (Q) . (4.29)
Algorithm 4.3 describes the computation of the reduced Tate pairing êm (P, Q)
using a single Miller loop and using separate variables for the numerator and
the denominator. This algorithm is based upon Eqn (4.28).

Algorithm 4.3: Miller’s algorithm for computing the reduced


Tate pairing êm (P, Q)
Let m = (1ms−1 . . . m1 m0 )2 be the binary representation of m.
Initialize fnum = fden = 1, and U = P .
Choose a point T 6= P, −Q, P − Q, O.
For i = s − 1, s − 2, . . . , 1, 0 {
/* Doubling */
Compute the rational functions LU,U and L2U,−2U .
2
Update numerator fnum = fnum × LU,U (Q + T ) × L2U,−2U (T ).
2
Update denominator fden = fden × L2U,−2U (Q + T ) × LU,U (T ).
Update U = 2U .
/* Conditional adding */
If (mi = 1) {
Compute the rational functions LU,P and LU +P,−(U +P ) .
Update numerator fnum = fnum ×LU,P (Q + T )×LU +P,−(U +P ) (T ).
Update denominator fden = fden ×LU +P,−(U +P ) (Q + T )×LU,P (T ).
Update U = U + P .
}
}
Compute f = fnum /fden .
/* Do the final exponentiation */
k
Return f (q −1)/m .

Algorithm 4.3 is somewhat more efficient than Algorithm 4.2. First, Tate
pairing requires only one point U to be maintained and updated in the loop,
230 Computational Number Theory

whereas Weil pairing requires two (U1 and U2 ). Second, in the loop of Al-
gorithm 4.3, only one set of rational functions (LU,U and L2U,−2U during
doubling, and LU,P and LU +P,−(U +P ) during addition) needs to be computed
(but evaluated twice). The loop of Algorithm 4.2 requires the computation
of two sets of these functions. To avoid degenerate output, it is a common
practice to take the first point P from Eq and the second point Q from Eqk .
In this setting, the functions fn,P are defined over Fq , whereas the functions
fn,Q are defined over Fqk . This indicates that the Miller’s loop for computing
hP, Qim is more efficient than that for computing hQ, P im . Moreover, if P
and Q are known to be linearly independent, we use Eqn (4.29) instead of
Eqn (4.28). This reduces the number of evaluations of the line functions by a
factor of two. As a result, Tate pairing is usually preferred to Weil pairing in
practical applications. The reduced Tate pairing, however, calls for an extra
final exponentiation. If k is not too small, this added overhead may make Tate
pairing less efficient than Weil pairing.

Example 4.70 Let us continue to work with the curve of Example 4.68, and
compute the Tate pairing of P = (1, 2) and Q = (15 + 22θ, 5 + 14θ). (These
points were called P1 and P2 in Example 4.68). Miller’s loop of Algorithm 4.3
proceeds as in the following table. These computations correspond to the
point T = (36 + 12θ, 40 + 31θ) for which Q + T = (19 + 32θ, 24 + 27θ). Here,
we have only one set of points maintained as U (Weil pairing required two:
U1 , U2 ). The updating rational function Λ is LU,U /L2U,−2U for doubling and
LU,P /LU +P,−(U +P ) for addition.

i mi Step Λ f U
y + 20x + 21 41 + 17θ
2 0 Dbl 2P = (11, 26)
x + 32 27 + 27θ
Add Skipped
y + 31x + 20 14 + 31θ
1 1 Dbl 4P = (36, 18)
x + 32 15 + 15θ
y + 2x + 39 41 + 36θ
Add 5P = (10, 16)
x + 33 37 + 16θ
y + 8x + 33 36 + 24θ
0 1 Dbl 10P = (1, 41)
x + 42 11 + 36θ
x + 42 9 + 36θ
Add 11P = O
1 39 + 16θ

These computations give


9 + 36θ
hP, Qim = = 14 + 4θ.
39 + 16θ
The corresponding reduced pairing is
2
êm (P, Q) = (14 + 4θ)(43 −1)/11
= 2 + 13θ.
Arithmetic of Elliptic Curves 231

The value of hP, Qim depends heavily on the choice of the point T . For
example, the choice T = (34 + 23θ, 9 + 23θ) (another point of order 44) gives

hP, Qim = 4 + 33θ.

The two values of hP, Qim differ by a factor which is an m-th power in F∗432 :
4 + 33θ
= 9 + 9θ = (4 + 23θ)m .
14 + 4θ
However, the final exponentiation gives the same value, that is,
2
êm (P, Q) = (4 + 33θ)(43 −1)/11
= 2 + 13θ.

By Example 4.68, em (P, Q) = 26 + 20θ. Computing hQ, P im for the choice


T = (11 + 15θ, 38 + 25θ) gives hQ, P im = 36 + 4θ. We have
µ ¶ µ ¶
hP, Qim 14 + 4θ
em (P, Q) = ×ξ = × ξ = (8 + 4θ) × ξ,
hQ, P im 36 + 4θ
where ξ = (26 + 20θ)/(8 + 4θ) = 38 + 5θ = (1 + 37θ)m .
In this example, all multiples of P lie in EF43 , that is, P and Q are linearly
independent, so we are allowed to use the formula hP, Qim = fm,P (Q). But
then, the line functions are evaluated only at the point Q (instead of at two
points Q + T and T ). The corresponding calculations in the Miller loop are
shown in the following table.

i mi Step Λ f U
y + 20x + 21 25 + 24θ
2 0 Dbl 2P = (11, 26)
x + 32 4 + 22θ
Add Skipped
y + 31x + 20 5 + 23θ
1 1 Dbl 4P = (36, 18)
x + 32 22 + 26θ
y + 2x + 39 25 + 14θ
Add 5P = (10, 16)
x + 33 11 + 12θ
y + 8x + 33 13 + 29θ
0 1 Dbl 10P = (1, 41)
x + 42 19 + 8θ
x + 42 17 + 4θ
Add 11P = O
1 19 + 8θ

17 + 4θ
We now get hP, Qim = = 15 + 12θ. We have seen that Eqn (4.28)
19 + 8θ
with T = (36 + 12θ, 40 + 31θ) gives hP, Qim = 14 + 4θ. The ratio of these two
15 + 12θ
values is = 7θ = (6θ)m which is an m-th power in F∗432 . Now, the
14 + 4θ
2
reduced pairing is êm (P, Q) = (15 + 12θ)(43 −1)/11 = 2 + 13θ which is again
the same as that computed using Eqn (4.28). ¤
232 Computational Number Theory

4.5.4 Non-Rational Homomorphisms


In typical practical applications, one takes a large prime divisor of |Eq | as
m. If k > 1, there is a unique (cyclic) subgroup of Eq of order m. Let G denote
this subgroup. If k = 1, there are two copies of Zm in Eq , and we take any of
these copies as G. The restriction of em or êm to the group G×G is of primary
concern to us. The linear dependence property of Weil pairing indicates that
em (P, Q) = 1 for all P, Q ∈ G. The same property for Tate pairing suggests
that if k > 1, we again get êm (P, Q) = 1 for all P, Q ∈ G. Consequently, the
maps em or êm restricted to G × G are trivial.
There is a way out of this problem. The pairing of linearly independent
points P and Q is a non-trivial value. In order to exploit this property, we first
map one of the points P, Q ∈ G, say the second one, to a point Q′ linearly
independent of P , and then apply the original Weil or Tate pairing on P and
Q′ . However, some care needs to be adopted so as to maintain the property of
bilinearity. Two ways of achieving this are now explained, both of which use
group homomorphisms that are not rational (that is, not defined) over Fq .

4.5.4.1 Distortion Maps


Definition 4.71 Let φ : E[m] → E[m] be an endomorphism of E[m] with
φ(P ) ∈
/ G for some P 6= O in G. The map φ is called a distortion map (for
G). The distorted Weil pairing of P, Q ∈ G is defined as em (P, φ(Q)), whereas
the distorted Tate pairing of P, Q ∈ G is defined as hP, φ(Q)im . ⊳

Since φ(P ) is linearly independent of P , we have em (P, φ(P )) 6= 1 and


hP, φ(P )im 6= 1. On the other hand, since φ is an endomorphism, bilinearity
is preserved. Moreover, we achieve an additional property.

Proposition 4.72 Symmetry: For all P, Q ∈ G, we have em (P, φ(Q)) =


em (Q, φ(P )), and hP, φ(Q)im = hQ, φ(P )im . ⊳

Distortion maps, however, exist only for supersingular curves.

Example 4.73 Consider the curve E : y 2 = X 3 + 3X of Examples 4.68 and


4.70. For m = (p + 1)/4 = 11, the group E[11] contains 112 points, and is
generated by two linearly independent points P = (1, 2) and Q = (−1, 2θ) =
(42, 2θ). The subgroup G of E[11] generated by P is a subset of EF43 . We want
to define a bilinear pairing on G × G.
For P1 = P = (1, 2) and P2 = 3P1 = (23, 14), Algorithm 4.3 with the
choice T = (12 + 7θ, 5 + 38θ) yields hP1 , P2 im = 22, and we get the trivial
2
value êm (P1 , P2 ) = 22(43 −1)/11 = 1.
A distortion map φ : E[11] → E[11] is fully specified by its images φ(P )
and φ(Q) only, since P and Q generate E[11]. In this example, we may take
φ(1, 2) = (−1, 2θ), and φ(−1, 2θ) = (1, 2). It follows that φ(a, b) = (−a, bθ)
for all (a, b) ∈ EF43 . In particular, φ(P2 ) = φ(23, 14) = (−23, 14θ) = (20, 14θ).
Arithmetic of Elliptic Curves 233

Tate pairing of P1 = (1, 2) and φ(P2 ) = (20, 14θ) is a non-trivial value. Algo-
rithm 4.3 with T = (37 + 6θ, 14 + 13θ) gives hP1 , φ(P2 )im = 21 + 2θ, and so
2
êm (P1 , φ(P2 )) = (21 + 2θ)(43 −1)/11 = 18 + 8θ. Moreover, Algorithm 4.2 for
Weil pairing now gives em (P1 , φ(P2 )) = 11 + 3θ, again a non-trivial value.
Let us now compute the pairing of P2 = (23, 14) and φ(P1 ) = (−1, 2θ) =
(42, 2θ). Tate pairing with T = (38 + 21θ, 19 + 11θ) gives hP2 , φ(P1 )im =
30 + 29θ = (21 + 2θ) × (23θ) = (21 + 2θ) × (30θ)m . The reduced Tate pairing
2
is êm (P2 , φ(P1 )) = (30 + 29θ)(43 −1)/11 = 18 + 8θ = êm (P1 , φ(P2 )). Finally,
the Weil pairing of P2 and φ(P1 ) is em (P2 , φ(P1 )) = 11 + 3θ = em (P1 , φ(P2 )).
Symmetry about the two arguments is thereby demonstrated. ¤

4.5.4.2 Twists
Another way of achieving linear independence of P and Q′ is by means of
twists which work even for ordinary curves. Suppose that p 6= 2, 3, and E is
defined by the short Weierstrass equation E : Y 2 = X 3 + aX + b. Further, let
d be an integer > 2, and v ∈ F∗q a d-th power non-residue. The curve

E ′ : Y 2 = X 3 + v 4/d aX + v 6/d b

is called a twist of E of degree d. For d = 2 (quadratic twist), E ′ is defined


over Fq itself. In general, E ′ is defined over Fqd . E and E ′ are isomorphic over
Fqd (but need not be over Fq even when E ′ is defined over Fq ). An explicit
isomorphism is given by the map φd : E ′ → E taking (r, s) 7→ (v −2/d r, v −3/d s).

Definition 4.74 Let G be a subgroup of order m in Eqk , and G′ a subgroup


of order m in Eq′ k . For quadratic twists, a natural choice is G ⊆ Eq and
G′ ⊆ Eq′ . We now pair points P, Q with P ∈ G and Q ∈ G′ . The twisted
Weil pairing of P and Q is defined as em (P, φd (Q)), whereas the twisted Tate
pairing of P and Q is defined as hP, φd (Q)im . ⊳

The domain of definition of twisted pairing is G × G′ (not G × G), but


that is not an important issue. Application of φd takes elements of G′ in G,
and the original pairing can be applied. The twisted pairing is non-degenerate
if φd (Q) is linearly independent of P . Since φd is a group homomorphism,
bilinearity is preserved.

Example 4.75 Let E : Y 2 = X 3 + 3X be the curve of Examples 4.68, 4.70,


and 4.73. Since p ≡ 3 (mod 8), two is a quadratic non-residue modulo p. Thus,
a quadratic twist of E is E ′ : Y 2 = X 3 + 12X. We now define a pairing on
G × G′ , where G is the subgroup of E[11] generated by P = (1, 2), and G′
is the subgroup of E ′ [11] generated by Q = (1, 20). Note that G and G′ are
of order 11, and completely contained in EF43 and EF′ 43 , respectively. Since E
and E ′ are different curves, they have different points. Indeed, the point (1, 2)
does not lie on E ′ , and the point (1, 20) does not lie on E.
234 Computational Number Theory

A square root of two in F432 is 16θ (where θ2 + 1 = 0), so the twist φ2


for E, E ′ takes (a, b) ∈ G′ to ((16θ)−2 a, (16θ)−3 b) = (22a, 39θb). Take the
points P1 = P = (1, 2) ∈ G, and P2 = 3Q = (11, 42) ∈ G′ . We have φ2 (P2 ) =
(27, 4θ), which is in EF432 but not in EF43 . The twisted pairings of P1 and P2
are hP1 , φ2 (P2 )im = 18 + 8θ (for T = (41 + 36θ, 2 + 38θ)), êm (P1 , φ2 (P2 )) =
2
(18 + 8θ)(43 −1)/11 = 11 + 40θ, and em (P1 , φ2 (P2 )) = 23 + θ. ¤

4.5.5 Pairing-Friendly Curves


Thanks to Miller’s algorithm, one can efficiently compute Weil and Tate
pairings, provided that the embedding degree k is not too large. For general
(like randomly chosen) curves, we have |k| ≈ |m|. A curve is called pairing-
friendly if, for a suitable choice of m, the embedding degree k is small, typically
k 6 12. Only some specific types of curves qualify as pairing-friendly. Nonethe-
less, many infinite families of curves are known to be pairing-friendly.28

By Hasse’s theorem, |Eq | = q+1−t for some t with |t| 6 2 q. If p = char Fq
divides t, we call E supersingular. A non-supersingular curve is called ordinary.
Supersingular curves are known to have small embedding degrees. The
only possibilities are k = 1, 2, 3, 4, 6. If Fq is a prime field with q > 5, the only
possibility is k = 2. Many infinite families of supersingular curves are known.

Example 4.76 (1) Curves of the form Y 2 + aY = X 3 + bX + c (with a 6= 0)


are supersingular over fields of characteristic two. All supersingular curves
over a finite field K of characteristic two have j-invariant equal to 0, and so
are isomorphic over K̄. For these curves, k ∈ {1, 2, 3, 4}.
(2) Curves of the form Y 2 = X 3 − X ± 1 are supersingular over fields of
characteristic three. The embedding degree is six for these curves.
(3) Take a prime p ≡ 2 (mod 3), and a ∈ F∗p . The curve Y 2 = X 3 + a
defined over Fp is supersingular, and has embedding degree two.
(4) Take a prime p ≡ 3 (mod 4), and a ∈ F∗p . The curve Y 2 = X 3 + aX
defined over Fp is supersingular, and has embedding degree two.
Solve Exercise 4.66 for deriving these embedding degrees. ¤

Locating ordinary curves (particularly, infinite families) with small em-


bedding degrees is a significantly more difficult task. One method is to fix
an embedding degree k and a discriminant ∆, and search for (integer-valued)
polynomials t(x), m(x), q(x) ∈ Q[x] satisfying the following five conditions:
(1) q(x) = p(x)n for some n ∈ N and p(x) ∈ Q[x] representing primes.
(2) m(x) is irreducible with a positive leading coefficient.
¡ ¢
(3) m(x)| q(x) + 1 − t(x) .
28 A comprehensive survey on pairing-friendly curves can be found in the article: David

Freeman, Michael Scott and Edlyn Teske, A taxonomy of pairing-friendly elliptic curves,
Journal of Cryptology, 23(2), 224–280, 2010.
Arithmetic of Elliptic Curves 235

(4) m(x)|Φk (t(x) − 1), where Φk is the k-th cyclotomic polynomial (see
Exercise 3.36).
(5) There are infinitely many integers (x, y) satisfying ∆y 2 = 4q(x) − t(x)2 .
If we are interested in ordinary curves, we additionally require:
(6) gcd(q(x), m(x)) = 1.
For a choice of t(x), m(x), q(x), families of elliptic curves over Fq of size
m, embedding degree k, and discriminant ∆ can be constructed using the
complex multiplication method. If y in Condition (5) can be parametrized by
a polynomial y(x) ∈ Q[x], the family is called complete, otherwise it is called
sparse. Some sparse families of ordinary pairing-friendly curves are:
• MNT (Miyaji–Nakabayashi–Takano) curves29 : These are ordinary curves
of prime orders with embedding degrees three, four, or six. Let m > 3
be the order (prime) of an ordinary curve E, t = q + 1 − m the trace of
Frobenius, and k the embedding degree of E. The curve E is completely
characterized by the following result.
(1) k = 3 if and only if t = −1 ± 6x and q = 12x2 − 1 for some x ∈ Z.
(2) k = 4 if and only if t = −x or t = x + 1, and q = x2 + x + 1 for
some x ∈ Z.
(3) k = 6 if and only if t = 1 ± 2x and q = 4x2 + 1 for some x ∈ Z.
• Freeman curves30 : These curves have embedding degree ten, and corre-
spond to the choices:
t(x) = 10x2 + 5x + 3,
m(x) = 25x4 + 25x3 + 15x2 + 5x + 1,
q(x) = 25x4 + 25x3 + 25x2 + 10x + 3.
For this family, we have m(x) = q(x) + 1 − t(x). The discriminant ∆ of
Freeman curves satisfies ∆ ≡ 43 or 67 (mod 120).
Some complete families of ordinary pairing-friendly curves are:
• BN (Barreto–Naehrig) curves31 : These curves have embedding degree
12 and discriminant three, and correspond to the following choices.
t(x) = 6x2 + 1,
m(x) = 36x4 + 36x3 + 18x2 + 6x + 1,
q(x) = 36x4 + 36x3 + 24x2 + 6x + 1.
29 Atsuko Miyaji, Masaki Nakabayashi and Shunzo Takano, New explicit conditions of

elliptic curve traces for FR-reductions, IEICE Transactions on Fundamentals, E84-A(5),


1234–1243, 2001.
30 David Freeman, Constructing pairing-friendly elliptic curves with embedding degree 10,

ANTS-VII, 452–465, 2006.


31 Paulo S. L. M. Barreto and Michael Naehrig, Pairing-friendly elliptic curves of prime

order, SAC, 319–331, 2006.


236 Computational Number Theory

For this family too, we have m(x) = q(x) + 1 − t(x).


• SB (Scott–Barreto) curves32 : The family of SB curves with embedding
degree k = 6 corresponds to the choices:

t(x) = −4x2 + 4x + 2,
17
q(x) = 4x5 − 8x4 + 3x3 − 3x2 + x + 1,
4
m(x) = 16x4 − 32x3 + 12x2 + 4x + 1.

For a square-free positive ∆ not dividing 27330 = 2 × 3 × 5 × 911, the


family (t(∆z 2 ), m(∆z 2 ), q(∆z 2 )) parametrized by z is a complete family
of ordinary curves of embedding degree six and discriminant ∆.
• BLS (Barreto–Lynn–Scott)33 and BW (Brezing–Weng)34 curves: These
families are called cyclotomic families, as the construction of these curves
is based upon cyclotomic extensions of Q. For example, the choices

t(x) = −x2 + 1,
m(x) = Φ4k (x),
1 ¡ 2k+4 ¢
q(x) = x + 2x2k+2 + x2k + x4 − 2x2 + 1
4
parametrize a family of BW curves with odd embedding degree k < 1000
and with discriminant ∆ = 1. Many other families of BLS and BW
curves are known.

4.5.6 Efficient Implementation


Although the iterations in Miller’s loops resemble repeated square-and-
multiply algorithms for exponentiation in finite fields, and repeated double-
and-add algorithms for point multiplication in elliptic curves (Exercise 4.26),
the computational overhead in an iteration of Miller’s loop is somewhat more
than that for finite-field exponentiation or elliptic-curve point multiplication.
In this section, we study some implementation tricks that can significantly
speed up Miller’s loop for computing Weil and Tate pairings.
Eisenträger, Lauter and Montgomery35 describe methods to speed up the
computation when the i-th bit mi in Miller’s loop is 1. In this case, computing
32 Michael Scott and Paulo S. L. M. Barreto, Generating more MNT elliptic curves, De-

signs, Codes and Cryptography, 38, 209–217, 2006.


33 Paulo S. L. M. Barreto, Ben Lynn and Michael Scott, Constructing elliptic curves with

prescribed embedding degrees, SCN, 263–273, 2002.


34 Friederike Brezing and Annegret Weng, Elliptic curves suitable for pairing based cryp-

tography, Designs, Codes and Cryptography, 37, 133–141, 2005.


35 Kirsten Eisenträger, Kristin Lauter and Peter L. Montgomery, Improved Weil and Tate

pairings for elliptic and hyperelliptic curves, ANTS, 169–183, 2004


Arithmetic of Elliptic Curves 237

2U +P as (U +P )+U saves a few field operations (Exercise 4.27). For speeding


up the update of f , they propose a second trick of using a parabola.
Another improvement is from Blake, Murty and Xu36 who replace the
computation of four lines by that of only two lines for the case mi = 1 (Exer-
cise 4.61). Although Blake et al. claim that their improvement is useful when
most of the bits mi are 1, the practicality of this improvement is rather evident
even for random values of m.

4.5.6.1 Windowed Loop in Miller’s Algorithm

By Exercise 1.27, a repeated square-and-multiply or double-and-add algo-


rithm can benefit from consuming chunks of bits in the exponent or multiplier
in each iteration, at the cost of some precomputation. Employing a similar
strategy for Miller’s algorithm is not straightforward, since the updating for-
mulas for f do not immediately yield to efficient generalizations. For example,
the formulas given in Exercise 4.62 do not make it obvious how even a two-bit
window can be effectively handled.
Blake, Murty and Xu use a separate set of formulas for a two-bit window
(see Exercise 4.59). Algorithm 4.4 rewrites Miller’s algorithm for the compu-
tation of fm,P (Q), and can be readily adapted to handle Miller’s algorithms
for computing Weil and Tate pairings.
Some further refinements along the line of Blake et al.’s developments are
proposed by Liu, Horng and Chen37 who segment the bitstring for m into
patterns of the form (01)r , 01r , 01r 0, 0r , 1r 0, and 1r . For many such patterns,
Liu et al. propose modifications to reduce the number of lines in the updating
formulas for f .

4.5.6.2 Final Exponentiation

Although the double-and-add loop for Tate pairing is more efficient than
that for Weil pairing, the added overhead of final exponentiation is unpleasant
for the reduced Tate pairing. Fortunately, we can choose the curve parameters
and tune this stage so as to arrive at an efficient implementation.38
Suppose that the basic field of definition of the elliptic curve E is Fq with
p = char Fq . We take m to be a prime dividing |Eq |. If k is the embedding
degree for this q and m, the final-exponentiation stage involves an exponent
of (q k − 1)/m. In this stage, we do arithmetic in the extension field Fqk .

36 Ian F. Blake, V. Kumar Murty and Guangwu Xu, Refinements of Miller’s algorithm for

computing the Weil/Tate pairing, Journal of Algorithms, 58(2), 134–149, 2006.


37 Chao-Liang Liu, Gwoboa Horng and Te-yu Chen, Further refinement of pairing compu-

tation based on Miller’s algorithm, Applied Mathematics and Computation, 189(1), 395–409,
2007. This is available also at http://eprint.iacr.org/2006/106.
38 Michael Scott, Naomi Benger, Manuel Charlemagne, Luis J. Dominguez Perez and

Ezekiel J. Kachisa, On the final exponentiation for calculating pairings on ordinary elliptic
curves, Pairing, 78–88, 2009.
238 Computational Number Theory

Algorithm 4.4: Two-bit windowed loop for computing fm,P (Q)


Take the base-4 representation m = (Ms Ms−1 . . . M1 M0 )4 , Ms 6= 0.
Initialize f = 1, and U = P .
LP,P (Q)
If (Ms = 2), set f = , and U = 2P ,
L2P,−2P (Q)
LP,P (Q)×LP,−P (Q)
else if (Ms = 3), set f = − , and U = 3P .
L2P,P (−Q)
For i = s − 1, s − 2, . . . , 1, 0, do {
If (Mi = 0) { Ã !
2
L U,U (Q)
Set f = −f 4 × , and U = 4U .
L2U,2U (−Q)
} else if (Mi =Ã1) { !
4
L2U,U (Q) × L4U,P (Q)
Set f = −f × , and U = 4U + P .
L4U +P,−(4U +P ) (Q) × L2U,2U (−Q)
} else if (Mi =Ã2) { !
4
L2U,U (Q) × L22U,P (Q)
Set f = −f × , and U = 4U + 2P .
L22U,−2U (Q) × L2U +P,2U +P (−Q)
} else if (Mi = Ã 3) { !
4
L2U,U (Q) × L22U,P (Q) × L4U +2P,P (Q)
Set f = −f × ,
L22U,−2U (Q)×L2U +P,2U +P (−Q)×L4U +3P,−(4U +3P ) (Q)
and U = 4U + 3P .
}
}
Return f .

Let β0 , β1 , . . . , βk−1 be an Fq -basis of Fqk . An element α = a0 β0 + a1 β1 +


· · · + ak−1 βk−1 ∈ Fqk (with ai ∈ Fq ) satisfies αq = a0 β0q + a1 β1q + · · · +
q
ak−1 βk−1 , since aqi = ai by Fermat’s little theorem for Fq . If the quantities
q q q
β0 , β1 , . . . , βk−1 are precomputed as linear combinations of the basis elements
β0 , β1 , . . . , βk−1 , simplifying αq to its representation in this basis is easy. For
example, if q = p ≡ 3 (mod 4), k = 2, and we represent Fp2 = Fp (θ) with
θ2 +1 = 0, then (a0 +a1 θ)p = a0 −a1 θ. Thus, exponentiation to the q-th power
is much more efficient than general square-and-multiply exponentiation.
Now, suppose that k = 2d is even, and write q k −1 = (q d −1)(q d +1). Since
k = ordq (m), we conclude that m does not divide q d − 1. But m is prime, so
it must divide q d + 1, and the final exponentiation can be carried out as
k
³ d ´(qd +1)/m
f (q −1)/m = f q −1 .

The inner exponentiation (to the power q d −1) involves d q-th power exponen-
tiations, followed by multiplication by f −1 . Since |(q d +1)/m| ≈ 12 |(q k −1)/m|,
this strategy reduces final-exponentiation time by a factor of about two.
Arithmetic of Elliptic Curves 239

Some more optimizations can be performed on the exponent (q d + 1)/m.


It follows from Exercise 3.36(c) that the cyclotomic polynomial Φk (x) divides
xd + 1 in Z[x]. Moreover, m|Φk (q). Therefore,

µ³ ´(qd +1)/Φk (q) ¶Φk (q)/m


(q k −1)/m q d −1
f = f .

The intermediate exponent (q d + 1)/Φk (q) is a polynomial in q, so we can


again exploit the efficiency of computing q-th powers in Fqk . Only the out-
ermost exponent Φk (q)/m calls for a general exponentiation algorithm. We
have deg Φk (x) = φ(k) (Euler totient function), so this exponent is of size
≈ |q φ(k) /m|. For k = 6, this is two-third the size of (q d + 1)/m.

4.5.6.3 Denominator Elimination


Barreto, Lynn and Scott39 propose a nice idea in view of which the vertical
lines L2U,−2U and LU +P,−(U +P ) can be eliminated altogether during the com-
putation of the reduced Tate pairing. Let E be an elliptic curve defined over
Fq of characteristic > 3, m a prime, and k = ordm q the embedding degree.
Assume that k = 2d is even (we may have d = 1), and that E is given by the
short Weierstrass equation Y 2 = X 3 + aX + b for some a, b ∈ Fq .
E is defined over Fqd as well. Let v ∈ F∗qd be a quadratic non-residue
in Fqd . Define the quadratic twist of E over Fqd in the usual way, that is,
E ′ : Y 2 = X 3 + v 2 aX + v 3 b. Choose a non-zero point P1 ∈ Eq [m] and a non-
zero point P2 ∈ Eq′ d [m]. Let G1 and G2 be the groups of order m generated
by P1 and P2 , respectively. We define the reduced Tate pairing on G1 × G2 as
êm (P, φ2 (Q)) for P ∈ G1 and Q ∈ G2 . Here, φ2 : Eq′ d → Eqk is the twist map

(r, s) 7→ (v −1 r, (v v)−1 s). Since v ∈ Fqd and G2 ⊆ Eq′ d , it follows that the
X-coordinates of all the points φ2 (Q) are in Fqd (although their Y -coordinates
may belong to the strictly larger field F2k ).
The vertical lines L2U,−2U and LU +P,−(U +P ) are of the form x + c with
c ∈ Fq , and evaluate, at points with X-coordinates in Fqd , to elements of Fqd .
d
Let α ∈ F∗qd . By Fermat’s little theorem for Fqd , we have αq −1 = 1.
Since k = 2d = ordm q, we conclude that q d − 1 is not divisible by m, and so
k
³ d ´(qd +1)/m
α(q −1)/m = αq −1 = 1. This, in turn, implies that the contribu-
tions of the values of L2U,−2U and LU +P,−(U +P ) at points with X-coordinates
in Fqd are absorbed in the final exponentiation. Consequently, it is not neces-
sary to compute the lines L2U,−2U and LU +P,−(U +P ) , and to evaluate them.
However, in order to make the idea work, we must choose the (random) point
T in Algorithm 4.3 from Eqd (or use the formula fm,P (Q) of Eqn (4.29)).

39 Paulo S. L. M. Barreto, Ben Lynn, and Michael Scott, On the selection of pairing-

friendly groups, SAC, 17–25, 2003.


240 Computational Number Theory

4.5.6.4 Loop Reduction


Loop reduction pertains to the computation of Tate pairing. In the Miller
loop for computing the function fm,P , the number of iterations is (about)
log2 m. A loop-reduction technique computes fM,P for some M , and involves
log2 M iterations. If M is substantially
√ smaller than m, this leads to significant
speedup. For example, if M ≈ m, the number of iterations in the Miller loop
reduces by a factor of nearly two. It is evident that replacing fm,P by fM,P for
an arbitrary M is not expected to produce any bilinear pairing at all. We must
choose M carefully so as to preserve both bilinearity and non-degeneracy. The
idea of loop reduction is introduced by Duursma and Lee.40

Eta Pairing
Barreto et al.41 propose an improvement of the Duursma–Lee construction.
Let E be a supersingular elliptic curve defined over K = Fq , m a prime divisor
of |Eq |, and k the embedding degree. E being supersingular, there exists a
distortion map φ : G → G′ for suitable groups G ⊆ Eq [m] and G′ ⊆ Eqk [m]
of order m. The distorted Tate pairing is defined as hP, φ(Q)im = fm,P (D)
for P, Q ∈ G, where D is a divisor equivalent to [φ(Q)] − [O]. For a suitable
choice of M , Barreto et al. define the eta pairing of P, Q ∈ G as

ηM (P, Q) = fM,P (φ(Q)).

For the original Tate pairing, we have mP = O. Now, we remove this require-
ment M P = O. We take M = q − cm for some c ∈ Z such that for every
point P ∈ G we have M P = γ(P ) for some automorphism γ of Eq . The au-
tomorphism γ and the distortion map φ should satisfy the golden condition:
γ(φq (P )) = φ(P ) for all P ∈ Eq . If M a + 1 = λm for some a ∈ N and λ ∈ Z,
then ηM (P, Q) is related to the Tate pairing hP, φ(Q)im as
³ a−1
´(qk −1)/m ³ ´(qk −1)/m
ηM (P, Q)aM = hP, φ(Q)iλm = êm (P, φ(Q))λ .

Example 4.77 Eta pairing provides a sizeable speedup for elliptic curves
over finite fields of characteristics two and three. For fields of larger charac-
teristics, eta pairing is not very useful. Many families of supersingular curves
and distortion maps on them can be found in Example 4.76 and Exercise 4.67.
(1) Let E : Y 2 + Y = X 3 + X + a, a ∈ {0, 1}, be a supersingular curve
defined over F2r with odd r. The choice γ = φr satisfies the golden condition.
In this case, |Eq | = 2r ± 2(r+1)/2 + 1. Suppose that |Eq | is prime, so we take
m = |Eq |. We choose M = ∓2(r+1)/2 − 1, so for a = 2 we have M 2 + 1 = 2M ,
40 Iwan M. Duursma and Hyang-Sook Lee, Tate pairing implementation for hyperelliptic

curves y 2 = xp − x + d, AsiaCrypt, 111–123, 2003.


41 Paulo S. L. M. Barreto, Steven Galbraith, Colm Ó hÉigeartaigh and Michael Scott,

Efficient pairing computation on supersingular Abelian varieties, Designs, Codes and Cryp-
tography, 239–271, 2004.
Arithmetic of Elliptic Curves 241

that is, λ = 2. Also, M = q − m, that is, c = 1. Then, ηM (P, Q) = fM,P (φ(Q))


involves a Miller loop twice as efficient as the Miller loop for hP, φ(Q)im . The
embedding degree k is four in this case, so
4r 4r
êm (P, φ(Q)) = hP, φ(Q)i(2
m
−1)/m
= ηM (P, Q)2M (2 −1)/m
.
Thus, the reduced Tate pairing of P and φ(Q) can be computed by raising
ηM (P, Q) to an appropriate power.
(2) For the supersingular curve E : Y 2 = X 3 − X + b, b = ±1, defined
over F3r with gcd(r, 6) = 1, the choice γ = φr satisfies the golden condition.
We have m = |Eq | = 3r ± 3(r+1)/2 + 1. Assume that m is prime. Choose M =
q − m = ∓3(r+1)/2 − 1. For a = 3, we get M a + 1 = λm with λ = ∓3(r+3)/2 .
The embedding degree is six in this case. This gives
¡ ¢(36r −1)/m 2 6r
êm (P, φ(Q))λ = hP, φ(Q)iλm = ηM (P, Q)3M (3 −1)/m .
So a power of the reduced Tate pairing can be obtained from the eta pairing.
The speedup is by a factor of about two, since log2 M ≈ 12 log2 m. The expo-
nent λ can be removed by another exponentiation, since gcd(λ, 36r −1) = 1. ¤

Ate Pairing
Hess et al.42 extend the idea of eta pairing to ordinary curves. Distortion
maps do not exist for ordinary curves. Nonetheless, subgroups G ⊆ Eq and
G′ ⊆ Eqk of order m can be chosen. Instead of defining a pairing on G × G′ ,
Hess et al. define a pairing on G′ × G. If t is the trace of Frobenius for E at
q, they take M = t − 1, and define the ate pairing of Q ∈ G′ and P ∈ G as
aM (Q, P ) = fM,Q (P ).
The ate pairing is related to the Tate pairing as follows. Let N = gcd(M k − 1,
q k − 1), where k is the embedding degree. Write M k − 1 = λN , and c =
Pk−1 k−1−i i
i=0 M q ≡ kq k−1 (mod m). Then, we have
³ ´(qk −1)/N ³ ´(qk −1)/m
aM (Q, P )c = hQ, P iλm = êm (Q, P )λ .

It can be shown that aM is bilinear. Moreover, aM is non-degenerate if m6 | λ.



Let us now see how much saving this is with ate pairing. Since |t| 6 2 q
by Hasse’s theorem, we have log2 M 6 1 + 12 log2 m, when log2 m ≈ log2 q.
Therefore, the number of iterations in Miller’s loop gets halved. But since
a point Q of Eqk has now become the first argument, all line functions and
point arithmetic involve working in Fqk . If k > 1, each iteration of Miller’s
loop becomes costlier than that for hP, Qim (with Q at the second argument).
Particularly, for k > 6, we fail to gain any speedup using ate pairing.
42 Florian Hess, Nigel P. Smart and Frederik Vercauteren, The eta pairing revisited, IEEE

Transactions on Information Theory, 52(10), 4595–4602, 2006.


242 Computational Number Theory

Twisted Ate Pairing


In order to define an ate-like pairing on G × G′ , Hess et al. use quartic
(degree-four) and sextic (degree-six) twists. For a twist of degree d|k,

fM e ,P (Q)

is defined as the twisted ate pairing of P, Q, where M = t − 1 and e = k/d.


For efficiency, M e should be small compared to m. If the trace t is large, that

is, |t| ≈ q, twisted ate pairing fails to provide any speedup. If t is small, that
is, |t| ≈ m1/φ(k) , twisted ate pairing achieves a speedup of two over basic Tate
pairing (here, φ is the Euler totient function). Twisted ate pairing in this case
is often called an optimal pairing, since,
³ under
´ reasonable assumptions, the
log m
Miller loop cannot have less than Θ φ(k) iterations.

Example 4.78 I now illustrate how twists can help in speeding up each Miller
iteration. The following example is of a modified ate pairing that uses twists,
since twisted ate pairing is defined in a slightly different way.
Take a Barreto–Naehrig (BN) curve E : Y 2 = X 3 + b defined over Fp with
a prime p ≡ 1 (mod 6). The embedding degree is k = 12. Define a sextic twist
of E with respect to a primitive sixth root ζ of unity. The twisted curve can
be written as E ′ : µY 2 = νX 3 + b, where µ ∈ F∗p2 is a cubic non-residue,
and ν ∈ F∗p2 is a quadratic non-residue. E ′ is defined over Fp2 , and we take a
subgroup G′ of order m in Ep′ 2 . A homomorphism φ6 that maps G′ into Ep12
is given by (r, s) 7→ (ν 1/3 r, µ1/2 s). We use standard ate pairing to define the
pairing aM (φ6 (Q), P ) of Q ∈ G′ and P ∈ G. For Q ∈ G, the point φ6 (P ) is
defined over Fp12 , but not over smaller subfields (in general). Nonetheless, the
association of G′ with φ6 (G′ ) allows us to work in Fp2 in some parts of the
Miller loop. For example, if Q1 , Q2 ∈ G′ , then φ6 (Q1 )+φ6 (Q2 ) = φ6 (Q1 +Q2 ),
that is, the point arithmetic in Miller’s loop can be carried out in Fp2 . ¤

Ate i Pairing
Zhao et al.43 propose another optimization of ate pairing. For an integer i
in the range 1 6 i 6 k − 1, they take Mi ≡ (t − 1)i ≡ q i (mod m), and define

fMi ,Q (P )

as the atei pairing of Q, P . The minimum of M1 , M2 , . . . , Mk−1 yields the


shortest Miller loop. Some curves offer choices of³i for ´which the number of
iterations in the atei Miller loop is optimal (Θ log m
φ(k) ). Atei pairings are
defined on G′ × G, and have the same performance degradation as ate pairing.
43 Changan Zhao, Fangguo Zhang and Jiwu Huang, A note on the ate pairing, Interna-

tional Journal of Information Security, 7(6), 379–382, 2008.


Arithmetic of Elliptic Curves 243

R-ate Pairing
At present, the best loop-reducing pairing is proposed by Lee et al.44 If e
and e′ are two pairings, then so also is e(Q, P )u e′ (Q, P )v for any integers u, v.
For A, B, a, b ∈ Z with A = aB + b, Lee et al. define the R-ate pairing as
RA,B (Q, P ) = fa,BQ (P )fb,Q (P )GaBQ,bQ (P ),
L
U,V
where GU,V = LU +V,−(U +V )
. Any choice of A, B, a, b does not define a pair-
ing. If fA,Q (P ) and fB,Q (P ) define non-degenerate bilinear pairings with
êm (Q, P )λ1 = fA,Q (P )µ1 and êm (Q, P )λ2 = fB,Q (P )µ2 for λ1 , λ2 , µ1 , µ2 ∈ Z,
then RA,B (Q, P ) is again a non-degenerate bilinear pairing satisfying
êm (Q, P )λ = RA,B (Q, P )µ ,
where µ = lcm(µ1 , µ2 ) and λ = (µ/µ1 )λ1 − a(µ/µ2 )λ2 , provided that m6 | λ.
There are several choices for A, B, including q, m, and the integers Mi of atei
pairing. If A = Mi and B = m, then RA,B (Q, P ) is the atei pairing of Q, P .
R-ate pairing makes two invocations of the Miller loop, but a suitable
choice of A, B, a, b reduces the total number of Miller iterations compared to
the best atei pairing. There are examples where loop reduction can be by a
factor of six over Tate pairing (for ate and atei pairings, the reduction factor
can be at most two). Moreover, R-ate pairing is known to be optimal on certain
curves for which no atei pairing is optimal. Another useful feature of R-ate
pairing is that it can handle both supersingular and ordinary curves.

4.6 Elliptic-Curve Point Counting


Let E be an elliptic curve defined over K = Fq . A pertinent question is to
find the size of the group EK = Eq . This problem has sophisticated algorithmic
solutions. If q is an odd prime or a power of two, there exist polynomial-time
(in log q) algorithms to compute |Eq |. The first known algorithm of this kind is
from Schoof.45 The SEA algorithm (named after Schoof, Elkies and Atkin)46
is an efficient modification of Schoof’s algorithm.
Another related question is: Given a size s satisfying the Hasse bound
√ √
(that is, q + 1 − 2 q 6 s 6 q + 1 + 2 q), how can one construct an elliptic
curve E over Fq with |Eq | = s? As of now, this question does not have efficient
algorithmic solutions, except in some very special cases.
44 Eunjeong Lee, Hyang-Sook Lee and Cheol-Min Park, Efficient and generalized pairing

computation on Abelian varieties, Cryptology ePrint Archive, Report 2008/040, 2008.


45 René Schoof, Elliptic curves over finite fields and the computation of square roots mod

p, Mathematics of Computation, 44(170), 483–494, 1985.


46 René Schoof, Counting points on elliptic curves over finite fields, Journal de Théorie

des Nombres de Bordeaux, 7, 219–254, 1995.


244 Computational Number Theory

4.6.1 A Baby-Step-Giant-Step (BSGS) Method


I first present an exponential-time probabilistic algorithm based on Shank’s
baby-step-giant-step paradigm,47 that works for any finite field Fq (q need not
be prime), and runs in O˜(q 1/4 ) time. Let E be an elliptic curve over Fq . By
√ √
Hasse’s theorem, the size of Eq lies between q + 1 − 2 q and q + 1 + 2 q. We
locate an m in this interval such that ord P |m for some point P ∈ Eq . If ord P
has a unique multiple m in the interval, then |Eq | = m. To promote search,
we break the interval into about 2q 1/4 subintervals of about the same size.
Baby Steps: We choose a random non-zero point P ∈ Eq§, and¨compute
and store the multiples O, ±P, ±2P, ±3P, . . . , ±sP , where s = q 1/4 . In fact,
it suffices to compute only the points P, 2P, 3P, . . . , sP . The opposites of these
points can be easily obtained. Some appropriate data structures (like hash
tables or sorted arrays or balanced search trees) should be used to store these
2s + 1 points in order to promote efficient search in the list.
§ √ ¨
Giant Steps: We compute the points Q = q + 1 − 2 q P and R =
(2s + 1)P by the repeated double-and-add algorithm. Subsequently, for i =
0, 1, 2, 3, . . . , 2s + 1, we compute Q + iR (for i > 0, we have Q + (i + 1)R =
(Q + iR) + R). For each i, we check whether Q + iR resides in the list of baby
steps, that is, whether Q + iR = jP§ for some j ¨in the range −s 6 j 6 s. If

so, we obtain mP = O, where m = q + 1 − 2 q + (2s + 1)i − j is a desired
multiple of ord P in the Hasse interval.
Example 4.79 Take E : Y 2 = X 3 + 3X + 6 defined§ ¨ over Fp = F997 , and
the point P = (234, 425) on E. We have s = p1/4 = 6. The baby steps
are
§ computed first. Next, we prepare for the giant steps by computing Q =
√ ¨
p + 1 − 2 p P = 935P = (944, 231) and R = (2s + 1)P = 13P = (24, 464).
Baby steps Giant steps
j jP −jP i Q + iR j
0 O 0 (944, 231)
1 (234, 425) (234, 572) 1 (867, 2)
2 (201, 828) (201, 169) 2 (115, 260)
3 (886, 800) (886, 197) 3 (527, 565)
4 (159, 57) (159, 940) 4 (162, 759)
5 (18, 914) (18, 83) 5 (34, 974)
6 (623, 968) (623, 29) 6 (549, 677)
7 (643, 806)
8 (159, 940) −4
We stop making giant steps as soon as we discover i, j with Q + iR = jP .
This gives (935 + 8 × 13 − (−4))P = O, that is, 1043P = O, that is, m = 1043.
47 Daniel Shanks, Class number, a theory of factorization and genera, Proceedings of Sym-

posia in Pure Mathematics, 20, 415–440, 1971. The BSGS paradigm is generic enough to
be applicable to a variety of computational problems. Its adaptation to point counting and
Mestre’s improvement are discussed in Schoof’s 1995 paper (see Footnote 46). Also see
Section 7.1.1 for another adaptation of the BSGS method.
Arithmetic of Elliptic Curves 245

If we complete all the giant steps, we find no other multiple of ord P in the
Hasse interval [935, 1061]. So the size of EF997 is 1043. Since 1043 = 7 × 149
is square-free, this group is cyclic. ¤
The BSGS method may fail to supply a unique answer if ord P has more
than one multiple in the Hasse interval. For instance, suppose that we start
with 149P as the base point P in Example 4.79. This point has order 7, so
every multiple of 7 in the Hasse interval will be supplied as a possible can-
didate for |Eq |. For this example, the problem can be overcome by repeating
the algorithm for different random choices of the base point P . After a few
iterations, we expect to find a P with a unique multiple of ord P in the Hasse
interval. This is indeed the expected behavior if Eq is a cyclic group.
However, the group Eq need not be cyclic. By Theorem 4.12, we may have
Eq ∼= Zn1 ⊕ Zn2 with n2 | gcd(n1 , q − 1). Every point P ∈ Eq satisfies n1 P = O
(indeed, n1 is the smallest positive integer with this property; we call n1 the
exponent of the group Eq ). If n1 is so small that the Hasse interval contains
two or more multiples of n1 , the BSGS method fails to supply a unique answer,
no matter how many times we run it (with different base points P ).
Example 4.80 The curve E : Y 2 = X 3 +X +161 defined over F1009 contains
1024 points and has the group structure Z64 ⊕ Z16 . The exponent of the group
is 64, which has two multiples 960 and 1024 in the Hasse interval [947, 1073].
Therefore, both 960P = O and 1024P = O for any point P on E. A point P of
order smaller than 64 has other multiples of ord P in the Hasse interval. Trying
several random points on E, we can eliminate these extra candidates, but the
ambiguity between 960 and 1024 cannot be removed. This is demonstrated
below for P = (6, 49). Now, we have s = 6, Q = (947, 339), and R = (947, 670).
Baby steps Giant steps
j jP −jP i Q + iR j
0 O 0 (947, 339)
1 (6, 49) (6, 960) 1 O 0
2 (3, 47) (3, 962) 2 (947, 670)
3 (552, 596) (552, 413) 3 (550, 195)
4 (798, 854) (798, 155) 4 (588, 583)
5 (455, 510) (455, 499) 5 (604, 602)
6 (413, 641) (413, 368) 6 (6, 49) 1
7 (717, 583)
8 (855, 172)
9 (756, 1000)
10 (713, 426)
11 (3, 47) 2
12 (842, 264)
13 (374, 133)
For (i, j) = (1, 0), we get m = 947+13−0 = 960, whereas for (i, j) = (6, 1),
we get m = 947+6×13−1 = 1024. The algorithm also outputs (i, j) = (11, 2),
246 Computational Number Theory

for which m = 947 + 11 × 13 − 3 = 1088. This third value of m is outside the


Hasse interval [947, 1073], and can be ignored. It is reported because of the
approximation of p1/4 by the integer s. Indeed, it suffices here to make giant
steps for 0 6 i 6 10 only. ¤

4.6.1.1 Mestre’s Improvement


J. F. Mestre proposes a way to avoid the problem just mentioned. Let
v ∈ F∗q be a quadratic non-residue (this requires q to be odd). Consider the
quadratic twist E ′ : Y 2 = X 3 + av 2 X + bv 3 of E, which is again defined over
Fq . By Exercise 4.68, |Eq | + |Eq′ | = 2(q + 1). Instead of computing |Eq |, we
may, therefore, calculate |Eq′ |. Even if Eq is of low exponent, the chance that
Eq′ too is of low exponent is rather small. Indeed, Mestre proves that for prime

fields Fp with p > 457, either Ep or Ep′ contains a point of order > 4 p.

Example 4.81 As a continuation of Example 4.80, we take the quadratic


non-residue v = 11 ∈ F∗1009 to define the quadratic twist E ′ : Y 2 = X 3 +
121X + 383 of E. The BSGS method on§ E ′ works as¨follows. We take the base

point P = (859, 587), for which Q = q + 1 − 2 q P = 947P = (825, 545),
and R = (2s + 1)P = 13P = (606, 90).

Baby steps Giant steps


j jP −jP i Q + iR j
0 O 0 (825, 545)
1 (859, 587) (859, 422) 1 (246, 188)
2 (282, 63) (282, 946) 2 (746, 845)
3 (677, 631) (677, 378) 3 (179, 447)
4 (325, 936) (325, 73) 4 (677, 631) 3
5 (667, 750) (667, 259) 5 (465, 757)
6 (203, 790) (203, 219) 6 (80, 735)
7 (214, 258)
8 (148, 2)
9 (557, 197)
10 (358, 501)
11 (529, 234)
12 (144, 55)
13 (937, 420)

The computations show that there is a unique value of i for which Q + iR


is found in the list of baby steps. This gives the size of Eq′ uniquely as |Eq′ | =
947 + 4 × 13 − 3 = 996. Consequently, |Eq | = 2(1009 + 1) − 996 = 1024. ¤

It is easy to verify that the BSGS method makes Θ(q 1/4 ) group operations
in Eq , that is, the BSGS method is an exponential-time algorithm, and cannot
be used for elliptic-curve point counting, except only when q is small.
Arithmetic of Elliptic Curves 247

4.6.2 Schoof ’s Algorithm


René Schoof is the first to propose a polynomial-time algorithm to solve
the elliptic-curve point counting problem. Schoof’s algorithm is particularly
suited to prime fields. In this section, we concentrate on elliptic curves defined
over a prime field Fp . We also assume that the curve is defined by the short
Weierstrass equation E : Y 2 = X 3 + aX + b with a, b ∈ Fp .
By Hasse’s theorem, |Ep | = p + 1 − t, where the trace of Frobenius t at
√ √
p satisfies −2 p 6 t 6 2 p. Schoof’s algorithm computes t modulo many
small primes r, and combines the residues by the Chinese remainder theorem
to a unique integer in the Hasse interval. By Exercise 4.10, |Eq | (mod 2) can
be determined by checking the irreducibility of X 3 + aX + b in Fp [X]. So we
assume that r is an odd prime not equal to p.
The Frobenius endomorphism (Definition 4.47) ϕ : EF̄p → EF̄p taking (x, y)
to (xp , y p ) fixes EFp point by point,48 and satisfies the identity

ϕ2 − tϕ + p = 0, (4.30)

that is, for any P = (x, y) ∈ EF̄p , we have


2 2
ϕ(ϕ(P )) − tϕ(P ) + pP = (xp , y p ) − t(xp , y p ) + p(x, y) = O, (4.31)

where the addition, the subtraction and the scalar multiplications correspond
to the arithmetic of EF̄p . Let tr ≡ t (mod r) and pr ≡ p (mod r) for a small
odd prime r not equal to p. We then have
2 2
(xp , y p ) − tr (xp , y p ) + pr (x, y) = O (4.32)

for all points (x, y) in the group E[r] of r-torsion points on EF̄p (that is, points
P with rP = O). By varying tr in the range 0 6 tr 6 r − 1, we find the correct
value of tr for which Eqn (4.32) holds identically on E[r]. For each trial value
of tr , we compute the left side of Eqn (4.32) symbolically using the addition
formula for the curve. There are, however, two problems with this approach.
The first problem is that the left side of Eqn (4.32) evaluates to a pair
of rational functions of degrees as high as Θ(p2 ). Our aim is to arrive at a
polynomial-time algorithm (in log p). This problem is solved by using division
polynomials (Section 4.4.3). For the reduced Weierstrass equation, we have

ψ0 (x, y) = 0,
ψ1 (x, y) = 1,
ψ2 (x, y) = 2y,
ψ3 (x, y) = 3x4 + 6ax2 + 12bx − a2 ,
ψ4 (x, y) = 4y(x6 + 5ax4 + 20bx3 − 5a2 x2 − 4abx − 8b2 − a3 ),
48 Schoof’s algorithm performs symbolic manipulation on the coordinates x, y of points on

E, which satisfy y 2 = x3 + ax + b. Thus, x, y are actually X, Y modulo Y 2 − (X 3 + aX + b).


So the use of lower-case x, y here is consistent with the earlier sections in this chapter.
248 Computational Number Theory

ψ5 (x, y) = 5x12 + 62ax10 + 380bx9 − 105a2 x8 + 240bax7 +


(−300a3 − 240b2 )x6 − 696ba2 x5 + (−125a4 − 1920b2 a)x4 +
(−80ba3 − 1600b3 )x3 + (−50a5 − 240b2 a2 )x2 +
(−100ba4 − 640b3 a)x + (a6 − 32b2 a3 − 256b4 ),
2 2
ψm (ψm+2 ψm−1 − ψm−2 ψm+1 )
ψ2m (x, y) = ,
2y
3 3
ψ2m+1 (x, y) = ψm+2 ψm − ψm+1 ψm−1 .
It is often convenient to define
ψ−1 (x, y) = −1.
Division polynomials possess the following property.

Theorem 4.82 A point (x, y) ∈ EF̄p is in E[r] if and only if ψr (x, y) = 0. ⊳

Since we are interested in evaluating Eqn (4.32) for points in E[r], it suffices
to do so modulo ψr (x, y). But the polynomials ψr (x, y) are in two variables
x, y. Since y 2 = x3 + ax + b, we can simplify ψm (x, y) to either a polynomial
in Fp [x] or y times a polynomial in Fp [x]. In particular, we define
½
ψm (x, y) if m is odd,
fm (x) =
ψm (x, y)/y if m is even.
The polynomials fm (x) are in one variable x only, with
(1
(m2 − 1) if m is odd,
deg fm (x) = 12 2
2 (m − 4) if m is even.

The polynomials fm (x) are also called division polynomials. We continue to


have the following characterization of r-torsion points.
Theorem 4.83 A point (x, y) ∈ EF̄p is in E[r] if and only if fr (x) = 0. ⊳

To sum up, we compute Eqn (4.32) modulo y 2 − (x3 + ax + b) and fr (x).


This keeps the degrees of all intermediate polynomials bounded below Θ(r2 ).
Since r is a small prime (indeed, r = O(log p)), the polynomial fr (x) and the
left side of Eqn (4.32) can be computed in time polynomial in log p.
The second problem associated with Schoof’s algorithm is that during the
symbolic computation of Eqn (4.32), we cannot distinguish between the cases
of addition and doubling. Since these two cases use different formulas, we need
to exercise some care so that Schoof’s algorithm works at all. This makes the
algorithm somewhat clumsy. First, we rewrite Eqn (4.32) as
2 2
(xp , y p ) + π(x, y) = τ (xp , y p ), (4.33)
where we have used π = pr and τ = tr for notational simplicity. Using
Eqns (4.18) and (4.19) of Section 4.4.3, we obtain the symbolic equality:
Arithmetic of Elliptic Curves 249
µ 2 2 ¶
2 2 ψπ−1 ψπ+1 ψπ+2 ψπ−1 − ψπ−2 ψπ+1
(xp , y p ) + x − ,
ψπ2 4yψπ3

O
 if τ = 0,
µ µ ¶p µ ¶ p ¶
= p ψτ −1 ψτ +1 ψτ +2 ψτ2−1 − ψτ −2 ψτ2+1 (4.34)
 x −
 2 , 3 otherwise.
ψτ 4yψτ
The left side of Eqn (4.34) is independent of τ . Name the two summands on
2 2
this side as Q = (xp , y p ) and R = π(x, y). The addition formula for the left
side depends upon which of the following cases holds: Q = R, Q = −R, and
Q 6= ±R. The first two cases can be treated together using the x-coordinates
only, that is, Q = ±R if and only if the following condition holds:
2 ψπ−1 ψπ+1
xp = x − .
ψπ2
But ψm are polynomials in two variables, so we rewrite this condition as

f (x)fπ+1 (x)(x3 + ax + b)
 x − π−1

 if π is odd,
2
xp = fπ2 (x)
f (x)f (x)
 x − 2 π−1 3 π+1

 if π is even,
fπ (x)(x + ax + b)
in terms of the univariate division polynomials fm (x). This computation is to
be done modulo fr (x). To this effect, we compute the following gcd
 ³ ´
p2 2 3


 gcd fr (x), (x − x)fπ (x) + fπ−1 (x)fπ+1 (x)(x + ax + b)


 if π is odd,
γ(x) = ³ ´
 2
p 2 3


 gcd fr (x), (x − x)fπ (x)(x + ax + b) + fπ−1 (x)fπ+1 (x)


if π is even,
where the second argument is computed modulo fr (x).
Case 1: γ(x) = 1. This is equivalent to the condition that Q 6= ±R for
all P ∈ E[r]. In particular, τ 6= 0. In this case, we use the formula for adding
two distinct points not opposite of one another, in order to evaluate the left
side of Eqn (4.34). Then, we check for which τ ∈ {1, 2, . . . , r − 1}, equality
holds in Eqn (4.34). By multiplying with suitable polynomials, we can clear
all the denominators, and check whether equalities of two pairs of polynomial
expressions (one for the x-coordinates and the other for the y-coordinates)
hold modulo y 2 −(x3 +ax+b) and fr (x). I now supply the final y-free formulas
without derivations, which use the polynomials fm (x) instead of ψm (x, y).
We first compute two polynomials dependent only on π, modulo fr (x):
 3 2 2
 (x + ax + b)(fπ+2 fπ−1 − fπ−2 fπ+1 )−

 2
3 (p +1)/2 3
4(x + ax + b) fπ if π is odd,
α = 2 2

 fπ+2 fπ−1 − fπ−2 fπ+1 −
 2
4(x3 + ax + b)(p +3)/2 fπ3 if π is even,
250 Computational Number Theory

 ³ ´
p2 2 3


 4fπ (x − x )fπ − (x + ax + b)fπ−1 fπ+1 if π is odd,
 ³ 2
β = 4(x3 + ax + b)fπ (x − xp )(x3 + ax + b)fπ2 −

 ´

 fπ−1 fπ+1 if π is even.

In order to check the equality of x-coordinates, we compute two other


polynomials based upon π only:
³


 fπ−1 fπ+1 (x3 + ax + b) −

 ´
 2
fπ2 (xp + xp + x) β 2 (x3 + ax + b) + fπ2 α2 if π is odd,
δ1 = ³ ´
 2
2 p
 f

 π−1 fπ+1 − fπ (x + xp + x)(x3 + ax + b) β 2 +


fπ α (x + ax + b)2
2 2 3
if π is even,
δ2 = fπ2 β 2 (x3 + ax + b).

We check whether the following condition holds for the selected value of τ :
(
δ1 fτ2p + δ2 (fτ −1 fτ +1 )p (x3 + ax + b)p if τ is odd,
0 = (4.35)
δ1 fτ2p (x3 + ax + b)p + δ2 (fτ −1 fτ +1 )p if τ is even,

where the equality is modulo the division polynomial fr (x).


For checking the equality of y-coordinates, we compute (using π only):

 4(x3 + ax + b)(p−1)/2 ×
·

 ³ ´
 2 p2 3



 fπ (2x + x) − fπ−1 fπ+1 (x + ax + b) αβ 2 (x3 + ax + b) −



 ³ ´¸
 2 3 3 3 (p2 +3)/2

 fπ α + β (x + ax + b) if π is odd,
δ3 = 3 (p+1)/2

 4(x
·³ + ax + b) ×

 ´
 2



2 p
fπ (2x + x)(x + ax + b) − fπ−1 fπ+1 αβ 2 −
3


 ¸
 2³ 3
 2
´
 fπ α + β 3 (x3 + ax + b)(p −3)/2 (x3 + ax + b)2 if π is even,

δ4 = β 3 fπ2 (x3 + ax + b).

For the selected τ , we check the following equality modulo fr (x):


 ³

 δ3 fτ3p − δ4 fτ +2 fτ2−1 −

 ´p


 fτ −2 fτ2+1 (x3 + ax + b)p if τ is odd,
0 = ³ (4.36)


 δ3 fτ3p (x3 + ax + b)(3p+1)/2 − δ4 fτ +2 fτ2−1 −

 ´p

 fτ −2 fτ2+1 (x3 + ax + b)(p+1)/2 if τ is even.
Arithmetic of Elliptic Curves 251

There exists a unique τ in the range 1 6 τ 6 r − 1, for which Eqns (4.35)


and (4.36) hold simultaneously. This is the desired value of tr , that is, the
value of t modulo the small prime r.
Case 2: γ(x) 6= 1, that is, Q = ±R for some P ∈ E[r]. Now, we need to
distinguish between the two possibilities: Q = R and Q = −R.

Sub-case 2(a): If Q = −R, then τ = 0.


Sub-case 2(b): If Q = R, Eqn (4.33) gives 2πP = τ ϕ(P ), that is, ϕ(P ) = 2π
τ P
(since τ 6≡ 0 (mod r) in this case) for some non-zero P ∈ E[r]. Plugging
in this value of ϕ(P ) in Eqn (4.33) eliminates ϕ as ( 4π τ 2 − 1)P = O
for some non-zero P ∈ E[r], that is, 4π τ2 − 1 ≡ 0 (mod r) (since r is a
prime), that is, τ 2 ≡ 4π (mod r). This, in turn, implies that 4π, and so
π too, must be quadratic residues modulo r. Let w ∈ Z∗r satisfy w2 ≡
2w2
π (mod r). Then, τ ≡ ±2w (mod r). But ϕ(P ) = 2π τ P = τ P (for
some P in E[r]), so ϕ(P ) = wP if τ ≡ 2w (mod r), and ϕ(P ) = −wP if
τ ≡ −2w (mod r).

In order
¡ ¢ to ¡identify
¢ the correct sub-case, we first compute the Legendre
symbol πr . If πr = −1, then τ = 0. Otherwise, we compute a square root
w of π (or p) modulo r. We may use a probabilistic algorithm like the Tonelli
and Shanks algorithm (Algorithm 1.9) for computing w. Since r = O(log p),
we can find w by successively squaring 1, 2, 3, . . . until w2 ≡ π (mod r) holds.
Now, we check whether ϕ(P ) = wP or ϕ(P ) = −wP for some P ∈ E[r].
In the first case, τ ≡ 2w (mod r), whereas in the second, τ ≡ −2w (mod r).
If no such P exists in E[r], we have τ = 0. If we concentrate only on the
x-coordinates, we can detect the existence of such a P . As we have done in
detecting whether Q = ±R (that is, ϕ2 (P ) = ±πP ), checking the validity of
the condition ϕ(P ) = ±wP boils down to computing the following gcd:
 ³ ´
p 2 3


 gcd fr (x), (x − x)fw (x) + fw−1 (x)fw+1 (x)(x + ax + b)


 if w is odd,
δ(x) = ³ ´
 p 2 3


 gcd fr (x), (x − x)fw (x)(x + ax + b) + fw−1 (x)fw+1 (x)


if w is even,
where the second argument is computed modulo fr (x). If δ(x) = 1, then τ = 0.
Otherwise, we need to identify which one of the equalities ϕ(P ) = ±wP holds.
For deciding this, we consult the y-coordinates, and compute another gcd:
 ³


 gcd fr (x), 4(x3 + ax + b)(p−1)/2 fw3 (x) −

 ´
 2 2
 fw+2 (x)fw−1 (x) + fw−2 (x)fw+1 (x) if w is odd,
η(x) = ³
3 (p+3)/2 3


 gcd fr (x), 4(x + ax + b) fw (x) −

 ´

 fw+2 (x)f 2 (x) + f
w−1 w−2 (x)f 2 (x)
w+1 if w is even,
252 Computational Number Theory

where again arithmetic modulo fr (x) is used to evaluate the second argument.
If η(x) = 1, then τ ≡ −2w (mod r), otherwise τ ≡ 2w (mod r).

This completes the determination of τ ≡ tr ≡ t (mod r). We repeat the


above process for O(log p) small primes r, whose product is larger than the

length 4 p + 1 of the Hasse interval. By the prime number theorem, each such
small prime is O(log p). Using CRT, we combine the residues tr for all small
primes r to a unique integer in the Hasse interval.

Example 4.84 Consider the curve E : Y 2 = X 3 + 3X + 7 defined over the


prime field Fp = F997 . The Hasse interval in this case is [935, 1061] with a width
of 127. If we compute the trace t modulo the four small primes r = 2, 3, 5, 7,
we can uniquely obtain t and consequently |EF997 |.
Small prime r = 2 : X 3 + 3X + 7 is irreducible in F997 [x]. Therefore,
EF997 does not contain a point of order two, that is, |EF997 | and so t are odd.
For working with other small primes, we need the division polynomials. We
use only the univariate versions fm (x). In this example, the polynomials fm (x)
are needed only for −1 6 m 6 8, and are listed below. These polynomials are
to be treated as members of F997 [x].

f−1 (x) = 996,


f0 (x) = 0,
f1 (x) = 1,
f2 (x) = 2,
f3 (x) = 3x4 + 18x2 + 84x + 988,
f4 (x) = 4x6 + 60x4 + 560x3 + 817x2 + 661x + 318,
f5 (x) = 5x12 + 186x10 + 666x9 + 52x8 + 55x7 + 80x6 + 20x5 +
753x4 + 382x3 + 653x2 + 586x + 760,
f6 (x) = 6x16 + 432x14 + 435x13 + 427x12 + 22x10 + 687x9 +
986x8 + 610x7 + 198x6 + 994x5 + 683x4 + 575x3 +
630x2 + 968x + 704,
f7 (x) = 7x24 + 924x22 + 689x21 + 333x20 + 639x19 + 154x18 +
666x17 + 562x16 + 144x15 + 365x14 + 905x13 + 61x12 +
420x11 + 552x10 + 388x9 + 457x8 + 342x7 + 169x6 +
618x5 + 213x4 + 507x3 + 738x2 + 664x + 223,
f8 (x) = 8x30 + 755x28 + 322x27 + 855x26 + 695x25 + 229x24 +
103x23 + 982x22 + 842x21 + 842x20 + 332x19 + 30x18 +
588x17 + 864x16 + 153x15 + 927x14 + 834x13 + 439x12 +
216x11 + 469x10 + 109x9 + 877x8 + 707x7 + 879x6 +
664x5 + 490x4 + 40x3 + 150x2 + 333x + 608.
Arithmetic of Elliptic Curves 253

Small prime r = 3 : We first compute the gcd γ(x) = x + 770 6= 1, and


we go to Case 2 of Schoof’s algorithm. We have π = p rem r = 1 in this case,
that is, p is a quadratic residue modulo r. A square root of p modulo r is
w = 1. We compute the gcd δ(x) = x + 770 which is not 1 again. In the final
step, we compute the gcd η(x) = x + 770 which is still not 1. We, therefore,
conclude that t ≡ 2w ≡ 2 (mod 3).
Small prime r = 5 : We get the gcd γ(x) = f5 (x) itself. Since 997, that
is, 2 is a quadratic non-residue modulo 5, we have t ≡ 0 (mod 5) by Case 2.
Small prime r = 7 : In this case, we get γ(x) = 1, that is, we resort to
Case 1 of Schoof’s algorithm. For this r, π = p rem r = 3 is odd. Using the
appropriate formulas, we first calculate
2
α ≡ (f5 f22 − f1 f42 )(x3 + ax + b) − 4(x3 + ax + b)(p +1)/2 f33
≡ 824x23 + 137x22 + 532x21 + 425x20 + 727x19 + 378x18 +
669x17 + 906x16 + 198x15 + 305x14 + 47x13 + 65x12 + 968x11 +
985x10 + 262x9 + 867x8 + 825x7 + 373x6 + 771x5 + 97x4 +
921x3 + 789x2 + 382x + 296 (mod f7 (x)),
³ 2
´
β ≡ 4f3 (x − xp )f32 − f2 f4 (x3 + ax + b)
≡ 151x23 + 501x22 + 879x21 + 133x20 + 288x19 + 451x18 + 781x17 +
479x16 + 809x15 + 544x14 + 35x13 + 923x12 + 983x11 + 171x10 +
359x9 + 675x8 + 633x7 + 828x6 + 438x5 + 955x4 + 979x3 +
367x2 + 212x + 195 (mod f7 (x)).

Subsequently, we compute the four polynomials δi as:


³ 2
´
δ1 ≡ f2 f4 (x3 + ax + b) − f32 (xp + xp + x) β 2 (x3 + ax + b) + f32 α2
≡ 0 (mod f7 (x)),
δ2 ≡ β 2 (x3 + ax + b)f32
≡ 499x23 + 124x22 + 310x21 + 236x20 + 74x19 + 175x18 + 441x17 +
259x16 + 557x15 + 14x14 + 431x13 + 339x12 + 277x11 + 520x10 +
386x9 + 941x8 + 281x7 + 230x6 + 320x5 + 201x4 + 540x3 +
89x2 + 791x + 179 (mod f7 (x)),
· ³ 2
δ3 ≡ 4(x3 + ax + b)(p−1)/2 αβ 2 (x3 + ax + b) f32 (2xp + x) −
´ ³ 2
´¸
f2 f4 (x3 + ax + b) − f32 α3 + β 3 (x3 + ax + b)(p +3)/2

≡ 148x23 + 845x22 + 755x21 + 370x20 + 763x19 + 767x18 + 338x17 +


938x16 + 719x15 + 497x14 + 885x13 + 509x12 + 218x11 + 467x10 +
586x9 + 822x8 + 717x7 + 680x6 + 257x5 + 490x4 + 488x3 +
254 Computational Number Theory

365x2 + 16x + 935 (mod f7 (x)),


δ4 ≡ β 3 (x3 + ax + b)f32
≡ 446x23 + 533x22 + 59x21 + 324x20 + 148x19 + 778x18 + 521x17 +
833x16 + 640x15 + 908x14 + 320x13 + 608x12 + 645x11 + 401x10 +
369x9 + 720x8 + 629x7 + 923x6 + 707x5 + 427x4 + 156x3 +
688x2 + 19x + 340 (mod f7 (x)).
Finally, we try τ = 1, 2, 3, 4, 5, 6. For each odd τ , we compute the polynomials
Hx (x) ≡ δ1 fτ2p + δ2 (fτ −1 fτ +1 )p (x3 + ax + b)p (mod f7 (x)),
Hy (x) ≡ δ3 fτ3p − δ4 (x3 + ax + b)p (fτ +2 fτ2−1 −fτ −2 fτ2+1 )p (mod f7 (x)),
whereas for each even τ , we compute the same polynomials as
Hx (x) ≡ δ1 fτ2p (x3 + ax + b)p + δ2 (fτ −1 fτ +1 )p (mod f7 (x)),
Hy (x) ≡ δ3 fτ3p (x3 + ax + b)(3p+1)/2 −
δ4 (x3 + ax + b)(p+1)/2 (fτ +2 fτ2−1 − fτ −2 fτ2+1 )p (mod f7 (x)).
It turns out that Hx (x) ≡ 0 (mod f7 (x)) for τ = 1 and 6, whereas Hy (x) ≡
0 (mod f7 (x)) only for τ = 1. It follows that t ≡ 1 (mod 7).
To sum up, we have computed the trace t as:
t ≡ 1 (mod 2),
t ≡ 2 (mod 3),
t ≡ 0 (mod 5),
t ≡ 1 (mod 7).
√ √
By CRT, t ≡ 155 (mod 210). But t is an integer in the range [−2 p, 2 p ],
that is, [−63, 63 ]. Thus, t = −55, that is, |EF997 | = 997 + 1 − (−55) = 1053. ¤

It is easy to argue that the running time of Schoof’s algorithm is polyno-


mial in log p. Indeed, the most time-consuming steps are the exponentiations
of polynomials to exponents of values Θ(p) or Θ(p2 ). These computations are
done modulo fr (x) with deg fr = Θ(r2 ), and r = O(log p). A careful cal-
culation shows that this algorithm runs in O(log8 p) time. Although this is a
polynomial expression in log p, the large exponent (eight) makes the algorithm
somewhat impractical.
Several modifications of this original algorithm are proposed in the litera-
ture. The SEA (Schoof–Elkies–Atkin) algorithm reduces the exponent to six
by using a suitable divisor of degree O(r) of fr (x) as the modulus. Adaptations
of the SEA algorithm for the fields F2n are also proposed. From a practical
angle, the algorithms in the SEA family are reasonable, and, in feasible time,
can handle fields of sizes as large as 2000 bits. For prime fields, they are the
best algorithms known to date. For fields of characteristic two, more efficient
algorithms are known.
Arithmetic of Elliptic Curves 255

Exercises
1. Prove that the Weierstrass equation of an elliptic curve is irreducible, that is,
the polynomial Y 2 + (a1 X + a3 )Y − (X 3 + a2 X 2 + a4 X + a6 ) with ai ∈ K is
irreducible in K[X, Y ].
2. Prove that an elliptic or hyperelliptic curve is smooth at its point at infinity.
3. Let C : Y 2 = f (X) be the equation of a cubic curve C over a field K with
char K 6= 2, where f (X) = X 3 + aX 2 + bX + c with a, b, c ∈ K. Prove that C
is an elliptic curve (that is, smooth or non-singular) if and only if Discr(f ) 6= 0
(or, equivalently, if and only if f (X) has no multiple roots).
4. Let K be a finite field of characteristic two, and a, b, c ∈ K. Prove that:
(a) The curve Y 2 + aY = X 3 + bX + c is smooth if and only if a 6= 0.
(b) The curve Y 2 + XY = X 3 + aX 2 + b is smooth if and only if b 6= 0.
5. Determine which of the following curves is/are smooth (that is, elliptic curves).
(a) Y 2 = X 3 − X 2 − X + 1 over Q.
(b) Y 2 + 2Y = X 3 + X 2 over Q.
(c) Y 2 + 2XY = X 3 + 1 over Q.
(d) Y 2 + 4XY = X 3 + 4X over Q.
(e) Y 2 + Y = X 3 + 5 over F7 .
(f ) Y 2 + Y = X 3 + 5 over F11 .
6. Let the elliptic curve E : Y 2 = X 3 + 2X + 3 be defined over F7 . Take the
points P = (2, 1) and Q = (3, 6) on E.
(a) Compute the points P + Q, 2P and 3Q on the curve.
(b) Determine the order of P in the elliptic curve group E(F7 ).
(c) Find the number of points on E treated as an elliptic curve over F49 = F72 .
7. Let P = (h, k) be a point with 2P = (h′ , k ′ ) 6= O on the elliptic curve Y 2 =
X 3 + aX 2 + bX + c. Verify that
h4 − 2bh2 − 8ch − 4ac + b2 h4 − 2bh2 − 8ch − 4ac + b2
h′ = 2 = , and
4k 4h3 + 4ah2 + 4bh + 4c
h6 + 2ah5 + 5bh4 + 20ch3 + (20ac − 5b2 )h2 + (8a2 c − 2ab2 − 4bc)h +
(4abc − b3 − 8c2 )
k′ = 3 .
8k
8. Let K be a finite field of characteristic two, and a, b, c ∈ K. Prove that:
(a) The supersingular curve E1 : Y 2 + aY = X 3 + bX + c contains no points
of order two. In particular, the size of (E1 )K is odd.
(b) The ordinary curve E2 : Y 2 + XY = X 3 + aX 2 + b contains exactly one
point of order two. In particular, the size of (E2 )K is even.
9. Let E be an elliptic curve defined over a field K with char K 6= 2, 3. Prove
that E has at most eight points of order three. If K is algebraically closed,
prove that E has exactly eight points of order three.
256 Computational Number Theory

10. Consider the elliptic curve E : Y 2 = f (X) with f (X) = X 3 + aX 2 + bX + c ∈


K[X], defined over a finite field K with char K 6= 2. Prove that:
(a) The size of the group EK is odd if and only if f (X) is irreducible in K[X].
(b) If f (X) splits in K[X], then EK is not cyclic.
11. Take P = (3, 5) on the elliptic curve E : Y 2 = X 3 − 2 over Q. Compute the
point 2P on E. Use the Nagell–Lutz theorem to establish that P has infinite
order. Conclude that the converse of the Nagell–Lutz theorem is not true.
12. Consider the elliptic curve E : Y 2 = X 3 + 17 over Q. Demonstrate that the
points P1 = (−1, 4), P2 = (−2, 3), P3 = (2, 5), P4 = (4, 9), P5 = (8, 23),
P6 = (43, 282), P7 = (52, 375), and P8 = (5234, 378661) lie on E. Verify that
P3 = P1 + 2P2 , P4 = −P1 − P2 , P5 = −2P2 , P6 = 2P1 + 3P2 , P7 = −P1 + P2
and P8 = 3P1 + 2P2 . (Remark: It is known that the only points on E with
integer coordinates are the points P1 through P8 , and their opposites.)
13. Consider the ordinary curve E : Y 2 + XY = X 3 + X 2 + (θ + 1) defined over
F8 = F2 (θ) with θ3 + θ + 1 = 0. Find all the points on EF8 . Prepare the group
table for EF8 .
14. Let p be an odd prime with p ≡ 2 (mod 3), and let a 6≡ 0 (mod p). Prove that
the elliptic curve Y 2 = X 3 + a defined over Fp contains exactly p + 1 points.
15. Let p be a prime with p ≡ 3 (mod 4), and let a 6≡ 0 (mod p). Prove that the
elliptic curve Y 2 = X 3 + aX defined over Fp contains exactly p + 1 points.
16. Find the order of the torsion point (3, 8) on the curve E : Y 2 = X 3 −43X +166
defined over Q. Determine the structure of the torsion subgroup of EQ .
17. Let E be an elliptic curve defined over Fq and so over Fqn for all n ∈ N. Let
(h, k) be a finite point on EFqn . Prove that ϕ(h, k) = (hq , k q ) is again on EFqn .
18. Prove that a supersingular elliptic curve cannot be anomalous.
19. Let E be an anomalous elliptic curve defined over Fq . Prove that EFq is cyclic.
20. Let E denote the ordinary elliptic curve Y 2 + XY = X 3 + X 2 + 1 defined
n
over £F2n for
1
¡n¢∈ 2N. Prove thatr ¡the
¡n¢ all n n
¤ of the group EF2n is 2 + 1 −
¢ size
r
2n−1 1 − 2 7 + 4 7 − · · · + (−1) 2r 7 for all n ∈ N, where r = ⌊n/2⌋.
21. Let E denote the elliptic curve Y 2 = X 3 + X 2 + X + 1 defined over F3n for
n n−1
£ n ¡∈n¢N. Prove
all ¡n¢ 2 that the sizerof
¡ nthe
¢ r ¤group EF3n is 3 + 1 + (−1) ×2×
1 − 2 2 + 4 2 − · · · + (−1) 2r 2 for all n ∈ N, where r = ⌊n/2⌋.
22. Determine the number of points on the curves Y 2 +Y = X 3 +X and Y 2 +Y =
X 3 + X + 1 over the field F2n . Conclude that for all n ∈ N, these curves are
supersingular.
23. Determine the number of points on the curves Y 2 = X 3 ± X and Y 2 =
X 3 + X ± 1 over the field F3n . Conclude that for all n ∈ N, these curves are
supersingular.
24. Determine the number of points on the curves Y 2 = X 3 − X ± 1 over the field
F3n . Conclude that for all n ∈ N, these curves are supersingular.
Arithmetic of Elliptic Curves 257

25. Let E be a supersingular elliptic curve defined over a prime field Fp with
p > 5. Determine the size of EFpn , and conclude that E remains supersingular
over all extension Fpn , n > 1.
26. Rewrite the square-and-multiply exponentiation algorithm (Algorithm 1.4)
for computing the multiple of a point on an elliptic curve. (In the context of
elliptic-curve point multiplication, we call this a double-and-add algorithm.)
27. [Eisenträger, Lauter and Montgomery] In the double-and-add elliptic-curve
point-multiplication algorithm, we need to compute points 2P +Q for every 1-
bit in the multiplier. Conventionally, this is done as (P +P )+Q. Assuming that
the curve is given by the short Weierstrass equation, count the field operations
used in computing (P + P ) + Q. Suppose instead that 2P + Q is computed
as (P + Q) + P . Argue that we may avoid computing the Y -coordinate of
P + Q. What saving does the computation of (P + Q) + P produce (over that
of (P + P ) + Q)?
28. Find the points at infinity (over R and C) on the following real curves.
2 2
(a) Ellipses of the form X2 + Y 2 = 1.
a b
2 2
(b) Hyperbolas of the form X2 − Y 2 = 1.
a b
(c) Hyperbolas of the form XY = a.
29. Let C : f (X, Y ) = 0 be a curve defined by a non-constant irreducible polyno-
mial f (X, Y ) ∈ K[X, Y ]. Let d be deg f (X, Y ), and fd (X, Y ) the sum of all
non-zero terms of degree d in f (X, Y ). Prove that all points at infinity on C
are obtained by solving fd (X, Y ) = 0. Conclude that all the points at infinity
on C can be obtained by solving a univariate polynomial equation over K.
30. [Projective coordinates] Projective coordinates are often used to speed up
elliptic-curve arithmetic. In the projective plane, a finite point (h, k) corre-
sponds to the point [h′ , k ′ , l′ ] with l′ 6= 0, h = h′ /l′ , and k = k ′ /l′ . Let E be
an elliptic curve defined by the special Weierstrass equation Y 2 = X 3 +aX +b,
and the finite points P1 , P2 on E have projective coordinates [h1 , k1 , l1 ] and
[h2 , k2 , l2 ]. Further, let P1 + P2 have projective coordinates [h, k, l], and 2P1
have projective coordinates [h′ , k ′ , l′ ].
(a) Express h, k, l as polynomials in h1 , k1 , l1 , h2 , k2 , l2 .
(b) Express h′ , k ′ , l′ as polynomials in h1 , k1 , l1 .
(c) Show how the double-and-add point-multiplication algorithm (Exercise
4.26) can benefit from the representation of points in projective coordinates.
31. [Mixed coordinates]49 Take the elliptic curve Y 2 = X 3 +aX +b. Suppose that
the point P1 = [h′1 , k1′ , l1′ ] on the curve is available in projective coordinates,
whereas the point P2 = (h2 , k2 ) is available in affine coordinates. Express the
projective coordinates of P1 +P2 as polynomial expressions in h′1 , k1′ , l1′ , h2 , k2 .
What impact does this have on the point-multiplication algorithm?
49 Henri Cohen, Atsuko Miyaji and Takatoshi Ono, Efficient elliptic curve exponentiation

using mixed coordinates, AsiaCrypt, 51–65, 1998.


258 Computational Number Theory

32. [Generalized projective coordinates] Let c, d ∈ N be constant. Define a re-


lation ∼ on K 3 \ {(0, 0, 0)} as (h, k, l) ∼ (h′ , k ′ , l′ ) if and only if h = λc h′ ,
k = λd k ′ , and l = λl′ for some non-zero λ ∈ K.
(a) Prove that ∼ is an equivalence relation.
Denote the equivalence class of (h, k, l) as [h, k, l]c,d . (The standard projective
coordinates correspond to c = d = 1.)
(b) Let C : f (X, Y ) = 0 be an affine curve defined over K. Replace X by
X/Z c and Y by Y /Z d in f (X, Y ), and clear the denominator by multiplying
with the smallest power of Z, in order to obtain the polynomial equation
C (c,d) : f (c,d) (X, Y, Z) = 0. Demonstrate how the finite points on C can be
represented in the generalized projective coordinates [h, k, l]c,d .
(c) How can one obtain the points at infinity on C (c,d) ?
33. [Jacobian coordinates] Take c = 2, d = 3 in Exercise 4.32. Let char K 6= 2, 3.
(a) Convert E : Y 2 = X 3 + aX + b to the form E (2,3) : f (X, Y, Z) = 0.
(b) What is the point at infinity on E (2,3) ?
(c) What is the opposite of a finite point on E (2,3) ?
(d) Write the point-addition formula for finite points on E (2,3) .
(e) Write the point-doubling formula for finite points on E (2,3) .
34. [Jacobian-affine coordinates] Let P1 , P2 be finite points on E : Y 2 = X 3 +
aX +b with P1 available in Jacobian coordinates, and P2 in affine coordinates.
Derive the Jacobian coordinates of P1 + P2 . (Take char K 6= 2, 3.)
35. Repeat Exercise 4.33 for the ordinary curve E : Y 2 + XY = X 3 + aX 2 + b
defined over F2n .
36. [Chudnovsky coordinates]50 Let char K 6= 2, 3. Take the elliptic curve E :
Y 2 = X 3 + aX + b over K. In addition to the usual Jacobian coordinates
(X, Y, Z), the Chudnovsky coordinate system stores Z 2 and Z 3 , that is, a
point is represented as (X, Y, Z, Z 2 , Z 3 ).
(a) Describe how point addition in E becomes slightly faster in the Chud-
novsky coordinate system compared to the Jacobian coordinate system.
(b) Compare Chudnovsky and Jacobian coordinates for point doubling in E.
37. [López–Dahab coordinates]51 For the ordinary curve E : Y 2 + XY = X 3 +
aX 2 + b defined over F2n , take c = 1 and d = 2.
(a) What is the point at infinity on E (1,2) ?
(b) What is the opposite of a finite point on E (1,2) ?
(c) Write the point-addition formula for finite points on E (1,2) .
(d) Write the point-doubling formula for finite points on E (1,2) .
38. [LD-affine coordinates] Take K = F2n . Let P1 , P2 be finite points on E :
Y 2 + XY = X 3 + aX 2 + b with P1 available in López–Dahab coordinates, and
P2 in affine coordinates. Derive the López–Dahab coordinates of P1 + P2 .
50 David V. Chudnovsky and Gregory V. Chudnovsky, Sequences of numbers generated by

addition in formal groups and new primality and factorization tests, Advances in Applied
Mathematics, 7(4), 385–434, 1986.
51 Julio López and Ricardo Dahab, Improved algorithms for elliptic curve arithmetic in

GF (2n ), Technical Report IC-98-39, Relatório Técnico, October 1998.


Arithmetic of Elliptic Curves 259

39. Prove that the norm function defined by Eqn (4.11) is multiplicative, that is,
N(G1 G2 ) = N(G1 ) N(G2 ) for all polynomial functions G1 , G2 ∈ K[C].
40. Consider the unit circle C : X 2 +Y 2 = 1 as a complex curve. Find all the zeros
and poles of the rational function R(x, y) of Example 4.35. Also determine the
multiplicities of these zeros and poles. (Hint: Use the factored form given in
Eqn (4.12). Argue that 1/x can be taken as a uniformizer at each of the two
points at infinity on C.)
41. Consider the real hyperbola H : X 2 − Y 2 = 1. Find all the zeros and poles
(and their respective multiplicities) of the following rational function on H:
2y 4 − 2y 3 x − y 2 + 2yx − 1
R(x, y) = .
y 2 + yx + y + x + 1
(Hint: Split the numerator and the denominator of R into linear factors.)
42. Repeat Exercise 4.41 treating the hyperbola H as being defined over F5 .
43. Find all the zeros and poles (and their multiplicities) of the rational function
x/y on the curve Y 2 = X 3 − X defined over C.
44. Find all the zeros and poles (and their multiplicities) of the function x2 + yx
on the curve Y 2 = X 3 + X defined over F3 .
45. Find all the zeros and poles (and their multiplicities) of the function 1 + yx
on the curve Y 2 = X 3 + X − 1 defined over the algebraic closure F̄7 of F7 .
46. Prove that the q-th power Frobenius map ϕq (Definition 4.47) is an endomor-
phism of E = EF̄q .
47. Prove that an admissible change of variables (Theorem 4.51) does not change
the j-invariant (Definition 4.9).
48. Let K be a field of characteristic 6= 2, 3. Prove that the elliptic curves E :
Y 2 = X 3 + aX + b and E ′ : Y 2 = X 3 + a′ X + b′ defined over K are isomorphic
over K̄ if and only if there exists a non-zero u ∈ K̄ such that replacing X by
u2 X and Y by u3 Y converts the equation for E to the equation for E ′ .
49. (a) Find all isomorphism classes of elliptic curves defined over F5 , where iso-
morphism is over the algebraic closure F̄5 of F5 .
(b) Argue that the curves Y 2 = X 3 + 1 and Y 2 = X 3 + 2 are isomorphic over
the algebraic closure F̄5 , but not over F5 .
(c) According to Definition 4.50, isomorphism of elliptic curves E and E ′ is
defined by the existence of bijective bilinear maps, not by the isomorphism

of the groups EK and EK (where K is a field over which both E and E ′ are
defined). As an example, show that the curves Y 2 = X 3 + 1 and Y 2 = X 3 + 2
have isomorphic groups over F5 .
50. Consider the elliptic curves E : Y 2 = X 3 + 4X and E ′ : Y 2 = X 3 + 4X + 1
both defined over F5 .
(a) Determine the group structures of EF5 and EF′ 5 .
(b) Demonstrate that the rational map
µ 2 ¶
x − x + 2 x2 y − 2xy − y
φ(x, y) = , 2
x−1 x − 2x + 1
260 Computational Number Theory

is an isogeny from EF5 to EF′ 5 .


(c) Demonstrate that the rational map
µ 2 ¶
x + 2x + 1 x2 y − xy − 2y
φ̂(x, y) = ,
x+2 x2 − x − 1
is an isogeny from EF′ 5 to EF5 .
(d) Verify that φ̂ ◦ φ is the multiplication-by-two map of EF5 , and φ ◦ φ̂ is the
multiplication-by-two map of EF′ 5 .
(Remark: It can be proved that a non-trivial isogeny exists between two
curves E and E ′ defined over Fq if and only if the groups EFq and EF′ q have
the same size. Moreover, for every isogeny φ : E → E ′ , there exists a unique
isogeny φ̂ : E ′ → E such that φ̂ ◦ φ is the multiplication-by-m map of EFq ,
and φ ◦ φ̂ is the multiplication-by-m map of EF′ q , for some integer m. We call
m the degree of the isogeny φ, and φ̂ the dual isogeny for φ. For example, the
isogeny of Part (b) is of degree two and with dual given by the isogeny of
Part (c). Another isogeny E → E ′ is defined by the rational map
µ 4 ¶
x − x3 + x + 1 x5 y + x4 y + x3 y − 2x2 y + 2xy − 2y
ψ(x, y) = , .
x3 − x2 − x x5 + x4 + 2x3 − 2x2

The corresponding dual isogeny ψ̂ : E ′ → E is


µ 4 ¶
′ x − x3 + 2 x5 y + 2x4 y − x3 y − x2 y − xy
ψ (x, y) = , .
x3 − x2 + 2 x5 + 2x4 − x3 + x − 1
These isogenies are of degree four.
The curves Y 2 = X 3 + 1 and Y 2 = X 3 + 2 have isomorphic (and so equinu-
merous) groups over F5 (Exercise 4.49(d)), so there exists an isogeny between
these curves. But no such isogeny can be bijective. If we allow rational maps
over F̄5 , a bijective isogeny can be computed between these curves.)
51. Derive Eqns (4.16) and (4.17).
52. Prove that Weil pairing is bilinear, alternating and skew-symmetric.
53. Prove that Tate pairing is bilinear.
54. Derive Eqns (4.22), (4.23) and (4.24).
55. Rewrite Miller’s algorithm using Eqn (4.26), for computing em (P1 , P2 ).
56. Rewrite Miller’s algorithm using Eqn (4.29), for computing êm (P, Q).
57. Write an algorithm that, for two input points P, Q on an elliptic curve E,
returns the monic polynomial representing the line LP, Q through P and Q.
Your algorithm should handle all possible inputs including P = ±Q, and one
or both of P, Q being O.
58. Suppose that Q = mQ′ for some Q′ ∈ EL (see the definition of Tate pairing).
Prove that hP, Qim = 1. This observation leads us to define Tate pairing on
EL /mEL (in the second argument) instead of on EL itself.
Arithmetic of Elliptic Curves 261

59. [Blake, Murty and Xu] Let U ∈ E[m] be non-zero, and Q 6= O, U, 2U, . . . ,
(m − 1)U . Prove the following assertions.
LU, U (Q) 1
(a) 2 =− .
LU, −U (Q)L2U, −2U (Q) LU, U (−Q)
L(k+1)U, kU (Q) LkU, −kU (Q)
(b) =− for k ∈ Z.
L(k+1)U, −(k+1)U (Q)L(2k+1)U, −(2k+1)U (Q) L(k+1)U, kU (−Q)
L2U, U (Q) LU, −U (Q)
(c) =− .
L2U, −2U (Q)L3U, −3U (Q) L2U, U (−Q)
60. Establish the correctness of Algorithm 4.4.
61. [Blake, Murty and Xu] Prove that the loop body of Algorithm 4.1 for the
computation of fn,P (Q) can be replaced as follows:
³ ´
L2U, −2U (Q)
If (ni = 0), then update f = −f 2 × LU, U (−Q)
and U = 2U ,
³ ´
L2U, P (Q)
else update f = −f 2 × LU, U (−Q)
and U = 2U + P .

Explain what speedup this modification is expected to produce.


62. Prove that for all n, n′ ∈ N, the functions fn,P in Miller’s algorithm satisfy

n
fnn′ ,P = fn,P fn′ ,nP = fnn′ ,P fn,n′ P .

63. Define the functions fn,P,S as rational functions having the divisor

Div(fn,P,S ) = n[P + S] − n[S] − [nP ] + [O].

Suppose that mP = O, so that Div(fm,P,S ) = m[P + S] − m[S]. For Weil and


Tate pairing, we are interested in S = O only. Nonetheless, it is interesting to
study the general functions fn,P,S . Prove that:
LP +S, −(P +S)
(a) f0,P,S = 1, and f1,P,S = .
LP, S
LnP, P
(b) For n ∈ N, we have fn+1,P,S = fn,P,S × f1,P,S × .
L(n+1)P, −(n+1)P
LnP, n′ P
(c) For n, n′ ∈ N, we have fn+n′ ,P,S = fn,P,S × fn′ ,P,S × .
L(n+n′ )P, −(n+n′ )P
64. Rewrite Miller’s Algorithm 4.1 to compute the functions fn,P,S .
65. Assume that Weil/Tate pairing is restricted to suitable groups of order m.
Prove that Weil/Tate pairing under the distortion map is symmetric about
its two arguments. Does symmetry hold for twisted pairings too?
66. Deduce that the embedding degrees of the following supersingular curves are
as mentioned. Recall that for a supersingular curve, the embedding degree
must be one of 1, 2, 3, 4, 6. Explicit examples for all these cases are given here.
In each case, take m to be a suitably large prime divisor of the size of the
elliptic-curve group Eq (where E is defined over Fq ).
(a) The curve Y 2 = X 3 + a defined over Fp for an odd prime p ≡ 2 (mod 3)
and with a 6≡ 0 (mod p) has embedding degree two (see Exercise 4.14).
262 Computational Number Theory

(b) The curve Y 2 = X 3 + aX defined over Fp for a prime p ≡ 3 (mod 4) and


with a 6≡ 0 (mod p) has embedding degree two (Exercise 4.15).
(c) The curve Y 2 + Y = X 3 + X + a with a = 0 or 1, defined over F2n with
odd n, has embedding degree four (Exercise 4.22).
(d) The curve Y 2 = X 3 −X +a with a = ±1, defined over F3n with n divisible
by neither 2 nor 3, has embedding degree six (Exercise 4.24).
(e) Let p ≡ 5 (mod 6) be a prime. Let a ∈ Fp2 be a square but not a cube.
It is known that the curve Y 2 = X 3 + a defined over Fp2 contains exactly
p2 − p + 1 points. This curve has embedding degree three.
(f ) Let E be a supersingular curve defined over a prime field Fp with p > 5.
E considered as a curve over Fpn with even n is supersingular (Exercise 4.25),
and has embedding degree one.

67. Prove that the following distortion maps are group homomorphisms. (In this
exercise, k is not used to denote the embedding degree.)
(a) For the curve Y 2 = X 3 +a defined over Fp for an odd prime p ≡ 2 (mod 3)
and with a 6≡ 0 (mod p), the map (h, k) 7→ (θh, k), where θ3 = 1.
(b) For the curve Y 2 = X 3 + aX defined over Fp for a prime p ≡ 3 (mod 4)
and with a 6≡ 0 (mod p), the map (h, k) 7→ (−h, θk), where θ2 = −1.
(c) For the curve Y 2 + Y = X 3 + X + a with a = 0 or 1, defined over F2n
with odd n, the map (h, k) 7→ (θh + ζ 2 , k + θζh + ζ), where θ ∈ F22 satisfies
θ2 + θ + 1 = 0, and ζ ∈ F24 satisfies ζ 2 + θζ + 1 = 0.
(d) For the curve Y 2 = X 3 − X + a with a = ±1, defined over F3n with
n divisible by neither 2 nor 3, the map (h, k) 7→ (ζ − h, θk) where θ ∈ F32
satisfies θ2 = −1, and ζ ∈ F33 satisfies ζ 3 − ζ − a = 0.
(e) Let p ≡ 5 (mod 6) be a prime, a ∈ Fp2 a square but not a cube, and
let γ ∈ Fp6 satisfy γ 3 = a. For the curve Y 2 = X 3 + a defined over Fp2 , the
distortion map is (h, k) 7→ (hp /(γa(p−2)/3 ), k p /a(p−1)/2 ).

68. Let E : Y 2 = X 3 + aX + b be an elliptic curve defined over a field Fq of odd


characteristic > 5. A quadratic twist of E is defined as E ′ : Y 2 = X 3 +v 2 aX +
v 3 b, where v ∈ F∗q is a quadratic non-residue.
³ 3
´
(a) Show that the j-invariant of E is j(E) = 1728 4a34a +27b 2 .
(b) Conclude that j(E) = j(E ′ ). (Thus, E and E ′ are isomorphic over F̄q .)
(c) Prove that |Eq | + |Eq′ | = 2(q + 1).

69. Let E : Y 2 = X 3 + aX + b be an elliptic curve defined over Fp with p > 5 a


p−1 µ 3 ¶
X x + ax + b
prime. Prove that the trace of Frobenius at p is − , where
x=0
p
³ ´
c
p is the Legendre symbol.
Arithmetic of Elliptic Curves 263

70. Edwards curves were proposed by Harold M. Edwards,52 and later modified to
suit elliptic-curve cryptography by Bernstein and Lange.53 For finite fields K
with char K 6= 2, an elliptic curve defined over K is equivalent to an Edwards
curve over a suitable extension of K (the extension may be K itself). A unified
addition formula (no distinction between addition and squaring, and a uniform
treatment of all group elements including the identity) makes Edwards curves
attractive and efficient alternatives to elliptic curves. An Edwards curve over
a non-binary finite field K is defined by the equation

D : X 2 + Y 2 = c2 (1 + dX 2 Y 2 ) with 0 6= c, d ∈ K and dc4 6= 1.

Suppose also that d is a quadratic non-residue in K. Define an operation on


two finite points P1 = (h1 , k1 ) and P2 = (h2 , k2 ) on D as
µ ¶
h1 k2 + k1 h2 k1 k2 − h1 h2
P1 + P2 = , .
c(1 + dh1 h2 k1 k2 ) c(1 − dh1 h2 k1 k2 )

Let P, P1 , P2 , P3 be arbitrary finite points on D. Prove that:


(a) P1 + P2 is again a finite point on D.
(b) P1 + P2 = P2 + P1 .
(c) P + O = O + P = P , where O is the finite point (0, c).
(d) If P = (h, k) on D, we have P + Q = O, where Q = (−h, k).
The finite points on D constitute an additive Abelian group. Directly proving
the associativity of this addition is very painful.
Now, let e = 1 − dc4 , and define the elliptic curve
µ ¶
1 2 3 4
E: Y =X + − 2 X 2 + X.
e e

For a finite point P = (h, k) on D, define



 O if P = (0, c),

 (0, 0) if P = (0, −c),
φ(P ) = µ ¶

 c + k 2c(c + k)
 , if h 6= 0.
c − k (c − k)h

(e) Prove that φ maps points on D to points on E.


It turns out that for any two points P, Q on D, we have φ(P + Q) = φ(P ) +
φ(Q), where the additions on the two sides correspond to the curves D and E,
respectively. This correspondence can be proved with involved calculations.
52 Harold M. Edwards, A normal form for elliptic curves, Bulletin of American Mathe-

matical Society, 44, 393–422, 2007.


53 Daniel J. Bernstein and Tanja Lange, A complete set of addition laws for incomplete

Edwards curves, Journal of Number Theory, 131, 858–872, 2011.


264 Computational Number Theory

Programming Exercises

71. Let p be a small prime. Write a GP/PARI program that, given an elliptic curve
E over Fp , finds all the points on Ep = EFp , calculates the size of Ep , computes
the order of each point in Ep , and determines the group structure of Ep .
72. Repeat Exercise 4.71 for elliptic curves over binary fields F2n for small n.
73. Write a GP/PARI program that, given a small prime p, an elliptic curve E
defined over Fp , and an n ∈ N, outputs the size of the group Epn = EFpn .
74. Write a GP/PARI function that, given an elliptic curve over a finite field (not
necessarily small), returns a random point on the curve.
75. Write a GP/PARI function that, given points U, V, Q on a curve, computes the
equation of the line passing through U and V , and returns the value of the
function at Q. Assume Q to be a finite point, but handle all cases for U, V .
76. Implement the reduced Tate pairing using the function of Exercise 4.75.
Consider a supersingular curve Y 2 = X 3 + aX over a prime field Fp with
p ≡ 3 (mod 4) and with m = (p + 1)/4 a prime.
77. Implement the distorted Tate pairing on the curve of Exercise 4.76.
Chapter 5
Primality Testing

5.1 Introduction to Primality Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266


5.1.1 Pratt Certificates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266
5.1.2 Complexity of Primality Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
5.1.3 Sieve of Eratosthenes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
5.1.4 Generating Random Primes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
5.1.5 Handling Primes in the GP/PARI Calculator . . . . . . . . . . . . . . . . . . . . 270
5.2 Probabilistic Primality Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
5.2.1 Fermat Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271
5.2.2 Solovay–Strassen Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
5.2.3 Miller–Rabin Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275
5.2.4 Fibonacci Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
5.2.5 Lucas Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
5.2.6 Other Probabilistic Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
5.3 Deterministic Primality Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284
5.3.1 Checking Perfect Powers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
5.3.2 AKS Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287
5.4 Primality Tests for Numbers of Special Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
5.4.1 Pépin Test for Fermat Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
5.4.2 Lucas–Lehmer Test for Mersenne Numbers . . . . . . . . . . . . . . . . . . . . . . . 290
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293

An integer p > 1 is called prime if its only positive integral divisors are 1 and
p. Equivalently, p is prime if and only if p|ab implies p|a or p|b. An integer
n > 1 is called composite if it is not prime, that is, if n has an integral divisor
u with 1 < u < n. The integer 1 is treated as neither prime nor composite.
One can extend the notion of primality to the set of all integers. The addi-
tive identity 0 and the multiplicative units ±1 are neither prime nor composite.
A non-zero non-unit p ∈ Z is called prime if a factorization p = uv necessarily
implies that either u or v is a unit. Thus, we now have the negative primes
−2, −3, −5, . . . . In this book, an unqualified use of the term prime indicates
positive primes. The set of all (positive) primes is denoted by P.
P is an infinite set (Theorem 1.69). Given n ∈ N, there exists a prime (in
fact, infinitely many primes) larger than n. The asymptotic density of primes
(the prime number theorem) and related results are discussed in Section 1.9.

Theorem 5.1 [Fundamental theorem of arithmetic ]1 Any non-zero inte-


ger n has a factorization of the form n = up1 p2 · · · pt , where u = ±1, and
p1 , p2 , . . . , pt are (positive) primes for some t > 0. Moreover, such a factor-
ization is unique up to rearrangement of the prime factors p1 , p2 , . . . , pt . ⊳
1 Euclid seems to have been the first to provide a complete proof of this theorem.

265
266 Computational Number Theory

Considering the possibility of repeated prime factors, one can rewrite the
factorization of n as n = uq1e1 q2e2 · · · qrer , where q1 , q2 , . . . , qr are pairwise dis-
tinct primes, and ei ∈ N is the multiplicity of qi in n, denoted ei = vqi (n).2
Problem 5.2 [Fundamental problem of computational number theory ]Given
a non-zero (usually positive) integer n, compute the decomposition of n into
prime factors, that is, compute all the prime divisors p of n together with their
respective multiplicities vp (n). ⊳
Problem 5.2 is also referred to as the integer factorization problem or as
IFP in short. Solving this demands ability to recognize primes as primes.
Problem 5.3 [Primality testing ] Given a positive integer n > 2, determine
whether n is prime or composite. ⊳
The primality testing problem has efficient probabilistic algorithms. The de-
terministic complexity of primality testing too is polynomial-time. On the
contrary, factoring integers appears to be a difficult and challenging computa-
tional problem. In this chapter, we discuss algorithms for testing the primality
of integers. Integer factorization is studied in Chapter 6.

5.1 Introduction to Primality Testing


The straightforward algorithm to prove whether n > 2 is prime is by trial
division of n by integers d in the range 2 6 d < n. We declare n as a prime
if and only if no non-trivial divisor d of n can √ be located. Evidently, n is
composite if and only if it admits a factor d 6 n. This implies that √ we can
restrict the search for potential divisors d of n to the range 2 6 d 6 ⌊ n⌋.
Trial division is a fully exponential-time algorithm (in the size log n of n)
for primality testing. No straightforward modification of trial division tends
to reduce this complexity to something faster than exponential. This is why
we do not describe this method further in connection with primality testing.
(If integer factorization is of concern, trial division plays an important role.)

5.1.1 Pratt Certificates


The first significant result about the complexity of primality testing comes
from Pratt3 who proves the existence of polynomial-time verifiable certifi-
cates for primality. This establishes that primality testing is a problem in the
complexity class NP. Pratt certificates are based on the following result.
2 Some authors prefer to state the fundamental theorem only for integers n > 2. But the

case n = 1 is not too problematic, since 1 factors uniquely into the empty product of primes.
3 Vaughan Pratt, Every prime has a succinct certificate, SIAM Journal on Computing,

4, 214–220, 1975.
Primality Testing 267

Theorem 5.4 A positive integer n is prime if and only if there exists an


element a ∈ Z∗n with ordn a = n − 1, that is, with an−1 ≡ 1 (mod n), and
a(n−1)/p 6≡ 1 (mod n) for every prime divisor p of n − 1. ⊳
Therefore, the prime factorization of n − 1 together with an element a of order
n − 1 seem to furnish a primality certificate for n. However, there is a subtle
catch here. We need to certify the primality of the prime divisors of n − 1.
Example 5.5 Here is a false certificate about the primality of n = 17343.
We claim the prime factorization n − 1 = 2 × 8671 and supply the primitive
root a ≡ 163 (mod n). One verifies that an−1 ≡ 1 (mod n), a(n−1)/2 ≡
3853 (mod n), and a(n−1)/8671 ≡ 9226 (mod n), thereby wrongly concluding
that n = 17343 is prime. However, n = 32 × 41 × 47 is composite. The problem
with this certificate is that 8671 = 13 × 23 × 29 is not prime. It is, therefore,
necessary to certify every prime divisor of n − 1 as prime. ¤
That looks like a circular requirement. A primality certificate requires other
primality certificates which, in turn, require some more primality certificates,
and so on. But each prime divisor of n − 1 is smaller than n, so this inductive
(not circular) process stops after finitely many steps. Pratt proves that the
total size of a complete certificate for the primality of n is only O(log2 n) bits.
Example 5.6 Let us provide a complete primality certificate for n = 1237.
(1) 1237 − 1 = 22 × 3 × 103 with primitive root 2
(2) 2 − 1 = 1 with primitive root 1
(3) 3 − 1 = 2 with primitive root 2
(4) 2 − 1 = 1 with primitive root 1
(5) 103 − 1 = 2 × 3 × 17 with primitive root 5
(6) 2 − 1 = 1 with primitive root 1
(7) 3 − 1 = 2 with primitive root 2
(8) 2 − 1 = 1 with primitive root 1
(9) 17 − 1 = 24 with primitive root 3
(10) 2 − 1 = 1 with primitive root 1
We now verify this certificate. Line (1) supplies the factorization 1237−1 =
22 ×3×103. Moreover, 21237−1 ≡ 1 (mod 1237), 2(1237−1)/2 ≡ 1236 (mod 1237),
2(1237−1)/3 ≡ 300 (mod 1237), and 2(1237−1)/103 ≡ 385 (mod 1237).
We recursively prove the primality of the divisors 2 (Line (2)), 3 (Lines (3)–
(4)) and 103 (Lines (5)–(10)) of 1237−1. For example, 103−1 = 2×3×17 with
5103−1 ≡ 1 (mod 103), 5(103−1)/2 ≡ 102 (mod 103), 5(103−1)/3 ≡ 56 (mod 103),
and 5(103−1)/17 ≡ 72 (mod 103). Thus, 103 is a prime, since 2, 3 and 17 are
established as primes in Lines (6), (7)–(8) and (9)–(10), respectively. ¤
One can verify the Pratt certificate for n ∈ P in O(log5 n) time. Moreover,
a composite number n can be easily disproved to be a prime if a non-trivial
divisor of n is supplied. Therefore, primality testing is a problem in the class
NP ∩ coNP. It is, thus, expected that primality testing is not NP-Complete,
since if so, we have P = NP, a fact widely believed to be false.
268 Computational Number Theory

5.1.2 Complexity of Primality Testing


Is primality testing solvable in polynomial time? This question remained
unanswered for quite a long time. Several primality-testing algorithms were
devised, that can be classified in two broad categories: deterministic and prob-
abilistic (or randomized). Some deterministic tests happen to have running
time very close to polynomial (like (log n)log log log n ). For other deterministic
tests, rigorous proofs of polynomial worst-case running times were missing.
The probabilistic tests, on the other hand, run in provably polynomial
time, but may yield incorrect answers on some occasions. The probability of
such incorrect answers can be made arbitrarily small by increasing the number
of rounds in the algorithms. Nonetheless, such algorithms cannot be accepted
as definite proofs for primality. Under the assumption of certain unproven
mathematical facts (like the ERH), one can convert some of these probabilistic
algorithms to polynomial-time deterministic algorithms. However, so long as
the underlying mathematical conjectures remain unproven, these converted
algorithms lack solid theoretical basis.
In August 2002, three Indians (Agarwal, Kayal and Saxena) propose an
algorithm that settles the question. Their AKS primality test meets three
important requirements. First, it is a deterministic algorithm. Second, it has
a polynomial worst-case running time. And third, its proof of correctness or
complexity does not rely upon any unproven mathematical assumption. Soon
after its conception, several improvements of the AKS test were proposed.
These new developments in the area of primality testing, although theo-
retically deep, do not seem to have significant practical impacts. In practi-
cal applications like cryptography, significantly faster randomized algorithms
suffice. The failure probability of these randomized algorithms can be made
smaller than the probability of hardware failure. Thus, a deterministic pri-
mality test is expected to give incorrect answers because of hardware failures
more often than a probabilistic algorithm furnishing wrong outputs.

5.1.3 Sieve of Eratosthenes


The Greek mathematician Eratosthenes of Cyrene (circa 276–195 BC) pro-
posed possibly the earliest known sieve algorithm to locate all primes up to a
specified positive integer n. We can use this algorithm to quickly generate a
set of small primes (like the first million to hundred million primes).
One starts by writing the integers 2, 3, 4, 5, . . . , n in a list. One then enters
a loop, each iteration of which discovers a new prime and marks all multiples
of that prime (other than the prime itself) as composite.
Initially, no integer in the list is marked prime or composite. The first
unmarked integer is 2, which is marked as prime. All even integers in the list
larger than 2 are marked as composite. In the second pass, the first unmarked
integer 3 is marked as prime. All multiples of 3 (other than 3) are marked as
composite. In the third pass, the first unmarked integer 5 is marked as prime,
Primality Testing 269

and multiples of 5 (other than itself) are marked √


composite. This process is
repeated until the first unmarked entry exceeds n. All entries in the list,
that are marked as prime or that remain unmarked are output as primes.

Example 5.7 The sieve of Eratosthenes finds all primes 6 n = 50 as follows.


Integers discovered as composite are underlined, and appear in bold face in
the pass in which they are first marked. The discovered primes are boxed.

Pass 1
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
Pass 2
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
Pass 3
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
Pass 4
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50

The passes stop here, since the first unmarked entry 11 is larger than 50.
All primes 6 50 are 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47. ¤

5.1.4 Generating Random Primes


In many applications, random primes of given bit sizes are required. One
may keep on checking the primality of randomly generated integers of a given
size by one or more probabilistic or deterministic tests. After O(s) iterations,
one expects to get a random prime of the given bit size, since by the prime
number theorem, the fraction of primes among integers of bit size s is Θ(1/s).
An adaptation of the sieve of Eratosthenes reduces the running time of this
search significantly. We start with a random s-bit integer n0 . We concentrate
on the integers ni = n0 + i in an interval [n0 , nl−1 ] of length l = Θ(s). It
is of no use to subject the even integers in this interval to a primality test.
Likewise, we can discard multiples of 3, the multiples of 5, and so on.
We choose a number t of small primes (typically, the first t primes). We
do not perform trial division of all the integers ni with all these small primes.
We instead use an array of size l with each cell initialized to 1. The i-th cell
corresponds to the integer ni for i = 0, 1, 2, . . . , l − 1. For each small prime p,
we compute the remainder r = n0 rem p which identifies the first multiple of p
270 Computational Number Theory

in [n0 , nl−1 ]. After this position, every p-th integer in the interval is a multiple
of p. We set to zero the array entries at all these positions.
After all the small primes are considered, we look at those array indices i
that continue to hold the value 1. These correspond to all those integers ni
in [n0 , nl−1 ] that are divisible by neither of the small primes. We subject only
these integers ni to one or more primality test(s). This method is an example
of sieving. Many composite integers are sieved out (eliminated) much more
easily than running primality tests individually on all of them. For each small
prime p, the predominant cost is that of a division (computation of n0 rem p).
Each other multiple of p is located easily (by adding p to the previous multiple
of p). In practice, one may work with 10 to 1000 small primes.
If the length l of the sieving interval is carefully chosen, we expect to locate
a prime among the non-multiples of small primes. However, if we are unlucky
enough to encounter only composite numbers in the interval, we repeat the
process for another random value of n0 . It is necessary to repeat the process
also in the case that n0 is of bit length s, whereas a discovered prime ni = n0 +i
is of bit length larger than s (to be precise, s + 1 for all sufficiently large s). If
n0 is chosen as a random s-bit integer, and if s is not too small, the probability
of such an overflow is negligibly small.

5.1.5 Handling Primes in the GP/PARI Calculator


Upon start-up, the GP/PARI calculator loads a precalculated list of the
first t primes. One can obtain the i-th prime by the command prime(i) for
i = 1, 2, 3, . . . , t. The call primes(i) returns a vector of the first i primes. If
i > t, the calculator issues a warning message. One can add and remove primes
to this precalculated list by the directives addprimes and removeprimes.
The primality testing function of GP/PARI is isprime. An optional flag may
be supplied to this function in order to indicate the algorithm to use. Consult
the GP/PARI manual for more details.
The call nextprime(x) returns the smallest prime > x, whereas the call
precprime(x) returns the largest prime 6 x. Here, x is allowed to be a real
number. For x < 2, the call precprime(x) returns 0.
A random integer a in the range 0 6 a < n can be obtained by the call
random(n). Therefore, a random prime of bit length 6 l can be obtained by
the call precprime(random(2^l)).

gp > prime(1000)
%1 = 7919
gp > prime(100000)
*** not enough precalculated primes
gp > primes(16)
%2 = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53]
gp > isprime(2^100+277)
%3 = 1
gp > nextprime(2^100)
Primality Testing 271

%4 = 1267650600228229401496703205653
gp > nextprime(2^100)-2^100
%5 = 277
gp > precprime(2^200)
%6 = 1606938044258990275541962092341162602522202993782792835301301
gp > 2^200-precprime(2^200)
%7 = 75
gp > precprime(random(2^10))
%8 = 811

5.2 Probabilistic Primality Testing


In this section, we discuss some probabilistic polynomial-time primality-
testing algorithms. These are No-biased Monte Carlo algorithms. This means
that when the algorithms output No, the answer is correct, whereas the out-
put Yes comes with some chances of error. In order that these algorithms
be useful, the error probabilities should be low. A Yes-biased Monte Carlo
algorithm produces the answer Yes with certainty and the answer No with
some probability of error. A Yes-biased Monte Carlo algorithm for checking
the primality of integers of a particular kind is explored in Exercise 5.21.

5.2.1 Fermat Test


Let n be a prime. By Fermat’s little theorem, an−1 ≡ 1 (mod n) for every
a ∈ Z∗n . The converse of this is not true, that is, an−1 ≡ 1 (mod n) for
some (or many) a ∈ Z∗n does not immediately imply that n is a prime. For
example, consider n = 17343 and a = 163 (see Example 5.5). Nonetheless, this
motivates us to define the following concept which leads to Algorithm 5.1.
Definition 5.8 Let n ∈ N and a ∈ Z with gcd(a, n) = 1. We call n a pseudo-
prime (or a Fermat pseudoprime) to the base a if an−1 ≡ 1 (mod n). ⊳

Algorithm 5.1: Fermat test for testing the primality of n


Fix the number t of iterations.
For i = 1, 2, . . . , t {
Choose a random integer a ∈ {2, 3, . . . , n − 1}.
If (an−1 6≡ 1 (mod n)) return "n is composite".
}
return "n is prime".

A prime n is a pseudoprime to every base a coprime to n. A composite


integer may also be a pseudoprime to some bases. If a composite integer n is
272 Computational Number Theory

not a pseudoprime to at least one base a ∈ Z∗n , then n is not a pseudoprime


to at least half of the bases in Z∗n .
A base, to which n is not a pseudoprime, is a witness to the compositeness
of n. If any such witness is found, n is declared as composite. On the other
hand, if we do not encounter a witness in t iterations, we declare n to be a
prime. If n is indeed prime, then no witnesses for n can be found, and the
algorithm correctly declares n as prime. However, if n is composite, and we fail
to encounter a witness in t iterations, the decision of the algorithm is incorrect.
In this case, suppose that there exists a witness for the compositeness of n.
Then, the probability that n is still declared as prime is 1/2t . By choosing t
appropriately, one can reduce this error probability to a very low value.

Example 5.9 (1) The composite integer 891 = 34 × 11 is pseudoprime to


only the 20 bases 1, 80, 82, 161, 163, 244, 323, 325, 404, 406, 485, 487, 566,
568, 647, 728, 730, 809, 811 and 890. We have φ(891) = 540, that is, the
fraction of non-witnesses in Z∗891 is 1/27.
(2) The composite integer 2891 = 72 × 59 is pseudoprime only to the four
bases 1, 589, 2302, 2890.
(3) The composite integer 1891 = 31×61 is pseudoprime to 900 bases, that
is, to exactly half of the bases in Z∗1891 (we have φ(1891) = 30 × 60 = 1800).
(4) There are only 22 composite integers 6 10, 000 that are pseudoprimes
to the base 2. These composite integers are 341, 561, 645, 1105, 1387, 1729,
1905, 2047, 2465, 2701, 2821, 3277, 4033, 4369, 4371, 4681, 5461, 6601, 7957,
8321, 8481 and 8911. There are 78 composite pseudoprimes 6 100, 000 and
245 composite pseudoprimes 6 1, 000, 000 to the base 2. ¤

There exist composite integers n which are pseudoprimes to every base


coprime to n. Such composite integers are called Carmichael numbers.4 A
Carmichael number n passes the primality test irrespective of how many bases
in Z∗n are chosen. Here follows a characterization of Carmichael numbers.

Theorem 5.10 A positive composite integer n is a Carmichael number if and


only if n is square-free, and (p − 1)|(n − 1) for every prime divisor p of n. ⊳

Example 5.11 The smallest Carmichael number is 561 = 3 × 11 × 17. We


have 561 − 1 = 280 × (3 − 1) = 56 × (11 − 1) = 35 × (17 − 1). All other
Carmichael numbers 6 100, 000 are:

1105 = 5 × 13 × 17,
1729 = 7 × 13 × 19,
2465 = 5 × 17 × 29,
4 These are named after the American mathematician Robert Daniel Carmichael (1879–

1967). The German mathematician Alwin Reinhold Korselt (1864–1947) first introduced
the concept of Carmichael numbers. Carmichael was the first to discover (in 1910) concrete
examples (like 561).
Primality Testing 273

2821 = 7 × 13 × 31,
6601 = 7 × 23 × 41,
8911 = 7 × 19 × 67,
10585 = 5 × 29 × 73,
15841 = 7 × 31 × 73,
29341 = 13 × 37 × 61,
41041 = 7 × 11 × 13 × 41,
46657 = 13 × 37 × 97,
52633 = 7 × 73 × 103,
62745 = 3 × 5 × 47 × 89,
63973 = 7 × 13 × 19 × 37, and
75361 = 11 × 13 × 17 × 31.
The smallest Carmichael numbers with five and six prime factors are 825265 =
5 × 7 × 17 × 19 × 73 and 321197185 = 5 × 19 × 23 × 29 × 37 × 137. ¤
One can show that a Carmichael number must be odd with at least three
distinct prime factors. Alford et al.5 prove that there exist infinitely many
Carmichael numbers. That is bad news for the Fermat test. We need to look
at its modifications so as to avoid the danger posed by Carmichael numbers.
Here is how the Fermat test can be implemented in GP/PARI. In the function
Fpsp(n,t), the first parameter n is the integer whose primality is to be checked,
and t is the count of random bases to try.

gp > Fpsp(n,t) = \
for (i=1, t, \
a = Mod(random(n),n); b = a^(n-1); \
if (b != Mod(1,n), return(0)) \
); \
return(1);
gp > Fpsp(1001,20)
%1 = 0
gp > Fpsp(1009,20)
%2 = 1
gp > p1 = 601; p2 = 1201; p3 = 1801; n = p1 * p2 * p3
%3 = 1299963601
gp > Fpsp(n,20)
%4 = 1

In practical implementations, one may choose single-precision integers (like


the first t small primes) as bases a. The error probability does not seem to
be affected by small bases, but the modular exponentiation an−1 (mod n)
becomes somewhat more efficient than in the case of multiple-precision bases.
5 W. R. Alford, Andrew Granville and Carl Pomerance, There are infinitely many

Carmichael numbers, Annals of Mathematics, 140, 703–722, 1994.


274 Computational Number Theory

5.2.2 Solovay–Strassen Test


¡ ¢
By Euler’s criterion, a(n−1)/2 ≡ na (mod n) for every odd prime n and for
every base a coprime to n. The converse is again not true, but a probabilistic
primality test6 (Algorithm 5.2) can be based on the following definition.

Definition 5.12 An odd positive integer n is called an Euler pseudoprime


¡ a ¢a Solovay–Strassen
or ¡ apseudoprime
¢ to a base a coprime to n if a(n−1)/2 ≡
n (mod n), where n is the Jacobi symbol. ⊳

Every Euler pseudoprime is a Fermat pseudoprime, but not conversely.

Algorithm 5.2: Solovay–Strassen primality test


Fix the number t of iterations.
For i = 1, 2, . . . , t {
¡ ¢a random integer a ∈ {2, 3, . . . , n − 1}.
Choose
If ( na = 0), return "n is composite". /* gcd(a, n) > 1 */
¡ ¢
If (a(n−1)/2 6≡ na (mod n)), return "n is composite".
}
Return "n is prime".

∗ (n−1)/2
¡ a ¢For an odd composite integer n, a base a ∈ Zn satisfying a 6≡
n (mod n) is a witness to the compositeness of n. If no such witness is found
in t iterations, the number n is declared as composite. Of course, primes do
not possess such witnesses. However, a good property of the Solovay–Strassen
test is that an odd composite n (even a Carmichael number) has at least
φ(n)/2 witnesses to the compositeness of n. Thus, the probability of erro-
neously declaring a composite n as prime is no more that 1/2t .

Example 5.13 (1) The composite integer 891 = 34 × 11 is a Fermat pseu-


doprime to 20 bases (Example 5.9(1)), but an Euler pseudoprime to only the
following ten of these bases: 1, 82, 161, 163, 404, 487, 728, 730, 809 and 890.
(2) 2891 = 72 × 59 is an Euler pseudoprime only to the bases 1 and 2890.
(3) 1891 = 31 × 61 is an Euler pseudoprime to 450 bases only.
(4) Let nF (n) denote the count of bases to which n is a Fermat pseu-
doprime, and nE (n) the count of bases to which n is an Euler pseudoprime.
Clearly, nF (n) > nE (n). These two counts may, however, be equal. For ex-
ample, for n = 1681 = 412 , there are exactly 40 bases to which n is a Fermat
pseudoprime, and n is an Euler pseudoprime to precisely these bases.
(5) The counts nF (n), nE (n) together with φ(n) are listed in the following
table for some small Carmichael numbers n. There are no Fermat witnesses to
the compositeness of Carmichael numbers, but there do exist Euler witnesses
to their compositeness.
6 Robert M. Solovay and Volker Strassen, A fast Monte-Carlo test for primality, SIAM

Journal on Computing, 6(1), 84–85, 1977.


Primality Testing 275

n φ(n) nF (n) nE (n)


561 = 3 × 11 × 17 320 320 80
1105 = 5 × 13 × 17 768 768 192
1729 = 7 × 13 × 19 1296 1296 648
2465 = 5 × 17 × 29 1792 1792 896
2821 = 7 × 13 × 31 2160 2160 540
6601 = 7 × 23 × 41 5280 5280 1320
8911 = 7 × 19 × 67 7128 7128 1782
41041 = 7 × 11 × 13 × 41 28800 28800 14400
825265 = 5 × 7 × 17 × 19 × 73 497664 497664 124416
¤

5.2.3 Miller–Rabin Test


Miller7 and Rabin8 propose another variant of the Fermat test, robust
against Carmichael numbers. The basic idea behind this test is that there exist
non-trivial square roots (that is, square roots other than ±1) of 1 modulo an
odd composite n, provided that n is not a power of a prime (Exercise 1.55).
Any Carmichael number has at least six non-trivial square roots of 1.
Suppose that an−1 ≡ 1 (mod n) for some odd integer n. Write n−1 = 2s n′
j ′
with s ∈ N and with n′ odd. Define bj ≡ a2 n (mod n) for j = 0, 1, 2, . . . , s.
It is given that bs ≡ 1 (mod n). If b0 ≡ 1 (mod n), then bj ≡ 1 (mod n) for all
j = 0, 1, 2, . . . , s. On the other hand, if b0 6≡ 1 (mod n), there exists (a unique)
j in the range 0 6 j < s such that bj 6≡ 1 (mod n) but bj+1 ≡ 1 (mod n).
If bj 6≡ −1 (mod n) too, then bj is a non-trivial square root of 1 modulo n.
Encountering such a non-trivial square root of 1 establishes the compositeness
of n. The following definition uses these notations.

Definition 5.14 An odd composite integer n is called a strong pseudoprime


or a Miller–Rabin pseudoprime to base a ∈ Z∗n if either b0 ≡ 1 (mod n) or
bj ≡ −1 (mod n) for some j ∈ {0, 1, 2, . . . , s − 1}. ⊳

If n is a strong pseudoprime to base a, then n is evidently a Fermat pseu-


doprime to base a. It is also true that every strong pseudoprime to base a is
an Euler pseudoprime to the same base a.
The attractiveness of the Miller–Rabin test (Algorithm 5.3) lies in the fact
that for a composite n, the fraction of bases in Z∗n to which n is a strong pseu-
doprime is no more than 1/4. Therefore, for t random bases, the probability
of missing any witness to the compositeness of n is at most 1/4t = 1/22t .

7 Miller proposed a deterministic primality test which is polynomial-time under the as-

sumption that the ERH is true: Gary L. Miller, Riemann’s hypothesis and tests for primality,
Journal of Computer and System Sciences, 13(3), 300–317, 1976.
8 Rabin proposed the randomized version: Michael O. Rabin, Probabilistic algorithm for

testing primality, Journal of Number Theory, 12(1), 128–138, 1980.


276 Computational Number Theory

Algorithm 5.3: Miller–Rabin primality test


Fix the number t of iterations.
Write n − 1 = 2s n′ with s ∈ N and n′ odd.
For i = 1, 2, . . . , t {
Choose a random integer a ∈ {2, 3, . . . , n − 1}.

Compute b ≡ an (mod n).
if (b 6≡ 1 (mod n)) {
Set j = 0.
While ((j 6 s − 2) and (b 6≡ −1 (mod n))) {
Set b = b2 (mod n). /* We have bj+1 ≡ b2j (mod n) */
If (b ≡ 1 (mod n)) return "n is composite".
++j.
}
If (b 6≡ −1 (mod n)), return "n is composite".
}
}
Return "n is prime".

Example 5.15 (1) 891 is a strong pseudoprime only to the ten bases 1, 82,
161, 163, 404, 487, 728, 730, 809, 890. These happen to be precisely the bases
to which n is an Euler pseudoprime (Example 5.13).
(2) 2891 is a strong pseudoprime only to the bases 1 and 2890.
(3) 1891 is a strong pseudoprime to 450 bases in Z∗n . These are again all
the bases to which n is an Euler pseudoprime.
(4) Let nS (n) denote the count of bases in Z∗n to which n is a strong
pseudoprime (also see Example 5.13). We always have nS (n) 6 nE (n). So far
in this example, we had nS (n) = nE (n) only. The strict inequality occurs,
for example, for the Carmichael number n = 561. In that case, nF (n) = 320,
nE (n) = 80, and nS (n) = 10. The following table summarizes these counts
(along with φ(n)) for some small Carmichael numbers.
n φ(n) nF (n) nE (n) nS (n)
561 = 3 × 11 × 17 320 320 80 10
1105 = 5 × 13 × 17 768 768 192 30
1729 = 7 × 13 × 19 1296 1296 648 162
2465 = 5 × 17 × 29 1792 1792 896 70
2821 = 7 × 13 × 31 2160 2160 540 270
6601 = 7 × 23 × 41 5280 5280 1320 330
8911 = 7 × 19 × 67 7128 7128 1782 1782
41041 = 7 × 11 × 13 × 41 28800 28800 14400 450
825265 = 5 × 7 × 17 × 19 × 73 497664 497664 124416 486
The table indicates that not only is the Miller–Rabin test immune against
Carmichael numbers, but also strong witnesses are often more numerous than
Primality Testing 277

Euler witnesses for the compositeness of an odd composite n. The Miller–


Rabin test is arguably the most commonly used primality test of today. ¤

The most time-consuming step in an iteration of each of the above prob-


abilistic tests is an exponentiation modulo n to an exponent 6 n − 1. For all
these tests, the bases a can be chosen as single-precision integers. Nonetheless,
each of these tests runs in O(log3 n) time.

5.2.4 Fibonacci Test


The Fibonacci numbers F0 , F1 , F2 , . . . are defined recursively as follows.

F0 = 0,
F1 = 1,
Fm = Fm−1 + Fm−2 for m > 2.

One can supply an explicit formula for Fm . The √


characteristic√ polynomial
x2 − x − 1 of the recurrence has two roots α = 1+2 5 and β = 1−2 5 . We have:
"à √ !m à √ !m #
αm − β m 1 1+ 5 1− 5
Fm = =√ − for all m > 0. (5.1)
α−β 5 2 2

A relevant property of Fibonacci numbers is given in the next theorem.


³ ´
Theorem 5.16 Let p ∈ P, p 6= 2, 5. Then, Fp−( 5 ) ≡ 0 (mod p), where p5
p
is the Legendre symbol. ³ ´
Proof First, suppose that p5 = 1, that is, 5 has (two) square roots modulo
p, that is, the two roots α, β of x2 − x − 1 (modulo p) belong to Fp . Since
α, β 6≡ 0 (mod p), we have αp−1 ≡ 1 (mod p) and β p−1 ≡ 1 (mod p), so
p−1
−β p−1
Fp−( 5 ) ≡ Fp−1 ≡ α α−β ≡ 0 (mod p).
p
³ ´
Now, suppose that p5 = −1. The roots α, β of x2 −x−1 do not belong to
Fp but to Fp2 . The p-th power Frobenius map Fp2 → Fp2 taking θ 7→ θp clearly
maps a root of x2 − x − 1 to a root of the same polynomial. But αp = α means
α ∈ Fp , a contradiction. So αp = β. Likewise, β p = α. Consequently, αp+1 ≡
p−1
−β p−1
β p+1 ≡ αβ (mod p), that is, Fp−( 5 ) ≡ Fp+1 ≡ α α−β ≡ 0 (mod p). ⊳
p

This property of Fibonacci numbers leads to the following definition.9

Definition 5.17 Let n ∈ N with gcd(n,¡ 10)


¢ = 1. We call n a Fibonacci pseu-
doprime if Fn−( 5 ) ≡ 0 (mod n), where n5 is the Jacobi symbol. ⊳
n

9 The Fibonacci and the Lucas tests are introduced by: Robert Baillie and Samuel S.

Wagstaff, Jr., Lucas pseudoprimes, Mathematics of Computation, 35(152), 1391–1417, 1980.


278 Computational Number Theory

If n 6= 2, 5 is prime, we have Fn−( 5 ) ≡ 0 (mod n). However, some com-


n
posite numbers too satisfy this congruence.

Example 5.18 (1) Lehmer proved that there are infinitely many Fibonacci
pseudoprimes. Indeed, the Fibonacci number F2p for every prime p > 5 is a
Fibonacci pseudoprime.
(2) There are only nine composite Fibonacci pseudoprimes 6 10, 000.
These are 323 = 17 × 19, 377 = 13 × 29, 1891 = 31 × 61, 3827 = 43 × 89,
4181 = 37 × 113, 5777 = 53 × 109, 6601 = 7 × 23 × 41, 6721 = 11 × 13 × 47,
and 8149 = 29 × 281. There are only fifty composite Fibonacci pseudoprimes
6 100, 000. The smallest composite Fibonacci pseudoprimes with four and five
prime factors are 199801 = 7×17×23×73 and 3348961 = 7×11×23×31×61.
(3) There is no known composite integer n ≡ ±2 (mod 5) which is simulta-
neously a Fibonacci pseudoprime and a Fermat pseudoprime to base 2. It is an
open question to find such a composite integer or to prove that no such com-
posite integer exists. There, however, exist composite integers n ≡ ±1 (mod 5)
which are simultaneously Fibonacci pseudoprimes and Fermat pseudoprimes
to base 2. Two examples are 6601 = 7 × 23 × 41 and 30889 = 17 × 23 × 79. ¤

Algorithm 5.4: Fibonacci test for n > 5


¡ ¢
Compute l = n5 .
If (l = 0) return 0.
Compute F = Fn−l modulo n.
If (F = 0), return "n is prime", else return "n is composite".

A very important question pertaining to this test is an efficient compu-


tation of Fn−( 5 ) . Using the straightforward iterative method of sequentially
n
computing F0 , F1 , F2 , . . . , Fn−( 5 ) takes time proportional to n, that is, ex-
n
ponential in log n. In order to avoid this difficulty, we look at the following
identities satisfied by Fibonacci numbers, that hold for all k ∈ N0 .

F2k = Fk (2Fk+1 − Fk ),
2
F2k+1 = Fk+1 + Fk2 , (5.2)
F2k+2 = Fk+1 (Fk+1 + 2Fk ).

Suppose that we want to compute Fm . Consider the binary representation


m = (ms−1 ms−2 . . . m0 )2 . Denote the subscripts Mi = (ms−1 ms−2 · · · mi )2
for i = 0, 1, 2, . . . , s (where Ms is to be interpreted as 0). We start with the
constant values (F0 , F1 ) = (0, 1), a case that corresponds to i = s as explained
below. Subsequently, we run a loop for i = s − 1, s − 2, . . . , 0 (in that sequence)
such that at the end of the i-th iteration of the loop, we have computed
FMi and FMi +1 . Computing these two values from the previous values FMi+1
and FMi+1 +1 is done by looking at the i-th bit mi of m. If mi = 0, then
Primality Testing 279

Mi = 2Mi+1 , so we use first two of the identities (5.2) in order to update


(FMi+1 , FMi+1 +1 ) to (F2Mi+1 , F2Mi+1 +1 ). On the other hand, if mi = 1, then
Mi = 2Mi+1 + 1, and we use the last two of the identities (5.2) for updating
(FMi+1 , FMi+1 +1 ) to (F2Mi+1 +1 , F2Mi+1 +2 ).
This clever adaptation of the repeated square-and-multiply exponentiation
algorithm is presented as Algorithm 5.5. For m = O(n), the for loop in the
algorithm runs for s = O(log n) times, with each iteration involving a constant
number of basic arithmetic operations modulo n. Since each such operation
can be done in O(log2 n) time, Algorithm 5.5 runs in O(log3 n) time.

Algorithm 5.5: Computing Fm modulo n


Initialize F = 0 and Fnext = 1.
For i = s − 1, s − 2, . . . , 1, 0 {
If (mi is 0) {
Compute t ≡ F (2Fnext − F ) (mod n).
2
Compute Fnext ≡ Fnext + F 2 (mod n).
Assign F = t.
} else {
2
Compute t ≡ Fnext + F 2 (mod n).
Compute Fnext ≡ Fnext (Fnext + 2F ) (mod n).
Assign F = t.
}
}
return F.

Example 5.19 We illustrate the working of Algorithm


¡ 5 ¢ 5.5 in order to es-
tablish that 323 is a Fibonacci pseudoprime. Since 323 = −1, we compute
F324 (mod 323). We have 324 = (101000100)2 . The modular computation of
F324 is given in the table below.

i mi Mi FMi (mod 323) FMi +1 (mod 323)


9 0 F0 = 0 F1 = 1
8 1 (1)2 = 1 F1 ≡ F12 + F02 ≡ 1 F2 ≡ F1 (F1 + F0 ) ≡ 1
7 0 (10)2 = 2 F2 ≡ F1 (2F2 − F1 ) ≡ 1 F3 ≡ F22 + F12 ≡ 2
6 1 (101)2 = 5 F5 ≡ F32 + F22 ≡ 5 F6 ≡ F3 (F3 + 2F2 ) ≡ 8
5 0 (1010)2 = 10 F10 ≡ F5 (2F6 − F5 ) ≡ 55 F11 ≡ F62 + F52 ≡ 89
4 0 (10100)2 = 20 F20 ≡ F10 (2F11 − F10 ) ≡ 305 F21 ≡ F112 + F 2 ≡ 287
10
3 0 (101000)2 = 40 F40 ≡ F20 (2F21 − F20 ) ≡ 3 F41 ≡ F212 + F2 ≡ 5
20
2 1 (1010001)2 = 81 2 + F 2 ≡ 34
F81 ≡ F41 F82 ≡ F41 (F41 + 2F40 ) ≡ 55
40
1 0 (10100010)2 = 162 F162 ≡ F81 (2F82 − F81 ) ≡ 0 2 2
F163 ≡ F82 + F81 ≡ 305
0 0 (101000100)2 = 324 F324 ≡ F162 (2F163 − F162 ) ≡ 0 F325 ≡ F1632 + F2
162 ≡ 1

Since F323−( 5
323 ) ≡ 0 (mod 323), 323 is declared as prime. ¤

Here follows a GP/PARI function implementing Algorithm 5.5.


280 Computational Number Theory

gp > FibMod(m,n) = \
local(i,s,t,F,Fnext); \
s = ceil(log(m)/log(2)); \
F = Mod(0,n); Fnext = Mod(1,n); \
i = s - 1; \
while (i>=0, \
if (bittest(m,i) == 0, \
t = F * (2 * Fnext - F); \
Fnext = Fnext^2 + F^2; \
F = t \
, \
t = Fnext^2 + F^2; \
Fnext = Fnext * (Fnext + 2 * F); \
F = t \
); \
i--; \
); \
return(F);
gp > FibMod(324,323)
%1 = Mod(0, 323)
gp > F324 = fibonacci(324)
%2 = 23041483585524168262220906489642018075101617466780496790573690289968
gp > F324 % 323
%3 = 0

The Fibonacci test is deterministic. It recognizes primes as primes, but


also certifies certain composite integers as primes, and Fibonacci certificates
do not change, no matter how many times we run the test on a fixed input.
We now look at generalized versions of the Fibonacci test. These generalized
tests have two advantages. First, a concept of variable parameters (like bases)
is introduced so as to make the test probabilistic. Second, the tests are made
more stringent so that fewer composite numbers are certified as primes.

5.2.5 Lucas Test


An obvious generalization of the Fibonacci sequence is the Lucas sequence
Um = Um (a, b) characterized by two integer parameters a, b.10
U0 = 0,
U1 = 1,
Um = aUm−1 − bUm−2 for m > 2. (5.3)
The characteristic polynomial of this recurrence is x2 −ax+b with discriminant
∆ = a2 − 4b. We assume that ∆ is non-zero and not a perfect square. √
The
two roots α, β of this polynomial are distinct and given by α = a+2 ∆ and

a− ∆
β= 2 . The sequence Um can be expressed explicitly in terms of α, β as
10 The Fibonacci sequence corresponds to a = 1 and b = −1.
Primality Testing 281
αm − β m
Um = Um (a, b) = for all m > 0. (5.4)
α−β
The generalization of Theorem 5.16 for Lucas sequences is the following.
Theorem 5.20 Up−( ∆ ) ≡ 0 (mod p) for a prime p with gcd(p, 2b∆) = 1.
p
Proof Straightforward modification of the proof of Theorem 5.16. ⊳
Definition 5.21 Let Um = Um (a, b) be a Lucas sequence, and ∆ = a2 − 4b.
An integer n with gcd(n, 2b∆) = 1 is called a Lucas pseudoprime with param-
eters (a, b) if Un−( ∆ ) ≡ 0 (mod n). ⊳
n

The Lucas pseudoprimality test is given as Algorithm 5.6.


Algorithm 5.6: Lucas test for n with parameters (a, b)
Compute ∆ =¡ a2¢− 4b.
Compute l = ∆n .
If (l = 0), return "n is composite".
Compute U = Un−l modulo n.
If (U = 0), return "n is prime", else return "n is composite".

We can invoke this test with several parameters (a, b). If any of these
invocations indicates that n is composite, then n is certainly composite. On
the other hand, if all of these invocations certify n as prime, we accept n as
prime. By increasing the number of trials (different parameters a, b), we can
reduce the probability that a composite integer is certified as a prime.
We should now supply an algorithm for an efficient computation of the
value Un−( ∆ ) modulo n. To that effect, we introduce a related sequence Vm =
n
Vm (a, b) as follows.
V0 = 2,
V1 = a,
Vm = aVm−1 − bVm−2 for m > 2. (5.5)
2
As above, let α, β be the roots of the characteristic polynomial x − ax + b.
An explicit formula for the sequence Vm is as follows.
Vm = Vm (a, b) = αm + β m for all m > 0. (5.6)
The sequence Um can be computed from Vm , Vm+1 by the simple formula:
Um = ∆−1 (2Vm+1 − aVm ) for all m > 0.
Therefore, it suffices to compute Vn−( ∆ ) and Vn−( ∆ )+1 for the Lucas test.
n n
This computation can be efficiently done using the doubling formulas:
V2k = Vk2 − 2bk ,
V2k+1 = Vk Vk+1 − abk , (5.7)
2
V2k+2 = Vk+1 − 2bk+1 .
Designing the analog of Algorithm 5.5 for Lucas tests is posed as Exercise 5.17.
282 Computational Number Theory

Example 5.22 Consider the Lucas sequence Um (3, 1) with a = 3 and b = 1:


U0 = 0,
U1 = 1,
Um = 3Um−1 − Um−2 for m > 2.
Thus, U2 = 3U1 − U0 = 3, U3 = 3U2 − U1 = 8, U4 = 3U3 − U2 = 21, and so on.
The discriminant

is a2 − 4b

= 5, and the roots of the characteristic equation
are α = 2 and β = 3−2 5 , that is,
3+ 5

"à √ !m à √ !m #
1 3+ 5 3− 5
Um = Um (3, 1) = √ − for all m > 0.
5 2 2

¢ pseudoprime with parameters 3, 1, that is, U20 ≡


We show that 21 is a¡Lucas
5
0 (mod 21) (we have 21 = 1). We use the sequence Vm for this computation.
Since b = 1, we have the simplified formulas:
V2k = Vk2 − 2,
V2k+1 = Vk Vk+1 − a = Vk Vk+1 − 3,
2
V2k+2 = Vk+1 − 2.
The computation of V20 is shown below. Note that 20 = (10100)2 .

i mi Mi VMi (mod 21) VMi +1 (mod 21)


5 0 V0 ≡ 2 V1 ≡ 3
4 1 (1)2 = 1 V1 ≡ V0 V1 − 3 ≡ 3 V2 ≡ V12 − 2 ≡ 7
3 0 (10)2 = 2 V2 ≡ V12 − 2 ≡ 7 V3 ≡ V1 V2 − 3 ≡ 18
2 1 (101)2 = 5 V5 ≡ V2 V3 − 3 ≡ 18 V6 ≡ V32 − 2 ≡ 7
1 0 (1010)2 = 10 V10 ≡ V52 − 2 ≡ 7 V11 ≡ V5 V6 − 3 ≡ 18
2
0 0 (10100)2 = 20 V20 ≡ V10 −2≡5 V21 ≡ V10 V11 − 3 ≡ 18

Therefore, U20 ≡ ∆−1 (2V21 − aV20 ) ≡ 5−1 (2 × 18 − 3 × 5) ≡ 0 (mod 21). ¤

A stronger Lucas test can be developed like the Miller–Rabin test. Let p be
an odd prime. Consider the Lucas sequence Um with parameters a, b. Assume
that´ gcd(p, 2b∆) = 1, that³is,´α, β are distinct in Fp³ or´Fp2 . This implies that
³
∆ ∆ ∆
p = ±1, that is, p − p is even. Write p − p = 2s t with s, t ∈ N
and with t odd. The condition Uk ≡ 0 (mod p) implies (α/β)k ≡ 1 (mod p).
Since the only square roots of 1 modulo p are ±1, we have either (α/β)t ≡
j
1 (mod p) or (α/β)2 t ≡ −1 (mod p) for some j ∈ {0, 1, . . . , s − 1}. The
condition (α/β)t ≡ 1 (mod p) implies Ut ≡ 0 (mod p), whereas the condition
j
(α/β)2 t ≡ −1 (mod p) implies V2j t ≡ 0 (mod p).

Definition 5.23 Let Um = Um (a, b) be a Lucas sequence with discriminant


∆ = a2 − 4b. Let Vm = Vm (a, b) be the corresponding sequence as defined by
the recurrence and initial conditions (5.5). Let n be a (positive) integer with
Primality Testing 283
¡ ¢
gcd(n, 2b∆) = 1. We write n− ∆ s
n = 2 t with s, t ∈ N and with t odd. We call
n a strong Lucas pseudoprime with parameters (a, b) if either Ut ≡ 0 (mod n)
or V2j t ≡ 0 (mod n) for some j ∈ {0, 1, . . . , s − 1}. ⊳

Obviously, every strong Lucas pseudoprime is also a Lucas pseudoprime


(with the same parameters). The converse of this is not true.

Example 5.24 (1) Example 5.22 shows that¡ 21¢ is a composite Lucas pseudo-
prime with parameters 3, 1. In this case, n − ∆ 2
n = 20 = 2 × 5, that is, s = 2
−1 −1
and t = 5. We have U5 ≡ ∆ (2V6 − aV5 ) ≡ 5 (14 − 54) ≡ 13 6≡ 0 (mod n).
Moreover, V5 ≡ 18 6≡ 0 (mod 21), and V10 ≡ 7 6≡ 0 (mod 21). That is, 21 is
not a strong Lucas pseudoprime with parameters 3, 1.
There are exactly 21 composite Lucas pseudoprimes 6 10, 000 with pa-
rameter 3, 1. These are 21, 323, 329, 377, 451, 861, 1081, 1819, 1891, 2033,
2211, 3653, 3827, 4089, 4181, 5671, 5777, 6601, 6721, 8149 and 8557. Only five
(323, 377, 1891, 4181 and 5777) of these are strong Lucas pseudoprimes.
(2) There is no composite integer 6 107 , which is a strong Lucas pseudo-
prime with respect to both the parameters (3, 1) and (4, 1). ¤

Algorithm 5.7: Strong Lucas test for n with parameters (a, b)


Compute ∆ =¡ a2¢− 4b.
Compute l = ∆ n .
If (l = 0), return "n is composite".
Express n − l = 2s t with s, t ∈ N, t odd.
Compute U = Ut (a, b) modulo n.
If (U = 0), return "n is prime".
Compute V = Vt (a, b) modulo n.
Set j = 0.
while (j < s) {
If (V = 0), return "n is prime".
j
Set V ≡ V 2 − 2b2 t (mod n).
++ j.
}
Return "n is composite".

Algorithm 5.7 presents the strong Lucas primality test. We assume a gen-
eral parameter (a, b). Evidently, the algorithm becomes somewhat neater and
more efficient if we restrict only to parameters of the form (a, 1).
Arnault11 proves an upper bound on the number of pairs (a, b) to which
a composite number n is a strong Lucas pseudoprime with parameters (a, b).
More precisely, Arnault takes a discriminant ∆ and an odd composite integer
n coprime to ∆ (but not equal to 9). By SL(∆, n), he denotes the number
11 François Arnault, The Rabin-Monier theorem for Lucas pseudoprimes, Mathematics of

Computation, 66(218), 869–881, 1997.


284 Computational Number Theory

of parameters (a, b) with 0 6 a, b < n, gcd(n, b) = 1, a2 − 4b ≡ ∆ (mod n),


and with n being a strong Lucas pseudoprime with parameter (a, b). Arnault
proves that SL(∆, n) 6 n/2 for all values of ∆ and n. Moreover, if n is not a
4
product of twin primes of a special form, then SL(∆, n) 6 15 n.

5.2.6 Other Probabilistic Tests


An extra strong Lucas test is covered in Exercise 5.18. Lehmer pseudo-
primes and strong Lehmer pseudoprimes12 are special types of Lucas and
strong Lucas pseudoprimes. Perrin pseudoprimes13 are based on (generalized)
Perrin sequences14 (see Exercise 5.20).
Grantham introduces the concept of Frobenius pseudoprimes.15 Instead of
working with specific polynomials like x2 − ax + b, he takes a general monic
polynomial f (x) ∈ Z[x]. The Fermat test to base a is a special case with
f (x) = x − a, whereas the Lucas test with parameter (a, b) is the special
case f (x) = x2 − ax + b. Moreover, Perrin pseudoprimes are special cases
with deg f (x) = 3. Grantham also defines strong Frobenius pseudoprimes, and
shows that strong (that is, Miller–Rabin) pseudoprimes and strong and extra
strong Lucas pseudoprimes are special cases of strong Frobenius pseudoprimes.
Pomerance, Selfridge and Wagstaff16 have declared an award of $620 for
solving the open problem reported in Example 5.18(3). Grantham, on the
other hand, has declared an award of $6.20 for locating a composite Frobenius
pseudoprime ≡ ±2 (mod 5) with respect to the polynomial x2 + 5x + 5 (or for
proving that no such pseudoprime exists). Grantham justifies that “the low
monetary figure is a reflection of my financial status at the time of the offer,
not of any lower confidence level.” Indeed, he mentions that “I believe that
the two problems are equally challenging.”

5.3 Deterministic Primality Testing


The deterministic complexity of primality testing has attracted serious
research attention for quite a period of time. Under the assumption that the
ERH is true, both the Miller–Rabin test and the Solovay–Strassen test can
be derandomized to polynomial-time primality-testing algorithms. Adleman
12 A. Rotkiewicz, On Euler Lehmer pseudoprimes and strong Lehmer pseudoprimes with

parameters L, Q in arithmetic progressions, Mathematics of Computation, 39(159), 239–247,


1982.
13 William Adams and Daniel Shanks, Strong primality tests that are not sufficient, Math-

ematics of Computation, 39(159), 255–300, 1982.


14 R. Perrin introduced this sequence in L’Intermédiaire des mathématiciens, Vol. 6, 1899.
15 Jon Grantham, Frobenius pseudoprimes, Mathematics of Computation, 70(234), 873–

891, 2001.
16 Richard Kenneth Guy, Unsolved problems in number theory (3rd ed), Springer, 2004.
Primality Testing 285

et al.17 propose a deterministic primality-testing algorithm with running time


O((log n)log log log n ). The exponent log log log n grows very slowly with n, but is
still not a constant—it grows to infinity as n goes to infinity. The elliptic-curve
primality-proving algorithm (ECPP), proposed by Goldwasser and Kilian18
and modified by Atkin and Morain19 , runs in deterministic polynomial time,
but its running time analysis is based upon certain heuristic assumptions.
Indeed, ECPP has not been proved to run in polynomial time for all inputs.
In 2002, Agarwal, Kayal and Saxena20 proposed the first deterministic
polynomial-time algorithm for primality testing. The proof of correctness (and
running time) of this AKS algorithm is not based on any unprovable fact or
heuristic assumption. Several improvements of the AKS test are proposed. The
inventors of the test themselves published a revised version with somewhat
reduced running time. Lenstra and Pomerance21 have proposed some more
significant reduction in the running time of the AKS test.

5.3.1 Checking Perfect Powers


I first show that it is computationally easy to determine whether a positive
integer is an integral power of a positive integer.

Definition 5.25 Let k ∈ N, k > 2. A positive integer n > 2 is called a perfect


k-th power if n = ak for some positive integer a. For k = 2 and k = 3, we talk
about perfect squares and perfect cubes in this context. We call n a perfect
power if n is a perfect k-th power for some k > 2. ⊳

Algorithm 5.8 determines whether n is a perfect k-th √ power. It is based on


the Newton–Raphson method.22 We first compute a = ⌊ k n⌋, and then check
whether n = ak . For computing the integer k-th root a of n, we essentially com-
the polynomial f (x) = xk − n. We start with an initial approx-
pute a zero of √
imation a0 > k n. A good starting point could be a0 = 2⌈l/k⌉ = 2⌊(l+k−1)/k⌋ ,
where l is the bit length of n. Subsequently, we refine the approximation by
computing a decreasing sequence a1 , a2 , . . . . For computing ai+1 from ai , we
approximate the curve y = f (x) by the tangent to the curve passing through
17 Leonard M. Adleman, Carl Pomerance and Robert S. Rumely, On distinguishing prime

numbers from composite numbers, Annals of Mathematics, 117(1), 173–206, 1983.


18 Shafi Goldwasser and Joe Kilian, Almost all primes can be quickly certified, STOC,

316–329, 1986.
19 A. O. L. Atkin and François Morain, Elliptic curves and primality proving, Mathematics

of Computation, 61, 29–68, 1993.


20 Manindra Agrawal, Neeraj Kayal and Nitin Saxena, PRIMES is in P, Annals of Math-

ematics, 160(2), 781–793, 2004. This article can also be downloaded from the Internet
site: http://www.cse.iitk.ac.in/users/manindra/algebra/primality.pdf. Their first article on this
topic is available at http://www.cse.iitk.ac.in/users/manindra/algebra/primality_original.pdf.
21 Hendrik W. Lenstra, Jr. and Carl Pomerance, Primality testing with Gaussian periods,

available from http://www.math.dartmouth.edu/∼carlp/aks041411.pdf, 2011.


22 This numerical method was developed by Sir Isaac Newton (1642–1727) and Joseph

Raphson (1648–1715).
286 Computational Number Theory
0−f (ai )
(ai , f (ai )). This line meets the x-axis at x = ai+1 . Therefore, f ′ (ai ) =
ai+1 −ai ,
(k−1)aki +n
that is, ai+1 = ai − ff′(ai)
(ai ) . Simple calculations show that ai+1 = kak−1
.
j i k
k
(k−1)ai +n
In order to avoid floating-point calculations, we update ai+1 = kak−1
.
i

Algorithm 5.8: Checking whether n is a perfect k-th power


Compute the bit length l of n.
Set the initial approximation a = 2⌈l/k⌉ = 2⌊(l+k−1)/k⌋ .
Repeat until explicitly broken {
Compute the temporary value t = ak−1 . ¹ º
(k − 1)at + n
Compute the next integer approximation b = .
kt
If (b > a), break the loop, else set a = b.
}
If n equals ak , return "True", else return "False".

I now prove the correctness of Algorithm 5.8, that is, a = ⌊ k n⌋ when
the loop is broken. Since k > 2,√ the function f (x) = xk − n is convex23 to
the right of the real root ξ = n. For resolving ambiguities, let us denote
k

(k−1)ak
i +n
αi+1 = kak−1 , and ai+1 = ⌊αi+1 ⌋. If ai > ξ, the convexity of f implies
i
that ξ < αi+1 < ai . Taking floor gives ⌊ξ⌋ 6 ai+1 < ai . This means that
the integer approximation stored in a decreases strictly so long as a > ⌊ξ⌋,
and eventually obtains the value aj = ⌊ξ⌋ for some j. I show that in the next
iteration, the loop is broken after the computation of b = aj+1 from a = aj .
If n is actually a k-th power, ⌊ξ⌋ = ξ. In that case, the next integer
approximation aj+1 also equals ⌊ξ⌋ = ξ. Thus, aj+1 = aj , and the loop is
broken. On the other hand, if n is not a perfect k-th power, then ⌊ξ⌋ < ξ, that
is, the current approximation aj < ξ. Since, in this case, we have f (aj ) < 0
and f ′ (aj ) > 0, the next real approximation αj+1 is larger than aj , and so its
floor aj+1 is > aj . Thus, the condition b > a is again satisfied.
It is well known from the results of numerical analysis that the Newton–
Raphson method converges quadratically. That is, the Newton–Raphson loop
is executed at most O(log n) times. The exponentiation t = ak−1 in each
iteration can be computed in O(log2 n log k) time. The rest of an iteration
runs in O(log2 n) time. Therefore, Algorithm 5.8 runs in O(log3 n log k) time.
We now vary k. The maximum possible exponent k for which n can be
a perfect k-th power corresponds to the case n = 2k , that is, to k = lg n =
log n
log 2 . Therefore,
j
it suffices to check whether n is a perfect k-th power for
k
log n
k = 2, 3, . . . , log 2 . In particular, we always have k = O(log n), and checking
whether n is a perfect power finishes in O(log4 n log log n) or O˜(log4 n) time.24
23 A real-valued function f is called convex in the real interval [a, b] if f ((1 − t)a + tb) 6

(1 − t)f (a) + tf (b) for every real t ∈ [0, 1].


24 The soft-
O notation O˜(t(n)) stands for O(t(n) logs (t(n))) for some constant s.
Primality Testing 287

In this context, it is worthwhile to mention that there is no need to consider


all possible values of k in the range 2 6 k 6 lg n. One may instead consider
only the prime values of k in this range. If k is composite with a prime divisor
p, then the condition that n is a perfect k-th power implies that n is a perfect
p-th power too. Since lg n is not a large value in practice, one may use a
precomputed table of all primes 6 lg n. Skipping Algorithm 5.8 for composite
k reduces the running time of the perfect-power-testing algorithm by a factor
of O(log log n) (by the prime number theorem).

5.3.2 AKS Test


The Agarwal–Kayal–Saxena (AKS) test is based on the following charac-
terization of primes.
¡ ¢
Theorem 5.26 An integer n > 2 is prime if and only if n| nk for all k =
1, 2, . . . , n − 1. ¡ ¢
Proof We have nk = n(n−1)···(n−k+1)
k!¡ ¢ . If n is prime and 1 6 k 6 n − 1, the
numerator of this expression for nk is divisible by n, whereas the denominator
is not. On the other hand, suppose that n is composite. Let p be any prime
divisor of n, and let vp (n) = e be the multiplicity of p in n. Take k = p.
Neither of the factors n − 1, n − 2, . . . , n − p + 1 is¡divisible
¢ by p, whereas
¡ ¢ p!
is divisible by p (but not by p2 ). Consequently, vp ( np ) = e − 1, so n6 | np . ⊳

Corollary 5.27 Let n be an odd positive integer, and let a be coprime to n.


Then, (x + a)n ≡ xn + a (mod n) if and only if n is a prime. ⊳

A straightforward use of Theorem 5.26 or Corollary 5.27 calls for the com-
putation of n − 1 binomial coefficients modulo n, leading to an exponential
algorithm for primality testing. This problem can be avoided by taking a poly-
nomial h(x) of small degree and by computing (x + a)n and xn + a modulo n
and h(x), that is, we now use the arithmetic of the ring Zn [x]/hh(x)i. Let r
denote the degree of h(x) modulo n. All intermediate products are maintained
as polynomials of degrees < r. Consequently, an exponentiation of the form
(x + a)n or xn can be computed in O(r2 log3 n) time. If r = O(logk n) for some
constant k, this leads to a polynomial-time test for the primality of n.
Composite integers n too may satisfy (x + a)n ≡ xn + a (mod n, h(x)),
and the AKS test appears to smell like another rotten probabilistic primality
test. However, there is a neat way to derandomize this algorithm. In view of
the results in Section 5.3.1, we assume that n is not a perfect power.
The AKS algorithm proceeds in two stages. In the first stage, we take
h(x) = xr − 1 for some small integer r. We call r suitable in this context if
ordr (n) > lg2 n.§ A simple
¨ argument establishes that for all n > 5,690,034,
a suitable r 6 lg5 n exists. An efficient computation of ordr (n) requires
the factorization of r and φ(r). However, since r is O(lg5 n), it is fine to use
an exponential (in lg r) algorithm for obtaining these factorizations. Another
alternative is to compute n, n2 , n3 , . . . modulo r until ordr (n) is revealed.
288 Computational Number Theory

Algorithm 5.9: The AKS primality test


If n is a perfect power, return "False".
/* Stage§ 1 */¨ § ¨ § ¨
For r = lg2 n + 1, lg2 n + 2, lg2 n + 3, . . . {
Compute the order t = ordr (n).
If (t > lg2 n), break.
}
For a = 2, 3, . . . , r {
If (gcd(a, n) > 1), return "False".
}
/* Stage 2 */ jp k
For a = 1, 2, 3, . . . , φ(r) lg n {
If (x + a)n 6≡ xn + a (mod n, xr − 1), return "False".
}
return "True".

In the second stage, one works with the smallest suitable r available from
n n r
the first stage. jpCheckingkwhether (x + a) ≡ x + a (mod n, x − 1) for all
a = 1, 2, . . . , φ(r) lg n allows one to deterministically conclude about the
primality of n. A proof of the fact that only these values of a suffice is omitted
here. The AKS test given as Algorithm 5.9 assumes that n > 5,690,034.

Example 5.28 (1) Take n = 8,079,493. The search for §a suitable ¨ r is shown
below. Since lg2 n = 526.511 . . . , this search starts from lg2 n + 1 = 528.
ord528 (n) = 20, ord529 (n) = 506, ord530 (n) = 52, ord531 (n) = 174,
ord532 (n) = 9, ord533 (n) = 60, ord534 (n) = 22, ord535 (n) = 212,
ord536 (n) = 6, ord537 (n) = 89, ord538 (n) = 67, ord539 (n) = 15,
ord540 (n) = 36, ord541 (n) = 540.
Therefore, r = 541 (which is a prime), and φ(r) = 540. One then verifies
that gcd(2, n) = gcd(3, n) = · · · = gcd(541, n) = 1,j that is, n khas no small
p
prime factors. One then computes the bound B = φ(r) lg n = 533, and
checks that the congruence (x + a)n ≡ xn + a (mod n, xr − 1) holds for all
a = 1, 2, . . . , B. For example, (x + 1)n ≡ x199 + 1 (mod n, x541 − 1), and
xn + 1 ≡ xn rem 541 + 1 ≡ x199 + 1 (mod n, x541 − 1). So 8,079,493 is prime.
(2) For n = 19,942,739, we have lg2 n = 588.031 . . . . We calculate
ord590 (n) = 58, ord591 (n) = 196, ord592 (n) = 36, and ord593 (n) = 592. So,
r = 593 is suitable, and n has no factors 6 593. The bound for the second stage
is now B = 590. However, for a = 1, one obtains (x + 1)n ≡ 9029368x592 +
919485x591 + 10987436x590 + · · · + 9357097x + 17978236 (mod n, x593 − 1),
whereas xn + 1 ≡ xn rem 593 + 1 ≡ x149 + 1 (mod n, x593 − 1). We conclude that
19,942,739 is not prime. Indeed, 19,942,739 = 2,683 × 7,433. ¤
Primality Testing 289

One can easily work out that under schoolbook arithmetic, the AKS al-
gorithm runs in O(lg16.5 n) time. If one uses fast arithmetic (based on FFT),
this running time drops to O˜(lg10.5 n). This exponent is quite high compared
to the Miller–Rabin exponent (three). That is, one does not plan to use the
AKS test frequently in practical applications. Lenstra and Pomerance’s im-
provement of the AKS test runs in O˜(log6 n) time.

5.4 Primality Tests for Numbers of Special Forms


The primality tests described until now work for general integers. For in-
tegers of specific types, more efficient algorithms can be developed.

5.4.1 Pépin Test for Fermat Numbers


m
A Fermat number fm is of the form fm = 22 + 1 for some integer m > 0.
In this section, we address the question which of the integers fm are prime.
One easily checks that if 2a + 1 is a prime, then a has to be of the form a = 2m
m
for some m > 0. However, all integers of the form fm = 22 + 1 are not prime.
It turns out that f0 = 3, f1 = 5, f2 = 17, f3 = 257, and f4 = 65537 are
m
prime. Fermat conjectured that all numbers of the form fm = 22 + 1 are
5
prime. Possibly, f5 = 22 + 1 = 4294967297 was too large to Fermat for hand
calculations. Indeed, f5 = 641 × 6700417 is not prime. We know no value of m
other than 0, 1, 2, 3, 4, for which fm is prime. It is an open question whether
the Fermat numbers fm for m > 5 are all composite.
A deterministic polynomial-time primality test for Fermat numbers can be
developed based on the following result.

Theorem 5.29 [Pépin’s test]25 The Fermat number fm for m > 1 is prime
if and only if 3(fm −1)/2 ≡ −1 (mod fm ).
Proof [if] The condition 3(fm −1)/2 ≡ −1 (mod fm ) implies 3fm −1 ≡
m
1 (mod fm ), that is, ordfm (3)|fm − 1 = 22 , that is, ordfm (3) = 2h for
some h in the range 1 6 h 6 2m . However, if h < 2m , we cannot have
m
3(fm −1)/2 ≡ −1 (mod fm ). So ordfm (3) = 22 = f³m −´1, that is, fm is prime.
[only if] If fm is prime, we have 3(fm −1)/2 ≡ f3m (mod fm ) by Euler’s
³ ´ ³ ´
criterion. By the quadratic reciprocity law, f3m = (−1)(fm −1)(3−1)/4 f3m =
2m −1
³ ´ ³ ´ ³ 2m ´ ³ m ´ ¡ ¢
fm fm 2 +1 (−1)2 +1
(−1)2 3 = 3 = 3 = 3 = 32 = −1. ⊳

Therefore, Pépin’s test involves only a modular exponentiation to the ex-


m
ponent (fm − 1)/2 = 22 −1 , that can be computed by square operations only.
25 Jean François Théophile Pépin (1826–1904) was a French mathematician.
290 Computational Number Theory
3
Example 5.30 (1) We show that f3 = 22 + 1 = 257 is prime. We need to
23 −1 7
compute 32 , that is, 32 modulo 257. Repeated squaring gives:
0
32 ≡ 3 (mod 257),
21 0
3 ≡ (32 )2 ≡ 9 (mod 257),
2 1
32 ≡ (32 )2 ≡ 81 (mod 257),
3 2
32 ≡ (32 )2 ≡ 136 (mod 257),
4 3
32 ≡ (32 )2 ≡ 249 (mod 257),
5 4
32 ≡ (32 )2 ≡ 64 (mod 257),
6 5
32 ≡ (32 )2 ≡ 241 (mod 257),
7 6
32 ≡ (32 )2 ≡ 256 ≡ −1 (mod 257).
31 5
(2) Let us compute 32 modulo f5 = 22 + 1 = 4294967297.
0
32 ≡ 3 (mod 4294967297),
21 0
3 ≡ (32 )2 ≡ 9 (mod 4294967297),
2 1
32 ≡ (32 )2 ≡ 81 (mod 4294967297),
3 2
32 ≡ (32 )2 ≡ 6561 (mod 4294967297),
4 3
32 ≡ (32 )2 ≡ 43046721 (mod 4294967297),
5 4
32 ≡ (32 )2 ≡ 3793201458 (mod 4294967297),
6 5
32 ≡ (32 )2 ≡ 1461798105 (mod 4294967297),
···
30 29
32 ≡ (32 )2 ≡ 1676826986 (mod 4294967297),
31 30
32 ≡ (32 )2 ≡ 10324303 (mod 4294967297).
Since 3(f5 −1)/2 6≡ −1 (mod f5 ), we conclude that f5 is not prime. ¤

5.4.2 Lucas–Lehmer Test for Mersenne Numbers


A Mersenne number26 is of the form Mn = 2n − 1 for n ∈ N. It is easy
to prove that if Mn is prime, then n has to be prime too. The converse of
this is not true. For example, 211 − 1 = 2047 = 23 × 89 is not prime. The
computational question in this context is to find the primes p for which Mp =
2p − 1 are primes. These primes Mp are called Mersenne primes.
Mersenne primes are useful for a variety of reasons. We do not know an
easily computable formula to generate only prime numbers. Mersenne primes
turn out to be the largest explicitly known primes. The collective Internet
effort called the Great Internet Mersenne Prime Search (GIMPS)27 bags an
26 The French mathematician Marin Mersenne (1588–1648) studied these numbers.
27 Look at the Internet site http://www.mersenne.org/.
Primality Testing 291

award of US$100,000 from the Electronic Frontier Foundation for the first
discoverer of a prime with ten million (or more) digits. This prime happens to
be the 47-th28 Mersenne prime 243,112,609 −1, a prime with 12,978,189 decimal
digits, discovered on August 23, 2008 in the Department of Mathematics,
UCLA. Running a deterministic primality test on such huge numbers is out
of question. Probabilistic tests, on the other hand, do not furnish iron-clad
proofs for primality and are infeasible too for these numbers. A special test
known as the Lucas–Lehmer test 29 is used for deterministically checking the
primality of Mersenne numbers.
A positive integer n is called a perfect number if it equals the sum of
its proper positive integral divisors. For example, 6 = 1 + 2 + 3 and 28 =
1 + 2 + 4 + 7 + 14 are perfect numbers. It is known that n is an even perfect
number if and only if it is of the form 2p−1 (2p − 1) with Mp = 2p − 1 being
a (Mersenne) prime. Thus, Mersenne primes have one-to-one correspondence
with even perfect numbers. We do not know any odd perfect number. We do
not even know whether an odd perfect number exists.
Theorem 5.31 [Lucas–Lehmer test] The sequence si , i > 0, is defined as:
s0 = 4,
si = s2i−1 − 2 for i > 1.
For p ∈ P, Mp is prime if and only if sp−2 ≡ 0 (mod Mp ). ⊳
I am not going to prove this theorem here. The theorem implies that we
need to compute the Lucas–Lehmer residue sp−2 (mod Mp ). The obvious
iterative algorithm of computing si from si−1 involves a square operation
followed by reduction modulo Mp . Since 2p ≡ 1 (mod Mp ), we write s2i−1 −2 =
2p n1 + n0 , and obtain s2i−1 − 2 ≡ n1 + n0 (mod Mp ). One can extract n1 , n0
by bit operations. Thus, reduction modulo Mp can be implemented efficiently.
Example 5.32 (1) We prove that M7 = 27 − 1 = 127 is prime. The calcula-
tions are shown below.
i si (mod M7 )
0 4
1 42 − 2 ≡ 14
2 142 − 2 ≡ 194 ≡ 1 × 27 + 66 ≡ 1 + 66 ≡ 67
3 672 − 2 ≡ 4487 ≡ 35 × 27 + 7 ≡ 35 + 7 ≡ 42
4 422 − 2 ≡ 1762 ≡ 13 × 27 + 98 ≡ 13 + 98 ≡ 111
5 1112 − 2 ≡ 12319 ≡ 96 × 27 + 31 ≡ 96 + 31 ≡ 127 ≡ 0
Since s7−2 ≡ 0 (mod M7 ), M7 is prime.
28 This is the 45th Mersenne prime to be discovered. Two smaller Mersenne primes were

discovered later. It is not yet settled whether there are more undiscovered Mersenne primes
smaller than M43,112,609 .
29 The French mathematician François Édouard Anatole Lucas (1842–1891) introduced

this test in 1856. It was later improved in 1930s by the American mathematician Derrick
Henry Lehmer (1905–1991).
292 Computational Number Theory

(2) We now run the Lucas–Lehmer test on M11 = 211 − 1 = 2047.

i si (mod M11 )
0 4
1 42 − 2 ≡ 14
2 142 − 2 ≡ 194
3 1942 − 2 ≡ 37634 ≡ 18 × 211 + 770 ≡ 18 + 770 ≡ 788
4 7882 − 2 ≡ 620942 ≡ 303 × 211 + 398 ≡ 303 + 398 ≡ 701
5 7012 − 2 ≡ 491399 ≡ 239 × 211 + 1927 ≡ 239 + 1927 ≡ 2166
≡ 1 × 2048 + 118 ≡ 1 + 118 ≡ 119
6 1192 − 2 ≡ 14159 ≡ 6 × 211 + 1871 ≡ 6 + 1871 ≡ 1877
7 18772 − 2 ≡ 3523127 ≡ 1720 × 211 + 567 ≡ 1720 + 567 ≡ 2287
≡ 1 × 211 + 239 ≡ 1 + 239 ≡ 240
8 240 − 2 ≡ 57598 ≡ 28 × 211 + 254 ≡ 28 + 254 ≡ 282
2

9 2822 − 2 ≡ 79522 ≡ 38 × 211 + 1698 ≡ 38 + 1698 ≡ 1736

Since s11−2 6≡ 0 (mod M11 ), M11 is not prime. ¤


Primality Testing 293

Exercises
1. For a positive integer n, the sum of the reciprocals of all primes 6 n asymptot-
ically approaches ln ln n. Using this fact, derive that the sieve of Eratosthenes
can be implemented to run in O(n ln ln n) time.
2. Modify the sieve of Eratosthenes so that it runs in O(n) time. (Hint: Mark
each composite integer only once.)
3. If both p and 2p + 1 are prime, we call p a Sophie Germain prime30 , and 2p + 1
a safe prime. It is conjectured that there are infinitely many Sophie Germain
primes. In this exercise, you are asked to extend the sieve of Section 5.1.4 for
locating the smallest Sophie Germain prime p > n for a given positive integer
n ≫ 1. Sieve over the interval [n, n + M ].
(a) Determine a value of M such that there is (at least) one Sophie Germain
prime of the form n + i, 0 6 i 6 M , with high probability. The value of M
should not be unreasonably large.
(b) Describe a sieve to throw away the values of n + i for which either n + i
or 2(n + i) + 1 has a prime divisor less than or equal to the t-th prime. Take
t as a constant (like 100).
(c) Describe the gain in the running time that you achieve using the sieve.
4. Let s and t be bit lengths with s > t.
(a) Describe an efficient algorithm to locate a random s-bit prime p such that
a random prime of bit length t divides p − 1.
(b) Express the expected running time of your algorithm in terms of s, t.
(c) How can you adapt the sieve of Section 5.1.4 in this computation?
5. Let p, q be primes, n = pq, a ∈ Z∗n , and d = gcd(p − 1, q − 1).
(a) Prove that n is a pseudoprime to base a if and only if ad ≡ 1 (mod n).
(b) Prove that n is pseudoprime to exactly d2 bases in Z∗n .
(c) Let q = 2p − 1. To how many bases in Z∗n is n a pseudoprime?
(d) Repeat Part (c) for the case q = 2p + 1.
6. Let n ∈ N be odd and composite. If n is not a pseudoprime to some base in
Z∗n , prove that n is not a pseudoprime to at least half of the bases in Z∗n .
7. Prove the following properties of any Carmichael number n.
(a) (p − 1)|(n − 1) for every prime divisor p of n.
(b) n is odd.
(c) n is square-free.
(d) n has at least three distinct prime factors.
8. Suppose that 6k + 1, 12k + 1 and 18k + 1 are all prime for some k ∈ N. Prove
that (6k + 1)(12k + 1)(18k + 1) is a Carmichael number. Find two Carmichael
numbers of this form.
30 This is named after the French mathematician Marie-Sophie Germain (1776–1831). The

name safe prime is attributed to the use of these primes in many cryptographic protocols.
294 Computational Number Theory

9. Prove that for every odd prime r, there exist only finitely many Carmichael
numbers of the form rpq (with p, q primes).
10. Prove that:
(a) Every Euler pseudoprime to base a is also a pseudoprime to base a.
(b) Every strong pseudoprime to base a is also a pseudoprime to base a.
11. Let n be an odd composite integer. Prove that:
(a) There is at least one base a ∈ Z∗n , to which n is not an Euler pseudoprime.
(b) n is not an Euler pseudoprime to at least half of the bases in Z∗n .
12. Prove that if n > 3 is a pseudoprime to base 2, then 2n − 1 is an Euler
pseudoprime to base 2 and also a strong pseudoprime to base 2.
13. Let p and q = 2p − 1 be primes, and n = pq. Prove that:
(a) n is an Euler pseudoprime to exactly one-fourth of the bases in Z∗n .
(b) If p ≡ 3 (mod 4), then n is a strong pseudoprime to exactly one-fourth of
the bases in Z∗n .
14. Deduce the formulas (5.1), (5.4) and (5.6).
15. Prove that for all integers m > 1 and n > 0, the Fibonacci numbers satisfy
Fm+n = Fm Fn+1 + Fm−1 Fn .
Deduce the identities (5.2).
16. Prove the doubling formulas (5.7) for Vm defined in Section 5.2.5.
17. Write an analog of Algorithm 5.5 for the computation of Vm (mod n).
18. [Extra strong Lucas pseudoprime] Let Um = Um (a, 1) be the Lucas sequence
with parameters a, 1, and Vm = Vm (a, 1) the corresponding V sequence. Take
an odd
¡ ∆ ¢ positive integer n with gcd(n, 2∆a) = 1, where ∆ = a2 − 4. We write
s
n− n = 2 t with t odd. We call n an extra strong Lucas pseudoprime to base
a if either (i) Ut ≡ 0 (mod n) and Vt ≡ ±2 (mod n), or (ii) V2j t ≡ 0 (mod n)
for some j ∈ {0, 1, 2, . . . , s − 1}. Prove that:
(a) If n ∈ P does not divide 2∆, then n is an extra strong Lucas pseudoprime.
(b) An extra strong Lucas pseudoprime is also a strong Lucas pseudoprime.
19. The Lehmer sequence Ūm with parameters a, b is defined as:
Ū0 = 0,
Ū1 = 1,
Ūm = Ūm−1 − b Ūm−2 if m > 2 is even,
Ūm = a Ūm−1 − b Ūm−2 if m > 3 is odd.
2

½ x m− amx + b.2
Let α, β be the roots of
(α − β )/(α − β 2 ) if m is even,
(a) Prove that Ūm =
(αm − β m )/(α − β) if m is odd.
(b) Let ∆ = a − 4b, and n a positive integer with gcd(n, 2a∆) = 1. We call n
is Lehmer pseudoprime with parameters a, b if Ūn−( a∆ ) ≡ 0 (mod n). Prove
n
that n is a Lehmer pseudoprime with parameters a, b if and only if n is a
Lucas pseudoprime with parameters a, ab.
Primality Testing 295

20. The Perrin sequence P (n) is defined recursively as:

P (0) = 3,
P (1) = 0,
P (2) = 2,
P (n) = P (n − 2) + P (n − 3) for n > 3.

(a) Let ω ∈ C be a primitive third root of unity. Verify that for i = 0, 1, 2,


s r s r
i 3 1 1 23 2i 3 1 1 23
ρi = ω + + ω −
2 6 3 2 6 3

satisfy the characteristic equation x3 − x − 1 = 0 of the Perrin sequence.


(b) The real number ρ0 is called the plastic number. Deduce that P (n) is the
integer nearest to ρn0 for all sufficiently large n. (Hint: Show that P (n) =
ρn0 + ρn1 + ρn2 for all n > 0.)
(Remark: It is proved that if n is prime, then n|P (n). Adams and Shanks (see
Footnote 13) discover explicit examples of composite n (like 271441 = 5212 )
satisfying n|P (n); these are (composite) Perrin pseudoprimes. Grantham31
proves that there are infinitely many composite Perrin pseudoprimes.)
21. An odd prime of the form k2r + 1 with r > 1, k odd, and k < 2r , is called a
Proth prime.32 The first few Proth primes are 3, 5, 13, 17, 41, 97.
(a) Describe an efficient way to recognize whether an odd positive integer
(not necessarily prime) is of the form k2r + 1 with r > 1, k odd, and k < 2r .
Henceforth, we call such an integer a Proth number.
(b) Suppose that a Proth number n = k2r + 1 satisfies the condition that
a(n−1)/2 ≡ −1 (mod n) for some integer a. Prove that n is prime.
(c) Devise a yes-biased probabilistic polynomial-time algorithm to test the
primality of a Proth number.
(d) Discuss how the algorithm of Part (c) can produce an incorrect answer.
Also estimate the probability of this error.
(e) Prove that if the extended Riemann hypothesis (ERH) is true, one can
convert the algorithm of Part (c) to a deterministic polynomial-time algorithm
to test the primality of a Proth number.
22. [Pocklington primality test]33 Let n be a positive odd integer whose primality
is to be checked. Write n − 1 = uv, where u is a product of small primes, and
v has no small prime divisors. Suppose that the complete prime factorization
of u is known, whereas no prime factor of v is known. Suppose also that for
31 Jon Grantham, There are infinitely many Perrin pseudoprimes, Journal of Number

Theory, 130(5), 1117–1128, 2010.


32 The Proth test was developed by the French farmer François Proth (1852–1879), a

self-taught mathematician.
33 This test was proposed by the English mathematician Henry Cabourn Pocklington

(1870–1952) and Lehmer.


296 Computational Number Theory

some integer a, we have an−1 ≡ 1 (mod n), whereas gcd(a(n−1)/q , n) = 1 for


all prime divisors q of u.
(a) Prove that every prime factor p of n satisfies p ≡ 1 (mod u). (Hint: First,
show that u| ordp (a).) √
(b) Conclude that if u > n , then n is prime.
(c) Describe a situation when the criterion of Part (b) leads to an efficient
algorithm for determining the primality of n.
23. Suppose that Ay is a yes-biased algorithm for proving the primality of an
integer, and An a no-biased algorithm for the same purpose. Prove or disprove:
By running Ay and An alone, we can deterministically conclude about the
primality of an integer.
24. [Binary search algorithm
√ for finding integer k-th roots] Suppose that we want
to compute a = ⌊ k n⌋ for positive integers n and k. We start with a lower
bound L and an upper bound U on a. We then run a loop, each iteration
of which computes M = (L + U )/2, and decides whether a > M or a < M .
Depending on the outcome of this comparison, we refine one of the two bounds
L, U . Complete the description of this algorithm for locating a. Determine a
suitable condition for terminating the loop. Compare the performance of this
method with the Newton–Raphson method discussed in Section 5.3.1, for
computing integer k-th roots.
25. Let si be the sequence√used in the √ Lucas–Lehmer test for Mersenne numbers.
i i
Prove that si = (2 + 3)2 + (2 − 3)2 for all i > 0.
m
26. Prove that for m > 2, the Fermat number fm = 22 + 1 is prime if and only
if 5(fm −1)/2 ≡ −1 (mod fm ).

Programming Exercises
Write GP/PARI functions to implement the following.
27. Obtaining a random prime of a given bit length l.
28. The Solovay–Strassen test.
29. The Miller–Rabin test.
30. The Fibonacci test (you may use the function FibMod() of Section 5.2.4).
31. The Lucas test.
32. The strong Lucas test.
33. The AKS test.
34. The Pépin test.
35. The Lucas–Lehmer test.
Chapter 6
Integer Factorization

6.1 Trial Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299


6.2 Pollard’s Rho Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
6.2.1 Floyd’s Variant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
6.2.2 Block GCD Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
6.2.3 Brent’s Variant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
6.3 Pollard’s p – 1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306
6.3.1 Large Prime Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
6.4 Dixon’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
6.5 CFRAC Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
6.6 Quadratic Sieve Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318
6.6.1 Sieving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
6.6.2 Incomplete Sieving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
6.6.3 Large Prime Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
6.6.4 Multiple-Polynomial Quadratic Sieve Method . . . . . . . . . . . . . . . . . . . . 326
6.7 Cubic Sieve Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
6.8 Elliptic Curve Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
6.9 Number-Field Sieve Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340

Now that we are able to quickly recognize primes as primes, it remains to com-
pute the prime factorization of (positive) integers. This is the tougher part
of the story. Research efforts for decades have miserably failed to produce
efficient algorithms for factoring integers. Even randomization does not seem
to help here. Today’s best integer-factoring algorithms run in subexponen-
tial time which, although better than exponential time, makes the factoring
problem practically intractable for input integers of size only thousand bits.
This chapter is an introduction to some integer-factoring algorithms. We
start with a few fully exponential algorithms. These old algorithms run effi-
ciently in certain specific situations, so we need to study them.
Some subexponential algorithms are discussed next. Assume that n is the
(positive) integer to be factored. A subexponential expression in log n is, in
this context, an expression of the form
£ ¤
L(n, ω, c) = exp (c + o(1))(ln n)ω (ln ln n)1−ω ,

where ω is a real number in the open interval (0, 1), and c is a positive real
number. Plugging in ω = 0 in L(n, ω, c) gives a polynomial expression in ln n.
On the other hand, for ω = 1, the expression L(n, ω, c) is fully exponential in
ln n. For 0 < ω < 1, the expression L(n, ω, c) is something between polynomial
and exponential, and is called a subexponential expression in ln n.

297
298 Computational Number Theory

Most modern integer-factoring algorithms have running times of the form


L(n, ω, c). Smaller values of ω lead to expressions closer to polynomial. During
1970s and 1980s, several integer-factoring algorithms are designed with ω =
1/2. It was, for a while, apparent that ω = 1/2 is possibly the best exponent
we can achieve. In 1989, the number-field sieve method was proposed. This
algorithm corresponds to ω = 1/3, and turns out to be the fastest (both
theoretically and practically) known algorithm for factoring integers.
A complete understanding of the number-field sieve method calls for math-
ematical background well beyond what this book can handle. We will mostly
study some L(n, 1/2, c) integer-factoring algorithms. We use the special sym-
bol L[n, c] or Ln [c] to stand for L(n, 1/2, c). If n is understood from the con-
text, we will abbreviate this notation further as L[c].
An integer n we wish to factor is first subjected to a primality test. Only
when we are certain that n is composite do we attempt to factor it. In view of
the results of Section 5.3.1, we may also assume that n is not a perfect power.
An algorithm that computes a non-trivial decomposition n = uv with 1 <
u, v < n (or that only supplies a non-trivial factor u of n) can be recursively
used to factor u and v = n/u. So it suffices to compute a non-trivial split of n.
Composite integers of some special forms are often believed to be more
difficult to factor than others. For example, the (original) RSA encryption
algorithm is based on composite moduli of the form n = pq with p, q being
primes of roughly the same bit length. For such composite integers, computing
a non-trivial split (or factor) is equivalent to fully factoring n.
The generic factoring function in GP/PARI is factor(). The function that
specifically handles integers is factorint(). One may supply an optional flag
to factorint() to indicate the user’s choice of the factoring algorithm (see the
online GP/PARI manual). Here follows a sample conversation with the GP/PARI
calculator. The last example illustrates that for about a minute, the calculator
tries to use easy methods for factoring 2301 − 1. When these easy methods fail,
it pulls out a big gun like the MPQS (multiple-polynomial quadratic sieve).
A new version of GP/PARI, however, completely factors 2301 − 1 in about two
minutes. An attempt to factor 2401 −1 in this version runs without any warning
message, and fails to output the result in fifteen minutes.

gp > factor(2^2^5+1)
%1 =
[641 1]

[6700417 1]

gp > factorint(2^2^5+1)
%2 =
[641 1]

[6700417 1]

gp > #
timer = 1 (on)
Integer Factorization 299

gp > factorint(2^101-1)
time = 68 ms.
%3 =
[7432339208719 1]

[341117531003194129 1]

gp > factorint(2^201-1)
time = 1,300 ms.
%4 =
[7 1]

[1609 1]

[22111 1]

[193707721 1]

[761838257287 1]

[87449423397425857942678833145441 1]

gp > factorint(2^201-1,1)
time = 1,540 ms.
%5 =
[7 1]

[1609 1]

[22111 1]

[193707721 1]

[761838257287 1]

[87449423397425857942678833145441 1]

gp > factorint(2^301-1)
*** Warning: MPQS: the factorization of this number will take several hours.
*** user interrupt after 1mn, 34,446 ms.
gp >

6.1 Trial Division


An obvious way to factor
√ a composite n is to divide n by potential divisors
d in the range 2 6 d 6 ⌊ n⌋. A successful trial division by d would also replace
n by n/d. The process continues until n reduces to 1 or a prime factor.
It is easy to conceive that trial divisions need to be carried out by only
prime numbers. Indeed, if d is a composite divisor of n, then all prime divisors
300 Computational Number Theory

of d divide n and are smaller than d. Therefore, before d is tried, all prime
divisors of d are already factored out from n. √
However, one then requires a list of primes 6 n . It is often not feasible
to have such a list. On the other hand, checking every potential divisor d for
primality before making a trial division of n by d is a massive investment of
time. A practical trade-off can be obtained using the following idea.1
After 2 is tried as a potential divisor, there is no point dividing n by
even integers. This curtails the space for potential divisors by a factor of 2.
Analogously, we should not carry out trial division by multiples of 3 (other
than 3 itself) and by multiples of 5 (other than 5 itself). What saving does it
produce? Consider d > 2 × 3 × 5 = 30 with r = d rem 30. If r is not coprime to
30, then d is clearly composite. Moreover, φ(30) = (2−1)×(3−1) × (5−1) = 8,
that is, only 8 (out of 30) values of r may be prime. Thus, trial division may be
skipped by d > 30 unless r = d rem 30 is among 1, 7, 11, 13, 17, 19, 23, 29. This
reduces the search space for potential divisors to about one-fourth. One may,
if one chooses, use other small primes like 7, 11, 13, . . . in this context. But
considering four or more small primes leads to additional bookkeeping, and
produces improvements that are not too dramatic. It appears that considering
only the first three primes 2, 3, 5 is a practically optimal choice.
Example 6.1 Let us factor
n = 361 + 1 = 127173474825648610542883299604
by trial division. We first divide n by 2. Since n is even, 2 is indeed a factor
of n. Replacing n by n/2 gives
63586737412824305271441649802
which is again even. So we make another trial division by 2, and reduce n to
31793368706412152635720824901.
A primality test reveals that this reduced n is composite. So we divide this n
by the remaining primes < 30, that is, by 3, 5, 7, 11, 13, 17, 19, 23, 29. It turns
out that n is divisible by neither of these primes.
As potential divisors d > 30 of n, we consider only the values 30k+r for r =
1, 7, 11, 13, 17, 19, 23, 29. That is, we divide n by 31, 37, 41, 43, 47, 49, 53, 59, 61,
67, 71, 73, 77, 79, 83, 89, 91, 97, . . . . Some of these divisors are not prime (like
49, 77, 91), but that does not matter. Eventually, we detect a divisor 367 =
12 × 30 + 7 of n. Clearly, 367 has to be prime. Reducing n by n/361 gives
86630432442539925437931403.
A primality test shows that this reduced n is prime. So there is no need to
carry out trial division further, that is, we have the complete factorization
n = 361 + 1 = 22 × 367 × 86630432442539925437931403.
1 The trial-division algorithm in this form is presented in: Henri Cohen, A course in

computational algebraic number theory, Graduate Text in Mathematics, 138, Springer, 1993.
Integer Factorization 301

This example illustrates that the method of trial division factors n efficiently
if all (except at most one) prime divisors of n are small. ¤

A complete
√ factorization of n by trial division calls for a worst-case running
time of O˜( n). This bound is achievable, for example, for RSA moduli of the
form n = pq with bit sizes of the primes p, q being nearly half of that of n. So
trial division is impractical except only for small values of n (like n 6 1020 ).
For factoring larger integers, more sophisticated ideas are needed.
Before employing these sophisticated algorithms, it is worthwhile to divide
n by a set of small primes. That reveals the small factors of n, and may reduce
its size considerably so as to make the sophisticated algorithms run somewhat
faster. In view of this, it will often be assumed that the number to be factored
does not contain small prime divisors.

Example 6.2 Let us use trial division to extract the small prime factors of
n = 360 + 1 = 42391158275216203514294433202.
Considering all potential divisors d 6 104 decomposes n as
n = 360 + 1 = 2 × 41 × 241 × 6481 × 330980468807135443441.
Primality tests indicate that the last factor is composite. We use sophisticated
algorithms for factoring this part of n. ¤

In practice, a program for factoring integers would load at startup a pre-


calculated list of small primes. This list may be as large as consisting of
the first ten million primes. It is useful to make trial divisions by all these
primes. If the list is not so large or not available at all, one may divide n by
d ≡ 1, 7, 11, 13, 17, 19, 23, 29 (mod 30), as explained earlier.

6.2 Pollard’s Rho Method


Let n be a composite positive integer which is not a perfect power, and p
a prime divisor of n. We generate a sequence of integers x0 , x1 , x2 , . . . modulo
n. We start with a random x0 ∈ Zn , and subsequently obtain xi = f (xi−1 )
for i > 1. Here, f is an easily computable function with the property that
x0 , x1 , x2 , . . . behaves like a random sequence in Zn . The most common choice
is f (x) = x2 + a (mod n), where a 6= 0, −2 is an element of Zn .
Since xi is generated from xi−1 using a deterministic formula, and since
Zn is finite, the sequence x0 , x1 , x2 , . . . must be eventually periodic. We also
consider the sequence x′0 , x′1 , x′2 , . . . , where each x′i is the reduction of xi mod-
ulo (the unknown prime) p. The reduced sequence x′0 , x′1 , x′2 , . . . is periodic
too, since each x′i is an element of the finite set Zp .
302 Computational Number Theory

Let τ be the (smallest) period of the sequence x0 , x1 , x2 , . . . , and τ ′ the


period of the sequence x′0 , x′1 , x′2 , . . . . It is clear that τ ′ |τ . If it so happens
that τ ′ < τ , then there exist i, j with i < j such that xi 6≡ xj (mod n) but
x′i ≡ x′j (mod p), that is, xi ≡ xj (mod p). In that case, d = gcd(xj − xi , n) is
a proper divisor of n. On the other hand, if τ ′ = τ , then for all i, j with i < j,
gcd(xj − xi , n) is either 1 or n.
Pollard’s rho method2 is based on these observations. However, computing
xj − xi for all i, j with i < j is a massive investment of time. Moreover, we
need to store all xi values until a pair i, j with gcd(xj − xi , n) > 1 is located.

6.2.1 Floyd’s Variant


The following proposition leads to a significant saving in time and space.
Proposition 6.3 There exists k > 1 such that xk ≡ x2k (mod p).
Proof We have xi ≡ xj (mod p) for some i, j with i < j. But then xk ≡
xk+s(j−i) (mod p) for all k > i and for all s ∈ N. So take k to be any multiple
of j − i larger than or equal to i, and adjust s accordingly. ⊳
We compute the sequence dk = gcd(x2k − xk , n) for k = 1, 2, 3, . . . until a
gcd dk > 1 is located. This variant of the Pollard rho method is called Floyd’s
variant,3 and is supplied as Algorithm 6.1.

Algorithm 6.1: Pollard’s rho method (Floyd’s variant)


Initialize x and y to a random element of Zn . /* x = y = x0 */
Repeat until explicitly broken {
Update x = f (x). /* x = xk here */
Update y = f (f (y)). /* y = x2k here */
Compute d = gcd(y − x, n).
If (d > 1), break the loop.
}
Return d.

5
Example 6.4 Let us try to factor the Fermat number n = f5 = 22 + 1 =
4294967297 by Pollard’s rho method. We take the sequence-generating func-
tion f (x) = x2 + 1 (mod n). The computations done by Algorithm 6.1 are
illustrated in the following table. We start with the initial term x0 = 123.
The non-trivial factor 641 of n is discovered by Pollard’s rho method. The
corresponding cofactor is 6700417. That is, f5 = 641 × 6700417. Both these
factors are prime, that is, we have completely factored f5 . ¤

2 John M. Pollard, A Monte Carlo method for factorization, BIT Numerical Mathematics,

15(3), 331–334, 1975.


3 Robert W. Floyd, Non-deterministic algorithms, Journal of ACM, 14(4), 636–644, 1967.

Floyd’s paper (1967) presents an algorithn for finding cycles in graphs. Pollard uses this
algorithm in his factoring paper (1975). This explains the apparent anachronism.
Integer Factorization 303

k xk = f (xk−1 ) x2k−1 = f (x2(k−1) ) x2k = f (x2k−1 ) gcd(x2k − xk , n)


0 123 − 123 −
1 15130 15130 228916901 1
2 228916901 33139238 3137246933 1
3 33139238 2733858014 4285228964 1
4 3137246933 2251701130 1572082836 1
5 2733858014 1467686897 1705858549 1
6 4285228964 1939628362 4277357175 1
7 2251701130 2578142297 3497150839 1
8 1572082836 962013932 1052924614 1
9 1467686897 2363824544 2370126580 1
10 1705858549 2736085524 4145405717 1
11 1939628362 4082917731 786147859 1
12 4277357175 954961871 3660251575 1
13 2578142297 2240785070 2793469517 1
14 3497150839 2846659272 812871200 1
15 962013932 385091241 3158671825 1
16 1052924614 2852659993 4184934804 1
17 2363824544 1777772295 9559945 1
18 2370126580 4234257460 1072318990 1
19 2736085524 1259270631 2648112086 1
20 4145405717 473166199 504356342 1
21 4082917731 1740566372 2252385287 1
22 786147859 2226234309 3261516148 641
The running time of Algorithm 6.1 is based upon the following result.
Proposition 6.5 Let S be a set of size n. If k 6 n elements are selected
from S with replacement, the probability pk of at least one match among the
k chosen elements (that is, at least one element is selected more than once) is
k−1
Yµ ¶
i −k2
pk = 1 − 1− ≈ 1 − e 2n .
i=1
n
√ √
We have pk ≈ 1/2 for k ≈ 1.177 n. Also, pk ≈ 0.99 for k ≈ 3.035 n. ⊳
Let us now look at the reduced sequence x′0 , x′1 , x′2 , . . . modulo p. If we

assume that this sequence is random, then after Θ( p ) iterations, we expect
to obtain a collision, that is, i, j with i < j and x′i ≡ x′j (mod p). Therefore,

we have to make an √ expected number of Θ( p) gcd calculations. Since n has
a prime
√ divisor 6 n, the expected running time of Pollard’s rho method √ is
O˜( 4 n ). Although this is significantly better than the running time O˜( n )
of trial division, it is still an exponential function in log n. So Pollard’s rho
method cannot be used for factoring large integers.
The running time of Pollard’s rho method is output-sensitive in the sense
that it depends upon the smallest prime factor p of n. If p is small, Pollard’s

rho method may detect p quite fast, namely in O˜( p ) time. On the other
hand, trial division by divisors 6 p requires O˜(p) running time.
304 Computational Number Theory

6.2.2 Block GCD Calculation


Several modifications of Pollard’s rho method reduce the running time
by (not-so-small) constant factors. The most expensive step in the loop of
Algorithm 6.1 is the computation of the gcd. We can avoid this computation
in every iteration. Instead, we accumulate the product (x2k −xk )(x2k+2 −xk+1 )
(x2k+4 − xk+2 ) · · · (x2k+2r−2 − xk+r−1 ) (mod n) for r iterations. We compute
the gcd of this product with n. If all these x2k+2i − xk+i are coprime to n,
then the gcd of the product with n is also 1. On the other hand, if some
gcd(x2k+2i − xk+i , n) > 1, then the gcd d of the above product with n is
larger than 1. If d < n, we have found a non-trivial divisor of n. If d = n, we
compute the individual gcds of x2k+2i − xk+i with n for i = 0, 1, 2, . . . , r − 1
until a gcd larger than 1 is located. This strategy is called block gcd calculation.
In practice, one may use r in the range 20 6 r 6 50 for optimal performance.

Algorithm 6.2: Pollard’s rho method (Brent’s variant)


Initialize x to a random element of Zn . /* Set x = x0 */
Initialize y to f (x). /* Set y = x1 */
Set t = 1. /* t stores the value 2r−1 */
Compute d = gcd(y − x, n).
while (d equals 1) {
Set x = y. /* Store in x the element x2r −1 */
For s = 1, 2, . . . , t, set y = f (y). /* Compute x(2r −1)+s , no gcd */
For s = 1, 2, . . . , t { /* Compute x(2r −1)+s with gcd */
Set y = f (y).
Compute d = gcd(y − x, n).
If (d > 1), break the inner (for) loop.
}
Set t = 2t. /* Prepare for the next iteration */
}
Return d.

6.2.3 Brent’s Variant


Brent’s variant4 reduces the number of gcd computations considerably. As
earlier, τ ′ stands for the smallest period of the sequence x′0 , x′1 , x′2 , . . . modulo
p. We first compute d = gcd(x1 −x0 , n). If this gcd is larger than 1, then τ ′ = 1,
and the algorithm terminates. If d = 1, we compute gcd(xk − x1 , n) for k = 3,
that is, we now check whether τ ′ = 2. Then, we compute gcd(xk − x3 , n)
for n = 6, 7. Here, τ ′ is searched among 3, 4. More generally, for r ∈ N,
we compute gcd(xk − x2r −1 , n) for 2r + 2r−1 6 k 6 2r+1 − 1. This means
4 Richard P. Brent, An improved Monte Carlo factorization algorithm, BIT, 20, 176–184,

1980.
Integer Factorization 305

that we search for values of τ ′ in the range 2r−1 + 1 6 τ ′ 6 2r . The last


value of xk in the loop is used in the next iteration for the computation of
gcd(xk − x2r+1 −1 , n). Since we have already investigated the possibilities for
τ ′ 6 2r , we start gcd computations from k = (2r+1 − 1) + 2r + 1 = 2r+1 + 2r .
Algorithm 6.2 formalizes these observations.
5
Example 6.6 We factor the Fermat number n = f5 = 22 + 1 = 4294967297
by Brent’s variant of Pollard’s rho method. As in Example 6.4, we take f (x) =
x2 + 1 (mod n), and start the sequence with x0 = 123. A dash in the last
column of the following table indicates that the gcd is not computed.
r t = 2r−1 2r − 1 x2r −1 k xk gcd(xk − x2r −1 , n)
0 0 0 123 1 15130 1
1 1 1 15130 2 228916901 −
3 33139238 1
2 2 3 33139238 4 3137246933 −
5 2733858014 −
6 4285228964 1
7 2251701130 1
3 4 7 2251701130 8 1572082836 −
9 1467686897 −
10 1705858549 −
11 1939628362 −
12 4277357175 1
13 2578142297 1
14 3497150839 1
15 962013932 1
4 8 15 962013932 16 1052924614 −
17 2363824544 −
18 2370126580 −
19 2736085524 −
20 4145405717 −
21 4082917731 −
22 786147859 −
23 954961871 −
24 3660251575 1
25 2240785070 1
26 2793469517 641
We get the factorization f5 = 641×6700417. Here, only 11 gcds are computed.
Compare this with the number (22) of gcds computed in Example 6.4. ¤
The working of Algorithm 6.2 is justified if the sequence x′0 , x′1 , x′2 , . . . is
totally periodic. In general, the initial non-periodic part does not turn out to
be a big problem, since it suffices to locate a multiple of τ ′ after the√sequence
gets periodic. The expected running time of Algorithm 6.2 is O˜( 4 n ). The
concept of block gcd calculation can be applied to Brent’s variant also.
306 Computational Number Theory

6.3 Pollard’s p – 1 Method


Trial division and Pollard’s rho method are effective for factoring n if n
has small prime factor(s). Pollard’s p − 1 method5 is effective if p − 1 has small
(in the following sense) prime factors for some prime divisor p of n. Note that
p itself may be large, but that p − 1 has only small prime factors helps.

Definition 6.7 Let B ∈ N be a bound. A positive integer n is called B-


power-smooth if whenever q e |n for q ∈ P and e ∈ N, we have q e 6 B. ⊳

If p − 1 is B-power-smooth for a prime divisor p of n for a small bound


B, one can extract the prime factor p of n using the following idea. Let a
be any integer coprime to n (and so to p too). By Fermat’s little theorem,
ap−1 ≡ 1 (mod p), that is, p|(ap−1 − 1). More generally, p|(ak(p−1) − 1) for all
k ∈ N. If, on the other hand, n6 | (ak(p−1) − 1), then gcd(ak(p−1) − 1, n) is a
non-trivial divisor of n. If p − 1 is B-power-smooth for a known bound B, a
suitable multiple k(p − 1) of p − 1 can be obtained easily as follows.
Let p1 , p2 , . . . , pt be all the primes 6 B. For each
Qt i = 1, 2, . . . , t, define
ei = ⌊log B/ log pi ⌋. Consider the exponent E = i=1 pei i . If p − 1 is B-
power-smooth, then this exponent E is a multiple of p − 1. Note that E may
be quite large.e However,
e2 e3
there
et
is no need to compute E explicitly. We have
1
aE ≡ (· · · ((ap1 )p2 )p3 · · ·)pt (mod n). That is, aE (mod n) can be obtained
by a suitable sequence of exponentiations by small primes. It is also worthwhile
to consider the fact that instead of computing aE (mod n) at one shot and
e1 e1 e2
then computing gcd(aE − 1, n), one may sequentially compute ap1 , (ap1 )p2 ,
e1 e2 e3
((ap1 )p2 )p3 , and so on. After each exponentiation by pei i , one computes a
gcd. If p − 1 is B ′ -power-smooth for some B ′ < B, this saves exponentiations
by prime powers pei i for primes pi in the range B ′ < pi 6 B.
Pollard’s p − 1 method is given as Algorithm 6.3. By the prime number
theorem, there are O(B/ log B) primes 6 B. Since each pei i 6 B, the expo-
ei
nentiation api (mod n) can be done in O(log B log2 n) time. Each gcd d can
be computed in O(log2 n) time. So Algorithm 6.3 runs in O(B log2 n) time.
Pollard’s p − 1 method is output-sensitive, and is efficient for small B.
Let us now investigate when Algorithm 6.3 fails to output a non-trivial
factor d of n, that is, the cases when the algorithm returns d = 1 or d = n.
The case d = 1 indicates that p−1 is not B-power-smooth. We can then repeat
the algorithm with a larger value of B. Of course, B is unknown a priori. So
one may repeat Pollard’s p − 1 method for gradually increasing values of B.
However, considering large values of B (like B > 106 ) is usually not a good
idea. One should employ subexponential algorithms instead.

5 John M. Pollard, Theorems of factorization and primality testing, Proceedings of the

Cambridge Philosophical Society, 76(3), 521–528, 1974.


Integer Factorization 307

Algorithm 6.3: Pollard’s p − 1 method


Choose a base a ∈ Z∗n .
For each small prime q 6 B {
Set e = ⌊log B/ log q⌋.
For i = 1, 2, . . . , e, compute a = aq (mod n).
Compute d = gcd(a − 1, n).
If (d > 1), break the loop.
}
Return d.

The case d = n is trickier. Suppose that n = pq with distinct primes


p, q. Suppose that both p − 1 and q − 1 are B-power-smooth. In that case,
both the congruences aE ≡ 1 (mod p) and aE ≡ 1 (mod q) hold. Therefore,
gcd(aE − 1, n) = n, that is, we fail to separate p and q. It is, however, not
necessary for q − 1 to be B-power-smooth in order to obtain d = n. If ordq (a)
is B-power-smooth, then also we may get d = n. For some other base a′ ,
we may have ordq (a′ ) not B-power-smooth. In that case, gcd(a′E − 1, n) is a
non-trivial factor (that is, p) of n. This argument can be readily extended to
composite integers other than those of the form pq. When we obtain d = n as
the output of Algorithm 6.3, it is worthwhile to run the algorithm for a few
other bases a. If we always obtain d = n, we report failure.
Example 6.8 (1) We factor n = 1602242212193303 by Pollard’s p−1 method.
For the bound B = 16 and the base a = 2, we have the following sequence of
computations. Here, pi denotes the i-th prime, and ei = ⌊log B/ log pi ⌋.
ei
p
i pi ei aold anew ≡ aold
i
(mod n) gcd(anew − 1, n)
1 2 4 2 65536 1
2 3 2 65536 50728532231011 1
3 5 1 50728532231011 602671824969697 1
4 7 1 602671824969697 328708173547029 1
5 11 1 328708173547029 265272211830818 1
6 13 1 265272211830818 167535681578625 1
Since the algorithm outputs 1, we try with the increased bound B = 32, and
obtain the following sequence of computations. The base is again 2.
ei
p
i pi ei aold anew ≡ aold
i
(mod n) gcd(anew − 1, n)
1 2 5 2 4294967296 1
2 3 3 4294967296 98065329688549 1
3 5 2 98065329688549 1142911340053463 1
4 7 1 1142911340053463 1220004434814213 1
5 11 1 1220004434814213 1358948392128938 1
6 13 1 1358948392128938 744812946196424 1
7 17 1 744812946196424 781753012202740 1
8 19 1 781753012202740 1512971883283798 17907121
308 Computational Number Theory

This gives the factorization n = 17907121 × 89475143 with both the factors
prime. Although the bound B is 32, we do not need to raise a to the powers
pei i for pi = 23, 29, 31. This is how computing the gcd inside the loop helps.
In order to see why Pollard’s p − 1 method works in this example, we note
the factorization of p − 1 and q − 1, where p = 17907121 and q = 89475143.

p − 1 = 24 × 32 × 5 × 7 × 11 × 17 × 19,
q − 1 = 2 × 19 × 2354609.

So p − 1 is 32-power-smooth (indeed, it is 19-power-smooth), but q − 1 is not.


(2) Let us now try to factor n = 490428787297681 using Pollard’s p − 1
method. We work with the bound B = 24 and the base 2.
ei
p
i pi ei aold anew ≡ aold
i
(mod n) gcd(anew − 1, n)
1 2 4 2 65536 1
2 3 2 65536 190069571010731 1
3 5 1 190069571010731 185747041897072 1
4 7 1 185747041897072 401041553458073 1
5 11 1 401041553458073 31162570081268 1
6 13 1 31162570081268 248632716971464 1
7 17 1 248632716971464 67661917074372 1
8 19 1 67661917074372 1 490428787297681

We fail to separate p from q in this case. So we try other bases. The base 23
works as given in the next table. The bound remains B = 24 as before.
ei
p
i pi ei aold anew ≡ aold
i
(mod n) gcd(anew − 1, n)
1 2 4 23 487183864388533 1
2 3 2 487183864388533 422240241462789 1
3 5 1 422240241462789 64491241974109 1
4 7 1 64491241974109 88891658296507 1
5 11 1 88891658296507 143147690932110 1
6 13 1 143147690932110 244789218562995 1
7 17 1 244789218562995 334411207888980 1
8 19 1 334411207888980 381444508879276 17907121

We obtain n = pq with p = 17907121 and q = 27387361. We have

p − 1 = 24 × 32 × 5 × 7 × 11 × 17 × 19,
q − 1 = 25 × 32 × 5 × 7 × 11 × 13 × 19.

This shows that q − 1 is not 24-power-smooth (it is 32-power-smooth). How-


ever, ordq (2) = 1244880 = 24 × 32 × 5 × 7 × 13 × 19 is 24-smooth, so the
base 2 yields the output d = n. On the other hand, ordq (23) = 480480 =
25 × 3 × 5 × 7 × 11 × 13 is not 24-power-smooth, so the base 23 factors n. ¤
Integer Factorization 309

6.3.1 Large Prime Variation


If Algorithm 6.3 outputs d = 1, then p−1 is not B-power-smooth. However,
it may be the case that p − 1 = uv, where u is B-power-smooth, and v is a
prime > B. In this context, v is called a large prime divisor of p − 1. If v is
not too large, like v 6 B ′ for some bound B ′ > B, adding a second stage to
the p − 1 factoring algorithm may factor n non-trivially.
Let pt+1 , pt+2 , . . . , pt′ be all the primes in the range B < pi 6 B ′ . When
Algorithm 6.3 terminates, it has already computed aE (mod n) with u|E.
We then compute (aE )pt+1 , ((aE )pt+1 )pt+2 , and so on (modulo n) until we get
a gcd > 1, or all primes 6 B ′ are exhausted. This method works if v is a
square-free product of large primes between B (exclusive) and B ′ (inclusive).
One could have executed Algorithm 6.3 with the larger bound B ′ (as B).
But then each ei would be large (compare ⌊log B ′ / log pi ⌋ with ⌊log B/ log pi ⌋).
Moreover, for all large primes pi , we take ei = 1. Therefore, using two different
bounds B, B ′ helps. In practice, one may take B 6 106 and B ′ 6 108 .

Example 6.9 We attempt to factor n = 600735950824741 using Pollard’s


p − 1 algorithm augmented by the second stage. We use the bounds B = 20
and B ′ = 100, and select the base a = 2. The two stages proceed as follows.
Stage 1
ei
p
i pi ei aold anew ≡ aold
i
(mod n) gcd(anew − 1, n)
1 2 4 2 65536 1
2 3 2 65536 250148431288895 1
3 5 1 250148431288895 404777501817913 1
4 7 1 404777501817913 482691043667836 1
5 11 1 482691043667836 309113846434884 1
6 13 1 309113846434884 297529593613895 1
7 17 1 297529593613895 544042973919022 1
8 19 1 544042973919022 358991192319517 1

Stage 2
ei
p
i pi ei aold anew ≡ aold
i
(mod n) gcd(anew − 1, n)
9 23 1 358991192319517 589515560613570 1
10 29 1 589515560613570 111846253267074 1
11 31 1 111846253267074 593264734044925 1
12 37 1 593264734044925 168270169378399 1
13 41 1 168270169378399 285271807182347 1
14 43 1 285271807182347 538018099945609 15495481

This leads to the factorization n = pq with primes p = 15495481 and q =


38768461. Let us see why the second stage works. We have the factorizations:
p−1 = 23 × 32 × 5 × 7 × 11 × 13 × 43,
q−1 = 22 × 3 × 5 × 79 × 8179.
310 Computational Number Theory

In particular, ordp (2) = 129129 = 3 × 7 × 11 × 13 × 43 is 20-power-smooth


apart from the large prime factor 43. On the contrary, q − 1 and ordq (2) =
12922820 = 22 × 5 × 79 × 8179 have a prime divisor larger than B ′ = 100. ¤

6.4 Dixon’s Method


Almost all modern integer-factoring algorithms rely on arriving at a con-
gruence of the following form (often called a Fermat congruence):
x2 ≡ y 2 (mod n).
This implies that n|(x − y)(x + y), that is, n = gcd(x − y, n) × gcd(x + y, n)
(assuming that gcd(x, y) = 1). If gcd(x − y, n) is a non-trivial factor of n, we
obtain a non-trivial split of n.

Example 6.10 We have 899 = 302 − 12 = (30 − 1)(30 + 1), so gcd(30 − 1,


899) = 29 is a non-trivial factor of 899. Moreover, 3×833 = 2499 = 502 −12 =
(50 − 1)(50 + 1), and gcd(50 − 1, 833) = 49 is a non-trivial factor of 833. ¤

The obvious question now is whether this method always works. Any odd
¡ n+1 ¢2 ¡ n−1 ¢2
integer n can be expressed as n = ¡ n+1 2 n−1−¢ ¡ 2 . However,
¢ this gives us
n+1 n−1
only the trivial factorization n = 2 − 2 2 + 2 = 1 × n.
Since it is easy to verify whether n is a perfect power, we assume that n has
m > 2 distinct prime factors. We may also assume that n contains no small
prime factors. In particular, n is odd. Then, for any y ∈ Z∗n , the congruence
x2 ≡ y 2 (mod n) has exactly 2m solutions for x. The only two trivial solutions
are x ≡ ±y (mod n). Each of the remaining 2m − 2 solutions yields a non-
trivial split of n. If x and y are random elements satisfying x2 ≡ y 2 (mod n),
m
then gcd(x − y, n) is a non-trivial factor of n with probability 2 2m−2 > 12 .
This factoring idea works if we can make available a non-trivial congruence
of the form x2 ≡ y 2 (mod n). The modern subexponential algorithms propose
different ways of obtaining this congruence. We start with a very simple idea.6
We choose a non-zero x ∈ Zn randomly, and compute a = x2 rem n (an
2 2
integer in {0, 1, 2, . . . , n − 1}). If a is¥√ ¦ square, say a = y , then x ≡
a perfect
2
y (mod n). However, there are only n − 1 non-zero√ perfect squares in Zn .
So √the probability that a is of the form y 2 is about 1/ n, that is, after trying
O( n ) random values of x, we expect to arrive at the desired congruence
x2 ≡ y 2 (mod n). This gives an algorithm with exponential running time.
In order to avoid this difficulty, we choose a factor base B consisting of
the first t primes p1 , p2 , . . . , pt . We choose a random non-zero x ∈ Z∗n , and
6 John D. Dixon, Asymptotically fast factorization of integers, Mathematics of Compu-

tation, 36, 255–260, 1981.


Integer Factorization 311

compute a = x2 rem n. We now do not check whether a is a perfect square,


but we check whether a can be factored completely over B. If so, we have
x2 ≡ pα 1 α2 αt
1 p2 · · · pt (mod n). Such a congruence is called a relation. If all αi
are even, the right side of the relation is a perfect square. But, as mentioned
in the last paragraph, such a finding is of very low probability.
What we do instead is to collect many such relations. We do not expect
to obtain a relation for every random value of x. If x2 rem n does not factor
completely over the factor base B, we discard that value of x. We store only
those values of x for which x2 rem n factors completely over B.
x21 ≡ pα11 α12
1 p2 · · · pα
t
1t
(mod n),
x22 ≡ pα21 α22
1 p2 · · · pα
t
2t
(mod n),
···
x2s ≡ pαs1 αs2
1 p2 · · · pα
t
st
(mod n).
One must not use a factoring algorithm to verify whether x2 rem n factors com-
pletely over B. One should perform trial divisions by the primes p1 , p2 , . . . , pt .
We obtain a relation if and only if trial divisions reduce x2 rem n to 1.
After many relations are collected, we combine the relations so as to obtain
a congruence of the desired form x2 ≡ y 2 (mod n). The combination stage
involves s variables β1 , β2 , . . . , βs . The collected relations yield
Yt µ Ps ¶
2 β1 2 β2 2 βs αij βi
(x1 ) (x2 ) · · · (xs ) ≡ pj i=1
(mod n),
j=1
that is,
³ ´2
xβ1 1 xβ2 2 · · · xβs s ≡ pγ11 pγ22 · · · pγt t (mod n),
Ps
where γj = i=1 αij βi for j = 1, 2, . . . , t. The left side of the last congruence
is already a square. We adjust the quantities β1 , β2 , . . . , βs in such a way that
the right side is also a square, that is, all of γ1 , γ2 , . . . , γt are even, that is,
γj = 2δj for all j = 1, 2, . . . , t. We then have the desired congruence
³ ´2 ³ ´2
xβ1 1 xβ2 2 · · · xβs s ≡ pδ11 pδ22 · · · pδt t (mod n).

The condition that all γj are even can be expressed as follows.


α11 β1 + α21 β2 + · · · + αs1 βs ≡ 0 (mod 2),
α12 β1 + α22 β2 + · · · + αs2 βs ≡ 0 (mod 2),
···
α1t β1 + α2t β2 + · · · + αst βs ≡ 0 (mod 2).
This is a set of t linear equations in s variables β1 , β2 , . . . , βs . The obvious so-
lution β1 = β2 = · · · = βs = 0 leads to the trivial congruence 12 ≡ 12 (mod n).
If s > t, the system has non-zero solutions. The number of solutions of the sys-
tem is 2s−r , where r is the rank of the t × s coefficient matrix (Exercises 6.14
312 Computational Number Theory

and 8.12). Some of the non-zero solutions are expected to split n. The choices
of t and s are explained later.
Example 6.11 We factor n = 64349 by Dixon’s method. We take the factor
base B = {2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47} (the first 15 primes).
Relations are obtained for the following random values of x. The values of x,
for which x2 rem n does not split completely over B, are not listed here.
x21 ≡ 265072 ≡ 58667 ≡ 7 × 172 × 29 (mod n)
x22 ≡ 535232 ≡ 22747 ≡ 232 × 43 (mod n)
x23 ≡ 347952 ≡ 29939 ≡ 72 × 13 × 47 (mod n)
x24 ≡ 176882 ≡ 506 ≡ 2 × 11 × 23 (mod n)
x25 ≡ 580942 ≡ 833 ≡ 72 × 17 (mod n)
x26 ≡ 370092 ≡ 61965 ≡ 36 × 5 × 17 (mod n)
x27 ≡ 153762 ≡ 3150 ≡ 2 × 32 × 52 × 7 (mod n)
x28 ≡ 314142 ≡ 47481 ≡ 3 × 72 × 19 (mod n)
x29 ≡ 624912 ≡ 41667 ≡ 3 × 17 × 19 × 43 (mod n)
x210 ≡ 467702 ≡ 17343 ≡ 32 × 41 × 47 (mod n)
x211 ≡ 192742 ≡ 299 ≡ 13 × 23 (mod n)
x212 ≡ 42182 ≡ 31200 ≡ 25 × 3 × 52 × 13 (mod n)
x213 ≡ 232032 ≡ 35475 ≡ 3 × 52 × 11 × 43 (mod n)
x214 ≡ 269112 ≡ 18275 ≡ 52 × 17 × 43 (mod n)
x215 ≡ 586972 ≡ 28000 ≡ 25 × 53 × 7 (mod n)
x216 ≡ 500892 ≡ 4760 ≡ 23 × 5 × 7 × 17 (mod n)
x217 ≡ 255052 ≡ 984 ≡ 23 × 3 × 41 (mod n)
x218 ≡ 268202 ≡ 19278 ≡ 2 × 34 × 7 × 17 (mod n)
x219 ≡ 185772 ≡ 1242 ≡ 2 × 33 × 23 (mod n)
x220 ≡ 94072 ≡ 11774 ≡ 2 × 7 × 292 (mod n)

We have collected 20 relations with the hope that at least one non-trivial
solution of β1 , β2 , . . . , β20 will lead to a non-trivial decomposition of n. In
matrix notation, the above system of linear congruences can be written as
β 
1

 β2 
0 0 0 1 0 0 1 0 0 0 0 5 0 0 5 3 3 1 1
 β   0 
1  3 
β
0  4 
0 0 0 0 0 6 2 1 1 2 0 1 1 0 0 0 1 4 3
  β5   0 
0 0 0 0 0 1 2 0 0 0 0 2 2 2 3 1 0 0 0 0   0
1 0 2 0 2 0 1 2 0 0 0 0 0 0 1 1 0 1 0 1  β6   
   β7   0 
0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0   0
0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0  β8   
   β9   0 
2 0 0 0 1 1 0 1 1 0 0 0 0 1 0 1 0 1 0 0 
 β   0 
  

0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0  10  ≡  0  (mod 2).
0 2 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0  β11   
  β   0 
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2  12   0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  β13   
  β   0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  14   0 
0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0
 β   
0  15   0 
 β
0 1 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0  16  0
β 
0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0  17  0
β
 18 
β19
β20
Integer Factorization 313

Call the coefficient matrix A. Since this is a system modulo 2, it suffices to


know the exponents αij modulo 2 only, and the system can be rewritten as
β 
1

 β2 
0 0 0 1 0 0 1 0 0 0 0 1 0 0 1 1 1 1 1
 β   0 
1  3 
β
0  4 
0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 1 0 1
 β   0 
0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0  5   0 
1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 1 0 1  β6   
   β7   0 
0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0   0
0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0
 β   
0  8   0 
 β
0 0 0 0 1 1 0 1 1 0 0 0 0 1 0 1 0 1 0 0  9   0 
0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0
 β   
0  10  ≡  0  (mod 2).

0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0  β11   
  β12   0 
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0   0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  β13   
  β14   0 
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0   0
0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0
 β   
0  15   0 
 β
0 1 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0  16  0
β 
0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0  17  0
 β18 
β19
β20

This reduced coefficient matrix (call it Ā) has rank 11 (modulo 2), that
is, the kernel of the matrix is a 9-dimensional subspace of Z20
2 . A basis of this
kernel is provided by the following vectors.

v1 = ( 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 )t ,
v2 = ( 0 1 0 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 )t ,
v3 = ( 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 )t ,
v4 = ( 0 0 0 0 1 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 )t ,
v5 = ( 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 )t ,
v6 = ( 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 )t ,
v7 = ( 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 )t ,
v8 = ( 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 )t ,
v9 = ( 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 )t .

There are 29 solutions for β1 , β2 , . . . , β20 , obtained by all Z2 -linear combi-


nations of v1 , v2 , . . . , v9 . Let us try some random solutions. First, take
t
( β1 β2 ··· β20 )
= v2 + v3 + v5 + v6 + v9
t
= (0 0 1 1 1 1 0 0 0 1 1 0 1 1 0 1 1 0 0 1) .
For this solution, we have
t
e = Aβ = ( 8 10 6 6 2 2 4 0 2 2 0 0 2 2 2) .
314 Computational Number Theory

This gives x ≡ xβ1 1 xβ2 2 · · · xβ2020 ≡ x3 x4 x5 x6 x10 x11 x13 x14 x16 x17 x20 ≡ 34795 ×
17688×58094×37009×46770×19274×23203×26911×50089×25505×9407 ≡
53886 (mod 64349). On the other hand, the vector e gives y ≡ 24 × 35 × 53 ×
73 × 11 × 13 × 172 × 23 × 29 × 41 × 43 × 47 ≡ 53886 (mod 64349). Therefore,
gcd(x − y, n) = 64349 = n, that is, the factorization attempt is unsuccessful.
Let us then try
t
( β1 β2 ··· β20 )
= v3 + v5 + v6 + v7 + v8
t
= (0 1 1 0 0 1 0 0 0 1 1 0 0 1 0 1 1 1 1 0)
so that
t
e = Aβ = ( 8 16 4 4 0 2 4 0 4 0 0 0 2 2 2) .
Therefore, x ≡ x2 x3 x6 x10 x11 x14 x16 x17 x18 x19 ≡ 53523 × 34795 × 37009 ×
46770 × 19274 × 26911 × 50089 × 25505 × 26820 × 18577 ≡ 58205 (mod 64349),
and y ≡ 24 × 38 × 52 × 72 × 13 × 172 × 232 × 41 × 43 × 47 ≡ 6144 (mod 64349).
In this case, gcd(x − y, n) = 1, and we again fail to split n non-trivially.
As a third attempt, let us try
t
( β1 β2 · · · β20 )
= v3 + v6 + v7 + v9
t
= (0 1 1 0 0 0 0 0 0 1 0 1 0 1 0 0 1 1 0 1) ,
for which
t
e = Aβ = ( 10 8 4 4 0 2 2 0 2 2 0 0 2 2 2) .
In this case, x ≡ x2 x3 x10 x12 x14 x17 x18 x20 ≡ 53523 × 34795 × 46770 × 4218 ×
26911 × 25505 × 26820 × 9407 ≡ 10746 (mod 64349). On the other hand,
y ≡ 25 ×34 ×52 ×72 ×13×17×23×29×41×43×47 ≡ 57954 (mod 64349). This
gives gcd(x−y, n) = 281, a non-trivial factor of n. The corresponding cofactor
is n/281 = 229. Since both these factors are prime, we get the complete
factorization 64349 = 229 × 281. ¤

We now derive the (optimal) running time of Dixon’s method. If the num-
ber t of primes in the factor base is too large, we have to collect too many
relations to obtain a system with non-zero solutions (we should have s > t).
Moreover, solving the system in that case would be costly. On the contrary, if
t is too small, most of the random values of x ∈ Z∗n will fail to generate rela-
tions, and we have to iterate too many times before we find a desired number
of values of x, for which x2 rem n factors completely over B. For estimating
the best trade-off, we need some results from analytic number theory.

Definition 6.12 Let m ∈ N. An integer x is called m-smooth (or smooth if


m is understood from the context) if all prime factors of x are 6 m. ⊳

Theorem 6.13 supplies the formula for the density of smooth integers.
Integer Factorization 315
ln n 2
Theorem 6.13 Let m, n ∈ N, and u = ln m . For u → ∞ and u > ln n,
the number of m-smooth integers x in the range 1 6 x 6 n asymptotically
approaches nu−u+o(u) . That is, the density of m-smooth integers between 1
and n is asymptotically u−u+o(u) . ⊳

Corollary 6.14 The density of L[β]-smooth positive integers with values of


α
the order O(nα ) is L[− 2β ]. Here, L[c] stands for L(n, 21 , c), as explained near
the beginning of this chapter.
Proof Integers with values O(nα ) can be taken as 6 knα for some constant
k. The natural logarithm of this is α ln n + ln k q≈ α ln n. On the other hand,

ln L[β] = (β + o(1)) ln n ln ln n. That is, u ≈ β lnlnlnnn . Consequently,
α

u−u+o(u) = exp[(−u + o(u)) ln u] = exp[(−1 + o(1))u ln u]


à à r !µ ¶!
α ln n α 1 1
= exp (−1 + o(1)) ln + ln ln n − ln ln ln n
β ln ln n β 2 2
µ ¶
α√
= exp (−1 + o(1))(1 + o(1)) ln n ln ln n

µµ ¶ ¶ · ¸
α √ α
= exp − + o(1) ln n ln ln n = L − .
2β 2β ⊳

We now deduce the best running time of Dixon’s method. Let us choose
the factor base to consist of all primes 6 L[β]. By the prime number theorem,
t ≈ L[β]/ ln L[β], that is, t = L[β] again. The random elements x2 rem n
are O(n), that is, we put α = 1 in Corollary 6.14. The probability that such
1
a random element factors completely over B is then L[− 2β ], that is, after
1
an expected number L[ 2β ] of iterations, we obtain one relation (that is, one
L[β]-smooth value). Each iteration calls for trial divisions by L[β] primes in
the factor base. Finally, we need to generate s > t relations. Therefore, the
1
total expected time taken by the relation collection stage is L[β]L[β]L[ 2β ]=
1 1 1
L[2β + 2β ]. The quantity 2β + 2β is minimized for β = 2 . For this choice, the
running time of the relation-collection stage is L[2].
The next stage of the algorithm solves a t × s system modulo 2. Since both
s and t are expressions of the form L[1/2], using standard Gaussian elimina-
tion gives a running time of L[3/2]. However, the relations collected lead to
equations that are necessarily sparse, since each smooth value of x2 rem n can
have only O(log n) prime factors. Such a sparse t × s system can be solved in
O˜(st) time using some special algorithms (see Chapter 8). Thus, the system-
solving stage can be completed in L[2/2] = L[1] time. To sum up, Dixon’s
method runs in subexponential time L[2] = L(n, 1/2, 2).
The constant 2 in L[2] makes Dixon’s method rather slow. The problem
with Dixon’s method is that it generates smoothness candidates as large as
O(n). By using other algorithms
√ (like CFRAC or QSM), we can generate
candidates having values O( n ).
316 Computational Number Theory

6.5 CFRAC Method


7
The continued fraction (CFRAC) √ method for factoring n is based upon
the√continued-fraction expansion of n. Let hr /kr denote the r-th convergent
to n for r = 0, 1, 2, . . . . We have
√ √
h2r − nkr2 = (h2r − 2 nhr kr + nkr2 ) − (2nkr2 − 2 nhr kr )
√ √ √
= ( nkr − hr )2 − 2 nkr ( nkr − hr ),
so that
√ √ √
|h2r − nkr2 | 6 | nkr − hr |2 + 2 nkr | nkr − hr |

1 2 nkr
< 2 + [by Theorem 1.67]
kr+1 kr+1
√ √
2 n(kr+1 − 1) 1 √ 2 nkr+1 − 1
6 + 2 = 2 n− 2
kr+1 kr+1 kr+1

< 2 n.

Under√the assumption that n is not a perfect √ square, n is irrational, that is,
hr
kr 6= n for all r, that is, 0 < |h2r − nkr2 | < 2 n for all r > 1.
The CFRAC method uses a factor base B = {−1, p1 , p2 , . . . , pt }, where
p1 , p2 , . . . , pt are all the small primes 6 L[β] (the choice of β will be discussed
later). One then computes the convergents hr /kr for r = 1, 2, 3, . . . . Let yr =
h2r − nkr2 . We have h2r ≡ yr (mod n). We check the smoothness of yr by
trial division of yr by primes in B. Since some values of yr are negative, we
also include −1 as an element in the factor √ base. Every smooth yr gives a
relation. As deduced above, we have |yr | < 2 n, that is, the integers tested
for smoothness are much smaller in this case than in Dixon’s method.
After sufficiently many relations are obtained, they are processed exactly
as in Dixon’s method. I will not discuss the linear-algebra phase further in
connection with the CFRAC method (or other factoring methods).
Let p be a small prime in the factor base. The condition p|yr implies that
h2r − nkr2 ≡ 0 (mod p), that is, (hr kr−1 )2 ≡ n (mod p), that is, n is a quadratic
residue modulo p. Therefore, we need not include those small primes p in the
factor base, modulo which n is a quadratic non-residue.
Example 6.15 Let us factor n = 596333 by the CFRAC method. We take
B = {−1, 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47}.
However, there is no need to consider those primes modulo which n is a
quadratic non-residue. This gives the reduced factor base
B = {−1, 2, 11, 13, 23, 29, 37, 43}.
7 Michael A. Morrison and John Brillhart, A method of factoring and the factorization

of F7 , Mathematics of Computation, 29(129), 183–205, 1975.


Integer Factorization 317

Before listing the relations, I highlight some implementation issues. The


denominators kr are not used in the relation h2r ≡ yr (mod n). We com-
pute only the numerator sequence hr . We start with h−2 = 0 and h−1 = 1.
Subsequently, for r > √ 0, we first obtain the r-th coefficient ar in the continued-
fraction expansion of n. This gives us hr = ar hr−1 + hr−2 . The sequence hr
grows quite rapidly with r. However, we need to know the value of hr modulo
n only. We compute yr ≡ h2r (mod n). For even r, we take yr as a value
between −(n − 1) and −1. For odd r, we take yr in the range 1 to n − 1.
For computing the coefficients
√ ar , we avoid floating-point arithmetic as
follows. We initialize ξ0 = n, compute ar = ⌊ξr ⌋ and subsequently ξr+1√=
1/(ξr − ar ). Here, each ξr is maintained as an expression of the form t1 + t2 n
2
with rational numbers t1 , t2 , that is, we work in the ring Q[x]/hx
√ − ni. Only
for computing ar as ⌊ξr ⌋, we need a floating-point value for n.
The following table demonstrates the CFRAC iterations for 0 6 r 6 15,
which give four B-smooth values of yr . Running the loop further yields B-
smooth values of yr for r = 20, 23, 24, 27, 28, 39. So we get ten relations, and
the size of the factor base is eight. We then solve the resulting system of linear
congruences modulo 2. This step is not shown here.

r ξ ar hr(mod n) yr B-smooth
√r
0 √n 772 772 −349 = (−1) × 349 No
1 (772 + √n)/349 4 3089 593 = 593 No
2 (624 + √n)/593 2 6950 −473 = (−1) × 11 × 43 Yes
3 (562 + √n)/473 2 16989 949 = 13 × 73 No
4 (384 + √n)/949 1 23939 −292 = (−1) × 22 × 73 No
5 (565 + √n)/292 4 112745 797 = 797 No
6 (603 + √n)/797 1 136684 −701 = (−1) × 701 No
7 (194 + √n)/701 1 249429 484 = 22 × 112 Yes
8 (507 + √n)/484 2 39209 −793 = (−1) × 13 × 61 No
9 (461 + √n)/793 1 288638 613 = 613 No
10 (332 + √n)/613 1 327847 −844 = (−1) × 22 × 211 No
11 (281 + √n)/844 1 20152 331 = 331 No
12 (563 + √n)/331 4 408455 −52 = (−1) × 22 × 13 Yes
13 (761 +√ n)/52 29 535020 737 = 11 × 67 No
14 (747 + √n)/737 2 285829 −92 = (−1) × 22 × 23 Yes
15 (727 + n)/92 16 337620 449 = 449 No
¤

Let us now deduce the optimal running time of the CFRAC method. The

smoothness candidates yr are O( n ), and have the probability L[− 1/2 2β ] =
1
L[− 4β ] for being L[β]-smooth. We expect to get one L[β]-smooth value of yr
1
after L[ 4β ] iterations. Each iteration involves trial divisions by L[β] primes in
the factor base. Finally, we need to collect L[β] relations, so the running time
1 1 1
of the CFRAC method is L[2β + 4β ]. Since 2β + 4β is minimized for β = 2√ 2
,

the optimal running time of the relation-collection stage is L[ 2 ]. The sparse
318 Computational Number Theory

system involving L[β] variables can be solved in L[2β] = L[ √12 ] time. To sum

up, the running time of the CFRAC method is L[ 2 ].
The CFRAC method can run √ in parallel with each instance handling the
continued-fraction expansion of sn for some √ s ∈ N. For a given s, the quan-
tity yr satisfies the inequality 0 < |yr | < 2 sn. If s grows, the probability of
yr being smooth decreases, so only small values of s should be used.

6.6 Quadratic Sieve Method


The Quadratic Sieve method (QSM)8 is another subexponential-time
integer-factoring
√ algorithm that generates smoothness candidates of values
O ( n ). As a result, its performance is similar to the CFRAC method. How-
ever, trial divisions in the QSM can be replaced by an efficient sieving proce-
dure. This leads to an L[1] running time of the QSM.

Let H = ⌈ n ⌉ and J = H√2 − n. If n is not a perfect square, J 6= 0. Also,
both H and J are of values O( n ). For a small integer c (positive or negative),
(H + c)2 ≡ H 2 + 2cH 2 2
√ + c ≡ J + 2cH + c (mod n). Call T√ (c) = J + 2cH + c2 .
Since H, J are O( n ) and c is small, we have T (c) = O( n ). We check the
smoothness of T (c) over a set B of small primes (the factor base). We vary
c in the range −M 6 c 6 M . The choices of B and M are explained later.
Since c can take negative values, T (c) may be negative. So we add −1 to the
factor base. After all smooth values of T (c) are found for −M 6 c 6 M , the
resulting linear system of congruences modulo 2 is solved to split n.
The condition p|T (c) implies (H +c)2 ≡ n (mod p). Thus, the factor base B
need not contain any small prime modulo which n is a quadratic non-residue.

Example 6.16 We use QSM to factor n = 713057, for which H = ⌈ n ⌉ =
845, and J = H 2 − n = 968. All primes < 50, modulo which n is a quadratic
residue, constitute the factor base, that is, B = {−1, 2, 7, 11, 17, 19, 29, 37, 43}.
We choose M = 50. All the values of c in the range −50 6 c 6 50, for which
T (c) is smooth over B, are listed below.

c T (c) c T (c)
−44 −71456 = (−1) × 25 × 7 × 11 × 29 −2 −2408 = (−1) × 23 × 7 × 43
−22 −35728 = (−1) × 24 × 7 × 11 × 29 0 968 = 23 × 112
−15 −24157 = (−1) × 72 × 17 × 29 2 4352 = 28 × 17
−14 −22496 = (−1) × 25 × 19 × 37 4 7744 = 26 × 112
−11 −17501 = (−1) × 11 × 37 × 43 26 45584 = 24 × 7 × 11 × 37
−9 −14161 = (−1) × 72 × 172 34 59584 = 26 × 72 × 19
−4 −5776 = (−1) × 24 × 192 36 63104 = 27 × 17 × 29

8 Carl Pomerance, The quadratic sieve factoring algorithm, Eurocrypt’84, 169–182, 1985.
Integer Factorization 319

There are 14 smooth values of T (c), and the size of the factor base is 9. Solving
the resulting system splits n as 761 × 937 (with both the factors prime). ¤

Let us now look at the optimal running time of the QSM. Let the factor
base B consist of (−1 and) primes √ 6 L[β]. Since the integers T (c) checked for
smoothness over B have values O( n ), the probability of each being smooth
1
is L[− 4β ]. In order that we get L[β] relations, the size of the sieving interval
1
(or equivalently M ) should be L[β + 4β ]. If we use trial division of each T (c)
by all of the L[β] primes in the factor base, we obtain a running time of
1
L[2β + 4β ]. As we will see shortly, a sieving procedure reduces this running
time by a factor of L[β], that is, the running time of the relation-collection
1
stage of QSM with sieving is only L[β + 4β ]. This quantity is minimized for
1
β = 2 , and we obtain a running time of L[1] for the relation-collection stage
of QSM. The resulting sparse system having L[ 12 ] variables and L[ 12 ] equations
can also be solved in the same time.

6.6.1 Sieving
Both the√CFRAC method and the QSM generate smoothness candidates √
of value O( n ). In the CFRAC method, these values √ are bounded by 2 n,
whereas for the QSM, we have a bound of nearly 2M n. The CFRAC method
is, therefore, expected to obtain smooth candidates more frequently than the
QSM. On the other hand, the QSM offers the possibility of sieving, a process
that replaces trial divisions by single-precision subtractions. As a result, the
QSM achieves a better running time than the CFRAC method.
In the QSM, the smoothness of the integers T (c) = J + 2cH + c2 is checked
for −M 6 c 6 M . To that effect, we use an array A indexed by c in the range
−M 6 c 6 M . We initialize the array location Ac to an approximate value
of log |T (c)|. We can use only one or two most significant words of T (c) for
this initial value. Indeed, it suffices to know log |T (c)| rounded or truncated
after three places of decimal. If so, we can instead store the integer value
⌊1000 log |T (c)|⌋, and perform only integer operations on the elements of A.
Now, we choose small primes p one by one from the factor base. For each
small positive exponent h, we try to find out all the values of c for which
ph |T (c). Since J = H 2 − n, this translates to solving (H + c)2 ≡ n (mod ph ).
For h = 1, we use a root-finding algorithm, whereas for h > 1, the solutions
can be obtained by lifting the solutions modulo ph−1 . In short, all the solutions
of (H + c)2 ≡ n (mod ph ) can be obtained in (expected) polynomial time (in
log p and log n).
Let χ be a solution of (H + c)2 ≡ n (mod ph ). For all c in the range
−M 6 c 6 M , we subtract log p from the array element Ac if and only if
c ≡ χ (mod ph ). In other words, we first obtain one solution χ, and then
update Ac for all c = χ ± kph with k = 0, 1, 2, . . . . If, for a given c, we have
the multiplicity v = vp (T (c)), then log p is subtracted from Ac exactly v times
(once for each of h = 1, 2, . . . , v).
320 Computational Number Theory

After all primes p ∈ B and all suitable small exponents h are considered,
we look at the array locations Ac for −M 6 c 6 M . If some T (c) is smooth
(over B), then all its prime divisors are eliminated during the subtractions
of log p from the initial value of log |T (c)|. Thus, we should have Ac = 0.
However, since we use only approximate log values, we would get Ac ≈ 0. On
the other hand, a non-smooth T (c) contains a prime factor > pt+1 . Therefore,
a quantity at least as large as log pt+1 remains in the array location Ac , that
is, Ac ≫ 0 in this case. In short, the post-sieving values of Ac readily identify
the smooth values of T (c). Once we know that some T (c) is smooth, we use
trial division of that T (c) by the primes in the factor base.
Example 6.17 We factor n = 713057 by the QSM. As in Example 6.16, take
B = {−1, 2, 7, 11, 17, 19, 29, 37, 43}, and M = 50. Initialize the array entry Ac
to ⌊1000 log |T (c)|⌋ (e is the base of logarithms). Since T (0) = J = 968, set
A0 = ⌊1000 log 968⌋ = 6875. Similarly, T (20) = J +40H +400 = 35168, so A20
is set to ⌊1000 log 35168⌋ = 10467, and T (−20) = J − 40H + 400 = −32432,
so A−20 is set to ⌊1000 log 32432⌋ = 10386. The approximate logarithms of
the primes in B are ⌊1000 log 2⌋ = 693, ⌊1000 log 7⌋ = 1945, ⌊1000 log 11⌋ =
2397, ⌊1000 log 17⌋ = 2833, ⌊1000 log 19⌋ = 2944, ⌊1000 log 29⌋ = 3367,
⌊1000 log 37⌋ = 3610, and ⌊1000 log 43⌋ = 3761.
The following table considers all small primes p and all small exponents
h for solving T (c) ≡ 0 (mod ph ). For each solution χ, we consider all values
of c ≡ χ (mod ph ) with −M 6 c 6 M , and subtract log p from Ac . For each
prime p, we consider h = 1, 2, 3, . . . in that sequence until a value of h is found,
for which there is no solution of T (c) ≡ 0 (mod ph ) with −M 6 c 6 M .
p ph χ c ≡ χ (mod ph ), −M 6 c 6 M
2 2 0 −50, −48, −46, −44, . . . , −2, 0, 2, . . . , 24 , 26, . . . , 44, 46, 48, 50
4 0 −48, −44, −40, −36, . . . , −4, 0, 4, . . . , 24 , . . . , 36, 40, 44, 48
2 −50, −46, −42, −38, . . . , −2, 2, . . . , 26, . . . , 38, 42, 46, 50
8 0 −48, −40, −32, −24, −16, −8, 0, 8, 16, 24 , 32, 40, 48
2 −46, −38, −30, −22, −14, −6, 2, 10, 18, 26, 34, 42, 50
4 −44, −36, −28, −20, −12, −4, 4, 12, 20, 28, 36, 44
6 −50, −42, −34, −26, −18, −10, −2, 6, 14, 22, 30, 38, 46
16 2 −46, −30, −14, 2, 18, 34, 50
4 −44, −28, −12, 4, 20, 36
10 −38, −22, −6, 10, 26, 42
12 −36, −20, −4, 12, 28, 44
32 2 −30, 2, 34
4 −28, 4, 36
18 −46, −14, 18, 50
20 −44, −12, 20
64 2 2
4 4
34 −30, 34
36 −28, 36
Integer Factorization 321

p ph χ c ≡ χ (mod ph ), −M 6 c 6 M
2 128 2 2
36 36
66
100 −28
256 2 2
100
130
228 −28
512 130
228
386
484 −28
1024 130
228
642
740
7 7 5 −44, −37, −30, −23, −16, −9, −2, 5, 12, 19, 26, 33, 40, 47
6 −50, −43, −36, −29, −22, −15, −8, −1, 6, 13, 20, 27, 34, 41, 48
49 34 −15, 34
40 −9, 40
343 132
236
11 11 0 −44, −33, −22, −11, 0, 11, 22, 33, 44
4 −40, −29, −18, −7, 4, 15, 26, 37, 48
121 0 0
4 4
1331 242
730
17 17 2 −49, −32, −15, 2, 19, 36
8 −43, −26, −9, 8, 25, 42
289 53
280 −9
4913 53
3170
19 19 5 −33, −14, 5, 24 , 43
15 −42, −23, −4, 15, 34
361 119
357 −4
6859 480
4689
29 29 7 −22, 7, 36
14 −44, −15, 14, 43
841 72
761
322 Computational Number Theory

p ph χ c ≡ χ (mod ph ), −M 6 c 6 M
37 37 23 −14, 23
26 −48, −11, 26
1369 245
803
43 43 32 −11, 32
41 −45, −2, 41
1849 471
1537

Let us track the array locations A24 and A26 in the above table (the bold
and the boxed entries). We have T (24) = 42104, and T (26) = 45584. So A24 is
initialized to ⌊1000 log 42104⌋ = 10647, and A26 to ⌊1000 log 45584⌋ = 10727.
We subtract ⌊1000 log 2⌋ = 693 thrice from A24 , and ⌊1000 log 19⌋ = 2944
once from A24 . After the end of the sieving process, A24 stores the value
10647 − 3 × 693 − 2944 = 5624, that is, T (24) is not smooth. In fact, T (24) =
42104 = 23 × 19 × 277. The smallest prime larger than those in B is 53 for
which ⌊1000 log 53⌋ = 3970. Thus, if T (c) is not smooth over B, then after the
completion of sieving, Ac would store a value not (much) smaller than 3970.
From A26 , ⌊1000 log 2⌋ = 693 is subtracted four times, ⌊1000 log 7⌋ = 1945
once, ⌊1000 log 11⌋ = 2397 once, and ⌊1000 log 37⌋ = 3610 once, leaving the
final value 10727 − 4 × 693 − 1945 − 2397 − 3610 = 3 at that array location.
So T (26) is smooth. Indeed, T (26) = 45584 = 24 × 7 × 11 × 37.
This example demonstrates that the final values of Ac for smooth T (c) are
clearly separated from those of Ac for non-smooth T (c). As a result, it is easy
to locate the smooth values after the sieving process even if somewhat crude
approximations are used for representing the logarithms. ¤

Let us now argue that the sieving process runs in L[1] time for the choice
β = 1/2. The primes in the factor base are 6 L[β] = L[1/2]. On the other
1
hand, M = L[β + 4β ] = L[1], and so 2M + 1 is also of the form L[1]. First,
we have to initialize all of the array locations Ac . Each location demands
the computation of T (c) (and its approximate logarithm), a task that can be
performed in O(lnk n) time for some constant k. Since there are L[1] array
locations, the total time for the initialization of A is of the order of
h √ i
(lgk n)L[1] = exp (1 + o(1)) ln n ln ln n + k ln ln n
"Ã r ! #
ln ln n √
= exp 1 + o(1) + k ln n ln ln n ,
ln n

which is again an expression of the form L[1].



Fix a prime p ∈ B. Since T (c) = O( n ), one needs to consider only
l
O(log n) values of h. Finding the solutions χ for each value of h takes O(log n)
l+1
time for a constant l. If we vary h, we spend a total of O(log n) time for
each prime p. Finally, we vary p, and conclude that the total time taken for
Integer Factorization 323

computing all the solutions χ is of the order of (logl+1 n)L[1/2], which is again
an expression of the form L[1/2].
Now, we derive the total cost of subtracting log values from all array
locations. First, take p = 2. We have to subtract log 2 from each array location
Ac at most O(log n) times. Since M and so 2M + 1 are expressions of the form
L[1], the total effort of all subtractions for p = 2 is of the order of (log n)L[1]
which is again of the form L[1]. Then, take an odd prime p. Assume that
p 6 | n, and n is a quadratic residue modulo p. In this case, the congruence
T (c) ≡ 0 (mod ph ) has
√ exactly two solutions for each value of h. Moreover,
the values ph are O( n ). Therefore, the total cost of subtractions for all odd
P
small primes p and for all small exponents h is of the order of p,h (2M p+1)×2
h <
Pn 1
2(2M + 1) r=1 r ≈ 2(2M + 1) ln n, which is an expression of the form L[1].
Finally, one has to factor the L[1/2] smooth values of T (c) by trial divisions
by the L[1/2] primes in the factor base. This process too can be completed in
L[1] time. To sum up, the entire relation-collection stage runs in L[1] time.

6.6.2 Incomplete Sieving


There are many practical ways of speeding up the sieving process in the
QSM. Here is one such technique. In the sieving process, we solve T (c) ≡
0 (mod ph ) for all p ∈ B and for all small exponents h. The final value of Ac
equals (a number close to) the logarithm of the unfactored part of T (c).
Suppose that now we work with only the value h = 1. That is, we solve only
T (c) ≡ 0 (mod p) and not modulo higher powers of p. This saves the time for
lifting and also some time for updating Ac . But then, after the sieving process
is over, Ac does not store the correct value of the logarithm of the unfactored
part of T (c). We failed to subtract log p the requisite number of times, and
so even for a smooth T (c) (unless square-free), the array location Ac stores a
significantly positive quantity (in addition to approximation errors).
But is this situation too bad? If p is small (like 2), its logarithm is small
too. So we can tolerate the lack of subtraction of log p certain number of times.
On the other hand, log p is large for a large member p of B. However, for a
large p, we do not expect many values of T (c) to be divisible by ph with h > 2.
To sum up, considering only the exponent h = 1 does not tend to leave a
huge value in Ac for a smooth T (c). If T (c) is not smooth, Ac would anyway
store a value at least as large as log pt+1 . Therefore, if we relax the criterion
of smallness of the final values of Ac , we expect to recognize smooth values
of T (c) as so. In other words, we suspect T (c) as smooth if and only if the
residual value stored in Ac is 6 ξ log pt+1 for some constant ξ. In practical
situations, the values 1.0 6 ξ 6 2.5 work quite well.
Since we choose ξ > 1, some non-smooth values of T (c) also pass the
selection criterion. Since we do trial division anyway for selected values of T (c),
we would discard the non-smooth values of T (c) that pass the liberal selection
criterion. An optimized choice of ξ more than compensates for the waste of
time on these false candidates by the saving done during the sieving process.
324 Computational Number Theory

Example 6.18 Let us continue with Examples 6.16 and 6.17. Now, we carry
out incomplete sieving. To start with, let us look at what happens to the array
locations A24 and A26 . After the incomplete sieving process terminates, A24
stores the value 10647 − 693 − 2944 = 7010. Since 1000 log pt+1 = 3970, T (24)
is selected as smooth for ξ > 7010/3970 ≈ 1.766. On the other hand, A26 ends
up with the value 10727 − 693 − 1945 − 2397 − 3610 = 2082 which passes the
smoothness test for ξ > 2082/3970 ≈ 0.524.
The following table lists, for several values of ξ, all values of c for which
T (c) passes the liberal smoothness test.

ξ c selected with smooth T (c) c selected with non-smooth T (c)


0.5 −15, −11, −2 None
1.0 −44, −22, −15, −14, −11, −2, 0, 26 None
1.5 −44, −22, −15, −14, −11, −9, −33, −23, −1, 5, 15, 19, 41, 43
−4, −2, 0, 2, 4, 26, 34, 36
2.0 −44, −22, −15, −14, −11, −9, −48, −45, −43, −42, −33, −32, −29,
−4, −2, 0, 2, 4, 26, 34, 36 −26, −23, −18, −16, −8, −7, −1, 1, 5,
6, 7, 8, 11, 12, 14, 15, 19, 20, 22, 23, 24,
25, 32, 33, 41, 42, 43, 48

Evidently, for small values of ξ, only some (not all) smooth values of T (c) pass
the selection criterion. As we increase ξ, more smooth values of T (c) pass the
criterion, and more non-smooth values too pass the criterion. ¤

6.6.3 Large Prime Variation


Suppose that in the relation collection stage of the QSM, we are unable to
obtain sufficiently many relations to force non-zero solutions for β1 , β2 , . . . , βs
needed to split n. This may have happened because t and/or M were chosen to
be smaller than the optimal values (possibly to speed up the sieving process).
Instead of repeating the entire sieving process with increased values of t and/or
M , it is often useful to look at some specific non-smooth values of T (c).
Write T (c) = uv with u smooth over B and with√v having no prime factors
in B. If v is composite, it admits a prime factor 6 v, that is, if pt < v < p2t
(where pt is the largest prime in B), then v is prime. In terms of log values,
if the final value stored in Ac satisfies log pt < Ac < 2 log pt , the non-smooth
part of T (c) is a prime. Conversely, if T (c) is smooth except for a prime factor
q in the range pt < q < p2t , the residue left in Ac satisfies log pt < Ac < 2 log pt .
Such a prime q is called a large prime in the context of the QSM.
Suppose that a large prime q appears as the non-smooth part in two or
more values of T (c). In that case, we add q to the factor base, and add all
these relations involving q. This results in an increase in the factor-base size
by one, whereas the number of relations increases by at least two. After suffi-
ciently many relations involving large primes are collected, we may have more
relations than the factor-base size, a situation that is needed to split n.
Integer Factorization 325

Example 6.19 Let me illustrate the concept of large primes in connection


with Examples 6.16 and 6.17. Suppose we want to keep track of all large primes
6 430 (ten times the largest prime in the factor base B). The following table
lists all values of c for which T (c) contains such a large prime factor. The final
values stored in Ac are also shown. Note that ⌊1000 log 430⌋ = 6063.

c Ac T (c)
−48 5573 −77848 = (−1) × 23 × 37 × 263
−33 5550 −53713 = (−1) × 11 × 19 × 257
−32 5948 −52088 = (−1) × 23 × 17 × 383
−30 4693 −48832 = (−1) × 26 × 7 × 109
−28 4489 −45568 = (−1) × 29 × 89
−26 5740 −42296 = (−1)23 × 17 × 311
−23 5639 −37373 = (−1) × 7 × 19 × 281
−18 5803 −29128 = (−1) × 23 × 11 × 331
−8 5408 −12488 = (−1) × 23 × 7 × 223
−1 4635 −721 = (−1) × 7 × 103
5 4264 9443 = 7 × 19 × 71
6 5294 11144 = 23 × 7 × 199
8 4673 14552 = 23 × 17 × 107
12 5253 21392 = 24 × 7 × 191
14 4673 24824 = 23 × 29 × 107
15 4845 26543 = 11 × 19 × 127
19 5639 33439 = 7 × 17 × 281
20 5057 35168 = 25 × 7 × 157
24 5624 42104 = 23 × 19 × 277
32 5094 56072 = 23 × 43 × 163
40 5189 70168 = 23 × 72 × 179
41 5477 71939 = 7 × 43 × 239
42 5602 73712 = 24 × 17 × 271
43 4920 75487 = 19 × 29 × 137
48 4922 84392 = 23 × 7 × 11 × 137

Three large primes have repeated occurrences (see the boxed entries). If
we add these three primes to the factor base of Example 6.16, we obtain
B = {−1, 2, 7, 11, 17, 19, 29, 37, 43, 107, 137, 281} (12 elements) and 14+6 = 20
relations, that is, we are now guaranteed to have at least 220−12 = 28 solutions.
Compare this with the original situation of 14 relations involving 9 elements
of B, where the guaranteed number of solutions was > 214−9 = 25 . ¤

Relations involving two or more large primes can be considered. If the


non-smooth part v in T (c) satisfies p2t < v < p3t (equivalently, 2 log pt < Ac <
3 log pt after the sieving process), then v is either a prime or a product of two
large primes. In the second case, we identify the two prime factors by trial
divisions by large primes. This gives us a relation involving two large primes
p, q. If each of these large primes occurs at least once more in other relations
(alone or with other repeated large primes), we can add p, q to B.
326 Computational Number Theory

6.6.4 Multiple-Polynomial Quadratic Sieve Method


The multiple-polynomial quadratic sieve method (MPQSM)9 is practically
considered to be the second fastest among the known integer-factoring algo-
rithms. In the original QSM, we consider smooth values of T (c) = J +2cH +c2
for −M 6 c 6 M . When c = ±M , the quantity |T (c)|√ reaches the maximum
possible value approximately equal to 2M H ≈ 2M n. The MPQSM works
with a more general quadratic polynomial of the form
T (c) = U + 2V c + W c2
with V 2 −U W = n. The coefficients U, V, W are so adjusted
√ that the maximum
value of T (c) is somewhat

smaller compared to 2M n. We first choose W as
a prime close to M2n , modulo which n is a quadratic residue. We take V to be
√ √n
the smaller square root of n modulo W , so V 6 12 M2n = M2 . Finally, we take
2 p
U = V W−n ≈ √−n 2n
= − n2 M . For these choices, the maximum value of |T (c)|
M
becomes |T√(M )| = |U + 2V M + W M 2 | 6 U + W M + W M 2 ≈ U + W M 2 ≈
p √ √
− n2 M + M2n ×M 2 = √12 M n. This value is 2 2 ≈ 2.828 times smaller than

the maximum value 2M n for the original QSM. Since smaller candidates are
now tested for smoothness, we expect to obtain a larger fraction of smooth
values of T (c) than in the original QSM.
It remains to establish how we can use the generalized polynomial T (c) =
U + 2V c + W c2 in order to obtain a relation. Multiplying the expression for
T (c) by W gives W T (c) = (W c + V )2 + (U W − V 2 ) = (W c + V )2 − n, that is,
(W c + V )2 ≡ W T (c) (mod n).
Since the prime W occurs on the right side of every relation, we√ include W
in the factor base B. (For a sufficiently large n, the prime W ≈ M2n was not
originally included in the set B of small prime.)
Example 6.20 We factor n = 713057 by the MPQSM. In the original QSM
(Example 6.16), we took B = {−1, 2, 7, 11, 17, 19, 29, 37, 43}, and M = 50. Let
us continue to work with these choices.
In order to apply the MPQSM, we√ first choose a suitable polynomial, that
is, the coefficients U, V, W . We have M2n ≈ 23.884. The smallest prime larger
than this quantity, modulo which n is a quadratic residue, is 29. Since n is
an artificially small composite integer chosen for the sake of illustration only,
the prime W = 29 happens to be already included in the factor base B, that
is, the inclusion of W in B does not enlarge B. The two square roots of n
modulo W are 11 and 18. We choose the smaller one as V , that is, V = 11.
2
Finally, we take U = V W−n = −24584. Therefore,
T (c) = −24584 + 22c + 29c2 .
9 Robert D. Silverman, The multiple polynomial quadratic sieve, Mathematics of Com-

putation, 48, 329–339, 1987. The author, however, acknowledges personal communication
with Peter L. Montgomery for the idea.
Integer Factorization 327

The following table lists the smooth values of T (c) as c ranges over the interval
−50 6 c 6 50. The MPQSM yields 19 relations, whereas the original QSM
yields only 14 relations (see Example 6.16).
c T (c) c T (c)
−50 46816 = 25 ×7×11×19 6 −23408 = (−1)×24 ×7×11×19
−38 16456 = 23 ×112 ×17 18 −14792 = (−1)×23 ×432
−34 8192 = 213 20 −12544 = (−1)×28 ×72
−32 4408 = 23 ×19×29 22 −10064 = (−1)×24 ×17×37
−29 −833 = (−1)×72 ×17 26 −4408 = (−1)×23 ×19×29
−28 −2464 = (−1)×25 ×7×11 27 −2849 = (−1)×7×11×37
−14 −19208 = (−1)×23 ×74 28 −1232 = (−1)×24 ×7×11
−12 −20672 = (−1)×26 ×17×19 30 2176 = 27 ×17
−10 −21904 = (−1)×24 ×372 45 35131 = 19×432
−3 −24389 = (−1)×293
In order to exploit the reduction of the values of T (c) in the MPQSM,
we could have started with a smaller sieving interval, like M = 35. In this
case, we have the parameters U = −19264, V = 17, and W = 37, that is,
T (c) = −19264 + 34c + 37c2 . Smoothness tests of the values of T (c) for −35 6
c 6 35 yield 15 smooth values (for c = −35, −26, −24, −23, −18, −16, −8,
−7, 0, 16, 18, 20, 21, 22, 32).
On the other hand, if we kept M = 50 but eliminated the primes 37 and
43 from B, we would have a factor base of size 7. The relations in the above
table, that we can now no longer use, correspond to c = −10, 18, 22, 27, 45.
But that still leaves us with 14 other relations. ¤
Example 6.20 illustrates that in the MPQSM, we can start with values of t
and/or M smaller than optimal. We may still hope to obtain sufficiently many
relations to split n. Moreover, we can use different polynomials (for different
choices of W ), and run different instances of the MPQSM in parallel.

6.7 Cubic Sieve Method


10
The cubic
p sieve method (CSM) proposed by Reyneri achieves a running
time of L[ 2/3 ], that is, nearly L[0.8165], and so is asymptotically faster
than the QSM. Suppose that we know a solution of the congruence
x3 ≡ y 2 z (mod n)
with x3 6= y 2 z as integers, and with x, y, z of absolute values O(nξ ) for ξ < 1/2.
Heuristic estimates indicate that we expect to have solutions with ξ ≈ 1/3.
10 Unpublished manuscript, first reported in: D. Coppersmith, A. M. Odlyzko, R. Schroep-

pel, Discrete logarithms in GF (p), Algorithmica, 1(1), 1–15, 1986.


328 Computational Number Theory

For small integers a, b, c with a + b + c = 0, we have

(x + ay)(x + by)(x + cy) ≡ x3 + (a + b + c)x2 y + (ab + ac + bc)xy 2 + (abc)y 3


≡ y 2 z + (ab + ac + bc)xy 2 + (abc)y 3
≡ y 2 T (a, b, c) (mod n),

where

T (a, b, c) = z + (ab + ac + bc)x + (abc)y.

Since each of x, y, z is O(nξ ), the value of T (a, b, c) is again O(nξ ). If ξ ≈ 1/3,


then T (a, b, c) is O(n1/3 ), that is, asymptotically smaller than O(n1/2 ), the
values of T (c) in the QSM (or MPQSM). p
We start with a factor base B, andpa bound M = L[ ξ/2 ]. The factor
base consists of all of the t primes 6 L[ ξ/2 ] along with the 2M p + 1 integers
x + ay for −M 6 a 6 M . The size of the factor base is then L[ ξ/2 ]. If some
T (a, b, c) factors completely over the t small primes in B, we get a relation

(x + ay)(x + by)(x + cy) ≡ y 2 pα1 α2 αt


1 p2 · · · pt (mod n).
p
The probability of a T (a, b, c) to be smooth is L[− √ξ ] = L[− ξ/2 ]. The
2 ξ/2
number of p triples a, b, c in the range −M to M with a + b + c = 0 is O(M 2 ),
that
p is, L[2 ξ/2 ]. Therefore, all these T (a, b, c) values are expected to produce
L[ ξ/2 ] relations, which is of the same size as the factor base B.
The CSM supports possibilities of sieving. To avoid duplicate generations
of the same relations for different permutations of a, b, c, we force the triples
(a, b, c) to satisfy −M 6 a 6 b 6 c 6 M . In that case, a is always negative, c
is always positive, and b varies from max(a, −(M + a)) to −a/2. We perform
a sieving procedure for each value of a ∈ {−M, −M + 1, . . . , 0}. For a fixed
a, the sieving interval is max(a, −(M + a)) 6 b 6 −a/2. The details of the
sieving process are left to the reader (Exercise 6.21). p
The CSM with p sieving can be shown to prun in time L[2 ξ/2 ]. A sparse
system involving
p L[ ξ/2 ] variables and L[ ξ/2 ] equations can alsop be solved
in time L[2 ξ/2 ]. That is, the running time p of the CSM is L[2 ξ/2 ] =

L[ 2ξ ]. For ξ = 1/3, this running time is L[ 2/3 ], as claimed earlier.

Example 6.21 Let us factor n = 6998891 using the CSM. We need a solution
of x3 ≡ y 2 z (mod n). For x = 241, y = 3 and z = −29, we have x3 − y 2 z = 2n.
We take M = 50. The factor base B consists of −1, all primes < 100 (there are
25 of them), and all integers of the form x + ay = 241 + 3a for −50 6 a 6 50.
The size of the factor base is, therefore, 1 + 25 + 101 = 127. In this case,
we have T (a, b, c) = −29 + 241(ab + ac + bc) + 3abc. If we vary a, b, c with
−50 6 a 6 b 6 c 6 50 and with a + b + c = 0, we obtain 162 smooth values
of T (a, b, c). Some of these smooth values are listed in the following table.
Integer Factorization 329

a b c T (a, b, c)
−50 −49 99 −1043970 = (−1) × 2 × 3 × 5 × 17 × 23 × 89
−50 −39 89 −918390 = (−1) × 2 × 3 × 5 × 113 × 23
−50 −16 66 −698625 = (−1) × 35 × 53 × 23
−50 14 36 −556665 = (−1) × 3 × 5 × 17 × 37 × 59
−49 −48 97 −1016334 = (−1) × 2 × 33 × 11 × 29 × 59
−49 −21 70 −716850 = (−1) × 2 × 35 × 52 × 59
−49 −4 53 −598598 = (−1) × 2 × 7 × 11 × 132 × 23
−49 7 42 −551034 = (−1) × 2 × 32 × 113 × 23
−49 10 39 −542010 = (−1) × 2 × 3 × 5 × 7 × 29 × 89
−49 18 31 −526218 = (−1) × 2 × 3 × 7 × 11 × 17 × 67
−48 −38 86 −872289 = (−1) × 34 × 112 × 89
−48 −29 77 −771894 = (−1) × 2 × 32 × 19 × 37 × 61
−48 −19 67 −678774 = (−1) × 2 × 3 × 29 × 47 × 83
−48 9 39 −521246 = (−1) × 2 × 11 × 19 × 29 × 43
−48 20 28 −500973 = (−1) × 3 × 11 × 17 × 19 × 47
···
−5 −4 9 −14190 = (−1) × 2 × 3 × 5 × 11 × 43
−5 −1 6 −7410 = (−1) × 2 × 3 × 5 × 13 × 19
−5 2 3 −4698 = (−1) × 2 × 34 × 29
−4 −3 7 −8694 = (−1) × 2 × 33 × 7 × 23
−4 −2 6 −6633 = (−1) × 32 × 11 × 67
−4 0 4 −3885 = (−1) × 3 × 5 × 7 × 37
−4 1 3 −3198 = (−1) × 2 × 3 × 13 × 41
−3 1 2 −1734 = (−1) × 2 × 3 × 172
−2 −2 4 −2873 = (−1) × 132 × 17
−1 0 1 −270 = (−1) × 2 × 33 × 5
0 0 0 −29 = (−1) × 29
We solve for β1 , β2 , . . . , β162 from 127 linear congruences modulo 2. These
linear-algebra calculations are not shown here. Since the number of variables is
significantly larger than the number of equations, we expect to find a non-zero
solution for β1 , β2 , . . . , β162 to split n. Indeed, we get n = 293 × 23887. ¤

The CSM has several problems which restrict the use the CSM in a general
situation. The biggest problem is that we need a solution of the congruence
x3 ≡ y 2 z (mod n) with x3 6= y 2 z and with x, y, z as small as possible. No
polynomial-time (nor even subexponential-time) method is known to obtain
such a solution. Only when n is of certain special forms (like n or a multiple
of n is close to a perfect cube), a solution for x, y, z is available naturally.
A second problem of the CSM is that because of the quadratic and cubic
coefficients (in a, b, c), the values
p of T (a, b, c) are, in practice, rather large. To
be precise, T (a, b, c) = O(L[3 ξ/2 ]nξ ). Although this quantity is asymptot-
ically O(nξ+o(1) ), the expected benefits of the CSM do not show up unless n
is quite large. My practical experience with the CSM shows that for integers
of bit sizes > 200, the CSM offers some speedup over the QSM. However,
330 Computational Number Theory

for these bit sizes, one would possibly prefer to apply the number-field sieve
method. In the special situations where the CSM is readily applicable (as
mentioned in the last paragraph), the special number-field sieve method too
is applicable, and appears to be a strong contender of the CSM.

6.8 Elliptic Curve Method


Lenstra’s elliptic curve method (ECM)11 is a clever adaptation of Pollard’s
p−1 method. Proposed almost at the same time as the QSM, the ECM has an
output-sensitive running time of Lp (c) = L(p, 1/2, c), where p is the smallest
prime divisor of n. The √ ECM can, therefore, be efficient if p is small (that
is, much smaller than n ), whereas the QSM is not designed to take direct
advantage of such a situation.
Let n be an odd composite integer which is not a prime power. Let p be
the smallest prime divisor of n. We consider an elliptic curve

E : y 2 = x3 + ax + b

defined over Fp . Moreover, let P = (h, k) be a non-zero point on Ep = EFp .


Since p is not known beforehand, we keep the parameters of E and the co-
ordinates of points in Ep reduced modulo n. We carry out curve arithmetic
modulo n. The canonical projection Zn → Zp lets the arithmetic proceed nat-
urally modulo p as well, albeit behind the curtain. Eventually, we expect the
arithmetic to fail in an attempt to invert some integer r which is non-zero
modulo n, but zero modulo p. The gcd of r and n divulges a non-trivial factor
of n. The ECM is based upon an idea of forcing this failure of computation
in a reasonable (subexponential, to be precise) amount of time.

By Hasse’s theorem, the group Ep contains between p + 1 − 2 p and

p+1+2 p points, that is, |Ep | ≈ p. We assume that for randomly chosen curve
parameters a, b, the size of Ep is a random integer √ in the Hasse interval. Let the
factor base B consist of all small primes √ 6 L p [1/ 2 ]. The size |Ep | is smooth
over B with a probability
√ of Lp [−1/ 2 ] (by Corollary 6.14). This means that
after trying Lp [1/ 2 ] random curves, we expect to obtain one curve with |Ep |
being B-smooth. Such a curve E can split n with high probability as follows.
Let BQ = {p1 , p2 , . . . , pt } (where pi is the i-th prime), ei = ⌊log p/ log pi ⌋,
t
and m = i=1 pei i . Since |Ep | is B-smooth, and the order of the point P ∈ Ep
is a divisor of |Ep |, we must have mP = O. This means that at some stage
during the computation of mP , we must attempt to add two finite points
Q1 = (h1 , k1 ) and Q2 = (h2 , k2 ) satisfying Q1 = −Q2 in the group Ep , that
is, h1 ≡ h2 (mod p) and k1 ≡ −k2 (mod p). But p is unknown, so the
11 Hendrik W. Lenstra Jr., Factoring integers with elliptic curves, Annals of Mathematics,

126(2), 649–673, 1987.


Integer Factorization 331

coefficients h1 , k1 , h2 , k2 are kept available modulo n. This, in turn, indicates


that we expect with high probability (see below why) that the modulo-n
representatives of h1 and h2 are different (although they are same modulo p).
First, assume that Q1 and Q2 are ordinary points, that is, different in Ep .
Then, the computation of Q1 + Q2 modulo n attempts to invert h2 − h1 (see
the addition formulas in Section 4.2). We have p|(h2 − h1 ). If n6 | (h2 − h1 ),
then gcd(h2 − h1 , n) is a non-trivial divisor (a multiple of p) of n. If Q1 and
Q2 are special points, Q1 + Q2 = O implies Q1 = Q2 . Such a situation arises
when the computation of mP performs a doubling step. Since Q1 is a special
point, we have k1 ≡ 0 (mod p). The doubling procedure involves inverting 2k1
modulo n. If k1 6≡ 0 (mod n), then gcd(k1 , n) is a non-trivial factor of n.
Let q be a prime divisor of n, other than p. If p is the smallest prime divisor
of n, we have q > p. The size of the group Eq is nearly q. It is expected that |Ep |
and |Eq | are not B-smooth simultaneously. But then, although mP = O in Ep ,
we have mP 6= O in Eq . Therefore, when Q1 +Q2 equals O in the computation
of mP in Ep , we have Q1 + Q2 6= O in Eq , that is, h1 ≡ h2 (mod p), but
h1 6≡ h2 (mod q). By the CRT, h1 6≡ h2 (mod n), that is, gcd(h2 − h1 , n) is a
multiple of p, but not of q, and so is a proper divisor of n, larger than 1. The
case 2Q1 = O analogously exhibits a high potential of splitting n.
There are some small problems to solve now in order to make the ECM
work. First, p itself is unknown until it (or a multiple of it) is revealed. But
the determination of the exponents ei requires the knowledge of p. If an upper
bound M on p is known, we take ei = ⌊log M/ log Qtpi ⌋. These exponents work
perfectly for the algorithm, since this new m = i=1 pei i continues to remain
a multiple of the order of P in Ep (of course, if |Ep | is B-smooth). √ If no
non-trivial upper bound on p is known, we can anyway take M = n.
A second problem is the choice of the curve parameters (a, b) and the point
P = (h, k) on E. If a and b are chosen as random residues modulo n, obtaining
a suitable point P on E becomes problematic. The usual way to locate a point
on an elliptic curve is to choose the x-coordinate h randomly, and subsequently
obtain the y-coordinate k by taking a square root of h3 + ah + b. In our case,
n is composite, and computing square roots modulo n is, in general, a difficult
computational problem—as difficult as factoring n itself.
This problem can be solved in two ways. The first possibility is to freely
choose a random point P = (h, k) and the parameter a. Subsequently, the
parameter b is computed as b ≡ k 2 − (h3 + ah) (mod n). The compositeness of
n imposes no specific problems in these computations. A second way to avoid
the modular square-root problem is to take a randomly, and set b ≡ c2 (mod n)
for some random c. We now choose the point P = (0, c) on the curve E. It
is often suggested to take b = c = 1 (and P = (0, 1)) always, and vary a
randomly to generate a random sequence of curves needed by the ECM.
Qt
The scalar multiplier m = i=1 pei i can be quite large, since t is already
a subexponential function of p (or n). The explicit computation of m can,
however, be avoided. We instead start with P0 = P , and then compute Pi =
pei i Pi−1 for i = 1, 2, . . . , t. Eventually, we obtain Pt = mP .
332 Computational Number Theory

The curve E : y 2 = x3 + ax + b is indeed an elliptic curve modulo p if


and only if E is non-singular, that is, p6 | (4a3 + 27b2 ). If this condition is not
satisfied, addition in Ep makes no sense. It may, therefore, be advisable to
check whether gcd(4a3 + 27b2 , n) > 1. If so, we have already located a divisor
of n. If that divisor is less than n, we return it. If it is equal to n, we choose
another random curve. If, on the other hand, gcd(4a3 + 27b2 , n) = 1, the curve
E is non-singular modulo every prime divisor of n, and we proceed to compute
the desired multiple mP of P . If a and b are randomly chosen, the probability
of obtaining gcd(4a3 + 27b2 , n) > 1 is extremely low, and one can altogether
avoid computing this gcd. But then, since this gcd computation adds only an
insignificant overhead to the computation of mP , it does not really matter
whether the non-singularity condition is checked or not.
The above observations are summarized as Algorithm 6.4.

Algorithm 6.4: Elliptic Curve Method


Repeat until a non-trivial divisor of n is located:
Choose a, h, k ∈ Zn randomly, and take b ≡ k 2 − (h3 + ah) (mod n).
E denotes the curve y 2 = x3 + ax + b, and P the point (h, k).
For i = 1, 2, . . . , t, set P = pei i P , where ei = ⌊log M/ log pi ⌋.
If the for loop encounters an integer r not invertible mod n {
Compute d = gcd(r, n).
If d 6= n, return d.
If d = n, abort the computation of mP for the current curve.
}

Example 6.22 Let us factor n = 1074967 by the ECM. I first demonstrate


an unsuccessful attempt. The choices P = (h, k) = (26, 83) and a = 8 give b ≡
k 2 − (h3 + ah) ≡ 1064072 (mod n). We have no non-trivial information about
a bound √ on the smallest prime divisor p of n. So we take the trivial bound
M = ⌊ n⌋ = 1036. We choose the factor base B = {2, 3, 5, 7, 11}. (Since this
is an artificially small example meant for demonstration only, the choice of
t is not dictated by any specific formula.) The following table illustrates the
computation of mP . For each scalar multiplication of a point on the curve,
we use a standard repeated double-and-add algorithm.
i pi ei = ⌊log M/ log pi ⌋ pei i Pi
0 P0 = (26, 83)
1 2 10 1024 P1 = 1024P0 = (330772, 1003428)
2 3 6 729 P2 = 729P1 = (804084, 683260)
3 5 4 625 P3 = 625P2 = (742854, 1008597)
4 7 3 343 P4 = 343P3 = (926695, 354471)
5 11 2 121 P5 = 121P4 = (730198, 880012)
The table demonstrates that the computation of P5 = mP proceeds without
any problem, so the attempt to split n for the above choices of a, b, h, k fails.
Integer Factorization 333

In order to see what happened behind the curtain, let me reveal that n
factors as pq with p = 541 and q = 1987. The curve Ep is cyclic with prime
order 571, and the curve Eq is cyclic of square-free order 1934 = 2×967. Thus,
neither |Ep | nor |Eq | is smooth over the chosen factor base B, and so mP 6= O
in both Ep and Eq , that is, an accidental discovery of a non-invertible element
modulo n did not happen during the five scalar multiplications. Indeed, the
points Pi on Ep and Eq are listed in the following table.
i Pi (mod n) Pi (mod p) Pi (mod q)
0 (26, 83) (26, 83) (26, 83)
1 (330772, 1003428) (221, 414) (930, 1980)
2 (804084, 683260) (158, 518) (1336, 1719)
3 (742854, 1008597) (61, 173) (1703, 1188)
4 (926695, 354471) (503, 116) (753, 785)
5 (730198, 880012) (389, 346) (969, 1758)
I now illustrate a successful iteration of the ECM. For the choices P =
(h, k) = (81, 82) and a = 3, we have b = 550007. We continue to take M =
1036 and B = {2, 3, 5, 7, 11}. The computation of mP now proceeds as follows.
i pi ei = ⌊log M/ log pi ⌋ pei i Pi
0 P0 = (81, 82)
1 2 10 1024 P1 = 1024P0 = (843635, 293492)
2 3 6 729 P2 = 729P1 = (630520, 992223)
3 5 4 625 P3 = 625P2 = (519291, 923811)
4 7 3 343 P4 = 343P3 = (988490, 846127)
5 11 2 121 P5 = 121P4 = ?
The following table lists all intermediate points lP4 in the left-to-right double-
and-add point-multiplication algorithm for computing P5 = 121P4 .
Step l lP4
Init 1 (988490, 846127)
Dbl 2 (519843, 375378)
Add 3 (579901, 1068102)
Dbl 6 (113035, 131528)
Add 7 (816990, 616888)
Dbl 14 (137904, 295554)
Add 15 (517276, 110757)
Dbl 30 (683232, 158345)
Dbl 60 (890993, 947226)
Dbl 120 (815911, 801218)
Add 121 Failure
In the last step, an attempt is made to add 120P4 = (815911, 801218) and
P4 = (988490, 846127). These two points are different modulo n. So we try to
invert r = 988490−815911 = 172579. To that effect, we compute the extended
gcd of r and n, and discover that gcd(r, n) = 541 is a non-trivial factor of n.
334 Computational Number Theory

Let us see what happened behind the curtain to make this attempt success-
ful. The choices a, b in this attempt give a curve Ep (where p = 541) of order
539 = 72 × 11, whereas Eq (where q = 1987) now has order 1959 = 3 × 653. It
follows that mP = O in Ep , but mP 6= O in Eq . This is the reason why the
computation of mP is bound to reveal p at some stage. The evolution of the
points Pi is listed below modulo n, p and q.

i Pi (mod n) Pi (mod p) Pi (mod q)


0 (81, 82) (81, 82) (81, 82)
1 (843635, 293492) (216, 270) (1147, 1403)
2 (630520, 992223) (255, 29) (641, 710)
3 (519291, 923811) (472, 324) (684, 1843)
4 (988490, 846127) (83, 3) (951, 1652)
5 ? O (1861, 796)

It is worthwhile to mention here that the above case of having a B-smooth


value of |Ep | and a B-non-smooth value of |Eq | is not the only situation in
which the ECM divulges a non-trivial factor of n. It may so happen that for a
choice of a, b, the group Eq has B-smooth order, whereas the other group Ep
has B-non-smooth order. In this case, the ECM divulges the larger factor q
instead of the smaller one p. For example, for the choices P = (h, k) = (50, 43),
a = 6 and b = 951516, we have |Ep | = 571, whereas |Eq | = 1944 = 23 × 35 . So
the computation of P2 = 729P1 = (729 × 1024)P fails, and reveals the factor
q = 1987. The details are not shown here.
There are other situations for the ECM to divulge a non-trivial factor of
n. Most surprisingly, this may happen even when both |Ep | and |Eq | are non-
smooth over B. For instance, the choices P = (h, k) = (8, 44), a = 1 and
b = 1076383 give |Ep | = 544 = 25 × 17 and |Eq | = 1926 = 2 × 32 × 107,
both non-smooth over B. We compute P1 = 1024P0 = (855840, 602652) and
P2 = 729P1 = (810601, 360117), and then the computation of P3 = 625P2
involves an addition chain in which we attempt to add 18P2 = (75923, 578140)
and P2 = (810601, 360117). P2 has order 17 in Ep , so 18P2 = P2 , that is, we
accidentally try to add P2 to itself. This results in a wrong choice of the
addition formula and a revelation of the factor p = 541 of n.
An addition chain in a single double-and-add scalar multiplication involves
the computation of only Θ(log n) multiples of the base point. Consequently,
a situation like the awkward addition just mentioned is √ rather unlikely in a
single scalar multiplication. But then, we make t = Lp [1/ 2 ] scalar√ multipli-
cations in each factoring attempt of the ECM, and we make Lp [1/ 2 ] such
attempts before we expect to discover a non-trivial factor of n. Given so many
addition chains, stray lucky incidents like the one illustrated in the last para-
graph need not remain too improbable. This is good news for the ECM, but
creates difficulty in analyzing its realistic performance. In fact, the running-
time analysis given below takes into account only the standard situation: |Ep |
is smooth only for the smallest prime factor p of n. ¤
Integer Factorization 335

For deriving the running time of the ECM, let us first assume that a fairly
accurate bound M on the smallest prime factor p of n is available to us. The
choices of the factor base and of the exponents √ ei depend upon that. More
precisely, we let B consist of all primes 6 LM [1/ 2 ]. An integer (the size
√ of
the group Ep ) of value√O(M ) is smooth over B with probability LM [−1/ 2 ],
that is, about LM [1/ 2 ] random curves E need to be tried to obtain one
B-smooth√ value of |Ep |. For each choice of a curve, we make t = |B| =
LM [1/ 2 ] scalar multiplications. Since each such scalar multiplication can
3
be completed in O(log √ n) time (a polynomial in log n), the √ running time
of the ECM is LM [ 2 ]. In the worst case, M is as large as n , and this
running time becomes Ln [1] which√is the same as that of the QSM. However,
if M is significantly smaller than n , the ECM is capable of demonstrating
superior performance compared to the QSM. For example, if n is known to
be the product
√ of three distinct primes of roughly the same bit size,p we can
take M = 3 n . In that case, the running time of the ECM is Ln [ 2/3 ] ≈
Ln [0.816]—the same as the best possible running time of the CSM.
Given our current knowledge of factoring, the hardest nuts to crack are
the products of two primes of nearly the same bit sizes. For such integers, the
MPQSM has been experimentally found by the researchers to be slightly more
efficient than the ECM. The most plausible reason for this is that the ECM
has no natural way of quickly sieving out the bad choices. The ECM is still an
important algorithm, because it can effectively exploit the presence of small
prime divisors—a capability not present at all in the QSM or the MPQSM.

6.9 Number-Field Sieve Method


The most sophisticated and efficient integer-factoring algorithm known to
date is the number-field sieve method (NFSM) proposed for special numbers
by Lenstra, Lenstra, Manasse and Pollard,12 and extended later to work for
all integers by Buhler, Lenstra and Pomerance.13 Understanding the NFSM
calls for exposure to number fields and rings—topics well beyond the scope of
this book. I provide here a very intuitive description of the NFSM.
In order to motivate the development of the NFSM, let us recapitulate
2
the working of the QSM. We take the polynomial f (x) √ = x − J satisfying
f (H) = n. Here, n is the integer to be factored, H = ⌈ n ⌉, and J = H 2 − n.
If J is a perfect square y 2 , we already have a congruence H 2 ≡ y 2 (mod n)
capable of splitting n. So assume that J is not a perfect square, that is, f (x)
is an irreducible polynomial in Q[x]. Let θ be a root of f in C. By adjoining
12 Arjen K. Lenstra, Hendrik W. Lenstra, Jr., Mark S. Manasse and John M. Pollard, The

number field sieve, STOC, 564–572, 1990.


13 Joe P. Buhler, Hendrik W. Lenstra, Jr. and Carl Pomerance, Factoring integers with

the number field sieve, Lecture Notes in Mathematics, 1554, Springer, 50–94, 1994.
336 Computational Number Theory

θ to Q, we get a field K = Q(θ) = {a + bθ | a, b ∈ Q}. This is an example of


a quadratic number field.
Let OK be all the elements of K, which satisfy monic polynomials with
integer coefficients.14 OK turns out to be a ring and so an integral domain.
For simplicity, assume that OK = {a + bθ | a, b ∈ Z} (this is indeed the case if
J is a square-free integer congruent to 2 or 3 modulo 4). The product of two
elements a + bθ, c + dθ ∈ OK is
(a + bθ)(c + dθ) = ac + (ad + bc)θ + bdθ 2 = (bdJ + ac) + (ad + bc)θ.
Since f (H) ≡ 0 (mod n), the map OK → Zn taking a + bθ to a + bH is a ring
homomorphism. Applying this homomorphism on the above product gives
(a + bH)(c + dH) ≡ (bdJ + ac) + (ad + bc)H (mod n).
If the right side of this congruence is smooth over a set of small primes, we
obtain a relation of the form
t
Y
(a + bH)(c + dH) ≡ pei i (mod n).
i=1

The original QSM takes a = c and b = d = 1, so we have


(a + bH)(c + dH) ≡ (H + c)2
≡ (bdJ + ac) + (ad + bc)H ≡ (J + c2 ) + 2cH ≡ T (c) (mod n).
By varying c, we obtain many relations. Also see Exercise 6.19.
With a bit of imagination, let us generalize this idea. Let f (x) ∈ Z[x] be
an irreducible polynomial such that f (H) is equal to n (or some multiple of
n) for some H ∈ Z. One possibility is to take an H of bit size about (lg n)2/3 ,
and obtain f (H) as the expansion of n to the base H, so f (H) is a polynomial
of degree d ∼ (lg n)1/3 . Adjoining a root θ ∈ C of f to Q gives us the field
K = Q(θ) = {a0 + a1 θ + a2 θ2 + · · · + ad−1 θd−1 | a0 , a1 , . . . , ad−1 ∈ Q}.
This is a number field of degree d. Let OK denote the set of elements of K,
that satisfy monic irreducible polynomials in Z[x]. OK is an integral domain,
called the order of K or the ring of integers of K. For simplicity, assume that
OK supports unique factorization of elements into products of primes and
units (in OK ). If we multiply elements β1 , β2 , . . . , βk ∈ OK , the product is
again a polynomial in θ of degree 6 d − 1, that is,
β1 β2 · · · βk = α = a0 + a1 θ + a2 θ2 + · · · + ad−1 θd−1 .
Since f (H) ≡ 0 (mod n), the map taking θ 7→ H naturally extends to a ring
homomorphism Φ : OK → Zn . Applying this map to the last equation gives
Φ(β1 )Φ(β2 ) · · · Φ(βk ) ≡ Φ(α) ≡ a0 + a1 H + a2 H 2 + · · · + ad−1 H d−1 (mod n).
14 O is the Gothic O. The letter O is an acronym for (maximal) order of K.
Integer Factorization 337

If the right side of this congruence is smooth over a set of small primes, we
get a relation of the form
t
Y
Φ(β1 )Φ(β2 ) · · · Φ(βk ) ≡ pei i (mod n).
i=1

In the QSM, we choose H ≈ n . In order to arrive at a smaller running time,
the NFSM chooses a subexponential expression in log n as H. Moreover, the
product α should be a polynomial of small degree in θ so that substituting θ
by H in α gives a value much smaller compared to n. A good choice for α is

α = aθ + b

for small coprime integers a, b. But then, how can we express such an element
aθ + b as a product of β1 , β2 , . . . , βk ? This is precisely where the algebraic
properties of OK come to the forefront. We can identify a set of small elements
in OK . More technically, we choose a set P of elements of OK of small prime
norms, and a generating set U of units of OK . For a choice of a, b, we need an
algorithm to factor a + bθ completely (if possible) into a product of elements
from P ∪ U. Indeed, checking the smoothness of a + bθ reduces to checking the
smoothness of the integer (−b)d f (−a/b). Factorization in number rings OK
is too difficult a topic to be explained here. Example 6.23 supplies a flavor.
To worsen matters, the ring OK may fail to support unique factorization
of elements into products of primes and units. However, all number rings are
so-called Dedekind domains where unique factorization holds at the level of
ideals, and each ideal can be generated by at most two elements. The general
number-field sieve method takes all these issues into account, and yields an
integer-factoring algorithm that runs in L(n, 1/3, (64/9)1/3 ) time.

Example 6.23 Let us attempt to factor n = 89478491 using the NFSM. In


this example, I demonstrate how a single relation can be generated. The first
task is to choose a suitable polynomial f (x) ∈ Z[x] and a positive integer H
such that f (H) is a small multiple of n. For the given n, the choices f (x) =
x7 + x + 1 and H = 16 work. In fact, we have f (H) = 3n.
The polynomial f (x) = x7 + x + 1 is irreducible in Z[x]. Let θ ∈ C be
a root of f (x). All the roots of an irreducible polynomial are algebraically
indistinguishable from one another. So we do not have to identify a specific
complex root of f . Instead, we define the number field K = Q(θ) as the set

K = Q(θ) = {a0 + a1 θ + · · · + a6 θ6 | ai ∈ Q},

and implement the arithmetic of K as the polynomial arithmetic of Q[x] mod-


ulo the defining irreducible polynomial f (x). This also saves us from address-
ing the issues associated with floating-point approximations.
The number field K has degree d = 7. The number of real roots of the
polynomial f (x) is r1 = 1 (in order to see why, plot f (x), or use the fact that
the derivative f ′ (x) = 7x6 + 1 is positive for all real x). This real root is an
338 Computational Number Theory

irrational value −0.79654 . . . , but we do not need to know this value explicitly
or approximately, because we planned for an algebraic representation of K.
The remaining six roots of f (x) are (properly) complex. Since complex roots
of a real polynomial occur in complex-conjugate pairs, the number of pairwise
non-conjugate complex roots of f (x) is r2 = 3. The pair (r1 , r2 ) = (1, 3) is
called the signature of K.
Having defined the number field K, we now need to concentrate on its ring
of integers OK , that is, elements of K whose minimal polynomials are in Z[x]
and are monic. It turns out that all elements of OK can be expressed uniquely
as Z-linear combinations of 1, θ, θ2 , . . . , θ6 , that is,
OK = Z[θ] = {a0 + a1 θ + · · · + a6 θ6 | ai ∈ Z}.
Furthermore, this OK supports unique factorization at the level of elements,
so we do not have to worry about factorization at the level of ideals.
The second task is to choose a factor base. The factor base now consists
of two parts: small integer primes p1 , p2 , . . . , pt for checking the smoothness
of the (rational) integers aH + b, and some small elements of OK for checking
the smoothness of the algebraic integers aθ + b. The so-called small elements
of OK are some small primes in OK and some units in OK .
In order to understand how small prime elements are chosen from OK , we
need the concept of norms. Let α = α(θ) be an element of K. Let θ1 , θ2 , . . . , θd
be all the roots of f (x) in C. The norm of α is defined as
d
Y
N(α) = α(θi ).
i=1

It turns out that N(α) ∈ Q. Moreover, if α ∈ OK , then N(α) ∈ Z. If α ∈ Q,


then N(α) = αd . Norm is a multiplicative function, that is, for all α, β ∈ K,
we have N(αβ) = N(α) N(β). We include an element γ of OK in the factor
base if and only if N(γ) is a small prime.
Let p be a small integer prime. In order to identify all elements of OK of
norm p, we factor f (x) over the finite field Fp . Each linear factor of f (x) in
Fp [x] yields an element of OK of norm p.
Modulo the primes p = 2 and p = 5, the polynomial f (x) remains irre-
ducible. So there are no elements of OK of norm 2 or 5. We say that the
(rational) primes 2 and 5 remain inert in OK .
For the small prime p = 3, we have x7 + x + 1 ≡ (x + 2)(x6 + x5 + x4 + x3 +
2
x + x + 2) (mod 3). The linear factor yields an element of norm 3. The other
factor gives an element of norm 36 , which is not considered for inclusion in the
factor base. For the linear factor x + 2, we need to find out an element γ3,2 (a
polynomial in θ of degree 6 d − 1 = 6) such that all OK -linear combinations
of 3 and θ + 2 can be written as multiples of γ3,2 . Computation of such an
element is an involved process. In this example, we have γ3,2 = θ6 − θ5 .
For p = 7, we have x7 + x + 1 ≡ (x + 4)(x2 + x + 3)(x2 + x + 4)(x2 +
x + 6) (mod 7). The quadratic factors yield elements of OK of norm 72 . The
linear factor corresponds to the element γ7,4 = θ6 − θ5 + θ4 of norm 7.
Integer Factorization 339

The small prime p = 11 behaves somewhat differently. We have x7 +x+1 ≡


(x + 3)2 (x5 + 5x4 + 5x3 + 2x2 + 9x + 5) (mod 11). The linear factor x + 3
has multiplicity more than one. We say that the prime 11 ramifies in OK .
Nevertheless, this factor gives the element γ11,3 = θ5 + 1 of norm 11.
Now that we have seen some examples of how integer primes behave in the
number ring OK , it remains to understand the units of OK for completing
the construction of the factor base. The group of units of OK is isomorphic to
the additive group Zs × Zr for some s ∈ N and r ∈ N0 . The (torsion group of)
units of finite orders is generated by a primitive s-th root ωs of unity. In this
example, the only units of OK of finite orders are ±1. Besides these (torsion)
elements, we can identify r independent units ξ1 , ξ2 , . . . , ξr of infinite order,
called a set of fundamental units in OK . By Dirichlet’s unit theorem, the rank
r is r1 + r2 − 1, where (r1 , r2 ) is the signature of K. In this example, r1 = 1
and r2 = 3, so we have three fundamental units. These units can be chosen as
ξ1 = θ, ξ2 = θ4 + θ2 , and ξ3 = θ6 − θ3 + 1. These elements have norms ±1.
To sum up, the factor base needed to factor the algebraic integers aθ + b
consists of the elements γp,cp for small integer primes p, the primitive root ωs
of unity, and the fundamental units ξ1 , ξ2 , . . . , ξr .
I now demonstrate the generation of a relation. Take ¡ the element α = 2θ ¢+
1 ∈ OK . The norm of this element is N(α) = (−2)7 (−1/2)7 + (−1/2) + 1 =
1 + 26 − 27 = −63 = −32 × 7. Indeed, α factors in OK as α = ω2 ξ2 ξ3 γ3,2 2
γ7,4 .
This can be written more explicitly as
2θ + 1 = (−1)(θ4 + θ2 )(θ6 − θ3 + 1)(θ6 − θ5 )2 (θ6 − θ5 + θ4 ),
that is,
2x + 1 ≡ (−1)(x4 + x2 )(x6 − x3 + 1)(x6 − x5 )2 (x6 − x5 + x4 ) (mod f (x)).
Since f (H) ≡ 0 (mod n), substituting x by H gives
33 ≡ (−1)(H 4 + H 2 )(H 6 − H 3 + 1)(H 6 − H 5 )2 (H 6 − H 5 + H 4 ) (mod n).

In fact, putting H = 16 lets the right side evaluate to

−4311876232897885183388771627827200 = −48188969043944708269485363n + 33.

But 33 = 3 × 11 is smooth (over a factor base containing at least the first five
rational primes), that is, a relation is generated. ¤

The NFSM deploys two sieves, one for filtering the smooth integers aH +b,
and the other for filtering the smooth algebraic integers aθ + b. Both these
candidates are subexponential expressions in log n. For QSM, the smoothness
candidates are exponential in log n. This results in the superior asymptotic
performance of the NFSM over the QSM. In practice, this asymptotic superi-
ority shows up for input integers of size at least several hundred bits.
340 Computational Number Theory

Exercises
1. Let ω and c be constants with 0 < ω < 1 and c > 0, and
£ ¤
Ln (ω, c) = exp (c + o(1)) (ln n)ω (ln ln n)1−ω .

For a positive constant k, prove that:


(a) The expression (ln n)k Ln (ω, c) is again of the form Ln (ω, c).
(b) The expression nk+o(1) Ln (ω, c) is again of the form nk+o(1) .
2. Take n ≈ 21024 . Evaluate the expressions n1/4 , Ln (1/2, 1) and Ln (1/3, 2)
(ignore the o(1) term in subexponential expressions). What do these values
tell about known integer-factoring algorithms?
3. You are given a black box that, given two positive integers n and k, returns in
one unit of time the decision whether n has a factor d in the range 2 6 d 6 k.
Using this black box, devise an algorithm to factor a positive integer n in
polynomial (in log n) time. Deduce the running time of your algorithm.
4. Explain why the functions f (x) = x2 (mod n) and f (x) = x2 − 2 (mod n) are
not chosen in Pollard’s rho method for factoring n.
5. Write a pseudocode implementing Floyd’s variant of Pollard’s rho method
with block gcd calculations.
6. In Floyd’s variant of Pollard’s rho method for factoring the integer n, we
compute the values of xk and x2k and then gcd(xk − x2k , n) for k = 1, 2, 3, . . . .
Suppose that we instead choose some r, s ∈ N, and compute xrk+1 and xsk
and subsequently gcd(xrk+1 − xsk , n) for k = 1, 2, 3, . . . .
(a) Deduce a condition relating r, s and the period τ ′ of the cycle such that
this method is guaranteed to detect a cycle of period τ ′ .
(b) Characterize all the pairs (r, s) such that this method is guaranteed to
detect cycles of any period.
7. In this exercise, we describe a variant of the second stage for Pollard’s p − 1
method. Suppose that p − 1 = uv with a B-power-smooth u and with a large
prime v (that is, a prime satisfying B < v < B ′ ). In the first stage, we have
already computed b ≡ aE (mod n).
(a) If b 6≡ 1 (mod n), show that ordp (b) = v.
(b) Set b0 = b, and for i = 1, 2, . . . , t, compute bi = bli−1
i
(mod n), where li


is a random integer between 2 and B − 1. Argue that for t = O( B ′ ), we
expect to have distinct i, j with bi ≡ bj (mod p).
(c) Describe how such a pair (i, j) is expected to reveal the factor p of n.
(d) Compare this variant of the second stage (in terms of running time and
space requirement) with the variant presented in Section 6.3.1.
8. [Euler’s factorization method ] We say that a positive integer n can be written
as the sum of two squares if n = a2 + b2 for some positive integers a, b.
Integer Factorization 341

(a) Show that if two odd integers m, n can be written as sums of two squares,
then their product mn can also be so written.
(b) Prove that no n ≡ 3 (mod 4) can be written as a sum of two squares.
(c) Let a square-free composite integer n be a product of (distinct) primes
each congruent to 1 modulo 4. Show that n can be written as a sum of two
squares in (at least) two different ways.
(d) Let n be as in Part (c). Suppose that we know two ways of expressing n
as a sum of two squares. Describe how n can factored easily.
9. Prove that an integer of the form 4e (8k + 7) (with e, k > 0) cannot be written
as a sum of three squares. (Remark: The converse of this statement is also
true, but is somewhat difficult to prove.)
10. Prove that Carmichael numbers are (probabilistically) easy to factor.
11. (a) Suppose that in Dixon’s method for factoring n, we first choose a non-zero
z ∈ Zn , and generate relations of the form
x2i z ≡ pαi1 αi2 αit
1 p2 · · · pt (mod n).
Describe how these relations can be combined to arrive at a congruence of the
form x2 ≡ y 2 (mod n).
(b) Now, choose several small values of z (like 1 6 z 6 M for a small bound
M ). Describe how you can still generate a congruence x2 ≡ y 2 (mod n). What,
if anything, do you gain by using this strategy (over Dixon’s original method)?
12. Dixon’s method for factoring an integer n can be combined with a sieve in
order to reduce its running time to L[3/2]. Instead of choosing random values
of x1 , x2 , . . . , xs in the relations, we first choose a random value of x, and for
−M 6 c 6 M , we check the smoothness of the integers (x+c)2 (mod n) over t
small primes p1 , p2 , . . . , pt . As in Dixon’s original method, we take t = L[1/2].
(a) Determine M for which one expects to get a system of the desired size.
(b) Describe a sieve over the interval [−M, M ] for detecting the smooth values
of (x + c)2 (mod n).
(c) Deduce how you achieve a running time of L[3/2] using this sieve.
13. Dixon’s method for factoring an integer n can be combined with another
sieving idea. We predetermine a factor base B = {p1 , p2 , . . . , pt }, and a sieving
interval [−M, M ]. Suppose that we compute u ≡ z 2 (mod n) for a randomly
chosen non-zero z ∈ Zn .
(a) Describe a sieve to identify all B-smooth values of u+cn for −M 6 c 6 M .
Each such B-smooth value yields a relation of the form
z 2 ≡ pαi1 αi2 αit
1 p2 · · · pt (mod n).
(b) How can these relations be combined to get a congruence x2 ≡ y 2 (mod n)?
(c) Supply optimal choices for t and M . What is the running time of this
variant of Dixon’s method for these choices of t and M ?
(d) Compare the sieve of this exercise with that of Exercise 6.12.
14. Let αij be the exponent of the j-th small prime pj in the i-th relation collected
in Dixon’s method. We find vectors in the null space of the t × s matrix A =
342 Computational Number Theory

(αji ). In the linear-algebra phase, it suffices to know the value of αij modulo
2. Assume that for a small prime p and a small exponent h, the probability
that a random square x2 (mod n) has probability 1/ph of being divisible by
ph (irrespective of whether x2 (mod n) is B-smooth or not). Calculate the
probability that αij ≡ 1 (mod 2). (This probability would be a function of
the prime pj . This probability calculation applies to other subexponential
factoring methods like CFRAC, QSM and CSM.)
15. [Fermat’s factorization method ] Let n be √ an odd positive composite integer
which is not a perfect square, and H = ⌈ n ⌉.
(a) Prove that there exists c > 0 such that (H + c)2 − n is a perfect square
b2 with H + c 6≡ ±b (mod n).
(b) If we keep on trying c = 0, 1, 2, . . . until (H + c)2 − n is a perfect square,
we obtain an algorithm to factor n. What is its √ worst-case complexity?
(c) Prove that if n has a factor u satisfying n − u < n1/4 , then H 2 − n is
itself a perfect square.
16. Suppose that we want to factor n = 3337 using the quadratic sieve method.
(a) Determine H and J, and write the expression for T (c).
(b) Let the factor base B be a suitable subset of {−1, 2, 3, 5, 7, 11}. Find all
B-smooth values of T (c) for −5 6 c 6 5. You do not have to use a sieve. Find
the smooth values by trial division only.
17. (a) Explain how sieving is carried out in the multiple-polynomial quadratic
sieve method, that is, for T (c) = U + 2V c + W c2 with V 2 − U W = n.
(b) If the factor base consists of L[1/2] primes and the sieving interval is of
size L[1], deduce that the sieving process can be completed in L[1] time.
√ §√ ¨
18. In the original QSM, we sieve around n . Let us instead take H = 2n ,
and J = H 2 − 2n.
(a) Describe how we can modify the original QSM to work for these values of
H and J. (It suffices to describe how we get a relation in the modified QSM.)
(b) Explain why the modified QSM is poorer than the original QSM. (Hint:
Look at the approximate average value of |T (c)|.)
(c) Despite the objection in Part (b) about the modified QSM, we can √exploit
it to our advantage. Suppose that we √ run two sieves: one around n (the
original QSM), and the other around 2n (the modified QSM), each on a
sieving interval of length half of that for the original QSM. Justify why this
reduction in the length of the sieving interval is acceptable. Discuss what we
gain by using the dual sieve.
19. In the original QSM, we took T (c) = (H + c)2 − n = J + 2cH + c2 . Instead,
one may choose c1 , c2 satisfying −M 6 c1 6 c2 6 M , and consider T (c1 , c2 ) =
(H + c1 )(H + c2 ) − n = J + (c1 + c2 )H + c1 c2 .
(a) Describe how we get a relation in this variant of the QSM.
(b) Prove that if we choose t = L[1/2] primes in the factor base and M =
L[1/2], we expect to obtain the required number of relations.
(c) Describe a sieving procedure for this variant of the QSM.
(d) Argue that this variant can be implemented to run in L[1] time.
Integer Factorization 343

(e) What are the advantages and disadvantages of this variant of the QSM
over the original QSM?
20. [Special-q variant of QSM ] In the original QSM, we sieve the quantities
T (c) = (H + c)2 − n for −M 6 c 6 M . For small values of |c|, the values
|T (c)| are small and are likely to be smooth. On the contrary, larger values of
c in the sieving interval yield larger values of |T (c)| resulting in poorer yields
of smooth candidates. In Exercise 6.18, this problem is tackled by using a dual
sieve. The MPQSM is another solution. We now study yet another variant.15
In this exercise, we study this variant for large primes only. See Exercise 6.29
for a potential speedup.
(a) Let q be a large prime (B < q < B 2 ) and c0 a small integer such that
q|T (c0 ). Describe how we can locate such q and c0 relatively easily.
(b) Let Tq (c) = T (c0 + cq)/q. How you can sieve Tq (c) for −M 6 c 6 M ?
(c) What do you gain by using this special-q variant of the QSM?
21. Describe a sieve for locating all the smooth values of T (a, b, c) in the CSM.
3
22. Show that the total number of solutions of the congruence xP ≡ y 2 z (mod n)
3 2 2
with x 6= y z is Θ(n ). You may use the formula that 16m6n d(m) =
Θ(n ln n), where d(m) denotes the number of positive integral divisors of m.
23. Describe a special-q method for the CSM. What do you gain, if anything, by
using this special-q variant of the CSM?
24. Show that in the ECM, we can maintain the multiples of P as pairs of ratio-
nal numbers. Describe what modifications are necessary in the ECM for this
representation. What do you gain from this?
25. [Montgomery ladder ]16 You want to compute nP for a point P on the curve
Y 2 = X 3 + aX + b. Let n = (ns−1 ns−2 . . . n1 n0 )2 and Ni = (ns−1 ns−2 . . . ni )2 .
(a) Rewrite the left-to-right double-and-add algorithm so that both Ni P and
(Ni + 1)P are computed in the loop.
(b) Prove that given only the X-coordinates of P1 , P2 and P1 − P2 , we can
compute the X-coordinate of P1 + P2 . Handle the case P1 = P2 too.
(c) What implication does this have in the ECM?
26. How can a second stage (as in Pollard’s p − 1 method) be added to the ECM?
27. Investigate how the integer primes 13, 17, 19, 23 behave in the number ring
OK of Example 6.23.
28. [Lattice sieve] Pollard17 introduces the concept of lattice sieves in connection
with the NFSM. Let B be a bound of small primes in the factor base. One
finds out small coprime pairs a, b such that both a + bm (a rational integer)
and a + bθ (an algebraic integer) are B-smooth. The usual way of sieving fixes
15 James A. Davis and Diane B. Holdridge, Factorization using the quadratic sieve algo-

rithm, Report SAND 83–1346, Sandia National Laboratories, Albuquerque, 1983.


16 Peter L. Montgomery, Speeding the Pollard and elliptic curve methods of factorization,

Mathematics of Computation, 48(177), 243–264, 1987.


17 John M. Pollard, The lattice sieve, Lecture Notes in Mathematics, 1554, Springer, 43–

49, 1993.
344 Computational Number Theory

a, and lets b vary over an interval. This is denoted by line sieving. In the rest
of this exercise, we restrict our attention to the rational sieve only.
We use a bound B ′ < B. The value k = B ′ /B lies in the range [0.1, 0.5]. All
primes 6 B ′ are called small primes. All primes p in the range B ′ < p 6 B are
called medium primes. Assume that no medium prime divides m. First, fix a
medium prime q, and consider only those pairs (a, b) with a+bm ≡ 0 (mod q).
Sieve using all primes p < q. This sieve is repeated for all medium primes q.
Let us see the effects of this sieving technique.
(a) Let N be the number of (a, b) pairs for which a + bm is checked for
smoothness in the line sieve, and N ′ the same number for the lattice sieve.
Show that N ′ /N ≈ log(1/k)/ log B. What is N ′ /N for k = 0.25 and B = 106 ?
(b) What smooth candidates are missed in the lattice sieve? Find their relative
percentage in the set of smooth integers located in the line sieve, for the values
k = 0.25, B = 106 , m = 1030 , and for b varying in the range 0 6 b 6 106 .
These real-life figures demonstrate that with significantly reduced efforts, one
can obtain most of the relations.
(c) Show that all integer solutions (a, b) of a + bm ≡ 0 (mod q) form a two-
dimensional lattice. Let V1 = (a1 , b1 ) and V2 = (a2 , b2 ) constitute a reduced
basis of this lattice.
(d) A solution (a, b) of a + bm ≡ 0 (mod q) can be written as (a, b) = cV1 +
dV2 = (ca1 + da2 , cb1 + db2 ). Instead of letting a vary from −M to M and b
from 1 to M , Pollard suggests letting c vary from −C to C and d from 1 to D.
This is somewhat ad hoc, since rectangular regions in the (a, b) plane do not,
in general, correspond to rectangular regions in the (c, d) plane. Nonetheless,
this is not a practically bad idea. Describe how sieving can be done in the
chosen rectangle for (c, d).
29. Describe how the idea of using small and medium primes, introduced in Exer-
cise 6.28, can be adapted to the case of the QSM. Also highlight the expected
benefits. Note that this is the special-q variant of the QSM with medium
special primes q instead of large special primes as discussed in Exercise 6.20.

Programming Exercises
Implement the following in GP/PARI.
30. Floyd’s variant of Pollard’s rho method.
31. Brent’s variant of Pollard’s rho method.
32. Pollard’s p − 1 method.
33. The second stage of Pollard’s p − 1 method.
34. Fermat’s factorization method (Exercise 6.15).
35. Dixon’s method.
36. The relation-collection stage of the QSM (use trial division instead of a sieve).
37. The sieve of the QSM.
38. Collecting relations involving large primes from the sieve of Exercise 6.37.
Chapter 7
Discrete Logarithms

7.1 Square-Root Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347


7.1.1 Shanks’ Baby-Step-Giant-Step (BSGS) Method . . . . . . . . . . . . . . . . . . 348
7.1.2 Pollard’s Rho Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349
7.1.3 Pollard’s Lambda Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
7.1.4 Pohlig–Hellman Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
7.2 Algorithms for Prime Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
7.2.1 Basic Index Calculus Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
7.2.2 Linear Sieve Method (LSM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
7.2.2.1 First Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
7.2.2.2 Sieving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
7.2.2.3 Running Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
7.2.2.4 Second Stage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
7.2.3 Residue-List Sieve Method (RLSM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
7.2.4 Gaussian Integer Method (GIM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363
7.2.5 Cubic Sieve Method (CSM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
7.2.6 Number-Field Sieve Method (NFSM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369
7.3 Algorithms for Fields of Characteristic Two . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370
7.3.1 Basic Index Calculus Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
7.3.1.1 A Faster Relation-Collection Strategy . . . . . . . . . . . . . . . . . . . 373
7.3.2 Linear Sieve Method (LSM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
7.3.3 Cubic Sieve Method (CSM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 378
7.3.4 Coppersmith’s Method (CM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381
7.4 Algorithms for General Extension Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384
7.4.1 A Basic Index Calculus Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384
7.4.2 Function-Field Sieve Method (FFSM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385
7.5 Algorithms for Elliptic Curves (ECDLP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
7.5.1 MOV/Frey–Rück Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389

Let G be a finite group1 of size n. To start with, assume that G is cyclic, and
g is a generator of G. Any element a ∈ G can be uniquely expressed as a = g x
for some integer x in the range 0 6 x 6 n − 1. The integer x is called the
discrete logarithm or index of a with respect to g, and is denoted by indg a.
Computing x from G, g and a is called the discrete logarithm problem (DLP).
We now remove the assumption that G is cyclic. Let g ∈ G have ord(g) =
m, and let H be the subgroup of G generated by g. H is cyclic of order m. We
are given an element a ∈ G. If a ∈ H, then a = g x for some unique integer x
in the range 0 6 x 6 m − 1. On the other hand, if a ∈ / H, then a cannot be

1 Unless otherwise stated, a group is a commutative (Abelian) group under multiplication.

345
346 Computational Number Theory

expressed as a = g x . The (generalized) discrete logarithm problem (GDLP )


refers to the determination of whether a can be expressed as g x and if so,
the computation of x. In general, DLP means either the special DLP or the
GDLP. In this book, we mostly study the special variant of the DLP.
Depending upon the group G, the computational complexity of the DLP
in G varies from easy (polynomial-time) to very difficult (exponential-time).
First, consider G to be the additive group Zn , and let g be a generator of Zn . It
is evident that g generates Zn if and only if g ∈ Z∗n . Given a ∈ Zn , we need to
find the unique integer x ∈ {0, 1, . . . , n − 1} such that a ≡ xg (mod n) (this is
the DLP in an additive setting). Since g ∈ Z∗n , we have gcd(g, n) = 1, that is,
ug + vn = 1 for some integers u, v. But then, ug ≡ 1 (mod n), that is, (ua)g ≡
a (mod n), that is, the discrete logarithm of a is x ≡ ua ≡ g −1 a (mod n). To
sum up, the discrete logarithm problem is easy in the additive group Zn .
Next, consider the multiplicative group Z∗n . For simplicity, assume that Z∗n
is cyclic, and let g ∈ Z∗n be a primitive root of n. Any a ∈ Z∗n can be written
uniquely as a ≡ g x (mod n) for an integer x in the range 0 6 x 6 φ(n) − 1. In
particular, if n = p is a prime, then 1 6 x 6 p − 2. The determination of x in
this case is apparently not an easy computational problem.
If we generalize the case of Zp to a finite field Fq and take G = F∗q , then we
talk about the finite-field discrete-logarithm problem. For a primitive element g
of F∗q and for a ∈ F∗q , there exists a unique integer x in the range 0 6 x 6 q − 2
such that a = g x . Like the case of Z∗p , the computation of x in this case again
appears to be a difficult computational problem.
Finally, the DLP in the group of rational points on an elliptic curve over
a finite field is called the elliptic-curve discrete-logarithm problem (ECDLP).
In
√ a general group of size n, the discrete logarithm problem can be solved in
O˜( n) time by algorithms referred to as square-root methods. The arithmetic
in the group G may allow us to arrive at faster algorithms. For example, the
availability of the extended gcd algorithm lets us solve the DLP in (Zn , +) in
polynomial time. For Z∗n , or, more generally, for F∗q , we know several subexpo-
nential algorithms. It is a popular belief that the discrete logarithm problem
in the multiplicative group of finite fields is computationally as difficult as the
integer-factoring problem. Some partial results are known to corroborate this
suspected computational equivalence. In practice, many algorithms that we
use to solve the finite-field discrete-logarithm problem are adaptations of the
subexponential algorithms for factoring integers. For elliptic curves, on the
other hand, no algorithms better than the square-root methods are known.
Only when the curve is of some special forms, some better (subexponential
and even polynomial-time) algorithms are known.
Like the integer-factoring problem, the computational difficulty of the DLP
is only apparent. No significant results are known to corroborate the fact that
the DLP cannot be solved faster than that achievable by the known algorithms.
The computational complexity of the DLP in a group G may depend upon
the representation of the elements of G. For example, if G = F∗q , and elements
of G are already represented by their indices with respect to a primitive el-
Discrete Logarithms 347

ement, then computing indices with respect to that or any other primitive
element is rather trivial. However, we have argued in Section 2.5 that this
representation is not practical except only for small fields.
A related computational problem is called the Diffie–Hellman problem
(DHP) that came to light after the seminal discovery of public-key cryptog-
raphy by Diffie and Hellman in 1976. Consider a multiplicative group G with
g ∈ G. Suppose that the group elements g x and g y are given to us for some
unknown indices x and y. Computation of g xy from the knowledge of G, g, g x
and g y is called the Diffie–Hellman problem in G. Evidently, if the DLP in G
is easy to solve, the DHP in G is easy too (g xy = (g x )y with y = indg (g y )).
The converse implication is not clear. It is again only a popular belief that
solving the DHP in G is computationally as difficult as solving the DLP in G.
In most of this chapter, I concentrate on algorithms for solving the discrete-
logarithm problem in finite fields. I start with some square-root methods that
are applicable to any group (including elliptic-curve groups). Later, I focus
on two practically important cases: the prime fields Fp , and the binary fields
F2n . Subexponential algorithms, collectively called index calculus methods, are
discussed for these two types of fields. The DLP in extension fields Fpn of odd
characteristics p is less studied, and not many significant results are known for
these fields, particularly when both p and n are allowed to grow indefinitely.
At the end, the elliptic-curve discrete-logarithm problem is briefly addressed.
GP/PARI supports computation of discrete logarithms in prime fields. One
should call znlog(a,g), where g is a primitive element of Z∗p for some prime p.

gp > p = nextprime(10000)
%1 = 10007
gp > g = Mod(5,p)
%2 = Mod(5, 10007)
gp > znorder(g)
%3 = 10006
gp > a = Mod(5678,p)
%4 = Mod(5678, 10007)
gp > znlog(a,g)
%5 = 8620
gp > g^8620
%6 = Mod(5678, 10007)
gp > znlog(Mod(0,p),g)
*** impossible inverse modulo: Mod(0, 10007).

7.1 Square-Root Methods


The square-root methods are DLP algorithms of the dark age, but they
apply to all groups. Unlike factoring integers, we do not have modern al-
348 Computational Number Theory

gorithms for computing discrete logarithms in all groups. For example, the
fastest known algorithms for solving the ECDLP for general elliptic curves
are the square-root methods. It is, therefore, quite important to understand
the tales from the dark age. In this section, we assume that G is a finite cyclic
multiplicative group of size n, and g ∈ G is a generator of G. We are interested
in computing indg a for some a ∈ G.

7.1.1 Shanks’ Baby-Step-Giant-Step (BSGS) Method


The baby-step-giant-step method refers to a class of algorithms, proposed
originally
√ by the American mathematician Shanks (1917–1996).2 Let m =
⌈ n ⌉. We compute and store g i for i = 0, 1, 2, . . . , m − 1 as ordered pairs
(i, g i ) sorted with respect to the second element. For j = 0, 1, 2, . . . , we check
whether ag −jm is a second element in the table of the pairs (i, g i ). If, for
some particular j, we obtain such an i, then ag −jm = g i , that is, a = g jm+i ,
that is, indg a ≡ jm + i (mod n). The determination of j is called the giant
step, whereas the determination of i is called the baby step. Since indg a ∈
{0, 1, 2, . . . , n − 1} and n 6 m2 , we can always express indg a as jm + i for
some i, j ∈ {0, 1, . . . , m − 1}.

Example 7.1 Let me illustrate the BSGS method for the √ group G = F∗97
with generator g = 23. Since n = |G| = 96, we have m = ⌈ n ⌉ = 10. The
table of baby steps contains (i, g i ) for i = 0, 1, 2, . . . , 9, and is given below.
The table is kept sorted with respect to the second element (g i ).
i 0 5 8 6 1 7 3 2 9 4
gi 1 5 16 18 23 26 42 44 77 93

Let us determine the index of 11 with respect to 23. We first compute g −m ≡


66 (mod 97). For j = 0, ag −jm ≡ a ≡ 11 (mod 97) is not present in the above
table. For j = 1, we have ag −jm ≡ 47 (mod 97), again not present in the
above table. Likewise, ag −2m ≡ 95 (mod 97) and ag −3m ≡ 62 (mod 97) are
not present in the table. However, ag −4m ≡ 18 (mod 97) exists in the table
against i = 6, and we get ind97 (11) ≡ 4 × 10 + 6 ≡ 46 (mod 96). ¤

Computing the powers g i for i = 0, 1, 2, . . . , m − 1 requires a total of O(m)


group operations. Sorting the table can be done using O(m log m) comparisons
(and movements) of group elements. The giant steps are taken for j = 0, 1, 2,
. . . , m − 1 only. The element g −m = g n−m is precomputed using O(log n)
multiplication and square operations in G. For j > 2, we obtain g −jm =
g −(j−1)m g −m by a single group operation. Since the table of baby steps is
kept sorted with respect to g i , searching whether ag −jm belongs to the table
can be accomplished by the binary search algorithm, demanding O(log m)
comparisons of group elements for each j. Finally, for a successful finding of
2 Daniel Shanks, Class number, a theory of factorization and genera, Proceedings of the

Symposia in Pure Mathematics, 20, 415–440, 1971.


Discrete Logarithms 349

(i, j), computing jm + i involves integer arithmetic of O(log2 n) total cost.


Thus,√ the running time of the BSGS method is dominated by O˜(m), that is,
O˜( n) group operations. The space complexity is dominated √ by the storage
for the table of baby steps, and is again O˜(m), that is, O˜( n).

7.1.2 Pollard’s Rho Method


This method is an adaptation of the rho method for factoring integers
(Section 6.2). We generate a random walk in G. We start at a random element
w0 = g s0 at0 of G. Subsequently, for i = 1, 2, 3, . . . , we jump to the element
wi = g si ati . The sequence w0 , w1 , w2 , . . . behaves like a random sequence of
elements of G. By √ the birthday paradox, we expect to arrive at a collision
wi = wj after O( n) iterations. In that case, we have g si ati = g sj atj , that is,
ati −tj = g sj −si . Since ord(g) = n, we have (ti − tj ) indg a ≡ sj − si (mod n).
If gcd(ti − tj , n) = 1, we obtain indg a ≡ (ti − tj )−1 (sj − si ) (mod n).
We need to realize a suitable function f : G → G to map wi−1 to wi for
all i > 1. We predetermine a small positive integer r, and map w ∈ G to
an element u ∈ {0, 1, 2, . . . , r − 1}. We also predetermine a set of multipliers
Mj = g σj aτj for j = 0, 1, 2, . . . , r − 1. We set f (w) = w × Mu = w × g σu aτu .
In practice, values of r ≈ 20 work well.
Example 7.2 Let us take G = F∗197 . The element g = 123 ∈ G is a primitive
element. We want to compute the discrete logarithm of a = 111 with respect
to g. Let us agree to take r = 5 and the multipliers
M0 = g 1 a4 = 71, M1 = g 7 a5 = 25, M2 = g 4 a2 = 7,
M3 = g 8 a5 = 120, M4 = g 8 a2 = 168.
Given w ∈ G, we take u = w rem r and use the multiplier Mu . We start the
random walk with s0 = 43 and t0 = 24, so that w0 ≡ g s0 at0 ≡ 189 (mod 197).
Some subsequent steps of the random walk are shown below.
i wi−1 u Mu si ti wi i wi−1 u Mu si ti wi
1 189 4 168 51 26 35 11 48 3 120 109 67 47
2 35 0 71 52 30 121 12 47 2 7 113 69 132
3 121 1 25 59 35 70 13 132 2 7 117 71 136
4 70 0 71 60 39 45 14 136 1 25 124 76 51
5 45 0 71 61 43 43 15 51 1 25 131 81 93
6 43 3 120 69 48 38 16 93 3 120 139 86 128
7 38 3 120 77 53 29 17 128 3 120 147 91 191
8 29 4 168 85 55 144 18 191 1 25 154 96 47
9 144 4 168 93 57 158 19 47 2 7 158 98 132
10 158 3 120 101 62 48 20 132 2 7 162 100 136
The random walk becomes periodic from i = 11 (with a shortest period of 7).
We have g s11 at11 ≡ g s18 at18 (mod 197), that is, at11 −t18 ≡ g s18 −s11 (mod 197),
that is, a−29 ≡ g 45 (mod 197), that is, −29 indg a ≡ 45 (mod 196), that is,
indg a ≡ (−29)−1 × 45 ≡ 39 (mod 196). ¤
350 Computational Number Theory

Space-saving variants (Floyd’s variant and Brent’s variant) of the Pollard


rho method work in a similar manner as described in connection with factoring
integers (Section 6.2). However, the concept of block gcd calculation does not
seem to adapt directly to the case of the DLP.

7.1.3 Pollard’s Lambda Method


Pollard’s lambda method is a minor variant of Pollard’s rho method.
The only difference is that now we use two random walks w0 , w1 , w2 , . . . and
w0′ , w1′ , w2′ , . . . in the group G. We use the same update function f to obtain
′ ′
wi = f (wi−1 ) and wi′ = f (wi−1 ′
) for i > 1. Let wi = g si ati and wi′ = g si ati .
After the two random walks intersect, they proceed identically, that is, the
two walks together look like the Greek letter λ. This method is also called the
method of wild and tame kangaroos. Two kangaroos make the two random
walks. The tame kangaroo digs a hole at every point it visits, and when the
wild kangaroo reaches the same point (later), it gets trapped in the hole.
′ ′
If wi = wj′ , then g si ati = g sj atj , that is, (ti − t′j ) indg a ≡ s′j − si (mod n).
If ti −t′j is invertible modulo n, we obtain indg a ≡ (ti −t′j )−1 (s′j −si ) (mod n).

Example 7.3 As in Example 7.2, we take G = F∗197 , g = 123, and a = 111.


We predetermine r = 5, and decide the multipliers
M0 = g 3 a3 = 88, M1 = g 4 a2 = 7, M2 = g 8 a8 = 142,
M3 = g 4 a9 = 115, M4 = g 2 a0 = 157.
The two random walks are shown in the following table.
i si ti wi s′i t′i wi′ i si ti wi s′i t′i wi′
0 43 24 189 34 42 172 15 102 91 46 108 114 121
1 45 24 123 42 50 193 16 106 93 125 112 116 59
2 49 33 158 46 59 131 17 109 96 165 114 116 4
3 53 42 46 50 61 129 18 112 99 139 116 116 37
4 57 44 125 52 61 159 19 114 99 153 124 124 132
5 60 47 165 54 61 141 20 118 108 62 132 132 29
6 63 50 139 58 63 2 21 126 116 136 134 132 22
7 65 50 153 66 71 87 22 130 118 164 142 140 169
8 69 59 62 74 79 140 23 132 118 138 144 140 135
9 77 67 136 77 82 106 24 136 127 110 147 143 60
10 81 69 164 81 84 151 25 139 130 27 150 146 158
11 83 69 138 85 86 72 26 147 138 91 154 155 46
12 87 78 110 93 94 177 27 151 140 46 158 157 125
13 90 81 27 101 102 115 28 155 142 125 161 160 165
14 98 89 91 104 105 73 29 158 145 165 164 163 139

We get w2 = w25 , so g 49 a33 ≡ g 150 a146 (mod 197), that is, (33 − 146) indg a ≡
150 − 49 (mod 196), that is, indg a ≡ (−113)−1 × 101 ≡ 39 (mod 196). ¤
Discrete Logarithms 351

7.1.4 Pohlig–Hellman Method


Suppose that the complete prime factorization n = p1e1 pe22 · · · perr of n = |G|
is known.pThe Pohlig–Hellman method3 can compute discrete logarithms in
G in O˜( max(p1 , p2 , . . . , pr ) ) time. If the largest prime divisor of n is small,
then this method turns out to be quite efficient. However, in the worst case,
this maximum prime divisor √ is n, and the Pohlig–Hellman method takes an
exponential running time O˜( n).
If x = indg a is known modulo pei i for all i = 1, 2, . . . , r, then the CRT
gives the desired value of x modulo n. In view of this, we take pe to be a
divisor of n with p ∈ P and e ∈ N. We first obtain x (mod p) by solving a DLP
in the subgroup of G of size p. Subsequently, we lift this value to x (mod p2 ),
x (mod p3 ), and so on. Each lifting involves computing a discrete logarithm in
the subgroup of size p. One uses the BSGS method or Pollard’s rho or lambda
method for computing discrete logarithms in the subgroup of size p.
e e
Let γ = g n/p , and α = an/p . Since g is a generator of G, we have ord γ =
e e e
pe . Moreover, a = g x implies that α = an/p = (g n/p )x = γ x = γ x rem p . We
plan to compute ξ = x rem pe . Let us write
ξ = x0 + x1 p + x2 p2 + · · · + xe−1 pe−1 .
We keep on computing the p-ary digits x0 , x1 , x2 , . . . of ξ one by one. Assume
that x0 , x1 , . . . , xi−1 are already computed, and we want to compute xi . Ini-
tially, no xi values are computed, and we plan to compute x0 . We treat both
the cases i = 0 and i > 0 identically. We already know
λ = x0 + x1 p + · · · + xi−1 pi−1
i
+xi+1 pi+1 + ··· +xe−1 pe−1
(λ = 0 if i = 0). Now, α = γ ξ gives αγ −λ = γ ξ−λ = γ xi p .
Exponentiation to the power pe−i−1 gives
e−i−1 e−1
+xi+1 pe +xi+2 pe+1 + ··· +xe−1 p2e−i−2 e−1 e−1
(αγ −λ )p = γ xi p = γ xi p = (γ p )xi .
e−1 e−i−1
The order of γ p is p, and so xi is the discrete logarithm of (αγ −λ )p to
pe−1
the base γ . If we rephrase this in terms of g and a, we see that xi is the
i+1
discrete logarithm of (ag −λ )n/p with respect to the base g n/p , that is,
·³ ´n/pi+1 ¸
−(x0 + x1 p + ··· + xi−1 pi−1 )
xi = indgn/p ag for i = 0, 1, 2, . . . , e − 1.

These index calculations are done in the subgroup of size p, generated by g n/p .

Example 7.4 Let us compute x = ind123 (111) in G = F∗197 by the Pohlig–


Hellman method. The size of the group F∗197 is n = 196 = 22 × 72 . That is,
3 Stephen Pohlig and Martin Hellman, An improved algorithm for computing logarithms

over GF(p) and its cryptographic significance, IEEE Transactions on Information Theory,
24, 106–110, 1978. This algorithm seems to have been first discovered (but not published)
by Roland Silver, and is often referred to also as the Silver-Pohlig–Hellman method.
352 Computational Number Theory

we compute x rem 4 = x0 + 2x1 and x rem 49 = x′0 + 7x′1 . The following table
illustrates these calculations.
p=2 p=7
i+1 i+1
i g n/2 λ (ag −λ )n/2 xi i g n/7 λ (ag −λ )n/7 x′i
0 196 0 196 1 0 164 0 178 4
1 196 1 196 1 1 164 4 36 5
These calculations yield x ≡ 1 + 2 × 1 ≡ 3 (mod 4) and x ≡ 4 + 7 × 5 ≡
39 (mod 49). Combining using the CRT gives x ≡ 39 (mod 196). ¤

7.2 Algorithms for Prime Fields


I now present some subexponential algorithms suited specifically to the
group F∗p with p ∈ P. These algorithms are collectively called index calculus
methods (ICM). Known earlier in the works of Kraitchik4 and of Western and
Miller,5 this method was rediscovered after the advent of public-key cryp-
tography. For example, see Adleman’s paper.6 Coppersmith et al.7 present
several improved variants (LSM, GIM, RLSM and CSM). Odlyzko8 surveys
index calculus methods for fields of characteristic two.
The group F∗p is cyclic. We plan to compute the index of a with respect to
a primitive element g of F∗p . We choose a factor base B = {b1 , b2 , . . . , bt } ⊆ F∗p
such that a reasonable fraction of elements of F∗p can be written as products
of elements of B. We collect relations of the form
t
Y
g α aβ ≡ bγi i (mod p).
i=1

This yields
t
X
α + β indg a ≡ γi indg (bi ) (mod p − 1).
i=1

If β is invertible modulo p − 1, this congruence gives us indg a, provided that


the indices indg (bi ) of the elements bi of the factor base B are known to us.
In view of this, an index calculus method proceeds in two stages.
4 M. Kraitchik, Théorie des nombres, Gauthier-Villards, 1922.
5 A. E. Western and J. C. P. Miller, Tables of indices and primitive roots, Cambridge
University Press, xxxvii–xlii, 1968.
6 Leonard Max Adleman, A subexponential algorithm for the discrete logarithm problem

with applications to cryptography, FOCS, 55–60, 1979.


7 Don Coppersmith, Andrew Michael Odlyzko and Richard Schroeppel, Discrete loga-

rithms in GF(p), Algorithmica, 1(1), 1–15, 1986.


8 Andrew Michael Odlyzko, Discrete logarithms in finite fields and their cryptographic

significance, EuroCrypt 1984, 224–314, 1985.


Discrete Logarithms 353

Stage 1: Determination of indg (bi ) for i = 1, 2, . . . , t.


In this stage, we generate relations without involving a (with β = 0), that is,

t
X
γi indg (bi ) ≡ α (mod p − 1),
i=1

a linear congruence modulo p − 1 in the unknown quantities indg (bi ). We


collect s such relations for some s > t. We solve the resulting system of linear
congruences, and obtain the desired indices of the elements of the factor base.

Stage 2: Determination of indg a.


In this stage, we generate a single relation with β invertible modulo p − 1.

Different index calculus methods vary in the way the factor base is chosen
and the relations are generated. In the rest of this section, we discuss some
variants. All these variants have running times of the form
£ ¤
L(p, ω, c) = exp (c + o(1))(ln p)ω (ln ln p)1−ω

for real constant values c and ω with c > 0 and 0 < ω < 1. If ω = 1/2, we
abbreviate L(p, ω, c) as Lp [c], and even as L[c] if p is clear from the context.

7.2.1 Basic Index Calculus Method


The basic index calculus method can be called an adaptation of Dixon’s
method for factoring integers (Section 6.4).9 The factor base consists of t small
primes, that is, B = {p1 , p2 , . . . , pt }, where pi is the i-th prime. I will later
specify the value of the parameter t so as to optimize the running time of
the algorithm. For random α in the range 1 6 α 6 p − 2, we try to express
g α (mod p) as a product of the primes in B. If the factorization attempt is
successful, we obtain a relation of the form

g α ≡ pγ11 pγ22 · · · pγt t (mod p), that is,

γ1 indg (p1 ) + γ2 indg (p2 ) + · · · + γt indg (pt ) ≡ α (mod p − 1).

We collect s relations of this form:

γ11 indg (p1 ) + γ12 indg (p2 ) + · · · + γ1t indg (pt ) ≡ α1 (mod p − 1),
γ21 indg (p1 ) + γ22 indg (p2 ) + · · · + γ2t indg (pt ) ≡ α2 (mod p − 1),
···
γs1 indg (p1 ) + γs2 indg (p2 ) + · · · + γst indg (pt ) ≡ αs (mod p − 1).

9 Historically, the basic index calculus method came earlier than Dixon’s method.
354 Computational Number Theory

This leads to the linear system of congruences:


   ind (p )   
γ11 γ12 · · · γ1t g 1 α1
 γ21 γ22 · · · γ2t   indg (p2 )   α2 
 ..  ..  ≡  .  (mod p − 1).
    .. 
··· ··· . ··· .
γs1 γs2 ··· γst indg (pt ) αs
For s ≫ t (for example, for s > 2t), we expect the s×t coefficient matrix (γij )
to be of full column rank (Exercises 7.10 and 8.12). If so, the indices indg (pi )
are uniquely obtained by solving the system. This completes the first stage.
In the second stage, we choose α ∈ {1, 2, . . . , p − 2} randomly, and try to
express ag α (mod p) as a product of the primes in the factor base. A successful
factoring attempt gives
ag α ≡ pγ11 pγ22 · · · pγt t (mod p), that is,

indg a ≡ −α + γ1 indg (p1 ) + γ2 indg (p2 ) + · · · + γt indg (pt ) (mod p − 1).

We obtain indg a using the values of indg (pi ) computed in the first stage.

Example 7.5 Take p = 821 and g = 21. We intend to compute the discrete
logarithm of a = 237 to the base g by the basic index calculus method. We
take the factor base B = {2, 3, 5, 7, 11} consisting of the first t = 5 primes.
In the first stage, we compute g j (mod p) for randomly chosen values of j.
After many choices, we come up with the following ten relations.
g 815 ≡ 90 ≡ 2 × 32 × 5 (mod 821)
g 784 ≡ 726 ≡ 2 × 3 × 112 (mod 821)
g 339 ≡ 126 ≡ 2 × 32 × 7 (mod 821)
g 639 ≡ 189 ≡ 33 × 7 (mod 821)
g 280 ≡ 88 ≡ 23 × 11 (mod 821)
g 295 ≡ 135 ≡ 33 × 5 (mod 821)
g 793 ≡ 375 ≡ 3 × 53 (mod 821)
g 478 ≡ 315 ≡ 32 × 5 × 7 (mod 821)
g 159 ≡ 105 ≡ 3×5×7 (mod 821)
g 635 ≡ 75 ≡ 3 × 52 (mod 821)
The corresponding system of linear congruences is as follows.
   
1 2 1 0 0 815
1 1 0 0 2  784 
   
1 2 0 1 0   339 
  indg (2)  
0 3 0 1 0  639 
   indg (3)   
3 0 0 0 1   280 
   indg (5)  ≡   (mod 820).
0 3 1 0 0   295 
  indg (7)  
0 1 3 0 0  793 
  indg (11)  
0 2 1 1 0  478 
   
0 1 1 1 0 159
0 1 2 0 0 635
Discrete Logarithms 355

Solving this system yields the indices of the factor-base elements as


indg (2) ≡ 19 (mod 820), indg (3) ≡ 319 (mod 820),
indg (5) ≡ 158 (mod 820), indg (7) ≡ 502 (mod 820),
indg (11) ≡ 223 (mod 820).
In the second stage, we obtain the relation
ag 226 ≡ 280 ≡ 23 × 5 × 7 (mod 821), that is,
indg a ≡ −226 + 3 × 19 + 158 + 502 ≡ 491 (mod 820). ¤

To optimize the running time of the basic index calculus method, we resort
to the density estimate of smooth integers, given in Section 6.4. In particular,
we use Corollary 6.14 with n = p.
Let the factor base B consist of all primes 6 L[η], so t too is of the form
L[η]. For randomly chosen values of α, the elements g α ∈ F∗p are random
integers between 1 and p − 1, that is, integers of value O(p). The probability
1 1
that such a value is smooth with respect to B is L[− 2η ]. Therefore, L[ 2η ]
random choices of α are expected to yield a single relation. We need s > 2t
relations, that is, s is again of the form L[η]. Thus, the total number of random
1
values of α, that need to be tried, is L[η + 2η ]. The most significant effort
associated with each choice of α is the attempt to factor g α . This is carried
out by trial divisions by L[η] primes in the factor base. To sum up, the relation-
1 1
collection stage runs in L[2η + 2η ] time. The quantity 2η + 2η is minimized
1
for η = 2 , leading to a running time of L[2] for the relation-collection stage.
In the linear-algebra stage, a system of L[ 12 ] linear congruences in L[ 12 ]
variables is solved. Standard Gaussian elimination requires a time of L[ 12 ]3 =
L[ 32 ] for solving the system. However, since each relation obtained by this
method is necessarily sparse, special sparse system solvers can be employed
to run in only L[ 12 ]2 = L[1] time. In any case, the first stage of the basic index
calculus method can be arranged to run in a total time of L[2].
The second stage involves finding a single smooth value of ag α . We need
1
to try L[ 2η ] = L[1] random values of α, with each value requiring L[ 12 ] trial
divisions. Thus, the total time required for the second stage is L[1+ 21 ] = L[ 32 ].
The running time of the basic index calculus method is dominated by the
relation-collection phase and is L[2]. The space requirement is L[η], that is,
L[ 21 ] (assuming that we use a sparse representation of the coefficient matrix).

7.2.2 Linear Sieve Method (LSM)


The basic method generates candidates as large as Θ(p) for smoothness
over a set of small primes. This leads to a running time of L[2], making the
basic method impractical. Schroeppel’s linear sieve method is an L[1]-time
algorithm for the computation of discrete logarithms over prime fields. It is an
adaptation of the quadratic sieve method for factoring integers, and generates
356 Computational Number Theory

smoothness candidates of absolute values O(p1/2 ). Moreover, we now have the


opportunity to expedite trial division by sieving.

7.2.2.1 First Stage


§√ ¨ √
Let H = p , and J = H 2 − p. Both H and J are O( p). For small
integers c1 , c2 (positive or negative), consider the expression (H +c1 )(H +c2 ) ≡
H 2 + (c1 + c2 )H + c1 c2 ≡ J + (c1 + c2 )H + c1 c2 (mod p). Let us denote by
T (c1 , c2 ) the integer J + (c1 + c2 )H + c1 c2 . Suppose that for some choice
of c1 and c2 , the integer T (c1 , c2 ) factors completely over the first t primes
p1 , p2 , . . . , pt . This gives us a relation

(H + c1 )(H + c2 ) ≡ pα1 α2 αt
1 p2 · · · pt (mod p), that is,

α1 indg (p1 ) + α2 indg (p2 ) + · · · + αt indg (pt )


− indg (H + c1 ) − indg (H + c2 ) ≡ 0 (mod p − 1).

This implies that the small primes p1 , p2 , . . . , pt should be in the factor


base B. Moreover, the integers H + c for some small values of c should also
be included in the factor base. More explicitly, we choose c1 and c2 to vary
between −M and M (the choice of M will be explained later), so the factor
base should also contain the integers H + c for −M 6 c 6 M . Finally, note
that for certain values of c1 and c2 , we have negative values for T (c1 , c2 ), that
is, −1 should also be included in the factor base. Therefore, we take
B = {−1} ∪ {p1 , p2 , . . . , pt } ∪ {H + c | − M 6 c 6 M }.
The size of this factor base is n = 2M + t + 2. By letting c1 and c2 vary in the
range −M 6 c1 6 c2 6 M , we generate m − 1 relations of the form mentioned
above. We assume that the base g of discrete logarithms is itself a small prime
pi in the factor base. This gives us a free relation: indg (pi ) ≡ 1 (mod p − 1).
The resulting system of congruences has a coefficient matrix of size m × n.
We adjust M and t, so that m ≫ n (say, m ≈ 2n). We hope that the coefficient
matrix is of full column rank, and the system furnishes a unique solution for
the indices of the elements of the factor base (Exercises 7.11 and 8.12).

Example 7.6 As an illustration


§√ ¨ of the linear sieve method, we take p = 719
and g = 11. Thus, H = p = 27, and J = H 2 − p = 10. We take t = 5
and M = 7, that is, the factor base B consists of −1, the first five primes
2, 3, 5, 7, 11, and the integers H +c for −M 6 c 6 M , that is, 20, 21, 22, . . . , 34.
The size of the factor base is n = 2M + t + 2 = 21. By considering all pairs
(c1 , c2 ) with −M 6 c1 6 c2 6 M , we obtain the following 30 relations.
Discrete Logarithms 357

c1 c2 T (c1 , c2 ) c1 c2 T (c1 , c2 )
−7 4 −99 = (−1) × 32 × 11 −3 3 1 = 1
−6 2 −110 = (−1) × 2 × 5 × 11 −3 4 25 = 52
−6 7 −5 = (−1) × 5 −3 5 49 = 72
−5 −1 −147 = (−1) × 3 × 72 −2 0 −44 = (−1) × 22 × 11
−5 0 −125 = (−1) × 53 −2 2 6 = 2×3
−5 2 −81 = (−1) × 34 −2 4 56 = 23 × 7
−5 5 −15 = (−1) × 3 × 5 −2 5 81 = 34
−5 6 7 = 7 −1 1 9 = 32
−4 −2 −144 = (−1) × 24 × 32 −1 2 35 = 5×7
−4 −1 −121 = (−1) × 112 −1 7 165 = 3 × 5 × 11
−4 0 −98 = (−1) × 2 × 72 0 0 10 = 2×5
−4 1 −75 = (−1) × 3 × 52 0 2 64 = 26
−4 4 −6 = (−1) × 2 × 3 1 3 121 = 112
−4 6 40 = 23 × 5 2 4 180 = 22 × 32 × 5
−4 7 63 = 32 × 7 4 4 242 = 21 × 112

Moreover, we have a free relation indg (11) = 1. Let C be the 31 × 21 matrix


 
1 0 2 0 0 1 −1 0 0 0 0 0 0 0 0 0 0 −1 0 0 0
1 1 0 1 0 1 0 −1 0 0 0 0 0 0 0 −1 0 0 0 0 0 
1 0 0 1 0 0 0 −1 0 0 0 0 0 0 0 0 0 0 0 0 −1
1 0 1 0 2 0 0 0 −1 0 0 0 −1 0 0 0 0 0 0 0 0 


1 0 0 3 0 0 0 0 −1 0 0 0 0 −1 0 0 0 0 0 0 0 
 
1 0 4 0 0 0 0 0 −1 0 0 0 0 0 0 −1 0 0 0 0 0 
1 0 1 1 0 0 0 0 −1 0 0 0 0 0 0 0 0 0 −1 0 0 
 
0 0 0 0 1 0 0 0 −1 0 0 0 0 0 0 0 0 0 0 −1 0 
1 4 2 0 0 0 0 0 0 −1 0 −1 0 0 0 0 0 0 0 0 0 


1 0 0 0 0 2 0 0 0 −1 0 0 −1 0 0 0 0 0 0 0 0 
 
1 1 0 0 2 0 0 0 0 −1 0 0 0 −1 0 0 0 0 0 0 0 
1 0 1 2 0 0 0 0 0 −1 0 0 0 0 −1 0 0 0 0 0 0 
 
1 1 1 0 0 0 0 0 0 −1 0 0 0 0 0 0 0 −1 0 0 0 
0 3 0 1 0 0 0 0 0 −1 0 0 0 0 0 0 0 0 0 −1 0 
 
0 0 2 0 1 0 0 0 0 −1 0 0 0 0 0 0 0 0 0 0 −1

C=0 0 0 0 0 0 0 0 0 0 −1 0 0 0 0 0 −1 0 0 0 0 .

0 0 0 2 0 0 0 0 0 0 −1 0 0 0 0 0 0 −1 0 0 0 
 
0 0 0 0 2 0 0 0 0 0 −1 0 0 0 0 0 0 0 −1 0 0 
1 2 0 0 0 1 0 0 0 0 0 −1 0 −1 0 0 0 0 0 0 0 
 
0 1 1 0 0 0 0 0 0 0 0 −1 0 0 0 −1 0 0 0 0 0 
0 3 0 0 1 0 0 0 0 0 0 −1 0 0 0 0 0 −1 0 0 0 


0 0 4 0 0 0 0 0 0 0 0 −1 0 0 0 0 0 0 −1 0 0 
 
0 0 2 0 0 0 0 0 0 0 0 0 −1 0 −1 0 0 0 0 0 0 
0 0 0 1 1 0 0 0 0 0 0 0 −1 0 0 −1 0 0 0 0 0 
 
0 0 1 1 0 1 0 0 0 0 0 0 −1 0 0 0 0 0 0 0 −1
0 1 0 1 0 0 0 0 0 0 0 0 0 −2 0 0 0 0 0 0 0 
 
0 6 0 0 0 0 0 0 0 0 0 0 0 −1 0 −1 0 0 0 0 0 
0 0 0 0 0 2 0 0 0 0 0 0 0 0 −1 0 −1 0 0 0 0 


0 2 2 1 0 0 0 0 0 0 0 0 0 0 0 −1 0 −1 0 0 0 
 
0 1 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 −2 0 0 0
0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

We have generated the following linear system (where xi = ind11 (i)):


t
C ( x−1 x2 x3 x5 x7 x11 x20 x21 ··· x34 )
t
≡ (0 0 ··· 0 1 ) (mod 718).
358 Computational Number Theory

We have the prime factorization p − 1 = 718 = 2 × 359, that is, we need


to solve the system modulo 2 and modulo 359. The coefficient matrix C has
full column rank (that is, 21) modulo 359, whereas it has a column rank of 20
modulo 2. Thus, there are two solutions modulo 2 and one solution modulo
359. Combining by CRT gives two solutions modulo 718. We can easily verify
the correctness of a solution by computing powers of g.
The correct solution modulo 2 turns out to be
(1 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 1 1 )t ,

whereas the (unique) solution modulo 359 is


(0 247 42 5 291 1 140 333 248 344 65 10 17 126 67 279 294 304 158 43 31)t .

Combining by the CRT gives the final solution:

ind11 (−1) = 359, ind11 (21) = 692, ind11 (28) = 426,


ind11 (2) = 606, ind11 (22) = 607, ind11 (29) = 638,
ind11 (3) = 42, ind11 (23) = 703, ind11 (30) = 294,
ind11 (5) = 364, ind11 (24) = 424, ind11 (31) = 304,
ind11 (7) = 650, ind11 (25) = 10, ind11 (32) = 158,
ind11 (11) = 1, ind11 (26) = 376, ind11 (33) = 43,
ind11 (20) = 140, ind11 (27) = 126, ind11 (34) = 31. ¤

7.2.2.2 Sieving
Let me now explain how sieving can be carried out to generate the relations
of the linear sieve method. First, fix a value of c1 in the range −M 6 c1 6 M ,
and allow c2 to vary in the interval [c1 , M ]. Let q be a small prime in the
factor base (q = pi for 1 6 i 6 t), and h a small positive exponent. We need
to determine all the values of c2 for which q h |T (c1 , c2 ) (for the fixed choice of
c1 ). The condition T (c1 , c2 ) ≡ 0 (mod q h ) gives the linear congruence in c2 :

(H + c1 )c2 ≡ −(J + c1 H) (mod q h ),

which can be solved by standard techniques (see Section 1.4).


We initialize an array A indexed by c2 to Ac2 = log |T (c1 , c2 )|. For each
appropriate choice of q and h, we subtract log q from all of the array locations
Ac2 , where c2 ∈ [c1 , M ] satisfies the above congruence. After all the choices for
q and h are considered, we find those array locations Ac2 which store values
close to zero. These correspond to precisely all the smooth values of T (c1 , c2 )
for the fixed c1 . We make trial divisions of these T (c1 , c2 ) values by the small
primes p1 , p2 , . . . , pt in the factor base.
The details of this sieving process are similar to those discussed in connec-
tion with the quadratic sieve method for factoring integers (Section 6.6), and
are omitted here. Many variants that are applicable to the QSM (like large
prime variation and incomplete sieving) apply to the linear sieve method too.
Discrete Logarithms 359

7.2.2.3 Running Time


Let me now prescribe values for t and M so that the linear sieve method
runs in L[1] time. We include all primes 6 L[1/2] in the factor base, that
is, t is again of the form L[1/2]. The probability that an integer of absolute

value O˜( p) is smooth with respect to these primes is then L[− 21 ] (by Corol-
lary 6.14). We choose M = L[1/2] also. That is, the size of the factor base is
again L[1/2]. The total number of pairs (c1 , c2 ), for which T (c1 , c2 ) is tested for
smoothness, is Θ(M 2 ) which is of the order of L[1]. Since each of these values

of T (c1 , c2 ) is O˜( p), the expected number of relations is L[1]L[− 12 ] = L[1/2],
the same as the size of the factor base.
It is easy to argue that for these choices of t and M , the entire sieving
process (for all values of c1 ) takes a time of L[1]. Straightforward Gaussian
elimination involving L[1/2] equations in L[1/2] variables takes a running time
of L[3/2]. However, the system of congruences generated by the linear sieve
method is necessarily sparse. Employing some efficient sparse system solver
reduces the running time to L[1]. To sum up, the first stage of the linear sieve
method can be arranged to run in L[1] time. The space requirement is L[1/2].

7.2.2.4 Second Stage


The first stage gives us the discrete logarithms of primes 6 L[1/2]. If we
adopt a strategy similar to the second stage of the basic method (search for
a smooth value of ag α (mod p) for randomly chosen α), we spend L[3/2] time
to compute each individual discrete logarithm. That is too much. Moreover,
we do not make use of the indices of H + c, available from the first stage.
Using a somewhat trickier technique, we can compute individual loga-
rithms in only L[1/2] time. This improved method has three steps.
• We choose α randomly, and try to express ag α (mod p) as a product of
primes 6 L[2]. We do not sieve using all primes 6 L[2], because that
list of primes is rather huge, and sieving would call for an unacceptable
running time of L[2]. We instead use the elliptic curve method (Sec-
tion 6.8) to detect the L[2]-smoothness of ag α (mod p). This factoring
algorithm is sensitive to the smoothness of the integer being factored.
More concretely, it can detect the L[2]-smoothness of an integer of value
O(p) (and completely factor such an L[2]-smooth integer) in L[1/4] time.
Suppose that for some α, we obtain the following factorization:
à t ! k 
Y Y β
ag α ≡ pα
i
i  qj j  (mod p),
i=1 j=1

where pi are the primes in the factor base B, and qj are primes 6 L[2]
not in the factor base. We then have
t
X k
X
indg a ≡ −α + αi indg (pi ) + βj indg (qj ) (mod p − 1).
i=1 j=1
360 Computational Number Theory

By Corollary 6.14, an integer of value O(p) is L[2]-smooth with proba-


bility L[− 41 ], that is, L[1/4] choices of α are expected to suffice to find
an L[2]-smooth value of ag α (mod p). The expected running time for
arriving at the last congruence is, therefore, L[1/4] × L[1/4] = L[1/2].
Since indg (pi ) are available from the first stage, we need to compute
the indices of the medium-sized primes q1 , q2 , . . . , qk . Since k 6 log n, we
achieve a running time of L[1/2] for the second stage of the linear sieve
method, provided that we can compute each indg (qj ) in L[1/2] time.

• Let q be a medium-sized prime. We find an integer y close to p/q such
that y is L[1/2]-smooth. Since the first stage gives us the indices of all
primes 6 L[1/2], the index of y can be computed. We use sieving to

obtain such an integer y. Indeed, the value of y is O˜( p), and so the
1
probability that y is L[1/2]-smooth is L[− 2 ]. This means that sieving

around p/q over an interval of size L[1/2] suffices.
• Let q be a medium-sized prime, and y an L[1/2]-smooth integer close to

p/q, as computed above. Consider the integer T ′ (c) = (H + c)qy − p

for a small value of c. We have T ′ (c) = O( p), and so T ′ (c) is L[1/2]-
smooth with probability L[− 21 ], that is, sieving over the L[1/2] values of
c in the range −M 6 c 6 M is expected to give an L[1/2]-smooth value

T ′ (c) = (H + c)qy − p = pγ11 pγ22 · · · pγt t , that is,


X t
indg q ≡ − indg (H + c) − indg y + γi indg (pi ) (mod p − 1).
i=1

Example 7.7 Let us continue with the database of indices available from the
first stage described in Example 7.6. Suppose that we want to compute the
index of a = 123 modulo p = 719 with respect to g = 11. We first obtain the
relation ag 161 ≡ 182 ≡ 2×7×13 (mod 719), that is, indg a ≡ −161+indg (2)+
indg (7) + indg (13) ≡ −161 + 606 + 650 + indg (13) ≡ 377 + indg (13) (mod 718).
What remains is to compute the index of q = 13.

We look for an 11-smooth integer y close to p/13 ≈ 2.0626. We take
y = 3. Example 7.6 gives indg (y) = 42. (Since we are working here with an
artificially small p, the value of y turns out to be abnormally small.)
Finally, we find an 11-smooth value of (H + c)qy − p for −7 6 c 6 7.
For c = 4, we have (H + 4)qy − p = 490 = 2 × 5 × 72 , that is, indg q ≡
− indg (H + 4) − indg (y) + indg (2) + indg (5) + 2 indg (7) ≡ −304 − 42 + 606 +
364 + 2 × 650 ≡ 488 (mod 718). This gives the desired discrete logarithm
indg (123) ≡ 377 + 488 ≡ 147 (mod 718). ¤

7.2.3 Residue-List Sieve Method (RLSM)


Like the linear sieve method, the residue list sieve method looks at integers

near p. However, instead of taking the product of two such integers, the
Discrete Logarithms 361

residue-list sieve method first locates smooth integers near p. After that,
pairs of located smooth integers are multiplied.
The factor
§√ ¨ base now consists of −1, and the primes p1 , p2 , . . . , pt 6 L[ 12 ].
Let H = p , J = H − p, and M = L[1]. We first locate L[ 12 ]-smooth inte-
2

gers of the form H +c with −M 6 c 6 M . Since each such candidate is O( p),
the smoothness probability is L[− 12 ], that is, among the L[1] candidates of the
form H + c, we expect to obtain L[ 12 ] smooth values. Let H + c1 and H + c2
be two such smooth integers. Consider T (c1 , c2 ) = (H + c1 )(H + c2 ) − p =

J + (c1 + c2 )H + c1 c2 . We have T (c1 , c2 ) = O( p), that is, the smoothness
probability of T (c1 , c2 ) with respect to the factor base B is again L[− 12 ]. Since
there are Θ(L[ 12 ]2 ) = L[1] pairs (c1 , c2 ) with both H + c1 and H + c2 smooth,
the expected number of smooth values of T (c1 , c2 ) is L[ 12 ]. These relations are
solved to compute the indices of the L[ 12 ] elements of the factor base B.
Example 7.8 Let us compute discrete logarithms
§√ ¨ modulo the prime p = 863
to the primitive base g = 5. We have H = 863 = 30, and J = H 2 −p = 37.
Take B = {−1, 2, 3, 5, 7, 11}, and M = 7. The smooth values of H + c with
−M 6 c 6 M are shown in the left table below. Combination of these smooth
values yields smooth T (c1 , c2 ) values as shown in the right table below.

c H +c H + c1 H + c2 T (c1 , c2 )
−6 24 = 23 × 3 24 36 1
−5 25 = 52 25 32 −63 = (−1) × 32 × 7
−3 27 = 33 25 35 12 = 22 × 3
−2 28 = 22 × 7 27 32 1
0 30 = 2×3×5 27 33 28 = 22 × 7
2 32 = 25 28 32 33 = 3 × 11
3 33 = 3 × 11
5 35 = 5×7
6 36 = 22 × 32
The following relations are thus obtained. We use the notation xa to stand
for indg a (where a ∈ B).
(3x2 + x3 ) + (2x2 + 2x3 ) ≡ 0 (mod 862),
(2x5 ) + (5x2 ) ≡ x−1 + 2x3 + x7 (mod 862),
(2x5 ) + (x5 + x7 ) ≡ 2x2 + x3 (mod 862),
(3x3 ) + (5x2 ) ≡ 0 (mod 862),
(3x3 ) + (x3 + x11 ) ≡ 2x2 + x7 (mod 862),
(2x2 + x7 ) + (5x2 ) ≡ x3 + x11 (mod 862).
Moreover, we have the free relation:
x5 ≡ 1 (mod 862).
We also use the fact that x−1 ≡ (p − 1)/2 ≡ 431 (mod 862), since g = 5 is a
primitive root of p. This gives us two solutions of the above congruences:
362 Computational Number Theory

(x−1 , x2 , x3 , x5 , x7 , x11 ) = (431, 161, 19, 1, 338, 584), and


(x−1 , x2 , x3 , x5 , x7 , x11 ) = (431, 592, 450, 1, 769, 153).

But 5161 ≡ −2 (mod 863), whereas 5592 ≡ 2 (mod 863). That is, the second
solution gives the correct values of the indices of the factor-base elements. ¤

The first stage of the residue-list sieve method uses two sieves. The first
one is used to locate all the smooth values of H + c. Since c ranges over L[1]
values between −M and M , this sieve takes a running time of the form L[1].
In the second sieve, one combines pairs of smooth values of H + c obtained
from the first sieve, and identifies the smooth values of T (c1 , c2 ). But H + c
itself ranges over L[1] values (although there are only L[ 12 ] smooth values
among them). In order that the second sieve too can be completed in L[1] time,
we, therefore, need to adopt some special tricks. For each small prime power
q h , we maintain a list of smooth H + c values obtained from the first sieve.
This list should be kept sorted with respect to the residues (H + c) rem q h .
The name residue-list sieve method is attributed to these lists. Since there are
L[ 21 ] prime powers q h , and there are L[ 12 ] smooth values of H + c, the total
storage requirement for all the residue lists is L[1].
For determining the smoothness of T (c1 , c2 ) = (H + c1 )(H + c2 ) − p, one
fixes c1 , and lets c2 vary in the interval c1 6 c2 6 M . For each small prime
power q h , one calculates (H + c) rem q h and p rem q h . One then solves for the
value(s) of (H + c2 ) (mod q h ) from the congruence T (c1 , c2 ) ≡ 0 (mod q h ).
For each solution χ, one consults the residue list for q h to locate all the values
of c2 for which (H + c2 ) rem q h = χ. Since the residue list is kept sorted with
respect to the residue values modulo q h , binary search can quickly identify
the desired values of c2 , leading to a running time of L[1] for the second sieve.
The resulting sparse system with L[ 21 ] congruences in L[ 12 ] variables can
be solved in L[1] time. The second stage of the residue-list sieve method is
identical to the second stage of the linear sieve method, and can be performed
in L[ 21 ] time for each individual logarithm. The second stage involves a sieve
in the third step, which calls for the residue lists available from the first stage.
Let us now make a comparative study between the performances of the
linear sieve method and the residue-list sieve method. The residue-list sieve
method does not include any H + c value in the factor base. As a result, the
size of the factor base is smaller than that in the linear sieve method. How-
ever, maintaining the residue lists calls for a storage of size L[1]. This storage
is permanent in the sense that the second stage (individual logarithm calcula-
tion) requires these lists. For the linear sieve method, on the other hand, the
permanent storage requirement is only L[ 12 ]. Moreover, the (hidden) o(1) term
in the exponent of the running time is higher in the residue-list sieve method
than in the linear sieve method. In view of these difficulties, the residue-list
sieve method turns out to be less practical than the linear sieve method.
Discrete Logarithms 363

7.2.4 Gaussian Integer Method (GIM)


An adaptation of ElGamal’s method10 for computing discrete logarithms
in Fp2 , the Gaussian integer method is an attractive alternative to the linear
sieve method. It has a running time of L[1] and a space requirement of L[ 12 ].
Assume that (at least) one of the integers −1, −2, −3, −7, −11, −19, −43,
−67 and −163 is a quadratic
√ residue modulo p. Let −r be such an integer.
(For these values of r, Z[ −r ] is a UFD. The algorithm can be made to work
even if the assumption does not hold.) Let s be a square root of −r modulo
p. Consider the ring homomorphism
√ √
Φ : Z[ −r ] → Fp , that maps a + b −r 7→ (a + bs) rem p.
By Cornacchia’s algorithm (Algorithm 1.10), we compute integers u, v such
that p = u2 + rv 2 . Since s2 ≡ −r (mod p), either u + vs ≡ 0 (mod p) or
u − vs ≡ 0 (mod p). Suppose that u + vs ≡ 0 (mod p). (One may first compute
u, v and subsequently take s ≡ −v −1 u (mod p).) The condition u2 + rv 2 = p

implies that u, v are O( p). For small integers c1 , c2 , the expression
T (c1 , c2 ) = c1 u + c2 v
√ √
is O( p). If we treat c1 u + c2 v as an element of Z[ −r ], we can write
√ √
c1 u + c2 v = c1 (u + v −r ) + v(c2 − c1 −r ).
Application of the ring homomorphism Φ gives

T (c1 , c2 ) ≡ c1 u + c2 v ≡ c1 (u + vs) + vΦ(c2 − c1 −r ) (mod p).
But u + vs ≡ 0 (mod p), so

T (c1 , c2 ) ≡ vΦ(c2 − c1 −r ) (mod p).
1
Let B1 = {p1 , p2 , . . . , pt } be the set of all (rational) primes
√ 6 L[ 2 ],√ and
B2 = {q1 , q2 , . . . , qt′ } the set of all (complex) primes a + b −r of Z[ −r ]
with a2√+ b2 r 6 L[ 12 ]. Suppose that T (c1 , c2 ) factors completely over B1 , and
c2 − c1 −r factors completely over B2 , that is, we have
pα1 α2 αt β1
1 p2 · · · pt ≡ vΦ(q1 ) Φ(q2 )
β2
· · · Φ(qt′ )βt′ (mod p), that is,

α1 indg (p1 ) + α2 indg (p2 ) + · · · + αt indg (pt )


≡ indg (v) + β1 indg Φ(q1 ) + β2 indg Φ(q2 ) + · · · + βt′ indg Φ(qt′ ) (mod p − 1).

This is a relation in the Gaussian integer method.


The factor base B now consists of −1, the t = L[ 12 ] rational primes p1 , p2 ,
. . . , pt , the images Φ(q1 ), Φ(q2 ), . . . , Φ(qt′ ) of the t′ = L[ 12 ] complex primes,
and the integer v. The size of B is t + t′ + 2 which is L[ 21 ]. We let c1 , c2 vary in
10 Taher ElGamal, A subexponential-time algorithm for computing discrete logarithms

over GF(p2 ), IEEE Transactions on Information Theory, 31, 473–481, 1985.


364 Computational Number Theory

the interval [−M, M ] with M = L[ 12 ]. In order to avoid duplicate relations, we


consider only those pairs with gcd(c1 , c2 ) = 1. There are L[1] such pairs. Since

each T (c1 , c2 ) is O( p), the probability that it is smooth with respect to the
L[ 12 ] rational primes is L[− 12 ], that is, we expect to get L[√ 1
2 ] smooth values of
T (c1 , c2 ). On the other hand, the complex number c2 − c1 −r is smooth over
B2 with high (constant) √ probability (c22 + rc21 is L[1], whereas B2 contains all
complex primes a+b −r with a +rb2 6 L[ 12 ]). Thus, a significant fraction of
2

(c1 , c2 ) pairs, for which T (c1 , c2 ) is smooth, leads to relations for the Gaussian
integer method, that is, we get L[ 12 ] relations in L[ 12 ] variables, as desired.
Example 7.9 Let us compute discrete logarithms modulo the prime p = 997
to the primitive base g = 7. We have p ≡ 1 (mod 4), so −1 is a quadratic
residue modulo p, and we may take r = 1. But then, we use the ring Z[i]
of Gaussian integers (this justifies the name of this algorithm). We express
p = u2 + v 2 with u = 31 and v = 6, and take s ≡ −v −1 u ≡ 161 (mod p) as
the modular square root of −1, satisfying u + vs ≡ 0 (mod p).
We take t = 6 small rational primes, that is, B1 = {2, 3, 5, 7, 11, 13}. A
set of pairwise non-associate complex primes a + bi with a2 + b2 6 13 is
B2 = {1 + i, 2 + i, 2 − i, 2 + 3i, 2 − 3i}. The factor base is, therefore, given by
B = {−1, 2, 3, 5, 7, 11, 13, v, Φ(1 + i), Φ(2 + i), Φ(2 − i), Φ(2 + 3i), Φ(2 − 3i)}
= {−1, 2, 3, 5, 7, 11, 13, 6, 1 + s, 2 + s, 2 − s, 2 + 3s, 2 − 3s}
= {−1, 2, 3, 5, 7, 11, 13, 6, 162, 163, −159, 485, −481}
= {−1, 2, 3, 5, 7, 11, 13, 6, 162, 163, 838, 485, 516}.
The prime integers 3, 7 and 11 remain prime in Z[i]. We take (c1 , c2 ) pairs with
gcd(c1 , c2 ) = 1, so these primes do not occur in the factorization of c2 − c1 i.
The units in Z[i] are ±1, ±i. We have
indg (Φ(1)) = indg (1) = 0, and
indg (Φ(−1)) = indg (−1) = (p − 1)/2 = 498.
Moreover, Φ(i) = s and Φ(−i) = −s. One of ±s has index (p − 1)/4, and the
other has index 3(p − 1)/4. In this case, we have
indg (Φ(i)) = indg (s) = (p − 1)/4 = 249, and
indg (Φ(−i)) = indg (−s) = 3(p − 1)/4 = 747.
Let us take M = 5, that is, we check all c1 , c2 values between −5 and 5
with gcd(c1 , c2 ) = 1. There are 78 such pairs. For 37 of these pairs, the integer
T (c1 , c2 ) = c1 u + c2 v is smooth with respect to B1 . Among these, 23 yield
smooth values of c2 − c1 i with respect to B2 (see the table on the next page).
Let us now see how such a factorization leads to a relation. Consider c1 =
−3 and c2 = −4. In this case, T (c1 , c2 ) = (−1) × 32 × 13, and c2 − c1 i =
i(2 + i)2 . This gives indg (−1) + 2 indg (3) + indg (13) ≡ indg (v) + indg (Φ(i)) +
2 indg (Φ(2 + i)) (mod p − 1), that is, 498 + 2 indg (3) + indg (13) ≡ indg (6) +
249 + 2 indg (163) (mod 996). The reader is urged to convert the other 22
relations and solve the resulting system of congruences. ¤
Discrete Logarithms 365

c1 c2 T (c1 , c2 ) = c1 u + c2 v c2 − c1 i
−3 −4 −117 = (−1) × 32 × 13 −4 + 3i = (i) × (2 + i)2
−3 −2 −105 = (−1) × 3 × 5 × 7 −2 + 3i = (−1) × (2 − 3i)
−3 −1 −99 = (−1) × 32 × 11 −1 + 3i = (i) × (1 + i) × (2 − i)
−3 2 −81 = (−1) × 34 2 + 3i = 2 + 3i
−2 −3 −80 = (−1) × 24 × 5 −3 + 2i = (i) × (2 + 3i)
−2 1 −56 = (−1) × 23 × 7 1 + 2i = (i) × (2 − i)
−2 3 −44 = (−1) × 22 × 11 3 + 2i = (i) × (2 − 3i)
−1 −3 −49 = (−1) × 72 −3 + i = (i) × (1 + i) × (2 + i)
−1 1 −25 = (−1) × 52 1+ i = 1+ i
−1 3 −13 = (−1) × 13 3+ i = (1 + i) × (2 − i)
−1 5 −1 = (−1) 5+ i = (−i) × (1 + i) × (2 + 3i)
0 1 6 = 2×3 1 = 1
1 −5 1 = 1 −5 − i = (i) × (1 + i) × (2 + 3i)
1 −3 13 = 13 −3 − i = (−1) × (1 + i) × (2 − i)
1 −1 25 = 52 −1 − i = (−1) × (1 + i)
1 3 49 = 72 3− i = (−i) × (1 + i) × (2 + i)
2 −3 44 = 22 × 11 −3 − 2i = (−i) × (2 − 3i)
2 −1 56 = 23 × 7 −1 − 2i = (−i) × (2 − i)
2 3 80 = 24 × 5 3 − 2i = (−i) × (2 + 3i)
3 −2 81 = 34 −2 − 3i = (−1) × (2 + 3i)
3 1 99 = 32 × 11 1 − 3i = (−i) × (1 + i) × (2 − i)
3 2 105 = 3 × 5 × 7 2 − 3i = 2 − 3i
3 4 117 = 32 × 13 4 − 3i = (−i) × (2 + i)2

Sieving to locate smooth T (c1 , c2 ) values is easy. After the sieve terminates,
we throw away smooth values with gcd(c1 , c2 ) > 1. If√gcd(c1 , c2 ) = 1 and
T (c1 , c2 ) is smooth, we make trial division of c2 − c1 −r by the complex
primes in B2 . The entire sieving process can be completed in L[1] time.
The resulting sparse system of L[ 21 ] equations in L[ 12 ] variables can be
solved in L[1] time. The second stage of the Gaussian integer method is iden-
tical to the second stage of the linear sieve method, and can be performed in
L[ 12 ] time for each individual logarithm.
The size of the factor base in the Gaussian integer method is t + t′ + 2,
whereas the size of the factor base in the linear sieve method is t + 2M + 2.
Since t′ ≪ M (indeed, t′ is roughly proportional to M/ ln M ), the Gaussian
integer method gives significantly smaller systems of linear congruences than
the linear sieve method. In addition, the values of T (c1 , c2 ) are somewhat
smaller in the Gaussian
√ integer method than in the linear sieve method (we
√ √
have |c1 u + c2 v| 6 2M p, whereas J + (c1 + c2 )H + c1 c2 6 2M p, approx-
imately). Finally, unlike the residue-list sieve method, the Gaussian integer
method is not crippled by the necessity of L[1] permanent storage. In view
of these advantages, the Gaussian integer method is practically the most pre-
ferred L[1]-time algorithm for computing discrete logarithms in prime fields.
366 Computational Number Theory

7.2.5 Cubic Sieve Method (CSM)


Reyneri’s cubic sieve method is a faster alternative to the L[1]-time algo-
rithms discussed so far. Unfortunately, it is not clear how we can apply this
method to a general prime p. In some specialp cases, this method is naturally
applicable, and has a best running time of L[ 2/3 ] (nearly L[0.816]).
Suppose that we know a solution of the congruence
x3 ≡ y 2 z (mod p)
with x3 6= y 2 z (as integers), and with x, y, z = O(pξ ) for 1/3 6 ξ < 1/2. For
small integers c1 , c2 , c3 with c1 + c2 + c3 = 0, we have
(x + c1 y)(x + c2 y)(x + c3 y)
≡ x3 + (c1 + c2 + c3 )x2 y + (c1 c2 + c1 c3 + c2 c3 )xy 2 + (c1 c2 c3 )y 3
≡ y 2 z + (c1 c2 + c1 c3 + c2 c3 )xy 2 + (c1 c2 c3 )y 3
≡ y 2 [z + (c1 c2 + c1 c3 + c2 c3 )x + (c1 c2 c3 )y] (mod p).
Let us denote
T (c1 , c2 , c3 ) = z + (c1 c2 + c1 c3 + c2 c3 )x + (c1 c2 c3 )y
= z − (c21 + c1 c2 + c22 )x − c1 c2 (c1 + c2 )y.
For small values of c1 , c2 , c3 , we have T (c1 , c2 , c3 ) = O(pξ ). We attempt to
factor T (c1 , c2 , c3 ) over the first t primes. If the factorization attempt is suc-
cessful, we get a relation of the form
(x + c1 y)(x + c2 y)(x + c3 y) ≡ y 2 pα1 α2 αt
1 p2 · · · pt (mod p), that is,

indg (x + c1 y) + indg (x + c2 y) + indg (x + c3 y)


≡ indg (y 2 ) + α1 indg (p1 ) + α2 indg (p2 ) + · · · + αt indg (pt ) (mod p − 1).

Therefore, we take a factor base of the following form (the choice of t and M
to be explained shortly):
B = {−1} ∪ {p1 , p2 , . . . , pt } ∪ {y 2 } ∪ {x + cy | − M 6 c 6 M }.

Example 7.10 Let us compute discrete logarithms modulo the prime p =


895189 to the primitive base g = 17. A solution of the congruence x3 ≡
y 2 z (mod p) is x = 139, y = 2 and z = 13. This solution satisfies x3 = y 2 z+3p.
We take t = 25 (all primes < 100), and M = 50. The size of the factor base is
2M + t + 3 = 128. Letting c1 , c2 , c3 vary in the range −M 6 c1 6 c2 6 c3 6 M
with c1 + c2 + c3 = 0, we get 196 relations, some of which are shown in the
table on the next page. In addition, we have a free relation
g 1 ≡ 17 (mod p).
This results in a system of 197 linear congruences in 128 variables modulo
p − 1 = 895188. ¤
Discrete Logarithms 367

c1 c2 c3 T (c1 , c2 , c3 )
−50 1 49 −345576 = (−1) × 23 × 3 × 7 × 112 × 17
−50 21 29 −323736 = (−1) × 23 × 3 × 7 × 41 × 47
−50 23 27 −323268 = (−1) × 22 × 3 × 11 × 31 × 79
−49 15 34 −312816 = (−1) × 24 × 3 × 73 × 19
−48 1 47 −318222 = (−1) × 2 × 33 × 71 × 83
−48 22 26 −295647 = (−1) × 3 × 11 × 172 × 31
−47 9 38 −291648 = (−1) × 26 × 3 × 72 × 31
−47 11 36 −289218 = (−1) × 2 × 3 × 19 × 43 × 59
−47 17 30 −284088 = (−1) × 23 × 3 × 7 × 19 × 89
···
−5 −5 10 −9912 = (−1) × 23 × 3 × 7 × 59
−5 2 3 −2688 = (−1) × 27 × 3 × 7
−4 −2 6 −3783 = (−1) × 3 × 13 × 97
−4 0 4 −2211 = (−1) × 3 × 11 × 67
−3 −1 4 −1770 = (−1) × 2 × 3 × 5 × 59
−3 1 2 −972 = (−1) × 22 × 35
−2 −1 3 −948 = (−1) × 22 × 3 × 79
−2 1 1 −408 = (−1) × 23 × 3 × 17
−1 −1 2 −400 = (−1) × 24 × 52
−1 0 1 −126 = (−1) × 2 × 32 × 7
0 0 0 13 = 13

Let me now specify the parameters t and M .pSuppose that x, y, z are each
O(pξ ). We take t as the number ofpprimes 6 L[ ξ/2]. By the prime p number
theorem, t is again
√ of the form L[ ξ/2]. We also take M = L[ ξ/2]. There
are Θ(M 2 ) = L[ 2ξ ] triples (c1 , c2 , c3 ) with −M 6 c1 6 c2 6 c3 6 M and
c1 + c2 + c3 = 0. pEach T (c1 , c2 , c3 ) is of value O(pξ ) and has a probability
ξ
L[− √ ] = L[− ξ/2] of being smooth with respect to the t primes in the
2 ξ/2
factor
pbase. Thus, the expected number of relations for all choices of (c1p , c2 , c3 )
is L[ ξ/2]. The size of the factor base is t + 2M + 3 which is also L[ ξ/2].
In order to locate the smooth values of T (c1 , c2 , c3 ), we express T (c1 , c2 , c3 )
as a function of c1 and c2 alone. The conditions −M 6 c1 6 c2 6 c3 6 M and
c1 + c2 + c3 = 0 imply that c1 varies between −M and 0, and c2 varies between
max(c1 , −(M + c1 )) and −c1 /2 for a fixed c1 . We fix c1 , and let c2 vary in this
allowed range. For a small prime power q h , we solve T (c1 , c2 , c3 ) ≡ 0 (mod q h )
for c2 , taking c1 as constant and c3 = −(c1 + c2 ). This calls for solving a
quadratic congruence in c2 . The details are left to p the reader√(Exercise 7.17).
The sieving process can be completed p in L[2 ξ/2] = L[ 2ξ ] time.pSolv-
ing the resulting sparse system of L[ ξ/2] linear congruences in L[ ξ/2]
variables also takes the same time. To sum up, the first √ stage of the cubic
sieve method canpbe so implemented as to run in L[ 2ξ p ] time. The space
requirement is L[ ξ/2]. If ξ = 1/3,
p the running time is L[ 2/3] ≈ L[0.816],
and the space requirement is L[ 1/6] ≈ L[0.408].
368 Computational Number Theory

The second stage of the cubic sieve method is costlier than those for the
L[1] methods discussed earlier. The trouble now
p is that the first stage supplies
a smaller database of discrete logarithms (L[ ξ/2] compared to L[1/2]). This
indicates that we need to perform more than L[1/2] work√in order to compute
individual logarithms. Here is a strategy that runs in L[ 2ξ ] time.

• Express ag α (mod p) as a product of primes 6 L[2]. This can be done


in L[1/2] time by the elliptic curve method. It remains to compute the
index of each medium-sized prime q that appears in this factorization.
• Find values of c3p> 0 such that x + c3 y is divisible by q and the cofactor
(x+c3 y)/q is L[ ξ/2]-smooth.p This can be done by running a sieve over
values of c3 > 0. We need L[ ξ/2] such values of c3 for the second stage
to terminate with high probability. As a result, we may have to work
with values of c3 > M , that is, the indices of x + c3 y are not necessarily
available in the database computed in the first stage. If c3 6 M , we
obtain the value of indg q, otherwise we proceed to the next step.
• For each c3 > M obtained above, we let c2 vary in the range −c3 − M 6
c2 6 −c3 + M , and set c1 = −(c2 + c3 ). (We may have c2 < −M , that
is, indg (x + c2 y) need not be available from the first stage.) We run a
sieve over this range to locate a value of c2 for which both p x + c2 y and
T (c1 , c2 , c3 ) = z + (c1 c2 + c1 c3 + c2 c3 )x + (c1 c2 c3 )y are L[ ξ/2]-smooth.
ξ
Sincepthese values are O(p ), each of these is smooth with√probability
L[− ξ/2], that is,p both are smooth with probability L[− 2ξ ]. Since
2M + 1 (that is, L[ ξ/2]) values of c2 are tried, the p probability that a
suitable c2 can be found for a fixed c3 is only L[− ξ/2]. If no suitable p
c2 is found, we repeat this step with another value of c3 . After L[ ξ/2]
trials, a suitable c2 is found with high probability. We then have

(x + c1 y)(x + c2 y)(x + c3 y) ≡ y 2 T (c1 , c2 , c3 ) (mod p).


p
By our choice of c3 , the integer x + c3 y is q times an L[pξ/2]-smooth
value. The quantities x + c2 y and T (c1 , c2 , c3 ) are both L[ ξ/2]-smooth
(by the choice of c2 ). We forced −M 6 c1 6 M , that is, the index of
x + c1 y is already available from the first stage (so also is indg (y 2 )).
Therefore, we obtain the desired index of the medium-sized prime q.

Asymptotically, the cubic sieve method is faster than the L[1]-time al-
gorithms discussed earlier. However, the quadratic and cubic terms in the
subexponential values c1 , c2 , c3 make the values of T (c1 , c2 , c3 ) rather large
compared to pξ . As a result, the theoretically better performance of the cubic
sieve method does not show up unless the bit size of p is rather large. Practical
experiences suggest that for bit sizes > 200, the cubic sieve method achieves
some speedup over the L[1] algorithms. On the other hand, the number-field
sieve method takes over for bit sizes > 300. To sum up, the cubic sieve method
Discrete Logarithms 369

seems to be a good choice only for a narrow band (200–300) of bit sizes. An-
other problem with the cubic sieve method is that its second stage is as slow as
its first stage, and much slower than the second stages of the L[1] algorithms.
The biggest trouble attached to the cubic sieve method is that we do not
know how to compute x, y, z as small as possible satisfying x3 ≡ y 2 z (mod p)
and x3 6= y 2 z. A solution is naturally available only for some special primes
p. General applicability of the cubic sieve method thus remains unclear.

7.2.6 Number-Field Sieve Method (NFSM)


The number-field sieve method11 is the fastest known algorithm for com-
puting discrete logarithms in prime fields. It is an adaptation of the method
with the same name for factoring integers. It can also be viewed as a general-
ization of the Gaussian integer method.
We start with an irreducible (over Q) polynomial f (x) ∈ Z[x] and an
integer m satisfying f (m) ≡ 0 (mod p). We extend Q by adjoining a root
θ of f (x): K = Q(θ). The ring OK of integers of K is the number ring
to be used in the algorithm. For simplicity, we assume that OK supports
unique factorization of elements, and that elements of OK can be written as
polynomials in θ with rational coefficients. The map Φ taking θ 7→ m (mod p)
extends to a ring homomorphism OK → Fp .
For small coprime integers c1 , c2 , we consider the two elements T1 (c1 , c2 ) =
c1 + c2 θ ∈ OK and T2 (c1 , c2 ) = Φ(T1 (c1 , c2 )) = c1 + c2 m ∈ Z. Suppose that
T1 (c1 , c2 ) is smooth with respect to small primes qi of OK , and T2 (c1Q , c2 ) is
smooth with respect to small rational primes pj , that is, T1 (c1 , c2 ) = i qiαi ,
Q β
and T2 (c1 , c2 ) = j pj j . Application of the homomorphism Φ gives
à !
Y Y Y β
αi
Φ(T1 (c1 , c2 )) ≡ Φ qi ≡ Φ(qi )αi ≡ T2 (c1 , c2 ) ≡ pj j (mod p).
i i j

This leads to the relation


X X
αi indg (Φ(qi )) ≡ βj indg (pj ) (mod p − 1).
i j

In addition to the rational primes pj , we should, therefore, include Φ(qi ) in the


factor base B for small primes qi of OK . We also consider a set of generators
of the group of units of OK , and include Φ(u) in B for each such genera-
tor u. I do not go into further detail on the NFSM, but mention only that
this ¡method (under the assumptions mentioned
¢ above) runs in approximately
exp (1.526 + o(1))(ln n)1/3 (ln ln n)2/3 time.
Element-wise unique factorization may fail in the ring OK . If so, we need
to look at unique factorization
¡ at the level of ideals. This variant
¢ of the NFSM
runs in approximately exp (1.923 + o(1))(ln n)1/3 (ln ln n)2/3 time.
11 Daniel M. Gordon, Discrete logarithms in GF(p) using the number field sieve, SIAM

Journal of Discrete Mathematics, 6, 124–138, 1993.


370 Computational Number Theory

7.3 Algorithms for Fields of Characteristic Two


Like prime fields, the finite fields F2n of characteristic two (also called
binary fields) find diverse applications. Indeed, the arithmetic in binary fields
can be implemented efficiently (compared to the general extension fields Fpn ).
In this section, we assume that F2n has the polynomial-basis representation
F2 (θ) with f (θ) = 0, where f (x) ∈ F2 [x] is an irreducible polynomial of degree
n. Moreover, we assume that a primitive element g(θ) ∈ F∗2n is provided to
us. We calculate logarithms of elements of F∗2n to the base g(θ).
All index calculus methods for F2n essentially try to factor non-zero poly-
nomials of F2 [x] into irreducible polynomials of small degrees. The concept of
smoothness of polynomials plays an important role in this context. Before pro-
ceeding to a discussion of the algorithms, we need to specify density estimates
for smooth polynomials (a counterpart of Theorem 6.13).

Theorem 7.11 A non-zero polynomial in F2 [x] of degree k has all its irre-
ducible factors with degrees 6 m with probability
µ ¶
k k
p(k, m) = exp (−1 + o(1)) ln .
m m

This estimate is valid for k → ∞, m → ∞, and k 1/100 6 m 6 k 99/100 .


A polynomial h(x) chosen uniformly randomly from the set of non-zero
polynomials in F2 [x] of degrees < l has a probability 2k /(2l − 1) ≈ 2−(l−k) to
have degree exactly k. Therefore, such an h(x) has all irreducible factors of
degrees 6 m with approximate probability
l−1 · ¸
X
−(l−k) (le/m)1/m
2 p(k, m) ≈ p(l, m) .
k=0
2 − (le/m)1/m

As l → ∞, m → ∞, and l1/100 6 m 6 l99/100 , this probability is approximately


equal to p(l, m). ⊳

A special case of this estimate is highlighted now.



Corollary 7.12 Let l = αn and m = β n ln n for some positive real con-
stants α, β with α 6 1. Then,
µ ¶ · ¸
α√ α
p(l, m) ≈ exp − n ln n = L − ,
2β 2β
³ √ ´
where the (revised) notation L[γ] stands for exp (γ + o(1)) n ln n . ⊳
Discrete Logarithms 371

7.3.1 Basic Index Calculus Method


We start with a factor base B consisting of non-constant irreducible poly-
nomials w(x) ∈ F2 [x] of degrees 6 m (the choice of m is specified later). In
the first stage, we compute the discrete logarithms of all w(θ) to the base
g(θ). To this end, we raise g(θ) to random exponents α, and try to express
the canonical representatives of g(θ)α ∈ F2n as products of polynomials w(θ)
with w(x) ∈ B. If a factoring attempt is successful, we obtain a relation:
Y
g(x)α ≡ w(x)γw (mod f (x)),
w(x)∈B

or equivalently,
Y
g(θ)α = w(θ)γw ∈ F2n = F2 (θ).
w(x)∈B

Taking the discrete logarithm of both sides, we get


X
α≡ γw indg(θ) w(θ) (mod 2n − 1).
w∈B

This is a linear congruence in the variables indg(θ) w(θ). Let |B| = t. We vary
α, and generate s relations with t 6 s 6 2t. The resulting s × t system is
solved modulo 2n − 1 to obtain the indices of the elements of the factor base.
The second stage computes individual logarithms using the database ob-
tained from the first stage. Suppose that we want to compute the index of
a(θ) ∈ F∗2n . We pick random α, and attempt to decompose a(θ)g(θ)α into
irreducible factors of degrees 6 m. A successful factoring attempt gives
Y
a(θ)g(θ)α = w(θ)δw , that is,
w∈B
X
indg(θ) a(θ) ≡ −α + δw indg(θ) w(θ) (mod 2n − 1).
w∈B

Example 7.13 Let us take n = 17, and represent F217 = F2 (θ), where θ17 +
θ3 + 1 = 0. The size of F∗2n is 2n − 1 = 131071 which is a prime. Therefore,
every element of F2n other than 0, 1 is a generator of F∗2n . Let us compute
discrete logarithms to the base g(θ) = θ7 + θ5 + θ3 + θ. We choose m = 4, that
is, the factor base is B = {w1 , w2 , w3 , w4 , w5 , w6 , w7 , w8 }, where
w1 = θ,
w2 = θ + 1,
w3 = θ2 + θ + 1,
w4 = θ3 + θ + 1,
w5 = θ3 + θ2 + 1,
w6 = θ4 + θ + 1,
w7 = θ4 + θ3 + θ2 + θ + 1,
w8 = θ4 + θ3 + 1.
372 Computational Number Theory

In the first stage, we compute g(θ)α for random α ∈ {1, 2, . . . , 2n − 2}.


Some values of α, leading to B-smooth values of g(θ)α , are shown below.

α g(θ)α
73162 θ + θ + θ + θ + θ + θ + θ6 + θ5 + θ4 + θ3 + 1
16 15 12 11 10 8

= (θ2 + θ + 1)3 (θ3 + θ + 1)2 (θ4 + θ + 1)


87648 θ16 + θ12 + θ11 + θ8 + θ7 + θ5 + θ4 + θ2 + θ + 1
= (θ + 1)2 (θ2 + θ + 1)2 (θ3 + θ + 1)(θ3 + θ2 + 1)(θ4 + θ3 + 1)
18107 θ15 + θ14 + θ13 + θ12 + θ8 + θ7 + θ6 + θ5
= θ5 (θ + 1)4 (θ3 + θ + 1)(θ3 + θ2 + 1)
31589 θ16 + θ14 + θ12 + θ11 + θ10 + θ8 + θ7 + θ5 + θ + 1
= (θ + 1)7 (θ2 + θ + 1)3 (θ3 + θ + 1)
26426 θ14 + θ13 + θ11 + θ9 + θ8 + θ7 + θ6 + θ5 + θ4 + θ3 + θ
= θ(θ3 + θ + 1)3 (θ4 + θ3 + θ2 + θ + 1)
74443 θ15 + θ14 + θ13 + θ11 + θ10 + θ6 + θ4 + θ3 + θ + 1
= (θ + 1)(θ2 + θ + 1)(θ3 + θ + 1)3 (θ3 + θ2 + 1)
29190 θ16 + θ14 + θ13 + θ10 + θ7 + θ3 + θ
= θ(θ2 + θ + 1)(θ3 + θ2 + 1)3 (θ4 + θ + 1)
109185 θ16 + θ15 + θ14 + θ13 + θ11 + θ9 + θ8 + θ7 + θ6 + θ2 + θ + 1
= (θ + 1)3 (θ2 + θ + 1)(θ3 + θ2 + 1)(θ4 + θ3 + 1)(θ4 + θ3 + θ2 + θ + 1)

Taking logarithms of the above relations leads to the following linear system.
We use the notation di = indg (wi ).
    
0 0 3 2 0 1 0 0 d1 73162
 0 2 2 1 1 0 0 1   d2   87648 
    
 5 4 0 1 1 0 0 0   d3   18107 
    
 0 7 3 1 0 0 0 0   d4   31589 
   ≡   (mod 131071).
 1 0 0 3 0 0 1 0   d5   26426 
    
 0 1 1 3 1 0 0 0   d6   74443 
    
1 0 1 0 3 1 0 0 d7 29190
0 3 1 0 1 0 1 1 d8 109185
Solving the system gives the indices of the elements of the factor base:

d1 ≡ indg (w1 ) ≡ indg (θ) ≡ 71571  

d2 ≡ indg (w2 ) ≡ indg (θ + 1) ≡ 31762  

2 
d3 ≡ indg (w3 ) ≡ indg (θ + θ + 1) ≡ 5306  

3 
d4 ≡ indg (w4 ) ≡ indg (θ + θ + 1) ≡ 55479
3 2 (mod 131071).
d5 ≡ indg (w5 ) ≡ indg (θ + θ + 1) ≡ 2009  
4 
d6 ≡ indg (w6 ) ≡ indg (θ + θ + 1) ≡ 77357  

d7 ≡ indg (w7 ) ≡ indg (θ4 + θ3 + θ2 + θ + 1) ≡ 50560 



4 3 
d8 ≡ indg (w8 ) ≡ indg (θ + θ + 1) ≡ 87095
In the second stage, we compute the index of a(θ) = θ15 + θ7 + 1. For the
choice α = 3316, the element a(θ)g(θ)α factors completely over B.
a(θ)g(θ)3316 = θ16 + θ14 + θ9 + θ6 + θ4 + θ3 + θ2 + 1
= (θ + 1)3 (θ2 + θ + 1)3 (θ3 + θ2 + 1)(θ4 + θ3 + 1).
Discrete Logarithms 373

This gives
indg a ≡ −3316 + 3d2 + 3d3 + d5 + d8
≡ −3316 + 3 × 31762 + 3 × 5306 + 2009 + 87095
≡ 65921 (mod 131071).
One can verify that (θ7 + θ5 + θ3 + θ)65921 = θ15 + θ7 + 1. ¤
I now deduce the optimal running time of this basic index calculus method.
In the first stage, α is chosen randomly from {1, 2, . . . , 2n −2}, and accordingly,
g α is a random√ element of F∗2n , that is, a polynomial of degree O(n). We
take m = c n ln n for some positive real constant c. By Corollary £ 7.12,
¤ the
α 1
probability
£1¤ that all irreducible factors of g have degrees 6 m is L − 2c , that
is, L 2c random values of α need to be tried for obtaining a single relation.
By Corollary 3.7, the total number of irreducible polynomials in F2 [x] of
degree k is nearly 2k /k. Therefore, the size of the factor base ³ is t = |B| ´≈
Pm k m m
k=1 2 /k. Evidently, 2 /m 6 t 6 m2 , that is, t = exp (ln 2 + o(1))m .
√ ³ √ ´
Putting m = c n ln n gives t = exp (c ln 2 + o(1)) n ln n = L[c ln 2]. Since
s £relations (with
¤ t 6 s 6 2t) need to be generated, an expected number of
1
L 2c + c ln 2 random values of α need to be tried. Each such trial involves
factoring g(θ)α . We have polynomial-time (randomized) algorithms for fac-
toring polynomials over finite fields (trial division by L[c£ ln 2] elements ¤ of B is
1
rather costly), so the relation-collection stage runs in L 2c + c ln 2 time. The
1

quantity 2c + c ln 2 is minimized for c = 1/ 2 ln 2, which leads to a running

time of L[ 2 ln 2 ] = L[1.1774 . . .] for the ·qrelation-collection
¸ phase.
ln 2
The size of the factor base is t = L 2 = L[0.5887 . . .]. Each relation
contains at most O(m) irreducible polynomials, so the resulting system of
congruences is√ sparse and can be solved in (essentially) quadratic time, that
is, in time L[ 2 ln 2 ]—the same as taken by the relation-collection phase.
The second stage involves obtaining a single relation (one smooth
·q value ¸
α
£ 1
¤ ln 2
of ag ), and can be accomplished in expected time L 2c = L 2 =
L[0.5887 . . .], that is, much faster than the first stage.

7.3.1.1 A Faster Relation-Collection Strategy


Blake et al.12 propose a heuristic trick to speed up the relation-collection
stage. Although their trick does not improve the asymptotic running time,
it significantly reduces the number of iterations for obtaining each relation.
Let me explain this trick in connection with the first stage. An analogous
improvement applies to the second stage too.
12 Ian F. Blake, Ryoh Fuji-Hara, Ronald C. Mullin and Scott A. Vanstone, Computing

logarithms in finite fields of characteristic two, SIAM Journal of Algebraic and Discrete
Methods, 5, 276–285, 1984.
374 Computational Number Theory

Let us denote h(θ) = g(θ)α for a randomly chosen α. Instead of making an


attempt to factor h(θ), we first run an extended gcd calculation on f (x) and
h(x) in F2 [x]. The gcd loop maintains an invariance ui (x)f (x) + vi (x)h(x) =
ri (x) for polynomials ui (x), vi (x), ri (x) ∈ F2 [x]. Initially, vi (x) = 1 is of low
degree, and ri (x) = h(x) is of degree O(n). Eventually, the remainder polyno-
mial ri (x) becomes 1 (low degree), and vi (x) attains a degree O(n). Somewhere
in the middle, we expect to have deg vi (x) ≈ deg ri (x) ≈ n/2. When this hap-
pens, we stop the extended gcd loop. Since f (θ) = 0, we get vi (θ)h(θ) = ri (θ),
that is, h(θ) = vi (θ)−1 ri (θ) with both vi and ri of degrees nearly n/2.
We replace the attempt to factor h(θ) by an attempt to factor both vi (θ)
and ri (θ). The probability that h(θ) is smooth with respect to B is about
p(n, m), whereas the probability that both vi (θ) and ri (θ) are smooth with
respect to B is about p(n/2, m)2 . From Theorem 7.11, we have

p(n/2, m)2 ³ n ´
≈ exp (1 + o(1)) ln 2 ≈ 2n/m .
p(n, m) m

This means that we expect to obtain smooth values of ri (θ)/vi (θ) about 2n/m
times more often than we expect to find smooth values of h(θ).√ Although this
factor is absorbed in the o(1) term in the running time L[ 2 ln 2 ] of the first
stage, the practical benefit of this trick is clearly noticeable.

Example 7.14 Let us continue to use the parameters of Example 7.13. I


demonstrate a situation where h(θ) fails to be smooth, whereas ri (θ)/vi (θ) is
smooth and leads to a relation.
Suppose we choose α = 39864. We obtain h(θ) = g(θ)α = θ16 + θ12 + θ8 +
θ7 + θ6 + θ + 1 = (θ2 + θ + 1)2 (θ12 + θ10 + θ8 + θ3 + θ2 + θ + 1), that is, h(θ) fails
to factor completely over B (the degree-twelve polynomial being irreducible).
We now compute the extended gcd of f (x) and h(x). Since we are interested
only in the v sequence, there is no need to keep track of the u sequence. The
following table summarizes the iterations of the extended gcd loop, with the
first two rows (i = 0, 1) corresponding to the initialization step.

i qi ri vi
0 − f (x) = x17 + x3 + 1 0
1 − h(x) = x16 + x12 + x8 + x7 + x6 + x + 1 1
2 x x13 + x9 + x8 + x7 + x3 + x2 + x + 1 x
3
3 x x + x10 + x8 + x7 + x5 + x4 + x3 + x + 1
11
x4 + 1
4 x2 + x + 1 x9 + x8 + x7 + x5 + x3 + x2 + x x6 + x5 + x4 + x2 + 1
5 x2 + 1 x7 + x5 + x3 + x2 + 1 x8 + x7 + x5 + x4

We stop the extended gcd loop as soon as deg ri (x) 6 n/2. In this example,
this happens for i = 5. We have the factorizations:

r5 (θ) = θ7 + θ5 + θ3 + θ2 + 1 = (θ3 + θ + 1)(θ4 + θ + 1),


v5 (θ) = θ8 + θ7 + θ5 + θ4 = θ4 (θ + 1)2 (θ2 + θ + 1).
Discrete Logarithms 375

This implies that

g(θ)39864 = h(θ) = r5 (θ)/v5 (θ)


= θ−4 (θ + 1)−2 (θ2 + θ + 1)−1 (θ3 + θ + 1)(θ4 + θ + 1).

Taking logarithm to the base g(θ) yields the relation:

−4d1 − 2d2 − d3 + d4 + d6 ≡ 39864 (mod 131071).

The above extended-gcd table shows that deg vi (x) + deg ri (x) ≈ n for
all values of i. Therefore, when deg ri (x) ≈ n/2, we have deg vi (x) ≈ n/2
too. This is indeed the expected behavior. However, there is no theoretical
guarantee (or proof) that this behavior is exhibited in all (or most) cases. As
a result, the modification of Blake et al. is only heuristic. ¤

7.3.2 Linear Sieve Method (LSM)


The basic index calculus method checks the smoothness of polynomials of
degrees nearly n. Blake et al.’s modification speeds this up by checking pairs
of polynomials for smoothness, each having degree about n/2. The linear sieve
method, an adaptation of the quadratic sieve method for factoring integers or
the linear sieve method for prime fields, checks single polynomials of degrees
about n/2 for smoothness. This gives the linear sieve method a running time
of L[0.8325 . . .], better than L[1.1774 . . .] time taken by the basic method.
We assume that the defining polynomial f (x) is of the special form xn +
f1 (x), where f1 (x) is a polynomial of low degree. For the linear sieve method,
it suffices to have f1 (x) of degree no more than n/2. Density estimates for
irreducible polynomials in F2 [x] guarantee that we expect to find irreducible
polynomials of the form xn + f1 (x) with deg f1 (x) as low as O(log n).
Let ν = ⌈n/2⌉. We consider a factor base B with two parts: B1 containing
non-constant irreducible polynomials of F2 [x] having degrees 6 m (the choice
of m to be made precise later), and B2 containing polynomials of the form
xν + c(x) with polynomials c(x) ∈ F2 [x] of degrees < m (or 6 m). The size of
B1 is between 2m /m and m2m , as deduced earlier. The size of B2 is 2m (or
2m+1 if we allow deg c(x) = m). To sum up, |B| = O˜(2m ).
Let us multiply two polynomials xν + c1 (x) and xν + c2 (x) from B2 :

(xν + c1 (x))(xν + c2 (x))


≡ x2ν + (c1 (x) + c2 (x))xν + c1 (x)c2 (x)
≡ xǫ f1 (x) + (c1 (x) + c2 (x))xν + c1 (x)c2 (x) (mod f (x)).

Here, ǫ = 2 ⌈n/2⌉ − n is 0 or 1 (depending on whether n is even or odd). Let


(c1 , c2 ) = xǫ f1 (x) + (c1 (x) + c2 (x))xν + c1 (x)c2 (x). If m ≪ n (in fact,
us call T√
m = O( n ln n), as we see later), T (c1 , c2 ) is of degree slightly larger than
376 Computational Number Theory

n/2. We try to factor T (c1 , c2 ) completely over the irreducible polynomials of


B1 . If the factoring attempt is successful, we get a relation:
(xν + c1 (x))(xν + c2 (x))
≡ xǫ f1 (x) + (c1 (x) + c2 (x))xν + c1 (x)c2 (x)
Y
≡ w(x)γw (mod f (x)),
w∈B1

or, equivalently,
Y
(θν +c1 (θ))(θν +c2 (θ)) = θǫ f1 (θ) + (c1 (θ)+c2 (θ))θν + c1 (θ)c2 (θ) = w(θ)γw .
w∈B1

Taking logarithm, we get


X
indg(θ) (θν + c1 (θ)) + indg(θ) (θν + c2 (θ)) ≡ γw indg(θ) (w(θ)) (mod 2n − 1).
w∈B1

This is a linear congruence in the indices of the elements of the factor base
B = B1 ∪ B2 . We assume that g(x) itself is an irreducible polynomial of small
degree, that is, g(x) = wk (x) ∈ B1 for some k. This gives us a free relation:
indg(θ) (wk (θ)) ≡ 1 (mod 2n − 1).
As c1 and c2 range over all polynomials in B2 , many relations are generated.
Let t = |B|. The parameter m should be so chosen that all (c1 , c2 ) pairs lead
to an expected number s of relations, satisfying t 6 s 6 2t. The resulting
system of linear congruences is then solved modulo 2n − 1. This completes the
first stage of the linear sieve method.

Example 7.15 Let us take n = 17, and represent F217 = F2 (θ), where θ17 +
θ3 + 1 = 0. Here, f1 (x) = x3 + 1 is of degree much smaller than n/2. Since
|F∗217 | = 131071 is a prime, every element of F∗217 , other than 1, is a generator
of the multiplicative group F∗217 .
We have ν = ⌈n/2⌉ = 9, and ǫ = 2ν − n = 1. We take m = 4, that is, B1
consists of the eight irreducible polynomials w1 , w2 , . . . , w8 of Example 7.13,
whereas B2 consists of the 24 = 16 polynomials x9 + a3 x3 + a2 x2 + a1 x + a0
with each ai ∈ {0, 1}. Let us name these polynomials as w9+(a3 a2 a1 a0 )2 , that
is, B2 = {w9 , w10 , . . . , w24 }. It follows that |B| = |B1 | + |B2 | = 8 + 16 = 24.
We vary c1 and c2 over all polynomials of B2 . In order to avoid repetitions,
we take c1 = wi for i = 9, 10, . . . , 24, and, for each i, we take c2 = wj for
j = i, i + 1, . . . , 24. Exactly 24 smooth polynomials T (c1 , c2 ) are obtained.
Let us now see how these smooth values of T (c1 , c2 ) lead to linear con-
gruences. As an example, consider the relation (x9 + x2 + 1)(x9 + x2 + x) ≡
x2 (x + 1)2 (x3 + x + 1)(x3 + x2 + 1) (mod f (x)). Substituting x = θ gives
(θ9 + θ2 + 1)(θ9 + θ2 + θ) = θ2 (θ + 1)2 (θ3 + θ + 1)(θ3 + θ2 + 1), that is,
w14 w15 = w12 w22 w4 w5 , that is, 2d1 +2d2 +d4 +d5 −d14 −d15 ≡ 0 (mod 217 −1),
where di = indg(θ) (wi (θ)).
Discrete Logarithms 377

c1 c2 T (c1 , c2 )
0 0 x4 + x = x(x + 1)(x2 + x + 1)
0 x3 + 1 x12 + x9 + x4 + x = x(x + 1)9 (x2 + x + 1)
1 1 x4 + x + 1
x x x4 + x2 + x = x(x3 + x + 1)
x x2 x11 + x10 + x4 + x3 + x = x(x2 + x + 1)3 (x4 + x + 1)
3
x x +1 x12 + x10 + x9 = x9 (x3 + x + 1)
x x3 + x2 + 1x12 + x11 + x10 + x9 + x3
= x3 (x2 + x + 1)(x3 + x2 + 1)(x4 + x3 + x2 + x + 1)
x+1 x+1 x4 + x2 + x + 1 = (x + 1)(x3 + x2 + 1)
3 2
x+1 x +x x12 + x11 + x10 + x9 + x2 + x
= x(x + 1)(x2 + x + 1)2 (x3 + x + 1)2
x+1 x3 + x2 + x x12 + x11 + x9 = x9 (x3 + x2 + 1)
x2 x2 x
2 2
x +1 x +1 x+1
x2 + 1 x2 + x x10 + x9 + x3 + x2
= x2 (x + 1)2 (x3 + x + 1)(x3 + x2 + 1)
x2 + x x2 + x x2 + x = x(x + 1)
2 2
x +x x +x+1 x9
x2 + x x3 + x2 + x + 1 x12 + x9 + x5 + x4 = x4 (x + 1)4 (x4 + x + 1)
x2 + x + 1 x2 + x + 1 x2 + x + 1
2 3
x +x+1 x x12 + x11 + x10 + x9 + x5 + x3 + x
= x(x3 + x + 1)(x4 + x + 1)(x4 + x3 + 1)
x3 + 1 x3 + 1 x6 + x4 + x + 1 = (x + 1)(x2 + x + 1)(x3 + x + 1)
3 3
x +x x +x x6 + x4 + x2 + x = x(x + 1)(x4 + x3 + 1)
3 3
x +x x +x+1 x9 + x6 + x4 + x3 + x2 = x2 (x2 + x + 1)2 (x3 + x + 1)
3 3 2
x +x x +x x11 + x10 + x6 + x5 + x3 + x
= x(x + 1)5 (x2 + x + 1)(x3 + x2 + 1)
3 2 3 2
x +x x +x x6 + x = x(x + 1)(x4 + x3 + x2 + x + 1)
x + x + x x + x + x x6 + x2 + x = x(x2 + x + 1)(x3 + x2 + 1)
3 2 3 2

All the 24 equations generated above are homogeneous, and correspond


to a solution di = 0 for all i. In order to make the system non-homogeneous,
we take g(θ) = wk (θ) for some k, 1 6 k 6 8. For example, if we take g(θ) =
θ2 + θ + 1, we obtain the free relation d3 ≡ 1 (mod 217 − 1).
The resulting 25 × 24 system is now solved to compute all the unknown
indices di for 1 6 i 6 24. This step is not shown here. ¤

For deducing the running time of the LSM, we take m = c n ln n. The size
of B is then L[c ln 2]. The degree of each T (c1 , c2 ) is nearly n/2, so T (c1 , c2 )
1
is smooth with respect to B1 with an approximate probability of L[− 4c ] (by
Corollary 7.12). The total number of T (c1 , c2 ) values that are checked for
smoothness is 2m (2m + 1)/2 ≈ 22m−1 = L[2c ln 2]. We, therefore, expect to
1
obtain L[− 4c + 2c ln 2] relations. In order that the number of relations is
1
between t and 2t, we then require − 4c + 2c ln 2 ≈ c ln 2, that is, c ≈ 2√1ln 2 .
Each T (c1 , c2 ) can be factored in (probabilistic) polynomial time. There are
about 22m−1 = L[2c ln 2] values of the pair (c1 , c2 ), so the relation-collection
stage can be completed in L[2c ln 2] time. Solving L[c ln 2] sparse congruences
378 Computational Number Theory

in L[c ln 2] variables requires the same time. To sum √ up, the first stage of the
linear sieve method for F2n runs in L[2c ln 2] = L[ ln 2 ] = L[0.83255 . . .] time.
For the fields Fp , we analogously need to factor all T (c1 , c2 ) values (see
Section 7.2.2). In that case, T (c1 , c2 ) are integers, and we do not know any
easy way to factor them. Using trial division by the primes in the factor base
leads to subexponential time for each T (c1 , c2 ). That is why we used sieving
in order to reduce the amortized effort of smoothness checking.
For F2n , on the other hand, we know good (polynomial-time) algorithms
(although probabilistic, but that does not matter) for factoring each polyno-
mial T (c1 , c2 ). Trial division is a significantly slower strategy, since the factor
base contains a subexponential number of irreducible polynomials of small √ de-
grees. Moreover, sieving is not required to achieve the running time L[ ln 2 ],
and the name linear sieve method for F2n sounds like a bad choice.
However, a kind of polynomial sieving (Exercise 7.26) can be applied to
all the sieve algorithms for F2n . These sieves usually do not improve upon the
running time in the L[ ] notation, because they affect only the o(1) terms in
the running times. But, in practice, these sieves do possess the potential of
significantly speeding up the relation-collection stages.
In the second stage, we need to compute individual logarithms. If we
use a strategy
√ similar to the second stage of the basic method, we spend
1
L[ 2c ] = L[ ln 2 ] = L[0.83255 . . .] time for each individual logarithm. This is
exactly the same as the running time of the first stage. The problem with this
strategy is that we now have a smaller database of indices of small irreducible
polynomials, compared to the basic method (L[ 2√1ln 2 ] instead of L[ √21ln 2 ]).
Moreover, we fail to exploit the database of indices of the elements of B2 .
A strategy similar to the second stage of the linear sieve method for prime
fields (Section 7.2.2) can be adapted to F2n (solve Exercise 7.19).

7.3.3 Cubic Sieve Method (CSM)


An adaptation of the cubic sieve method for factoring integers or com-
puting discrete logarithms in prime fields, the cubic sieve method for F2n
generates a set of smoothness candidates T (c1 , c2 ), each of degree about n/3.
The reduction in the degrees of these T (c1 , c2 ) values leads to an improved
running time of L[0.67977 . . .]. The cubic sieve method for factoring integers
or for index calculations in prime fields encounters the problem of solving the
congruence x3 ≡ y 2 z (mod n) or x3 ≡ y 2 z (mod p). The cubic sieve method for
computing indices in F2n does not suffer from this drawback, and is straight-
away applicable for any extension degree n. The only requirement now is that
the defining polynomial f (x) should be of the form f (x) = xn + f1 (x) with
deg f1 (x) 6 n/3. As argued in connection with the linear sieve method for
F2n , such an f (x) exists with high probability, and can be easily detected.
Let us take ν = ⌈n/3⌉, and ǫ = 3 ⌈n/3⌉ − n ∈ {0, 1, 2}. The factor base
B now consists of two parts: B1 containing irreducible polynomials in F2 [x]
of degrees 6 m, and B2 containing polynomials of the form xν + c(x) with
Discrete Logarithms 379

c(x) ∈ F2 [x], and deg c(x) < m. Consider the product:


(xν + c1 (x))(xν + c2 (x))(xν + c1 (x) + c2 (x))
≡ x3ν + (c1 (x)2 + c1 (x)c2 (x) + c2 (x)2 )xν + c1 (x)c2 (x)(c1 (x) + c2 (x))
≡ xǫf1 (x) + (c1 (x)2 + c1 (x)c2 (x) + c2 (x)2 )xν +
c1 (x)c2 (x)(c1 (x) + c2 (x)) (mod f (x)).
Let us denote the last polynomial expression by T (c1 , c2 ) (or as T (c
√1 , c2 , c3 )
with c3 = c1 +c2 ). But deg f1 (x) 6 n/3 and m ≪ n (indeed, m = O( n ln n)),
so the degree of T (c1 , c2 ) is slightly larger than n/3. We check whether
T (c1 , c2 ) factors completely over irreducible polynomials in F2 [x] of degrees
6 m (that is, over B1 ). If so, we get a relation:
Y
(xν + c1 (x))(xν + c2 (x))(xν + c1 (x) + c2 (x)) ≡ w(x)γw (mod f (x)),
w∈B1
or equivalently,
Y
(θν + c1 (θ))(θν + c2 (θ))(θν + c1 (θ) + c2 (θ)) = w(θ)γw ,
w∈B1
which leads to the linear congruence
indg(θ) (θν + c1 (θ)) + indg(θ) (θν + c2 (θ)) + indg(θ) (θν + c1 (θ) + c2 (θ))
X
≡ γw indg(θ) (w(θ)) (mod 2n − 1).
w∈B1

By varying c1 and c2 , we collect many such relations. We also assume that


g(θ) = wk (θ) ∈ B1 for some k. This gives us a free relation
indg(θ) (wk (θ)) ≡ 1 (mod 2n − 1).
The resulting system is solved to compute the indices of the elements of B.

Example 7.16 It is difficult to illustrate the true essence of the cubic sieve
method for small values of n, as we have done in Examples 7.13 and 7.15. This
is because if n is too small, the expressions c21 + c1 c2 + c22 and c1 c2 (c1 + c2 )
lead to T (c1 , c2 ) values having degrees comparable to n.
We take n = 31, so ν = ⌈n/3⌉ = 11, and ǫ = 3ν − n = 2. The size of F∗231
is 231 − 1 = 2147483647 which is a prime. Thus, every element of F231 , other
than 0 and 1, is a generator of the group F∗231 . We represent F231 as F2 (θ),
where θ31 + θ3 + 1 = 0, that is, f1 (x) = x3 + 1 is of suitably small degree.
Let us choose m = 8. B1 contains all non-constant irreducible polynomials
of F2 [x] of degrees 6 8. There are exactly 71 such polynomials. B2 consists of
the 28 = 256 polynomials of the form x11 + c(x) with deg c(x) < 8. The size
of the factor base is, therefore, |B| = |B1 | + |B2 | = 71 + 256 = 327.
We generate the polynomials T (c1 , c2 , c3 ) (with c3 = c1 +c2 ), and check the
smoothness of these polynomials over B1 . The polynomial T (c1 , c2 , c3 ) remains
the same for any of the six permutations of the arguments c1 , c2 , c3 . To avoid
380 Computational Number Theory

repetitions, we force the condition c1 (2) 6 c2 (2) 6 c3 (2) before considering


T (c1 , c2 , c3 ). There are exactly 11,051 such tuples. Out of them, only 487 tuples
lead to smooth values of T (c1 , c2 , c3 ). I am not going to list all these values. Let
us instead look at a single demonstration. Take c1 (x) = x5 +x3 +x2 +x+1, and
c2 (x) = x7 +x6 +x4 +x3 +x2 , so c3 (x) = c1 (x)+c2 (x) = x7 +x6 +x5 +x4 +x+1.
Note that c1 (2) = 47, c2 (2) = 220, and c3 (2) = 243, so, for this (c1 , c2 , c3 ),
the polynomial T (c1 , c2 , c3 ) is computed and factored.

(x11 + x5 + x3 + x2 + x + 1) × (x11 + x7 + x6 + x4 + x3 + x2 ) ×
(x11 + x7 + x6 + x5 + x4 + x + 1)
≡ x2 (x3 + 1) + (c21 + c1 c2 + c22 )x11 + c1 c2 (c1 + c2 )
≡ x25 + x22 + x20 + x19 + x16 + x15 + x14 + x12 + x10 + x9 + x8 +
x6 + x5 + x4 + x3
≡ x3 (x3 + x2 + 1)(x5 + x2 + 1)(x6 + x + 1) ×
(x8 + x7 + x6 + x4 + x3 + x2 + 1) (mod f (x)).

Putting x = θ gives

(θ11 + θ5 + θ3 + θ2 + θ + 1) × (θ11 + θ7 + θ6 + θ4 + θ3 + θ2 ) ×
(θ11 + θ7 + θ6 + θ5 + θ4 + θ + 1)
= θ3 (θ3 + θ2 + 1)(θ5 + θ2 + 1)(θ6 + θ + 1) ×
(θ8 + θ7 + θ6 + θ4 + θ3 + θ2 + 1).

If we number the polynomials in B in an analogous fashion as in Example 7.15,


this relation can be rewritten as

w119 w292 w315 = w13 w5 w9 w15 w67 .

With the notation di = indg(θ) (wi (θ)), we have the linear congruence

3d1 + d5 + d9 + d15 + d67 − d119 − d292 − d315 ≡ 0 (mod 231 − 1).

All these relations are homogeneous, so we need to include a free relation. For
instance, taking g(θ) = θ7 + θ + 1 = w24 gives d24 ≡ 1 (mod 231 − 1). ¤

Let us now deduce


√ the running time of the cubic sieve method for F2n .
We choose m = c n ln n for some constant c to be determined. The size of
B is t = L[c ln 2]. Since each T (c1 , c2 ) is a polynomial of degree ≈ n/3, its
1
probability of being smooth is L[− 6c ]. The total number of triples (c1 , c2 , c3 )
with c3 = c1 + c2 , distinct with respect to permutations of the elements, is
exactly 32 × 4m−1 + 2m−1 + 13 = Θ(22m ) = L[2c ln 2]. Thus, we expect to get
1
L[− 6c +2c ln 2] relations. In order to obtain a solvable system (more equations
1

than variables), we require − 6c + 2c ln 2 ≈ c ln 2, that is, c ≈ 1/ 6 ln 2. This
p
choice leads to a running time of L[2c ln 2] = L[ 2(ln 2)/3 ] for relation col-
lection. Solving a sparse system of L[c ln 2] congruences in as many variables
Discrete Logarithms 381

takes
p the same time. Thus, the first stage of the cubic sieve method runs in
L[ 2(ln 2)/3 ] = L[0.67977 . . .] time.
The second stage of the cubic sieve method for prime fields can be adapted
to work for F2n . The details are left to the reader (solve Exercise 7.20).

7.3.4 Coppersmith’s Method (CM)


¡ ¢
Coppersmith’s method13 runs in exp (c + o(1))n1/3 (ln n)2/3 time, and
is asymptotically faster than all L[c] algorithms for computing discrete loga-
rithms in F2n , discussed so far. These earlier algorithms generate smoothness
candidates of degrees proportional to n. Speedup (reduction of c in the run-
ning time L[c]) was obtained by decreasing the constant of proportionality.
Coppersmith’s method, on the other hand, generates smoothness candidates
of degrees about n2/3 . This results in an asymptotic improvement in the run-
ning time. Note, however, that although Coppersmith’s method has a running
time similar to that achieved by the number-field sieve method for factoring
integers or for computing indices in prime fields, Coppersmith’s method is not
an adaptation of the number-field sieve method, nor conversely.
We assume that the defining polynomial f (x) is of the special form f (x) =
xn + f1 (x) with f1 (x) having degree O(log n). We argued earlier that such an
irreducible polynomial f (x) can be found with very high probability.
Coppersmith’s method requires selecting some initial parameters.¥First, ¦ a
positive integer k is chosen such that 2k is O((n/ ln n)1/3 ). We set h = 2nk +1,
so h = O˜(n2/3 ). Two bounds, m and b, are also chosen. The factor base B
consists of all non-constant irreducible polynomials in F2 [x] of degrees 6 m,
whereas b stands for a bound on the degrees of c1 (x) and c2 (x) (see below).
A relation in the first stage of Coppersmith’s method is generated as fol-
lows. We choose polynomials c1 (x), c2 (x) of degrees less than b. In order to
avoid duplicate relations, we require gcd(c1 (x), c2 (x)) = 1. Let

T1 (x) = c1 (x)xh + c2 (x), and


k k k k k
T2 (x) ≡ T1 (x)2 ≡ T1 (x2 ) ≡ c1 (x2 )xh2 + c2 (x2 )
k k k
≡ c1 (x2 )xh2 −n
f1 (x) + c2 (x2 ) (mod f (x)).

The degree of T1 (x) is less than b + h, whereas the degree of T2 (x) is less than
(b + 1)2k . Since 2k = O˜(n1/3 ), choosing b = O˜(n1/3 ) implies that both T1
and T2 are of degrees about n2/3 (recall that h = O˜(n2/3 )).
If both T1 and T2 factor completely over the factor base B, that is, if
Y Y
T1 (x) = w(x)γw , and T2 (x) = w(x)δw ,
w∈B w∈B

13 Don Coppersmith, Fast evaluation of logarithms in fields of characteristic two, IEEE

Transactions on Information Theory, 30, 587–594, 1984.


382 Computational Number Theory
k
the condition T2 (x) ≡ T1 (x)2 (mod f (x)) leads to the congruence
X X
δw indg(θ) (w(θ)) ≡ 2k γw indg(θ) (w(θ)) (mod 2n − 1).
w∈B w∈B

By varying c1 and c2 , we generate more relations than variables (the size of


the factor base). Moreover, we assume that g(θ) is itself an irreducible polyno-
mial wj (θ) of small degree, so that we have the free relation indg(θ) (wj (θ)) ≡
1 (mod 2n − 1). The resulting system of linear congruences is solved to obtain
the indices of the elements of the factor base.

Example 7.17 As in Example 7.16, let us choose n = 31 and f (x) = x31 +


x3 + 1, ¥so that 3
¦ f1 (x) = x + 1 has a suitably small degree. We take k = 2,
k
so h = n/2 + 1 = 8. We choose m = 7, that is, the factor base B consists
of all non-constant irreducible polynomials in F2 [x] of degrees 6 7. There are
exactly 41 of them, that is, |B| = 41. Finally, we take the bound b = 5.
I now demonstrate how a relation can be obtained. Take c1 (x) = x2 + 1,
and c2 (x) = x4 + x2 + x. We compute the two smoothness candidates as

T1 (x) = c1 (x)xh + c2 (x) = x10 + x8 + x4 + x2 + x,


k k k
T2 (x) = c1 (x2 )xh2 −n f1 (x) + c2 (x2 ) = x16 + x12 + x9 + x8 + x,

both of which factor completely into irreducible polynomials of degrees 6 7:

T1 (x) = x(x4 + x3 + x2 + x + 1)(x5 + x4 + x3 + x2 + 1),


T2 (x) = x(x4 + x + 1)(x5 + x2 + 1)(x6 + x + 1).

Putting x = θ gives the congruence

d1 + d8 + d9 + d15 ≡ 4(d1 + d7 + d14 ) (mod 231 − 1),

where di = indg(θ) wi (θ), and wi are numbered by a scheme as in Example 7.13.


There are 512 pairs (c1 (x), c2 (x)) with c1 (x) and c2 (x) coprime and of
degrees < b = 5. We obtain 55 relations involving d1 , d2 , . . . , d41 . If we assume
g(θ) = w28 (θ), then we have the free relation d28 ≡ 1 (mod 231 − 1). ¤

Let us now look at the running time of Coppersmith’s method. The proba-
bility that both T1 and T2 are smooth is about p(b+h, m)p((b+1)2k , m). There
are exactly 22b−1 pairs (c1 , c2 ) for which the polynomials T1 and T2 are com-
puted. So the expected number of relations is 22b−1 p(b + h, m)p((b + 1)2k , m).
The factor base contains O˜(2m ) irreducible polynomials. In order to obtain a
system of congruences with slightly more equations than variables, we require

22b−1 p(b + h, m)p((b + 1)2k , m) ≈ 2m .

We take

2k = αn1/3 (ln n)−1/3 , m = βn1/3 (ln n)2/3 , b = γn1/3 (ln n)2/3 .


Discrete Logarithms 383

The optimal choices β = γ = 22/3 (3 ln 2)−2/3 = 0.97436 . . . and α = γ −1/2 =


1.01306 . . . lead to the running time for the relation-collection phase as:
³ ´
exp (1.35075 . . . + o(1))n1/3 (ln n)2/3 .

A sparse system of O˜(2m ) congruences can be solved in the same time.


The second stage of Coppersmith’s method is somewhat involved. Suppose
that we want to compute the index of a(θ) ∈ F∗2n to the base g(θ). As in the
basic method, we choose random integers α and compute a(θ)g(θ)α . Now,
we have a significantly smaller factor base than in the basic method. So each
a(θ)g(θ)α is smooth with a very small probability, and finding a suitable value
of α would take too much time. We instead demand a(θ)g(θ)α to have all
irreducible factors with degrees 6 n2/3 1/3
¡ (ln n) . The probability of ¢success
per trial is p(n, n (ln n) ) = exp (1.0986 . . . + o(1))n1/3 (ln n)2/3 . Since
2/3 1/3

each trial involves factoring a polynomial of degree < n (a task that can be
finished in probabilistic polynomial time), we find a desired factorization in
the same subexponential time. Suppose that some α gives
r
Y
a(θ)g(θ)α = ui (θ)
i=1

with the degree of each ui (θ) no more than n2/3 (ln n)1/3 . In order to determine
indg(θ) (a(θ)), it suffices to compute indg(θ) (ui (θ)) for all i.
We reduce the computation of each indg(θ) (ui (θ)) to the computation of
indg(θ) (ui,j (θ)) for some polynomials ui,j of degrees smaller than that of ui .
This recursive process of replacing the index of a polynomial by the indices of
multiple polynomials of smaller degrees is repeated until we eventually arrive
at polynomials with degrees so reduced that they belong to the factor base.
In order to explain the reduction process, suppose that we want to compute
the index of a polynomial u(θ) of degree d 6 n2/3 (ln n)1/3 . The procedure is
similar to the first stage of pCoppersmith’s method.¥ ¦We first choose a positive
integer k satisfying 2k ≈ n/d, and take h = 2nk + 1. For relatively prime
polynomials c1 (x), c2 (x) of small degrees, we consider the two polynomials
T1 (x) = c1 (x)xh + c2 (x), and
k k k k
T2 (x) ≡ T1 (x)2 ≡ c1 (x2 )xh2 −n
f1 (x) + c2 (x2 ) (mod f (x)).
We choose c1 (x) and c2 (x) in such a manner that T1 (x) is a multiple of u(x).
We choose a degree d′ , and want both T1 (x)/u(x) and T2 (x) to split into
irreducible factors of degrees 6 d′ . For a successful factorization, we have
 
Y Y
vi (x) ≡ 2k u(x) wj (x) (mod f (x)),
i j

where vi (x) and wj (x) are polynomials of degrees 6 d′ . But then, indg(θ) (u(θ))
can be computed if the indices of all vi (θ) and wj (θ) are computed.
384 Computational Number Theory

It can be shown that we can take d′ 6 d/1.1. In that case, the depth of
recursion becomes O(log n) (recursion continues until the polynomials of B
are arrived at). Moreover, each reduction gives a branching of no more than n
new index calculations. That is, the total number
£ of intermediate
¤ polynomials
created in the process is only nO(log n) = exp c(ln n)2 (for a positive constant
c), and so the running time of the second stage of Coppersmith’s method is
dominated by the initial ¡search for a suitable a(θ)g(θ)α .¢ We argued earlier
that this step runs in exp (1.0986 . . . + o(1))n1/3 (ln n)2/3 time.

7.4 Algorithms for General Extension Fields


Determination of indices in arbitrary fields Fqn (where q may be any prime
or already a power of a prime, and n > 1) is a relevant computational ques-
tion. Not many algorithms are studied in this area. It appears that many
subexponential-time algorithms used for prime and characteristic-two fields
can be adapted to the case of Fqn too. However, the applicability and analy-
sis of these adaptations have not been rigorously studied. On the contrary, a
specific adaptation of the number-field sieve method has been proposed and
both theoretically and experimentally studied.

7.4.1 A Basic Index Calculus Method


We represent Fqn as an extension of Fq by a monic irreducible polynomial
f (x) ∈ Fq [x] of degree n. Suppose that q is not too large, so that it is feasible
to work with factor bases B consisting of all monic irreducible polynomials
in Fq [x] of degrees 6 D for some bound D > 1. Let us compute indices with
respect to a generator g of F∗qn . For randomly chosen α ∈ {1, 2, . . . , q − 2}, we
compute g α as a polynomial in Fq [x] of degree < n. We attempt to factor this
polynomial completely over B. A successful factoring attempt gives a relation:
Y β
gα ≡ κ wi i (mod f (x)),
i

where κ ∈ F∗q , and wi are the (monic) irreducible polynomials in the factor
base B. Taking logarithm to the base g gives
X
α ≡ indg κ + βi indg wi (mod q n − 1).
i

After collecting many relations, we solve the system to compute the unknown
indices indg wi of the factor-base elements. See below to know how to handle
the unknown quantities indg κ for κ ∈ F∗q .
In the second stage, the individual logarithm of h ∈ F∗qn is computed by
locating one B-smooth value of hg α .
Discrete Logarithms 385

The problem with this method is that for large values of q, it may even be
infeasible to work with factor bases B containing only all the linear polyno-
mials of Fq [x] (there are q of them)—a situation corresponding to D = 1. A
suitable subset of linear polynomials may then act as the factor base.
Another problem associated with the basic method is that g α , even if
smooth over B, need not be monic. If its leading coefficient is κ, we need to
include indg κ too in the relation. However, this is not a serious problem. If
q is not very large, it is feasible to compute indices in F∗q . More precisely,
n
³ n ´
g ′ = g (q −1)/(q−1) is an element of F∗q , and indg κ = qq−1
−1
indg′ κ.

7.4.2 Function-Field Sieve Method (FFSM)


Proposed by Adleman14 and improved by Adleman and Huang15 and by
Joux and Lercier16 , the function-field sieve method adapts the number-field
sieve method to computing indices in Fqn . A brief intuitive overview of Adle-
man and Huang’s variant is presented here.
We represent Fqn by a monic irreducible polynomial f (x) ∈ Fq [x] of degree
n. A bivariate polynomial

d dX
X −1
H(x, y) = hi,j y i xj ∈ Fq [x, y]
i=0 j=0

is then chosen. This polynomial should satisfy the following eight conditions.
1. H(x, y) is irreducible in F̄q [x, y], where F̄q is the algebraic closure of Fq .
2. H(x, m(x)) is divisible by the defining polynomial f (x) for some (known)
univariate polynomial m(x) ∈ Fq [x].
3. hd,d′ −1 = 1.
4. hd,0 6= 0.
5. h0,d′ −1 6= 0.
Pd i
6. i=0 hi,d′ −1 y ∈ Fq [y] is square-free.
Pd′ −1 j
7. j=0 hd,j x ∈ Fq [x] is square-free.
8. The size of the Jacobian JFqn (H) is coprime to (q n − 1)/(q − 1).
The curve H(x, y) plays an important role here. The set Fq (H) of all rational
functions on H is called the function field of H (Section 4.4.2). The set of all
integers in Fq (H) is the set Fq [H] of all polynomial functions on H. Because
of the Condition 2 above, the function taking y 7→ m(x) (mod f (x)) naturally
extends to a ring homomorphism Φ : Fq [H] → Fqn [x].
14 Leonard M. Adleman, The function field sieve, ANTS, 108–121, 1994.
15 Leonard M. Adleman, Ming-Deh A. Huang, Function field sieve method for discrete
logarithms over finite fields, Information and Computation, 151(1-2), 5–16, 1999.
16 Antoine Joux and Reynald Lercier, The function field sieve in the medium prime case,

EuroCrypt, 254–270, 2006.


386 Computational Number Theory

Adleman and Huang propose the following method to determine the poly-
nomial H(x, y). The defining polynomial f (x) ∈ Fq [x] is monic of degree n.
The y-degree d of H is chosen as a value about n1/3 . Let d′ = ⌈n/d⌉, and
δ = dd′ − n < d. For any monic polynomial m(x) ∈ Fq [x] of degree d′ ≈ n2/3 ,
we then have xδ f (x) = m(x)d + Hd−1 (x)m(x)d−1 + Hd−2 (x)m(x)d−2 + · · · +
H1 (x)m(x) + H0 (x), where each Hi (x) ∈ Fq [x] is of degree 6 d − 1. Let
H(x, y) = y d + Hd−1 (x)y d−1 + Hd−2 (x)y d−2 + · · · + H1 (x)y + H0 (x).
By construction, H(x, m(x)) ≡ 0 (mod f (x)). More concretely, we can take

f (x) = xn +f1 (x) with deg f1 (x) < n2/3 , and m(x) = xd . But then, H(x, y) =
y d + xδ f1 (x). We vary f1 (x) until H(x, y) satisfies the above eight conditions.
Once a suitable polynomial H(x, y) is chosen, we need to choose a factor
base. Let S consist of all monic irreducible polynomials of Fq [x] with degrees no
more than about n1/3 . For r(x), s(x) ∈ Fq [x] with degrees no more than about
n1/3 and with gcd(r(x), s(x)) = 1, we consider two quantities: the polynomial
r(x)m(x) + s(x), and the polynomial function r(x)y + s(x) ∈ Fq (H). The
polynomial rm + s is attempted to factor completely over S, whereas the
function ry + s is attempted to factor completely in Fq (H) over a set of
primes of Fq (H) of small norms. If both the factoring attempts are successful,
we get a relation (also called a doubly smooth pair). Factorization in Fq (H)
is too difficult a topic to be elaborated in this book. It suffices for the time
being to note that factoring r(x)y + s(x) essentially boils down to factoring
its norm which is the univariate polynomial r(x)d H(x, −s(x)/r(x)) ∈ Fq [x].
Both r(x)m(x) + s(x) and this norm are polynomials of degrees no more than
about n2/3 . Sieving is carried out to identify the doubly smooth pairs.
Each trial for a relation in the FFSM, therefore, checks the smoothness
of two polynomials of degrees about n2/3 . This is asymptotically better than
trying to find one smooth polynomial of degree proportional to n (as in LSM
or CSM). The FFSM achieves a running time of Lqn [1/3, (32/9)1/3 ].

7.5 Algorithms for Elliptic Curves (ECDLP)


Conventionally, elliptic-curve groups are treated as additive groups. Let E
be an elliptic curve defined over the finite field Fq . The group Eq = EFq of
rational points on the curve E is introduced in Chapter 4. The group Eq need
not be cyclic. Let P ∈ Eq . The multiples λP of P constitute a cyclic subgroup
G of Eq . Given Q ∈ Eq , our task is to determine a λ for which Q = λP ,
provided that Q ∈ G. We call λ the discrete logarithm or the index of Q with
respect to the base P , and denote this as indP Q.
No subexponential algorithms are known to compute indices in elliptic
curves. Ideas inherent in the index calculus methods for finite fields are not
straightaway applicable to elliptic curves. Even when they are, they are likely
Discrete Logarithms 387

to be ineffective and of no practical significance. The square-root methods


of Section 7.1 are the only general-purpose algorithms known to solve the
elliptic-curve discrete logarithm problem.
For some special types of elliptic curves, good algorithms are known. For
anomalous elliptic curves, the SmartASS method is a linear-time probabilis-
tic algorithm proposed by Smart,17 Satoh and Araki,18 and Semaev.19 The
MOV algorithm reduces the problem of computing discrete logarithms in Eq
to the problem of computing indices in the finite field Fqk for some k. For
supersingular curves, the value of k is small (k 6 6). The SmartASS method
uses p-adic logarithms, whereas the MOV reduction uses pairing.
Silverman proposes the xedni calculus method 20 for computing indices in
elliptic curves. Its essential idea is to lift elliptic curves over prime fields Fp to
elliptic curve over rationals Q. The xedni calculus method has, however, been
proved to be inefficient, both theoretically and practically.
In what follows, I explain the MOV reduction algorithm. The SmartASS
method and the xedni calculus method are beyond the scope of this book.

7.5.1 MOV/Frey–Rück Reduction

The MOV/Frey–Rück reduction is proposed independently by Menezes,


Okamoto and Vanstone,21 and by Frey and Rück.22 Suppose that E is an
elliptic curve defined over Fq , and we want to compute the index of Q ∈ Eq
with respect to a base P ∈ Eq of prime order m. Let e be a pairing (like Weil
or reduced Tate pairing) of points in Eqk , where k is the embedding degree.
We take the first argument of e from the subgroup G of Eq , generated by P .
In order that e is non-degenerate, we need to take the second argument of e
from a subgroup G′ 6= G of Eqk . Since m is prime, any non-zero element of
G′ is a generator of G′ . We choose a random generator R of G′ , and reduce
the computation of indP Q to the computation of a discrete logarithm in the
finite field Fqk . This reduction is explained in Algorithm 7.1.

17 Nigel P. Smart, The discrete logarithm problem on elliptic curves of trace one, Journal

of Cryptology, 12, 193–196, 1999.


18 T. Satoh and K. Araki, Fermat quotients and the polynomial time discrete log algorithm

for anomalous elliptic curves, Commentarii Mathematici Universitatis Sancti Pauli, 47, 81–
92, 1998.
19 Igor A. Semaev, Evaluation of discrete logarithms on some elliptic curves, Mathematics

of Computation, 67, 353–356, 1998.


20 Joseph H. Silverman, The xedni calculus and the elliptic curve discrete logarithm prob-

lem, Design, Codes and Cryptography, 20, 5–40, 2000.


21 Alfred J. Menezes, T. Okamoto and Scott A. Vanstone, Reducing elliptic curve loga-

rithms to a finite field, IEEE Transactions on Information Theory, 39, 1639–1646, 1993.
22 Gerhard Frey and Hans-Georg Rück, A remark concerning m-divisibility and the dis-

crete logarithm problem in the divisor class group of curves, Mathematics of Computation,
62, 865–874, 1994.
388 Computational Number Theory

Algorithm 7.1: The MOV/Frey–Rück reduction


Let k be the embedding degree for E, q and m.
Let G′ be a subgroup of Eqk of order m, not containing P .
Choose a random non-zero element R from G′ .
Let α = e(P, R).
Let β = e(Q, R).
Compute λ = indα β in Fqk .
Return λ.

If k > 1, then R can be chosen as follows. There is no need to construct the


group G′ explicitly. We keep on picking random non-zero points R′ ∈ Eqk \ Eq
and setting R = (|Eqk |/m)R′ until we find an R 6= O. If m is close to q, this
random search succeeds within only a few iterations. Finding random elements
R′ in Eqk and computing the multiple R of R′ can be done in (probabilistic)
polynomial time in k log q.
Let λ = indP Q. By the choice of G′ and R, the pairing values α, β ∈
Fqk are non-degenerate. By the bilinearity of e, we have αλ = e(P, R)λ =
e(λP, R) = e(Q, R) = β. This establishes the correctness of Algorithm 7.1.
For computing the discrete logarithm indα β in the field Fqk , one may
use a subexponential algorithm like those discussed earlier in this chapter.
In order that this computation is feasible, k should be small. Pairing-friendly
curves (like supersingular curves), therefore, admit subexponential solutions
to ECDLP. For a general curve E, on the other hand, k is rather large (of the
order of q), and Algorithm 7.1 does not yield a subexponential solution.

Example 7.18 Consider the curve E : Y 2 = X 3 + 3X defined over F43 (See


Example 4.68). We take the base point P = (1, 2) which has order 11, and
want to compute indP Q, where Q = (36, 25). The embedding degree in this
case is k = 2. We represent F432 as F43 (θ), where θ2 + 1 = 0. A point of order
11 outside EF43 is R = (7 + 2θ, 37 + 33θ). Miller’s algorithm for Weil pairing
(Algorithm 4.2) gives α = em (P, R) = 7 + 34θ, and β = em (Q, R) = 26 + 20θ.
In the field F432 , we have indα β = 7. This gives Q = 7P . One can easily verify
that this is the correct relation between P and Q. ¤
Discrete Logarithms 389

Exercises
1. Let h ∈ F∗q have order m (a divisor of q −1). Prove that for a ∈ F∗q , the discrete
logarithm indh a exists if and only if am = 1.
2. Let E be an elliptic curve defined over Fq , and P ∈ Eq a point of order m.
Prove or disprove: For a point Q ∈ Eq , the discrete logarithm indP Q exists if
and only if mQ = O.
3. Suppose that g and g ′ are two primitive elements of F∗q . Show that if one can
compute discrete logarithms to the base g in O(f (log q)) time, then one can
also compute discrete logarithms to the base g ′ in O(f (log q)) time. (Assume
that f (log q) is a super-polynomial expression in log q.)
4. Suppose that g is a primitive element of a finite field Fq , where q is a power of 2.
Prove that computing indg a is polynomial-time equivalent to the computation
of the parity of indg a.
5. Explain how the baby-step-giant-step method can be used to compute the
order of an element in a finite group of size n. (Assume that the prime factor-
ization of n is unknown.)
6. Let G be a finite group, and g ∈ G. Suppose that an a = g x is given together
with the knowledge that i 6 x 6 j for some known i, j. Let k = j − i + 1.
Describe how the baby-step-giant-step √ method to determine x can be modified√
so as to use a storage for only O( k) group elements and a time of only O˜( k)
group operations.
7. Let n = pq be the product of two distinct odd primes p, q of the same bit size.
(a) Let g ∈ Z∗n . Prove that ordn g divides φ(n)/2.
(b) Conclude that g (n+1)/2 ≡ g x (mod n), where x = (p + q)/2.
(c) Use Exercise 7.6 to determine x. Demonstrate how you can factor n from
the knowledge of x.
(d) Prove that this factoring algorithm runs in O˜(n1/4 ) time.
8. Let g1 , g2 , . . . , gt , a belong to a finite group G. The multi-dimensional discrete-
logarithm problem is to find integers x1 , x2 , . . . , xt such that a = g1x1 g2x2 · · · gtxt
(if such integers exist). Some r is given such that a has the above represen-
tation with 0 6 xi < r for all i. Devise a baby-step-giant-step method to
compute x1 , x2 , . . . , xt using only O˜(rt/2 ) group operations.
9. Discuss how the Pollard rho and lambda methods (illustrated in Examples 7.2
and 7.3 for prime fields) can be modified to work for extension fields Fpn .
10. Let γij be the exponent of the small prime pj in the i-th relation in the basic
index calculus method for a prime field Fp . Assume that a random integer
g α (mod p) has probability 1/phj for being divisible by phj , where pj is a small
prime, and h is a small exponent. Determine the probability that γij 6= 0.
11. Let C be the coefficient matrix obtained in the first stage of the LSM for the
prime field Fp . Count the expected number of non-zero entries in the j-th
390 Computational Number Theory

column of C for the following cases:


(1) j = 0 (the first column corresponding to indg (−1)).
(2) 1 6 j 6 t (the next t columns corresponding to the variables indg pj ).
(3) t + 1 6 j 6 2M + t + 1 (the last 2M + 1 columns corresponding to the
variables indg (H + c)).
12. Suppose that in the linear sieve method for computing discrete logarithms in
Fp , we obtain an m × n system of congruences, where n = t + 2M + 2, and
m = 2n. Assume that the T (c1 , c2 ) values behave as random integers (within
a bound). Calculate the expected number of non-zero entries in the m × n
coefficient matrix. You may use the asymptotic formula that, for a positive
real number x, the sum of the reciprocals of the primes 6 x is approximately
ln ln x + B1 , where B1 = 0.2614972128 . . . is the Mertens constant. (Remark:
The expected number of non-zero entries is significantly smaller than the ob-
vious upper bound O(m log p). However, the assumption that the probability
that a small prime q divides a candidate T (c1 , c2 ) is the same as the probability
that q divides a smooth value of T (c1 , c2 ) is heuristic.)
13. Suppose that we want to adapt the dual sieve of Exercise 6.18 to the linear
sieve method over prime fields. We use two sieves each on a sieving interval
of length half of the sieving length in the original LSM.
(a) Count the total number of candidates checked for smoothness by the dual
sieve. Identify what problem the dual sieve faces.
(b) Explain how you can increase the sieving length of each sieve to avoid the
problem of Part (a). What new problem does this increase introduce?
14. Describe the second stage for the residue-list sieve method for prime fields Fp .
Discuss how the residue lists available from the first stage continue to be used
in the second stage. Note that the second stage must run in L[1/2] time.
15. Describe the second stage for the Gaussian integer method for prime fields Fp .
The running time of the second stage should be L[1/2].
16. Count the number of triples (c1 , c2 , c3 ) (with c1 + c2 + c3 = 0 and −M 6 c1 6
c2 6 c3 6 M ) in the cubic sieve method for prime fields Fp .
17. Describe how sieving is done in the cubic sieve method for prime fields, in
order to find all the solutions of the congruence T (c1 , c2 , c3 ) ≡ 0 (mod q h ),
where q is a small prime in the factor base, and h is a small positive integer.
18. Let N (k, m) denote the number of polynomials in F2 [x] with degree k and
with all irreducible factors of degree 6 m. Also, let I(i) be the number of
irreducible polynomials in F2 [x] of degree i. Prove that
m X µ ¶
X r + I(i) − 1
N (k, m) = N (k − ri, i − 1) .
i=1
r
r>1

19. Modify the second stage of the linear sieve method for prime fields to work
for the fields F2n . What is the running time of this modified second stage?
20. Modify the second stage of the cubic sieve method for prime fields to work for
the fields F2n . What is the running time of this modified second stage?
Discrete Logarithms 391

21. Prove that the number of triples (c1 , c2 , c3 ) in the cubic sieve method for the
field F2n is 32 × 4m−1 + 2m−1 + 13 .
22. Propose an adaptation of the residue-list sieve method for the fields F2n .
23. Extend the concept of large prime variation to the case of the fields F2n .
24. Prove that there are exactly 22b−1 pairs (c1 (x), c2 (x)) in Coppersmith’s
method with deg c1 < b, deg c2 < b, and gcd(c1 , c2 ) = 1. Exclude the pair
(0, 1) in your count.
(d) (d) (d)
25. A gray code23 G0 , G1 , . . . , G2d −1 of dimension d is the enumeration of all
(1)
d-bit strings, defined recursively as follows. For
( d = 1, we have G0 = 0 and
(d−1)
(1) (d) 0Gk if 0 6 k < 2d−1 ,
G1 = 1, whereas for d > 2, we have Gk = (d−1)
1G2d −k−1 if 2d−1 6 k < 2d .
(d) (d)
Prove that for 1 6 k < 2d , the bit strings Gk−1 and Gk differ in exactly one
bit position given by v2 (k) (the multiplicity of 2 in k).
26. In this exercise, we explore Gordon and McCurley’s polynomial sieving pro-
cedure24 in connection with the linear sieve method for F2n . Let w(x) be an
irreducible polynomial in B of low degree, h a small positive integer, and
δ = m − 1 − h deg w(x). We find polynomials c1 , c2 of degrees < m satisfy-
ing T (c1 , c2 ) ≡ xǫ f1 (x) + (c1 (x) + c2 (x))xν + c1 (x)c2 (x) ≡ 0 (mod w(x)h ).
Suppose that, for a fixed c1 , a solution of this congruence is c̄2 (x). Then, all
the solutions for c2 (x) are c2 (x) = c̄2 (x) + u(x)w(x)h for all polynomials u(x)
of degrees < δ. Describe how the δ-dimensional gray code can be used to
efficiently step through all these values of c2 (x). The idea is to replace the
product u(x)w(x)h by more efficient operations. Complete the description of
the sieve, based upon this strategy. Deduce the running time for this sieve.
27. Argue that the basic index calculus method for Fqn (with small q) can be
designed to run in L(q n , 1/2, c) time.
28. Extend the linear sieve method for characteristic-two fields to compute indices
in Fqn . Assume that q is small.
29. How can sieving be done in the extended linear sieve method of Exercise 7.28?
30. How can you modify Coppersmith’s method in order to compute indices in
non-binary fields of small characteristics (like three or five)?

Programming Exercises
Using the GP/PARI calculator, implement the following.
31. The baby-step-giant-step method for prime finite fields.
32. The Pollard rho method for binary finite fields.
33. The Pollard lambda method for elliptic curves over finite fields.
23 The Gray code is named after the American physicist Frank Gray (1887–1969).
24 DanielM. Gordon and Kevin S. McCurley, Massively parallel computation of discrete
logarithms, CRYPTO, 312–323, 1992.
392 Computational Number Theory

34. The sieving step for the linear sieve method for prime fields.
35. The sieving step for the linear sieve method for binary finite fields.
36. Relation generation in the basic index calculus method in the field F3n .
Chapter 8
Large Sparse Linear Systems

8.1 Structured Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395


8.2 Lanczos Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404
8.3 Wiedemann Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410
8.4 Block Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
8.4.1 Block Lanczos Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415
8.4.2 Block Wiedemann Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428

Algorithms for solving large sparse linear systems over the finite rings ZM con-
stitute the basic theme for this chapter. Arguably, this is not part of number
theory. However, given the importance of our ability to quickly solve such sys-
tems in connection with factoring and discrete-logarithm algorithms, a book
on computational number theory has only little option to ignore this topic.
The sieving phases of factoring and discrete-log algorithms are massively par-
allelizable. On the contrary, the linear-algebra phases resist parallelization
efforts, and may turn out to be the bottleneck in practical implementations.
Throughout this chapter, we plan to solve the linear system of congruences:

Ax ≡ b (mod M ), (8.1)

where A is an m × n matrix with elements form ZM , x an n × 1 vector, b an


m × 1 vector, and M > 2 an integer modulus. In earlier chapters, the letter
n indicates other quantities. In Chapter 6, n was the integer to be factored,
whereas for computing discrete logarithms over F2n , the extension degree was
n. In this chapter, n (and also m) are subexponential expressions in the input
sizes (log n for factoring, and n for discrete logarithms in F2n ). I hope that
this notational inconsistency across chapters will not be a source of confusion.
In several ways, systems generated by integer-factoring algorithms differ
from those generated by discrete-log algorithms. For factoring, the system is
homogeneous (that is, b = 0), and n > m. Our goal is to obtain a non-zero
solution x to the system. Such a solution belongs to the null space of A, which
has dimension > n − m. If n ≫ m (like n = 2m), we expect to find many such
solutions x that lead to non-trivial Fermat congruences capable of splitting
the input integer. Finally, the modulus is M = 2 for factoring algorithms.
On the contrary, discrete-log algorithms yield non-homogeneous systems
(that is, b 6= 0). We also desire the systems to be of full (column) rank, so
we arrange to have m ≫ n (like m = 2n). Finally, the modulus M is usually
a large integer (like q − 1 for computing discrete logarithms in Fq ).

393
394 Computational Number Theory

In many senses, these basic differences do not matter, both theoretically


and practically. For example, suppose that we like to compute a solution x of
the homogeneous system Ax = 0. For a random x1 , we compute b = Ax1 . If
b = 0, we are done. So assume that b 6= 0. Since A(x + x1 ) = Ax + Ax1 = b,
this means that if we can solve the non-homogeneous system Ay = b, a
solution for the corresponding homogeneous system can be obtained as x =
y − x1 . So, without loss of generality, we assume that b 6= 0 in Eqn (8.1).
Many algorithms we are going to discuss generate a square system

Bx ≡ c (mod M ) (8.2)

by multiplying Eqn (8.1) by A t (transpose of A). Here, B = A t A is an n × n


matrix, and c = A t b is an n × 1 vector. Clearly, a solution of Eqn (8.1) is a
solution of Eqn (8.2) too. The converse is also true with high probability.
The modulus M has some bearing on the choice of the algorithm for solving
the system. The so-called block methods are specifically suited to the modulus
M = 2. If M > 3, we usually employ the non-block variants. It is worthwhile
to note here that M need not be a prime for every system we obtain from
a discrete-log algorithm. This, in turn, implies that ZM is not necessarily a
field. Let us see how serious the consequences of this are.
First, suppose that a complete prime factorization of the modulus M is
known. We solve the system modulo the prime divisors of M , and then lift the
solutions to appropriate powers of these primes. The lifting procedure is sim-
ilar to Hensel’s lifting described in Section 1.5, and is studied in Exercise 8.1.
In some cases, a partial factorization of M is only available. For discrete loga-
rithms in Fq with q odd, the modulus q − 1 can be written as q − 1 = 2s t with
t odd. It is a good idea to also remove factors of small primes (other than 2)
from M , before applying a system solver for this modulus.
If M is a modulus with unknown factorization, an effort to factor M and
then use the procedure described in the last paragraph is usually too much
of an investment of time. We instead pretend that M is a prime (that is, ZM
is a field), and apply a system-solver algorithm with the modulus. If, during
the running of this algorithm, an attempt is made to invert a non-zero non-
invertible element, the algorithm fails. But then, we also discover a non-trivial
factorization M = M1 M2 . We continue solving the system modulo both M1
and M2 , pretending again that M1 and M2 are primes.
A standard algorithm to solve linear systems is Gaussian elimination which
converts the coefficient matrix A to a (reduced) row echelon form, and then
obtains the values of the variables by backward substitution. For an m × n
system with m = Θ(n), this method takes O(m3 ) (or O(n3 )) time which is
unacceptably high for most factoring and discrete-log algorithms.
A commonness of all the sieving algorithms is that the linear systems
produced by them are necessarily sparse, that is, there are only a few non-zero
entries in each row (or column, not necessarily both). This sparsity needs to be
effectively exploited to arrive at O˜(m2 ) (or O˜(n2 )) algorithms for completing
the linear-algebra phase. This chapter is an introduction to these sparse system
Large Sparse Linear Systems 395

solvers. I do not make any effort here to explain the standard cubic algorithms
like Gaussian elimination, but straightaway jump to the algorithms suitable
for sparse systems. Structured Gaussian elimination applies to any modulus
M . The standard Lanczos and Wiedemann methods are typically used for odd
moduli M , although they are equally applicable to M = 2. The block versions
of Lanczos and Wiedemann methods are significantly more efficient compared
to their standard versions, but are meant for M = 2 only.
The structure of a typical sparse matrix A from the sieve algorithms merits
a discussion in this context. Each row of A is sparse, but the columns in A have
significant variations in their weights (that is, counts of non-zero elements). A
randomly chosen integer is divisible by a small prime p with probability about
1/p. The column in A corresponding to p is expected to contain about m/p
non-zero entries. If p is small (like p = 2), the column corresponding to p is
quite dense. On the contrary, if p is relatively large (like the millionth prime
15,485,863), the column corresponding to it is expected to be rather sparse.
The columns in A corresponding to factor-base elements other than small
primes (like H + c in the linear sieve method) are expected to contain only a
small constant number of non-zero elements. In view of these observations, we
call some of the columns heavy, and the rest light. More concretely, we may
choose a small positive real constant α (like 1/32), and call a column of A
heavy if its weight is more than αm, light otherwise.
Usually, each non-zero entry in A is a small positive or negative integer,
even when the modulus M is large. Here, small means absolute values no
larger than a few hundreds. It is preferable to represent a negative entry −a
by −a itself, and not by the canonical representative M − a. This practice
ensures that each entry in A can be represented by a single-precision signed
integer. The matrix A is typically not stored in a dense format (except perhaps
for M = 2, in which case multiple coefficients can be packed per word). We
instead store only the non-zero entries in a row-major format.
In the situations where we solve Eqn (8.2) instead of Eqn (8.1), it is often
not advisable to compute B = A t A explicitly, for B may be significantly
denser than A. It is instead preferable to carry out a multiplication by B as
two multiplications by the sparse matrices A and A t . While multiplying by
A t , we often find it handy to have a row-major listing of the non-zero elements
of A t , or equivalently a column-major listing of A.

8.1 Structured Gaussian Elimination


Structured Gaussian elimination (SGE) is used to reduce the dimensions of
a sparse matrix by eliminating some of its rows and columns. The reduction in
the size of the matrix may be quite considerable. Indeed, SGE may reduce the
size of a matrix to such an extent that applying standard Gaussian elimination
396 Computational Number Theory

on the reduced system becomes more practical than applying the quadratic
sparse solvers we describe later in this chapter. This is particularly important
for M = 2, since in this case multiple coefficients can be packed per word
in a natural way, and standard Gaussian elimination can operate on words,
thereby processing multiple coefficients in each operation.
There are some delicate differences in the SGE procedure between the cases
of factorization and discrete-log matrices. Here, I supply a unified treatment
for both m > n and m 6 n. For dealing with matrices specifically from
factoring algorithms, one may look at the paper by Bender and Canfield.1
Matrices from discrete-log algorithms are studied by LaMacchia and Odlyzko.2
During the execution of SGE, we call certain columns of A heavy, and
the rest light. An initial settlement of this discrimination may be based upon
the weights (the weight of a row or column of A is the number of non-zero
entries in that row or column) of the columns in comparison with αm for
a predetermined small positive fraction α (like 1/32). Later, when rows and
columns are removed, this quantitative notion of heaviness or lightness may be
violated. The steps of SGE attempt to keep the light columns light, perhaps
at the cost of increasing the weights of the heavy columns.
Step 1: Delete columns of weights zero and one.
(a) Remove all columns of weight zero. These columns correspond to vari-
ables which do not appear in the system at all, and can be discarded altogether.
(b) Remove all columns of weight one and the rows containing these non-
zero entries. Each such column refers to a variable that appears in exactly one
equation. When the values of other variables are available, the value of this
variable can be obtained from the equation (that is, row) being eliminated.
After the completion of Step 1, all columns have weights > 2. Since the
matrix has potentially lost many light columns, it may be desirable to declare
some light columns as heavy. The obvious choices are those having the highest
weights among the light columns. This may be done so as to maintain the
heavy-vs-light discrimination based upon the fraction α. Notice that the value
of m (the number of rows) reduces after every column removal in Step 1(b).
Step 2: Delete rows of weights zero and one.
(a) A row of weight zero stands for the equation 0 = 0, and can be elim-
inated. Although such a row may be absent in the first round of the steps of
SGE, they may appear in later rounds.
(b) Let the row Ri contain only one non-zero entry. Suppose that this
entry corresponds to the variable xj . This row supplies the value of xj (we
assume that the non-zero entry is invertible modulo M ). Substitute this value
of xj in all the equations (rows) where xj occurs. Delete Ri and the column
(light or heavy) corresponding to xj . Repeat for all rows of weight one.
1 Edward A. Bender and E. Rodney Canfield, An approximate probabilistic model for

structured Gaussian elimination, Journal of Algorithms, 31(2), 271–290, 1999.


2 Brian A. LaMacchia and Andrew M. Odlyzko, Solving large sparse linear systems over

finite fields, CRYPTO, 109–133, 1991.


Large Sparse Linear Systems 397

Step 3: Delete rows of weight one in the light part.


Consider each row Ri whose intersection with all the light columns contains
exactly one non-zero entry. Let this entry correspond to the variable xj . The
row Ri allows xj to be written in terms of some variables corresponding to
heavy columns. In all the equations (rows) where xj occurs, substitute this
value of xj to eliminate xj from these other rows. Then, remove the row Ri
and the column corresponding to the variable xj .
After all the appropriate rows for Steps 2 and 3 (and the corresponding
columns) are removed, it may be necessary to redefine heavy and light columns
as done at the end of Step 1.
Step 4: Delete redundant rows.
If the matrix contains more rows than columns (as is typical in the case of
discrete-log matrices), we may throw away some redundant rows. Rows with
the largest number of non-zero entries in the light columns are good candidates
for removal, because their removal makes the light part even lighter.
After Steps 2–4 are executed, some columns and/or rows may be exposed
as having weights one and zero. It is, therefore, necessary to repeat the round
of above steps until no further reduction is possible.

Example 8.1 Let me demonstrate the working of structured Gaussian elimi-


nation on a system of 25 equations in 25 variables. This system is not generated
by any sieve method. The randomly generated entries ensure that the first few
columns are heavy. We start with the following set of equations. All equations
in this example are over F2 , that is, congruences modulo M = 2.

x18 = 1
x1 + x8 + x17 = 0
x1 + x3 + x6 + x11 + x23 = 1
x2 + x5 + x23 = 1
x3 + x6 + x21 + x22 + x23 = 1
x2 + x3 + x13 + x21 = 0
x1 + x2 + x7 = 1
x22 = 1
x1 + x2 + x5 + x6 + x9 + x10 + x21 = 0
x1 + x3 + x14 + x18 = 0
x1 + x2 = 1
x1 + x5 = 1
x3 + x4 + x5 + x16 + x24 = 0
x1 + x3 + x13 + x20 = 1
x1 + x4 + x6 + x13 + x14 + x24 = 0
x2 = 1
398 Computational Number Theory

x3 + x23 = 0
x4 + x10 + x16 + x20 = 0
x1 + x6 + x11 + x24 = 0
x10 + x11 + x16 + x21 = 1
x1 + x2 + x3 + x4 + x11 + x21 = 0
x1 + x2 + x4 + x17 = 0
x1 = 0
x1 + x4 + x18 = 0
x1 + x2 + x25 = 0
Below, the system is written in the matrix form. Only the reduced matrix A
(initially the 25 × 25 coefficient matrix) and the reduced vector b are shown.
A b
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
H H H H LH LLL L L L L L L L L L L L H L L L L
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1
2 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
3 1 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1
4 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1
5 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1
6 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0
7 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1
9 1 1 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
10 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0
11 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
12 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
13 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0
14 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1
15 1 0 0 1 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0
16 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
17 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
18 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0
19 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
20 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1
21 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
22 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
23 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
24 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
25 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0

The indices of the undeleted rows and columns are shown as row and
column headers. Let us take the heaviness indicator fraction as α = 1/5, that
is, a column is called heavy if and only if contains at least αm = m/5 non-zero
entries, where m is the current count of rows in A. Heavy and light columns
are marked by H and L respectively.
Large Sparse Linear Systems 399

Round 1

Step 1(a): Columns indexed 12, 15 and 19 have no non-zero entries, that is,
the variables x12 , x15 and x19 appear in no equations. We cannot solve for
these variables form the given system. Eliminating these columns reduces the
system to the following:

A b
1 2 3 4 5 6 7 8 9 10 11 13 14 16 17 18 20 21 22 23 24 25
H H H H LH LLL L L L L L L L L H L L L L
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1
2 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
3 1 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1
4 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1
5 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1
6 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0
7 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1
9 1 1 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0
10 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0
11 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
12 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
13 0 0 1 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0
14 1 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1
15 1 0 0 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0
16 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
17 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
18 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0
19 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0
20 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 1 0 0 0 0 1
21 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0
22 1 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
23 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
24 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
25 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0

Step 1(b): The columns with single non-zero entries are indexed 7, 8, 9, 25.
We delete these columns and the rows containing these non-zero entries (rows
7, 2, 9, 25). We will later make the following substitutions:

x7 = 1 + x1 + x2 , (8.3)
x8 = 0 + x1 + x17 , (8.4)
x9 = 0 + x1 + x2 + x5 + x6 + x10 + x21 , (8.5)
x25 = 0 + x1 + x2 . (8.6)
400 Computational Number Theory

The system reduces as follows:


A b
1 2 3 4 5 6 10 11 13 14 16 17 18 20 21 22 23 24
H H H H LL L L L L L L L L L L L L
1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1
3 1 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 1
4 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1
5 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 1 1 0 1
6 0 1 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0
8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1
10 1 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0
11 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
12 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1
13 0 0 1 1 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0
14 1 0 1 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1
15 1 0 0 1 0 1 0 0 1 1 0 0 0 0 0 0 0 1 0
16 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
17 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
18 0 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0
19 1 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0
20 0 0 0 0 0 0 1 1 0 0 1 0 0 0 1 0 0 0 1
21 1 1 1 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0
22 1 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
23 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
24 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0

Step 2(a): At this instant, there are no zero rows, so this step is not executed.
Step 2(b): We look at rows with single non-zero entries. Row 1 is the first
case with the sole non-zero entry at Column 18. We get the immediate solution
x18 = 1 (8.7)
The variable x18 occurs in Rows 10 and 24. We substitute the value of x18 in
these equations, and delete Row 1 and Column 18. This step is subsequently
repeated for the following rows of weight one.
Row index Column index Rows adjusted
8 22 5
16 2 4, 6, 11, 21, 22
23 1 3, 10, 11, 12, 14, 15, 19, 21, 22, 24
24 4 13, 15, 18, 21, 22
In the process, we obtain the solutions for the following variables:
x22 = 1 (8.8)
x2 = 1 (8.9)
Large Sparse Linear Systems 401

x1 = 0 (8.10)
x4 = 1 (8.11)
After all these five iterations of Step 2(b), the system reduces to:
A b
3 5 6 10 11 13 14 16 17 20 21 23 24
H LH L H L L L L L H H L
3 1 0 1 0 1 0 0 0 0 0 0 1 0 1
4 0 1 0 0 0 0 0 0 0 0 0 1 0 0
5 1 0 1 0 0 0 0 0 0 0 1 1 0 0
6 1 0 0 0 0 1 0 0 0 0 1 0 0 1
10 1 0 0 0 0 0 1 0 0 0 0 0 0 1
11 0 0 0 0 0 0 0 0 0 0 0 0 0 0
12 0 1 0 0 0 0 0 0 0 0 0 0 0 1
13 1 1 0 0 0 0 0 1 0 0 0 0 1 1
14 1 0 0 0 0 1 0 0 0 1 0 0 0 1
15 0 0 1 0 0 1 1 0 0 0 0 0 1 1
17 1 0 0 0 0 0 0 0 0 0 0 1 0 0
18 0 0 0 1 0 0 0 1 0 1 0 0 0 1
19 0 0 1 0 1 0 0 0 0 0 0 0 1 0
20 0 0 0 1 1 0 0 1 0 0 1 0 0 1
21 1 0 0 0 1 0 0 0 0 0 1 0 0 0
22 0 0 0 0 0 0 0 0 1 0 0 0 0 0

The columns are relabeled as heavy/light after all the iterations of Step 2(b).
Now, there are 16 rows, so columns with at most three non-zero entries are
light, and those with more than three entries are heavy.
Step 3: The first row to contain only one non-zero entry in the light columns
is Row 4 with that entry being in Column 5. This gives us the following future
possibility of back substitution:
x5 = 0 + x23 . (8.12)
We substitute this expression for x5 in all other equations where x5 appears.
Here, these other equations are 12 and 13. We effect this substitution by a
subtraction (same as addition modulo 2) of Row 4 from Rows 12 and 13. The
modified system is shown below. Notice that the subtraction of (multiples of)
the row being deleted from other rows may change the weights of some columns
other than the one being deleted. Consequently, it is necessary to relabel each
column (or at least the columns suffering changes) as heavy/light, and restart
the search for rows with single non-zero entries in the new light columns.
402 Computational Number Theory

A b
3 6 10 11 13 14 16 17 20 21 23 24
H H L H H L H L L H H H
3 1 1 0 1 0 0 0 0 0 0 1 0 1
5 1 1 0 0 0 0 0 0 0 1 1 0 0
6 1 0 0 0 1 0 0 0 0 1 0 0 1
10 1 0 0 0 0 1 0 0 0 0 0 0 1
11 0 0 0 0 0 0 0 0 0 0 0 0 0
12 0 0 0 0 0 0 0 0 0 0 1 0 1
13 1 0 0 0 0 0 1 0 0 0 1 1 1
14 1 0 0 0 1 0 0 0 1 0 0 0 1
15 0 1 0 0 1 1 0 0 0 0 0 1 1
17 1 0 0 0 0 0 0 0 0 0 1 0 0
18 0 0 1 0 0 0 1 0 1 0 0 0 1
19 0 1 0 1 0 0 0 0 0 0 0 1 0
20 0 0 1 1 0 0 1 0 0 1 0 0 1
21 1 0 0 1 0 0 0 0 0 1 0 0 0
22 0 0 0 0 0 0 0 1 0 0 0 0 0
Subsequently, Step 3 is executed several times. I am not showing these reduc-
tions individually, but record the substitutions carried out in these steps.
x14 = 1 + x3 (8.13)
x20 = 1 + x3 + x13 (8.14)
x10 = 0 + x3 + x13 + x16 (8.15)
x16 = 1 + x3 + x23 + x24 (8.16)
x24 = 0 + x3 + x6 + x13 (8.17)
x17 = 0 (8.18)
After all these steps, the system reduces as follows:
A b
3 6 11 13 21 23
H H H H H H
3 1 1 1 0 0 1 1
5 1 1 0 0 1 1 0
6 1 0 0 1 1 0 1
11 0 0 0 0 0 0 0
12 0 0 0 0 0 1 1
17 1 0 0 0 0 1 0
19 1 0 1 1 0 0 0
20 1 0 1 1 1 0 1
21 1 0 1 0 1 0 0
This completes the first round of Steps 1–3. We prefer to avoid Step 4 in
this example, since we started with a square system, and the number of rows
always remains close to the number of columns.
Large Sparse Linear Systems 403

Round 2: Although all the columns are marked heavy now, there still remain
reduction possibilities. So we go through another round of the above steps.
Step 1(a) and 1(b): There are no columns of weight zero or one. So these
steps are not executed in this round.
Step 2(a): Row 11 contains only zero entries, and is deleted from the system.
Step 2(b): Row 12 contains a single non-zero entry giving
x23 = 1. (8.19)
We also adjust Rows 3, 5 and 17 by substituting this value of x23 . This lets
Row 17 have a single non-zero entry at Column 3, so we have another reduction
x3 = 1, (8.20)
which changes Rows 3, 5, 6, 19, 20 and 21. The reduced system now becomes
A b
6 11 13 21
H H H H
3 1 1 0 0 1
5 1 0 0 1 0
6 0 0 1 1 0
19 0 1 1 0 1
20 0 1 1 1 0
21 0 1 0 1 1

Step 3: Since all columns are heavy at this point, this step is not executed.
Another round of SGE does not reduce the system further. So we stop at
this point. The final reduced system consists of the following equations:
x6 + x11 = 1
x6 + x21 = 0
x13 + x21 = 0
x11 + x13 = 1
x11 + x13 + x21 = 0
x11 + x21 = 1
Standard Gaussian elimination on this system gives the unique solution:
x6 = 1 (8.21)
x11 = 0 (8.22)
x13 = 1 (8.23)
x21 = 1 (8.24)
Now, we work backwards to obtain the values of other variables using the
substitution equations generated so far. The way SGE works ensures that
404 Computational Number Theory

when we use a substitution equation to determine the value of a variable, all


variables appearing on the right side of the equation are already solved. For
example, when we invoke Eqn (8.17) to calculate x24 , we know the values of
x6 , x11 , x13 , x21 , x3 , x23 , x17 . The right side (x3 + x6 + x13 ) contains a subset
of only these variables. After x24 is computed, we use Eqn (8.16) to calculate
x16 (which requires the solution for x24 ). These back substitution steps are
not elaborated here. Only the final solution is given below.
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16 x17 x18 x19 x20 x21 x22 x23 x24 x25
0 1 1 1 1 1 0 0 0 0 0 − 1 0 − 0 0 1 − 1 1 1 1 1 1
Here, I use − to indicate that solutions for these variables cannot be (uniquely)
obtained from the given system. ¤

Structured Gaussian elimination is a heuristic algorithm, and lacks solid


analytic foundation (Bender and Canfield provide some estimates though). In
practice, structured Gaussian elimination can substantially reduce the size of a
sparse system. For example, LaMacchia and Odlyzko report implementations
of SGE achieving 50–90% size reduction.

8.2 Lanczos Method


The Lanczos3 method is an iterative method for solving linear systems.
Originally proposed for solving equations over R, the Lanczos method can be
adapted4 to work for finite fields. For an n × n system, it takes O˜(n2 ) time.
To start with, consider the system defined over the field R of real numbers:
Ax = b,
where A is an n × n symmetric positive-definite5 matrix, and b 6= 0. The
Lanczos method starts with an initial direction w0 . Subsequently, it keeps
on generating linearly independent directions w1 , w2 , . . . until a direction ws
is found which is linearly dependent upon w0 , w1 , . . . , ws−1 . Any n + 1 (or
more) vectors in Rn are linearly dependent, so s 6 n, that is, the Lanczos loop
terminates after at most n iterations. A solution to the original linear system
is obtained as a linear combination of the direction vectors w0 , w1 , . . . , ws−1 .
Two vectors y1 and y2 are called A-orthogonal if y1t Ay2 = 0. We take
w0 = b. (8.25)
Subsequently, for i = 1, 2, 3, . . . , we generate
3 Cornelius Lanczos (1893–1974) was a Hungarian mathematician and physicist.
4 Seethe citation of Footnote 2.
5 An n × n matrix A with real entries is called positive-definite if y tAy > 0 for all

n-dimensional real vectors y 6= 0.


Large Sparse Linear Systems 405
i−1
X wjt A2 wi−1
wi = Awi−1 − cij wj , where cij = . (8.26)
j=0
wjt Awj

One can prove by induction on i that wit Awj = 0 for i > j. Moreover, since
A is symmetric, for i < j, we have wit Awj = (wjt Awi ) t = 0. Therefore,
the direction vectors w0 , w1 , w2 , . . . are A-orthogonal to one another. Now,
ws is the first vector
Ps−1 linearly dependent upon theP previous direction vectors,
s−1
that is, ws = j=0 aj wj . But then, wst Aws = j=0 aj wst Awj = 0 by A-
orthogonality. The positive-definiteness of A implies that ws = 0, that is, the
Lanczos iterations continue until we obtain a direction vector ws = 0. Take

s−1
X wjt b
x= bj wj , where bj = . (8.27)
j=0
wjt Awj

Since x is a linear combination of w0 , w1 , . . . , ws−1 , it follows that Ax is a


linear combination of Aw0 , Aw1 , . . . , Aws−1 . By Eqn (8.26), Awj is a linear
combination of w0 , w1 , . . . , wj+1 . Therefore, Ax can be expressed as a linear
combination of w0 , w1 , . . . , ws . But ws = 0, and w0 = b, so we can write

s−1
X
Ax − b = d j wj
j=0

for some real coefficients di . Also, by the A-orthogonality of the direction


vectors and by Eqn (8.27), we have wjt Ax = bj wjt Awj = wjt b for all j,
Ps−1
0 6 j 6 s − 1. But then, (Ax − b) t (Ax − b) = j=0 dj wjt (Ax − b) = 0. This
implies that Ax − b = 0, that is, x computed as above satisfies Ax = b.
The A-orthogonality of the direction vectors implies that the coefficients
ci,j = 0 in Eqn (8.26) for 0 6 j 6 i − 3, and simplifies this formula as

(Awi−1 ) t (Awi−1 ) (Awi−2 ) t (Awi−1 )


wi = Awi−1 − t (Aw w i−1 − t (Aw wi−2 (8.28)
wi−1 i−1 ) wi−2 i−2 )

for i > 2. We include the computation of w1 in the initialization step, and


show the Lanczos procedure as Algorithm 8.1. The algorithm makes some
optimizations for implementation, like storing the vector vi = Awi and the
two scalar quantities αi = vit v and βi = wi−1 t
vi . Although different values
(vectors and scalars) are shown as suffixed, it suffices to remember these values
only from (at most) two previous iterations.
Algorithm 8.1 works for real symmetric positive-definite matrices A with
b 6= 0 (for b = 0, it gives the trivial solution x = 0). If we want to adapt this
method to work for a finite field, we encounter several problems. All these
problems can be reasonably solved, and this method can be used to solve
systems like (8.1) of our interest. We now discuss these adaptations in detail.
406 Computational Number Theory

Algorithm 8.1: Lanczos method for solving Ax = b


/* Initialize for i = 0 */
Set w0 = b.
/* Initialize for i = 1 */
Compute v1 = Aw0 , α1 = v1t v1 , and β1 = w0t v1 .
Compute w1 = v1 − α1 w0 .
µ t ¶ β1
w0 b
Compute x = w0 .
β1
Set i = 1.
While (wi 6= 0) {
Increment i by 1.
Compute vi = Awi−1 , αi = vit vi , and βi = wi−1
t
vi . µ ¶
vit vi−1
³ ´
Compute the new direction wi = vi − αi wi−1 − wi−2 .
β βi−1
µ t ¶ i
wi−1 b
Update the solution x = x + wi−1 .
βi
}
Return x.

In general, our coefficient matrix A is not symmetric. It is not even square.


However, instead of solving Ax = b, we can solve (A tA)x = (A t b). If A is
of close-to-full rank (that is, of rank nearly min(m, n), where A is an m × n
matrix), the two systems are equivalent with high probability. The matrix
A tA is not only square (n × n), but symmetric too. As mentioned earlier, we
do not compute the matrix A tA explicitly, since it may be significantly less
sparse than A. In Algorithm 8.1, A is needed during the computation of the
vectors vi only. We carry this out as A t (Awi−1 ) (instead of as (A tA)wi−1 ).
The (modified) vector A t b (needed during initialization, and updating of the
solution x) may be precomputed once before the loop.
The notion of positive-definiteness is, however, meaningless in a context
of finite fields. For real matrices, this property of A implies that y tAy = 0 if
and only if y = 0, that is, no non-zero vector can be A-orthogonal to itself,
that is, the computations in Algorithm 8.1 proceed without error, since all βi
values are non-zero (these are the only quantities by which divisions are made).
When A is a matrix over Fq , a non-zero direction vector wi−1 may lead to
t t
the situation βi = wi−1 vi = wi−1 Awi−1 = 0. This problem is rather serious
for small values of q, that is, for q smaller than or comparable to n (there
are at most n iterations, and so at most n values of βi are encountered in the
entire algorithm). If q ≫ n, the odds against encountering this bad situation
are low, and Algorithm 8.1 is expected to terminate without troubles.
A way to get around this problem is by working in a suitable extension
field Fqd for which q d ≫ n. The original system (A tA)x = (A t b) leads to
computations to proceed in Fq itself. So we choose a random invertible m × m
Large Sparse Linear Systems 407

diagonal matrix D with entries from Fqd , and solve the equivalent system
DAx = Db, that is, feed the system (DA) t (DA)x = (DA) t Db (that is,
the system (A tD2 A)x = (A tD2 b)) to Algorithm 8.1. The introduction of D
does not affect sparsity (A and DA have non-zero entries at exactly the same
locations), but the storage of DA may take more space than storing D and
A individually (since the elements of DA are from a larger field compared to
those of A). So, we keep the modified coefficient matrix A tD2 A in this factored
form, and compute vi = A t (D2 (Awi−1 )). Although the final solution belongs
to (Fq )n , the computation proceeds in Fqd , and is less likely to encounter the
problem associated with the original matrix. Still, if the computation fails, we
choose another random matrix D, and restart from the beginning.

Example 8.2 (1) Consider the 6 × 4 system modulo the prime p = 97:
   
0 0 49 56 41
 
 0 35 12 0  x1  11 
   
 77 21 0 3   x2   9 
   =  . (8.29)
 0 79 0 0  x3  8 
   
26 0 68 24 x4 51
17 0 0 53 77
The coefficient matrix is not square. Left multiplication by the transpose of
this matrix gives the square system:
    
7 65 22 10 x1 30
 65 50 32 63   x2   42 
   =  . (8.30)
22 32 88 11 x3 80
10 63 11 31 x4 62
By an abuse of notation, we let A denote the coefficient matrix of this square
system. In practice, we do not compute A explicitly, but keep it as the product
B t B, where B is the 6 × 4 coefficient matrix of the original system. We feed
System (8.30) to Algorithm 8.1. The computations made by the algorithm are
summarized in the following table.

i vi αi βi i  vi  αi βi  wi   x 
 wi  x
30 68 58 56
 42   67   85   80 
0 − − −  80  − 3   54 9  49   77 
19
   62     38  69  78 
82 16 21 36 0 64
 40   64   10   64   0   75 
1   22 10  44   56  4   51 60  0   57 
26 46
 25  5  24  77 0 34
52 79 86
 46   80   76 
2   39 90  46   65 
22
78 19 14
408 Computational Number Theory

We get w4 = 0, and the Lanczos loop terminates. The value of x after this
iteration gives the solution of System (8.30): x1 = 64, x2 = 75, x3 = 57, x4 =
34. This happens to be a solution of System (8.29) too.

(2) Let us now try to solve a 6 × 4 system modulo 7:


   
0 0 4 5 2
 
0 3 1 0  x1 1
   
5 2 0 3   x2   3 
   =   (8.31)
0 3 0 0  x3 6
   
2 0 4 6 x4 1
1 0 0 5 6

Multiplication by the transpose of the coefficient matrix gives the following


square system which we feed to Algorithm 8.1.
    
2 3 1 4 x1 2
3 1 3 6   x2   6 
   =   (8.32)
1 3 5 2 x3 6
4 6 2 4 x4 6

Since the modulus 7 is small (comparable to n = 4), the Lanczos algorithm


fails on this system, as shown below. We obtain β2 = 0 even though w1 6= 0.

i vi αi βi wi x i vi αi βi wi x
   
2 5
6 6
0 − − −   − 2   3 0 Failure
6 6
6 5
     
3 3 0
3 3 0
1   0 1    
6 6 0
3 3 0

(3) In order to make Algorithm 8.1 succeed on System (8.31), we use the
quadratic extension F49 = F7 (θ) with θ2 + 1 = 0. Since 49 is somewhat large
compared to n = 4, the computations are now expected to terminate without
error. We multiply (8.31) by the 6 × 6 diagonal matrix
 
6θ 0 0 0 0 0
 0 4θ + 3 0 0 0 0 
 
 0 0 6θ + 1 0 0 0 
D= 
 0 0 0 3θ + 5 0 0 
 
0 0 0 0 θ 0
0 0 0 0 0 2θ + 6
Large Sparse Linear Systems 409

over F49 , and generate the equivalent system


   
0 0 3θ 2θ 5θ
 
 0 5θ + 2 4θ + 3 0  x1  4θ + 3 
   
 2θ + 5 5θ + 2 0 4θ + 3   2   4θ + 3 
x
   =  , (8.33)
 0 2θ + 1 0 0  x3  4θ + 2 
   
2θ 0 4θ 6θ x4 θ
2θ + 6 0 0 3θ + 2 5θ + 1
which leads to the corresponding square system:
    
2θ θ 6 6θ + 1 x1 2θ + 1
 θ 2θ + 4 2θ 2θ   x2   5θ + 1 
   =  . (8.34)
6 2θ 3θ + 3 5 x3 3θ + 2
6θ + 1 2θ 5 θ+4 x4 2θ + 6
Algorithm 8.1 works on System (8.34) as follows:
i vi αi βi wi x
 
2θ + 1
 5θ + 1 
0 − − −   −
3θ + 2
2θ + 6
     
3θ + 4 6θ + 1 5θ
 4θ + 3   6θ + 3   4θ + 4 
1   5θ + 2 6θ + 3    
4θ + 2 θ+2 θ+1
4θ + 4 2θ + 4 3θ + 3
     
2θ 2θ 2θ + 2
 θ+2   4   4 
2   6θ + 1 5θ + 2    
5θ + 3 5θ + 2 6θ + 3
5 4θ + 3 6θ
     
1 6θ + 4 3θ + 1
 4θ + 3   6   2θ + 6 
3   4θ + 2 6θ + 4    
5θ + 6 4θ + 3 6θ + 5
5θ + 6 3θ + 5 6θ + 3
     
θ 0 5
 4θ + 4  0 2
4   4θ + 4 5θ + 3    
4 0 2
2θ 0 3
These computations do not encounter the situation that βi = 0 for wi−1 6= 0.
The final solution belongs to F7 :
x1 = 5, x2 = 2, x3 = 2, x4 = 3. ¤
If A is an m × n matrix with m = Θ(n), each iteration of Algorithm 8.1
involves O(nk) finite-field operations, where k is the maximum number of
non-zero entries in a row of A. Moreover, there are at most n iterations of the
Lanczos loop. Therefore, if k = O(log n), and n is a subexponential expression
in the size of the underlying field (as typically hold in our cases of interest),
the running time of Algorithm 8.1 is O˜(n2 ) (or O˜(m2 )).
410 Computational Number Theory

8.3 Wiedemann Method


Wiedemann’s method6 for solving linear systems uses linear recurrent se-
quences. Let a0 , a1 , a2 , . . . be an infinite sequence of elements from a field K.
The first d terms are supplied as initial conditions, and for all k > d, we have
ak = cd−1 ak−1 + cd−2 ak−2 + · · · + c1 ak−d+1 + c0 ak−d (8.35)
for some constant elements c0 , c1 , . . . , cd−1 ∈ K. Given the order d of the
sequence, and 2d terms a0 , a1 , . . . , a2d−1 in the sequence, one can determine
c0 , c1 , . . . , cd−1 as follows. For every k > d, we can rewrite Eqn (8.35) as
c 
d−1
 cd−2 
 . 
( ak−1 ak−2 ··· ak−d+1  ..  = ak .
ak−d )  
 
c1
c0
Using this relation for k = d, d + 1, . . . , 2d − 1, we write
c   a 
a d−1 d
d−1 ad−2 · · · a1 a0 
 cd−2   ad+1 
 ad ad−1 · · · a2 a1   .   . 
 . .. ..  .  =  . , (8.36)
 .  .   . 
. . ··· . ···    
c1 a2d−2
a2d−2 a2d−3 · · · ad ad−1
c0 a2d−1
that is, the coefficients c0 , c1 , . . . , cd−1 in the recurrence (8.35) can be obtained
by solving a linear system.
This method of computing c0 , c1 , . . . , cd−1 suffers from two problems. First,
under standard matrix arithmetic, the procedure takes O(d3 ) time. Faster ma-
trix arithmetic (like Strassen’s matrix multiplication7 ) reduces the exponent
below three, but the desired goal of O(d2 ) running time cannot be achieved.
The second problem is that the coefficient matrix of Eqn (8.36) may be non-
invertible. This may happen, for example, when a recurrence relation of order
smaller than d also generates the sequence a0 , a1 , a2 , . . . .
There are several ways to solve these problems. The coefficient matrix in
Eqn (8.36) has a particular structure. Each of its rows (except the topmost) is
obtained by right shifting the previous row (and introducing a new term at the
leftmost position). Such a matrix is called a Toeplitz matrix. A Toeplitz system
like (8.36) can be solved in O(d2 ) time using special algorithms. I will come
back to this topic again while studying the block Wiedemann algorithm. Here,
6 Douglas H. Wiedemann, Solving sparse linear equations over finite fields, IEEE Trans-

actions on Information Theory, 32(1), 54–62, 1986.


7 Volker Strassen, Gaussian elimination is not optimal, Numerical Mathematics, 13, 354–

356, 1969.
Large Sparse Linear Systems 411

I explain the Berlekamp–Massey algorithm8 for computing c0 , c1 , . . . , cd−1 .


The generating function for the sequence a0 , a1 , a2 , . . . is
G(x) = a0 + a1 x + a2 x2 + · · · + ad−1 xd−1 + ad xd + ad+1 xd+1 + · · ·
X
= (a0 + a1 x + a2 x2 + · · · + ad−1 xd−1 ) + ak xk
k>d
2 d−1
= (a0 + a1 x + a2 x + · · · + ad−1 x ) +
X
(cd−1 ak−1 + cd−2 ak−2 + · · · + c1 ak−d+1 + c0 ak−d )xk
k>d

= R(x) + (cd−1 x + cd−2 x2 + · · · + c0 xd )G(x),


that is,
C(x)G(x) = R(x), (8.37)
where R(x) is a polynomial of degree 6 d − 1, and
C(x) = 1 − cd−1 x − cd−2 x2 − · · · − c0 xd . (8.38)
In order to compute c0 , c1 , . . . , cd−1 , it suffices to compute C(x). To that end,
we use an extended gcd computation as follows. Let
A(x) = a0 + a1 x + a2 x2 + · · · + a2d−1 x2d−1 .
Then, Eqn (8.37) can be rewritten as
C(x)A(x) + B(x)x2d = R(x)
for some polynomial B(x). These observations lead to Algorithm 8.2 for com-
puting the coefficients c0 , c1 , . . . , cd−1 in Eqn (8.35), given the first 2d terms
a0 , a1 , . . . , a2d−1 in the sequence. The extended gcd computation maintains
an invariance of the form B1 (x)x2d + C1 (x)A(x) = R1 (x). Since the multiplier
B1 (x) is not needed, it is not explicitly computed.

Algorithm 8.2: Berlekamp–Massey algorithm


Let R0 (x) = x2d and R1 (x) = a0 + a1 x + a2 x2 + · · · + a2d−1 x2d−1 .
Initialize C0 (x) = 0 and C1 (x) = 1.
While (deg R1 (x) > d) {
Let Q(x) = R0 (x) quot R1 (x) and R(x) = R0 (x) rem R1 (x).
Update C(x) = C0 (x) − Q(x)C1 (x).
Prepare for next iteration:
R0 (x) = R1 (x), R1 (x) = R(x), C0 (x) = C1 (x) and C1 (x) = C(x).
}
Divide C1 (x) by its constant term.
Recover the coefficients c0 , c1 , . . . , cd−1 from C1 (x).

8 Originally, Elwyn R. Berlekamp proposed this algorithm for decoding BCH codes (see

Berlekamp’s 1968 book Algebraic Coding Theory). James Massey (Shift-register synthe-
sis and BCH decoding, IEEE Transactions on Information Theory, 15(1), 122–127, 1969)
simplified and modified Berlekamp’s algorithm to the form presented in this book.
412 Computational Number Theory

Example 8.3 (1) Consider the sequence from F7 defined as


a0 = 0, a1 = 1, a2 = 2, ak = 3ak−1 + ak−3 for k > 3. (8.39)
The first six terms in this sequence are 0, 1, 2, 6, 5, 3. The extended gcd com-
putation in Algorithm 8.2 is shown below:
i Qi (x) Ri (x) Ci (x)
0 − x6 0
1 − 3x + 5x + 6x3 + 2x2 + x
5 4
1
2 5x + 1 5x3 + 6x 2x + 6
3 2x2 + x + 3 3x2 + 4x 3x3 + 2x + 4

We obtain C(x) = 41 (4 + 2x + 3x3 ) = 1 + 4x + 6x3 = 1 − 3x − x3 which is


consistent with the recurrence (8.39).
(2) Now, consider the following sequence from F7 :
a0 = 0, a1 = 1, a2 = 5, ak = 3ak−1 + ak−3 for k > 3. (8.40)
The first six terms in the sequence are 0, 1, 5, 1, 4, 3, and Algorithm 8.2 pro-
ceeds as follows:
i Qi (x) Ri (x) Ci (x)
0 − x6 0
1 − 3x5 + 4x4 + x3 + 5x2 + x 1
2 5x + 5 3x4 + 5x3 + 5x2 + 2x 2x + 2
3 x+2 4x 5x2 + x + 4

This gives C(x) = 14 (4 + x + 5x2 ) = 1 + 2x + 3x2 = 1 − 5x − 4x2 . This appears


to contradict the recurrence (8.40). However, notice that (8.40) is not the
recurrence of smallest order to generate this sequence. Indeed, the following
is the recurrence of smallest order for this purpose.
a0 = 0, a1 = 1, ak = 5ak−1 + 4ak−2 for k > 2.
Note also that 1 − 5x − 4x2 divides 1 − 3x − x3 in F7 [x]. ¤

It is easy to argue that the Berlekamp–Massey algorithm performs a total


of only O(d2 ) basic operations in the field K.
Let us now come to Wiedemann’s algorithm. Let Ax = b be a sparse linear
system (over a field K), where A is a square n × n matrix. A is not needed to
be symmetric (or positive-definite). The characteristic equation of A is
χA (x) = det(xI − A),
where I is the n × n identity matrix. Cayley–Hamilton theorem9 states that A
satisfies χA (x), that is, χA (A) = 0. The set of all polynomials in K[x] satisfied
9 This is named after the British mathematician Arthur Cayley (1821–1895) and the Irish

mathematician and physicist William Rowan Hamilton (1805–1865).


Large Sparse Linear Systems 413

by A is an ideal of K[x]. The monic generator of this ideal, that is, the monic
non-zero polynomial of the smallest degree, which A satisfies, is called the
minimal polynomial of A and denoted as µA (x). Clearly, µA (x)|χA (x) in K[x].
Wiedemann’s algorithm starts by probabilistically determining µA (x). Let

µA (x) = xd − cd−1 xd−1 − cd−2 xd−2 − · · · − c1 x − c0 ∈ K[x] (8.41)

with d = deg µA (x) 6 n. Since µA (A) = 0, for any n × 1 non-zero vector v


and for any integer k > d, we have

Ak v − cd−1 Ak−1 v − cd−2 Ak−2 v − · · · − c1 Ak−d+1 v − c0 Ak−d v = 0. (8.42)

Let vk be the element of Ak v at some particular position. The sequence vk ,


k > 0, satisfies the recurrence relation

vk = cd−1 vk−1 + cd−2 vk−2 + · · · + c1 vk−d+1 + c0 vk−d

for k > d. Using the Berlekamp–Massey algorithm, we compute the polynomial



C(x) of degree d′ 6 d. But then, xd C(1/x) (the opposite of C(x)—compare
Eqns (8.38) and (8.41)) divides µA (x) in K[x]. Trying several such sequences
(corresponding to different positions in Ak v), we obtain many polynomials

xd C(1/x), whose lcm is expected to be the minimal polynomial µA (x).
In order that the Berlekamp–Massey algorithm works correctly in all these
cases, we take the obvious upper bound n for d, and supply 2n vector elements
v0 , v1 , . . . , v2n−1 . This means that we need to compute the 2n matrix-vector
products Ai v for i = 0, 1, 2, . . . , 2n − 1. Since A is a sparse matrix with O˜(n)
non-zero entries, the determination of µA (x) can be completed in O˜(n2 ) time.
For obtaining a solution of Ax = b, we use µA (x) as follows. Putting k = d
and v = b in Eqn (8.42) gives

A(Ad−1 b − cd−1 Ad−2 b − cd−2 Ad−3 b − · · · − c1 Ab) = c0 b,

that is, if c0 6= 0,

x = c−1
0 (A
d−1
b − cd−1 Ad−2 b − cd−2 Ad−3 b − · · · − c1 Ab) (8.43)

is a solution of Ax = b. The basic time-consuming task here is the computa-


tion of the d 6 n matrix-vector products Ai b for i = 0, 1, 2, . . . , d − 1—a task
that can be completed in O˜(n2 ) time.

Example 8.4 We solve the following system in F7 by Wiedemann’s method.


   
2 6 0 4 2
 
0 4 6 4  x1 1
   
6 0 0 5   x2   2 
   =  . (8.44)
4 0 1 0  x3 4
   
0 0 4 6 x4 1
0 2 0 2 1
414 Computational Number Theory

This is not a square system. Multiplication by the transpose of the coefficient


matrix yields a square system on which we apply Wiedemann’s method.
    
0 5 4 3 x1 4
 5 0 3 2   x2   4 
   =  . (8.45)
4 3 4 6 x3 0
3 2 6 6 x4 2
 
0
5
Let us choose the non-zero vector v =  , and compute Ai v for i =
2
0
0, 1, 2, . . . , 7. For i > 1, we compute Ai v as A(Ai−1 v).
i 0 1 2 3 4 5 6 7
               
0 5 6 4 0 5 4 4
5 6 5 3 5 5 4 0
Ai v                
2 2 3 6 6 0 0 3
0 1 3 1 4 0 4 2
First, we apply the Berlekamp–Massey algorithm on the first (topmost)
elements of Ai v. We supply 8 terms 0, 5, 6, 4, 0, 5, 4, 4 to Algorithm 8.2, and
obtain the output 6x3 + 2x2 + 1. But we are interested in the opposite of
this polynomial, that is, M1 (x) = x3 + 2x + 6. Substituting x by A in M1
 
0 0 0 0
0 3 1 3
gives M1 (A) =  , that is, the minimal polynomial of A is not
0 1 5 1
0 3 1 3
computed yet. So, we apply the Berlekamp–Massey algorithm on the second
position of Ai v. Upon an input of the sequence 5, 6, 5, 3, 5, 5, 4, 0, Algorithm 8.2
outputs 3x4 +2x2 +4x+1, the opposite of which is M2 (x) = x4 +4x3 +2x2 +3.
Since M2 (x) is of degree four, it must be the minimal polynomial of A (indeed,
M1 (x) divides M2 (x) in F7 [x]). That is, we have computed
µA (x) = x4 + 4x3 + 2x2 + 3 = x4 − 3x3 − 5x2 − 4.
In order to obtain the solution x, we now compute Ai b for i = 0, 1, 2, 3.
Again, we use Ai b = A(Ai−1 b) for i > 1.
i0 1 2 3
       
4 5 5 1
4 3 6 2
Ai b        
0 5 3 3
2 4 5 5
This gives
       
1 5 5 6
2 6 3 1
x = 4−1   − 3   − 5   =  .
       
¤
3 3 5 1
5 5 4 3
Large Sparse Linear Systems 415

Although the Lanczos and the Wiedemann methods look different at the
first sight, there is a commonness between them. For a non-zero vector b, the
i-th Krylov vector is defined as ui−1 = Ai−1 b. Clearly, there exists s ∈ N such
that u0 , u1 , . . . , us−1 are linearly independent, whereas u0 , u1 , . . . , us−1 , us
are linearly dependent. The span of u0 , u1 , . . . , us−1 is called the Krylov space
for A and b. Both Lanczos and Wiedemann methods express the solution of
Ax = b as a linear combination of the Krylov vectors. In view of this, these
methods are often called Krylov space methods.

8.4 Block Methods


We now specialize to systems Ax = b modulo 2. Each coefficient in A is
now a bit (0 or 1), and one packs multiple coefficients (like 32 or 64) in a com-
puter word. One operates in blocks, that is, does arithmetic at a word level so
as to process all the coefficients in a word simultaneously. These block methods
run more efficiently than the methods that handle coefficients individually. I
shortly explain the block versions of the Lanczos and the Wiedemann meth-
ods. Among several block adaptations of the Lanczos solver, I concentrate on
Montgomery’s variant.10 Another interesting variant is from Coppersmith.11
The block Wiedemann algorithm is proposed by Coppersmith.12 In this sec-
tion, I assume that A is an n × n matrix over a field K. Our case of interest
is K = F2 , but I will present the equations in a form valid for any K.

8.4.1 Block Lanczos Method


Instead of working on pairwise A-orthogonal vectors w0 , w1 , . . . , ws , we
now work with pairwise A-orthogonal subspaces W0 , W1 , . . . , Ws of K n . For
subspaces V, W of K n , we first define some standard operations.

V +W = {v + w | v ∈ V, and w ∈ W},
VtW = {v t w | v ∈ V, and w ∈ W},
AV = {Av | v ∈ V}.

V and W are called A-orthogonal if V tAW = {0}, that is, if v tAw = 0 for all
v ∈ V and w ∈ W. Let the subspace V ⊆ K n have dimension ν. Fix any basis
v0 , v1 , . . . , vν−1 of V, and consider the n × ν matrix V = (v0 v1 · · · vν−1 ).
V is called A-invertible if the ν × ν matrix V tAV is invertible. Since different
10 Peter L. Montgomery, A block Lanczos algorithm for finding dependencies over GF(2),

EuroCrypt, 106–120, 1995.


11 Don Coppersmith, Solving linear equations over GF(2): Block Lanczos algorithm, Lin-

ear Algebra and its Applications, 192, 33–60, 1993.


12 Don Coppersmith, Solving homogeneous linear equations over GF(2) via block Wiede-

mann algorithm, Mathematics of Computation, 62(205), 333–350, 1994.


416 Computational Number Theory

bases of the same subspace are related by invertible transformation matrices,


the notion of A-invertibility does not depend on the choice of the basis of V.
The block Lanczos method assumes that A is a symmetric matrix. The
subspaces W0 , W1 , . . . , Ws−1 generated by the method satisfy the conditions:
Wi is A-invertible for all i = 0, 1, 2, . . . , s − 1,
Wi and Wj are A-orthogonal for i 6= j, and
AW ⊆ W, where W = W0 + W1 + · · · + Ws−1 .
The iterations stop as soon as we obtain the zero space Ws . For the subspace
Wj , we consider the matrix Wj whose columns constitute a basis of Wj . Then,
the solution of Ax = b is given by
s−1
X
x= Wj (WjtAWj )−1 Wjt b.
j=0

In the original Lanczos method, Wi is the one-dimensional subspace of K n ,


generated by wi . In the block Lanczos method, we generate higher-dimensional
subspaces instead of individual vectors. This helps us in two ways. First, block
operations on words process multiple dimensions simultaneously. Second, the
number s of iterations reduces roughly by a factor of the word size.
It remains to explain how the subspaces Wi or equivalently the matrices
Wi of basis vectors are generated. Let ν denote the word size (like 32 or 64).
We iteratively generate n × ν matrices V0 , V1 , . . . , Vs , and select some columns
from each Vi to obtain Wi . More concretely, we choose ν ×νi selection matrices
Si , and obtain Wi = Vi Si , where Si dictates which of the νi columns of Vi are
to be included in Wi . For instance, let ν = 4, and νi = 3. We take
 
1 0 0
0 0 0
Si =  
0 1 0
0 0 1
if we plan to select the first, third and fourth columns of Vi for inclusion in Wi .
Each column of Si contains exactly one 1 in that row indicating the column
of Vi to be selected. Note that Sit Si is the νi × νi identity matrix Iνi .
We start with a randomly chosen n×ν matrix V0 . We plan to keep as many
rows of V0 in W0 as possible. The requirement is that W0 has to be A-invertible.
The selection matrix S0 is chosen accordingly, and we set W0 = V0 S0 .
After this initialization, the Lanczos loop continues for i = 1, 2, 3, . . . . In
the i-th iteration, an n × ν matrix Vi is first computed using the formula
i−1
X
t
Vi = AWi−1 Si−1 + Vi−1 − Wj Ci,j , where
j=0

Ci,j = (WjtAWj )−1 WjtA(AWi−1 Si−1


t
+ Vi−1 ) for j = 0, 1, . . . , i − 1.
If VitAVi = 0, the iterations stop.
Large Sparse Linear Systems 417

All Wj are needed to be A-invertible for the computation of Vi to succeed.


The matrix Vi computed above is A-orthogonal to Wj for j = 0, 1, 2, . . . , i − 1.
However, Vi need not be A-invertible, so we use a selection matrix Si to choose
as many columns from Vi as possible, and form an A-invertible matrix Wi :
Wi = Vi Si . (8.46)
Evidently, Wi remains A-orthogonal to all previous Wj . The A-invertibility of
Wi is required in later iterations.
Like the original Lanczos algorithm, the formula for computing Vi can be
significantly simplified. We can take Ci,j = 0 for j < i − 3, so each updating
formula for Vi uses only three previous Vj matrices (for j = i − 1, i − 2, i − 3).
The simplified formula uses the following intermediate matrices:
inv t
Wi−1 = Si−1 (Wi−1 AWi−1 )−1 Si−1
t

t t
= Si−1 (Si−1 Vi−1 AVi−1 Si−1 )−1 Si−1
t
, (8.47)
inv t
Di = Wi−1 (Vi−1 A2 Vi−1 Si−1 Si−1
t t
+ Vi−1 AVi−1 ) − Iν , (8.48)
inv t t
Ei = Wi−2 Vi−1 AVi−1 Si−1 Si−1 , (8.49)
inv t inv
Fi = Wi−3 (Iν − Vi−2 AVi−2 Wi−2 )
t 2 t t t
(Vi−2 A Vi−2 Si−2 Si−2 + Vi−2 AVi−2 )Si−1 Si−1 . (8.50)
The simplified formula is as follows:
t
Vi = AVi−1 Si−1 Si−1 − Vi−1 Di − Vi−2 Ei − Vi−3 Fi . (8.51)
This formula is valid for i > 3. For j < 0, we take Vj and Wjinv as the n × ν
and ν × ν zero matrices, and Sj as the identity matrix Iν . If so, Eqn (8.51)
holds for all i > 1. Algorithm 8.3 summarizes the block Lanczos method.

Algorithm 8.3: Block Lanczos method for solving Ax = b


/* Initialize for i < 0 */
inv inv
W−2 = W−1 = 0ν×ν , V−2 = V−1 = 0n×ν , and S−2 = S−1 = Iν .
/* Initialize for i = 0 */
Take V0 as a random n × ν matrix.
Select a maximal A-invertible set of columns of V0 as W0 = V0 S0.
Initialize the solution x = 0, and set i = 0.
/* Lanczos loop for i = 1, 2, 3, . . . , s */
while (VitAVi 6= 0) {
Increment i by 1.
inv
Compute Wi−1 , Di , Ei and Fi using Eqns (8.47)−(8.50).
Compute Vi using Eqn (8.51).
Select a maximal A-invertible set of columns of Vi as Wi = Vi Si.
t
Add Wi−1 (Wi−1 AWi−1 )−1 Wi−1
t
b to the solution vector x.
}
Return x.
418 Computational Number Theory

Example 8.5 Let me demonstrate the working of the block Lanczos method
on artificially small parameters. We solve the following symmetric 10 × 10
system modulo two. Let us take the word size as ν = 4.
    
0 1 0 1 0 0 0 1 1 0 x1 1
1 0 0 1 1 0 1 1 1 0   x2   0 
0 0 0 0 0 0 1 0 0
   
1   x3   0 

1 1 0 0 1 1 1 1 0 1  x4   1 
    
0 1 0 1 1 0 1 0 1 1  x5  =  1  .
   

0 0 0 1 0 1 0 0 0 1  x6   0 
    
0 1 1 1 1 0 1 0 1 0   x7   0 
    
1 1 0 1 0 0 0 1 0 0   x8   0 
1 1 0 0 1 0 1 0 1 0   x9   0 
0 0 1 1 1 1 0 0 0 0 x10 0
 
0 0 0 0
inv inv 0 0 0 0
Initialization for i = −2, −1: W−2 = W−1 = , V = V−1 =
0 0 0 0  −2
  0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 0

  
0 0 0 0 1 0 0 0
 
0 0 0 0 0 1 0 0
 , and S−2 = S−1 = 
0 .
0
 0 0 0 0 1 0
0 0 0 0 0 0 0 1
 
0 0 0 0
0 0 0 0
0 0 0 0
Initialization
 fori = 0: We start with the following randomly generated
0 1 00
0 1 1
1
1 1 1
0


1 0 1
0
 
0 0 0
0
V0 =  . A set of three A-invertible columns of V0 is chosen by
0 0 1
1
 
0 0 0
0
 
1 0 0
1
1 0 1
1
1 0 00  
0 1 0
0 1 1
1 1 0

  
1 0 0 1 0 1
 
0 1 0 0 0 0
the selection matrix S0 =  giving W0 = V0 S0 =   (the
0 0 0 0
 0 1
0 0 1 0 0 0
 
1 0 1
1 0 1
1 0 0
Large Sparse Linear Systems 419

first, second and fourth columns of V0 ). Finally, we initialize the solution


vector: x = ( 0 0 0 0 0 0 0 0 0 0 ) t .
 
0 1 1 0
1 0 1 1
Iteration for i = 1: First, we compute V0tAV0 =  which is
1 1 0 1
0 1 1 1
non-zero.
 So we go inside
 the loop
 body, andcompute
 the temporary
 matrices
1 1 0 1 0 1 1 0 0 0 0 0
1 0 0 0 0 1 1 1 0 0 0 0
W0inv = ,D = ,E = , and F1 =
0 0 0 0 1 0 0 1 0 1 0 0 0 0
1 0 0 1 1 1 0 
0 0 0 0 0
1 0 1 1
0 1 0 0
1 0 1 1

  
0 0 0 0 1 0 1 1
 
 0 0 0 0  1 1 0 1
 0 0 0 0 . Next, we compute V1 = 
1 1 1 0
. A selection matrix
 
0 0 0 0 1 0 0 1
 
1 0 1 1
0 0 0 0
  0 0 1 0
1 0 0 0
0 1 0 0
for V1 is S1 =  yielding W1 = V1 . Finally, we update the solution
0 0 1 0
0 0 0 1
vector to x = ( 1 1 0 1 0 0 0 1 1 1 ) t .
Two more iterations are needed before the block Lanczos algorithm ter-
minates. Computations in these iterations are shown in the table on the next
page. The second iteration (i = 2) updates the solution vector to
x = (1 1 0 1 1 1 1 1 1 0 )t ,

whereas the third iteration (i = 3) updates x to


x = (0 0 1 1 1 1 0 1 1 0).

The loop terminates after this iteration, so this is the final solution. ¤

Let us now investigate when Algorithm 8.3 may fail because of the lack
of positive-definiteness of the matrix A. Example 8.5 illustrates the situation
that V3 has become 0, and so V3tAV3 is 0 too. In the earlier iteration (i = 2), V2
is non-zero, but V2tAV2 is of rank 3 (less than its dimension 4). The selection
matrix S2 ensures that only three vectors are added to the Krylov space.
In general, any Vi offers 2ν − 1 non-empty choices for the selection matrix
Si . If Vi 6= 0, any one of these choices, that yields an A-invertible Wi , suffices.
Typically, ν > 32, whereas we solve systems of size n no larger than a few
hundreds of millions (larger systems are infeasible anyway). Moreover, during
each iteration, multiple vectors are added to the Krylov space in general, that
is, the number of iterations is expected to be substantially smaller than n.
420 Computational Number Theory

i=2 i=3 i=4


     
0 0 1 0 0 0 0 1 0 0 0 0
t 0 0 0 1 0 0 0 0 0 0 0 0
Vi−1 AVi−1      
1 0 0 0 0 0 1 0 0 0 0 0
0 1 0 1 1 0 0 0 0 0 0 0
0 0 1 0 0 0 0 1
inv 0 1 0 1 0 0 0 0
Wi−1    
1 0 0 0 0 0 1 0
0 1 0 0 1 0 0 0
0 1 0 0 0 0 1 0
0 0 1 0 0 1 0 0
Di    
0 1 0 1 0 0 1 1
1 1 1 1 1 0 0 0
0 1 1 0 0 0 1 0
0 0 1 0 1 0 0 0
Ei    
0 0 0 0 0 0 0 1
0 1 1 1 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
Fi    
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0
   
0 0 0 1 0 0 0 0
   
0 0 1 1 0 0 0 0
   
0 0 0 0 0 0 0 0
Vi    
0 0 1 1 0 0 0 0
   
1 0 1 1 0 0 0 0
   
0 0 0 0 0 0 0 0
   
1 0 1 0 0 0 0 0
0 0 0 0 0 0 
 0 0
1 0 0
0 0 0 
Si   
0 1 0
0 0 1 
1 0 0
1 0 0  
   
0 0 1  
   
0 1 1  
   
0 0 0  
Wi    
0 1 1  
   
1 1 1  
   
0 0 0  
   
1 1 0
0 0 0
Large Sparse Linear Systems 421

Thus, it is highly probable that we can always find a suitable subset of the
columns of a non-zero Vi to form an A-invertible Wi . On the other hand, if
Vi = 0, no selection matrix Si can produce an A-invertible Wi . This means
that although the modulus is small (only 2), there is not much of a need to
work in extension fields for the algorithm to succeed.

8.4.2 Block Wiedemann Method


Suppose that we want to solve an m×n system Ax = b of linear equations
over the field K. If b 6= 0, we introduce a new variable xn+1 , and convert the
original system to the homogeneous form
µ ¶
x
( A −b ) = 0n+1 .
xn+1
Any solution of this m × (n + 1) system with xn+1 = 1 gives a solution of the
original system Ax = b. Without loss in generality, we can, therefore, take
b = 0. Moreover, there is no harm in taking m = n (premultiply by A t ), that
is, we concentrate upon a square (n × n) homogeneous system Ax = 0.
In the original Wiedemann algorithm, we compute the minimal polynomial
of the scalar sequence u tAi v for a randomly chosen vector v and for a projec-
tion vector u. In the block Wiedemann method, we take a block of µ vectors
as U and a block of ν vectors as V , that is, U is an n × µ matrix, whereas V
is an n × ν matrix. We typically have µ = ν = the size of a computer word
(like 32 or 64). Take W = AV . Consider the sequence
Mi = U tAi W for i > 0
of µ × ν matrices. Let d = ⌈n/ν⌉. There exist coefficient vectors c0 , c1 , . . . , cd
of dimension ν × 1 such that the sequence Mi is linearly generated as
Mk cd + Mk−1 cd−1 + Mk−2 cd−2 + · · · + Mk−d+1 c1 + Mk−d c0 = 0µ
l m
for all k > d. Let e = ν(d+1) µ . Applying the above recurrence for k = d, d+1,
. . . , d + e − 1 yields the system
 c 
d
 M Md−1 Md−2 · · · M1 M0 
d  cd−1 
 
 Md+1 Md Md−1 · · · M2 M1   cd−2 
..   ..  = 0µe . (8.52)
   

··· ··· ··· . ··· ···  . 
 
Md+e−1 Md+e−2 Md+e−3 · · · Me Me−1 c1
c0
This (µe) × ν(d + 1) system is the block analog of Eqn (8.36). For a moment,
assume that a solution for c0 , c1 , . . . , cd is provided to us. With high proba-
bility, the sequence of n × ν matrices Ai W , i > 0, also satisfies the recurrence
as done by their µ × ν projections Mi . For all k > d, we then have
Ak W cd + Ak−1 W cd−1 + Ak−2 W cd−2 + · · · + Ak−d+1 W c1 + Ak−d W c0 = 0n .
422 Computational Number Theory

Putting k = d and using the fact that W = AV , we get


A(Ad V cd + Ad−1 V cd−1 + Ad−2 V cd−2 + · · · + AV c1 + V c0 ) = 0n ,
that is, a solution of Ax = 0 is given by
x = Ad V cd + Ad−1 V cd−1 + Ad−2 V cd−2 + · · · + AV c1 + V c0 .
Let us now see how we can solve System (8.52). Coppersmith (Footnote 12)
proposes a generalization of the Berlekamp–Massey algorithm for linear se-
quences generated by vectors. The procedure runs in the desired O(n2 ) time,
but is somewhat complicated. Here, I explain a conceptually simpler algo-
rithm from Kaltofen13 , which achieves the same running time. Kaltofen’s al-
gorithm exploits the fact that the coefficient matrix in (8.52) is in the Toeplitz
form with scalars replaced by µ × ν matrix blocks. In practice, System (8.52)
need not be square (in terms of both elements and blocks). Kaltofen carefully
handles square subsystems which may slightly lose the Toeplitz property. Al-
though these almost Toeplitz systems can be solved in O(n2 ) time, I present
a simplified version of the algorithm assuming that µ = ν, so that we have
both e = d + 1 and µe = ν(d + 1). This iterative algorithm is proposed by
Levinson, and modified by Durbin, Trench and Zohar.14
To start with, let us solve a Toeplitz system of scalars, that is, let
Tx = b (8.53)
be an n × n system with the i, j-th entry of T given by ti−j ∈ K (a function
of i − j). This could be an attempt to solve (8.36) in the original Wiedemann
method.15 Let T (i) be the i × i submatrix sitting at the top left corner of T :
 t t t ··· t 
0 −1 −2 −i+1
 t1 t0 t−1 ··· t−i+2 
T (i) =
 .. .. .. .
.. 
. . . ··· .
ti−1 ti−2 ti−3 ··· t0
Clearly, T = T (n) . Likewise, let b(i) denote the i × 1 vector obtained by the
top i elements of b (we have b = b(n) ). We iteratively solve the system
T (i) x(i) = b(i)
for i = 1, 2, 3, . . . , n. We also keep on computing and using two auxiliary
vectors y(i) and z(i) satisfying
µ (i) ¶ µ ¶
ǫ 0i−1
T (i) y(i) = and T (i) z(i) = (8.54)
0i−1 ǫ(i)
13 Erich Kaltofen, Analysis of Coppersmith’s block Wiedemann algorithm for the parallel

solution of sparse linear systems, Mathematics of Computation, 64(210), 777–806, 1995.


14 For a nice survey, look at: Bruce R. Musicus, Levinson and fast Choleski algorithms

for Toeplitz and almost Toeplitz matrices, Technical Report 538, Research Laboratory of
Electronics, Massachusetts Institute of Technology, December 1988.
15 We intend to solve System (8.36) or (8.52) where the variables are denoted by c or c .
i i
Here, we use xi (or Xi in Example 8.7) for variables. Likewise, for b (or B). This notational
inconsistency is motivated by that solving Toeplitz systems is itself of independent interest.
Large Sparse Linear Systems 423

for a suitable choice of the scalar ǫ(i) to be specified later. In this section, I use
the parenthesized superscript (i) to indicate quantities in the i-th iteration.
In an actual implementation, it suffices to remember the quantities from only
the previous iteration. Superscripts are used for logical clarity. Subscripts are
used for matrix and vector elements (like ti ) and dimensions (like 0i−1 ).
For i = 1, we have t0 x1 = b1 which immediately gives x(1) = ( t−1 0 b1 ),
provided that t0 6= 0. We also take y(1) = z(1) = ( 1 ), that is, ǫ(1) = t0 .
Suppose that T (i) x(i) = b(i) is solved, and we plan to compute a solution
of T (i+1) x(i+1) = b(i+1) . At this stage, the vectors y(i) and z(i) , and the scalar
ǫ(i) are known. We write
 t−i   t0 t−1 · · · t−i 
(i)
T ..   .
T (i+1) =  . = .. .
(i)
ti ti−1 · · · t0 T
ti
This implies that the following equalities hold:
   (i) (i+1) 
µ (i) ¶ ǫ(i) µ ¶ −ǫ ζ
y 0
T (i+1) =  0i−1  and T (i+1) =  0i−1 ,
0 (i) (i+1) z(i) (i)
−ǫ ξ ǫ
where
1
ξ (i+1) = − ( ti ti−1 ··· t1 ) y(i) , and (8.55)
ǫ(i)
1
ζ (i+1) = − ( t−1 t−2 ··· t−i ) z(i) .
(8.56)
ǫ(i)
µ (i) ¶ µ ¶
y 0
We compute y(i+1) and z(i+1) as linear combinations of and :
0 z(i)
µ (i) ¶ µ ¶
(i+1) y (i+1) 0
y = +ξ , and (8.57)
0 z(i)
µ ¶ µ (i) ¶
0 y
z(i+1) = + ζ (i+1) . (8.58)
z(i) 0
This requires us to take
ǫ(i+1) = ǫ(i) (1 − ξ (i+1) ζ (i+1) ). (8.59)
Finally, we update the solution x(i) to x(i+1) by noting that
µ (i) ¶ µ (i) ¶ µ (i+1) ¶
x b (i+1) η − bi+1
T (i+1) = = b + T (i+1) z(i+1) ,
0 η (i+1) ǫ(i+1)
where
η (i+1) = ( ti ti−1 ··· t1 ) x(i) , (8.60)
that is,
µ ¶ µ ¶
x(i) bi+1 − η (i+1)
x(i+1) = + z(i+1) . (8.61)
0 ǫ(i+1)
424 Computational Number Theory

Example 8.6 Let us solve the following 6 × 6 non-homogeneous Toeplitz


system modulo the prime p = 97:
    
85 92 79 6 15 72 x1 82
 39 85 92 79 6 15   x2   53 
    
 87 39 85 92 79 6   x3   31 
    ≡   (mod 97) (8.62)
 42 87 39 85 92 79   x4   12 
    
82 42 87 39 85 92 x5 14
30 82 42 87 39 85 x6 48
We start with the initialization
ǫ(1) = 85, y(1) = z(1) = ( 1 ) , x(1) = ( 85−1 × 82 ) = ( 74 ) .
Subsequently, for i = 2, 3, 4, 5, 6, we run the Levinson iteration. In each iter-
ation, ξ (i+1) and ζ (i+1) are first computed using Eqns (8.55) and (8.56), and
then ǫ(i+1) is computed using Eqn (8.59). The scalars ξ (i+1) and ζ (i+1) allow
us to compute y(i+1) and z(i+1) from Eqns (8.57) and (8.58). Finally, η (i+1) is
computed by Eqn (8.60), and the updated solution vector x(i+1) is obtained
using Eqn (8.61). The following table illustrates the iterations.
i ξ (i) ζ (i) ǫ(i) (i)
µy ¶
(i)
µz ¶ η (i) (i)
µx ¶
1 40 80
2 76 40 93 73
76
   1   5 
1 91 87
3 11 91 23  31   69  74  70 
11 1 15
     
1 44 80
 73   0   70 
4 90 44 26     47  
13 68 13
 90   1   77 
1 78 61
 9   15   85 
     
5 25 78 57  13   44  54  57 
     
44 7 84
 25   1   1 
1 60 31
 11   36   67 
     
 88   19   96 
6 5 60 29     14  
 70   65   3 
     
60 52 72
5 1 48
These computations lead to the solution x1 = 31, x2 = 67, x3 = 96, x4 = 3,
x5 = 72, and x6 = 48 (where all equalities are modulo 97). ¤

Clearly, Levinson’s algorithm terminates after only Θ(n2 ) field operations


(Exercise 8.7). The algorithm fails if ǫ(i) = 0 for some i. By Exercise 8.8, a
successful termination of the algorithm demands each T (i) to be non-singular.
Large Sparse Linear Systems 425

This is a very restrictive condition, even when T is of full rank, particularly


since our interests focus on systems over small fields K (like F2 ). If K is large
compared to n, this problem is not so serious, probabilistically.
Kaltofen suggests a way to get rid of this problem. For an n × n Toeplitz
matrix T over a field K, Kaltofen considers the matrix T̂ = U T V , where U
is an upper triangular Toeplitz matrix, and V is a lower triangular Toeplitz
matrix. The elements on the main diagonals of U and V are 1. The elements of
U above the main diagonal and the elements of V below the main diagonal are
chosen randomly from a suitably large extension K s . If T is of rank n, then all
T̂ (i) , i = 1, 2, . . . , n, are invertible with probability at least 1 − n(n − 1)/|K|s .
Exercise 8.9 deals with the case of non-invertible T .
Levinson’s algorithm continues to work even if the individual coefficients ti
of T are replaced by matrix blocks. In that case, other scalar variables in the
algorithm must also be replaced by blocks or vectors of suitable dimensions.
The details are left to the reader (Exercise 8.10). I demonstrate the block
Levinson algorithm by an example.

Example 8.7 Let us again solve System (8.62) of Example 8.6. We take 2×2
blocks, and rewrite the system as
    
T0 T−1 T−2 X1 B1
 T1 T0 T−1   X2  ≡  B2  (mod 97),
T2 T1 T0 X3 B3
µ ¶ µ ¶ µ ¶
85 92 79 6 15 72
where T0 = , T−1 = , T−2 = , T1 =
µ ¶ 39 µ 85 ¶ 92µ 79 ¶ µ ¶6 15 µ ¶
87 39 82 42 x1 x3 x5
, T = , X1 = , X2 = , X3 = ,
42 µ87 ¶ 2 30
µ ¶ 82 µ ¶ x 2 x4 x6
82 31 14
B1 = , B2 = , and B3 = .
53 12 48
Since matrix multiplication is not commutative in general, we need to use
two potentially different ǫ(i) and ǫ′(i) satisfying the block version of Eqn (8.54):
µ ¶ µ ¶
ǫ(i) 0(i−1)×ν
T (i) Y (i) = and T (i) Z (i) = .
0(i−1)×ν ǫ′(i)

The corresponding updating equations for ξ (i+1) , ζ (i+1) , ǫ(i+1) , ǫ′(i+1) and
x(i+1) should be adjusted accordingly (solve Exercise 8.10).
Initialization (i = 1): The initial solution is
µ ¶
80
X (1) = T0−1 B1 = .
5
Let us plan to take
µ ¶ µ ¶
1 0 85 92
Y (1) = Z (1) = I2 = , so that ǫ(1) = ǫ′(1) = T0 = .
0 1 39 85
426 Computational Number Theory

Iteration for i = 2: The sequence of computations goes as follows.

³ ´−1 µ ¶
(2) ′(1) (1) 78 31
ξ = − ǫ T1 Y = ,
63 11
³ ´−1 µ ¶
91 64
ζ (2) = − ǫ(1) T−1 Z (1) = ,
69 61
µ ¶
29 85
ǫ(2) = ǫ(1) (I2 − ζ (2) ξ (2) ) = ,
67 23
µ ¶
23 85
ǫ′(2) = ǫ′(1) (I2 − ξ (2) ζ (2) ) = ,
67 29
 
µ (1) ¶ µ ¶ 1 0
Y 02×2  0 1 
Y (2) = + ξ (2) =  ,
02×2 Z (1) 78 31
63 11
 
µ ¶ µ (1) ¶ 91 64
02×2 Y  69 61 
Z (2) = + ζ (2) =  ,
Z (1) 02×2 1 0
0 1
µ ¶
74
η (2) = T1 X (1) = ,
12
 
µ (1) ¶ 80
³ ´−1
X  70 
X (2) = +Z (2)
ǫ′(2) (2)
(B2 − η ) =   .
02 13
77

Iteration for i = 3: In this iteration, we have the following computations.

³ ´−1 µ ¶
(3) ′(2) (2) 61 44
ξ = − ǫ ( T2 T1 ) Y = ,
21 25
³ ´−1 µ ¶
78 78
ζ (3) = − ǫ (2)
( T−1 T−2 ) Z = (2)
,
15 32
µ ¶
41 43
ǫ(3) = ǫ(2) (I2 − ζ (3) ξ (3) ) = ,
52 57
µ ¶
57 43
ǫ′(3) = ǫ′(2) (I2 − ξ (3) ζ (3) ) = ,
52 41
 
1 0
µ (2) ¶ µ ¶  0 1 
 
Y 02×2 86 9 
Y (3) ξ (3) = 

= + (2) ,
02×2 Z  24 13 
 
61 44
21 25
Large Sparse Linear Systems 427
 
78 78
µ ¶ µ (2) ¶  15 32 
 
02×2 Y  44 59 
Z (3) = (2) + ζ (3) =  ,
Z 02×2  7 89 
 
1 0
0 1
µ ¶
54
η (3) = ( T2 T1 ) X (2) = ,
59
 
31
µ (2) ¶  67 
³ ´−1  
X  96 
X (3) = + Z (3) ǫ′(3) (B3 − η (3) ) =   .
02  3 
 
72
48

Note that the solutions X (1) , X (2) and X (3) in the block version are the same
as the solutions x(2) , x(4) and x(6) in the original version (Example 8.6). ¤
428 Computational Number Theory

Exercises
1. [Lifting] Let p ∈ P and e ∈ N. Describe how you can lift solutions of Ax ≡
b (mod pe ) to solutions of Ax ≡ b (mod pe+1 ).
2. For linear systems arising out of factorization algorithms, we typically need
multiple solutions of homogeneous systems. Describe how the standard Lanc-
zos method can be tuned to meet this requirement.
3. Repeat Exercise 8.2 for the block Lanczos algorithm. More precisely, show
how a block of solutions to the homogeneous system can be obtained.
4. In order to address the problem of self-orthogonality in the Lanczos algorithm,
we modified the system Ax = b to A tD2Ax = A tD2 b for a randomly chosen
invertible diagonal matrix D with entries from a suitable extension field. What
is the problem if we plan to solve the system DA tAx = DA t b instead?
5. For the block Lanczos method, describe a method to identify the selection
matrix Si of Eqn (8.46)
6. The Wiedemann algorithm chooses elements of Ak v at particular positions.
This amounts to multiplying Eqn (8.42) from left by suitable projection vectors
u. Generalize this concept to work for any non-zero vectors u.
7. Prove that Levinson’s algorithm for solving Eqn (8.53), as presented in the
text, performs a total of about 3n2 multiplications and about 3n2 additions.
8. Prove that Eqn (8.53) is solvable by Levinson’s iterative algorithm if and only
if the matrices T (i) are invertible for all i = 1, 2, . . . , n.
9. Let T be an n × n Toeplitz matrix with rank r 6 n. We call T to be of generic
rank profile if T (i) is invertible for all i = 1, 2, . . . , r. Describe a strategy to let
Levinson’s algorithm generate random solutions of a solvable system T x = b,
where T is a Toeplitz matrix of generic rank profile with rank r < n.
10. Write the steps of the block version of Levinson’s algorithm. For simplicity,
assume that each block of T is a square matrix of size ν × ν, and the entire
coefficient matrix T is also square of size n × n with ν | n.
11. Block algorithms are intended to speed up solving linear systems over F2 .
They are also suitable for parallelization. Explain how.
12. Let A = (aij ) be an m × n matrix with entries from Fq . Suppose that m > n.
Let r denote the rank of A, and d = n − r the rank deficit (also called defect)
of A. Denote the Pj-th column of A by Aj . A non-zero n-tuple (c1 , c2 , . . . , cn ) ∈
n
(Fq )n , for which j=1 cj Aj = 0, is called a linear dependency of the columns
of A. Let l denote the number of linear dependencies of the columns of A.
(a) Prove that l + 1 = q d .
(b) Let the entries of A be randomly chosen. Prove that E(r) > n−logq (E(l)+
1), where E(X) is the expected value of the random variable X.
(c) How can you compute E(l), given a probability distribution for each aij ?
Chapter 9
Public-Key Cryptography

9.1 Public-Key Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433


9.1.1 RSA Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433
9.1.2 ElGamal Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436
9.2 Key Agreement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
9.3 Digital Signatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438
9.3.1 RSA Signature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438
9.3.2 ElGamal Signature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 439
9.3.3 DSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440
9.3.4 ECDSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441
9.4 Entity Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442
9.4.1 Simple Challenge-Response Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442
9.4.2 Zero-Knowledge Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444
9.5 Pairing-Based Cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447
9.5.1 Identity-Based Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
9.5.1.1 Boneh–Franklin Identity-Based Encryption . . . . . . . . . . . . . 449
9.5.2 Key Agreement Based on Pairing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452
9.5.2.1 Sakai–Ohgishi–Kasahara Two-Party Key Agreement . . . 452
9.5.2.2 Joux Three-Party Key Agreement . . . . . . . . . . . . . . . . . . . . . . . 453
9.5.3 Identity-Based Signature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454
9.5.3.1 Shamir Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454
9.5.3.2 Paterson Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455
9.5.4 Boneh–Lynn–Shacham (BLS) Short Signature Scheme . . . . . . . . . . 457
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459

In this chapter, we study some engineering applications of number-theoretic


algorithms. The American mathematician Leonard Eugene Dickson (1874–
1954) commented: Thank God that number theory is unsullied by any appli-
cation. This assertion is no longer true. The development of error-correcting
codes in the 1950s/60s involved first serious engineering applications of finite
fields. The advent of public-key cryptography in late nineteen seventies opens
yet another avenue of application. Almost everything that appears in earlier
chapters has profound implications in public-key cryptography. This chapter
provides an introductory exposure to public-key algorithms. This is not a book
on cryptography. Nonetheless, a treatment of some practical applications of
the otherwise theoretical study may be motivating to the readers.
I start with classical algorithms of public-key cryptography, and then dis-
cuss pairing-based protocols, a new branch of public-key cryptography. The
main problems that cryptography deals with are listed below. A study of
breaking cryptographic protocols is referred to as cryptanalysis. Cryptology
refers to the combined study of cryptography and cryptanalysis.

429
430 Computational Number Theory

• Message confidentiality: Alice wants to send a private message M


to Bob. Use of a public channel for transferring the message M allows
any eavesdropper to access M , resulting in a loss in the privacy and
confidentiality of the message. In order to avoid this problem, Alice first
transforms M to a ciphertext C = E(M ) by applying an encryption
function E, and sends C (instead of M ) through the public channel. Bob,
upon receiving C, uses a decryption function D to recover M = D(C).
Eve, the eavesdropper, must not be able to generate M from C in feasi-
ble time. This requirement can be met in two ways. First, the functions
E and D may be secretly chosen by Alice and Bob. But then, they need
to set up these functions before any transmission. Every pair of com-
municating parties requires a secret algorithm. Moreover, this strategy
is known to be weak from the angle of information theory. A better ap-
proach is to use the same algorithm for every pair of communicating par-
ties. Confidentiality is achieved by keys. Alice encrypts as C = EK (M ),
and Bob decrypts as M = DK ′ (C). If the decryption key K ′ is not
disclosed to Eve, it should be infeasible for her to generate M from C.

• Key agreement: Although encryption helps Alice and Bob to exchange


private messages over a public channel, they must have a mechanism to
agree upon the keys K and K ′ . Symmetric or secret-key cryptography
deals with the situation that K = K ′ . This common key may be set up
by a private communication between Alice and Bob. Another alternative
is to run a key-agreement or a key-exchange protocol, where Alice and
Bob generate random secrets, and exchange masked versions of these
secrets over a public channel. Combining the personal secret and the
masked secret from the other party, each of Alice and Bob computes a
common value which they later use as the symmetric key K = K ′ . Eve,
from the knowledge of only the masked secrets, cannot compute this
common value in feasible time. In this way, the necessity of a private
communication between Alice and Bob is eliminated.
In an asymmetric or public-key cryptographic system, we have K ′ 6= K.
The encryption key (also called the public key) K is made public (even
to eavesdroppers). Anybody having access to K can encrypt messages
for Bob. However, only Bob can decrypt these messages. The knowledge
of K ′ (called the private key) is vital in the decryption process. Although
K and K ′ are sort of matching keys, it should be infeasible to compute
K ′ from K. This is how an asymmetric cryptosystem derives its secu-
rity. Asymmetric cryptosystems alleviate the need of any key agreement
between Alice and Bob, secretly or publicly.

• Digital signatures: Like hand-written signatures, a digital signature


binds a digital message (or document) M to an entity (say, Bob). Digital
signature schemes are based upon asymmetric keys. The signing key K ′
is known only to Bob. The verification key K is made public. In order
Public-Key Cryptography 431

to sign M , Bob computes a short representative m of M , and uses the


signing key K ′ to generate s = SK ′ (m). The signed message is the pair
(M, s). Anybody (say, Alice) having access to the verification key K can
verify the authenticity of the signed message (M, s) as follows. Alice first
computes the short representative m from M , and applies a verification
function to generate m′ = VK (s). The signature is accepted as authentic
if and only if m′ = m. A digital signature scheme is like a public-key
encryption scheme with the sequence of using the keys K, K ′ reversed.
A signature scheme requires that no party without the knowledge of the
signing key K ′ can generate a verifiable signature s on a message M .

• Entity authentication: The authenticity of an entity Alice is realized


by Alice’s knowledge of a secret piece of information Σ. In order to prove
her identity to a verifier, Bob, Alice demonstrates her knowledge of Σ
to Bob. Any party not having access to Σ cannot impersonate Alice.
Entity authentication can be achieved by passwords. Bob stores f (Σ),
where f is a one-way function (a function that cannot be easily inverted).
During an authentication session, Alice reveals her password Σ to Bob.
Bob accepts Alice if and only if f (Σ) equals Bob’s stored value. The use
of f is necessitated as a safeguard against impersonation attempts by
parties having access to Bob’s storage. Password-based authentication
is weak, since Alice has to disclose her secret Σ to the verifier Bob.
In a strong authentication scheme, Alice does not reveal Σ directly to
Bob. Instead they run a protocol which allows Alice to succeed (with
high probability) if and only if she knows Σ. Strong authentication
schemes are based upon public-key cryptosystems.

• Certification: Public-key protocols use key pairs (K, K ′ ). The encryp-


tion (or verification) key K is made public, whereas the decryption (or
signing) key K ′ is kept secret. While using Bob’s public key K, one
must be certain that K really belongs to Bob. This binding is achieved
by digital certificates. A trusted third party, also called the certification
authority (CA), embeds K along with other identifying information (like
name, address, e-mail ID of Bob) in a certificate Γ. The CA digitally
signs Γ by its signing key. Anybody willing to use Bob’s public key ver-
ifies CA’s signature on the certificate. If the signature is verified, the
identifying information in the certificate is scrutinized. In case all these
pieces of information ensure that they correspond to Bob, the public key
of Bob, embedded in the certificate, is used.

In order to solve the above cryptographic problems, number theory is used.


There are several computational problems (like factoring large composite in-
tegers, and computing discrete logarithms in certain groups) which cannot
be solved efficiently by best existing algorithms (subexponential algorithms).
Public-key cryptography is based upon the assumption that these problems
432 Computational Number Theory

are not solvable by polynomial-time algorithms. Although this assumption is


not exactly justified, this is how this technology is developed. Interestingly
enough, this is how this technology must be developed. Computational prob-
lems that have provably high lower bounds appear to be unsuitable in real-
izing cryptographic protocols. Intuitively, these problems are so difficult that
a decryption function with an instrumental role of the private key cannot be
designed. Public-key cryptography turns out to be an intriguing application
that exploits our inability to solve certain computational problems efficiently.
Symmetric ciphers occasionally use number-theoretic tools. For example,
stream ciphers are sometimes implemented using linear feedback shift registers
(LFSRs). The theory of LFSRs is dependent upon properties of polynomials
over the finite field F2 . The Berlekamp–Massey algorithm can be used to crypt-
analyze such stream ciphers. Some nonlinearity is introduced in the output of
LFSRs in order that this attack cannot be mounted. Another sample appli-
cation of finite fields is the use of the field F256 = F28 in the Rijndael cipher
currently adopted as a standard (AES—the advanced encryption standard).
This block cipher exploits the high nonlinearity in the multiplicative inverse
operation of F256 . Despite these sporadic uses of number theory, the study of
symmetric ciphers is not classified as applied number theory. On the contrary,
number theory is omnipresent in all facets of public-key cryptography. In view
of this, I concentrate only on public-key technology in the rest of this chapter.
Symmetric and asymmetric ciphers are, however, not competitors of one
another. Asymmetric technology makes certain things possible (like realiza-
tion of digital signatures) that cannot be achieved by symmetric techniques.
On the contrary, public-key cryptographic functions are orders of magnitude
slower than symmetric cryptographic functions. In practice, a combination of
symmetric and asymmetric algorithms is used. For example, long messages are
encrypted by symmetric ciphers (block ciphers, typically). The symmetric key
between the communicating parties is established by either a key-agreement
protocol or a public-key encryption of the symmetric key. In cryptography,
both symmetric and asymmetric ciphers are useful and important. Here, our
focus is on number theory, and so on public-key techniques only.
A particular type of functions is often used in public-key cryptosystems,
most notably in signature schemes and pairing-based schemes. These are called
hash functions. A hash function H maps bit strings of any length to bit strings
of a fixed length n. A hash function H suitable for cryptography is required
to have the three following properties.

First preimage resistance Given an n-bit string y, it should, in general,


be difficult to find a string x (of any length) with H(x) = y. Of course,
one may choose certain values of x, and store (x, H(x)) pairs for these
values of x. It is evidently easy to invert H on these values of H(x).
However, since the storage of this table is restricted by availability of
memory (for example, if n = 160, it is infeasible to store a table of size
2160 ), the general complexity of inverting H should be high.
Public-Key Cryptography 433

Second preimage resistance Given a string x, it should be difficult to find


a different string y such that H(x) = H(y). Such a pair (x, y) is called
a collision for H. Since H maps an infinite domain to a finite range,
collisions must exist for any hash function. However, it should be difficult
to find a y colliding with any given x.

Collision resistance It should be difficult to find any two strings x, y with


H(x) = H(y). Unlike second preimage resistance, the choice of both x
and y are free here.

The relations between these properties of hash functions are explored in


Exercise 9.1. An obviously needed property for a cryptographic hash function
is that it should be easy to compute. A function provably possessing all these
properties is not known. There are practical constructions of hash functions.
An example is provided by the secure hash family of functions like SHA-1.1
For some cryptosystems, we often modify the range of a hash function to some
set A other than {0, 1}n . The set A could be Zr or even an elliptic-curve group
over a finite field. A practical hash function (like SHA-1) can often be used
as the basic building block for constructing such modified hash functions.
However, such constructions are not necessarily trivial. In this book, I will
make no attempt to construct hash functions. At present, these constructions
are not really applications of number theory. For us, it suffices to know that
hash functions, secure for practical cryptographic applications, do exist.

9.1 Public-Key Encryption


Among a host of public-key encryption algorithms available in the litera-
ture, I pick only two in this section. Historically, these turn out to be the first
two public-key encryption algorithms to appear in the literature.

9.1.1 RSA Encryption


The first public-key encryption and signature algorithm RSA2 happens to
be the most popular public-key algorithm. This popularity stems from its sim-
plicity and resistance against cryptanalytic studies for over three decades. The
RSA algorithm is dependent upon the difficulty of factoring large composite
integers. It involves the following steps.
1 Federal Information Processing Standard http://csrc.nist.gov/publications/fips/

fips180-3/fips180-3_final.pdf.
2 Ronald Linn Rivest, Adi Shamir and Leonard Max Adleman, A method for obtaining

digital signatures and public-key cryptosystems, Communications of the ACM, 21(2), 120–
126, 1978.
434 Computational Number Theory

Key generation: The entity willing to receive RSA-encrypted messages


generates two random large primes p and q of nearly the same bit length. In
order to achieve a decent amount of security, this bit length should be at least
512. The product n = pq and the Euler function φ(n) = (p − 1)(q − 1) are
then computed. Finally, an integer e coprime to φ(n) is chosen, and its inverse
d modulo φ(n) is computed by the extended gcd algorithm. The pair (n, e) is
published as the public key, whereas d is kept secret as the private key.
The public key e need not be a random element of Z∗φ(n) . In order to speed
up RSA encryption, it is a good idea to take e as small as possible (but greater
than 1). A possibility is to take as e the smallest prime not dividing φ(n).
I now argue that the public knowledge of n and e does not allow an adver-
sary to compute the secret d. (This problem is often referred to as the RSA
key-inversion problem (RSAKIP).) If the adversary can factor n, she can com-
pute φ(n) = (p − 1)(q − 1) and then d ≡ e−1 (mod φ(n)). This does not imply
that an adversary has to factor n to obtain d from the knowledge of n and
e only. However, it turns out that RSAKIP is computationally equivalent to
factoring n, that is, if one can compute d from the knowledge of n and e only,
one can factor n too. A probabilistic algorithm for this is supplied below.
Write ed − 1 = 2s t with t odd and s > 2 (ed − 1 is a multiple of φ(n) =
(p − 1)(q − 1)). Let a ∈ Z∗n . Since ordn a divides φ(n), it divides ed − 1 too, so


2s t′ ′
ordn a = 2s t′ with 0 6 s′ 6 s and t′ |t. But then, ordn (at ) = gcd(2s′ t′ ,t)
= 2s .
Let us look at the multiplicative orders of at modulo p and q individually.
Let g be a primitive root modulo p, and a ≡ g k (mod p). We have ordp g =
p − 1 = 2v r with v > 1, and r odd. If k is odd, then ordp (a) = 2v r′ for some

r′ |r. If k is even, then ordp (a) = 2v r′ for some v ′ < v and r′ |r. Consequently,

ordp (at ) equals 2v if k is odd, or 2v for some v ′ < v if k is even. Likewise,

ordq (at ) is 2w for some w′ 6 w, where w is the multiplicity of 2 in q − 1.

Moreover, w = w if and only if indh a is odd for some primitive root h of q.
We randomly choose a ∈ {2, 3, . . . , n − 1}. If gcd(a, n) 6= 1, this gcd is a
non-trivial factor of n. But this has a very low probability. So assume that

a ∈ Z∗n . We compute b ≡ at (mod n). We have argued that ordp b = 2v and
′ v′
ordq b = 2w for some v ′ , w′ ∈ {0, 1, 2, . . . , s}. If v ′ < w′ , then b2 ≡ 1 (mod p),
v′ v′
whereas b2 6≡ 1 (mod q), so gcd(b2 − 1, n) = p. Likewise, if v ′ > w′ , then
w′
gcd(b2 − 1, n) = q. In short, if v ′ 6= w′ , there exists an s′ ∈ {0, 1, 2, . . . , s − 1}
s′
for which gcd(b2 − 1, n) is a non-trivial factor of n. We keep on computing
s−1 s′
b, b2 , b4 , b8 , . . . , b2 modulo n by successive squaring. For each b2 so com-
s′
puted, we compute gcd(b2 − 1, n). If some s′ gives a non-trivial factor of n,
we are done. Otherwise, we choose another random a, and repeat.
Let us now investigate how likely the occurrence of the useful case v ′ 6= w′
is. Exactly half of the elements of Z∗p have v ′ = v, and exactly half of the
elements of Z∗q have w′ = w (the quadratic non-residues). If v = w, then a
is useful if it is a quadratic residue modulo p but a non-residue modulo q, or
Public-Key Cryptography 435

if it is a quadratic non-residue modulo p but¡ a residue


¢ ¡ q−1 ¢modulo q. The count
of such useful values of a is, therefore, 2 × p−1
2 2 = φ(n)/2. If v < w,
then a is useful if it is a quadratic non-residue modulo q. If v > w, then a
is useful if it is a quadratic non-residue modulo p. Therefore, at least half of
the elements of Z∗n are capable of factoring n, so a randomly chosen a splits
n non-trivially with probability > 1/2, that is, only a few choices of a suffice.
The RSA key-inversion problem is, therefore, probabilistic polynomial-
time equivalent to factoring n. Coron and May3 prove that the RSAKIP is
deterministic polynomial-time equivalent to factoring.

Example 9.1 Suppose that Bob publishes the public RSA key:
n = 35394171409,
e = 7.
Somehow, it is leaked to Eve that Bob’s private key is
d = 15168759223.
Let us see how this knowledge helps Eve to factor n. Since ed − 1 = 211 ×
51846345, we have s = 11 and t = 51846345. Eve chooses a = 5283679203, and
computes b ≡ at ≡ 90953423 (mod n). Subsequently, for s′ = 0, 1, 2, . . . , 10,
s′
Eve computes gcd(b2 − 1, n). It turns out that for s′ = 0, 1, 2, 3, this gcd is
1, and for s′ = 4, 5, . . . , 10, this gcd is n itself. This indicates that b has the
same order (namely, 24 ) modulo both the prime factors p and q of n.
Eve then tries a = 985439671 for which b ≡ at ≡ 12661598494 (mod n).
s′
The gcd of b2 − 1 with n is now 1 for s′ = 0, 1, 2, 3, it is 132241 for s′ = 4, 5,
and n itself for s′ = 6, 7, 8, 9, 10. As soon as the non-trivial gcd p = 132241 is
obtained (for s′ = 4), Eve stops and computes the cofactor q = n/p = 267649.
The complete gcd trail is shown here to illustrate that Eve is successful in this
attempt, because ordp b = 24 , whereas ordq b = 26 , in this case. ¤

Encryption: In order to encrypt a message M for Bob, Alice obtains


Bob’s public key (n, e). The message M is converted to an element m ∈ Zn
(this procedure is called encoding, and is not needed to be done securely).
Alice computes and sends c ≡ me (mod n) to Bob.
Decryption: Upon receiving c, Bob recovers m as m ≡ cd (mod n) using
his knowledge of the private key d.
To show that this decryption correctly recovers m, write ed − 1 = kφ(n) =
k(p − 1)(q − 1) for some positive integer k. In view of the CRT, it suffices to
show that med is congruent to m modulo both p and q. If p|m, then med ≡ 0 ≡
m (mod p). On the other hand, if gcd(m, p) = 1, we have mp−1 ≡ 1 (mod p)
¡ ¢k(q−1)
by Fermat’s little theorem, so med ≡ med−1 × m ≡ mp−1 ×m ≡
m (mod p). In both the cases, med ≡ m (mod p). Likewise, med ≡ m (mod q).
3 Jean-Sebastien Coron and Alexander May, Deterministic polynomial-time equivalence

of computing the RSA secret key and factoring, Journal of Cryptology, 20(1), 39–50, 2007.
436 Computational Number Theory

Let us now investigate the connection between factoring n and RSA de-
cryption. Since the ciphertext c is sent through a public channel, an eaves-
dropper can intercept c. If she can factor n, she computes d ≡ e−1 (mod φ(n)),
and subsequently decrypts c as Bob does. However, the converse capability
of the eavesdropper is not mathematically established. This means that it is
not known whether factoring n is necessary to decrypt c, or, in other words,
if Eve can decrypt c from the knowledge of n and e only, she can also factor
n. At present, no algorithms other than factoring n is known (except for some
pathological parameter values) to decrypt RSA-encrypted messages (without
the knowledge of the decryption exponent d). In view of this, we often say
that the security of RSA is based upon the intractability of the integer factor-
ing problem. It, however, remains an open question whether RSA decryption
(knowing n, e only) is equivalent to or easier than factoring n.
Example 9.2 Let us work with the modulus of Example 9.1.4 Bob chooses
the two primes p = 132241 and q = 267649, and computes n = pq =
35394171409 and φ(n) = (p−1)(q−1) = 35393771520. The smallest prime that
does not divide φ(n) is 7, so Bob takes e = 7. Extended gcd of e and φ(n) gives
d ≡ e−1 ≡ 15168759223 (mod φ(n)). Bob publishes (n, e) = (35394171409, 7)
as his public key, and keeps the private key d = 15168759223 secret.
For encrypting the message M = "Love", Alice converts M to an element m
of Zn . Standard 7-bit ASCII encoding gives m = 76×1283 +111×1282 +118×
128 + 101 = 161217381. Alice obtains Bob’s public key (n, e), and encrypts m
as c ≡ me ≡ 26448592996 (mod n). This value of c (in some encoding suitable
for transmission) is sent to Bob via a public channel.
Bob decrypts as m ≡ cd ≡ 161217381 (mod n), so M is decoded from
m correctly as "Love." If any key other than Bob’s decryption exponent d
is used to decrypt c, the output ciphertext will be different from the mes-

sage sent. For example, if Eve uses d′ = 4238432571, she gets m′ ≡ cd ≡
21292182600 (mod n), whose 7-bit ASCII decoding gives the string "O(sXH." ¤

9.1.2 ElGamal Encryption


ElGamal’s encryption algorithm5 is based on the Diffie–Hellman problem
(DHP). Let G be a cyclic group with a generator g. The DHP is the problem
of computing g ab from the knowledge of g a and g b . If one can compute discrete
logarithms in G, one can solve the DHP in G. In view of this, it is necessary to
take the group G as one in which it is infeasible to compute discrete logarithms.
Key generation: ElGamal’s scheme uses two types of keys. For each
entity, a permanent key pair is used for encrypting all messages, whereas a
4 The moduli used in the examples of this chapter are artificially small. They are meant

only for illustrating the working of the algorithms. In order to achieve a decent level of
security, practical implementations have to use much larger parameter values.
5 Taher ElGamal, A public-key cryptosystem and a signature scheme based on discrete

logarithms, IEEE Transactions on Information Theory, 31(4), 469–472, 1985.


Public-Key Cryptography 437

temporary or a session key pair is used for each encryption. The session key
must be different in different runs of the encryption algorithm. Both these
key pairs are of the form (d, g d ). Here, d is a random integer between 2 and
|G| − 1, and is used as the private key, whereas g d is to be used as the public
key. Obtaining an ElGamal private key from the corresponding public key
is the same as computing a discrete logarithm in G, that is, the ElGamal
key-inversion problem is computationally equivalent to the DLP in G.
ElGamal encryption: In order to send a message m ∈ G (M is encoded
as an element in G) to Bob, Alice obtains Bob’s public key g d . Alice generates
a random integer d′ ∈ {2, 3, . . . , |G| − 1} (session private key), and computes

s = g d (session public key). Alice then masks the message m by the quantity
′ ¡ ¢d′
g dd as t = m g d . The encrypted message (s, t) is sent to Bob.
ElGamal decryption: Upon receiving (s, t) from Alice, Bob recovers
m as m = ts−d using his permanent private key d. The correctness of this
′ ′
decryption is based on that ts−d = mg dd (g d )−d = m.

An eavesdropper possesses knowledge of G, g, g d and s = g d . If she can
′ ′
compute g dd , she decrypts the message as m = t(g dd )−1 . Conversely, if Eve

can decrypt (s, t), she computes g dd = tm−1 . ElGamal decryption (using
public information only) is, therefore, as difficult as solving the DHP in G.
Example 9.3 The prime p = 35394171431 is chosen by Bob, along with
the generator g = 31 of G = F∗p . These parameters G and g may be used by
multiple (in fact, all) entities in a network. This is in contrast with RSA where
each entity should use a different n. Bob takes d = 4958743298 as his private
key, and computes and publishes the public key y ≡ 31d ≡ 628863325 (mod p).
In order to encrypt m = 161217381 (the 7-bit ASCII encoding of "Love"),

Alice first generates a session key d′ = 19254627018, and computes s ≡ g d ≡
33303060050 (mod p) and t ≡ my d ≡ 3056015643 (mod p). Alice sends the
pair (33303060050, 3056015643) to Bob over a public channel.
Bob decrypts the ciphertext (s, t) as m ≡ ts−d ≡ 161217381 (mod p). If
any private key other than d is used, one expects to get a different recovered
message. If Eve uses d = 21375157906, she decrypts (s, t) as m′ ≡ ts−d ≡
24041362599 (mod p), whose 7-bit ASCII decoding is "YGh)’." ¤

9.2 Key Agreement


The Diffie–Hellman key-agreement protocol6 is the first published public-
key algorithm. It is based on the Diffie–Hellman problem. Indeed, the ElGamal
encryption scheme is an adaptation of the Diffie–Hellman protocol.
6 Whitfield Diffie and Martin Edward Hellman, New directions in cryptography, IEEE

Transactions on Information Theory, 22, 644–654, 1976.


438 Computational Number Theory

In order to share a secret, Alice and Bob publicly agree upon a suitable
finite cyclic group G with a generator g. Alice chooses a random integer d ∈
{2, 3, . . . , |G|−1}, and sends g d to Bob. Bob, in turn, chooses a random integer
′ ′ ′
d′ ∈ {2, 3, . . . , |G| − 1}, and sends g d to Alice. Alice computes g dd = (g d )d ,
′ ′ ′
whereas Bob computes g dd = (g d )d . The element g dd ∈ G is the common
secret exchanged publicly by Alice and Bob. An eavesdropper knows g d and
′ ′
g d only, and cannot compute g dd if the DHP is infeasible in the group G.

Example 9.4 Alice and Bob publicly decide to use the group G = F∗p , where
p = 35394171431. They also decide the generator g = 31 publicly. Alice
chooses d = 5294364, and sends y ≡ g d ≡ 21709635652 (mod p) to Bob. Like-

wise, Bob chooses d′ = 92215703, and sends y ′ ≡ g d ≡ 31439131289 (mod p)
to Alice. Alice computes α ≡ y ′d ≡ 10078655355 (mod p), whereas Bob com-

putes β ≡ y d ≡ 10078655355 (mod p). The shared secret α = β may be used
for future use. For example, Alice and Bob may use the 7-bit ASCII decoding
"%Ep&{" of α = β as the key of a block cipher. The task of an eavesdropper is
to compute this shared secret knowing G, g, y and y ′ only. This is equivalent
to solving an instance of a Diffie–Hellman problem in G. ¤

9.3 Digital Signatures


I now introduce four digital signature algorithms. RSA signatures are
adapted from the RSA encryption algorithm, and ElGamal signatures are
adapted from the ElGamal encryption algorithm. DSA (the digital signature
standard) and its elliptic-curve variant ECDSA are ElGamal-like signatures,
that are accepted as federal information processing standards.
In the rest of this section, I assume that M is the message to be signed by
Bob. In what is called a signature scheme with appendix, Bob first generates a
small representative m of M . A hash function is typically used to generate m =
H(M ). The signature generation primitive is applied on the representative m.

9.3.1 RSA Signature


Bob selects RSA keys (n, e) and d as in the case of RSA encryption. Bob
generates the appendix as s = md (mod n). The signed message is (M, s). For
verifying Bob’s signature, Alice obtains Bob’s public key (n, e), and computes
m′ ≡ se (mod n). Alice accepts (M, s) if and only if m′ = H(M ). RSA
signature is like RSA encryption with the application of the keys reversed.
The correctness and security of RSA signatures are as for RSA encryption.

Example 9.5 Bob generates the primes p = 241537 and q = 382069, and
computes n = pq = 92283800053 and φ(n) = (p − 1)(q − 1) = 92283176448.
Public-Key Cryptography 439

The smallest prime not dividing φ(n) is chosen as the public exponent e = 5.
The corresponding private exponent is d ≡ e−1 ≡ 55369905869 (mod φ(n)).
Suppose that Bob plans to sign the message m = 1234567890 (may be,
because he wants to donate these many dollars in a charity fund). He generates
the appendix as s ≡ md ≡ 85505674365 (mod n).
To verify Bob’s signature, Alice obtains his public key e = 5, and computes
m′ ≡ se ≡ 1234567890 (mod n). Since m′ = m, the signature is verified.

A forger uses d′ = 2137532490 to generate the signature s′ ≡ md ≡
84756771448 (mod n) on m. Verification by Bob’s public key gives m′ ≡ s′e ≡
23986755072 (mod n). Since m′ 6= m, the forged signature is not verified. ¤

9.3.2 ElGamal Signature


ElGamal’s original signature scheme was proposed for the group G = F∗p ,
where p is a suitably large prime. Although it is possible to generalize this
construction to any (finite) cyclic group G, let us concentrate on the case
G = F∗p only. We assume that a primitive root g modulo p is provided to us.
After fixing p and g, Bob chooses a random d ∈ {2, 3, . . . , p − 2} as his
private key, and publishes y ≡ g d (mod p) as his public key. For signing (a mes-
sage representative) m, Bob chooses a random session key d′ ∈ {2, 3, . . . , p−2}

coprime to p − 1. Subsequently, the quantities s ≡ g d (mod p) and t ≡
(m − ds)d′−1 (mod p − 1) are computed. Bob’s signature on m is the pair
(s, t). The computation of t requires gcd(d′ , p − 1) = 1.

We have m ≡ td′ +sd (mod p−1), so that g m ≡ (g d )t (g d )s ≡ st y s (mod p).
Therefore, to verify Bob’s signature (s, t), Alice needs to check whether the
congruence g m ≡ st y s (mod p) holds. This requires using Bob’s public key y.
Let us investigate the security of ElGamal’s signature scheme. The knowl-
edge of the permanent public key y and the session public key s does not
reveal the corresponding private keys d and d′ , under the assumption that it
is infeasible to solve the DLP modulo p. Clearly, if d is known to an adver-
sary, she can generate a valid signature on m in exactly the same way as Bob
does (the session secret d′ can be chosen by the adversary). Conversely, sup-
pose that Eve can produce a valid signature (s, t) of Bob on m. The equation
m ≡ td′ + sd (mod p − 1) has Θ(p) solutions in the unknown quantities d, d′
which cannot be uniquely identified unless one of d, d′ is provided to Eve. But
then, d′ can be chosen by the adversary, implying that she can solve for d, that
is, can compute the discrete logarithm of y. This argument intuitively suggests
that the security of the ElGamal signature scheme is based upon the difficulty
of computing discrete logarithms in Fp . (This is not a formal security proof.
Indeed, a silly signature scheme may reveal d to the adversary, and the effort
spent by the adversary in the process may be less than necessary to solve the
DLP in Fp . Hopefully, ElGamal signatures do not leak such silly information.
To the best of my knowledge, the equivalence of forgery of ElGamal signatures
with solving the DLP is not established yet.)
440 Computational Number Theory

Example 9.6 Let us choose the ElGamal parameters p = 92283800099 and


g = 19. Bob chooses the private key d = 23499347910, and the corresponding
public key is y ≡ g d ≡ 66075503407 (mod p). Let m = 1234567890 be the
message (representative) to be signed by Bob.
Bob chooses the random session secret d′ = 9213753243, and generates s ≡
d′
g ≡ 85536409136 (mod p) and t ≡ (m − ds)d′−1 ≡ 22134180366 (mod p − 1).
Bob’s signature on m is the pair (85536409136, 22134180366).
A verifier computes g m ≡ 44505409554 (mod p) and st y s ≡ 44505409554
(mod p). These two quantities being equal, the signature is verified.
A forger generates s by choosing d′ like Bob. Computing t, however, uses d
which the forger guesses as d = 14762527324. This gives t′ ≡ (m − ds)d′−1 ≡
43362818978 (mod p − 1). For the forged signature (s, t′ ), we have g m ≡

44505409554 (mod p) as before, whereas st y s ≡ 52275374670 (mod p). These
two quantities are not equal, so the forged signature is not verified. ¤

9.3.3 DSA
The digital signature algorithm (DSA)7 is an efficient variant of ElGamal’s
signature scheme. The reduction in the running time arises from that DSA
works in a subgroup of F∗p . Let q be a prime divisor of p − 1, having bit length
160 or more. To compute a generator g of the (unique) subgroup G of F∗p of
size q, we choose random h ∈ F∗p and compute g ≡ h(p−1)/q (mod p) until we
have g 6≡ 1 (mod p). The parameters for DSA are p, q and g.
Bob sets up a permanent key pair by choosing a random d ∈ {2, 3, . . . , q−1}
(the private key) and then computing y ≡ g d (mod p) (the public key).
For signing the message (representative) m, Bob chooses a random session

secret d′ ∈ {2, 3, . . . , q − 1}, and computes s = (g d (mod p)) (mod q) and
′−1
t ≡ (m + ds)d (mod q). Bob’s signature on m is the pair (s, t).
For verifying this signature, one computes w ≡ t−1 (mod q), u1 ≡ mw
(mod q), u2 ≡ sw (mod q), and v ≡ (g u1 y u2 (mod p)) (mod q), and accepts if
and only if v = s. The correctness of this procedure is easy to establish.
The basic difference between ElGamal’s scheme and DSA is that all expo-
nents in ElGamal’s scheme are needed modulo p − 1, whereas all exponents in
DSA are needed modulo q. In order that the DLP in Fp is difficult, one needs
to take the bit size of p at least 1024. On the contrary, q may be as small as
only 160 bits long. Thus, exponentiation time in DSA decreases by a factor of
(at least) six. The signature size also decreases by the same factor. The DSA
standard recommends a particular way of generating the primes p and q.

Example 9.7 Choose the prime p = 92283800153, for which we have p − 1 =


23 × 21529 × 535811. We take q = 21529. To locate an element g of order q in
F∗p , we take h = 3284762809 which gives g ≡ h(p−1)/q ≡ 34370710159 (mod p).
7 Federal Information Processing Standard http://csrc.nist.gov/publications/fips/

fips186-3/fips_186-3.pdf.
Public-Key Cryptography 441

Bob chooses the permanent key pair as: d = 14723 (the private key) and
y ≡ g d ≡ 62425452257 (mod p) (the public key).
Let m = 1234567890 be the message (representative) to be signed by

Bob. Bob chooses the session secret d′ = 9372, for which g d ≡ 58447941827
(mod p). Reduction modulo q of this value gives s = 764. Bob computes t ≡
(m + ds)d′−1 ≡ 17681 (mod q). Bob’s signature on m is the pair (764, 17681).
To verify this signature, Alice computes w ≡ t−1 ≡ 19789 (mod q), u1 ≡
mw ≡ 12049 (mod q), u2 ≡ sw ≡ 5438 (mod q), and v = (g u1 y u2 (mod p))
(mod q) = 58447941827 (mod q) = 764. Since v = s, the signature is verified.
Let us see how verification fails on the forged signature (764, 8179). We
now have w ≡ t−1 ≡ 8739 (mod q), u1 ≡ mw ≡ 7524 (mod q), u2 ≡ sw ≡
2606 (mod q), and v = (g u1 y u2 (mod p)) (mod q) = 9836368153 (mod q) =
4872. Since v 6= s, the forged signature is not verified. ¤

9.3.4 ECDSA
The elliptic-curve digital signature algorithm is essentially DSA adapted
to elliptic-curve groups. The same FIPS document (Footnote 7) that accepts
DSA as a standard includes ECDSA too.
Setting up the domain parameters for ECDSA involves some work. First,
a finite field Fq is chosen, where q is either a prime or a power of 2. We choose
two random elements a, b ∈ Fq , and consider the curve E : y 2 = x3 + ax + b
if q is prime, or the curve y 2 + xy = x3 + ax2 + b if q is a power of 2. In
order to avoid the MOV/Frey–Rück attack, it is necessary to take the curve
as non-supersingular. Let n be a prime divisor of |Eq |, having bit length > 160,
and let h = |Eq |/n denote the corresponding cofactor. A random point G in
Eq of order n is chosen by first selecting a random P on the curve and then
computing G = hP until one G 6= O is found. In order to determine n and
to check that the curve E is not cryptographically weak (like supersingular
or anomalous), it is necessary to compute the order |Eq | by a point-counting
algorithm. This is doable in reasonable time, since q is typically restricted
to be no more than about 512 bits in length. The field size q (along with a
representation of Fq ), the elements a, b of Fq defining the curve E, the integers
n, h, and the point G constitute the domain parameters for ECDSA.
The signer (Bob) chooses a random integer d ∈ {2, 3, . . . , n−1} (the private
key), and publishes the elliptic-curve point Y = dG (the public key).
To sign a message M , Bob maps M to a representative m ∈ {0, 1, 2, . . . ,
n − 1}. A random session key d′ ∈ {2, 3, . . . , n − 1} is generated, and the
point S = d′ G is computed. The x-coordinate x(S) of S is reduced modulo
n to generate the first part of the signature: s ≡ x(S) (mod n). In order to
generate the second part, Bob computes t ≡ (m + ds)d′−1 (mod n). Bob’s
signature on m (or M ) is the pair (s, t).
In order to verify this signature, Alice obtains Bob’s permanent public
key Y , and computes the following: w ≡ t−1 (mod n), u1 ≡ mw (mod n),
442 Computational Number Theory

u2 ≡ sw (mod n), V = u1 G + u2 Y , and v ≡ x(V ) (mod n). The signature is


accepted if and only if v = s. The correctness of this verification procedure is
analogous to that for ElGamal or DSA signatures.

Example 9.8 Take the curve E : Y 2 = X 3 + 3X + 6 defined over the prime


field F997 . By Example 4.79, EF997 has order 1043 = 7×149, so we take n = 149
and h = 7. A random point of order n on E is G = h × (14, 625) = (246, 540).
Bob generates the key pair (d, Y ), where d = 73 and Y = dG = (698, 240).
Suppose that the message representative to be signed by Bob is m = 123.
Choosing the session secret as d′ = 107, Bob computes S = d′ G = (543, 20),
s = 543 rem 149 = 96, and t ≡ (m + ds)d′−1 ≡ 75 (mod n).
To verify the signature (96, 75), Alice computes w ≡ t−1 ≡ 2 (mod n),
u1 ≡ mw ≡ 97 (mod n), u2 ≡ sw ≡ 43 (mod n), V = u1 G + u2 Y = (543, 20),
and v = 543 rem 149 = 96. Since v = s, the signature is verified.
The forged signature (s, t′ ) = (96, 112) leads to the computations: w ≡
′−1
t ≡ 4 (mod n), u1 ≡ mw ≡ 45 (mod n), u2 ≡ sw ≡ 86 (mod n), V =
u1 G + u2 Y = (504, 759), and v = 504 rem 149 = 57 6= s. ¤

9.4 Entity Authentication


Strong authentication schemes are often referred to as challenge-response
authentication schemes for the following reason. Suppose that Alice wants
to prove to Bob her identity, that is, her knowledge of a secret σ, without
disclosing σ directly to Bob. Bob generates a random challenge for Alice.
Alice can successfully respond to this challenge if she knows σ. Lack of the
knowledge of σ lets an impersonation attempt fail with high probability.

9.4.1 Simple Challenge-Response Schemes


A challenge-response scheme may be based on public-key encryption or dig-
ital signatures. Alice’s knowledge of her private key enables her to successfully
decrypt an encrypted message sent as a challenge by Bob, or to successfully
generate a verifiable signature. These two approaches are elaborated now.
Algorithm 9.1 explains an authentication protocol based on Alice’s ca-
pability to decrypt ciphertext messages. Bob generates a random plaintext
message r, and sends a witness w of r to Alice. The witness ensures that Bob
really possesses the knowledge of r. In order that third parties cannot make a
successful replay of this protocol, it is required that Bob change the string r
in different sessions. The function f should be one-way (not easily invertible)
and collision-resistant (it should be difficult to find two different r, r′ with
f (r) = f (r′ )). A cryptographic hash function may be used as f .
Public-Key Cryptography 443

Algorithm 9.1: Challenge-response scheme based on encryption


Bob generates a random string r, and computes the witness w = f (r).
Bob encrypts r using Alice’s public key e: c = Ee (r).
Bob sends the witness w and the encrypted message c to Alice.
Alice decrypts c using her private key d: r′ = Dd (c).
If w 6= f (r′ ), Alice quits the protocol.
Alice sends the response r′ to Bob.
Bob accepts Alice if and only if r′ = r.

Bob then encrypts r, and sends the challenge c = Ee (r) to Alice. If Alice
knows the corresponding private key, she can decrypt c to recover r. If she
sends r back to Bob, Bob becomes sure of Alice’s capability to decrypt his
challenge. A third party having no knowledge of d can only guess a value of r,
and succeeds with a very little probability.
Before sending the decrypted r′ back to Bob, Alice must check that Bob
is participating honestly in the protocol. If w 6= f (r′ ), Alice concludes that
the witness w does not establish Bob’s knowledge of r. The implication could
be that Bob is trying to make Alice decrypt some ciphertext message in order
to obtain the corresponding unknown plaintext message. In such a situation,
Alice should not proceed further with the protocol.

Example 9.9 Let me illustrate the working of Algorithm 9.1 in tandem with
RSA encryption. Suppose that Alice sets up her keys as follows: p = 132241,
q = 267649, n = pq = 35394171409, φ(n) = (p − 1)(q − 1) = 35393771520,
e = 7, d ≡ e−1 ≡ 15168759223 (mod φ(n)). Alice’s knowledge of her private
key d is to be established in an authentication interaction with Bob.
Bob chooses the random element r = 2319486374, and sends its sum of
digits w = 2 + 3 + 1 + 9 + 4 + 8 + 6 + 3 + 7 + 4 = 47 to Alice as a witness of his
knowledge of r. Cryptographically, this is a very bad realization of f , since it is
neither one-way nor collision-resistant. But then, this is only a demonstrating
example, not a real-life implementation.
Bob encrypts r as c ≡ re ≡ 22927769204 (mod n). Upon receiving the
challenge c, Alice first decrypts it: r′ ≡ cd ≡ 2319486374 (mod n). Since the
sum of digits in r′ is w = 47, she gains the confidence that Bob really knows
r′ . She sends r′ back to Bob. Finally, since r′ = r, Bob accepts Alice. ¤

Algorithm 9.2 describes a challenge-response scheme that uses Alice’s abil-


ity to generate verifiable signatures. The message to be signed is generated
partially by both the parties Alice and Bob. Bob’s contribution c is necessary
to prevent replay attacks by eavesdroppers. If c were absent in m, Eve can
always send c′ and s = Sd (c′ ) to Bob and impersonate as Alice, after a real
interaction between Alice and Bob is intercepted. On the other hand, Alice’s
contribution c′ in m is necessary to preclude possible attempts by Bob to let
Alice use her private key on messages selected by Bob. The strings c and c′
can be combined in several ways (like concatenation or modular addition).
444 Computational Number Theory

Algorithm 9.2: Challenge-response scheme based on signatures


Bob sends a random challenge c to Alice.
Alice generates a random string c′ .
Alice combines c and c′ to get a message m = combine(c, c′ ).
Alice generates the signature s = Sd (m) on m.
Alice sends c′ and s simultaneously to Bob.
Bob uses Alice’s verification key to generate m′ = Ve (s).
Bob accepts Alice if and only if m′ = combine(c, c′ ).

Example 9.10 Let us use the same RSA parameters and key pair of Alice as
in Example 9.9. Bob sends the random string c = 21321368768 to Alice. Alice,
in turn, generates the random string c′ = 30687013256, and combines c and
c′ as m ≡ c + c′ ≡ 16614210615 (mod n). Alice then generates her signature
s ≡ md ≡ 26379460389 (mod n) on m, and sends c′ and s together to Bob. Bob
uses the RSA verification primitive to get m′ ≡ se ≡ 16614210615 (mod n).
Since m′ ≡ c + c′ (mod n), Bob accepts Alice as authentic. ¤

9.4.2 Zero-Knowledge Protocols


A challenge-response authentication protocol does not reveal Alice’s secrets
straightaway to Bob or any eavesdropper, but may leak some partial infor-
mation on this secret. Upon repeated use, Bob may increasingly acquire the
capability of choosing strategic challenges. A zero-knowledge protocol is a form
of challenge-response authentication scheme that comes with a mathematical
proof that no partial information is leaked to Bob (or any eavesdropper). This
means that all the parties provably continue to remain as ignorant of Alice’s
secret as they were before the protocol started. As a result, the security of the
protocol does not degrade with continued use.
A zero-knowledge protocol typically consists of three stages. Suppose that
Alice (the claimant or the prover) wants to prove her identity to Bob (the
verifier). First, Alice chooses a random commitment and sends a witness of
this commitment to Bob. Bob, in turn, sends a random challenge to Alice.
Finally, Alice sends her response to the challenge back to Bob. If Alice knows
the secret, she can send a valid response during every interaction, whereas an
eavesdropper without the knowledge of Alice’s secret can succeed with only
some limited probability. If this probability is not too small, the protocol may
be repeated a requisite number of times in order to reduce an eavesdropper’s
chance in succeeding in all these interactions to below a very small value.
In what follows, I explain Fiat and Shamir’s zero-knowledge protocol.8
Algorithm 9.3 details the steps in this protocol.
8 Amos Fiat and Adi Shamir, How to prove yourself: Practical solutions to identification

and signature problems, Crypto, 186–194, 1986.


Public-Key Cryptography 445

Algorithm 9.3: The Fiat–Shamir zero-knowledge protocol


Setting up of domain parameters:
A trusted third party (TTP) selects two large primes p and q.
The TTP publishes the product n = pq and a small integer k.
Setting up of Alice’s secret:
The TTP selects k secret integers s1 , s2 , . . . , sk ∈ Z∗n .
The TTP computes the squares vi ≡ s2i (mod n) for i = 1, 2, . . . , k.
The TTP transfers s1 , s2 , . . . , sk to Alice securely.
The TTP makes v1 , v2 , . . . , vk public.
Authentication of Alice (claimant) by Bob (verifier):
Alice selects a random commitment c ∈ Zn .
Alice sends the witness w ≡ c2 (mod n) to Bob.
Bob sends k random bits (challenges) e1 , e2 , . . . , ek to Alice.
Yk
Alice sends the response r ≡ c si (mod n) to Bob.
i=1
ei =1
k
Y
Bob accepts Alice if and only if r2 ≡ w vi (mod n).
i=1
ei =1

In order to see how the Fiat–Shamir protocol works, first take the simple
case k = 1, and drop all suffixes (so vi becomes v, for example). Alice proves
her identity by demonstrating her knowledge of s, the square v of which is
known publicly. Under the assumption that computing square roots modulo
a large integer n with unknown factorization is infeasible, no entity in the
network (except the TTP) can compute s from the published value v.
In an authentication interaction, Alice first chooses a random commitment
c which she later uses as a random mask for her response. Her commitment to c
is reflected by the witness w ≡ c2 (mod n). Again because of the intractability
of the modular square-root problem, neither Bob nor any eavesdropper can
recover c from w. Now, Bob chooses a random challenge bit e. If e = 0, Alice
sends the commitment c to Bob. If e = 1, Alice sends cs to Bob. In both cases,
Bob squares this response, and checks whether this square equals w (for e = 0)
or wv (for e = 1). In order that the secret s is not revealed to Bob, Alice must
choose different commitments in different authentication interactions.
Let us now see how an adversary, Eve, can impersonate Alice. When Eve
selects a commitment, she is unaware of the challenge that Bob is going to
throw in future. In that case, her ability to supply valid responses to both
the challenge bits is equivalent to her knowledge of s. If Eve does not know
s, she can succeed with a probability of 1/2 as follows. Suppose that Eve
sends w ≡ c2 (mod n) to Bob for some c chosen by her. If Bob challenges
with e = 0, she can send the correct response c to Bob. On the other hand,
if Bob’s challenge is e = 1, she cannot send the correct response cs, because
446 Computational Number Theory

s is unknown to her. Eve may also prepare for sending the correct response
for e = 1 as follows. She chooses a commitment c but sends the improper
commitment c2 /v (mod n) to Bob. If Bob challenges with e = 1, she sends
the verifiable response c. On the other hand, if Bob sends e = 0, the verifiable
response would be c/s which is unknown to Eve.
If a value of k > 1 is used, Eve succeeds in an interaction with Bob with
probability 2−k . This is because Eve’s commitment can successfully handle ex-
actly one of the 2k different challenges (e1 , e2 , . . . , ek ) from Bob. Example 9.11
illustrates the case k = 2. If 2−k is not small enough, the protocol is repeated
t times, and the chance that Eve succeeds in all these interactions is 2−kt . By
choosing k and t suitably, this probability can be made as low as one desires.
A modification of the Fiat–Shamir protocol and a proof of the zero-
knowledge property of the modified protocol are from Feige, Fiat and Shamir.9
I will not deal with the Feige–Fiat–Shamir (FFS) protocol in this book.
Example 9.11 Suppose that the TTP chooses the composite integer n =
148198401661 to be used in the Fiat–Shamir protocol. It turns out that n is
the product of two primes, but I am not disclosing these primes, since the
readers, not being the TTP, are not supposed to know them.
Take k = 2. The TTP chooses the secrets s1 = 18368213879 and s2 =
94357932436 for Alice, and publishes their squares v1 ≡ s21 ≡ 119051447029
(mod n) and v2 ≡ s22 ≡ 100695453825 (mod n).
In an authentication protocol with Bob, Alice chooses the commitment c =
32764862846 (mod n), and sends the witness w ≡ c2 ≡ 87868748231 (mod n)
to Bob. The following table shows the responses of Alice for different challenges
from Bob. The square r2 and the product wv1e1 v2e2 match in all the cases.
e1 e2 r (mod n) r2 (mod n) wv1e1 v2e2 (mod n)
0 0 c = 32764862846 87868748231 87868748231
1 0 cs1 = 50965101270 49026257157 49026257157
0 1 cs2 = 102354808490 32027759547 32027759547
1 1 cs1 s2 = 65287381609 57409938890 57409938890
Let us now look at an impersonation attempt by Eve having no knowledge
of Alice’s secrets s1 , s2 and Bob’s challenges e1 , e2 . She can prepare for exactly
one challenge of Bob, say, (0, 1). She randomly chooses c = 32764862846, and
sends the (improper) witness w ≡ c2 /v2 ≡ 72251816136 (mod n) to Bob. If
Bob sends the challenge (0, 1), Eve responds by sending r = c = 32764862846
for which r2 ≡ 87868748231 (mod n), and wv2 ≡ 87868748231 too, that is, Eve
succeeds. However, if Eve has to succeed for the other challenges (0, 0), (1, 0)
and (1, 1), she needs to send the responses c/s2 , cs1 /s2 and cs1 , respectively,
which she cannot do since she knows neither s1 nor s2 . This illustrates that if
all of the four possible challenges of Bob are equally likely, Eve succeeds with
a probability of 1/4 only. ¤

9 Uriel Feige, Amos Fiat and Adi Shamir, Zero knowledge proofs of identity, Journal of

Cryptology, 1, 77–94, 1988.


Public-Key Cryptography 447

9.5 Pairing-Based Cryptography


Using bilinear pairings in cryptographic protocols is a relatively new area
in public-key cryptography. This development began with the work of Sakai,
Ohgishi and Kasahara10 , but the paper was written in Japanese and did not
immediately attract worldwide attention. Joux’s three-party key-agreement
protocol11 and Boneh and Franklin’s identity-based encryption scheme12 , pub-
lished shortly afterwards, opened a floodgate in pairing-based cryptography.
Section 4.5 of this book already gives a reasonably detailed mathematical
exposure to certain realizable pairing functions. Here, it suffices to review only
the basic notion of a bilinear map. Let G1 and G2 be additive groups and G3
a multiplicative group, each of prime order r. A bilinear pairing e : G1 × G2 →
G3 is a map satisfying e(aP, bQ) = e(P, Q)ab for all P ∈ G1 , Q ∈ G2 , and
a, b ∈ Z. For such a map to be useful, we require two conditions. First, e
should be easily computable, and second, e should not be degenerate (that is,
there must exist P ∈ G1 and Q ∈ G2 for which e(P, Q) is not the identity
element of G3 ). Recall that Weil and (reduced) Tate pairings satisfy these
conditions for appropriately chosen elliptic-curve groups G1 and G2 and for
an appropriate subgroup G3 of the multiplicative group of a finite field.
A special attention is given to the case G1 = G2 (call this group G). A
bilinear map e : G × G → G3 can, for example, be obtained from Weil or Tate
pairing in conjunction with distortion maps on supersingular elliptic curves.
Before we jump to computationally difficult problems associated with bi-
linear pairing maps, let us review the Diffie–Hellman problem (introduced
near the beginning of Chapter 7). Let P be a generator of the additive group
G of prime order r. The computational Diffie–Hellman problem (CDHP) in G
is the problem of computing abP from a knowledge of P , aP and bP only. The
decisional Diffie–Hellman problem (DDHP) in G, on the other hand, refers to
deciding, given P, aP, bP, zP , whether z ≡ ab (mod r). In certain groups (like
suitably large elliptic-curve groups), both these problems are computationally
infeasible, at least to the extent we know about the discrete-logarithm problem
in these groups. Of course, this assertion should not be taken to mean that
one has to solve the discrete-logarithm problem to solve these Diffie–Hellman
problems. It is only that at present, no better methods are known.
Difficulties arise in the presence of bilinear maps associated with G. First,
consider the special (but practically important) case that e : G × G → G3 is
a bilinear map. Solving the DDHP in G becomes easy, since z ≡ ab (mod r)
if and only if e(aP, bP ) = e(P, zP ). The CDHP in G is, however, not known

10 R. Sakai, K. Ohgishi and M. Kasahara, Cryptosystems based on pairing, SCIS, 2000.


11 Antoine Joux, A one-round protocol for tripartite Diffie–Hellman, ANTS-4, 385–394,
2004.
12 Dan Boneh and Matthew K. Franklin, Identity based encryption from the Weil pairing,

Crypto, 213–229, 2001.


448 Computational Number Theory

to be easily solvable in the presence of bilinear maps e : G × G → G3 . Indeed,


the CDHP in the presence of bilinear maps is believed to remain as difficult as
it was without such maps. The group G is called a gap Diffie–Hellman (GDH)
group if the DDHP is easy in G, but the CDHP is difficult in G. Note, how-
ever, that the presence of bilinear maps is not necessary (but only sufficient)
for creating GDH groups. The notion of GDH groups readily extends to the
general case of two groups G1 , G2 admitting a bilinear map e : G1 × G2 → G3 .
The above discussion highlights the need for modifying the notion of Diffie–
Hellman problems in the presence of bilinear maps. The (computational)
bilinear Diffie–Hellman problem (BDHP) in the context of a bilinear map
e : G × G → G3 is to compute e(P, P )abc given the elements P, aP, bP, cP ∈
G only. The decisional bilinear Diffie–Hellman problem (DBDHP), on
the other hand, refers to the problem of deciding, from a knowledge of
P, aP, bP, cP, zP ∈ G only, whether z ≡ abc (mod r) (or equivalently, whether
e(P, P )z = e(P, P )abc ). These two new problems are believed not to be assisted
by the existence of the bilinear map e, that is, if the discrete-logarithm prob-
lem is difficult in G, it is assumed that the BDHP and DBDHP are difficult
too, even in the presence of (efficiently computable) bilinear maps e.
Let us now turn our attention to the more general setting that G1 6= G2 ,
and we have a bilinear map e : G1 × G2 → G3 . We continue to assume that
G1 , G2 are additive groups and G3 is a multiplicative group, each of prime
order r. The presence of e is assumed not to make the DDHP in G1 or G2
easier. This is referred to as the external Diffie–Hellman (XDH) assumption.
In the context of bilinear maps e : G1 × G2 → G3 (with G1 6= G2 ), we talk
about some other related computational problems which are again believed to
be difficult (given that the discrete-logarithm problem is difficult in G1 and
also in G2 ). Let P and Q be generators of G1 and G2 , respectively. Since e is
non-degenerate, e(P, Q) is not the identity element of G3 . The Co-CDHP is
the problem of computing abQ ∈ G2 , given P, aP ∈ G1 and Q, bQ ∈ G2 only.
The special case b = 1 (computing aQ from P, aP, Q) is also often referred to
by the same name. Indeed, these two variants are computationally equivalent,
since the general case can be solved from the special case by replacing Q by
bQ. Here, it is assumed that b 6≡ 0 (mod r) (otherwise the problem is trivial).
Note also that the variant with a = 1 (computing bP from the knowledge of
P, Q, bQ only) is analogously a problem equivalent to the general Co-CDHP.
The Co-DDHP is to decide, given P, aP ∈ G1 and Q, bQ, zQ ∈ G2 only,
whether z ≡ ab (mod r). Again, the special variants corresponding to b = 1
or a = 1 are computationally equivalent to the general problem.
The two bilinear Diffie–Hellman problems in the context of the bilinear
map e : G1 × G2 → G3 are the following. The Co-BDHP is the problem of
computing e(P, Q)ab from the knowledge of P, aP, bP ∈ G1 and Q ∈ G2 only.
Finally, the co-DBDHP is the problem of deciding whether z ≡ ab (mod r)
from the knowledge of P, aP, bP ∈ G1 , Q ∈ G2 and e(P, Q)z ∈ G3 only.
I am now going to describe a few cryptographic protocols that exploit the
presence of bilinear maps and that derive their securities from the assumption
Public-Key Cryptography 449

that the different versions of the Diffie–Hellman problem discussed above are
difficult to solve for suitable choices of the groups. If the assumption G1 = G2
(= G) simplifies the exposition (but without a loss of security), I will not
hesitate to make this assumption. The reader should, however, keep in mind
that these protocols may often be generalized to the case G1 6= G2 .

9.5.1 Identity-Based Encryption


Long before pairings have been employed in cryptography, Shamir13 in-
troduced the notion of identity-based encryption and signature schemes. Al-
though Shamir proposed an identity-based signature scheme in this paper,
he could not provide a concrete realization of an identity-based encryption
scheme. Boneh and Franklin’s work (2001) provided the first such realization,
thereby bringing pairing to the forefront of cryptology research.
In a traditional encryption or signature scheme, the authenticity of an
entity’s public key is established by certificates. This requires a certification
authority (CA) to sign every public key (and other identifying information).
When an entity verifies the CA’s signature on a public key (and the associated
identifying information), (s)he gains the confidence in using the key.
An identity-based scheme also requires a trusted authority (TA) responsi-
ble for generating keys for each entity. In view of this, the TA is also often
called the key-generation center (KGC) or the private-key generator (PKG).
As in the case of certificates, every entity needs to meet the TA privately for
obtaining its keys. However, there is no signature-verification process during
the use of a public key. Instead, Alice can herself generate Bob’s (authentic)
public key from the identity of Bob (like Bob’s e-mail address). No involve-
ment of the TA is necessary at this stage. This facility makes identity-based
schemes much more attractive than certification-based schemes.

9.5.1.1 Boneh–Franklin Identity-Based Encryption


The Boneh–Franklin IBE scheme has four stages discussed one by one
below. We consider the simplified case of pairing given by G1 = G2 = G.
Setup: The TA (or PKG or KGC) first identifies suitable groups G, G3 of
prime order r, a generator P of G, and also a pairing map e : G × G → G3 .
The TA chooses a random master secret key s ∈ Z∗r , and computes Ppub = sP .
Two (cryptographic) hash functions H1 : {0, 1}∗ → G and H2 : G3 → {0, 1}n
for some suitable n are also chosen. The master secret key s is kept secret by
the TA. The parameters r, G, G3 , e, P, Ppub , n, H1 , H2 are publicly disclosed.
Key generation: Suppose that Bob wants to register to the TA. Let the
identity of Bob be given by his e-mail address bob@p.b.cr. The TA first hashes
the identity of Bob, and then multiplies the hash by the master secret s in
13 Adi Shamir, Identity based cryptosystems and signature schemes, Crypto’84, 47–53,

1985.
450 Computational Number Theory

order to generate Bob’s decryption key:


PBob = H1 (bob@p.b.cr),
DBob = sPBob .
Here, PBob is the hashed identity of Bob, computable by anybody who knows
Bob’s e-mail address. The decryption key DBob is handed over to Bob (by the
TA) securely, and is to be kept secret.
Encryption: In order to send an n-bit message M to Bob, Alice performs
the following steps.
1. Alice computes Bob’s hashed identity PBob = H1 (bob@p.b.cr) ∈ G.
2. Alice computes g = e(PBob , Ppub ) ∈ G3 .
3. Alice chooses a random element a ∈ Z∗r .
4. Alice computes the ciphertext C = (aP, M ⊕ H2 (g a )) ∈ G × {0, 1}n .
Here, H2 (g a ) is used as a mask to hide the message M .
Decryption: Bob decrypts a ciphertext C = (U, V ) ∈ G × {0, 1}n as
M = V ⊕ H2 (e(DBob , U )).
This process involves use of Bob’s private key DBob .
Let us first establish that this decryption procedure works, that is, we start
with U = aP and V = M ⊕ H2 (g a ). It suffices to show g a = e(DBob , aP ).
By bilinearity, it follows that e(DBob , aP ) = e(sPBob , aP ) = e(PBob , P )sa =
e(PBob , sP )a = e(PBob , Ppub )a = g a .
Let us now scrutinize the security of this scheme. Let the discrete logarithm
of PBob to the base P be b. This is unknown to all the parties, but that does
not matter. From public (or intercepted) information, an eavesdropper, Eve,
knows P (a system parameter), U = aP (part of the ciphertext), PBob = bP
(computable by anybody), and Ppub = sP (the public key of the TA). The
mask (before hashing) is g a = e(PBob , Ppub )a = e(bP, sP )a = e(P, P )abs (by
bilinearity). Eve’s ability to decrypt only from public knowledge is equiva-
lent to computing the mask H2 (e(P, P )abs ) (under cryptographic assumptions
about the hash function H2 ). That is, Eve needs to solve the computational bi-
linear Diffie–Hellman problem (BDHP). Under the assumption that the BDHP
is infeasible for the chosen parameters, the Boneh–Franklin scheme is secure.
Alice knows the session secret a (in addition to P, aP, bP, sP ), so she can
compute the mask e(P, P )abs = e(bP, sP )a . Bob knows neither of a, b, s, but
can still compute the mask as e(P, P )abs = e(s(bP ), aP ) using his knowledge
of DBob = sbP = sPBob .

Example 9.12 In all the examples of pairing-based protocols, I use the super-
singular curve E : y 2 = x3 + x defined over the prime field Fp , p = 744283. We
have |Ep | = p+1 = 4r, where r = 186071 is a prime. The point P = (6, 106202)
of order r generates the subgroup G = G1 of Ep . In order to fix the second
Public-Key Cryptography 451

group for defining a bilinear map, we take Fp2 = Fp (θ) with θ2 + 1 = 0. An


element of Fp2 is represented as aθ + b with a, b ∈ Fp . These elements follow
the same arithmetic as do the complex numbers in the standard representa-
tion. The group Ep2 contains the subgroup G2 of order r, generated by the
point Q = (−6, 106202θ) = (744277, 106202θ). Since P and Q are linearly in-
dependent, the Weil pairing er : G1 × G2 → G3 is a degenerate bilinear map,
where G3 is the multiplicative group of r-th roots of unity in F∗p2 .
We often use a simplified pairing e : G×G → G3 . Since E is supersingular,
it admits a distortion map ϕ : G1 → G2 given by ϕ(a, b) = (−a, bθ). (In
fact, ϕ(P ) = Q.) In this case, we take the distorted Weil pairing e(P1 , P2 ) =
er (P1 , ϕ(P2 )) as the relevant bilinear map e (with both P1 , P2 ∈ G1 = G).
In the setup phase of the Boneh–Franklin scheme, the TA fixes these pa-
rameters r, G, G3 , e, P . Suppose that the TA chooses the master secret key
s = 314095, for which the public key is Ppub = sP = (246588, 425427). The
TA should also specify n, H1 and H2 . A hash function from {0, 1}∗ to the
group G involves some involved constructions. I will not go into the details
of such a construction, but assume that the hashed public identities of enti-
ties (like PBob ) are provided to us. I, however, make a concrete proposal for
H2 : G3 → {0, 1}n . Each element of G3 is of the form aθ + b with a, b ∈ Fp .
Since p is a 20-bit prime, we will concatenate 20-bit representations of a and
b, and call it H2 (aθ + b). This concatenated value equals 220 a + b, so n = 40.
This is certainly not a good hash function at all, but is supplied here just as
a placeholder to demonstrate the working of the Boneh–Franklin scheme.
In the registration phase, Bob’s hashed public identity is obtained as
PBob = (267934, 76182) (we assume that this is computable by anybody know-
ing Bob’s identity). Bob’s secret key is then DBob = sPBob = (505855, 273372).
Suppose that Alice wants to encrypt the 40-bit message M = 239 + 229 +
2 + 29 = 550293209600 whose bit-wise representation is
19

M = 1000000000100000000010000000001000000000.
Alice computes Bob’s hashed public identity as PBob = (267934, 76182) for
which g = e(PBob , Ppub ) = 239214θ + 737818. Now, Alice chooses the random
value a = 60294, and obtains
U = aP = (58577, 21875)
and g a = 609914θ + 551077. Applying H2 on g a gives
H2 (g a ) = 609914 × 220 + 551077 = 639541733541
= 1001010011100111101010000110100010100101,
where the last expression for H2 (g a ) is its 40-bit binary representation. Using
bit-wise XOR with M gives the second part of the ciphertext as
V = M ⊕ H2 (g a ) = 0001010011000111101000000110101010100101.
Let us now see how the ciphertext (U, V ) is decrypted by Bob. Bob first
computes e(DBob , U ) = 609914θ + 551077. This is same as g a computed by
452 Computational Number Theory

Alice. Therefore, this quantity after hashing by H2 and bit-wise xor-ing with
V recovers the message M that Alice encrypted.

An eavesdropper Eve intercepts (U, V ), and uses DBob = (215899, 48408) 6=

DBob for decryption. This gives e(DBob , U ) = 291901θ + 498758. Application
of H2 on this gives the bit string 0100011101000011110101111001110001000110.
When this is xor-ed with V , Eve recovers the message
M ′ = 0101001110000100011101111111011011100011 6= M . ¤

9.5.2 Key Agreement Based on Pairing


The two pairing-based key-agreement protocols I am going to discuss now
are based on the BDHP. Indeed, the Boneh–Franklin encryption scheme is a
direct adaptation of these key-agreement protocols. This is quite similar to
the fact that the ElGamal encryption scheme is a direct adaptation of the
Diffie–Hellman key-agreement protocol.

9.5.2.1 Sakai–Ohgishi–Kasahara Two-Party Key Agreement


As in the Boneh–Franklin scheme, the TA generates public parameters: a
prime r, groups G, G3 of order r, a pairing map e : G × G → G3 , a generator
P of G, TA’s public key Ppub (= sP , where s is the master secret key), and
a hash function H1 : {0, 1}∗ → G for hashing identities of parties. The bit
length n and the second hash function H2 used by the Boneh–Franklin scheme
are not needed in the Sakai–Ohgishi–Kasahara (SOK) protocol.
The TA hashes an individual party’s identity, and then gives the product of
s with this hashed identity to that party. For instance, Alice (with e-mail ad-
dress alice@p.b.cr) has the public hashed identity PAlice = H1 (alice@p.b.cr),
and receives the private key DAlice = sPAlice from the TA.
Suppose that two registered parties Alice and Bob plan to establish a
shared secret. Alice computes Bob’s hashed identity PBob = H1 (bob@p.b.cr),
and generates the secret SAlice = e(DAlice , PBob ). Likewise, Bob com-
putes Alice’s hashed identity PAlice = H1 (alice@p.b.cr) and subsequently
the secret SBob = e(PAlice , DBob ). We have SAlice = e(DAlice , PBob ) =
e(sPAlice , PBob ) = e(PAlice , PBob )s = e(PAlice , sPBob ) = e(PAlice , DBob ) =
SBob . This key-agreement protocol is non-interactive in the sense that no
message transmission takes place between the communicating parties.
In order to see how this protocol is related to the BDHP, let a and b be the
(unknown) discrete logarithms of the hashed identities PAlice and PBob to the
base P . All parties can compute aP, bP . Moreover, P and sP = Ppub are public
knowledge. The shared secret between Alice and Bob is e(DAlice , PBob ) =
e(PAlice , PBob )s = e(aP, bP )s = e(P, P )abs . This means that a derivation of
this shared secret from the knowledge of P, aP, bP, sP only is equivalent to
solving an instance of the BDHP. On the other hand, Alice knows DAlice =
sPAlice = saP , so she can compute e(P, P )abs = e(saP, bP ), whereas Bob
knows DBob = sbP , so he too can compute e(P, P )abs = e(aP, sbP ).
Public-Key Cryptography 453

Example 9.13 The TA uses the elliptic curve and the distorted Weil pairing
e : G × G → G3 as in Example 9.12. Let the master secret key be s = 592103,
for which the TA’s public key is Ppub = sP = (199703, 717555).
Suppose that Alice’s hashed public identity is PAlice = (523280, 234529), so
Alice’s secret key is DAlice = (360234, 27008). Likewise, if Bob’s hashed iden-
tity is PBob = (267934, 76182), Bob’s secret key is DBob = (621010, 360227).
In the key-agreement phase, Alice and Bob compute each other’s hashed
identity. Subsequently, Alice computes e(DAlice , PBob ) = 238010θ + 137679,
and Bob computes e(PAlice , DBob ) = 238010θ + 137679. Thus, the secret
shared by Alice and Bob is the group element 238010θ + 137679. Since
the distorted Weil pairing is symmetric about its two arguments, we have
e(DAlice , PBob ) = e(PBob , DAlice ) and e(PAlice , DBob ) = e(DBob , PAlice ), so it
is not necessary to decide which party’s keys go in the first argument. ¤

9.5.2.2 Joux Three-Party Key Agreement


Joux’s three-party protocol is conceptually similar to the SOK protocol
with a few differences. First, there is no role played by the TA. The three com-
municating parties decide upon the parameters r, G, G3 , e, P publicly without
involving any trusted party. Joux’s protocol is not identity-based, that is, the
master secret key and the private keys of entities are absent from the protocol.
Finally, the protocol is interactive. The communicating parties need to carry
out one round of message broadcasting before they establish a common secret.
Alice, Bob and Carol individually and secretly generate random elements
a, b, c ∈ Z∗r , respectively. Alice broadcasts aP to Bob and Carol, Bob broad-
casts bP to Alice and Carol, and Carol broadcasts cP to Alice and Bob. After
this transmission, the parties proceed as follows.
1. Alice computes e(bP, cP )a = e(P, P )abc ,
2. Bob computes e(aP, cP )b = e(P, P )abc , and
3. Carol computes e(aP, bP )c = e(P, P )abc .
So the shared secret is e(P, P )abc . A passive eavesdropper Eve, listening to
the communication, gains the knowledge of P, aP, bP, cP only. Her ability to
compute e(P, P )abc amounts to solving an instance of the BDHP.
Example 9.14 Alice, Bob and Carol plan to use the elliptic curve, the point
P , and the distorted Weil pairing e : G × G → G3 as in Example 9.12. The
computations done by Alice, Bob and Carol are summarized below.
Entity x xP Shared value
Alice a = 328764 aP = (676324, 250820) e(bP, cP )a = 140130θ + 718087
Bob b = 76532 bP = (182560, 387188) e(aP, cP )b = 140130θ + 718087
Carol c = 127654 cP = (377194, 304569) e(aP, bP )c = 140130θ + 718087
Alice broadcasts aP after she computes this point in G. Likewise, Bob broad-
casts bP , and Carol broadcasts cP . At the end of the protocol, the secret
shared by the three parties is the element 140130θ + 718087 ∈ Fp2 . ¤
454 Computational Number Theory

9.5.3 Identity-Based Signature


An identity-based signature scheme is similar to an identity-based encryp-
tion scheme in the sense that a verifier derives the signer’s public key directly
from the public identity of the signer. The authenticity of this public key is
not to be established by means of certificates.

9.5.3.1 Shamir Scheme


Adi Shamir happens to be the first to propose a concrete realization of an
identity-based signature scheme (Footnote 13). Shamir’s scheme is not based
on pairing, but, being a pioneer, is worth mentioning in this context.
Setup: The TA generates two large primes p, q, and computes n = pq and
φ(n) = (p − 1)(q − 1). In addition, an integer e ∈ {2, 3, . . . , φ(n) − 2}, coprime
to φ(n), is chosen by the TA. Although Shamir remarks that e should also
be a large prime, it is not clear whether this is a real necessity. Any e > 2
(could be a prime) appears to make the scheme sufficiently secure. The integer
d ≡ e−1 (mod φ(n)) is also computed by the TA.
The TA publishes (n, e) as public parameters to be used by all the entities
across the network. Furthermore, a public hash function H : {0, 1}∗ → Zn is
fixed a priori. The factorization of n (knowledge of p or q or d or φ(n)) is kept
secret. The TA needs to use this secret knowledge to generate keys.
Key generation: For registering Bob, the TA generates the hashed iden-
tity IBob = H(bob@s.i.b.cr) (Bob is now not in the domain of pairing-based
d
cryptography!). Bob’s secret key DBob ≡ IBob (mod n) is secretly computed
e
and securely handed over to Bob by the TA. Note that IBob ≡ DBob (mod n).
Signing: In order to sign a message m ∈ Zn , Bob first generates a random
non-zero x ∈ Zn , and computes
s ≡ xe (mod n).
Subsequently, Bob uses his private key to generate
t ≡ DBob × xH(s,m) (mod n).
Bob’s signature on m is the pair (s, t).
Verification: Raising the last congruence (for t) to the e-th power gives
te ≡ DBob
e
× (xe )H(s,m) ≡ IBob × sH(s,m) (mod n).
That is, the verifier checks whether the congruence te ≡ IBob ×sH(s,m) (mod n)
holds. Here, Bob’s public key IBob is obtained by hashing his identity.
A forger can start with a random non-zero x of her choice, and computes
xe and xH(s,m) modulo n. But then, the ability of the forger to generate the
correct t is equivalent to her knowledge of DBob . Obtaining DBob from IBob
amounts to RSA decryption without knowing the TA’s decryption key d.
Public-Key Cryptography 455

Example 9.15 The TA chooses the primes p = 142871 and q = 289031, and
computes n = pq = 41294148001 and φ(n) = (p−1)(q−1) = 41293716100. The
prime e = 103319 (not a factor of φ(n)) is chosen, and its inverse d ≡ e−1 ≡
35665134679 (mod φ(n)) is computed. The values of e and n are published.
In 7-bit ASCII encoding, the string Bob evaluates to IBob = 214 × 66 + 27 ×
111 + 98 = 1095650. Let us take this as Bob’s public identity. His private key
d
is generated by the TA as DBob ≡ IBob ≡ 32533552181 (mod n).
Bob wants to sign the message m = 1627384950 ∈ Zn . He first chooses
x = 32465921980, and computes s ≡ xe ≡ 30699940025 (mod n). The hash of
s and m is computed as H(s, m) ≡ sm ≡ 22566668067 (mod n) (this is not a
good hash function, but is used only for illustration). Finally, the second part
of the signature is computed as t ≡ DBob × xH(s,m) ≡ 7434728537 (mod n).
For verifying the signature (s, t), one computes te ≡ 22911772376 (mod n)
and IBob × sH(s,m) ≡ 22911772376 (mod n) (IBob is derived from the string
Bob as the TA did). These two quantities are equal, so the signature is verified.
Let (s, t′ ) be a forged signature, where s = 30699940025 as before (for
the choice x = 32465921980), but t′ = 21352176809 6= t. The quantity IBob ×
sH(s,m) ≡ 22911772376 (mod n) remains the same as in the genuine signature,
but t′e ≡ 9116775652 (mod n) changes, thereby invalidating the signature. ¤

9.5.3.2 Paterson Scheme


Shortly after the first identity-based signature scheme using pairing was
proposed by Sakai, Ohgishi and Kasahara in 2000, many other similar schemes
appeared in the literature. These schemes differ somewhat with respect to for-
mal security guarantee and/or efficiency. For pedagogical reasons (close resem-
blance with the ElGamal signature scheme), I pick the proposal of Paterson,14
although this scheme is neither the most secure nor the most efficient among
the lot. The four phases of this scheme are now described.
Setup: The TA generates the parameters: a prime r, suitable groups G, G3
of order r, a bilinear map e : G × G → G3 , a generator P of G, a master secret
key s, and the point Ppub = sP ∈ G. Three hash functions H1 : {0, 1}∗ → G,
H2 : {0, 1}∗ → Zr and H3 : G → Zr are also predetermined. The TA publishes
r, G, G3 , e, P, Ppub , H1 , H2 , H3 . The key s is kept secret.
Key generation: For registering Bob, the TA derives the hashed identity
PBob = H1 (bob@p.b.cr), and gives Bob the private key DBob = sPBob securely.
Signing: Let M be the message to be signed by Bob, and m = H2 (M ).
Bob generates a random d′ ∈ Zr , and generates the signature (S, T ) as follows.
Notice that signature generation involves no pairing computation.
S = d′ P, and
T = d′−1 (mP − H3 (S)DBob ).
14 Kenny G. Paterson, ID-based signatures from pairings on elliptic curves, Electronics

Letters, 38(18), 1025–1026, 2002.


456 Computational Number Theory

Verification: Bob’s signature (S, T ) on M is verified if and only if the


following condition is satisfied (where m = H2 (M )):
e(P, P )m = e(S, T )e(Ppub , PBob )H3 (S)
If (S, T ) is a valid signature of Bob on M (with m = H2 (M )), we have
mP = d′ T + H3 (S)DBob = d′ T + H3 (S)sPBob . By bilinearity, it follows that
e(P, P )m = e(P, mP ) = e(P, d′ T + H3 (S)sPBob )
= e(P, d′ T )e(P, H3 (S)sPBob ) = e(d′ P, T )e(sP, PBob )H3 (S)
= e(S, T )e(Ppub , PBob )H3 (S) .
This establishes the correctness of the verification procedure.
The security of this scheme resembles that of the ElGamal signature
scheme. Signatures in the Paterson scheme are generated much in the same
way as in the ElGamal scheme, and this process involves working in G only.
Verification proceeds in G3 after the application of the bilinear map e.

Example 9.16 Let us continue to use the supersingular curve E, the gen-
erator P of the group G, and the distorted Weil pairing e : G × G → G3
as in Example 9.12. Suppose the the TA’s master secret key is s = 219430,
for which the public key is Ppub = sP = (138113, 152726). Suppose also that
Bob’s hashed identity is PBob = (267934, 76182) which corresponds to the
private key DBob = sPBob = (334919, 466375).
Bob wants to sign the message M which hashes to m = 12345 ∈ Zr . Bob
chooses the session secret d′ = 87413, and gets S = d′ P = (513155, 447898).
For computing the second part T of the signature, Bob needs to use a hash
function H3 : G → Zr . Let us take H3 (a, b) = ab (mod r). This is again not a
good hash function but is used here as a placeholder. For this H3 , Bob gets
H3 (S) = 553526, so T = d′−1 (mP − H3 (S)DBob ) = (487883, 187017).
For signature verification, Alice computes Bob’s hashed public identity
PBob = (267934, 76182). Then, Alice computes W1 = e(P, P )m = 45968θ +
325199, W2 = e(S, T ) = 139295θ + 53887, and W3 = e(Ppub , PBob )H3 (S) =
61033θ + 645472. Since W1 = W2 W3 , the signature is verified.
Now, let us see how a forged signature is not verified. A forger can generate
S = d′ P = (513155, 447898) for the choice d′ = 87413, as Bob did. However,
since DBob is unknown to the forger, she uses a random T ′ = (446698, 456705),
and claims that (S, T ′ ) is the signature of Bob on the message M (or m). For
verifying this forged signature, one computes W1 = e(P, P )m = 45968θ +
325199 and W3 = e(Ppub , PBob )H3 (S) = 61033θ + 645472 as in the case of the
genuine signature. However, we now have W2′ = e(S, T ′ ) = 638462θ+253684 6=
W2 . This gives W2′ W3 = 367570θ + 366935 6= W1 . ¤

An advantage of identity-based schemes using pairing over Shamir’s


scheme is a reduction in signature sizes. For standard security, n in Shamir’s
scheme should be a 1024-bit composite integer, and two elements modulo
n (a total of about 2048 bits) constitute the signature. On the other hand,
Public-Key Cryptography 457

Paterson’s scheme generates two elliptic-curve points as the signature. With


suitable choices of supersingular curves (like those of embedding degree six),
the total size of a Paterson signature can be smaller than 2048 bits.

9.5.4 Boneh–Lynn–Shacham (BLS) Short Signature Scheme


As an application of pairing in cryptography, I finally present the short
signature scheme proposed by Boneh et al.15 This is a conventional signature
scheme (that is, not identity-based). However, its attractiveness stems from
the fact that compared to other such schemes (like DSA or ECDSA), the BLS
scheme can produce significantly shorter signatures at the same security level.
However, in order to ensure both the shortness of signatures and comparable
security with DSA-like schemes, it is preferable to work on a general pairing
function e : G1 × G2 → G3 with G1 6= G2 . Supersingular curves supporting
G1 = G2 may fail to give short signatures at the same security level of DSA.
Three groups G1 , G2 , G3 of prime order r, and a bilinear map e : G1 ×G2 →
G3 are chosen as parameters, along with a generator Q of G2 . Bob selects a
random non-zero d ∈ Zr as his private key. His public key is Y = dQ ∈ G2 .
In order to sign a message M , Bob computes a short representative R =
H(M ) ∈ G1 . Bob generates the signature as σ = dR, that is, the signature now
consists of only one element of G1 . With a suitable choice of the underlying
elliptic curve on which the pairing e is defined, this may lead to signatures
as small as about 160 bits (only the x-coordinate of σ and a disambiguating
bit to identify its correct y-coordinate suffice). On the contrary, a DSA or
ECDSA signature requires about 320 bits.
It is interesting to note that the BLS scheme does not require a session
secret (unlike other conventional schemes described earlier). We now have two
groups G1 , G2 . The elements Q and Y = dQ in G2 are related in the same
way as do R and σ = dR in G1 . This implies that the verification of σ on M
is the same as checking whether the equality e(σ, Q) = e(R, Y ) holds (both
sides are equal to e(R, Q)d ), where R is computed as H(M ).
A forger, on the other hand, needs to solve an instance of Co-CDHP in
order to generate a valid signature. Signature verification is easy, since the Co-
DDHP is easy for G1 , G2 , whereas forging is difficult, since the Co-CDHP is
difficult. It follows that any pair of gap Diffie–Hellman (GDH) groups G1 , G2
forms a perfect setting for the BLS scheme.

Example 9.17 We use the supersingular elliptic curve E of Example 9.12.


For BLS signatures, we consider the original Weil pairing
e = er : G1 × G2 → G3 ,
where G1 is the subgroup of Ep of order r generated by the point P =
(6, 106202), and G2 is the subgroup of Ep2 of order r generated by the point
15 Dan Boneh, Ben Lynn and Hovav Shacham, Short signatures from the Weil pairing,

Journal of Cryptology, 17, 297–319, 2004.


458 Computational Number Theory

Q = ϕ(P ) = (−6, 106202θ) = (744277, 106202θ). Let Bob’s private key be


d = 73302. This gives his public key as Y = dQ = (40987, 640815θ).
Let R = (128039, 742463) be the hashed message that Bob wants to sign.
Bob only computes σ = dR = (626836, 439558).
In order to verify Bob’s signature σ on R, one computes W1 = e(σ, Q) =
521603θ + 230328, and W2 = e(R, Y ) = 521603θ + 230328. Since W1 = W2 ,
the signature is verified.
Let σ ′ = (221920, 287578) be a forged signature on the same R. Now,
we get W1′ = e(σ ′ , Q) = 226963θ + 361018 6= W1 , whereas W2 = e(R, Y ) =
521603θ + 230328 remains the same as for the genuine signature. Since W1′ 6=
W2 , the forged signature is not verified. ¤
Public-Key Cryptography 459

Exercises
1. Let H be a hash function.
(a) Prove that if H is collision resistant, then H is second preimage resistant.
(b) Give an example to corroborate that H may be second preimage resistant,
but not collision resistant.
(c) Corroborate by an example that H may be first preimage resistant, but
not second preimage resistant.
(d) Corroborate by an example that H may be second preimage resistant,
but not first preimage resistant.
2. Let M be the message to be signed by a digital signature scheme. Using a
hash function H, one obtains a representative m = H(M ), and the signature
is computed as a function of m and the signer’s private key. Describe how the
three desirable properties of H are required for securing the signature scheme.
3. Prove that for an n-bit hash function H, collisions can be found with high
probability after making about 2n/2 evaluations of H on random input strings.
4. For the RSA encryption scheme, different entities are required to use different
primes p, q. Argue why. Given that the RSA modulus is of length t bits (like
t = 1024), estimate the probability that two entities in a network of N entities
accidentally use a common prime p or q. You may assume that each entity
chooses p and q independently and randomly from the set of t/2-bit primes.
5. Let, in the RSA encryption scheme, the ciphertexts corresponding to messages
m1 , m2 ∈ Zn be c1 , c2 for the same recipient (Bob). Argue that the ciphertext
corresponding to m1 m2 (mod n) is c1 c2 (mod n). What problem does this
relation create? How can you remedy this problem?
6. Let Alice encrypt, using RSA, the same message for e identities sharing the
same public key e (under pairwise coprime moduli). How can Eve identify the
(common) plaintext message by intercepting the e ciphertext messages? Notice
that this situation is possible in practice, since the RSA encryption exponent
is often chosen as a small prime. How can this problem be remedied?
7. Let m ∈ Zn be a message to be encrypted by RSA. Count how many messages
m satisfy the identity me ≡ m (mod n). These are precisely the messages
which do not change after encryption.
8. Prove that the RSA decryption algorithm may fail to work for p = q, even if
one correctly takes φ(p2 ) = p2 − p. (If p = q, factoring n = p2 is trivial, and
the RSA scheme forfeits its security. Worse still, it does not work at all.)
9. To speed up RSA decryption, let Bob store the primes p, q (in addition to d),
compute cd modulo both p and q, and combine these two residues by the CRT.
Complete the details of this decryption procedure. What speedup is produced
by this modified decryption procedure (over directly computing cd (mod n))?
460 Computational Number Theory

10. Establish why a new session secret d′ is required for every invocation of the
ElGamal encryption algorithm.
11. Suppose that in the Diffie–Hellman key-agreement protocol, the group size |G|
has a small prime divisor. Establish how an active adversary (an adversary
who, in addition to intercepting a message, can modify the message and send
the modified message to the recipient) can learn a shared secret between Alice
and Bob. How can you remedy this attack?
12. Suppose that the group G in the Diffie–Hellman key-agreement protocol is
cyclic of size m, whereas g ∈ G has order n with n|m. Let f = m/n be the

cofactor. Suppose that f has a small prime divisor u, and Bob sends g d h or
h to Alice (but Alice sends g d to Bob), where h ∈ G is an element of order u.
Suppose that Alice later uses a symmetric cipher to encrypt some message for
Bob using the (shared) secret key computed by Alice. Explain how Bob can
easily obtain d modulo u upon receiving the ciphertext. Explain that using

g f dd as the shared secret, this problem can be remedied.
13. Let G be a cyclic multiplicative group (like a subgroup of F∗q ) with a generator
g. Assume that the DLP is computationally infeasible in G. Suppose that
Alice, Bob and Carol plan to agree upon a common shared secret by the
Burmester–Desmedt protocol16 which works as follows.
1. Alice generates random a, and broadcasts Za = g a .
2. Bob generates random b, and broadcasts Zb = g b .
3. Carol generates random c, and broadcasts Zc = g c .
4. Alice broadcasts Xa = (Zb /Zc )a .
5. Bob broadcasts Xb = (Zc /Za )b .
6. Carol broadcasts Xc = (Za /Zb )c .
7. Alice computes Ka = Zc3a Xa2 Xb .
8. Bob computes Kb = Za3b Xb2 Xc .
9. Carol computes Kc = Zb3c Xc2 Xa .
Prove that Ka = Kb = Kc = g ab+bc+ca .
14. Assume that Bob uses the same RSA key pair for both encryption and sig-
nature. Suppose also that Alice sends a ciphertext c to Bob, and the corre-
sponding plaintext is m. Finally, assume that Bob is willing to sign a message
in Zn supplied by Eve. Bob only ensures that he does not sign a message (like
c) which has been sent by him as a ciphertext. Describe how Eve can still
arrange a message µ ∈ Zn such that Bob’s signature on µ reveals m to Eve.
15. Show that a situation as described in Exercise 9.5 can happen for RSA signa-
tures too. This is often termed as existential forgery of signatures. Explain, in
this context, the role of the hash function H used for computing m = H(M ).
16. Describe how existential forgery is possible for ElGamal signatures.
16 Mike Burmester and Yvo Desmedt, A secure and scalable group key exchange system,

Information Processing Letters, 94(3), 137–143, 2005.


Public-Key Cryptography 461

17. Explain why a new session key d′ is required during each invocation of the
ElGamal signature-generation procedure.
18. Suppose that for a particular choice of m and d′ in the ElGamal signature
generation procedure, one obtains t = 0. Argue why this situation must be
avoided. (If one gets t = 0, one should choose another random d′ , and repeat
the signing procedure until t 6= 0. For simplicity, this intricacy is not mentioned
in the ElGamal signature generation procedure given in the text.)
19. Show that in the DSA signature generation procedure, it is possible to have
s = 0 or t = 0. Argue why each of these cases must be avoided.
20. Show that for each message m ∈ Zn , there are at least two valid ECDSA
signatures (s, t1 ) and (s, t2 ) of Bob with the same s. Identify a situation where
there are more than two valid ECDSA signatures (s, t) with the same s.
21. In a blind signature scheme, Bob signs a message m without knowing the mes-
sage m itself. Bob is presented a masked version µ of m. For example, Bob
may be a bank, and the original message m may pertain to an electronic coin
belonging to Alice. Since money spending is usually desired to be anonymous,
Bob should not be able to identify Alice’s identity from the coin. However,
Bob’s active involvement (signature on µ) is necessary to generate his sig-
nature on the original message m. Assume that the RSA signature scheme is
used. Describe a method of masking m to generate µ such that Bob’s signature
on m can be easily recovered from his signature on µ.17
22. A batch-verification algorithm for signatures s1 , s2 , . . . , sk on k messages (or
message representatives) m1 , m2 , . . . , mk returns “signature verified” if each
si is a valid signature on mi for i = 1, 2, . . . , k. If one or more si is/are not
valid signature(s) on the corresponding message(s) mi , the algorithm should,
in general, return “signature not verified.” A batch-verification algorithm is
useful when its running time on a batch of k signatures is significantly smaller
than the total time needed for k individual verifications.
(a) Suppose that s1 , s2 , . . . , sk are RSA signatures of the same entity on mes-
sages m1 , m2 , . . . , mk . Describe a batch-verification procedure for these k sig-
natures. Establish the speedup produced by your batch-verification algorithm.
Also explain how the algorithm declares a batch of signatures as verified even
when one or more signatures are not individually verifiable.
(b) Repeat Part (a) on k DSA signatures (s1 , t1 ), (s2 , t2 ), . . . , (sk , tk ) from the
same entity on messages m1 , m2 , . . . , mk .
(c) Repeat Part (b) when the k signatures come from k different entities.
(d) What is the problem in adapting the algorithm of Part (b) or (c) to the
batch verification of ECDSA signatures?
23. Describe how a zero-knowledge authentication scheme can be converted to a
signature scheme.
24. Let n = pq be a product of two primes p, q each congruent to 3 modulo 4.
17 David Chaum, Blind signatures for untraceable payments, Crypto, 199–202, 1982.
462 Computational Number Theory

(a) Prove that every quadratic residue modulo this n has four square roots,
of which exactly one is again a quadratic residue modulo n. Argue that from
the knowledge of p and q, this unique square root can be easily determined.
(b) Suppose that Alice wants to prove her knowledge of the factorization of n
to Bob using the following authentication protocol. Bob generates a random
x ∈ Zn , and sends the challenge c ≡ x4 (mod n) to Alice. Alice computes the
unique square root r of c which is a square in Zn . Alice sends r to Bob. Bob
accepts Alice’s identity if and only if r ≡ x2 (mod n). Show that Bob can
send a malicious challenge c (not a fourth power) to Alice such that Alice’s
response r reveals the factorization of n to Bob.
25. Let e : G × G → G3 be a bilinear map (easily computable). Prove that the
DLP in G is no more difficult than the DLP in G3 .
26. In this exercise, we deal with Boneh and Boyen’s identity-based encryption
scheme.18 Let G, G3 be groups of prime order r, P a generator of G, and
e : G × G → G3 a bilinear map. The master secret key of the TA consists of
two elements s1 , s2 ∈ Z∗r , and the public keys are Y1 = s1 P and Y2 = s2 P .
In the registration phase for Bob, the TA generates a random t ∈ Z∗r , and
computes K = (PBob + s1 + s2 t)−1 P , where PBob ∈ Z∗r is the hashed public
identity of Bob, and where the inverse is computed modulo r. Bob’s private
key is the pair DBob = (t, K).
In order to encrypt a message m ∈ G3 for Bob, Alice generates a random
k ∈ Z∗r , and computes the ciphertext (U, V, W ) ∈ G × G × G3 , where U =
kPBob P + kY1 , V = kY2 , and W = M × e(P, P )k .
Describe how the ciphertext (U, V, W ) is decrypted by Bob.
27. Generalize the Sakai–Ohgishi–Kasahara key-agreement protocol to the general
setting of a bilinear map e : G1 × G2 → G3 (with G1 6= G2 , in general).19
28. Okamoto and Okamoto propose a three-party non-interactive key-agreement
scheme.20 The TA sets up a bilinear map e : G × G → G3 with r = |G| = |G3 |.
The TA also chooses a secret polynomial of a suitable degree k :

f (x) = d0 + d1 x + · · · + dk xk ∈ Zr [x].

For a generator P of G, the TA computes Vi = di P for i = 0, 1, . . . , k. The


TA publishes r, G, G3 , e, P, V0 , V1 , . . . , Vk .
For an entity A with hashed identity PA ∈ Z∗r , the private key is computed
by the TA as DA ≡ f (PA ) (mod r).
Describe how three parties Alice, Bob and Carol with hashed public identities
PA , PB , PC (respectively) can come up with the shared secret e(P, P )DA DB DC .
Comment on the choice of the degree k of f .
18 Dan Boneh and Xavier Boyen, Efficient selective-ID secure identity based encryption

without random oracles, EuroCrypt, 223–238, 2004.


19 Régis Dupont and Andreas Enge, Provably secure non-interactive key distribution based

on pairings, Discrete Applied Mathematics, 154(2), 270–276, 2006.


20 Eiji Okamoto and Takeshi Okamoto, Cryptosystems based on elliptic curve pairing,

Modeling Decisions for Artificial Intelligence—MDAI, 13–23, 2005.


Public-Key Cryptography 463

29. Sakai, Ohgishi and Kasahara propose an identity-based signature scheme (in
the same paper where they introduced their key-agreement scheme). The pub-
lic parameters r, G, G3 , e, P, Ppub , and the master secret key s are as in the
SOK key-agreement scheme. Also, let H : {0, 1}∗ → G be a hash function.
Bob has the hashed public identity PBob and the private key DBob = sPBob .
In order to sign a message M , Bob chooses a random d ∈ Zr , and computes
U = dP ∈ G. For h = H(PBob , M, U ), Bob also computes V = DBob +dh ∈ G.
Bob’s signature on M is (U, V ). Describe how this signature can be verified.
30. Cha and Cheon’s identity-based signature scheme21 uses a bilinear map e :
G × G → G3 , and hash functions H1 : {0, 1}∗ → G and H2 : {0, 1}∗ × G → Zr ,
where r = |G| = |G3 |. For the master secret key s, the TA’s public identity is
Ppub = sP , where P is a generator of G. Bob’s hashed (by H1 ) public identity
is PBob , and Bob’s private key is DBob = sPBob . Bob’s signature on a message
M is (U, V ), where, for a randomly chosen t ∈ Zr , Bob computes U = tPBob ,
and V = (t + H2 (M, U ))DBob . Explain how verification of (U, V ) is done in
the Cha–Cheon scheme. Discuss how the security of the Cha–Cheon scheme
is related to the bilinear Diffie–Hellman problem. Compare the efficiency of
the Cha–Cheon scheme with the Paterson scheme.
31. Boneh and Boyen propose short signature schemes (not identity-based).22
These schemes do not use hash functions. In this exercise, we deal with one
such scheme. Let e : G1 × G2 → G3 be a bilinear map with G1 , G2 , G3 hav-
ing prime order r. Let P be a generator of G1 , Q a generator of G2 , and
g = e(P, Q). The public parameters are r, G1 , G2 , G3 , e, P, Q, g. Bob selects a
random d ∈ Z∗r (private key), and makes Y = dQ public. Bob’s signature on
the message m ∈ Zr is σ = (d + m)−1 P ∈ G1 . Here, (d + m)−1 is computed
modulo r, and is taken to be 0 if r|(d + m). Describe a verification procedure
for this scheme. Argue that this verification procedure can be implemented to
be somewhat faster than verification in the BLS scheme.
32. Boneh and Boyen’s scheme presented in Exercise 9.31 is weakly secure in
some sense. In order to make the scheme strongly secure, they propose a
modification which does not yield very short signatures. Indeed, the signature
size is now comparable to that in DSA or ECDSA. For this modified scheme,
the parameters r, G1 , G2 , G3 , e, P, Q, g are chosen as in Exercise 9.31. Two
random elements d1 , d2 ∈ Z∗r are chosen by Bob as his private key, and the
elements Y1 = d1 Q ∈ G2 and Y2 = d2 Q ∈ G2 are made public. In order
to sign a message m ∈ Zr , Bob selects a random t ∈ Zr with t 6≡ −(d1 +
m)d−12 (mod r), and computes σ = (d1 + d2 t + m)−1 P ∈ G1 , where the
inverse is computed modulo r. Bob’s signature on m is the pair (t, σ). Describe
a verification procedure for this scheme. Argue that this verification procedure
can be implemented to be somewhat faster than BLS verification.

21 Jae Choon Cha and Jung Hee Cheon, An identity-based signature from gap Diffie–

Hellman groups, PKC, 18–30, 2003.


22 Dan Boneh and Xavier Boyen, Short signatures without random oracles, Journal of

Cryptology, 21(2), 149–177, 2008.


464 Computational Number Theory

Programming Exercises
Using GP/PARI, implement the following functions.
33. RSA encryption and decryption.
34. ElGamal encryption and decryption.
35. RSA signature generation and verification.
36. ElGamal signature generation and verification.
37. DSA signature generation and verification.
38. ECDSA signature generation and verification.
Assuming that GP/PARI functions are available for Weil and distorted Weil
pairing on supersingular elliptic curves, implement the following functions.
39. Boneh–Franklin encryption and decryption.
40. Paterson signature generation and verification.
41. BLS short signature generation and verification.
Appendices
Appendix A
Background

A.1 Algorithms and Their Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467


A.1.1 Order Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468
A.1.2 Recursive Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
A.1.3 Worst-Case and Average Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475
A.1.4 Complexity Classes P and NP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
A.1.5 Randomized Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
A.2 Discrete Algebraic Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483
A.2.1 Functions and Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483
A.2.2 Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484
A.2.3 Rings and Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 488
A.2.4 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492
A.2.5 Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494
A.3 Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495
A.3.1 Linear Transformations and Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496
A.3.2 Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497
A.3.3 Inverse and Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502
A.3.4 Rank and Nullspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506
A.3.5 Characteristic and Minimal Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . 509
A.4 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 510
A.4.1 Random Variables and Probability Distributions . . . . . . . . . . . . . . . . . 511
A.4.2 Birthday Paradox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513
A.4.3 Random-Number Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514

In this chapter, I review some background material needed as a prerequisite for


understanding the contents of the earlier chapters. Readers who are already
conversant with these topics may quickly browse through this appendix to
become familiar with my notations and conventions.

A.1 Algorithms and Their Complexity


A basic notion associated with the performance of an algorithm is its
running time. The actual running time of an algorithm depends upon its im-
plementation (in a high-level language like C or Java, or in hardware). But the
implementation alone cannot characterize the running time in any precise unit,
like seconds, machine instructions or CPU cycles, because these measurements
depend heavily on the architecture, the compiler, the version of the compiler,
the run-time conditions (such as the current load of the machine, availability

467
468 Computational Number Theory

of cache), and so on. It is, therefore, customary to express the running time of
an algorithm (or an implementation of it) in more abstract (yet meaningful)
terms. An algorithm can be viewed as a transformation that converts its input
I to an output O. The size n = |I| of I is usually the parameter, in terms of
which the running time of the algorithm is specified. This specification is good
if it can be expressed as a simple function of n. This leads to the following
order notations. These notations are not invented by computer scientists, but
have been used by mathematicians for ages. Computer scientists have only
adopted them in the context of analyzing algorithms.

A.1.1 Order Notations


Order notations compare the rates of growth of functions. The most basic
definition in this context is given now. While analyzing algorithms, we typi-
cally restrict our attention to positive (or non-negative) real-valued functions
of positive (or non-negative) integers. The argument of such a function is the
input size which must be a non-negative integer. The value of this function is
typically the running time (or space requirement) of an algorithm, and cannot
be negative. In the context of analyzing algorithms, an unqualified use of the
term function indicates a function of this type.

Definition A.1 [Big-O notation] We say that a function f (n) is of the order
of g(n), denoted f (n) = O(g(n)), if there exist a positive real constant c and
a non-negative integer n0 such that f (n) 6 cg(n) for all n > n0 .1 ⊳

Intuitively, f (n) = O(g(n)) implies that f does not grow faster than g
up to multiplication by a positive constant value. Moreover, initial patterns
exhibited by f and g (that is, their values for n < n0 ) are not of concern to
us. The inequality f (n) 6 cg(n) must hold for all sufficiently large n.

Example A.2 (1) Let f (n) = 4n3 − 16n2 + 4n + 25, and g(n) = n3 . We
see that f (n) = 4(n + 1)(n − 2)(n − 3) + 1 > 0 for all integers n > 0. That
f (2.5) = −2.5 < 0 does not matter, since we are not interested in evaluating
f at fractional values of n. We have f (n) 6 4n3 + 4n + 25. But n 6 n3 and
1 6 n3 for all n > 1, so f (n) 6 4n3 + 4n3 + 25n3 = 33n3 for all n > 1, that is,
f (n) = O(g(n)). Conversely, g(n) = n3 = 13 (4n3 −n3 ) 6 13 (4n3 −n3 +4n+25) 6
1 3 2
3 (4n − 16n + 4n + 25) for all n > 16, so g(n) = O(f (n)) too.
(2) The example of Part (1) can be generalized. Let f (n) = ad nd +
ad−1 nd−1 + · · · + a1 n + a0 be a polynomial with ad > 0. For all sufficiently
large n, the term ad nd dominates over the other non-zero terms of f (n), and
consequently, f (n) = O(nd ), and nd = O(f (n)).
(3) Let f (n) = 4n3 − 16n2 + 4n + 25 as in Part (1), but g(n) = n4 . For
all n > 1, we have f (n) 6 10n3 6 10n4 , so that f (n) = O(n4 ). We prove by
1 In this context, some authors prefer to say that f is big-O of g. For them, f is of the

order of g if and only if f (n) = Θ(g(n)) (Definition A.3).


Background 469

contradiction that g(n) = n4 is not O(f (n)). Suppose that g(n) = O(f (n)).
This implies that there exist constants c > 0 (real) and n0 > 0 (integer) such
that n4 6 c(4n3 − 16n2 + 4n + 25), that is, n4 − c(4n3 − 16n2 + 4n + √ 25) 6 0 for
n > n0 . But 41 n4 > 4cn3 for n > 16c = r1 (say), 41 n4 > 4cn for n > 3 16c = r2 ,

and 14 n4 > 25c for n > 4 100c = r3 . For any ¡ 1 integer n¢> max(n0¡, r1 , r2 , r3 ), ¢we
1 4
have
¡1 4 n4
−c(4n 3
¢ 1 4−16n 2
+4n+25) = 4 n + 4 n4
− 4cn3
+16cn2 + 14 n4 − 4cn +
2
4 n − 25c > 4 n + 16cn > 0, a contradiction.
More generally, if f (n) is a polynomial of degree d, and g(n) a polynomial
of degree e with 0 6 d 6 e, then f (n) is O(g(n)). If e = d, then g(n) is O(f (n))
too. But if e > d, then g(n) is not O(f (n)). Thus, the degree of a polynomial
function determines its rate of growth under the O( ) notation.
(4) Let f (n) = 2 + sin n, and g(n) = 2 + cos n. Because of the bias 2, the
functions f and g are positive-valued. Evidently, f (n) > g(n) infinitely often,
and also g(n) > f (n) infinitely often. But 1 6 f (n) 6 3 and 1 6 g(n) 6 3 for
all n > 0, that is, f (n) 6 3g(n) and g(n) 6 3f (n) for all n > 0. Thus, f (n)
is O(g(n)) and g(n) is O(f (n)). Indeed, both these functions are of the order
of the constant function 1, and conversely. This is intuitively clear, since f (n)
and g(n) remain confined in the band [1, 3], and do not grow at all.
½ ½
2 2
(5) Let f (n) = n if n is even , and g(n) = n3 if n is odd . We see
3
n if n is odd n if n is even
that f grows strictly faster than g for odd values of n, whereas g grows strictly
faster than f for even values of n. This implies that neither f (n) is O(g(n))
nor g(n) is O(f (n)). ¤

The big-O notation of Definition A.1 leads to some other related notations.
Definition A.3 Let f (n) and g(n) be functions.
(1) [Big-Omega notation] If f (n) = O(g(n)), we write g(n) = Ω(f (n)). (Big-
O indicates upper bound, whereas big-Omega indicates lower bound.)
(2) [Big-Theta notation] If f (n) = O(g(n)) and f (n) = Ω(g(n)), we write
f (n) = Θ(g(n)). (In this case, f and g exhibit the same rate of growth
up to multiplication by positive constants.)
(3) [Small-o notation] We say that f (n) = o(g(n)) if for every positive
constant c (however small), there exists n0 ∈ N0 such that f (n) 6 cg(n)
for all n > n0 . (Here, g is an upper bound on f , which is not tight.)
(4) [Small-omega notation] If f (n) = o(g(n)), we write g(n) = ω(f (n)).
(This means that f is a loose lower bound on g.)
All these order notations are called asymptotic, since they compare the growths
of functions for all sufficiently large n. ⊳

Example A.4 (1) For any non-negative integer d and real constant a > 1,
we have nd = o(an ). In words, any exponential function asymptotically grows
faster than any polynomial function. Likewise, logk n = o(nd ) for any positive
k and d, that is, any logarithmic (or poly-logarithmic) function asymptotically
grows more slowly than any polynomial function.
470 Computational Number Theory
³√ ´
(2) Consider the subexponential function L(n) = exp n ln n . We have
L(n) = o(an ) for any real constant a > 1. Also, L(n) = ω(nd ) for any integer
constant d > 0. This is why L(n) is called a subexponential function of n.
(3) It may be tempting to conclude that f (n) = o(g(n)) if and only if
f (n) = O(g(n)) but g(n) 6= O(f (n)). For most functions we encounter during
analysis of algorithms, these two notions of loose upper bounds turn out to
be the same. However, there exists a subtle difference between
n them. As an
n if n is odd
illustration, take f (n) = n for all n > 0, whereas g(n) = .
n2 if n is even
We have f (n) = O(g(n)) and g(n) 6= O(f (n)). But f (n) 6= o(g(n)), since for
the choice c = 1/2, we cannot find an n0 such that f (n) 6 cg(n) for all n > n0
(look at the odd values of n). ¤

Some comments about the input size n are in order now. There are several
units in which n can be expressed. For example, n could be the number of bits
in (some reasonable encoding of) the input I. When we deal with an array
of n integers each fitting into a standard 32-bit (or 64-bit) machine word, the
bit size of the array is 32n (or 64n). Since we are interested in asymptotic
formulas with constant factors neglected, it is often convenient to take the
size of the array as n. Physically too, this makes sense, since now the input
size is measured in units of words (rather than bits).
It may be the case that the input size is specified by two or more indepen-
dent parameters. An m × n matrix of integers has the input size mn (in terms
of words, or 32mn in terms of bits). However, m and n can be independent of
one another, and we often express the running time of a matrix-manipulation
algorithm as a function of two arguments m and n (instead of a function of
one argument mn). If m = n, that is, for square (n × n) matrices, the input
size is n2 , but running times are expressed in terms of n, rather than of n2 .
Here, n is not the input size, but a parameter that dictates the input size.
In computational number theory, we deal with large integers which do
not fit in individual machine words. Treating each such integer as having a
constant size is not justifiable. An integer k fits in ⌈log232 k⌉ 32-bit machine
words. A change in the base of logarithms affects this size by a constant
factor, so we take the size of k as log k. For a polynomial input of degree d
with coefficients modulo a large integer m, the input size is 6 (d + 1) log m,
since the polynomial contains at most d + 1 non-zero coefficients, and ⌈log m⌉
is an upper bound on the size of each coefficient. We may treat the size of the
polynomial as consisting of two independent parameters d and log m.
The order notations introduced in Definition A.1 and A.3 neglect constant
factors. Sometimes, it is useful to neglect logarithmic factors too.

Definition A.5 [Soft-O notation] Let f (n) and g(n) be functions. We say
that f (n) = O˜(g(n)) if f (n) = O(g(n) logt g(n)) for some constant t > 0. ⊳

Example A.6 Some exponential algorithms for√factoring m (like the Pollard


rho method) take running times of the form O( 4 m logt m). We say that the
Background 471
√ √
running time of such an algorithm is O˜( 4 m). The idea is that 4 m grows so
fast (exponentially) with log m that it is not of a great importance to look
closely at the polynomial factor logt m, despite that this factor is non-constant.
By using O˜( ) instead of O( ), we make it clear that we have neglected the fac-
tor logt m. The only information lost in the soft-O notation is the correct value
of t, but √
that is insignificant, since irrespective of the value of the constant t,
1
we have 4 m logt m = O(m 4 +ǫ ) for any ǫ > 0 (however small). ¤

A.1.2 Recursive Algorithms


A recursive algorithm invokes itself while solving a problem. Suppose that
we want to solve a problem on an input of size n. We generate subproblems
on input sizes n1 , n2 , . . . , nk , recursively call the algorithm on each of these
k subproblems, and finally combine the solutions of these subproblems to
obtain the solution of the original problem. Typically, the input size ni of each
subproblem is smaller than the input size n of the original problem. Recursion
stops when the input size of a problem becomes so small that we can solve
this instance of the problem directly, that is, without further recursive calls on
even smaller instances. Such a recursive algorithm is often called a divide-and-
conquer algorithm. If the number k of subproblems and the sizes n1 , n2 , . . . , nk
are appropriately small, this leads to good algorithms. However, if the number
of subproblems is too large and/or the sizes ni are not much reduced compared
to n, the behavior of the divide-and-conquer algorithm may be quite bad (so
this now may become a divide-and-get-conquered algorithm).
Let T (n) be the running time of the recursive algorithm described in the
last paragraph. Suppose that n0 is the size such that we make recursive calls if
and only if n > n0 . Then, T (0), T (1), . . . , T (n0 − 1) are some constant values
(the times required to directly solve the problem for these small instances).
On the other hand, for n > n0 , we have

T (n) = T (n1 ) + T (n2 ) + · · · + T (nk ) + c(n),

where n1 , n2 , . . . , nk (and perhaps also k) depend upon n, and c(n) is the cost
associated with the generation of the subproblems and with the combination
of the solutions of the k subproblems. Such an expression of a function in terms
of itself (but with different argument values) is called a recurrence relation.
Solving a recurrence relation to obtain a closed-form expression of the function
is an important topic in the theory of algorithms.

Example A.7 The Fibonacci numbers are defined recursively as:

F0 = 0,
F1 = 1,
Fn = Fn−1 + Fn−2 for n > 2.
472 Computational Number Theory

Consider a recursive algorithm for computing Fn , given the input n. If n = 0


or n = 1, the algorithm immediately returns 0 and 1, respectively. If n > 2, it
recursively calls the algorithm on n − 1 and n − 2, adds the values returned by
these recursive calls, and returns this sum. This is a bad divide-and-conquer
algorithm. In order to see why, let me count the number s(n) of additions
performed by this recursive algorithm.
s(0) = 0,
s(1) = 0,
s(n) = s(n − 1) + s(n − 2) + 1 for n > 2.
We have s(1) = F1 − 1 and s(2) = s(0) + s(1) + 1 = 1 = F2 − 1. For n > 3, we
inductively get s(n) = s(n−1)+s(n−2)+1 = Fn−1 −1+Fn−2 −1+1 = Fn −1.
Therefore, s(n) = Fn − 1 for all n > 1. We know that Fn ≈ √15 ρn for all large

n, that is, s(n) = Θ(ρn ), where ρ = 1+2 5 is the golden ratio. This analysis
implies that this recursive algorithm runs in time exponential in n.
The following iterative (that is, non-recursive) algorithm performs much
better. We start with F0 = 0 and F1 = 1. From these two values, we compute
F2 = F1 + F0 . We then compute F3 = F2 + F1 , F4 = F3 + F2 , and so on, until
Fn for the given n is computed. This requires only n − 1 additions.
Since the size of Fn grows linearly with n (because Fn ≈ √15 ρn ), we cannot
compute Fn in o(n) time. This is precisely what the iterative algorithm does
(not exactly, it takes Θ(n2 ) time, since we should consider adding integers of
size proportional to n). However, if Fn needs to be computed modulo m ≈ n
(see Algorithm 5.5), we can finish the computation in log3 n time. This is now
truly polynomial in the input size, since the size of n is log n. ¤
The above example demonstrates a bad use of recursion. In many situa-
tions, however, recursion can prove to be rather useful.
Example A.8 Let A be an array of n elements (say, single-precision integers).
Our task is to rearrange the elements in the increasing order (or non-decreasing
order if A contains repetitions of elements). Merge sort is a divide-and-conquer
algorithm to solve this problem. The array is broken in two halves, each of
size about n/2. The two halves are recursively sorted. Finally, these sorted
arrays of sizes (about) n/2 each are merged to a single sorted array of size n.
The splitting of A (the divide step) is trivial. We let the two parts contain
the first n/2 and the last n/2 elements of A, respectively. If n is odd, then the
sizes of the subarrays may be taken as ⌈n/2⌉ and ⌊n/2⌋.
The merging procedure (that is, the combine step) is non-trivial. Let A1
and A2 be two sorted arrays of sizes n1 and n2 , respectively. We plan to merge
the two arrays into a sorted array B of size n = n1 + n2 . We maintain two
indices i1 and i2 for reading from A1 and A2 , respectively. These indices are
initialized to point to the first locations (that is, the smallest elements) in the
respective arrays. We also maintain an index j for writing to B. The index j
is initialized to 1 (assuming that array indices start from 1).
Background 473

So long as either i1 6 n1 or i2 6 n2 , we carry out the following iteration.


If i1 > n1 , the array A1 is exhausted, so we copy the i2 -th element of A2 to
the j-th location in B, and increment i2 (by one). The symmetric case i2 > n2
is analogously handled. If i1 6 n1 and i2 6 n2 , we compare the element a1 of
A1 at index i1 with the element a2 of A2 at index i2 . If a1 6 a2 , we copy a1
to the j-th location in B, and increment i1 . If a1 > a2 , we copy a2 to the j-th
location in B, and increment i2 . After each copy of an element of A1 or A2 to
B, we increment j so that it points to the next available location for writing.
In each iteration of the merging loop, we make a few comparisons, a copy
of an element from A1 or A2 to B, and two index increments, that is, each
iteration takes Θ(1) time. There are exactly n iterations, since each iteration
involves a single writing in B. Therefore, the merging step takes Θ(n) time.
The basis case for merge sort is n = 1 (arrays with only single elements).
In this case, the array is already sorted, and the call simply returns.
Let T (n) denote the time for merge soring an array of size n. We have:

T (1) = 1,
T (n) = T (⌈n/2⌉) + T (⌊n/2⌋) + n for n > 2.

We like to determine T (n) as a closed-form expression in n. But the recurrence


relation involves floor and ceiling functions, making the analysis difficult. For
us, it suffices if we can provide a big-Θ estimate on T (n).
We can get rid of all floor and ceiling expressions if and only if n is a power
of 2, that is, n = 2t for some t ∈ N0 . In this case, we have:

T (n) = T (2t ) = 2T (2t−1 ) + 2t


= 2(2T (2t−2 ) + 2t−1 ) + 2t = 22 T (2t−2 ) + 2 × 2t
= 22 (2T (2t−3 ) + 2t−2 ) + 2 × 2t = 23 T (2t−3 ) + 3 × 2t
= ···
= 2t T (1) + t2t = 2t + t2t = (1 + t)2t = (1 + log2 n)n.

We plan to generalize this result to the claim that T (n) = Θ(n log n) (for all
values of n). Proving this claim rigorously involves some careful considerations.
For any n > 1, we can find a t such that 2t 6 n < 2t+1 . Since we already
know T (2t ) and T (2t+1 ), we are tempted to write T (2t ) 6 T (n) 6 T (2t+1 ) in
order to obtain a lower bound and an upper bound on T (n). But the catch
is that we are permitted to write these inequalities provided that T (n) is an
increasing function of n, that is, T (n) 6 T (n + 1) for all n > 1. I now prove
this property of T (n) by induction on n.
Since T (1) = 1 and T (2) = 2T (1) + 2 = 4, we have T (1) 6 T (2). For the
inductive step, take n > 2, and assume that T (m) 6 T (m + 1) for all m < n.
If n = 2m (even), then T (n) = 2T (m) + 2m, whereas T (n + 1) = T (m + 1) +
T (m) + (2m + 1), that is, T (n + 1) − T (n) = T (m + 1) − T (m) + 1 > 0 (by the
induction hypothesis), that is, T (n) 6 T (n + 1). If n = 2m + 1 (odd), then
T (n) = T (m+1)+T (m)+(2m+1), whereas T (n+1) = 2T (m+1)+(2m+2),
474 Computational Number Theory

that is, T (n + 1) − T (n) = T (m + 1) − T (m) + 1 > 0, again by the induction


hypothesis. Thus, T (n) is an increasing function of n, as required.
We now manipulate the inequalities 2t 6 n < 2t+1 and T (2t ) 6 T (n) 6
T (2t+1 ). First, T (n) 6 T (2t+1 ) = (t + 2)2t+1 . Also, n > 2t , that is, T (n) 6
2(log2 n + 2)n. On the other hand, T (n) > T (2t ) = (t + 1)2t . Since n < 2t+1 ,
we have T (n) > (log2 n)n/2. It therefore follows that T (n) = Θ(n log n). ¤

This analysis for merge sort can be easily adapted to a generalized setting
of certain types of divide-and-conquer algorithms. Suppose that an algorithm,
upon an input of size n, creates a > 1 sub-instances, each of size (about) n/b.
Suppose that the total effort associated with the divide and combine steps is
Θ(nd ) for some constant d > 0. The following theorem establishes the (tight)
order of the running time of this algorithm.

Theorem A.9 [Master theorem for divide-and-conquer recurrences] Let T (n)


be an increasing function of n, which satisfies
T (n) = aT (n/b) + Θ(nd ),
whenever n > 1 is a power of b. Let τ = logb a. Then, we have:
• If τ > d, then T (n) = Θ(nτ ).
• If τ < d, then T (n) = Θ(nd ).
• If τ = d, then T (n) = Θ(nd log n). ⊳

Informally, τ is the parameter indicating the cost of conquer, whereas d is


the parameter standing for the cost of divide and combine. If τ > d, the cost
of conquer dominates over the divide and combine cost. If τ < d, the divide
and combine cost dominates over the cost of conquer. If τ = d, these two costs
are the same at each level of the recursion, so we multiply this cost by the
(maximum) depth of recursion.

Example A.10 (1) The running time of merge sort corresponds to a = 2,


b = 2 and d = 1, so τ = log2 2 = 1 = d. By the master theorem, this running
time is Θ(n log n).
(2) Let A = (ai , ai+1 , . . . , aj ) be a sorted array of n elements. We want to
determine whether a key x belongs to A, and if so, at which index. The binary
search algorithm for accomplishing this task proceeds as follows. If n = 1 (that
is, i = j), we check whether x = ai , and return i or an invalid index depending
upon the outcome of the comparison (equality or otherwise). If n > 1, we
compute the central index k = ⌊(i + j)/2⌋ in the array, and compare x with
ak . If x 6 ak , we recursively search for x in the subarray (ai , ai+1 , . . . , ak ),
otherwise we recursively search for x in the subarray (ak+1 , ak+2 , . . . , aj ). The
correctness of this strategy follows from the fact that A is sorted. Since k is
the central index in A, the recursive call is made on a subarray of size (about)
n/2. Therefore, the running time of binary search can be expressed as
T (n) = T (n/2) + Θ(1).
Background 475

In the notation of the master theorem, we have a = 1, b = 2 and d = 0, so


τ = log2 1 = 0 = d, and it follows that T (n) = Θ(log n).
Since recursive function calls incur some overhead, it is preferable to imple-
ment binary search as an iterative algorithm using a simple loop. The running-
time analysis continues to hold for the iterative version too. For merge sort,
however, replacing recursion by an iterative program is quite non-trivial. ¤

A.1.3 Worst-Case and Average Complexity


So far, we have characterized the running time of an algorithm as a function
of only the input size n. However, the exact running time may change with
what input is fed to the algorithm, even when the size of the input remains
the same. So long as this variation is bounded by some constant factors, the
asymptotic order notations are good enough to capture the running time as a
function of n. However, there exist cases where the exact function of n changes
with the choice of the input (of size n). In that case, it is necessary to give a
more precise connotation to the notion of running time.
The worst-case running time of an algorithm on an input of size n is the
maximum of the running times over all inputs of size n. The average running
time of an algorithm on an input of size n is the average of the running
times, taken over all inputs of size n. Likewise, we may define the best-case
running time of an algorithm. For merge sort, all three running times happen
to be Θ(n log n). Depending upon the input, the constant hidden in the order
notation may vary, but the functional relationship with n is always Θ(n log n).
I shortly describe another popular sorting algorithm for which the worst-case
and the average running times differ in the functional expression.
Clearly, the worst-case running time gives an upper bound on the running
time of an algorithm. On the other hand, the average running time indicates
the expected performance of the algorithm on an average (or random) input,
but provides no guarantee against bad inputs. In view of this, an unqualified
use of the phrase running time stands for the worst-case running time. Average
or expected running times are specifically mentioned to be so.
Example A.11 Quick sort is a recursive algorithm for sorting an array of
elements (say, single-precision integers). Let the input array A have size n.
The algorithm uses an element a in A as a pivot. It makes a partition of the
array to a form LEG, where the subarray L consists of elements of A smaller
than the pivot, E is the subarray consisting of the elements equal to the pivot,
and the subarray G consists of the elements of A greater than the pivot. In
the decomposition LEG, the block E is already in place for the sorting task.
The blocks L and G are recursively quick sorted. When the recursive calls
return, the entire array is sorted. Recursion terminates when an array of size
0 or 1 is passed as argument. Such an array is already sorted.
What remains to complete the description of quick sort is the way to par-
tition A in the form LEG. If we are allowed to use an additional array, there is
476 Computational Number Theory

little difficulty in partitioning. However, it is not necessary to use an additional


array. By suitable exchanges of elements of A, the LEG decomposition can
be computed. I explain a variant of this in-place partitioning. The description
also illustrates the useful concept of loop invariance.
Let us use the first element of A as the pivot a. We run a loop which main-
tains the invariance that A is always represented in the form LEU G, where
L, E and G are as described earlier. The block U stands for the unprocessed
elements, that is, the elements that have not yet been classified for inclusion
in L, E or G. We maintain three indices i, j, k pointing to the L-E, E-U and
U -G boundaries. The following figure describes this decomposition.

i j k
↓ ↓ ↓
L a E x U y G

Before entering the partitioning loop, the first element of A belongs to the
block E, whereas all the remaining n − 1 elements belong to the block U . If we
imagine that L is the empty block sitting before E and that G is the empty
block sitting after U , we start with an LEU G decomposition of A.
Inside the partitioning loop, the first element of U (that is, the element x
pointed to by j) is considered for processing. Depending upon the result of
comparison of x with the pivot a, one of the following three actions are made.
• If x = a, the region E grows by including x. This is effected by incre-
menting the index j (by one).
• If x > a, then x should go to the block G, that is, x may go to the location
indexed by k. But this location already contains an unprocessed element
y. The elements x and y are, therefore, swapped, and k is decremented to
mark the growth of the block G. But j is not altered, since it now points
to the unprocessed element y to be considered in the next iteration.
• If x < a, the element x should join the region L. Since L grows by
one cell, the entire block E should shift by one cell. However, since E
contains only elements with values equal to a, this shift of E can be
more easily implemented by exchanging x with the first element of E,
that is, by exchanging the elements pointed to by i and j. Both i and j
should then be incremented by 1 in order to indicate the advance of the
region E. The other end of U (that is, the index k) is not altered.
Each iteration of this loop processes one element from U . After exactly
n − 1 iterations, U shrinks to a block of size 0, that is, the desired LEG
decomposition of A is achieved. The partitioning procedure takes Θ(n) time,
since each of the n − 1 iterations involves only a constant amount of work.
Let us now investigate the running time of the quick-sort algorithm. The
original array A contains n elements. Suppose that, after the LEG decom-
position, the block L contains n1 elements, whereas the block G contains n2
elements. Since the pivot a was chosen as an element of the array A, the block
Background 477

E contains at least one element, that is, n1 + n2 6 n − 1. If T (n) denotes the


running time to quick sort an array of n elements, we have:

T (0) = 1,
T (1) = 1,
T (n) = T (n1 ) + T (n2 ) + n for n > 2.

A marked difference of this recurrence with the recurrence of merge sort is


that now n1 and n2 depend heavily on the input.
In order to simplify our analysis, let us assume that the elements of A are
distinct from one another, that is, n1 + n2 = n − 1. Intuitively, the algorithm
exhibits the worst-case performance, when the partitioning is very skew, that
is, either n1 = 0 or n2 = 0. In that case, a recursive call is made on an array
of size n − 1, that is, the reduction in the size of the instance is as bad as
possible. If every recursion experiences this situation, we have:

T (n) = T (n − 1) + T (0) + n = T (n − 1) + (n + 1)
= T (n − 2) + n + (n + 1)
= T (n − 3) + (n − 1) + n + (n + 1)
= ···
= T (0) + 2 + 3 + · · · + (n − 1) + n + (n + 1)
= 1 + 2 + 3 + · · · + (n − 1) + n + (n + 1)
= (n + 1)(n + 2)/2,

that is, T (n) = Θ(n2 ).


Quick sort exhibits the best performance, when the splitting of A is rather
balanced, that is, n1 ≈ n2 ≈ n/2. If such a situation happens in every recursive
call, the recurrence is quite similar to the recurrence for merge sort, and we
obtain T (n) = Θ(n log n). Therefore, the functional behavior of the running
time of quick sort in the worst case differs from that in the best case.
But then, what is the average behavior of quick sort? There are n! per-
mutations of n elements. The probability that the pivot a is the i-th smallest
element of A is (n − 1)!/n! = 1/n for each i = 1, 2, . . . , n. Therefore, the
expected running time of quick sort satisfies the recurrence relation
1h i 1h i
T (n) = T (0) + T (n − 1) + n + T (1) + T (n − 2) + n +
n n
1h i 1h i
T (2) + T (n − 3) + n + · · · + T (n − 1) + T (0) + n
n n
2h i
= T (n − 1) + T (n − 2) + · · · + T (1) + T (0) + n for n > 2,
n
that is,

nT (n) = 2[T (n − 1) + T (n − 2) + · · · + T (1) + T (0)] + n2 .


478 Computational Number Theory

For n > 3, we have


(n − 1)T (n − 1) = 2[T (n − 2) + T (n − 3) + · · · + T (1) + T (0)] + (n − 1)2 .
Therefore, nT (n) − (n − 1)T (n − 1) = 2T (n − 1) + 2n − 1, that is,
µ ¶
n+1 2n − 1
T (n) = T (n − 1) + for n > 3.
n n
Unfolding this recurrence gives
µ ¶
n+1 2n − 1
T (n) = T (n − 1) +
n n
µ ¶ ·µ ¶ ¸
n+1 n 2n − 3 2n − 1
= T (n − 2) + +
n n−1 n−1 n
µ ¶ · ¸
n+1 2n − 1 2n − 3
= T (n − 2) + (n + 1) +
n−1 n(n + 1) (n − 1)n
µ ¶ ·µ ¶ ¸
n+1 n−1 2n − 5
= T (n − 3) + +
n−1 n−2 n−2
· ¸
2n − 1 2n − 3
(n + 1) +
n(n + 1) (n − 1)n
µ ¶
n+1
= T (n − 3) +
n−2
· ¸
2n − 1 2n − 3 2n − 5
(n + 1) + +
n(n + 1) (n − 1)n (n − 2)(n − 1)
= ·µ· · ¶ ·
n+1 2n − 1 2n − 3
= T (2) + (n + 1) + +
3 n(n + 1) (n − 1)n
¸
2n − 5 5
+ ··· +
(n − 2)(n − 1) 3×4
·µ ¶ µ ¶
4 3 1 3 1
= (n + 1) + (n + 1) − + − +
3 n+1 n n n−1
µ ¶ µ ¶ #
3 1 3 1
− + ··· + −
n−1 n−2 4 3
· ¸
4 1 1 1 1
= (n + 1) + 2(n + 1) + + ··· + + 3 − (n + 1)
3 n n−1 4 3
· ¸
1 1 1
= (n + 1) + 3 + 2(n + 1) Hn − − −
1 2 3
8
= 2(n + 1)Hn − (n + 1) + 3 .
3
Here, Hn = 11 + 12 + · · · + n1 is the n-th harmonic number. We have ln(n + 1) 6
Hn 6 ln n + 1 for n > 1. It therefore follows that the average running time of
quick sort is Θ(n log n). ¤
Background 479

A.1.4 Complexity Classes P and NP


So far, we have considered the complexity of algorithms. It is interesting
to talk also about the complexity of computational problems, rather than of
specific algorithms for solving them. Let P be a solvable problem, that is,
there exists at least one algorithm to solve P . If we consider all algorithms to
solve P , and take the minimum of the running times of these algorithms (in
an order notation), we obtain the complexity of the problem P .
There are difficulties associated with this notion of complexity of prob-
lems. There may be infinitely many algorithms to solve a particular problem.
Enumerating all these algorithms and taking the minimum of their running
times is not feasible, in general. This limitation can be overcome by providing
lower bounds on the complexity of problems. We prove analytically that any
algorithm to solve some problem P must run in Ω(f (n)) time. If we come
up with an algorithm A to solve P in the worst-case running time O(f (n)),
then A is called an optimal algorithm for P . A classical example is the sorting
problem. It can be proved that under reasonable assumptions, an algorithm
to sort an array of n elements must take Ω(n log n) running time. Since merge
sort takes O(n log n) time, it is an optimal sorting algorithm. We conclude
that the complexity of sorting is n log n (in the sense of the big-Θ notation).
A bigger difficulty attached to the determination of the complexity of prob-
lems is that not all algorithms for solving a problem are known to us. Although
we can supply a lower bound on the complexity of a problem, we may fail
to come up with an optimal algorithm for that problem. For example, the
complexity of multiplying two n × n matrices is certainly Ω(n2 ), since any
algorithm for multiplying two such matrices must read the two input matri-
ces, and must output the product. But then, can there be an O(n2 ) algorithm
(that is, an optimal algorithm) for matrix multiplication? The answerPn is not
known to us. The obvious algorithm based on the formula cij = k=1 aik bkj
takes Θ(n3 ) running time. Strassen discovers an O(nlog2 7 )-time, that is, an
O(n2.81 )-time, divide-and-conquer algorithm for matrix multiplication. The
best known matrix-multiplication algorithm is from Coppersmith and Wino-
grad, and runs in about O(n2.376 ) time. There exists no proof that this is
an optimal algorithm, that is, no algorithm can multiply two n × n matrices
faster. The complexity of matrix multiplication remains unknown to this date.
In computational number theory, our current knowledge often prohibits
us from concluding whether there at all exist polynomial-time algorithms for
solving certain problems. Until the disclosure of the AKS test (August 2002),
it was unknown whether the primality-testing problem can at all be solved
in polynomial time. Today, we do not know whether the integer-factoring or
the discrete-logarithm problem can be solved in polynomial time. There is no
solid evidence that they cannot be. We cannot, as well, say that they can be.
The class of all problems that can be solved in polynomial time is called the
complexity class P. All problems that have known polynomial-time algorithms
belong to this class. But that is not all. The class P may contain problems for
480 Computational Number Theory

which polynomial-time algorithms are still not known. For example, primality
testing has always been in the class P. It is only in August 2002 when we
know that this problem is indeed in P. This indicates that our understanding
of the boundary of the class P can never be clear. For certain problems, we can
prove superpolynomial lower bounds, so these problems are naturally outside
P. But problems like integer factorization would continue to bother us.
Intuitively, the class P contains precisely those problems that are easily
solvable. Of course, an O(n100 )-time algorithm would be practically as worth-
less as an O(2n )-time algorithm. Nonetheless, treating easy synonymously as
polynomial-time is a common perception in computer science.
An introduction to the class NP requires some abstraction. The basic idea
is to imagine algorithms that can guess. Suppose that we want to sort an array
A of n integers. Let there be an algorithm which, upon the input of A and
n, guesses the index i at which the maximum of A resides. The algorithm
then swaps the last element of A with the element at index i. Now that the
maximum of A is in place, we reduce the original problem to that of sorting
an array of n − 1 elements. By repeatedly guessing the maximum, we sort A
in n iterations, provided that each guess made in this process is correct.
There are two intricacies involved here. First, what is the running time of
this algorithm? We assume that each guess can be done in unit time. If so,
the algorithm runs in Θ(n) time. But who will guarantee that the guesses the
algorithm makes are correct? Nobody! One possibility to view this guessing
procedure is to create parallel threads, each handling a guess. In this case,
we talk about the parallel running time of this algorithm. Since the parallel
algorithm must behave gracefully for all input sizes n, there must be an infinite
number of computing elements to allow perfect parallelism among all guesses.
A second way to realize a guess is to make all possible guesses one after another
in a sequential manner. For the sorting example, the first guess involves n
possible indices, the second n−1 indices, the third n−2, and so on. Thus, there
is a total of n! guesses, among which exactly one gives the correct result. Since
n! = ω(2n ), we end up with an exponential-time (sequential) simulation of the
guessing algorithm. A third way of making correct guesses is an availability
of the guesses before the algorithm runs. Such a sequence of correct guesses is
called a certificate. But then, who will supply a certificate? Nobody. We can
only say that if a certificate is provided, we can sort in linear time.
For the sorting problem, there is actually not a huge necessity to guess. We
can compute the index i of the maximum in A in only O(n) time. When guess-
ing is replaced by this computation, we come up with an O(n2 )-time sorting
algorithm, popularly known as selection sort. For some other computational
problems, it is possible to design efficient guessing algorithms, for which there
is no straightforward way to replace guessing by an easy computation.
As an example, let us compute the discrete logarithm of a ∈ F∗p to a
primitive element g of F∗p (where p ∈ P). We seek for an integer x such that
a ≡ g x (mod p). Suppose that a bound s on the bit size of x is given. Initially,
we take 0 6 x 6 p − 2, so the bit size of p supplies a bound on s. Let us write
Background 481

x = 2s−1 ǫ + x′ , where ǫ is the most significant bit of x with respect to the


size constraint s. Let us imagine an algorithm that can guess the bit ǫ. Since
s−1 ′ ′ s−1
a ≡ g x ≡ (g 2 )ǫ g x (mod p), we can compute b ≡ g x ≡ a(g 2 )−ǫ (mod p)
in time polynomial in log p, and the problem reduces to computing the discrete
logarithm x′ of b to the base g. Since the bit size of x′ is s − 1, this process of
successively guessing the most significant bits of indices leads to a polynomial-
time algorithm for the discrete-logarithm problem, provided that each guess
can be done in unit time, and that each guess made in this process is correct.
Like the sorting example, this guessing can be realized in three ways: a parallel
implementation on an imaginary machine with infinitely many processing ele-
ments to handle inputs of any size, an exponential-time sequential simulation
(the exhaustive search) of the bit-guessing procedure, and an availability of
the correct guesses (a certificate) from an imaginary source. Evidently, if there
is a polynomial-time algorithm to compute the most significant bit ǫ (from
p, g, s), the discrete-logarithm problem can be solved in polynomial time too.
To this date, no such polynomial-time algorithm for computing ǫ is known.
A k-way guess can be replaced by log2 k guesses of bits. It is often desirable
to assume that each bit can be guessed in unit time. Therefore, the above
guessing algorithm for sorting takes O(n log n) time, since each guess involves
choosing one of at most n possibilities. Guessing helps us to reduce the running
time from O(n2 ) to O(n log n). But sorting is a problem which already has
O(n log n)-time algorithms (like merge sort) which do not require guessing.
Let A be a guessing algorithm for a problem P . Suppose that the number
of bit guesses made by A on an input of size n is bounded from above by a
polynomial in n. Suppose also that if each bit is correctly guessed, the correct
solution is supplied by A in time polynomial in n. If guessing each bit takes
unit time, A is a polynomial-time guessing algorithm for P . We say that A is a
non-deterministic polynomial-time algorithm for solving P . Such an algorithm
can be equivalently characterized in terms of certificates. A certificate is a
specification of the sequence of correct guesses. Given a certificate, we can
solve the problem P using A in polynomial time. Moreover, since there are only
a polynomial number of guesses made by A, each certificate must be succinct
(that is, of size no larger than a polynomial in n). A non-trivial example of
certificates is provided in Chapter 5 (Pratt certificates for primality).
The class of problems having non-deterministic polynomial-time algo-
rithms is called NP. A non-deterministic algorithm is allowed to make zero
?
guesses, so P ⊆ NP. Whether NP ⊆ P too is not known. The P = NP prob-
lem is the deepest open problem of computer science. The Clay Mathematical
Foundation has declared an award of one million US dollars for solving it.

A.1.5 Randomized Algorithms


For certain computational problems in the complexity class NP, there is
an interesting way to deal with non-determinism. Suppose that a significant
fraction (like 1/2) of all possible guesses for any given input leads to a solution
482 Computational Number Theory

of the problem. We replace guesses by random choices, and run the polynomial-
time verifier on these choices. If this procedure is repeated for a few number
of times, we hope to arrive at the solution in at least one of these runs.
First, consider decision problems (problems with Yes/No answers). For
instance, the complement of the primality-testing problem, that is, checking
the compositeness of n ∈ N, is in NP, since a non-trivial divisor d of n (a divisor
in the range 2 6 d 6 n − 1) is a succinct certificate for the compositeness of
n. This certificate can be verified easily by carrying out a division of n by d.
But then, how easy is it to guess such a non-trivial divisor of a composite
n? If n is the square of a prime, there exists only one such non-trivial divisor.
A random guess reveals this divisor with a probability of about 1/n, which
is exponentially small in the input size log n. Even when n is not of this
particular form, it has only a few non-trivial divisors, and trying to find one
by chance is like searching for a needle in a haystack.
An idea based upon Fermat’s little theorem leads us to more significant
developments. The theorem states that if p is prime, and a is not a multiple of
p, then ap−1 ≡ 1 (mod p). Any a (coprime to n) satisfying an−1 6≡ 1 (mod n) is
a witness (certificate) to the compositeness of n. We know that if a composite
n has at least one witness, then at least half of Z∗n are witnesses too. Therefore,
it makes sense that we randomly choose an element of Zn , and verify whether
our choice is really a witness. If n is composite (and has a witness), then after
only a few random choices, we hope to locate one witness, and become certain
about the fact that n is composite. However, if no witness is located in several
iterations, there are two possibilities: n does not have a witness at all, or
despite n having witnesses, we have been so unlucky that we missed them in
all these iterations. In this case, we declare n as prime with the understanding
that this decision may be wrong, albeit with a small probability.
The Fermat test exemplifies how randomization helps us to arrive at prac-
tical solutions to computational problems. A Monte Carlo algorithm is a ran-
domized algorithm which always runs fast but may supply a wrong answer
(with low probability). The Fermat test is No-biased, since the answer No
comes with zero probability of error. Yes-biased Monte Carlo algorithms, and
Monte Carlo algorithms with two-sided errors may be conceived of.
The problem with the Fermat test is that it deterministically fails to
find witnesses for Carmichael numbers. The Solovay–Strassen and the Miller–
Rabin tests are designed to get around this problem. There are deterministic
polynomial-time primality tests (like the AKS test), but the randomized tests
are much more efficient and practical than the deterministic tests.
Another type of randomized algorithms needs mention in this context. A
Las Vegas algorithm is a randomized algorithm that always produces the cor-
rect answer, but has a fast expected running time. This means that almost
always we expect a Las Vegas algorithm to terminate fast, but on rare occa-
sions, we may be so unlucky about the random guesses made in the algorithm
that the algorithm fails to supply the (correct) answer for a very long time.
Background 483

Root-finding algorithms for polynomials over large finite fields (Section 3.2)
are examples of Las Vegas algorithms. Let f (x) ∈ Fq [x] be a polynomial with
q odd. For a random α ∈ Fq , the polynomial gcd((x + α)(q−1)/2 − 1, f (x)) is
a non-trivial factor of f (x) with probability > 1/2. Therefore, computing this
gcd for a few random values of α is expected to produce a non-trivial split of
f (x). The algorithm is repeated recursively on the two factors of f (x) thus
revealed. If f is of degree d, we need d − 1 splits to obtain all the roots of f .
However, we may be so unlucky that a very huge number of choices for α fails
to produce a non-trivial split of a polynomial. Certainly, such a situation is
rather unlikely, and that is the reason why Las Vegas algorithms are useful in
practice. It is important to note that no deterministic algorithm that runs in
time polynomial in log q is known to solve this root-finding problem.
As another example, let us compute a random prime of a given bit length
2l
l. By the prime number theorem, the number of primes < 2l is about 0.693l .
Therefore, if we randomly try O(l) l-bit integers, we expect with high probabil-
ity that at least one of these candidates is a prime. We subject each candidate
to a polynomial-time (in l) primality test (deterministic or randomized) until
a prime is located. The expected running time of this algorithm is polynomial
in l. But chances remain, however small, that even after trying a large number
of candidates, we fail to encounter a prime.
Randomized algorithms (also called probabilistic algorithms) are quite use-
ful in number-theoretic computations. They often are the most practical
among all known algorithms, and sometimes the only known polynomial-time
algorithms. However, there are number-theoretic problems for which even ran-
domization does not help much. For example, the best known algorithms for
factoring integers and for computing discrete logarithms in finite fields have
randomized flavors, and are better than the best deterministic algorithms
known for solving these problems, but the improvement in the running time
is from exponential to subexponential only.

A.2 Discrete Algebraic Structures


Computational number theory often deploys algebraic tools. It is, there-
fore, useful to have an understanding of discrete mathematical structures.

A.2.1 Functions and Operations


A function or map f : A → B is an association of each element a ∈ A with
an element b ∈ B. We say that a maps to b, or b is the image of a under f ,
or f (a) = b. A function f : A → B is called B-valued. The set A is called the
domain of f , and the set B the range of f .
484 Computational Number Theory

For functions f : A → B and g : B → C, the composition g ◦ f is the


function A → C defined as (g ◦ f )(a) = g(f (a)) for all a ∈ A.
A function f : A → B is called injective or one-one if for any two different
elements a1 , a2 ∈ A, the images f (a1 ) and f (a2 ) are different.
A function f : A → B is called surjective or onto if for every b ∈ B, there
exists (at least) one a ∈ A with b = f (a).
A function f : A → B which is both injective and surjective is called
bijective or a bijection or a one-to-one correspondence between A and B. For
a bijective function f : A → B, the inverse function f −1 : B → A is defined
as: f −1 (b) = a if and only if f (a) = b. Since f is bijective, f −1 (b) is a unique
element of A for every b ∈ B. In this case, we have f ◦ f −1 = idB and
f −1 ◦ f = idA , where idX is the identity function on the set X, that is, the
function X → X that maps every element x ∈ X to itself. Conversely, if
f : A → B and g : B → A satisfy f ◦ g = idB and g ◦ f = idA , then both f
and g are bijections, and g = f −1 and f = g −1 .
A unary operation on a set A is a function A → A. Negation of integers
is, for instance, a unary operator on the set Z of all integers.
A binary operation on a set A is a function A × A → A (where × stands
for the Cartesian product of two sets). If ∗ : A × A → A is a binary operation,
the value ∗(a1 , a2 ) is usually specified in the infix notation, that is, as a1 ∗ a2 .
Addition is, for instance, a binary operation on Z, and we customarily write
a1 + a2 instead of +(a1 , a2 ).
An n-ary operation on a set A is a function A × A × · · · × A → A.
| {z }
n times

A.2.2 Groups
We now study sets with operations.

Definition A.12 A group is a set G together with a binary operation ⋄ sat-


isfying the following three properties.
(1) a ⋄ (b ⋄ c) = (a ⋄ b) ⋄ c for all a, b, c ∈ G (Associativity).
(2) There exists an element e ∈ G such that e ⋄ a = a ⋄ e = a for all a ∈ G.
The element e is called the identity element of G. The identity element of a
group is unique. If ⋄ is an addition operation, we often write the identity as
0. If ⋄ is a multiplicative operation, we often write the identity as 1.
(3) For every element a ∈ G, there exists an element b ∈ G such that
a ⋄ b = b ⋄ a = e. The element b is called the inverse of a, and is unique for
every element a. If ⋄ is an addition operation, we often write b = −a, whereas
if ⋄ is a multiplication operation, we write b = a−1 .
If, in addition, G satisfies the following property, we call G a commutative
group or an Abelian group.
(4) a ⋄ b = b ⋄ a for all a, b ∈ G. ⊳
Background 485

Example A.13 (1) The set Z of integers is an Abelian group under addition.
The identity in this group is 0, and the inverse of a is −a. Multiplication is
an associative and commutative operation on Z, and 1 is the multiplicative
identity, but Z is not a group under multiplication, since multiplicative inverses
exist in Z only for the elements ±1.
(2) The set Q of rational numbers is an Abelian group under addition. The
set Q∗ = Q\{0} of non-zero rational numbers is a group under multiplication.
Likewise, R (the set of real numbers) and C (complex numbers) are additive
groups, and their multiplicative groups are R∗ = R \ {0} and C∗ = C \ {0}.
(3) The set A[x] of polynomials over A in one variable x (where A is Z,
Q, R or C) is a group under polynomial addition. Non-zero polynomials do
not form a group under polynomial multiplication, since inverses do not exist
for all elements of A[x] (like x). These results can be generalized to the set
A[x1 , x2 , . . . , xn ] of multivariate polynomials over A.
(4) The set of all m×n matrices (with integer, rational, real or complex en-
tries) is a group under matrix addition. The set of all invertible n × n matrices
with rational, real or complex entries is a group under matrix multiplication. ¤
Existence of inverses in groups leads to the following cancellation laws.
Proposition A.14 Let a, b, c be elements of a group G (with operation ⋄). If
a ⋄ b = a ⋄ c, then b = c. Moreover, if a ⋄ c = b ⋄ c, then a = b. ⊳
Definition A.15 Let G be a group under ⋄ . A subset H of G is called a
subgroup of G if H is also a group under the operation ⋄ inherited from G.
In order that a subset H of G is a subgroup, it suffices that H is closed under
the operation ⋄ and also under taking inverses, or equivalently if a ⋄ b−1 ∈ H
for all a, b ∈ H, where b−1 is the inverse of b in G. ⊳
Example A.16 (1) Z is a subgroup of R (under addition).
(2) Z is a subgroup of Z[x] (under addition). The set of all polynomials
in Z[x] with even constant terms is another subgroup of Z[x].
(3) The set of n × n matrices with determinant 1 is a subgroup of all n × n
invertible matrices (under matrix multiplication). ¤
For the time being, let us concentrate on multiplicative groups, that is,
groups under some multiplication operations. This is done only for notational
convenience. The theory is applicable to groups under any operations.
Definition A.17 Let G be a group, and H a subgroup. For a ∈ G, the set
aH = {ah | h ∈ H} is called a left coset of H in G. Likewise, for a ∈ G, the
set Ha = {ha | h ∈ H} is called a right coset of H in G. ⊳
Proposition A.18 Let G be group (multiplicatively written), H a subgroup,
and a, b ∈ G. The left cosets aH and bH are in bijection with one another. For
every a, b ∈ G, we have either aH = bH or aH ∩ bH = ∅. We have aH = bH
if and only if a−1 b ∈ H. ⊳
486 Computational Number Theory

A similar result holds for right cosets too. The last statement should be
modified as: Ha = Hb if and only if ab−1 ∈ H.
Definition A.19 Let H be a subgroup of G. The count of cosets (left or
right, not both) of H in G is called the index of H in G, denoted [G : H]. ⊳
Proposition A.18 tells that the left cosets (also the right cosets) form a
partition of G. It therefore follows that:
Corollary A.20 [Lagrange’s theorem] Let G be a finite group, and H a sub-
group. Then, the size of G is an integral multiple of the size of H. Indeed, we
have |G| = [G : H]|H|. ⊳
A particular type of subgroups is of important concern to us.
Proposition A.21 For a subgroup H of G, the following conditions are equi-
valent:
(a) aH = Ha for all a ∈ G.
(b) aHa−1 = H for all a ∈ G (where aHa−1 = {aha−1 | h ∈ H}).
(c) aha−1 ∈ H for all a ∈ G and for all h ∈ H. ⊳
Definition A.22 If H satisfies these equivalent conditions, it is called a nor-
mal subgroup of G. Every subgroup of an Abelian group is normal. ⊳
Definition A.23 Let H be a normal subgroup of G, and G/H denote the
set of cosets (left or right) of H in G. Define an operation on G/H as
(aH)(bH) = abH.
It is easy to verify that this is a well-defined binary operation on G/H, and
that G/H is again a group under this operation. H = eH (where e is the
identity in G) is the identity in G/H, and the inverse of aH is a−1 H. We call
G/H the quotient of G with respect to H. ⊳
Example A.24 (1) Take the group Z under addition, n ∈ N, and H =
nZ = {na | a ∈ Z}. Then, H is a subgroup of Z. Since Z is Abelian, H is
normal. We have [Z : H] = n. Indeed, all the cosets of H in Z are a + nZ for
a = 0, 1, 2, . . . , n − 1. We denote the set of these cosets as Zn = Z/nZ.
(2) Let G = Z[x] (additive group), and H the set of polynomials in G
with even constant terms. H is normal in G. In this case, [G : H] = 2. The
quotient group G/H contains only two elements: H and 1 + H. ¤
In many situations, we deal with quotient groups. In general, the elements
of a quotient group are sets. It is convenient to identify some particular element
of each coset as the representative of that coset. When we define the group
operation on the quotient group, we compute the representative of the result
from the representatives standing for the operands. For example, Zn is often
identified as the set {0, 1, 2, . . . , n − 1}, and the addition of Zn is rephrased in
terms of modular addition. Algebraically, an element a ∈ Zn actually stands
for the coset a + nZ = {a + kn | k ∈ Z}. The addition of cosets (a + nZ) +
(b + nZ) = (a + b) + nZ is consistent with addition modulo n.
Background 487

Definition A.25 Let G1 , G2 be (multiplicative) groups. A function f : G1 →


G2 is called a (group) homomorphism if f commutes with the group opera-
tions, that is, f (ab) = f (a)f (b) for all a, b ∈ G1 . A bijective homomorphism
is called an isomorphism. For an isomorphism f : G1 → G2 , the inverse
f −1 : G2 → G1 is again a homomorphism (indeed, an isomorphism too). ⊳

Theorem A.26 [Isomorphism theorem] Let f : G1 → G2 be a group homo-


morphism. Define the kernel of f as ker f = {a ∈ G1 | f (a) = e2 } (where e2
is the identity of G2 ). Define the image of f as Im f = {b ∈ G2 | b = f (a)
for some a ∈ G1 }. Then, ker f is a normal subgroup of G1 , Im f is a subgroup
of G2 , and G1 / ker f is isomorphic to Im f under the map a ker f 7→ f (a). ⊳

Definition A.27 Let S be a subset of a group G (multiplicative). The set


of all finite products of elements of S and their inverses is a subgroup of G,
called the subgroup generated by S. If G is generated by a single element g,
we call G a cyclic group generated by g, and g a generator of G. ⊳

Example A.28 (1) Z (under addition) is a cyclic group generated by 1.


(2) For every n ∈ N, the additive group Zn is cyclic (and generated by 1).
(3) The subset of Zn , consisting of elements invertible modulo n, is a group
under multiplication modulo n. We denote this group by Z∗n . Z∗n is cyclic if
and only if n = 1, 2, 4, pr , 2pr , where p is any odd prime, and r ∈ N. ¤

Theorem A.29 Let G be a cyclic group. If G is infinite, then G is isomorphic


to the additive group Z. If G is finite, it is isomorphic to the additive group
Zn for some n ∈ N. ⊳

Proposition A.30 Let G be a finite cyclic group of size n. The number of


generators of G is φ(n). ⊳

Example A.31 The size of Z∗n is φ(n). If Z∗n is cyclic, then the number
of generators of Z∗n is φ(φ(n)). In particular, for a prime p, the number of
generators of Z∗p is φ(p − 1). ¤

Proposition A.32 Let G be a finite cyclic group (multiplicative) with iden-


tity e, and H a subgroup of G of size m. Then, H is cyclic too, and consists
precisely of those elements a of G which satisfy am = e. ⊳

Definition A.33 Let G be a group (multiplicative), and a ∈ G. If the el-


ements a, a2 , a3 , . . . are all distinct from one another, we say that a is of
infinite order. In that case, a generates a subgroup of G, isomorphic to Z.
If a is not of infinite order, then the smallest positive integer r for which
ar = a × a × · · · × a = e (the identity of G) is called the order of a, denoted
| {z }
r times
ord a. If G is finite, ord a divides the size of G (by Corollary A.20). ⊳
488 Computational Number Theory

Definition A.34 A group G generated by a finite subset S of G is called


finitely generated. In particular, cyclic groups are finitely generated. ⊳

Example A.35 (1) The Cartesian product Z × Z (under component-wise


addition) is generated by the two elements (0, 1) and (1, 0), but is not cyclic.
More generally, the r-fold Cartesian product Zr (under addition) is generated
by r elements (1, 0, 0, . . . , 0), (0, 1, 0, . . . , 0), . . . , (0, 0, . . . , 0, 1).
(2) The group Z∗8 (under multiplication) is not cyclic, but generated by
−1 and 5. Indeed, for any t > 3, the group Z∗2t is generated by −1 and 5. ¤

Theorem A.36 [Structure theorem for finitely generated Abelian groups]Any


finitely generated Abelian group G is isomorphic to the additive group Zr ×
Zn1 × Zn2 × · · · × Zns for some integers r, s > 0 and n1 , n2 , . . . , ns > 0 with
the properties that ni |ni−1 for all i = 2, 3, . . . , s. ⊳

Definition A.37 The integer r in Theorem A.36 is called the rank of the
Abelian group G. (For finite Abelian groups, r = 0.) All elements of G with
finite orders form a subgroup of G, isomorphic to Zn1 × Zn2 × · · · × Zns , called
the torsion subgroup of G. ⊳

A.2.3 Rings and Fields


Rings are sets with two binary operations.

Definition A.38 A ring is a set R with two binary operations + (addition)


and · (multiplication) satisfying the following conditions.
(1) R is an Abelian group under + .
(2) For every a, b, c ∈ R, we have a(bc) = (ab)c (Associative).
(3) There exists an element 1 ∈ R such that 1 · a = a · 1 = a for all a ∈ R.
The element 1 is called the multiplicative identity of A. (In some texts, the
existence of the multiplicative identity is not part of the definition of a ring.
We, however, assume that rings always come with multiplicative identities.)
(4) The operation · distributes over +, that is, for all a, b, c ∈ R, we have
a(b + c) = ab + ac and (a + b)c = ac + bc.
In addition to these requirements, we assume that:
(5) Multiplication in R is commutative, that is, ab = ba for all a, b ∈ R.
(This is again not part of the standard definition of a ring.) ⊳

Definition A.39 A ring in which 0 = 1 is the ring R = {0}. This is called


the zero ring. ⊳

Definition A.40 Let R be a non-zero ring. An element a ∈ R is called a zero


divisor if ab = 0 for some non-zero b ∈ R. Clearly, 0 is a zero divisor in R. If
R contains no non-zero zero divisors, we call R an integral domain. ⊳
Background 489

In an integral domain, cancellation holds with respect to multiplication,


that is, ab = ac with a 6= 0 implies b = c.

Definition A.41 Let R be a non-zero ring. An element a ∈ R is called a unit


if ab = 1 for some b ∈ R. The set of all units in R is a group under the ring
multiplication. We denote this group by R∗ . If R∗ = R \ {0} (that is, if every
non-zero element of R is a unit), we call R a field. ⊳

Example A.42 (1) Z is an integral domain, but not a field. The only units
of Z are ±1.
(2) Z[i] = {a + ib | a, b ∈ Z} ⊆ C is an integral domain called the ring of
Gaussian integers. The only units of Z[i] are ±1, ±i.
(3) Q, R, C are fields.
(4) Zn under addition and multiplication modulo n is a ring. The units
of Zn constitute the group Z∗n . Zn is a field if and only if n is prime. For a
prime p, the field Zp is also denoted as Fp .
(5) If R is a ring, the set R[x] of all univariate polynomials with coefficients
from R is a ring. If R is an integral domain, so too is R[x]. R[x] is never a
field. We likewise have the ring R[x1 , x2 , . . . , xn ] of multivariate polynomials.
(6) Let R be a ring. The set R[[x]] of (infinite) power series over R is a
ring. If R is an integral domain, so also is R[[x]].
(7) Let R be a field. The set R(x) = {f (x)/g(x) | f (x), g(x) ∈ R[x], g(x) 6=
0} of rational functions over R is again a field.
(8) The set of all n × n matrices (with elements from a field) is a non-
commutative ring. It contains non-zero zero divisors (for n > 1).
(9) The Cartesian product R1 × R2 × · · · × Rn of rings R1 , R2 , . . . , Rn is
again a ring under element-wise addition and multiplication operations. ¤

Proposition A.43 A field is an integral domain. A finite integral domain is


a field. ⊳

Definition A.44 Let R be a (non-zero) ring. If the elements 1, 1 + 1, 1 + 1 +


1, . . . are all distinct from one another, we say that the characteristic of R is
zero. Otherwise, the smallest positive integer r for which 1 + 1 + · · · + 1 = 0
| {z }
r times
is called the characteristic of R, denoted char R. ⊳

Example A.45 (1) Z, Q, R, C are fields of characteristic zero.


(2) The rings R, R[x], R[x1 , x2 , . . . , xn ] are of the same characteristic.
(3) The characteristic of Zn is n.
(4) If an integral domain has a positive characteristic c, then c is prime. ¤

Like normal subgroups, a particular type of subsets of a ring takes part in


the formation of quotient rings.
490 Computational Number Theory

Definition A.46 Let R be a ring. A subset I ⊆ R is called an ideal of R if I


is a subgroup of R under addition, and if ra ∈ I for all r ∈ R and a ∈ I. ⊳

Example A.47 (1) All ideals of Z are nZ = {na | a ∈ Z} for n ∈ N0 .


(2) Let S be any subset of a ring R. The set of all finite sums of the form
ra with r ∈ R and a ∈ S is an ideal of R. We say that this ideal is generated
by S. This ideal is called finitely generated if S is a finite set. The ideal nZ of
Z is generated by the single integer n.
(3) The only ideals of a field K are {0} and K. ¤

Definition A.48 An ideal in a ring, generated by a single element, is called


a principal ideal. An integral domain in which every ideal is principal is called
a principal ideal domain or a PID. ⊳

Example A.49 (1) Z is a principal ideal domain.


(2) For a field K, the polynomial ring K[x] is a principal ideal domain.
(3) The polynomial ring K[x, y] over a field K is not a PID. Indeed, the
ideal of K[x, y], generated by {x, y}, is not principal. ¤

Definition A.50 Let R be an integral domain.


(1) Let a, b ∈ R. We say that a divides b (in R) if b = ac for some c ∈ R.
We denote this as a|b.
(2) A non-zero non-unit a of R is called a prime in R if a|(bc) implies a|b
or a|c for all b, c ∈ R.
(3) A non-zero non-unit a of R is called irreducible if any factorization
a = bc of a in R implies that either b or c is a unit. ⊳

Theorem A.51 A prime in an integral domain is irreducible. An irreducible


element in a PID is prime. ⊳

Definition A.52 An integral domain in which every non-zero element can


be uniquely expressed as a product of primes is called a unique factorization
domain or a UFD. Here, uniqueness is up to rearrangement of the prime factors
and up to multiplications of the factors by units. ⊳

Theorem A.53 Every PID is a UFD. In particular, Z and K[x] (where K


is a field) are UFDs. ⊳

Example A.54√ Not every√ integral domain supports unique factorization.



Consider Z[ −5 ] = {a + b −5 | a, b ∈ Z}. It is easy to see that Z[ −5 ] is a
ring. √
Being a subset √ domain too. We have 6 = 2 × 3 =
√ of C, it is an integral
(1 + −5 )(1 − −5 ). Here, 2, 3, 1 ± −5 are irreducible but not prime. ¤

Let us now look at the formation of quotient rings.


Background 491

Definition A.55 Let I be an ideal in a ring R. Then, I is a (normal) subgroup


of the additive Abelian group R, and so R/I is a group under the addition
law (a + I) + (b + I) = (a + b) + I. We can also define multiplication on R/I
as (a + I)(b + I) = (ab) + I. These two operations make R/I a ring called the
quotient ring of R with respect to I. ⊳

Example A.56 For the ideal nZ = {an | a ∈ Z} of Z generated by n ∈ N,


the quotient ring Z/nZ is denoted by Zn . If the cosets of nZ are represented by
0, 1, 2, . . . , n − 1, this is same as Zn defined in terms of modular operations. ¤

Definition A.57 Let R1 , R2 be rings. A function f : R1 → R2 is called a


(ring) homomorphism if f commutes with the addition and multiplication
operations of the rings, that is, if f (a + b) = f (a) + f (b) and f (ab) = f (a)f (b)
for all a, b ∈ R1 . We additionally require f to map the multiplicative identity
of R1 to the multiplicative identity of R2 .
A bijective homomorphism is called an isomorphism. For an isomorphism
f : R1 → R2 , the inverse f −1 : R2 → R1 is again an isomorphism. ⊳

Theorem A.58 [Isomorphism theorem] Let f : R1 → R2 be a ring homomor-


phism. The set ker f = {a ∈ R1 | f (a) = 0} is called the kernel of f , and the
set Im f = {b ∈ R2 | b = f (a) for some a ∈ R1 } is called the image of f . Then,
ker f is an ideal of R1 , Im f is a ring (under the operations inherited from
R2 ), and R1 / ker f is isomorphic to Im f under the map a + ker f 7→ f (a). ⊳

Definition A.59 Let I, J be ideals of a ring R. By I + J, we denote the set


of all elements a + b of R with a ∈ I and b ∈ J. I + J is again an ideal of R.
The ideals I and J are called coprime (or relatively prime) if I + J = R.
The set I ∩ J is also an ideal of R, called
Pnthe intersection of I and J.
The set of all finite sums of the form i=1 ai bi with n > 0, ai ∈ I and
bi ∈ J is again an ideal of R, called the product IJ of the ideals I and J. ⊳

Theorem A.60 [Chinese remainder theorem] Let I1 , I2 , . . . , In be ideals in a


ring R with Ii + Ij = R for all i, j with i 6= j. The ring R/(I1 ∩ I2 ∩ · · · ∩ In )
is isomorphic to the Cartesian product (R/I1 ) × (R/I2 ) × · · · × (R/In ) under
the map a + (I1 ∩ I2 ∩ · · · ∩ In ) 7→ (a + I1 , a + I2 , . . . , a + In ). ⊳

Definition A.61 If an ideal I of R contains a unit u, then by definition


1 = u−1 u ∈ I too, so every a ∈ R is in I too (a = a × 1), that is, I = R. This
is why R is called the unit ideal of R. Every ideal which is a proper subset
of R does not contain any unit of R, and is called a proper ideal of R. The
singleton subset {0} in any ring is an ideal called the zero ideal. ⊳

Definition A.62 A proper ideal I of a ring R is called prime if the following


condition holds: Whenever a product ab of elements a, b of R is a member of
I, we have either a ∈ I or b ∈ I (or both). ⊳

Definition A.63 A proper ideal I of a ring R is called maximal if for any


ideal J of R satisfying I ⊆ J ⊆ R, we have either J = I or J = R. ⊳
492 Computational Number Theory

Theorem A.64 An ideal I in a ring R is prime if and only if the quotient


ring R/I is an integral domain. An ideal I in a ring R is maximal if and
only if the quotient ring R/I is a field. Since every field is an integral domain,
every maximal ideal in a ring is prime. ⊳
Example A.65 (1) Let n ∈ N0 . The set nZ of all multiples of n is an ideal
of Z. This ideal is prime if and only if n is either zero or a prime.
(2) Let K be a field, and f (x) ∈ K[x]. All the polynomial multiples of
f (x) form an ideal in K[x], which is prime if and only if f (x) is either zero or
an irreducible polynomial in K[x].
(3) In Z and K[x] (where K is a field), a non-zero (proper) ideal is prime
if and only if it is maximal.
(4) A ring R is an integral domain if and only if the zero ideal is prime.
(5) The converse of Theorem A.64 is not true. In the ring R = Z[x] of
univariate polynomials with integer coefficients, the ideals I = {xf (x) | f (x) ∈
Z[x]} and J = {xf (x) + 2g(x) | f (x), g(x) ∈ Z[x]} are both prime. Indeed,
I consists of all polynomials with zero constant terms, whereas J consists of
all polynomials with even constant terms. I is properly contained in J and
cannot be maximal. J is a maximal ideal in R = Z[x]. The quotient ring R/J
is isomorphic to the field Z2 . On the other hand, the quotient ring R/I is
isomorphic to Z which is an integral domain but not a field. ¤

A.2.4 Vector Spaces


Vector spaces play very important roles in algebra and linear algebra.
Definition A.66 Let K be a field. An additive Abelian group V is called a
vector space over K (or a K-vector space) if there is a scalar multiplication
map · : K × V → V satisfying the following properties:
(1) 1 · x = x for all x ∈ V .
(2) (a + b)x = ax + bx for all a, b ∈ K and x ∈ V .
(3) a(x + y) = ax + ay for all a ∈ K and x, y ∈ V .
(4) a(bx) = (ab)x for all a, b ∈ K and x ∈ V . ⊳
Example A.67 (1) For any n ∈ N, the Cartesian product K n is a K-vector
space under the scalar multiplication a(x1 , x2 , . . . , xn ) = (ax1 , ax2 , . . . , axn ).
In particular, K itself is a K-vector space.
(2) More generally, if V1 , V2 , . . . , Vn are K-vector spaces, then so also is
their Cartesian product V1 × V2 × · · · × Vn .
(3) Let K, L be fields with K ⊆ L. Then, L is a K-vector space with
scalar multiplication defined as the multiplication of L. For example, C is a
vector space over R, and R is a vector space over Q.
(4) The polynomial rings K[x] and K[x1 , x2 , . . . , xn ] for any n ∈ N are
K-vector spaces. ¤
Background 493

Definition A.68 Let V be a vector space over K.


A subset S ⊆ V is (or the elements of S are) called linearly independent
over K if any finite sum of the form a1 x1 + a2 x2 + · · · + an xn with n ∈ N0 ,
ai ∈ K, and xi ∈ S is zero only if a1 = a2 = · · · = an = 0.
A subset S ⊆ V is (or the elements of S are) said to generate V if every
element of V can be written in the form a1 x1 + a2 x2 + · · · + an xn with n ∈ N0 ,
ai ∈ K, and xi ∈ S. If S is finite, V is called finitely generated over K. ⊳
Theorem A.69 For a subset S of a K-vector space V , the following two
conditions are equivalent.
(a) S is a maximal linearly independent (over K) subset of V (that is,
S ∪ {x} is linearly dependent over K for any x ∈ V \ S).
(b) S is a minimal generating subset of V over K (that is, no proper subset
of S generates V as a K-vector space). ⊳
Definition A.70 A subset S of a K-vector space V , satisfying the two equiv-
alent conditions of Theorem A.69, is called a K-basis of V . ⊳
Theorem A.71 Any two K-bases S, T of a K-vector space V are in bijection
with one another. In particular, if S, T are finite, they are of the same size. ⊳
Definition A.72 The size of any K-basis of a K-vector space V is called the
dimension of V over K, denoted by dimK V . If dimK V is finite, V is called a
finite-dimensional vector space over K. ⊳
Example A.73 (1) The dimension of K n over K is n. More generally, the
dimension of V1 × V2 × · · · × Vn is d1 + d2 + · · · + dn , where di is the dimension
of the K-vector space Vi .
(2) The dimension of C over R is 2, since 1, i constitute an R-basis of C.
R is not a finite-dimensional vector space over Q.
(3) For a field K, the polynomial ring K[x] has a basis {1, x, x2 , . . .}. In
particular, K[x] is not finite-dimensional as a K-vector space.
(4) Let x1 , x2 , . . . , xn be Boolean (that is, {0, 1}-valued) variables. The
set V of all Boolean functions of x1 , x2 , . . . , xn is an F2 -vector space. The size
n
of V is 22 , so dimF2 V = 2n . The product (logical AND) y1 y2 · · · yn with
each yi ∈ {xi , x̄i } is called a minterm. The 2n minterms constitute an F2 -
basis of V , that is, each Boolean function can be written as a unique F2 -linear
combination (XOR, same as OR in this context) of the minterms. ¤
Definition A.74 Let V be a K-vector space. A subset U ⊆ V is called a
vector subspace of V if U is a subgroup of V , and if U is closed under the
scalar multiplication map. If U is a subspace of V , then dimK U 6 dimK V . ⊳
Definition A.75 Let V1 , V2 be K-vector spaces. A function f : V1 → V2 is
called a vector-space homomorphism or a linear transformation if f (ax+by) =
af (x)+bf (y) for all a, b ∈ K and x, y ∈ V . A bijective homomorphism is called
an isomorphism. If f : V1 → V2 is a vector-space isomorphism, then the inverse
f −1 : V2 → V1 is again a vector-space isomorphism. ⊳
494 Computational Number Theory

Theorem A.76 [Isomorphism theorem] Let f : V1 → V2 be a K-linear trans-


formation. The set ker f = {x ∈ V1 | f (x) = 0} is called the kernel of f .
The set Im f = {y ∈ V2 | y = f (x) for some x ∈ V1 } is called the image of
f . Then, ker f is a subspace of V1 , Im f is a subspace of V2 , and V1 / ker f is
isomorphic to Im f under the map x + ker f 7→ f (x). ⊳

Definition A.77 For a K-linear transformation f : V1 → V2 , the dimension


dimK (ker f ) is called the nullity of f , and dimK (Im f ) the rank of f . ⊳

Theorem A.78 [Rank-nullity theorem] For a K-linear map f : V1 → V2 ,


the sum of the rank and the nullity of f is equal to dimK V1 . ⊳

Theorem A.79 Let V, W be K-vector spaces. The set of all K-linear maps
V → W , denoted HomK (V, W ), is a K-vector space with addition defined as
(f +g)(x) = f (x)+g(x) for all x ∈ V , and with scalar multiplication defined as
(af )(x) = af (x) for all a ∈ K and x ∈ V . If m = dimK V and n = dimK W
are finite, then the dimension of HomK (V, W ) as a K-vector space is mn. ⊳

Definition A.80 For a K-vector space V , the K-vector space HomK (V, K)
is called the dual space of V . The K-vector spaces V and HomK (V, K) are
isomorphic (and have the same dimension over K). ⊳

A.2.5 Polynomials
Polynomials play a crucial role in the algebra of fields. Let K be a field.
Since the polynomial ring K[x] is a PID, irreducible polynomials are same as
prime polynomials in K[x], and are widely used for defining field extensions.
Let f (x) ∈ K[x] be an irreducible polynomial of degree n. The ideal I =
f (x)K[x] = {f (x)a(x) | a(x) ∈ K[x]} generated by f (x) ∈ K[x] plays a role
similar to that played by ideals generated by integer primes. The quotient
ring L = K[x]/I is a field. We have K ⊆ L, so L is a K-vector space. The
dimension of L over K is n = deg f . We call n the degree of the field extension
K ⊆ L, denoted [L : K]. L contains a root α = x + I of f (x). We say
that L is obtained by adjoining the root α of f to K. Other roots of f (x)
may or may not belong to L. Elements of L can be written as polynomials
a0 + a1 α + a2 α2 + · · · + an−1 αn−1 with unique ai ∈ K. This representation of
L also indicates that L has the dimension n as a vector space over K. Indeed,
1, α, α2 , . . . , αn−1 constitute a K-basis of L. We write L = K(α).

Example A.81 (1) The polynomial x2 + 1 is irreducible in R[x]. Adjoining


a root i of this polynomial to R gives the field C. The other root −i of x2 + 1
is also included in C. We have [C : R] = 2.
(2) The polynomial x3 − 2 is irreducible in Q[x]. Let α be a root of x3 − 2
in C. Adjoining α to Q gives the field Q(α) of extension degree three over Q.
Every element of Q(α) is of the form a0 + a1 α + a2 α2 with a0 , a1 , a2 ∈ Q.
Background 495

There are three possibilities for α, namely,



21/3 , 21/3 ω and 21/3 ω 2 , where
1/3 −1+ i 3
2 is the real cube root of 2, and ω = 2 is a complex cube root of unity.
Adjoining any of these roots gives essentially the same field, that is, there exist
isomorphisms among the fields Q(21/3 ), Q(21/3 ω) and Q(21/3 ω 2 ). Indeed, each
of these fields is isomorphic to the quotient field Q[x]/(x3 − 2)Q[x].
Adjoining a root of x3 − 2 to Q does not add the other two roots of
x − 2. For α = 21/3 , the extension Q(21/3 ) lies inside R, and cannot contain
3

the two properly complex roots. The polynomial (x − 21/3 ω)(x − 21/3 ω 2 ) =
x2 +21/3 x+22/3 is irreducible in Q(21/3 )[x]. Adjoining a root of this polynomial
to Q(21/3 ) gives a field of extension degree two over Q(21/3 ) and six over Q.
(3) If K is a finite field, adjoining a root of an irreducible polynomial to
K also adds the other roots of the polynomial to the extension. ¤

Theorem A.82 Let f (x) ∈ K[x] be an irreducible polynomial of degree n.


There exists a field extension L of K with [L : K] 6 n! such that all the roots
of f lie in L. ⊳

Theorem A.83 Let K be a field. Then, there exists an extension L of K


such that every irreducible polynomial has all its roots in L. ⊳

Definition A.84 A smallest (with respect to inclusion) field L satisfying


Theorem A.83 is called an algebraic closure of K. Algebraic closures of K
are unique up to field isomorphisms that fix K element-wise. We denote the
algebraic closure of K as K̄. K is called algebraically closed if K̄ = K. ⊳

Example A.85 (1) The field C of complex numbers is algebraically closed


(fundamental theorem of algebra).
(2) The algebraic closure of R is C. However, C is not the algebraic closure
of Q, since C contains transcendental numbers like e and π.
(3) Let K = {a1 , a2 , . . . , an } be a finite field of size n. The polynomial
(x − a1 )(x − a2 ) · · · (x − an ) + 1 does not have a root in K, that is, no finite
field is algebraically closed. In particular, the algebraic closure of a finite field
is an infinite field. ¤

A.3 Linear Algebra


Vector spaces and linear transformations are the basic objects of study
in linear algebra. We have already studied some properties of these objects.
Here, I concentrate mostly on some computational problems of linear algebra.
496 Computational Number Theory

A.3.1 Linear Transformations and Matrices


Matrices are compact and handy representations of linear transformations
between finite-dimensional vector spaces. Let V and W be K-vector spaces of
respective dimensions n and m. Choose any K-basis α1 , α2 , . . . , αn of V , and
any K-basis β1 , β2 , . . . , βm of W . Let f : V → W be a linear transformation.
In view of linearity, f is fully specified if only the elements f (αj ) for j =
1, 2, . . . , n are provided. For each j = 1, 2, . . . , n, write
f (αj ) = c1,j β1 + c2,j β2 + · · · + cm,j βm
with ci,j ∈ K. The m × n matrix M whose i, j-th entry is ci,j is a compact
representation of the linear map f . For any element
 
a1
 a2 
α = a1 α1 + a2 α2 + · · · + an αn = ( α1 α2 · · · αn ) 
 ...  ,

an
we have
f (α) = a1 f (α1 ) + a2 f (α2 ) + · · · + an f (αn )
 
a1
 a2 
= ( f (α1 ) f (α2 ) · · · f (αn ) )   ... 

an
c c1,2 ··· c1,n  
1,1 a1
 c2,1 c2,2 ··· c2,n   a2 
= ( β1 β2 ··· βm ) 
 .. .. ..  . 
  .. 
. . ··· .
cm,1 cm,2 ··· cm,n an
 
a1
 a2 
= ( β1 β2 ··· βm ) M 
 ...  ,

an
where M = (ci,j ) is the m × n transformation matrix for f . The argument
α of f is specified by the scalars a1 , a2 , . . . , an , whereas the image f (α) =
b1 β1 + b2 β2 + · · · + bm βm is specified by the scalars b1 , b2 , . . . , bm . We have
   
b1 a1
 b2   a2 
 .  = M  . ,
 ..   .. 
bm an
that is, application of f is equivalent to premultiplication by the matrix M .
Now, take two linear maps f, g ∈ HomK (V, W ) (see Theorem A.79) with
transformation matrices M and N respectively. The linear map f + g has the
Background 497

transformation matrix M + N , and the scalar product af has the transfor-


mation matrix aM . This means that the above description of linear maps in
terms of matrices respects the vector-space operations of HomK (V, W ).
In order to see how matrix multiplication is related to linear maps, take
three K-vector spaces U, V, W of respective dimensions n, m, l. Choose bases
α1 , α2 , . . . , αn of U , β1 , β2 , . . . , βm of V , and γ1 , γ2 , . . . , γl of W . Finally, let
f : U → V and g : V → W be linear maps given by
f (αk ) = c1,k β1 + c2,k β2 + · · · + cm,k βm ,
g(βj ) = d1,j γ1 + d2,j γ2 + · · · + dl,j γl ,
for k = 1, 2, . . . , n and j = 1, 2, . . . , m. The transformation matrices for f and
g are respectively M = (cj,k )m×n and N = (di,j )l×m . But then, we have
(g ◦ f )(αk ) = g(f (αk )) = g(c1,k β1 + c2,k β2 + · · · + cm,k βm )
= c1,k g(β1 ) + c2,k g(β2 ) + · · · + cm,k g(βm )
= c1,k (d1,1 γ1 + d2,1 γ2 + · · · + dl,1 γl ) +
c2,k (d1,2 γ1 + d2,2 γ2 + · · · + dl,2 γl ) +
··· +
cm,k (d1,m γ1 + d2,m γ2 + · · · + dl,m γl )
= (d1,1 c1,k + d1,2 c2,k + · · · + d1,m cm,k )γ1 +
(d2,1 c1,k + d2,2 c2,k + · · · + d2,m cm,k )γ2 +
··· +
(dl,1 c1,k + dl,2 c2,k + · · · + dl,m cm,k )γl
= e1,k γ1 + e2,k γ2 + · · · + el,k γl ,
where ei,k is the (i, k)-th entry of the matrix product N M (an l × n matrix).
This indicates that the transformation matrix for g ◦ f : U → W is the matrix
product N M , that is, matrix products correspond to compositions of linear
maps. More correctly, matrix multiplication was defined in such a weird way
in order that it conforms with compositions of linear maps.

A.3.2 Gaussian Elimination


Gaussian elimination is a fundamentally useful computational tool needed
for solving a variety of problems in linear algebra. Suppose that we have m
linear equations in n variables x1 , x2 , . . . , xn :
a1,1 x1 + a1,2 x2 + · · · + a1,n xn = b1 ,
a2,1 x1 + a2,2 x2 + · · · + a2,n xn = b2 ,
··· (A.1)
am,1 x1 + am,2 x2 + · · · + am,n xn = bm .
In terms of matrices, we can rewrite this system as
Ax = b, (A.2)
498 Computational Number Theory

where A is the m × n coefficient matrix (ai,j ), x = (x1 x2 · · · xn ) t is the


vector of variables, and b = (b1 b2 · · · bm ) t . Here, the superscript t stands
for matrix transpose. If b = 0, the system is called homogeneous.
In order to solve a system like (A.1), we convert the system (or the matrix
A) to a row echelon form (REF). A matrix M is said to be in the row echelon
form if the following three conditions are satisfied.
1. All zero rows of M are below the non-zero rows of M .
2. The first non-zero entry (called pivot) in a non-zero row is 1.2
3. The pivot of a non-zero row stays in a column to the left of the pivots
of all the following non-zero rows.
For example,
 
1 2 5 3 7
0 0 1 0 5
 
0 0 0 1 3
0 0 0 0 0
is a matrix in the row echelon form. A matrix M in REF is said to be in the
reduced row echelon form (RREF) if the only non-zero entry in each column
containing a pivot of some row is that pivot itself. For example, the RREF of
the matrix in the above example is
 
1 2 0 0 −27
0 0 1 0 5 
 
0 0 0 1 3
0 0 0 0 0
We can convert any matrix A to REF or RREF using Gaussian elimination.
The REF of A is not unique, but the RREF of A is unique. Let me first explain
how a system (or matrix) can be converted to an REF. Gaussian elimination
involves a sequence of the following elementary row operations:
1. Exchange two rows.
2. Multiply a row by a non-zero element.
3. Subtract a non-zero multiple of a row from another row.
To start with, we process the first row of A. We find the leftmost column
containing at least one non-zero element. If no such column exists, the REF
conversion procedure is over. So let the l-th column be the leftmost column to
contain a non-zero element. We bring such a non-zero element to the (1, l)-th
position in A by a row exchange (unless that position already contained a
non-zero element). By multiplying the first row with the inverse of the element
at position (1, l), we convert the pivot to 1. Subsequently, from each of the
following rows containing a non-zero element at the l-th column, we subtract
a suitable multiple of the first row in order to reduce the element at the
2 Some authors do not impose this condition. For them, any non-zero entry is allowed as

a pivot. This is not a serious issue anyway.


Background 499

l-th column to zero. This completes the processing of the first row. We then
recursively reduce the submatrix of A, obtained by removing the first row
and the first l columns. When processing a system of equations (instead of
its coefficient matrix), we apply the same elementary transformations on the
right sides of the equations.
Example A.86 Consider the following system over F7 :
5x2 + 3x3 + x4 = 6,
6x1 + 5x2 + 2x3 + 5x4 + 6x5 = 2,
2x1 + 5x2 + 5x3 + 2x4 + 6x5 = 6,
2x1 + 2x2 + 6x3 + 2x4 + 3x5 = 0.
The matrix of coefficients (including the right sides) is:
 
0 5 3 1 0 6
 6 5 2 5 6 2 
 
 2 5 5 2 6 6 
2 2 6 2 3 0
We convert this system to an REF using the following steps.
 
6 5 2 5 6 2
 0 5 3 1 0 6 

 2
 [Exchanging Row 1 with Row 2]
5 5 2 6 6 
2 2 6 2 3 0
 
1 2 5 2 1 5
 0 5 3 1 0 6 

 2
 [Multiplying Row 1 by 6−1 ≡ 6 (mod 7)]
5 5 2 6 6 
2 2 6 2 3 0
 
1 2 5 2 1 5
 0 5 3 1 0 6 

 0
 [Subtracting 2 times Row 1 from Row 3]
1 2 5 4 3 
2 2 6 2 3 0
 
1 2 5 2 1 5
 0
 5 3 1 0 6   [Subtracting 2 times Row 1 from Row 4]
 0 1 2 5 4 3  (Processing of Row 1 over)
0 5 3 5 1 4
 
1 2 5 2 1 5
 0 1 2 3 0 4 

 0
 [Multiplying Row 2 by 5−1 ≡ 3 (mod 7)]
1 2 5 4 3 
0 5 3 5 1 4
 
1 2 5 2 1 5
 0 1 2 3 0 4 

 0
 [Subtracting 1 times Row 2 from Row 3]
0 0 2 4 6 
0 5 3 5 1 4
500 Computational Number Theory
 
1 2 5 2 1 5
 0 1
 2 3 0 4 
 [Subtracting 5 times Row 2 from Row 4]
 0 0 0 2 4 6  (Processing of Row 2 over)
0 0 0 4 1 5
 
1 2 5 2 1 5 (Column 3 contains no non-zero element,
 0 1
 2 3 0 4 
 so we proceed to Column 4)
 0 0 0 1 2 3 
0 0 0 4 1 5 [Multiplying Row 3 by 2−1 ≡ 4 (mod 7)]
 
1 2 5 2 1 5 [Subtracting 4 times Row 3 from Row 4]
 0 1
 2 3 0 4 
 (Processing of Row 3 over)
 0 0 0 1 2 3 
0 0 0 0 0 0 (Row 4 is zero and needs no processing)
This last matrix is an REF of the original matrix. ¤
We can convert a matrix in REF to a matrix in RREF. In the REF-
conversion procedure above, we have subtracted suitable multiples of the cur-
rent row from the rows below the current row. If the same procedure is applied
to rows above the current row, the RREF is obtained. Notice that this addi-
tional task may be done after the REF conversion (as demonstrated in Ex-
ample A.87 below), or during the REF-conversion procedure itself. When the
original matrix is sparse, the second alternative is preferable, since an REF of
the original sparse matrix may be quite dense, so zeroing all non-pivot column
elements while handling a pivot element preserves sparsity.
Example A.87 The following steps convert the last matrix of Example A.86
to the RREF. This matrix is already in REF.
 
1 0 1 3 1 4
 0 1 2 3 0 4 
  [Subtracting 2 times Row 2 from Row 1]
 0 0 0 1 2 3  (Handling Column 2)
0 0 0 0 0 0
 
1 0 1 0 2 2
 0 1 2 3 0 4 
  [Subtracting 3 times Row 3 from Row 1]
 0 0 0 1 2 3  (Handling Column 4)
0 0 0 0 0 0
 
1 0 1 0 2 2
 0 1 2 0 1 2 
  [Subtracting 3 times Row 3 from Row 2]
 0 0 0 1 2 3  (Handling Column 4)
0 0 0 0 0 0
In the last matrix, Columns 1, 2 and 4 contain pivot elements, so all non-pivot
entries in these columns are reduced to zero. Columns 3 and 5 do not contain
pivot elements, and are allowed to contain non-zero entries. ¤
Let us now come back to our original problem of solving a linear system
Ax = b. Using the above procedure, we convert (A | b) to an REF (or RREF)
Background 501

B. If there is a zero row in the A part of B, containing a non-zero element in


the b part, the given system is inconsistent, and is not solvable. So we assume
that each row of B is either entirely zero or contains a non-zero element (a
pivot) in the A part. The columns that do not contain a pivot element (in the
A part) correspond to free or independent variables, whereas the columns that
contain pivot elements correspond to dependent variables. Let xi1 , xi2 , . . . , xik
be the free variables, and xj1 , xj2 , . . . , xjl the dependent variables. Suppose
that i1 < i2 < · · · < ik and j1 < j2 < · · · < jl . From the non-zero rows of the
REF, we express each dependent variable as a linear combination of the other
variables. If we plug in values for the free variables xi1 , xi2 , . . . , xik , we can
solve for xjl , xjl−1 , . . . , xj1 in that sequence. This process of obtaining values
for the dependent variables is called back (or backward) substitution. For each
tuple of values for xi1 , xi2 , . . . , xik , we obtain a different solution. If there are
no free variables (that is, if k = 0), the solution is unique.
If the system is reduced to the RREF, then writing xit = xit for t =
1, 2, . . . , k allows us to express the solutions as
x = u + xi1 v1 + xi2 v2 + · · · + xik vk (A.3)
for some constant n × 1 vectors u, v1 , v2 , . . . , vk . For a system reduced to an
REF (but not the RREF), we cannot write the solutions in this form.
Example A.88 (1) From the REF of the system in Example A.86, we see
that x3 and x5 are independent variables, whereas x1 , x2 and x4 are dependent
variables. The REF gives
x4 = 3 − 2x5 = 3 + 5x5 ,
x2 = 4 − (2x3 + 3x4 ) = 4 + 5x3 + 4x4 ,
x1 = 5 − (2x2 + 5x3 + 2x4 + x5 ) = 5 + 5x2 + 2x3 + 5x4 + 6x5 .
For each pair of values for x3 , x5 , we solve for x4 , x2 , x1 by backward substitu-
tion. Since the system is over F7 and has two free variables, there are exactly
72 = 49 solutions.
(2) For the RREF of Example A.87, we express the solutions as
x1 = 2 − (x3 + 2x5 ) = 2 + 6x3 + 5x5 ,
x2 = 2 − (2x3 + x5 ) = 2 + 5x3 + 6x5 ,
x3 = x3 ,
x4 = 3 − 2x5 = 3 + 5x5 ,
x5 = x5 .
These solutions can be written also as
       
x1 2 6 5
 x2   2  5 6
       
x =  x3  =  0  + x3  1  + x5  0  .
       
x4 3 0 5
x5 0 0 1
502 Computational Number Theory
t t
The two vectors ( 6 5 1 0 0 ) and ( 5 6 0 5 1 ) are clearly linearly
independent (look at the positions of the free variables). ¤
The Gaussian-elimination procedure (that is, conversion to REF or RREF
or solving a system in the form (A.3)) makes O(max(m, n)3 ) elementary op-
erations in the field K.

A.3.3 Inverse and Determinant


Gaussian elimination is used for a variety of purposes other than solving
linear systems. Now, we discuss some such applications of Gaussian elimina-
tion. Let A be a square matrix (of size n × n), that is, a linear transformation
f from an n-dimensional vector space V to an n-dimensional vector space W
(we may have V = W ). If f is an isomorphism (equivalently, if f is invertible
as a function), the inverse transformation f −1 : W → V is represented by a
matrix B satisfying AB = BA = In , where In is the n × n identity matrix.
The matrix B is called the inverse of A, and is denoted as A−1 .
If A (that is, f ) is invertible, any (consistent) system of the form Ax = b
has a unique solution, that is, there are no free variables, that is, an REF
of A contains pivot elements (1, to be precise) at every position of the main
diagonal, that is, the RREF of A is the identity matrix. In order to compute
A−1 , we convert A to In by elementary row operations. If we apply the same
sequence of row operations on In , we get A−1 . If, during the RREF-conversion
procedure, we ever encounter a situation where the i-th row does not contain
a pivot at the i-th column, A is not invertible, that is, A−1 does not exist.
 
0 5 3 6 4
 4 3 1 4 5 
 
Example A.89 Let us compute the inverse of A =   3 6 2 4 3 

 4 1 0 3 6 
4 6 6 3 6
defined over the field F7 . To this end, we start with (A | I5 ), reduce A (the
left part) to the identity matrix I5 . We also carry out the same sequence of
row operations on the right part (initially I5 ), and let it get converted to A−1 .
The steps are given below. Unlike in Examples A.86 and A.87, we straightaway
reduce A to RREF (instead of first to REF and then to RREF).
 
0 5 3 6 4 1 0 0 0 0
 4 3 1 4 5 0 1 0 0 0 
 
(The initial matrix (A | I5 ))   3 6 2 4 3 0 0 1 0 0 

 4 1 0 3 6 0 0 0 1 0 
4 6 6 3 6 0 0 0 0 1
 
4 3 1 4 5 0 1 0 0 0
 0 5 3 6 4 1 0 0 0 0 
 
[Exchange Row 1 with Row 2]   3 6 2 4 3 0 0 1 0 0 

 4 1 0 3 6 0 0 0 1 0 
4 6 6 3 6 0 0 0 0 1
Background 503
 
1 6 2 1 3 0 2 0 0 0
 0 5 3 6 4 1 0 0 0 0 
 
[Multiply Row 1 by 4−1 ≡ 2 (mod 7)] 
 3 6 2 4 3 0 0 1 0 0 

 4 1 0 3 6 0 0 0 1 0 
4 6 6 3 6 0 0 0 0 1
 
1 6 2 1 3 0 2 0 0 0
 0 5 3 6 4 1 0 0 0 0 
 
[Subtract 3 × Row 1 from Row 3]   0 2 3 1 1 0 1 1 0 0 

 4 1 0 3 6 0 0 0 1 0 
4 6 6 3 6 0 0 0 0 1
 
1 6 2 1 3 0 2 0 0 0
 0 5 3 6 4 1 0 0 0 0 
 
[Subtract 4 × Row 1 from Row 4]   0 2 3 1 1 0 1 1 0 0 

 0 5 6 6 1 0 6 0 1 0 
4 6 6 3 6 0 0 0 0 1
 
1 6 2 1 3 0 2 0 0 0
 0 5 3 6 4 1 0 0 0 0 
 
[Subtract 4 × Row 1 from Row 5]   0 2 3 1 1 0 1 1 0 0 

 0 5 6 6 1 0 6 0 1 0 
0 3 5 6 1 0 6 0 0 1
 
1 6 2 1 3 0 2 0 0 0
 0 1 2 4 5 3 0 0 0 0 
 
[Multiply Row 2 by 5−1 ≡ 3 (mod 7)] 
 0 2 3 1 1 0 1 1 0 0 

 0 5 6 6 1 0 6 0 1 0 
0 3 5 6 1 0 6 0 0 1
 
1 0 4 5 1 3 2 0 0 0
 0 1 2 4 5 3 0 0 0 0 
 
[Subtract 6 × Row 2 from Row 1]   0 2 3 1 1 0 1 1 0 0 

 0 5 6 6 1 0 6 0 1 0 
0 3 5 6 1 0 6 0 0 1
 
1 0 4 5 1 3 2 0 0 0
 0 1 2 4 5 3 0 0 0 0 
 
[Subtract 2 × Row 2 from Row 3]   0 0 6 0 5 1 1 1 0 0 

 0 5 6 6 1 0 6 0 1 0 
0 3 5 6 1 0 6 0 0 1
 
1 0 4 5 1 3 2 0 0 0
 0 1 2 4 5 3 0 0 0 0 
 
[Subtract 5 × Row 2 from Row 4]   0 0 6 0 5 1 1 1 0 0 

 0 0 3 0 4 6 6 0 1 0 
0 3 5 6 1 0 6 0 0 1
504 Computational Number Theory
 
1 0 4 5 1 3 2 0 0 0
 0 1 2 4 5 3 0 0 0 0 
 
[Subtract 3 × Row 2 from Row 5]   0 0 6 0 5 1 1 1 0 0 

 0 0 3 0 4 6 6 0 1 0 
0 0 6 1 0 5 6 0 0 1
 
1 0 4 5 1 3 2 0 0 0
 0 1 2 4 5 3 0 0 0 0 
 
[Multiply Row 3 by 6−1 ≡ 6 (mod 7)] 
 0 0 1 0 2 6 6 6 0 0 

 0 0 3 0 4 6 6 0 1 0 
0 0 6 1 0 5 6 0 0 1
 
1 0 0 5 0 0 6 4 0 0
 0 1 2 4 5 3 0 0 0 0 
 
[Subtract 4 × Row 3 from Row 1]   0 0 1 0 2 6 6 6 0 0 

 0 0 3 0 4 6 6 0 1 0 
0 0 6 1 0 5 6 0 0 1
 
1 0 0 5 0 0 6 4 0 0
 0 1 0 4 1 5 2 2 0 0 
 
[Subtract 2 × Row 3 from Row 2]   0 0 1 0 2 6 6 6 0 0 

 0 0 3 0 4 6 6 0 1 0 
0 0 6 1 0 5 6 0 0 1
 
1 0 0 5 0 0 6 4 0 0
 0 1 0 4 1 5 2 2 0 0 
 
[Subtract 3 × Row 3 from Row 4]   0 0 1 0 2 6 6 6 0 0 

 0 0 0 0 5 2 2 3 1 0 
0 0 6 1 0 5 6 0 0 1
 
1 0 0 5 0 0 6 4 0 0
 0 1 0 4 1 5 2 2 0 0 
 
[Subtract 6 × Row 3 from Row 5]   0 0 1 0 2 6 6 6 0 0 

 0 0 0 0 5 2 2 3 1 0 
0 0 0 1 2 4 5 6 0 1
 
1 0 0 5 0 0 6 4 0 0
 0 1 0 4 1 5 2 2 0 0 
 
[Exchange Row 4 with Row 5]  0 0 1 0 2 6 6 6 0 0 

 0 0 0 1 2 4 5 6 0 1 
0 0 0 0 5 2 2 3 1 0
 
1 0 0 0 4 1 2 2 0 2
 0 1 0 4 1 5 2 2 0 0 
 
[Subtract 5 × Row 4 from Row 1]   0 0 1 0 2 6 6 6 0 0 

 0 0 0 1 2 4 5 6 0 1 
0 0 0 0 5 2 2 3 1 0
Background 505
 
1 0 0 0 4 1 2 2 0 2
 0 1 0 0 0 3 3 6 0 3 
 
[Subtract 4 × Row 4 from Row 2]   0 0 1 0 2 6 6 6 0 0 

 0 0 0 1 2 4 5 6 0 1 
0 0 0 0 5 2 2 3 1 0
 
1 0 0 0 4 1 2 2 0 2
 0 1 0 0 0 3 3 6 0 3 
 
[Multiply Row 5 by 5−1 ≡ 3 (mod 7)] 
 0 0 1 0 2 6 6 6 0 0 

 0 0 0 1 2 4 5 6 0 1 
0 0 0 0 1 6 6 2 3 0
 
1 0 0 0 0 5 6 1 2 2
 0 1 0 0 0 3 3 6 0 3 
 
[Subtract 4 × Row 5 from Row 1]   0 0 1 0 2 6 6 6 0 0 

 0 0 0 1 2 4 5 6 0 1 
0 0 0 0 1 6 6 2 3 0
 
1 0 0 0 0 5 6 1 2 2
 0 1 0 0 0 3 3 6 0 3 
 
[Subtract 2 × Row 5 from Row 3]   0 0 1 0 0 1 1 2 1 0 

 0 0 0 1 2 4 5 6 0 1 
0 0 0 0 1 6 6 2 3 0
 
1 0 0 0 0 5 6 1 2 2
 0 1 0 0 0 3 3 6 0 3 
 
[Subtract 2 × Row 5 from Row 4]   0 0 1 0 0 1 1 2 1 0 

 0 0 0 1 0 6 0 2 1 1 
0 0 0 0 1 6 6 2 3 0
 
5 6 1 2 2
 3 3 6 0 3 
 
The last matrix is of the form (I5 | A−1 ), that is, A−1 = 
 1 1 2 1 0 .
 6 0 2 1 1 
6 6 2 3 0
−1 −1
One can verify that AA = A A = I5 (modulo 7). ¤

Definition A.90 The determinant of an n × n matrix A = (ai,j ) is defined as


" n
#
X Y
det A = sign(π) ai,π(i) ,
π∈Sn i=1

where the sum is over all permutations π of 1, 2, . . . , n, and the sign of π is


+1 for even permutations π and −1 for odd permutations π.3 ⊳
3 The set of all permutations of 1, 2, . . . , n is denoted by S . Every permutation in S
n n
can be obtained from 1, 2, . . . , n by a sequence of transpositions (that is, swapping pairs of
elements). Given a permutation π ∈ Sn , there are many (in fact, infinitely many) transpo-
sition sequences to obtain π from 1, 2, . . . , n, but the parity of the count of transpositions
506 Computational Number Theory

The determinant of A is an important mathematical property of A. For


example, A is invertible if and only if det A 6= 0.
The determinant of a square matrix A can be computed by Gaussian elim-
ination. We reduce the matrix A to an REF (not necessarily RREF). If this
procedure reveals that some column does not contain a pivot element, A is
not invertible, that is, det A = 0. Otherwise, the REF-conversion procedure
places a non-zero si at A[i][i] for every i = 1, 2, . . . , n. We force A[i][i] = 1 by
dividing the entire i-th row by si (or multiplying by s−1 i ). Finally, let t be the
number of row exchanges done during the REF-conversion procedure. Then,
n
Y
det A = (−1)t si .
i=1

Example A.91 Let us compute det A for the matrix A of Example A.89. If
we are interested in computing only the determinant of A, it is not necessary
to convert I5 to A−1 , that is, the REF conversion may be restricted only to
the first five columns. We have s1 = 4, s2 = 5, s3 = 6, s4 = 1, s5 = 5, and
t = 2, that is, det A ≡ (−1)2 × 4 × 5 × 6 × 1 × 5 ≡ 5 (mod 7). ¤

The inverse or the determinant of an n × n matrix can be computed using


O(n3 ) elementary operations in the underlying field K.

A.3.4 Rank and Nullspace


Definition A.92 Let A be an m × n matrix. Now, we may have m 6= n. The
rows of A can be treated as n-tuples over K, that is, members of the K-vector
space K n . A maximum number of linearly independent (over K) rows of A
is called the row rank of A. Likewise, we define the column rank of A as the
maximum number of linearly independent columns of A. ⊳

Theorem A.93 For every matrix A, its row rank is the same as its column
rank. We refer to this common value as the rank of A or rank(A). If A is the
matrix of a K-linear map f : V → W (with dimK V = n and dimK W = m),
the rank of A is the same as the rank of f . ⊳

Definition A.94 The nullspace of A is the set of all solutions of the homo-
geneous system Ax = 0. These solutions, treated as n-tuples over K, form a
subspace of K n . The dimension of the nullspace of A is called the nullity of
A, denoted as nullity(A). ⊳

in each such sequence is the same for a given π. We call π an even or an odd permutation
according as whether this parity is even or odd, respectively. It turns out that (for n > 2)
exactly half of the n! permutations in Sn are even, and the rest odd. Sn is a group under
composition. The set An of even permutations in Sn is a subgroup of Sn (of index 2).
For example, for n = 5, consider the following sequence of transpositions: 1, 2, 3, 4, 5 →
1, 5, 3, 4, 2 → 1, 4, 3, 5, 2 → 3, 4, 1, 5, 2 → 2, 4, 1, 5, 3. It follows that 2, 4, 1, 5, 3 is an even
permutation, whereas 3, 4, 1, 5, 2 is an odd permutation of 1, 2, 3, 4, 5.
Background 507

Theorem A.95 If A is the matrix of a linear map f : V → W (with dimK V =


n and dimK W = m), the nullity of A is the same as the nullity of f . ⊳

The rank-nullity theorem for linear maps implies the following:


Corollary A.96 For an m×n matrix A, we have rank(A)+nullity(A) = n. ⊳
Both the rank and a basis of the nullspace of a matrix A can be computed
by Gaussian elimination. We reduce A to an REF. Let k denote the number of
free variables, and l the number of dependent variables. Then, k is the nullity
of A, and l is the rank of A. If A is converted to the RREF, Eqn (A.3) allows
us to write the solutions of Ax = 0 as follows (since we are dealing with a
homogeneous system, we have u = 0 in Eqn (A.3)):
x = xi1 v1 + xi2 v2 + · · · + xik vk ,
where xi1 , xi2 , . . . , xik are the free variables, and v1 , v2 , . . . , vk are linearly
independent vectors constituting a basis of the nullspace of A.
 
0 5 3 1 0
 6 5 2 5 6 
Example A.97 Let us compute the rank of A =   2 5 5 2 6  of Ex-

2 2 6 2 3
ample A.86 over F7 . We have seen that x3 , x5 are free variables, and x1 , x2 , x4
are dependent variables. Therefore, rank(A) = 3 and nullity(A) = 2.
The RREF of A (Example A.87 and Example A.88(2)) indicates that all
solutions of Ax = 0 can be written as
   
6 5
5 6
   
x = x3  1  + x5  0  .
   
0 5
0 1
So (6 5 1 0 0) t and (5 6 0 5 1) t form a basis of the nullspace of A. ¤

Example A.98 As an application, let us find linear dependencies among the


Boolean functions f0 , f1 , . . . , fn−1 in s Boolean variables x0 , x1 , . . . , xs−1 . Let
m = 2s , and T0 , T1 , . . . , Tm−1 the minterms in the variables x0 , x1 , . . . , xs−1 .
We want to determine all n-tuples (x0 , x1 , . . . , xn−1 ) ∈ Fn2 such that
x0 f0 + x1 f1 + · · · + xn−1 fn−1 = 0, (A.4)
where 0 indicates the zero function. Each fj can be written uniquely as
fj = a0,j T0 + a1,j T1 + · · · + am−1,j Tm−1 , (A.5)
where ai,j is the value of fj for the values of x0 , x1 , . . . , xs−1 given by the
minterm Ti . Substituting Eqn (A.5) in Eqn (A.4) for all j = 0, 1, . . . , n − 1
gives (a0,0 x0 + a0,1 x1 + · · · + a0,n−1 xn−1 )T0 + (a1,0 x0 + a1,1 x1 + · · · +
508 Computational Number Theory

a1,n−1 xn−1 )T1 + · · · + (am−1,0 x0 + am−1,1 x1 + · · · + am−1,n−1 xn−1 )Tm−1 = 0.


Since T0 , T1 , . . . , Tm−1 are linearly independent over F2 , we have:

a0,0 x0 + a0,1 x1 + · · · + a0,n−1 xn−1 = 0,


a1,0 x0 + a1,1 x1 + · · · + a1,n−1 xn−1 = 0,
···
am−1,0 x0 + am−1,1 x1 + · · · + am−1,n−1 xn−1 = 0.

Let A = (ai,j ) 06i6m−1 . We need to solve the homogeneous system Ax = 0 over


06j6n−1
F2 . The solutions of Eqn (A.4) are precisely the vectors in the nullspace of A.
As a numeric example, consider the following ten functions f0 , f1 , . . . , f9
in three Boolean variable x0 , x1 , x2 :

minterm x0 x1 x2 f0 f1 f2 f3 f4 f5 f6 f7 f8 f9
T0 = x̄0 x̄1 x̄2 0 0 0 1 1 1 1 1 0 1 0 0 0
T1 = x̄0 x̄1 x2 0 0 1 0 1 0 1 1 0 1 1 0 1
T2 = x̄0 x1 x̄2 0 1 0 0 1 0 0 0 1 0 1 0 0
T3 = x̄0 x1 x2 0 1 1 1 0 1 0 1 0 0 0 0 0
T4 = x0 x̄1 x̄2 1 0 0 1 0 1 1 1 1 0 0 0 0
T5 = x0 x̄1 x2 1 0 1 1 0 0 0 1 0 1 1 0 0
T6 = x0 x1 x̄2 1 1 0 1 1 0 0 1 1 0 1 0 0
T7 = x0 x1 x2 1 1 1 1 1 1 1 0 0 0 0 1 0

We denote by A the 8 × 10 matrix formed by the values of the functions (the


last ten columns in the table above):
 
1 1 1 1 1 0 1 0 0 0

 0 1 0 1 1 0 1 1 0 1 


 0 1 0 0 0 1 0 1 0 0 

 1 0 1 0 1 0 0 0 0 0 
A= .

 1 0 1 1 1 1 0 0 0 0 


 1 0 0 0 1 0 1 1 0 0 

 1 1 0 0 1 1 0 1 0 0 
1 1 1 1 0 0 0 0 1 0

The RREF of A is (the steps for calculating this are not shown here):
 
1 0 0 0 0 0 0 1 0 1

 0 1 0 0 0 1 0 1 0 0 


 0 0 1 0 0 0 0 0 0 0 


 0 0 0 1 0 1 0 0 0 0 
.

 0 0 0 0 1 0 0 1 0 1 


 0 0 0 0 0 0 1 1 0 0 

 0 0 0 0 0 0 0 0 1 1 
0 0 0 0 0 0 0 0 0 0
Background 509

The free variables are x5 , x7 , x9 , and all the solutions of Ax = 0 are


       
x0 0 1 1
 x1  1 1 0
       
 x2  0 0 0
       
 x3  1 0 0
       
 x4  0 1 1
x =   = x5   + x7   + x9   with x5 , x7 , x9 ∈ {0, 1}.
 x5  1 0 0
       
 x6  0 1 0
       
 x7  0 1 0
       
x8 0 0 1
x9 0 0 1
Therefore, three linearly independent linear equations involving f0 , f1 , . . . , f9
are f1 + f3 + f5 = 0, f0 + f1 + f4 + f6 + f7 = 0, and f0 + f4 + f8 + f9 = 0.
All dependencies among these functions can be obtained by the F2 -linear
combinations of these independent equations. There are 23 − 1 = 7 non-zero
linear equations in the given ten functions. ¤

A.3.5 Characteristic and Minimal Polynomials


Let A be an n × n (square) matrix with elements from a field K. Choose
any n-dimensional vector v. The n + 1 vectors v, Av, A2 v, . . . , An v must be
linearly dependent, that is, there exist c0 , c1 , . . . , cn , not all zero, such that
c0 v + c1 Av + c2 A2 v + · · · + cn An v = 0,
that is, p(A)v = 0, where p(x) = c0 + c1 x + c2 x2 + · · · + cn xn ∈ K[x] is a non-
zero polynomial. If v1 , v2 , . . . , vn constitute a basis of K n , and if pi (A)vi = 0
for non-zero polynomials pi (x) ∈ K[x], i = 1, 2, . . . , n, then q(A)v = 0 for all
v ∈ K n , where q(x) = lcm(p1 (x), p2 (x), . . . , pn (x)). But then, q(A) = 0n×n ,
that is, the matrix A satisfies non-zero polynomial equations over K.
Definition A.99 The monic polynomial of smallest positive degree, satisfied
by A is called the minimal polynomial µA (x) ∈ K[x] of A. ⊳
It turns out that p(A) = 0 for some p(x) ∈ K[x] if and only if µA (x)|p(x).
One particular polynomial is always satisfied by A:
Definition A.100 The monic polynomial det(xIn − A) of degree n is called
the characteristic polynomial χA (x) of A. ⊳
Theorem A.101 [Cayley–Hamilton theorem] χA (A) = 0. ⊳
Computation of χA (x) is easy, since this amounts to computing the de-
terminant of the matrix xIn − A. Suppose that χA (x) factors over a suitable
extension of K as
χA (x) = (x − ǫ1 )r1 (x − ǫ2 )r2 · · · (x − ǫk )rk
510 Computational Number Theory

with ǫi ∈ K̄ and ri ∈ N. The minimal polynomial of A must be of the form


µA (x) = (x − ǫ1 )s1 (x − ǫ2 )s2 · · · (x − ǫk )sk
for some si satisfying 0 6 si 6 ri for all i. It turns out that the value si = 0
is not possible, that is, the roots of µA (x) are precisely the roots of χA (x).
 
5 6 2
Example A.102 For the matrix A =  2 1 3  defined over F7 , we have
  1 1 4
x+2 1 5
xI3 − A =  5 x+6 4 . Therefore,
6 6 x+3

χA (x) = det(xI3 − A) = x3 + 4x2 + 5x + 2 = (x + 1)2 (x + 2).


Thus, the minimal polynomial of A is either χA (x) itself or (x + 1)(x + 2).
Since (A + I3 )(A + 2I3 ) = 0, we have µA (x) = (x + 1)(x + 2) = x2 + 3x + 2. ¤

The roots ǫ1 , ǫ2 , . . . , ǫk of χA (x) or µA (x) are called the eigenvalues of A.


These are important quantities associated with A. However, keeping in mind
the current content of this book, I will not discuss eigenvalues further.

A.4 Probability
In a random experiment, there are several possibilities for the output. For
example, if a coin is tossed, the outcome may be either a head (H) or a tail
(T ). If a die is thrown, the outcome is an integer in the range 1–6. The life
of an electric bulb is a positive real number. To every possible outcome of
a random experiment, we associate a quantity called the probability of the
outcome, that quantifies the likelihood of the occurrence of that event.
In general, there is no good way to determine the probabilities of events
in a random experiment. In order to find the probability of obtaining H in
the toss of a coin, we may toss the coin n times, and count the number h of
occurrences of H. Define pn = h/n. The probability of H is taken as the limit
p = limn→∞ pn (provided that the limit exists). The probability of T is then
q = 1 − p. Likewise, we determine the probability of each integer 1, 2, . . . , 6 by
throwing a die an infinite number of times. Computing the probability of the
life of an electric bulb is more complicated. First, there are infinitely many
(even uncountable) possibilities. Second, we need to measure the life of many
bulbs in order to identify a pattern. As the number of bulbs tends to infinity,
the pattern tends to the probability distribution of the life of a bulb.
Evidently, these methods for determining the probabilities of events in a
random experiment are impractical (usually infeasible). To get around this
Background 511

difficulty, we make certain simplifying assumptions. For example, we assume


that a coin is unbiased. This means that both H and T are equally likely to
occur, that is, each has a probability of 1/2. If we toss an unbiased coin 100
times, we cannot certainly say that we obtain exactly 50 heads and exactly
50 tails, but this would be the most likely event. The probability of each
integer 1, 2, . . . , 6 in the throw of an unbiased die is likewise taken as 1/6,
since each of the six possibilities are equally likely to occur. The life of an
electric bulb is modeled as a non-negative real number with exponentially
decreasing likelihood of survival as the lifetime increases.
These abstract models of probability turn out to be practically useful in a
variety of contexts. In what follows, I assume that the probabilities of events
in a random experiment are provided to us. I focus on how we can play with
these probabilities in order to perform things useful to us. I also expect the
readers to have some acquaintance with the basic notion of probability.

A.4.1 Random Variables and Probability Distributions


A random variable x assumes values from the set X of all outcomes of a
random experiment. The set X is called the sample space for x. To the mem-
bers of X, we associate probabilities (idealized, computed or approximated).
If X is a countable set (not necessarily finite), we call the random variable x
discrete. If X is uncountable, we call x a continuous random variable.
Let x be a discrete random variable with sample space X = {x1 , x2 , x3 , . . .}
(the set X may be infinite). The probability distribution of X is the association
of
P a non-negative real number pi = Pr(x = xi ) to each i. We must have
4
i pi = 1, indicating that each pi lies in the real interval [0, 1].
Defining the probability distribution of a continuous random variable x
involves more effort. For simplicity, assume that the sample space X is a subset
of R. We do not assign probabilities to individual elements of X. We instead
supply a non-negative real-valued function f defined on X. The probability
associated with the real interval [a, b] (the interval may be open at one or both
Rb
ends) is given by the integral x=a f (x)dx. We may set f (x) = 0 for x ∈ R \ X
so that f is defined
R ∞ on the entire real
R line. The probability distribution function
f must satisfy x=−∞ f (x)dx = x∈X f (x)dx = 1.
Example A.103 (1) The random variable C that stands for the outcome of
the toss of a coin assumes two values H and T . If the coin is assumed to be
unbiased, we have Pr(C = H) = Pr(C = T ) = 1/2.
(2) The random variable D that stands for the outcome of the throw of a
die has the sample space {1, 2, 3, 4, 5, 6}. If the die is unbiased, the probability
distribution for D is Pr(D = 1) = Pr(D = 2) = Pr(D = 3) = Pr(D = 4) =
Pr(D = 5) = Pr(D = 6) = 1/6.
4 Let a, b be real numbers with a < b. The open interval (a, b) is the set {x ∈ R | a < x < b},

and the closed interval [a, b] is {x ∈ R | a 6 x 6 b}. We also talk about intervals [a, b) and
(a, b], closed at one end and open at the other.
512 Computational Number Theory

(3) Consider the random experiment of choosing an element from Zn . The


random variable U standing for this event has sample space Zn . If all elements
are equally likely to occur in the choice, then Pr(x = i) = 1/n for each i ∈ Zn .
(4) [Exponential distribution] The random variable V representing the
life of a bulb is continuous. Its probability distribution function is often mod-
eled as f (x) = λe−λx for x > 0, where λ is a positive real constant. The proba-
Rb
bility that the life of a bulb is between a and b (with a < b) is x=a λe−λx dx =
e−λa − e−λb . For the interval [0, ∞), this integral evaluates to 1. ¤

We can think about more complicated events, as exemplified below.

Example A.104 (1) [Binomial distribution] Let the random variable A


stand for the number of heads in n throws of an unbiased coin. The sample
space for A is {0, 1,
¡ 2,
¢ . . . , n}. For an integer k in this set, the probability is
pk = Pr(A = k) = nk /2n . ¡This ¢ is because n tosses have 2n
possible outcomes,
and k heads can occur in nk ways. Since the coin is unbiased, all of the 2n
possible outcomes are equally likely.
In general, if the probability of head is p ¡and ¢ that for tail is q (so p+q = 1),
the probability that A takes the value k is nk pk q n−k .
(2) [Geometric distribution] Let B denote the number of tosses of a coin
until we obtain a head. The sample space for B is N, which is infinite but
countable. If p and q are the probabilities of head and tail in one toss of the
coin, then Pr(B = k) = q k−1 p. For an unbiased coin, this probability is 1/2k .
(3) [Hypergeometric distribution] An urn contains n1 red and n2 blue
balls, and m balls are randomly taken out¡ of ¢¡
the urn
¢ ¡together.
¢ The probability
that exactly k red balls are taken out is nk1 m−kn2
/ n1m+n2
.
(4) [Uniform distribution] Let X = {x1 , x2 , . . . , xn } be a finite set. An
element is drawn from X. If all the elements have equal likelihood of being
drawn, the probability that xi is drawn is 1/n.
For a real interval [a, b] of finite length l = b − a, we define the continuous
uniform distribution as f (x) = 1/(b − a). The probability associated with a
subinterval [c, d] of [a, b] is (d − c)/(b − a).
We can carry the concept of uniform distribution to higher dimensions too.
For example, let us define the uniform distribution on a unit square by the
function f (x) = 1. For Ra Rsub-region A of this square, the probability is given
by the double integral A
dx dy which evaluates to the area of A.
(5) [Poisson distribution] Suppose that we expect λ occurrences of some
event in one unit of time. The probability that exactly k of this type of events
k −λ
happen in one unit of time is given by λ k!
e
for all integers k > 0.
(6) [Normal (or Gaussian) distribution] Many natural quantities (like
weight or height of people in a group, velocity of a particle) flock around a
mean value µ with some deviation characterized by σ. In these cases, the prob-
(x−µ)2
ability distribution is modeled as fµ,σ (x) = √1 e− 2σ 2 for −∞ < x < ∞.
σ 2π
Background 513
R µ+σ
The probability that x lies within ±σ of µ is x=µ−σ fµ,σ (x) dx ≈ 0.6827. The
probability for x ∈ [µ − 2σ, µ + 2σ] is about 0.9545, for x ∈ [µ − 3σ, µ + 3σ] is
about 0.9973, and for x ∈ [µ − 6σ, µ + 6σ] is about 1 − 1.973 × 10−9 .
The normal distribution has theoretical importance too. In 1810, Laplace
proved the central limit theorem, which states that the mean of a real-valued
sample of size n tends to follow a normal distribution as n → ∞. More im-
portantly, this result holds irrespective of the original probability distribution
followed by the objects of the sample. ¤

If U and V are two discrete random variables with sample spaces X and
Y (both subsets of R), the random variable U + V has sample space X + Y =
{x + y | x ∈ X, y ∈ Y }. For z ∈ X +PY , the probability that the random
variable U + V assumes the value z is x,y Pr(U = x, V = y), where the
x+y=z
probability Pr(U = x, V = y) is the joint probability that U = x and V = y.
The random variables U and V are called independent if Pr(U = x, V = y) =
Pr(U = x) × Pr(V = y) for all x ∈ X and y ∈ Y .
The product U V is again a random variable with sample space XY =
{xy | x ∈ X, y ∈ Y }. For zP ∈ XY , the probability that the random variable
U V assumes the value z is x,y Pr(U = x, V = y).
xy=z
The sum and product of continuous random variables can also be defined
(although I do not do it here).
The expectation of a real-valued discrete random variable U with sample
space X is defined as
X
E(U ) = x Pr(U = x).
x∈X

If U is a real-valued continuous random variable with probability distribution


function f (x), the expectation of U is defined as
Z ∞
E(U ) = xf (x) dx.
−∞

For two random variables U and V , we always have E(U + V ) = E(U ) +


E(V ). If U and V are independent, we have E(U V ) = E(U )E(V ).

A.4.2 Birthday Paradox


A very important result from probability theory, used on multiple occasions
in this book, is the birthday paradox. Let S be a finite set of size n. We keep
on choosing elements from S uniformly randomly. That is, in each draw, each
of the n elements of S is equally likely (has probability 1/n) to be chosen. The
question is: how many elements must be chosen from S in order to obtain a
collision with high probability? More precisely, let xi be the element chosen
from S in the i-th draw. If k elements are drawn (with replacement) from S,
what is the chance (as a function of n and k) of having xi = xj for at least
one pair (i, j) satisfying 1 6 i < j 6 k?
514 Computational Number Theory

In the worst case, n+1 elements should be chosen to ensure that there is at
least one collision. However, we expect collisions to occur much√ earlier, that is,
for much smaller values of k. More precisely, for k√≈ 1.18 n, the probability
of collision is more than√1/2, whereas for k ≈ 3.06 n, the probability is more
than 0.99. In short, Θ( n) choices suffice to obtain collisions with very high
probability. Assuming that each of the 365 days in the year is equally likely
to be the date of birth of a human, only 23 randomly chosen people have a
chance of at least half to contain a pair with the same birthday. For a group
of 58 randomly chosen people, this probability is at least 0.99.
It is not difficult to derive the probability of collision. The probability that
there is no collision in k draws is n(n − 1)(n − 2) · · · (n − k + 1)/nk , that is,
the probability of at least one collision is
n(n − 1)(n − 2) · · · (n − k + 1)
pcollision (n, k) = 1− k
µ ¶µ n ¶ µ ¶
1 2 k−1
= 1− 1− 1− ··· 1 − .
n n n
For a positive real value x much smaller than 1, we have 1 − x ≈ e−x . So long
as k is much smaller than n, we then have
1+2+···+(k−1) −k(k−1) −k2
pcollision (n, k) ≈ 1 − e− n =1−e 2n ≈1−e 2n .
Stated differently, we have
p √
k ≈ −2 ln(1 − pcollision (n, k)) × n.

√k) = 1/2 gives k ≈ 1.1774 n, whereas pcollision (n, k) =
Plugging in pcollision (n,
0.99 gives k ≈ 3.0349 n.

A.4.3 Random-Number Generators


Many number-theoretic algorithms require a source of random number gen-
erators (RNGs). For example, the Miller–Rabin primality test needs random
bases in Z∗n , Dixon’s integer-factoring method requires randomly generated el-
ements xi ∈ Zn in a search for smooth values x2i (mod n), it is often needed to
locate random points on an elliptic curve, and so on. In general, the problem
of random-number generation can be simplistically stated as the generation of
sequences of bits b0 b1 b2 . . . such that each bit is of probability 1/2 of being zero
and of probability 1/2 of being one, and the knowledge of the bits b0 , b1 , . . . , bi
cannot let one predict the next bit bi+1 . If one clubs multiple bits together, one
can talk about random sequences of integers or even floating-point numbers.
RNGs can be classified into two broad categories. Hardware generators
sample random properties of physical objects to generate random bit streams.
Some example sources are noise in electronic circuits or nature, and quantum-
mechanical phenomena like spin of electrons or polarization of photons. One
may also consider the load of a computing processor and even user inputs, as
Background 515

sources of random bits. Hardware RNGs are called true RNGs, because they
possess good statistical properties, at least theoretically. But hardware RNGs
are costly and difficult to control, and it is usually impossible to use them to
generate the same random sequence at two different times and/or locations.
Software RNGs offer practical solutions to the random-number generation
problem. They are called pseudorandom number generators (PRNG), since
they use known algorithms to generate sequences of random bits (or integers
or floating-point numbers). A PRNG operates on a seed initialized to a value
s0 . For i = 1, 2, 3, . . . , two functions f and g are used. The next element in the
pseudorandom sequence is generated as xi = f (si ), and the seed is updated to
si+1 = g(si ). In many cases, the seed itself is used as the random number, that
is, xi = si . PRNGs are easy to implement (in both hardware and software),
and are practically usable if their outputs look random.
Most commonly, PRNGs are realized using linear congruential generators
in which the pseudorandom sequence is generated as xi+1 ≡ axi + b (mod m)
for some suitable modulus m, multiplier a, and increment b. The parameters
a, b, m should be carefully chosen so as to avoid pseudorandom sequences of
poor statistical properties. An example of linear congruential generator is the
ANSI-C generator defined by m = 231 , a = 1103515245, and b = 12345. The
seed is initialized to x0 = 12345. This PRNG is known to have many flaws
but can be used in many practical situations.
PRNGs with provably good statistical properties are also known. In 1986,
Lenore Blum, Manuel Blum and Michael Shub propose a PRNG called the
Blum-Blum-Shub or BBS generator. This PRNG uses a modulus m = pq,
where both p and q are suitably large primes congruent to 3 modulo 4. The
sequence generation involves a modular squaring: xi+1 ≡ x2i (mod m). It is
not practical, since modular operations on multiple-precision integers are not
very efficient. Moreover, a statistical drawback of the BBS generator is that
each xi (except perhaps for i = 0) is a quadratic residue modulo m, that
is, all elements in Z∗m are not generated by the BBS generator. It, however,
is proved that if only O(log log m) least significant bits of xi are used in the
output stream, then the output bit sequence is indistinguishable from random,
so long as factoring the modulus m is infeasible. In view of this, the BBS
generator is often used in cryptographic applications.
So far, we have concentrated on generating uniformly random samples (bits
or elements of Zm ). Samples following other probability distributions can also
be generated using a uniform PRNG. The concept of cumulative probability
helps us in this context.
Let U be a random variable with sample space X. For simplicity, suppose
that X = {x1 , x2 , . . . , xn } is finite. Let pi = Pr(U = xi ). We break the
real interval [0, 1) in n disjoint sub-intervals: I1 = [0, p1 ), I2 = [p1 , p1 + p2 ),
I3 = [p1 + p2 , p1 + p2 + p3 ), . . . , In = [p1 + p2 + · · · + pn−1 , 1). We then generate
a uniformly random floating-point number x ∈ [0, 1). There is a unique k such
that Ik contains x. We find out this k (for example, using binary search), and
output xk . It is easy to argue that the output follows the distribution of U .
516 Computational Number Theory

If the elements of X are ordered as x1 < x2 < x3 < · · · < xn , the key role is
played here by the cumulative probabilities Pr(U 6 xk ) = p1 + p2 + · · · + pk .
In the case of a continuous random variable U , the summation is to be
replaced by integration. As an example, consider the exponential distribution
standing for the lifetime of an electric bulb: f (x) = e−x for all x > 0 (see
Example A.103(4), we have taken λ = 1 for simplicity). In order to generate
a bulb sample with a lifetime following this distribution,
R x we obtain the cumu-
lative probability distribution: F (x) = Pr(U 6 x) = x=0 e−x dx = 1 − e−x .
We generate a uniformly random floating-point ´ y ∈ [0, 1), and output x
³ value
1
satisfying y = F (x) = 1 − e−x , that is, x = ln 1−y .
Appendix B
Solutions to Selected Exercises

Chapter 1 Arithmetic of Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517


Chapter 2 Arithmetic of Finite Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528
Chapter 3 Arithmetic of Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536
Chapter 4 Arithmetic of Elliptic Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546
Chapter 5 Primality Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557
Chapter 6 Integer Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 562
Chapter 7 Discrete Logarithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568
Chapter 8 Large Sparse Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575
Chapter 9 Public-Key Cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 578

Chapter 1 Arithmetic of Integers

3. Let a = (as−1 as−2 . . . a1 a0 )B be the multiple-precision integer whose square


needs to be computed.
In the schoolbook multiplication method, we run a doubly nested loop
on i, j. The outer loop variable i runs in the range 0, 1, 2, . . . , s − 1, whereas
for each i, the inner loop variable j runs in the range 0, 1, 2, . . . , i. If i = j,
a2i is computed as the double-precision word (hl)B . The digits h and l are
added (with carry adjustments) to the (2i + 1)-st and the 2i-th words of
the product, respectively. If i 6= j, only one product ai aj is computed as a
double-precision word (h, l), and the digits h and l are added twice each to
the (i + j + 1)-st and the (i + j)-th words of the product, respectively. Here,
the products ai aj and aj ai are computed only once (but added twice). This
saves a significant number of word multiplications compared to the general-
purpose multiplication routine. In practice, one typically achieves a speedup
by 20–30% using this special trick for squaring.
Instead of adding the two words of ai aj twice to the product, one may
compute 2ai aj by left-shifting the product ai aj by one bit. The computation
of 2ai aj may lead to an overflow (in a double-precision space). If so, 1 is added
to the (i + j + 2)-nd word of the product. The other two words of 2ai aj are
added once each to the (i + j + 1)-st and the (i + j)-th words of the product.
For Karatsuba squaring, note that 2A1 A0 = A21 + A20 − (A1 − A0 )2 . This
means that three squares of integers of half sizes are computed recursively,
and combined using one addition and one subtraction.

517
518 Computational Number Theory

5. Let us divide a = (as−1 as−2 . . . a1 a0 )B by b = B l +m. If l > s, then a < b, and


we are done. Otherwise, we multiply m by as−1 to obtain the double-precision
word (hl)B . Notice that for l > 2, the (s − 1)-st word of as−1 B s−1−l b is as−1 ,
so we set the (s − 1 − l)-th word of the quotient to as−1 , set as−1 to zero, and
subtract h and l from the (s−l)-th and the (s−1−l)-th words of a with borrow
adjustments. If the borrow is not finally absorbed, we add B s−1−l b to the
negative difference, and subtract 1 from the (s−1−l)-th word of the quotient.
Now, a is reduced to an integer of the form a′ = (a′s′ −1 a′s′ −2 . . . a′1 a′0 )B with
s′ = s−1. We then repeat the above steps to reduce a′s′ −1 to zero. This process
is continued until a reduces to an integer smaller than b. This reduced value
is the desired remainder.
For dividing a by b = B l − m, the words h and l are added (instead of
subtracted) to the (s − l)-th and the (s − 1 − l)-st words of a with carry
adjustments. In this case, a can never become negative, that is, the overhead
of adding b to an intermediate negative a is absent.
8. Write the operands as

a = A2 R2 + A1 R + A0 ,
b = B1 R + B0 ,

where R is a suitable integral power of the base B. The product is a polynomial


of degree three:

c = C3 R3 + C2 R2 + C1 R + C0 ,

where

C3 = A2 B1 ,
C2 = A2 B0 + A1 B1 ,
C1 = A2 B0 + A0 B1 ,
C0 = A0 B0 .

Instead of computing all the six products Ai Bj , we evaluate c = ab at R =


∞, 0, ±1 to obtain

C3 = c(∞) = A2 B1 ,
C0 = c(0) = A0 B0 ,
C3 + C2 + C1 + C0 = c(1) = (A2 + A1 + A0 )(B1 + B0 ),
−C3 + C2 − C1 + C0 = c(−1) = (A2 − A1 + A0 )(−B1 + B0 ).

The four products on the right side are computed by a good multiplication
algorithm (these products have balanced operands). Since
    
1 0 0 0 C3 c(∞)
 0 0 0 1   C2   c(0) 
   =  ,
1 1 1 1 C1 c(1)
−1 1 −1 1 C0 c(−1)
Solutions to Selected Exercises 519

the coefficients C3 , C2 , C1 , C0 can be expressed in terms of the subproducts as


    
C3 1 0 0 0 c(∞)
1 1
 C2   0 −1 2 2   c(0) 
 = 1 1  .
C1 −1 0 2 − 2 c(1)
C0 0 1 0 0 c(−1)

12. (a) For each i in the range 2 6 i 6 k + 1, we have ri−2 = qi ri−1 + ri .


Since ri−2 > ri−1 , we have qi > 1, so ri−2 > ri−1 + ri . Moreover, rk 6= 0,
that is, rk > 1 = F2 , and rk−1 > rk , that is, rk−1 > 2 = F3 . But then,
rk−2 > rk−1 + rk > F3 + F2 = F4 , rk−3 > rk−2 + rk−1 > F4 + F3 = F5 , and
so on. Proceeding in this way, we can show that r1 > Fk+1 , and r0 > Fk+2 .
(b) Compute ri = ri−2 rem ri−1 , where 0 6 ri 6 ri−1 − 1. If ri > 12 ri−1 ,
replace ri by ri−1 − ri . The correctness of this variant is based on the fact
that gcd(ri−1 , ri ) = gcd(ri−1 , −ri ) = gcd(ri−1 , ri−1 − ri ).
(c) In the original Euclidean algorithm, we have r1 > Fk+1 ≈ √15 ρk+1 , that

is, k 6 −1 + (log( 5r1 ))/ log ρ. For the modified algorithm, let k ′ denote the
′ ′
number of iterations. We have r1 > 2r2 > 22 r3 > · · · > 2k −1 rk′ > 2k −1 ,
that is, k ′ 6 1 + log(r1 )/ log(2). Since 2 > ρ, the modified algorithm has
the potential of reducing the number of iterations of the Euclidean loop by a
factor of log 2/ log ρ ≈ 1.440.
13. Let a, b be the two integers of which the gcd is to be computed. For the time
being, assume that a, b are both odd. In the extended binary gcd algorithm,
we keep track of three sequences ri , ui , vi satisfying the invariance

ui a + vi b = ri .

We initialize u0 = 1, v0 = 0, r0 = a, and u1 = 0, v1 = 1, r1 = b, so the


invariance is satisfied for i = 0, 1.
In the binary gcd loop, the smaller of ri−2 , ri−1 is subtracted from the other
to obtain an even integer ri . Suppose that ri−2 > ri−1 , so we compute ri =
ri−2 − ri−1 (the other case can be symmetrically handled). By the invariance
for i − 1, i − 2, we already have

ui−2 a + vi−2 b = ri−2 ,


ui−1 a + vi−1 b = ri−1 .

After subtraction, we should continue to have ui a + vi b = ri , so we set

ui = ui−2 − ui−1 ,
vi = vi−2 − vi−1 .
The next step is more complicated. Now, ri is even, so an appropriate number
of 2’s should be factored out of it so as make it odd. Here, I describe the
removal of a single factor of 2. Since ui a + vi b = ri , dividing ri by 2 requires
dividing both ui and vi by 2 so that the invariance is maintained. But there
is no guarantee that ui , vi are even. However, since a, b are odd, exactly one
520 Computational Number Theory

of ui , vi cannot be odd. If both ui , vi are even, we extract a factor of 2 from


both. If both ui , vi are odd, we rewrite the invariance as

(ui + b)a + (vi − a)b = ri .

Now, ui + b and vi − a are even, so a factor of 2 can be extracted from all of


ui + b, vi − a, ri .
Notice that each iteration of the binary gcd loop uses values from only
two previous iterations. So it is not necessary to store the entire sequences
ui , vi , ri . Moreover, it is not necessary to explicitly maintain (and update)
both the u and v sequences. If only the u sequence is maintained, one can
compute v = (r − ua)/b. This is done after the gcd loop terminates.
Let us now remove the restriction that a, b are odd, and write a = 2s a′
and b = 2t b′ with a′ , b′ odd. By the above binary gcd loop, we compute the
extended gcd of a′ , b′ giving

u′ a′ + v ′ b′ = d′ = gcd(a′ , b′ ).

Let r = min(s, t). Then, the gcd d of a, b is

2r u′ a′ + 2r v ′ b′ = u′ (2r a′ ) + v ′ (2r b′ ) = 2r d′ = d.

If s = t = r, we are done. So suppose that s > t (the case s < t can be


symmetrically handled). In that case, we have d = u′ 2t a + v ′ b = d. It is not
guaranteed that u′ itself would supply the remaining s − t factors of 2 at this
stage. For example, this initial u′ may be odd. Still, we proceed as follows.
Suppose that at some point of time, we have arrived at a readjusted Bézout
relation of the form

u′ 2τ a + v ′ b = d

with t 6 τ < s and with u′ odd. We rewrite this invariance as

(u′ + b′ )2τ a + (v ′ − 2τ −t a)b = d.

Now, u′ + b′ is even, so factor(s) of 2 can be extracted from it and absorbed


in 2τ so long as necessary and/or possible.
Like the main binary gcd loop, it suffices to update only one of the u and
v sequences in this final adjustment loop. Moreover, only the value from the
previous iteration is needed for the current iteration to proceed.
16. (a) We have uk = uk−2 − qk uk−1 with qk > 1, for all k > 2. It is easy to estab-
lish that the sequence u2 , u3 , u4 , . . . alternate in sign, and |uk | > qk |uk−1 | >
|uk−1 | (unless uk−2 = 0 which is true only for k = 3). The given inequalities
for the v sequence can be analogously established.
(b) We start with u0 = 1, u1 = 0, and obtain u2 = u0 − q2 u1 = 1 =
K0 (), u3 = u1 − q3 u2 = −q3 , that is, |u3 | = K1 (q3 ). In general, sup-
pose that |ui | = Ki−2 (q3 , . . . , qi ) and |ui+1 | = Ki−1 (q3 , . . . , qi+1 ). But then,
Solutions to Selected Exercises 521

|ui+2 | = |ui | + qi+2 |ui+1 | = qi+2 Ki−1 (q3 , . . . , qi+1 ) + Ki−2 (q3 , . . . , qi ) =
Ki (q3 , . . . , qi+2 ). Thus, we have:
|ui | = Ki−2 (q3 , . . . , qi ) for all i > 2.
Analogously, we have:
|vi | = Ki−1 (q2 , . . . , qi ) for all i > 1.
(c) Take n = i − 2, and substitute x1 = q2 , x2 = q3 , . . . , xn+1 = qi in
Exercise 1.15(e) to get |ui |Ki−2 (q3 , . . . , qi ) − |vi |Ki−3 (q3 , . . . , qi−1 ) = (−1)i−2
for all i > 3. Moreover, since u0 = v1 = u2 = 1, we have:
gcd(ui , vi ) = 1 for all i > 0.
(d)³ If´rj = 0,³that
´ is, rj−1 = gcd(a, b) = d, we have uj a + vj³b = 0, that
´ ³ ´ is,
a b a b
uj d = −vj d . By Part (c), gcd(uj , vj ) = 1. Moreover, gcd( d , d ) = 1.
³ ´ ³ ´
It therefore follows that uj = ± db and vj = ∓ ad .
(e) The case b|a is easy to handle. So assume that a > b > 0 and b6 | a. Then,
the gcd loop ends with j > 4. In this case, we have |u3 | < |u4 | < · · · < |uj−1 | <
|uj | = db and |v2 | < |v3 | < · · · < |vj−1 | < |vj | = ad , that is,
b a
|uj−1 | < and |vj−1 | < .
d d
21. Write a = (as−1 as−2 . . . a1 a0 )B and b = (bt−1 bt−2 . . . b1 b0 )B , where each ai
and each bj are base-B words. We assume that B is a power of 2, and that a
and b are odd. If so, b0 is an odd integer, and b−1
0 (mod B) exists. We compute
the multiplier µ = b−10 a0 (mod B). The least significant word of a − µb is zero.
If B = 2r , we can remove at least r factors of 2 from a − µb.
Computing the inverse b−1 0 (mod B) can be finished by a single-precision
extended gcd computation. Moreover, the multiplication b−1 0 a0 (mod B) is
again of single-precision integers. In most CPUs, this is the value returned
by the single-precision multiplication function (the more significant word is
ignored). Therefore, µ can be computed efficiently. The multiplication µb and
the subtraction a − µb are also efficient (taking time proportional to s).
But then, we require (a − µb)/B to have word length (at least) one smaller
than that of a. This is ensured if t < s, that is, if the word length of b is at
least one smaller than that of a. If s = t (we cannot have s < t since a > b),
then µb may be an (s + 1)-word integer, so a − µb is again an (s + 1)-bit
negative integer. Ignoring the sign and removing the least significant word of
a − µb, we continue to have an s-word integer.
25. Consider the product x = n(n − 1)(n − 2) · · · (n − r + 1) of r consecutive
integers. If any of the factors n − i of x is zero, we have x = 0 which is a
multiple of r!. If all factors n − i of x are negative, we can write x as (−1)r
times a product of r consecutive positive integers. Therefore, we can assume
without loss of generality that 1 6 r 6 n. But then, x = n!/(n − r)!. For
522 Computational Number Theory
¥ ¦ ¥ ¦ ¥ ¦
any prime p and any k ∈ N, we have n/pk > (n − r)/pk + r/pk . By
Exercise 1.24, we conclude that vp (x) > vp (r!).
28. (a) The modified square-and-multiply algorithm is elaborated below.

Let r = (rl−1 rl−2 . . . r1 r0 )2 , and s = (sl−1 sl−2 . . . s1 s0 )2 .


Precompute xy (mod n).
Initialize prod = 1.
For i = l − 1, l − 2, . . . , 1, 0 {
Set prod = prod2 (mod n).
If (ri = 1) and (si = 1), set prod = prod × (xy) (mod n),
else if (ri = 1) and (si = 0), set prod = prod × x (mod n),
else if (ri = 0) and (si = 1), set prod = prod × y (mod n).
}

The modified algorithm reduces the number of square operations to half of


that performed by two independent calls of the repeated square-and-multiply
algorithm. The number of products depends on the bit patterns of the ex-
ponents r and s. For random exponents, about half of the bits are one, that
is, two exponentiations make about l modular multiplications. The modified
algorithm skips the multiplication only when both the bits are 0—an event
having a probability of 1/4 for random exponents. Thus, the expected number
of multiplications done by the modified algorithm is 0.75l. The precomputa-
tion involves only one modular multiplication, and has negligible overhead.
41. Let p be a prime divisor of the modulus m, and e the multiplicity of p in m.
If p|a, both am and am−φ(m) (and so their difference too) are divisible by pe ,
since pe−1 (and so pe too) are > e for all p > 2 and e > 1. On the other
e
hand, if p6 | a, we have aφ(p ) ≡ 1 (mod pe ) by the original theorem of Euler.
But then, aφ(m) ≡ 1 (mod pe ) (since φ(pe )|φ(m)), that is, am − am−φ(m) =
am−φ(m) (aφ(m) − 1) is again divisible by pe .
45. [Only if] Let a ∈ Z be a solution to the t congruences x ≡ ai (mod mi ). But
then for all i, j with i 6= j, we have a = ai + ki mi = aj + kj mj for some
integers ki , kj , that is, ai − aj = kj mj − ki mi is a multiple of gcd(mi , mj ).
[If] We proceed by induction on t. As the base case, we take t = 2, that is,
we look at the two congruences x ≡ a1 (mod m1 ) and x ≡ a2 (mod m2 ) with
d = gcd(m1 , m2 ) dividing ¡ a1 −a ¢ 2 . There exist u, v ∈ Z such that um1 +vm ¡ a −a 2 =¢
d. Consider x = a1 + a2 −a d
1
um 1 . Since d|(a 2 − a1 ) by hypothesis, 2
d
1

is an ¡integer,¢ so x ≡ a1 (mod m1 ). Moreover, ¡ a −a ¢ um1 = d − vm2 , so that


a1 + a2 −ad
1
um
¡ a −a ¢ 1 ≡ a1 + (a2 − a 1 ) − 2
d
1
vm2 ≡ a2 (mod m2 ). Thus,
x = a1 + 2 d 1 um1 is a simultaneous solution of the two given congruences.
Now, take t > 3, and assume that the result holds for t − 1 congruences.
As in the base case, the first two of the t given congruences are simultane-
ously solvable. Let a0 be any particular solution of the first two congruences,
and m0 = lcm(m1 , m2 ). Any solution of the first two congruences satisfy x ≡
a0 (mod m0 ) (see the uniqueness proof below). We now look at the t − 1 con-
gruences x ≡ ai (mod mi ) for i = 0, 3, 4, . . . , t. Take any i ∈ {3, 4, . . . , t}, and
Solutions to Selected Exercises 523

write gcd(m0 , mi ) = gcd(lcm(m1 , m2 ), mi ) = lcm(gcd(m1 , mi ), gcd(m2 , mi )).


Now, gcd(m1 , mi )|(a1 − ai ) (by hypothesis), a1 − ai = (a1 − a0 ) + (a0 − ai ),
a0 ≡ a1 (mod m1 ) (since a0 is a solution of the first of the initial congru-
ences), and gcd(m1 , mi )|m1 . It follows that gcd(m1 , mi )|(a0 − ai ). Likewise,
gcd(m2 , mi )|(a0 − ai ), and so gcd(m0 , mi ) = lcm(gcd(m1 , mi ), gcd(m2 , mi ))
divides a0 −ai . By induction hypothesis, the t−1 congruences x ≡ ai (mod mi )
for i = 0, 3, 4, . . . , t are simultaneously solvable.
For proving the uniqueness, let a and b be two solutions of the given con-
gruences. For all i, we then have a ≡ ai (mod mi ) and b ≡ ai (mod mi ), that
is, a ≡ b (mod mi ), that is, mi |(a − b). Therefore, lcm(m1 , m2 , . . . , mt )|(a − b).
47. [If] Since gcd(e, φ(m)) = 1, an extended gcd calculation gives d satisfying
ed ≡ 1 (mod φ(m)). It suffices to show that aed ≡ a (mod m) for all a ∈
Zm (so exponentiation to the d-th power is the inverse of exponentiation
to the e-th power). Take any prime divisor pi of the modulus m. We show
that aed ≡ a (mod pi ). By CRT, it then follows that aed ≡ a (mod m).
If pi |a, both aed and a are congruent to zero modulo pi . So assume that
pi 6 | a. Write ed = tφ(m) + 1 for some integer t. By Fermat’s little theorem,
api −1 ≡ 1 (mod pi ). Since φ(m) and so tφ(m) too are multiples of pi − 1, we
have atφ(m) ≡ 1 (mod pi ), that is, aed ≡ a (mod pi ).
[Only if] Suppose that s = gcd(e, pi − 1) > 1 for some i ∈ {1, 2, . . . , k}. Let
g be a primitive root of pi . Then, g (pi −1)/s 6≡ 1 (mod pi ), but (g (pi −1)/s )e ≡
(g pi −1 )e/s ≡ 1 (mod pi ). Consider two elements a, b ∈ Zm satisfying a ≡ b ≡
1 (mod pj ) for j 6= i, but a ≡ g (pi −1)/s (mod pi ) and b ≡ 1 (mod pi ). By
CRT, a 6≡ b (mod m), but ae ≡ be ≡ 1 (mod m), that is, the e-th-power-
exponentiation map is not bijective. Therefore, gcd(e, φ(pi )) = 1 for all i, that
is, gcd(e, φ(m)) = 1.
48. (a) We have bp ≡ b (mod p) and bq ≡ b (mod q), so we have to combine
these two values by the CRT. Let β = bq + tq. Then, β ≡ bq (mod q).
Also, tq ≡ bp − bq (mod p), so β ≡ bq + (bp − bq ) ≡ bp (mod p). Therefore,
β ≡ b (mod pq).
(b) Let s = |n| be the bit size of m. We then have the bit sizes |p| ≈ s/2 and
|q| ≈ s/2. Since modular exponentiation is done in cubic time, computing the
two modular exponentiations to obtain bp and bq takes a total time which is
about 1/4-th of that for computing b ≡ ae (mod m) directly. The remaining
operations in the modified algorithm can be done in O(s2 ) time. Thus, we get
a speed-up of about four.
50. In view of the CRT, it suffices to count the solutions of xm−1 ≡ 1 (mod pe )
for a prime divisor p of m with e = vp (m). Since the derivative of xm−1 − 1
is non-zero (−xm−2 ) modulo p, each solution of xm−1 ≡ 1 (mod p) lifts
uniquely to a solution of xm−1 ≡ 1 (mod pe ). Therefore, it suffices to count
the solutions of xm−1 ≡ 1 (mod p). Let g be a primitive root of p, and g α a
solution of xm−1 ≡ 1 (mod p). This implies that (m − 1)α ≡ 0 (mod p − 1)
(since ordp g = p − 1). This is a linear congruence in α, and has exactly
gcd(p − 1, m − 1) solutions.
524 Computational Number Theory

53. (a) Let f (x) = ad xd + ad−1 xd−1 + · · · + a1 x + a0 . The binomial theorem with
the substitution x = ξ ′ gives

f (ξ ′ ) = ad (ξ + kpe )d + ad−1 (ξ + kpe )d−1 + · · · + a1 (ξ + kpe ) + a0


= f (ξ) + kpe f ′ (ξ) + p2e × t

for some integer t. The condition f (ξ ′ ) ≡ 0³(mod´ p2e ) implies that f (ξ) +
kpe f ′ (ξ) ≡ 0 (mod p2e ), that is, f ′ (ξ)k ≡ − fp(ξ)
e (mod pe ). Each solution
of this linear congruence modulo pe gives a lifted root ξ ′ of f (x) modulo p2e .
(b) Here, f (x) = 2x3 + 4x2 + 3, so f ′ (x) = 6x2 + 8x. For p = 5, e = 2 and
ξ = 14, we have f (ξ) = 2 × 143 + 4 × 142 + 3 = 6275, that is, f (ξ)/25 ≡
251 ≡ 1 (mod 25). Also, f ′ (ξ) ≡ 6 × 142 + 8 × 14 ≡ 1288 ≡ 13 (mod 25).
Thus, we need to solve 13k ≡ −1 (mod 25). Since 13−1 ≡ 2 (mod 25), we have
k ≡ −2 ≡ 23 (mod 25). It follows that the only solution of 2x3 + 4x2 + 3 ≡
0 (mod 625) is 14 + 23 × 25 ≡ 589 (mod 625).
56. Since we can extract powers of two easily from a, we¡ ¢assume that a is odd.
For the Jacobi symbol, b is odd too. If a = b, then ab = 0. If a < b, we use
¡a¢ ¡b¢
¡ a ¢law to write b in terms of a . So it remains only
the quadratic reciprocity
to analyze the case of b with a, b odd and a > b. Let α ³ = a´− b. We write
¡a¢ ′
α = 2 a with r ∈ N and a odd. If r is even, then b = ab , whereas if r
r ′ ′

¡ ¢ ¡ ¢³ ′´ 2
³ ′´
is odd, then ab = 2b ab = (−1)(b −1)/8 ab . So, the problem reduces to
³ ′´
computing ab with both a′ , b odd.

59. (a) By Euler’s criterion, we have b(p−1)/2 ≡ −1 (mod p). By Fermat’s


p−1
little theorem, we also ³have
´ b ≡ 1 (mod
³ q ´ p). ³Therefore,
´q the order of
q v a a a
g ≡ b (mod p) is 2 . If p = 1, we have p = p = 1. Moreover, the
order of aq modulo p is a divisor of 2v−1 , that is, aq ≡ g s (mod p) for some
even integer s. We can rewrite this as aq g t ≡ 1 (mod p), where t = 2v − s
(if s = 0, we take t = 0). But then, (a(q+1)/2 g t/2 )2 ≡ a (mod p), that is,
a(q+1)/2 g t/2 is a square root of a modulo p.
The loop in Tonelli–Shanks algorithm determines the one bits of t/2. Since
t/2 < 2v−1 , we can write t/2 = tv−2 2v−2 + tv−3 2v−3 + · · · + t1 2 + t0 with
each ti ∈ {0, 1}. Assume that at the beginning of some iteration of the loop,
bits t0 , t1 , . . . , tv−i−2 are determined, and tv−i−1 = 1. At that point, x stores
the value a(q+1)/2 g (tv−i−2 ...t1 t0 )2 (mod p). (Initially, x ≡ a(q+1)/2 (mod p),
and no bits ti are determined.) Since (a(q+1)/2 g t/2 )2 ≡ a (mod p), we have
³ v−i−1
´2 v−i
x2 a−1 ≡ g −2 (tv−2 tv−3 ...tv−i−1 )2
≡ g −2 (tv−2 tv−3 ...tv−i−1 )2 (mod p).
The order of g modulo p is 2v , and the integer (tv−2 tv−3 . . . tv−i−1 )2 is odd.
j
Therefore, i is smallest among all integers j for which (x2 a−1 )2 ≡ 1 (mod p).
v−i−1
Once this i is detected, we know tv−i−1 = 1, and multiply x by g 2 , so x
now stores the value a(q+1)/2 g (tv−i−1 tv−i−2 ...t1 t0 )2 (mod p). When all the bits
Solutions to Selected Exercises 525

t0 , t1 , . . . , tv−2 are determined, x stores the value a(q+1)/2 g t/2 (mod p). At this
0
point, we have x2 a−1 ≡ (x2 a−1 )2 ≡ 1 (mod p).
(b) There are (p−1)/2 quadratic residues and (p−1)/2 quadratic non-residues
in Z∗p . Therefore, a randomly chosen element in Z∗p is a quadratic non-residue
with probability 1/2. That is, trying a constant number of random candidates
gives us a non-residue, and locating b is expected to involve only a constant
number of Legendre-symbol calculations. The remaining part of the algorithm
involves two modular exponentiations to compute g and the initial value of x.
Moreover, a−1 (mod p) can be precomputed outside the loop. Each iteration
of the loop involves at most v − 1 modular squaring operations to detect
i. This is followed by one modular multiplication. It therefore follows that
Algorithm 1.9 runs in probabilistic polynomial time.
(c) By Conjecture 1.74, the smallest quadratic non-residue modulo p is less
than 2 ln2 p. Therefore, we may search for b in Algorithm 1.9 deterministically
in the sequence 1, 2, 3, . . . until a non-residue is found. The search succeeds in
less than 2 ln2 p iterations.
¥√ ¦
61. (a) Let r = p . Since p is not a perfect square, we have r2 < p < (r + 1)2 .
Consider the (r + 1)2 integers (u + vx) rem p with u, v ∈ {0, 1, 2, . . . , r}. By
the pigeon-hole principle, there exist unequal pairs (u1 , v1 ) and (u2 , v2 ) such
that u1 + xv1 ≡ u2 + xv2 (mod p). Let a = u1 − u2 and b = v2 − v1 . Then,
a ≡ bx (mod p). By the choice of u1 , u2 , v1 , v2 , we have −r 6 a 6 r and
−r 6 b 6 r with either a 6= 0 or b 6= 0. Furthermore, since x 6= 0, both a and
b must be non-zero.
(b) For p = 2, take a = b = 1. So, we assume that p is an odd prime.
If p ≡ 1 (mod 4), the congruence x2 ≡ −1 (mod p) is solvable by Exer-
cise 1.60. Using this value of x in Part (a) gives us a non-zero pair (a, b)
satisfying a ≡ bx (mod p), that is, a2 ≡ b2 x2 ≡ −b2 (mod p), that is,
a2 + b2 ≡ 0 (mod p). By Part (a), 0 < a2 + b2 6 2r2 < 2p, that is, a2 + b2 = p.
Finally, take a prime p ≡ 3 (mod 4). If p = a2 + b2 for some a, b (both
must be non-zero), we have (ab−1 )2 ≡ −1 (mod p). But by Exercise 1.60, the
congruence x2 ≡ −1 (mod p) does not have a solution.
(c) [If] Let m = p1 p2 · · · ps q12e1 q22e2 · · · qt2et , where p1 , p2 , . . . , pt are all the
prime divisors (not necessarily distinct from one another) of m, that are not
of the form 4k + 3, and where q1 , q2 , . . . , qt are all the prime divisors (distinct
from one another) of m, that are of the form 4k + 3. Since (a2 + b2 )(c2 + d2 ) =
(ac + bd)2 + (ad − bc)2 , Part (b) establishes that p1 p2 · · · ps can be expressed
as α2 + β 2 . Now, take a = αq1e1 q2e2 · · · qtet and b = βq1e1 q2e2 · · · qtet .
[Only if] Let m = a2 + b2 for some integers a, b, and let q ≡ 3 (mod 4)
be a prime divisor of m. Since q|(a2 + b2 ), and the congruence x2 ≡
−1 (mod q) is not solvable by Exercise 1.60, we must have q|a and q|b. Let
e = min(vq (a), vq (b)). Since m = a2 +b2 , we have m/(q e )2 = (a/q e )2 +(b/q e )2 ,
that is, m/(q e )2 is again a sum of two squares. If q divides m/(q e )2 , then q
divides both a/q e and b/q e as before, a contradiction to the choice of e.
70. The result is obvious for e = 1, so take e > 2.
526 Computational Number Theory
e−2
Lemma: For every e > 2, we have (1 + ap)p ≡ 1 + ape−1 (mod pe ).
Proof We proceed by induction on e. For e = 2, both sides of the congruence
are equal to the integer 1 + ap. So assume that the given congruence holds
e−1
for some e > 2. We investigate the value of (1 + ap)p modulo pe+1 . By
e−2
p e−1 e
the induction hypothesis, (1 + ap) = 1 + ap + up for some integer u.
Raising both sides of this equality to the p-th power gives
e−1
(1 + ap)p = (1 + ape−1 + upe )p
µ ¶ µ ¶
p e−1 e p
= 1+ (ap + up ) + (ape−1 + upe )2 + · · · +
1 2
µ ¶
p
(ape−1 + upe )p−1 + (ape−1 + upe )p
p−1
= 1 + ape + pe+1 × v
¡ ¢
for some integer v (since p is prime and so p| kp for 1 6 k 6 p − 1, and since
the last term in the binomial expansion is divisible by pp(e−1) , in which the
exponent p(e − 1) > e + 1 for all p > 3 and e > 2). •
Let us now derive the order of 1 + ap modulo pe . Using the lemma for e + 1
e−1 e−1
indicates (1 + ap)p ≡ 1 + ape (mod pe+1 ) and, in particular, (1 + ap)p ≡
1 (mod pe ). Therefore, ordpe (1 + ap) | pe−1 . The lemma also implies that
e−2
(1 + ap)p 6≡ 1 (mod pe ) (for a is coprime to p), that is, ordpe (1 + ap)6 | pe−2 .
We, therefore, have ordpe (1 + ap) = pe−1 .

76. The infinite simple continued fraction expansion of 2 is ha0√ , a1 , a2 , . . .i =
h1, 2i. Let hn /kn = ha0 , a1√
, . . . , an i be the n-th convergent to 2 . But then,
for every n ∈ N,we have | 2 − hknn | < kn k1n+1 = kn (2kn1+kn−1 ) 6 2k12 , that is,
√ √ √ √
n

2kn − 2k1n < hn < 2kn + 2k1n , that is, − 2 + 4k12 < h2n − 2kn2 < 2 + 4k12 .
n √ n
Since kn > 1, it follows that h2n − 2kn2 ∈ {0, 1, −1} for all n ∈ N.√But 2 is
irrational, so we cannot have h2n − 2kn2 = 0. Furthermore, hknn < 2 for even
√ n
−1 if n is even,
n, whereas hknn > 2 for odd n. Consequently, h2n − 2kn2 =
1 if n is odd.

78. (a) We compute 5 = h2, 4, 4, 4, . . .i = h2, 4i as follows:

ξ0 = 5 = 2.236 . . . , a0 = ⌊ξ0 ⌋ = 2
1 1 √
ξ1 = =√ = 5 + 2 = 4.236 . . . , a1 = ⌊ξ1 ⌋ = 4
ξ0 − a0 5−2
1 1 √
ξ2 = =√ = 5 + 2 = 4.236 . . . , a2 = ⌊ξ2 ⌋ = 4
ξ1 − a1 5−2
···

(b) The first convergent is r0 = hk00 = h2i = 2/1, that is, h0 = 2 and k0 = 1.
But h20 − 5k02 = −1. Then, we have r = hk11 = h2, 4i = 2 + 41 = 49 , that is,
h1 = 9 and k1 = 4. We have h21 − 5k12 = 1. Since k0 6 k1 < k2 < k3 < · · · , the
smallest solution is (9, 4).
Solutions to Selected Exercises 527

(c) We proceed by induction on n. For n = 0, (x0 , y0 ) = (a, b) = (9, 4) is


a solution of x2 − 5y 2 = 1 by Part (b). So assume that n > 1, and that
2
x2n−1 − 5yn−1 = 1. But then

x2n − 5yn2 = (axn−1 + 5byn−1 )2 − 5(bxn−1 + ayn−1 )2


= a2 (x2n−1 − 5yn−1
2
) − 5b2 (x2n−1 − 5yn−1
2
)
2 2
= a − 5b = 1.

80. Let r0 , r1 , r2 , . . . , rj be the remainder sequence computed by the Euclidean


gcd algorithm with r0 = a, r1 = b, and rj = 0. We assume that a > b so
that r0 > r1 > r2 > · · · > rj−1 > rj . Let di be the bit length of ri . We
then have d0 > d1 > d2 > · · · > dj−1 > dj . The crucial observation here is
that if some di is slightly larger than di+1 , then Euclidean division of ri by
ri+1 is quite efficient, but we do not expect to have a huge size reduction in
ri+2 compared to ri . On the other hand, if di is much larger than di+1 , then
Euclidean division of ri by ri+1 takes quite some time, but the size reduction
of ri+2 compared to ri is also substantial.
In order to make this observation more precise, we note that Euclidean
division of an s-bit integer by a t-bit integer (with s > t) takes time roughly
proportional to t(s − t). (In practice, we do word-level operations on multiple-
precision integers, and so the running time of a Euclidean division is roughly
proportional to t′ (s′ − t′ ) where s′ , t′ are the word lengths of the operands.
But since bit lengths are roughly proportional to word lengths, we may talk
about bit lengths only.) Therefore, the running time of the Euclidean gcd
loop is roughly proportional to d1 (d0 − d1 ) + d2 (d1 − d2 ) + d3 (d2 − d3 ) + · · · +
dj−1 (dj−2 − dj−1 ) 6 d1 [(d0 − d1 ) + (d1 − d2 ) + (d2 − d3 ) + · · · + (dj−2 − dj−1 )] =
d1 (d0 − dj−1 ) 6 d0 d1 6 d20 = O(lg2 a).
81. A GP/PARI code implementing this search is given below.

nprime = 0; nsol = 0;
for (p=2, 10^6, \
if (isprime(p), \
nprime++; \
t = p; s = 0; \
while (t > 0, s += t % 7; t = floor(t / 7)); \
if ((s > 1) && (!isprime(s)), \
nsol++; \
print("p = ", p, ", S7(p) = ", s); \
) \
) \
)
print("Total number of primes less that 10^6 is ", nprime);
print("Total number of primes for which S7(p) is composite is ", nsol);

This code reveals that among 78498 primes p < 106 , only 13596 lead to com-
posite values of S7 (p). Each of these composite values is either 25 or 35.
528 Computational Number Theory

Let p = ak−1 7k−1 + ak−2 7k−2 + · · · + a1 7 + a0 be the base-7 representation


of p, so S7 (p) = ak−1 + ak−2 + · · · + a1 + a0 . One easily sees that S7 (p) − p is
a multiple of 6. It suffices to consider the primes p > 7, so p must be of the
form 6s ± 1, that is, S7 (p) too is of the form 6t ± 1. For p < 107 , we have k 6 8
(where k is the number of non-zero 7-ary digits of p), that is, S7 (p) 6 48.
All integers r in the range 1 < r 6 48 and of the form 6t ± 1 are 5, 7, 11, 13,
17, 19, 23, 25, 29, 31, 35, 37, 41, 43, 47. All of these except 25 and 35 are prime.
82. The following GP/PARI function implements the search for integral values of
(a2 + b2 )/(ab + 1). Since the function is symmetric about the two arguments,
we restrict the search to 1 6 a 6 b 6 B.

getsol (B) = \
for (a=1, B, \
for (b=a, B, \
c = a^2 + b^2; \
d = a*b + 1; \
if (c % d == 0, \
print("a = ", a, ", b = ", b, ", (a^2+b^2)/(ab+1) = ", c / d); \
) \
) \
)

The observation is that whenever (a2 + b2 )/(ab + 1) is an integer, that


integer is a perfect square. For proving this, let a, b be non-negative integers
with a > b > 0 such that (a2 + b2 )/(ab + 1) = n ∈ N. But then, we have
a2 − (nb)a + (b2 − n) = 0.
This is a quadratic equation in a (treating b as constant). Let the other solution
of this equation be a′ . We have a + a′ = nb and aa′ = b2 − n. The first of these
equations implies that a′ = nb − a is an integer. Since (a′2 + b2 )/(a′ b + 1) is
positive, a′ cannot be negative. Moreover, a′ = (b2 − n)/a 6 (a2 − n)/a <
a2 /a = a. Thus, we can replace the solution (a, b) of (a2 + b2 )/(ab + 1) = n
by a strictly smaller solution (b, a′ ) (with the same n). This process cannot
continue indefinitely, that is, we must eventually encounter a pair (ā, b̄) for
which (ā2 + b̄2 )/(āb̄ + 1) = n with b̄ = 0. But then, n = ā2 .

Chapter 2 Arithmetic of Finite Fields


1. We adopt the convention that the degree of the zero polynomial is −∞. For any
two polynomials f (x), g(x), we then have deg(f (x)g(x)) = deg f (x)+deg g(x).
Moreover, we can include the case r(x) = 0 in the case deg r(x) < deg g(x).
(a) Let m = deg f (x) and n = deg g(x). Since the result is trivial for n = 0
(constant non-zero polynomials g(x)), we assume that n > 1, and proceed by
Solutions to Selected Exercises 529

induction on m. If m < n, we take q(x) = 0 and r(x) = f (x). So consider


m > n, and assume that the result holds for all polynomials f1 (x) of degrees
< m. If a and b are the leading coefficients of f and g, we construct the
polynomial f1 (x) = f (x) − (a/b)xm−n g(x). Clearly, deg f1 (x) < m, and so by
the induction hypothesis, f1 (x) = q1 (x)g(x) + r1 (x) for some polynomials q1
and r1 with deg r1 < deg g. But then, f (x) = (q1 (x)+(a/b)xm−n )g(x)+r1 (x),
that is, we take q(x) = q1 (x) + (a/b)xm−n and r(x) = r1 (x).
In order to prove the uniqueness of the quotient and the remainder polyno-
mials, suppose that f (x) = q(x)g(x) + r(x) = q̄(x)g(x) + r̄(x) with both r and
r̄ having degrees less than deg g. But then, (q(x) − q̄(x))g(x) = r̄(x) − r(x).
If r 6= r̄, then the right side is a non-zero polynomial of degree less than n,
whereas the left side, if non-zero, is a polynomial of degree > n. This contra-
diction indicates that we must have q = q̄ and r = r̄.
(b) Since r(x) = f (x) − q(x)g(x), any common divisor of f (x) and g(x) di-
vides r(x) and so gcd(g(x), r(x)) too. Likewise, f (x) = q(x)g(x) + r(x) implies
that any common divisor of g(x) and r(x) divides f (x) and so gcd(f (x), g(x))
too. In particular, gcd(f, g)| gcd(g, r) and gcd(g, r)| gcd(f, g). If both these
gcds are taken as monic polynomials, they must be equal.
(c) We follow a procedure similar to the Euclidean gcd of integers. We gener-
ate three sequences ri (x), ui (x), vi (x) maintaining the invariance ui (x)f (x) +
vi (x)g(x) = ri (x) for all i > 0. We initialize the sequences as r0 (x) = f (x),
u0 (x) = 1, v0 (x) = 0, r1 (x) = g(x), u1 (x) = 0, v1 (x) = 1. Subsequently,
for i = 2, 3, 4, . . . , we compute the quotient qi (x) and ri (x) of Euclidean
division of ri−2 (x) by ri−1 (x). We also update the u and v sequences as
ui (x) = ui−2 (x) − qi (x)ui−1 (x) and vi (x) = vi−2 (x) − qi (x)vi−1 (x). The al-
gorithm terminates, since the r sequence consists of polynomials with strictly
decreasing degrees. If j is the smallest index for which rj (x) = 0, then
gcd(f (x), g(x)) = rj−1 (x) = uj−1 (x)f (x) + vj−1 (x)g(x).
(d) Let d(x) = gcd(f (x), g(x)) = u(x)f (x) + v(x)g(x) for some polynomials
u, v. For any polynomial q(x), we have d(x) = (u(x) − q(x)g(x))f (x) + (v(x) +
q(x)f (x))g(x). In particular, we can take q(x) = u(x) quot g(x), and assume
that deg u < deg g in the Bézout relation d = uf + vg. But then, deg vg =
deg v + deg g = deg(d − uf ) 6 max(deg d, deg uf ) = deg uf = deg u + deg f <
deg g + deg f , that is, deg v < deg f .
9. (c) We use a window of size t. For simplicity, t should divide the bit size w of
a word. If w = 32 or 64, natural choices for t are 2, 4, 8. For each t-bit pattern
(at−1 at−2 . . . a1 a0 ), the 2t-bit pattern (0at−1 0at−2 . . . 0a1 0a0 ) is precomputed
and stored in a table of size 2t . In the squaring loop, t bits of the operand are
processed simultaneously. For a t-bit chunk in the operand, the square is read
from the precomputed table and XOR-ed with the output with an appropriate
shift. Note that the precomputed table is an absolutely constant table, that
is, independent of the operand.
10. We first extract the coefficients of x255 through x233 from γ3 :

µ = RIGHT-SHIFT(γ3 , 41).
530 Computational Number Theory

We then make these bits in γ3 zero as follows:


γ3 is AND-ed with the constant integer 0x1FFFFFFFFFF.
What remains is to add µf1 = µ(x74 + 1) = x64 (x10 µ) + µ to γ. Since µ is a
23-bit value, this is done as follows:
γ1 is XOR-ed with LEFT-SHIFT(µ, 10),
γ0 is XOR-ed with µ.

13. Initialize the u sequence as u0 = β and u1 = 0. The rest of the extended


gcd algorithm remains the same. Now, the extended gcd loop maintains the
invariance ui β −1 α + vi f = ri (where f is the defining polynomial). If rj = 1,
we have uj β −1 α ≡ 1 (mod f ), that is, βα−1 = uj .
2k k k k k k 2k+1
15. (a) We have α2 −1 = α(2 −1)(2 +1) = (α2 −1 )2 α2 −1 . Moreover, α2 −1
=
2k+1 2k
α2 −2+1
= (α2 −1 )2 α.
(b) The following algorithm resembles left-to-right exponentiation.

Let n − 1 = (ns−1 ns−2 . . . n1 n0 )2 with ns−1 = 1.


Initialize prod = α and k = 1.
n−1
/* Loop for computing α2 −1 */
For i = s − 2, s − 3, . . . , 2, 1, 0, repeat: {
k
/* Here, k = (ns−1 ns−2 . . . ni+1 )2 , and prod = α2 −1 */
k
Set t = prod. /* Remember α2 −1 */
k k
For j = 1, 2, . . . , k, set prod = prod2 . /* prod = (α2 −1 )2 */
2k k k k
Set prod = prod × t. /* prod = α2 −1 = (α2 −1 )2 α2 −1 */
Set k = 2k. /* k = (ns−1 ns−2 . . . ni+1 0)2 */
If (ni = 1) { /* (ns−1 ns−2 . . . ni+1 ni )2 = (ns−1 ns−2 . . . ni+1 0)2 + 1 */
Set prod = prod2 × α and k = k + 1.
}
}
Return prod2 .
(c) Let Ni = (ns−1 ns−2 . . . ni )2 . The number of squares (in ¥ the field) per-
¦
s−1
formed
¥ by the ¦loop is 6 (Ns−1 + N s−2 + · · · + N1 ) + (s − 1) = (n − 1)/2 +
1 1
(n − 1)/2s−2 +· · ·+⌊n/2⌋+(s−1) 6 (n−1)( 2s−1 + 2s−2 +· · ·+ 12 )+(s−1) 6
(n − 1) + (s − 1) 6 n + s. The number of field multiplications performed by the
loop is 6 2s. The algorithm of Exercise 2.14(b), on the other hand, performs
about n square and n multiplication operations in the field. Since s ≈ lg n,
the current algorithm is expected to be faster than the algorithm of Exer-
cise 2.14(b) (unless n is too small).
24. Let us represent elements of Fpn in a basis β0 , β1 , . . . , βn−1 . Take an element
α = a0 β0 + a1 β1 + a2 β2 + · · · + an−1 βn−1 with each ai ∈ Fp . We then have
api = ai for all i. Therefore, αp = a0 β0p + a1 β1p + a2 β2p + · · · + an−1 βn−1p
. If we
p
precompute and store each βi as an Fp -linear combination of β0 , β1 , . . . , βn−1 ,
computing αp can be finished in O(n2 ) time.
Solutions to Selected Exercises 531

If β0 , β1 , β2 , . . . , βn−1 constitute a normal basis of Fpn over Fp with βi =


i
β p , then we have βip = β(i+1) rem n . Therefore, the p-th power exponentiation
of (a0 , a1 , . . . , an−1 ) is the cyclic shift (an−1 , a0 , a1 , . . . , an−2 ). That is, p-th
power exponentiation with respect to a normal basis is very efficient.
26. (b) We first write the input operands as (a0 + a1 θ) + (a2 )θ2 and (b0 + b1 θ) +
(b2 )θ2 . The first level of Karatsuba–Ofman multiplication involves computing
the three products (a0 + a1 θ)(b0 + b1 θ), a2 b2 and (a0 + a2 + a1 θ)(b0 + b2 + b1 θ),
of which only one (a2 b2 ) is an Fq -multiplication. Applying a second level of
Karatsuba–Ofman multiplication on (a0 + a1 θ)(b0 + b1 θ) requires three Fq -
multiplications: a0 b0 , a1 b1 , and (a0 + a1 )(b0 + b1 ). Likewise, computing (a0 +
a2 +a1 θ)(b0 +b2 +b1 θ) involves three Fq -multiplications: (a0 +a2 )(b0 +b2 ), a1 b1 ,
and (a0 + a1 + a2 )(b0 + b1 + b2 ). Finally, note that the product a1 b1 appears
in both the second-level Karatsuba–Ofman multiplications, and needs to be
computed only once.
(d) Let us write the input operands as (a0 +a1 θ+a2 θ2 )+(a3 +a4 θ)θ3 and (b0 +
b1 θ +b2 θ2 )+(b3 +b4 θ)θ3 . In the first level of Karatsuba–Ofman multiplication,
we need the three products (a0 + a1 θ + a2 θ2 )(b0 + b1 θ + b2 θ2 ) (requiring
six Fq -multiplications by Part (b)), (a3 + a4 θ)(b3 + b4 θ) (requiring three Fq -
multiplications by Part (a)), and ((a0 +a3 )+(a1 +a4 )θ +a2 θ2 )((b0 +b3 )+(b1 +
b4 )θ + b2 θ2 ) (requiring six Fq -multiplications again by Part (b)). However, the
Fq -product a2 b2 is commonly required in the first and the third of these three
first-level products, and needs to be computed only once.
36. (a) The monic linear irreducible polynomials over F4 are x, x+1, x+θ, x+θ+1.
The products of any two (including repetition) of these polynomials are the
reducible monic quadratic polynomials—there are ten of them: x2 , x2 + 1,
x2 +θ+1, x2 +θ, x2 +x, x2 +θx, x2 +(θ+1)x, x2 +(θ+1)x+θ, x2 +θx+(θ+1),
and x2 +x+1. The remaining six monic quadratic polynomials are irreducible:
x2 + x + θ, x2 + x + (θ + 1), x2 + θx + 1, x2 + θx + θ, x2 + (θ + 1)x + 1, and
x2 + (θ + 1)x + (θ + 1).
(b) Let us use the polynomial x2 +x+θ to represent F16 . That is, F16 = F4 (ψ),
where ψ 2 + ψ + θ = 0. Let us take two elements

α = (a3 θ + a2 )ψ + (a1 θ + a0 ),
β = (b3 θ + b2 )ψ + (b1 θ + b0 )

in F16 . The formula for their sum is simple:

α+β = [(a3 + b3 )θ + (a2 + b2 )]ψ + [(a1 + b1 )θ + (a0 + b0 )].

The product involves reduction with respect to both θ and ψ.

αβ = [(a3 θ+a2 )(b3 θ+b2 )]ψ 2 +[(a3 θ+a2 )(b1 θ+b0 )+(a1 θ+a0 )(b3 θ+b2 )]ψ+
[(a1 θ+a0 )(b1 θ+b0 )]
= [(a3 b3 +a3 b2 +a2 b3 )θ+(a3 b3 +a2 b2 )]ψ 2 +
[(a3 b1 +a3 b0 +a2 b1 +a1 b3 +a1 b2 +a0 b3 )θ+(a3 b1 +a2 b0 +a1 b3 +a0 b2 )]ψ+
532 Computational Number Theory

[(a1 b1 +a1 b0 +a0 b1 )θ+(a1 b1 +a0 b0 )]


= [(a3 b3 +a3 b2 +a2 b3 )θ+(a3 b3 +a2 b2 )](ψ+θ)+
[(a3 b1 +a3 b0 +a2 b1 +a1 b3 +a1 b2 +a0 b3 )θ+(a3 b1 +a2 b0 +a1 b3 +a0 b2 )]ψ+
[(a1 b1 +a1 b0 +a0 b1 )θ+(a1 b1 +a0 b0 )]
h
= (a3 b3 +a3 b2 +a3 b1 +a3 b0 +a2 b3 +a2 b1 +a1 b3 +a1 b2 +a0 b3 )θ+
i
(a3 b3 +a3 b1 +a2 b2 +a2 b0 +a1 b3 +a0 b2 ) ψ+
h i
(a3 b3 +a3 b2 +a2 b3 )θ2 +(a3 b3 +a2 b2 +a1 b1 +a1 b0 +a0 b1 )θ+(a1 b1 +a0 b0 )
h
= (a3 b3 +a3 b2 +a3 b1 +a3 b0 +a2 b3 +a2 b1 +a1 b3 +a1 b2 +a0 b3 )θ+
i
(a3 b3 +a3 b1 +a2 b2 +a2 b0 +a1 b3 +a0 b2 ) ψ+
h i
(a3 b2 +a2 b3 +a2 b2 +a1 b1 +a2 b0 +a0 b1 )θ+(a3 b3 +a3 b2 +a2 b3 +a1 b1 +a0 b0 )

(c) We have |F∗16 | = 15 = 3 × 5, ψ 3 = (θ + 1)ψ + θ 6= 1 and ψ 5 = θ 6= 1, so ψ


is a primitive element of F16 .
(d) We have the following powers of γ = (θ + 1)ψ + 1:

γ = (θ + 1)ψ + 1,
γ2 = (θ)ψ + (θ),
γ4 = (θ + 1)ψ + (θ),
γ8 = (θ)ψ.

Thus, the minimal polynomial of γ over F2 is (x + γ)(x + γ 2 )(x + γ 4 )(x + γ 8 ) =


x4 + x3 + x2 + x + 1.
(e) The minimal polynomial of γ over F4 is (x+γ)(x+γ 4 ) = (x+(θ +1)ψ +1)
(x + (θ + 1)ψ + θ) = x2 + (θ + 1)x + 1.
41. As in Exercise 2.32, we represent F16 = F2 (φ) with φ4 + φ + 1 = 0. The
representation of F16 in Exercise 2.36 is F16 = F2 (θ)(ψ), where θ2 + θ + 1 = 0
and ψ 2 + ψ + θ = 0. We need to compute the change-of-basis matrix from
the polynomial basis (1, φ, φ2 , φ3 ) to the composite basis (1, θ, ψ, θψ). To that
effect, we note that φ satisfies x4 + x + 1 = 0, and obtain a root of this
polynomial in the second representation. Squaring ψ 2 + ψ + θ = 0 gives ψ 4 +
ψ 2 +θ2 = 0, that is, ψ 4 +(ψ 2 +ψ +θ)+ψ +(θ2 +θ) = 0, that is, ψ 4 +ψ +1 = 0.
We consider the linear map µ taking φ to ψ, and obtain:

µ(1) = 1,
µ(φ) = ψ,
µ(φ2 ) = ψ 2 = ψ + θ,
µ(φ3 ) = ψ(ψ + θ) = ψ 2 + ψθ = θ + ψ + ψθ.
Solutions to Selected Exercises 533

Therefore, the change-of-basis matrix is


 
1 0 0 0
0 0 1 0
T = 
0 1 1 0
0 1 1 1

42. We iteratively find elements β0 , β1 , . . . , βn−1 to form an Fp -basis of Fpn . Ini-


tially, any non-zero element of Fpn can be taken as β0 , so the number of choices
is pn − 1. Now, suppose that i linearly independent elements β0 , β1 , . . . , βi−1
are chosen. The number of all possible Fp -linear combinations of these i el-
ements is exactly pi . We choose any βi which is not a linear combination of
β0 , β1 , . . . , βi−1 , that is, the number of choices for βi is exactly pn − pi .
44. Both the parts follow from the following result.
Claim: Let d = gcd(m, n). Then, g decomposes in F2m [x] into a product of d
irreducible polynomials each of degree n/d.
m
Proof Take any root α ∈ Fp of g. The conjugates of α over Fpm are α, αp ,
m 2 m t−1 m t
α(p ) , . . . , α(p ) , where t is the smallest integer for which α(p ) = α. On
k
the other hand, deg g = n, and g is irreducible over Fp , implying that αp = α
if and only if k is a multiple of n. Therefore, mt ≡ 0 (mod n). The smallest
positive integral solution for t is n/d. That is, the degree of α over Fpm is
exactly n/d. Since this is true for any root of g, the claim is established. •
53. If α is a t-th power residue, then β t = α for some β ∈ F∗q . But then, α(q−1)/d =
(β t )(q−1)/d = (β q−1 )t/d = 1 by Fermat’s little theorem.
Proving the converse requires more effort. Let γ be a primitive element
in F∗q . Then, an element γ i is a t-th power residue if and only if γ i = (γ y )t
for some y, that is, the congruence ty ≡ i (mod q − 1) is solvable for y,
that is, gcd(t, q − 1)|i. Thus, the values of i ∈ {0, 1, 2, . . . , q − 2} for which γ i
is a t-th power residue are precisely 0, d, 2d, . . . , ( q−1
d − 1)d, that is, there are
exactly (q−1)/d t-th power residues in F∗q . All these t-th power residues satisfy
x(q−1)/d = 1. But then, since x(q−1)/d − 1 cannot have more than (q − 1)/d
roots, no t-th power non-residue can satisfy x(q−1)/d = 1.
n−1
54. If q = 2n , take x = 0 and y = a2 . So assume that q is odd, and write the
given equation as x2 = α − y 2 . As y ranges over all values in Fq , the quantity
y 2 ranges over a total of (q + 1)/2 values (zero, and all the quadratic residues),
that is, α − y 2 too assumes (q + 1)/2 distinct values. Not all these values can
be quadratic non-residues, since there are only (q − 1)/2 non-residues in Fq .
58. (d) If α = γ p − γ, then by additivity of the trace function, we have Tr(α) =
2 3 n 2 n−1
Tr(γ p ) − Tr(γ) = (γ p + γ p + γ p + · · · + γ p ) − (γ + γ p + γ p + · · · + γ p ) = 0,
pn
since γ = γ by Fermat’s little theorem.
Conversely, suppose that Tr(α) = 0. It suffices to show that the polynomial
n
xp − x − α has at least one root in Fpn . Since xp − x is the product of all
monic linear polynomials in Fpn [x], the number of roots of xp − x − α is the
534 Computational Number Theory
n
degree of the gcd of xp − x − α with xp − x. In order to compute this gcd,
n
we compute xp − x modulo xp − x − α. But xp ≡ x + α (mod xp − x − α), so
n n−1
xp − x ≡ (x + α)p −x
pn−1 pn−1
≡ x +α −x
pn−2 n−1
≡ (x + α) + αp −x
n−2 n−2 n−1
≡ xp + αp + αp −x
≡ ···
2 n−2 n−1
≡ x + α + α p + α p + · · · + αp + αp −x
≡ Tr(α)
≡ 0 (mod xp − x − α).
n
Therefore, gcd(xp − x − α, xp − x) = xp − x − α, that is, α = γ p − γ for p
distinct elements of Fpn .
61. (a) Let θ0 , θ1 , . . . , θn−1 constitute an Fp -basis of Fpn . Let Ai denote the i-th
column of A (for i = 0, 1, 2, . . . , n − 1). Suppose that a0 A0 + a1 A1 + · · · +
an−1 An−1 = 0. Let α = a0 θ0 + a1 θ1 + · · · + an−1 θn−1 . Since api = ai for all i,
we then have a0 Tr(θi θ0 ) + a1 Tr(θi θ1 ) + · · · + an−1 Tr(θi θn−1 ) = Tr(θi (a0 θ0 +
a1 θ1 +· · ·+an−1 θn−1 )) = Tr(θi α) = 0 for all i. Since θ0 , θ1 , . . . , θn−1 constitute
a basis of Fpn over Fp , it follows that Tr(βα) = 0 for all β ∈ Fpn . If α 6= 0,
this in turn implies that Tr(γ) = 0 for all γ ∈ Fpn . But the polynomial
2 n−1
x + xp + xp + · · · + xp can have at most pn−1 roots. Therefore, we must
have α = 0. But then, by the linear independence of θ0 , θ1 , . . . , θn−1 , we
conclude that a0 = a1 = · · · = an−1 = 0, that is, the columns of A are linearly
independent, that is, ∆(θ0 , θ1 , . . . , θn−1 ) 6= 0.
Conversely, if θ0 , θ1 , . . . , θn−1 are linearly dependent, then a0 θ0 + a1 θ1 +
· · · + an−1 θn−1 = 0 for some a0 , a1 , . . . , an−1 ∈ Fp , not all zero. But then, for
all i ∈ {0, 1, 2, . . . , n − 1}, we have a0 θi θ0 + a1 θi θ1 + · · · + an−1 θi θn−1 = 0, that
is, a0 Tr(θi θ0 ) + a1 Tr(θi θ1 ) + · · · + an−1 Tr(θi θn−1 ) = 0, that is, the columns
of A are linearly dependent, that is, ∆(θ0 , θ1 , . . . , θn−1 ) = 0.
(c) Consider the van der Monde matrix
 
1 1 1 ··· 1
 λ0
 2 λ1 λ2 · · · λn−1  
2 2
V (λ0 , λ1 , . . . , λn−1 ) = 
 λ0 λ1 λ2 · · · λ2n−1 .
 . .. .. .. 
 . . . . ··· . 
λn−1
0 λ n−1
1 λ n−1
2 · · · λn−1
n−1

If λi = λj , the determinant of this matrix is 0. It therefore follows that


Y
det V (λ0 , λ1 , . . . , λn−1 ) = ± (λi − λj ).
06i<j6n−1
2 n−1
If we take θi = θi in Part (b), we see that B t = V (θ, θp , θp , . . . , θp ).
Finally, det B = det B t , and det A = (det B)2 .
Solutions to Selected Exercises 535

66. The following GP/PARI function accepts as input the element a(x) that we want
to invert, the characteristic p, and the defining polynomial f (x).

BinInv(a,p,f) = \
local(r1,r2,u1,u2); \
r1 = Mod(1,p) * a; r2 = Mod(1,p) * f; \
u1 = Mod(1,p); u2 = Mod(0,p); \
while (1, \
while(polcoeff(r1,0)==Mod(0,p), \
r1 = r1 / (Mod(1,p) * x); \
if (polcoeff(u1,0) != Mod(0,p), \
u1 = u1 - (polcoeff(u1,0) / polcoeff(f,0)) * f \
); \
u1 = u1 / (Mod(1,p) * x); \
if (poldegree(r1) == 0, return(lift(u1/polcoeff(r1,0)))); \
); \
while(polcoeff(r2,0)==Mod(0,p), \
r2 = r2 / (Mod(1,p) * x); \
if (polcoeff(u2,0) != Mod(0,p), \
u2 = u2 - (polcoeff(u2,0) / polcoeff(f,0)) * f
); \
u2 = u2 / (Mod(1,p) * x); \
if (poldegree(r2) == 0, return(lift(u2/polcoeff(r2,0)))); \
); \
if (poldegree(r1) >= poldegree(r2), \
c = polcoeff(r1,0)/polcoeff(r2,0); r1 = r1 - c*r2; u1 = u1 - c*u2, \
c = polcoeff(r2,0)/polcoeff(r1,0); r2 = r2 - c*r1; u2 = u2 - c*u1 \
) \
)

A couple of calls of this function follow.

BinInv(x^6+x^3+x^2+x, 2, x^7+x^3+1)
BinInv(9*x^4+7*x^3+5*x^2+3*x+2, 17, x^5+3*x^2+5)

69. First, we write two functions for computing the trace and the norm of a ∈ Fpn .
The characteristic p and the defining polynomial f are also passed to these
functions. The extension degree n is determined from f .

abstrace(p,f,a) = \
local(n,s,u); \
f = Mod(1,p) * f; \
a = Mod(1,p) * a; \
n = poldegree(f); \
s = u = a; \
for (i=1,n-1, \
u = lift(Mod(u,f)^p); \
s = s + u; \
); \
return(lift(s));
536 Computational Number Theory

absnorm(p,f,a) = \
local(n,t,u); \
f = Mod(1,p) * f; \
a = Mod(1,p) * a; \
n = poldegree(f); \
t = u = a; \
for (i=1,n-1, \
u = lift(Mod(u,f)^p); \
t = (t * u) % f; \
); \
return(lift(t));

The following statements print the traces and norms of all elements of F64 =
F2 (θ), where θ6 + θ + 1 = 0.

f = x^6 + x + 1;
p = 2;
for (i=0,63, \
a = 0; t = i; \
for (j=0, 5, c = t % 2; a = a + c * x^j; t = floor(t/2)); \
print("a = ", a, ", Tr(a) = ", abstrace(p,f,a), ", N(a) = ", absnorm(p,f,a)) \
)

Chapter 3 Arithmetic of Polynomials


2. (a) Let Iq,n denote the product
Q of all monic irreducible polynomials of degree
n
n. We have xq − x = d|n Iq,d . By the multiplicative form of the Möbius
inversion formula, we have
Y d Y n/d
Iq,n = (xq − x)µ(n/d) = (xq − x)µ(d) .
d|n d|n

(b) For q = 2 and n = 6, we have


6 3 2 1
I2,6 = (x2 + x)µ(1) (x2 + x)µ(2) (x2 + x)µ(3) (x2 + x)µ(6)
(x64 + x)(x2 + x)
=
(x8 + x)(x4 + x)
= x54 + x53 + x51 + x50 + x48 + x46 + x45 + x43 + x42 + x33 +
x32 + x30 + x29 + x27 + x25 + x24 + x22 + x21 + x12 + x11 +
x9 + x8 + x6 + x4 + x3 + x + 1.
(c) Now, take q = 4 and n = 3 to obtain:
3 1 x64 + x
I4,3 = (x4 + x)µ(1) (x4 + x)µ(3) =
x4 + x
Solutions to Selected Exercises 537

= x60 + x57 + x54 + x51 + x48 + x45 + x42 + x39 + x36 + x33 +
x30 + x27 + x24 + x21 + x18 + x15 + x12 + x9 + x6 + x3 + 1.

8. (a) This follows from Exercise 2.52.


(b) Let Rα = {β − α | β is a quadratic residue in Fq }, and Nα = {β − α | β is
a quadratic non-residue in Fq }. Then, Rα consists of all the roots of vα (x),
and Nα of all the roots of wα (x). Let γ be any root of f . For a random α,
we conclude, under the given reasonable assumptions, that γ = α with prob-
ability 1/q, γ ∈ Rα with probability about 1/2, and γ ∈ Nα with probability
about 1/2 again. The first case (γ = α) is very unlikely, and can be ignored.
Therefore, all the d roots of f are in Rα with probability 1/2d , and all are in
Nα with probability 1/2d . Therefore, the probability that gcd(vα (x), f (x)) is
a non-trivial factor of f (x) is 1 − 2 × 21d = 1 − 2d−1 1
> 1/2 for d > 2.
(c) By Part (b), we need to try a constant number of random elements α ∈ Fq
before we expect to split f non-trivially. Each trial involves an exponentiation
modulo f and a gcd calculation of polynomials of degrees 6 d. Exactly d − 1
splits are necessary to obtain all the roots of f .
9. (a) Assume that q ≫ d/r, and that the quadratic residues are randomly
distributed in F∗qr . Since q is odd, ξ ∈ F∗qr is a quadratic residue if and only
i i i
if all its conjugates ξ q are. Moreover, if α ∈ Fq , we have (ξ − α)q = ξ q − α,
i
that is, the conjugates of ξ − α are ξ q − α.
r
For α ∈ Fq , define Rα as the set of all the roots of (x + α)(q −1)/2 − 1 in
r
(q −1)/2
Fqr , and Nα as the set of all the roots of (x+α) +1 in Fqr . If g(x) is an
irreducible (over Fq ) factor of f (x), then all the roots of g(x) belong to Fqr .
Moreover, all these roots are simultaneously present either in Rα or in Nα .
r
Therefore, g(x) is a factor of gcd((x + α)(q −1)/2 − 1, f (x)) with probability
1/2, and so the probability of obtaining a non-trivial split of f (for a randomly
1
chosen α) is 1 − 2 × 2d/r which is > 1/2 for d > 2r.
10. (f ) Let α1 , α2 , α3 , . . . , αq−1 be an ordering of the elements of F∗q . For example,
if F2n = F2 (θ), we can take αi = an−1 θn−1 + an−2 θn−2 + · · · + a1 θ + a0 , where
(an−1 an−2 . . . a1 a0 )2 is the binary representation of i. We can choose u(x) in
Algorithm 3.8 as αi xj , where j increases in the sequence 1, 3, 5, 7, . . . , and
for each j, the index i runs from 1 to q − 1. (That is, we choose u(x) sequen-
tially as α1 x, α2 x, . . . , αq−1 x, α1 x3 , α2 x3 , . . . , αq−1 x3 , α1 x5 , α2 x5 , . . . , αq−1 x5 ,
α1 x7 , . . . until a non-trivial split is obtained.) Moreover, in a recursive call,
the search for u(x) should start from the element in this sequence, that is
next to the polynomial that yielded a non-trivial split of f (x). This method
is effective for small q. For large values of q, one may select u(x) = αxs for
random α ∈ F∗q and for small odd degrees s.
30. (a) In view of the CRT for polynomials (Exercise 3.28), it suffices to show
that h(x)q ≡ h(x) (mod g(x)) has exactly q solutions for h(x) of degrees < δ,
where g(x) is an irreducible factor of f (x) of degree δ. Let us represent Fqδ
by adjoining a root of g(x) to Fq . But then, all solutions for h(x) are those
elements γ of Fqδ that satisfy γ q = γ. These are precisely all the elements
538 Computational Number Theory

of Fq . In other words, the only solutions of h(x)q ≡ h(x) (mod g(x)) are
h(x) ≡ γ (mod g(x)), where γ ∈ Fq .
(b) For r = 0, 1, 2, . . . , d − 1, write

xrq ≡ βr,0 + βr,1 x + βr,2 x2 + · · · + βr,d−1 xd−1 (mod f (x)).

The condition h(x)q ≡ h(x) (mod f (x)) implies that

α0 + α1 xq + α2 x2q + · · · + αd−1 x(d−1)q


≡ α0 (β0,0 + β0,1 x + β0,2 x2 + · · · + β0,d−1 xd−1 ) +
α1 (β1,0 + β1,1 x + β1,2 x2 + · · · + β1,d−1 xd−1 ) +
α2 (β2,0 + β2,1 x + βr,2 x2 + · · · + β2,d−1 xd−1 ) +
··· +
αd−1 (βd−1,0 + βd−1,1 x + βd−1,2 x2 + · · · + βd−1,d−1 xd−1 )
≡ (α0 β0,0 + α1 β1,0 + α2 β2,0 + · · · + αd−1 βd−1,0 ) +
(α0 β0,1 + α1 β1,1 + α2 β2,1 + · · · + αd−1 βd−1,1 )x +
(α0 β0,2 + α1 β1,2 + α2 β2,2 + · · · + αd−1 βd−1,2 )x2 +
··· +
(α0 β0,d−1 + α1 β1,d−1 + α2 β2,d−1 + · · · + αd−1 βd−1,d−1 )xd−1
≡ α0 + α1 x + α2 x2 + · · · + αd−1 xd−1 (mod f (x)).

Equating coefficients of xi for i = 0, 1, 2, . . . , d − 1 gives the following linear


system over Fq :

α0 β0,0 + α1 β1,0 + α2 β2,0 + · · · + αd−1 βd−1,0 = α0 ,


α0 β0,1 + α1 β1,1 + α2 β2,1 + · · · + αd−1 βd−1,1 = α1 ,
α0 β0,2 + α1 β1,2 + α2 β2,2 + · · · + αd−1 βd−1,2 = α2 ,
···
α0 β0,d−1 + α1 β1,d−1 + α2 β2,d−1 + · · · + αd−1 βd−1,d−1 = αd−1 .

Consider the d×d matrix Q whose½r, s-th element is βs,r −δr,s for 0 6 r 6 d−1
1 if r = s
and 0 6 s 6 d − 1, where δr,s = is the Kronecker delta. All
0 otherwise
the solutions for h(x) in h(x)q ≡ h(x) (mod f (x)) can be obtained by solving
   
α0 0
 α1   0 
   
the homogeneous linear system Q  α  0
 .2  =  . .
 ..   .. 
αd−1 0
(c) By Part (a), this system has exactly q t solutions, that is, the nullity of Q
is t, and so its rank is d − t.
(d) There exists a unique solution (modulo f (x)) to the set of congruences:
h(x) ≡ γ1 (mod f1 (x)), h(x) ≡ γ2 (mod f2 (x)), and h(x) ≡ 0 (mod g(x)) for
Solutions to Selected Exercises 539

any other irreducible factor g of f . Since γ1 , γ2 , 0 ∈ Fq , this h(x) belongs to


the nullspace V of Q (by Part (a)).
Let h1 , h2 , . . . , ht constitute an Fq -basis of V (here hi are polynomials iden-
tified with their coefficient vectors). If hi (x) (mod f1 (x)) and hi (x) (mod f2 (x))
are equal for all i = 1, 2, . . . , t, then for any Fq -linear combination h(x) of
h1 , h2 , . . . , ht , the elements h(x) (mod f1 (x)) and h(x) (mod f2 (x)) of Fq are
equal. On the contrary, we have h(x) ∈ V such that h(x) ≡ 0 (mod f1 (x))
and h(x) ≡ 1 (mod f2 (x)). Therefore, for at least one i, the elements γ1 =
hi (x) (mod f1 (x)) and γ2 = hi (x) (mod f2 (x)) must be distinct.
(e) If we choose γ = γ1 as in Part (d), then f1 (x)|(hi (x) − γ), but f2 (x) 6 |
(hi (x) − γ). Therefore, gcd(f (x), hi (x) − γ) is a non-trivial factor of f (x).
However, this γ is not known in advance. If q is small, we try all possible
γ ∈ Fq and all h1 , h2 , . . . , ht until all irreducible factors of f are discovered.

Generate the matrix Q as in Part (b).


Compute the nullity t and a basis h1 , h2 , . . . , ht of the nullspace V of Q.
Repeat until t irreducible factors of f are discovered: {
For i = 1, 2, . . . , t {
For each γ ∈ Fq {
Compute the gcds of reducible factors of f (x) with hi (x) − γ.
Add all the non-trivial factors found to the list of factors of f (x).
Mark the new factors that are irreducible.
}
}
}

31. We first express x2r modulo f (x) = x8 +x5 +x4 +x+1 for r = 0, 1, 2, . . . , d−1:

x0 ≡ 1 

x2 ≡ x2




4 4 
x ≡ x 

8 5
x ≡ x +x +x+1 4
(mod x8 + x5 + x4 + x + 1).
10 7 6 3 2 
x ≡ x +x +x +x 


x12 ≡ x6 + x5 + x2 + 1




14 7 5 2 
x ≡ x +x +x +x+1

Therefore, the matrix Q is


 
0 0 0 0 1 0 1 1
0 1 0 0 1 0 0 1
 
0 1 1 0 0 1 1 1
 
0 0 0 1 0 1 0 0
Q= .
0 0 1 0 0 0 0 0
 
0 0 0 0 1 1 1 1
 
0 0 0 1 0 1 0 0
0 0 0 0 0 1 0 0
540 Computational Number Theory
   
1 0
0 1
   
0 0
   
0 0
The nullspace of Q is generated by the two vectors   and  , so t = 2,
0 1
   
0 0
   
0 1
0 0
that is, f (x) has two irreducible factors. These two vectors are identified with
the polynomials h1 (x) = 1 and h2 (x) = x6 + x4 + x.
We then calculate gcd(hi (x) − a, f (x)) for i = 1, 2 and a = 0, 1. Since
h1 (x) = 1, each gcd(h1 (x) − a, f (x)) is either 1 or f (x), that is, there is no
need to compute these gcds. However, h2 (x) 6= 1, and we can successfully split
f (x) for i = 2 and a = 0 as follows.

f1 (x) = gcd(h2 (x), f (x)) = x5 + x3 + 1,


f2 (x) = f (x)/f1 (x) = x3 + x + 1.

36. (i) Let us write Φn (x) = f (x)g(x) with f non-constant and irreducible in
Z[x]. Let ξ ∈ C be a root of f (x). Choose any prime p that does not divide n.
Assume that ξ p is not a root of f . Since ξ p is a primitive n-th root of unity,
Φn (ξ p ) = f (ξ p )g(ξ p ) = 0. Therefore, f (ξ p ) 6= 0 implies that g(ξ p ) = 0. But
f (x) is the minimal polynomial of ξ, and so f (x)|g(xp ) in Z[x].
Let ā(x) denote the modulo-p reduction of a polynomial a(x) ∈ Z[x]. Since
f (x)|g(xp ) in Z[x], we must have f¯(x)|ḡ(xp ) in Fp [x]. We have ḡ(xp ) = ḡ(x)p
in Fp [x]. Therefore, f¯(x)|ḡ(x)p implies that there exists a common irreducible
factor h̄(x) of f¯(x) and ḡ(x). This, in turn, implies that h̄(x)2 |Φ̄n (x). Moreover,
Φn (x) divides xn − 1 in Z[x], so Φ̄n (x) divides xn − 1 in Fp [x], that is, h̄(x)2
divides xn − 1 in Fp [x]. The formal derivative of xn − 1 is nxn−1 6= 0 in Fp [x]
since p6 | n. Therefore, gcd(xn − 1, nxn−1 ) = 1, that is, xn − 1 is square-free, a
contradiction to the fact that h̄(x)2 |(xn − 1) in Fp [x].
This contradiction proves that ξ p must be a root of f (x). Repeatedly
applying this result proves that for all k with gcd(k, n) = 1, ξ k is again a root
of f (x), that is, all primitive n-th roots of unity are roots of f (x).
38. (1) We can convert Syl(f, g) to Syl(g, f ) by mn number of interchanges of
adjacent rows.
(2) Let us write f (x) = q(x)g(x) + r(x) with deg r < deg g. Write q(x) =
qm−n xm−n +qm−n−1 xm−n−1 +· · ·+q1 x+q0 . Subtract qm−n times the (n+1)-
st row, qm−n−1 times the (n + 2)-nd row, and so on from the first row in order
to convert the first row to the coefficients of r(x) treated as a polynomial
of formal degree m. Likewise, from the second row subtract qm−n times the
(n+2)-nd row, qm−n−1 times the (n+3)-rd row, and so on. This is done for each
of the first n rows. We then make mn interchanges of adjacent rows in order
to bring the last µ
m rows in the first m row
¶ positions. This gives us a matrix
T U
of the form S = . Here, T is an (m − n) × (m − n)
02n×(m−n) Syl(g, r)
Solutions to Selected Exercises 541

upper triangular matrix with each entry in the main diagonal being equal to
bn . Moreover, r is treated as a polynomial of formal degree n. Therefore,

Res(f, g) = (−1)mn det S = (−1)mn (det T ) Res(g, r) = (−1)mn bm−n


n Res(g, r).

(3) Clearly, the last three of the given expressions are equal to each other, so it
suffices to show that Res(f, g) is equal to any of these expressions. We assume
that m > n (if not, use Part (1)). We proceed by induction on the actual
degree of the second argument. QIf deg g = 0 (that is, g(x) = a0 is a constant),
m
we have Res(f, g) = am 0 = a0
m i=1 g(αi ). Moreover,Qwe also need to cover the
m
case g = 0. In this case, we have Res(f, g) = 0 = a0m i=1 g(αi ). Now, suppose
mn m−n
that n > 0. By Part (2), we have Res(f, g) = (−1) bn Res(g, r) with r
treated as a polynomial of formal degree n. But r is a polynomial Qnof actual
degree 6 n − 1, so the induction assumption is that Res(g, r) = bnn j=1 r(βj ).
Since f = qg + r and each βj is aQroot of g, we have r(βj )Q= f (βj ). It follows
n n
that Res(f, g) = (−1)mn bm−n
n bnn j=1 f (βj ) = (−1)mn bmn j=1 f (βj ).
42. Consider the Sylvester matrix S of f and g. The first n − 1 rows contain the
coefficients of z n − 1, and the last n rows the coefficients of g(z). We make the
n × (n − 1) block at the bottom left corner of S zero by subtracting suitable
multiples of the first n − 1 rows from the last n rows. For example, from the n-
n−2
th row, we subtract α times the first row, αp times the second row, . . . , αp
times the (n − 1)-st row. These row operations do notµchange the determinant

In−1 C
of S. The converted matrix is now of the form T = , where
0n×(n−1) D
In−1 is the (n − 1) × (n − 1) identity matrix, and
 n−1 2 n−2 
αp α αp αp ··· αp
n−2
 αp pn−1 pn−3 
 pn−3 α α αp ··· α
n−2 n−1 n−4 

α .
D= αp αp α ··· αp .
 .. .. .. .. .. 
. . . ··· . 
2 3 n−1
α αp αp αp ··· αp

We have Res(f, g) = det S = det D. By making n(n − 1)/2 exchanges of


i
adjacent rows, we convert D to the matrix B of Exercise 2.61 with θi = αp .
Now, α is a normal element in Fpn if and only if det B 6= 0, that is, det D 6= 0,
that is, det S 6= 0, that is, gcd(f, g) = 1.
43. Let b∗1 , b∗2 , . . . , b∗n be the Gram–Schmidt orthogonalization of b1 , b2 , . . . , bn
(Algorithm 3.10). Let M ∗ denote the matrix whose columns are b∗1 , b∗2 , . . . , b∗n .
Pi−1
Since bi = b∗i + j=1 µi,j b∗j , we have M = AM ∗ , where
1 µ2,1 µ3,1 µ4,1 ··· µn,1 
0 1 µ3,2 µ4,2 ··· µn,2 
 
A= 0 0 1 µ4,3 ··· µn,3  .
. .. .. .. .. 
. 
. . . . ··· .
0 0 0 0 ··· 1
542 Computational Number Theory

But then, det M = (det A)(det M ∗ ) = det MQ∗


. Since the vectors b∗i are or-
n
thogonal to one another, we have | det M | = i=1 |b∗i |. Let hx, yi denote the

inner product of the vectors x and y. We have


|b∗i |2 = hb∗i , b∗i i
i−1
X
= hbi − µi,j b∗j , b∗i i
j=1
i−1
X
= hbi , b∗i i − µi,j hb∗j , b∗i i
j=1

= hbi , b∗i i
i−1
X
= hbi , bi − µi,j b∗j i
j=1
i−1
X
= hbi , bi i − µi,j hbi , b∗j i
j=1
i−1
X hbi , b∗j i2
= hbi , bi i −
j=1
hb∗j , b∗j i
i−1 µ ¶2
X hbi , b∗j i
= |bi |2 −
j=1
|b∗j |
6 |bi |2 ,
that is, |b∗i | 6 |bi | for all i, and so
n
Y n
X
| det M | = | det M ∗ | = |b∗i | 6 |bi |.
i=1 i=1

50. (a) By the definition of reduced bases (Eqns (3.1) and (3.2)), we have
µ ¶
3 1
|b∗i |2 > − µ2i,i−1 |b∗i−1 |2 > |b∗i−1 |2 .
4 2
Applying this result i − j times shows that for 1 6 j 6 i 6 n, we have
|b∗j |2 6 2i−j |b∗i |2 .
Pj−1
Gram–Schmidt orthogonalization gives bj = b∗j + k=1 µj,k b∗k . Since the
∗ ∗ ∗
vectors b1 , b2 , . . . , bn are orthogonal to one another, we then have:
j−1
X
|bj |2
= |b∗j |2 + µ2j,k |b∗k |2
k=1
j−1
1X ∗2
6 |b∗j |2 + |bk | [by Eqn (3.1)]
4
k=1
Solutions to Selected Exercises 543
j−1
1 X j−k ∗ 2
6 |b∗j |2 + 2 |bj | [proved above]
4
k=1
µ ¶
1
= 2j−2
+ |b∗j |2
2
6 2j−1 |b∗j |2 [since j > 1]
6 2 2 |b∗i |2
j−1 i−j
[proved above]
= 2i−1 |b∗i |2 .
Qn
(b) By Hadamard’s inequality, d(L) 6 i=1 |bi |, and by Part (a), |bi | 6
2Q(i−1)/2 |b∗i |. But Q
b∗1 , b∗2 , . . . , b∗n form an orthogonal basis of L, so d(L) =
n ∗ n [0+1+2+···+(n−1)]/2
i=1 |bi |. Thus, i=1 |bi | 6 2 d(L) = 2n(n−1)/4 d(L).
(i−1) ∗
(c) By Part (a), |bQ 1| 6 2 |bi | for all i = 1, 2, . . . , n. Therefore, |b1 |n 6
[0+1+2+···+(n−1)]/2 n ∗
2 |b | = 2n(n−1)/4 d(L).
Pmi=1 i Pm
51. We can write x = i=1 ui bi = i=1 u∗i b∗i with integers ui and real numbers
u∗i , where m ∈ {1, 2, . . . , n} is the largest integer for which um 6= 0. By the
Gram–Schmidt orthogonalization formula, we must have um = u∗m , so
|x|2 > (u∗m )2 |b∗m |2 = u2m |b∗m |2 > |b∗m |2 .
Exercise 3.50(a) gives |bi |2 6 2m−1 |b∗m |2 for all i, 1 6 i 6 m. In particular,
for i = 1, we have
|b1 |2 6 2m−1 |b∗m |2 6 2m−1 |x|2 6 2n−1 |x|2 .
Now, take any n linearly independent vectors x1 , x2 , . . . , xn in L. Let mj
be the value of m for xj . As in the last paragraph, we can prove that
|bi |2 6 2n−1 |xj |2
for all i, 1 6 i 6 mj . Since x1 , x2 , . . . , xn are linearly independent, we must
have mj = n for at least one j. For this j, we have |bi |2 6 2n−1 |xj |2 for all i
in the range 1 6 i 6 n.
57. The function EDF() takes four arguments: the polynomial f (x), the prime
modulus p, the degree r of each irreducible factor of f , and a bound B. This
bound dictates the maximum degree of u(x) used in Algorithm 3.7. The choice
B = 1 corresponds to Algorithm 3.5.

EDF(f,p,r,B) = \
local(u,d,i,g,h,e); \
f = Mod(1,p) * f; \
if (poldegree(f) == 0, return; ); \
if (poldegree(f) == r, print("Factor found: ", lift(f)); return; ); \
e = (p^r - 1) / 2; \
while (1, \
u = Mod(0,p); \
d = 1 + random() % B; \
for (i=0, d, u = u + Mod(random(),p) * x^i ); \
544 Computational Number Theory

u = Mod(u,f); g = u^e; \
h = gcd(lift(g)-Mod(1,p),f); h = h / polcoeff(h,poldegree(h)); \
if ((poldegree(h) > 0) && (poldegree(h) < poldegree(f)), \
EDF(h,p,r,B); \
EDF(f/h,p,r,B); \
return; \
); \
);

EDF(x^8 + x^7 + 2*x^6 + 2*x^5 + x^4 + 2*x^3 + 2*x^2 + x + 1, 3, 4, 2)


EDF(x^6 + 16*x^5 + 3*x^4 + 16*x^3 + 8*x^2 + 8*x + 14, 17, 2, 1)

58. Following the recommendation of Exercise 3.8(f), we choose u(x) for Algo-
rithm 3.8 in the sequence x, x3 , x5 , x7 , . . . . We pass the maximum degree 2d+1
such that x2d+1 has already been tried as u(x). The next recursive call starts
with u(x) = x2d+3 . The outermost call should pass −1 as d.

EDF2(f,r,d) = \
local(u,s,i,h); \
f = Mod(1,2) * f; \
if (poldegree(f) == 0, return); \
if (poldegree(f) == r, print("Factor found: ", lift(f)); return); \
while (1, \
d = d + 2; u = Mod(1,2) * x^d; s = u; \
for (i=1,r-1, u= u^2 % f; s = s + u; ); \
h = gcd(f,s); \
if ((poldegree(h) > 0) && (poldegree(h) < poldegree(f)), \
EDF2(h,r,d); \
EDF2(f/h,r,d); \
return; \
); \
);

EDF2(x^18+x^17+x^15+x^11+x^6+x^5+1, 6, -1)
EDF2(x^20+x^18+x^17+x^16+x^15+x^12+x^10+x^9+x^7+x^3+1, 5, -1)

59. The following GP/PARI functions implement Berlekamp’s Q-matrix factoriza-


tion over a prime field Fp . The first function computes Q, its nullity t, and a
basis of the nullspace of Q. It then calls the second function in order to com-
pute gcd(hi (x) − γ, f ) for i = 1, 2, . . . , t and for γ ∈ Fp . The functions assume
that f (x) is square-free. The computation of the d powers xrp (mod f (x)) can
be optimized further. Two sample calls of BQfactor() follow.

BQfactor(f,p) = \
f = Mod(1,p) * f; d = poldegree(f); \
Q = matrix(d,d); for (i=0,d-1, Q[i+1,1] = Mod(0,p)); \
for (r=1,d-1, \
h = lift(lift(Mod(Mod(1,p)*x,f)^(r*p))); \
for (i=0,d-1, Q[i+1,r+1] = Mod(polcoeff(h,i),p)); \
Q[r+1,r+1] = Q[r+1,r+1] - Mod(1,p); \
); \
Solutions to Selected Exercises 545

V = matker(Q); t = matsize(V)[2]; \
if (t==1, print(lift(f)); return); \
decompose(V,t,f,p,d);

decompose(V,t,f,p,d) = \
d = poldegree(f); \
for (i=1,t, \
h = 0; \
for (j=0,d-1, h = h + Mod(V[j+1,i],p) * x^j); \
for (a=0,p-1, \
f1 = gcd(h-Mod(a,p), f); \
d1 = poldegree(f1); \
f1 = f1 / polcoeff(f1,d1); \
if ((d1 > 0) && (d1 < d), \
if (polisirreducible(f1), \
print(lift(f1)), \
decompose(V,t,f1,p,d); \
); \
f2 = f / f1; \
if (polisirreducible(f2), \
print(lift(f2)), \
decompose(V,t,f2,p,d); \
); \
return; \
); \
); \
);

BQfactor(x^8+x^5+x^4+x+1, 2)
BQfactor(x^8+x^5+x^4+2*x+1, 3)

60. We first write a function polyliftonce() to lift a factorization f = gh modulo


pn to a factorization modulo pn+1 . This function is called multiple times by
polylift() in order to lift the factorization modulo p to the factorization
modulo pn . Sample calls are also supplied to demonstrate the working of
these functions on the data of Example 3.34.

polyliftonce(f,g,h,p,n) = \
local(q,i,j,w,A,c,b,r,s,t); \
q = p^n; \
w = (f - g * h) / q; \
r = poldegree(g); s = poldegree(h); t = poldegree(f); \
b = matrix(t+1,1); A = matrix(t+1,t+1); \
for(i=0, t, b[t-i+1,1] = Mod(polcoeff(w,i),p)); \
for (j=0, s, \
for (i=0, r, \
A[i+j+1,j+1] = Mod(polcoeff(g,r-i),p); \
); \
); \
for (j=0, r-1, \
for (i=0, s, \
A[i+j+2,s+j+2] = Mod(polcoeff(h,s-i),p); \
); \
); \
546 Computational Number Theory

c = A^(-1) * b; \
u = 0; v = 0; \
for (i=0, r-1, u = u + lift(c[s+i+2,1]) * x^(r-i-1)); \
for (i=0, s, v = v + lift(c[i+1,1]) * x^(s-i)); \
return([g+q*u,h+q*v]);

polylift(f,g,h,p,n) = \
local(i,j,q,L); \
q = p; \
for (i=1, n-1, \
L = polyliftonce(f,g,h,p,i); \
q = q * p; \
g = lift(L[1]); \
for (j=0, poldegree(g), if (polcoeff(g,j) > q/2, g = g - q * x^j) ); \
h = lift(L[2]); \
for (j=0, poldegree(h), if (polcoeff(h,j) > q/2, h = h - q * x^j) ); \
); \
return([g,h]);

polylift(35*x^5-22*x^3+10*x^2+3*x-2, x^2+2*x-2, -4*x^3-5*x^2+6*x+1, 13, 1)


polylift(35*x^5-22*x^3+10*x^2+3*x-2, x^2+2*x-2, -4*x^3-5*x^2+6*x+1, 13, 2)
polylift(35*x^5-22*x^3+10*x^2+3*x-2, x^2+2*x-2, -4*x^3-5*x^2+6*x+1, 13, 3)

Chapter 4 Arithmetic of Elliptic Curves


9. Let us use the equation Y 2 = f (X) for the curve, where f (X) = X 3 + aX 2 +
bX + c. The condition 3P = O is equivalent to the condition 2P = −P .
Therefore, if P = (h, k) is a (finite) point of order three, we have x(2P ) =
x(−P ) = x(P ). By Exercise 4.7, we then have h4 − 2bh2 − 8ch + (b2 − 4ac) =
4h(h3 + ah2 + bh + c), that is,
ψ(h) = 3h4 + 4ah3 + 6bh2 + 12ch + (4ac − b2 ) = 0.
This is a quartic equation in h alone, and can have at most four roots for h. For
each root h, we obtain at most two values of k satisfying k 2 = h3 +ah2 +bh+c.
Thus, there are at most eight points of order three on the curve.
If the field K is algebraically closed, the above quartic equation has exactly
four roots. For each such root, we cannot get k = 0, because a point of the
form (h, 0) is of order two. We also need to argue that the roots of the quartic
equation cannot be repeated. To that effect, we compute the discriminants:
Discr(f ) = −4a3 c + a2 b2 + 18abc − 4b3 − 27c2 ,
³
Discr(ψ) = 28 × 33 −16a6 c2 +8a5 b2 c −a4 b4 +144a4 bc2 −68a3 b3 c −216a3 c3 +
´
8a2 b5 −270a2 b2 c2 +144ab4 c+972abc3 −16b6 − 216b3 c2 −729c4
¡ ¢2
= −28 × 33 × Discr(f ) .
Solutions to Selected Exercises 547

If char K 6= 2, 3, we conclude that ψ has multiple roots if and only if f has


multiple roots. (Note that ψ(h) = 2f (h)f ′′ (h) − f ′ (h)2 and ψ ′ (h) = 12f (h), so
a common root of ψ and ψ ′ is also a common root of f and f ′ , and conversely.)
15. If we substitute X ≡ 0 (mod p), we get a unique solution for Y , namely, Y ≡
0 (mod p). Now, substitute a value of X 6≡ 0 (mod p). If X 2 + a ≡ 0 (mod p),
then each of the two values ±X gives the unique ³ solution´ Y ³≡ 0´ (mod
³ 2 p).´
2 X 3 +aX X X +a
So assume that X + a 6≡ 0 (mod p). We have = .
³ ´ ³ ´ ³p 2 ´ p³ 2
p
´
Since p ≡ 3 (mod 4), −X p = − X p . Moreover,
X +a
p = (−X)p +a .
Therefore, one of the two values ±X yields two solutions for Y , and the other
no solution for Y .
22. The only F2 -rational points on Y 2 + Y = X 3 + X are O, (0, 0), (0, 1), (1, 0)
and (1, 1), so the trace of Frobenius at 2 is −2. The roots of W 2 + 2W + 2 = 0
are −1 ± i. Therefore, the size of the group of this curve over F2n is

S(n) = 2n + 1 − ((−1 + i)n + (−1 − i)n )


√ ³ i nπ i nπ
´
= 2n + 1 − (−1)n ( 2)n e− 4 + e 4
n nπ
= 2n + 1 − (−1)n 2 2 +1 cos .
4
On the curve Y 2 + Y = X 3 + X + 1, the only F2 -rational point is O, that
is, the trace of Frobenius at 2 is 2. The roots of W 2 − 2W + 2 = 0 are 1 ± i.
As above, we deduce that the size of the elliptic-curve group over F2n is
n nπ
T (n) = 2n + 1 − 2 2 +1 cos .
4
The following table lists the values of S(n) and T (n) for different values
of n modulo 8.
n (mod 8) S(n) T (n)
n n
0 2n + 1 − 2 2 +1 2n + 1 − 2 2 +1
n+1 n+1
1 2n + 1 + 2 2 2n + 1 − 2 2
2 2n + 1 2n + 1
n+1 n+1
n n
3 2 +1−2 2 2 +1+2 2
n n
4 2n + 1 + 2 2 +1 2n + 1 + 2 2 +1
n+1 n+1
5 2n + 1 − 2 2 2n + 1 + 2 2
n n
6 2 +1 2 +1
n+1 n+1
n n
7 2 +1+2 2 2 +1−2 2
n n+1
For each n ∈ N, the trace is 0, ±2 2 +1 (only if n is even), or ±2 2 (only if
n is odd). The trace is divisible by 2 in all these cases. So over all extensions
F2n , the curves Y 2 + Y = X 3 + X and Y 2 + Y = X 3 + X + 1 are supersingular.
27. Let us use the simplest form of Weierstrass equation:

E : Y 2 = X 3 + aX + b.
548 Computational Number Theory

Let all of P = (h1 , k1 ), Q = (h2 , k2 ), P + Q = (h3 , k3 ) and 2P + Q = (h4 , k4 )


be finite points. If we compute P + Q first, we need to compute the slope λ1
of the line passing through P and Q. Next, we need to compute the slope λ2
of the line passing through P and P + Q. We have:
k2 − k1
λ1 = , and
h2 − h1
k3 − k1 λ1 (h1 − h3 ) − k1 − k1 2k1
λ2 = = = −λ1 − .
h3 − h1 h3 − h1 h3 − h1
For computing λ2 (and subsequently h4 and k4 ), we, therefore, do not need
the value of k3 . That is, we can avoid the computation of k3 = λ1 (h1 −h3 )−k1 .
Addition, subtraction and multiplication by two and three being efficient
(linear-time) in the field size, let us concentrate on the times of multiplication
(M ), squaring (S) and inversion (I). Each addition of distinct finite points
takes time (about) 1M +1S +1I, and each doubling of a finite point takes time
1M +2S +1I. Therefore, the computation of (P +P )+Q takes time 2M +3S +
2I. On the contrary, the computation of (P + Q) + P takes time 2M + 2S + 2I.
Moreover, if the intermediate y-coordinate is not computed, one multiplication
is saved, and the computation of (P + Q) + P takes time 1M + 2S + 2I.
29. Let us write the equation of the curve as

C : fd (X, Y ) + fd−1 (X, Y ) + · · · + f1 (X, Y ) + f0 (X, Y ) = 0,

where fi (X, Y ) the sum of all non-zero terms of degree i in f (X, Y ). The
homogenization of C is then

C (h) : fd (X, Y ) + Zfd−1 (X, Y ) + · · · + Z d−1 f1 (X, Y ) + Z d f0 (X, Y ) = 0.

In order to find the points at infinity on C (h) , we put Z = 0 and get fd (X, Y ) =
0. Let us write this equation as

cd X d + cd−1 X d−1 Y + cd−2 X d−2 Y 2 + · · · + c1 XY d−1 + c0 Y d = 0,

where ci ∈ K are not all zero. If c0 is the only non-zero coefficient, we have
Y d = 0, and the only point at infinity on the curve is [1, 0, 0]. If ci 6= 0 for
some i > 0, we rewrite this equation as

cd (X/Y )d + cd−1 (X/Y )d−1 + cd−2 (X/Y )d−2 + · · · + c1 (X/Y ) + c0 = 0.

This is a univariate equation having roots α1 , α2 , . . . , αt with t 6 d. For each


root αi , we have X/Y = αi , that is, X = αi Y , that is, [αi , 1, 0] is a point at
infinity on the curve.
30. (a) We have
k2 k1
l2 − l1 k2 l1 − k1 l2
λ= h2 h1
= .
l2 − l1
h 2 l1 − h 1 l2
Solutions to Selected Exercises 549

Therefore,
h h1 h2 l1 l2 (k2 l1 − k1 l2 )2 − (h2 l1 − h1 l2 )2 (h1 l2 + h2 l1 )
= λ2 − − = ,
l l1 l2 l1 l2 (h2 l1 − h1 l2 )2
and
µ ¶
k h1 h k1
=λ − − .
l l1 l l1
Substituting the values of λ and h/l gives an explicit expression for k/l. These
expressions are too clumsy. Fortunately, there are many common subexpres-
sions used in these formulas. Computing these intermediate subexpressions
allows us to obtain h, k, l as follows:
T1 = k2 l1 − k1 l2 ,
T2 = h 2 l1 − h 1 l2 ,
T3 = T22 ,
T4 = T2 T3 ,
T5 = l1 l2 T12 − T4 − 2h1 l2 T3 ,
h = T2 T5 ,
k = T1 (h1 l2 T3 − T5 ) − k1 l2 T4 ,
l = l1 l2 T 4 .
These expressions can be further optimized by using temporary variables to
store the values of h1 l2 , k1 l2 and l1 l2 .
(b) We proceed as in the case addition. Here, I present only the final formulas.
T1 = 3h21 + al12 ,
T2 = k1 l1 ,
T3 = h1 k1 T2 ,
T4 = T12 − 8T3 ,
T5 = T22 ,
h′ = 2T2 T4 ,
k′ = T1 (4T3 − T4 ) − 8k12 T5 ,
l′ = 8T2 T5 .
(c) Computing the affine coordinates requires a division in the field. If this
division operation is much more expensive than multiplication and squaring
in the field, avoiding this operation inside the loop (but doing it only once
after the end of the loop) may speed up the point-multiplication algorithm.
However, projective coordinates increase the number of multiplication (and
squaring) operations substantially. Therefore, it is not clear whether avoiding
one division in the loop can really provide practical benefits. Implementers re-
port contradictory results in the literature. The practical performance heavily
depends on the library used for the field arithmetic.
550 Computational Number Theory

31. The second point can be treated as having projective coordinates [h2 , k2 , 1],
that is, l2 = 1. The formulas in Exercise 4.30(a) can be used with l2 = 1. This
saves the three multiplications h1 l2 , k1 l2 and l1 l2 .
In the conditional addition part of the point-multiplication algorithm, the
second summand is always the base point P which is available in affine coor-
dinates. Inside the loop, we always keep the sum S in projective coordinates.
However, the computation of S + P benefits from using mixed coordinates.
32. (b) Let [h, k, l]c,d be a finite point on C. We have l 6= 0, and so [h, k, l]c,d ∼
[h/lc , k/ld , 1]c,d . Therefore, we identify [h, k, l]c,d with the point (h/lc , k/ld ).
Conversely, to a point (h, k) in affine coordinates, we associate the point
[h, k, 1]c,d with projective coordinates. It is easy to verify that these associa-
tions produce a bijection of all finite points on C with all points on C (c,d) with
non-zero Z-coordinates.
(c) We need to set Z = 0 in order to locate the points at infinity. This gives us
the polynomial g(X, Y ) = f (c,d) (X, Y, 0). In general, g is not a homogeneous
polynomial in the standard sense. However, if we give a weight of c to X and
a weight of d to Y , each non-zero term in g is of the same total weight. Let
′ ′
X i Y j and X i Y j have the same weight, that is, ci + dj = ci′ + dj ′ , that is,
c(i − i′ ) = d(j ′ − j). For the sake of simplicity, let us assume that gcd(c, d) = 1
(the case gcd(c, d) > 1 can be gracefully handled). But then, i ≡ i′ (mod d)
and j ≡ j ′ (mod c). In view of this, we proceed as follows.
If X divides g, we get the point [0, 1, 0]c,d at infinity. If Y divides g, we get
the point [1, 0, 0]c,d at infinity. We remove all factors of X and Y from g, and
assume that g is divisible by neither X nor Y . By the argument in the last
paragraph, we can write g(X, Y ) = h(X d , Y c ). Call X ′ = X d and Y ′ = Y c .
Then, h(X ′ , Y ′ ) is a homogeneous polynomial in X ′ , Y ′ . We find all the roots
for X ′ /Y ′ from this equation. Each root α corresponds to a point Oα at infinity
on the curve. Since gcd(c, d) = 1, we have uc + vd = 1 for some integers u, v.
The choices X = αv and Y = α−u are consistent with X ′ = X d , Y ′ = Y c
and X ′ /Y ′ = α, so we take Oα = [αv , α−u , 0]c,d . There is a small catch here,
namely, the values u, v in Bézout’s relation are not unique. Given any solution
u, v of uc + vd = 1, all solutions are given by (u + rd)c + (v − rc)d = 1 for any
v
α−u
r ∈ Z. But then, [αv−rc , α−(u+rd) , 0]c,d = [ (ααr )c , (α v
r )d , 0]c.d = [α , α
−u
, 0]c,d ,
that is, the point Oα does not depend on the choice of the pair (u, v).
33. (a) Substituting X by X/Z 2 and Y by Y /Z 3 in the equation of the curve and
multiplying by Z 6 , we obtain
E (2,3) : Y 2 = X 3 + aXZ 4 + bZ 6 .
(b) Put Z = 0 to get X 3 − Y 2 = 0. Now, set X ′ = X 3 and Y ′ = Y 2 to get
X ′ − Y ′ = 0. The only root of this equation is X ′ /Y ′ = 1. Therefore, the
point at infinity on E (2,3) is [1, 1, 0]2,3 (since 1 raised to any integral power is
again 1, we do not need to compute Bézout’s relation involving 2, 3).
(c) The point [h, k, l]2,3 (with l 6= 0) has affine coordinates (h/l2 , k/l3 ). The
opposite of this point is (h/l2 , −k/l3 ), that is, the point [h/l2 , −k/l3 , 1]2,3 =
[h, −k, l]2,3 = [h, k, −l]2,3 .
Solutions to Selected Exercises 551

(d) Let P1 = [h1 , k1 , l1 ]2,3 and P2 = [h2 , k2 , l2 ]2,3 be two finite points on
E with P1 6= ±P2 . In order to compute the sum P1 + P2 = [h, k, l]2,3 , we
compute the sum of the points (h1 /l12 , k1 /l13 ) and (h2 /l22 , k2 /l23 ). We proceed
as in Exercise 4.30(a), and obtain the following formulas.
T1 = h1 l22 ,
T2 = h2 l12 ,
T3 = k1 l23 ,
T4 = k2 l13 ,
T5 = T2 − T1 ,
T6 = T4 − T3 ,
h = −T53 − 2T1 T52 + T62 ,
k = −T3 T53 + T6 (T1 T52 − h),
l = l1 l2 T 5 .
(e) The double of [h, k, l]c,d is the point [h′ , k ′ , l′ ]c,d computed as follows.
T1 = 4hk 2 ,
T2 = 3h2 + al4 ,
h′ = −2T1 + T22 ,
k′ = −8k 4 + T2 (T1 − h′ ),
l′ = 2kl.

41. Since x2 − y 2 = 1 on the curve, we have


2y 4 − 2y 3 x − y 2 + 2yx − 1
= (2y 4 − 2y 2 ) − (2y 3 x − 2yx) + (y 2 − 1)
= (y 2 − 1)(2y 2 − 2yx + 1)
= (y − 1)(y + 1)(y − x)2 ,
and
y 2 + yx + y + x + 1
= x2 + yx + x + y
= (x + 1)(y + x).
Therefore,
(y − 1)(y + 1)(y − x)2
R(x, y) = .
(x + 1)(y + x)

Zeros of√ y − 1: The line y − 1 = 0 cuts the hyperbola at P1 = ( 2, 1) and
P2 = (− 2, 1). We can take y − 1 as a uniformizer at each of these two points,
and conclude that P1 and P2 are simple zeros of R.
552 Computational Number Theory

Zeros of√y + 1: The line y + 1 = 0 cuts the hyperbola at P3 = ( 2, −1) and
P4 = (− 2, −1). We can take y + 1 as a uniformizer at each of these two
points, and conclude that P3 and P4 are simple zeros of R.
Zeros of x + 1: The line x + 1 = 0 touches the hyperbola at P5 = (−1, 0). At
this point, the non-tangent line y can be taken as the uniformizer. We have
· ¸
(y − 1)(y + 1)(y − x)2 (x − 1)
R(x, y) = y −2 .
(y + x)
Thus, P5 is a pole of R of multiplicity two.
Zeros of (y − x)2 : The line y − x = 0 does not meet the curve at any finite
point. However, it touches the curve at O1 = [1, 1, 0], one of its point at
infinity. The vertical line x = ∞ (or 1/x = 0) meets but is not tangential to
the hyperbola at O1 . Thus, 1/x can be taken as a uniformizer at O1 . But
· y 1 y 1 2¸
(y − 1)(y + 1)(y 2 − x2 )2 2 ( x − x )( x + x )(−1)
R(x, y) = = (1/x) .
(x + 1)(y + x)3 (1 + x1 )( xy + 1)3

At O1 , we have y/x = 1 and 1/x = 0. It follows that O1 is a zero of R of


multiplicity two.
Zeros of y + x: The only intersection of the line y + x with the hyperbola is
at O2 (the second point at infinity on the curve). We can again take 1/x as
the uniformizer at O2 = [1, −1, 0]. We write
· y 1 y 1 y 3¸
(y − 1)(y + 1)(y − x)3 −4 ( x − x )( x + x )( x − 1)
R(x, y) = = (1/x) .
(x + 1)(y 2 − x2 ) (1 + x1 )(−1)

But y/x = −1 and 1/x = 0 at O2 , so O2 is a pole of R of multiplicity four.


43. For elliptic curves, we prefer to use the explicit formulas given as Eqns (4.13),
(4.14) and (4.15). The given rational function is written as G(x, y)/H(x, y)
with G(x, y) = x and H(x, y) = y.
Finite zeros of G(x, y): The only zero of x is the special point P1 = (0, 0).
We have e = 1 and l = 0, so the multiplicity of P1 is 2e = 2.
Finite zeros of H(x, y): The zeros of y are the special points P1 = (0, 0),
P2 = (1, 0) and P3 = (−1, 0). For each of these points, we have e = 0 and
l = 1, so ordPi (y) = 1 for i = 1, 2, 3.
Zeros and poles of R = G/H: P1 is a zero of R of multiplicity 2 − 1 = 1. P2
and P3 are poles of multiplicity one. Moreover, ordO (G) = −2 and ordO (H) =
−3, so that ordO (R) = −2 + 3 = 1, that is, R has a simple zero at O.
52. Bilinearity: We claim that ([P ] − [O]) + ([Q] − [O]) ∼ [P + Q] − [O]. If
one (or both) of P, Q is/are O, this is certainly true. If P = −Q, the right
side is 0, and the left side is the divisor of the vertical line LP,Q (that is,
a principal divisor, that is, equivalent to 0). If P, Q are finite points that
are not opposites of one another, then let U = P + Q, and observe that
Solutions to Selected Exercises 553

[P ] + [Q] + [−U ] − 3[O] = Div(LP,Q ) and [U ] + [−U ] − 2[O] = Div(LU,−U ).


Therefore, ([P ] − [O]) + ([Q] − [O]) = [U ] − [O] + Div(LP,Q /LU,−U ).
For V = P, Q, P + Q, R, let DV be a divisor equivalent to [V ] − [O], and
let fV be the rational function such that Div(fV ) = mDV . By definition,
em (P + Q, R) = fP +Q (DR )/fR (DP +Q ).
Since ([P ] − [O]) + ([Q] − [O]) ∼ [P + Q] − [O], we have DP + DQ ∼ DP +Q ,
that is, DP +Q = DP + DQ + Div(h) for some rational function h. Moreover,
Div(fP +Q ) = mDP +Q = mDP + mDQ + m Div(h) = Div(fP ) + Div(fQ ) +
Div(hm ) = Div(fP fQ hm ), so we can take fP +Q = fP fQ hm . Therefore,
fP (DR )fQ (DR )h(DR )m fP (DR )fQ (DR )h(DR )m
em (P + Q, R) = = .
fR (DP + DQ + Div(h)) fR (DP )fR (DQ )fR (Div(h))
But h(DR )m = h(mDR ) = h(Div(fR )) = fR (Div(h)) (by Weil’s reciprocity
law), and so em (P + Q, R) = em (P, R)em (Q, R).
One can analogously check that em (P, Q + R) = em (P, Q)em (P, R).
Alternation: Take two divisors D1 , D2 with disjoint supports, each equiva-
lent to [P ] − [O]. But then, D1 ∼ D2 , that is, D1 = D2 + Div(h) for some
rational function h. Also, choose functions f1 , f2 such that Div(f1 ) = mD1
and Div(f2 ) = mD2 . Since Div(f1 ) = mD1 = mD2 + m Div(h) = Div(f2 ) +
Div(hm ) = Div(f2 hm ), we can take f1 = f2 hm . Therefore,
f1 (D2 ) f2 hm (D2 ) f2 (D2 )h(Div(f2 ))
em (P, P ) = = = = 1.
f2 (D1 ) f2 (D2 + Div(h)) f2 (D2 )f2 (Div(h))
Skew symmetry: By the previous two properties, we have
1 = em (P + Q, P + Q) = em (P, P )em (P, Q)em (Q, P )em (Q, Q)
= em (P, Q)em (Q, P ).

58. Let Div(f ) = m[P ]−m[O], D ∼ [Q]−[O] and D′ ∼ [Q′ ]−[O], where Q = mQ′ .
Since Q − O − m(Q′ − O) = O, we have D ∼ mD′ , that is, D = mD′ + Div(h)
for some rational function h. But then,
hP, Qim = f (D) = f (mD′ + Div(h)) = f (mD′ )f (Div(h))
= f (D′ )m h(Div(f )) = f (D′ )m h(m[P ] − m[O])
³ ´m
= f (D′ )h([P ] − [O]) .

59. Let us restrict our attention to the simple Weierstrass equation Y 2 = X 3 +


aX + b. Let L : y + µx + λ = 0 be the equation of a non-vertical straight line.
The conjugate of L is L̂ = −y + µx + λ. For a point Q, we have:
L(−Q) = L̂(Q).
Moreover,
2 2 3
N(L) = LL̂ = −y + (µx + λ) = −(x + ux + vx + w)
554 Computational Number Theory

for some u, v, w. Suppose that L is the line through the points P1 = (h1 , k1 )
and P2 = (h2 , k2 ) on the curve (we may have P1 = P2 ). The third point of
intersection of L with the curve is (h3 , −k3 ), where P1 +P2 = (h3 , k3 ). Clearly,
h1 , h2 , h3 are roots of N(L), and it follows that
N(L) = −(x − h1 )(x − h2 )(x − h3 ).
In particular, if P1 = P2 , we have
2
N(L) = −(x − h1 ) (x − h3 ).
In the parts below, we take Q = (h, k).
(a) We have
LU,U (Q) LU,U (Q) 1
= = −
L2U,−U (Q)L2U,−2U (Q) (h − x(U ))2 (h − x(2U )) L̂U,U (Q)
1
= − .
LU,U (−Q)
(b) Let us denote tU = (ht , yt ). But then,
L(k+1)U,kU (Q)
L(k+1)U,−(k+1)U (Q)L(2k+1)U,−(2k+1)U (Q)
L(k+1)U,kU (Q)
=
(h − hk+1 )(h − h2k+1 )
L(k+1)U,kU (Q)L̂(k+1)U,kU (Q)
=
(h − hk+1 )(h − h2k+1 )L̂(k+1)U,kU (Q)
L(k+1)U,kU (Q)L̂(k+1)U,kU (Q)(h − hk )
=
(h − hk )(h − hk+1 )(h − h2k+1 )L̂(k+1)U,kU (Q)
h − hk
= −
L̂(k+1)U,kU (Q)
LkU,−kU (Q)
= − .
L(k+1)U,kU (−Q)
(c) Put k = 1 in Part (b).
61. For n > 2, let us define the function
gn,P = fn,P LnP,−nP .
If we can compute gn,P , we can compute fn,P , and conversely. Moreover, if
mP = O (this is usually the case in Weil- and Tate-pairing computations), we
have LmP,−mP = 1, and so gm,P = fm,P , that is, no final adjustment is neces-
sary to convert the value of g to the value of f . In view of this, we can modify
Miller’s algorithm so as to compute gn,P (Q). Write n = (1ns−1 ns−2 . . . n1 n0 )2 .
We first consume two leftmost bits from n. If ns−1 = 0, we compute
g2,P (Q) = f2,P L2P,−2P (Q) = LP,P (Q).
Solutions to Selected Exercises 555

On the other hand, if ns−1 = 1, we compute


LP,P (Q)L2P,P (Q)
g3,P (Q) = f3,P (Q)L3P,−3P = f2,P (Q)L2P,P (Q) = .
L2P,−2P (Q)
In these two cases, we initialize U to 2P and 3P , respectively.
The Miller loop runs for i = s − 2, s − 3, . . . , 1, 0. In the i-th iteration, we
consume the bit ni , and update the value of the function f (or g) and the point
U appropriately. Suppose that at the beginning of the i-th iteration, we have
computed gk,P and U = kP . If ni = 0, we compute g2k,P and 2kP = 2U in
the current iteration. If ni = 1, we compute g2k+1,P and (2k + 1)P = 2U + P .
We consider these two cases separately.
If ni = 0, we have
g2k,P (Q) = f2k,P (Q)L2kP,−2kP (Q)
2
= fk,P (Q)LkP,kP (Q)
µ ¶2
gk,P (Q)
= LkP,kP (Q)
LkP,−kP (Q)
à !
2 LkP,kP (Q)
= gk,P (Q)
L2kP,−kP (Q)
µ ¶
2 L2kP,−2kP (Q)
= −gk,P (Q) [by Exercise 4.59(a)]
LkP,kP (−Q)
µ ¶
2 L2U,−2U (Q)
= −gk,P (Q) .
LU,U (−Q)
If ni = 1, we have
g2k+1,P (Q) = f2k+1,P (Q)L(2k+1)P,−(2k+1)P (Q)
= f2k,P L2kP,P (Q)
µ ¶
L2kP,P (Q)
= g2k,P (Q)
L2kP,−2kP (Q)
µ ¶µ ¶
2 L2U,−2U (Q) L2U,P (Q)
= −gk,P (Q)
LU,U (−Q) L2U,−2U (Q)
µ ¶
2 L2U,P (Q)
= −gk,P (Q) .
LU,U (−Q)
Here, each of the two cases ni = 0 and ni = 1 requires the computation of
only two line functions. On the contrary, the original Miller loop requires two
or four line-function computations for these two cases, respectively. Therefore,
if many bits in n are 1, the modified loop is expected to be somewhat more
efficient than the original Miller loop.
63. For any two points U, V on the curve, we have:
µ ¶
LU,V
[U ] + [V ] − [U + V ] − [O] = Div .
LU +V,−(U +V )
556 Computational Number Theory

(a) Since Div(f0,P,S ) = 0, we can


³ take f0,P,S ´= 1. Moreover, Div(f1,P,S ) =
L
[P + S] − [S] − [P ] + [O] = Div (P +S),−(P
LP,S
+S)
.
(b) We have Div(fn+1,P,S ) − Div(fn,P,S ) = [P + S] − [S] − [(n + 1)P ] − [nP ] =
³ S] − [P ] − [S]´+ [O]) + ([nP ] + [P ] − [(n + 1)P ] − [O]) = Div(f1,P,S ) +
([P +
LnP,P
Div L(n+1)P,−(n+1)P .
(c) We have Div(fn+n′ ,P,S ) = (n+n′ )[P +S]−(n+n′ )[S]−[(n+n′ )P ]+[O] =
(n[P +S]−n[S]−[nP ]+[O])+(n′ [P +S]−n′ [S]−[n′ P ]+[O])+([nP
³ ]+[n´′ P ]−
LnP,n′ P
[(n + n′ )P ] − [O]) = Div(fn,P,S ) + Div(fn′ ,P,S ) + Div L(n+n′ )P,−(n+n′ )P .
65. Let Q = aP for some a, where P, Q ∈ Eq are points of order m. For the dis-
torted Weil pairing, we have em (P, φ(Q)) = em (P, φ(aP )) = em (P, aφ(P )) =
em (P, φ(P ))a = em (aP, φ(P )) = em (Q, φ(P )).
The twisted pairing is defined on G×G′ with two different groups G, G′ , so
the question of symmetry is not legitimate in the context of twisted pairings.
75. For simplicity, we assume that the curve is defined over a field of characteristic
6= 2, 3 by the equation E : Y 2 = X 3 +aX +b. The following program translates
the algorithm of Exercise 4.57 to GP/PARI.

L(U,V,Q) = \
local(l,m); \
if ((U == [0]) && (V == [0]), return(1)); \
if (U == [0], return(Q[1] - V[1])); \
if (V == [0], return(Q[1] - U[1])); \
if ((U[1] == V[1]) && (U[2] == -V[2]), return(Q[1] - U[1])); \
if ((U[1] == V[1]) && (U[2] == V[2]), \
l = (3 * U[1]^2 + E[4]) / (2 * U[2]), \
l = (V[2] - U[2]) / (V[1] - U[1]) \
); \
m = l * U[1] - U[2]; \
return(Q[2] - l * Q[1] + m);

p = 43; L([Mod(1,p),Mod(2,p)],[Mod(1,p),Mod(2,p)],[x,y])

76. We first fix the prime p, a supersingular curve E of the given form, and the
parameters m and k. We also need a representation of the field Fpk . Since
p ≡ 3 (mod 4) and k = 2, we can represent this extension as Fp (θ) with
θ2 + 1 = 0. We pass two points for which the reduced Tate pairing needs to
be computed. We invoke the line-function primitive L(U,V,Q) implemented in
Exercise 4.75. We use the formula hP, Qim = fm,P (Q). If the computation
fails, we conclude that this is a case of degenerate output, and return 1.

p = 43; E = ellinit([0,0,0,Mod(3,p),0]);
m = (p + 1) / 4; k = 2; T = Mod(t, Mod(1,p)*t^2 + Mod(1,p))

Tate(P,Q) = \
local(s,U,V,fnum,fden); \
s = ceil(log(p)/log(2)); while (bittest(m,s)==0, s--); \
Solutions to Selected Exercises 557

fnum = fden = 1; U = P; s--; \


while (s >= 0, \
fnum = fnum^2 * L(U,U,Q); \
U = elladd(E,U,U); \
V = U; if (matsize(U)==[1,2], V[2] = -U[2]); \
fden = fden^2 * L(U,V,Q); \
if (bittest(m,s) == 1, \
fnum = fnum * L(U,P,Q); \
U = elladd(E,U,P); \
V = U; if(matsize(U)==[1,2], V[2] = -U[2]); \
fden = fden * L(U,V,Q); \
); \
s--; \
); \
if ((fnum == 0) && (fden == 0), return(1)); \
return((fnum / fden) ^ ((p^k - 1) / m));

lift(lift(Tate([1, 2], [15 + 22*T, 5 + 14*T])))

Chapter 5 Primality Testing


2. The sieve of Eratosthenes marks many composite integers multiple times. For
example, 30 is marked thrice—as multiples of 2, 3, and 5. To improve the
running time of the sieve, we plan to mark a composite integer x 6 n only
once, namely, as a proper √ multiple
√ of its least prime factor p. If x = pf with
the cofactor f , then p 6 x 6 n, and p 6 r for any prime factor r of f .
In the following algorithm, the outer loop runs over all possible values of f
(between 2 and ⌊n/2⌋). Let q denote the least prime factor of one value of f
in its range of variation. So long as p 6 q, the least prime factor of pf is p,
so we mark pf as composite (provided that pf 6 n). But q is not known in
advance, so we keep on dividing f by p (along with marking pf ) until p|f . At
this point of time, we terminate the iteration for the current cofactor√ f . The
algorithm requires a (sorted) list p1 , p2 , . . . , pt of all primes 6 n.

Prepare the sorted list p1 , p2 , . . . , pt of all primes 6 n.
Initialize each element of the array P (indexed 1 through n) to true.
For f = 2, 3, 4, . . . , ⌊n/2⌋, repeat: {
For i = 1, 2, . . . , t, repeat: {
If (pi f > n), break the inner for loop.
Mark P [pi f ] as false.
If (pi |f ), break the inner for loop.
}
}
For i = 2, 3, . . . , n, repeat: { If P [i] is true, output i. }
558 Computational Number Theory

The primes 6 n may be obtained recursively. √ Even the original sieve of
Eratosthenes can generate these primes in O( n ln ln n) (that is, o(n)) time.
3. (a) By the prime number theorem, the number of primes 6 x is nearly x/ ln x,
that is, a randomly chosen integer 6 x is prime with probability about 1/ ln x.
Under the assumption that x and 2x+1 behave as random integers, one n+i is
a Sophie Germain prime with probability about 1/[ln(n+M ) ln(2(n+M )+1)]
which is approximately 1/ ln2 n. Therefore, we should take M = ln2 n (or a
small multiple of ln2 n).
(b) We use an array A indexed by i in the range 0 6 i 6 M . It is not essential
to know the exact factorizations of n+i. Detecting only that n+i or 2(n+i)+1
is divisible by any pj suffices to throw away n + i.
In view of this, we initialize each array location Ai to 1. Now, take q = pj
for some j ∈ {1, 2, . . . , t}. The condition q | (n + i) implies i ≡ −n (mod q),
so we set Ai = 0 for all values of i satisfying this congruence. Moreover, for
q 6= 2, the condition q | [2(n + i) + 1] implies i ≡ −n − 2−1 (mod q), that is,
we set Ai = 0 for all values of i satisfying this second congruence.
After all primes p1 , p2 , . . . , pt are considered, we check the primality of n+i
and 2(n + i) + 1 only for those i, for which we still have Ai = 1.
(c) Let P = p1 p2 · · · pt and Q = p2 p3 · · · pt . The probability that a random
n + i is not divisible by any pj is about φ(P )/P . Likewise, the probability that
a random 2(n+i)+1 is not divisible by any pj is about φ(Q)/Q. Let us assume
that the two events divisibility of n + i by pj and divisibility of 2(n + i) + 1 by
pj are independent. But then, we check the primality of n + i and 2(n + i) + 1
for about (M +1) φ(PP)φ(Q)
Q values of i. Therefore, the speedup obtained is close
to φ(PP)φ(Q)
Q
. For t = 10, this speedup is about 20; for t = 100, it is about 64;
and for t = 1000, it is about 128. Note that for a suitably chosen t, we may
neglect the sieving time which is O(t + M log t), that is, O(t + (log2 n)(log t)).
In contrast, each primality test (like Miller–Rabin) takes O(log3 n) time.
9. Since the three primes r, p, q must be distinct from one another, we take p < q.
Since there are only finitely many pairs (p, q) satisfying p < q < r, we may
assume, without loss of generality, that r < q. Moreover, if p < r < q, then
by the result to be proved, there are only finitely many Carmichael numbers
of the form prq with the smallest factor p < r. Therefore, we assume that
r < p < q, that is, r is the smallest prime factor of n = rpq.
We have n − 1 = pqr − 1 = pq(r − 1) + (pq − 1). Since n is a Carmichael
number, we have (r − 1)|(n − 1), that is, (r − 1)|(pq − 1). A similar result holds
for p and q too, that is, for some positive integers u, v, w, we have:
pq − 1 = u(r − 1),
qr − 1 = v(p − 1),
pr − 1 = w(q − 1).
Since p, q, r are odd primes with r < p < q, we have q − 1 > p, that is,
pr − 1 = w(q − 1) > wp, that is, pr > wp, that is, w < r, that is,
w 6 r − 1.
Solutions to Selected Exercises 559
¡ ¢
Now, qr = 1 + v(p − 1) = r pr−1
w + 1 , that is,

(r − 1)(r + w)
p−1= .
vw − r2
Since r − 1, r + w, p − 1 are positive, vw − r2 is positive too. Finally, since
vw − r2 is an integer, we have
p − 1 6 (r − 1)(r + w) 6 (r − 1)(2r − 1).
Given r, there are, therefore, only finitely many possibilities for p. For each
such p, we get only finitely many primes q satisfying pr − 1 = w(q − 1).
11. (a) In view of Exercise 5.10(a), it suffices to concentrate only on Carmichael
numbers n. We can write n = p1 p2 · · · pr with pairwise distinct odd primes
p1 , p2 , . . . , pr , r > 3, and with (pi − 1)|(n − 1) for all i = 1, 2, . . . , r. We now
consider two cases.
n−1
Case 1: All pi −1 are even.
³ ´ ³ ´
a a
We choose a base a ∈ Z∗n such that p1 = −1, whereas
= +1 for
pi
¡a¢
i = 2, 3, . . . , r. By the definition of the Jacobi symbol, we have n = −1.
(n−1)/2
By Euler’s criterion, a(p1 −1)/2 ≡ −1 (mod p1 ). Since pn−1 1 −1
= (p 1 −2)/2
is
even by hypothesis, we have a(n−1)/2 ≡ 1 (mod p1 ). On the other hand, for
i = 2, 3, . . . , r, we have a(pi −1)/2 ≡ 1 (mod pi ), that is, a(n−1)/2 ¡≡ 1¢ (mod pi ).
By CRT, we then have a(n−1)/2 ≡ 1 (mod n), that is, a(n−1)/2 6≡ na (mod n),
that is, n is not an Euler pseudoprime to base a.
n−1
Case 2: Some pi −1 is odd.

Without loss of generality, assume that pn−1 1 −1


is odd. Again take a ∈ Z∗n with
³ ´ ³ ´
a
p1 = −1 and pai = +1 for i = 2, 3, . . . , r. By the definition of the Ja-
¡ ¢
cobi symbol, we then have na = −1. On the other hand, by Euler’s cri-
terion, we have a(n−1)/2 ≡ −1 (mod p1 ) and a(n−1)/2 ≡ 1 (mod pi ) for
i = 2, 3, . . .¡, r.¢ By CRT, we conclude that a(n−1)/2 6≡ ±1 (mod n), that is,
a(n−1)/2 6≡ na (mod n), that is, n is not an Euler pseudoprime to base a.
(b) Suppose that n is an Euler pseudoprime to the bases a1 , a2 , . . . , at ∈ Z∗n
only. Let a be a base to which n is not an¡ Euler ¢ pseudoprime. (Such a base
exists by Part (a).) We have a(n−1)/2 6≡ na (mod n). On the other hand,
(n−1)/2 ¡ ¢ (n−1)/2
ai ≡ ani (mod n) for i = 1, 2, . . . , t. It follows that (aai ) ≡
(n−1)/2 (n−1)/2 ¡ a
¢ ¡ ai
¢ ¡ aai
¢
a ai 6≡ n n ≡ n (mod n), that is, n is not an Euler
pseudoprime to each of the bases aai , that is, there are at least t bases to
which n is not an Euler pseudoprime.
13. (a) We have n − 1 = pq − 1 = p(2p − 1) − 1 = (p − 1)(2p + 1). Let
α ≡ a(n−1)/2 (mod p), and β ≡ a(n−1)/2 (mod q). Modulo p, we have
³ ´2p+1 ³ ´
α ≡ (a(p−1)/2 )2p+1 ≡ ap ≡ ap (mod p). In particular, α = ±1.
560 Computational Number Theory
¡ ¢ ³ ´³ ´
Since na = ap a
q , we conclude that n is an Euler pseudoprime to base a
only if β = ±1.
The determination of β is more involved. If a is a quadratic residue modulo
n, then a(q−1)/2 ≡ 1 (mod q). Exactly half of these residues satisfy a(q−1)/4 ≡
1 (mod q), and the remaining half a(q−1)/4 ≡ −1 (mod q). Since 2p + 1 is odd,
it follows that β ≡ a(n−1)/2 ≡ 1 (mod q) for half of the quadratic residues
modulo q and β ≡ a(n−1)/2 ≡ −1 (mod q) for the remaining half of the
quadratic residues modulo q. If a is a quadratic non-residue modulo q, then
a2p+1 (mod q) is again a quadratic non-residue modulo q (since 2p + 1 is odd).
Therefore, β ≡ a(n−1)/2 ≡ (a2p+1 )(q−1)/4 6≡ ±1 (mod q), that is, it suffices to
concentrate only on the case that a is a quadratic residue
³ ´ modulo q.
Let n be an Euler pseudoprime to base a. If α = ap = 1, we should have
³ ´ ¡ ¢ ¡ q−1 ¢
a
q = 1 and β = 1. There are exactly p−1 2 4 = (p−1)(q−1)
8 such bases
¡ ¢ ³ ´ ³ ´
in Z∗n . For each such base, na = ap a
q = 1 and a
(n−1)/2
≡ 1 (mod n). On
³ ´ ³ ´
the other hand, if α = ap = −1, we must have aq = 1 and β = −1. There
¡ ¢ ¡ q−1 ¢ (p−1)(q−1)
are again exactly p−1 2³ ´ ³4 ´ = 8 such bases in Z∗n . For each such
¡a¢
base, we have n = ap a
q = −1 and a
(n−1)/2
≡ −1 (mod n).
(b) If p ≡ 3 (mod 4), we have q ≡ 1 (mod 4), that is, n ≡ 3 (mod 4), that
is, (n − 1)/2 is odd. Now, n is a strong pseudoprime to base a if and only if
a(n−1)/2 ≡ ±1 (mod n). As¡ ¢shown in Part (a), this condition is the same as
the condition a(n−1)/2 ≡ na (mod n).
18. (a) If b = 1, we have αβ = 1, that is, β = α−1 . Therefore, (α/β)t = 1
implies α2t = 1, that is, αt = ±1 (since Fp and Fp2 are fields). But then,
αt = β t = ±1, and Vt = αt + β t = ±2.
21. (b) We prove this by contradiction. Suppose that √ n is composite. Let p be
the smallest prime divisor of n. Then 3 6 p 6 n. By the given condition,
a(n−1)/2 ≡ −1 6≡ 1 (mod p), whereas an−1 ≡ (−1)2 ≡ 1 (mod p), that is,
ordp a = t2r for some odd t > 1. But ordp a|(p − 1), that is, t2r |(p
√ − 1), that
r r r
But p 6= 1, so p − 1 > 2 , that is, p > 2 + 1 > k2r + 1 =
is, 2 |(p − 1). √

n − 1 + 1 > n, a contradiction to the choice of p.
(c) Assume that the input integer n is already a Proth number.

Repeat the following steps for t times: {


Choose a random base a in the range 1 6 a 6 n − 1.
If a(n−1)/2 ≡ −1 (mod n), return yes.
}
Return no.

The running time of this algorithm is dominated by (at most) t modular


exponentiations. So long as t is a constant (or a polynomial in log n), the
running time of this algorithm is bounded by a polynomial in log n.
Solutions to Selected Exercises 561

22. (a) Since an−1 ≡ 1 (mod n), we have an−1 ≡ 1 (mod p). Moreover, since p | n,
we have gcd(a(n−1)/q − 1, p) = 1, that is, a(n−1)/q 6≡ 1 (mod p). It follows that
ordp (a) = ut for some t | v. But then, b ≡ at (mod p) has order u modulo p.
Since ordp b divides φ(p) = p − 1, we have u|(p − 1), that is, p ≡ 1 (mod u).√
(b) Suppose that n is composite.
√ Take any prime divisor p of n with p 6 n .
By Part (a), p > u + 1 > n + 1, a contradiction. Therefore, n must be prime.
(c) In order to convert the above observations to an efficient algorithm, we
need to clarify two issues.
(1) The √integer n − 1 can be written as uv with u, v as above and with
u > n . We can keep on making trial divisions of n − 1 by small √
primes
q1 = 2, q2 = 3, q3 = 5, . . . until n − 1 reduces to a value v 6 n . If
n − 1 is not expressible in the above form, we terminate the procedure,
and report failure after suitably many small primes are tried.
(2) We need an element a satisfying the two conditions an−1 ≡ 1 (mod n)
and gcd(a(n−1)/q − 1, n) = 1 for all q|u. If n is prime, any of the φ(n − 1)
primitive elements modulo n satisfies these conditions. Thus, a suitable
random base a is expected to be available within a few iterations.
27. The adaptation of the sieve of Eratosthenes is implemented by the following
function. Here, k is the number of small primes to be used in the sieve. The
sieve length is taken to be M = 2t. If no primes are found in the interval
[a, a + M − 1] for a randomly chosen a, the function returns −1.

randprime(t,k) = \
local(M,a,A,i,p,r); \
a = 0; \
while (a <= 2^(t-1), a = random(2^t)); \
M = 2 * t; A = vector(M); \
for (i=0, M-1, A[i+1] = 1); \
for (i=1, k, \
p = prime(i); r = a % p; \
if (r != 0, r = p - r); \
while (r < M, A[r+1] = 0; r = r + p ); \
); \
for (i=0, M-1, if (A[i+1] == 1, \
if(isprime(a+i), return(a+i)); \
)); \
return(-1);

³ 2 ´
32. We write n − l = 2s t with t odd, where l = a −4b
n . The following function
computes (and returns) Vt and Vt+1 modulo n using the doubling formulas.
The power bt (mod n) is also computed (and returned).

VMod(m,n,a,b) = \
local(V,Vnext,B,i); \
V = Mod(2,n); Vnext = Mod(a,n); B = Mod(1,n); \
i = ceil(log(m)/log(2)) - 1; \
562 Computational Number Theory

while (i >= 0, \
if (bittest(m,i) == 0, \
Vnext = V * Vnext - a * B; \
V = V^2 - 2 * B; \
B = B^2; \
, \
V = V * Vnext - a * B; \
Vnext = Vnext^2 - 2 * B * b; \
B = B^2 * b; \
); \
i--; \
); \
return([V,Vnext,B]);

Next, we compute Ut from Vt and Vt+1 . If Ut ≡ 0 (mod n), then n is a strong


Lucas pseudoprime with parameters (a, b). If not, we check whether any of
V2j t ≡ 0 (mod n) for j = 0, 1, 2, . . . , s − 1. If so, n is again a strong Lucas
pseudoprime. If all these checks fail, then n is not a strong Lucas pseudoprime.

strongLucaspsp(n,a,b) = \
local(D,l,U,s,t,V,Vnext,W,B); \
D = a^2 - 4*b; l = kronecker(D,n); \
if (l == 0, return(0)); \
t = n - l; s = 0; \
while (bittest(t,s) == 0, s++); \
t = t / (2^s); \
W = VMod(t,n,a,b); \
V = W[1]; Vnext = W[2]; B = W[3]; \
U = (2*Vnext - a*V)/D; \
if (U == Mod(0,n), return(1)); \
while (s>0, \
if (V == Mod(0,n), return(1)); \
Vnext = V * Vnext - a * B; \
V = V^2 - 2 * B; \
B = B^2; \
s--; \
); \
return(0);

Chapter 6 Integer Factorization


4. To avoid the sequences 1, 1, 1, . . . and −1, −1, −1, . . ., respectively.
7. (a) Let u = pe11 pe22 · · · pet t for small primes p1 , p2 , . . . , pt (distinct from the large
prime v). The first stage computes b ≡ aE (mod n), where E = pf11 pf22 · · · pft t
for some fi > ei . Now, ordp a|(p − 1), and ordp b = ordp a/ gcd(ordp a, E) ∈
{1, v}. Therefore, if b 6≡ 1 (mod n), we have ordp b = v.
Solutions to Selected Exercises 563

(b) We have bk ≡ 1 (mod p) if and only if v|k. Therefore, we may assume


that b0 , b1 , b2 , . . . constitute a random sequence in the multiplicative group
generated by √ v. By the birthday paradox, we expect bi ≡ bj (mod p) with
i 6= j after Θ( v ) iterations.
(c) If bi ≡ bj (mod p), but bi 6≡ bj (mod n/p), then gcd(bi − bj , n) is a
non-trivial factor of n.
(d) We store the sequence b0 , b1 , b2 , . . . . Whenever a new √ bi is generated, we
compute gcd(bi − bj , n) for j = 0, 1, 2, . . . , i − 1. Since O( B ′ ) terms in the
sequence suffice
√ to split n with high probability, the running time is essentially
that of O( B ′ ) exponentiations √ and O(B ′ ) gcd calculations. The storage re-

quirement is that of O( B ) elements of Zn .

The number of primes between B and B ′ is about lnBB ′ − lnBB . If B ′ ≫ B,

this is about lnBB ′ . The second stage described in Section 6.3.1 makes (at most)
these many exponentiations and these many gcd computations, and calls for
a storage of only a constant number of elements of Zn (in addition to the
absolutely constant list of primes between B and B ′ ).
It follows that the second stage of Section 6.3.1 has running time compara-
ble with that of the variant discussed in this exercise (the new variant carries
out fewer exponentiations but more gcd calculations than the original vari-
ant). The space requirements of the two √ variants are not directly comparable
(O(B ′ / ln B ′ ) large primes against O( B ′ ) elements of Zn ). If n is large, the
original variant is better in terms of storage.
8. (d) Let n = a2 + b2 = c2 + d2 , that is, (a − c)(a + c) = (d − b)(d + b). Assume
that a > c, so that d > b. Let u = gcd(a − c, b − d) and v = gcd(a + c, b + d).
Write a − c = uα, d − b = uβ, a + c = vγ and b + d = vδ for positive integers
α, β, γ, δ. But then, (a − c)(a + c) = (d − b)(d + b) implies uαvγ = uβvδ, that
is, αγ = βδ. Since gcd(α, β) = gcd(γ, δ) = 1, this implies that α = δ and
β = γ, that is, we have
a − c = uα,
a + c = vβ,
d − b = uβ,
d+b = vα.
This gives a = (uα + vβ)/2 and b = (vα − uβ)/2, that is, n = a2 + b2 =
(u2 + v 2 )(α2 + β 2 )/4. If α = β, then a = d and b = c, a contradiction. So at
least one of α, β is > 2, that is, α2 + β 2 > 5.
Since n is odd, one of a, b is odd and the other even. Likewise, one of c, d
is odd, and the other even. If a and c are of the same parity, then so also are
b and d, that is, u and v are both even, that is, (u2 + v 2 )/4 is a non-trivial
factor of n. On the other hand, if a, c are of opposite parities, then so also
are b, d, and it follows that u, v, α, β are all odd. But then, (u2 + v 2 )/2 is a
non-trivial factor of n.
2
12. (a) For a randomly chosen x, the integerh i T (c) = (x + c) rem n is of value
−1
O(n), and so has a probability of L 2× 12
= L[−1] of being L[1/2]-smooth.
564 Computational Number Theory

That is, L[1] values of c need to be tried in order to obtain a single relation.
Since we require about 2t (which is again L[1/2]) relations, the value of M
should be L[1] × L[1/2] = L[3/2].
(b) Proceed as in the QSM. Let x2 = kn + J with J ∈ {0, 1, 2, . . . , n − 1}. We
have (x + c)2 ≡ x2 + 2xc + c2 ≡ kn + J + 2xc + c2 ≡ T (c) (mod n), where
T (c) = J + 2xc + c2 . Use an array A indexed by c in the range −M 6 c 6 M .
Initialize Ac = log |T (c)|. For each small prime q and small exponent h, solve
the congruence (x + c)2 ≡ kn (mod q h ). For all values of c in the range
−M 6 c 6 M , that satisfy the above congruence, subtract log q from Ac .
When all q and h values are considered, check which array locations Ac store
values ≈ 0. Perform trial divisions on the corresponding T (c) values.
(c) Follow the analysis of sieving in QSM. Initializing A takes L[3/2] time.
Solving all the congruences (x + c)2 ≡ kn (mod q h ) takes L[1/2] time. Sub-
traction of all log q values takes L[3/2] time. Trial division of L[1/2] smooth
values by L[1/2] primes takes L[1] time. Finally, the sparse system with L[1/2]
variables and L[1/2] equations can be solved in L[1] time.
√ §√ ¨
18. (c) Let H = ⌈ n ⌉ (original QSM) and H ′ = 2n (modified QSM). Let
M be the optimal sieving limit when the original QSM runs alone. In the
case of the dual QSM, we run both the original QSM and the modified QSM
with a sieving limit of M/2. In the original QSM, 2M + 1 candidates (for
−M 6 c 6 M ) are tried for smoothness. In the dual QSM, there are two sieves
each handling M + 1 candidates (for −M/2 6 c 6 M/2). The total number
of candidates for the dual QSM is, therefore, 2M + 2. In the original QSM,
2M +1 candidates are expected to supply the requisite number of relations. So
the dual QSM, too, is expected to supply nearly the same number of relations,
provided that the candidates are not much larger in the dual QSM than in
the original QSM. Indeed, we now show that the dual QSM actually reduces
the absolute values of the candidates by a factor larger than 1.
The values of |T (c)| for the original QSM are approximately proportional

√ H, whereas those for the modified QSM are roughly proportional to H ≈
to
2H. In particular, the average value of |T (c)| for the first sieve is nearly
2H ×(M/4) = M H/2 (the sieving interval is M/2 now),√and the average value

of |T (c)| for the second sieve
√ is about 2H × (M/4) ≈ 2 M H/2. The overall
average is, therefore, (1 + 2 )M H/4. When the original QSM runs alone, this
average is M H. Consequently, the smoothness candidates in the √ dual QSM are
smaller than those for the original QSM by a factor of 4/(1 + 2 ) ≈ 1.657. As
a result, the dual QSM is expected to supply more relations than the original
QSM. Viewed from another angle, we can take slightly smaller values for M
and/or t in the dual QSM than necessary for the original QSM, that is, the
dual QSM is slightly more efficient than the original QSM.
The dual QSM does not consider the larger half of the T (c) values (cor-
responding to M/2 < |c| 6 M ) for smoothness tests. It instead runs another √
sieve. Although the smoothness candidates in the second sieve are about 2
times larger than the candidates in the original sieve, there is an overall re-
duction in the absolute values of T (c) (averaged over the two sieves).
Solutions to Selected Exercises 565

20. (a) Solve the following quadratic congruence for c0 :

c20 + 2c0 H + J ≡ 0 (mod q).

If we run the original sieve in QSM, such a q and a corresponding value c0


of c can be obtained from the sieving array itself (an entry holding a residual
value between log pt and 2 log pt ).
(b) We have

T (c0 + cq) T (c0 )


Tq (c) = = + 2c(H + c0 ) + c2 q.
q q

This is an integer since q|T (c0 ). Since Tq (c) is a quadratic polynomial in c,


sieving can be done as in the original QSM. Moreover, c and c0 are small, so
the maximum value of |Tq (c)| is ≈ 2M H—the same as in the original QSM.
(c) A large prime q available from the (original) QSM is useful only when
it appears in multiple relations. But q > pt is large, so the probability that
it appears as the unfactored part in two (or more) relations among 2M + 1
choices of c is small. The special-q variant allows us to systematically collect
relations in which q is involved. If Tq (c) is pt -smooth for some c, we have
T (c0 + cq) equal to q times this smooth value. We are interested in only a few
such relations, and so we can run the special-q sieve only for a small sieving
interval (of length L[1/2] only, instead of L[1] as in the original sieve).
25. (a) The point-multiplication algorithm is modified as follows. Two variables
S and T are used in order to store Ni P and (Ni + 1)P , respectively.

Initialize S = O and T = P .
For i = s − 1, s − 2, . . . , 1, 0, repeat: {
If (ni = 0) { /* Update (S, T ) to (2S, 2S + P ) = (2S, S + T ) */
Assign T = S + T and S = 2S.
} else { /* Update (S, T ) to (2S + P, 2S + 2P ) = (S + T, 2T ) */
Assign S = S + T and T = 2T .
}
}
Return S.

(b) Let P1 = (h1 , k1 ), P2 = (h2 , k2 ), P1 +P2 = (h3 , k3 ), and P1 −P2 = (h4 , k4 ).


First, consider the case P1 6= P2 . The addition formula gives

(h1 − h2 )2 h3 = (h1 + h2 )(h1 h2 + a) + 2b − 2k1 k2 ,


(h1 − h2 )2 h4 = (h1 + h2 )(h1 h2 + a) + 2b + 2k1 k2 .

Multiplying these two formulas and substituting k12 = h31 + ah1 + b and k22 =
h32 + ah2 + b, we get

h3 h4 (h1 − h2 )2 = (h1 h2 − a)2 − 4b(h1 + h2 ).


566 Computational Number Theory

This implies that given h1 , h2 , h4 alone, one can compute h3 . If P1 = P2 (the


case of doubling), we have

4h3 (h31 + ah1 + b) = (h21 − a)2 − 8bh1 .

Given h1 alone, we can compute h3 .


(c) The point-multiplication algorithm of Part (a) is a special case of Part (b),
where P1 = Ni P and P2 = (Ni + 1)P . Here, P1 − P2 = −P has known X-
coordinate. Therefore, for computing mP , we have the choice of never com-
puting any Y -coordinate. This saves many arithmetic operations modulo n.
An iteration in the original point-multiplication algorithm computes only
a doubling if ni = 0, and a doubling and an addition if ni = 1. An iteration in
the algorithm of Part (a) makes one addition and one doubling irrespective of
the bit value ni . Despite that, ignoring all Y -coordinates is expected to bring
down the running time. Since the X-coordinate alone suffices to detect failure
in the computation of mP , the ECM continues to work.
28. (a) We assume that an integer of the form a + bm is divisibleP by the
P medium
1 1
prime q with probability 1/q. Therefore, N ′ /N ≈ q∈M q = p∈B p −
P 1 ′
p∈B ′ p ≈ ln ln B − ln ln B = ln ln B − ln ln(kB) = ln ln B − ln(ln k + ln B) =
ln ln B − ln (ln B(1 + ln k/ ln B)) = − ln(1 + ln k/ ln B) ≈ − ln k/ ln B =
ln(1/k)/ ln B. For B = 106 and k = 0.25, this ratio is about 0.1.
(b) Here, we force a smooth value to be divisible by at least one medium
prime. Therefore, the candidates having all prime factors 6 B ′ (only the
small primes) are missed in the lattice sieve.
Treat a + bm ≈ bm as random integers with absolute values no more
than C = bmax m. Let u = ln C/ ln B and u′ = ln C/ ln B ′ . Then, the frac-
tion of smooth integers that are missed by the lattice sieve is approximately
′ ′
u′−u /u−u = uu /u′u .
For the given numerical values, C = 1036 , u ≈ 6 and u′ ≈ 6.669. Therefore,

the fraction of smooth values missed by the lattice sieve is ≈ uu /u′u ≈ 0.149.
The implication is that we now check the smoothness of about 10% of the
line-sieve candidates. Still, we obtain 85% of the smooth values.
(c) If (a, b) and (a′ , b′ ) satisfy a + bm ≡ 0 (mod q) and a′ + b′ m ≡ 0 (mod q),
and if r ∈ Z, then (a + a′ ) + (b + b′ )m ≡ 0 (mod q) and ra ≡ (rb)m (mod q).
(d) We need to locate all (a, b) pairs such that a + bm is divisible by q and
(a + bm)/q is smooth over all primes < q. If a + bm ≡ 0 (mod q), we can
write (a, b) = cV1 + dV2 = c(a1 , b1 ) + d(a2 , b2 ) = (ca1 + da2 , cb1 + db2 ), that is,
a = ca1 + da2 and b = cb1 + db2 . Clearly, if a and b are coprime, then c and d
are coprime too. So we concentrate on (c, d) pairs with gcd(c, d) = 1.
We use a two-dimensional array A indexed by c, d in the ranges −C 6
c 6 C and 1 6 d 6 D. We initialize A[c, d] to log |c(a1 + b1 m) + d(a2 + b2 m)|
if gcd(c, d) = 1, or to +∞ if gcd(c, d) > 1. Let p be a prime < q, and h a
small exponent. The condition ph |(a + bm) is equivalent to the condition that
cu1 + du2 ≡ 0 (mod ph ), where u1 = a1 + b1 m and u2 = a2 + b2 m correspond
to the basis vectors V1 , V2 . Both u1 and u2 are divisible by q, but not by p. If
Solutions to Selected Exercises 567

ph |u1 (but p6 | u2 ), then we subtract log p from all the entries in the d-th column
if and only if ph |d. Likewise, if ph |u2 (but p6 | u1 ), then log p is subtracted from
the entire c-th row if and only if ph |c. Finally, if gcd(u1 , p) = gcd(u2 , p) = 1,
then for each row c, a solution for d is obtained from cu1 + du2 ≡ 0 (mod ph ),
and every ph -th location is sieved in the c-th row.
Pollard suggests another sieving idea. The (c, d) pairs satisfying cu1 +du2 ≡
0 (mod ph ) form a sublattice. Let W1 = (c1 , d1 ) and W2 = (c2 , d2 ) constitute
a reduced basis of this sublattice. The array locations (c, d) = e(c1 , d1 ) +
f (c2 , d2 ) = (ec1 + f c2 , ed1 + f d2 ) are sieved for small coprime e and f .
35. Dixon’s method (both the relation-collection and the linear-algebra phases)
are implemented below. The decisions about t (the size of the factor base) and
s (the number of relations to be collected) are not taken by this function.

Dixon(n,s,t) = \
local(i,j,x,y,X,A,C,D,k,v,beta,d,e); \
A = matrix(t,s); C = matrix(t,s); X = vector(s); i = 1; \
while(i <= s, \
X[i] = random(n); y = (X[i]^2) % n; \
for (j=1,t, \
A[j,i] = 0; \
while (y % prime(j) == 0, A[j,i]++; y = y / prime(j)); \
); \
if (y == 1, i++) \
); \
print("Relations collected"); \
for (i=1,t, for(j=1,s, C[i,j] = Mod(A[i,j],2))); \
D = lift(matker(C)); k = matsize(D)[2]; \
print("Nullspace computed: dimension = ", k); \
v = vector(k); beta = vectorv(s); d = 1; \
while ((d==1) || (d==n), \
for (j=1,k, v[j] = random(2)); \
for (i=1,s, \
for(j=1,k, \
beta[i] = beta[i] + v[j] * D[i,j] \
); \
beta[i] = beta[i] % 2; \
); \
e = mattranspose(A * beta); \
x = Mod(1,n); for (i=1,s, if (beta[i], x = x * Mod(X[i],n))); \
y = Mod(1,n); for (j=1,t, y = y * Mod(prime(j),n)^(e[j]/2)); \
d = gcd(lift(x)-lift(y),n); \
print("x = ", lift(x), ", y = ", lift(y), ", d = ", d); \
); \
return(d);

Dixon(64349,20,15)

37. Given a bound b, we first compute the factor base B consisting of −1 and
all primes 6 b, modulo which n is a quadratic residue. We also require the
sieving range [−M, M ]. In what follows, we implement the incomplete sieving
strategy. The tolerance bound ξ should also be supplied.
568 Computational Number Theory

getFactorBase(n,b) = \
local(P,B,p,t,i); \
P = vector(b); p = 2; t = 1; P[t] = -1; \
while (p <= b, \
if (kronecker(n,p) == 1, t++; P[t] = p); \
p = nextprime(p+1); \
); \
B = vector(t); for (i=1,t, B[i] = P[i]); \
return([t,B]);

QSMsieve(n,b,M,xi) = \
local(H,J,tB,t,B,logB,c,c1,c2,A,i,T,bnd); \
H = 1 + sqrtint(n); J = H^2 - n; \
tB = getFactorBase(n,b); t = tB[1]; B = tB[2]; tB = 0; \
logB = vector(t); for (i=2,t, logB[i] = floor(1000 * log(B[i]))); \
A = vector(2*M + 1); \
for (c=-M,M, \
T = J + 2*H*c + c^2; if (T < 0, T = -T); \
A[c+M+1] = floor(1000 * log(T)); \
); \
c = -M; if (J % 2 == 0, if (c % 2 == 1, c++), if (c % 2 == 0, c++)); \
while (c < M, A[c+M+1] -= logB[2]; c += 2); \
for (i=3,t, \
c1 = sqrt(Mod(n,B[i])); c2 = B[i] - c1; \
c1 = lift(c1 - Mod(H,B[i])); c2 = lift(c2 - Mod(H,B[i])); \
c = c1; while (c >= -M, A[c+M+1] -= logB[i]; c -= B[i]); \
c = c1 + B[i]; while (c <= M, A[c+M+1] -= logB[i]; c += B[i]); \
c = c2; while (c >= -M, A[c+M+1] -= logB[i]; c -= B[i]); \
c = c2 + B[i]; while (c <= M, A[c+M+1] -= logB[i]; c += B[i]); \
); \
bnd = floor(xi * 1000 * log(nextprime(B[t]))); \
for (c=-M, M, \
if (A[c+M+1] <= bnd, \
T = J + 2*H*c + c^2; \
X = vector(t); if (T < 0, T = -T; X[1] = 1, X[1] = 0); \
for (i=2,t, while(T % B[i] == 0, X[i]++; T = T / B[i])); \
if (T==1, \
print("Relation found for c = ", c, ", X = ", X), \
print("False alarm for c = ", c); \
); \
); \
)

QSMsieve(713057,50,50,1.5);

Chapter 7 Discrete Logarithms


5. Let h ∈ {1, 2,√
. . . , n} be the order of an element a in a finite group G of size
n, and m = ⌈ n ⌉. Write h − 1 = jm + i for some i, j ∈ {0, 1, 2, . . . , m − 1}.
We have e = ah = a(am )j ai = (am )j ai+1 (where e is the identity in G), that
Solutions to Selected Exercises 569

is, a−mj = ai+1 . For i = 1, 2, . . . , m, we compute and store the pairs (i, ai ) in
a table sorted with respect to the second component. Then, we precompute
a−m , and find the smallest j ∈ {0, 1, 2, . . . , m − 1} for which a−mj = (a−m )j is
the second element of some pair in the precomputed table. If multiple values
of i correspond to the same value as a−mj , we take the smallest such i. But
then, h = ord(a) = jm + i for these j and i.
11. Number the columns of the coefficient matrix C by 0, 1, 2, . . . , t + 2M + 1.
Column 0 corresponds to the prime −1, Columns 1 through t to the small
primes p1 , p2 , . . . , pt , and Columns t + 1 through t + (2M + 1) to the H + c
values for −M 6 c 6 M . Suppose also that the last row corresponds to the
free relation indg (pj ) = 1 for some j. This row has only one non-zero entry.
We now count the number of non-zero entries in the first m − 1 rows.
(1) The expected number of non-zero entries in Column 0 is (m − 1)/2.
(2) For 1 6 j 6 t, the expected number of non-zero entries in Column j is
(m − 1)/pj , since a randomly chosen integer is divisible by the prime pj with
probability 1/pj .
(3) In the submatrix consisting of the first m − 1 rows and the last 2M + 1
columns, each row has exactly two non-zero entries corresponding to the two
values c1 , c2 for a smooth T (c1 , c2 ). Of course, we allow the possibility c1 = c2
during sieving (in which case there is only one non-zero entry in a row), but
this situation occurs with a low probability, and we expect to get at most only
a small constant number of such rows. In view of this, we neglect the effects of
these rows in our count, and conclude that the expected number of non-zero
entries in each of the last 2M + 1 columns is 2(m − 1)/(2M + 1) ≈ m/M .
15. We first express some ag α as a product of small (6 L[1/2]) and medium-sized
(6 L[2]) primes. In order to compute the index of a medium-sized prime q,
we fix some c1 and vary c2 in order to locate a value of c1 u + c2 v which is
q times an L[1/2]-smooth value. We first locate one particular value γ of c2
for which c1 u + c2 v is a multiple of q, and then sieve over all c2 = γ + c′2 q

with |c′2 | 6 L[1/2]. Since the smoothness candidates are O( p), this sieve is
expected to produce a few L[1/2]-smooth values of (c1 u + c2 v)/q. This sieve
runs in L[1/2] time, as required.
√ Among √ these smooth values just obtained, we
locate one for which c2 − c1 −r ∈ Z[ −r ] is smooth over the small complex
primes of norms 6 L[1/2]. But then, we get a relation of the form

indg q + indg ((c1 u + c2 v)/q) ≡ indg v + indg (Φ(c2 − c1 −r )) (mod p − 1).

Since (c1 u + c2 v)/q is smooth over rational primes 6 L[1/2], and c2 − c1 −r
is smooth over complex primes of norms 6 L[1/2], we obtain indg q using the
precomputed database of indices of factor-base elements.
18. Take a polynomial f (x) of degree k with all irreducible factors of degrees 6 m.
Let i > 1 be the largest degree of an irreducible factor of f (x). Write

f (x) = g(x)u1 (x)e1 u2 (x)e2 · · · us (x)es ,


570 Computational Number Theory

where each uj (x) is an irreducible polynomial of degree i, and all irreducible


factors of g(x) are of degrees 6 i − 1. The degree of g(x) is k − ri, where
r = e1 + e2 + · · · + es ∈ N. The number of possibilities for g is N (k − ri, i − 1).
¡ ¢
For a given sum r, the total number of choices of s, e1 , e2 , . . . , es is r+I(i)−1
r
(since there are I(i) irreducible polynomials in F2 [x] of degree i, and the
number of solutions in non-negative integers of x1 + x2 + · · · + xI(i) = r is the
¡ ¢
binomial coefficient r+I(i)−1
r ). If we vary both i and r, we get
m X µ ¶
X r + I(i) − 1
N (k, m) = N (k − ri, i − 1) .
i=1
r
r>1

19. The second stage of the LSM for F2n has the following sub-stages.
• For random α ∈ {0, 1, 2, . . . , 2n − 2}, we try√ to express ag α as a prod-
uct of irreducible polynomials of degrees 6 √nlnln2n . The polynomial ag α

ln 2
is of degree ≈ n, and is smooth with probability L[− 2 ]. Therefore,

ln 2
L[ 2 ] = L[0.416 . . .] random values of α are expected to produce one

n ln n
smooth ag α . The irreducible factors of this ag α with degrees 6 2√ ln 2
are in the factor base. So it suffices to compute√
the indices of all irre-
n ln n
ducible factors of ag α with degrees larger than 2√ ln 2
but not larger than

n ln n

ln 2
Let u(x) be such an irreducible polynomial. Since each ag α can
.
be factored in polynomial time, this sub-stage runs in L[0.416 . . .] time.
• To compute the index of u(x), we find a polynomial v(x) = x⌈n/2⌉ +
d(x)

= u(x)w(x) with all irreducible factors of w(x) having degrees
n ln n
6 2√ ln 2
. If one multiple of u(x) of the form x⌈n/2⌉ + d0 (x) is located,
we search among the candidates x⌈n/2⌉ + d0 (x) + e(x)u(x) with √
deg e
as small as possible. We expect to get one such v(x) after L[ ln 2
2 ] =
L[0.416 . . .] trials. Indeed, deg e is expected to be as small as O(log n).
Using polynomial-time factoring algorithms, we finish this sub-stage in
L[0.416 . . .] time. The index of w(x) is computed from the database of
indices available from the first stage. If x⌈n/2⌉ +d(x) is in the factor base,
we get indg u. This is usually not the case, so we go to the next sub-stage.

n ln n
• We look at the following polynomial y(x) with deg c(x) 6 √
2 ln 2
.

y(x) ≡ v(x)(x⌈n/2⌉ + c(x)) ≡ (x⌈n/2⌉ + d(x))(x⌈n/2⌉ + c(x))


≡ xǫ f1 (x) + (c(x) + d(x))x⌈n/2⌉ + c(x)d(x) (mod f (x)).

Since both deg c and deg d are much smaller compared to ⌈n/2⌉, √
we
n ln n
expect to find one y(x) with all irreducible factors of degrees 6 2 ln 2 ,


after trying all of the L[ ln 2
2 ] polynomials x
⌈n/2⌉
+ c(x) in the factor
⌈n/2⌉
base. The index of this x + c(x) is available

from the first stage, so
ln 2
we obtain indg u. This sub-stage too runs in L[ 2 ] = L[0.416 . . .] time.
Solutions to Selected Exercises 571

21. To avoid duplicate counting of triples, force the condition c1 (2) > c2 (2) >
c3 (2). We have c3 (2) = c1 (2) XOR c2 (2). Let i = deg c1 and j = deg c2 . If
i < j, then c1 (2) < c2 (2), whereas if i > j, then c2 (2) < c3 (2). Therefore, we
must have i = j. If i = j = −∞, then c1 = c2 = c3 = 0. If i > 0, there are
1 + 2 + 3 + · · · + 2i = 2i (2i + 1)/2 possibilities for choosing c1 and c2 satisfying
c1 (2) > c2 (2). Each such choice gives c3 satisfying c2 (2) > c3 (2). Therefore,
the total number of tuples (c1 , c2 , c3 ) in the CSM for binary fields is
m−1 m−1 m−1
X 1 X i 1 X i
1+ 2i (2i + 1)/2 = 1 + 4 + 2
i=0
2 i=0 2 i=0
µ ¶
1 4m − 1 1 2 1
= 1+ + (2m − 1) = × 4m−1 + 2m−1 + .
2 3 2 3 3

24. If c1 = 0, then gcd(c1 , c2 ) = c2 . This gcd is 1 if and only if c2 = 1. Since


the pair (0, 1) is not to be counted, we may assume that c1 is non-zero. We
proceed by induction on b. If b = 1, the only possibilities for (c1 , c2 ) are (1, 0)
and (1, 1). On the other hand, 22×1−1 = 2, that is, the induction basis holds.
Now, let b > 2. The number of non-zero values of c1 is 2b − 1, and the
number of values of c2 is 2b . Among these (2b − 1)2b = 22b − 2b pairs (c1 , c2 ),
those with non-constant gcds should be discarded. Let d(x) = gcd(c1 , c2 ) have
degree δ > 1. Write c1 (x) = d(x)u1 (x) and c2 (x) = d(x)u2 (x) with degrees
of u1 and u2 less than b − δ and with gcd(u1 , u2 ) = 1. But u1 is non-zero, so
by the induction hypothesis, the number of choices for (u1 , u2 ) is 22(b−δ)−1 .
The gcd d(x) can be chosen as any one of the 2δ polynomials of degree δ.
Therefore, the desired number of pairs (c1 , c2 ) is
b−1 ³ ´ b−1
2b b
X
δ 2(b−δ)−1 2b b 2b−1
X 1
2 −2 − 2 ×2 = 2 −2 −2

δ=1 δ=1
µ ¶
2b b 2b−1 1 2b−1
= 2 −2 −2 1− = 2 .
2b−1
26. Fix c1 , and sieve over different values of c2 . We use an array A indexed by c2 .
A reasonable strategy is to let c2 (x) stand for the array index c2 (2). To avoid
duplicate generation of relations, we would not use the entire array (of size
2m ). For simplicity, however, we do not impose restrictions on c2 here.
We initialize the sieving array A by setting Ac2 (2) = deg T (c1 , c2 ). Then, we
take (w(x), h) pairs. The condition w(x)h |T (c1 , c2 ) gives a linear congruence
in c2 (for the fixed c1 ). If c̄2 (x) is a solution for c2 of this congruence, all
solutions for c2 are c2 (x) = c̄2 (x) + u(x)w(x)h for all polynomials u(x) of
degrees < δ = m − h deg w. For each such solution of c2 (x), we subtract deg w
from Ac2 (2) . At the end of the sieving, we identify all Ac2 locations storing
zero. These correspond to all the smooth values of T (c1 , c2 ) for the fixed c1 .
Computing c2 (x) = c̄2 (x) + u(x)w(x)h for all u(x) involves many multi-
plications of the form u(x)w(x)h . Using the δ-dimensional gray code, we can
avoid these multiplications. We step through the possibilities of u(x) using the
572 Computational Number Theory

sequence G0 , G1 , G2 , . . . , G2δ −1 . The i-th bit-string Gi = aδ−1 aδ−2 . . . a1 a0 is


identified with the polynomial ui (x) = aδ−1 xδ−1 + aδ−2 xδ−2 + · · · + a1 x + a0 .
We let u(x) vary in the sequence u0 , u1 , . . . , u2δ −1 . Initially, u0 (x) = 0, and
c2 (x) = c̄2 (x). Subsequently, we have ui (x) = ui−1 (x) + xv2 (i) . Therefore,
c̄2 (x) + ui (x)w(x)h = (c̄2 (x) + ui−1 (x)w(x)h ) + xv2 (i) w(x)h . Instead of multi-
plying ui (x) with w(x)h , we can shift w(x)h (this was precomputed) by v2 (i),
and add that shifted value to the previous value of c2 (x). Since one multipli-
cation is usually somewhat slower than the combination of one shift and one
addition, we gain some speedup in the sieving process.
Theoretically, this polynomial sieve runs in the same time as the variant
with trial division. However, since sieving replaces trial divisions by simpler
operations, we expect to get some practical speedup with sieving.
32. In the following code, the σi values are stored in the array S, the τi values in T,
and the multipliers in M. The first two of these arrays are randomly generated.
A (pseudorandom) sequence of triples (wi , si , ti ) is generated √
and stored in the
array W. We have wi = g si ati for all i. The array W stores Θ( 2n − 1 ) triples.
It is with high likelihood that periodicity is detected within these many itera-
tions. After the generation of the sequence, the array W is sorted with respect to
the first components of the triples. Finally, a pass is made through the sorted
W in order to detect repetitions. If we use an O(m log m)-time algorithm to

sort an array of size m, this implementation runs in O˜( 2n − 1 ) time.

update(w,r,S,T,M,f) = \
local(k,s,t,v); \
k = 1 + substpol(lift(w[1]),x,2) % r; \
s = (w[2] + S[k]) % (2^n-1); \
t = (w[3] + T[k]) % (2^n-1); \
v = (w[1] * M[k]) % f; \
return([v,s,t]);

genseq(f,g,a,r) = \
local(n,i,W,s,t,v,asz); \
n = poldegree(f); \
S = vector(r,x,random(2^n-1)); \
T = vector(r,x,random(2^n-1)); \
M = vector(r); \
for(i=1,r, M[i] = lift((g^S[i])*(a^T[i]))); \
asz = ceil(3*sqrt(2^n-1)); W = vector(asz); \
s = random(2^n-1); t = random(2^n-1); v = lift((g^s)*(a^t)); \
W[1] = [v,s,t]; \
for (i=2, asz, W[i] = update(W[i-1],r,S,T,M,f)); \
return(lift(W));

DLPrho(f,g,a,r) = \
local(n,W,asz,i,s,t); \
n = poldegree(f); \
asz = ceil(3*sqrt(2^n-1)); \
W = genseq(f,g,a,r); \
W = sort(W,asz); \
for (i=1,asz-1, \
Solutions to Selected Exercises 573

if (W[i][1] == W[i+1][1], \
s = W[i+1][2] - W[i][2]; \
t = W[i][3] - W[i+1][3]; \
if (gcd(t,2^n-1)==1, return(lift(Mod(s,2^n-1)/Mod(t,2^n-1)))); \
); \
)

f = Mod(1, 2)*x^7 + Mod(1, 2)*x + Mod(1, 2);


g = Mod(Mod(1, 2)*x^3 + Mod(1,2)*x^2, f);
a = Mod(Mod(1, 2)*x^5 + Mod(1, 2)*x^2 + Mod(1, 2), f);
DLPrho(f,g,a,10)

35. A GP/PARI code for the LSM for F2n is somewhat involved. We first imple-
ment a function getFB1(m) to return the list of all (non-constant) irreducible
polynomials of F2 [x] with degrees 6 m.
In the sieving stage, we index array elements by a polynomial c2 (x). We
associate the array location c2 (2) with c2 (x). We need two functions int2poly
and poly2int in order convert c2 (x) to c2 (2), and conversely.
The function DLPLSM2n implements incomplete sieving by the irreducible
polynomials in B1 (degrees 6 m). We restrict the degrees of c(x) in xν +c(x) to
less than m. In order to avoid generation of duplicate relations, we let (c1 , c2 )
pairs satisfy the constraints 0 6 c1 (2) 6 c2 (2) 6 2m − 1. For each fixed c1 (x),
we sieve an array indexed by c2 (x) in the range c1 (2) 6 c2 (2) 6 2m −1. The ar-
ray is initialized by the degrees of T (c1 , c2 ). Subsequently, for each irreducible
polynomial u(x) ∈ B1 , we solve u|T (c1 , c2 ). This is equivalent to solving the
linear congruence (xν + c1 (x))c2 (x) ≡ xǫ f1 (x) + xν c1 (x) (mod u(x)). For a
solution c(x) of c2 (x), all solutions are c2 (x) = c(x) + u(x)v(x), where v(x)
is any polynomial of degree 6 m − deg(u) − 1. After all u ∈ B are consid-
ered, we look at the array indices where the leftover degrees are 6 ξm. These
are shortlisted candidates in the LSM smoothness test (for the fixed c1 ). The
actual smooth values are identified by polynomial factorizations. Gordon and
McCurley’s trick (use of gray codes) is not implemented.

getFB1(m) = \
local(t,B,B1,F,i,j); \
B = vector(2^(m+1)); t = 0; \
for (i=1,m, \
F = factor(Mod(1,2)*x^(2^i)+x); \
for (j=1,matsize(F)[1], \
if (poldegree(F[j,1]) == i, t++; B[t] = F[j,1]); \
); \
); \
B1 = vector(t); for (i=1,t, B1[i] = B[i]); return(B1);

int2poly(d) = \
local(c,i); \
c = Mod(0,2); i = 0;
while(d>0, c += Mod(d,2)*x^i; d = floor(d/2); i++); \
return(c);
574 Computational Number Theory

poly2int(c) = \
local(d,i); \
d = 0; \
for (i=0,poldegree(c), \
if (polcoeff(c,i)==Mod(1,2), d += 2^i); \
); \
return(d);

DLPLSM2n(f,m,xi) = \
local(n,f1,nu,epsilon,B1,t,A,c1,c2,d1,d2,T,i,j,u,v,w); \
n = poldegree(f); f1 = f + Mod(1,2) * x^n; \
nu = ceil(n/2); epsilon = 2*nu - n; \
B1 = getFB1(m); t = matsize(B1)[2]; \
A = vector(2^m); \
for (d1=0,2^m-1, \
c1 = int2poly(d1); \
for(d2=0,d1-1, A[d2+1]=-1); \
for(d2=d1,2^m-1, \
c2 = int2poly(d2); \
T = Mod(1,2)*x^epsilon*f1 + (c1+c2)*x^nu + c1*c2; \
A[d2+1] = poldegree(T); \
); \
for (i=1,t, \
u = B1[i]; \
v = Mod(1,2)*x^nu+c1; w = Mod(1,2)*x^epsilon*f1+x^nu*c1; \
if (poldegree(gcd(v,u)) > 0, \
if (poldegree(gcd(w,u)) > 0, \
for(j=d1,2^m-1, A[j+1] -= poldegree(u)); \
); \
, \
c = lift(Mod(w,u)/Mod(v,u)); \
for (i=0, 2^(m-poldegree(u))-1, \
j = poly2int(int2poly(i)*u+c); \
if (j >= d1, A[j+1] -= poldegree(u)); \
); \
); \
); \
for (d2=d1,2^m-1, \
if(A[d2+1] <= xi*m, \
c2 = int2poly(d2); \
print1("c1 = ", lift(c1), ", c2 = ", lift(c2)); \
T = Mod(1,2)*x^epsilon*f1 + (c1+c2)*x^nu + c1*c2; \
if (poldegree(T) > 0, \
T = factor(T); \
if (poldegree(T[matsize(T)[1],1])>m, print1(" false alarm")); \
); \
print(""); \
); \
); \
);

f = Mod(1,2)*x^17 + Mod(1,2)*x^3 + Mod(1,2);


g = Mod(1,2)*x^7 + Mod(1,2)*x^5 + Mod(1,2)*x^3 + Mod(1,2);
DLPLSM2n(f,4,1.5);
Solutions to Selected Exercises 575

Chapter 8 Large Sparse Linear Systems


3. Since we now handle blocks of size ν (like 32 or 64), we can find multiple null-
space vectors for A using a single execution of the Lanczos method. Assume
that A is a symmetric n × n matrix over F2 . We find a random n × ν matrix
Y over F2 , compute B = AY, and solve for an n × ν matrix X satisfying

AX = B.

This essentially means that we are solving ν systems Ax = b simultaneously.


All the formulas for the block Lanczos method can be rephrased in terms of
this block solution. In particular, the final solution is given by
s−1
X
X= Wj (Wjt AWj )−1 Wjt B,
j=0

where Wj are determined exactly as in the block Lanczos method presented


in Section 8.4.1. The ν columns of X − Y are random vectors in the null-space
of A. Any F2 -linear combination of these column vectors gives us a solution
of the homogeneous system Ax = 0.
5. For simplicity, let us remove the subscript i. Given an n × ν matrix V , we plan
to compute a selection matrix S for which W = V S consists of a maximal set
of A-invertible columns of V . The matrix A is assumed to be symmetric.
First, notice that W tAW = S t QS, where Q = V tAV is a symmetric ν × ν
matrix. Let r = Rank(Q). By standard linear-algebra techniques (like conver-
sion to row echelon form), we determine a set of r linearly independent rows of
Q. The selection matrix S is so constructed as to take the r columns of Q with
the same indices as the rows just chosen. Let Q11 denote the r × r symmetric
submatrix of Q obtained by throwing away all the rows and columns other
than those chosen above.
For proving that Rank(Q) = Rank(Q11 ), we rearrange Q, if necessary, to
assume that µ the first r rows
¶ and the first r columns of Q are chosen as Q11 . We
Q11 Q12
write Q = , where Q22 is a symmetric (ν − r) × (ν − r) matrix,
Q21 Q22
t
and Q12 = Q21 . Since the lastµ ν − r¶columns
µ of¶Q are linearly dependent upon
Q12 Q11
the first r columns, we have = T for an r × (ν − r) matrix T .
Q22 Q21
t
Therefore, Q12 = Q11 T and Q22 = Q21 T = Q12 T = (Q11 T ) t T = T t Q11 T , so
µ ¶µ ¶µ ¶
Ir 0 Q11 0 Ir T
Q= ,
T t Iν−r 0 0 0 Iν−r

that is, Rank(Q) = Rank(Q11 ). Finally, note that W tAW = S t QS is the


same as Q11 chosen above.
576 Computational Number Theory

8. Fix any i ∈ {1, 2, 3, . . . , n}. Consider the following i × i matrix formed by the
vectors z(j) for j = 1, 2, . . . , i:
 (2) (3) (i) 
1 z1 z1 · · · z1
(3) (i)
0
 1 z2 · · · z2  
(i) 
Z (i) =  0

0 1 · · · z3  .
. .. .. .. 
. 
. . . ··· .
0 0 0 ··· 1
By Eqn (8.54), we have
 
ǫ1 0 0 ··· 0
 ∗ ǫ2 0 ··· 0
 
T (i) Z (i) =  ∗ ∗ ǫ3 ··· 0 ,
 . .. .. .. 
 .. . . ··· .
∗ ∗ ∗ · · · ǫi
Qi
and, therefore, det T (i) det Z (i) = j=1 ǫj . But det Z (i) = 1, so
i
Y
det T (i) = ǫj .
j=1

Levinson’s algorithm succeeds if and only if all ǫ(i) are non-zero. This is equiv-
alent to the condition that all T (i) are invertible.
9. First, consider the homogeneous system T x = 0. Take a random n-dimensional
vector y, and compute u = T y. Denote the columns of T as ci for i =
t
1, 2, . . . , n. Also, let y = ( y1 y2 · · · yn ) . We can write u = y1 c1 + y2 c2 +
· · · + yn cn . By hypothesis, the columns cr+1 , cr+2 , . . . , cn can be written as
linear combinations of the columns c1 , c2 , . . . , cr . Therefore, we can rewrite

u = T y = z1 c1 + z2 c2 + · · · + zr cr = T z,
t
where z = ( z1 z2 · · · zr 0 0 · · · 0 ) for some z1 , z2 , . . . , zr . But
then, T (z − y) = 0, that is, z − y is a solution of T x = 0.
Now, consider a non-homogeneous system T x = b. For a randomly chosen
y, set u = T y + b. If T x = b is solvable, u is T times a vector. As in the last
t
paragraph, we can then find z = ( z1 z2 · · · zr 0 0 · · · 0 ) with the
bottom n − r entries zero, such that u = T z. But then, T (z − y) = b, that is,
z − y is a solution of T x = b.
To determine z1 , z2 , . . . , zr , note that T (r) z′ = u′ , where u′ and z′ are the
r-dimensional vectors consisting of the top r entries of u and z, respectively.
But T (r) is a Toeplitz matrix. Moreover, each T (i) is given to be invertible for
i = 1, 2, . . . , r. Therefore, Levinson’s algorithm can be used to compute z′ .
10. Let N = n/ν, that is, we solve an N × N system of blocks. Denote the (i, j)-th
block by Ti−j . Let T (i) denote the i×i block matrix (i is the number of blocks)
Solutions to Selected Exercises 577

at the top left corner of the block Toeplitz matrix T . Also, let B (i) denote the
subvector of b consisting of the top νi entries. We iteratively compute X (i) ,
Y (i) and Z (i) satisfying
µ ¶ µ ¶
(i) (i) (i) (i) (i) ǫ(i) (i) (i) 0(i−1)×ν
T X =B , T Y = and T Z = .
0(i−1)×ν ǫ′(i)

Here, X (i) is a vector with νi entries, whereas Y (i) and Z (i) are νi×ν matrices,
and ǫ(i) and ǫ′(i) are ν × ν matrices to be chosen later. Initialization goes as:
X (1) = T0−1 B1 , Y (1) = Z (1) = Iν , ǫ(1) = ǫ′(1) = T0 .
For i = 1, 2, . . . , N − 1, we iteratively proceed as follows. We have
   (i) (i+1) 
µ (i) ¶ ǫ(i) µ ¶ −ǫ ζ
Y 0ν×ν
T (i+1) =  0(i−1)ν×ν  and T (i+1) =  0(i−1)ν×ν ,
0ν×ν ′(i) (i+1) Z (i)
−ǫ ξ ǫ′(i)
where
³ ´−1
ξ (i+1) = − ǫ′(i) ( Ti Ti−1 · · · T1 ) Y (i) and
³ ´−1
ζ (i+1) = − ǫ(i) ( T−1 T−2 · · · T−i ) Z (i) .

In order to update Y and Z as


µ (i) ¶ µ ¶ µ ¶ µ (i) ¶
(i+1) Y 0ν×ν (i+1) (i+1) 0ν×ν Y
Y = + ξ and Z = + ζ (i+1) ,
0ν×ν Z (i) Z (i) 0ν×ν
we need to take
ǫ(i+1) = ǫ(i) (Iν×ν − ζ (i+1) ξ (i+1) ) and ǫ′(i+1) = ǫ′(i) (Iν×ν − ξ (i+1) ζ (i+1) ).
Finally, we update the solution as
µ (i) ¶ ³ ´−1
(i+1) X
X = + Z (i+1) ǫ′(i+1) (B (i+1) − η (i+1) ),

where
η (i+1) = ( Ti Ti−1 ··· T1 ) X (i) .

12. (a) Without loss of generality, we may assume that A1 , A2 , . . . , Ar are lin-
early independent, and the remaining columns belong to the subspace V gen-
erated by the first rPcolumns. Let (c1 , c2 , . . . , cn ) be a linear dependency of the
n
columns. We have i=r+1 ci Ai ∈ V , that is, there is a unique way of express-
ing this vector as a linear combination of A1 , A2 , . . . , Ar (these vectors form
a basis of V ). This means that given any cr+1 , cr+2 , . . . , cn in a linear depen-
dency, the coefficients c1 , c2 , . . . , cr are fixed. Moreover, all of cr+1 , cr+2 , . . . , cn
cannot be zero in a linear dependency, since that implies that A1 , A2 , . . . , Ar
are not linearly independent.
578 Computational Number Theory

(b) We have E(l) + 1 = E(q d ) > q E(d) (since arithmetic mean of a non-
negative-valued random variable is no less than its geometric mean). There-
fore, E(d) 6 logq (E(l) + 1). Finally, note that d + r = n (a constant), so
E(r) > n − logq (E(l) + 1).
(c) Fix a non-zero tuple (c1 , c2 , . . . , cn ) ∈ Fnq with exactly k non-zero entries.
We first determine the probability that this is a linear dependency of the
columns of A. Let ci1 , ci2 , . . . , cik be the non-zero entries in the tuple. Now,
ci1 Ai1 + ci2 Ai2 + · · · + cik Aik = 0 gives n equations from the n rows, each of
the form ci1 a1 + ci2 a2 + · · · + cik ak = 0, where aj are matrix entries following
the given probability distribution. If Pk denotes the probability that such a
linear combination of the scalars is zero, then the above linear combination
of the columns is zero with probability Pkn . Now, a tuple (c1 , c2 , . . . , cn ) ∈ Fnq
¡ ¢
with exactly k non-zero entries can be chosen in nk (q − 1)k ways. Finally, we
also vary k to obtain
n µ ¶
X n
E(l) = (q − 1)k Pkn .
k
k=1

Determination of Pk depends upon the probability distribution of the ma-


trix entries. For a case study, look at: Johannes Blömer, Richard Karp and
Emo Welzl, The rank of sparse random matrices over finite fields, Random
Structures and Algorithms, Vol. 10, 407–419, 1997.

Chapter 9 Public-Key Cryptography


1. (d) Let H ′ be an n-bit½ cryptographic hash function. Define an (n+1)-bit hash
0x if x is of bit length n,
function as H(x) = For half of the (n + 1)-
1H ′ (x) otherwise.
bit strings (those starting with 0), it is easy to find a preimage. However, it
is difficult to locate a second preimage of a hash value h. This is because if h
starts with 0, it does not have a second preimage at all. On the other hand,
if h starts with 1, it is difficult to find a second preimage under H ′ .
5. We have c1 ≡ me1 (mod n) and c2 ≡ me2 (mod n), so c1 c2 ≡ (m1 m2 )e (mod n).
Suppose that an adversary Eve wants the plaintext m1 corresponding to
the ciphertext c1 . Bob is willing to decrypt ciphertext messages to Eve, pro-
vided that the ciphertext message is not c1 (or a simple function of it, like
±c±11 (mod n)). Eve generates a random m2 , and encrypts it with Bob’s pub-
lic key to get c2 . Eve asks Bob to decrypt c1 c2 (mod n) which conceals c1 by
the random mask c2 . When Bob presents the corresponding plaintext message
m1 m2 , Eve retrieves m1 by multiplying this with m−1 2 (mod n).
One way to prevent such an attack is to adopt a convention on the struc-
ture of plaintext messages. Since Eve has no control over m1 m2 , it is with
Solutions to Selected Exercises 579

high probability that m1 m2 would not satisfy the structural constraints, and
the decryption request is, therefore, rejected by Bob. A possible structural
constraint is to use some parity-check bits in a message. If the number of
such bits is > 100, then a random m1 m2 passes all these parity checks with a
probability of only 1/2100 .
6. We have ci ≡ me (mod ni ) for i = 1, 2, . . . , e. By the CRT, one computes
c (mod n1 n2 · · · ne ) such that c ≡ ci (mod ni ) for all i = 1, 2, . . . , e. Since
me < n1 n2 . . . ne , we have c = me (as integers), that is, m can be obtained by
taking the integer e-th root of c.
In order to avoid this (and many other) attacks, it is often advisable to
append some random bits to every plaintext being encrypted. This process is
called salting. Only a few random bits dramatically reduce the practicality of
mounting the above attack, even when e is as small as three.
11. Let n = ord g = uv with u small (may be prime). An active eavesdropper Eve
mounts the following attack.

Alice selects d ∈ {2, 3, . . . , n − 1} and sends g d to Bob.


Eve intercepts g d , computes and sends to Bob the element (g d )v .

Bob selects d′ ∈ {2, 3, . . . , n − 1} and sends g d to Alice.
′ ′
Eve intercepts g d , computes and sends to Bob the element (g d )v .

Alice and Bob compute the shared secret g vdd .

The order of the shared secret g vdd divides u and is small. That is, there
are only a few possibilities for the shared secret. If Alice and Bob make a
future communication using the shared secret, the correct possibility can be
worked out by Eve by an exhaustive search.
This attack can be avoided if n = ord g does not have any small divisor.
For example, we choose a suitably large finite field Fq with |Fq | having a prime
factor n of bit size > 160. Although q may be so large that q − 1 cannot be
(completely) factored in feasible time, a knowledge of n alone suffices. In that
case, we work in the subgroup of F∗q of order n.
16. Let (s, t) be the ElGamal signature on the message representative m, that is,

s ≡ g d (mod p) (for some d′ ) and t ≡ (m − ds)d′−1 (mod p − 1). Take any

k coprime to p − 1, and let s′ ≡ g kd ≡ sk (mod p) and t′ ≡ (sk−1 k −1 )t ≡
k−1
(ms − ds )(kd ) (mod p − 1), that is, (s′ , t′ ) is again a valid signature on
′ ′ −1

the message representative msk−1 .


20. The ECDSA signature on M with m = H(M ) is (s, t), where S = d′ G (for a
random d′ ), s ≡ x(S) (mod n), and t ≡ (m + ds)d′−1 (mod n). If we choose
n − d′ in place of d′ , we get S ′ = −d′ G = −S, but x(S ′ ) = x(S), whereas
(m + ds)(n − d′ )−1 ≡ −(m + ds)d′−1 (mod n), that is, (s, n − t) is again a
valid signature on M .
Now, assume that n < q (this may happen even if the cofactor h is 1).
This means that a given s may have multiple representatives in Fq , and each
representative produces two signatures with the same s.
580 Computational Number Theory

22. (b) The verification of the i-th DSA signature (si , ti ) involves computing wi ≡
t−1
i (mod q), ui ≡ mi wi (mod q) and vi ≡ si wi (mod q), and checking whether
si ≡ (g ui y vi (mod p)) (mod q). Multiplying the k verification equations gives
k
Y µµ Pk ¶ µ Pk ¶ ¶
si ≡ g i=1 ui y i=1 vi (mod p) (mod q).
i=1

This is the equation for the batch verification of the k DSA signatures. The
sums of ui and of vi should be computed modulo q.
Modular exponentiations are the most expensive operations in DSA ver-
ification. The number of modular exponentiations drops from 2k (individual
verification) to 2 (batch verification). So the speedup achieved is about k.
It is evident that individual signatures may be faulty, but their product
qualifies for being verified as a batch. For randomly chosen messages and
signatures, the probability of such an occurrence is, however, quite low.
(c) In Part (b), we assumed that y remains constant for all signatures. If not,
we should modify the batch-verification criterion to
k
Y µµ Pk ¶ ¶
si ≡ g i=1 ui y1v1 y2v2 · · · ykvk (mod p) (mod q).
i=1

This requires k + 1 exponentiations, and the speedup is about 2k/(k + 1) 6 2.


23. The following signature scheme is derived from a zero-knowledge authentica-
tion scheme. We suppose that Alice wants to sign a message M . We use the
symbol || to represent concatenation of strings.

Alice generates a random commitment t and a witness w to t.


Alice uses a hash function H to generate the challenge c = H(M ||w) herself.
Alice’s signature on M is the pair (w, r), where r is Alice’s response on c.
A verifier checks whether r is consistent with the challenge c = H(M ||w).

26. For a valid encrypted message (U, V, W ), we have U + tV = kPBob P + kY1 +


tkY2 = kPBob P + ks1 P + tks2 P = k(PBob + s1 + ts2 )P , that is, e(U + tV, K) =
e(k(PBob +s1 +ts2 )P, (PBob +s1 +ts2 )−1 P ) = e(kP, P ) = e(P, P )k . Therefore,
M is retrieved by Bob as W/e(U + tV, K).
Pk Pk
28. Alice computes fB = i=0 PBi Vi and fC = i
i=0 PC Vi . We have fB =
Pk i
P k i
i=0 PB di P = ( i=0 di PB )P = f (PB )P = DB P . Likewise, fC = DC P .
Therefore, Alice obtains e(fB , fC )DA = e(DB P, DC P )DA = e(P, P )DA DB DC .
Bob and Carol can analogously compute this shared value.
In order to reconstruct the secret polynomial f (x) uniquely using interpo-
lation, we need f (Pi ) for k known values of Pi . If n is the number of entities
participating in the network, and if all of these entities disclose their private
keys to one another, then too the polynomial cannot be uniquely determined,
provided that k > n.
Solutions to Selected Exercises 581

30. Let (U, V ) be a valid signature of Bob on the message M . Then,


V = (t + H2 (M, U ))DBob = s(t + H2 (M, U ))PBob
= s(tPBob + H2 (M, U )PBob ) = s(U + H2 (M, U )PBob ).
Therefore,
e(P, V ) = e(P, s(U + H2 (M, U )PBob )) = e(sP, U + H2 (M, U )PBob ),
that is, the signature is verified if and only if the following condition holds:
e(P, V ) = e(Ppub , U + H2 (M, U )PBob ).
If U + H2 (M, U )PBob = aP , the above condition is equivalent to checking
whether e(P, saP ) = e(sP, aP ). Under the gap Diffie–Hellman assumption, it
is easy to verify a signature. However, generating a signature involves solving
an instance of the (computational) bilinear Diffie–Hellman problem.
Each Cha–Cheon signature generation requires two point multiplications
in G, whereas each Paterson signature generation requires three such point
multiplications. Verification involves two pairing computations in the Cha–
Cheon scheme, and three pairing computations and two field exponentiations
in the Paterson scheme. It follows that the Cha–Cheon scheme is considerably
more efficient than the Paterson scheme.
32. Since g = e(P, Q) = e((d1 + d2 t + m)−1 P, (d1 + d2 t + m)Q) = e(σ, d1 Q +
td2 Q + mQ), the signature is verified if and only if we have
e(σ, Y1 + tY2 + mQ) = g.
This verification procedure involves two point multiplications (and two addi-
tions) in G2 and only one pairing computation. On the contrary, BLS ver-
ification involves two pairing computations, and would be slower than this
Boneh–Boyen verification if one pairing computation takes more time than
two point multiplications.
38. We assume that all ECDSA system parameters are fixed beforehand. The
following functions sign and verify messages given these parameters and the
appropriate keys as arguments.

ECDSAsgn(m,d,E,p,n,G) = \
local(d1,S,s,t); \
d1 = 2 + random(n-2); \
S = ellpow(E,Mod(G,p),d1); \
s = lift(S[1]) % n; \
t = lift(Mod(m + d * s, n) / Mod(d1, n)); \
return([s,lift(t)]);

ECDSAvrf(m,S,Y,E,p,n,G) = \
local(w,u1,u2,V,v); \
w = Mod(S[2],n)^(-1); u1 = Mod(m,n) * Mod(w,n); u2 = Mod(S[1],n) * w; \
V = elladd(E,ellpow(E,Mod(G,p),lift(u1)),ellpow(E,Y,lift(u2))); \
v = lift(V[1]) % n; \
if (v == S[1], return(1), return(0));
582 Computational Number Theory

Use of these functions is now demonstrated for the data in Example 9.8.

p = 997;
E = ellinit([Mod(0,p),Mod(0,p),Mod(0,p),Mod(3,p),Mod(6,p)]);
n = 149;
G = [246,540];
d = 73;
Y = ellpow(E,G,d); print("Y = ",lift(Y));
m = 123;
S = ECDSAsgn(m,d,E,p,n,G); print("S = ", S);
ECDSAvrf(m,S,Y,E,p,n,G)
ECDSAvrf(m,[96,75],Y,E,p,n,G)
ECDSAvrf(m,[96,112],Y,E,p,n,G)

39. In this exercise, we assume that the parameters E, q, k, r, P, Q and the relevant
hash functions are defined globally. If symmetric pairing can be used, we invoke
the distorted Weil pairing dWeil(), otherwise we invoke Weil(). The four stages
of the Boneh–Franklin IBE scheme can be implemented as follows.

BF_TAkeys() = \
local(s,Ppub); \
s = 2 + random(r-2); \
Ppub = ellpow(E,P,s); \
return([s,lift(Ppub)]);

BFreg(s,PBob) = \
local(DBob); \
DBob = ellpow(E,PBob,s); \
return(lift(DBob));

BFenc(M,Ppub,PBob) = \
local(g,a,U,V); \
g = dWeil(PBob,Ppub); \
a = 2 + random(r-2);\
U = ellpow(E,P,a);\
h = lift(lift(g^a)); \
V = H2(h); \
V = bitxor(V,M); \
return([U,V]);

BFdec(C,DBob) = \
local(h,M); \
h = H2(lift(lift(dWeil(DBob,C[1])))); \
M = bitxor(C[2],h); \
return(M);
Applied Mathematics

COMPUTATIONAL NUMBER THEORY


DISCRETE MATHEMATICS AND ITS APPLICATIONS DISCRETE MATHEMATICS AND ITS APPLICATIONS
Series Editor KENNETH H. ROSEN Series Editor KENNETH H. ROSEN

Developed from the author’s popular graduate-level course,


Computational Number Theory presents a complete treatment of
COMPUTATIONAL
number-theoretic algorithms. Avoiding advanced algebra, this self-
contained text is designed for advanced undergraduate and beginning
graduate students in engineering. It is also suitable for researchers
new to the field and practitioners of cryptography in industry.
NUMBER THEORY
Requiring no prior experience with number theory or sophisticated
algebraic tools, the book covers many computational aspects of
number theory and highlights important and interesting engineering
applications. It first builds the foundation of computational number
theory by covering the arithmetic of integers and polynomials at a
very basic level. It then discusses elliptic curves, primality testing,
algorithms for integer factorization, computing discrete logarithms,
and methods for sparse linear systems. The text also shows how
number-theoretic tools are used in cryptography and cryptanalysis. A
dedicated chapter on the application of number theory in public-key
cryptography incorporates recent developments in pairing-based
cryptography.
With an emphasis on implementation issues, the book uses the freely
available number-theory calculator GP/PARI to demonstrate complex
arithmetic computations. The text includes numerous examples and
exercises throughout and omits lengthy proofs, making the material
accessible to students and practitioners. DAS

ABHIJIT DAS
K12950

K12950_Cover.indd 1 2/13/13 10:41 AM

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy