0% found this document useful (0 votes)
11 views44 pages

Final Project Report

This final year project report presents the design and implementation of Galois Field arithmetic circuits for cryptography applications using FPGA technology. It focuses on optimizing arithmetic operations related to elliptic curve cryptography to enhance security and efficiency in communication over the internet. The report includes mathematical background, design methodologies, and performance comparisons between prime and binary field implementations.

Uploaded by

benson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views44 pages

Final Project Report

This final year project report presents the design and implementation of Galois Field arithmetic circuits for cryptography applications using FPGA technology. It focuses on optimizing arithmetic operations related to elliptic curve cryptography to enhance security and efficiency in communication over the internet. The report includes mathematical background, design methodologies, and performance comparisons between prime and binary field implementations.

Uploaded by

benson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Institute of Electrical and Electronic Engineering

Fundamental Teaching Department


M’Hamed Bougara University of Boumerdes

Final Year Project Report Presented in Partial


Fulfillment of the Requirements of the Degree
Bachelor’s in Electrical Engineering

Title:
FPGA Design and Implementation of Galois
Field Arithmetic Circuits
for Cryptography Applications

Presented by:
Benslimane Mohammed el Bachir, Zorgani Merouane

Supervisor:
Prof. Khouas Abdelhakim
Abstract
The internet has become the dominant medium of virtually all communi-
cation and transactions between individuals and even large corporations and
governments. However, it remains vulnerable to hacker attacks that may ex-
pose our information to unauthorized parties. Cryptography deals with this
problem to ensure that exchanged information cannot be exposed. This field
exploits computationally expensive mathematical problems to secure commu-
nication over the internet. Elliptic curves (ECs) are one such mathematical
tool. Our cryptosystems need to be fast and secure, otherwise they would
be susceptible to hacker attacks. We adopt hardware implementation as a
solution to this problem. By implementing the low level arithmetic oper-
ations in hardware, we lay the foundation to a safe and efficient EC based
cryptosystem as an application to our work. Our work achieved an optimized
implementation of prime and binary field arithmetic circuits, and in this re-
port we present a comparison between the two fields in terms of hardware
resources and efficiency.

i
Acknowledgements
We express our appreciation to those who helped, in any way, in the com-
pletion of this project; our supervisor Prof. Khouas Abdelhakim, our friends,
and our families.

ii
Contents
List of Figures v

List of Tables vi

List of Acronyms vii

1 Mathematical background 1
1.1 Elliptic curves . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Modular arithmetic . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Galois fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Prime fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Binary fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Design 10
2.1 Prime field arithmetic circuits . . . . . . . . . . . . . . . . . . 10
2.1.1 Design of the prime field modulo adder . . . . . . . . . 11
2.1.2 Design of the prime field modulo subtractor . . . . . . 11
2.1.3 Design of the prime field modulo multiplier . . . . . . . 12
2.1.4 Design of the prime field modulo inverter . . . . . . . . 15
2.2 Binary field arithmetic circuits . . . . . . . . . . . . . . . . . . 21
2.2.1 Design of the binary field adder and subtractor . . . . 21
2.2.2 Design of the binary field multiplier . . . . . . . . . . . 21
2.2.3 Design of the binary field inverter . . . . . . . . . . . . 23

3 Implementation 25
3.1 Simulation of the prime field multiplier . . . . . . . . . . . . . 25
3.2 Simulation of the prime field inverter . . . . . . . . . . . . . . 26
3.3 Simulation of the binary field multiplier . . . . . . . . . . . . . 26
3.4 Simulation of the binary field inverter . . . . . . . . . . . . . . 27

iii
3.5 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

Conclusion and further work 31

Bibliography 31

Appendix 33
List of Figures
2.1 Top level diagram of the prime field adder . . . . . . . . . . . 11
2.2 Proposed hardware implementation of modulo adder . . . . . 12
2.3 Proposed hardware implementation of modulo subtractor . . . 12
2.4 Top level diagram of the prime field multiplication circuit . . . 12
2.5 Circuit diagram of the prime field multiplier . . . . . . . . . . 15
2.6 Top level diagram of the prime field inversion circuit . . . . . 15
2.7 State diagram of the FSM controller . . . . . . . . . . . . . . 18
2.8 Conceptual diagram of a modulo inverter . . . . . . . . . . . . 18
2.9 Circuit diagram for register U datapath . . . . . . . . . . . . . 19
2.10 Circuit diagram for register V datapath . . . . . . . . . . . . . 20
2.11 Circuit diagram for register X datapath . . . . . . . . . . . . . 20
2.12 Circuit diagram for register Y datapath . . . . . . . . . . . . . 21
2.13 Top level diagram of the binary field multiplication circuit . . 22
2.14 Top level diagram of the binary field inversion circuit . . . . . 23
2.15 Bitwise comparison of u ⊕ v under degree equality and inequality 24
3.1 Circuit diagram of the prime field multiplier . . . . . . . . . . 25
3.2 Simulation results of the inversion for inputs a = 45 and p = 103 26
3.3 Simulation results of the binary multiplication circuit . . . . . 27
3.4 Simulation results of the binary inversion simulation . . . . . . 27
3.5 Graph showing maximum operating frequency vs operand size
for prime field multiplier circuit . . . . . . . . . . . . . . . . . 29
3.6 Graph showing LUT consumption vs operand size for prime
field multiplier circuit . . . . . . . . . . . . . . . . . . . . . . 30

v
List of Tables
1.1 Addition in GF(7) . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 multiplication in GF(7) . . . . . . . . . . . . . . . . . . . . . . 5
2.1 An example multiplication using algorithm 2.1.3 . . . . . . . . 14
2.2 Table of values for u, v, x, and y . . . . . . . . . . . . . . . . 17
3.1 Prime field multiplier performance and used resources . . . . . 28
3.2 Binary field multiplier performance and used resources . . . . 28
3.3 Prime field inversion performance and resource usage . . . . . 28
3.4 Binary field inversion performance and resource usage . . . . . 28

vi
List of acronyms
• CPU: Central Processing Unit

• AES: Advanced Encryption Standard

• EC: Elliptic Curve

• ECC: Elliptic Curve Cryptography

• FPGA: Field Programmable Gate Array

• FSM: Finite State Machine

• GF: Galois Field

• LUT: Look-Up Table

• RSA: Rivest-Shamir-Adleman

• RTL: Register Transfer Level

• VHDL: Very-high-speed-integrated-circuit Hardware Description Lan-


guage

vii
Introduction
Cryptography is the process of shielding information used in communica-
tion such that no party except the sender and receiver is able to read or
process it. Throughout history, several techniques and methods have been
utilized for this purpose. From primitive encryption schemes, such as the Cae-
sar cipher, to modern algorithms like AES(Advanced Encryption Standard)
and RSA(after its inventors Ron Rivest, Shi Adleman, and Adi Shamir), the
aim of cryptography has always been to ensure secure communication be-
tween two parties. In today’s computer-dominated world, millions of devices
are exchanging critical data all the time. Each second, thousands of gigabytes
are being passed around. And so with malevolent hackers peering from every
corner, the need for fast, efficient and secure cryptosystems is as strong as
ever. To tackle the problem of building cryptosystems that are up to this
task, we need to define what a cryptosystem does. A cryptosystem takes the
sender’s text (known as plaintext) and produces an unreadable text (known
as ciphertext). The latter is sent over an insecure channel, where it’s suscep-
tible to being intercepted by unwanted third parties, to the receiving end,
where it is to be decrypted back into the plaintext.
For this system to be of any use at all, only the sender and receiver must
have the ability of decryption. A cryptosystem achieves this is by using al-
gorithms that somehow enable the sender and receiver to share a piece of
information known as the key without which decryption is extremely diffi-
cult. To ensure the security of the cryptosystem as a whole, these algorithms
are based on mathematical problems that are computationally expensive and
require a great deal of time to reverse without the key. Our work will deal
with one such mathematical tool. A cryptosystem does not only need to be
secure, it needs to be fast and efficient as well. In fact this reinforces its secu-
rity. Therefore, for a lot of applications an implementation of such algorithms
on a general purpose CPU(Central Processing Unit) won’t cut it. This mo-
tivates the use of custom hardware circuits that are designed and optimized

viii
for these specific tasks. Indeed, our work is centered around implementing
operations necessary for one of the standard encryption schemes which use
a mathematical problem associated with a specific type of curves called El-
liptic Curves. Defining operations on these curves turns out to produce a
discrete logarithm problem1 , which is at the core of one of the fundamental
key generation algorithms that underpin the realization of secure communi-
cation over the Internet. Our work proposes an optimized implementation of
the arithmetic operations associated with elliptic curves both over binary and
prime fields (these will be explained later). This report is organized as fol-
lows: in chapter 1 we go through an explanation of the math that is relevent
to our work, namely, modular arithmetic, Galois fields, prime and binary
field arithmetic. Then in chapter 2 we present and explain the algorithms
and the designs of the basic arithmetic blocks. In chapter 3 we inspect the
performance of the presented designs and compare between binary field and
prime field circuit performance.

1
The Discrete logarithm problem is the problem of finding x such that bx = a, with a and b known. In
many definitions of this exponentiation operation, it is extremely difficult to find x.

ix
Chapter 1

Mathematical background
To understand how a cryptosystem permits the communicating parties to
possess the same key without directly transmitting it over the internet, which
is one of the fundamental tasks of a cryptosystem, we first need to entertain
the notion of operations. Think of an operation as a function that takes in
two numbers and outputs a number. The operation of addition takes in 9 and
5 and returns 14. If it were multiplication, the result would be 45. Curiously,
there exists an operation that takes in 9 and 5 and produces 2; Grab a watch,
start at the number 9, and move 7 hours ahead. You will land at the number
2. If you input 12 and 39 into this operation, it yields 3, since 12 goes into
39 four times before leaving a remainder of 2. This is one example of an
operation alternative to what’s typically taught in primary school and it’s
called addition modulo 12.
Cryptosystems make use of operations with certain properties to actualize
the following scheme; Alice, the sender, and Bob, the receiver, agree publicly
on an integer number g. Then each party generates their own secret number
a and b for Alice and Bob respectively. What happens now is, using an
operation denoted ⋆, Alice transmits, not a, but g ⋆ a. And Bob transmits,
not b, but g ⋆ b. This way, Alice can compute g ⋆ b ⋆a, and Bob can compute
|{z}
Bob’s
1
g ⋆ a ⋆b. This results in only Alice and Bob having the same key g ⋆ a ⋆ b,
|{z}
Alice’s
which they can use to communicate securely.
An astute reader should notice a problem in this scheme. If g and g ⋆ a
are both public knowledge, then it might be possible for unwanted third
1
Note that this means that in order to make this a joint key for the two parties, we need the operation ⋆
to yield (g ⋆ b) ⋆ a = (g ⋆ a) ⋆ b. For example, (3 × 5) × 7 = (3 × 7) × 5.

1
parties to derive a. The same thing goes for Bob’s secret number, b. If
g = 428, a = 43 and we make the bad choice of the addition operation for ⋆,
then it’s possible for a spy to derive a from g + a = 471 simply by performing
the inverse operation of addition, subtraction; 471−g = 43 = a. This renders
the whole cryptosystem pointless. It is this reason that underpins the need
for operations hard to reverse. An operation ⋆ such that if I give you g and
g ⋆ x, it would be extremely difficult for anyone to deduce x through any
means. This is precisely the discrete logarithm problem, and elliptic curves
provide exactly that.

1.1 Elliptic curves


Generally, An elliptic curve over R is the set of all points (x, y) ∈ R that
satisfy y 2 = x3 + ax + b Where a, b ∈ R
Looking at a plot alone doesn’t offer much information into how elliptic
curves are used to do cryptography. The first insight is that, unlike the ex-
amples provided in the previous section, elliptic curve cryptography (ECC)
does not perform math operations on integers. Instead, ECC performs oper-
ations by combining points on an elliptic curve. Without going into detail,
there is a way to add two points on an elliptic curve, P and Q, to produce
another point on the elliptic curve, R = P + Q. We can sensibly add a
point to itself, P + P , and get a point on the elliptic curve still. This latter
operation is often termed point doubling, and it paves the way to define the
operation called point multiplication; this is the multiplication of a point P
by an integer n 2 , and it’s thought of as nP = P
| + P {z+ ... + P} = T . It is this
n times
operation, point multiplication, that grants us a discrete logarithm problem.
That is, if I give you P and T = nP , it can be extremely difficult to find
n, which is why it can serve as someone’s private key. Elliptic curve point
multiplication, therefore, enables cryptosystems to implement the algorithm
of key generation. However, it’s worth noting that elliptic curves adopted in
cryptographic applications look nothing like that of figure ??. The values of
x and y with which we work do not belong to the set of real numbers, R.
Instead, they belong to a special set, equipped with elements and operations
alternative to what we traditionally work with in the real numbers. This is
2
In reality, n can not be any integer. n must be greater than zero and smaller than a number we call the
order of the elliptic curve.

2
the second insight, and it sets the stage for introducing the mathematical
prerequisites relevant to this work. This will be done in the next sections.

1.2 Modular arithmetic


The importance of studying mathematical operations should now be ev-
ident to the reader. We start by explaining simple operations involved in
what’s known as modular arithmetic, for they constitute the fundamental
building block with which defining elliptic curve operations is made possible.
Modular addition is one such operation, and an example of which has already
been discussed. Namely, addition modulo 12; the addition of numbers on a
clock. However, simply asking the following question can open the door to
developing very productive mathematics; what if clocks were split into, not
the arbitrarily chosen number 12, but any other arbitrary integer n? On a
clock split into 7 units from 0 to 6, where would one land if one starts at 5
and moves 17 units ahead? Again, we invoke the same reasoning; starting
at 0, moving 5 then 17 units ahead results in a total of 23 units, and 7 goes
into 23 three times before landing at the remainder 2. This is called addition
modulo 7. The next definition shortcuts these logical steps into the operation
we call modular addition.
Definition 1. Let a, b be integers, and n a non zero integer. Then a+b mod n
equals the remainder of a + b upon division by n.
Example. • 12 + 12 mod 13 = 24 mod 13 = 11
• (9 − 4) mod 3 = 2
• (3 + 4) mod 7 = 7 mod 7 = 0
Modular multiplication is also defined in a similar fashion;
Definition 2. Let a, b be integers, and n a non zero integer. Then a×b mod n
is the remainder of a × b upon division by n
Example. • 3 × 5 mod 7 = 1
• 3 × 5 mod 5 = 0
• 9 × 7 mod 12 = 3
Note that in the first example, the result is unity. We say that 3 is the
(multiplicative) inverse of 5 in the integers mod 7.

3
1.3 Galois fields
The mathematical operations we use in cryptography must have a specific
set of properties and constraints. A good demonstration of this statement
is the concept of a field. A field is basically a set equipped with two op-
erations, which we call addition, denoted +, and multiplication, denoted ·,
where these operations ”behave nicely”. For example, both operations must
be commutative. The precise definition of a field is provided in the appendix.
Fields of an infinite number of elements do not constitute much of a
concern in this work. We deal only with the so called finite fields or Galois
fields.
Definition 3. A finite field or a galois field is a field with a finite number of
elements (finite order).
For example, there exists a field with 7 elements. If we represent each
element by a number;
{0, 1, 2, 3, 4, 5, 6}
We can, in fact, use addition modulo 7 and multiplication modulo 7 as the
operations associated with this Galois field. Table 1.1 shows the results of
performing the addition operation on any two elements of the field.

+ 0 1 2 3 4 5 6
0 0 1 2 3 4 5 6
1 1 2 3 4 5 6 0
2 2 3 4 5 6 0 1
3 3 4 5 6 0 1 2
4 4 5 6 0 1 2 3
5 5 6 0 1 2 3 4
6 6 0 1 2 3 4 5

Table 1.1: Addition in GF(7)

Note that every row contains a zero. In a field, any element x has what
we call an additive inverse y, often denoted −x, where x + y = 0.
Table 1.2 describes the multiplication result of every two elements in this
field.

4
× 0 1 2 3 4 5 6
0 0 0 0 0 0 0 0
1 0 1 2 3 4 5 6
2 0 2 4 6 1 3 5
3 0 3 6 2 5 1 4
4 0 4 1 5 2 6 3
5 0 5 3 1 6 4 2
6 0 6 5 4 3 2 1

Table 1.2: multiplication in GF(7)


Note that almost every row contains a one. In a field, every non zero
element x has what we call a multiplicative inverse y, where x · y = 1. For
example, the multiplicative inverse of 3 is 5 since 3 · 5 mod 7 = 1. At this
point, two important facts need to be considered in order to proceed further.
The first is the following theorem;
Theorem 4. [5] A field with a finite order m only exists if m is a prime
power, i.e., m = pn , for some positive integer n and prime integer p. p is
called the characteristic of the finite field.
So there does exist a finite field with order 256 = 28 , or 71 , but there is
no finite field with an order of 12 since it’s not a prime power. This leads to
the notion of a prime field.

1.4 Prime fields


By letting n = 1, theorem 4 implies that there exist fields with a prime
order. Such fields are particularly relevant to this work that we give them a
name; prime fields.
Definition 5. A prime field is a field with a prime number of elements, often
denoted GF (p), where p is its order.
The second fact is that arithmetic in a prime field is done modulo p. If
the order were not prime, the addition and multiplication operations on the
elements of the Galois field cannot be represented by modular addition and
multiplication.

5
Definition 6. Let a and b be elements of the prime field GF (p). Their sum,
c, which belongs to GF (p), is then computed as follows: c = a + b mod p
Subtraction is defined in a similar fashion;
Definition 7. Let a and b be elements of the prime field GF (p). Their
difference, d, which belongs to GF (p), is then computed as follows: d =
a − b mod p
Example. The addition of 2 and 6 in GF (7) = {0, 1, 2, 3, 4, 5, 6} yields 1
since 2 + 6 mod 7 = 1.
Subtracting 6 from 4 in GF (7) yields 5 since 4 − 6 mod 7 = 5.
Multiplication is, unsurprisingly, also defined in a similar fashion;
Definition 8. Let a and b be elements of the prime field GF (p). Their
product, m, which belongs to GF (p), is then computed as follows: m = a ×
b mod p
Example. Multiplication of 4 and 9 in GF (11) yields 3; 4 × 9 mod 11 =
36 mod 11 = 3
Naturally, the next operation to consider is the inverse operation of prime
field multiplication; inversion. Essentially, it is the operation that returns the
multiplicative inverse of a number modulo p.
Definition 9. Let a be an element of GF (p). The multiplicative inverse of
a, denoted a−1 , is the element of GF (p) that satisfies: a × a−1 mod p = 1
Example. The multiplicative inverse of 9 in GF (11) is 5 , since 9 × 5 mod
11 = 45 mod 11 = 1
The smallest Galois field is GF (2) = {0, 1}, which is a prime field . The
addition and multiplication are, of course, done modulo 2, as demonstrated
by the table below.
addition multiplication
+ 0 1 × 0 1
0 0 1 0 0 0
1 1 0 1 0 1
Notice that in this field, addition and subtraction are the same operation;
each element is its own inverse. The reader might find it interesting that

6
addition and multiplication operations in GF (2) are no more than XOR and
AND gates respectively.
Note that finding the multiplicative inverse of an element in GF (p) is not as
straight forward as the other operations in the field.

1.5 Binary fields


In addition to prime fields, another kind of finite fields is prevalent in
cryptography.

Definition 10. A binary field is a Galois field with an order of elements


equal to a power of two; GF (2m ) where m is a positive integer.

Binary fields with m > 1 are often called Extension fields. It’s not hard
to see why binary fields could offer convenient characteristics. For m = 8, for
example, each field element could be represented by one byte.

Since the order is not prime, we cannot use modular arithmetic to handle
the operations of binary fields. How, then, does one go about performing
addition and multiplication on the elements of binary fields? The answer lies
in the way we represent these elements.
Every element in a binary field GF (2m ) can be represented by a polynomial
with binary coefficients. Each polynomial has a degree of m−1, which counts
to m coefficients for each element. For example, elements in GF (28 ) have the
form
A(x) = a7 x7 + ... + a1 x + a0
Where ai ∈ GF (2) = {0, 1}
Addition in binary fields is the simple classical polynomial addition, where
we add corresponding coefficients in GF (2). It is therefore a bit wise to think
of addition (or subtraction) in GF (2m ) as simply a bitwise XOR function.

Definition 11. Let A(x), B(x) be elements of GF (2m ). Their sum (or differ-
Pm−1are ithen computed according to; C(x) = A(x) + B(x) = A(x) − B(x) =
ence)
i=0 ci x , ci = ai + bi mod 2 = ai − bi mod 2

Example. Let A(x) = x7 + x6 + x4 + x2 + 1 and B(x) = x7 + x6 + x4 + x2 + 1


be elements of GF (28 ). Their sum, C(x), is then:

7
A(x) = x7 + x6 + x4 + 1
4 2
B(x) = x +x +1
7 6
C(x) = x + x + x2
Subtraction yields the same result.

Naturally, the next question concerns multiplication in GF (2m ). Multi-


plying two polynomials of degree m − 1 results generally in a polynomial of
degree 2m − 2, an element not guaranteed to be in GF (2m ). To understand
how to multiply elements of GF (2m ), we shall introduce the concept of irre-
ducible polynomials.

Definition 12. A polynomial is irreducible over GF (2) if it cannot be factored


into the product of two non constant polynomials over GF (2).

Example. x2 + x + 1 is irreducible since it’s impossible to write it as (x +


a)(x+b) with a, b ∈ GF (2). On the other hand, x2 +1 is reducible over GF (2)
since it equals (x + 1)2 (Remember that multiplication is an AND operation).

Irreducible polynomials resemble prime numbers. In a prime field, we mul-


tiply two numbers classically, divide them by p, then consider the remainder
only. In a binary field, we multiply two polynomials classically, divide the
resulting polynomial (of degree 2m−2 at most) by an irreducible polynomial,
and consider the remainder. The resulting polynomial shall always be of de-
gree less than m. Therefore, we associate each Galois field GF (2m ) with an
irreducible polynomial of degree m. This would make possible the operation
of multiplication.

Definition 13. Let A(x), B(x) ∈ GF (2m ) and let


m
X
P (x) = pi x i , pi ∈ GF (2)
i=0

be an irreducible polynomial. Multiplication of the two elements A(x), B(x)


is performed as
C(x) = A(x) · B(x) mod P (x).

In other words, we find the product A(x) · B(x), then divide it by P (x),
and take the remainder. We consider an example.

8
Example. Let A(x) = x2 + 1 and B(x) = x2 + x + 1 be elements of GF (22 ),
with an irreducible polynomial P (x) = x4 + x3 + 1.
We shall perform the multiplication operation of the above elements A and
B. To do so, we first calculate the plain polynomial product A(x)B(x).

A(x)B(x) = (x2 + 1)(x2 + x + 1)


= x4 + x3 + x2 + x2 + x + 1
= x4 + x3 + x + 1

Now we need to reduce this result modulo P (x):

x4 + x3 + x + 1 = P (x) · 1 + x

Therefore, the result is x.

A · B = C
2 2
(x + 1) · (x + x + 1) = x
(0 1 0 1) · (0 1 1 1) = (0 0 1 0).
In regards to operations on binary fields, only inversion is left to discuss.

Definition 14. For a given binary field GF (2m ) and its associated irreducible
polynomial P (x), the inverse A−1 (x) of a non zero element A(x) ∈ GF (2m )
must satisfy
A(x) · A−1 (x) = 1 mod P (x)

Example 1. for the binary field GF (26 ) with the irreducible polynomial
P (x) = x6 + x5 + x2 + x + 1, the inverse of the element 110101 is 001001,
because;
(x5 + x4 + x2 + 1)(x3 + 1) mod P (x) = 1

Note, again, that finding the multiplicative inverse of an element in GF (2m )


is not as straight forward as the other operations in the field.
It turns out that finite field arithmetic finds extensive use in Elliptic Curve
Cryptography (ECC). For this purpose our work deals with the low level
operations involved therein.

9
Chapter 2

Design
In this chapter, we shall go through the design and implementation of the
basic arithmetic blocks of prime and binary fields. The approach we take is
as follows: We take a look at the algorithm or mathematical formalism of the
operations in question, then derive a circuit that implements it.
These algorithms are sequential in nature. That is; they are a series of op-
erations that are to be performed in a specific order with some depending
on others. To enforce this sequentialism in hardware, which is concurrent in
nature, we need to follow a specific methodology that leads to the realization
of a circuit that will faithfully follow the algorithm and yield the expected
results.
First of all, the variables in the algorithms are mapped to registers that take
their appropriate values from the datapath; a combinational circuit that out-
puts the register’s next value based on the algorithm’s instructions.
To enforce the order of the operations, we use a finite state machine (FSM)
that will act as the datapath controller.
This way we have circuitry that performs the data manipulation and calcu-
lations for our algorithm’s variables and an FSM that controls them. This is
known as the register transfer level methodology (RTL methodology [1]).

2.1 Prime field arithmetic circuits


This section concerns the design of prime field arithmetic circuits. Namely,
modulo adder, modulo subtractor, modulo multiplier and a modulo inverter.

10
2.1.1 Design of the prime field modulo adder

Figure 2.1: Top level diagram of the prime field adder

Our circuit will work with inputs that do not exceed the prime modulus.
This offers a simple way to implement modular addition; if a and b belong to
a prime field, then

a + b mod p = a + b if a + b ≤ p
(2.1)
a + b mod p = a + b − p if a + b > p

Taking a look at equation 2.1, we see that the result of modular addition is
nothing but a selection between two computed results based on a comparison
of their magnitudes. For this, we just need an adder to compute a + b, a
subtractor to take off p from that result, and a multiplexer that selects the
output of the subtractor if a + b > p, otherwise it selects the output of the
adder. The circuit diagram that performs the described operation is shown
in Figure 2.2

2.1.2 Design of the prime field modulo subtractor


The design of the subtractor closely follows that of the adder, except that,
when a and b belong to GF (p);

a − b mod p = a − b if a ≥ b
(2.2)
a − b mod p = a − b + p if a < b

Following a similar logic to the previous design, the modulo subtractor circuit
is therefore derived and is shown in figure 2.3.

11
Figure 2.2: Proposed hardware implementation of modulo adder

Figure 2.3: Proposed hardware implementation of modulo subtractor

2.1.3 Design of the prime field modulo multiplier

Figure 2.4: Top level diagram of the prime field multiplication circuit

12
There are many algorithms that implement modular multiplication pre-
sented in the literature. We chose one that directly follows from the basic
shift-and-add algorithm. It makes use of the linearity of the mod opera-
tor and reduces the partial products each iteration of its execution. 1 The
algorithm’s pseudocode is presented below (Algorithm 2.1.3):
Algorithm 2.1.3 Conventional Interleaved Modular Multiplication (Left-to-
Right) [3]

Input: X, Y are n-bit vectors such that X, Y ∈ [1, p − 1], with each bit
given by X(i), Y (i) ∈ {0, 1}.
Output: Z = (X · Y ) mod p

1. Z ← 0

2. For i from n − 1 downto 0 do:

2.1. U ← 2Z
2.2. V ← Y (i) · X
2.3. W ← U + V
2.4. Z ← W mod p

3. Return Z

To compute the mod product, this algorithm follows the basic shift and
add algorithm from usual multiplication (lines 2.1, 2.2 and 2.3) then reduces
the partial product modulo p (line 2.4) and repeats these steps for all multi-
plier bits Y(i) (loop in line 2). The usual shift and add multiplication of two
vectors involves the following:

• A multiplication of the multiplicand by bit y(i).

• Shifting the result.

• Then, adding the shifted result to the previous partial product.

But we are doing multiplication in GF(p), which means our result needs to
be in the range [0, p-1]. Hence the mod reduction in line 2.4, which will
result in taking the remainder of the usual multiplication partial product
upon division by the modulus p. this is possible because the mod operator is

13
linear, that is; the remainder of the sum is equal to the sum of remainders.
This guarantees that the partial product will be brought back into the range
[0, p-1] whenever it exceeds it. In fact it will never exceed the range by more
than 2p. To see that this is indeed valid (and to illustrate the algorithm) we
consider the worst case for a 3-bit multiplication example: let the modulus
p=7, X= 110 and Y=110, the results of each step are presented in table 2.1

i U V W Z
2 000 110 110 110
1 1100 110 1000 100
0 1000 000 1000 001

Table 2.1: An example multiplication using algorithm 2.1.3

if we track the values of W we see that it never exceeds the range with more
than 2p, this gives us insight as to how we should implement the reduction
in line 2.4. We distinguish 3 cases:
case 1: W is already in the range.
Case 2: W exceeds the range with a value less than p. In this case we need
to subtract p to get W back to the range.
Case 3: W exceeds the range with a value more than 1p. In this case we need
to subtract 2p to get W back to the range.
this tells us that the result is a selection between the results in the above
3 cases. To implement this in hardware we use 2 subtractors to compute
w-p and w-2p whose result are going to feed into the multiplexer along with
unreduced w. To choose which is the value to be fed into Z the select lines
of the multiplexer are simply the sign bits of the subtractor’s output, with
s1 being w-p sign bit and s2 w-2p sign bit, there are again 3 cases:
Case 1: s1s2 = 00, meaning neither subtraction makes the result negative,
which means that we choose w-2p.
Case 2: s1s2 = 01, meaning that the w-2p makes the result negative and w-p
doesn’t so we select the latter.
Case 3: s1s2 = 11, both subtractions make the result negative which means
W is already in the range and so we select W.
A case where s1s2 = 10 is impossible because it implies that subtracting a
number p from W makes the result negative but subtracting two times p
doesn’t. In hardware, line 2.1 is simply a shift right operation. Line 2.2 is

14
done using AND gates as shown in the figure. To index into Y we use a shift
register, take its MSB and feed it into the AND gates. Line 2.3 is simply an
adder. All that remains is to implement the loop in line 2. We use a counter
along with an FSM with two states: an Idle state to initialize the registers
X, Y and Z. The FSM stays in the idle state until the Start input is asserted.
Thereafter it rolls to the ”compute” state where the operations in 2.1, 2.2,
2.3 and 2.4 are carried out concurrently. The FSM keeps entering this state
until the value of the counter reaches 0, then it goes back to the idle state
where the ready output signal is asserted announcing the end of execution.
The circuit is presented in Figure 2.5

Figure 2.5: Circuit diagram of the prime field multiplier

2.1.4 Design of the prime field modulo inverter

Figure 2.6: Top level diagram of the prime field inversion circuit

15
To find the multiplicative inverse of an integer a modulo p, that is; the
integer x such that a × x mod p = 1, several algorithms were developed.
These include algorithms such as the Itoh-Tsuji algorithm [4] (which relies
on repeated exponentiation), The extended Euclidean algorithm (which relies
on reduction by division), the Binary Extended Euclidean Algorithm, etc.
These algorithms were explained extensively in the literature, so we will solely
present the latter algorithm, the one relevant to our work. Its pseudocode is
presented in Algorithm 2.1.4.
Algorithm 2.1.4 Binary extended Euclidean algorithm for inversion in GF(p)
[2]

Input: A prime p and an integer a ∈ [1, p − 1].


Output: R = a−1 mod p

1. u ← a, v ← p, x ← 1, y←0

2. While u ̸= 0, do:

2.1. While u is even:


2.1.1. u ← u/2
2.1.2. If x is even then x ← x/2, else x ← (x + p)/2
2.2. While v is even:
2.2.1. v ← v/2
2.2.2. If y is even then y ← y/2, else y ← (y + p)/2
2.3. If u ≥ v then:
2.3.1. u ← u − v
2.3.2. If x > y then x ← x − y, else x ← x + p − y
2.4. Else:
2.4.1. v ← v − u
2.4.2. If y > x then y ← y − x, else y ← y + p − x

3. If u = 1, then R ← x mod p

4. Else if v = 1, then R ← y mod p

5. Return R At this point, R = a−1 mod p

16
The algorithm uses 4 variables x,y,u and v they are first initialized as
specefied in line 1. Then the algorithm enters the main while loop and stays
there until u reduced to 0. Inside this loop there are 2 consecutive while loops,
after them there’s a conditional assignement that depends on the values of
U, V, X and Y computed in the previous two loops. To better illustrate the
algorithm we go through an example and compute the values of the variables
each step. The results are summarized in table 2.2

u v x y
4 7 1 0
2 7 4 0
1 6 2 5
1 2 2 4
0 1 2 2
Table 2.2: Table of values for u, v, x, and y

To preserve the order, loop structure and interdependency of the steps our
FSM must do the following: Have a state where only u and x are changed as
specified by lines 2.1.1 and 2.1.2 until u is no longer even. The FSM should
then enter a new state where only v and y this time are changed as specified
by lines 2.2.1 and 2.2.2 until v is no longer even. After these two loops are
executed the FSM should enter a state where the values of U,V,X and Y
are ready and the selective assignment can take place as specified by lines
2.3 through 2.4.2. We label the previously outlined states ”while u even”,
”while v even” and ”u v comp” states respectively. In addition to the ”idle”
state where the registers are initialized, the start input signal is checked to
begin execution and the ready signal is asserted. This way, the FSM enforces
the nested loop structure of the algorithm and enforces the flow of execution.
The state diagram is shown in figure 2.7
To use the FSM we let it control the values that the registers should take
each state, by using the state register as the select line to the multiplexer that
feeds the registers from the datapath as shown in the conceptual diagram in

17
Figure 2.7: State diagram of the FSM controller

figure 2.8.

Figure 2.8: Conceptual diagram of a modulo inverter

We specify the registers values and derive the datapath. To do that, let’s
track the possible values of the registers/variables each individually to derive
the datapath:

18
The U register: It takes U/2 if the FSM is inside the while-u-even state,
retains its value if inside the while-v-state and in the u-v-comparison
state it either receives u - v. If u ≥ v otherwise it retains its value.
Division by 2 is a shift right operation. So the datapath for the register
simply consists of MUXs for selection, a shifter and a subtractor. The
circuit is shown in Figure 2.9

Figure 2.9: Circuit diagram for register U datapath

The V register: similar to the U register, the V register takes V/2 if the
FSM is inside the while-v-even stain, retains its value if the FSM is
inside the while-u-even state and in the u-v-comparison state it either
takes v-u if u ≥ v condition is false, retains its value otherwise. The
circuit is shown in Figure 2.10

The x register: in the while-u-even state, the x register takes x/2 if it’s
even or (x + p)/2 otherwise, retains its value in the while-v-even-state.
In the u-v-comparison state it takes x > y, if not, it receives x + p − y.

The y register: in the while-u-even state, the y register takes y/2, in the
while-u-even-state it retains its value and in the u-v-comparison state a

19
Figure 2.10: Circuit diagram for register V datapath

Figure 2.11: Circuit diagram for register X datapath

20
similar selection to register x determines its value. The circuit is shown
in figure 2.12

Figure 2.12: Circuit diagram for register Y datapath

2.2 Binary field arithmetic circuits


This section concerns the design of binary field arithmetic circuits; adder,
subtractor, multiplier and inverter over binary fields.

2.2.1 Design of the binary field adder and subtractor


The design of a binary field adder or subtractor (since they’re exactly the
same operation) is very simple. This is because the entier adder/subtractor
is nothing more than bitwise XOR gates.

2.2.2 Design of the binary field multiplier


Algorithm 2.2.2 takes in m bit binary vectors, a(z) and b(z), the irreducible
polynomial f (z), and returns c(z) = a(z)b(z) mod f (z).
Algorithm 2.2.2 Right-to-left shift-and-add field multiplication in GF (2m )
[2]

21
Figure 2.13: Top level diagram of the binary field multiplication circuit

Input: Binary polynomials a(z) and b(z) of degree at most m − 1.


Output: c(z) = a(z) · b(z) mod f (z).

1. If a0 = 1 then c ← b; else c ← 0.

2. For i from 1 to m − 1

2.1. b ← b · z mod f (z)


2.2. If ai = 1 then c ← c + b

3. Return c.

In essence, the algorithm shifts b one bit to the left, then accumulates it
into c if ai =′ 1′ . Except that when we shift b, it could exceed the range
of GF (2m ), in which case we subtract it from f (z) (i.e., XOR them). This
explains the implementation of the mod operation in line 2.1.
The datapath is therefore apparent. Now, note that the operations inside
the loop are possible to perform in one clock cycle. We therefore need two
states for the control path; the idle state where we load the registers with
their initial values, and an Operation state where we perform the necessary
operations (shift, Xor, decrement) based on the relevant conditions. The
FSM will simply keep entering the Operation state to loop over all the a
register indices. To index into a we simply shift it right and process its LSB.
For its simplicity, we leave the circuit out.

22
Figure 2.14: Top level diagram of the binary field inversion circuit

2.2.3 Design of the binary field inverter


Algorithm 2.2.3 makes possible finding the inverse of any element a in a
binary field GF (2m ) with an irreducible polynomial f .
Algorithm 2.2.3 Binary algorithm for inversion in GF (2m ) [2]

Input: A nonzero binary polynomial a of degree at most m − 1.


Output: a−1 mod f

1. u ← a, v←f

2. x ← 1, y←0

3. While u ̸= 1 and v ̸= 1, do:

3.1. While z divides u, do:


3.1.1. u ← u/z
3.1.2. If z divides x, then x ← x/z; else x ← (x + f )/z
3.2. While z divides v, do:
3.2.1. v ← v/z
3.2.2. If z divides y, then y ← y/z; else y ← (y + f )/z
3.3. If deg(u) > deg(v), then:

u ← u + v, x←x+y

Else:
v ← v + u, y ←y+x

23
4. If u = 1, then return x; else return y

We note that this is very similar to algorithm 2.1.4 for prime field inver-
sion. The difference is that the addition here is just bitwise XORing and that
we are comparing the degrees of the polynomials instead of their magnitudes.
Therefore; the implementation will be very similar.
The FSM is exactly the same in both designs. The register datapaths, here,
will have XOR gates instead of adders. What might be a bit tricky is the im-
plementation of the comparison in line 3.3. Explicitly computing the degrees
of each polynomial and then performing a comparison is quite expensive in
hardware. To work around this, we reasoned as follows; If two vectors u have
v have the same degree, then u XOR v will be smaller than both u and v. If,
instead, polynomial u had a greater degree than v, then XORing them must
yield a polynomial that is greater than v. This is demonstrated in figure 2.15

deg(u) = deg(v) deg(u) > deg(v)

u = 0 0 ··· 0 1 x x ··· x u = 0 0 ··· 0 1 x x ··· x


v = 0 0 ··· 0 1 x x ··· x v = 0 0 ··· 0 0 x x ··· x
u ⊕ v = 0 0 ··· 0 0 x x ··· x u ⊕ v = 0 0 ··· 0 1 x x ··· x

u ⊕ v is smaller than both u and v. u⊕v >v

Figure 2.15: Bitwise comparison of u⊕v under degree equality and inequality

Hence, to implement line 3.3, we need only check if u XOR v > v.

24
Chapter 3

Implementation
In this chapter we present the simulation results of the designs presented in
the previous chapter. We also examine and compare the performance and
used resources of both prime and binary field designs.

3.1 Simulation of the prime field multiplier


The design takes m clock cycles to finnish execution. The simulation results
of the prime field multiplier design with an example multiplication are shown
in Figure 3.1

Figure 3.1: Circuit diagram of the prime field multiplier

With inputs : a = 107 and b = 100, a prime modulus p = 131 and a


size m + 1 = 8 We can see that upon asserting the start signal the output
starts to go through different values following the algorithm (this can easily
be verified by carrying out the algorithm by hand) until after 8 clock cycles

25
it converges to the value 89. Which is indeed equal to 107 ∗ 100 mod 131.
The ready output signal is finally asserted announcing the end of execution.

3.2 Simulation of the prime field inverter


The circuit takes at worst 2m + 1 clock cycles for the output register to
converge to the inverse. m is the width of the input to be inverted. An
example is shown in figure 3.2

Figure 3.2: Simulation results of the inversion for inputs a = 45 and p = 103

We can see that upon asserting the start signal the output starts to go
through different values following the algorithm (this can easily be verified
by carrying out the algorithm by hand, checking the values of registers U, V,
X, Y) until after 14 clock cycles it converges to the value 87 which is indeed
equal to the multiplicative inverse of 45 mod 103.
The ready output signal is finally asserted anouncing the end of execution.

3.3 Simulation of the binary field multiplier


The design takes m clock cycles for the output register to converge to the
result, where m is the width of inputs. Simulation results are shown in Figure
3.3.
Here we are working in GF (28 ) with the irreducible f = 111000011. The
inputs are a = 10101010 = (170)10 and b = 01010101 = (85)10 .

26
Figure 3.3: Simulation results of the binary multiplication circuit

We can see that upon asserting the start signal the output starts to go
through different values following the algorithm until after 6 clock cycles it
converges to the value (189)10 = 10111101. Which is indeed equal to the
product of a and b in this field.
The ready output signal is finally asserted announcing the end of execution.

3.4 Simulation of the binary field inverter


The circuit takes at worst 2m + 1 clock cycles for the output register to
converge to the inverse, where m is the width of the input to be inverted. An
example is shown in figure 3.4.
The values used follow example 1, where the finite field is GF (26 ) and the
input to invert is a = 0110101 = (53)10 .

Figure 3.4: Simulation results of the binary inversion simulation

We can see that upon asserting the start signal the output starts to go
through different values following the algorithm until after 13 clock cycles it
converges to the value (9)10 = 0001001. Which is indeed correct as shows
example 1. The ready output signal is finally asserted announcing the end of
execution.

27
3.5 Performance
The designs presented in the previous sections have been implemented us-
ing VHDL to target the Xilinx Artix 7 FPGA.
The performance results and used resources for different NIST recommended
field sizes of both prime and binary field multipliers and inverters are sum-
marized in tables 3.1, 3.2, 3.3, and 3.4 respectively.

Bit-width LUTs FMAX (MHz)


Bit-width LUTs FMAX (MHz)
192 bits 508 84.44
192 bits 774 100.68
256 bits 669 41.98
233 bits 924 85.46
512 bits 1314 20.06
Table 3.2: Binary field multiplier
Table 3.1: Prime field multiplier per-
performance and used resources
formance and used resources

Overall, the binary field multiplier achieves better frequency and con-
sumes less resources. This is due to the simplicity of the algorithm and dat-
apath, since they only consist of XOR gates and multiplexers. Whereas the
prime field multiplier datapath consists of adders as well, which introduces
more delay, since adders concume more look up tables (LUTs).

Bit-width LUTs FMAX (MHz) Bit-width LUTs FMAX (MHz)


192 bits 2698 50.46 150 bits 710 69.18
256 bits 3589 40.08 192 bits 908 68.92
512 bits 7131 30.97 233 bits 1101 58.67

Table 3.3: Prime field inversion per- Table 3.4: Binary field inversion per-
formance and resource usage formance and resource usage

From tables 3.3 and 3.4 we can see that, again, the binary field inverter
outperforms its prime field counterpart, and consumes less resources.
Once again, this is expected. Especially that both circuits implement the
same algorithm (Algorithms 2.1.4 and 2.2.3). However, since the associated
binary operations are much easier to perform (XOR compared to an adder),
the critical path in the inversion circuit for the binary field is shorter. This
further confirms the idea that binary fields are generally more efficient and
consume less area.

28
To visualize how size affects frequency and resource consumption, we
present in Figures 3.5 and 3.6, as an example, the prime field multiplier’s
frequency as a function of size.

Prime Field Multiplier Performance

160
FMAX vs Bit-width
140
120
FMAX (MHz)

100
80
60
40
20
0
0 50 100 150 200 250 300 400 500
Bit-width
Figure 3.5: Graph showing maximum operating frequency vs operand size
for prime field multiplier circuit

The reason the curve slightly changes slope is that, using the synthesis
tools, we can only approximate the maximum operating frequency of the de-
sign by adjusting the timing constraints that the synthesis tool should try
to achieve. For example if we increase the timing constaints the routing
tools may ”work harder” to acheive a better maximum frequency, while if we
”loosen” the constraints, it may acheive only a suboptimal frequency. There-
fore, this is not an exact curve of maximum frequency vs size. However, it
reveals that maximum frequency in this multiplier design is roughly related
to size through a linear fit.

It should be noted that a full comparison that takes into account other
parameters, such as security and power consumption, is beyond the scope of
our work.

29
LUT Usage vs. Bit-width for Prime Field Multiplier
1,400
1,200
1,000
800
LUTs

600
400
200
LUTs vs Bit-width
0
0 50 100 150 200 256 300 400 512
Bit-width
Figure 3.6: Graph showing LUT consumption vs operand size for prime field
multiplier circuit

30
Conclusion
In our work, overall, we acheived good performance in the individual de-
signs. We showed that binary field arithmetic circuits generally consume
less resources and acheive better maximum operating frequency. However it
should be noted that the inversion circuits achieve a significantly lower max-
imum frequency than the other designs. This would introduce a bottleneck
if the inversion circuits are integrated within a larger system. This is a lim-
itation of our work. As future work, we will first make use of optimization
techniques such as pipelining to increase frequency and avoid bottlenecks,
then we would use our designs to implement EC group operations which
would then implement an ECC protocol like key exchange or digital signa-
ture generation and verification. To make our designs useful we could make
use of the processor built in the Virtex 4 FPGA board to have a functioning
System on Chip that serves as an ECC core.

31
References
[1] P.P. Chu. RTL Hardware Design Using VHDL: Coding for Efficiency,
Portability, and Scalability. IEEE Press. Wiley, 2006. isbn: 9780471786399.
url: https://books.google.dz/books?id=gVd2yeFHshUC.
[2] Scott Vanstone Darrel Hankerson Alfred J. Menezes. Guide to Elliptic
Curve Cryptography. Springer New York, 2004.
[3] MD. Mainul Islam et al. “Area-Time Efficient Hardware Implementa-
tion of Modular Multiplication for Elliptic Curve Cryptography”. In:
IEEE Access 8 (2020), pp. 73898–73906. doi: 10.1109/ACCESS.2020.
2988379.
[4] Toshiya Itoh and Shigeo Tsujii. “A fast algorithm for computing multi-
plicative inverses in GF(2m) using normal bases”. In: Information and
Computation 78.3 (1988), pp. 171–177. issn: 0890-5401. doi: https :
//doi.org/10.1016/0890- 5401(88)90024- 7. url: https://www.
sciencedirect.com/science/article/pii/0890540188900247.
[5] T.R. Shemanske. Modern Cryptography and Elliptic Curves. Student
Mathematical Library. American Mathematical Society, 2017. isbn: 9781470435820.
url: https://books.google.dz/books?id=TQIvDwAAQBAJ.

32
Appendix
Definition 15. A field is a set F with two binary operations called addition,
denoted +, and multiplication, denoted ·, satisfying the following field axioms:

FA0 (Closure under Addition) For all x, y ∈ F, the sum x + y is in F.

FA1 (Closure under Multiplication) For all x, y ∈ F, the product x · y


is in F.

FA2 (Commutativity of Addition) For all x, y ∈ F, x + y = y + x.

FA3 (Associativity of Addition) For all x, y, z ∈ F, (x + y) + z = x +


(y + z).

FA4 (Additive Identity) There exists an element 0 ∈ F such that x + 0 =


0 + x = x for all x ∈ F.

FA5 (Additive Inverses) For every x ∈ F, there exists y ∈ F such that


x + y = y + x = 0. The element y is called the additive
inverse of x and is denoted −x.

FA6 (Commutativity of Multiplication) For all x, y ∈ F, x · y = y · x.

FA7 (Associativity of Multiplication) For all x, y, z ∈ F, (x · y) · z =


x · (y · z).

FA8 (Multiplicative Identity) There exists an element 1 ∈ F such that


x · 1 = 1 · x = x for all x ∈ F.

FA9 (Multiplicative Inverses) For every x ∈ F such that x ̸= 0, there


exists y ∈ F such that x · y = y · x = 1. The element y is
called the multiplicative inverse of x, and is denoted x−1 or
1
x.

33
FA10 (Distributivity) For all x, y, z ∈ F, multiplication distributes over
addition: x · (y + z) = x · y + x · z.

FA11 (Distinct Identities) 1 ̸= 0.

34

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy