19
19
FPGA Implementations of
the ICEBERG Block Cipher
François-Xavier Standaert, Gilles Piret, Gael Rouvroy, Jean-Jacques Quisquater
UCL Crypto Group, Place du Levant, 3, B-1348 Louvain-La-Neuve, Belgium.
e-mail: standaert,piret,rouvroy,quisquater@dice.ucl.ac.be
Abstract— This paper presents FPGA (Field Programmable The paper is structured as follows. Section 2 briefly presents
Gate Array) implementations of ICEBERG, a block cipher de- the specifications of ICEBERG and Section 3 describes our
signed for reconfigurable hardware implementations and pre- FPGA design methodology. Section 4 lists the combinatorial
sented at FSE 2004. All its components are involutional and
allow very efficient combinations of encryption/decryption. The cost of the block cipher components. The implementation
implementations proposed also allow changing the key and results for various architectures are in Sect. 5 and comparisons
Encrypt/Decrypt (E/D) mode for every plaintext, without any with other block ciphers are in Sect. 6. Resistance against
performance loss. In comparison with other recent block ciphers, side-channel analysis is briefly discussed in Sect. 7. Finally,
the implementation results of ICEBERG show a significant im- conclusions are in Sect. 8.
provement of hardware efficiency. Moreover, the key and E/D
agility allows considering new encryption modes to counteract
certain side-channel attacks. II. S PECIFICATIONS
A. Block and Key Size
I. I NTRODUCTION
ICEBERG operates on 64-bit blocks and uses a 128-bit key. It
In October 2000, NIST (National Institute of Standards is an involutional iterative block cipher based on the repetition
and Technology) selected Rijndael as the new Advanced of 16 identical key-dependent round functions. In the next
Encryption Standard. The selection process included subsections, we briefly present the algorithm. A more detailed
performance evaluation on both software and hardware description can be found in the original paper [1].
platforms. However, as implementation versatility was a
criteria for the selection of the AES, it appeared that Rijndael S0 S0 S0 S0 S0 S0 S0 S0 S0 S0 S0 S0 S0 S0 S0 S0
S0 S0 S0 S0 S0 S0 S0 S0 S0 S0 S0 S0 S0 S0 S0 S0
ICEBERG is a block cipher designed for efficient
reconfigurable hardware implementations. It is based on P64
input lookup tables1 of FPGAs, and its key scheduling allows P64
Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC’05)
0-7695-2315-3/05 $ 20.00 IEEE
2
y3 0 1 1 1 x3 KeySelection(expandedkey[16],not(sel),roundkey[16]);
y2 1 0 1 1 x2
×
y1 = 1 1 0 1 x1
AddRoundKey(state,roundkey[0]);
y0 1 1 1 0 x0 for (i=1;i<16;i++)
where every output bit is a ⊕ operation between three input {
bits. It is therefore efficiently combined with the key addition Round(state,roundkey[i]);
inside a single 4-input LUT. }
NonLinearLayer(state);
AddRoundKey(state,roundkey[16]);
C. The key schedule
}
The key scheduling process consists of key expansion and
key selection. The round constants are : C = 0 until round 8, C = 1
thereafter. A particular structure of the expanded key is
The key expansion expands the cipher key K into a therefore obtained:
sequence of keys K 0 , K 1 , ..., K 16 . We set the initial key
K 0 = K. The following keys are obtained by a keyround K0 = K 16
function so that : K i+1 = keyround(K i ). K1 = K 15
The keyround is pictured in Fig. 2, where we distinguish a ... (1)
conditional shift layer, bit permutations P 128 (i.e. 128-bit
As a consequence, ICEBERG allows the encryption/decryption
wire crossings) and S-boxes S0. The conditional shift
with exactly the same hardware (only the selection bit has to
operation depends on a round constant C that will be
be changed) and the expanded key may be derived “on the
discussed further.
fly” in encryption and decryption (the storage of round keys
SHIFT Left/Right is not necessary). More details about this particular structure
P128
are available in the paper of FSE 2004.
S0 S0 S0 S0 S0 S0 S0 S0 .... S0 S0 S0 S0 S0 S0
III. D ESIGN METHODOLOGY
P128
Present reconfigurable components like FPGAs are usually
SHIFT Left/Right
made of reconfigurable logic blocks combined with fast access
memories (RAM blocks) and high speed arithmetic circuits
Fig. 2. The key round.
[2], [3]. Basic logic blocks of FPGAs include a 4-input
Finally, the key selection first performs a simple compression function generator (called lookup table, LUT) and a storage
function that selects 64 bytes of K i having odd indices. element. In addition, most FPGA manufacturers provide users
Thereafter, a 4 × 4 key selection box is applied in parallel with fast carry logic and particular structures of the logic
to every 4-bit key-block. It performs the following boolean blocks to efficiently implement distributed memories, shift
operation: registers,... A brief description of these components is given
in Appendix.
y(0) = (x(0) ⊕ x(1) ⊕ x(2)) · sel ∨ (x(0) ⊕ x(1)) · sel
y(1) = (x(1) ⊕ x(2)) · sel ∨ x(1) · sel As reconfigurable components are divided into logic elements
y(2) = (x(2) ⊕ x(3) ⊕ x(0)) · sel ∨ (x(2) ⊕ x(3)) · sel and storage elements, an efficient implementation will be the
y(3) = (x(3) ⊕ x(0)) · sel ∨ x(3) · sel result of a better compromise between combinatorial logic
Depending on the value of a selection bit sel, we obtain the used, sequential logic used and resulting performances. These
round key RK0i or RK1i for the round i. observations lead to different definitions of implementation
efficiency:
D. Encryption/decryption process 1) In terms of performances, let the efficiency of a block ci-
pher be the ratio T hroughput (M bits/s)/Area (LU T s,
The complete cipher consists of an initial round key addition, RAM blocks).
15 rounds and a final transform. Due to the involutional 2) In terms of resources, the efficiency is easily tested by
structure of every single component of ICEBERG, the computing the ratio N br of registers/N br of LU T s:
E/D mode is fixed with the selection bit only: sel = 1 it should be close to one.
in encryption and sel = 0 in decryption. In pseudo C, we have:
ICEBERG was designed in order to allow very efficient
ICEBERG(state,cipherkey,sel) FPGA implementations and our architectures are defined
{ in order to maximize these notions of hardware efficiency.
KeyExpansion(cipherkey,expandedkey[0..16]); It practically results in the pipelining of the round and
for (i=0;i<16;i++) keyround functions. Pipelining increases the encryption speed
{ by processing multiple blocks of data simultaneously. It is
KeySelection(expandedkey[i],sel,roundkey[i]); achieved by inserting rows of registers among combinatorial
} logic. Parts of logic between two consecutive registers form
Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC’05)
0-7695-2315-3/05 $ 20.00 IEEE
3
keyround
Linear diffusion layer 64 Keyround 384 round S0 S0
Round 256 Selection layer 64 shift S1 shift
D S0
Remark that if the maximum pipeline is not inserted, the
keyround
S
round
E S0
L
shift layers can be efficiently implemented inside the Virtex D
S
shift
S
round E keyround
S1 S0
keyround
V. I MPLEMENTATION RESULTS S0
S1
of the key and E/D mode for every plaintext. The area and L L
tation with Xilinx ISE 6.1 on the Xilinx Virtex-II technol- Fig. 3. Unrolled architectures : full pipe and half pipe.
ogy. The timing constraints were applied to the inner clock
and we used the input-output (IO) registers embedded into the B. Loop architectures
FPGA IOBs2 in order to take the interface constraints into In the applications requiring minimum area, we propose a loop
account. It is important to note that the limiting factor of our architecture with only one round implemented. In order to
work frequencies was always the input-output management. decrease the area requirements, we only considered the half
As an illustration, the internal clock of the fully pipelined pipe strategy. In addition to the efficiency advantages already
unrolled implementation without IO registers is near to the mentioned, half pipe structures are specially convenient for
maximum (380 Mhz), but if IO registers are considered, it loop architectures because they allow the combination of the
decreases to 297 Mhz, what we believe to be a fair frequency loop multiplexer with the round and keyround logic. Our
estimation. proposal is pictured in Fig. 4, where we share the initial and
final key addition. As for unrolled architectures, it is possible
A. Unrolled architectures to use the FPGA RAM blocks to implement the round S-box.
The implementation results for these loop architectures are
For high throughput applications, we propose an unrolled
provided in Table II.
implementation with the 16 rounds implemented and we
applied two pipelining strategies. If a maximum throughput
C. Feedback modes
is required, a full pipe implementation is provided, with the
As soon as a feedback mode is used, pipelining techniques
2 IOBs : Input-Output Blocks. are not relevant for block cipher implementations. This is due
Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC’05)
0-7695-2315-3/05 $ 20.00 IEEE
4
text
keyround
round
provide key and E/D agility: two properties that are never
combined in other block cipher implementations3 . The
S0 S0
D shift
S1 shift
allows very efficient implementation opportunities and having
keyround
text
round
S
round E keyround key
L
shift
round
S1
S0 S0
[17] that actual computers and microchips leak information
S0
S
E D correlated to the data handled. Side-channel attacks based
L
shift on time, power and electromagnetic measurements were
cipher successfully applied to smart card implementations of block
Fig. 5. Feedback mode : Unrolled and loop architectures. ciphers. Protecting implementations against side-channel
attacks is usually difficult and expensive. Masking all the data
Type # of # of Latency Out. every Freq. Throughput with random boolean values is suggested in several papers
slices RAMBs (cycles) (cycles) (Mhz) (Mbits/sec)
[18], [19] and the use of small substitution tables allows this
Unrolled 3174 0 1 1 14 896
Loop 571 0 17 1/16 147 588 to be efficiently implemented, although it is still an expensive
RAM 467 4 17 1/16 145 580 solution (the additional cost of masking a 2n -bit table is
another 22n -bit table).
TABLE III
F EEDBACK MODE RESULTS ON V IRTEX - II . 3 Excepted in the Triple-DES.
4 The ICEBERG S-box memory requirements are : (24 × 4) × 6 = 384
bits. If RAMBs are used, it becomes 28 × 8 = 2048 bits.
Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC’05)
0-7695-2315-3/05 $ 20.00 IEEE
5
TABLE IV
BASIC FEATURES OF COMPARED BLOCK CIPHERS .
TABLE V
P ERFORMANCES OF COMPARED BLOCK CIPHERS .
Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC’05)
0-7695-2315-3/05 $ 20.00 IEEE
6
The key agility provided by ICEBERG (changing the [11] G. Rouvroy, F.-X. Standaert, J.-J. Quisquater, J.-D. Legat, Optimizing
key at every plaintext block is for free) also offers interesting Cipher FPGA Implementations : DES and Triple DES, in the proceedings
of FPL 2003, Lecture Notes in Computer Science, vol. 2778, pp 181-193,
opportunities to prevent certain side-channel attacks by Sppringer-Verlag, 2003.
defining new encryption modes where the key is changed [12] G. Rouvroy, F.-X. Standaert, J.-J. Quisquater, J.-D. Legat, Compact and
sufficiently often. As most side-channel attacks need to Efficient Encryption/Decryption Module for FPGA Implementation of the
AES Rijndael Very Well Suited for Small Embedded Applications, in the
collect several leakage traces to remove the noise from useful proceedings of ITCC 2004, Las Vegas, April 5-7 2004.
information, changing the key frequently, even in a well [13] Helion Encryption Cores : http://www.heliontech.com/
chosen deterministic way (e.g. LFSR-based), could help to [14] P. Chodowiec, K. Gaj, Very Compact FPGA Implementation of the AES,
in the proceedings of CHES 2003, Lecture Notes in Computer Sciences,
counteract (or at least make more difficult) these attacks. vol 2779, pp 319-333, Springer-Verlag, 2003.
A thorough analysis of side-channel resistance based on [15] A.J. Elbirt, W. Yip, B. Chetwynd, C. Paar, An FPGA Implementation and
re-keying techniques would deserve further research and Performance Evaluation of the AES Block Cipher Candidate Algorithm
Finalists, in the proceedings of the Third AES Candidate Conference,
analysis. April 13-14 2000, New York, USA.
[16] B. Weeks, M. Bean, T. Rozylowicz, C. Ficke, Hardware Perfor-
VIII. C ONCLUSION mance Simulations of Round 2 Advanced Encryption Standard Al-
gorithms, NSA Final Report on AES candidates, available from
We presented FPGA implementations of ICEBERG, a block http://csrc.nist.gov/CryptoToolkit/aes/round2/NSA-AESfinalreport.pdf
[17] P.Kocher, J.Jaffe, B.Jun, Differential Power Analysis, in the proceedings
cipher designed for hardware implementations. In terms of CRYPTO 99, Lecture Notes in Computer Science 1666, pp 398-412,
of area requirements, throughput and hardware efficiency, Springer-Verlag.
ICEBERG exhibits excellent abilities compared to most recent [18] L.Goubin,J.Patarin, DES and Differential Power Analysis: The Dupli-
cation Method, in the proceedings of CHES 1999, Lecture Notes in
block ciphers. The simplicity of the design is also considerably Computer Science 1717, pp 158-172, Springer-Verlag.
improved and allows the fast development of an efficient [19] S.Chari et al., Towards Sound Approaches to Counteract Power-Analysis
architecture. In practice, an unrolled (resp. loop) architecture Attacks, in the proceedings of CRYPTO 1999, Lecture Notes in Computer
Science 1666, pp 398-412, Springer-Verlag.
has a throughput of 17,3 Gbits/sec (resp. 1,0 Gbits /sec), [20] S.Chari, J.Rao, P.Rohatgi, Template Attacks, in the proceedings of CHES
using 4946 FPGA slices (resp. 631 FPGA slices) in the Xilinx 2002, Lecture Notes in Computer Science 2523, pp 13-28, Springer-
Virtex-II technology. In addition, ICEBERG allows key and Verlag.
E/D agility. These properties could be used to improve
resistance against certain side-channel attacks, although this A PPENDIX
last point is let as a scope for further research. Due to the All the implementation results provided in this paper were obtained using
Xilinx Virtex-II devices [2]. In general, FPGAs may be viewed as a “sea”
simplicity of its component functions, ICEBERG is also likely of programmable logic gates where the logic, but also the routing are user
to exhibit excellent implementation results in hardware in programmable. This section briefly describes these components.
general (not only FPGAs). The main element of the Xilinx Virtex-II devices is the Configurable
Logic Block (CLB) that is made up of two slices, each one divided into two
Logic Cells (LC). An LC includes a 4-input function generator, carry logic
and a storage element. The output from the function generator in each LC
R EFERENCES drives both the CLB output and the D input of the flip-flop. Figure 6 shows
a simplified view of a single slice.
[1] F.-X. Standaert, G. Piret, G. Rouvroy, J.-J. Quisquater, J.-D. Legat,
ICEBERG : an Involutional Cipher Efficient for Block Encryption in Virtex-II function generators are implemented as 4-input LUTs that can also
Reconfigurable Hardware, in the proceedings of FSE 2004, the Fast provide a 16×1-bit synchronous RAM or a 16-bit shift register. In addition,
Software Encryption workshop, New Delhi, February 5-7 2004, Springer- the F5 multiplexer in each slice combines the LUT outputs. This combination
Verlag. provides a function generator that implements any 5-input function, a 4:1
[2] Xilinx: Virtex 2 FPGAs Data Sheet, http://www.xilinx.com. multiplexer, a 32 × 1-bit synchronous RAM or selected functions of up
[3] Altera: Stratix 1.5V FPGAs Data Sheet, http://www.altera.com. to nine bits. Similarly, the F6 multiplexer combines the outputs of all four
[4] K. Gaj, P. Chodowiec, Fast Implementation and fair Comparison of the LUTs in the CLB by selecting one of the F5-multiplexer outputs. Finally,
Final Candidates for the Advanced Encryption Standard using Field the arithmetic logic includes fast carry chains and additional logic gates
Programmable Gate Arrays, in the proceedings of the RSA Security (e.g. XORCY) to improve the efficiency of adder/multiplier implementations.
Conference - Cryptographer’s Track, San Francisco, CA, April 8-12,
Cout
2001, pp. 84-99.
[5] F.-X. Standaert, G. Rouvroy, J.-J. Quisquater, J.-D. Legat, Efficient LUTout1
Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC’05)
0-7695-2315-3/05 $ 20.00 IEEE