0% found this document useful (0 votes)
47 views4 pages

Cryptacus 2018 Paper 4

This document discusses two contributions to designing mixing layers for symmetric cryptography: 1) Existing algorithms can provide cheaper implementations for maximum distance separable (MDS) matrices than previously thought by viewing them as binary matrices and performing global optimization. Applying these algorithms to known MDS matrices significantly reduces the number of XOR operations needed. 2) Column parity mixers are introduced as a generalization of the mixing layer in Keccak that has low implementation cost but good diffusion properties. Column parity mixers operate on two-dimensional arrays and use parities computed over the columns. They can be suitable alternatives to MDS matrices.

Uploaded by

Mircea Petrescu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views4 pages

Cryptacus 2018 Paper 4

This document discusses two contributions to designing mixing layers for symmetric cryptography: 1) Existing algorithms can provide cheaper implementations for maximum distance separable (MDS) matrices than previously thought by viewing them as binary matrices and performing global optimization. Applying these algorithms to known MDS matrices significantly reduces the number of XOR operations needed. 2) Column parity mixers are introduced as a generalization of the mixing layer in Keccak that has low implementation cost but good diffusion properties. Column parity mixers operate on two-dimensional arrays and use parities computed over the columns. They can be suitable alternatives to MDS matrices.

Uploaded by

Mircea Petrescu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Mixing Layers in Symmetric Crypto

Ko Stoffelen
Digital Security Group, Radboud University,
Nijmegen, The Netherlands
Email: k.stoffelen@cs.ru.nl

Abstract—This paper describes two related contributions to for increasing matrix dimensions. However, quite a number
the area of designing lightweight mixing (diffusion) layers for of heuristic algorithms for finding the shortest linear straight-
symmetric cryptographic primitives such as permutations and line program, which corresponds exactly to minimizing the
block ciphers. First, we show how existing algorithms can number of XORs, have been proposed in the literature [8],
provide cheaper implementations for MDS matrices than previ- [9], [10].
ously thought possible, by viewing an MDS matrix over a finite We take several well-locally-optimized MDS matrices
field as a binary matrix and performing global optimization. from the literature and apply the known algorithms to all of
Then, we define column parity mixers as a generalization of the them. This leads immediately to significant improvements:
mixing layer used in K ECCAK, and we study their interesting we often get an implementation using less XOR operations
algebraic and diffusion properties. We show that column parity than what was considered fixed costs before.
mixers are a suitable alternative to MDS matrices. Not all ciphers use (near-)MDS matrices for their dif-
This very concise overview is based on two recently pub- fusion. A different type of mixing layer is found as θ in
lished papers [1], [2], which should be consulted for more K ECCAK-f , the permutation underlying K ECCAK [11]. This
details. mixing layer θ has a branch number of 4 and it requires
only 2 XORs per bit. Despite the low cost, it appears to
have quite good diffusion in combination with other parts of
1. Introduction the round function. In particular, [12] reports on proofs for
promising upper bounds for the differential probability of
Lightweight cryptography has been a major trend in differential trails. It appears that θ-like mappings can form
symmetric cryptography over the last years. While it is mixing layers with a good trade-off between implementation
sometimes not very clear when something is lightweight, the cost and mixing power.
main goal can be summarized as very efficient cryptography. In [2] we present a generalization of the θ mixing layer
Here, the meaning of efficiency ranges from small chip size in K ECCAK-f called column parity mixers (CPMs). CPMs
to low latency and low energy. operate on two-dimensional arrays and in their definition
In light of this, researchers started to optimize the parities computed over the columns play a central role. In
construction of many parts of permutations and block ciphers, Section 4 we provide an elegant description using matrix
with a special focus on the linear layers more recently and arithmetic, allowing us to easily derive algebraic and diffu-
even more specifically the implementation of maximum sion properties.
distance separable (MDS) matrices. That is, linear layers We also show that CPMs operating on states with an
with an optimal branch number. even number of rows have quite different properties from
Starting with [3] and followed by a whole series of papers those operating on an odd number of rows. The former
(e. g., [4], [5], [6]) researchers focused on finding MDS are involutions and are ideally suited for block ciphers and
constructions that minimize the number of XOR operations permutations that need to have an efficient inverse. The latter
needed for their implementations. Considering an n × n may have an inverse with high implementation cost but also
MDS matrix over a finite field F2k given as A = (αi,j ), with very interesting diffusion properties.
the aim was to choose the elements αi,j in such a way that
implementing all of the multiplications x 7→ αi,j x in parallel
becomes as cheap as possible. In order to compute the matrix
2. Preliminaries
A entirely, those partial results have to be added together,
for which an additional amount of XORs is required. So far, 2.1. Matrices
researchers have focused on local optimization by taking the
cost of combining the parts as a given. We use I to denote a (square) identity matrix and 0 to
Global optimization of matrix multiplication is another denote an all-zero matrix. We assume that the dimensions of
extensively studied line of research. It is known that the these matrices are determined by the context. The transpose
problem is NP-hard [7] and thus renders quickly infeasible of a matrix A is denoted as AT .
We use 1x to denote a column vector of x components 3. Global Optimization of MDS Matrices
that are all equal to 1. Consequently, 1T x is an all-1 row
vector with x components. We use 1yx to denote a matrix 3.1. Algorithms
with x rows and y columns with all components 1. Clearly,
1yx = 1x 1T
y. Back in 1997, Paar [8] studied how to optimize the
The element of a matrix A at row i and column j is arithmetic used by Reed-Solomon encoders. This boils down
denoted by αi,j . If B = AT , we have βi,j = αj,i . The trace to reducing the number of XORs that are necessary for a
of a square matrix is the linear function that simply takes multiplier that operates on matrices A over the field F2k .
Paar described two algorithms that find a local optimum.
the sum ofPits diagonal elements. It is denoted by tr(A), so Here we focus on the second algorithm. Intuitively, the idea
tr(A) = i αi,i .
is to iteratively eliminate common subexpressions. Let Tα
The Hamming weight of a vector u or of a matrix A is
be the multiplication matrix, to be applied to a variable
denoted hw(u) and hw(A) and is defined as the number of
field element x = (x1 , . . . , xk ) ∈ Fk2 . The algorithm for
nonzero entries in u and A, respectively.
computing Tα x finds a pair (i, j), with i 6= j , where the
Consider an n × n matrix A with αi,j ∈ F2k . Then bitwise AND between columns i and j of Tα has the highest
every multiplication by an element α can be described by Hamming weight. In other words, it finds a pair (xi , xj ) that
a left-multiplication with a matrix Tα ∈ Fk×k 2 . For 1 ≤ occurs most frequently as subexpression in the output bits of
i, j ≤ n, we define B(A) := (Tαi,j ) ⊆ GL(k, F2 )n×n ⊆ Tα x. When multiple pairs are equally common, all of them
(Fk×k
2 )n×n ∼
= Fnk×nk
2 and call this the binary representation are tried recursively. The XOR between xi and xj is then
of A. computed, and A is updated accordingly, with xi + xj as
newly available variable. This is repeated until there are no
common subexpressions left. Compared to the naive XOR
2.2. MDS Matrices count, Paar noted an average reduction in the number of
XORs of 17.5% for matrices over F24 and 40% for matrices
over F28 .
For a binary vector v ∈ Fnk Paar’s algorithms lead to so-called cancellation-free
2 , we define hwk (v) :=
hw(v 0 ), where v 0 ∈ (Fk2 )n is the vector that has been programs. This means that for every XOR operation u + v ,
constructed by partitioning v into groups of k bits. Fur- none of the input bit variables xi occurs in both u and v .
thermore, the branch number of a matrix A is defined as Thus, the possibility that two variables cancel each other
bn(A) := minu6=0 {hw(u) + hw(Au)}. For a binary matrix out is never taken into consideration, while this may in fact
B ∈ Fnk×nk , the branch number for k -bit words is defined yield a more efficient solution in terms of the total number
2
as bnk (B) := minu∈Fnk {hwk (u) + hwk (Au)}. of XORs. In 2008, Boyar, Matthews, and Peralta [7] showed
2 \{0} that cancellation-free techniques can often not be expected
In the design of block ciphers, maximum distance sepa- to yield optimal solutions for non-trivial inputs. They also
rable (MDS) matrices play an important role. showed that, even under the restriction to cancellation-free
Definition 1. An n × n matrix A is MDS if and only if programs, the problem of finding an optimal program is
bn(A) = n + 1. NP-complete.
Around 2010, Boyar and Peralta [9] came up with a
MDS matrices do not exist for every choice of n, k . The heuristic that is not cancellation-free and that improved on
exact parameters for which MDS matrices do or do not exist Paar’s algorithms in most scenarios. Their idea was to keep
are investigated in the context of the famous MDS conjecture. track of a distance vector that contains, for each targeted
For binary matrices, we need to modify Definition 1. expression of an output bit, the minimum number of XORs of
the already computed intermediate values that are necessary
Definition 2. A binary matrix B ∈ Fnk×nk
2 is MDS for k -bit to obtain that target. To decide which values will be added,
words if and only if bnk (A) = n + 1. the pair that minimizes the sum of new distances is picked.
If there is a tie, the pair that maximizes the Euclidean norm
MDS matrices have a common application in linear layers of the new distances is chosen. Additionally, if the XOR of
of block ciphers, due to the wide trail strategy proposed for two values immediately leads to a targeted output, this can
the A ES, see [13]. We typically deal with n×n MDS matrices always be done without searching further.
over Fk2 respectively binary Fnk×nk
2 matrices that are MDS At BFA 2017, an improvement was presented that
for k -bit words where k ∈ {4, 8} is the size of the S-box. simultaneously reduces the number of XORs and the depth
In either case, when we call a matrix MDS, the size of k of the resulting circuit [10].
will always be clear from the context when not explicitly
mentioned. 3.2. Results
It is easy to see that, if A ∈ Fn×n
2k
is MDS, then also
B(A) is MDS for k -bit words. On the other hand, there Using the heuristic methods that are described in the
might also exist binary MDS matrices for k -bit words that previous section, we can easily and significantly reduce the
have no according representation over Fk2 . XOR counts for many matrices that have been used in the
TABLE 1. N UMBER OF XOR S REQUIRED FOR MATRICES IN CIPHERS . a square matrix Z at the right. We call the n × n matrix Z
the parity-folding matrix of θ. We are now ready to define
Cipher Type Naive Literature PAAR [8] BP [9] the θ-effect of a matrix A.
8
4×4
A ES F2 152 7 + 96* 108 97 Definition 5. The θ-effect of A with respect to Z is a row
4×4
A NUBIS F82 184 20 + 96† 121 113 vector, denoted as eZ (A) (or just e(A) if Z is clear from
4×4
F82 —‡

C LEFIA M0
4×4
184 121 106 the context) and is defined by eZ (A) = 1T
m AZ .
C LEFIA M1 F28 208 —‡ 121 111
F OX MU 4 F82
4×4
219 —‡ 143 137 For a given input A and parity-folding matrix Z , a column
T WOFISH F82
4×4
327 —‡ 149 129 x is called unaffected (affected) if the component with index
8×8
x in eZ (A) is zero (non-zero). Whether a column is affected
F82 —‡

F OX MU 8 1257 611 594 or not is fully determined by the column parity of A and
8×8
G RØSTL F82 1112 504 + 448† 493 475 the column x of the parity-folding matrix Z .
8×8
K HAZAD F82 1232 584 + 448† 488 507
8×8
W HIRLPOOL F82 840 304 + 448† 481 465 Definition 6. The expanded θ effect of A with respect to Z
4
 4×4 †
is a matrix with m rows all equal to the CPM effect, namely,
J OLTIK F2 72 20 + 48 48 48
4×4 EZ (A) = 1m m AZ .
S MALL S CALE A ES F42 72 —‡ 54 47
8×8 A CPM θ simply consists in computing the expanded
F42 488 200 + 224†

W HIRLWIND M0 218 212
8×8 θ-effect of a matrix A and adding it to A.
W HIRLWIND M1 F42 536 200 + 224† 244 235
*
Reported by [14]. Definition 7. The column parity mixer θ using parity-folding
† Reported by [15]. matrix Z is defined as:
‡ We are not aware of any reported results for this matrix.
θ(A) = A + EZ (A) = A + 1m
m AZ .

literature. The running times for the optimizations are in the Note that a CPM is fully defined by a parity-folding
range of seconds to minutes. Table 1 summarizes the main matrix Z and m.
results.
A number of issues arise from this that are worth 4.2. Group Properties
highlighting. First, it turns out that there are cases where the
n(n − 1)k XORs for summing the products for all rows is In this section we list a few algebraic properties of CPMs
not a correct lower bound. In fact, all the 4 × 4 matrices over θ for given dimensions m × n. Proofs and examples are
GL(4, F2 ) that we studied can be implemented in at most omitted here and can be found in [2].
48 XORs. Second, the implementation of the MDS matrix
used in A ES with 97 XORs is, to the best of our knowledge, • Let ψ = θ0 ◦ θ be the composition of two CPMs.
the most efficient implementation so far and improves on the Then ψ is again a CPM.
previous implementation of 103 XORs, reported by [14]. As – If m is even, the parity-folding matrix of ψ
a side note, cancellations do occur in this implementation, is Z + Z 0 .
we thus conjecture that such a low XOR count is not possible – If m is odd, its parity-folding matrix is (Z 0 +
with cancellation-free programs. I)(Z + I) + I.

4. Column Parity Mixers • The set of all CPMs with m even forms a group
with composition that is isomorphic to the abelian
2

4.1. Definitions group (Zn2 , +).


• CPMs with m even are therefore involutions.
We consider linear mappings θ that operate on arrays • For CPMs with m odd, not all column parity mixers
with m rows and n columns. are invertible.
• The set of CPMs with m odd and Z + I non-singular,
Definition 3. The column parity of a matrix A is a (row) forms a group with composition that is isomorphic
vector defined as 1T
m A. to the group of binary invertible n × n matrices
In a matrix A, a column x is called even (odd) if the with multiplication. This is the general linear group
component with index x in 1T GL(n, 2).
m A is zero (one).

Definition 4. The expanded column parity of A is a matrix


4.3. Diffusion Properties
with m rows all equal to the column parity of A, and it is
given by 1mm A. For MDS matrices, the study of its diffusion properties
A column parity mixer (CPM) makes use of a linear is largely independent of the other steps in the round
transformation operating on the column parity of a matrix, function. For example, the proof of the fact that every
called its parity-folding transformation. We denote the parity- 4-round differential trail in A ES has 25 active S-boxes
folding transformation by multiplying the column parity with only requires from the MixColumns matrix that it is MDS.
Diffusion in CPMs is more subtle: their properties only TABLE 2. C OMPARISON OF MIXING LAYERS .
become meaningful in the context of a design approach,
where interaction with other steps of the round function Cipher Type XORs/bit Branch nr.
must be taken into account. A few properties are shared by A ES MDS, F82
4×4

3.03 5
all CPMs. J OLTIK MDS, F24
4×4
3 5
6×6 *
P HOTON MDS, F42 5† 7
4.3.1. The kernel. The set of states A with column parity P RØST MDS, F2
16×16
4.5† 5
equal to zero form a vector space with dimension n(m − 1). M IDORI Not MDS,‡ F42
4×4
1.5 4
Following [11], we call this the (column parity) kernel. For M INALPHER ‡
Not MDS, F2 4
4×4
1.5 4
states A in the kernel the parity is zero and consequently θ P RINCE Not MDS, F2
64×64
1.5 4
reduces to the identity mapping. S KINNY Not MDS, F42
4×4
0.75 2
There are states in the kernel with Hamming weight K ECCAK-f CPM, F2
5×5×w §
2 4
2, namely all states with a pair of active bits in the same Circulant CPM CPM, Fl2
m×n
2 + h−2 k 4
m
column. Again following [11], we call this an orbital. States
*
in the kernel have an even number of active bits per column Dimensions are for P100 , P144 , and P196 .
† Unknown whether it can be computed with less XORs.
and as observed in [12], they can be seen as a collection ‡ Can also be considered to be a CPM.
of orbitals. Due to the existence of single-orbital states, the § Where w ∈ {8, 16, 32, 64}, depending on which K ECCAK -f .

(Hamming) branch number, both differentially and linearly, k Where h is the weight of a row of Z . XORs/bit ≥ 2 − 1/m.

of every CPM is at most 4. In fact, an invertible CPM θ


with m ≥ 2 and where Z has no all-zero rows or columns,
[2] K. Stoffelen and J. Daemen, “Column parity mixers,” IACR Trans.
has differential and linear branch numbers 4, with the only Symm. Cryptol., vol. 2018, no. 1, pp. 126–159, 2018.
exception being the case where m = n = 2 and Z = I. [3] K. Khoo, T. Peyrin, A. Y. Poschmann, and H. Yap, “FOAM: Searching
The existence of this kernel implies that for a permutation for hardware-optimal SPN structures and components with a fair
that resists linear and differential attacks, the mixing layer comparison,” in CHES 2014, ser. LNCS, L. Batina and M. Robshaw,
needs to be accompanied by a transposition layer, that Eds., vol. 8731. Springer, Heidelberg, Sep. 2014, pp. 433–450.
makes sure that an attacker cannot indefinitely remain in the [4] S. M. Sim, K. Khoo, F. E. Oggier, and T. Peyrin, “Lightweight MDS
involution matrices,” in FSE 2015, ser. LNCS, G. Leander, Ed., vol.
kernel. However, by selecting this transposition layer and 9054. Springer, Heidelberg, Mar. 2015, pp. 471–493.
the dimensions carefully, we can show that the kernel does
[5] M. Liu and S. M. Sim, “Lightweight MDS generalized circulant
not pose a serious problem. matrices,” in FSE 2016, ser. LNCS, T. Peyrin, Ed., vol. 9783. Springer,
Heidelberg, Mar. 2016, pp. 101–120.
4.3.2. Propagation of isolated bits. A single-bit difference [6] C. Beierle, T. Kranz, and G. Leander, “Lightweight multiplication
at the input of θ propagates to on average 1 + |Z|m bits, in GF(2n ) with applications to MDS matrices,” in CRYPTO 2016,
Part I, ser. LNCS, M. Robshaw and J. Katz, Eds., vol. 9814. Springer,
with |Z| the Hamming weight of Z divided by its dimension. Heidelberg, Aug. 2016, pp. 625–653.
For a single-bit difference at the output of θ we have to
[7] J. Boyar, P. Matthews, and R. Peralta, “On the shortest linear straight-
distinguish between two cases. If m is even, the CPM is an line program for computing linear forms,” in MFCS 2008, ser. LNCS,
involution and the same properties hold as in the forward vol. 5162, 2008, pp. 168–179.
direction. For odd m, we have to consider the CPM defined [8] C. Paar, “Optimized arithmetic for Reed-Solomon encoders,” in ISIT.
by Z 0 = (Z + I)−1 + I. The matrix Z 0 may have much IEEE, 1997.
higher Hamming weight than Z , as is illustrated by θ in [9] J. Boyar and R. Peralta, “A new combinational logic minimization
K ECCAK [11] and even for large states, full diffusion in the technique with applications to cryptology,” in SEA 2010, ser. LNCS,
backward direction can be immediate. vol. 6049, 2010, pp. 178–189.
[10] J. Boyar, M. G. Find, and R. Peralta, “Low-depth, low-size circuits
for cryptographic applications,” 2017.
4.4. Costs [11] G. Bertoni, J. Daemen, M. Peeters, and G. V. Assche, “The
K ECCAK reference,” January 2011. [Online]. Available: https:
We attempt to compare different types of mixing layers //keccak.team/keccak.html
in Table 2. To account for the different sizes, implementation [12] S. Mella, J. Daemen, and G. V. Assche, “New techniques for trail
bounds and application to differential trails in Keccak,” IACR Trans.
costs are expressed as the number of XORs per bit. While Symm. Cryptol., vol. 2017, no. 1, pp. 329–357, 2017.
the branch numbers are interesting for diffusion over a single
[13] J. Daemen and V. Rijmen, The Design of Rijndael: AES - The Advanced
round, it is more important how this propagates over multiple Encryption Standard, ser. Information Security and Cryptography.
rounds in the context of a full permutation or block cipher. Springer, 2002.
This is studied in detail in [2]. [14] J. Jean, A. Moradi, T. Peyrin, and P. Sasdrich, “Bit-sliding: A generic
technique for bit-serial implementations of SPN-based primitives -
applications to AES, PRESENT and SKINNY,” in CHES 2017, ser.
References LNCS, W. Fischer and N. Homma, Eds., vol. 10529. Springer,
Heidelberg, Sep. 2017, pp. 687–707.
[1] T. Kranz, G. Leander, K. Stoffelen, and F. Wiemer, “Shorter linear [15] J. Jean, T. Peyrin, S. M. Sim, and J. Tourteaux, “Optimizing implemen-
straight-line programs for MDS matrices,” IACR Trans. Symm. Cryptol., tations of lightweight building blocks,” IACR Trans. Symm. Cryptol.,
vol. 2017, no. 4, pp. 188–211, 2017. vol. 2017, no. 4, pp. 130–168, 2017.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy