Collin 47481500 2020
Collin 47481500 2020
Collin, Simon
ABSTRACT
Today, two-factor authentication tokens have become more and more popular due to the increasing
number of attacks against the classical user-password online authentication. This master thesis aims to
assess the security of an open source security token, the Solo key, against some side channel attacks.
The Solo key relies on the U2F authentication protocol from the Fido Alliance to provide a strong second
factor authentication. The surface of attack found in the authentication protocol is first described. Then a
measurement setup relying on EM emanations is mounted. Next the research of the POI required by the
2 strategies attempted to break part of the HMAC-SHA256 construction is described. After that four side
channel attacks, three CPA with different leakage models and a template attack are launched against the
HMAC-SHA256 construction. Finally, it is shown that, employing the measurement setup and the trace
alignment algorithm described, the outer hash requires a greater number than 200 000 traces to be forged
while one can forge the inner hash in more or less 200 000 traces.
Collin, Simon. Side channel attacks against the Solo key - HMAC-SHA256 scheme. Ecole polytechnique
de Louvain, Université catholique de Louvain, 2020. Prom. : Standaert, François-Xavier. http://
hdl.handle.net/2078.1/thesis:26545
Le répertoire DIAL.mem est destiné à l'archivage DIAL.mem is the institutional repository for the
et à la diffusion des mémoires rédigés par les Master theses of the UCLouvain. Usage of this
étudiants de l'UCLouvain. Toute utilisation de ce document for profit or commercial purposes
document à des fins lucratives ou commerciales is stricly prohibited. User agrees to respect
est strictement interdite. L'utilisateur s'engage à copyright, in particular text integrity and credit
respecter les droits d'auteur liés à ce document, to the author. Full content of copyright policy is
notamment le droit à l'intégrité de l'oeuvre et le available at Copyright policy
droit à la paternité. La politique complète de droit
d'auteur est disponible sur la page Copyright
policy
Then I would like to thank my two assistants Olivier Bronchain and Balazs
Udvarhelyi who accompany me during this whole work by giving precious advice
and ideas to explore. They were always available in case of questions and provided
me a significant boost when I needed it the most. A particular thanks to Olivier
Bronchain for giving me access to his toolboxes.
I also would like to thank Pr. Jean-Didier Legat for accepting to take part in
this master thesis as a jury member.
Next, I would like to thank Charles Momin who was my hands at the laboratory
during the lockdown period that we went through.
Finally, I want to thank my family, Leslie and my friends and roommates for
giving me their support and their love during this master thesis and, in general,
during my 5 years of study at EPL.
II
Contents
1 Introduction 1
2 Theoretical background 3
2.1 SHA256 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 HMAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Jitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.4 Signal to noise ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.5 Rank estimation and key enumeration . . . . . . . . . . . . . . . . 7
2.6 Side channel attacks . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.6.1 Correlation power analysis . . . . . . . . . . . . . . . . . . . 8
2.6.2 Template attack . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.7 Attacks against SHA256 . . . . . . . . . . . . . . . . . . . . . . . . 9
2.7.1 Divide and conquer strategy . . . . . . . . . . . . . . . . . . 10
2.7.2 Extend and prune strategy . . . . . . . . . . . . . . . . . . . 12
2.7.3 Comparison of the strategies . . . . . . . . . . . . . . . . . . 13
4 Experimental results 21
4.1 Measurement acquisition setup . . . . . . . . . . . . . . . . . . . . . 21
4.1.1 Leakage measurement . . . . . . . . . . . . . . . . . . . . . 21
4.1.2 Stimulation of the encryption . . . . . . . . . . . . . . . . . 23
4.2 Search of the points of interest . . . . . . . . . . . . . . . . . . . . . 24
4.2.1 Alignment of the traces . . . . . . . . . . . . . . . . . . . . . 24
4.2.2 Search of the POI . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3 Results of the attacks . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.3.1 Side channel attacks description . . . . . . . . . . . . . . . . 31
III
4.3.2 Divide and conquer strategy : results and comparison of the
SCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.3.3 Extend and prune strategy : results of the SCA . . . . . . . 40
5 Conclusion 44
Bibliography 46
IV
Chapter 1
Introduction
These tokens, for commercial purposes, often have strong portability, memory
and price requirements that may lead to security issues. Side channel attacks
(SCA), which are an active research field since the first timing attacks performed
on different cryptographic implementations [2], are a variety of attacks that take
advantage of the physical leakage of the targeted device. They allow to decrease the
complexity of the attack from exponential to linear with the size of the cryptographic
secret at the cost of physical access to the device. Moreover, this kind of attack
can be succeeded without letting any physical traces on the device thanks to the
electromagnetic emanations for example. For these reasons, SCA seem to be a
natural way to recover the cryptographic secret of authentication hardware tokens.
Hence, one can ask about the security of these tokens in the light of the
SCA. In the context of this work, the FIDO second factor authentication protocol
implemented in the open source Solo key device will be attacked. While physical
security of the Solo key has already been investigated against a fault injection
adversary [3], its SCA resistance remains an open question. Therefore, this work
aims to study the security provided by the Solo key against SCA relying on
electromagnetic emanations of the device.
More in details, the goal is to achieve SCA on this token to allow the adversary
to authenticate as the rightful user. The attacker will be assumed to get physical
access of the token during a limited among of time, e.g. by stealing the device lying
1
on a colleague’s desk during the weekend. This limited among of time is considered
the time before the rightful owner of the device notices its disappearance. This
time is a key point as it bounds the power of the attacker. In this situation, it
is also required that the attacker does not leave remarkable traces on the device
while breaking it. It is also assumed that the attacker owns the user-password
pairs linked to the second factor authentication thanks to another kind of attacks.
One more remark consists in highlighting the fact that the Solo key device is
not certified according to the FIDO ’Authenticator Certification Level’. This
certification assesses the level of difficulty of breaking a device including SCA.
The contribution of this work is to assess the security of the open source yet
commercialized Solo key that implements the FIDO U2F authentication protocol
by performing different SCA against a HMAC-SHA256 construction in a real
application. This work is articulated the following way. In chapter two the
theoretical background necessary to understand this manuscript is described. Then,
in chapter three, the FIDO U2F authentication protocol is described and after a
short introduction about the Solo key, the way this protocol is implemented in
this device is investigated to find a surface of attack. Finally, in chapter four, the
whole train of thought to realize the different side channel attacks through the two
strategies against the Solo key is described and the obtained results are exposed.
2
Chapter 2
Theoretical background
2.1 SHA256
SHA256 is a 2001 hash function created by the NSA (National Security Agency).
The purpose of this cryptographic tool is to produce a digest that has a shorter
size than the size of the message that is hashed while being easy to compute and
uniquely identifying the message as a fingerprint of this one. Hash functions are
used in many applications such as message authentication codes, proof of work in
the bitcoin scheme, password storage, files integrity verification,... The SHA256
hash function takes as input a 512 bits message and a 256 bits randomly chosen
initial vector (IV) and outputs a 256 bits digest. Each of these inputs is cut into
words of 32 bits leading to 16 words for the message and 8 words for the IV, also
called in this framework the initial states. The algorithm of the SHA256 hash
function is given in Algorithm 1.
First, concerning Algorithm 1, all operations are performed on 32 bits words.
The + denotes a 32 bits modular addition. Then SIG0 , SIG1 , EP0 , EP1 , CH and
MAJ function are defined as
3
• EP1 (x) = (x >>> 6) + (x >>> 11) + (x >>> 25)
where x, y and z are 32 bits words, x >> b is the b bits shift to the right performed
on x, x >>> b is the b bits rotation to the right performed on x, & is the bitwise
AND operation, ∧ is the bitwise XOR operation and ∼x is the bitwise complement
of x. Finally, Ki is a constant hard coded table of 64 random words.
The SHA256 algorithm has also three properties that are enumerated below1 .
4
2. Second pre-image resistance: given s, x, it is difficult to find x’ such that
Hs (x) = Hs (x’).
3. Pre-image resistance: given s and y = Hs (x), it is difficult to find x’ such
that Hs (x’) = y.
Where s represents the initial vector, x and x’ are two different input messages and
Hs (x) is the SHA256 hash function.
2.2 HMAC
The hashed message authentication code (HMAC) is a cryptographic tool developed
in [9]. The purpose of this tool is to generate a message authentication code (MAC)
that cannot be forged based on a key, an initial vector (IV) and a cryptographic
hash function. A MAC is a tag appended to the message and that depends
on this message which allows, thanks to the key, to perform an authenticated
communication. The algorithm of this construction can be observed in Algorithm
2. In this algorithm, ∧ denotes the bitwise XOR and || the concatenation of the
Algorithm 2 HMAC
Inputs: the key K, the message to be authenticated M and the cryptographic hash
function H.
Output: the HMACK (M)
1: return HMACK (M) = H((K ∧ opad) || H((K ∧ ipad) || M))
messages. opad and ipad are fixed constants. This function is proved to be secure
under some assumptions about the cryptographic hash function used in it. The
SHA256 hash function fulfilled these assumptions. In the rest of this work, the
HMAC algorithm will be used in combination with the SHA256 function and will be
called the HMAC-SHA256 structure. A graphical representation of this structure
can be seen in Figure 2.1.
In this Figure one can notice that the structure is separated in two parts, the
inner hash and the outer hash. It is also possible to note that k0 and k1 are
constant values. Finally, a last observation that can be made is that recovering
somehow the value of k1 and k0 makes the tag forgeable.
2.3 Jitter
The jitter denotes the random variation of a presumably constant periodic signal.
The clock of a micro controller is such a signal. There are multiple measurements of
5
Figure 2.1: Representation of the HMAC-SHA256 structure
the jitter. The long term or absolute jitter denotes the deviation in time with refer
to a perfect clock. The cycle-to-cycle jitter is the deviation from one clock cycle
with refer to the previous one. The jitter can come from multiple sources, crosstalk
with other signals, electromagnetic interference or simply from the generation of
the clock.
This jitter has a strong impact on side channel attacks. Indeed, to perform
these attacks in a feasible among of traces, it is required to have a really low
jitter in the measured leakage, i.e. the number of traces to achieve an attack is
increasing with the jitter present in the traces. In fact, the designers of a device
can intentionally increase the jitter to reduce the potential of SCA. It thus can be
seen as a countermeasure. However, algorithms to align the traces have also been
studied in the literature such as in [10]. Hence, it can be seen as a game between
the potential jitter that can be added by the designers without touching to the
functionality of the device and the trace alignment algorithm developed by the
adversaries.
var(µc )
SN R =
mean(σc )
6
for one time sample of the leakage where µc and σc represent respectively the mean
and the variance of the traces linked to a class value. This class value corresponds
to a given state value in a cryptographic scheme. It thus requires to compute this
state value for each trace and to sort them in the corresponding classes before
performing the SNR.
Thanks to the SNR the number of traces N required to reach a high success
rate in recovering the secret can be approximated as
c
N'
SN R
where c is an empirical constant value.
7
different implementations of SCA relying on power consumption varying statistic
tools used to recover the secret information or the modeling of the physical leakage
have been proposed. In the next part of this section the two SCA used in this work
will be described.
8
If L is too high, the CPA might be computationally too intense to be performed.
However, partial CPA makes it possible by separating the secret K into b blocks
L
of length L/b. Indeed, it reduces the computational cost from O(2L ) to b*O(2 b ).
The side effects of applying partial CPA are that it might be not straightforward as
blocks can be related to each other and that the L-(L/b) other bits are considered
as noise since they are not taken into account in the modeling of the leakage.
9
SHA256 implementation. Both strategies will be described and their requirements
exposed. Then the strategies will be summarized in a comparative tabular which
intends to highlight the pros and cons of each of them.
Ei ← Ei secret + Mi
Ai ← Ai secret + Mi
where Ei secret and Ai secret are respectively the secret related to the operation
that outputs Ei and Ai . For those secrets to be a constant value in the ith round,
as SCA target a constant secret value manipulated with a known value (Mi here),
the messages of the previous rounds must be fixed. Moreover, once the message is
fixed to a value to attack the next round, its value must remain the same when
the next round is attacked. For example, when attacking the second round the
value of M1 must be fixed. Then when attacking the third round, the value of M1
and M2 must be fixed and the value of M1 must remain the same as the one used
during the attack in the round two. Indeed, the value E1 used in the equations
2.1 to 2.8 depends on the message M1 fixed. The value E2 used in the same
equations depends on the fixed M2 and E1 since E2 secret is linked to E1 . And so
on. Therefore, the validity of the equations 2.1 to 2.8 requires from the attacker to
have full control of the three first input messages. When the four first rounds were
attacked, E1,2,3,4 secret and A1,2,3,4 secret are known as well as E1,2,3,4 and A1,2,3,4 .
This attack takes advantage of the inter-round relationships between the states
of SHA256 e.g. H3 is equivalent to G2 , to F1 and to E0 where H3 denotes the value
10
of H at the third round and so on. As example, the attack to recover E0 relies on
the next equation
E0 can thus be obtained since all other terms of the equation are known. An
overview of the inter-round relationships can be seen in the Figure 2.2. In this
Figure, the known values, recovered thanks to the SCA, are in red and the values
that are recovered using the equations 2.1 to 2.8 are in blue.
Figure 2.2: Overview of the inter-round relationships. In red the values recovered
through the SCA. In blue the values recovered through the equation 2.1 to 2.8.
11
D2 ← E3 − (H2 + EP1 (E2 ) + Ch(E2 , F2 , G2 ) + K3 + M3 ) (2.4)
The disadvantages of this attack are that it requires four, one for each round
attacked, different sets of traces and the control of the three input messages of the
SHA256 function.
12
4. During the second round, recover the constant F1 = E0 (from the inter-round
relationship) by performing SCA on E1 & F1 in the CH(E1 ,F1 ,G1 ) function.
This step relies on the second stage of the attack.
9. Finally, H0 and C0 can be deduced thanks to all the previous stages. H0 can
be figured out in T1 ← H0 + EP1 (E0 ) + CH(E0 , F0 , G0 ) + K1 + M1 . And C0
can be found in T2 ← EP0 (A0 ) + MAJ(A0 ,B0 ,C0 ).
Performing all these steps requires only 1 set of traces and no condition on
the input message except its randomness. However, since the targeted states of
SHA256 in the stage four to seven of the attack strategy are the result of a bitwise
AND, it is important to observe that, in the case of the secret K is equal to 0,
performing SCA on K & Mi will not bring any information as the state will be
equal to 0 whatever the message is. In other words, the hypothesis K equals to 0
cannot be assessed.
13
the outer hash have to be guessed from the input messages of the inner hash and
its initial states. Finally, the rank estimation and key enumeration cryptographic
tools are applicable, by definition, only in the case of the recovery of independent
partial secrets. They are thus unusable in the extend and prune strategy.
14
Chapter 3
In this chapter, the FIDO U2F authentication protocol implemented in the Solo
key will be described. Then the different characteristics of the Solo key will be
shortly exposed and the implementation of the FIDO U2F protocol on it will be
analyzed.
15
3.1.1 Registration phase
The different steps of the registration phase go as follows:
1. The service sends to the U2F token the concatenation of the challenge param-
eter given by the relying party and the application parameter representing the
service identity requesting the registration. Note that the challenge parameter
is saved by the relying party.
2. The U2F protocol requires proof of the user presence by pressing a button
on the device for example.
3. The U2F token generates a new private-public key pair and a key handle
that allows the token to recognize the generated key pair. The public key is
derived from the private key as the coordinates of point on the P-256 National
Institute of Standards and Technology elliptic curve.
4. The U2F token saves the private key.
5. The U2F token signs with its attestation key corresponding to its attestation
formulate thanks to the elliptic curve digital signature algorithm (ECDSA)[15],
the concatenation of 0x00 (hexadecimal number representation), the appli-
cation parameter, the challenge parameter, the key handle and the public
key.
6. The token sends to the service the concatenation of 0x05 (legacy reason), the
public key, the key handle length, the key handle, the attestation certificate
and the signature previously computed.
7. The service verifies through the relying party the attestation certificate which
has to be issued by a trusted certification authority and the signature. The
signature can verified since the message as well as the elliptic curve parameters
and the public key which is contained in the attestation certificate are known
by the relying party.
8. If the conditions are fulfilled, the relying party stores the key handle and the
public key and it sends a succeed registration message to the service.
16
Figure 3.1: Overview of the registration phase.
2. The service sends to the token the concatenation of a control byte, the
challenge parameter, the application parameter, the key handle length and
the key handle. The control byte has 3 possible values. 0x07, the U2F token
only verifies if the key handle and the application parameter correspond and
were created by it. No signature is provided in this case. 0x03, requires the
presence of the user to perform the end of the protocol. 0x08, the presence
of the user is not required to perform the end of the protocol.
3. Depending on the control byte, the U2F may or may not require proof of the
user’s presence.
4. The token verifies if the key handle was created by it and corresponds with
the application parameter.
5. The token signs with the private key corresponding to this key handle, still
with the ECDSA, the concatenation of the application parameter, the user
presence which is just a byte at 0 or 1 whether the presence of the user
was verified or not, a counter value that is incremented by 1 each time an
authentication occurs and the challenge parameter.
6. The token sends to the service the concatenation of the user byte, the counter
and the signature.
7. The service verifies through the relying party that the signature is correct
thanks to the public key saved during the registration. The relying party also
verified the counter value which is a countermeasure from malicious dupli-
cating of the key. Effectively by verifying that the counter value associated
with a key handle value is only increasing a mismatch between a rightful key
17
and its duplication is likely to occur. In other words, if the duplicated key
uses a counter value lower than the one contained by the relying party, then,
the authentication will be rejected. If, in the opposite way, the duplicated
key uses a counter value higher than the authentication will occur. However,
the user of the rightful key might be rejected in its further authentication if
its counter value is lower than the one used by the duplicated one. The user
thus knows that his authenticating device is compromised.
8. If the conditions are fulfilled, the relying party authorizes the service to
proceed to the authentication and saves the counter value.
The Solo key is the first open source security key that implements the FIDO U2F
protocol as well as the FIDO CTAP2 protocol, another protocol for passwordless,
second factor or multi-factors authentication also developed by the FIDO Alliance
group. There are different kinds of Solo keys but the ones this work is focused
on are the "normal" Solo and the "hacker" Solo. Both of these Solo keys have the
18
same purpose and thus rely on the same hardware and firmware. They diverge by
the fact that the solo hacker’s firmware is programmable, i.e. it is possible to load
unsigned by SoloKeys application. It is important to note that the implementation
of the U2F protocol on the Solo key is not FIDO certified by their ’Authenticator
Certification Level’ meaning that there might exist some hardware attack on it. It
has already been demonstrated by [3].
The token relies on the STM32L432 micro controller (MCU) running at 48
MHz to perform its cryptographic operations. This MCU also has a true random
number generator that guarantees the randomness needed by those cryptographic
operations. The open source repository of the Solo key can be found in [16]. In
this repository the whole hardware and software of the Solo keys can be found as
well as some Python functions. These Python functions are useful especially for
programming the hacker Solo key or simulating cryptographic functions. Thanks
to this repository the implementation of the U2F protocol on the Solo key can
be analyzed and, more precisely, the way the private key is computed during the
registration phase or the authentication phase of the protocol.
During the registration phase, as it can be seen in the Figure 3.4, it consists of
two HMAC-SHA256 structures in cascade. The first one is fed with the master key
(64 bytes), a secret key that is generated at the first boot of the token and which is
independent from one token to the other, then with the concatenation of a random
number generated by the token (32 bytes), the key handle, and the application
19
parameter (32 bytes) received by the service. After that, a second HMAC-SHA256
is fed with the master key as well and the result of the first HMAC-SHA256
concatenated with the 32 LSBs (in the context of this work, LSB refers to least
significant byte) of the master key. The "Length" represents the addition of the
length of all previous input parameters of the HMAC construction. During the
authentication phase, the private key is computed following the same structure
except that the key handle is now a message sent by the service. The verification
that the key handle was created by it and corresponds to the application parameter
is made by comparing whether or not the private key (re)generated by this scheme
was saved, earlier in a registration phase, on the token.
Since the initialization vector (IV) and the master key are constant, k0 and k1
(see Figure 3.4) are constant and represent a surface of attack to recover all private
keys. Effectively by finding those two initial states of the SHA256 construction
and with the parameters given by the service during the authentication phase (see
section 3.1.2), it is possible to derive the private key corresponding to that key
handle and to that application parameter.
20
Chapter 4
Experimental results
This chapter has as purpose to expose the train of thought of the side channel
attacks through the divide and conquer strategy and the extend and prune strategy
realized on the Solo key to recover k0 and k1 and the experimental results that
accompany it. First the measurement acquisition setup will be described. Then
the required trace alignment algorithm will be exposed. Next, the search of the
points of interest (POI) for each strategy that are necessary to mount the attacks
against the HMAC-SHA256 construction present in the Solo key will be described.
Finally, the different side channel attacks used will be explained, launched and if
succeeded compared to each other in terms of number of traces needed to find back
the secret. All the background necessary to understand this section is available in
Chapter two.
21
probe is placed just above the micro controller at the position that magnify the
information contained in the signal. An overview of the measurement setup can be
observed in Figure 4.1. This leakage measurement is non-invasive meaning that
among other benefits no footprints are left on the targeted device.
Figure 4.1: Overview of the setup of measurements. On the left: R&S HZ-15
probe just above the MCU of the Solo key. On the right: matched R&S HZ-16
preamplifier and PicoScope.
Then, another important point is the triggering technique for this measurement
setup in order to measure only the part of the leakage that is interesting to perform
the attack and so decrease the memory needed to store all the leakage measurements
necessary to mount the attack. It also suppresses the long term jitter that can
be present due to noisy clock and thus performs a synchronization of the leakage
traces. To achieve this trigger on the hacker Solo key, a modified software version
is used to send a message with the Universal Asynchronous Transmitter Receiver
(UART) communication protocol. This message, containing only zeros, is used as a
trigger from the key to the external world. The UART communication protocol
was already implemented in the Solo key and the electric signal is easily recoverable
as pins to measure it are available on the surface of the Solo key. These pins can
be seen in the Figure 4.2.
This triggering technique applied just before the HMAC construction of the
Solo key allows to output the leakage measurement in Figure 4.3. In this Figure,
five distinct groups related to the five executions of the SHA256 inside the HMAC
construction (see Figure 3.4) can be observed. It can be noticed that this leakage
shape is really similar to the one in [17]. The trace is also really noisy since all the
different spikes that occur at regular intervals are unwanted and are shifted from
one trace to the other.
22
Figure 4.2: Back of the Solo key. The right most pin highlighted by the red square
corresponds to the UART sending pin. The other highlighted pin is a ground
reference voltage.
Figure 4.3: EM leakage observed by triggering at the begin of the HMAC protocol.
23
disabled. Thanks to all of these modifications and to the measurement setup, the
automated recording traces rate is around 1.5 traces per second.
These changes allowing an automatic encryption stimulation choosing both
the key and the plaintext (the input message) are possible due to the reprogram-
ming capability of the hacker version of the Solo key. Nonetheless they are not
straightforward at all to apply to the normal version of the Solo key.
24
The static methods consist in evaluating the offset difference between two traces
and then shifting back this offset in one of the traces. The dynamic ones propose
a non-linear resampling of the traces to be aligned to match a trace. One static
algorithm that requires low computations is the cross-correlation alignment.
25
Figure 4.4: Steps of the cross-correlation alignment algorithm. The upper graph
shows the initial trace set, the middle one shows the remaining traces after discarding
and the lower one shows the alignment of those traces
Figure 4.5: The upper graph corresponds to the reference trace, the graph in the
middle is the mean of 50000 traces without the alignment algorithm and the lower
one is the mean of the traces obtained by the alignment algorithm.
26
The main drawbacks of this algorithm are its effective windows as highlighted
in the Figure 4.5. In this Figure, it can be observed that the mean of the aligned
traces has a similar leakage value to the model trace only in small range roughly
from sample number 11000 to sample number 13000. Outside this range, the mean
of the aligned traces falls off with refer to the model trace. A dynamic alignment
algorithm, such as the elastic alignment algorithm from [10], could counteract
this side effect but would require a lot more computing power while one could
translate the effective window depending on the point of interest targeted. One
can also notice in Figure 4.5 that the mean of non aligned traces leads in barely
indistinguishable signal. This highlights once again the need of the trace alignment
algorithm. All following experiments are made with traces aligned by the Algorithm
4.
Figure 4.6: SNR value computed with a fixed key and only the least significant
byte of the plain random. The model trace is shown as a reference
The SNR statistical tool is used to localize a precise state, such as a modular
addition or the bitwise AND between 2 values, or any bijective state related to this
27
particular state in time. The SNR is said to converge at a state or its bijection
when a particular peak is standing out of the noise level. The POI of a particular
state is the points of convergence of this particular state.
The first analysis that is performed consists in evaluating the SNR of the
SHA256 algorithm with fixed input states and a fixed plaintext unless the least
significant byte which is random. This gives an idea about where the first input
word is manipulated in the SHA256 structure. The result of this evaluation is
shown in Figure 4.6. Since only a byte is random, a lot of bijections states exist due
to the SHA256 structure. This allows to guess the different parts of the SHA256
algorithm. Effectively one could guess that the moment where there is only noise
in the SNR is the moment where the initial states are loaded since they are not
related at all with the first word message. The pattern that can be observed on the
left of these loading samples corresponds to the extension algorithm of the message
present at the beginning of SHA256 and the pattern that can be noticed on the
right of these samples corresponds to the rounds of transformation of SHA256.
The difference of amplitude of the SNR on the sides is explained by the end of the
effective window of the trace alignment algorithm. The difference of amplitude at
the beginning of the rounds of SHA256 is related to the fact that the dependence
of the different states of the algorithm with the first LSB of the plaintext increases
with the rounds.
Figure 4.7: SNR value of targeted operations needed by the divide and conquer
strategy from round one to round four.
28
Then the second analysis made consists in targeting the modular addition
operations through the four rounds that are necessary to perform the divide and
conquer attack strategy (see Section 2.7.1). To suppress a maximum of bijective
dependence between two states and thus obtain only the aimed state, the SNR is
performed on a set of traces with the initial states and the plaintext random. The
SNR for each targeted operation is shown in Figure 4.7. The same shape can be
observed from one round to the other meaning that each round leaks more or less
the same amount of information. Thanks to this test, it is possible to find the POI
to perform the divide and conquer attack. One important thing to note is that the
modular addition updating the value of the state A delivers less information than
the one actuating the value of the state E. Another remark is that the leftmost
POI found are due to bijective dependence.
After that, the same methodology is applied to find the POI of the extend
and prune strategy by targeting modular additions and bitwise AND operations
through the first and the second round (as described in Section 2.7.2). However, the
eight bits SNR does not converge at all while targeting a bitwise AND operation
using the same number of traces as the one used when computing the SNR of
the modular addition. This non convergence can be observed in the third sub
figure of the Figure 4.8. This effect can be explained by the fact that unlike the
modular addition the output of the bitwise AND is not uniform. For example, as
it is highlighted in the Table 4.1 the bitwise AND between two random variables
of two bits leads to a probability of 9/16 to output 00, 3/16 to output 01 and 10
and 1/16 to output 11.
In 1 In 2 Out In 1 In 2 Out
00 00 00 10 00 00
00 01 00 10 01 00
00 10 00 10 10 10
00 11 00 10 11 10
01 00 00 11 00 00
01 01 01 11 01 01
01 10 00 11 10 10
01 11 01 11 11 11
Table 4.1: Output of two bits bitwise AND between two variables
29
random inputs is given by 2−2n , n being the size of the field. Hence, it is needed to
increase the number of traces used to compute the SNR in order to obtain the POI
for an eight bits field. Decrease the size of the field can also lead to a convergence
of an SNR since there are fewer classes to estimate and the non uniformity is less
accentuated. This effect can be observed in the Figure 4.8. In this Figure, the SNR
is computed using 200 000 traces and targets the bitwise AND operation between
A1 and B1 which is happening at the beginning of the round two. The SNR in the
two and four bits fields converge while it does not converge in the eight bits field.
One can also notice that the noise floor is higher in the four bits field than the one
in the 2 bits. The spikes found in the two first sub figures can be used as POI.
Figure 4.8: SNR value computed with 200 000 traces of a targeted AND operation
for the field sizes 2,4 and 8 bits.
The other POI of this strategy can be searched using the same method and can
be observed in the Figure 4.9. This Figure aims to present the POI of the eight
SCA that are necessary to recover the secret. One can notice that the convergence
of the different SNR reaches different values related to the information recovered
from the traces for this specific operation.
30
Figure 4.9: Two bits SNR value of targeted operations needed by the extend and
prune strategy over round one and two.
the malicious time needed in possession of the Solo key to be broken. In order
to fix a reasonable amount of this malicious time, a limit of 200 000 traces is
fixed for the online attack representing more or less a day and half of time. First,
the different SCA will be described. Then, the results of the divide and conquer
strategy will be exposed and interpreted. This strategy is presented first as it is the
less tricky one since it relies only on modular addition operation and the partial
secrets are independent of each other. Finally, the results of the extend and prune
strategy will be shown. The results of the attacks presented in this section are
only done to recover the inner hash. However, since the outer hash relies on the
same cryptographic algorithm computed on the same MCU, there is no reason for
the attacks of the extend and prune strategy to not obtain similar results in the
recovery of the outer hash.
31
as the leakage depends on every bit of the secret. Moreover, particular attention
needs to be given if an attacked block has a dependence on the other blocks. For
example, in modular addition the carry out of the previous assessed blocks needs to
be taken into account in the next partial attack. In the context of this work, eight
bits partial attacks on a modular addition operation and bitwise AND operation
will be used. In the case of modular addition operation, the carry in between the
different attacked blocks, since it propagates from the LSB to the MSB, will be
considered as found by the previous successful partial attack starting with the LSB
partial attack which requires no carry. This assumption will be assessed later in
the next section.
The four different SCA used in this framework are described in the following
part. The value of the targeted operation can be, depending on the strategy, the
result of the modular addition between a secret value byte, a known byte related to
the plaintext byte and a potential carry out coming from the LSB addition or the
result of the bitwise AND between a secret value byte and a known byte related to
the plaintext byte. In the case of the profiling phase required for some SCA, the
secret value byte is random and known.
1. A non profiled CPA. The profile used in such an attack is the Hamming
weight of the value of the targeted operation. The advantage of this attack is
that it does not require a model computation however it makes the strong
assumption that the targeted device leaks according to the Hamming weight.
Another advantage of the non profiled CPA is that the POI are not necessary
to launch the attack even if it allows lower computation time and avoid
potential higher bijective spikes that are linked to another hypothesis key
value.
2. A linear regression (LR) profiled CPA [18]. This attack works with a precom-
puted profile that aims to capture any linear relation between the leakage
value and its corresponding value of the targeted operation. The linear regres-
sion model is computed such that it fits a coefficient vector to minimize the
residual sum of squares between the value of the targeted operation, and its
leakage value, predicted by the linear approximation. This LR profiled CPA
requires a set of profiling traces and the knowledge of the POI but improves
the modeling of the leakage compared to the non profiled CPA and relaxes
the assumption about the way the targeted device is leaking. The POI value
used corresponds to one of the SNR spikes that has been observed in Figure
4.7 and in Figure 4.9.
3. An exhaustive CPA with first-order moments [19]. This attack also works
with a precomputed profile but this one aims to capture any relation between
32
the leakage and its corresponding value of the targeted operation. The
modeling of leakage is performed through the first order statistical moment,
meaning that each trace present in the profiling traces set is first sorted in
the group of its corresponding value of targeted operation and then the mean
of each group is computed and used as profile. This attack also requires POI
which will be the same as the ones used in the LR CPA. The advantage of
this method is that it perfectly models the leakage with no assumption on
it. However, it may require more profiling traces to converge than the LR
profiled CPA.
4. A univariate template attack (see Section 2.6.2). This attack as the LR and
exhaustive CPA also requires POI knowledge.
The main motivation to perform all three different CPA is that, in the case
of modular addition, the correlation between the modeled output of the targeted
operation and the leakage, that can be observed in Figure 4.10, is really similar in
terms of shape and amplitude in all three cases. Thanks to this figure, one can
conclude that the Solo key is leaking according to the Hamming weight model or
some relation really close to it. The three models will thus be compared in terms
of needed traces to recover the secret.
Figure 4.10: Comparison of the correlation coefficient for the POI of the first round
for all three models in the case of the divide and conquer strategy. On the left the
output of aimed modular addition E1 and on the right, A1. The corresponding
SNR value of the operation is given as reference.
33
4.3.2 Divide and conquer strategy : results and compari-
son of the SCA
First, the results of the CPA according to the Hamming weight profile model are
shown in the Figure 4.11. The eight secrets that have to be found to recompute
the initial states of the SHA256 algorithm are separated in four partial CPA, one
for each byte of the 32 bits secret. The graphs show in blue the evolution of the
correlation coefficient of all hypothesis tested in regard to the number of traces.
The correct hypothesis is highlighted in red. One can observe that in less than
50 000 traces all secret bytes are found. Another remark is that it seems more
difficult to recover the secret corresponding to the A state of SHA256 than the one
corresponding to the E state. This is linked to the fact that the peak of the SNR
corresponding to the A state of SHA256 is more or less 3 times smaller than the
peak corresponding to the E state.
Secondly, the results corresponding to the linear regression are presented the
same way as the ones of the non profiled CPA and can be observed in Figure 4.12.
The profiled model is obtained with 200 000 traces set with random initial states
and random plaintext. By analyzing the results, one can conclude that they are
really close to the ones obtained by the non profiled CPA and that all secrets are
also recovered in less than 50 000 traces. The same explanation regarding the
poorer performance to recover the secret of the A state of SHA256 as the one given
in the non profiled CPA can be used here.
Thirdly, the results corresponding to the exhaustive CPA can be seen in Figure
4.13 and are really less convincing than the ones in the other CPA. The profiling
phase is performed with the same traces set as the LR CPA. In one way, the
results corresponding to the secret related to the E state are better than in the
previous attack, especially for the E2 secret where the correlation of the correct
hypothesis detaches more than in the previous attacks. In another way, the results
corresponding to the secret related to the A state are far poorer than the previous
attacks. All the secrets cannot be recovered in less than 50 000 traces in the
exhaustive profiled CPA. This can be explained by the fact that a lesser SNR
requires more traces to achieve a great profiling and to succeed the attack phase.
34
35
Figure 4.11: Correlation coefficients of all hypothesis of all partial secrets obtained
by the non profiled CPA through the divide and conquer strategy in function of
the number of traces. The correct hypothesis is represented in red.
36
Figure 4.12: Correlation coefficients of all hypotheses of all partial secrets obtained
by the linear profiled CPA through the divide and conquer strategy in function of
the number of traces. The correct hypothesis is represented in red.
37
Figure 4.13: Correlation coefficients of all hypotheses of all partial secrets obtained
by the exhaustive profiled CPA through the divide and conquer strategy in function
of the number of traces. The correct hypothesis is represented in red.
In order to compare the different attacks, the rank estimations of the initial
states of each of them have been computed in function of the number of traces
used in the Figure 4.14. With no surprise the Hamming weight model and the LR
model give significantly better results than the exhaustive model. Since they lead
to the similar results, the non profiled CPA should be preferred in regard to the
LR CPA as it does not require any profiling phase.
38
analyzes the impact on the attacks. This is what is shown in the Figure 4.16. In
this Figure, it can be noticed that the curve that takes the carry in propagation
into account is slightly shifted to the right meaning that the previous attacks were
effectively overestimated. Nevertheless, this overestimation is quite small and does
not change the final trend of the rank estimation of the secret. The attack used to
perform this figure is the non profiled CPA.
Finally, the rank estimation of another random initial states of the SHA256
algorithm is also performed in Figure 4.17. The 2 curves really follow the same
trend which allow to somehow validate the attack.
39
4.3.3 Extend and prune strategy : results of the SCA
First of all, an additional preliminary hypothesis is made due to the specificity of
the extend and prune strategy. This strong hypothesis consists in assuming that
the previous secrets needed to perform this stage of the strategy are well guessed.
This hypothesis will be evaluated in case of successfully recovering the initial states
of SHA256 with an attack.
Secondly, only the non profiled CPA and the linear profiled CPA will be
attempted. Indeed, in the light of their performance during the divide and conquer
strategy, they are the attacks with the greater potential of success. Moreover, it
avoids the complexity of computing the exhaustive profile of the bitwise AND
operation since this complexity is related to the convergence of the eight bits SNR
of this operation.
Next, the results of the non profiled CPA are shown in the Figure 4.18. These
results are quite mixed. D0 and T2 are the only partial secrets that are completely
recovered. These two attacks targets the same states of SHA256 than the attacks
of the divide and conquer strategy. The other results are less convincing certainly
due to a lack of traces. In the case of the non profiled CPA against a bitwise AND
operation (E0, F0, A0, B0) the necessity of increasing the number of traces can be
due to its intrinsic nature. Indeed as highlighted in [5], if one takes a key hypothesis
K1 and another K2 = K1 ∧ 1 then HW (K1 &Mi ) = HW (K2 &Mi ) for 50% of the
random input messages Mi and for the other 50% Mi the Hamming weight will
only differ by 1. The hypotheses will thus have similar correlation coefficients. In
the remaining cases, α and B0, the need of more traces could be explained more
simply by the fact that the operation targeted in both cases, the updating of T1 ,
leaks less than the operations that update the states A and E. This can be seen
in the Figure 4.9 where the peaks of SNR related to the update of T1 are smaller
than ones related to the SNR of E1 and A1.
40
41
Figure 4.18: Correlation coefficients of all hypothesis of all partial secrets obtained
by the non profiled CPA through the extend and prune strategy in function of the
number of traces. The correct hypothesis is represented in red.
Finally, the result of the linear profiled CPA can be observed in the Figure 4.19.
The profiling phase is made with 200 000 traces with the same set of traces as the one
used during the divide and conquer strategy. These results are really comparable
to the ones obtained with the non profiled CPA. This is explained by a quite
accurate profiling model that is close to the Hamming weight as already highlighted
above. The same explanation as the one provided for the non profiled CPA is
therefore applicable for the linear profiled CPA. Only one noticeable difference
can be observed concerning G0 which is well recovered here. However, this is not
surprising since the peak of the SNR of T1 in the second round, even if it is smaller,
is not very far from the one concerning A1.
To conclude, the different attacks through the extend and prune strategy do
not allow to recover the entire secret using 200 000 traces. The absence of success
is likely to be due to a lack of traces resulting from small SNR spikes and the
nature of the bitwise AND operation. The failure of this strategy implies a failure
to break the entire HMAC-SHA256 construction since k1 cannot be found. And so,
a failure of recovering the private key of the Solo key.
42
43
Figure 4.19: Correlation coefficients of all hypothesis of all partial secrets obtained
by the linear profiled CPA through the extend and prune strategy in function of
the number of traces. The correct hypothesis is represented in red.
Chapter 5
Conclusion
The goal of this master thesis was to evaluate the resistance of the open source
FIDO U2F protocol implemented in the Solo key. To realize this objective, the
protocol was first investigated to find a surface of attack. From this stage it can
be concluded that breaking the HMAC-SHA256 present in the protocol allows the
adversary to perform authentication as the rightful user. Then the measurement
setup needed to acquire the leakage traces necessary to mount the attack was
described. After that, the analysis of the leakage was performed, thanks to a trace
alignment algorithm, to recover the points of interest hidden in this leakage for
two strategies, the divide and conquer one and the extend and prune one. Finally,
different attacks were performed and compared to each other. It results that in
the case of the divide and conquer strategy, the simplest attack corresponding to a
non profiled CPA is the best and is able to recover the initial states of the inner
hash of the HMAC-SHA256 algorithm in 200 000 traces. The results of the extend
and prune strategy, the only strategy that can be employed to recover the initial
states of the outer hash of HMAC-SHA256, are not satisfying likely due to a lack
of traces used to perform the attacks. Hence, the FIDO U2F protocol implemented
in the Solo key was not broken in the context of this master thesis.
However, a lot of improvements can be managed. The needs of increasing the
number of traces and so the malicious time in possession of the targeted Solo key
could be reduced by improving the attacks. Indeed, better synchronization of traces
(through the elastic alignment algorithm [10] for example), better measurement
setup or preprocessing the traces such as frequency filtering will hopefully lead
to a reduction of the number of traces required by the attacks. The use of more
powerful attacks or increasing the number of traces during the profiling phase could
also go in this direction.
Finally, due to the potential feasibility of the described attacks, some counter-
measures could be considered. Those countermeasures can take multiple aspects.
44
For example, it can be an implementation of a masked version the SHA256 algo-
rithm [4] which will only involve more time to perform the authentication protocol,
but will require more to an infeasible number of traces to break it. Indeed, using
only two shares will increase the number of traces needed to break the inner hash
to 200 000 traces squared. Or more simply, protecting the micro controller with
an electromagnetic shield requiring that the attacker leaves fingerprints on the
targeted token while only increasing the cost of the device.
45
Bibliography
[3] Colin O’Flynn. Min()imum failure: Emfi attacks against usb stacks. WOOT’19:
Proceedings of the 13th USENIX Conference on Offensive Technologies, page 15,
2019.
[4] Robert McEvoy, Michael Tunstall, Colin C. Murphy, and William P. Marnane.
Differential power analysis of hmac based on sha-2, and countermeasures.
Information Security Applications: 8th International Workshop, WISA 2007,
2007, Revised Selected Papers, pages 317–332, 2007.
[5] Sonia Belaïd, Luk Bettale, Emmanuelle Dottax, Laurie Genelle, and Franck
Rondepierre. Differential power analysis of hmac sha-2 in the hamming
weight model. 2013 International Conference on Security and Cryptography
(SECRYPT), 2013.
[6] Pankaj Rohatgi and Mark Marson. Nsa suite b crypto, keys, and side channel
attacks - rsa conference 2013. https://query.prod.cms.rt.microsoft.com/
cms/api/am/binary/RE2KEup.
[7] Catherine H Gebotys, Brian A White, and Edgar Mateos. Preaveraging and
carry propagate approaches to side-channel analysis of hmac-sha256. ACM
Transactions on Embedded Computing Systems - Article No.: 4, 2016.
[8] David Oswald, Bastian Richter, and Christof Paar. Side-channel attacks on
the yubikey 2 one-time password generator. Research in Attacks, Intrusions,
and Defenses: 16th International Symposium, RAID 2013, Rodney Bay, St.
Lucia, October 23-25, 2013. Proceedings, pages 204–222, 2013.
46
[9] Mihir Bellare, Ran Canetti, and Hugo Krawczyk. Keying hash functions
for message authentication. Advances in Cryptology, Proceedings Crypto ’96,
pages 1–15, 1996.
[10] Marc F Witteman Jasper G. J. van Woudenberg and Bram Bakker. Improving
differential power analysis by elastic alignment. CT-RSA’11: Proceedings of
the 11th international conference on Topics in cryptology, pages 104–119, 2011.
[11] Cezary Glowacz, Vincent Grosso, Joachim Schüth Romain Poussier, and
François-Xavier Standaert. Simpler and more efficient rank estimation for
side-channel security assessment. FSE 2015: Fast Software Encryption, pages
117–129, 2015.
[12] Paul Kocher, Joshua Jaffe, and Benjamin Jun. Differential power analysis.
CRYPTO ’99, Advances in Cryptology, pages 388–397, 1999.
[13] Sampath Srinivas, Dirk Balfanz, Eric Tiffany, and Alexei Czeskis. Universal
2nd factor (u2f) overview. https://fidoalliance.org/specs/fido-u2f-v1.
2-ps-20170411/fido-u2f-overview-v1.2-ps-20170411.html.
[14] Dirk Balfanz, Jakob Ehrensvard, and Juan Lang. Fido u2f raw message for-
mats. https://fidoalliance.org/specs/fido-u2f-v1.2-ps-20170411/
fido-u2f-raw-message-formats-v1.2-ps-20170411.html.
[15] Mehmet Adalier and Antara Teknik. Efficient and secure elliptic curve cryptog-
raphy implementation of curve p-256. Workshop on Elliptic Curve Cryptography
Standards, 2015.
[17] Pierre-Alain Fouque, Gaëtan Leurent, Denis Réal, and Frédéric Valette. Prac-
tical electromagnetic template attack on hmac. Cryptographic Hardware and
Embedded Systems - CHES 2009, pages 60–80, 2009.
[18] Werner Schindler, Kerstin Lemke, and Christof Paar. A stochastic model for
differential side channel cryptanalysis. Cryptographic Hardware and Embedded
Systems - CHES 2005, 7th International Workshop, pages 30–46, 2005.
[19] Amir Moradi and François Xavier Standaert. Moments-correlating dpa. TIS
’16: Proceedings of the 2016 ACM Workshop on Theory of Implementation
Security, pages 5–15, 2016.
47
UNIVERSITÉ CATHOLIQUE DE LOUVAIN
École polytechnique de Louvain
Rue Archimède, 1 bte L6.11.01, 1348 Louvain-la-Neuve, Belgique | www.uclouvain.be/epl