Sec19 Dessouky
Sec19 Dessouky
Hardware Bugs
Ghada Dessouky and David Gens, Technische Universität Darmstadt; Patrick Haney and
Garrett Persyn, Texas A&M University; Arun Kanuparthi, Hareesh Khattri, and Jason
M. Fung, Intel Corporation; Ahmad-Reza Sadeghi, Technische Universität Darmstadt;
Jeyavijayan Rajendran, Texas A&M University
https://www.usenix.org/conference/usenixsecurity19/presentation/dessouky
Ghada Dessouky† , David Gens† , Patrick Haney∗ , Garrett Persyn∗ , Arun Kanuparthi◦ ,
Hareesh Khattri◦ , Jason M. Fung◦ , Ahmad-Reza Sadeghi† , Jeyavijayan Rajendran∗
† Technische Universität Darmstadt, Germany. ∗ Texas A&M University, College Station, USA.
◦ Intel Corporation, Hillsboro, OR, USA.
ghada.dessouky@trust.tu-darmstadt.de,david.gens@trust.tu-darmstadt.de,
prh537@tamu.edu,gpersyn@tamu.edu,arun.kanuparthi@intel.com,
hareesh.khattri@intel.com,jason.m.fung@intel.com,
ahmad.sadeghi@trust.tu-darmstadt.de,jv.rajendran@tamu.edu
Abstract ticated attacks that combine software and hardware bugs to
exploit computing platforms at runtime [20, 23, 36, 43, 45, 64,
Modern computer systems are becoming faster, more efficient, 69, 72, 74]. These cross-layer attacks disrupt traditional threat
and increasingly interconnected with each generation. Thus, models, which assume either hardware-only or software-only
these platforms grow more complex, with new features con- adversaries. Such attacks may provoke physical effects to in-
tinually introducing the possibility of new bugs. Although the duce hardware faults or trigger unintended microarchitectural
semiconductor industry employs a combination of different states. They can make these effects visible to software adver-
verification techniques to ensure the security of System-on- saries, enabling them to exploit these hardware vulnerabilities
Chip (SoC) designs, a growing number of increasingly so- remotely. The affected targets range from low-end embedded
phisticated attacks are starting to leverage cross-layer bugs. devices to complex servers, that are hardened with advanced
These attacks leverage subtle interactions between hardware defenses, such as data-execution prevention, supervisor-mode
and software, as recently demonstrated through a series of execution prevention, and control-flow integrity.
real-world exploits that affected all major hardware vendors.
In this paper, we take a deep dive into microarchitectural Hardware vulnerabilities. Cross-layer attacks circumvent
security from a hardware designer’s perspective by reviewing many existing security mechanisms [20, 23, 43, 45, 64, 69, 72,
state-of-the-art approaches used to detect hardware vulnera- 74], that focus on mitigating attacks exploiting software vul-
bilities at design time. We show that a protection gap currently nerabilities. Moreover, hardware-security extensions are not
exists, leaving chip designs vulnerable to software-based at- designed to tackle hardware vulnerabilities. Their implemen-
tacks that can exploit these hardware vulnerabilities. Inspired tation remains vulnerable to potentially undetected hardware
by real-world vulnerabilities and insights from our industry bugs committed at design-time. In fact, deployed extensions
collaborator (a leading chip manufacturer), we construct the such as SGX [31] and TrustZone [3] have been targets of suc-
first representative testbed of real-world software-exploitable cessful cross-layer attacks [69, 72]. Research projects such
RTL bugs based on RISC-V SoCs. Patching these bugs may as Sanctum [18], Sanctuary [8], or Keystone [39] are also not
not always be possible and can potentially result in a product designed to ensure security at the hardware implementation
recall. Based on our testbed, we conduct two extensive case level. Hardware vulnerabilities can occur due to: (a) incor-
studies to analyze the effectiveness of state-of-the-art security rect or ambiguous security specifications, (b) incorrect design,
verification approaches and identify specific classes of vulner- (c) flawed implementation of the design, or (d) a combination
abilities, which we call HardFails, which these approaches thereof. Hardware implementation bugs are introduced either
fail to detect. Through our work, we focus the spotlight on through human error or faulty translation of the design in
specific limitations of these approaches to propel future re- gate-level synthesis.
search in these directions. We envision our RISC-V testbed SoC designs are typically implemented at register-transfer
of RTL bugs providing a rich exploratory ground for future level (RTL) by engineers using hardware description lan-
research in hardware security verification and contributing to guages (HDLs), such as Verilog and VHDL, which are synthe-
the open-source hardware landscape. sized into a lower-level representation using automated tools.
Just like software programmers introduce bugs to the high-
level code, hardware engineers may accidentally introduce
1 Introduction bugs to the RTL code. While software errors typically cause
a crash which triggers various fallback routines to ensure the
The divide between hardware and software security research safety and security of other programs running on the platform,
is starting to take its toll, as we witness increasingly sophis- no such safety net exists for hardware bugs. Thus, even mi-
2 Addresses for L2 memory is out of the specified range. Native 3 3 3 43 6746 3.5×1013
3 Processor assigns privilege level of execution incorrectly from CSR. Native 7 3 3 2 1186 2.1×1096
4 Register that controls GPIO lock can be written to with software. Inserted (CVE-2017-18293) 3 3 7 2 1186 2.1×1096
5 Reset clears the GPIO lock control register. Inserted (CVE-2017-18293) 3 3 7 2 408 1
6 Incorrect address range for APB allows memory aliasing. Inserted (CVE-2018-12206 / 3 3 7 1 110 2
CVE-2019-6260)
8 Address range overlap between GPIO, SPI, and SoC control peripherals. Inserted (CVE-2018-12206 / 3 3 3 68 14635 9.4×1021
(CVE-2017-5704)
10 Advanced debug unit only checks 31 of the 32 bits of the password. Inserted (CVE-2017-18347 / 7 3 7 4 436 16
CVE-2017-7564)
11 Able to access debug register when in halt mode. Native (CVE-2017-18347 / 7 3 3 2 887 1
12 Password check for the debug unit does not reset after successful check. Inserted (CVE-2017-7564) 7 3 3 4 436 16
13 Faulty decoder state machine logic in RISC-V core results in a hang. Native 7 3 3 2 1119 32
14 Incomplete case statement in ALU can cause unpredictable behavior. Native 7 3 3 2 1152 4
15 Faulty logic in the RTC causing inaccurate time calculation for security-critical flows, e.g., DRM. Native 7 3 7 1 191 1
16 Reset for the advanced debug unit not operational. Inserted (CVE-2017-18347) 7 7 3 4 436 16
19 Insecure hash function in the cryptography module. Inserted (CVE-2018-1751) 7 7 7 24 2651 N/A
20 Cryptographic key for AES stored in unprotected memory. Inserted (CVE-2018-8933 / 7 7 7 57 8955 1
CVE-2014-0881 / CVE-2017-5704)
22 ROM size is too small preventing execution of security code. Inserted (CVE-2018-6242 / ) 7 7 3 1 751 N/A
CVE-2018-15383)
23 Disabled the ability to activate the security-enhanced core. Inserted (CVE-2018-12206) 7 7 7 1 282 N/A
25 Unprivileged user-space code can write to the privileged CSR. Inserted (CVE-2018-7522 / 7 7 3 1 745 1
CVE-2017-0352)
26 Advanced debug unit password is hard-coded and set on reset. Inserted (CVE-2018-8870) 7 7 3 1 406 16
27 Secure mode is not required to write to interrupt registers. Inserted (CVE-2017-0352) 7 7 3 1 303 1
30 Supervisor mode signal of a core is floating preventing the use of SMAP. Native 7 7 3 1 282 1
TLB Page Fault Timing Side Channel (L-1 & L-2). SPI CPI I2C UART
While analyzing the Ariane RTL, we noted a timing
side-channel leakage with TLB accesses. TLB page faults F IGURE 2: Hardware overview of the PULPissimo SoC. Each
due to illegal accesses occur in a different number of clock bug icon indicates the presence of at least one security vulner-
cycles than page faults that occur due to unmapped memory ability in the module.
(we contacted the developers and they acknowledged the
vulnerability). This timing disparity in the RTL manifests were pre-fetched into the cache (at the system privilege level)
in the microarchitectural behavior of the processor. Thus, do not get flushed. These shared cache entries are visible to
it constitutes a software-visible side channel due to the user-space software, thus enabling timing channels between
measurable clock-cycle difference in the two cases. Previous privileged and unprivileged software.
work already demonstrated how this can be exploited by Verifying the implementation of all the flush control signals
user-space adversaries to probe mapped and unmapped and their behavior in all different states of the processor
pages and to break randomization-based defenses [24, 29]. requires examining at least eight modules: ariane.sv -
Timing flow properties cannot be directly expressed by controller.sv - frontend.sv - id_stage.sv - icache.sv - fetch_fifo
simple properties or modeled by state-of-the-art verification - ariane_pkg.sv - csr_regfile.sv (see Figure 5). This is complex
techniques. Moreover, for this vulnerability, we identify at because it requires identifying and defining all the relevant
least seven RTL modules that would need to be modeled, security properties to be checked across these RTL modules.
analyzed and verified in combination, namely: mmu.sv - Since current industry-standard approaches do not support
nbdcache.sv - tlb.sv instantiations - ptw.sv - load_unit.sv expressive capturing and the verification of cache states, this
- store_unit.sv. Besides modeling their complex inter- and issue in the RTL can only be found by manual inspection.
intra-modular logic flows (L-1), the timing flows need to be
modeled to formally prove the absence of this timing channel Firmware-Configured Memory Ranges (L-4).
leakage, which is not supported by current industry-standard In PULPissimo, we added peripherals with injected bugs to
tools (L-2). Hence, the only alternative is to verify this reproduce bugs from CVEs. We added an AES encryption/de-
property by manually inspecting and following the clock cryption engine whose input key is stored and fetched from
cycle transitions across the RTL modules, which is highly memory tightly coupled to the processor. The memory ad-
cumbersome and error-prone. However, the design must still dress the key is stored in is unknown, and whether it is within
be analyzed to verify that timing side-channel resilience is the protected memory range or not is inconclusive by observ-
implemented correctly and bug-free in the RTL. This only ing the RTL alone. In real-world SoCs, the AES key is stored
becomes far more complex for real-world industry-standard in programmable fuses. During secure boot, the bootload-
SoCs. We show the RTL hierarchy of the Ariane core in er/firmware senses the fuses and stores the key to memory-
Figure 5 in Appendix A to illustrate its complexity. mapped registers. The access control filter is then configured
to allow only the AES engine access to these registers, thus
Pre-Fetched Cache State Not Rolled Back (L-1 & L-3). protecting this memory range. Because the open-source SoC
Another issue in Ariane is with the cache state when a system we used did not contain a fuse infrastructure, the key storage
return instruction is executed, where the privilege level of the was mimicked to be in a register in the Memory-Mapped I/O
core is not changed until this instruction is retired. Before (MMIO) space.
retirement, linear fetching (guided by branch prediction) of Although the information flow of the AES key is defined
data and instructions following the unretired system return in hardware, its location is controlled by the firmware.
instruction continues at the current higher system privilege Reasoning on whether the information flow is allowed or
level. Once the instruction is retired, the execution mode of the not using conventional hardware verification approaches is
core is changed to the unprivileged level, but the entries that inconclusive when considering the RTL code in isolation.
# of bugs
4
and manual inspection. Our first case study covered manual
inspection, simulation and emulation techniques. Thus, we 3
We thank our anonymous reviewers and shepherd, Stephen [12] Cadence. JasperGold Security Path Verification App.
Checkoway, for their valuable feedback. The work was sup- https://www.cadence.com/content/cadence-www/global/en_
ported by the Intel Collaborative Research Institute for Col- US/home/tools/system-design-and-verification/formal-
and-static-verification/jasper-gold-verification-
laborative Autonomous & Resilient Systems (ICRI-CARS), platform/security-path-verification-app.html, 2018. Last
the German Research Foundation (DFG) by CRC 1119 accessed on 09/09/18.
CROSSING P3, and the Office of Naval Research (ONR
[13] M. Castro, M. Costa, and T. Harris. Securing software by enforcing
Award #N00014-18-1-2058). We would also like to ac- data-flow integrity. USENIX Symposium on Operating Systems Design
knowledge the co-organizers of Hack@DAC: Dan Holcomb and Implementation, pages 147–160, 2006.
(UMass-Amherst), Siddharth Garg (NYU), and Sourav Sudhir
(TAMU), and the sponsors of Hack@DAC: the National Sci- [14] D. P. Christopher Celio, Krste Asanovic. The Berkeley Out-of-Order
Machine. https://riscv.org/wp-content/uploads/2016/01/
ence Foundation (NSF CNS-1749175), NYU CCS, Mentor - a Wed1345-RISCV-Workshop-3-BOOM.pdf, 2016.
Siemens Business and CROSSING, as well as the participants
of Hack@DAC. [15] Cisco. Cisco: Strengthening Cisco Products. https://www.
cisco.com/c/en/us/about/security-center/security-
programs/secure-development-lifecycle.html, 2017.
[4] R. Armstrong, R. Punnoose, M. Wong, and J. Mayo. Sur- [20] D. Evtyushkin, R. Riley, N. C. Abu-Ghazaleh, D. Ponomarev, et al.
vey of Existing Tools for Formal Verification. Sandia Na- BranchScope: A New Side-Channel Attack on Directional Branch
tional Laboratories https://prod.sandia.gov/techlib-noauth/ Predictor. ACM Conference on Architectural Support for Programming
access-control.cgi/2014/1420533.pdf, 2014. Languages and Operating Systems, pages 693–707, 2018.
[57] J. Oberg. Secure Development Lifecycle for Hardware Becomes an Im- [75] D. Zhang, Y. Wang, G. E. Suh, and A. C. Myers. A Hardware De-
perative. https://www.eetimes.com/author.asp?section_id= sign Language for Timing-Sensitive Information-Flow Security. In-
36&doc_id=1332962, 2018. ternational Conference on Architectural Support for Programming
Languages and Operating Systems, pages 503–516, 2015.
[58] J. Oberg, W. Hu, A. Irturk, M. Tiwari, T. Sherwood, and R. Kastner. The-
oretical Analysis of Gate Level Information Flow Tracking. IEEE/ACM [76] T. Zhang and R. B. Lee. New Models of Cache Architectures Char-
Design Automation Conference, pages 244–247, 2010. acterizing Information Leakage from Cache Side Channels. ACSAC,
pages 96–105, 2014.
[59] PULP Platform. Ariane. https://github.com/pulp-platform/
ariane, 2018.
[69] A. Tang, S. Sethumadhavan, and S. Stolfo. CLKSCREW: exposing C Details on the Pulpissimo Bugs
the perils of security-oblivious energy managemen. USENIX Security
Symposium, pages 1057–1074, 2017. We present next more detail on some of the RTL bugs used in
[70] M. Tiwari, H. M. Wassel, B. Mazloom, S. Mysore, F. T. Chong, and our investigation.
T. Sherwood. Complete Information Flow Tracking from the Gates Up. Bugs in crypto units and incorrect usage: We extended
ACM International Conference on Architectural Support for Program- the SoC with a faulty cryptographic unit with a multiplexer
ming Languages and Operating Systems, pages 109–120, 2009.
to select between AES, SHA1, MD5, and a temperature sen-
[71] Tortuga Logic. Verifying Security at the Hardware/Software Boundary. sor. The multiplexer was modified such that a race condition
http://www.tortugalogic.com/unison-whitepaper/, 2017. occurs if more than one bit in the status register is enabled,
causing unreliable behavior in these security critical modules.
[72] J. Van Bulck, F. Piessens, and R. Strackx. Foreshadow: Extracting the
Keys to the Intel SGX Kingdom with Transient Out-of-Order Execution. Furthermore, both SHA-1 and MD5 are outdated and bro-
USENIX Security Symposium, 2018. ken cryptographic hash functions. Such bugs are not de-
tectable by formal verification, since they occur due to a
[73] A. Waterman, Y. Lee, D. A. Patterson, and K. Asanovic. The
RISC-V Instruction Set Manual. Volume 1: User-Level ISA, Version
specification/design issue and not an implementation flaw,
2.0. https://content.riscv.org/wp-content/uploads/2017/ therefore they are out of the scope of automated approaches
05/riscv-spec-v2.2.pdf, 2014. and formal verification methods. The cryptographic key is
std_cache_
frontend id_stage issue_stage ex_stage commit_stage csr_regfile perf_counters controller
subsystem
TABLE 2: Classification of the underlying vulnerabilities of recent microarchitectural attacks by their HardFail properties.
DRAM
...
aes_1cc aes( Memory
.clk(0),
... NULL
2 Interconnect 5 PCBB 1
.rst(1),
.g_input(b),
.e_input(a), F IGURE 6: Our attack exploits a bug in the implementation
.o(aes_out) of the memory bus of the PULPissimo SoC: by 1 spamming
); the bus with invalid transactions an adversary can make 4
malicious write requests be set to operational.
Bugs in security modes: We replaced the standard under certain conditions (see bug number #7 in Table 1). In
PULP_SECURE parameter in the riscv_cs_registers and the first step 1 , the attacker generates a user program (Task
riscv_int_controller modules with another constant param- A) that registers a dummy signal handler for the segmenta-
eter to permanently disable the security/privilege checks for tion fault (SIGSEGV) access violation. Task A then executes a
these two modules. Another bug we inserted is switching the loop with 2 a faulting memory access to an invalid memory
write and read protections for the AXI bus interface, causing address (e.g., LW x5, 0x0). This will generate an error in
erroneous checks for read and write accesses. the memory subsystem of the processor and issue an invalid
Bugs in the JTAG module: We implemented a JTAG memory access interrupt (i.e., 0x0000008C) to the processor.
password-checker and injected multiple bugs in it, includ- The processor raises this interrupt to the running software (in
ing the password being hardcoded in the password checking this case the OS), using the pre-configured interrupt handler
file. The password checker also only checks the first 31 bits, routines in software. The interrupt handler in the OS will then
which reduces the computational complexity of brute-forcing forward this as a signal to the faulting task 3 , which keeps
the password. The password checker does not reset the state looping and continuously generating invalid accesses. Mean-
of the correctness of the password when an incorrect bit is while, the attacker launches a separate Task B, which will
detected, allowing for repeated partial checks of passwords then issue a single memory access 4 to a privileged memory
to end up unlocking the password checker. This is also facil- location (e.g., LW x6, 0xf77c3000). In this situation, multi-
itated by the fact that the index overflows after the user hits ple outstanding memory transactions will be generated on the
bit 31, allowing for an infinite cycling of bit checks. memory bus, all of which but one will be flagged as faulty by
the address decoder. An invalid memory access will always
D Exploiting Hardware Bugs From Software proceed the single access of Task B. Due to the bug in the
memory bus address decoder, 5 the malicious memory ac-
We now explain how one of our hardware bugs can be ex-
cess will become operational instead of triggering an error.
ploited in real-world by software. This RTL vulnerability
Thus, the attacker can issue read and write instructions to
manifests in the following way. When an error signal is gen-
arbitrary privileged (and unprivileged) memory by forcing the
erated on the memory bus while the underlining logic is still
malicious illegal access to be preceded with a faulty access.
handling an outstanding transaction, the next signal to be han-
Using this technique the attacker can eventually leverage this
dled will instead be considered operational by the module
read-write primitive, e.g., 6 to escalate privileges by writing
unconditionally. This lets erroneous memory accesses slip
the process control block (PCBB ) for his task to elevate the
through hardware checks at runtime. Armed with the knowl-
corresponding process to root. This bug leaves the attacker
edge about this vulnerability, an adversary can force memory
with access to a root process, gaining control over the en-
access errors to evade the checks. As shown in Figure 6, the
tire platform and potentially compromising all the processes
memory bus decoder unit (unit of the memory interconnect)
running on the system.
is assumed to have the bug. This causes errors to be ignored