Zenhammer Sec24
Zenhammer Sec24
Patrick Jattke† Max Wipfli† Flavien Solt Michele Marazzi Matej Bölcskei Kaveh Razavi
ETH Zurich
† Equal contribution first authors
2 Background that this assumption does not hold on our target systems.
g(x)
f(x)
f(x)
0 0 0
8 9 10 11 12 8 9 10 11 12 8 9 10 11 12
physical address [GiB] offset physical address [GiB] offset physical address [GiB]
Figure 2. (a) Function values for f (x) given by 0x64440100 for same-cluster addresses over the full address range on Z 3 , showing a uneven
distribution between “0” and “1”. (b) After offsetting the physical addresses by 768 MiB before applying the function, the same function’s
output looks evenly distributed. (c) This allows us to find the function g(x) defined by 0x44440100 that is constant for the cluster’s addresses
across all memory. We color the addresses whose function value changes when applying the offset in green (0 → 1) or blue (1 → 0).
Intel CPU AMD
OS Invisible (Phys. Addr. Space) Shifting by PCI Range Offset Table 2. Primary PCI memory
Reclaim Offsetting System mappings and detected physi-
DRAM3 [MiB] [MiB]
TOM cal address offsets, i.e., differ-
DRAM3 Z+ 3072 – 4048 1024
DRAM2 DRAM2 Z2 3584 – 3968 512 ence between 4 GiB and the
DRAM2 4 GiB Z3 3328 – 4076 768 PCI mapping’s start address.
DRAM3 PCI
DRAM1 DRAM1 DRAM1
0 GiB 4.3 Recovered Address Mappings
Figure 3. Remapping of higher address ranges to unused parts We run DARE on all our systems using single and dual-rank
of physical memory on Intel and AMD CPUs. The Top of Mem-
DIMMs. DARE successfully reverse engineers the address
ory (TOM) is the system’s highest addressable memory location.
functions for all memory configurations on all three systems.
For simplicity, we limit our analysis to single-channel, single-
System Address Map. We now provide an explanation and DIMM systems with default UEFI settings, as this is sufficient
supporting evidence for the existence of this offset. for performing Rowhammer.
The physical address space is divided into ranges backed by To validate our results, we verify the functions’ correct-
main memory (i.e., DRAM) and ranges for memory-mapped ness using a high-bandwidth oscilloscope similar to previous
I/O (MMIO) devices. In particular, PCI(e) devices are com- work [32]. This also allows us to obtain the function labels
monly mapped just below the 4 GiB boundary to keep 32-bit (i.e., assign functions to the DRAM address components) and
compatibility, thus masking parts of main memory. Due to clarify the cases where our tool found linear combinations of
DRAM sizes in the order of gigabytes, CPU vendors intro- the actual address functions. We note that this manual step is
duced mechanisms to remap the otherwise inaccessible part not required for Rowhammer attacks.
of DRAM to a higher address range, as shown in Figure 3. In- Results. We provide a list of all our reverse engineered and
tel still employs this “OS Invisible Reclaim” mechanism [16, oscilloscope-validated address functions for three AMD Zen
p. 19], but AMD stopped documenting “Memory Hoisting” [2, microarchitectures and different memory configurations in
§2.9.12] with the Zen microarchitecture. Our findings suggest Table 3. Note that some physical address bits are above the
that newer AMD processors shift all physical addresses above 1 GiB mark, which explains why DARE uses as many 1 GiB
4 GiB by a fixed, system-specific offset. This offset depends superpages as possible while building same-bank address
on the system’s hardware configuration, e.g., mainboard, in- clusters.
stalled PCI(e) devices. Observation 2. We need access to a memory block larger
than 1 GiB to entirely recover all DRAM address mappings.
Automation. To avoid having to brute force the physical
address offset, we analyze the system memory map of our Discussion. To the best of our knowledge, we are the first
target systems to find the location of the primary PCI memory to reverse engineer and provide physically validated DRAM
mapping.2 For example, as we show in Table 2, the PCI mem- address mappings on recent AMD Zen-based systems with
ory range on Z 2 starts at 3584 MiB and ends at 3968 MiB. consideration of the address offsets. Further, we provide an
This allows us to precisely calculate the system’s address improved reverse-engineering tool to reproduce and extend
offset by determining the difference between the PCI map- our results with more memory configurations as needed.
ping’s start address and the 4 GiB boundary, for example,
4096 − 3584 = 512 MiB for Z 2 . We apply the offset to our Row Mapping. DARE, just like DRAMA, does not allow the
physical addresses before brute forcing the address functions detection of physical address bits used for DRAM row and
to produce valid functions on all our systems. column indices. Therefore, before we can experimentally
evaluate our address mappings using a Rowhammer attack, we
need to extract the row mapping. Based on previous results [9,
2 In Linux, the “PCI Bus 0000:00” in the (privileged) /proc/iomem file. 42], we assume that the highest available address bits are used
Table 3. Reverse engineered address mappings and offsets for different DRAM configurations. All memory configurations are single-channel,
single-DIMM, with the tuple indicating the DIMM’s geometry (#ranks, #bank groups, #banks per bank group, #rows).
for row indexing, which we verified with our oscilloscope. for all THPs with the same color are identical, we can use
For example, a 16 GiB device (with 216 rows) consists of 234 the same THP row offsets for all THPs of the same color.
individually addressable bytes, and a row index is described Finally, we validate the row addresses using our bank conflict
by the bits (a33 , a32 , . . . , a18 ). side channel and discard all THPs where any two rows do not
cause bank conflicts.
4.4 Enabling Exploitation Results. We measured how long the coloring and detecting
On our Intel Coffee Lake system the bank, bank group, and same-bank rows take on our Zen 3 system with a dual-rank
rank bits all fall within the lower 21 bits, i.e., within a transpar- DIMM (S2 in Table 4). The THP coloring took on overage
ent huge page (THP). However, we noticed that the address 39.23 s and must be repeated for each attack as the THP al-
functions on AMD Zen 2 and Zen 3 systems can cover up to location in physical memory changes. Detecting same-bank
bit 34 (see Table 3). This makes exploitation without knowing rows for each THP color is a one-time cost that can be pre-
these bits challenging. Previous methods assume DRAM func- computed for each system memory configuration and took on
tions with all addressing bits falling in the lower 21 bits [24], average 18 ms.
do not take advantage of THPs [25], or color THPs for other
purposes such as cache eviction [9]. We now describe how the 4.5 Evaluation
bank conflict side channel and the reverse-engineered DRAM In addition to the physical validation of our mappings, we
mappings can be combined to detect consecutive same-bank use Rowhammer on our AMD systems with non-uniform
rows, which is crucial for Rowhammer attacks. hammering patterns [17] to see if we can trigger bit flips, as
Coloring THPs. We allocate 256 MiB of 2 MiB-aligned this requires precise DRAM addressing. Further, we evaluate
memory and turn it into 2 MiB THPs by using madvise. We the recent Half-Double patterns [24].
then iterate in steps of 2 MiB over the allocated memory such Threat Model. In our evaluation, we assume that the CPU
that the 21 lower bits are always the same. As the upper phys- model of the target machine is known to the attacker and that
ical address bits are unknown, we cannot directly apply our they have obtained the correct DRAM address mappings, for
recovered address functions. Instead, we use the bank conflict example, using DARE. We further assume that an unprivi-
side channel to measure if the current THP conflicts with any leged attacker can execute programs on the victim’s machine
other THP we found before. If two THPs conflict, we assign but does not know anything more specific about the DRAM
them the same color; otherwise, we assign a new color to the devices (e.g., DRAM chip manufacturer).
current THP. This approach allows us to assign a color to
Setup. We modify the reference implementation of Black-
each THP based on the unknown upper physical address bits.
smith [17].3 Our changes include adding the address map-
Detecting same-bank rows. Given that THPs are 2 MiB con- pings we found previously and other necessary platform
tiguous memory regions, we know that the lower 21 physical changes, such as timing thresholds, as the fuzzer was orig-
and virtual address bits are the same. Thus, we can group inally designed for an Intel Coffee Lake system. However,
the THPs of the same color and use our recovered address we do not apply any microarchitecture-specific optimizations.
functions on the lower bits to address consecutive same-bank For the evaluation, we do six-hour fuzzing runs on both Z 2
rows. For that, we iterate over the row index bits that fall into and Z 3 with the ten DDR4 DIMMs listed in Table 4 that we or-
the lower 21 bits. As they may overlap with bank address bits, dered randomly from an online retailer. These DIMMs cover
it may require flipping lower (non-overlapping) bits to stay
within the same bank. As the values of the DRAM functions 3 https://github.com/comsec-group/blacksmith
Table 4. DDR4 UDIMMs used in the evaluation of our AMD Zen- Listing 1. Refresh synchronization routine as used by Blacksmith.
optimized Rowhammer fuzzer. We abbreviate the DRAM vendors
void ref_sync_original(volatile char* rows[2]) {
Samsung (S ), SK Hynix (H ), and Micron (M ). For each device, we while (true) {
report the number of ranks (RK), bank groups (BG), banks per bank uint64_t start = rdtscp(); /* START TIMER */
group (BA), and rows (R). lfence();
*rows[0]; *rows[1];
Production Freq. Size DIMM Geometry clflushopt(rows[0]); clflushopt(rows[1]);
ID
Date [MHz] [GiB] (RK,BG,BA,R) uint64_t stop = rdtscp(); /* STOP TIMER */
lfence();
S0 Q3-2020† 2132 8 (1, 4, 4, 216 ) if ((stop - start) > THRESHOLD) break;
S1 Q3-2020† 2132 16 (2, 4, 4, 216 ) }
S2 Q2-2020 2666 32 (2, 4, 4, 217 ) }
S3 Q4-2017 2400 8 (1, 4, 4, 216 )
S4 Q3-2020† 2666 8 (1, 4, 4, 216 )
S5 Q2-2020 2666 16 (2, 4, 4, 216 ) terns in the remainder of this work, and base Z EN H AMMER
H0 Q3-2020† 2132 16 (2, 4, 4, 216 ) on non-uniform Rowhammer patterns.
H1 Q4-2020 2400 8 (1, 4, 4, 216 ) Based on our results, we conclude that the common ham-
M0 Q1-2020 2666 8 (1, 4, 4, 216 ) mering instruction sequence as used by Blacksmith [17] and
M1 Q1-2020 2400 8 (1, 4, 4, 216 ) others [10] encodes implicit assumptions about the underly-
† ing Intel microarchitecture. Our results show that this signifi-
Purchase date used as production date unavailable.
cantly affects Rowhammer’s effectiveness on other platforms,
such as the AMD systems targeted in this work. Motivated
Table 5. Result of running Blacksmith with our address mappings
and platform fixes (e.g., thresholds) on AMD Zen 2 and Zen 3 sys-
by this, we investigate the two crucial aspects of hammering,
tems, compared to our Intel Coffee Lake baseline. We report for each namely, refresh synchronization (Section 5) and the activation
device the number of patterns found (|P+ |) and the number of bit rate (Section 6) on AMD systems, and show how Z EN H AM -
flips over all patterns (|Ffuzz |). We omit devices without any bit flips. MER can improve them.
the three major DRAM manufacturers. To allow comparison 5.1 Blacksmith Synchronization
with Intel, we further run the same code on the same DIMMs In Listing 1, we present Blacksmith’s synchronization rou-
on a Coffee Lake (Core i7-8700K) machine. tine, which uses two same-bank rows. This method relies on
Results. The result of our evaluation is presented in Table 5. RDTSCP to capture timestamps, LFENCE to serialize the execu-
It shows that with our minimal changes, we can trigger bit tion stream, and CLFLUSHOPT to immediately flush accessed
flips on our Zen 2 system; however, only on 5 of 10 modules. rows. It assumes a REF has been detected whenever the timing
We could not find any patterns on Zen 3. This is much lower measurements exceed a predefined threshold.
than compared to 8 of 10 modules on the Intel Coffee Lake Evaluation. To detect whether synchronization works prop-
platform. We further note that the number of patterns found erly, we evaluate the time between detected refreshes, both on
in the worst case (S2 ) is roughly 50x smaller on Zen 2 (14 Z + and Z 3 . When refresh commands are correctly detected,
patterns) than on Coffee Lake (782 patterns). we expect the time between them to be around 7.8 µs, i.e.,
We also tested Half-Double [24] patterns on all DDR4 tREFI as specified by the DDR4 standard [18].
devices with our address mappings and the reference imple-
Results. The experiment results, each with 10 K iterations,
mentation.4 As we did not find any bit flips on our devices
are presented in Figure 4. The median latencies are 7.62 µs for
using these patterns, and Half-Double has not been shown to
Z + and 5.37 µs for Z 3 .5 While the data for the Zen+ system
be exploitable on x86-64 machines, we disregard these pat-
5 For a
fair comparison with Blacksmith, which uses AsmJit [23] to just-in-
4 https://github.com/IAIK/halfdouble time (JIT) compile hammering patterns and their synchronization from x86-
104 7.8µs Z+ 7.8µs Z3 Listing 2. Our continuous, non-repeating refresh synchronization.
# Samples
suggests that this method works quite reliably, REFs are often Table 6. REF-to-REF inter-
detected too early on Zen 3. This could be because of two Median [µs] Outliers [%]
#Rows val when using the con-
reasons: either the refresh detection fails most of the time, Z+ Z3 Z+ Z3 tinuous, non-repeating tim-
or the memory controller schedules REFs opportunistically. 16 2.01 2.62 7.3 24.7 ing measurement routine
The latter is possible because the DDR4 standard [18] only 32 1.19 4.41 43.4 71.4 (ref_sync_nonrep) for dif-
specifies the average time between refresh commands and 64 7.81 7.77 0.3 0.6 ferent numbers of rows on
allows for some flexibility. In the following section, we will 128 7.93 7.85 0.3 0.7 Z + and Z 3 . We identify as
show that it is possible to detect the majority of refreshes 256 7.80 7.71 0.2 0.7 outliers all the values that
reliably, as the original refresh synchronization method is differ more than 10 % from
Orig.† 7.62 5.37 1.1 93.4 the median.
inadequate on our AMD platforms.
† The original refresh sync. routine
with 2 rows (see Figure 4).
5.2 Precise and Reliable Synchronization
We analyzed Blacksmith’s refresh synchronization routine routine is presented in Listing 2.
as used by Z EN H AMMER to identify possible measurement
Evaluation. We evaluate our new routine using the same ex-
errors. By looking at the source code (Listing 1), we identi-
periment as before. We show the obtained distribution of mea-
fied a brief time window, where fencing (lfence) happens,
sured REF-to-REF intervals in Table 6. The results demonstrate
that is not measured between the stop timestamp and the
that when more than 32 rows are employed in the synchroniza-
next iteration’s start timestamp. As the memory controller
tion, we correctly identify refreshes on all our systems. This
has some flexibility for scheduling refresh commands, it can
means that a sufficient number of unique rows is necessary
happen that a REF sometimes remains undetected if it falls
to cover an entire refresh interval (i.e., 7.8 µs) before falling
into this untimed gap. Furthermore, the memory controller
through the end of the detection loop.
may schedule the REF commands opportunistically during
flush instructions, reducing the accuracy of detecting the REF Observation 3. Continuous, non-repeating time measure-
commands. ments strongly improve the reliability of our refresh com-
Continuous Measurements. To mitigate this issue, we pro- mand detection.
pose a modified refresh synchronization routine with con-
tinuous, non-repeating timing measurements: each recorded 6 Activation Rate
timestamp serves as both the end time of the current measure- We noticed that the number of tested patterns on the AMD sys-
ment round and the start time of the next. This ensures that tems is significantly lower than on the Intel Coffee Lake base-
all the instructions are included in the timing measurement. line during fuzzing, on average by 45 % (Z 2 ) and 52 % (Z 3 ).
To ensure that the memory controller does not opportunisti- As we fuzz for a fixed period (6 h) while hammering each pat-
cally schedule REF commands during the flush instructions, tern for 5 M activations, this suggests that each individual pat-
we avoid flushing during the synchronization phase. We solve tern takes significantly longer to hammer. To investigate this,
this by designing a new method that allows a flexible number we measure hammering execution times to compute the aver-
of rows and measures the latency of each memory access age number of activations per refresh interval (ACTs/tREFI)
individually. for each pattern. We present the comparison between Z + , Z 3 ,
Avoiding Cache Hits. To avoid CLFLUSHOPT during synchro- and Coffee Lake in Figure 5. The data shows that the average
nization, our code can only access different rows not to incur number of ACTs/tREFI achieved on Z + (41.9) and Z 3 (37.2)
cache hits. To evict the cache lines for the subsequent synchro- are only about half when compared to Coffee Lake (76.8).
nization phase, we flush the accessed rows after the REF is The lower activation rate on the AMD systems have a direct
detected. Our continuous, non-repeating timing measurement impact on Rowhammer as discussed next.
64 assembly, we implement all routines using AsmJit. We show equivalent C Hammer Count Estimation. We now approximate the ham-
representations throughout this paper. mer count (HC) that a victim row is subjected to given these
Table 7. Heatmap of memory access rates (in ACTs/tREFI) for
Rel. frequency
75
50
SPnone and SPrep on half of all devices.
25 Observation 6. For SK Hynix devices, choosing SPpair
0
Samsung SK Hynix Micron Samsung SK Hynix Micron works best across different devices.
SPnone SPpair SPrep SPfull
Lastly, we have not found any effective hammering pattern
Figure 6. Comparison of the four effective scheduling policies
for Micron devices using SPnone , which indicates that ordering
(SPnone , SPpair , SPopt , SPfull ) grouped by vendors, normalized by
#devices per vendor. The dashed areas indicate how often each pol-
is essential for these chips. This behavior could be explained
icy was the best in the no. of effective patterns. The percentages per by the type of deployed in-DRAM mitigation. Rowhammer
vendor sum up to the total percentage of devices with bit flips. mitigations that sample rows with non-uniform probabili-
ties are harder to evade if the accesses are uncontrollably
reordered.
cies (SPs), which are summarized in Table 8. Besides the Observation 7. Preserving ordering in hammer patterns is
two simple polices, no fences (SPnone ) and fencing after ev- essential on Micron devices.
ery access (SPfull ), we propose four policies that take the
pattern’s structure into account, fencing every (SPBP ) or As the results show that the best scheduling policy may
every half base period (SPBP/2 ), fencing between aggres- vary for different devices from the same vendor, we do not
sor pairs (SPpair ), and fencing between repetitions of the incorporate vendor-specific policies in Z EN H AMMER.
same aggressors (SPrep ). Some scheduling policies are cache-
avoiding, i.e., they strongly order all consecutive accesses to
the same aggressor. However, we still consider all policies 7 Evaluation
on all our systems, as previous work has shown that omitting
fences can lead to both higher activation rates [8] and more In this section, we compare Z EN H AMMER, especially de-
bit flips [43] despite possibly incurring cache hits. signed for Rowhammer on Zen-based systems, to the baseline
Evaluation. We evaluate the effectiveness of our fence established on Intel in Section 4.5. In addition, we assess
scheduling policies in two ways. To begin with, we build the impact of our optimizations on the effectiveness of Z EN -
a theoretical model for the amount of ordering provided by H AMMER and evaluate the exploitability of the discovered
different scheduling policies, and contrast this with the ham- bit flips. We first describe our evaluation setup and methodol-
mering speeds obtained with the respective policies on our ogy (Section 7.1) and then present and discuss the results (Sec-
systems, as described in Appendix C. The results show that tion 7.2). We conclude by applying Z EN H AMMER on DDR5
SPpair and SPrep can provide significantly higher activation devices (Section 7.3).
rates when compared to SPfull without allowing significant
reordering. To validate our theoretical model against the real
world, we perform 6 h fuzzing for each of our ten DIMMs (Ta-
7.1 Setup and Methodology
ble 4. We employ the two proposed policies SPpair and SPrep , For our evaluation, we pick the same previously used DDR4
and for comparison SPnone and SPfull . As the activation rate devices (Sections 4 and 6), covering DRAM chips from
experiment (Section 6.1) was inconclusive in defining which all three major DRAM manufacturers, Samsung (S ), SK
memory barrier is optimal, we randomize the fence type be- Hynix (H ), and Micron (M ). For establishing the Intel base-
tween MFENCE and LFENCE. line, we used an Intel Core i7-8700K. The AMD Zen 2 and
In Figure 6, we show the results of our experiments. We Zen 3 machines are equipped with the CPUs listed in Table 1.
present how many configurations generated at least one effec- All machines use default UEFI settings and device timings.
tive hammering pattern per vendor, normalized by the number In line with previous work [10, 17], we evaluate Z EN H AM -
of DIMMs from that vendor. These results describe which MER in three stages: (i) fuzzing for 6 h using Z EN H AMMER
configuration is most widely effective for each DRAM vendor. for each configuration (i.e., fence scheduling policy), (ii) de-
From the data, we observe that fencing is not strictly required, termining the best pattern using a minisweep over all effective
as SPnone found bit flips on all devices from Samsung on both patterns by moving the pattern over a physically contiguous
Zen 2 and Zen 3. However, SPpair is the most effective pol- 4 MiB of memory, and (iii) sweeping the best pattern found
icy on Zen 2 across most devices (75%). The same, but less over a physically contiguous 256 MiB memory range to as-
significantly, applies to Zen 3. sess the device’s vulnerability level and assess the bit flips’
Observation 5. For Samsung devices, the scheduling pol- exploitability. We note that our approach does not rely on
icy SPpair is the most widely applicable (across devices) any DRAM device-specific knowledge as we tested all fence
and most effective (across patterns). scheduling policies and fence types on each device to deter-
mine the optimal per-device configuration (see Section 6).
Zen 2 Zen 3 Coffee Lake Table 9. Z EN H AMMER results on AMD
ID Zen 2 and Zen 3 as well as Intel
SPopt |P+ | |Ffuzz | |Fswp | SPopt |P+ | |Ffuzz | |Fswp | SPopt |P+ | |Ffuzz | |Fswp | Coffee Lake. For each of our ten de-
S0 SPrep 51 151 6,945 SPnone 31 124 17,775 SPfull 122 3,502 6,782 vices, we report the best scheduling pol-
S1 SPrep 26 97 1,758 SPpair 25 144 15,613 SPfull 102 1,374 10,106 icy (SPopt ) and the number of effec-
S2 SPnone 97 1,685 12,893 SPnone 45 471 79,306 SPfull 782 22,339 1,708 tive patterns (|P+ |) and bit flips (|Ffuzz |)
S3 SPnone 8 15 2,020 SPpair 1 1 667 SPfull 3 3 0 found while fuzzing with the best pol-
S4 SPnone 60 182 1,183 SPpair 43 297 13 SPfull 47 654 18,357 icy. We also show the number of bit flips
S5 SPnone 25 83 1,911 SPpair 26 87 10,741 SPfull 155 4,131 5,860 found when sweeping the best patterns
H0 SPnone 6 13 182 – 0 0 0 – 0 0 0 over a 256 MiB range (|Fswp |).
H1 – 0 0 0 – 0 0 0 SPfull 24 35 0
M0 – 0 0 0 – 0 0 0 – 0 0 0
M1 – 0 0 0 – 0 0 0 SPfull 16 23 2
Table 10. Analysis of the bit flip exploitability found during the sweep over 256 MiB on AMD Zen 2, Zen 3, and Intel Coffee Lake. For each
attack, we indicate the number of exploitable bit flips (#Ex.) and average time to find an exploitable bit flip (Time). We mark DIMMs with a
single exploitable bit flip by (*). We omit DIMMs without any exploitable bit flips.
PTE [36] RSA-2048 [34] sudo [11]
DIMM Zen 2 Zen 3 Coffee Lake Zen 2 Zen 3 Coffee Lake Zen 2 Zen 3 Coffee Lake
#Ex. Time #Ex. Time #Ex. Time #Ex. Time #Ex. Time #Ex. Time #Ex. T. #Ex. Time #Ex. Time
S0 7 6m 4s 7 2m 55s 3 4m 15s 17 2m 47s 37 46s 14 1m 36s – – 4 3m 13s 1 *23m 49s
S1 90 9s 1474 2s 846 2s 6 2m 2s 27 30s 21 26s – – 1 *6m 50s 1 *1m 20s
S2 641 21s 5326 1s 126 11s 30 2m 16s 170 6s 6 1m 59s – – 12 1m 17s – –
S3 142 9s 61 32s – – 7 2m 21s – – – – – – – – – –
S4 220 28s 3 23m 52s 2658 1s 7 12m 29s 1 *23m 52s 53 26s – – – – 4 5m 16s
S5 102 6s 625 2s 330 4s 6 1m 14s 28 33s 11 1m 5s – – 2 5m 58s 3 2m 34s
H0 11 53s – – – – – – – – – – – – – – – –
7.2 Effectiveness and Exploitability any optimizations (see Table 5) to 7 and 6 devices afterward,
for Zen 2 and Zen 3, respectively. The number of effective
The results of our evaluation are presented in Table 9. We
hammering patterns found further increased drastically, in the
show for each tested platform (AMD Zen 2 and Zen 3, Intel
best case (S2 ) by roughly six times (from 14 to 97). Moreover,
Coffee Lake) and each DDR4 device, the number of effec-
the results on Zen 3, where we had not found any bit flips
tive patterns found (|P+ |) and the number of bit flips (|Ffuzz |)
previously, stress the need for our optimizations to trigger
found during fuzzing with the device’s best fence schedul-
any bit flips on the AMD Zen 3 platform. This shows that
ing policy (SPopt ) that we used in all three stages. For Intel
the hammering instruction sequence and fence scheduling
Coffee Lake, we assumed the scheduling policy SPfull , which
policy are important when adapting Rowhammer attacks to
corresponds to the one used by the original Blacksmith fuzzer.
new platforms.
We also show for the best pattern, the total number of
Nevertheless, we note that there are still strong differences
bit flips over the sweeped 256 MiB of physically contiguous
in terms of hammering effectiveness between AMD and Intel.
memory (|Fswp |), which we then use to assess exploitability
On Intel, four of eight DIMMs have a higher bit flips count
of three known Rowhammer end-to-end attacks in Table 10.
in the sweep than the same devices on Zen 2. Interestingly,
For the exploitability analysis, we follow prior work [7,
there is one device (H0 ) where we could not find any bit
10, 17] and use the Rowhammer attack simulation framework
flip on Coffee Lake while Z EN H AMMER is successful on
Hammertime [37] to estimate the required time for three pre-
Zen 2. Generally, our optimizations seem to be more effective
viously proposed Rowhammer attacks targeting (i) page table
on Zen 3, where the number of bit flips of the best pattern
entries (PTE) to craft an arbitrary memory read/write prim-
during the sweep is in 5 out of 6 cases higher than on Coffee
itive [36], (ii) RSA-2048 keys to break the SSH public-key
Lake. In the best case (S2 ), we find 46x more bit flips on
authentication [34], and (iii) the sudo binary to elevate the
Zen 3 (79,306) than on Coffee Lake (1,708). These results
privilege to the root user [11]. We use the bit flips we found
suggest that the effectiveness of a Rowhammer attack does
during the sweep with the best pattern to perform the ex-
not entirely depend on the activation rate, which is generally
ploitability analysis.
higher on Coffee Lake than on Zen 3, but also on enforcing
Results. Our results in Table 9 show that our Zen-based plat- the order of aggressor accesses (i.e., the fencing policy) and
form optimizations have strongly improved the number of de- CPU-specific memory controller optimizations.
vices we can trigger bit flips on, from 5 and 0 devices before
Table 11. Reverse engineered address mappings and offsets for our Zen 4 (Ryzen 7 7700X) system. All memory configurations are single-
channel, single-DIMM, with the tuple indicating the DIMM’s geometry (#subchannels, #ranks, #bank groups, #banks per bank group, #rows).
Exploitability Analysis. The larger number of bit flips after the time for THP coloring as reported in Section 4.4.
our optimizations strongly facilitates exploitation, as we show Discussion. These results show that using the techniques
in Table 10. The PTE attack by Seaborn [35] can be exploited we discussed in this paper, Z EN H AMMER enables practical
in the best case in around one second on both Zen 3 and Coffee Rowhammer exploits on AMD Zen-based platforms for the
Lake. Due to the lower number of exploitable bit flips on Zen 2, first time. We also believe that our insights will make it easier
we need in the best case six times as long (6 s) as on the two to port Rowhammer attacks to newer platforms in the future,
other systems. There is one device (S3 ) where exploitation is such as DDR5 devices, as we will show next.
not possible at all on Coffee Lake due to missing bit flips, but
on Zen 2 and Zen 3 we can find exploitable bit flips in 9 s and 7.3 ZenHammer on DDR5
32 s, respectively. We note that even if the number of bit flips
As part of our evaluation, we tested whether Z EN H AMMER is
is very low (e.g., 3 bit flips on S4 , Zen 3), we were still able
effective in triggering bit flips on more recent devices (DDR5).
to exploit the system in a practical time (23 m 52 s).
We reverse engineered the DRAM address functions of our
The RSA-2048 key attack [34] is on 4 of 5 exploitable
Zen 4 system (Ryzen 7 7700X) and present the functions
devices on average 38 s faster on Zen 3 than on Coffee Lake.
in Table 11. As for DDR4, we randomly picked ten DDR5
Overall, the average time to find an exploitable bit flip is
devices (Table 16 in Appendix D) and repeated the experiment
3 m 52 s, 29 s, and 1 m 6 s for Zen 2, Zen 3, and Coffee Lake,
described in Section 6.2 to find the best fence scheduling
respectively. We note that the device H0 with bit flips only on
policy for each device.
Zen 2 is not exploitable. Our data shows that even if we find a
We found bit flips on only 1 of 10 tested devices (S1 ), sug-
very low number of patterns only (e.g., 7 pattern for S3 ), we
gesting that the changes in DDR5 such as improved Rowham-
still are likely to find an exploitable bit flip (2 m 21 s).
mer mitigations, on-die error correction code (ECC), and a
Lastly, the sudo binary exploit [11] is the hardest attack
higher refresh rate (32 ms) make it harder to trigger bit flips.
as it requires a precise set of bit flips. Given the low number
On S1 with the policy SPnone , we found 109 patterns and
of bit flips on Zen 2, we cannot find any exploitable bit flips
23,110 bit flips during fuzzing. The best pattern triggered
for this attack. For the remaining platforms, Zen 3 and Coffee
41,995 bit flips during the sweep over 256 MiB of memory.
Lake, we find an equal number of exploitable devices (4) and
Given the lack of bit flips on 9 of 10 DDR5 devices, more work
a similar average time to find an exploitable bit flip, 3 m 29 s
is needed to better understand the potentially new Rowham-
and 3 m 55 s, respectively, when excluding devices with a
mer mitigations and their security guarantees.
single bit flip only. The exploitable devices are those that
showed the highest number of bit flips while sweeping on
these platforms 8 Related Work
End-to-End Attack’s Practicality. As our exploitability anal- In this section, we discuss differences between DARE and
ysis is based on simulation results, we further verified the existing tools for reverse engineering DRAM address func-
practicality of the PTE attack by Seaborn and Dullien [36]. tions (Section 8.1). Thereafter, we discuss similar and orthog-
Our attack’s implementation is based on the THP coloring onal approaches used to reverse engineer the DRAM address
technique described in Section 4.4. Moreover, we modified functions (Section 8.2). Lastly, we summarize previous ef-
our Z EN H AMMER fuzzer to use THPs like it has been done forts regarding Rowhammer on pre-Zen AMD systems (Sec-
before for n-sided patterns [9]. This means we distribute ag- tion 8.3).
gressors across THPs such that aggressor pairs are placed on
the same THP and the pattern is spread across multiple THPs. 8.1 Comparison to Existing Tools
We successfully verified the attack’s feasibility on device S2 . In Table 12, we compare our new reverse engineering tool
Over ten successful attack runs (i.e., obtaining root privileges), DARE to the open-source tool DRAMA [32] and concurrent
we report an average time of 93 seconds for the end-to-end at- work AMDRE [14]. DRAMA was not able to recover the cor-
tack once an exploitable bit flip has been found. This includes rect DRAM address mappings on our Zen-based systems,
Table 12. Comparison of DARE with AMDRE and DRAMA. The 8.2 Comparison to Other Techniques
table shows features and changes made for correctness (Corr.), noise
The approaches used by existing work to reverse engineer the
handling (Noise), and performance improvement (Perf.).
secret DRAM address mappings can be divided into software-
Tool Goal based and hardware-based approaches. Software-based ap-
DARE AMDRE DRAMA Corr. Noise Perf. proaches generally require side channels, such as bank con-
Thresh. Detection flicts. Instead, hardware-based techniques require specialized
– Autom. Detection ✔ ✔ ✔
– Reliable Timing ✔ ✔ ✘ equipment like a logic analyzer. We compare the existing
Clustering approaches in Table 13, which we now explain in more detail.
– Superpages ✔ ✘ ✘ Our comparison considers three categories: requirements,
– Pairwise Testing ✔ ✔ ✘ results, and features. For the Requirements, we compare the
Brute forcing
– Address Offsets ✔ ✘ ✘ monetary costs involved (Cst.), if any special hardware is
– Strict Validation ✔ ✔ ✘ needed (HW), and if the method relies on a side channel (SC).
In the Results category, we look at how generic (Gen.) the
approach is (i.e., if it also works with different memory config-
urations), the result’s completeness (Cpl.) w.r.t. the different
DRAM address components, and the result’s precision (i.e.,
while AMDRE could only partially (up to bit 21) recover the how reliable results are). Lastly, the Features category con-
Zen 2 functions due to its limitation to 2 MiB THPs. siders whether the approach can obtain labels for the found
Our changes enabled us to recover the complete and correct functions (Lbl.) and analyze the devices’ internal row remap-
DRAM address mappings in a fast and reliable way. Like ping (RR).
DRAMA and AMDRE, our tool requires superuser privileges
Table 13. Comparison of existing software-based (top) and hardware-
for the virtual-to-physical address translation. However, an
based (bottom) techniques for recovering DRAM address mappings.
attacker could recover the DRAM address mappings offline, Our work uses row buffer conflicts to find the functions and an
i.e., on another system with the same hardware configuration. oscilloscope to verify their validity.
We now discuss our improvements to the existing work.
Requirements Results Features
Reliable Timing. The timing routine used in DRAMA does Technique
Cst. HW SC Gen. Cpl. Prec. Lbl. RR
not reliably work on AMD Zen-based systems, leading to
1 Row buffer conflict
many outliers. In AMDRE, the timing routine works mostly [5, 14, 32, 40, 42, 43]
reliably, except for the few occasions where the automatic 2 Rowhammer [35]
threshold detection fails. We designed an optimized and more 3 Perf. counters [15]
reliable timing routine in Section 4.1.
4 Oscilloscope [32]
Superpages. During reverse engineering, we use all avail-
able 1 GiB superpages as higher physical address bits (above 5 Logic analyzer [31]
1 GiB) are involved in some address mappings. Both DRAMA 6 Retention
and AMDRE can be configured to use more memory; however, + Temp. [20]
only with 4 KiB pages and 2 MiB THPs, respectively.
Requirements. Software-based approaches 1 – 3 are cost-
Pairwise Testing. We reduce false positives by measuring effective, essentially free. Oscilloscopes 4 are affordable ,
pairwise latencies for cluster addresses and removing those while logic analyzers 5 are more expensive . Approach 6
conflicting with less than 75% of the cluster, thus creating requires an FPGA and special heating equipment . Using
perfect bank clusters. AMDRE uses a similar technique to Rowhammer bit flips as side channel 2 requires a vulnerable
remove false positives. device , which might be hard to obtain. To the best of our
Address Offsets. The functions found by DRAMA and AM- knowledge, only server platforms provide hardware-based
DRE are not valid across the whole physical address space. performance counters 3 with DRAM-related data . Be-
This is caused by the remapping of physical memory above sides Rowhammer bit flips 2 , other side channels used are
the 4 GiB mark, which introduces a nonlinearity. DARE is row buffer conflicts 1 and DRAM retention time 6 .
the first tool to take this into account by applying a system- Results. Oscilloscopes 4 , logic analyzers 5 , and Rowham-
specific offset prior to brute forcing the XOR functions. mer 2 are purely generic and support any DRAM de-
Strict Validation. DRAMA only requires that candidate func- vice configuration. Exploiting row buffer conflicts 1 may
tions do not produce the same result across the clusters. Our require tweaking timing thresholds in multi-DIMM/-channel
and AMDRE’s condition is stronger, requiring that every func- setups . Only logic analyzers 5 can recover all DRAM ad-
tion returns the same result on exactly half of all clusters. dress components as the limited number of channels on os-
This condition allows us to filter out many invalid address cilloscopes 4 may make data filtering for some address com-
functions early on during brute forcing the functions. ponent hard or impossible . The retention time approach 6
cannot recover DRAM address bits requiring multiple DRAM DDR5 device for the first time.
devices . The hardware-based approaches 4 – 6 and per-
formance counters 3 provide high precision , whereas row Acknowledgments
buffer conflicts 1 require a reliable timing function . Using
We thank the anonymous reviewers for their feedback.
Rowhammer itself 2 might be imprecise as mitigations in
This research was supported by the Swiss National Sci-
the memory controller or the devices themselves could disturb
ence Foundation under NCCR Automation, grant agreement
the bit flip feedback channel .
51NF40_180545, by the Swiss State Secretariat for Education,
Features. All hardware-based approaches 4 – 6 provide in- Research and Innovation under contract number MB22.00057
formation to derive labels for DRAM address mappings . (ERC-StG PROMISE), and by a Microsoft Swiss JRC grant.
Depending on the availability, performance counters 3 may
have separate counters per bank and/or rank, allowing to de- References
rive some labels only . Rowhammer bit flips 2 and DRAM
retention 6 are the only techniques allowing to reverse the [1] PassMark CPU Benchmarks: AMD vs Intel Mar-
DRAM-internal row remapping . ket Share. URL https://www.cpubenchmark.net/
market_share.html.
Relation to Our Work. Similar to previous work, we rely
on the row buffer conflict side channel 1 to reverse engineer [2] Advanced Micro Devices. BIOS and Kernel De-
the DRAM address mappings. However, as the first work, veloper’s Guide (BKDG) for AMD Family 15h
we take the address offset into account and collect addresses Models 00h-0Fh Processors, January 2013. URL
from multiple superpages, enabling us to recover the correct https://www.amd.com/content/dam/amd/en/
mappings on all Zen-based systems. Furthermore, we use documents/archived-tech-docs/programmer-
an oscilloscope 4 , with the same method as in previous references/42301_15h_Mod_00h-0Fh_BKDG.pdf.
work [32], to physically validate our address mappings.
[3] Advanced Micro Devices. AMD64 Architecture
8.3 Rowhammer on AMD Programmer’s Manual Volume 4: 128-Bit and
256-Bit Media Instructions, November 2021. URL
Little attention has been paid to Rowhammer on AMD in the
https://www.amd.com/content/dam/amd/en/
past decade. The original Rowhammer study from 2014 by
documents/processor-tech-docs/programmer-
Kim et al. [22] showed bit flips on Intel and AMD Piledriver.
references/26568.pdf.
In these older systems, using the same hammering instructions
on the two systems was still effective. We demonstrated that [4] Advanced Micro Devices. AMD64 Architecture Pro-
this is not the case anymore for modern CPUs. grammer’s Manual Volume 3: General-Purpose and Sys-
Later, in 2016, a comparative analysis looked into Rowham- tem Instructions, June 2023. URL https://www.amd.
mer on Intel (Sandy Bridge, Ivy Bridge, and Haswell) and com/content/dam/amd/en/documents/processor-
AMD (Piledriver) platforms. They showed that not only the tech-docs/programmer-references/24594.pdf.
access rate is much lower on AMD (6.1 M/s compared to
11.6 M/s–12.3 M/s), but also the number of bit flips observed [5] Alessandro Barenghi, Luca Breveglieri, Niccolò Izzo,
is roughly two orders of magnitude larger for Intel (16.1 k– and Gerardo Pelosi. Software-only Reverse Engineering
22.9 k) than on AMD (59) [27]. Our findings show a lower of Physical DRAM Mappings for Rowhammer Attacks.
number of bit flips on AMD Zen 2 compared to Intel systems, In IVSW ’18, pages 19–24, July 2018.
even after our optimizations. [6] Yaakov Cohen, Kevin Sam Tharayil, Arie Haenel,
Daniel Genkin, Angelos D. Keromytis, Yossi Oren, and
9 Conclusion Yuval Yarom. HammerScope: Observing DRAM Power
We presented Z EN H AMMER, the first successful Rowhammer Consumption Using Rowhammer. In CCS ’22, pages
attacks launched from AMD Zen-based CPUs. To build Z EN - 547–561, November 2022.
H AMMER, we needed to overcome a number of challenges [7] Lucian Cojocar, Kaveh Razavi, Cristiano Giuffrida, and
including the reverse engineering of the DRAM addressing Herbert Bos. Exploiting Correcting Codes: On the Effec-
functions by taking physical address offsets into account, a tiveness of ECC Memory Against Rowhammer Attacks.
new mechanism for synchronization with refresh commands, In IEEE S&P ’19, pages 55–71, May 2019.
and careful scheduling of flushing and fencing instructions to
improve the activation throughput of Rowhammer patterns. [8] Lucian Cojocar, Jeremie Kim, Minesh Patel, Lillian Tsai,
Z EN H AMMER is capable of flipping bits on 7 and 6 out of Stefan Saroiu, Alec Wolman, and Onur Mutlu. Are We
our ten DDR4 samples on AMD Zen 2 and 3 respectively, Susceptible to Rowhammer? An End-to-End Method-
enabling Rowhammer exploits on recent AMD platforms for ology for Cloud Providers. In IEEE S&P ’20, pages
the first time. We further show Rowhammer bit flips on a 712–728, May 2020.
[9] Finn de Ridder, Pietro Frigo, Emanuele Vannacci, [20] Matthias Jung, Carl C. Rheinländer, Christian Weis, and
Herbert Bos, Cristiano Giuffrida, and Kaveh Razavi. Norbert Wehn. Reverse Engineering of DRAMs: Row
SMASH: Synchronized Many-sided Rowhammer At- Hammer with Crosshair. In MEMSYS ’16, pages 471–
tacks from JavaScript. In USENIX Security ’21, pages 476, October 2016.
1001–1018, August 2021.
[21] Jeremie S. Kim, Minesh Patel, A. Giray Yağlıkçı, Hasan
[10] Pietro Frigo, Emanuele Vannacc, Hasan Hassan, Victor Hassan, Roknoddin Azizi, Lois Orosa, and Onur Mutlu.
van der Veen, Onur Mutlu, Cristiano Giuffrida, Herbert Revisiting RowHammer: An Experimental Analysis of
Bos, and Kaveh Razavi. TRRespass: Exploiting the Modern DRAM Devices and Mitigation Techniques. In
Many Sides of Target Row Refresh. In IEEE S&P ’20, ISCA ’20, pages 638–651, May 2020.
pages 747–762, May 2020.
[22] Yoongu Kim, Ross Daly, Jeremie Kim, Chris Fallin,
[11] Daniel Gruss, Moritz Lipp, Michael Schwarz, Daniel Ji Hye Lee, Donghyuk Lee, Chris Wilkerson, Konrad
Genkin, Jonas Juffinger, Sioli O’Connell, Wolfgang Lai, and Onur Mutlu. Flipping Bits in Memory Without
Schoechl, and Yuval Yarom. Another Flip in the Wall Accessing Them: An Experimental Study of DRAM
of Rowhammer Defenses. In IEEE S&P ’18, pages Disturbance Errors. In ISCA ’14, pages 361–372, June
245–261, May 2018. 2014.
[12] Hasan Hassan, Yahya Can Tugrul, Jeremie S. Kim, Vic- [23] Petr Kobalicek. AsmJit: Low-Latency Machine Code
tor van der Veen, Kaveh Razavi, and Onur Mutlu. Uncov- Generation, 2023. URL https://asmjit.com/.
ering In-DRAM RowHammer Protection Mechanisms:
A New Methodology, Custom RowHammer Patterns, [24] Andreas Kogler, Jonas Juffinger, Salman Qazi, Yoongu
and Implications. In MICRO ’21, pages 1198–1213, Kim, Moritz Lipp, Nicolas Boichat, Eric Shiu, Mattias
October 2021. Nissler, and Daniel Gruss. Half-Double: Hammering
From the Next Row Over. In USENIX Security ’22,
[13] Wei He, Zhi Zhang, Yueqiang Cheng, Wenhao Wang, pages 3807–3824, August 2022.
Wei Song, Yansong Gao, Qifei Zhang, Kang Li, Dongxi
Liu, and Surya Nepal. WhistleBlower: A System-level [25] Andrew Kwong, Daniel Genkin, Daniel Gruss, and Yu-
Empirical Study on RowHammer. IEEE Transactions val Yarom. RAMBleed: Reading Bits in Memory With-
on Computers, pages 1–15, January 2023. out Accessing Them. In IEEE S&P ’20, pages 695–711,
May 2020.
[14] Martin Heckel and Florian Adamsky. Reverse-
Engineering Bank Addressing Functions on AMD [26] Zhenrong Lang, Patrick Jattke, Michele Marazzi, and
CPUs. In DRAMSec ’23, pages 1–6, June 2023. Kaveh Razavi. Blaster: Characterizing the Blast Radius
[15] Christian Helm, Soramichi Akiyama, and Kenjiro Taura. of Rowhammer. In DRAMSec ’23, pages 1–7, June
Reliable Reverse Engineering of Intel DRAM Address- 2023.
ing Using Performance Counters. In MASCOTS ’20, [27] Mark Lanteigne. A Tale of Two Hammers: A Brief
pages 1–8, November 2020. Rowhammer Analysis of AMD vs. Intel. Techni-
[16] Intel. 12th Generation Intel Core Processors, Datasheet cal report, Third I/O, May 2016. URL http://www.
Volume 2 of 2, April 2022. URL https://cdrdv2. thirdio.com/rowhammera1.pdf.
intel.com/v1/dl/getContent/655259. [28] Michele Marazzi, Flavien Solt, Patrick Jattke, Kubo
[17] Patrick Jattke, Victor Van Der Veen, Pietro Frigo, Stijn Takashi, and Kaveh Razavi. REGA: Scalable Rowham-
Gunter, and Kaveh Razavi. Blacksmith: Scalable mer Mitigation with Refresh-Generating Activations. In
Rowhammering in the Frequency Domain. In IEEE IEEE S&P ’23, pages 1684–1701, May 2023.
S&P ’22, pages 716–734, May 2022.
[29] Koksal Mus, Yarkın Doröz, M. Caner Tol, Kristi Rah-
[18] JEDEC Solid State Technology Association. DDR4 man, and Berk Sunar. Jolt: Recovering TLS Signing
SDRAM, September 2012. URL https://www.jedec. Keys via Rowhammer Faults. In IEEE S&P ’23, pages
org/sites/default/files/docs/JESD79-4.pdf. 1719–1736, May 2023.
[19] Michael Fahr Jr, Thinh Dang, Hunter Kippen, Jacob [30] Lois Orosa, Ulrich Rührmair, A. Giray Yaglikci, Hao-
Lichtinger, Andrew Kwong, Dana Dachman-Soled, cong Luo, Ataberk Olgun, Patrick Jattke, Minesh Patel,
Daniel Genkin, and Alexander Nelson. When Frodo Jeremie Kim, Kaveh Razavi, and Onur Mutlu. SpyHam-
Flips: End-to-End Key Recovery on FrodoKEM via mer: Using RowHammer to Remotely Spy on Temper-
Rowhammer. In CCS ’22, pages 979–993, November ature, October 2022. URL https://arxiv.org/abs/
2022. 2210.04084.
[31] Minesh Patel, Jeremie S. Kim, and Onur Mutlu. The [42] Minghua Wang, Zhi Zhang, Yueqiang Cheng, and Surya
Reach Profiler (REAPER): Enabling the Mitigation of Nepal. DRAMDig: A Knowledge-assisted Tool to Un-
DRAM Retention Failures via Profiling at Aggressive cover DRAM Address Mapping. In DAC ’20, July 2020.
Conditions. In ISCA ’17, pages 255–268, June 2017.
[43] Yuan Xiao, Xiaokuan Zhang, Yinqian Zhang, and Radu
[32] Peter Pessl, Daniel Gruss, Clémentine Maurice, Michael Teodorescu. One Bit Flips, One Cloud Flops: Cross-
Schwarz, and Stefan Mangard. DRAMA: Exploiting VM Row Hammer Attacks and Privilege Escalation. In
DRAM Addressing for Cross-CPU Attacks. In USENIX USENIX Security ’16, pages 19–35, August 2016.
Security ’16, pages 565–581, August 2016. [44] Z. Zhang, Y. Cheng, D. Liu, S. Nepal, Z. Wang, and
[33] Rui Qiao and Mark Seaborn. A New Approach For Y. Yarom. PThammer: Cross-User-Kernel-Boundary
Rowhammer Attacks. In HOST ’16, pages 161–166, Rowhammer through Implicit Accesses. In MICRO ’20,
May 2016. pages 28–41, October 2020.
[34] Kaveh Razavi, Ben Gras, Erik Bosman, Bart Preneel, Appendices
Cristiano Giuffrida, and Herbert Bos. Flip Feng Shui:
Hammering a Needle in the Software Stack. In USENIX A Equally-sized Bins in XOR Partition
Security ’16, pages 1–18, August 2016. In Section 4.2, we assumed that the result of any XOR func-
tion on a bin of addresses returns either a constant value (i.e.,
[35] Mark Seaborn. How physical addresses map 0 or 1) for all addresses or evenly splits the addresses. We
to rows and banks in DRAM, May 2015. URL prove this assumption in the following.
https://lackingrhoticity.blogspot.com/2015/
Claim. Consider an aligned power-of-two range of addresses
05/how-physical-addresses-map-to-rows-and-
A = [m · 2n , (m + 1) · 2n − 1] (m, n ∈ N), a XOR function f
banks.html.
which is non-constant on A, and the set of addresses B = {a ∈
[36] Mark Seaborn and Thomas Dullien. Exploiting the A | f (a) = 0}. Partitioning the addresses in B using a different,
DRAM rowhammer bug to gain kernel privileges, non-constant XOR function g results in two equally-sized bins
March 2015. URL https://googleprojectzero. where g is constant 0 and constant 1, respectively.
blogspot.com/2015/03/exploiting-dram- Proof. First, we show that the claim holds for one function
rowhammer-bug-to-gain.html. g1 ̸= f . We construct g1 by extending f to include another
previously unused bit in the XOR computation.6 We note
[37] Andrei Tatar, Cristiano Giuffrida, Herbert Bos, and that adding this new bit leads to a different function result
Kaveh Razavi. Defeating Software Mitigations Against for exactly half of all addresses in B (namely, those where
Rowhammer: A Surgical Precision Hammer. In RAID that address bit is set). As the function result was previously
’18, pages 48–66, September 2018. constant 0 for all b ∈ B, it must now be equally distributed
[38] M. Caner Tol, Saad Islam, Andrew J. Adiletta, Berk between 0 and 1, satisfying our claim.
Sunar, and Ziming Zhang. Don’t Knock! Rowhammer Second, we show that we can successively modify g1 to
at the Backdoor of DNN Models. In DSN ’23, pages obtain an arbitrary function g without changing the size of
109–122, June 2023. the two bins. To do this, we successively add (or remove) a
bit to (or from) the XOR computation in g1 until reaching g.
[39] Chihiro Tomita, Makoto Takita, Kazuhide Fukushima, During each of these steps, the function result will flip for
Yuto Nakano, Yoshiaki Shiraishi, and Masakatu Morii. half of all addresses. We note that the addresses where the
Extracting the Secrets of OpenSSL with RAMBleed. affected bit is set are always split evenly between the two bins.
Sensors, 22(9):3586, January 2022. Thus, the affected addresses are split evenly between the two
bins, keeping the size of the two bins equal after each step
[40] Victor van der Veen, Yanick Fratantonio, Martina Lin- and satisfying our claim for any function g.
dorfer, Daniel Gruss, Clementine Maurice, Giovanni Vi-
gna, Herbert Bos, Kaveh Razavi, and Cristiano Giuffrida. B Heatmap of Memory Access Rates
Drammer: Deterministic Rowhammer Attacks on Mo- Table 14 shows the same data as Table 7. However, we also
bile Platforms. In CCS ’16, pages 1675–1689, October show the instruction sequences that were previously excluded
2016. due to their throughput either being low (≤ 100 ACTs/tREFI)
[41] Hari Venugopalan, Kaustav Goswami, Zainul Abi Din, or very high, indicating cache hits (≥ 1000 ACTs/tREFI).
Jason Lowe-Power, Samuel T. King, and Zubair Shafiq. In Table 15, we show the results of the same experiment
Centauri: Practical Rowhammer Fingerprinting, June for the Intel Coffee Lake system.
2023. URL https://arxiv.org/abs/2307.00143. 6 Alternatively, a bit could be removed from the XOR computation.
Table 14. Heatmap of memory access rates (in ACTs/tREFI) for all Table 15. Heatmap of memory access rates (in ACTs/tREFI) for all
tested instruction sequences and varying numbers of accessed rows on tested instruction sequences and varying numbers of accessed rows on
the AMD Z 3 system. We abbreviate scatter, fence each by “s.f.e.” the Intel CL system. We abbreviate scatter, fence each by “s.f.e.”
Figure 7. Activation rates and possible pattern orderings for non-uniform hammering patterns when using different scheduling policies. The
data was collected on Z 3 using the MFENCE barrier.
C Modelling Fence Scheduling Policies Observations. As expected, the scheduling policies differ
significantly in the trade-off they provide. SPnone provides
In this appendix, we first present a theoretical model for the
very high activation rates, as it allows the most reordering.
amount of ordering enforced by a scheduling policy (as de-
On the contrary, SPfull allows zero reordering at the expense
scribed in Section 6.2) based on a simple CPU behavior model.
of low activation rates (of 37 ACTs/tREFI on average). The
We then evaluate the trade-off provided by different schedul-
pattern-aware policies show two different types of distribu-
ing policies by contrasting the amount of ordering provided
tions. For SPBP and SPBP/2 , the distributions are somewhat
with the patterns’ hammering speeds.
similar to SPnone , albeit without the very fast outliers. On the
Computing Pattern Permutations To analyze the amount other hand, SPpair and SPrep provide ordering that is nearly as
of ordering provided by a scheduling policy, we use a model strict as SPfull , while allowing faster hammering when com-
for the processor’s memory subsystem which assumes that pared to the latter, with average activation rates increased by
(a) load requests cannot be reordered around memory barriers, 51 % (SPpair ) and 39 % (SPfull ) respectively.
as guaranteed by M/LFENCE [4], and (b) all load requests are Based on these results, we believe SPpair and SPrep could
served by DRAM, including consecutive ones to the same be well suited to reduce the amount of fencing without signif-
cache line with flushing in between accesses (Obs. 4). Us- icantly impacting a pattern’s ordering.
ing this model, we can compute the number of theoretically
possible orderings of a hammering pattern. D Analyzed DDR5 Devices
We assume that patterns are always ordered at their be-
ginning and their end, and we compute the number of per- In Table 16, we present the list of ten randomly chosen DDR5
mutations for each interval (delineated by memory barriers) UDIMMs covering all three major manufacturers, i.e., Sam-
individually. For a multiset M, containing l different elements sung, SK Hynix, and Micron. We report each device’s pro-
with multiplicities m1 , m2 , . . . , ml , the number of permuta- duction date, speed, size, and DRAM geometry.
tions is given by the multinomial coefficient m1 ,mm
=
2 ,...,ml Table 16. DDR5 UDIMMs used in the evaluation of our AMD Zen-
m1 ! m2 ! ... ml ! . To obtain the total number of all permutations,
m!
optimized Rowhammer fuzzer. We abbreviate the DRAM vendors
we multiply the numbers for the different intervals. Samsung (S ), SK Hynix (H ), and Micron (M ). We report for each de-
In practice, it is highly unlikely that memory accesses are vice, the number of subchannels (SC), ranks (RK), bank groups (BG),
reordered over large distances, even if theoretically possible banks per bank group (BA), and rows (R).
based on ordering semantics. However, as the realistic extent Production Freq. Size DIMM Geometry
of reordering is unknown, we use this simpler model. ID
Date [MHz] [GiB] (SC,RK,BG,BA,R)
Example. To illustrate, we use an example non-uniform pat- S0 Q4-2021 4800 8 (2, 1, 4, 4, 216 )
tern |a1 a2 a1 a2 a3 a4 | where fences are shown using verti- S1 Q4-2021 4800 16 (2, 1, 8, 4, 216 )
cal bars. The number of possible orderings is computed as S2 Q4-2021 5600 8 (2, 1, 4, 4, 216 )
6 6!
2,2,1,1 = 2! 2! 1! 1! = 180. When inserting another fence after S3 Q4-2021 4800 8 (2, 1, 4, 4, 216 )
the fourth access (corresponding to SPpair ), we get the pattern H0 Q4-2021 4800 8 (2, 1, 4, 4, 216 )
4 2 (2, 1, 8, 4, 216 )
|a1 a2 a1 a2 |a3 a4 | with 2,2 · 1,1 = 12 possible orderings. M0 Q4-2021 4800 16
By inserting a single memory barrier in the middle of the M1 Q4-2021 4800 16 (2, 1, 8, 4, 216 )
patterns, the number of possible orderings has been reduced M2 Q4-2021 4800 16 (2, 1, 8, 4, 216 )
drastically. M3 Q4-2021 4800 16 (2, 1, 8, 4, 216 )
M4 Q4-2021 4800 16 (2, 1, 8, 4, 216 )
Ordering vs. Hammering Speed. To explore the trade-off
provided by our scheduling policies, we contrast the provided
ordering and hammering speeds of 15 K random non-uniform
patterns. We implement all proposed scheduling policies (see
Table 8) in our fuzzer, hammer the generated patterns using
the different policies, and record their activation rates. We
then compute the number of pattern permutations using the
theoretical model introduced above.7
We plot the results for Z 3 in Figure 7, where we show,
for each policy and each generated pattern, the hammering
speed (x-axis) and the number of possible orderings (y-axis).
We omit the similar results from Z + , where we also ran this
experiment.
7 Toaccount for different pattern lengths (L), we use the normalized
√
ordering metric Ñ := L N, where N is the number of possible orderings.