2022 04 Conference HPCA Mithril
2022 04 Conference HPCA Mithril
Managed Refresh
Michael Jaemin Kim† Jaehyun Park† Yeonhong Park‡ Wanju Doh§ Namhoon Kim†
Tae Jun Ham‡ Jae W. Lee‡ Jung Ho Ahn†§
Dept. of {Intelligence and Information† , Computer Science and Engineering‡ }, Prog. in Artificial Intelligence§
Seoul National University
{michael604, wogus20002, ilil96, wj.doh, sirius0323, taejunham, jaewlee, gajh}@snu.ac.kr
Abstract—Since its public introduction in the mid-2010s, reasons, architectural solutions have emerged as promising
the Row Hammer (RH) phenomenon has drawn significant alternatives.
attention from the research community due to its security One of the important design decisions for an architectural
implications. Although many RH-protection schemes have been
proposed by processor vendors, DRAM manufacturers, and RH-protection scheme is to determine where to implement
academia, they still have shortcomings. Solutions implemented the proposed solution within the system. In practice, most
in the memory controller (MC) incur increasingly higher costs RH-protection solutions are either implemented in an on-
due to their conservative design for the worst case in terms die memory controller (MC) or a DRAM device. For ex-
of the number of DRAM banks and RH threshold to support. ample, Graphene [43], BlockHammer [56], and PARA [30]
Meanwhile, DRAM-side implementation either has a limited
time margin for RH-protection measures or requires extensive have been proposed for implementation on the processor-
modifications to the standard DRAM interface. Recently, a side MC, whereas TWiCe [32] and industry-oriented RH-
new command for RH-protection has been introduced in the protection schemes [40], [15] are implemented in DRAM.
DDR5/LPDDR5 standards, referred to as refresh management Unfortunately, both choices have their own drawbacks.
(RFM). RFM enables the separation of the tasks for RH- First, the MC-side implementation needs to provision
protection to both MC and DRAM by having the former
generate an RFM command at a specific activation frequency RH-protection resources for the worst-case scenario, where
and the latter take proper RH-protection measures within the expected F lipT H level is very low and the processor
a given time window. Although promising, no existing study is connected to the maximum number of DRAM banks
presents and analyzes RFM-based solutions for RH-protection. it supports. As a result, this strategy tends to require a
In this paper, we propose Mithril, the first RFM interface- large extra area for the counter structures utilized by the
compatible, DRAM-MC cooperative RH-protection scheme
providing deterministic protection guarantees. Mithril has RH-protection mechanism. DRAM-side implementations are
minimal energy overheads for common use cases without adver- free from such concerns, as F lipT H of a specific DRAM
sarial memory access patterns. We also introduce Mithril+, an is more accurately estimated by DRAM vendors, and the
optional extension to provide minimal performance overheads resource usage is proportional to the number of DRAM
at the expense of a tiny modification to the MC, while utilizing banks because on-DRAM RH-protection schemes are often
existing DRAM commands.
deployed on a per-bank or per-DIMM basis. However, such
on-DRAM protection schemes have interface issues. To
I. I NTRODUCTION secure the time margin for the extra operations for potential
RH victim rows, DRAM-side schemes must either request
Row Hammer (RH) has been critical DRAM reliability the MC to generate non-standard adjacent row refresh
and security vulnerabilities that have troubled the indus- (ARR) commands or perform extra operations during the
try for almost a decade. This refers to a phenomenon auto-refresh process (ordinary DRAM operation) in a way
in which a certain frequently activated row (aggressor) transparent to the MC. The former mechanism breaks the
results in bit-flips in the corresponding adjacent rows (vic- abstraction that DRAM is a passive device, whereas the
tims). In particular, RH is incurred when the activation latter [15], referred to as the time-margin-stealing method,
rate exceeds the RH threshold (F lipT H ). RH is especially is not always possible depending on DRAM characteristics
dangerous as it breaks the basic integrity guarantee in such as the time margin during the auto-refresh process.
the computer system and can be abused in various attack Refresh Management (RFM) is a newly added extension
scenarios [1], [18], [13], [46], [12], [55], [59]. for the latest DDR5 and LPDDR5 interfaces [23], [22], al-
The criticality of this problem has motivated many RH- lowing the DRAM-side implementation of an RH-protection
protection solutions. There exist several software-based solu- solution to cooperate smoothly with an MC. An MC sends
tions [4], [8], [55], [20], [31], but such of these typically in- an RFM command at a specific activation frequency to a
curs a high-performance cost and have limited coverage (i.e., target DRAM bank without specifying a target row. The
only effective against a specific attack scenario). For these DRAM-side RH-protection scheme exploits the time margin
1
provided by the RFM command to undertake necessary multiple banks. Each bank allows independent ACT, PRE,
operations. This cooperation between the MC and DRAM read, and write operations. Multiple banks form a rank,
effectively avoids the critical drawbacks of MC- or DRAM- which shares the memory channel with other ranks and the
side only implementations. memory controller (MC) at the host side.
Despite its promising traits, the applicability of RFM as Due to the inherent characteristic of a DRAM cell ca-
an RH-protection scheme has not been publicly verified or pacitor, by which the stored charge leaks over time, the
properly evaluated to the best of our knowledge. A prior cell value must be restored periodically [7], [5]. This type
probabilistic scheme [30] can be trivially applied. However, of periodic restoration, referred to as an auto-refresh, is
prior deterministic (guaranteeing not to exceed F lipT H ) initiated at every refresh (REF) command within the tRFC
schemes cannot be directly applied to the RFM interface. (refresh time) period. Every DRAM row must be refreshed
Prior ARR-based schemes reactively issue a command tar- at least once during every refresh window period (tREFW)
geting a specific row when the activation count reaches to be safe from this charge retention problem. In modern
a scheme-specific predefined threshold. However, given its DRAM devices (e.g., DDR5 [23]), all rows in a single bank
periodicity, the RFM interface is prone to the worst-case are divided into typically 8,192 groups. A group is refreshed
scenario where a large number of rows will simultaneously in every time interval tREFI (refresh interval).
require a preventive refresh in a short time period, unlike
the ARR-based schemes. Thus, prior approaches are not B. Row Hammer Phenomenon
compatible with the RFM interface. Row Hammer (RH) refers to a phenomenon in which
In this paper, we propose Mithril, a novel RFM- repetitive activations of a specific row (aggressor) lead to
compatible, deterministic RH-protection scheme that ex- bit flips in physically nearby rows (victims) [39], [30], [42],
ploits MC and DRAM in a cooperative manner. To avoid [57]. A bit flip is observable when the ACT count reaches
the aforementioned concentration of rows to refresh for RH- a certain RH threshold (F lipT H ) without being refreshed
protection, we utilize a greedy approach when selecting inside a tREFW time window. Because two aggressors can
the target row to refresh upon every RFM command. We simultaneously affect a single victim, F lipT H /2 ACTs on
investigate the effective use of streaming algorithms [38] each aggressor can cause a bit flip (double-sided attack).
(Section III) and provide a new mathematical proof through The F lipT H value varies depending on different chips,
which we guarantee deterministic protection by maintaining generations, and/or DRAM manufacturers [29]. The RH
the greedy selection scheme (Section IV and Appendix). problem has worsened following the current scale-down
Finally, we propose 1) a hardware scheme to obviate trend of fabrication technology, due to the intensified inter-
the need for counter table resets, which were mandatory cell interference. Recent studies [15], [29] reported that
in prior studies; 2) an algorithmic optimization for energy F lipT H has been reduced to a mere several thousand ACTs.
savings; and 3) an extension to the RFM interface to mitigate It has also been observed that non-adjacent rows affect the
the performance overhead by exploiting the memory access victim rows when activated frequently, which degrades the
patterns of ordinary workloads. effective F lipT H .
The key contributions of this paper are as follows:
• We propose Mithril, the first RFM-based RH-protection
C. Classifying Prior RH Mitigation Schemes
scheme with deterministic safety guarantees, exploiting a As shown in Table I, existing architectural RH-protection
modified Counter-based Summary algorithm [37], [36]. schemes all have four important criteria of a 1) protection
• We provide a rigorous mathematical proof of the modi- guarantee, 2) type of remedy, 3) implementation location,
fied algorithm and the RH safety of Mithril. and 4) tracking mechanism.
• We suggest energy and performance optimization tech- 1) Protection Guarantee: There exist two different types
niques that exploit the memory access patterns of com- of RH-protection guarantees, deterministic and probabilistic.
mon, non-adversarial workloads. The deterministic guarantee ensures RH-protection by
guaranteeing that a victim row is always refreshed before the
II. BACKGROUND
number of ACTs exceeds F lipT H on its aggressors, either
A. DRAM Refresh by an extra preventive refresh or the normal auto-refresh.
DRAM stores a single bit in a cell, composed of one This type utilizes a counter structure to track the aggressor
capacitor and one access transistor [41]. These cells are row and deals with it by applying a certain remedy. The
organized into rows and columns. A DRAM row, the cells main drawback of a deterministic scheme is its higher area
of which share a wordline, is the granularity of the acti- overhead due to the large counter structure.
vation (ACT) and precharge (PRE), respectively allowing The probabilistic guarantee prevents RH with a certain
and disallowing read or write operations on the row. The probability. The probabilistic approach has its strength in the
read and write operation involves accessing a certain number minimal area overhead. However, the performance overhead
of columns in an activated row. DRAM is composed of is exacerbated severely when the target F lipT H level is
2
Table I
C ATEGORIZATION OF EXISTING ROW H AMMER MITIGATION SCHEMES AND M ITHRIL
lowered or when the number of DRAM devices in the system Memory Controller DRAM
increases. It does not provide a deterministic protection RFM Logic
RH Protection
guarantee, either. RAA Counter [Last] Scheme
RFM Command DRAM Bank [Last]
2) Remedies of Prior RH-protection Schemes: Prior
...
(tRFM Time Margin)
...
works exploited one of two remedies, adjacent row refresh RAA Counter [1] RH Protection
(ARR) or throttling. ARR refers to a type of command that RAA Counter [0] Scheme
the MC issues to DRAM with an explicit target row address DRAM Bank [0]
(either aggressor or victim) at a required moment. It triggers
(a) RFM organization
an extra preventive refresh on the potential RH victim rows
within the time margin provided by the command. This Find Bank &
MC Scheduler ACT Issue
differs from the normal REF command, which is row- RAA Counter ++
3
Table II
S YMBOLS AND THEIR DESCRIPTIONS USED FOR DRAM REFRESH , RH, AND RFM
Symbol Description
tREFW Per row auto-refresh interval (e.g., 32ms or 64ms)
F lipT H RH threshold
RF MT H RFM threshold
Preventive refresh Extra refresh of potential RH victim rows. Executed during ARR, RFM command, or hidden under auto-refresh.
4) Tracking Mechanism and Streaming Algorithms: Each by the DRAM device, the MC issues an RFM command
RH-protection scheme has its own tracking mechanism to only to the corresponding bank and resets the RAA counter
identify the aggressor or victim rows with high ACT counts. for the target bank. The larger the RF MT H , the lower the
The tracking mechanism of a probabilistic scheme is often frequency of the RFM command, which reduces the effect
insignificant. However, for a deterministic scheme, it is cru- on the system performance. At every RFM command issue,
cial to choose an effective tracking mechanism to minimize the recipient bank receives a time margin (tRFM) during
the area overhead of the counter structure. One class of which no disturbance from any other regular operation is
tracking mechanisms is based on streaming algorithms [38], guaranteed.
which are most effective when estimating the ACT counts A key difference with regard to the prior ARR command
of rows when the counter table size is limited. Multiple is that RFM is row agnostic and periodic (i.e., it cannot
prior works [32], [43], [56] explicitly leverage or can be be issued in a bursty way). In a sense, it can be seen
interpreted as based on such streaming algorithms. as an extension of the time-margin stealing method. The
The streaming algorithm was first invented and developed format of an RFM command is similar to that of a per-
unrelated to the RH problem in the field of data mining to bank REF command [23], [22] specifying the bank to apply
analyze fast and dense data streams with limited memory. A RFM, but not a certain row. Therefore, it requires minimal
certain subset of the algorithms estimates the total number additional complexity to the MC. The symbols related to
of occurrences per input element. Considering the fact the DRAM refresh, RH, and RFM are summarized in Table II.
ACT commands with an address are “streamed” from the
MC to DRAM, a subset of the streaming algorithms can be III. I NVESTIGATING RFM- BASED S CHEMES
utilized to estimate the ACT count per row address. Thus, RFM as a remedy for RH-protection allows for DRAM-
they are suitable as an effective tracking mechanism of an side implementation with MC cooperation, eliminating mul-
RH-protection scheme. They report the approximate number tiple drawbacks of MC-side- or DRAM-side-only implemen-
of occurrences for each element (address), referred to as the tation. First, RFM can minimize the aforementioned overkill
estimated count, instead of the actual count. Generally, the of the MC-side-only implementation because it can use an
resolution (or the error) of the algorithm is higher (lower) accurate prediction of F lipT H and even set the RF MT H
when more memory is used. value after testing the manufactured DRAM chip. It also
Several other works [49], [48], [26] use the different ap- scales according to the number of DRAM devices that are
proach of a grouped counter. They allocate multiple rows to actually attached to the host. Second, RFM also provides a
a single counter to reduce the area overhead of the tracking standard interface that a DRAM-side RH-protection scheme
mechanism. They optimize further by dynamically adjusting can utilize to gain an additional time margin for RH preven-
the allocation or by utilizing the characteristics of DRAM. tive refreshes. The ARR command assumed in many prior
works is not supported in the recent DDR interface. RFM
D. RFM Interface as a New Remedy is newly being adopted and is now recommended as the
The RFM interface has been newly introduced as an primary method for RH-protection [25], [24].
alternative remedy that allows for DRAM-MC cooperation.
It is suggested as the primary means of RH-protection A. Incompatibility of Prior Approaches
by the JEDEC committee [24], [25]. The RH-protection Although promising, prior approaches based on ARR
scheme resides on the DRAM-side while the MC provides are not effective in RFM because RFM is vulnerable to
a periodic but DRAM-row agnostic time margin to the the concentration of victim rows that require a preventive
DRAM bank. Periodic here is not based on time but on refresh. The ARR-based scheme has its own predefined
the number of ACTs over a single DRAM bank. Figure 1 threshold value directly related to the target F lipT H . When
shows an example of a main-memory organization scheme its tracking mechanism detects the ACT count of an aggres-
using an RFM interface and RFM issue logic. An MC has sor row reaching the predefined threshold, it immediately
a Rolling Accumulated ACT (RAA) counter per bank that issues an ARR command and executes preventive refreshes
keeps track of the number of ACTs on its bank. When to guarantee the deterministic RH safety. For example,
the RAA count reaches the RFM threshold (RF MT H ) set Graphene with ARR can provide safety for F lipT H that is
4
50 Yes
RFM-Graphene ne
phe Address Increase the counter
Gra Address
40 R R-
Safe 𝐹𝑙𝑖𝑝!" (K)
Figure 2. Ineffectiveness of RFM-Graphene compared to the original greedy selection policy properly, the streaming algorithm
ARR-Graphene. Note that the inverse of the predefined threshold indicates must link the actual ACT count to the lower and upper bound
a larger table size with a lower resolution.
of the estimated ACT count. We explain this in detail with
an example.
linear to the predefined threshold (red line in Figure 2). Even Counter-based Summary (CbS) algorithm [37], [36], [2]
if the predefined threshold is low, the relationship between is a representative streaming algorithm that matches such
predefined threshold and F lipT H does not change. needs. The CbS algorithm has a table of entries, each holding
However, when this ARR-based approach is applied to the an address and a counter. When the queried address hits an
RFM interface, there is a limit to F lipT H that is guaranteed entry in the table (on-table), the counter in the corresponding
to be safe regardless of how low the predefined threshold entry is incremented by one. When it misses the table
is set (see Figure 2). With the same prior approach, one (off-table), it replaces the address of the entry with the
scheme could set a predefined threshold and buffer the minimum counter value in the table with the queried address.
aggressor rows that reach it. Then, when the subsequent It then increments its counter by one (see Figure 3). Due
RFM command is issued, the postponed preventive refresh to its monotonically increasing nature and swapping, the
can be executed on the corresponding adjacent victim rows. accumulated counter value above the minimum in the table
However, such a scheme is vulnerable when multiple aggres- belongs to the currently written address. In contrast, the ones
sor rows reach the predefined threshold in a short period. below the minimum cannot find their source.
For example, when the predefined threshold is 2K and the
RF MT H is reasonably set to 64 (see Section VI), the safe On-Table Addr: Estimated Count = W ritten Counter V alue
F lipT H becomes 20K, not 10K. This occurs because 310 Off-Table Addr: Estimated Count = M in
rows can reach 2K in a single tREFW period; thus the last Actual Count ≤ Estimated Count (1)
buffered row must wait through (310×64) ACTs. Estimated Count ≤ Actual Count + M in (2)
B. Greedy Selection The CbS algorithm reports the estimated (ACT) count of
To prevent the concentration of victim rows requiring a an on-table address with its written counter value, whereas
preventive refresh in an RFM-based scheme, it is necessary the count of an off-table address is estimated with the
to properly select the target row and refresh its victims, even minimum value in the entire table. Inequalities (1) and (2)
if the ACT count of the row has not reached F lipT H or correspondingly show the lower-bound and upper-bound of
another predefined threshold. In particular, we propose the the estimated count in relation to the actual (ACT) count.
use of the greedy selection of a target row upon every RFM Min denotes the minimum counter value in the table.
command for the RFM-based scheme. First, based on the lower bound (inequality (1)) of
An intuitive method for the proper selection of a row at the estimated count, the RH-protection scheme is able
every RFM command is to greedily choose the row with to act upon an inaccurate, yet conservatively large ACT
the highest estimated ACT count based on the tracking value. This allows the scheme to provide deterministic
mechanism. Also, after choosing the row and refreshing its safety [32], [43], [56]. Second, the upper-bound (inequality
victims, it is logical to reset or minimize the estimated ACT (2)) of the estimated count is also necessary to decrement
count of the selected row to assist with the decision at the the estimated count of the greedily selected row at the
next RFM command, as the actual ACT count is now 0 after RFM command, where the actual ACT count is now 0.
the refresh. Based on this simple basic principle, we search Without this upper-bound, the estimated count cannot be
for the proper tracking mechanism. decremented safely. The lossy-counting algorithm used in
TWiCe [32] also has both the lower and upper bound of the
C. Counter-based Summary estimated counts, but is less efficient algorithmically (as is
We choose to use some variant of streaming algorithms for later shown in Figure 6). It causes fewer preventive refreshes
the RFM-based RH-protection scheme. While the grouped at the cost of a higher area overhead. Thus, we choose the
counter approach was effective in ARR-based work, it is CbS algorithm as the basic building block of our tracking
no longer efficient in RFM (Section III-D). To support the mechanism.
5
… Memory
# of Banks ounter
greater performance and energy overhead. We discuss this
Controller
RAA Counter [0]
…
further in Section VI.
DDR PHY
ACT, RFM command DRAM Chip IV. M ITHRIL
1 # of Based on the investigation of the RFM-based RH-
…
…
Banks
protection schemes in Section III, we present Mithril, the
𝑴𝒊𝒏𝑷𝒕𝒓 first RFM-interface-compatible RH-protection scheme pro-
viding a deterministic protection guarantee. It exploits a
Count CAM
Control modified CbS algorithm for counter management.
Logic 3
DRAM
𝑴𝒂𝒙𝑷𝒕𝒓
Preventive cells A. Organization
Refresh
Find Max Logic
The Mithril logic in each DRAM bank is composed of a
Address CAM
counter structure (henceforth the Mithril table), two pointers
(M axP tr and M inP tr), and the control logic (Figure 4).
Mithril logic 2
To be more specific, the Mithril table comprises two CAM
structures, one storing the row address and the other the
Figure 4. Mithril hardware implementation. An identical Mithril module ACT count. Each ACT counter is directly related to a single
with the logic and CAM structure is populated per bank, at every DRAM row address. The M axP tr and M inP tr pointers are also
chip. Here, 1 , 2 , and 3 denote the high-level command flow.
employed as index pointing registers. The Mithril structure
including the CAMs and logic must be equipped in every
There exists other streaming algorithms that only have bank at every DRAM chip (Figure 4).
a lower bound of the estimated count, such as Count-min
Sketch [11], but it can only be used in throttling based B. Operation
works such as BlockHammer [56]. Others that do not have Figure 5 illustrates how Mithril manages the corre-
the lower bound such as Sticky-sampling [35] or Count- sponding Mithril table and the two pointers, M axP tr and
sketch [11] cannot provide deterministic safety. M inP tr. The Mithril logic of the corresponding DRAM
bank is informed at every ACT command (with an address)
D. Grouped Counter Approach
or RFM command (without an address). If the Mithril logic
The grouped counter approach was another type of track- receives an ACT command, the count CAM, M axP tr, and
ing mechanism in ARR-based works. However, prior works M inP tr are updated. To be more specific, first, Mithril
that augmented this methodology are not compatible with checks if the address table already tracks the activated row
or efficient at the RFM interface. CBT [49], [48] is the address. If so, the associated ACT counter is incremented by
representative scheme of this type. First, it cannot utilize one. When the row address misses, the address of the entry
the RFM opportunities during its tree construction phase. indicated by M inP tr is replaced with the requesting row
Suppose it chooses to refresh a group prematurely that is not address, and its counter is incremented by one. If affected,
fully split. In such a case, it will have to refresh many rows M axP tr and M inP tr are updated at each step to point
too conservatively. Second, even after the tree is constructed, correspondingly to the correct maximum and minimum.
having a leaf node of a size larger than eight rows will not fit Thus far, the operation is identical to that of the original
into a single tRFM period, leading to the stacking of refresh CbS algorithm.
loads. CAT-TWO [26], which extends CBT, may guarantee When the Mithril logic instead receives an RFM com-
that a leaf is small (covering a single row) enough, but only mand, Mithril selects the entry pointed via M axP tr
at the cost of a higher area overhead. (greedy-selection). It performs a preventive refresh for the
two victim rows associated with this entry, identified as the
E. Probabilistic RFM-based Scheme prime candidates of the aggressor rows. Then, the counter
An RFM-compatible probabilistic RH-protection scheme value is decremented to the table’s minimum value pointed
(henceforth PARFM) can be built in a manner simi- by M inP tr. M axP tr is also updated correspondingly. The
lar to PARA [30]. Whenever an RFM command arrives, new M axP tr must be found during the RFM time window.
PARFM randomly samples a single aggressor row among
the last RF MT H ACTs. PARFM’s protection capability C. Mathematical Proof of Protection Guarantee
depends solely on RF MT H . By adjusting RF MT H prop- Mithril guarantees RH safety by preventing the ACT count
erly, PARFM can guarantee probabilistic safety on the target of any row from reaching F lipT H by continuing the greedy
F lipT H . However, as F lipT H decreases, PARFM requires selection and preventive refresh processes. This contradicts
a lower RF MT H than those in deterministic RFM-based prior works which triggered a preventive refresh at the
schemes to maintain a high safety probability, leading to exact hazardous moment where a row reaches a predefined
6
1 ACT 0xA0 ACT 0xE0 RFM
Figure 5. Sequence of ACT and RFM commands and the corresponding update of the Mithril table. 1 , 2 , and 3 correspond to those in Figure 4.
6
threshold ACT value. To prove the deterministic safety of 𝐹𝑙𝑖𝑝!"
5 1.56K (CbS)
Mithril, we initially prove that continuously applying greedy
bound in the rate of the estimated ACT count increment 12.5K (CbS)
3 25K (CbS)
during tREFW. That upper bound is defined by an equation 50K (CbS)
2
with Nentry (the number of Mithril counter entries) and 25K (Lossy-Counting)
50K (Lossy-Counting)
RF MT H , as follows: 1
0
Theorem 1. Within any tREFW, an increase in the estimated 16 116 216 316 416
count for any single row is bounded to M , which is a 𝑅𝐹𝑀𝑇𝐻
function of Nentry and RF MT H . Figure 6. Each line denotes the possible configuration of Nentry
(represented by the counter table size) and RF MT H that can protect victim
Nentry tRFC
! rows against RH at the given F lipT H value. The RFM-based scheme built
X RF MT H RF MT H tREFW(1 − tREFI ) with a Lossy-Counting algorithm is also indicated by the dotted lines.
M= + −2
k Nentry tRC × RF MT H + tRFM
k=1
Then, by setting Nentry and RF MT H so that M is less when determining Nentry . The target F lipT H level can be
than (F lipT H /2), Mithril can deterministically prevent RH adjusted by tweaking the RF MT H value even if Nentry is
from experiencing double-sided attacks. The detailed proof fixed. This flexibility can be handy when the scheme must be
of Theorem 1 is provided in the Appendix (Section IX). built based on the predicted F lipT H level and thus a fixed
area, as it can avoid excessive performance/energy overhead.
D. Configuring Nentry and RF MT H
There are multiple possible Mithril configurations for a
E. Wrapping Mithril Counters
single target F lipT H because both Nentry and RF MT H
can change to satisfy M < F lipT H /2. Figure 6 plots The absolute counter value of the Mithril table can
(Nentry , RF MT H ) pairs that satisfy this condition for increase in an unbounded manner during its run-time,
various F lipT H values (e.g., 1.5K, 3.125K, ..., 50K). First, which complicates the hardware implementation. Prior
a trade-off is depicted between Nentry and RF MT H re- works solved this issue by periodically resetting the en-
gardless of F lipT H . The decreased Nentry implies less tire table [43], [32] or by using a duplicate counter table
area usage but results in a lower RF MT H , incurring more in an interleaving fashion [56]; these two strategies lead
performance and energy overhead due to more frequent to two-fold degradation of the predefined threshold level
issuing of RFM commands. This trade-off exists for all (from F lipT H /2 to F lipT H /4) and the area, respectively.
instances of F lipT H , but the appearance of the curve differs However, Mithril can avoid this. Unlike prior approaches,
across various F lipT H values. A scheme similar to Mithril Mithril does not require the absolute value of the estimated
but based on a Lossy-counting algorithm is also noted at count. Instead, we require the relative difference of the
F lipT H values of 50K and 25K, which clearly demonstrates estimated count in the minimum estimated count on the
a larger table for a given F lipT H . Mithril table. Moreover, due to the operational behavior
When F lipT H is sufficiently high (e.g., larger than of Mithril, the maximum difference between the M axP tr
12.5K), it is possible to set RF MT H to approximately 256 and M inP tr counter values is always bounded. Therefore,
at a relatively small Nentry . Then, Mithril can achieve RH- we adopt a wrapping counter for Mithril table implemen-
protection with relatively low area, performance, and energy tation. If we provision enough bits capable of expressing a
overhead. In contrast, when F lipT H is low, maintaining value larger than the maximum difference in the table, the
the low performance/energy overhead (i.e., sufficiently large wrapping counter can always correctly identify the relative
RF MT H ) requires a substantially larger Nentry . Overall, size relationship among Mithril table entries. Through this
this is a trade-off that a DRAM vendor must consider implementation, we acquire a two-fold benefit.
7
dynamic energy overhead additional 𝑁!"#$% 150
Accessed row
Relative energy (%)
100
150
200
100
150
200
100
150
200
50
50
50
50
0
0
15
Accessed Row
𝐴𝑑𝑇𝐻
row
(𝐹𝑙𝑖𝑝𝑇𝐻 , 𝑅𝐹𝑀𝑇𝐻 ) (3.125K, 16) (6.25K, 64)
Multi-programmed (3.125K, 16) (6.25K, 64)
Multi-threaded 10
Accessed
Multi-programmed Multi-threaded 5
Activated Row
row
10
Activated
5
V. E NHANCING M ITHRIL F URTHER 0
15 20 25 30 35 40 45
A. Adaptive Refresh time (μs)
(c) Activate pattern (small time window)
Section IV assumed that Mithril performs a preventive
Figure 8. Example of a large object sweep pattern of lbm in SPEC
refresh for every RFM command. However, if Mithril can CPU2017: (a) the memory access pattern in the large time window, (b)
successfully distinguish a benign memory access pattern magnified to a small window, (c) the activation pattern in the small window.
from an RH attack pattern, we can skip some of the RFM
commands. We find that the difference between the M axP tr However, such an effect is minimal unless AdT H is very
and the M inP tr count values is an effective identifier high. Figure 7 shows a small increase in Nentry , a maximum
of such different patterns. Thus, we propose to perform of 12% at only a very low F lipT H value. Proof of the
a preventive refresh only when this difference exceeds a adjusted bound can be derived from Theorem 1 but is
certain threshold (AdT H ). This is referred to as an adaptive omitted here due to a lack of space.
refresh policy.
The difference between the M axP tr and the M inP tr B. Mithril+
count values serves as a decent proxy of possible RH attacks, The adaptive refresh policy allows Mithril to skip a
as large difference implies a high concentration of memory preventive refresh even when the RFM command is issued
accesses to a small number of rows. Therefore, if AdT H is by the memory controller. By doing so, Mithril can reduce
set large enough, Mithril with the adaptive refresh policy can energy consumption but not the performance overhead. Re-
effectively filter out the ACT patterns observed by normal gardless of whether a DRAM component actually performs
workloads. Figure 7 shows the effectiveness of the adaptive refreshes, the MC will continue to issue RFM commands at
refresh policy, nearly eliminating additional energy overhead every RF MT H ACT.
with benign workloads (see Section VI for the details of the Inspired by such a limitation, we propose an optional,
experimental setup). more invasive extension of Mithril, termed Mithril+, which
Among the multiple AdT H values, we can identify that prevents the MC from issuing unnecessary RFM commands.
the adaptive refresh policy is effective at the range of 100 Mithril+ utilizes the mode register in the DRAM device,
to 200 in all cases. We seek the root cause in the cross- which is flagged when the difference between M axP tr and
play of memory access patterns of ordinary workloads and M inP tr is smaller than the values of AdT H . At every
the DRAM row size. Multithreaded or memory-intensive RF MT H , MC reads the flag using the JEDEC-standard
workloads often exhibit large-object-sweep behavior that MRR (Mode Register Read) command, determining whether
results in main-memory accesses (Figure 8(a)). In such a or not to issue the RFM command. With this interface,
case, memory accesses are concentrated on a small number Mithril+ can substantially minimize the performance over-
of rows in a short time period (Figure 8(b)) while being head in the common case of ordinary workloads at the
rather evenly distributed over the entire footprint overall. expense of a modification to the RFM interface.
Although such an access pattern may possess high DRAM
row locality, inter-process/thread conflicts can cause a high C. Non-adjacent Row Hammer
rate of ACT per memory access (Figure 8(c)). Here, the Mithril can follow approaches similar to those in prior
number of concentrated ACTs would be similar to the works [43], [56] with regard to handling a non-adjacent RH
number of streaming RDs/WRs, which would be 128 for an by adjusting the M value and the number of rows required
8KB DRAM row and a 64B cache line size. This matches to execute a preventive refresh. When the range of the RH
the range of the effective adaptive threshold values, although effect is one (double-sided attack, which we have assumed
the exact value must be determined empirically. thus far for Mithril), M smaller than F lipT H /2 is safe.
The adaptive refresh policy causes a slight deterioration However, when the range is broader, M must be smaller
of the bound M (Theorem 1), thus inducing a higher area or than F lipT H /(aggregated RH ef f ect) for non-adjacent
performance cost to ensure the same effect as the baseline. aggressors. Within the range of 3, the aggregated RH effect
8
Table III
A RCHITECTURAL PARAMETERS FOR SIMULATION activated just enough to reach the blacklist threshold. This
effectively throttles benign workloads, especially memory-
Core Configurations (16 cores) intensive types. Each RH attack or adversarial pattern runs
Core 3.6 GHz 4-way OOO cores simultaneously with the 15 other benign workloads.
LLC 16 MB Configurations: We select up to three different Mithril
Memory System Configurations
Module DDR5-4800
and Mithril+ (Nentry , RF MT H ) configurations for each
Channel 2 channels F lipT H , ranging from 50K to 1.5K. Recently observed [29]
Configuration 1 rank; 32 banks per rank F lipT H values are approximately 5K, but 1.5K is reachable
Scheduling BLISS [53] considering the continued scaling of process technology and
Page-Policy Minimalist-open [27] the non-adjacent RH. At high F lipT H values of 50K and
tRFC, tRC, tRFM 295 ns, 48.64 ns, 97.28 ns
25K, RF MT H at fixed to 256 given that Nentry is already
tRCD, tRP, tCL 16.64 ns
low. At the lowest F lipT H of 1.5K, RF MT H is fixed at 32
because a higher RF MT H value results in an overly high
is 3.5 [56], with six victim rows to execute a preventive Nentry . We use a value of 200 for AdT H as the default
refresh. value. For PARFM, RF MT H is fixed to satisfy a failure
probability of 10−15 (a typical consumer memory reliability
VI. E VALUATION target [56], [9], [10], [21], [34], [45]) for 64 banks within a
We evaluate the performance, energy, and area overhead 32ms time period (tREFW) for each F lipT H . The probabil-
of Mithril and Mithril+ in comparison with the RFM- ity degrades if the number of banks to support increases.
interface-compatible PARFM and BlockHammer, as well We reconfigure BlockHammer1 to match our simulation
as the RFM-interface-non-compatible PARA, CBT, TWiCe, environment and our target F lipT H values. For (CBF size,
and Graphene. NBL ) pairs, we used (1K, 17.1K), (1K, 8.6K), (1K, 4.3K),
(2K, 2.1K), (4K, 1.1K), and (8K, 0.49K) for F lipT H
A. Experimental Setup from 50K to 1.5K. Under our system of four banks per
Methodology: The performance overhead is evaluated based thread, the number of ACTs per row easily exceeds 700
on McSimA+ [3]. Table III summarizes the experimental (as opposed to 109 ACTs in the original BlockHammer
setup. We use the normalized aggregate IPC as the per- system with more banks per thread [56]), especially for
formance metric, where the baseline is the aggregate IPC memory-intensive workloads. Because NBL must be lower
without applying any RH-protection scheme for a workload. than F lipT H /2 (750 for a F lipT H value of 1.5K), it is
We count the number of ACTs, PREs, and executed preven- difficult to set an appropriate NBL value that distinguishes
tive refreshes to calculate the dynamic energy dissipation. benign accesses from aggressor accesses and fulfill RH-
First, we synthesize the RTL implementation of the Mithril protection at a F lipT H value of 1.5K while also incurring
module using the TSMC 40 nm standard cell library with minimal performance overhead.
the Synopsys Design Compiler. The area overhead is scaled Other prior schemes not compatible with RFM are also
down to DRAM 20 nm and then again scaled up 10× [14] to configured for a fair comparison with Mithril. TWiCe and
conservatively take the inferior DRAM process into account. Graphene are configured using the equations provided in
The hardware energy consumption of Mithril is also derived each work to be applied to the DDR5 specification. PARA
from the synthesis. is configured to satisfy a failure probability of 10−15 . CBT
Workloads: We use 1) normal, 2) multi-sided RH, and 3) is configured to follow the configuration in the original
BlockHammer-performance-adversarial workloads for eval- work [49], [48].
uation. We use both multi-programmed and multi-threaded
workloads for normal workloads, reporting their geo-mean B. The Overheads of Mithril and Mithril+
values. From SPEC CPU2017, we extract 100M instruction Mithril+ shows nearly zero performance overhead at all
traces [51] and render two different workloads, mix-high and F lipT H levels. The performance of Mithril degrades, with
mix-blend, each of which comprises 16 traces of memory- the amount depending on the target F lipT H and RF MT H
intensive and randomly selected workloads, respectively. We configurations. There exists a performance-area trade-off for
execute 400M instructions in total. We also evaluate three every F lipT H , which is amplified as F lipT H value becomes
different multi-threaded benchmarks (FFT and RADIX from smaller.
SPLASH-2 [44] and PageRank from GAP [6]). 1 BlockHammer uses a pair of interleaved counting bloom filters (CBFs)
We configure a multi-sided RH attack that targets multiple similar to Count-min Sketch algorithm. Each CBF is reset at every CBF
victims [15], [16], typically 32 in total. The adversarial lifetime (tCBF ), which typically matches tREFW. There exists a certain
pattern for BlockHammer in performance is configured to blacklist threshold (NBL ) of ACT that triggers a delay on a certain row
when it is surpassed. The delay time (tDelay ) is calculated as (tCBF −
blacklist specific profiled rows that share the CBF (count- NBL ×tRC)/(F lipT H − NBL ). Thread-level scheduling support is built
ing bloom filter) entry with the benign threads. Each is on top of these to throttle the aggressor thread itself.
9
Area overhead Mithril Mithril+ PARA CBT TWiCe Graphene Mithril Mithril+
102 6
102 4%
Performance (%)
Energy overhead
Performance (%)
100 4 101
3%
Relative
100
Relative
98 2 99 2%
98
1%
96 0 97
𝑅𝐹𝑀!" 512 256 128 256 128 64 128 64 32 32 96 0%
𝐹𝑙𝑖𝑝!" 12.5k 6.25k 3.125k 1.5k 𝐹𝑙𝑖𝑝!" 𝐹𝑙𝑖𝑝!"
5k
5k
k
3. k
k
12 k
5k
6. k
3. 5k
5k
k
12 k
5k
6. k
3. 5k
5k
50
25
.5
25
50
25
.5
50
25
.5
1.
12
1.
12
2
1.
12
2
12
6.
Figure 9. The relative performance and area overhead of Mithril and (a) Normal workload (b) Multi-sided RH (c) Normal workload
Mithril+.
Figure 11. Relative performance at (a) normal workloads and (b) multi-
PARFM BlockHammer Mithril Mithril+
sided RH-attack; (c): Relative dynamic energy at normal workloads.
105
performance (%)
100
Relative
95
and PARFM are agnostic with regard to the access patterns.
90
85 Lastly, regarding the performance of BlockHammer with
80 an adversarial pattern (Figure 10(c)), the performance of
𝐹𝑙𝑖𝑝!"
k
5k
k
3. k
k
5k
5k
k
3. k
k
5k
5k
k
3. k
5k
25
.5
25
50
25
.5
25
50
25
.5
25
1.
12
1.
12
1.
12
12
12
12
6.
6.
6.
(a) Normal workload (b) Multi-sided RH (c) BH adversarial pattern drop in the aggregate IPC. This implies the possibility of a
4%
12.5% critical performance (not RH) attack on systems equipped
Table size (KB)
3% 20
overhead
Energy
2%
with BlockHammer, as its throttling feature works as a
10
1% double-edged sword depending on how effectively it iden-
0% 0 tifies RH attacking threads.
𝐹𝑙𝑖𝑝!" 𝐹𝑙𝑖𝑝!"
The energy overhead of Mithril and Mithril+ are less
k
5k
k
k
5k
5k
5k
50
25
.5
25
50
25
.5
25
1.
12
1.
12
12
12
6.
6.
3.
3.
(d) Normal workload (e) Area overhead comparison than 0.4%, even when F lipT H is 1.5K. These values are
Figure 10. (a), (b), and (c): Relative performance at normal workloads, much smaller than that of PARFM and slightly higher than
under a multi-sided RH-attack, and BlockHammer-adversarial patterns; (d) that of BlockHammer (Figure 10(d)). This occurs because
Relative dynamic energy at normal workloads; and (e) Area overhead of the adaptive refresh policy successfully identifies ordinary
BlockHammer and Mithril.
workloads, skipping many of the RFM commands and not
triggering additional preventive refreshes. PARFM shows
Mithril can support the recently observed F lipT H values
the energy overhead in cases when every RFM command
of approximately 6.25K [29] with an RF MT H of 128,
triggers a preventive refresh. BlockHammer causes only
which results in performance overhead of less than 0.5%
minimal logic energy because it is a throttling-based scheme.
and a table size per bank of 1KB. Mithril can also support
The table size overhead of Mithril is much smaller than
lower F lipT H values, though at the cost of around 2% of the
that of BlockHammer at all F lipT H levels. Figure 10(e)
performance and 4KB of area overhead. The area overhead
shows the table size overhead for each scheme. PARFM is
of Mithril+ is identical to that of Mithril, with only negligible
omitted due to its negligible overhead, and that of Mithril+
performance overhead.
is identical to Mithril. The table size of Mithril is up to 60×
C. Comparison with Other Interface-Compatible Schemes and a minimum of 4× smaller than that of BlockHammer
at all F lipT H levels. The table size comparison is discussed
Figure 10 shows the performance and the energy overhead further in Section VI-E.
of other RFM-interface-compatible schemes of PARFM and
BlockHammer on multiple workloads for F lipT H values D. Comparison with Interface Non-Compatible Schemes
ranging from 50K to 1.5K. First, on normal workloads Mithril and Mithril+ also show competitive performance
(Figure 10(a)), both Mithril+ and Mithril show small perfor- and energy overhead compared to the RFM-non-compatible
mance degradation of less than 2%, superior to that of both prior works of PARA, CBT, TWiCe, and Graphene. Under
PARFM and BlockHammer. BlockHammer is particularly both normal workloads and a multi-sided RH attack situation
vulnerable at the low F lipT H of 1.5K because it is prone (Figure 11(a), (b)), Mithril+ shows performance degradation
to misidentifying benign threads and throttling them under of less than 0.2%, comparable to those of TWiCe, Graphene,
such a condition. or CBT. The performance degradation of Mithril is worse
Second, at the multi-sided RH (Figure 10(b)), BlockHam- than those of other schemes but is limited to less than 2%
mer exhibits a better aggregate IPC of up to 5% for higher even at the low F lipT H of 1.5K. The energy overhead of
F lipT H values, but it degrades again at a low F lipT H Mithril is comparable to those of TWiCe and Graphene at
value. This occurs because when BlockHammer successfully less than 1% even when F lipT H is 1.5K (Figure 11(c)).
identifies RH attacking threads and throttles them, benign
threads can benefit in return. However, this again leads E. Table Size Overhead
to vulnerabilities during misclassifications when F lipT H is We report the counter table size of each scheme in units
lower than, for instance, 1.5K. The performance of Mithril of KB per bank (see Table IV). While MC-side schemes
10
Table IV
P ER BANK TABLE S IZE C OMPARISON (KB) attacks at the architecture level. Among these, [58], [52],
[50] are susceptible to adversarial DRAM access patterns.
Scheme 50K 25K 12.5K 6.25K 3.125K 1.5K TWiCe [32] and CAT-TWO [26] are relatively free from
CBT @ MC 0.47 0.97 2.0 4.12 8.5 17.5
Graphene @ MC 0.14 0.21 0.51 0.99 1.92 3.7 this susceptibility but require an order of magnitude more
BlockHammer @ MC 3.75 3.5 3.25 6.0 11.0 20.0 storage to track aggressor rows compared to Graphene [43].
TWiCe @ buffer chip 2.79 5.08 9.54 18.27 35.29 71.26
Mithril-256 @ DRAM 0.08 0.17 0.41 1.45 - - PARA [30] incurs low performance and energy overhead,
Mithril-128 @ DRAM 0.07 0.15 0.34 0.84 3.76 - whereas it is also extremely area-efficient as it does not
Mithril-64 @ DRAM 0.07 0.14 0.3 0.68 1.78 -
Mithril-32 @ DRAM 0.06 0.13 0.27 0.57 1.38 4.64 require counters to trace aggressor rows. Yet, the protection
is probabilistic in nature; even if the probability is quite
* Mithril-(256/128/64/32) denote different RF MT H values ranging from
256 to 32. small, there is a non-zero probability that a victim row will
not be refreshed after reaching its RH threshold. BlockHam-
benefit from their use of faster transistors, abundant wiring mer [56] uses a throttling approach backed up with thread-
resources, and a relaxed area budget, the number of total level MC scheduling.
banks is much higher (1,024), and the target F lipT H must be
VIII. C ONCLUSION
pessimistic. DRAM-side schemes benefit from fewer banks
(32) to support per device and more accurate F lipT H values, Here, we propose Mithril, a DRAM-side, RFM-
but they are hindered by slower transistors and a tighter compatible, efficient scheme that provides deterministic
area/wiring budget. safety against Row Hammer attacks. First, we show that
Mithril shows lower or competitive area overhead in the conventional algorithms and methodologies used in
terms of the KB per bank, reaching 0.024mm2 when previous architectural RH-prevention schemes are not com-
F lipT H equals 6.25K. This represents 1% of a single DDR5 patible with the RFM command introduced in the latest
chip [28] when multiplied by 32 to cover 32 banks per DRAM specifications, such as DDR5 and LPDDR5. By
chip. While both Graphene and Mithril share fundamentally mathematical defining the maximum bound of activation
the same CbS algorithm as their tracking mechanism, their count without a refresh in a tREFW time window, we
table size overhead differs for several reasons. First, as guarantee safety at a specific F lipT H value. The devised
an advantage for Mithril, it does not require a table reset adaptive refresh policy decreases the energy overhead by
due to the wrapping counter scheme, resulting in two-fold exploiting the row activation patterns of ordinary workloads.
reduction. Also, the per-entry bit width of the counter CAM Moreover, we proposed Mithril+, which requires a slight
is smaller in Mithril because the maximum value is bounded modification of the RFM interface. It utilizes the existing
to M (Theorem 1), which is smaller than the Graphene case DRAM command to skip the sending of RFM commands,
for maximum number of ACTs in the tREFW window. At which can significantly reduce the performance overhead of
a F lipT H value of 1.5K, we ensure that RF MT H is small Mithril. Our evaluation demonstrates that Mithril achieves
to minimize the performance drop, resulting in increased a significantly low energy overhead in all cases compared
Nentry and area overhead. to PARFM, whereas it incurs slightly higher performance
overhead. Mithril+ shows not only low energy overhead but
VII. R ELATED W ORK also significantly lower performance overhead such that it
Row Hammer (RH) on Real Systems: RH has been is comparable to Graphene, a state-of-the-art RH-prevention
shown to be able to bypass all system memory protection scheme that does not support RFM.
schemes, allowing adversaries to compromise the confiden-
ACKNOWLEDGMENT
tiality and integrity of actual systems. In 2015, Google [47]
demonstrated that a user-level program could breach the This work was supported by Institute of Information &
system-level security of a typical PC by exploiting the communications Technology Planning & Evaluation (IITP)
RH vulnerability of the system. A number of successful grant funded by the Korea government (MSIT) (2020-
attacks followed [47], including those compromising mobile 0-01300, Development of AI-specific Parallel High-speed
devices [54], [55] and servers [19], [13], [46], thus breaking Memory Interface, and 2021-0-01343, Artificial Intelligence
the authentication process and damaging the entire system, Graduate School Program (Seoul National University)). Jung
even when a system protects memory locations near sensitive Ho Ahn is the corresponding author.
data [59]. Because RH undermines the fundamental principle
IX. A PPENDIX
of memory isolation, it has been regarded as a serious threat,
drawing mitigation proposals from software, architecture, A. Proof for Theorem 1
and hardware levels. Theorem 1. Within any tREFW, an increase in the estimated
Architectural Proposals to Mitigate RH: There have count for any single row is bounded to M , which is a
been deterministic [50], [26], [32], [43], [56] and proba- function of Nentry and RF MT H .
bilistic [30], [52], [58] schemes proposed to mitigate RH
11
3 Pk k
Pk+1
Lemma 4. i=1 cj [i] ≤ k+1 ( i=1 cj−1 [i]+RF MT H ) for
𝑐!"# 1 𝒄&𝒋"𝟏 𝟏 𝑐! 1 𝑐!& 1
&
1 1≤k ≤N −1
𝑐!"# 2 𝑐!"# 2 𝑐! 2 𝑐!& 2
.. .. 1 .. ..
. . . . Proof: Using Lemma 1, Lemma 2, and the fact that
1 : Lemma 1
𝑐!"# 𝑘 − 1 &
𝑐!"# 𝑘−1 𝑐! 𝑘 − 1 𝑐!& 𝑘 − 1
2 : Lemma 2
c0j−1 [1] ≥ c0j−1 [i] for all i, the following holds true:
& 1
𝑐!"# 𝑘 𝑐!"# 𝑘 𝑐! 𝑘 𝑐!& 𝑘
.. .. .. 2 .. 3 : Lemma 3 k k+1
. . Refreshed . .
X X
& by RFM &
cj [i] = c0j−1 [i]
𝑐!"# 𝑁 𝑐!"# 𝑁 𝒄𝒋 𝑵 𝑐! 𝑁
i=1 i=2
𝑅𝐹𝑀!" ACTs 𝑅𝐹𝑀!" ACTs k+1
X
(j-1)-th RFM interval j-th RFM interval = c0j−1 [i] − c0j−1 [1]
i=1
Figure 12. Example case illustrating how the estimated count is updated k+1 k+1
X 1 X 0
between two consecutive RFM intervals. ≤ c0j−1 [i] − c [i]
i=1
k + 1 i=1 j−1
k+1
Nentry tRFC
! k X 0
X RF MT H RF MT H tREFW(1 − tREFI ) = c [i]
M= + −2 k + 1 i=1 j−1
k Nentry tRC × RF MT H + tRFM
k=1 k+1
!
k X
≤ cj−1 [i] + RF MT H
Henceforth, Nentry is replaced by N . Also, W represents k+1 i=1
the maximum number of RFM intervals (the period between
two consecutive RFM commands) within a tREFW. It is With these Lemmas, we are ready to prove Theorem 1.
computed as follows: Proving Theorem 1 is equivalent to proving the following:
W = d(tREFW − (tREFW/tREFI) × tRFC)/(tRC × RF MT H + tRFM)e
c0W [1] − c1 [N ] ≤ M for given c1 [1], ..., c1 [N ]
Suppose that cj [i] is the i-th largest estimated count in the
Mithril table at the beginning of the j-th RFM interval This works because any row’s estimated count that is in-
(1 ≤ i ≤ N, 1 ≤ j ≤ W ). c0j [i] is the i-th largest estimated creased during the W RFM intervals is obviously less than
count in the table at the end of the j-th RFM interval. the difference between the largest estimated count at the end
Figure 12 illustrates a prime example of such notations. (c0W [1]) and the smallest estimated count at the beginning
Then, the following Lemmas hold true: (c1 [N ]). Accordingly, we can obtain the upper bound for
c0W [1] as follows:
Lemma 1. c0j [i] = cj+1 [i − 1]
c0W [1] ≤ cW [1] + RF MT H (∵ Lemma 2)
Proof: At the end of each RFM interval, one of the entries 2
!
1 X
with the largest estimated count (i.e., c0j [1]) becomes the ≤ cW −1 [i] + RF MT H + RF MT H (∵ Lemma 4)
2 i=1
target for the RFM refresh, and its estimated count is reset 3
! 2
to the minimum count in the table. Thus, the ranks of all 1 X X RF MT H
≤ cW −2 [i] + RF MT H + (∵ Lemma 4)
3 i=1 k
other entries are increased by one after the RFM refresh. k=1
Pk 0
Pk Repeatedly applying Lemma 4 for a total of N − 1 times,
Lemma 2. i=1 cj [i] ≤ i=1 cj [i] + RF MT H for
1≤k≤N we obtain the following inequality:
−1
N
! N
Proof: Considering that there are RF MT H ACTs within 1 X X RF MT H
c0W [1] ≤ cW −N +1 [i] + RF MT H +
each RFM interval and c0j [i] is larger than or equal to cj [i] N i=1
k
k=1
N
! N
for all values of i by definition, the following holds true: 1 X X RF MT H
k N N = cW −N +1 [i] +
X
c0j [i] =
X
c0j [i] −
X
c0j [i] N i=1
k
k=1
i=1 i=1 i=k+1
N
X N
X At this point, we can no longer apply Lemma 4 and instead
= cj [i] + RF MT H − c0j [i] apply Lemma 3 (k = N ) W − N times.
i=1 i=k+1
k N
PN N
i=1 c1 [i] (W − N )RF MT H X RF MT H
c0W [1] ≤
X X
= cj [i] − (c0j [i] − cj [i]) + RF MT H + +
i=1 i=k+1
N N k
k=1
k PN
X
i=1 c1 [i] N −2
≤ cj [i] + RF MT H = +M − RF MT H
i=1
N N
Pk Pk Earlier, we showed that proving Theorem 1 is equivalent to
Lemma 3. i=1 cj [i] ≤ i=1 cj−1 [i] + RF MT H for proving c0W [1]−c1 [N ] ≤ M . With the above equation, prov-
1≤k≤N ing the following is the only step left to prove Theorem 1:
Proof: This isPan obvious extension of Lemma 2 because PN
Pk
c
k
[i] ≤ i=1 c0j−1 [i]. In other words, an RFM refresh i=1 c1 [i] N −2
i=1 j − c1 [N ] ≤ RF MT H
always decreases the sum of the top k counter values in the N N
table. Here, the left-hand side can be represented as follows.
12
PN PN PN
i=1 c1 [i] i=1 c1 [i]− N c1 [N ] i=1 (c1 [i] − c1 [N ]) [9] Y. Cai, S. Ghose, E. F. Haratsch, Y. Luo, and O. Mutlu, “Error
− c1 [N ] = = Characterization, Mitigation, and Recovery in Flash-memory-
N N N
based Solid-state Drives,” Proceedings of the IEEE, vol. 105,
PN
The upper bound of i=1 (cj [i] − cj [N ]) for any j-th RFM no. 9, 2017.
interval can be obtained by contradiction. We assume that [10] Y. Cai, E. F. Haratsch, O. Mutlu, and K. Mai, “Error Patterns
PN in MLC NAND Flash Memory: Measurement, Characteriza-
i=1 (cj [i] − cj [N ]) is maximized when j is m and that
the difference between cm [1] and cm [N ] is greater than tion, and Analysis,” in Design, Automation & Test in Europe
Conference & Exhibition (DATE). IEEE, 2012.
RF MT H . Then, the following holds:
[11] M. Charikar, K. Chen, and M. Farach-Colton, “Finding
c0m−1 [1] − c0m−1 [N ] ≥ c0m−1 [2] − c0m−1 [N ] Frequent Items in Data Streams,” in Proceedings of the
(3)
= cm [1] − cm [N ] > RF MT H 29th International Colloquium on Automata, Languages and
Programming, 2002.
At the end of the (m-1)-th RFM interval, c0m−1 [1] is reduced
[12] L. Cojocar, J. Kim, M. Patel, L. Tsai, S. Saroiu, A. Wol-
to c0m−1 [N ] by RFM. Therefore
man, and O. Mutlu, “Are We Susceptible to Rowhammer?
N
X N
X An End-to-End Methodology for Cloud Providers,” in IEEE
(cm [i] − cm [N ]) = (c0m−1 [i] − c0m−1 [N ]) − (c0m−1 [1] − c0m−1 [N ])
Symposium on Security and Privacy (S&P), 2020.
i=1 i=1
N
X [13] L. Cojocar, K. Razavi, C. Giuffrida, and H. Bos, “Exploiting
= (cm−1 [i] − cm−1 [N ]) + RF MT H − (c0m−1 [1] − c0m−1 [N ])
i=1
Correcting Codes: On the Effectiveness of ECC Memory
XN Against Rowhammer Attacks,” in IEEE Symposium on Se-
< (cm−1 [i] − cm−1 [N ]) (∵ (3)) curity and Privacy (S&P), 2019.
i=1
[14] F. Devaux, “The True Processing in Memory Accelerator,” in
This contradicts the contention that N i=1 (cj [i] − cj [N ]) is
P
IEEE Hot Chips 31 Symposium. IEEE Computer Society,
maximized when j is m. Therefore, if N i=1 (cj [i] − cj [N ]) is
P
2019.
maximized when j is m, the difference between cm [1] and [15] P. Frigo, E. Vannacci, H. Hassan, V. van der Veen, O. Mutlu,
cm [N ] is less than or equal to RF MT H . Then, we obtain C. Giuffrida, H. Bos, and K. Razavi, “TRRespass: Exploiting
the following inequality: the Many Sides of Target Row Refresh,” in IEEE Symposium
PN PN on Security and Privacy (S&P), 2020.
i=1 (c1 [i] − c1 [N ]) − cm [N ])
i=1 (cm [i]
≤ [16] ““Half-Double”: Next-Row-Over Assisted Rowhammer,”
N N
PN −2
(cm [i] − cm [N ]) https://github.com/google/hammer-kit/blob/main/20210525
= i=1 (∵ cm [N − 1] = cm [N ]) half double.pdf, Google, 2021.
N
PN −2
(cm [1] − cm [N ]) [17] Z. Greenfield and L. Tomer, “Throttling Support for Row-
≤ i=1
N hammer Counters,” U.S. Patent 9251885, Feb. 2016.
(N − 2)RF MT H
≤ [18] D. Gruss, M. Lipp, M. Schwarz, D. Genkin, J. Juffinger,
N
S. O’Connell, W. Schoechl, and Y. Yarom, “Another Flip in
R EFERENCES the Wall of Rowhammer Defenses,” in IEEE Symposium on
Security and Privacy (S&P), 2017.
[1] M. T. Aga, Z. B. Aweke, and T. Austin, “When Good Protec-
tions Go Bad: Exploiting Anti-DoS Measures to Accelerate [19] D. Gruss, C. Maurice, and S. Mangard, “Rowhammer.js:
Rowhammer Attacks,” in IEEE International Symposium on A Remote Software-Induced Fault Attack in JavaScript,”
Hardware Oriented Security and Trust (HOST), 2017. in Detection of Intrusions and Malware, and Vulnerability
[2] P. K. Agarwal, G. Cormode, Z. Huang, J. M. Phillips, Z. Wei, Assessment, 2016.
and K. Yi, “Mergeable Summaries,” ACM Transactions on [20] G. Irazoqui, T. Eisenbarth, and B. Sunar, “MASCAT: Pre-
Database Systems (TODS), vol. 38, no. 4, 2013. venting Microarchitectural Attacks Before Distribution,” in
[3] J. Ahn, S. Li, S. O, and N. P. Jouppi, “McSimA+: A Manycore Proceedings of the 8th ACM Conference on Data and Appli-
Simulator with Application-level+ Simulation and Detailed cation Security and Privacy, 2018.
Microarchitecture Modeling,” in ISPASS, 2013. [21] JEDEC, “Failure Mechanisms and Models for Semiconductor
[4] Z. B. Aweke, S. F. Yitbarek, R. Qiao, R. Das, M. Hicks, Devices,” 2019.
Y. Oren, and T. Austin, “ANVIL: Software-Based Protection [22] JEDEC, “LPDDR5 Standard JESD209-5,” 2019.
Against Next-Generation Rowhammer Attacks,” in ASPLOS, [23] JEDEC, “DDR5 SDRAM,” 2020.
2016.
[24] JEDEC, “Near-Term DRAM Level RowHammer Mitigation,”
[5] R. Balasubramonian, Innovations in the Memory System.
2021.
Morgan & Claypool Publishers, 2019.
[25] JEDEC, “System Level RowHammer Mitigation,” 2021.
[6] S. Beamer, K. Asanovic, and D. A. Patterson, “The GAP
Benchmark Suite,” CoRR, vol. abs/1508.03619, 2015. [26] I. Kang, E. Lee, and J. Ahn, “CAT-TWO: Counter-Based
[7] I. Bhati, M. Chang, Z. Chishti, S. Lu, and B. Jacob, “DRAM Adaptive Tree, Time Window Optimized for DRAM Row-
Refresh Mechanisms, Penalties, and Trade-Offs,” IEEE Trans- Hammer Prevention,” IEEE Access, vol. 8, 2020.
actions on Computers, vol. 65, no. 1, 2016. [27] D. Kaseridis, J. Stuecheli, and L. K. John, “Minimalist Open-
[8] F. Brasser, L. Davi, D. Gens, C. Liebchen, and A. R. page: A DRAM Page-mode Scheduling Policy for the Many-
Sadeghi, “CAn’T Touch This: Software-only Mitigation core Era,” in MICRO, 2011.
Against Rowhammer Attacks Targeting Kernel Memory,” in
26th USENIX Conference on Security Symposium, 2017.
13
[28] D. Kim, M. Park, S. Jang, J.-Y. Song, H. Chi, G. Choi, [44] PARSEC Group, “A Memo on Exploration of SPLASH-2
S. Choi, J. Kim, C. Kim, K. Kim, K. Koo, S. Song, Y. Kim, Input Sets,” in Princeton University, 2011.
D. U. Lee, J. Lee, D. Kim, K. Kwon, M. Han, B. Choi, [45] M. Patel, J. S. Kim, and O. Mutlu, “The Reach Profiler
H. Kim, S. Ku, Y. Kim, J. Kim, S. Kim, Y. Seo, S. Oh, D. Im, (REAPER): Enabling the Mitigation of DRAM Retention
H. Kim, J. Choi, J. Chung, C. Lee, Y. Lee, J.-H. Cho, J. Chun, Failures via Profiling at Aggressive Conditions,” in ISCA,
and J. Oh, “A 1.1V 1ynm 6.4 Gb/s/pin 16Gb DDR5 SDRAM 2017.
with a Phase-Rotator-Based DLL, High-Speed SerDes and
RX/TX Equalization Scheme,” in IEEE International Solid- [46] K. Razavi, B. Gras, E. Bosman, B. Preneel, C. Giuffrida,
State Circuits Conference (ISSCC), 2019, pp. 380–382. and H. Bos, “Flip Feng Shui: Hammering a Needle in the
Software Stack,” in 25th USENIX Conference on Security
[29] J. Kim, M. Patel, A. G. Yaglikçi, H. Hassan, R. Azizi, Symposium, 2016.
L. Orosa, and O. Mutlu, “Revisiting RowHammer: An Exper-
imental Analysis of Modern DRAM Devices and Mitigation [47] M. Seaborn and T. Dullien, “Exploiting the DRAM
Techniques,” in ISCA, 2020. Rowhammer Bug to Gain Kernel Privileges,”
https://googleprojectzero.blogspot.com/2015/03/exploiting-
[30] Y. Kim, R. Daly, J. Kim, C. Fallin, J. H. Lee, D. Lee, dram-rowhammer-bug-to-gain.html, 2015.
C. Wilkerson, K. Lai, and O. Mutlu, “Flipping Bits in
Memory Without Accessing Them: An Experimental Study [48] S. M. Seyedzadeh, A. K. Jones, and R. Melhem, “Counter-
of DRAM Disturbance Errors,” in ISCA, 2014. Based Tree Structure for Row Hammering Mitigation in
DRAM,” IEEE Computer Architecture Letters, vol. 16, no. 1,
[31] R. K. Konoth, M. Oliverio, A. Tatar, D. Andriesse, H. Bos, 2017.
C. Giuffrida, and K. Razavi, “ZebRAM: Comprehensive
and Compatible Software Protection Against Rowhammer [49] S. M. Seyedzadeh, A. K. Jones, and R. Melhem, “Mitigating
Attacks,” in 13th USENIX Symposium on Operating Systems Wordline Crosstalk using Adaptive Trees of Counters,” in
Design and Implementation, 2018. ISCA, 2018.
[32] E. Lee, I. Kang, S. Lee, G. E. Suh, and J. Ahn, “TWiCe: [50] S. M. Seyedzadeh, A. K. Jones, and R. Melhem, “Mitigating
Preventing Row-hammering by Exploiting Time Window Wordline Crosstalk Using Adaptive Trees of Counters,” in
Counters,” in ISCA, 2019. ISCA, 2018.
[33] E. Lee, S. Lee, G. E. Suh, and J. Ahn, “TWiCe: Time Window [51] T. Sherwood, E. Perelman, G. Hamerly, and B. Calder, “Au-
Counter Based Row Refresh to Prevent Row-Hammering,” tomatically Characterizing Large Scale Program Behavior,” in
IEEE Computer Architecture Letters, vol. 17, no. 1, 2018. ASPLOS, 2002.
[34] Y. Luo, S. Ghose, Y. Cai, E. F. Haratsch, and O. Mutlu, “En- [52] M. Son, H. Park, J. Ahn, and S. Yoo, “Making DRAM
abling Accurate and Practical Online Flash Channel Modeling Stronger Against Row Hammering,” in Proceedings of the
for Modern MLC NAND Flash Memory,” IEEE Journal on 54th Annual Design Automation Conference, 2017.
Selected Areas in Communications, vol. 34, no. 9, 2016. [53] L. Subramanian, D. Lee, V. Seshadri, H. Rastogi, and
[35] G. S. Manku and R. Motwani, “Approximate Frequency O. Mutlu, “BLISS: Balancing Performance, Fairness and
Counts over Data Streams,” in Proceedings of the 28th Complexity in Memory Access Scheduling,” IEEE Transac-
International Conference on Very Large Data Bases, 2002. tions on Parallel and Distributed Systems, vol. 27, no. 10,
2016.
[36] A. Metwally, D. Agrawal, and A. El Abbadi, “Efficient Com-
putation of Frequent and Top-k Elements in Data Streams,” [54] V. van der Veen, Y. Fratantonio, M. Lindorfer, D. Gruss,
in Proceedings of the 10th International Conference on C. Maurice, G. Vigna, H. Bos, K. Razavi, and C. Giuffrida,
Database Theory, 2005. “Drammer: Deterministic Rowhammer Attacks on Mobile
Platforms,” in Proceedings of the 2016 ACM SIGSAC Con-
[37] J. Misra and D. Gries, “Finding Repeated Elements,” Science ference on Computer and Communications Security, 2016.
of Computer Programming, vol. 2, no. 2, 1982.
[55] V. van der Veen, M. Lindorfer, Y. Fratantonio, H. Pillai,
[38] S. Muthukrishnan, Data Streams: Algorithms and Applica- G. Vigna, C. Kruegel, H. Bos, and K. Razavi, “GuardiON:
tions. Now Publishers Inc., 2005. Practical Mitigation of DMA-Based Rowhammer Attacks
[39] O. Mutlu and J. S. Kim, “RowHammer: A Retrospective,” on ARM,” in 15th International Conference on Detection
IEEE Transactions on Computer-Aided Design of Integrated of Intrusions and Malware, and Vulnerability Assessment
Circuits and Systems, vol. 39, no. 8, 2019. (DIMVA), 2018.
[40] B. Nale and C. E. Cox, “Refresh Command Control for Host [56] A. G. Yaglikci, M. Patel, J. Kim, R. AziziBarzoki,
Assist of Row Hammer Mitigation,” U.S. Patent 10950288B2, J. Park, H. Hassan, A. Olgun, L. Orosa, K. Kanellopoulos,
Mar. 2021. T. Shahroodi, S. Ghose, and O. Mutlu, “BlockHammer: Pre-
[41] S. O, Y. H. Son, N. S. Kim, and J. Ahn, “Row-buffer Decou- venting RowHammer at Low Cost by Blacklisting Rapidly-
pling: A Case for Low-latency DRAM Microarchitecture,” in Accessed DRAM Rows,” in HPCA, 2021.
ISCA, 2014. [57] T. Yang and X. Lin, “Trap-Assisted DRAM Row Hammer
[42] K. Park, C. Lim, D. Yun, and S. Baeg, “Experiments and Root Effect,” IEEE Electron Device Letters, vol. 40, no. 3, 2019.
Cause Analysis for Active-precharge Hammering Fault in [58] J. M. You and J.-S. Yang, “MRLoc: Mitigating Row-
DDR3 SDRAM under 3x nm Technology,” Microelectronics hammering Based on Memory Locality,” in Proceedings of
Reliability, vol. 57, 2016. the 56th Annual Design Automation Conference, 2019.
[43] Y. Park, W. Kwon, E. Lee, T. J. Ham, J. Ahn, and J. W. Lee, [59] Z. Zhang, Y. Cheng, D. Liu, S. Nepal, Z. Wang,
“Graphene: Strong yet Lightweight Row Hammer Protection,” and Y. Yarom, “PThammer: Cross-User-Kernel-Boundary
in MICRO, 2020. Rowhammer through Implicit Accesses,” in MICRO, 2020.
14