RL-based Cache Replacement A Modern Interpretation
RL-based Cache Replacement A Modern Interpretation
This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3346790
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.DOI
ABSTRACT Belady’s algorithm is widely known as an optimal cache replacement policy. It has been
the foundation of numerous recent studies on cache replacement policies, and most studies assume this as
an upper limit. Despite its widespread adoption, we discovered opportunities to unleash the headroom by
addressing cache access types and implementing cache bypass. In this study, we propose Stormbird, a cache
replacement policy that synergistically integrates the extensions of Belady’s algorithm and the power of
reinforcement learning. Reinforcement learning is well-suited for cache replacement policy problems owing
to its ability to interact dynamically with the environment, adapt to changing access patterns, and optimize
the maximum cumulative rewards. Stormbird utilizes several selected features from the reinforcement
learning model to enhance the instructions per cycle efficiency while maintaining a low hardware overhead.
Furthermore, it considers cache access types and integrates dynamic set dueling techniques to improve the
cache performance. For 2 MB last-level cache per core, Stormbird achieves an average instructions per
cycle improvement of 0.13% over the previous state-of-the-art on a single-core system and 0.02% on a four-
core system while simultaneously reducing hardware overhead by 62.5%. Stormbird incurs a low hardware
overhead of only 10.5 KB for 2 MB last-level cache and can be implemented without using program counter
values.
INDEX TERMS Computer architecture, caches, reinforcement learning, replacement policy, Belady’s
algorithm, set dueling, cache access type.
VOLUME 4, 2016 1
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3346790
FIGURE 1. Comparison of the improved dead-block-reduction rate over LRU of Belady’s algorithm and Belady+ on SPEC CPU 2017 and Streaming of Cloudsuite
(16 way, 2MB LLC)
2 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3346790
placement policy using Belady+ to construct an RL simula- TABLE 1. Comparison of cache replacement policies.
tion framework. The input features obtained from RL with
significant weights were refined to mitigate the hardware Compatible with Standalone Considering
Policies\Features
traditional cache mechanism access type
overhead.
The remainder of this paper is organized as follows. Hawkeye [3] ✗ ✓ ✓
Mockingjay [4] ✗ ✓ ✓
Section 2 provides an overview of related work on cache SHiP [12] ✗ ✓ ✗
replacement policies. Section 3 presents the methodology SHiP++ [13] ✗ ✓ ✓
used in our research, encompassing the RL simulation model RLR [6] ✓ ✓ ✓
Glider [11] ✗ ✓ ✗
for Stormbird. Section 4 provides an in-depth exposition of EHC [5] ✓ ✗ ✗
Stormbird and elucidates its mechanisms and functionali- ReD [15] ✓ ✗ ✗
ties. Section 5 presents the evaluation results and analysis, MRP [16] ✗ ✓ ✗
Stormbird (ours) ✓ ✓ ✓
comparing Stormbird’s performance with that of other cache
replacement policies. Finally, Section 6 concludes the study
and outlines potential directions for future work.
Glider leveraged long short-term memory (LSTM) learning
II. RELATED WORK models in offline environments to refine the precision of
Several innovative policies have exploited program existing hardware predictors.
counter (PC)-based signatures and predictors to boost the Certain policies targeted unique aspects of cache replace-
efficiency of cache replacement. The signature-based hit ment. ReD [15], a block selection policy developed by Diaz
prediction (SHiP) replacement policy proposed by Wu et al. et al., was designed to determine the eligibility of a block
[12] focused on predicting the re-reference characteristics of from the main memory for insertion into the LLC based
cache lines using a signature history counter table (SHCT). on its expected reuse behavior. The multi-perspective reuse
Similarly, the SHiP++ method [13] strengthened the orig- prediction (MRP) method [16], proposed by Jimenez et al.,
inal SHiP approach and introduced several optimizations, forecasted the future reuse of cache blocks employing diverse
including more effective handling of writeback accesses and features to optimize cache management.
prefetch-aware re-reference prediction values (RRPV) up- Table 1 summarizes the comparison of the cache replace-
dates. Moreover, the Hawkeye replacement policy [3] utilized ment policies. Many of the aforementioned policies utilized
a PC-based predictor to enhance the cache management PC values to bolster their accuracy. However, this approach
efficiency by emulating Belady’s algorithm to classify cache requires extra logic, wire, and energy consumption and incurs
lines as cache-friendly or cache-averse. On a cache miss, a substantial organizational expense. When a replacement
Hawkeye prioritized evicting cache-averse lines, utilizing policy incorporates PC values, it is incompatible with data
sampling sets and a counter-based structure to track the use- prefetchers. Additionally, the miss status holding register
fulness of cache lines, leading to elevated cache management (MSHR) and pipeline design needs to encompass these PC
efficiency. values [17].
Various strategies, including the Hawkeye policy, use Addressing the cache access type can improve the replace-
different methodologies to simulate Belady’s algorithm for ment ability to handle various sequences of cache accesses
cache replacement. The Mockingjay replacement policy [4] effectively, as different access types can have unique behav-
introduced an ETA-based policy that mimicked Belady’s ior patterns and impacts on cache performance. Many ML-
algorithm while considering cache access types. Expected based policies have been established based on the foundation
hit count policy (EHC) [5], a concept developed by Vakil- of conventional Belady’s algorithm. To improve the overall
Ghahani et al., proposes a hit-count-based victim selection performance of the replacement policy, we opted to use
procedure intended to bridge the gap between the traditional Belady+ instead of Belady’s algorithm.
LRU policy and Belady’s MIN policy. The authors observed a
strong correlation between a cache block’s expected number III. CACHE REPLACEMENT SIMULATION WITH
of hits and the reciprocal of its reuse distance. They proposed REINFORCEMENT LEARNING
a hit-count-based victim selection procedure that can be RL is particularly suited to the cache replacement policy
implemented on top of existing replacement policies (e.g., for several reasons. The cache replacement policy is essen-
DRRIP [14]). tially a problem of making the best decision under uncertain
ML has been investigated in cache replacement field using conditions. RL excels in such problems because it is designed
several approaches. For example, the reinforcement learning to learn from the environment and make the most benefi-
replacement (RLR) policy [6] employed RL to isolate critical cial decisions over time [18]. In addition, RL focuses on
features for cache replacement. RLR implements a replace- maximizing long-term rewards rather than immediate gains.
ment policy without directly integrating an ML framework This align with the goal of cache replacement policies, which
into the cache architecture. LSTM learning model was also aim to minimize the long-term miss rate, rather than sim-
investigated for cache replacement, as shown in the Glider ply avoiding immediate cache misses. Constructing a neural
policy by Shi et al [11]. Building on the Hawkeye policy, network directly in hardware is undesirable owing to power,
VOLUME 4, 2016 3
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3346790
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3346790
Algorithm 1 Reward function for determining victim selec- Algorithm 2 Reward function for determining bypass deci-
tion (Rvictim ) sion (Rbypass )
Input: Actionvictim , action of victim selection by agent; Input: Actionbypass , action of bypass by agent; RDaccess ,
RDaccess , reuse distance of current access; RDn , reuse reuse distance of current access; RDN , reuse distance
distance of way n in corresponding set; of way N in corresponding set;
Output: rvictim , reward of victim selection; Output: rbypass , reward of bypass decision;
1: Beladyvictim ← way with the max(RDN ) 1: if RDaccess > max(RDN ) then
2: if Actionvictim is Beladyvictim then 2: if Actionbypass is Bypass then
3: rvictim ← 1 3: rbypass ← 3
4: else if RD of Actionvictim < RDaccess then 4: else
5: rvictim ← −1 5: rbypass ← −2
6: else 6: end if
7: rvictim ← 0 7: else
8: end if 8: if Actionbypass is Bypass then
9: return rvictim 9: rbypass ← −1
10: else
11: rbypass ← 0
tenance. However, we observed a correlation between certain 12: end if
tag bits and the agent’s replacement decisions. We also ex- 13: end if
perimented with additional features such as accessed tag bits 14: return rbypass
sequence and prefetch usefulness counters. These features
were excluded from the RL model, taking into account fac-
tors such as computational overhead, implementation com-
plexity at replacement policy, and marginal contribution to the selected way could have been reused sooner in the future
performance enhancement. Implementing a policy based on but is instead being evicted by the current access that will be
impactful segments by indicating the agent’s response in reused later. In other situations, a neutral reward is provided.
various scenarios provides reliable performance, considering To balance positive and negative rewards, we set the
the hardware overhead. magnitude of both positive and negative rewards equal. Fur-
thermore, considering the clarity of decision boundaries and
3) Agent computational efficiency, we have designated the reward val-
The agent shown in Fig. 3 evaluates the current state vector ues as integers. The bypass decision made by Belady+ occurs
and generates an output vector corresponding to an n-way less often than victim selection, which arises every cache
set associative LLC. The output vector signifies the merit miss. This infrequency makes learning the bypass decision
evicting for each way and bypassing for the current access. more challenging than the victim selection, necessitating a
We use a deep Q network (DQN) [19] with a multi-layer per- more sophisticated determination of the reward value for the
ceptron (MLP) using a single hidden layer. The hidden layer Rbypass . The agent can recognize its importance by allocat-
has tanh activation function. The neural network structure ing a larger reward for more essential actions or outcomes,
involved 323 input neurons, and 256 neurons in the hidden enhancing learning efficacy.
layer. Agentvictim has 16 output neurons corresponding to Therefore, if the bypass action is selected by both the
the 16-way LLC, and Agentbypass has 2 output neurons for agent and Belady+ (i.e., If the reuse distance of the cache
decision bypass. On every LLC miss, the agent selects the access exceeds that of every way within the cache set, the
victim way based on the output of the network. We take the incoming data is determined to be bypassed), a large mag-
ϵ-greedy algorithm [19] to avoid overfitting problems, using nitude of reward value (+3) is provided to encourage the
0.5 for initial value and decaying 0.001 every 1024 steps until agent to learn this decision strongly. In cases where only
0.01. Belady+ selects the bypass action and the agent does not, a
negative reward (-2) will be provided as a strong penalty for
4) Reward missing this behavior. Additionally, a minor negative reward
To approximate the behavior of the Belady+ in Stormbird, (-1) is assigned for the agent’s unilateral decision to bypass,
we crafted two reward functions: one for the selection of mitigating unnecessary bypass actions.
the victim way (Rvictim , 4-a in Fig. 3) and the other for The rationale for assigning different magnitudes for the
determining bypass decisions (Rbypass , 4-b in Fig. 3). three distinct reward scenarios is deliberately structured so
The Rvictim provides a positive reward only when the that the aggregate sum of rewards equals zero. This equi-
agent selects the same victim way as Belady’s algorithm librium is intended to balance the reward system, ensuring
would. In contrast, If the reuse distance of the way chosen that the agent’s learning process is neither excessively penal-
by the agent is shorter than the reuse distance of the current ized nor unjustifiably rewarded, thus facilitating an unbiased
access, the Rvictim provides a negative reward. It is because adaptation to the Belady+ algorithm’s behavior.
VOLUME 4, 2016 5
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3346790
FIGURE 5. The learning process of 602.gcc over 100 epochs (Agentvictim ) FIGURE 7. Weight heatmap for victim selection
5) Training the victim way using the Belady+ algorithm, and Fig. 8
In alignment with numerous RL implementations, experi- shows a weight heatmap for bypass learning based on the
ence replay [19] serves a pivotal role in training RL networks. same algorithm. Given that the learning direction required
This method uses a batch of transactions randomly sampled for selecting the victim and bypass ways differ, we conducted
from the replay buffer. Through a series of experiments, it separate training for both cases and derived their respective
is essential to assign a distinct replay buffer size to each weight heatmaps. Weights were extracted from each heatmap
benchmark. Empirical analysis revealed that a replay buffer and the geometric mean of the weights was computed and
size equivalent to 1/10 of the total LLC access count yielded subsequently normalized individually for each benchmark.
the most optimal results during training. This process ensures that the relative importance of each
For each benchmark, 100 training epochs were applied, feature is accurately represented across various benchmarks,
enabling thorough training and evaluation of the RL model. accommodating the diverse nature of the workloads. The ex-
Fig. 5 and 6 illustrate the evolution of feature weights over periments encompassed all benchmark types from the SPEC
epochs. As the learning progresses, a distinction in the weight CPU 2017 to guarantee fair feature selection across a broad
significance of each feature emerges, highlighting the adapta- spectrum of workloads.
tion of the model to the importance of each feature for cache
replacement. B. RL WEIGHT HEATMAP ANALYSIS
The feature weights were computed based on the trained Fig. 7 and 8 illustrate the learning from the Belady+ re-
RL network. Fig. 7 presents a weight heatmap for selecting placement policy, indicating the significance of each feature.
6 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3346790
FIGURE 9. LLC Hit rate comparison (16 way, 2MB LLC, w\o prefetcher)
FIGURE 10. IPC comparisons across benchmarks using a cache access
type-aware eviction scheme
TABLE 3. Comparison of cache replacement policies.
Benchmarks \IPC LRU Hawkeye Mockingjay the latter two policies, indicating a substantial deviation from
605.mcf 0.291 0.347 0.387 the LRU. The overall LLC hit rate is higher in LRU than in
619.lbm 0.498 0.458 0.467 the other cases, but IPC improves on the other two policies
620.omnetpp 0.363 0.370 0.373
621.wrf 0.935 0.960 0.958 except for 619.lbm.
625.x264 1.837 1.865 1.867 Fig. 10 shows IPC performance with a simple cache ac-
628.pop2 1.441 1.450 1.458
654.roms 0.551 0.584 0.593
cess type-aware eviction scheme added to the basic LRU
policy. The data indicates that a writeback-aware eviction
technique yields the most substantial improvement, with IPC
enhancements reaching up to 1.88%. This result suggests
Since bypass occurs less frequently than victim selection,
that considering cache access type techniques can improve
features do not stand out as distinctly in the bypass heatmap
the replacement policy performance, especially in writeback
compared to victim selection. Examining the heatmap re-
access.
veals several features as pivotal: set-access-count-since-last-
miss (SALM), tag-bit-subset-array (TBSA), line-recency, Fig. 9 and 10 indicate that writeback miss has a mi-
and access-type. The following discussion delves into the nor impact on IPC because it occurs when data are being
importance of these features. written back to memory from the cache rather than when
A higher count SALM implies that the set is frequently hit, data are being fetched for processing. Stormbird incorporates
indicating more temporal locality for LLC in the correspond- this observation during the hardware implementation stage,
ing set. This information could influence the aggressiveness enabling the development of a more IPC-effective cache
of the replacement policy, with a more assertive eviction replacement policy that considers assigning different priority
strategy in sets with a low SALM. values depending on the cache access types.
TBSA demonstrates capturing the spatial locality of the Our RL framework, which emphasizes the above four
LLC. TBSA provides a compact yet informative representa- features aligns well with the traditional principles of cache
tion of the cache lines. The details of the TBSA are described management, underscoring the efficacy of our RL-based
in Section 4. approach for cache replacement. Each feature exploits the
Line-recency represents the principle of LRU, especially cache’s current state and potential future behavior, enabling
on cache access sequence. However, from Belady’s per- the model to make informed decisions regarding which cache
spective, the MRU line is likely to be evicted because the lines to evict.
MRU line typically has the longest reuse distance in the
access sequences. The sequence of cache line accesses itself IV. PROPOSED CACHE REPLACEMENT POLICY:
underscores the importance of victim selection. STORMBIRD
Access-type plays a pivotal role in the efficacy of cache Building on these simulation observations, we introduce
replacement policies, given that different access-types entail a cache replacement policy called Stormbird. It is designed
varying latencies and miss penalties. A comparison of the to effectively exploit temporal and spatial localities, discern
hit rate and IPC results of LRU, Hawkeye, and Mockingjay between different access types, and manage cache contention
based on the cache access type is presented in Fig. 9 and adaptively. Stormbird incorporates victim selection and by-
Table 3, respectively. In the benchmark exhibiting the largest pass policy, reflecting our observations acquired through
difference between the load hit rate and writeback hit rate, the RL simulation. Resource efficiency has been a primary
654.roms, writeback hits account for only 26.49∼27.46% in consideration in the development of Stormbird, designing at
VOLUME 4, 2016 7
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3346790
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3346790
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3346790
10 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3346790
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3346790
FIGURE 18. IPC over LRU for different LLC replacement policies (SPEC CPU 2017)
FIGURE 19. IPC over LRU for different LLC replacement policies (SPEC CPU 2006)
which represents the current access information. diverse range of benchmarks by employing fine-grained set
The writeback prior evict policy (TAB) analogously to dueling techniques.
its predecessor but introduces flexibility in access type prior- Table 4 illustrates the hardware overhead for each policy
ity through Ptype . Specifically, Ptype is set at 2 if the current for a 16-way 2MB LLC. Stormbird outperforms the com-
access type is writeback; otherwise, it assumes a value of 1. petitors in this aspect, requiring the least hardware overhead
This policy focuses on prioritizing the eviction of writebacks. of 10.5KB. Further details on Stormbird’s hardware budget
The prefetch prior bypass policy (TAB) parallels the are listed in Table 5. Conversely, Glider, which is built on an
functioning of the tag age-based policy. However, before LSTM model, incurs higher hardware overhead. Similarly,
identifying the victim way, the current access is bypassed SHiP, Hawkeye, and Mockingjay, which are predictor-based
if the current access type is identified as a prefetch. Conse- policies, also demonstrate a relatively larger hardware over-
quently, prefetch data are bypassed to L2 rather than to LLC. heads.
This policy mitigates the potential cache pollution caused by
non-reused prefetch data. V. EVALUATION
After completing the warm-up phase, Stormbird elects the A. METHODOLOGY
policy with the highest IPC performance among the three We evaluate Stormbird using the ChampSim [24] simula-
policies. Stormbird aspires to enhance performance across a tor initially introduced during the CRC2 [7]. Our evaluation
12 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3346790
TABLE 8. Maximum Simpoints of benchmarks TABLE 9. Benchmark mixes from the SPEC CPU 2017 multicore experiment
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3346790
TABLE 10. Benchmark mixes from the SPEC CPU 2006 multicore experiment
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3346790
behavior of the mcf benchmark has a large range of addresses nents of the benchmark mix used in multicore experiments.
and pattern complexities. This result suggests avenues for the This addition provides fellow researchers with the necessary
future refinement of adaptive cache management techniques. information to replicate our experiments, thereby improving
Fig. 20 shows the performance of the different LLC re- the objectivity and reproducibility of experiment results.
placement policies on a 4-core setup using the Cloudsuite
benchmark suite. On average, all six policies outperform the REFERENCES
LRU policy. Stormbird stands at 100.28%, which is com- [1] S. Kumar and P. K. Singh, “An overview of modern cache memory and
petitive with the other policies. In the streaming benchmark, performance analysis of replacement policies,” in 2016 IEEE International
Conference on Engineering and Technology (ICETECH). IEEE, 2016, pp.
Stormbird’s IPC of 100.56% demonstrates its efficacy, show- 210–214.
ing a marked improvement over the other policies. Stormbird [2] L. A. Belady, “A study of replacement algorithms for a virtual-storage
chooses TAB policy in streaming benchmark, implying that computer,” IBM Systems journal, vol. 5, no. 2, pp. 78–101, 1966.
[3] A. Jain and C. Lin, “Back to the future: Leveraging Belady’s algorithm
the streaming benchmark does not efficiently utilize the for improved cache replacement,” ACM SIGARCH Computer Architecture
prefetched data. News, vol. 44, no. 3, pp. 78–89, 2016.
Fig. 21 and 22 show the multicore evaluation of IPC over [4] I. Shah, A. Jain, and C. Lin, “Effective mimicry of Belady’s min policy,”
in 2022 IEEE International Symposium on High-Performance Computer
LRU in the SPEC CPU benchmarks suite. Stormbird achieves Architecture (HPCA). IEEE, 2022, pp. 558–572.
99.46% and 100.13%, respectively. Given that Stormbird pri- [5] A. Vakil-Ghahani, S. Mahdizadeh-Shahri, M.-R. Lotfi-Namin,
oritizes the traits of current access within the LLC, the irreg- M. Bakhshalipour, P. Lotfi-Kamran, and H. Sarbazi-Azad, “Cache
replacement policy based on expected hit count,” IEEE computer
ular access patterns arising from varied workloads hindered architecture letters, vol. 17, no. 1, pp. 64–67, 2017.
Stormbird’s performance. PC-based policies are noticeable [6] Sethumurugan, Subhash, J. Yin, and J. Sartori, “Designing a cost-effective
within the multicore extension. A closer examination of the cache replacement policy using machine learning,” in International Sym-
posium on High-Performance Computer Architecture (HPCA). IEEE,
benchmarks, especially those in which Mockingjay stood 2021, pp. 291–303.
out, revealed that the limited PC values were accessed more [7] “The 2nd cache replacement championship,” https://crc2.ece.tamu.edu/,
intensively. This observation demonstrates the effective op- 2017.
[8] A. M. Krause, P. C. Santos, and P. O. Navaux, “Avoiding unnecessary
eration of the PC-based reuse distance predictor in multicore caching with history-based preemptive bypassing,” in 2022 IEEE 34th In-
extension. ternational Symposium on Computer Architecture and High Performance
Table 7 summarizes the overall IPC performance for Computing (SBAC-PAD). IEEE, 2022, pp. 71–80.
[9] Y. Zeng and X. Guo, “Long short term memory based hardware prefetcher:
single-core and 4-core evaluation. Overall, although all poli- a case study,” in Proceedings of the International Symposium on Memory
cies showcase competitive performances, specific strengths Systems, 2017, pp. 305–311.
and weaknesses emerge depending on the benchmark and [10] D. A. Jiménez and C. Lin, “Dynamic branch prediction with percep-
trons,” in Proceedings HPCA Seventh International Symposium on High-
core configuration. This analysis underscores the importance Performance Computer Architecture. IEEE, 2001, pp. 197–206.
of a tailored approach for choosing cache replacement strate- [11] Z. Shi, X. Huang, A. Jain, and C. Lin, “Applying deep learning to the cache
gies that are contingent on the specific behavior of workloads. replacement problem,” in Proceedings of the 52nd Annual IEEE/ACM
International Symposium on Microarchitecture, 2019, pp. 413–425.
[12] C.-J. Wu, A. Jaleel, W. Hasenplaugh, M. Martonosi, S. C. Steely Jr,
VI. CONCLUSION and J. Emer, “SHiP: Signature-based hit predictor for high performance
In this study, we present a replacement policy built upon caching,” in Proceedings of the 44th Annual IEEE/ACM International
Symposium on Microarchitecture, 2011, pp. 430–441.
RL, extracting innovative features for optimal replacement
[13] V. Young, C.-C. Chou, A. Jaleel, and M. Qureshi, “SHiP++: Enhancing
algorithm behavior. As the computer architecture landscape signature-based hit predictor for improved cache performance,” in The
evolves with ever-growing cache sizes, the importance of re- 2nd Cache Replacement Championship (CRC-2 Workshop in ISCA 2017),
2017.
ducing hardware overhead cannot be overstated. We focused
[14] A. Jaleel, K. B. Theobald, S. C. Steely Jr, and J. Emer, “High performance
on reducing the HW overhead of the replacement policy. One cache replacement using re-reference interval prediction (RRIP),” ACM
of the pivotal findings of our experiments was the distinction SIGARCH computer architecture news, vol. 38, no. 3, pp. 60–71, 2010.
between the IPC and hit rate. An increase in the hit rate [15] J. Díaz Maag, P. E. Ibáñez Marín, T. Monreal Arnal, V. Viñals Yúfera,
and J. M. Llaberia Griñó, “ReD: A policy based on reuse detection for
does not invariably lead to a corresponding rise in IPC. This demanding block selection in last-level caches,” in The Second Cache
differentiation is accentuated when the access types vary. Replacement Championship: workshop schedule, 2017, pp. 1–4.
Therefore, it is necessary to amplify the addressing access [16] D. A. Jiménez and E. Teran, “Multiperspective reuse prediction,” in
Proceedings of the 50th Annual IEEE/ACM International Symposium on
types in the replacement policy. Beyond traditional set duel- Microarchitecture, 2017, pp. 436–448.
ing, we integrated an approach based on SALM to discern [17] J. Kim, E. Teran, P. V. Gratz, D. A. Jiménez, S. H. Pugsley, and C. Wilk-
the competitiveness of LLC sets. Additionally, we ensure erson, “Kill the program counter: Reconstructing program behavior in the
processor cache hierarchy,” ACM SIGPLAN Notices, vol. 52, no. 4, pp.
the application of policies tailored to specific benchmark 737–749, 2017.
characteristics. Our proposed replacement policy not only [18] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction.
demonstrates comparable performance to existing policies MIT press, 2018.
[19] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G.
but also achieves a significant reduction in HW overhead. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al.,
“Human-level control through deep reinforcement learning,” nature, vol.
APPENDIX 518, no. 7540, pp. 529–533, 2015.
[20] M. K. Qureshi, A. Jaleel, Y. N. Patt, S. C. Steely, and J. Emer, “Adaptive in-
Table 8 includes a detailed list of the specific Simpoints sertion policies for high performance caching,” ACM SIGARCH Computer
used in our benchmarks. Table 9 and 10 shows the compo- Architecture News, vol. 35, no. 2, pp. 381–391, 2007.
VOLUME 4, 2016 15
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3346790
[21] A. Jaleel, W. Hasenplaugh, M. Qureshi, J. Sebot, S. Steely Jr, and J. Emer, TAE HEE HAN (Senior Member, IEEE) received
“Adaptive insertion policies for managing shared caches,” in Proceedings the B.S., M.S., and Ph.D. degrees in electrical
of the 17th international conference on Parallel architectures and compi- engineering from the Korea Advanced Institute of
lation techniques, 2008, pp. 208–219. Science and Technology (KAIST), Daejeon, Ko-
[22] F. Hameed, L. Bauer, and J. Henkel, “Reducing inter-core cache contention rea, in 1992, 1994, and 1999, respectively. From
with an adaptive bank mapping policy in dram cache,” in 2013 Interna- 1999 to 2006, he was with the Telecom R&D
tional Conference on Hardware/Software Codesign and System Synthesis center of Samsung Electronics, where he devel-
(CODES+ISSS), 2013, pp. 1–8.
oped 3G wireless, mobile TV, and mobile WiMax
[23] P. Zhang, A. Srivastava, A. V. Nori, R. Kannan, and V. K. Prasanna,
handset chipsets. Since March 2008, he has been
“Fine-grained address segmentation for attention-based variable-degree
prefetching,” in Proceedings of the 19th ACM International Conference with Sungkyunkwan University, Suwon, Korea, as
on Computing Frontiers. Association for Computing Machinery, 2022, a Professor. From 2011 to 2013, he had served as a full-time advisor
pp. 103–112. on System ICs for the Korean Government. His current research interests
[24] N. Gober, G. Chacon, L. Wang, P. V. Gratz, D. A. Jimenez, E. Teran, include SoC/Chiplet architectures for AI, advanced memory architecture,
S. Pugsley, and J. Kim, “The championship simulator: Architectural sim- network-on-chip, and system-level design methodologies.
ulation for education and competition,” arXiv preprint arXiv:2210.14324,
2022.
[25] J. Bucek, K.-D. Lange, and J. v. Kistowski, “SPEC CPU2017: Next-
generation compute benchmark,” in Companion of the 2018 ACM/SPEC
International Conference on Performance Engineering, 2018, pp. 41–42.
[26] J. L. Henning, “SPEC CPU2006 benchmark descriptions,” ACM
SIGARCH Computer Architecture News, vol. 34, no. 4, pp. 1–17, 2006.
[27] M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic,
C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi, “Clearing the
clouds: a study of emerging scale-out workloads on modern hardware,”
Acm sigplan notices, vol. 47, no. 4, pp. 37–48, 2012.
[28] E. Perelman, G. Hamerly, M. Van Biesbrouck, T. Sherwood, and B. Calder,
“Using SimPoint for accurate and efficient simulation,” ACM SIGMET-
RICS Performance Evaluation Review, vol. 31, no. 1, pp. 318–319, 2003.
16 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4