


default search action
ACM Transactions on Architecture and Code Optimization, Volume 20
Volume 20, Number 1, March 2023
- Thomas Luinaud
, J. M. Pierre Langlois
, Yvon Savaria
:
Symbolic Analysis for Data Plane Programs Specialization. 1:1-1:21 - Nilesh Rajendra Shah
, Ashitabh Misra
, Antoine Miné
, Rakesh Venkat
, Ramakrishna Upadrasta
:
BullsEye : Scalable and Accurate Approximation Framework for Cache Miss Calculation. 2:1-2:28 - Mitali Soni
, Asmita Pal
, Joshua San Miguel
:
As-Is Approximate Computing. 3:1-3:26 - Parth Shah
, Ranjal Gautham Shenoy
, Vaidyanathan Srinivasan
, Pradip Bose
, Alper Buyuktosunoglu
:
TokenSmart: Distributed, Scalable Power Management in the Many-core Era. 4:1-4:26 - Zhangyu Chen
, Yu Hua
, Luochangqi Ding
, Bo Ding
, Pengfei Zuo
, Xue Liu
:
Lock-Free High-performance Hashing for Persistent Memory via PM-aware Holistic Optimization. 5:1-5:26 - Aristeidis Mastoras
, Sotiris Anagnostidis
, Albert-Jan Nicholas Yzelman
:
Design and Implementation for Nonblocking Execution in GraphBLAS: Tradeoffs and Performance. 6:1-6:23 - Yemao Xu
, Dezun Dong
, Dongsheng Wang
, Shi Xu
, Enda Yu
, Weixia Xu
, Xiangke Liao:
SSD-SGD: Communication Sparsification for Distributed Deep Learning Training. 7:1-7:25 - Ataberk Olgun
, Juan Gómez-Luna
, Konstantinos Kanellopoulos
, Behzad Salami
, Hasan Hassan
, Oguz Ergin
, Onur Mutlu
:
PiDRAM: A Holistic End-to-end FPGA-based Framework for Processing-in-DRAM. 8:1-8:31 - Christos Sakalis
, Stefanos Kaxiras
, Magnus Själander
:
Delay-on-Squash: Stopping Microarchitectural Replay Attacks in Their Tracks. 9:1-9:24 - Yi Liang
, Shaokang Zeng
, Lei Wang
:
Quantifying Resource Contention of Co-located Workloads with the System-level Entropy. 10:1-10:25 - Suyeon Hur
, Seongmin Na
, Dongup Kwon
, Joonsung Kim
, Andrew Boutros
, Eriko Nurvitadhi
, Jangwoo Kim
:
A Fast and Flexible FPGA-based Accelerator for Natural Language Processing Neural Networks. 11:1-11:24 - Ashish Gondimalla
, Jianqiao Liu
, Mithuna Thottethodi
, T. N. Vijaykumar
:
Occam: Optimal Data Reuse for Convolutional Neural Networks. 12:1-12:25 - Bo Peng
, Yaozu Dong
, Jianguo Yao
, Fengguang Wu
, Haibing Guan
:
FlexHM: A Practical System for Heterogeneous Memory with Flexible and Efficient Performance Optimizations. 13:1-13:26 - Qiang Zhang
, Lei Xu
, Baowen Xu
:
RegCPython: A Register-based Python Interpreter for Better Performance. 14:1-14:25 - Hai Jin
, Zhuo He
, Weizhong Qiang
:
SpecTerminator: Blocking Speculative Side Channels Based on Instruction Classes on RISC-V. 15:1-15:26 - Tuowen Zhao
, Tobi Popoola
, Mary W. Hall
, Catherine Olschanowsky
, Michelle Strout
:
Polyhedral Specification and Code Generation of Sparse Tensor Contraction with Co-iteration. 16:1-16:26 - Manuela Schuler
, Richard Membarth
, Philipp Slusallek
:
XEngine: Optimal Tensor Rematerialization for Neural Networks in Heterogeneous Environments. 17:1-17:25 - Ivan Korostelev
, Joao P. L. de Carvalho
, José E. Moreira
, José Nelson Amaral
:
YaConv: Convolution with Low Cache Footprint. 18:1-18:18 - Furkan Eris
, Marcia S. Louis
, Kubra Eris
, José Luis Abellán Miguel
, Ajay Joshi
:
Puppeteer: A Random Forest Based Manager for Hardware Prefetchers Across the Memory Hierarchy. 19:1-19:25
Volume 20, Number 2, June 2023
- Nicolas Tollenaere
, Guillaume Iooss
, Stéphane Pouget
, Hugo Brunie
, Christophe Guillon
, Albert Cohen
, P. Sadayappan
, Fabrice Rastello
:
Autotuning Convolutions Is Easier Than You Think. 20:1-20:24 - Victor Perez, Lukas Sommer
, Victor Lomüller
, Kumudha Narasimhan
, Mehdi Goli
:
User-driven Online Kernel Fusion for SYCL. 21:1-21:25 - Vinicius Espindola
, Luciano G. Zago
, Hervé Yviquel
, Guido Araujo
:
Source Matching and Rewriting for MLIR Using String-Based Automata. 22:1-22:26 - Wenjing Ma
, Fangfang Liu
, Daokun Chen
, Qinglin Lu
, Yi Hu
, Hongsen Wang
, Xinhui Yuan
:
An Optimized Framework for Matrix Factorization on the New Sunway Many-core Platform. 23:1-23:24 - Sarabjeet Singh
, Neelam Surana
, Kailash Prasad
, Pranjali Jain
, Joycee Mekie
, Manu Awasthi
:
HyGain: High-performance, Energy-efficient Hybrid Gain Cell-based Cache Hierarchy. 24:1-24:20 - Chandra Sekhar Mummidi
, Sandip Kundu
:
ACTION: Adaptive Cache Block Migration in Distributed Cache Architectures. 25:1-25:19 - Qiaoyi Liu
, Jeff Setter
, Dillon Huff
, Maxwell Strange
, Kathleen Feng
, Mark Horowitz
, Priyanka Raina
, Fredrik Kjolstad
:
Unified Buffer: Compiling Image Processing and Machine Learning Applications to Push-Memory Accelerators. 26:1-26:26 - Ahmet Caner Yüzügüler
, Canberk Sönmez
, Mario Drumond
, Yunho Oh
, Babak Falsafi
, Pascal Frossard
:
Scale-out Systolic Arrays. 27:1-27:25 - Francesco Minervini
, Oscar Palomar
, Osman S. Unsal
, Enrico Reggiani
, Josue V. Quiroga
, Joan Marimon
, Carlos Rojas
, Roger Figueras
, Abraham Ruiz
, Alberto González
, Jonnatan Mendoza
, Iván Vargas, César Hernández
, Joan Cabre
, Lina Khoirunisya
, Mustapha Bouhali
, Julian Pavon
, Francesc Moll
, Mauro Olivieri
, Mario Kovac
, Mate Kovac
, Leon Dragic
, Mateo Valero
, Adrián Cristal
:
Vitruvius+: An Area-Efficient RISC-V Decoupled Vector Coprocessor for High Performance Computing Applications. 28:1-28:25 - Hadjer Benmeziane
, Hamza Ouarnoughi
, Kaoutar El Maghraoui
, Smaïl Niar
:
Multi-objective Hardware-aware Neural Architecture Search with Pareto Rank-preserving Surrogate Models. 29:1-29:21 - Dongwei Chen
, Dong Tong
, Chun Yang
, Jiangfang Yi
, Xu Cheng
:
FlexPointer: Fast Address Translation Based on Range TLB and Tagged Pointers. 30:1-30:24 - Jingwen Du
, Fang Wang
, Dan Feng
, Changchen Gan
, Yuchao Cao
, Xiaomin Zou
, Fan Li
:
Fast One-Sided RDMA-Based State Machine Replication for Disaggregated Memory. 31:1-31:25
Volume 20, Number 3, September 2023
- Abdul Rasheed Sahni
, Hamza Omar
, Usman Ali
, Omer Khan
:
ASM: An Adaptive Secure Multicore for Co-located Mutually Distrusting Processes. 32:1-32:24 - Sooraj Puthoor
, Mikko H. Lipasti
:
Turn-based Spatiotemporal Coherence for GPUs. 33:1-33:27 - Ruobing Chen
, Haosen Shi
, Jinping Wu
, Yusen Li
, Xiaoguang Liu
, Gang Wang
:
Jointly Optimizing Job Assignment and Resource Partitioning for Improving System Throughput in Cloud Datacenters. 34:1-34:24 - Gokul Subramanian Ravi
, Tushar Krishna
, Mikko H. Lipasti
:
TNT: A Modular Approach to Traversing Physically Heterogeneous NOCs at Bare-wire Latency. 35:1-35:25 - Weizhi Xu
, Yintai Sun
, Shengyu Fan
, Hui Yu
, Xin Fu
:
Accelerating Convolutional Neural Network by Exploiting Sparsity on GPUs. 36:1-36:26 - Jin Zhao
, Yu Zhang
, Ligang He
, Qikun Li
, Xiang Zhang
, Xinyu Jiang
, Hui Yu
, Xiaofei Liao
, Hai Jin
, Lin Gu
, Haikun Liu
, Bingsheng He
, Ji Zhang
, Xianzheng Song
, Lin Wang
, Jun Zhou
:
GraphTune: An Efficient Dependency-Aware Substrate to Alleviate Irregularity in Concurrent Graph Processing. 37:1-37:24 - Yufeng Zhou
, Alan L. Cox
, Sandhya Dwarkadas
, Xiaowan Dong
:
The Impact of Page Size and Microarchitecture on Instruction Address Translation Overhead. 38:1-38:25 - Benjamin Reber
, Matthew Gould
, Alexander H. Kneipp
, Fangzhou Liu
, Ian Prechtl
, Chen Ding
, Linlin Chen
, Dorin Patru
:
Cache Programming for Scientific Loops Using Leases. 39:1-39:25 - Xinfeng Xie
, Peng Gu
, Yufei Ding
, Dimin Niu
, Hongzhong Zheng
, Yuan Xie
:
MPU: Memory-centric SIMT Processor via In-DRAM Near-bank Computing. 40:1-40:26 - Alexander Krolik
, Clark Verbrugge
, Laurie J. Hendren
:
rNdN: Fast Query Compilation for NVIDIA GPUs. 41:1-41:25 - Jiazhi Jiang
, Zijiang Huang
, Dan Huang
, Jiangsu Du
, Lin Chen, Ziguang Chen
, Yutong Lu
:
Hierarchical Model Parallelism for Optimizing Inference on Many-core Processor via Decoupled 3D-CNN Structure. 42:1-42:21 - Yuwen Zhao
, Fangfang Liu
, Wenjing Ma
, Huiyuan Li
, Yuanchi Peng
, Cui Wang
:
MFFT: A GPU Accelerated Highly Efficient Mixed-Precision Large-Scale FFT Framework. 43:1-43:23 - Muhammad Waqar Azhar
, Madhavan Manivannan
, Per Stenström
:
Approx-RM: Reducing Energy on Heterogeneous Multicore Processors under Accuracy and Timing Constraints. 44:1-44:25 - Dong Huang
, Dan Feng
, Qiankun Liu
, Bo Ding
, Wei Zhao
, Xueliang Wei
, Wei Tong
:
SplitZNS: Towards an Efficient LSM-Tree on Zoned Namespace SSDs. 45:1-45:26
Volume 20, Number 4, December 2023
- Jiangsu Du
, Jiazhi Jiang
, Jiang Zheng
, Hongbin Zhang
, Dan Huang
, Yutong Lu
:
Improving Computation and Memory Efficiency for Real-world Transformer Inference on GPUs. 46:1-46:22 - Hai Jin
, Bo Lei
, Haikun Liu
, Xiaofei Liao
, Zhuohui Duan
, Chencheng Ye
, Yu Zhang
:
A Compilation Tool for Computation Offloading in ReRAM-based CIM Architectures. 47:1-47:25 - Christian Menard
, Marten Lohstroh
, Soroush Bateni
, Matthew Chorlian
, Arthur Deng
, Peter Donovan
, Clément Fournier
, Shaokai Lin
, Felix Suchert
, Tassilo Tanneberger
, Hokeun Kim
, Jerónimo Castrillón
, Edward A. Lee
:
High-performance Deterministic Concurrency Using Lingua Franca. 48:1-48:29 - Donglei Wu
, Weihao Yang
, Xiangyu Zou
, Wen Xia
, Shiyi Li
, Zhenbo Hu
, Weizhe Zhang
, Binxing Fang
:
Smart-DNN+: A Memory-efficient Neural Networks Compression Framework for the Model Inference. 49:1-49:24 - Syed Salauddin Mohammad Tariq
, Lance Menard
, Pengfei Su
, Probir Roy
:
MicroProf: Code-level Attribution of Unnecessary Data Transfer in Microservice Applications. 50:1-50:26 - Shiyi Li
, Qiang Cao
, Shenggang Wan
, Wen Xia
, Changsheng Xie
:
gPPM: A Generalized Matrix Operation and Parallel Algorithm to Accelerate the Encoding/Decoding Process of Erasure Codes. 51:1-51:25 - Petros Anastasiadis
, Nikela Papadopoulou
, Georgios I. Goumas
, Nectarios Koziris
, Dennis Hoppe
, Li Zhong
:
PARALiA: A Performance Aware Runtime for Auto-tuning Linear Algebra on Heterogeneous Systems. 52:1-52:25 - Hui Yu, Yu Zhang
, Jin Zhao
, Yujian Liao, Zhiying Huang, Donghao He, Lin Gu
, Hai Jin, Xiaofei Liao, Haikun Liu
, Bingsheng He, Jianhui Yue:
RACE: An Efficient Redundancy-aware Accelerator for Dynamic Graph Neural Network. 53:1-53:26 - Victor Ferrari
, Rafael Cardoso Fernandes Sousa
, Márcio Machado Pereira, Joao P. L. de Carvalho
, José Nelson Amaral
, José E. Moreira
, Guido Araujo
:
Advancing Direct Convolution Using Convolution Slicing Optimization and ISA Extensions. 54:1-54:26 - Bowen He
, Xiao Zheng
, Yuan Chen
, Weinan Li
, Yajin Zhou
, Xin Long
, Pengcheng Zhang
, Xiaowei Lu
, Linquan Jiang
, Qiang Liu
, Dennis Cai
, Xiantao Zhang
:
DxPU: Large-scale Disaggregated GPU Pools in the Datacenter. 55:1-55:23 - Shiqing Zhang
, Mahmood Naderan-Tahan
, Magnus Jahre
, Lieven Eeckhout
:
Characterizing Multi-Chip GPU Data Sharing. 56:1-56:24 - Jens Domke
, Emil Vatai
, Balazs Gerofi
, Yuetsu Kodama
, Mohamed Wahib
, Artur Podobas
, Sparsh Mittal
, Miquel Pericàs
, Lingqi Zhang
, Peng Chen
, Aleksandr Drozd
, Satoshi Matsuoka
:
At the Locus of Performance: Quantifying the Effects of Copious 3D-Stacked Cache on HPC Workloads. 57:1-57:26 - Satya Jaswanth Badri
, Mukesh Saini
, Neeraj Goel
:
Mapi-Pro: An Energy Efficient Memory Mapping Technique for Intermittent Computing. 58:1-58:25 - Miao Yu
, Tingting Xiang
, Venkata Pavan Kumar Miriyala
, Trevor E. Carlson
:
Multiply-and-Fire: An Event-Driven Sparse Neural Network Accelerator. 59:1-59:26 - Ziaul Choudhury
, Anish Gulati
, Suresh Purini
:
FlowPix: Accelerating Image Processing Pipelines on an FPGA Overlay using a Domain Specific Compiler. 60:1-60:25 - Zachary Susskind
, Aman Arora
, Igor D. S. Miranda
, Alan T. L. Bacellar
, Luis A. Q. Villon
, Rafael Fontella Katopodis
, Leandro Santiago de Araújo
, Diego L. C. Dutra
, Priscila M. V. Lima
, Felipe M. G. França
, Maurício Breternitz
, Lizy K. John
:
ULEEN: A Novel Architecture for Ultra-low-energy Edge Neural Networks. 61:1-61:24 - Jia Wei
, Xingjun Zhang
, Longxiang Wang
, Zheng Wei
:
Fastensor: Optimise the Tensor I/O Path from SSD to GPU for Deep Learning Training. 62:1-62:25

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.