


default search action
ICPP 2021: Virtual Event / Lemont (near Chicago), IL, USA
- Xian-He Sun, Sameer Shende, Laxmikant V. Kalé, Yong Chen:
ICPP 2021: 50th International Conference on Parallel Processing, Lemont, IL, USA, August 9 - 12, 2021. ACM 2021, ISBN 978-1-4503-9068-2
Best Paper Candidates
- Hanfeng Liu, Zeyi Wen, Wei Cai:
FastPSO: Towards Efficient Swarm Intelligence Algorithm on GPUs. 1:1-1:10 - Tanmoy Sen
, Haiying Shen:
Context-aware Data Operation Strategies in Edge Systems for High Application Performance. 2:1-2:10 - Yang Yang, Qiang Cao, Jie Yao, Yuanyuan Dong, Weikang Kong:
SPMFS: A Scalable Persistent Memory File System on Optane Persistent Memory. 3:1-3:10 - Lucas Leandro Nesi, Arnaud Legrand, Lucas Mello Schnorr:
Exploiting system level heterogeneity to improve the performance of a GeoStatistics multi-phase task-based application. 4:1-4:10
Memory Systems and NVM
- Shizhi Jiang, Yiwei Ci, Qiusong Yang, Mingshu Li:
Matryoshka: A Coalesced Delta Sequence Prefetcher. 5:1-5:11 - Jingwen Du, Fang Wang, Dan Feng, Weiguang Li, Fan Li:
Fast and Consistent Remote Direct Access to Non-volatile Memory. 6:1-6:11 - Mengya Lei, Fang Wang, Dan Feng, Fan Li, Xueliang Wei:
Crash-Consistency-Aware Encryption for Non-Volatile Memories. 7:1-7:10 - Bagus Hanindhito, Ruihao Li
, Dimitrios Gourounas, Arash Fathi, Karan Govil, Dimitar Trenev, Andreas Gerstlauer, Lizy Kurian John:
Wave-PIM: Accelerating Wave Simulation Using Processing-in-Memory. 8:1-8:11
GPU Computing and Task-based Programming Models
- Yan-Hao Chen, Fei Hua, Yuwei Jin, Eddy Z. Zhang:
BGPQ: A Heap-Based Priority Queue Design for GPUs. 9:1-9:10 - Antoni Navarro Muñoz
, Arthur Francisco Lorenzon, Eduard Ayguadé Parra, Vicenç Beltran Querol:
Combining Dynamic Concurrency Throttling with Voltage and Frequency Scaling on Task-based Programming Models. 10:1-10:11 - Seiya Kozakai, Noriyuki Fujimoto, Koichi Wada:
Efficient GPU-Implementation for Integer Sorting Based on Histogram and Prefix-Sums. 11:1-11:11 - Martin Koppehel, Tobias Groth, Sven Groppe
, Thilo Pionteck:
CuART - a CUDA-based, scalable Radix-Tree lookup and update engine. 12:1-12:10
Resource Management and Infrastructure
- Jinyu Yu
, Dan Feng, Wei Tong, Pengze Lv, Yufei Xiong:
CERES: Container-Based Elastic Resource Management System for Mixed Workloads. 13:1-13:10 - Jananie Jarachanthan, Li Chen, Fei Xu, Bo Li:
AMPS-Inf: Automatic Model Partitioning for Serverless Inference with Cost Efficiency. 14:1-14:12 - Hongyan Li, Hang Lu, Jiawen Huang, Wenxu Wang, Mingzhe Zhang, Wei Chen, Liang Chang, Xiaowei Li
:
BitX: Empower Versatile Inference with Hardware Runtime Pruning. 15:1-15:12 - Longfang Zhou, Xiaorong Zhang, Wenxiang Yang, Yongguo Han, Fang Wang, Yadong Wu, Jie Yu:
PREP: Predicting Job Runtime with Job Running Path on Supercomputers. 16:1-16:10
Storage Systems and Parallel I/O
- Liangfeng Cheng, Yuchong Hu, Zhaokang Ke, Zhongjie Wu:
Coupling Right-Provisioned Cold Storage Data Centers with Deduplication. 17:1-17:11 - Yang Zhou, Fang Wang, Dan Feng:
ASLDP: An Active Semi-supervised Learning method for Disk Failure Prediction. 18:1-18:11 - Hai Zhou
, Dan Feng, Yuchong Hu:
Multi-level Forwarding and Scheduling Repair Technique in Heterogeneous Network for Erasure-coded Clusters. 19:1-19:11 - Haiwei Deng, Ranhao Jia, Chentao Wu:
A Graph-Assisted Out-of-Place Update Scheme for Erasure Coded Storage Systems. 20:1-20:10
Scheduling Algorithms and Optimizations
- Yubin Duan, Jie Wu:
Joint Optimization of DNN Partition and Scheduling for Mobile Cloud Computing. 21:1-21:10 - Ali Eker, David Timmerman, Barry Williams, Kenneth Chiu, Dmitry Ponomarev:
GVT-Guided Demand-Driven Scheduling in Parallel Discrete Event Simulation. 22:1-22:10 - Lucas Perotin, Hongyang Sun, Padma Raghavan:
Multi-Resource List Scheduling of Moldable Parallel Jobs under Precedence Constraints. 23:1-23:10 - YuAng Chen, Yeh-Ching Chung:
HiPa: Hierarchical Partitioning for Fast PageRank on NUMA Multicore Systems. 24:1-24:10
GPU-Accelerated Applications
- Robin Kobus, André Müller
, Daniel Jünger, Christian Hundt, Bertil Schmidt
:
MetaCache-GPU: Ultra-Fast Metagenomic Classification. 25:1-25:11 - Zonghao Feng, Qiong Luo
:
Accelerating Sequence-to-Graph Alignment on Heterogeneous Processors. 26:1-26:10 - Ricardo Nobre, Aleksandar Ilic
, Sergio Santander-Jiménez
, Leonel Sousa:
Fourth-Order Exhaustive Epistasis Detection for the xPU Era. 27:1-27:10 - Junsong Wang, Xiaofan Zhang, Yubo Li, Yonghua Lin:
Exploring HW/SW Co-Optimizations for Accelerating Large-scale Texture Identification on Distributed GPUs. 28:1-28:10
Performance Modeling and Evaluation
- Guillem López-Paradís, Adrià Armejach, Miquel Moretó:
gem5 + rtl: A Framework to Enable RTL Models Inside a Full-System Simulator. 29:1-29:11 - Abdullah Alperen, Md. Afibuzzaman, Fazlay Rabbi, M. Yusuf Özkaya, Ümit V. Çatalyürek, Hasan Metin Aktulga
:
An Evaluation of Task-Parallel Frameworks for Sparse Solvers on Multicore and Manycore CPU Architectures. 30:1-30:11 - Alexandre Denis
, Emmanuel Jeannot, Philippe Swartvagher:
Interferences between Communications and Computations in Distributed HPC Systems. 31:1-31:11 - Junyao Yang, Yuchen Wang, Zhenlin Wang:
Efficient Modeling of Random Sampling-Based LRU. 32:1-32:11
Parallelization and Code Generation
- Qiang Fu, H. Howie Huang
:
Automatic Generation of High-Performance Inference Kernels for Graph Neural Networks on Multi-Core Systems. 33:1-33:11 - Mingzhen Li
, Yi Liu, Hailong Yang, Yongmin Hu, Qingxiao Sun, Bangduo Chen, Xin You, Xiaoyan Liu, Zhongzhi Luan, Depei Qian:
Automatic Code Generation and Optimization of Large-scale Stencil Computation on Many-core Processors. 34:1-34:12 - Jan-Patrick Lehr
, Christian H. Bischof, Florian Dewald, Heiko Mantel, Mohammad Norouzi, Felix Wolf:
Tool-Supported Mini-App Extraction to Facilitate Program Analysis and Parallelization. 35:1-35:10 - Hannah Cartier, James Dinan, D. Brian Larkins:
Optimizing Work Stealing Communication with Structured Atomic Operations. 36:1-36:10
Applications with Machine Learning
- Haoyu Wang, Haiying Shen, Jiechao Gao, Kevin Zheng, Xiaoying Li:
Multi-Agent Reinforcement Learning based Distributed Renewable Energy Matching for Datacenters. 37:1-37:10 - Sijiang Fan, Jiawei Fei, Xiaowei Guo
, Canqun Yang, Alistair Revell:
CNN+LSTM Accelerated Turbulent Flow Simulation with Link-Wise Artificial Compressibility Method. 38:1-38:10 - Garvit Goel, Atharva Gondhalekar, Jingyuan Qi, Zhicheng Zhang
, Guohua Cao, Wu Feng:
ComputeCOVID19+: Accelerating COVID-19 Diagnosis and Monitoring via High-Performance Deep Learning on CT Images. 39:1-39:11 - Aymen Al Saadi
, Dario Alfè, Yadu N. Babuji, Agastya Bhati
, Ben Blaiszik, Alexander Brace, Thomas S. Brettin, Kyle Chard
, Ryan Chard, Austin Clyde, Peter V. Coveney, Ian T. Foster, Tom Gibbs, Shantenu Jha
, Kristopher Keipert, Dieter Kranzlmüller, Thorsten Kurth, Hyungro Lee, Zhuozhao Li
, Heng Ma, Gerald Mathias
, André Merzky, Alexander Partin
, Arvind Ramanathan, Ashka Shah, Abraham C. Stern, Rick Stevens, Li Tan, Mikhail Titov
, Anda Trifan, Aristeidis Tsaris, Matteo Turilli
, Huub J. J. Van Dam
, Shunzhou Wan, David Wifling, Junqi Yin:
IMPECCABLE: Integrated Modeling PipelinE for COVID Cure by Assessing Better LEads. 40:1-40:12
Graph Computing
- Ruiqi Tang, Ziyi Zhao
, Kailun Wang, Xiaoli Gong, Jin Zhang, Wenwen Wang, Pen-Chung Yew
:
Ascetic: Enhancing Cross-Iterations Data Efficiency in Out-of-Memory Graph Processing on GPUs. 41:1-41:10 - Mohsen Koohi Esfahani
, Peter Kilpatrick
, Hans Vandierendonck:
Exploiting in-Hub Temporal Locality in SpMV-based Graph Processing. 42:1-42:10 - Huashan Yu, Xiaolin Wang, Yingwei Luo:
An Edge-Fencing Strategy for Optimizing SSSP Computations on Large-Scale Graphs. 43:1-43:11 - Lin Zhu, Qiang-Sheng Hua, Hai Jin:
Communication Avoiding All-Pairs Shortest Paths Algorithm for Sparse Graphs. 44:1-44:10
Storage Software and Optimizations
- Qiliang Li
, Min Lyu, Liangliang Xu, Yinlong Xu, Wei Wang:
Fast Reconstruction for Large Disk Enclosures Based on RAID2.0. 45:1-45:10 - Jun Li, Minjun Li, Zhigang Cai, François Trahay, Mohamed Wahib, Balazs Gerofi, Zhiming Liu, Min Huang, Jianwei Liao:
Intra-page Cache Update in SLC-mode with Partial Programming in High Density SSDs. 46:1-46:10 - Junhao Zhu, Kaixin Huang, Xiaomin Zou, Chenglong Huang, Nuo Xu
, Liang Fang:
HDNH: a read-efficient and write-optimized hashing scheme for hybrid DRAM-NVM memory. 47:1-47:10 - Jing Hu
, Jianxi Chen, Yifeng Zhu, Qing Yang, Zhouxuan Peng, Ya Yu:
Parallel Multi-split Extendible Hashing for Persistent Memory. 48:1-48:10
Algorithms and Applications
- Zitong Li
, Qiming Fang, Grey Ballard
:
Parallel Tucker Decomposition with Numerically Accurate SVD. 49:1-49:11 - Nikita Mishin, Daniil Berezun, Alexander Tiskin:
Efficient Parallel Algorithms for String Comparison. 50:1-50:10 - Zhuoran Ji
, Cho-Li Wang:
Accelerating DBSCAN Algorithm with AI Chips for Large Datasets. 51:1-51:11 - Runtian Ren, Xueyan Tang:
Generalized Skyline Interval Coloring and Dynamic Geometric Bin Packing Problems. 52:1-52:10
Linear Algebra Algorithms
- Chenhao Xie, Jieyang Chen
, Jesun Firoz, Jiajia Li, Shuaiwen Leon Song, Kevin J. Barker
, Mark Raugas
, Ang Li:
Fast and Scalable Sparse Triangular Solver for Multi-GPU Based HPC Architectures. 53:1-53:11 - Christoph Klein
, Robert Strzodka:
Tridiagonal GPU Solver with Scaled Partial Pivoting at Maximum Bandwidth. 54:1-54:10 - Yuan Tang, Weiguo Gao:
Processor-Aware Cache-Oblivious Algorithms✱. 55:1-55:10 - Viviana Arrigoni, Filippo Maggioli
, Annalisa Massini, Emanuele Rodolà:
Efficiently Parallelizable Strassen-Based Multiplication of a Matrix by its Transpose. 56:1-56:12
Data Analytics Systems and Runtime
- Yijie Shen, Jin Xiong, Dejun Jiang:
Using Vectorized Execution to Improve SQL Query Performance on Spark. 57:1-57:11 - Bowen Yu, Huanqi Cao, Tianyi Shan, Haojie Wang, Xiongchao Tang, Wenguang Chen:
Sparker: Efficient Reduction for More Scalable Machine Learning with Spark. 58:1-58:11 - Qianwen Ye, Wuji Liu, Chase Q. Wu:
NoStop: A Novel Configuration Optimization Scheme for Spark Streaming. 59:1-59:10 - Md. Muhib Khan, Weikuan Yu
:
ROBOTune: High-Dimensional Configuration Tuning for Cluster-Based Data Analytics. 60:1-60:10
Applications and Performance
- Hanpei Wu, Tongliang Deng, Yanliang Zou, Shu Yin, Si Chen, Tao Xie:
ADA: An Application-Conscious Data Acquirer for Visual Molecular Dynamics. 61:1-61:9 - Kun Qiu, Harry Chang, Yang Hong, Wenjun Zhu, Xiang Wang, Baoqian Li:
Teddy: An Efficient SIMD-based Literal Matching Engine for Scalable Deep Packet Inspection. 62:1-62:11 - Jianda Wang, Yang Hu:
Enabling Efficient SIMD Acceleration for Virtual Radio Access Network. 63:1-63:10 - Marquita Ellis, Aydin Buluç
, Katherine A. Yelick
:
Scaling Generalized N-Body Problems, A Case Study from Genomics. 64:1-64:9
Networking and Routing
- Yang Shi, Mei Wen:
sRouting: Towards a Better Flow Size Estimation Performance through Routing and Sketch Configuration. 65:1-65:11 - Yiran Zhang
, Kun Qian, Fengyuan Ren:
Receiver-Driven Congestion Control for InfiniBand. 66:1-66:10 - En Wang, Dongming Luan, Yongjian Yang, Zihe Wang, Pengmin Dong, Dawei Li, Wenbin Liu, Jie Wu:
Distributed Game-Theoretical Route Navigation for Vehicular Crowdsensing. 67:1-67:11 - Sen Liu, Xiang Lin, Zehua Guo, Yi Wang, Mohamed Adel Serhani, Yang Xu
:
Optimizing Flow Completion Time via Adaptive Buffer Management in Data Center Networks. 68:1-68:10
Machine Learning and Acceleration
- Zhenwei Zhang, Qiang Qi, Ruitao Shang, Li Chen, Fei Xu:
Prophet: Speeding up Distributed DNN Training with Predictable Communication Scheduling. 69:1-69:11 - Dongsheng Li, Dan Huang, Zhiguang Chen, Yutong Lu:
Optimizing Massively Parallel Winograd Convolution on ARM Processor. 70:1-70:12 - Xiangyu Ye, Zhiquan Lai, Shengwei Li, Lei Cai, Ding Sun, Linbo Qiao, Dongsheng Li:
Hippie: A Data-Paralleled Pipeline Approach to Improve Memory-Efficiency and Scalability for Large DNN Training. 71:1-71:10 - Hao Lan, Li Chen, Baochun Li:
Accelerated Device Placement Optimization with Contrastive Learning. 72:1-72:10
Data Structures and Applications
- Haosen Wen, Wentao Cai
, Mingzhe Du
, Louis Jenkins, Benjamin Valpey, Michael L. Scott
:
A Fast, General System for Buffered Persistent Data Structures. 73:1-73:11 - Zhengming Yi
, Yiping Yao, Kai Chen
:
A Universal Construction to implement Concurrent Data Structure for NUMA-muticore. 74:1-74:11 - Hui Zeng
, Tongqing Zhou, Yeting Guo
, Zhiping Cai, Fang Liu:
FedCav: Contribution-aware Model Aggregation on Distributed Heterogeneous Data in Federated Learning. 75:1-75:10 - Yizhi Huang, Yanlong Yin
, Yan Liu, Shuibing He, Yang Bai, Renfa Li:
A Novel Multi-CPU/GPU Collaborative Computing Framework for SGD-based Matrix Factorization. 76:1-76:12
Performance Optimization
- Xiang Fei, Youhui Zhang:
Regu2D: Accelerating Vectorization of SpMV on Intel Processors through 2D-partitioning and Regular Arrangement. 77:1-77:11 - Daichi Mukunoki
, Katsuhisa Ozaki, Takeshi Ogita, Toshiyuki Imamura:
Accurate Matrix Multiplication on Binary128 Format Accelerated by Ozaki Scheme. 78:1-78:11 - Enda Yu, Dezun Dong, Yemao Xu, Shuo Ouyang, Xiangke Liao:
CD-SGD: Distributed Stochastic Gradient Descent with Compression and Delay Compensation. 79:1-79:10 - Shaoshuai Zhang
, Panruo Wu:
Recursion Brings Speedup to Out-of-Core TensorCore-based Linear Algebra Algorithms: A Case Study of Classic Gram-Schmidt QR Factorization. 80:1-80:11
Machine Learning Algorithms
- Guangli Li, Zhen Jia, Xiaobing Feng, Yida Wang:
LoWino: Towards Efficient Low-Precision Winograd Convolutions on Modern CPUs. 81:1-81:11 - Liang Gao, Li Li, Yingwen Chen, Wenli Zheng
, Chengzhong Xu, Ming Xu:
FIFL: A Fair Incentive Mechanism for Federated Learning. 82:1-82:10 - Shulai Zhang, Zirui Li, Quan Chen, Wenli Zheng, Jingwen Leng, Minyi Guo:
Dubhe: Towards Data Unbiasedness with Homomorphic Encryption in Federated Learning Client Selection. 83:1-83:10 - Junhong Liu, Dongxu Yang, Junjie Lai:
Optimizing Winograd-Based Convolution with Tensor Cores. 84:1-84:10
Virtualization and Stream Processing
- Yuewen Wu
, Heng Wu, Yuanjia Xu, Yi Hu, Wenbo Zhang, Hua Zhong, Tao Huang:
Best VM Selection for Big Data Applications across Multiple Frameworks by Transfer Learning. 85:1-85:11 - Lulu Yao, Yongkun Li, Jiawei Li, Weijie Wu, Yinlong Xu:
Progressive Memory Adjustment with Performance Guarantee in Virtualized Systems. 86:1-86:11 - Stijn Schildermans, Kris Aerts
, Jianchen Shan, Xiaoning Ding
:
Paratick: Reducing Timer Overhead in Virtual Machines. 87:1-87:10 - Huiyao Mei, Hanhua Chen, Hai Jin, Qiang-Sheng Hua, Bing Bing Zhou:
Efficient Complete Event Trend Detection over High-Velocity Streams. 88:1-88:12

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.