research-article

Improve the Performance of Parallel Reduction on General-Purpose Graphics Processor Units Using Prediction Models

Authors:

Chin-Fu Kuo,

Zong-Ru Zhung,

Yung-Feng LuAuthors Info & Claims

RACS '23: Proceedings of the 2023 International Conference on Research in Adaptive and Convergent Systems

Article No.: 1, Pages 1 - 7

https://doi.org/10.1145/3599957.3606208

Published: 29 August 2023 Publication History

Get Access

Abstract

When executing a kernel function on a general-purpose graphics processing unit (GPGPU), it is critical to select an appropriate configuration setting for optimal performance. Configuration settings affect the allocation and utilization of GPGPU resources during the execution of a kernel function1. However, testing all possible configuration settings to find an optimal setting is time-consuming and costly. To address this challenge, we propose a prediction mechanism that can suggest a configuration setting for the kernel function to complete the operation with minimal execution time. We start by filtering the amount of data, mandatory parameters, and optional parameters, and then calculate the resource occupancy of three critical resources on the GPGPU: Warp, Register, and Shared Memory. We eliminate configuration settings with a lower average resource occupancy than the user-defined value. The remaining configuration settings have better execution performance, and we use them to execute the kernel functions and record the required execution time. Finally, we use these configuration settings and their corresponding execution times as training data to build a prediction model using the logistic regression (LR) algorithm. At runtime, the prediction model recommends a configuration setting with better performance when the amount of data to be processed is known. We have conducted experiments that confirm our proposed mechanism's ability to improve kernel function execution performance more effectively than other mechanisms. Note that the proposed mechanism can be applied to other kernel functions.

References

[1]

CUDA Toolkit Documentation v11.3.0, https://docs.nvidia.com/cuda/index.html, 2021.

Google Scholar

[2]

Miroslav Kubat, An Introduction to Machine Learning, Springer, 2017, pp. 43--62.

Crossref

Google Scholar

[3]

Thanasekhar Balaiah and Ranjani Parthasarathi. 2020. Autotuning of configuration for program execution in GPUs. Concurrency and Computation: Practice and Experience 32, 9 (2020), e5635.

Crossref

Google Scholar

[4]

Yalin Baştardar and Mustafa Özuysal. 2014. Introduction to machine learning. miRNomics: MicroRNA biology and computational analysis (2014), 105--128.

Google Scholar

[5]

Ben van Werkhoven. 2019. Kernel Tuner: A search-optimizing GPU code auto- tuner. Future Generation Computer Systems 90 (2019), 347--358.

Crossref

Google Scholar

Index Terms

Improve the Performance of Parallel Reduction on General-Purpose Graphics Processor Units Using Prediction Models
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Multicore architectures

Recommendations

A performance study of general-purpose applications on graphics processors using CUDA

Graphics processors (GPUs) provide a vast number of simple, data-parallel, deeply multithreaded cores and high memory bandwidths. GPU architectures are becoming increasingly programmable, offering the potential for dramatic speedups for a variety of ...
Load Balancing versus Occupancy Maximization on Graphics Processing Units: The Generalized Hough Transform as a Case Study

Programs developed under the Compute Unified Device Architecture obtain the highest performance rate, when the exploitation of hardware resources on a Graphics Processing Unit (GPU) is maximized. In order to achieve this purpose, load balancing among ...
General-purpose Graphics Processor Architectures

Comments

comments powered by Disqus.

Information & Contributors

Information

Published In

RACS '23: Proceedings of the 2023 International Conference on Research in Adaptive and Convergent Systems

August 2023

251 pages

ISBN:9798400702280

DOI:10.1145/3599957

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 August 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

RACS '23

Sponsor:

SIGAPP

RACS '23: International Conference on Research in Adaptive and Convergent Systems

August 6 - 10, 2023

Gdansk, Poland

Acceptance Rates

Overall Acceptance Rate 393 of 1,581 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
19
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 18 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Index Terms

Recommendations

A performance study of general-purpose applications on graphics processors using CUDA

Load Balancing versus Occupancy Maximization on Graphics Processing Units: The Generalized Hough Transform as a Case Study

General-purpose Graphics Processor Architectures

Comments

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Other Metrics

Article Metrics

Other Metrics

Login options

Full Access

PDF

eReader

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Abstract

References

Index Terms

Recommendations

A performance study of general-purpose applications on graphics processors using CUDA

Load Balancing versus Occupancy Maximization on Graphics Processing Units: The Generalized Hough Transform as a Case Study

General-purpose Graphics Processor Architectures

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.