skip to main content
10.1145/3599957.3606208acmconferencesArticle/Chapter ViewAbstractPublication PagesracsConference Proceedingsconference-collections
research-article

Improve the Performance of Parallel Reduction on General-Purpose Graphics Processor Units Using Prediction Models

Published: 29 August 2023 Publication History

Abstract

When executing a kernel function on a general-purpose graphics processing unit (GPGPU), it is critical to select an appropriate configuration setting for optimal performance. Configuration settings affect the allocation and utilization of GPGPU resources during the execution of a kernel function1. However, testing all possible configuration settings to find an optimal setting is time-consuming and costly. To address this challenge, we propose a prediction mechanism that can suggest a configuration setting for the kernel function to complete the operation with minimal execution time. We start by filtering the amount of data, mandatory parameters, and optional parameters, and then calculate the resource occupancy of three critical resources on the GPGPU: Warp, Register, and Shared Memory. We eliminate configuration settings with a lower average resource occupancy than the user-defined value. The remaining configuration settings have better execution performance, and we use them to execute the kernel functions and record the required execution time. Finally, we use these configuration settings and their corresponding execution times as training data to build a prediction model using the logistic regression (LR) algorithm. At runtime, the prediction model recommends a configuration setting with better performance when the amount of data to be processed is known. We have conducted experiments that confirm our proposed mechanism's ability to improve kernel function execution performance more effectively than other mechanisms. Note that the proposed mechanism can be applied to other kernel functions.

References

[1]
CUDA Toolkit Documentation v11.3.0, https://docs.nvidia.com/cuda/index.html, 2021.
[2]
Miroslav Kubat, An Introduction to Machine Learning, Springer, 2017, pp. 43--62.
[3]
Thanasekhar Balaiah and Ranjani Parthasarathi. 2020. Autotuning of configuration for program execution in GPUs. Concurrency and Computation: Practice and Experience 32, 9 (2020), e5635.
[4]
Yalin Baştardar and Mustafa Özuysal. 2014. Introduction to machine learning. miRNomics: MicroRNA biology and computational analysis (2014), 105--128.
[5]
Ben van Werkhoven. 2019. Kernel Tuner: A search-optimizing GPU code auto- tuner. Future Generation Computer Systems 90 (2019), 347--358.

Index Terms

  1. Improve the Performance of Parallel Reduction on General-Purpose Graphics Processor Units Using Prediction Models

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    RACS '23: Proceedings of the 2023 International Conference on Research in Adaptive and Convergent Systems
    August 2023
    251 pages
    ISBN:9798400702280
    DOI:10.1145/3599957
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 August 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Configuration Setting
    2. Execution Time
    3. Kernel Function
    4. Logistic Regression
    5. Occupancy

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    RACS '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 393 of 1,581 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 19
      Total Downloads
    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 18 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media

    pFad - Phonifier reborn

    Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

    Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


    Alternative Proxies:

    Alternative Proxy

    pFad Proxy

    pFad v3 Proxy

    pFad v4 Proxy