Efficient cuDNN-Compatible Convolution-Pooling on the GPU

Suita, Shunsuke; Nishimura, Takahiro; Tokura, Hiroki; Nakano, Koji; Ito, Yasuaki; Kasagi, Akihiko; Tabaru, Tsuguchika

doi:10.1007/978-3-030-43222-5_5

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12044))

Included in the following conference series:

International Conference on Parallel Processing and Applied Mathematics

817 Accesses

Abstract

The main contribution of this paper is to show efficient implementations of the convolution-pooling in the GPU, in which the pooling follows the multiple convolution. Since the multiple convolution and the pooling operations are performed alternately in earlier stages of many Convolutional Neural Networks (CNNs), it is very important to accelerate the convolution-pooling. Our new GPU implementation uses two techniques, (1) convolution interchange with direct sum, and (2) conversion to matrix multiplication. By these techniques, the computational and memory access cost are reduced. Further the convolution interchange is converted to matrix multiplication, which can be computed by cuBLAS very efficiently. Experimental results using Telsa V100 GPU show that our new GPU implementation compatible with cuDNN for the convolution-pooling is at least 1.34 times faster than the multiple convolution and then the pooling by cuDNN, the most popular library of primitives to implement the CNNs in the GPU.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Optimizing Winograd Convolution on GPUs via Partial Kernel Fusion

A Novel GPU-Based Efficient Approach for Convolutional Neural Networks with Small Filters

Article 28 March 2016

Improving Performance of Convolutional Neural Networks by Separable Filters on GPU

References

Cheng, Y., Wang, D., Zhou, P., Zhang, T.: A survey of model compression and acceleration for deep neural networks. CoRR abs/1710.09282, October 2017
Google Scholar
Chetlur, S., Woolley, C., Vandermersch, P., Cohen, J., Tran, J., Catanzaro, B., Shelhamer, E.: cuDNN: efficient primitives for deep learning. CoRR abs/1410.0759, August 2014
Google Scholar
Emoto, Y., Funasaka, S., Tokura, H., Honda, T., Nakano, K., Ito, Y.: An optimal parallel algorithm for computing the summed area table on the GPU. In: Proceedings of International Parallel and Distributed Processing Symposium Workshops, pp. 763–772, February 2018
Google Scholar
Honda, T., Yamamoto, S., Honda, H., Nakano, K., Ito, Y.: Simple and fast parallel algorithms for the Voronoi map and the Euclidean distance map, with GPU implementations. In: Proceedings of International Conference on Parallel Processing, pp. 362–371, August 2017
Google Scholar
Hwu, W.W.: GPU Computing Gems, Emerald edn. Morgan Kaufmann, Burlington (2011)
Google Scholar
Kasagi, A., Nakano, K., Ito, Y.: Parallel algorithms for the summed area table on the asynchronous hierarchical memory machine, with GPU implementations. In: Proceedings of International Conference on Parallel Processing (ICPP), pp. 251–260, September 2014
Google Scholar
Kasagi, A., Tabaru, T., Tamura, H.: Fast algorithm using summed area tables with unified layer performing convolution and average pooling. In: Proceedings of International Workshop on Machine Learning for Signal Processing, September 2017
Google Scholar
Li, C., Yang, Y., Feng, M., Chakradhar, S., Zhou, H.: Optimizing memory efficiency for deep convolutional neural networks on GPUs. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, November 2016
Google Scholar
Matsumura, N., Tokura, H., Kuroda, Y., Ito, Y., Nakano, K.: Tile art image generation using conditional generative adversarial networks. In: Proceedings of International Symposium on Computing and Networking Workshops, pp. 209–215 (2018)
Google Scholar
NVIDIA Corporation: NVIDIA CUDA C programming guide version 4.0 (2011)
Google Scholar
NVIDIA Corporation: CUBLAS LIBRARY user guide, February 2019. https://docs.nvidia.com/cuda/cublas/index.html
NVIDIA Corporation: CUDNN developer guide, February 2019. https://docs.nvidia.com/deeplearning/sdk/cudnn-developer-guide/index.html
Ogawa, K., Ito, Y., Nakano, K.: Efficient Canny edge detection using a GPU. In: Proceedings of International Conference on Networking and Computing, pp. 279–280. IEEE CS Press, November 2010
Google Scholar
Sze, V., Chen, Y.H., Yang, T.J., Emer, J.S.: Efficient processing of deep neural networks: a tutorial and survey. Proc. IEEE 105(12), 2295–2329 (2017)
Article Google Scholar
Takeuchi, Y., Takafuji, D., Ito, Y., Nakano, K.: ASCII art generation using the local exhaustive search on the GPU. In: Proceedings of International Symposium on Computing and Networking, pp. 194–200, December 2013
Google Scholar
Zhang, Q., Zhang, M., Chen, T., Sun, Z., Ma, Y., Yu, B.: Recent advances in convolutional neural network acceleration. Neurocomputing 323, 37–51 (2019)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Engineering, Hiroshima University, 1-4-1 Kagamiyama, Higashi-Hiroshima, 739-8527, Japan
Shunsuke Suita, Takahiro Nishimura, Hiroki Tokura, Koji Nakano & Yasuaki Ito
Fujitsu Laboratories Ltd., 4-1-1 Kamikodanaka, Nakahara-ku, Kawasaki, Kanagawa, 211-8588, Japan
Akihiko Kasagi & Tsuguchika Tabaru

Authors

Shunsuke Suita
View author publications
You can also search for this author in PubMed Google Scholar
Takahiro Nishimura
View author publications
You can also search for this author in PubMed Google Scholar
Hiroki Tokura
View author publications
You can also search for this author in PubMed Google Scholar
Koji Nakano
View author publications
You can also search for this author in PubMed Google Scholar
Yasuaki Ito
View author publications
You can also search for this author in PubMed Google Scholar
Akihiko Kasagi
View author publications
You can also search for this author in PubMed Google Scholar
Tsuguchika Tabaru
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Koji Nakano .

Editor information

Editors and Affiliations

Czestochowa University of Technology, Czestochowa, Poland
Roman Wyrzykowski
University of Southern California, Marina del Rey, CA, USA
Ewa Deelman
University of Tennessee, Knoxville, TN, USA
Jack Dongarra
Czestochowa University of Technology, Czestochowa, Poland
Konrad Karczewski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Suita, S. et al. (2020). Efficient cuDNN-Compatible Convolution-Pooling on the GPU. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2019. Lecture Notes in Computer Science(), vol 12044. Springer, Cham. https://doi.org/10.1007/978-3-030-43222-5_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-43222-5_5
Published: 19 March 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-43221-8
Online ISBN: 978-3-030-43222-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Efficient cuDNN-Compatible Convolution-Pooling on the GPU

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Optimizing Winograd Convolution on GPUs via Partial Kernel Fusion

A Novel GPU-Based Efficient Approach for Convolutional Neural Networks with Small Filters

Improving Performance of Convolutional Neural Networks by Separable Filters on GPU

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Efficient cuDNN-Compatible Convolution-Pooling on the GPU

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Optimizing Winograd Convolution on GPUs via Partial Kernel Fusion

A Novel GPU-Based Efficient Approach for Convolutional Neural Networks with Small Filters

Improving Performance of Convolutional Neural Networks by Separable Filters on GPU

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.