Improving Performance of Floating Point Division on GPU and MIC

Huang, Kun; Chen, Yifeng

doi:10.1007/978-3-319-27122-4_48

Kun Huang¹⁷ &
Yifeng Chen¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9529))

Included in the following conference series:

International Conference on Algorithms and Architectures for Parallel Processing

1404 Accesses
2 Citations

Abstract

Floating point computing ability is an important concern in high performance scientific application and engineering computing. Although as a fundamental operation, floating point division (or reciprocal) has long been much less efficiency compared with addition and multiplication. Architectures like GPU and MIC even have no instruction for such division in the instruction level. This paper proposes a fast approximation algorithm to estimate the division of floating point numbers in IEEE 754 format based on existing instructions which in most cases are accurate enough for practical computing. It consists of a predicting step and an iterating step like most iterative numerical algorithm. The predicting step makes use of the property of IEEE 754 format to calculate estimation by only one integer subtraction instruction. The iterating step improves the accuracy by fast iterations in about ten instructions. This new algorithm is extremely easy to implement and shows a great performance in practical experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Implementation of Multiple-Precision Floating-Point Arithmetic on Intel Xeon Phi Coprocessors

Data-Parallel High-Precision Multiplication on Graphics Processing Units

Performance Evaluation of Accelerated Complex Multiple-Precision LU Decomposition

References

IEEE standard for floating-point arithmetic: IEEE Std 754–2008, 1–70 (2008)
Google Scholar
Flynn, M.J.: On division by functional iteration. IEEE Trans. Comput. 100(8), 702–706 (1970)
Article MATH Google Scholar
Goldschmidt, R.E.: Applications of division by convergence. Ph.D. thesis, Massachusetts Institute of Technology (1964)
Google Scholar
Granlund, T., Montgomery, P.L.: Division by invariant integers using multiplication. In: ACM SIGPLAN Notices, vol. 29, pp. 61–72. ACM (1994)
Google Scholar
Hwang, K., Louri, A.: Optical multiplication and division using modified-signed-digit symbolic substitution. Opt. Eng. 28(4), 284364–284364 (1989)
Article Google Scholar
Jeffers, J., Reinders, J.: Intel Xeon Phi coprocessor high-performance programming. Newnes (2013)
Google Scholar
Markstein, P.: Software division and square root using Goldschmidts algorithms. In: Proceedings of the 6th Conference on Real Numbers and Computers (RNC6). vol. 123, pp. 146–157 (2004)
Google Scholar
NVIDIA: CUDA C programming guide. http://docs.nvidia.com/cuda/cuda-c-programming-guide/
Oberman, S.F.: Floating point division and square root algorithms and implementation in the AMD-K7 TM microprocessor. In: 14th IEEE Symposium on Computer Arithmetic, Proceedings, pp. 106–115. IEEE (1999)
Google Scholar
Oberman, S.F., Flynn, M.J.: Design issues in division and other floating-point operations. IEEE Trans. Comput. 46(2), 154–161 (1997)
Article Google Scholar
Oberman, S.F.: Design issues in high performance floating point arithmetic units. Ph.D. thesis, Citeseer (1996)
Google Scholar
Patterson, D.A., Hennessy, J.L.: Computer organization and design: the hardware/software interface. Newnes (2013)
Google Scholar
Piñeiro, J.A., Bruguera, J.D.: High-speed double-precision computation of reciprocal, division, square root, and inverse square root. IEEE Trans. Comput. 51(12), 1377–1388 (2002)
Article MathSciNet Google Scholar
Sharangpani, H., Barton, M.: Statistical analysis of floating point flaw in the pentium processor. Intel Corporation (1994)
Google Scholar
Soderquist, P., Leeser, M.: Division and square root: choosing the right implementation. IEEE Micro 17(4), 56–66 (1997)
Article Google Scholar
Wikipedia: Double-precision floating-point format. https://en.wikipedia.org/wiki/Double-precision_floating-point_format

Download references

Acknowledgments

We thank anonymous reviewers for comments and suggestions on the submitted version of this paper. Special thanks to the suggestions from members of the Parallel Software Group of EECS, Peking University.

This research is supported by the National HTRD 863 Plan under Grants No. 2012AA010902, 2012AA010903; and NSFC Grants No. 61170053, 61432018, 61379048.

Author information

Authors and Affiliations

Department of Computer Science, School of EECS, Peking University, Beijing, 100871, China
Kun Huang & Yifeng Chen

Authors

Kun Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yifeng Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kun Huang .

Editor information

Editors and Affiliations

Central South University, Changsha, China
Guojun Wang
The University of Sydney, Sydney, New South Wales, Australia
Albert Zomaya
University of Murcia, Murcia, Murcia, Spain
Gregorio Martinez
Hunan University , Changsha, China
Kenli Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huang, K., Chen, Y. (2015). Improving Performance of Floating Point Division on GPU and MIC. In: Wang, G., Zomaya, A., Martinez, G., Li, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2015. Lecture Notes in Computer Science(), vol 9529. Springer, Cham. https://doi.org/10.1007/978-3-319-27122-4_48

Download citation

DOI: https://doi.org/10.1007/978-3-319-27122-4_48
Published: 16 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27121-7
Online ISBN: 978-3-319-27122-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improving Performance of Floating Point Division on GPU and MIC

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Implementation of Multiple-Precision Floating-Point Arithmetic on Intel Xeon Phi Coprocessors

Data-Parallel High-Precision Multiplication on Graphics Processing Units

Performance Evaluation of Accelerated Complex Multiple-Precision LU Decomposition

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Improving Performance of Floating Point Division on GPU and MIC

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Implementation of Multiple-Precision Floating-Point Arithmetic on Intel Xeon Phi Coprocessors

Data-Parallel High-Precision Multiplication on Graphics Processing Units

Performance Evaluation of Accelerated Complex Multiple-Precision LU Decomposition

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.