ViT Hybrid Channel Fit Pruning Algorithm for Co-optimization of Hardware and Software for Edge Device

Liu, Fang; Li, Heyuan; Chen, Ziyu; Hu, Wei; He, Yanxiang; Wang, Fei

doi:10.1007/978-981-97-5495-3_25

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14885))

Included in the following conference series:

International Conference on Knowledge Science, Engineering and Management

524 Accesses

Abstract

This study addresses the deployment of vision Transformer models on edge computing devices by proposing an FPGA-based hardware-software co-acceleration scheme aimed at solving the problem of the significant divide that exists between the deployment of deep neural network models on edge devices and the supply-demand of hardware computing resources. In this paper, we use a structured hybrid pruning method, which starts with a channel pruning strategy that automatically identifies the impact of parameters in each layer of the model on the results prunes them accordingly, and then performs a Top-k pruning on the saved weight matrix after the first pruning is finished. The method is hardware-friendly and can effectively reduce the model complexity. To adapt to sparse matrix computation, a specific matrix multiplication optimization module is designed in this study. Experimental results show that the scheme achieves 11.52- and 1.56-times improvement in throughput compared to using traditional CPU and GPU platforms, respectively, while maintaining model accuracy. Future research will focus on further trade-offs between model complexity and performance, introducing adaptive mechanisms, and applications in lower-power hardware environments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 199.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A generic deep learning architecture optimization method for edge device based on start-up latency reduction

Article Open access 18 June 2024

HILP: hardware-in-loop pruning of convolutional neural networks towards inference acceleration

Article 05 March 2024

LPViT: Low-Power Semi-structured Pruning for Vision Transformers

References

Qiu, M., Li, J.: Real-Time Embedded Systems: Optimization, Synthesis, and Networking, CRC Press, Boca Raton (2011)
Google Scholar
Qiu, M., Guo, M., et al.: Loop scheduling and bank type assignment for heterogeneous multi-bank memory. JPDC 69(6), 546–558 (2009)
Google Scholar
Cui, Y., Cao, K., et al.: Client scheduling and resource management for efficient training in heterogeneous IoT-edge federated learning. IEEE TCAD (2021)
Google Scholar
Zhang, Y., Qiu, M., Gao, H.: Communication-efficient stochastic gradient descent ascent with momentum algorithms. In: IJCAI (2023)
Google Scholar
Ling, C., Jiang, J., et al.: Deep graph representation learning and optimization for influence maximization. In: ICML (2023)
Google Scholar
Qiu, H., Zheng, Q., et al.: Toward secure and efficient deep learning inference in dependable IoT systems. IEEE Internet Things J. 8(5), 3180–3188 (2020)
Article Google Scholar
Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Huang, H., Chaturvedi, V., et al.: Throughput maximization for periodic real-time systems under the maximal temperature constraint. ACM TECS 13(2s), 1–22 (2014)
Article Google Scholar
Song, Y., Li, Y., Jia, L., Qiu, M.: Retraining strategy-based domain adaption network for intelligent fault diagnosis. IEEE Trans. Ind. Inform. 16(9), 6163–6171 (2019)
Article Google Scholar
Qiu, M., Dai, W., Vasilakos, A.: Loop parallelism maximization for multimedia data processing in mobile vehicular clouds. IEEE Trans. Cloud Comput. 7(1), 250–258 (2016)
Article Google Scholar
Qiu, M., Ming, Z., et al.: Phase-change memory optimization for green cloud with genetic algorithm. IEEE Trans. Comput. 64(12), 3528–3540 (2015)
Article MathSciNet Google Scholar
Han, S., Pool, J., Tran, J., et al.: Learning both weights and connections for efficient neural network. In: Advances in Neural Information Processing Systems (2015)
Google Scholar
Courbariaux, M., Bengio, Y., David, J.P.: Binaryconnect: training deep neural networks with binary weights during propagations. In: Advances in Neural Information Processing Systems (2015)
Google Scholar
Zhang, M., Yu, X., Rong, J., et al.: Repnas: searching for efficient re-parameterizing blocks. arXiv preprint arXiv:210903508 (2021)
Ding, X., Xia, C., Zhang, X., et al.: RepMLP: re-parameterizing convolutions into fully-connected layers for image recognition. arXiv preprint arXiv:210501883 (2021)
Cai, H., Gan, C., Wang, T., et al.: Once-for-all: train one network and specialize it for efficient deployment. arXiv preprint arXiv:1908.09791 (2019)
Jouppi, N.P., Young, C., Patil, N., et al.: In-datacenter performance analysis of a tensor processing unit. In: 44th Annual International Symposium on Computer Architecture, pp. 1–12 (2017)
Google Scholar
Zhu, M., et al.: Vision transformer pruning. arXiv preprint arXiv:2104.08500 (2021)
Lin, Z., Liu, J.Z., Yang, Z., et al.: Pruning redundant mappings in transformer models via spectral-normalized identity prior. arXiv preprint arXiv:2010.01791 (2020)
He, H., Liu, J., Pan, Z., et al.: Pruning self-attentions into convolutional layers in single path. arXiv preprint arXiv:2111.11802 (2021)
Li, B., Pandey, S., Fang, H., et al.: FTrans: energy-efficient acceleration of transformers using FPGA. In: ACM/IEEE International Symposium on Low Power Electronics and Design, pp. 175–180 (2020)
Google Scholar
Zhang, X., Wu, Y., Zhou, P., et al.: Algorithm-hardware co-design of attention mechanism on FPGA devices. ACM Trans. Embed. Comput. Syst. (TECS) 20(5s), 1–24 (2021)
Google Scholar
Gao, G., Li, W., Li, J., et al.: Feature distillation interaction weighting network for lightweight image super-resolution. In: AAAI Conference on Artificial Intelligence, vol. 36, no. 1, pp. 661–669 (2022)
Google Scholar
Fang, G., Ma, X., Song, M., et al.: Depgraph: towards any structural pruning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16091–16101 (2023)
Google Scholar
Xu, Z., Hong, Z., Ding, C., et al.: MobileFaceSwap: a lightweight framework for video face swapping. In: AAAI Conference on Artificial Intelligence, vol. 36, no. 3, pp. 2973–2981 (2022)
Google Scholar
Ham, T.J., Jung, S.J., Kim, S., et al.: A^3: accelerating attention mechanisms in neural networks with approximation. In: IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 328–341 (2020)
Google Scholar
Cao, S., Zhang, C., Yao, Z., et al.: Efficient and effective sparse LSTM on FPGA with bank-balanced sparsity. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 63–72 (2019)
Google Scholar
Khan, H., Khan, A., Khan, Z., et al.: NPE: an FPGA-based overlay processor for natural language processing. arXiv preprint arXiv:2104.06535 (2021)
Park, J., Yoon, H., Ahn, D., et al.: OPTIMUS: optimized matrix multiplication structure for transformer neural network accelerator. Proc. Mach. Learn. Syst. 2, 363–378 (2020)
Google Scholar
Fan, Z., Hu, W., Liu, F., et al.: A hardware design framework for computer vision models based on reconfigurable devices. ACM Trans. Reconfig. Technol. Syst. (2023)
Google Scholar

Download references

Acknowledgments

This paper is supported by the National Key Research and Development Program of China (No. 2021ZD0113304), National Natural Science Foundation of China (U23A20316), the General Program of Natural Science Foundation of China (NSFC) (Grant No. 62072346, No. 61972293), Founded by Joint&Laboratory on Credit Technology. Hubei Higher Education Excellent Youth Science and Technology Innovation Team Project (No. T2022060) and Wuhan City College Research Science Project (2023CYYYJJ01).

Author information

Authors and Affiliations

School of Computer Science, Wuhan University, Wuhan, 430072, China
Fang Liu & Yanxiang He
School of Computer Science, Wuhan University of Science and Technology, Wuhan, 430081, China
Heyuan Li, Ziyu Chen & Wei Hu
School of Automobile and Transportation, Chengdu Technological University, Chengdu, 611730, China
Fang Liu & Fei Wang
Department of Information Engineering, Wuhan City College, Wuhan, 430083, China
Fang Liu

Authors

Fang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Heyuan Li
View author publications
You can also search for this author in PubMed Google Scholar
Ziyu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Wei Hu
View author publications
You can also search for this author in PubMed Google Scholar
Yanxiang He
View author publications
You can also search for this author in PubMed Google Scholar
Fei Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fei Wang .

Editor information

Editors and Affiliations

Chinese Academy of Sciences, Beijing, China
Cungeng Cao
Zhejiang University, Zhejiang, China
Huajun Chen
Emory University, Atlanta, GA, USA
Liang Zhao
Birmingham City University, Birmingham, UK
Junaid Arshad
Monash University, Banten, Indonesia
Taufiq Asyhari
Birmingham City University, Birmingham, UK
Yonghao Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, F., Li, H., Chen, Z., Hu, W., He, Y., Wang, F. (2024). ViT Hybrid Channel Fit Pruning Algorithm for Co-optimization of Hardware and Software for Edge Device. In: Cao, C., Chen, H., Zhao, L., Arshad, J., Asyhari, T., Wang, Y. (eds) Knowledge Science, Engineering and Management. KSEM 2024. Lecture Notes in Computer Science(), vol 14885. Springer, Singapore. https://doi.org/10.1007/978-981-97-5495-3_25

Download citation

DOI: https://doi.org/10.1007/978-981-97-5495-3_25
Published: 26 July 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-5494-6
Online ISBN: 978-981-97-5495-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

ViT Hybrid Channel Fit Pruning Algorithm for Co-optimization of Hardware and Software for Edge Device

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A generic deep learning architecture optimization method for edge device based on start-up latency reduction

HILP: hardware-in-loop pruning of convolutional neural networks towards inference acceleration

LPViT: Low-Power Semi-structured Pruning for Vision Transformers

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

ViT Hybrid Channel Fit Pruning Algorithm for Co-optimization of Hardware and Software for Edge Device

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A generic deep learning architecture optimization method for edge device based on start-up latency reduction

HILP: hardware-in-loop pruning of convolutional neural networks towards inference acceleration

LPViT: Low-Power Semi-structured Pruning for Vision Transformers

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.