Skip to main content

ViT Hybrid Channel Fit Pruning Algorithm for Co-optimization of Hardware and Software for Edge Device

  • Conference paper
  • First Online:
Knowledge Science, Engineering and Management (KSEM 2024)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14885))

  • 524 Accesses

Abstract

This study addresses the deployment of vision Transformer models on edge computing devices by proposing an FPGA-based hardware-software co-acceleration scheme aimed at solving the problem of the significant divide that exists between the deployment of deep neural network models on edge devices and the supply-demand of hardware computing resources. In this paper, we use a structured hybrid pruning method, which starts with a channel pruning strategy that automatically identifies the impact of parameters in each layer of the model on the results prunes them accordingly, and then performs a Top-k pruning on the saved weight matrix after the first pruning is finished. The method is hardware-friendly and can effectively reduce the model complexity. To adapt to sparse matrix computation, a specific matrix multiplication optimization module is designed in this study. Experimental results show that the scheme achieves 11.52- and 1.56-times improvement in throughput compared to using traditional CPU and GPU platforms, respectively, while maintaining model accuracy. Future research will focus on further trade-offs between model complexity and performance, introducing adaptive mechanisms, and applications in lower-power hardware environments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Qiu, M., Li, J.: Real-Time Embedded Systems: Optimization, Synthesis, and Networking, CRC Press, Boca Raton (2011)

    Google Scholar 

  2. Qiu, M., Guo, M., et al.: Loop scheduling and bank type assignment for heterogeneous multi-bank memory. JPDC 69(6), 546–558 (2009)

    Google Scholar 

  3. Cui, Y., Cao, K., et al.: Client scheduling and resource management for efficient training in heterogeneous IoT-edge federated learning. IEEE TCAD (2021)

    Google Scholar 

  4. Zhang, Y., Qiu, M., Gao, H.: Communication-efficient stochastic gradient descent ascent with momentum algorithms. In: IJCAI (2023)

    Google Scholar 

  5. Ling, C., Jiang, J., et al.: Deep graph representation learning and optimization for influence maximization. In: ICML (2023)

    Google Scholar 

  6. Qiu, H., Zheng, Q., et al.: Toward secure and efficient deep learning inference in dependable IoT systems. IEEE Internet Things J. 8(5), 3180–3188 (2020)

    Article  Google Scholar 

  7. Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  8. Huang, H., Chaturvedi, V., et al.: Throughput maximization for periodic real-time systems under the maximal temperature constraint. ACM TECS 13(2s), 1–22 (2014)

    Article  Google Scholar 

  9. Song, Y., Li, Y., Jia, L., Qiu, M.: Retraining strategy-based domain adaption network for intelligent fault diagnosis. IEEE Trans. Ind. Inform. 16(9), 6163–6171 (2019)

    Article  Google Scholar 

  10. Qiu, M., Dai, W., Vasilakos, A.: Loop parallelism maximization for multimedia data processing in mobile vehicular clouds. IEEE Trans. Cloud Comput. 7(1), 250–258 (2016)

    Article  Google Scholar 

  11. Qiu, M., Ming, Z., et al.: Phase-change memory optimization for green cloud with genetic algorithm. IEEE Trans. Comput. 64(12), 3528–3540 (2015)

    Article  MathSciNet  Google Scholar 

  12. Han, S., Pool, J., Tran, J., et al.: Learning both weights and connections for efficient neural network. In: Advances in Neural Information Processing Systems (2015)

    Google Scholar 

  13. Courbariaux, M., Bengio, Y., David, J.P.: Binaryconnect: training deep neural networks with binary weights during propagations. In: Advances in Neural Information Processing Systems (2015)

    Google Scholar 

  14. Zhang, M., Yu, X., Rong, J., et al.: Repnas: searching for efficient re-parameterizing blocks. arXiv preprint arXiv:210903508 (2021)

  15. Ding, X., Xia, C., Zhang, X., et al.: RepMLP: re-parameterizing convolutions into fully-connected layers for image recognition. arXiv preprint arXiv:210501883 (2021)

  16. Cai, H., Gan, C., Wang, T., et al.: Once-for-all: train one network and specialize it for efficient deployment. arXiv preprint arXiv:1908.09791 (2019)

  17. Jouppi, N.P., Young, C., Patil, N., et al.: In-datacenter performance analysis of a tensor processing unit. In: 44th Annual International Symposium on Computer Architecture, pp. 1–12 (2017)

    Google Scholar 

  18. Zhu, M., et al.: Vision transformer pruning. arXiv preprint arXiv:2104.08500 (2021)

  19. Lin, Z., Liu, J.Z., Yang, Z., et al.: Pruning redundant mappings in transformer models via spectral-normalized identity prior. arXiv preprint arXiv:2010.01791 (2020)

  20. He, H., Liu, J., Pan, Z., et al.: Pruning self-attentions into convolutional layers in single path. arXiv preprint arXiv:2111.11802 (2021)

  21. Li, B., Pandey, S., Fang, H., et al.: FTrans: energy-efficient acceleration of transformers using FPGA. In: ACM/IEEE International Symposium on Low Power Electronics and Design, pp. 175–180 (2020)

    Google Scholar 

  22. Zhang, X., Wu, Y., Zhou, P., et al.: Algorithm-hardware co-design of attention mechanism on FPGA devices. ACM Trans. Embed. Comput. Syst. (TECS) 20(5s), 1–24 (2021)

    Google Scholar 

  23. Gao, G., Li, W., Li, J., et al.: Feature distillation interaction weighting network for lightweight image super-resolution. In: AAAI Conference on Artificial Intelligence, vol. 36, no. 1, pp. 661–669 (2022)

    Google Scholar 

  24. Fang, G., Ma, X., Song, M., et al.: Depgraph: towards any structural pruning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16091–16101 (2023)

    Google Scholar 

  25. Xu, Z., Hong, Z., Ding, C., et al.: MobileFaceSwap: a lightweight framework for video face swapping. In: AAAI Conference on Artificial Intelligence, vol. 36, no. 3, pp. 2973–2981 (2022)

    Google Scholar 

  26. Ham, T.J., Jung, S.J., Kim, S., et al.: A^3: accelerating attention mechanisms in neural networks with approximation. In: IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 328–341 (2020)

    Google Scholar 

  27. Cao, S., Zhang, C., Yao, Z., et al.: Efficient and effective sparse LSTM on FPGA with bank-balanced sparsity. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 63–72 (2019)

    Google Scholar 

  28. Khan, H., Khan, A., Khan, Z., et al.: NPE: an FPGA-based overlay processor for natural language processing. arXiv preprint arXiv:2104.06535 (2021)

  29. Park, J., Yoon, H., Ahn, D., et al.: OPTIMUS: optimized matrix multiplication structure for transformer neural network accelerator. Proc. Mach. Learn. Syst. 2, 363–378 (2020)

    Google Scholar 

  30. Fan, Z., Hu, W., Liu, F., et al.: A hardware design framework for computer vision models based on reconfigurable devices. ACM Trans. Reconfig. Technol. Syst. (2023)

    Google Scholar 

Download references

Acknowledgments

This paper is supported by the National Key Research and Development Program of China (No. 2021ZD0113304), National Natural Science Foundation of China (U23A20316), the General Program of Natural Science Foundation of China (NSFC) (Grant No. 62072346, No. 61972293), Founded by Joint&Laboratory on Credit Technology. Hubei Higher Education Excellent Youth Science and Technology Innovation Team Project (No. T2022060) and Wuhan City College Research Science Project (2023CYYYJJ01).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fei Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, F., Li, H., Chen, Z., Hu, W., He, Y., Wang, F. (2024). ViT Hybrid Channel Fit Pruning Algorithm for Co-optimization of Hardware and Software for Edge Device. In: Cao, C., Chen, H., Zhao, L., Arshad, J., Asyhari, T., Wang, Y. (eds) Knowledge Science, Engineering and Management. KSEM 2024. Lecture Notes in Computer Science(), vol 14885. Springer, Singapore. https://doi.org/10.1007/978-981-97-5495-3_25

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-5495-3_25

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-5494-6

  • Online ISBN: 978-981-97-5495-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy