Abstract
This study addresses the deployment of vision Transformer models on edge computing devices by proposing an FPGA-based hardware-software co-acceleration scheme aimed at solving the problem of the significant divide that exists between the deployment of deep neural network models on edge devices and the supply-demand of hardware computing resources. In this paper, we use a structured hybrid pruning method, which starts with a channel pruning strategy that automatically identifies the impact of parameters in each layer of the model on the results prunes them accordingly, and then performs a Top-k pruning on the saved weight matrix after the first pruning is finished. The method is hardware-friendly and can effectively reduce the model complexity. To adapt to sparse matrix computation, a specific matrix multiplication optimization module is designed in this study. Experimental results show that the scheme achieves 11.52- and 1.56-times improvement in throughput compared to using traditional CPU and GPU platforms, respectively, while maintaining model accuracy. Future research will focus on further trade-offs between model complexity and performance, introducing adaptive mechanisms, and applications in lower-power hardware environments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Qiu, M., Li, J.: Real-Time Embedded Systems: Optimization, Synthesis, and Networking, CRC Press, Boca Raton (2011)
Qiu, M., Guo, M., et al.: Loop scheduling and bank type assignment for heterogeneous multi-bank memory. JPDC 69(6), 546–558 (2009)
Cui, Y., Cao, K., et al.: Client scheduling and resource management for efficient training in heterogeneous IoT-edge federated learning. IEEE TCAD (2021)
Zhang, Y., Qiu, M., Gao, H.: Communication-efficient stochastic gradient descent ascent with momentum algorithms. In: IJCAI (2023)
Ling, C., Jiang, J., et al.: Deep graph representation learning and optimization for influence maximization. In: ICML (2023)
Qiu, H., Zheng, Q., et al.: Toward secure and efficient deep learning inference in dependable IoT systems. IEEE Internet Things J. 8(5), 3180–3188 (2020)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Huang, H., Chaturvedi, V., et al.: Throughput maximization for periodic real-time systems under the maximal temperature constraint. ACM TECS 13(2s), 1–22 (2014)
Song, Y., Li, Y., Jia, L., Qiu, M.: Retraining strategy-based domain adaption network for intelligent fault diagnosis. IEEE Trans. Ind. Inform. 16(9), 6163–6171 (2019)
Qiu, M., Dai, W., Vasilakos, A.: Loop parallelism maximization for multimedia data processing in mobile vehicular clouds. IEEE Trans. Cloud Comput. 7(1), 250–258 (2016)
Qiu, M., Ming, Z., et al.: Phase-change memory optimization for green cloud with genetic algorithm. IEEE Trans. Comput. 64(12), 3528–3540 (2015)
Han, S., Pool, J., Tran, J., et al.: Learning both weights and connections for efficient neural network. In: Advances in Neural Information Processing Systems (2015)
Courbariaux, M., Bengio, Y., David, J.P.: Binaryconnect: training deep neural networks with binary weights during propagations. In: Advances in Neural Information Processing Systems (2015)
Zhang, M., Yu, X., Rong, J., et al.: Repnas: searching for efficient re-parameterizing blocks. arXiv preprint arXiv:210903508 (2021)
Ding, X., Xia, C., Zhang, X., et al.: RepMLP: re-parameterizing convolutions into fully-connected layers for image recognition. arXiv preprint arXiv:210501883 (2021)
Cai, H., Gan, C., Wang, T., et al.: Once-for-all: train one network and specialize it for efficient deployment. arXiv preprint arXiv:1908.09791 (2019)
Jouppi, N.P., Young, C., Patil, N., et al.: In-datacenter performance analysis of a tensor processing unit. In: 44th Annual International Symposium on Computer Architecture, pp. 1–12 (2017)
Zhu, M., et al.: Vision transformer pruning. arXiv preprint arXiv:2104.08500 (2021)
Lin, Z., Liu, J.Z., Yang, Z., et al.: Pruning redundant mappings in transformer models via spectral-normalized identity prior. arXiv preprint arXiv:2010.01791 (2020)
He, H., Liu, J., Pan, Z., et al.: Pruning self-attentions into convolutional layers in single path. arXiv preprint arXiv:2111.11802 (2021)
Li, B., Pandey, S., Fang, H., et al.: FTrans: energy-efficient acceleration of transformers using FPGA. In: ACM/IEEE International Symposium on Low Power Electronics and Design, pp. 175–180 (2020)
Zhang, X., Wu, Y., Zhou, P., et al.: Algorithm-hardware co-design of attention mechanism on FPGA devices. ACM Trans. Embed. Comput. Syst. (TECS) 20(5s), 1–24 (2021)
Gao, G., Li, W., Li, J., et al.: Feature distillation interaction weighting network for lightweight image super-resolution. In: AAAI Conference on Artificial Intelligence, vol. 36, no. 1, pp. 661–669 (2022)
Fang, G., Ma, X., Song, M., et al.: Depgraph: towards any structural pruning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16091–16101 (2023)
Xu, Z., Hong, Z., Ding, C., et al.: MobileFaceSwap: a lightweight framework for video face swapping. In: AAAI Conference on Artificial Intelligence, vol. 36, no. 3, pp. 2973–2981 (2022)
Ham, T.J., Jung, S.J., Kim, S., et al.: A^3: accelerating attention mechanisms in neural networks with approximation. In: IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 328–341 (2020)
Cao, S., Zhang, C., Yao, Z., et al.: Efficient and effective sparse LSTM on FPGA with bank-balanced sparsity. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 63–72 (2019)
Khan, H., Khan, A., Khan, Z., et al.: NPE: an FPGA-based overlay processor for natural language processing. arXiv preprint arXiv:2104.06535 (2021)
Park, J., Yoon, H., Ahn, D., et al.: OPTIMUS: optimized matrix multiplication structure for transformer neural network accelerator. Proc. Mach. Learn. Syst. 2, 363–378 (2020)
Fan, Z., Hu, W., Liu, F., et al.: A hardware design framework for computer vision models based on reconfigurable devices. ACM Trans. Reconfig. Technol. Syst. (2023)
Acknowledgments
This paper is supported by the National Key Research and Development Program of China (No. 2021ZD0113304), National Natural Science Foundation of China (U23A20316), the General Program of Natural Science Foundation of China (NSFC) (Grant No. 62072346, No. 61972293), Founded by Joint&Laboratory on Credit Technology. Hubei Higher Education Excellent Youth Science and Technology Innovation Team Project (No. T2022060) and Wuhan City College Research Science Project (2023CYYYJJ01).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Liu, F., Li, H., Chen, Z., Hu, W., He, Y., Wang, F. (2024). ViT Hybrid Channel Fit Pruning Algorithm for Co-optimization of Hardware and Software for Edge Device. In: Cao, C., Chen, H., Zhao, L., Arshad, J., Asyhari, T., Wang, Y. (eds) Knowledge Science, Engineering and Management. KSEM 2024. Lecture Notes in Computer Science(), vol 14885. Springer, Singapore. https://doi.org/10.1007/978-981-97-5495-3_25
Download citation
DOI: https://doi.org/10.1007/978-981-97-5495-3_25
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-5494-6
Online ISBN: 978-981-97-5495-3
eBook Packages: Computer ScienceComputer Science (R0)