December 18, 2024

Low-Bit Mixed-Precision Quantization and Acceleration of CNN for FPGA Deployment

Key Points

Key points are not available for this paper at this time.

Abstract

Nowadays, the deployment of intelligent networks on hardware devices for real-time applications is gaining popularity in both academia and industry. However, on-chip resources and power consumption are usually limited, making quantization a crucial step due to its ability to reduce the computational footprint. To this point, mixed-precision bit-width allocation for weights is an effective way to reduce the overall memory footprint while maximizing model accuracy, which can generally be divided into two schemes: per-layer quantization and per-channel quantization. However, the latter has a large searching space, making it hard to obtain optimal solutions, so currently most research focuses on the former scheme. Additionally, there is almost no research targeting the design and optimization of FPGA accelerator structures for per-channel quantization. Motivated by these considerations, this paper first proposes a mixed-precision bit allocation method, called Hierarchical Bit Programming (HBP), which reduces the magnitude of the search space by applying group optimization on channel dimension and consequently reduce the computational complexity of the solving process. Then a loop optimization strategy is presented based on quantization manner, and models are established to evaluate FPGA performance and resource requirement, enabling the evaluation and analysis of accelerator performance bottlenecks and optimization boundaries in the early phase of system design. Based on the optimization results, a hardware accelerator design structure is presented. Several mainstream CNN models are used for evaluation, and on-board tests are conducted on the Zynq MPSoC XCZU15EG FPGA platform. The experiment results show that our HBP method could achieve an improvement of more than 2% on accuracy compared with other related methods. Compared with CPU and GPU, the proposed FPGA accelerator yields speedups of 28.8%, 46.2%, 31.0%, and 35.9% in energy efficiency on VGG-16, ResNet18, ResNet34, and ResNet50, respectively, and the processing latency could be 25% lower than state-of-the-art methods.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Jianrong Wang

Zhijun He

Hongbo Zhao

Journals

IEEE Transactions on Emerging Topics in Computational Intelligence

Actions

Institutions

Beihang University

China Academy of Space Technology

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Low-Bit Mixed-Precision Quantization and Acceleration of CNN for FPGA Deployment

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider