May 16, 2026Open Access

Hessian-guided mixed-precision quantization of vision transformers under hardware constraints

Key Points

Key points are not available for this paper at this time.

Abstract

Vision transformers (ViTs) have attracted increasing attention in visual tasks due to their strong global modeling capability. However, compared with conventional convolutional neural networks, ViTs typically involve substantially more parameters and higher computational complexity, which poses significant challenges for deployment on resource-constrained hardware platforms. Model quantization, as an effective model compression technique, can significantly reduce inference overhead; however, existing methods often struggle to strike a good balance between model accuracy and hardware deployment efficiency. To address this issue, this paper proposes a mixed-precision quantization framework for ViTs under hardware constraints. First, based on the distribution characteristics of activation values at different layers, we employ uniform and non-uniform quantization strategies, respectively, to improve the adaptability of the quantized representation. To further guide the reasonable allocation of quantization bit width, we introduce a sensitivity metric based on the trace of the Hessian matrix, which evaluates the sensitivity of each layer to quantization by measuring the impact of quantization perturbations on the loss. Based on this, the mixed-precision bit-width allocation problem is formulated as an optimal binary mask search problem under hardware constraints and solved using integer linear programming to explicitly account for resource constraints in actual hardware deployment. Experimental results demonstrate that the proposed method achieves an effective balance among model accuracy, model size, and inference latency. Extensive experiments on benchmark datasets such as ImageNet and COCO demonstrate that our approach achieves lower model size and computational cost than existing methods, while maintaining competitive accuracy.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Weihong He

Ruifeng Rao

Yuli Fu

Journals

Scientific Reports

Actions

Institutions

South China University of Technology

Nanfang Hospital

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Hessian-guided mixed-precision quantization of vision transformers under hardware constraints

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study