Key points are not available for this paper at this time.
Vision transformers (ViTs) have attracted increasing attention in visual tasks due to their strong global modeling capability. However, compared with conventional convolutional neural networks, ViTs typically involve substantially more parameters and higher computational complexity, which poses significant challenges for deployment on resource-constrained hardware platforms. Model quantization, as an effective model compression technique, can significantly reduce inference overhead; however, existing methods often struggle to strike a good balance between model accuracy and hardware deployment efficiency. To address this issue, this paper proposes a mixed-precision quantization framework for ViTs under hardware constraints. First, based on the distribution characteristics of activation values at different layers, we employ uniform and non-uniform quantization strategies, respectively, to improve the adaptability of the quantized representation. To further guide the reasonable allocation of quantization bit width, we introduce a sensitivity metric based on the trace of the Hessian matrix, which evaluates the sensitivity of each layer to quantization by measuring the impact of quantization perturbations on the loss. Based on this, the mixed-precision bit-width allocation problem is formulated as an optimal binary mask search problem under hardware constraints and solved using integer linear programming to explicitly account for resource constraints in actual hardware deployment. Experimental results demonstrate that the proposed method achieves an effective balance among model accuracy, model size, and inference latency. Extensive experiments on benchmark datasets such as ImageNet and COCO demonstrate that our approach achieves lower model size and computational cost than existing methods, while maintaining competitive accuracy.
Building similarity graph...
Analyzing shared references across papers
Loading...
Weihong He
Ruifeng Rao
Yuli Fu
Scientific Reports
South China University of Technology
Nanfang Hospital
Building similarity graph...
Analyzing shared references across papers
Loading...
He et al. (Wed,) studied this question.
www.synapsesocial.com/papers/6a08093ca487c87a6a40b248 — DOI: https://doi.org/10.1038/s41598-026-53062-w