March 3, 2026Open Access

Mixed-precision quantization techniques for energyefficient DNN inference

Key Points

Results indicate that mixed precision quantization significantly reduces model bitwidth assignments, enhancing computational efficiency.
Two quantization-aware training methods were implemented, showcasing a potential to maintain accuracy while lowering resource demands.
The analysis showed the ability to deploy neural networks with reduced bitwidth, affirming their feasibility in practical applications.
Implications suggest that such techniques can lead to energy-efficient inference, which is crucial for large-scale deployment of deep learning models.

Abstract

In this project, we aimed to enhance the computational efficiency and deployment feasibility of neural networks through mixed precision quantization. We implemented two quantization-aware training (QAT) methods. Our results demonstrated significant reductions in model bitwidth assignments while maintaining accuracy comparable to fullprecision models.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Omar Lahyani

Journals

QRU Quaderns de Recerca en Urbanisme

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Mixed-precision quantization techniques for energyefficient DNN inference

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study