Deep neural networks (DNNs) have become foundational to modern applications, yet their substantial computational and memory demands pose major obstacles to energy-efficient inference. Moreover, the rapidly expanding parameter footprint and structural diversity further amplify data movement, leading to substantial energy consumption and latency overheads. To address these issues, we propose a novel accelerator, S-TRAC , that dynamically adjusts sparsity t h r esholds through a lgorithm-hardware c o-design to enable efficient DNN inference. At the algorithm level, we employ a static sparse-dense storage format and a dynamic bit-processing scheme to skip non-contributing bits without sacrificing weight precision. At the hardware level, we introduce a column-wise processing-element array with LUT-based shift-accumulate multiplication and a global partial-sum accumulator to sustain energy-efficient execution. To support the proposed algorithm-hardware co-design, we propose a RISC-V extension that coordinates the read, arrangement, multiplication, accumulation, and write stages to support end-to-end accelerator execution. Experimental results show that S-TRAC increases effective sparsity by an average factor of 8.4 × across the evaluated DNN models, enabling substantial memory savings. S-TRAC design achieves 11.16 × energy efficiency and 37.03 × hardware efficiency improvements over state-of-the-art solutions.
Li et al. (Tue,) studied this question.