The proliferation of AI applications across diverse domains has driven the evolution of AI accelerators toward higher performance and energy efficiency. This paper addresses the critical challenge of task scheduling in AI accelerators by introducing a fully hardware-managed scheduling system. Our approach leverages Operator Completion Status Registers (OCSRs) and a novel computation-scheduling instruction set to minimize software overhead and maximize execution parallelism. The co-designed hardware-software solution comprises: (1) a dedicated hardware scheduling unit with a complete instruction pipeline, (2) a compiler that maps operators to scheduling instructions while managing OCSR allocation, and (3) a lightweight runtime for efficient task dispatch. Experimental results demonstrate that our system significantly reduces scheduling latency and improves overall throughput, achieving an average performance gain of approximately 30% across multiple CNN models while maintaining minimal area overhead of only 7.41%. The proposed architecture establishes a new paradigm for high-efficiency AI accelerator design.
Building similarity graph...
Analyzing shared references across papers
Loading...
Libo Cheng
Liang Yang
Jian Shao
Journal of King Saud University - Computer and Information Sciences
China Electronics Technology Group Corporation
Building similarity graph...
Analyzing shared references across papers
Loading...
Cheng et al. (Thu,) studied this question.
www.synapsesocial.com/papers/69a7681bbadf0bb9e87e39da — DOI: https://doi.org/10.1007/s44443-026-00513-z