March 3, 2026Open Access

A fully hardware-managed scheduling architecture for AI accelerators

Key Points

Our system significantly reduces scheduling latency, and improves overall throughput by about 30%.
The architecture utilizes operator completion status registers to reduce software overhead effectively.
Experimental validation shows minimal area overhead of only 7.41% while enhancing performance.
A dedicated hardware scheduling unit and a lightweight runtime work together to streamline task dispatch.

Abstract

The proliferation of AI applications across diverse domains has driven the evolution of AI accelerators toward higher performance and energy efficiency. This paper addresses the critical challenge of task scheduling in AI accelerators by introducing a fully hardware-managed scheduling system. Our approach leverages Operator Completion Status Registers (OCSRs) and a novel computation-scheduling instruction set to minimize software overhead and maximize execution parallelism. The co-designed hardware-software solution comprises: (1) a dedicated hardware scheduling unit with a complete instruction pipeline, (2) a compiler that maps operators to scheduling instructions while managing OCSR allocation, and (3) a lightweight runtime for efficient task dispatch. Experimental results demonstrate that our system significantly reduces scheduling latency and improves overall throughput, achieving an average performance gain of approximately 30% across multiple CNN models while maintaining minimal area overhead of only 7.41%. The proposed architecture establishes a new paradigm for high-efficiency AI accelerator design.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Libo Cheng

Liang Yang

Jian Shao

Journals

Journal of King Saud University - Computer and Information Sciences

Actions

Institutions

China Electronics Technology Group Corporation

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

A fully hardware-managed scheduling architecture for AI accelerators

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study