What question did this study set out to answer?

The study aims to develop a scheduling framework that optimizes GPU resource utilization while ensuring real-time performance for DNN applications.

April 12, 2026

FlePo: GPU Multi-task Scheduling Optimization Framework for Dynamic Scenes

Key Points

The study aims to develop a scheduling framework that optimizes GPU resource utilization while ensuring real-time performance for DNN applications.
Developed FlePo, a GPU scheduling framework.
Implemented Adaptive Padding Dispatch for dynamic task scheduling.
Introduced Dynamic Parallel Fusion for kernel fusion and computational isolation.
Evaluated performance on NVIDIA Tesla V100 and AMD MI50 GPUs using offline profiling and online adaptation.
Achieved up to a 50% increase in throughput.
Maintained real-time task overhead below 2%.
Improved scheduling efficiency for dynamic workloads.

Abstract

Deep Neural Networks (DNNs) are widely used in intelligent applications, driving increasing computational demands on GPUs. However, modern GPU multitasking scheduling algorithms fail to effectively balance real-time task performance and resource utilization, especially under dynamic workloads with highly variable DNN computational demands. The complex and workload-dependent execution times of DNN kernels often lead to inefficient resource allocation, degraded system throughput, and missed real-time constraints. To address these challenges, we propose FlePo (Flexible Parallel Orchestrator), a GPU multitasking scheduling framework designed to optimize resource utilization and maintain real-time task performance within acceptable limits for soft real-time systems. FlePo integrates two key techniques: Adaptive Padding Dispatch (APD), which dynamically schedules best-effort tasks while leveraging the predictable execution characteristics of DNN kernels to maintain real-time predictability; and Dynamic Parallel Fusion (DPF), which employs kernel fusion to create computational isolation, reducing interference in parallel job execution. By combining offline profiling with online adaptation, FlePo efficiently responds to workload variations. We evaluate FlePo on two heterogeneous GPU platforms, NVIDIA Tesla V100 and AMD MI50, achieving up to a 50% increase in throughput while keeping real-time overhead below 2%. Our work enhances GPU multitasking in dynamic environments, with potential applications in autonomous driving, smart homes, and intelligent healthcare.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Huanghuang Liang

Xin Yang

Rui Ge

Journals

ACM Transactions on Autonomous and Adaptive Systems

Actions

Institutions

Wuhan University

University of Macau

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

FlePo: GPU Multi-task Scheduling Optimization Framework for Dynamic Scenes

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study