Deep Neural Networks (DNNs) are widely used in intelligent applications, driving increasing computational demands on GPUs. However, modern GPU multitasking scheduling algorithms fail to effectively balance real-time task performance and resource utilization, especially under dynamic workloads with highly variable DNN computational demands. The complex and workload-dependent execution times of DNN kernels often lead to inefficient resource allocation, degraded system throughput, and missed real-time constraints. To address these challenges, we propose FlePo (Flexible Parallel Orchestrator), a GPU multitasking scheduling framework designed to optimize resource utilization and maintain real-time task performance within acceptable limits for soft real-time systems. FlePo integrates two key techniques: Adaptive Padding Dispatch (APD), which dynamically schedules best-effort tasks while leveraging the predictable execution characteristics of DNN kernels to maintain real-time predictability; and Dynamic Parallel Fusion (DPF), which employs kernel fusion to create computational isolation, reducing interference in parallel job execution. By combining offline profiling with online adaptation, FlePo efficiently responds to workload variations. We evaluate FlePo on two heterogeneous GPU platforms, NVIDIA Tesla V100 and AMD MI50, achieving up to a 50% increase in throughput while keeping real-time overhead below 2%. Our work enhances GPU multitasking in dynamic environments, with potential applications in autonomous driving, smart homes, and intelligent healthcare.
Building similarity graph...
Analyzing shared references across papers
Loading...
Huanghuang Liang
Xin Yang
Rui Ge
ACM Transactions on Autonomous and Adaptive Systems
Wuhan University
University of Macau
Building similarity graph...
Analyzing shared references across papers
Loading...
Liang et al. (Fri,) studied this question.
www.synapsesocial.com/papers/69db36e64fe01fead37c4ebd — DOI: https://doi.org/10.1145/3807775