This paper presents a demonstration of mLoRA, a system for parallel and efficient fine-tuning of Large Language Models (LLMs) using Low-Rank Adaptation (LoRA). mLoRA introduces two core components: LoRAPP, a zero-bubble pipeline parallelism mechanism that leverages the independence of LoRA adapters to maximize GPU utilization across multiple GPUs, and BatchLoRA, a custom operator that consolidates multiple LoRA tasks into batched matrix operations to reduce kernel launch overhead. The system also includes a memory-aware task scheduler for efficient resource allocation. Demonstrated on database-related tasks including Text2SQL and LLM-based data preprocessing (LLM4DP), mLoRA achieves 30–45% faster training compared to existing parallel methods and has been deployed in production at AntGroup. This demo paper was submitted to the PVLDB 2025 Demo Track and serves as a companion to the full research paper accepted at VLDB 2025.
Building similarity graph...
Analyzing shared references across papers
Loading...
Zelong Huang
Zhengmao Ye
Salma Filali
Cornell University
Sichuan University
The University of Texas at Arlington
Building similarity graph...
Analyzing shared references across papers
Loading...
Huang et al. (Sun,) studied this question.
www.synapsesocial.com/papers/69a67f12f353c071a6f0ae55 — DOI: https://doi.org/10.5281/zenodo.18827405