What question did this study set out to answer?

This work aims to develop an approach for maintaining knowledge retention in compact AI models while deploying in edge environments.

April 25, 2026Open Access

H-LoRA: Rethinking Rank Selection for Controllable Knowledge Retention in Edge AI

Key Points

This work aims to develop an approach for maintaining knowledge retention in compact AI models while deploying in edge environments.
Introduced H-LoRA for high-rank adaptation in edge models.
Conducted experiments on compact models like Minimind and Qwen across three domains.
Evaluated knowledge retention and performance metrics with a sample size of 29,647 across diverse tasks.
H-LoRA demonstrated 90% topic retention compared to 1% with traditional SFT.
Achieved SFT-level precision of 99.81% with significantly fewer trainable parameters (20.35%).
Reduced OTA update size from 1.4 GB to 96 MB, enhancing deployment efficiency.

Abstract

The deployment of specialized language models in resource-constrained edge environments (≤1B parameters, ≤2 GB memory, ≤100 ms latency) faces a critical challenge: Supervised Fine-Tuning (SFT) achieves domain expertise but suffers from irreversible catastrophic forgetting, while traditional Low-Rank Adaptation (LoRA) with conservative ranks (r ≤ 64) often underperforms due to insufficient adaptation capacity. This work introduces H-LoRA (High-Rank LoRA) for edge-deployable models and establishes a fundamental distinction between destructive forgetting and controllable knowledge retention. Through comprehensive experiments on compact models (0.12B Minimind and Qwen-0.5B) across three domains (Human Resources, Medical, Mathematics) using 29,647 samples, we demonstrate that while both SFT and H-LoRA exhibit general capability degradation, they differ fundamentally: SFT completely destroys the original knowledge structure (1% topic retention), while H-LoRA maintains knowledge integrity with 90% topic retention—an 89 percentage point improvement—enabling post-deployment capability recovery. H-LoRA employs simplified scaling and strategic high-rank adaptation at approximately two-thirds of the model’s hidden dimension (r = 512 for d = 768), achieving SFT-level domain performance (99.81% precision) with 5× greater parameter efficiency (20.35% trainable parameters) and robust cross-domain generalization (93.5 ± 6.8% average precision). In addition, H-LoRA reduces over-the-air (OTA) update size from 1.4 GB to 96 MB (≈93%), enabling practical and frequent deployment of specialized models in bandwidth-limited edge environments. Beyond demonstrating effectiveness, this work establishes the first comprehensive framework for characterizing specialization-retention trade-offs in parameter-efficient fine-tuning, providing practical guidance for method selection in real-world deployments.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Darren Chai Xin Lun

Lim Tong Ming

Journals

Computers, materials & continua/Computers, materials & continua (Print)

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

H-LoRA: Rethinking Rank Selection for Controllable Knowledge Retention in Edge AI

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider