Low-rank adaptation (LoRA) is a predominant parameter-efficient finetuning method for adapting large language models (LLMs) to downstream tasks. Meanwhile, Compute-in-Memory (CIM) architectures demonstrate superior energy efficiency due to their array-level parallel in-memory computing designs. In this paper, we propose deploying the LoRA-finetuned LLMs on the hybrid CIM architecture (i.e., pretrained weights onto energy-efficient Resistive Random-Access Memory (RRAM) and LoRA branches onto noise-free Static Random-Access Memory (SRAM)), reducing the energy cost to about 3% compared to the Nvidia A100 GPU. However, the inherent noise of RRAM on the saved weights leads to performance degradation, simultaneously. To address this issue, we design a novel Hardware-aware Low-rank Adaptation (HaLoRA) method. The key insight is to train a LoRA branch that is robust toward such noise and then deploy it on noise-free SRAM, while the extra cost is negligible since the parameters of LoRAs are much fewer than pretrained weights (e.g., 0.15% for LLaMA-3.2 1B model). To improve the robustness towards the noise, we theoretically analyze the gap between the optimization trajectories of the LoRA branch under both ideal and noisy conditions and further design an extra loss to minimize the upper bound of this gap. Therefore, we can enjoy both energy efficiency and accuracy during inference. Experiments finetuning the Qwen and LLaMA series demonstrate the effectiveness of HaLoRA across multiple reasoning tasks, achieving up to 22.7 improvement in average score while maintaining robustness at various noise types and noise levels.
Building similarity graph...
Analyzing shared references across papers
Taiqiang Wu
Chenchen Ding
Wei Zhou
ACM Transactions on Design Automation of Electronic Systems
University of Hong Kong
Tsinghua University
Building similarity graph...
Analyzing shared references across papers
Loading...
Wu et al. (Mon,) studied this question.
www.synapsesocial.com/papers/69b25adb96eeacc4fcec8eca — DOI: https://doi.org/10.1145/3801559
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: