March 2, 2023Open Access

Parameter-efficient fine-tuning of large-scale pre-trained language models

Key Points

Key points are not available for this paper at this time.

Abstract

Abstract With the prevalence of pre-trained language models (PLMs) and the pre-training–fine-tuning paradigm, it has been continuously shown that larger models tend to yield better performance. However, as PLMs scale up, fine-tuning and storing all the parameters is prohibitively costly and eventually becomes practically infeasible. This necessitates a new branch of research focusing on the parameter-efficient adaptation of PLMs, which optimizes a small portion of the model parameters while keeping the rest fixed, drastically cutting down computation and storage costs. In general, it demonstrates that large-scale models could be effectively stimulated by the optimization of a few parameters. Despite the various designs, here we discuss and analyse the approaches under a more consistent and accessible term ‘delta-tuning’, where ‘delta’ a mathematical notation often used to denote changes, is borrowed to refer to the portion of parameters that are ‘changed’ during training. We formally describe the problem and propose a unified categorization criterion for existing delta-tuning methods to explore their correlations and differences. We also discuss the theoretical principles underlying the effectiveness of delta-tuning and interpret them from the perspectives of optimization and optimal control. Furthermore, we provide a holistic empirical study on over 100 natural language processing tasks and investigate various aspects of delta-tuning. With comprehensive study and analysis, our research demonstrates the theoretical and practical properties of delta-tuning in the adaptation of PLMs.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Ning Ding

Yujia Qin

Guang Yang

Journals

Nature Machine Intelligence

Actions

Institutions

Tsinghua University

Tsinghua–Berkeley Shenzhen Institute

Beijing Academy of Artificial Intelligence

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Ding et al. (Thu,) studied this question.

www.synapsesocial.com/papers/69d8a176183921ebcaae2fe7 — DOI: https://doi.org/10.1038/s42256-023-00626-4

Also consider

Synapse has enriched 3 closely related papers on similar clinical questions. Consider them for comparative context:

Deep learning· 2015 · 80,583 citations
Advances in neural information processing systems 7· 1997 · 22,300 citations
Subspace Methods for Nonlinear Optimization· 2021 · 7 citations

Parameter-efficient fine-tuning of large-scale pre-trained language models

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider