July 9, 2024Open Access

Reuse, Don't Retrain: A Recipe for Continued Pretraining of Language Models

Key Points

Key points are not available for this paper at this time.

Abstract

As language models have scaled both their number of parameters and pretraining dataset sizes, the computational cost for pretraining has become intractable except for the most well-resourced teams. This increasing cost makes it ever more important to be able to reuse a model after it has completed pretraining; allowing for a model's abilities to further improve without needing to train from scratch. In this work, we detail a set of guidelines that cover how to design efficacious data distributions and learning rate schedules for continued pretraining of language models. When applying these findings within a continued pretraining run on top of a well-trained 15B parameter model, we show an improvement of 9\% in average model accuracy compared to the baseline of continued training on the pretraining set. The resulting recipe provides a practical starting point with which to begin developing language models through reuse rather than retraining.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Parmar et al. (Tue,) studied this question.

www.synapsesocial.com/papers/68e60e42b6db6435875a11d3 — DOI: https://doi.org/10.48550/arxiv.2407.07263

Authors

Jupinder Parmar

Sanjev Satheesh

Mostofa Patwary

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Reuse, Don't Retrain: A Recipe for Continued Pretraining of Language Models

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Also consider