Key points are not available for this paper at this time.
We introduce LLaMA, a collection of foundation language models ranging from 7B to 65B parameters. We train our models on trillions of tokens, and show that it is possible to train state-of-the-art models using publicly available datasets exclusively, without resorting to proprietary and inaccessible datasets. In particular, LLaMA-13B outperforms GPT-3 (175B) on most benchmarks, and LLaMA-65B is competitive with the best models, Chinchilla-70B and PaLM-540B. We release all our models to the research community.
Building similarity graph...
Analyzing shared references across papers
Loading...
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Building similarity graph...
Analyzing shared references across papers
Loading...
Touvron et al. (Mon,) studied this question.
www.synapsesocial.com/papers/69d98341e6ab964fb0835e37 — DOI: https://doi.org/10.48550/arxiv.2302.13971