April 2, 2024Open Access

대규모 언어 모델의 추론 효율성 향상: 최적화 전략 및 구조 혁신 연구

Key Points

Key points are not available for this paper at this time.

Abstract

대규모 언어 모델은 점점 더 커지고 있으며, 큰 모델이 더 빠르게 학습하기 때문에 이러한 추세는 계속될 것으로 예상됩니다. 그러나 모델 크기 증가는 추론 비용에 심각한 영향을 미칠 것입니다. 따라서 더 큰 모델의 성능을 유지하면서 실행 비용을 줄이기 위해 모델 압축이 중요합니다. 본 논문에서는 모델 압축 방법을 탐구하였고, Transformer LLM의 후반부 어텐션 서브레이어를 단순히 생략하는 방법이 효과적인 모델 압축 방법임을 경험적으로 입증하였습니다. 이 층들은 중복성이 있으면서도 계산 비용이 매우 크기 때문입니다. Llama 2 7B 모델에서 한 토큰 생성 속도가 21% 향상되었으며, 여러 일반적인 벤치마크에서 놀랍고 예상치 못하게도 성능이 향상되는 결과를 관찰하였습니다.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Georgy Tyukin (Tue,)가 이 질문을 연구하였습니다.

www.synapsesocial.com/papers/68e70d86b6db64358768697d — DOI: https://doi.org/10.48550/arxiv.2404.05741

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Model Compression and Efficient Inference for Large Language Models: A Survey· 2024 · 14 citations
Designing Large Foundation Models for Efficient Training and Inference: A Survey· 2024 · 5 citations
Efficient Training and Inference: Techniques for Large Language Models Using Llama· 2024 · 20 citations
Efficiency optimization of large-scale language models based on deep learning in natural language processing tasks· 2024 · 5 citations

대규모 언어 모델의 추론 효율성 향상: 최적화 전략 및 구조 혁신 연구

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion