May 27, 2024Open Access

Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

We present Lightning Attention, the first linear attention implementation that maintains a constant training speed for various sequence lengths under fixed memory consumption. Due to the issue with cumulative summation operations (cumsum), previous linear attention implementations cannot achieve their theoretical advantage in a casual setting. However, this issue can be effectively solved by utilizing different attention calculation strategies to compute the different parts of attention. Specifically, we split the attention calculation into intra-blocks and inter-blocks and use conventional attention computation for intra-blocks and linear attention kernel tricks for inter-blocks. This eliminates the need for cumsum in the linear attention calculation. Furthermore, a tiling technique is adopted through both forward and backward procedures to take full advantage of the GPU hardware. To enhance accuracy while preserving efficacy, we introduce TransNormerLLM (TNL), a new architecture that is tailored to our lightning attention. We conduct rigorous testing on standard and self-collected datasets with varying model sizes and sequence lengths. TNL is notably more efficient than other language models. In addition, benchmark results indicate that TNL performs on par with state-of-the-art LLMs utilizing conventional transformer structures. The source code is released at github.com/OpenNLPLab/TransnormerLLM.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Zhen Qin

Weigao Sun

Dong Li

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention

Puntos clave

Resumen

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider