Key points are not available for this paper at this time.
Motivated by the factorization inherent in the original fast multipole method and the improved fast Gauss transform we introduce a factorable form of attention that operates efficiently in high dimensions. This approach reduces the computational and memory complexity of the attention mechanism in transformers from O (N²) to O (N). In comparison to previous attempts, our work presents a linearly scaled attention mechanism that maintains the full representation of the attention matrix without compromising on sparsification and incorporates the all-to-all relationship between tokens. We explore the properties of our new attention metric and conduct tests in various standard settings. Results indicate that our attention mechanism has a robust performance and holds significant promise for diverse applications where self-attention is used.
Building similarity graph...
Analyzing shared references across papers
Loading...
Gerami et al. (Mon,) studied this question.
www.synapsesocial.com/papers/68e79844b6db643587708c59 — DOI: https://doi.org/10.48550/arxiv.2402.07901
Armin Gerami
Monte Hoover
Pranav S. Dulepet
Building similarity graph...
Analyzing shared references across papers
Loading...