June 16, 2024Open Access

주의 병목 현상 극복

Key Points

Key points are not available for this paper at this time.

Abstract

주의 기반 트랜스포머는 장거리 의존성을 모델링하고 가변 길이 입력 시퀀스를 처리하는 능력 덕분에 많은 딥러닝 분야에서 표준 아키텍처가 되었습니다. 그러나 이차 복잡도를 가진 주의 메커니즘은 트랜스포머 아키텍처에서 중요한 병목 현상입니다. 이 알고리즘은 디코더에서 단방향이며 과매개변수화된 디코더 전용 모델에서 정적인 패턴으로 수렴합니다. 저는 주의 또는 활성화 대체로서 생성 함수를 개발하여 이 문제를 해결했습니다. 이 함수는 각 토큰을 이전 토큰과 비교하여 자동회귀적 특성을 유지합니다. nanoGPT를 이용한 테스트 설정에서 이는 더 작은 모델로 더 낮은 손실을 나타냅니다. 평균 컨텍스트 벡터를 도입하면 손실이 더 감소합니다. 주의 대체 개념은 GNU AGPL v3 라이선스 하에 https://gitlab.com/Bachstelze/causalgeneration 에서 배포됩니다.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Kalle Hilsenbek (Sun,)이 이 질문을 연구하였습니다.

www.synapsesocial.com/papers/68e64883b6db6435875d9e17 — DOI: https://doi.org/10.48550/arxiv.2406.10906

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

주의 병목 현상 극복

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion