April 23, 2024Open Access

다중 헤드 전문가 혼합

Key Points

Key points are not available for this paper at this time.

Abstract

Sparse Mixtures of Experts (SMoE)는 훈련 및 추론 비용의 큰 증가 없이 모델 용량을 확장하지만, 다음 두 가지 문제를 나타냅니다: (1) 낮은 전문가 활성화, 즉 최적화를 위해 극히 일부 전문가만 활성화됨. (2) 개별 토큰 내 여러 의미 개념에 대한 미세 분석 능력 부족. 우리는 각 토큰을 여러 하위 토큰으로 분할하는 다중 헤드 메커니즘을 사용하는 Multi-Head Mixture-of-Experts (MH-MoE)를 제안합니다. 이 하위 토큰들은 다양한 전문가 집합에 병렬로 할당되고 처리된 후 원래의 토큰 형태로 매끄럽게 재통합됩니다. 다중 헤드 메커니즘은 모델이 서로 다른 전문가 내 다양한 표현 공간에서 정보를 집합적으로 주목할 수 있게 하며, 전문가 활성화를 크게 향상시켜 문맥 이해를 심화하고 과적합을 완화합니다. 또한, 우리의 MH-MoE는 구현이 간단하며 다른 SMoE 최적화 방법과 독립적이어서, 다른 SMoE 모델과 쉽게 통합하여 성능을 향상시킬 수 있습니다. 영어 중심 언어 모델링, 다중 언어 모델링, 마스킹된 다중 모달리티 모델링 등 세 가지 과제를 아우르는 광범위한 실험 결과가 MH-MoE의 효과성을 입증합니다.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Xun Wu

Shaohan Huang

Wenhui Wang

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

다중 헤드 전문가 혼합

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider