Key points are not available for this paper at this time.
专家混合(MoE)大型语言模型(LLM)的内存需求通常超过GPU内存容量,导致需要昂贵的参数从辅助存储移动到GPU以进行专家计算。在本工作中,我们提出了近数据专家混合(MoNDE),这是一种近数据计算解决方案,有效支持MoE LLM的推理。MoNDE通过仅将活跃专家传输到GPU,而在主机内存设备中计算其余的冷专家,减少了MoE参数移动的体积。通过用小的激活数据替代大规模专家参数的传输,MoNDE实现了更加高效的MoE推理通信,从而在编码器和解码器操作中相较于现有参数卸载框架带来显著加速。
Building similarity graph...
Analyzing shared references across papers
Loading...
Tae Hyun Kim
Kwanseok Choi
Y.H. Cho
Building similarity graph...
Analyzing shared references across papers
Loading...
Kim 等人(周三)研究了这个问题。
www.synapsesocial.com/papers/68e67f72b6db643587608fdf — DOI: https://doi.org/10.48550/arxiv.2405.18832
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: