November 1, 2022

DeepSpeed-Inference：前例のない規模でのトランスフォーマーモデル推論を効率化

Key Points

Key points are not available for this paper at this time.

Abstract

トランスフォーマーモデルの推論環境は、モデルサイズ、モデル特性、レイテンシーおよびスループット要件、ハードウェア要件など、多様化が進んでいます。このような多様性に対応する汎用的な推論システムの設計は困難です。DeepSpeed-Inferenceは、(1) モデルが集約GPUメモリに収まる場合に、密および疎トランスフォーマーのレイテンシーを最小化しつつスループットを最大化するマルチGPU推論ソリューション、(2) CPU/NVMe/GPUメモリを活用し、集約GPUメモリより大きいモデルの高スループット推論を可能にする異種推論ソリューションによって、これらの課題に対処します。DeepSpeed-Inferenceはレイテンシーを6.4倍削減し、スループットを最先端技術と比較して1.5倍向上させます。数百のGPUを活用することで、リアルタイムレイテンシー制約下での兆単位パラメータ規模の推論を可能にし、推論の前例のない規模を実現します。また、GPUのみのソリューションに比べて25倍大きいモデルの推論が可能であり、84 TFLOPS（A6000のピーク性能の50%以上）の高スループットを発揮します。

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Reza Yazdani Aminabadi

Samyam Rajbhandari

Ammar Ahmad Awan

Actions

Institutions

Microsoft (United States)

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Aminabadiら（火曜日）はこの問題について研究しました。

www.synapsesocial.com/papers/6a08b5e4ad370a6b44de498f — DOI: https://doi.org/10.1109/sc41404.2022.00051

Also consider

Synapse has enriched 2 closely related papers on similar clinical questions. Consider them for comparative context:

MizAR 60 for Mizar 50· 2023 · 75,682 citations
AI-Assisted Pipeline for Dynamic Generation of Trustworthy Health Supplement Content at Scale· 2018 · 45,559 citations

DeepSpeed-Inference：前例のない規模でのトランスフォーマーモデル推論を効率化

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider