February 18, 2024Open Access

Generation Meets Verification: Accelerating Large Language Model Inference with Smart Parallel Auto-Correct Decoding

Key Points

Key points are not available for this paper at this time.

Abstract

This research aims to accelerate the inference speed of large language models (LLMs) with billions of parameters. We propose Smart Parallel Auto-Correct dEcoding (SPACE), an innovative approach designed for achieving lossless acceleration of LLMs. By integrating semi-autoregressive inference and speculative decoding capabilities, SPACE uniquely enables autoregressive LLMs to parallelize token generation and verification. This is realized through a specialized semi-autoregressive supervised fine-tuning process that equips existing LLMs with the ability to simultaneously predict multiple tokens. Additionally, an auto-correct decoding algorithm facilitates the simultaneous generation and verification of token sequences within a single model invocation. Through extensive experiments on a range of LLMs, SPACE has demonstrated inference speedup ranging from 2. 7x-4. 0x on HumanEval-X while maintaining output quality.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Yi et al. (Sun,) studied this question.

www.synapsesocial.com/papers/68e78b99b6db6435876fdc61 — DOI: https://doi.org/10.48550/arxiv.2402.11809

Authors

Hanling Yi

Feng‐Huei Lin

Hongbin Li

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Generation Meets Verification: Accelerating Large Language Model Inference with Smart Parallel Auto-Correct Decoding

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Also consider