Large language models (LLMs) have advanced rapidly, emerging as versatile tools across fields thanks to their exceptional language understanding, generation, and reasoning capabilities. However, performing LLM inference at the network edge remains challenging due to their large memory and compute demands. This survey outlines the challenges specific to LLM edge inference and provides a comprehensive overview of recent progress, covering system architectures, model optimization and deployment, and resource management and scheduling. By synthesizing state-of-the-art techniques and mapping future directions, this survey aims to unlock the potential of LLMs in resource-constrained edge environments.
Building similarity graph...
Analyzing shared references across papers
Loading...
Zhixiong Chen
Bingjie Zhu
Jiangzhou Wang
ACM Computing Surveys
Nanyang Technological University
Queen Mary University of London
Kyung Hee University
Building similarity graph...
Analyzing shared references across papers
Loading...
Chen et al. (Fri,) studied this question.
www.synapsesocial.com/papers/69edacdb4a46254e215b49d0 — DOI: https://doi.org/10.1145/3809166
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: