July 15, 2025Open Access

Integrating Large Language Models into Robotic Autonomy: A Review of Motion, Voice, and Training Pipelines

Key Points

Large language models significantly improve robotic autonomy in multiple areas like motion and voice interaction.
TrustNavGPT reduces the word error rate to 5.7% for voice commands, enhancing navigation under noisy conditions.
The integration of multi-modal data in frameworks like MapGPT supports robust planning and real-time execution.
Best practices identified in this review aim to bridge the gap between simulation training and real-world robotic deployment.

Abstract

This survey provides a comprehensive review of the integration of large language models (LLMs) into autonomous robotic systems, organized around four key pillars: locomotion, navigation, manipulation, and voice-based interaction. We examine how LLMs enhance robotic autonomy by translating high-level natural language commands into low-level control signals, supporting semantic planning and enabling adaptive execution. Systems like SayTap improve gait stability through LLM-generated contact patterns, while TrustNavGPT achieves a 5.7% word error rate (WER) under noisy voice-guided conditions by modeling user uncertainty. Frameworks such as MapGPT, LLM-Planner, and 3D-LOTUS++ integrate multi-modal data—including vision, speech, and proprioception—for robust planning and real-time recovery. We also highlight the use of physics-informed neural networks (PINNs) to model object deformation and support precision in contact-rich manipulation tasks. To bridge the gap between simulation and real-world deployment, we synthesize best practices from benchmark datasets (e.g., RH20T, Open X-Embodiment) and training pipelines designed for one-shot imitation learning and cross-embodiment generalization. Additionally, we analyze deployment trade-offs across cloud, edge, and hybrid architectures, emphasizing latency, scalability, and privacy. The survey concludes with a multi-dimensional taxonomy and cross-domain synthesis, offering design insights and future directions for building intelligent, human-aligned robotic systems powered by LLMs.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Liu et al. (Tue,) studied this question.

www.synapsesocial.com/papers/689a02bce6551bb0af8cc7bd — DOI: https://doi.org/10.3390/ai6070158

Authors

Yutong Liu

Qingquan Sun

Dhruvi Rajeshkumar Kapadia

Journals

Actions

Institutions

California State University, San Bernardino

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Integrating Large Language Models into Robotic Autonomy: A Review of Motion, Voice, and Training Pipelines

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion