Key points are not available for this paper at this time.
Large Language Models (LLMs) open new opportunities for adaptive automation in production systems by enabling robots to interpret human instructions and generate context-aware actions. In contrast to conventional robot programming, which requires expert knowledge and frequent reconfiguration, LLM-based control promises greater flexibility and easier interaction between humans and machines. However, generic LLMs still face major challenges when applied to manufacturing environments, as they lack grounding in real-world perception and may produce infeasible or unsafe actions. This paper presents a laboratory demonstrator that evaluates how different prompting strategies affect the performance of an LLM-controlled pick-and-place robot. The study systematically compares zero-shot and multimodal few-shot prompting, where visual examples such as annotated video frames and image captions are integrated into the LLM input. A dedicated evaluation model with metrics for plan success, action success, and plan optimality is used to quantify system behavior. The experimental results demonstrate that multimodal few-shot prompting significantly improves planning accuracy, robustness, and adaptability compared to a zero-shot baseline. These findings illustrate the potential of LLM-driven control for future intelligent production systems that combine semantic reasoning, multimodal perception, and human-interpretable automation.
Building similarity graph...
Analyzing shared references across papers
Loading...
Dominik Koch
Jakob Wolber
Zhuo Shi
Building similarity graph...
Analyzing shared references across papers
Loading...
Koch et al. (Thu,) studied this question.
www.synapsesocial.com/papers/6a06b83de7dec685947aab35 — DOI: https://doi.org/10.5445/ir/1000193181