March 3, 2026Open Access

LLM4ATS: Applying Large Language Models for Auto-Testing Scripts in Automobiles

Key Points

The LLM4ATS framework generates automated test scripts from natural language descriptions, enhancing generation quality.
GPT-4 achieved a pass rate of 91% with LLM4ATS, significantly up from 42% in zero-shot mode across various models.
Integration of rule-guided mechanisms ensures rigorous syntax validation and semantic compliance checks.
Expert evaluations confirm the generated scripts meet industry standards for correctness and readability.

Abstract

This paper introduces LLM4ATS, a framework integrating large language models, RAG, and closed-loop verification to automatically generate highly reliable automotive automated test scripts from natural language descriptions. Addressing the complex linguistic structure, strict rules, and strong dependency on the in-vehicle communication database inherent in ATS scripts, LLM4ATS innovatively employs fine-grained line-level generation and a rule-guided iterative refinement mechanism. The framework first enhances prompt context by retrieving relevant information from constructed syntax and case knowledge bases via RAG. Subsequently, each generated script line undergoes rigorous verification through a two-stage validator: initial syntax validation followed by semantic compliance checks against the communication database for signal paths and value domains. Any errors trigger structured feedback, driving iterative refinement by the large language model until fully compliant scripts are produced. This paper evaluated the framework’s effectiveness on real ATS datasets, testing models including GPT-3.5, GPT-4, Qwen2.5-7B, and Qwen2.5-72B-Instruct. Experimental results demonstrate that compared to zero-shot and few-shot baseline methods, the LLM4ATS framework significantly improves generation quality and pass rates across all models. Notably, the strongest GPT-4 model achieved a script pass rate of 91% with LLM4ATS, up from 42% in zero-shot mode, and validated functional effectiveness on a specified in-vehicle hardware platform (Chery Fengyun T28 dashboard). At the same time, expert manual evaluations confirmed the superior performance of the generated scripts in correctness, readability, and compliance with industry standards.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Zeyuan Li

Wei Li

Yuezhao Liu

Journals

Big Data and Cognitive Computing

SHILAP Revista de lepidopterología

Actions

Institutions

South China University of Technology

Nanchang University

Guangzhou Experimental Station

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

LLM4ATS: Applying Large Language Models for Auto-Testing Scripts in Automobiles

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study