March 3, 2026Open Access

Aiding Software Root Cause Analysis withLarge Language Models : Evaluation of the Effectiveness of Fine-tuned T5, GPT, and RAG in the handling Customer Fault Reports

Key Points

T5 demonstrates superior performance in lexical and structural fidelity, highlighting its effectiveness in root cause analysis.
With a BLEU-4 score of 0.1810, T5 outperforms GPT-2, which scored 0.1210, marking a notable difference in efficiency.
Evaluation used a curated dataset of real-world fault descriptions to assess three advanced models using semantic similarity metrics.
Combining T5's precision with RAG's contextual abilities may lead to more intelligent root cause analysis assistance tools.

Abstract

Software systems generate a substantial number of fault reports during pre-deployment customer testing, making manual root cause analysis (RCA) both time-consuming and error-prone. This study explores the use of large language models (LLMs)—specifically T5, GPT-2, and a retrieval-augmented generation (RAG) model—to automate and enhance the RCA process in a domain-specific software engineering setting. Using a curated dataset of real-world fault descriptions and resolutions, the models were fine-tuned and evaluated using BLEU-4, ROUGE, and BERT-based semantic similarity metrics. Results indicate that T5 outperforms GPT-2 in lexical and structural fidelity (e.g., BLEU-4: 0.1810 vs. 0.1210), while RAG achieves the highest semantic similarity (BERT score: 0.7715). These findings suggest that combining T5’s precision in technical phrasing with RAG’s contextual understanding may offer a promising direction for developing intelligent RCA assistance tools that improve both accuracy and relevance in software fault diagnosis. Future work will focus on hybrid model optimization and user-centered system integration for real-world engineering workflows.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

SHIJUN FENG

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Aiding Software Root Cause Analysis withLarge Language Models : Evaluation of the Effectiveness of Fine-tuned T5, GPT, and RAG in the handling Customer Fault Reports

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study