March 3, 2026Open Access

Using LLMs to Extract UML Class Diagrams from Java and Python Programs:An Empirical Study

Key Points

The study reveals that large language models can effectively generate UML class diagrams from code.
Accuracy and F1 scores are notably higher for Python programs compared to Java, highlighting language performance differences.
Evaluation of five large language models includes StarCoder2, LLaMA, CodeLlama, Mistral, and DeepSeek in generating UML diagrams.
DeepSeek and Mistral outperform other models, while LLaMA shows consistent underperformance across all metrics.

Abstract

In this paper, we present a comprehensive study of the capabilities of five large language models (LLMs), namely StarCoder2, LLaMA, CodeLlama, Mistral, and DeepSeek, for abstracting UML class diagrams from code, with the aim to provide researchers and developers with insights into the capabilities and limitations of using various LLMs in a model-driven reverse engineering process. We evaluate the LLMs by prompting them to generate UML class diagrams for both Java and Python programs, with the key focus on accuracy, consistency, and F1 score. Our findings reveal that all LLMs have higher accuracy and F1 scores for Python than for Java. DeepSeek and Mistral perform best overall, while LLaMA consistently performs the lowest in all metrics and for both languages.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Hanan; id_orcid 0009-0003-4693-8707 Siala

King's College London

Kevin; id_orcid 0000-0002-9706-1410 Lano

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Using LLMs to Extract UML Class Diagrams from Java and Python Programs:An Empirical Study

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study