What question did this study set out to answer?

The aim is to improve the performance of Arabic LLMs on diacritized text through a structured knowledge graph and prompting strategy.

May 26, 2026Open Access

ADL-KG: Diacritic-Aware Knowledge Graph Prompting for Arabic LLM Question Answering

Key Points

The aim is to improve the performance of Arabic LLMs on diacritized text through a structured knowledge graph and prompting strategy.
Developed the Arabic Diacritic Lexical Knowledge Graph (ADL-KG) linking diacritized and undiacritized forms.
Introduced Diacritic-Aware Knowledge Graph Prompting (DA-KGP) to integrate linguistic features into LLM inputs.
Evaluated across multiple LLM architectures on the Arabic Reading Comprehension Dataset.
DA-KGP significantly mitigates performance degradation, improving BERTScore-F1 by +5.96 points for AraGPT2-base.
SILMA-v1 achieves 21.57 BLEU and 81.31% BERTScore-F1 in the KG-enhanced two-shot setup.
LLaMA 3.1-8B reaches the highest overall performance with 82.54% BERTScore-F1 under KG-enhanced prompting.

Abstract

Arabic’s complex morphological system and the optional use of short vowels (tashkīl) introduce substantial lexical ambiguity, posing significant challenges for Large Language Models (LLMs). While diacritics enhance linguistic precision, LLMs trained predominantly on undiacritized corpora often exhibit performance degradation when processing fully diacritized inputs due to representation shifts and tokenization inconsistencies. To address this limitation, we propose the Arabic Diacritic Lexical Knowledge Graph (ADL-KG), a structured framework that links diacritized and undiacritized forms through integrated lexical, morphological, and semantic knowledge. Building upon this resource, we introduce Diacritic-Aware Knowledge Graph Prompting (DA-KGP), a prompt augmentation strategy that injects explicit linguistic features into LLM inputs to facilitate robust interpretation of diacritized Arabic text. The framework is evaluated on the Arabic Reading Comprehension Dataset under zero-shot and few-shot question answering across AraGPT2-base, BLOOMZ-560M, SILMA-v1, and LLaMA 3.1-8B. Performance is assessed using Exact Match, BLEU, ROUGE-1, and BERTScore-F1. Experimental results show that fully diacritized prompts significantly degrade baseline performance, whereas DA-KGP consistently mitigates this effect by improving semantic alignment across diverse architectures. For AraGPT2-base, KG augmentation improves average BERTScore-F1 by +5.96 points. SILMA-v1 achieves the strongest lexical improvements, reaching 21.57 BLEU and 81.31% BERTScore-F1 in the KG-enhanced two-shot configuration. LLaMA 3.1-8B achieves the highest overall semantic performance with 82.54% BERTScore-F1 under KG-enhanced prompting, while BLOOMZ-560M also demonstrates statistically significant semantic gains through structured augmentation. These findings demonstrate that morphologically informed prompting and structured lexical grounding provide an effective and parameter-efficient strategy for improving the robustness and semantic fidelity of Arabic LLMs under fully diacritized input conditions.

ADL-KG: Diacritic-Aware Knowledge Graph Prompting for Arabic LLM Question Answering

Key Points

Abstract

Cite This Study