Arabic’s complex morphological system and the optional use of short vowels (tashkīl) introduce substantial lexical ambiguity, posing significant challenges for Large Language Models (LLMs). While diacritics enhance linguistic precision, LLMs trained predominantly on undiacritized corpora often exhibit performance degradation when processing fully diacritized inputs due to representation shifts and tokenization inconsistencies. To address this limitation, we propose the Arabic Diacritic Lexical Knowledge Graph (ADL-KG), a structured framework that links diacritized and undiacritized forms through integrated lexical, morphological, and semantic knowledge. Building upon this resource, we introduce Diacritic-Aware Knowledge Graph Prompting (DA-KGP), a prompt augmentation strategy that injects explicit linguistic features into LLM inputs to facilitate robust interpretation of diacritized Arabic text. The framework is evaluated on the Arabic Reading Comprehension Dataset under zero-shot and few-shot question answering across AraGPT2-base, BLOOMZ-560M, SILMA-v1, and LLaMA 3.1-8B. Performance is assessed using Exact Match, BLEU, ROUGE-1, and BERTScore-F1. Experimental results show that fully diacritized prompts significantly degrade baseline performance, whereas DA-KGP consistently mitigates this effect by improving semantic alignment across diverse architectures. For AraGPT2-base, KG augmentation improves average BERTScore-F1 by +5.96 points. SILMA-v1 achieves the strongest lexical improvements, reaching 21.57 BLEU and 81.31% BERTScore-F1 in the KG-enhanced two-shot configuration. LLaMA 3.1-8B achieves the highest overall semantic performance with 82.54% BERTScore-F1 under KG-enhanced prompting, while BLOOMZ-560M also demonstrates statistically significant semantic gains through structured augmentation. These findings demonstrate that morphologically informed prompting and structured lexical grounding provide an effective and parameter-efficient strategy for improving the robustness and semantic fidelity of Arabic LLMs under fully diacritized input conditions.
Ayat et al. (Sun,) studied this question.