Indonesian court decisions contain rich legal knowledge about how judges interpret statutes, assess evidence, and determine sentences, yet much of this remains hidden in semi-structured, inconsistent documents. Rule-based systems fail on lengthy narratives, while machine learning models overfit to noise, and large language models risk factual errors. This study introduces a hybrid framework that combines rule-based extraction for structured sections with a BERT-based pipeline for narrative text. This framework was developed using a dataset of 9,109 narcotics and corruption cases; from this corpus, seven legal experts manually annotated 200 decisions. A relevance filter cleaned the text before entity extraction using fine-tuned LegalBERT. The entity extraction pipeline achieved an average F1-score of 84.39% across both domains (86.1% for corruption cases and 82.3% for narcotics cases), and the resulting knowledge graph enabled sentence length prediction with 85% accuracy, i.e., a 56.9% point improvement over full-text baselines that reached only 26–28% despite 95% training accuracy. This notable performance gap suggests that structured graph representations may better capture legally meaningful patterns that unstructured text alone tends to miss. Our study introduces what we believe is the first legal knowledge graph of Indonesian court decisions.
Building similarity graph...
Analyzing shared references across papers
Loading...
Hairurahman et al. (Mon,) studied this question.
www.synapsesocial.com/papers/69df2a4be4eeef8a2a6af855 — DOI: https://doi.org/10.1007/s10506-026-09507-8
Hairurahman Hairurahman
Arif Perdana
Derry Wijaya
Artificial Intelligence and Law
Monash University
State University of Semarang
Indonesian Pediatric Society
Building similarity graph...
Analyzing shared references across papers
Loading...