Automatically identifying structural components of Indian court judgments is critical for effective legal document analysis but remains challenging due to complex legal language, class imbalance, and limited annotated data. This paper proposes a multiclass structural classification framework for Indian legal judgments using a dataset of nearly 6500 judgment segments from Indian Kanoon, manually annotated into 15 structural categories. A domain-specific lexicon of 5,000 legal n-grams is used to support feature construction for TF-IDF representation. We evaluate statistical representations (TF-IDF), dimensionality reduction (PCA), data augmentation, and contextual embeddings from transformer-based models across multiple machine learning and deep learning classifiers. The best-performing models are further combined using interpolation-based fusion. Experimental results show that a fused Legal-BERT and Indian Legal-BERT model achieves the best performance, with 84% accuracy, 84% weighted F1-score, and 80% macro recall, without data augmentation or manual feature engineering. Performance gains are validated using a paired Wilcoxon signed-rank test (p< 0.05), demonstrating robust and consistent improvements across structural classes. Further, explainability tools are used for interpreting and understanding the primary tokens influencing the model’s decisions.
Garg et al. (Wed,) studied this question.