What question did this study set out to answer?

This research aims to improve the classification of knowledge points in exam questions by incorporating spatial layout information.

April 4, 2026Open Access

Heterogeneous Layout-Aware Cross-Modal Knowledge Point Classification for Exam Questions

Key Points

This research aims to improve the classification of knowledge points in exam questions by incorporating spatial layout information.
Proposed a heterogeneous layout-aware cross-modal framework for classification.
Developed independent text and layout encoders to extract semantic content and spatial configurations.
Designed a layout-aware enhancing module with cross-modal blocks for bidirectional fusion of features.
Introduced a dynamic router for adapting to question-specific knowledge distributions.
Achieved 91.56% accuracy for coarse-grained classification.
Attained 80.58% accuracy for fine-grained classification.
Recorded an overall F1-score of 91.39%, outperforming baseline models.

Abstract

With the continuous emergence of exam question types, accurate classification of knowledge points is crucial for intelligent exam analysis. Existing methods focus on text or text–image fusion but largely ignore spatial layout. To address this limitation, we propose a heterogeneous layout-aware cross-modal framework for knowledge point classification. The architecture begins with an encoding module where independent text and layout encoders extract semantic content and spatial configurations, respectively. We then design a layout-aware enhancing module consisting of two parallel cross-modal blocks, namely a Layout-Aware Text-Enhancing block and a Context-Aware Layout-Enhancing block. This module supports the bidirectional fusion of text and layout features and generates a comprehensive representation that integrates both semantic and spatial information. Furthermore, a dynamic router with top-k expert selection is introduced to dynamically adapt to question-specific knowledge distributions and focus on core knowledge points for precise classification. Experimental results demonstrate that our method effectively integrates text and layout information, significantly enhancing performance on the proposed QType-EDU dataset. The approach achieves 91.56% accuracy for coarse-grained classification and 80.58% for fine-grained classification, with an overall F1-score of 91.39%, surpassing all baseline models.

Heterogeneous Layout-Aware Cross-Modal Knowledge Point Classification for Exam Questions

Key Points

Abstract

Cite This Study