Purpose The automated classification of construction accident reports is often inundated with persistent challenges such as low accuracy, data scarcity, class imbalance, and inadequate semantic representation. Therefore, to overcome those challenges, this study aims to develop a resource-efficient, high-performing ensemble framework that enhances safety information management, enables timely risk identification, and supports data-driven decision-making in construction safety. Design/methodology/approach A construction accident report dataset was processed using random oversampling to address severe class imbalance. Four large language models (DeBERTa-v3-Large, T5-Large, OPT, and Ettin-Encoder-1B) were adapted through parameter-efficient fine-tuning (PEFT) using the Low-Rank Adaptation (LoRA) method. Individual model predictions were aggregated via three ensemble techniques: hard voting, soft voting, and stacking, with stacking selected as the final integration strategy. The proposed ensemble framework was benchmarked against the performance of the individual fine-tuned models and three conventional baseline classifiers. Findings The ensemble models based on hard voting and stacking achieved the highest macro-average F1 score of 0.9377, surpassing all soft voting configurations as well as every individual LLM. In particular, the hard voting ensemble combining OPT, Ettin-Encoder-1B, and T5-Large and the stacking model using OPT and T5-Large demonstrated superior and stable performance across most accident subcategories. Both ensembles significantly outperformed single-model baselines, especially in the accurate identification of minority classes. Random oversampling effectively reduced class imbalance, while parameter-efficient fine-tuning maintained strong performance despite limited computational resources. Originality/value This study presents a novel integration of parameter-efficient fine-tuned large language models with ensemble learning for construction accident report classification. It introduces an effective, resource-efficient methodology that simultaneously tackles data imbalance and semantic complexity in safety-related text analysis. The framework offers practical value by enabling accurate, automated analysis of accident reports, reducing dependence on labor-intensive manual coding, and facilitating proactive safety management, hazard mitigation, and regulatory compliance in the construction industry.
Building similarity graph...
Analyzing shared references across papers
Loading...
Yì Wáng
Jing Li
Shujie Wu
Engineering Construction & Architectural Management
The University of Adelaide
Qingdao University of Science and Technology
Bond University
Building similarity graph...
Analyzing shared references across papers
Loading...
Wáng et al. (Wed,) studied this question.
www.synapsesocial.com/papers/69fd7f4fbfa21ec5bbf07cf7 — DOI: https://doi.org/10.1108/ecam-12-2025-2035