What question did this study set out to answer?

The study aims to develop an efficient framework for automating the classification of construction accident reports.

May 8, 2026

Classifying construction accident reports with parameter-efficient fine-tuned LLMs and ensemble methods

Key Points

The study aims to develop an efficient framework for automating the classification of construction accident reports.
Processed construction accident report dataset using random oversampling for class imbalance.
Adapted four large language models through parameter-efficient fine-tuning using Low-Rank Adaptation (LoRA).
Aggregated individual model predictions using hard voting, soft voting, and stacking techniques.
Achieved highest macro-average F1 score of 0.9377 with hard voting and stacking ensembles.
Hard voting ensemble with OPT, Ettin-Encoder-1B, and T5-Large outperformed individual models in identifying minority classes.
Random oversampling reduced class imbalance effectively while maintaining performance with limited resources.

Abstract

Purpose The automated classification of construction accident reports is often inundated with persistent challenges such as low accuracy, data scarcity, class imbalance, and inadequate semantic representation. Therefore, to overcome those challenges, this study aims to develop a resource-efficient, high-performing ensemble framework that enhances safety information management, enables timely risk identification, and supports data-driven decision-making in construction safety. Design/methodology/approach A construction accident report dataset was processed using random oversampling to address severe class imbalance. Four large language models (DeBERTa-v3-Large, T5-Large, OPT, and Ettin-Encoder-1B) were adapted through parameter-efficient fine-tuning (PEFT) using the Low-Rank Adaptation (LoRA) method. Individual model predictions were aggregated via three ensemble techniques: hard voting, soft voting, and stacking, with stacking selected as the final integration strategy. The proposed ensemble framework was benchmarked against the performance of the individual fine-tuned models and three conventional baseline classifiers. Findings The ensemble models based on hard voting and stacking achieved the highest macro-average F1 score of 0.9377, surpassing all soft voting configurations as well as every individual LLM. In particular, the hard voting ensemble combining OPT, Ettin-Encoder-1B, and T5-Large and the stacking model using OPT and T5-Large demonstrated superior and stable performance across most accident subcategories. Both ensembles significantly outperformed single-model baselines, especially in the accurate identification of minority classes. Random oversampling effectively reduced class imbalance, while parameter-efficient fine-tuning maintained strong performance despite limited computational resources. Originality/value This study presents a novel integration of parameter-efficient fine-tuned large language models with ensemble learning for construction accident report classification. It introduces an effective, resource-efficient methodology that simultaneously tackles data imbalance and semantic complexity in safety-related text analysis. The framework offers practical value by enabling accurate, automated analysis of accident reports, reducing dependence on labor-intensive manual coding, and facilitating proactive safety management, hazard mitigation, and regulatory compliance in the construction industry.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Yì Wáng

Jing Li

Shujie Wu

Journals

Engineering Construction & Architectural Management

Actions

Institutions

The University of Adelaide

Qingdao University of Science and Technology

Bond University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Classifying construction accident reports with parameter-efficient fine-tuned LLMs and ensemble methods

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study