BackgroundConstruction sites generate large volumes of textual safety data, yet inconsistent terminology and mixed-language expressions (MLEs) reduce the reliability of analysis. Korean safety violation and warning reports (SVWRs), a localized form of safety observation reports, are often written with irregular spacing, abbreviations, and hybrid vocabulary, hindering systematic utilization for data-driven safety management.ObjectiveThis study aims to develop and validate a domain-specific text normalization framework to improve the linguistic consistency and analytical reliability of SVWRs.MethodsA dataset of 64,999 SVWRs collected from 39 construction sites in South Korea was analyzed. A rule- and dictionary-based normalization pipeline was designed to unify fragmented terms and standardize MLEs. Topic modeling was conducted using topic modeling with symmetric priors and eight topics aligned with national safety categories.ResultsNormalization increased topic-model coherence from 0.412 to 0.497 (20.6% improvement), clarifying risk structures across categories such as falls, electrical hazards, and fire prevention. It revealed co-occurring risk patterns previously obscured by inconsistent language use, demonstrating that linguistic preprocessing is crucial for reliable text-based safety analytics.ConclusionsThe proposed framework enhances both methodological reliability and practical applicability by converting fragmented field reports into standardized, analyzable data. Its dictionary-based architecture can be extended to other agglutinative or multilingual languages, supporting scalable and data-driven safety management in the construction industry.
Building similarity graph...
Analyzing shared references across papers
Loading...
Kyung-Su Kang
Sang‐Min Lee
Han-Guk Ryu
Work
Korea Aerospace University
Sahmyook University
Building similarity graph...
Analyzing shared references across papers
Loading...
Kang et al. (Mon,) studied this question.
www.synapsesocial.com/papers/69ba422e4e9516ffd37a238f — DOI: https://doi.org/10.1177/10519815261426996
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: