Key points are not available for this paper at this time.
Maintenance work orders (MWOs) contain essential but unstructured reports describing equipment failures and repairs. In safety-critical environments such as healthcare, the inability to automatically and reliably extract this information limits the use of NLP models in risk assessments. This study examines the use of artificial intelligence (AI) as a trustworthy assistant for enhancing qualitative insights in predictive maintenance pipelines by detecting failure-related reports and extracting corresponding evidence from clinical maintenance documents. Using more than 44,000 maintenance reports from BD Alaris infusion pumps, the study evaluates annotation accuracy, characterizes extraction behavior, and measures how errors in automated information extraction propagate into downstream representations of device failures. A novel set-theoretic evaluation framework reveals multiple forms of NLP prediction behavior, which include accuracy, partial extraction, confabulation, and hallucination that influence the integrity of maintenance datasets. Monte Carlo simulation further demonstrates that classification errors or incorrect extraction of failure-related text systematically inflate the randomness of equipment degradation and distort the representation of device reliability. This study contributes three methodological advances. First, it formalizes information extraction from maintenance text as a set-theoretic alignment task, establishing a generalizable framework for evaluating the factual grounding of AI-generated annotations. Second, it introduces a regression-based sensitivity model that quantifies how annotation errors influence failure-time estimates, enabling post-deployment verification and continuous monitoring of AI-assisted maintenance pipelines. Third, it provides a practical pathway for integrating NLP annotations into existing asset management workflows without overhauling technician reporting practices. Thus, it strengthens the operational safety of AI-assisted predictive maintenance systems. • Developed NLP models for information extraction in predictive maintenance. • Introduced a set theoretic framework for characterizing NLP performance in data annotation. • Quantified effects of annotation bias on risk and reliability modeling. • Conducted sensitivity analysis of failure estimates to annotation errors. • Discussed implementation pathways for deployment in operational environments.
Shobanke et al. (Thu,) studied this question.