Abstract Objective Home healthcare (HHC) clinical notes contain critical infection indicators that clinicians need in structured “indicator + context” pairs. Data sparsity and limited computing resources hinder automated extraction in decentralized HHC settings. This study developed and evaluated a resource-efficient pipeline using instruction-tuned, moderate-sized large language models (LLMs) to address these barriers. To address the data sparsity challenge, we also assessed the impact of a targeted LLM-based data augmentation strategy. Materials and Methods An expert-defined schema of 26 infection indicator categories was developed. We expanded the training set using a 3-stage workflow: targeted annotation, context mutation, and synthetic generation. We adapted 2 moderate-sized models (Gemma-12B and Qwen-14B) via Quantized Low-Rank Adaptation (QLoRA). We compared them to a larger-sized, prompted model and a smaller-sized, fully fine-tuned LLM. We evaluated all models on a held-out test set using partial micro-averaged F1 score, output reliability metrics, and qualitative error analysis. Results Instruction-tuned moderate-sized LLMs outperformed both baselines. The top-performing model, augmented Gemma-12B, achieved a partial micro-averaged F1 score of 0.879. LLM-based data augmentation enhanced overall performance, improving the identification of rare indicators and the interpretation of negations. The best model maintained a partial F1 score above 0.750 across all indicator categories. It also showed high format adherence, confirming its ability to generate reliable structured outputs. Discussion Instruction-tuning moderate-sized LLMs with QLoRA and targeted data augmentation enables high-accuracy extraction of infection indicators from HHC notes. Conclusion This resource-efficient pipeline provides a scalable foundation for automated infection surveillance in healthcare settings with limited resources.
Building similarity graph...
Analyzing shared references across papers
Loading...
Zidu Xu
Jiyoun Song
Shengli Zhou
Journal of the American Medical Informatics Association
Columbia University
University of Minnesota
Icahn School of Medicine at Mount Sinai
Building similarity graph...
Analyzing shared references across papers
Loading...
Xu et al. (Sat,) studied this question.
www.synapsesocial.com/papers/69df2a99e4eeef8a2a6afad6 — DOI: https://doi.org/10.1093/jamia/ocag040
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: