What question did this study set out to answer?

The aim is to automate the extraction of crucial infection indicators from home healthcare notes using instruction-tuned models.

April 15, 2026

Automating infection indicator extraction in home healthcare through instruction-tuned large language models

Key Points

The aim is to automate the extraction of crucial infection indicators from home healthcare notes using instruction-tuned models.
Developed a schema of 26 infection indicator categories.
Expanded training data through targeted annotation, context mutation, and synthetic generation.
Adapted moderate-sized models via Quantized Low-Rank Adaptation (QLoRA).
Compared the performance of different model sizes on a held-out test set.
Instruction-tuned models outperformed larger and smaller model baselines.
The best model, augmented Gemma-12B, achieved a partial micro-averaged F1 score of 0.879.
Data augmentation improved identification of rare indicators and interpretation of negations.
The top model consistently maintained a partial F1 score above 0.750 across all categories.

Abstract

Abstract Objective Home healthcare (HHC) clinical notes contain critical infection indicators that clinicians need in structured “indicator + context” pairs. Data sparsity and limited computing resources hinder automated extraction in decentralized HHC settings. This study developed and evaluated a resource-efficient pipeline using instruction-tuned, moderate-sized large language models (LLMs) to address these barriers. To address the data sparsity challenge, we also assessed the impact of a targeted LLM-based data augmentation strategy. Materials and Methods An expert-defined schema of 26 infection indicator categories was developed. We expanded the training set using a 3-stage workflow: targeted annotation, context mutation, and synthetic generation. We adapted 2 moderate-sized models (Gemma-12B and Qwen-14B) via Quantized Low-Rank Adaptation (QLoRA). We compared them to a larger-sized, prompted model and a smaller-sized, fully fine-tuned LLM. We evaluated all models on a held-out test set using partial micro-averaged F1 score, output reliability metrics, and qualitative error analysis. Results Instruction-tuned moderate-sized LLMs outperformed both baselines. The top-performing model, augmented Gemma-12B, achieved a partial micro-averaged F1 score of 0.879. LLM-based data augmentation enhanced overall performance, improving the identification of rare indicators and the interpretation of negations. The best model maintained a partial F1 score above 0.750 across all indicator categories. It also showed high format adherence, confirming its ability to generate reliable structured outputs. Discussion Instruction-tuning moderate-sized LLMs with QLoRA and targeted data augmentation enables high-accuracy extraction of infection indicators from HHC notes. Conclusion This resource-efficient pipeline provides a scalable foundation for automated infection surveillance in healthcare settings with limited resources.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Zidu Xu

Jiyoun Song

Shengli Zhou

Journals

Journal of the American Medical Informatics Association

Actions

Institutions

Columbia University

University of Minnesota

Icahn School of Medicine at Mount Sinai

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Automating infection indicator extraction in home healthcare through instruction-tuned large language models

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider