What question did this study set out to answer?

To create a machine learning model that automates the interpretation of chest X-rays for TB diagnosis using a small dataset.

February 9, 2026Open Access

PaliGemma-CXR: a multi-task multimodal model for tuberculosis chest X-ray interpretation

Puntos clave

To create a machine learning model that automates the interpretation of chest X-rays for TB diagnosis using a small dataset.
Developed a multimodal dataset for TB diagnosis, segmentation, VQA, object detection, and report generation.
Finetuned the PaliGemma model on five tasks simultaneously.
Used annotated chest X-ray images for training.
Achieved 90.32% accuracy on TB diagnosis.
Attained 98.95% accuracy on close-ended VQA.
Reported a BLEU score of 41.3 for report generation.
Achieved mean average precisions (mAP) of 19.4 for object detection and 16.0 for segmentation.

Resumen

Abstract Background Uganda has a high incidence of tuberculosis (TB), and chest X-rays are widely used for diagnosis. However, interpreting chest X-rays requires radiologists, who are in shortage in Uganda. Machine learning has shown potential to automate this process but requires a large dataset of annotated chest X-ray images. We developed a multi-task, multimodal machine learning model that effectively utilizes a small dataset of X-ray images. Methods Starting with a dataset of chest X-ray images annotated with labels for TB diagnosis and segmentation tasks, we curated a multimodal dataset with labels for Visual Question Answering (VQA), object detection and report generation. We developed PaliGemma-CXR, by finetuning PaliGemma, a foundation vision-language model jointly on all five tasks. Results PaliGemma-CXR achieved competitive performance across all tasks and outperformed the same model architecture when trained separately on individual tasks. Specifically, it attained 90.32% accuracy on TB diagnosis, 98.95% accuracy on close-ended VQA, a BLEU score of 41.3 for report generation, and mean average precisions (mAP) of 19.4 and 16.0 for object detection and segmentation, respectively. Conclusion Our multi-task model outperforms task-specific approaches, making it a more effective and easier-to-deploy solution in low-resource clinical settings where a single model can perform multiple essential tasks.

Me gusta

Guardar

Ver artículo completo