To develop a multimodal machine learning framework for pneumonia detection from lung sound recordings to address the challenge of timely and affordable diagnosis in resource-limited settings. We designed a progressive semi-supervised learning model that processes lung sound signals in three forms: time-series data, spectrogram representations, and wavelet transforms. Contrastive and diversity losses were introduced during progressive training to improve generalization and reduce overfitting with limited labeled data. The proposed framework achieved state-of-the-art performance with an accuracy of 97.85% and an F1-score of 97.80%, outperforming existing unimodal and multimodal benchmarks. This approach shows strong potential as a reliable and efficient noninvasive screening tool for pneumonia, offering robust performance with a minimal computational footprint.
Jobayer et al. (Thu,) studied this question.