Background The health sector faces challenges in analysing unstructured medical transcription data, particularly in identifying semantic similarities between clinical question pairs for information retrieval. A major challenge is that it is not feasible to obtain sufficiently large and representative data for specialised machine learning models due to privacy policies. Data augmentation could help alleviate these challenges and, therefore, requires investigation. Methods This study investigated two models, Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) and Artificial Neural Network (ANN), for semantic text similarity classification. An annotated dataset of 3,048 medical question pairs from Hugging Face was employed. The LSTM-RNN model was developed using an Embedding layer and an LSTM layer, whereas the ANN model was developed without these layers. Experiments were conducted in TensorFlow for 100 epochs with an 80:20 train-validation split. Results Six performance metrics were recorded: Accuracy, Binary Cross-Entropy Loss, Area Under the Curve (AUC), Precision, Recall, and F1-Score. The augmented LSTM-RNN significantly outperformed other configurations, achieving a validation accuracy of 95.31%, a loss of 0.2931, an AUC of 97.39%, a precision of 94.69%, a recall of 96.21 %, and an F1-score of 95.44%. Without augmentation, the LSTM-RNN validation accuracy dropped to 58.03%. The augmented ANN achieved a validation accuracy of 82.87%, while the non-augmented ANN struggled with 50.49% accuracy. Conclusions The inclusion of LSTM and Embedding layers allowed the LSTM-RNN to capture contextual dependencies that the ANN could not. The results demonstrate that data augmentation is important for achieving high-performance metrics in clinical text analysis, where data is limited.
Building similarity graph...
Analyzing shared references across papers
Loading...
Daniel A Folorunso
Adio T Akinwale
Alaba O Adejimi
Cureus Journal of Computer Science.
Building similarity graph...
Analyzing shared references across papers
Loading...
Folorunso et al. (Tue,) studied this question.
www.synapsesocial.com/papers/69f2a4da8c0f03fd67763fc3 — DOI: https://doi.org/10.7759/s44389-026-00062-6
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: