Breast cancer is an important issue and continues to be one of the most significant global health problems due to its complexity and the difficulty in finding clinically relevant mutations as early as possible. Early detection of breast cancer, especially for improving diagnosis, prognosis, and individualized therapies, depends greatly on accurate identification of mutations present in patients at the time of diagnosis. Recently, Next Generation Sequencing (NGS) technology has allowed for the rapid and large-scale investigation of genomic variations, which increases our understanding of mutation profiles and mutational classes. Additionally, there has been a major growth in the use of machine learning (ML) and deep learning (DL) algorithms as tools to analyze large amounts of genomic data at a high dimension, providing improved prediction accuracy. This paper provides a literature review of recent studies (2020 through 2026) that discuss how NGS data and ML have been used to develop and refine methods to identify mutations in breast cancer patients. This review also describes the entire end-to-end bioinformatic workflow, starting from evaluation of data quality through preprocessing, sequence alignment, variant calling, and annotation. Also, particular emphasis is placed on the various approaches to feature extraction and to the application of different predictive models (i.e., Support Vector Machines, Random Forests, and Deep Neural Networks) and how these models can be used in practice and/or have limitations. The literature review highlights numerous challenges which continue to persist. Specifically, there are many cases in which bioinformatics processing and ML modeling are treated as separate processes, resulting in fragmented workflow. Additionally, there have been scarce attempts in applying control datasets alongside cancerous datasets in order to provide more comprehensive comparisons between healthy samples and tumor samples when characterizing mutations. Problems such as dataset variance, lack of automated workflows, and low interpretability of models also limit how useful these prediction models can be clinically. To address some of these findings, the authors outline future work including bringing together automated prediction pipelines, implementing comparisons to genomes, and applying better feature engineering with biologically significant features. Additionally, there is the suggestion to begin using explainable AI in these predictive models. Overall, this survey presents a good basis for future mutation prediction methods to be more interpretable, high-throughput, and accurate in regards to precision medicine for breast cancer. From the application perspective, closing the gap between biological knowledge and integrating it with prediction methods is required.
Sisodia et al. (Wed,) studied this question.