What type of study is this?

This is a Literature Review study.

What question did this study set out to answer?

This literature survey aims to evaluate the role of machine learning in mutation prediction for breast cancer using next-generation sequencing data.

May 6, 2026Open Access

A Comprehensive Literature Survey on Machine Learning-Based Mutation Prediction in Breast Cancer Using Next-Generation Sequencing Data

Key Points

This literature survey aims to evaluate the role of machine learning in mutation prediction for breast cancer using next-generation sequencing data.
Literature review of studies from 2020 to 2026 on ML techniques in breast cancer mutation prediction.
Assessment of bioinformatic workflows including data quality, preprocessing, and variant calling.
Evaluation of various predictive models like Support Vector Machines and Deep Neural Networks.
Machine learning and NGS enhance mutation prediction accuracy in breast cancer.
Identified challenges in workflows, including dataset variance and model interpretability.
Recommendations for future work include using explainable AI and automated prediction pipelines.

Abstract

Breast cancer is an important issue and continues to be one of the most significant global health problems due to its complexity and the difficulty in finding clinically relevant mutations as early as possible. Early detection of breast cancer, especially for improving diagnosis, prognosis, and individualized therapies, depends greatly on accurate identification of mutations present in patients at the time of diagnosis. Recently, Next Generation Sequencing (NGS) technology has allowed for the rapid and large-scale investigation of genomic variations, which increases our understanding of mutation profiles and mutational classes. Additionally, there has been a major growth in the use of machine learning (ML) and deep learning (DL) algorithms as tools to analyze large amounts of genomic data at a high dimension, providing improved prediction accuracy. This paper provides a literature review of recent studies (2020 through 2026) that discuss how NGS data and ML have been used to develop and refine methods to identify mutations in breast cancer patients. This review also describes the entire end-to-end bioinformatic workflow, starting from evaluation of data quality through preprocessing, sequence alignment, variant calling, and annotation. Also, particular emphasis is placed on the various approaches to feature extraction and to the application of different predictive models (i.e., Support Vector Machines, Random Forests, and Deep Neural Networks) and how these models can be used in practice and/or have limitations. The literature review highlights numerous challenges which continue to persist. Specifically, there are many cases in which bioinformatics processing and ML modeling are treated as separate processes, resulting in fragmented workflow. Additionally, there have been scarce attempts in applying control datasets alongside cancerous datasets in order to provide more comprehensive comparisons between healthy samples and tumor samples when characterizing mutations. Problems such as dataset variance, lack of automated workflows, and low interpretability of models also limit how useful these prediction models can be clinically. To address some of these findings, the authors outline future work including bringing together automated prediction pipelines, implementing comparisons to genomes, and applying better feature engineering with biologically significant features. Additionally, there is the suggestion to begin using explainable AI in these predictive models. Overall, this survey presents a good basis for future mutation prediction methods to be more interpretable, high-throughput, and accurate in regards to precision medicine for breast cancer. From the application perspective, closing the gap between biological knowledge and integrating it with prediction methods is required.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Sisodia et al. (Wed,) studied this question.

synapsesocial.com/papers/69faa2b504f884e66b5334bf https://doi.org/https://doi.org/10.5281/zenodo.20021196

Bookmark

View Full Paper