What question did this study set out to answer?

The aim is to discover new diagnostic biomarkers for breast cancer using machine learning techniques.

April 10, 2026Open Access

Leveraging Machine Learning To Discover Novel Diagnostic Biomarkers for Breast Cancer

Key Points

The aim is to discover new diagnostic biomarkers for breast cancer using machine learning techniques.
Utilized machine learning algorithms in R to analyze breast cancer datasets from the GEO database.
Screened differentially expressed genes to select key feature genes using a machine learning feature selection model.
Validated gene expression through quantitative polymerase chain reaction (qPCR), Western blots, and immunohistochemistry (IHC).
Identified two diagnostic gene candidates, S100P and COL10A1, consistently expressed at higher levels in breast cancer tissues compared to normal controls.
Both biomarkers showed significant expression across all experimental validations.

Abstract

Breast cancer is still the most significant contributor to morbidity and mortality among women in China. Despite advances in imaging and molecular testing, few reliable biomarkers exist for early detection and disease characterization. The identification of new marker genes related to breast carcinogenesis could greatly improve diagnostic accuracy, and potentially influence treatment decisions. In this study, machine learning algorithms were implemented using the R programming environment to evaluate three publicly available breast cancer datasets included in the Gene Expression Omnibus (GEO) database. We screened differentially expressed genes and then selected the best feature genes using a machine learning cased feature selection model. Finally, we experimentally validated these genes by performing quantitative polymerase chain reaction (qPCR), Western blots, and immunohistochemistry (IHC). By intersecting the top 10 signature genes from each dataset, we were able to identify two consistently diagnostic gene candidates; S100P and COL10A1. Both genes were discovered to exhibit significantly greater expression in the tissues of breast cancer vs. normal controls, across all experimental validation. Our results suggest that S100P and COL10A1 may be appropriate as adjunct molecular biomarkers for improved early and accurate breast cancer diagnosis and could be especially helpful in cases with indeterminate morphological features to improve detection rates and decrease cancer related.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Xin Chen

Wenyang Pang

Yuanfan Wang

Journals

International Journal of Computational Intelligence Systems

Actions

Institutions

Taizhou Municipal Hospital

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Leveraging Machine Learning To Discover Novel Diagnostic Biomarkers for Breast Cancer

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study