Abstract Breast cancer is a heterogeneous disease with distinct clinical and molecular biomarkers used to determine treatment options. The three main clinical subtypes of the disease are Estrogen Receptor (ER+)/HER2-negative, Human Epidermal Growth Receptor 2 positive (HER2+), and Triple Negative Breast Cancer (TNBC). Breast cancer can also be stratified based upon gene expression into four distinct molecular subtypes: Basal-like, HER2-enriched, Luminal A, and Luminal B. These phenotypes are typically assigned using the PAM50 molecular subtyping predictor using bulk-tumor gene expression data. Advances in single cell sequencing now allow researchers to distinguish cell types within tumors and focus on cancerous cells. While PAM50 remains an important biomarker, the algorithm is optimized for bulk-tumor gene expression and performance drops precipitously when applied to individual scRNA-seq cells. An accurate predictor of molecular subtypes of cancer cells at single cell resolution could allow for further exploration into the heterogeneity of breast cancers and their microenvironments. Previously, we developed a single-cell version of PAM50 called scSubtype, however, it was trained using only 2-3 samples per molecular subtype. Here, we build upon this foundation by greatly expanding our training sample set to 151 breast cancer tumors with matched bulk and single cell RNA-seq. We leveraged this new large-scale dataset to develop scSubtype2.0, an updated cancer cell intrinsic molecular subtype predictor at single cell resolution. We filtered for robust samples of each molecular subtype that have matching bulk and single cell derived pseudo-bulk PAM50 calls, high silhouette width, and high cancer cell content. The final training data is made up of 53 tumors encompassing 118,188 tumor cells and at least 10 tumors per molecular subtype. We performed differential expression analyses to identify single-cell subtype-defining genes (LumA: 60 genes, LumB: 129 genes, HER2-enriched: 231 genes, Basal-like: 271 genes) and used them as a signature to assign each tumor cell to the highest scoring subtype. All training set tumors had 80%+ of their cell calls match their corresponding bulk PAM50 subtype, suggesting our lists encapsulate the full breadth of each molecular subtype. To objectively evaluate the new model’s performance and our gene lists, we crafted synthetic, homogenous tumors of each subtype from previously annotated test data set. We show that scSubtype2.0 outputs the correct subtype of the crafted tumors with a 91% accuracy and outperforms its predecessor’s classifications. The algorithm continues to be improved and will soon include predictors of additional tumor cell states including the claudin-low/mesenchymal subtype, and a hypoxic state. Overall, we believe scSubtype2.0 is a robust and accurate predictor of molecular subtypes to individual cancer cells and a novel measure of intra-tumor heterogeneity. Citation Format: Alexander V. Lobanov, Hani Jieun Kim, Sehrish Kanwal, Kate Harvey, John Reeves, Marcel Batten, Beata Kiedik, Daniel L. Roden, Mun N. Hui, Kym Pham Stewart, Oliver Hofmann, Sandra O’Toole, Elgene Lim, Sean M. Grimmond, Alexander Swarbrick, Charles M. Perou. scSubtype2.0: Predictor of breast cancer molecular subtypes at single cell resolution abstract. In: Proceedings of the American Association for Cancer Research Annual Meeting 2026; Part 1 (Regular Abstracts); 2026 Apr 17-22; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2026;86(7 Suppl):Abstract nr 44.
Building similarity graph...
Analyzing shared references across papers
Loading...
Lobanov et al. (Fri,) studied this question.
www.synapsesocial.com/papers/69d1fe07a79560c99a0a481e — DOI: https://doi.org/10.1158/1538-7445.am2026-44
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:
Alexander V Lobanov
Ho Kim
Sehrish Kanwal
Cancer Research
UNC Lineberger Comprehensive Cancer Center
Garvan Institute of Medical Research
Centre for Cancer Biology
Building similarity graph...
Analyzing shared references across papers
Loading...