Abstract Transcription factors (TFs) regulate gene expression by binding to specific DNA sites on genome, making accurate TF binding site prediction critical for understanding gene regulation and downstream phenotypes. Almost all current deep learning methods use only DNA-related information to predict TF binding sites, ignoring the fact that different TF protein sequences and structures recognize distinct DNA patterns. Not leveraging TF information not only limits prediction accuracy but also makes the methods not generalizable to predicting binding sites of new TFs that do not exist in the training data. Here, we present TransBind, a protein-aware deep learning architecture that integrates DNA sequence information with protein embeddings containing both sequence and structural information derived from a protein language model pretrained on DNA-binding proteins, to improve TF binding site prediction. Through the cross-attention, a TF embedding selectively attends to genomic regions according to its unique binding properties. Evaluated on the data of 690 ChIP-seq experiments spanning 161 TFs across 91 human cell types, TransBind achieves an AUROC of 0. 9508 and AUPR of 0. 3741—representing a 11. 8% relative AUPR improvement over state-of-the-art methods including TBiNet, EPBDXDNABERT-2, DanQ, and DeepSEA. The model outperformed existing methods in 98% of TF–cell type combinations. It also recovered 160 known TF binding motifs in the JASPAR database, providing the biological interpretability of the model. Moreover, the approach enables label-zero-shot prediction for unseen TFs, demonstrating its potential of generalizing to new, poorly characterized TFs. The source code of TransBind is available at https: //github. com/jianlin-cheng/TransBind. The version used in this work is archived at https: //doi. org/10. 5281/zenodo. 19462292.
Building similarity graph...
Analyzing shared references across papers
Loading...
Shreya Basnet
Jianlin Cheng
NAR Genomics and Bioinformatics
University of Missouri
University of Missouri Health System
Building similarity graph...
Analyzing shared references across papers
Loading...
Basnet et al. (Fri,) studied this question.
synapsesocial.com/papers/69fd7e79bfa21ec5bbf06a97 — DOI: https://doi.org/10.1093/nargab/lqag047