Chromatin looping, which facilitates the three-dimensional (3D) organization of the genome, is essential for the regulation of gene expression. This process relies on the interaction of numerous transcription factors (TFs), particularly CCCTC-binding factor (CTCF) and Cohesin, whose dynamic binding patterns orchestrate loop formation. Current computational methods for prediction of CTCF-mediated chromatin loops struggle to perform genome-wide predictions, primarily due to the extreme imbalance between positive and negative samples in training datasets. Existing DNA-sequence-based models often fail to capture the complex dynamics of TF binding and the regulatory code behind chromatin looping. To address these challenges, we present TF-loop, a novel TF regulatory language framework designed to predict chromatin loops. This framework conceptualizes TF sequences, defined by the binding positions and orientations of five key TFs, as a structured "TF language." Using the BERT model, TF-loop decodes the latent linguistic patterns embedded in these sequences, facilitating accurate predictions of chromatin loops. Comparative analysis with state-of-the-art model demonstrates that TF-loop significantly improves prediction accuracy across diverse cell types, even when faced with highly imbalanced datasets. The results highlight the potential of TF-loop to offer a new perspective on decoding the 3D structure of chromatin using natural language processing techniques.
Building similarity graph...
Analyzing shared references across papers
Loading...
Yi-Xuan Qi
Haixia Zhang
Hao-Xiang Tang
Briefings in Bioinformatics
University of Electronic Science and Technology of China
Murray State University
Sichuan Academy of Medical Sciences & Sichuan Provincial People's Hospital
Building similarity graph...
Analyzing shared references across papers
Loading...
Qi et al. (Sun,) studied this question.
www.synapsesocial.com/papers/69e07cc02f7e8953b7cbddea — DOI: https://doi.org/10.1093/bib/bbag162