Accurate protein function prediction is fundamental to advancing drug discovery, precision medicine, and understanding complex biological systems. While gene ontology (GO) provides a standardized framework for protein annotation, a critical challenge persists: the imbalance between low-specificity GO terms and high-specificity GO terms. This imbalance creates blind spots in our understanding of protein function landscapes, particularly in clinically relevant pathways. We present ProGO-PSL, a novel large graph architecture designed to resolve this imbalance. ProGO-PSL simultaneously leverages explicit domain identifier from InterPro and implicit evolutionary context from Multiple Sequence Alignments, fusing these complementary data sources within a powerful imbalance learning framework. Our model consistently outperforms state-of-the-art methods by 5-15% across all specificity levels and on both benchmark dataset and independent test set, demonstrating robust generalization. Furthermore, ProGO-PSL generates interpretable representations that clarify relationships between low- and high-specificity GO terms, enabling a more complete functional characterization of the proteome. This work accelerates the identification of therapeutic targets in previously uncharacterized biological pathways.
Building similarity graph...
Analyzing shared references across papers
Loading...
Shao et al. (Wed,) studied this question.
www.synapsesocial.com/papers/69e1cf1b5cdc762e9d85817e — DOI: https://doi.org/10.1101/gr.280816.125
JiangYi Shao
Shutao Chen
Ziwen Wang
Genome Research
Beijing Institute of Technology
Building similarity graph...
Analyzing shared references across papers
Loading...