What question did this study set out to answer?

The aim is to improve protein function prediction by addressing the specificity imbalance in gene ontology annotations.

April 17, 2026

Balancing gene ontology annotation specificity in protein function prediction based on the protein sequence large graph

Key Points

The aim is to improve protein function prediction by addressing the specificity imbalance in gene ontology annotations.
Developed ProGO-PSL, a large graph architecture for protein function prediction.
Utilized explicit domain identifiers from InterPro and implicit evolutionary context from Multiple Sequence Alignments.
Implemented an imbalance learning framework to fuse complementary data sources.
ProGO-PSL outperformed existing methods by 5-15% across all specificity levels.
Demonstrated robust generalization on benchmark and independent test sets.
Produced interpretable representations clarifying relationships between GO terms.

Abstract

Accurate protein function prediction is fundamental to advancing drug discovery, precision medicine, and understanding complex biological systems. While gene ontology (GO) provides a standardized framework for protein annotation, a critical challenge persists: the imbalance between low-specificity GO terms and high-specificity GO terms. This imbalance creates blind spots in our understanding of protein function landscapes, particularly in clinically relevant pathways. We present ProGO-PSL, a novel large graph architecture designed to resolve this imbalance. ProGO-PSL simultaneously leverages explicit domain identifier from InterPro and implicit evolutionary context from Multiple Sequence Alignments, fusing these complementary data sources within a powerful imbalance learning framework. Our model consistently outperforms state-of-the-art methods by 5-15% across all specificity levels and on both benchmark dataset and independent test set, demonstrating robust generalization. Furthermore, ProGO-PSL generates interpretable representations that clarify relationships between low- and high-specificity GO terms, enabling a more complete functional characterization of the proteome. This work accelerates the identification of therapeutic targets in previously uncharacterized biological pathways.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Shao et al. (Wed,) studied this question.

www.synapsesocial.com/papers/69e1cf1b5cdc762e9d85817e — DOI: https://doi.org/10.1101/gr.280816.125

Authors

JiangYi Shao

Shutao Chen

Ziwen Wang

Journals

Genome Research

Actions

Institutions

Beijing Institute of Technology

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Balancing gene ontology annotation specificity in protein function prediction based on the protein sequence large graph

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion