What question did this study set out to answer?

To improve the accuracy of predicting protein-protein interactions using pseudo-dimers derived from monomeric proteins.

June 3, 2026Open Access

Improving protein and protein interactions using pseudo-dimers derived from monomeric proteins

Key Points

To improve the accuracy of predicting protein-protein interactions using pseudo-dimers derived from monomeric proteins.
Introduced a pre-training method called split and merge proxy (SMP) utilizing monomeric proteins.
Constructed pseudo-dimers by splitting monomeric data into pseudo-receptors and pseudo-ligands.
Fine-tuned models on real protein dimer datasets after pre-training with SMP.
Models pre-trained with SMP showed improved accuracy on multiple benchmarks compared to strong baselines.
SMP outperformed AlphaFold-Multimer and AlphaFold3 in structure predictions on CASP15 dimer targets.
Enhanced generalization across various protein interaction applications.

Abstract

Accurately predicting protein-protein interactions (PPIs) in dimeric complexes remains a fundamental challenge in computational biology. Although existing PPIs prediction models, such as AlphaFold-Multimer (AF-Multimer) and AlphaFold3 (AF3), have achieved impressive performance, they still suffer from unsatisfactory accuracy due to the limited availability of protein dimer structures, whose collection is both expensive and labor-intensive. Here, we introduce a simple yet effective pre-training method, termed split and merge proxy (SMP), that leverages abundant monomeric proteins to simulate various PPIs tasks for the first time. Specifically, SMP constructs pseudo-dimers by splitting monomer data into two subunits, referred to as pseudo-receptors and pseudo-ligands, and trains models to merge them back by predicting their pseudo interactions (e.g., contact or docking). This proxy task enables large-scale pre-training without additional cost. Models pre-trained with SMP and subsequently fine-tuned on real protein dimer datasets demonstrate consistently improved accuracy and generalization across multiple benchmarks, surpassing strong baselines. Notably, SMP delivers more accurate structure predictions than both AF-Multimer and AF3 on several CASP15 dimer targets. Our findings highlight SMP as a scalable strategy for harnessing monomeric data to advance protein complex modeling, providing insights into the linkage between monomers and multimers. Accurate prediction of protein-protein interactions is limited by the scarcity of high-quality complex structures. Here, authors introduce SMP, a strategy that leverages pseudo-dimers derived from monomers to improve accuracy and generalization across diverse protein interaction applications.

Mark Helpful

Bookmark

Relay

View Full Paper