What question did this study set out to answer?

The aim is to optimize machine learning models for predicting protein-protein interactions based on noncovalent interactions.

March 4, 2026Open Access

Machine Learning Methods for Protein–Protein Interaction Prediction Based on Noncovalent Interactions

Key Points

The aim is to optimize machine learning models for predicting protein-protein interactions based on noncovalent interactions.
Generated noncovalent interaction data from 44848 pdb files from RCSB-PDB database.
Benchmarking twenty-five machine learning algorithms for performance evaluation.
Utilized hyperparameter optimization and feature selection techniques.
Implemented stacking and voting classifiers for model ensembling.
Conducted SHAP analysis to examine the contributions of noncovalent interactions.
ETsO model achieved the highest performance across all metrics (>0.9).
Three stacking models followed closely in performance.
Significant insights into the dependence of PPIs on synergistic noncovalent interactions were found.
Different models displayed variation in the top polynomial features after feature selection.

Abstract

Given the pivotal role of noncovalent interactions in protein–protein interactions (PPIs), exploring the hidden patterns underlying the interaction data has become essential for deciphering and evaluating PPIs. In the current study, different types of noncovalent interaction data were generated from 44848 pdb files collected from the RCSB-PDB database, based on which twenty-five machine learning algorithms were benchmarked using default parameters, with top performers selected for subsequent hyperparameter optimization. Then, optimized models underwent feature selection and were subsequently ensembled via stacking and voting classifiers before comprehensive performance evaluation on test data. Finally, 12 models were built to evaluate the relationship between PPIs and noncovalent interactions after optimization. Among them, ETsO achieved the best performance across all eight metrics (>0. 9, only Specificity and MCC < 0. 9), followed closely by the three stacking models (SMₑt487, SMₛe375 and SMdt415) and ETsOFS. The SHAP analysis was used for elucidating the contribution of noncovalent interactions in PPIs, which indicated that PPIs depend inherently on synergistic effects among multiple noncovalent interactions. Further feature analysis indicated a notable divergence in features using behaviors among the three models after FS, with varying frequencies of different interactions observed among the top 20 polynomial features. The current study provided new practical tools for PPI prediction and supplied valuable insights into the molecular determinants of protein recognition.

Machine Learning Methods for Protein–Protein Interaction Prediction Based on Noncovalent Interactions

Key Points

Abstract

Cite This Study