Given the pivotal role of noncovalent interactions in protein–protein interactions (PPIs), exploring the hidden patterns underlying the interaction data has become essential for deciphering and evaluating PPIs. In the current study, different types of noncovalent interaction data were generated from 44848 pdb files collected from the RCSB-PDB database, based on which twenty-five machine learning algorithms were benchmarked using default parameters, with top performers selected for subsequent hyperparameter optimization. Then, optimized models underwent feature selection and were subsequently ensembled via stacking and voting classifiers before comprehensive performance evaluation on test data. Finally, 12 models were built to evaluate the relationship between PPIs and noncovalent interactions after optimization. Among them, ETsO achieved the best performance across all eight metrics (>0. 9, only Specificity and MCC < 0. 9), followed closely by the three stacking models (SMₑt487, SMₛe375 and SMdt415) and ETsOFS. The SHAP analysis was used for elucidating the contribution of noncovalent interactions in PPIs, which indicated that PPIs depend inherently on synergistic effects among multiple noncovalent interactions. Further feature analysis indicated a notable divergence in features using behaviors among the three models after FS, with varying frequencies of different interactions observed among the top 20 polynomial features. The current study provided new practical tools for PPI prediction and supplied valuable insights into the molecular determinants of protein recognition.
Feng et al. (Mon,) studied this question.