Pro-inflammatory peptides are key immune signaling molecules that contribute to vaccine and immunotherapy development. Machine learning enables rapid, accurate, and high-throughput prediction of these peptides, complementing traditional experimental approaches with computational methods. This study tests the hypothesis that a peptide’s pro-inflammatory potential can be predicted from its physicochemical properties using an amino acid–based methods, or from k-mer sequence patterns using bag-of-words methods. Model performance was evaluated using a peptide dataset from PIP-EL implemented in the Orange Data Mining platform. The amino acid–based methods employing tree-based algorithms using Random Forest and Gradient Boosting achieved area under the ROC curve (AUC) values of 0.965 and 0.955, respectively, while k-mer (k = 5) methods using Logistic Regression and Neural Networks both achieved AUC values of 0.980, with AUCs consistently above 0.95 for k = 3–8. Performance metrics were calculated for these models, with classification accuracy, F1-score, precision, and recall ranging from 0.91 to 0.93, and Matthews correlation coefficients (MCC) ranging between 0.82 and 0.86. These results demonstrate that properly configured machine learning models can effectively predict pro-inflammatory peptides computationally. While the findings are broadly consistent with previous studies, direct performance comparisons should be interpreted cautiously due to differences in algorithms, underlying hypotheses, therapeutic targets, and datasets. Overall, this study evaluates two machine learning approaches and presents reproducible models with strong performance metrics, which help inform future peptide-based wet-lab therapeutic research.
Yanling Lin (Sat,) studied this question.