What question did this study set out to answer?

This research aims to predict anti-cancer peptides using machine learning classification techniques.

February 14, 2026Open Access

Using Supervised Machine Learning Classification Techniques to Predict Anti-Cancer Peptides

Key Points

This research aims to predict anti-cancer peptides using machine learning classification techniques.
Applied several machine learning algorithms including SVMs, Decision Trees, and Gradient Boosting.
Conducted an 80-20 training-testing split and used 10-fold cross-validation to assess model performance.
Addressed class imbalances and utilized amino acid properties for feature extraction.
Gradient Boosting achieved the highest accuracy of 90.4%, contrary to the expectation that SVM would perform best.
Other models, excluding SVM, demonstrated about 84.4% prediction accuracy overall.
SVM's lower performance was linked to the data featurization techniques used.

Abstract

In a world where cancer is the leading cause of death, and treatments available are neither curative nor targeted, there exists an unmet need for the accelerated development of new therapies. Anti-Cancer Peptides (ACPs) represent a possible solution, because they are drawn to the unique characteristics of cancer cells and provide a low toxicity targeted alternative. However, experimental identification of such peptides can be an arduous process, therefore requiring a method of computationally completing this task before the ACPs reach the lab. In this study, machine learning was explored to predict peptides with anti-cancer properties through the use of several machine learning algorithms (Support Vector Machines (SVMs), Decision Trees, Logistic Regression, Gradient Boosting, Neural Networks). SVMs were predicted to perform the highest due to their complex nature, and the models were expected to perform well overall. After correcting class imbalances, doing featurization using amino acid properties, doing an 80-20 training-testing split, and using 10-fold cross-validation, the models excluding SVM were shown to have a prediction accuracy of about 84.4%. All models besides SVM also performed well, with Gradient Boosting performing particularly well with 90.4% accuracy, refuting the hypothesis that SVM would perform the highest, but supporting the prediction that machine learning would be overall effective in ACP prediction. Analysis of SVM performance revealed that its poor performance most likely resulted from the data featurization techniques employed. Ultimately, the results of this study show the potential of machine learning in the identification and development of ACPs for future cancer treatment.

Bookmark

View Full Paper

Bookmark

View Full Paper

Using Supervised Machine Learning Classification Techniques to Predict Anti-Cancer Peptides

Key Points

Abstract

Cite This Study