In a world where cancer is the leading cause of death, and treatments available are neither curative nor targeted, there exists an unmet need for the accelerated development of new therapies. Anti-Cancer Peptides (ACPs) represent a possible solution, because they are drawn to the unique characteristics of cancer cells and provide a low toxicity targeted alternative. However, experimental identification of such peptides can be an arduous process, therefore requiring a method of computationally completing this task before the ACPs reach the lab. In this study, machine learning was explored to predict peptides with anti-cancer properties through the use of several machine learning algorithms (Support Vector Machines (SVMs), Decision Trees, Logistic Regression, Gradient Boosting, Neural Networks). SVMs were predicted to perform the highest due to their complex nature, and the models were expected to perform well overall. After correcting class imbalances, doing featurization using amino acid properties, doing an 80-20 training-testing split, and using 10-fold cross-validation, the models excluding SVM were shown to have a prediction accuracy of about 84.4%. All models besides SVM also performed well, with Gradient Boosting performing particularly well with 90.4% accuracy, refuting the hypothesis that SVM would perform the highest, but supporting the prediction that machine learning would be overall effective in ACP prediction. Analysis of SVM performance revealed that its poor performance most likely resulted from the data featurization techniques employed. Ultimately, the results of this study show the potential of machine learning in the identification and development of ACPs for future cancer treatment.
Aysha Jafar (Sat,) studied this question.