What question did this study set out to answer?

To develop a machine learning model for classifying peptides based on their self-assembly characteristics.

February 14, 2026Open Access

Predicting Peptide Self-Assembly Using Machine Learning

Key Points

To develop a machine learning model for classifying peptides based on their self-assembly characteristics.
Created a supervised machine learning model to classify peptides
Used a dataset of 42,532 peptide sequences labeled for self-assembly
Implemented k-mer-based representation for feature extraction
Evaluated multiple classification algorithms including logistic regression and neural networks
Calibrated models with k-fold cross-validation
Non-linear models outperformed linear classifiers in predicting peptide self-assembly
The study highlights complex, non-linear relationships between sequence and assembly behavior

Abstract

The phenomenon of peptide self-assembly has both biological and technological applications and contributes toward creating biomaterials and nano-structures. Currently, predicting how a sequence of amino acids will self-assemble based on its composition using computational biology is problematic because of the non-linear relationship between peptide sequence and structure. In this study, a supervised machine learning model was created to classify peptides as either self-assembling (positive or 1) or non-self assembling (negative or 0) based on their characteristics derived from the sequence of amino acids. The dataset contained 42,532 sequences of peptides with associated positive or negative labels for self-assembly. K-mer-based representation was used for feature extraction of peptide sequences and multiple related classification algorithms were trained and evaluated in the visual programming environment provided by Orange3. Both linear and non-linear classifiers such as logistic regression, random forest, support vector machine, and neural networks were employed. All models were calibrated with k-fold cross-validation and assessed with the use of standard performance measures. The data reveal that non-linear modeling approaches outperform linear models in this context. This finding supports the assertion that the behaviors of self-assembling peptides are the result of complex non-linear sequence arrangements.

Bookmark

View Full Paper

Bookmark

View Full Paper

Predicting Peptide Self-Assembly Using Machine Learning

Key Points

Abstract

Cite This Study