What question did this study set out to answer?

The aim is to assess a new deep learning model for classifying Brassica species based on genomic codon usage patterns.

April 13, 2026Open Access

GGAR: gradient guided adaptive regularization enhances deep learning classification of brassica species using codon usage bias

Key Points

The aim is to assess a new deep learning model for classifying Brassica species based on genomic codon usage patterns.
Developed a Gradient Guided Adaptive Regularization (GGAR) Multilayer Perceptron (MLP) model.
Compared GGAR against five other MLP approaches and a traditional 1D-CNN.
Utilized 10-fold cross-validation with various hyperparameter configurations.
Measured performance using accuracy, precision, recall, F1-score, and Matthews Correlation Coefficient (MCC).
GGAR outperformed existing models achieving near perfect accuracy and F1-score in low learning rates.
Statistical analysis confirmed GGAR's superior performance compared to traditional models (p < 0.001).
Fixed L1 performed well at higher learning rates but GGAR excelled in low rate regimes.
Training time was longer for GGAR but yielded higher accuracies compared to faster models.

Abstract

This study explores a comprehensive assessment of deep learning models for classification of four Brassica species (Brassica juncia, Brassica napus, Brassica oleracea, and Brassica rapa) based on codon usage frequency patterns mined from their whole CDS genomes. We compared the performance of a novel Gradient Guided Adaptive Regularized (GGAR) Multilayer Perceptron (MLP) model against five panelized approaches of MLP, Adaptive, Elastic Net, Fixed L1, Fixed L2, base line MLP and one traditional 1D-CNN model, across multiple hyper parameter configurations (learning rates: 0.01, 0.001, 0.0001; batch sizes: 32, 64, 128, 256). The models were evaluated using 10-fold cross-validation, with performance metrics including accuracy, precision, recall, F1-score, and Matthews Correlation Coefficient (MCC). The results shows that GGAR consistently performed better than its existing models models in low learning rate of 0.0001 and batch sizes of 32, 64 and 128 settings, attaining near perfect classification accuracy, recall, mcc and F1 approximately equals to 1. Statistical validation via Kruskal–Wallis and ANOVA tests confirmed GGAR’s superiority (p < 0.001) over comparative models as well as over traditional CNN model in all evaluation scenarios. Notably, Fixed L1 and CNN excelled at higher learning rates of 0.01 and 0.001, while GGAR dominated in fine-tuned, low rate regimes, signifying its effectiveness in handling indirect genomic patterns. The analysis of training durations showed that Fixed L1 was computationally efficient, completing analysis in 5.90–91.52 min. In contrast, GGAR demanded more time from 6.38 to 124.78 min but achieved higher accuracies. While the MLP baseline performed competitively, its results were less consistent, and Elastic Net and Fixed L2 demonstrated clear speed versus precision tradeoffs. The CNN also gives exceptional performance with very low execution speed from 99.49 to 179.25 min. These results highlights the significance of adaptive regularization in genomic classification, with GGAR showing particularly effective for precise species classification. This study introduces a practical guidance for filtering deep learning models in bioinformatics, stressing how regularization approaches and hyper parameter tuning influence deep learning model performance.

Bookmark

View Full Paper

Cite This Study

Shahzad et al. (Sat,) studied this question.

synapsesocial.com/papers/69dc892e3afacbeac03eae75 https://doi.org/https://doi.org/10.1186/s12859-026-06440-0

Bookmark

View Full Paper