What question did this study set out to answer?

The study aims to evaluate machine learning pipelines for effectively detecting DDoS attacks and assess their practicality for deployment.

April 15, 2026Open Access

An Evaluation of Supervised Machine Learning Pipelines for the Identification of Distributed Denial-of-Service Attacks Using Conventional and Computational Performance Metrics

Key Points

The study aims to evaluate machine learning pipelines for effectively detecting DDoS attacks and assess their practicality for deployment.
Evaluated 210 pipelines combining five classifiers and various feature selection techniques
Used the CICDDoS2019 dataset for training and testing
Employed composite scoring and statistical testing for pipeline selection
Implemented grid search for hyperparameter tuning and recursive feature elimination for feature selection
Identified the Decision Tree model with Recursive Feature Elimination as the champion pipeline
Achieved a Matthews correlation coefficient (MCC) of 0.993±0.024
Demonstrated a training time of 0.194±0.001 seconds and an inference time of 0.000998±0.00008 seconds
Showed average memory usage of 15,167±322 kilobytes during operation

Abstract

Distributed denial-of-service (DDoS) attacks, a type of Denial-of-Service (DoS) attack in which the targeted server, service or network is overloaded with malicious traffic originating from various different sources with the aim of making such targets inaccessible for legitimate users, continue to pose a pertinent threat to the availability and integrity of organisational digital assets. While many studies have shown that machine learning models can provide high predictive accuracy in detecting such attacks, they often fail to evaluate the practicality of deploying such models to production. This study aims to address this gap by evaluating a considerable amount of pipelines based on five popular supervised classifiers for detecting DDoS attacks using the CICDDoS2019 dataset. The study employs a comprehensive methodology that combines both manual feature removal with automated encoding, scaling and feature selection integrated within pipelines. A total of 210 pipelines formed of five classifiers, three features selectors, two hyperparameter tuners and seven train–test splits were initially evaluated. Pipeline performance was assessed using both conventional and computational performance metrics. To identify the champion pipeline, a two-step approach was employed: composite scoring for shortlisting and statistical testing using Friedman and post hoc Nemenyi tests. The champion pipeline was shown to be Decision Tree coupled with Recursive Feature Elimination (with 20 features selected) and Grid Search hyperparameter tuning with a 90-10 train–test split. It achieved the most optimal balance of predictive capabilities and computational overheads, achieving an MCC of 0.993±0.024, training time of 0.194±0.001 s, inference time of 0.000998±0.00008 s, CPU time of 0.194±0.008 s and average memory usage of 15,167 ± 322 kilobytes across training and inference. The findings highlight the importance of a holistic and more nuanced approach when selecting a champion pipeline that is not only effective but also feasible for deployment in resource-constrained environments.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Adrian Kwiecien

Waddah Saeed

Journals

Mathematical and Computational Applications

Actions

Institutions

De Montfort University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

An Evaluation of Supervised Machine Learning Pipelines for the Identification of Distributed Denial-of-Service Attacks Using Conventional and Computational Performance Metrics

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study