March 3, 2026

Effort estimation in software development projects using supervised machine learning techniques

Key Points

Tree-based models, particularly Random Forest, achieved an accuracy of 80%, underscoring effective effort estimation.
F1-score and ROC-AUC were key metrics used for evaluating the predictive stability of developed models.
The CRISP-DM methodology structured the analysis, allowing for rigorous data cleaning and variable selection processes.
Reliable predictive models depend on systematized, high-quality data, highlighting the need for improved datasets.

Abstract

This paper presents the results of a research project focused on effort estimation in software development using supervised machine learning techniques. To structure the analysis process, the CRISP-DM methodology was adopted, given that it is recognized for its comprehensive approach and wide acceptance in data mining. The study was based on a dataset provided by the International Software Benchmarking Standards Group (ISBSG), to which rigorous cleaning, transformation, and variable selection procedures were applied. Four effort categories were defined, and key variables for their classification were identified, including the functional size of the software, team productivity, programming language, and the implementation platform. Eight predictive models were developed using representative supervised learning algorithms: AdaBoost, Decision Trees, Random Forests, SVM, Multilayer Perceptron, KNN, Naive Bayes, and Logistic Regression. Their evaluation was carried out using metrics such as the F1-score, MCC, ROC-AUC, Gini index, accuracy, and standard deviations to assess performance and stability. The results show that tree-based models, particularly Random Forest, offer superior performance, achieving an accuracy of 80%. It is concluded that having systematized and high-quality data is fundamental for building reliable predictive models. As future work, the study proposes examining additional ensemble configurations, incorporating new algorithms, and using updated versions of the ISBSG repository.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Jesus Getial Barragan

Ricardo Timarán Pereira

David W. Ramilo

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Effort estimation in software development projects using supervised machine learning techniques

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study