This paper presents the results of a research project focused on effort estimation in software development using supervised machine learning techniques. To structure the analysis process, the CRISP-DM methodology was adopted, given that it is recognized for its comprehensive approach and wide acceptance in data mining. The study was based on a dataset provided by the International Software Benchmarking Standards Group (ISBSG), to which rigorous cleaning, transformation, and variable selection procedures were applied. Four effort categories were defined, and key variables for their classification were identified, including the functional size of the software, team productivity, programming language, and the implementation platform. Eight predictive models were developed using representative supervised learning algorithms: AdaBoost, Decision Trees, Random Forests, SVM, Multilayer Perceptron, KNN, Naive Bayes, and Logistic Regression. Their evaluation was carried out using metrics such as the F1-score, MCC, ROC-AUC, Gini index, accuracy, and standard deviations to assess performance and stability. The results show that tree-based models, particularly Random Forest, offer superior performance, achieving an accuracy of 80%. It is concluded that having systematized and high-quality data is fundamental for building reliable predictive models. As future work, the study proposes examining additional ensemble configurations, incorporating new algorithms, and using updated versions of the ISBSG repository.
Building similarity graph...
Analyzing shared references across papers
Loading...
Jesus Getial Barragan
Ricardo Timarán Pereira
David W. Ramilo
Building similarity graph...
Analyzing shared references across papers
Loading...
Barragan et al. (Wed,) studied this question.