What question did this study set out to answer?

The aim is to predict hypothyroidism earlier using machine learning algorithms and appropriate feature selection techniques.

January 22, 2026Open Access

Early Prediction of Hypothyroidism using Chi-Square Feature Selection and SMOTE-ENN with Randomized Search CV Compared to Grid-Based Search

Key Points

The aim is to predict hypothyroidism earlier using machine learning algorithms and appropriate feature selection techniques.
Utilized machine learning algorithms with data pre-processing and feature selection.
Handled class imbalance using SMOTE-ENN to balance the dataset.
Optimized model hyperparameters using Random Search Cross-Validation.
Evaluated multiple classifiers like Decision Tree, Random Forest, and K-Nearest Neighbor.
Achieved high accuracy of 98.81% in predicting hypothyroidism.
Implemented ensemble learning strategies to enhance predictive performance.
Demonstrated effectiveness through metrics such as precision, recall, and F1-score.

Abstract

Abstract Objective: To predict hypothyroid disorder at an earlier stage using Machine Learning Algorithms. Method: An early diagnosis of hypothyroidism was addressed through a machine-learning algorithm-based framework incorporating data pre-processing, handling imbalance dataset, selecting relevant features, data splitting, and model optimization. An unbalanced dataset was converted to a balanced dataset using SMOTEEN. To address the effect of class imbalance and boost the learning of minority class data elements. Feature selection is performed using a filter-based Chi-square test to discover the most appropriate features, thus enhancing the model performance and minimizing overfitting. Random Search Cross-Validation (Randomized CV) was used to train and optimize a subset of important machine learning classifiers, including Decision tree (DT), Random Forest (RF), and K closest neighbor (KNN), to determine the optimal hyperparameters. Findings: An ensemble learning strategy using voting techniques was implemented to improve predictive performance. The effectiveness of each model was measured using performance metrics such as precision, recall, F1-score, and accuracy of 98.81%. The results of the experiments show that the proposed method greatly enhances categorization and can be a reliable instrument for the early detection of hypothyroidism in clinical decision support systems. Novelty: Random Search is a well-defined structure in hyperparameter tuning for the prediction of hypothyroidism earlier and faster, ensuring computational efficacy and robust performance, and assisting as a stepping stone for advanced optimization strategies. Keywords: Hypothyroidism, SMOTE-ENN, Pre-processing, Feature selection, Performance Metrics

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Chitra et al. (Sun,) studied this question.

synapsesocial.com/papers/6971be50642b1836717e2f2b https://doi.org/https://doi.org/10.17485/ijst/v19i1.1928

Bookmark

View Full Paper