Abstract Hepatitis, a severe liver infection caused by different viral strains, poses a significant diagnostic challenge due to overlapping symptoms. Early and accurate classification of hepatitis types (A, B, C, D, and E) is crucial for effective treatment and preventing complications. This study introduces a hybrid ensemble architecture combining stacking and soft-voting methods for multi-class hepatitis diagnosis using symptom-only inputs. The research utilizes a validated Kaggle dataset containing 4,920 samples and 133 features, including 132 symptom attributes and one target variable. The dataset is split into 80% for training and 20% for testing, ensuring robust model generalization. Various machine learning models are evaluated, including Decision Trees, Random Forest, Support Vector Machines, Gradient Boosting, Logistic Regression, Multinomial Naïve Bayes, and K-Nearest Neighbors. Among these, ensemble techniques—particularly stacking with Decision Tree and Logistic Regression as meta-models and the Voting Classifier using soft voting—demonstrate superior performance, achieving classification accuracies of up to 0.99. This non-invasive, symptom-based approach improves diagnostic precision, especially in differentiating between diseases with similar symptoms like jaundice and cholestasis, offering a scalable and cost-effective tool for early hepatitis detection in resource-limited settings.
Deotale et al. (Fri,) studied this question.