The rapid integration of artificial intelligence (AI) into educational systems is transforming how student performance is analysed and how educational policies are informed by large-scale data. Within this context, machine learning techniques are increasingly used to identify patterns associated with academic success and educational inequality. However, the use of predictive algorithms in education also raises important questions regarding transparency, fairness, and potential algorithmic bias. This study examines the predictive performance and fairness implications of machine learning models used to identify academically resilient students using data from the Programme for International Student Assessment (PISA) 2022. The analysis is based on a dataset containing more than 600,000 student observations across multiple national education systems. Academic resilience is operationalised following the OECD framework, identifying students who belong to the lowest quartile of the socioeconomic status index (ESCS) within their country while simultaneously achieving mathematics performance in the top quartile (PV1MATH). A predictive framework incorporating six supervised learning algorithms—Logistic Regression, Random Forest, Gradient Boosting, XGBoost, LightGBM, and CatBoost—was implemented. The modelling pipeline includes data preprocessing, missing value imputation, class imbalance correction using SMOTE, and model evaluation through multiple classification metrics, including accuracy, F1-score, and the area under the ROC curve (AUC). In addition, fairness diagnostics are conducted to examine potential disparities in prediction outcomes across gender groups, while feature importance analysis and SHAP-based explanations are used to interpret the contribution of key predictors. The results indicate that ensemble-based models achieve the highest predictive performance, particularly those based on gradient boosting techniques. At the same time, the analysis reveals that socioeconomic status, migration background, and school repetition constitute the most influential predictors of academic resilience. Although gender displays relatively low predictive importance, measurable differences in positive prediction rates across gender groups suggest the presence of potential algorithmic disparities. These findings highlight the importance of integrating fairness evaluation, transparency, and interpretability into educational data science workflows. The study contributes to ongoing discussions on the responsible use of artificial intelligence in education by emphasising the need for governance frameworks capable of ensuring that algorithmic systems support equity-oriented educational policies.
Building similarity graph...
Analyzing shared references across papers
Loading...
Trejo-Macotela et al. (Fri,) studied this question.
www.synapsesocial.com/papers/69db37f94fe01fead37c6174 — DOI: https://doi.org/10.3390/educsci16040605
Francisco R. Trejo-Macotela
Mayra Fabiola González-Peralta
Gregoria C. Godinez-Flores
Education Sciences
Universidad Politécnica de Pachuca
Building similarity graph...
Analyzing shared references across papers
Loading...