ABSTRACT Machine learning in water quality prediction has been a major interest and has achieved satisfactory results in previous research. But there is still a gap in doing the research on the data scarcity environment. The Chao Phraya River is the major river of Thailand, and the PCD has monitored the major rivers' water quality four times per year. There are 19 water quality monitoring stations along the river, including six stations at the lower part of the river. This study investigates two machine learning models' performances on predicting non-optically active water quality parameters, NO3-N and TP, at the lower part of the river by using remote sensing data and ground measurement data. The result indicates that XGBoost outperformed Random Forest in predicting the target variable: NO3-N (R2 = 0.75) and TP (R2 = 0.61) using an 80/20 train-test split. Despite the limited sample size, the models were able to extract meaningful patterns, though performance remained variable across cross-validation folds. These findings highlight the potential of machine learning for water quality prediction in data-scarce environments while emphasizing that more data are required to improve model robustness. This study provides a transferable workflow and contributes an initial step toward water quality prediction in data-limited conditions.
Mon et al. (Sat,) studied this question.