What does this research mean for the field?

XGBoost outperforms Random Forest in predicting non-optically active water quality parameters NO3-N and TP in the Chao Phraya River using remote sensing data. Novelty: ClaimNovelty.NOVEL_FINDING. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

The research aims to assess the effectiveness of machine learning models in predicting specific water quality parameters in a data-scarce environment.

March 3, 2026Open Access

Non-optically active water quality parameters prediction by using remote sensing and machine learning

Puntos clave

The research aims to assess the effectiveness of machine learning models in predicting specific water quality parameters in a data-scarce environment.
Utilized two machine learning models: XGBoost and Random Forest.
Performed predictions on non-optically active water quality parameters, NO3-N and TP.
Employed remote sensing data combined with ground measurement data.
Conducted an 80/20 train-test split for model evaluation.
XGBoost outperformed Random Forest in predicting NO3-N (R2 = 0.75) and TP (R2 = 0.61).
Despite a limited sample size, the models identified meaningful patterns.
Model performance varied across cross-validation folds, indicating some instability.

Resumen

ABSTRACT Machine learning in water quality prediction has been a major interest and has achieved satisfactory results in previous research. But there is still a gap in doing the research on the data scarcity environment. The Chao Phraya River is the major river of Thailand, and the PCD has monitored the major rivers' water quality four times per year. There are 19 water quality monitoring stations along the river, including six stations at the lower part of the river. This study investigates two machine learning models' performances on predicting non-optically active water quality parameters, NO3-N and TP, at the lower part of the river by using remote sensing data and ground measurement data. The result indicates that XGBoost outperformed Random Forest in predicting the target variable: NO3-N (R2 = 0.75) and TP (R2 = 0.61) using an 80/20 train-test split. Despite the limited sample size, the models were able to extract meaningful patterns, though performance remained variable across cross-validation folds. These findings highlight the potential of machine learning for water quality prediction in data-scarce environments while emphasizing that more data are required to improve model robustness. This study provides a transferable workflow and contributes an initial step toward water quality prediction in data-limited conditions.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo

Cite This Study

Mon et al. (Sat,) studied this question.

synapsesocial.com/papers/69a67f4af353c071a6f0b269 https://doi.org/https://doi.org/10.2166/ws.2026.125

Me gusta

Guardar

Ver artículo completo