What question did this study set out to answer?

The aim is to evaluate machine learning techniques for estimating surface NO2 concentrations using satellite data from GEMS and TROPOMI.

February 17, 2026Open Access

Machine Learning-Based Estimation of Surface NO2 Concentrations over China: A Comparative Analysis of Geostationary (GEMS) and Polar-Orbiting (TROPOMI) Satellite Data

Key Points

The aim is to evaluate machine learning techniques for estimating surface NO2 concentrations using satellite data from GEMS and TROPOMI.
Utilized geostationary GEMS and polar-orbiting TROPOMI satellite data from 2022.
Trained four tree-based machine learning models: Random Forest, XGBoost, CatBoost, and LightGBM.
Integrated satellite vertical-column densities with multi-source meteorological and ancillary data.
Conducted a controlled experiment to assess data volume effects on model performance.
CatBoost achieved the highest accuracy with R2 values of 0.842 for GEMS and 0.765 for TROPOMI.
Models based on GEMS consistently outperformed those based on TROPOMI across all metrics evaluated.
GEMS estimates identified sharper concentration gradients and localized emission hotspots, unlike TROPOMI's smoother fields.
GEMS enabled reconstruction of detailed diurnal patterns and near-real-time tracking of pollution episodes.

Abstract

High-accuracy spatiotemporal monitoring of surface nitrogen dioxide (NO2) concentrations is essential for air quality management. This study evaluates machine learning-based estimates of near-surface NO2 concentrations using data from the geostationary GEMS instrument and the polar-orbiting TROPOMI over China in 2022. Four tree-based models—Random Forest, XGBoost, CatBoost, and LightGBM—were trained by integrating satellite vertical-column densities with multi-source meteorological and ancillary data. Results show that CatBoost achieved the highest accuracy, with an R2 of 0.842 for GEMS and 0.765 for TROPOMI, alongside the lowest RMSE and MAE. Models trained on GEMS data consistently outperformed TROPOMI-based models across all metrics. This advantage is primarily attributed to the substantially larger training sample size enabled by GEMS’s high temporal resolution, as confirmed through a controlled experiment with consistent sample sizes which isolated the effect of data volume. Spatially, GEMS estimates captured sharper concentration gradients and localized emission hotspots, while TROPOMI produced smoother fields. Temporally, only GEMS allowed the reconstruction of detailed diurnal patterns and near-real-time pollution episode tracking. This study confirms the significant added value of geostationary satellite data for high-frequency air quality monitoring and analysis when combined with machine learning.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Yijin Ma

Yi Wang

Jun Wang

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Machine Learning-Based Estimation of Surface NO2 Concentrations over China: A Comparative Analysis of Geostationary (GEMS) and Polar-Orbiting (TROPOMI) Satellite Data

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study