Abstract Introduction Low-cost particulate matter sensors (LCPMS) require calibration for reliable PM 2.5 measurements. We propose a novel data-analytic framework to evaluate machine learning (ML) calibration approaches for fine particulate matter (PM 2.5 ) across the complete environmental and temporal scales relevant to the geographic location via a long-term co-location study in Vishakhapatnam, India. Methods We compared Random Forest (RF) and eXtreme Gradient Boosting (XGB) models while systematically incorporating temperature (T), relative humidity (RH), hour of day (HD), and month of year (MY) variables against baseline linear regression (LR) models. Additionally, a hybrid ensemble LR model combining the best-performing ML models was also explored. Overall model performance was assessed against the United States Environmental Protection Agency (USEPA) recommended performance metrics for low-cost PM₂.₅ sensors. In addition, the developed models were evaluated across environmental (T and RH) and temporal (TD and MY) scales by binning the respective variables. Categorical accuracies across air quality index (AQI) categories are also explored. Results and Discussion The best ML model reduced RMSE by 58% compared to the baseline LR model; the hybrid model performed better with a 63% reduction in RMSE compared to the baseline. The hybrid model exhibited the least errors across most environmental and temporal conditions, while satisfying the USEPA performance criteria. Conclusions The hybrid ensemble approach mitigates environmental and temporal variability effects on measurement accuracy, improving PM 2.5 quantification in diverse field conditions. Our framework provides a robust, computationally efficient approach that is sensor-model agnostic and adaptable to various target pollutants and calibration methodologies. Graphical abstract
Wathore et al. (Sun,) studied this question.