What question did this study set out to answer?

This research aims to improve the calibration of low-cost PM2.5 sensors using machine learning models.

February 12, 2026Open Access

Exploring Environmental and Temporal Performances of Machine Learning Models for Calibration of a Low-Cost PM2.5 Sensor

Key Points

This research aims to improve the calibration of low-cost PM2.5 sensors using machine learning models.
Evaluated various machine learning models including Random Forest and Gradient Boosting.
Incorporated environmental factors like temperature and humidity and temporal factors like hour and month.
Compared ML models against baseline linear regression performance metrics set by USEPA.
The best machine learning model reduced root mean square error (RMSE) by 58% compared to linear regression.
A hybrid model displayed a 63% reduction in RMSE, outperforming other models under varying conditions.
The hybrid model met USEPA performance criteria across diverse environmental and temporal conditions.

Abstract

Abstract Introduction Low-cost particulate matter sensors (LCPMS) require calibration for reliable PM 2.5 measurements. We propose a novel data-analytic framework to evaluate machine learning (ML) calibration approaches for fine particulate matter (PM 2.5 ) across the complete environmental and temporal scales relevant to the geographic location via a long-term co-location study in Vishakhapatnam, India. Methods We compared Random Forest (RF) and eXtreme Gradient Boosting (XGB) models while systematically incorporating temperature (T), relative humidity (RH), hour of day (HD), and month of year (MY) variables against baseline linear regression (LR) models. Additionally, a hybrid ensemble LR model combining the best-performing ML models was also explored. Overall model performance was assessed against the United States Environmental Protection Agency (USEPA) recommended performance metrics for low-cost PM₂.₅ sensors. In addition, the developed models were evaluated across environmental (T and RH) and temporal (TD and MY) scales by binning the respective variables. Categorical accuracies across air quality index (AQI) categories are also explored. Results and Discussion The best ML model reduced RMSE by 58% compared to the baseline LR model; the hybrid model performed better with a 63% reduction in RMSE compared to the baseline. The hybrid model exhibited the least errors across most environmental and temporal conditions, while satisfying the USEPA performance criteria. Conclusions The hybrid ensemble approach mitigates environmental and temporal variability effects on measurement accuracy, improving PM 2.5 quantification in diverse field conditions. Our framework provides a robust, computationally efficient approach that is sensor-model agnostic and adaptable to various target pollutants and calibration methodologies. Graphical abstract

Mark Helpful

Bookmark

Relay

View Full Paper