Commercial pesticides are indispensable for modern agriculture but may exert unintended toxicity toward non-target soil fauna, notably earthworms. Here, we present a machine learning-based QSAR (ML-QSAR) framework that simultaneously integrates molecular structure descriptors, soil physicochemical parameters, and formulation types to predict commercial pesticide toxicity under real soil conditions. Required inputs for the models are: a SMILES string (used to compute Mordred descriptors, RDKit descriptors, and Morgan fingerprint), soil organic matter content, pH, and the commercial pesticide formulation type, and the output is commercial pesticide toxicity (LC50, the concentration that killed 50 % of the population) to earthworms. A combined dataset of 608 proprietary and 339 publicly available data was employed to train and validate 30 binary classification models, generated by evaluating ten machine learning algorithms against two descriptor sets and a molecular fingerprint. The results showed that the optimal Gradient Boosting Decision Tree model with RDKit descriptors (GBDT-RDKit) achieved the best prediction performance (accuracy = 0.83, precision = 0.77, recall = 0.80, F1-score = 0.79, MCC = 0.64, AUROC = 0.88), with an applicability domain defined by maximum Tanimoto similarity > 0.512. SHapley Additive exPlanations (SHAP) analysis quantified the relative contributions of molecular and environmental features, highlighting the dominant influence of chemical structure, soil parameters and formulation types. Furthermore, three-class classifiers were also developed to prioritize highly toxic pesticides. Overall, our high-performance ML-QSAR approach offers a rapid and cost-effective surrogate for assessing commercial pesticide impacts on earthworms in soil environments.
Li et al. (Sun,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: