Abstract The presence of active pharmaceutical ingredients (APIs) in aquatic environments calls for greater attention to the risks these substances may pose, particularly concerning chronic toxicity. The No Observed Effect Concentration (NOEC) is a commonly used endpoint to summarize chronic toxicity in ecological risk assessment and ecotoxicology. A new approach is proposed here for predicting the NOEC according to the Early-life Stage Toxicity test Organization for Economic Co-operation and Development (OECD) TG210 (tested on more than 200 substances) in fish using a molecular questionnaire. Molecular questioning is a survey of molecules represented as Simplified Molecular Input Line Entry System (SMILES) for the presence of individual features extracted from SMILES. The Las Vegas algorithm was used to rationally split the data into training and validation sets. The optimal descriptors for one-parameter linear regression were calculated using the Monte Carlo method, based on the index of ideality of correlation (IIC). The statistical quality of the model is as follows: the coefficient of determination is about 0.7 for the training set and 0.7 for the validation set of the substances studied. In addition, we used several machine learning algorithms, but the results were poor. The model presented here compares favourably with models previously published for the same endpoint and chemicals. Since APIs are complex molecules that can have multiple effects, particularly on human health, these results are of particular interest.
Toropov et al. (Mon,) studied this question.