Cocrystallization can be used to tune key drug properties, such as aqueous solubility, without altering molecular structure; however, the space of possible coformers is enormous and design rules are empirical. We present a Bayesian optimization framework that couples Gaussian process (GP) classification and regression that can accelerate cocrystal discovery and solubility enhancement. Starting from 6338 literature-derived binary coformer pairs, vector fingerprints that combine 2D structural information (fragment and MQN fingerprints) with low-cost shape and polarity descriptors were engineered for cocrystal prediction. A GP classifier, trained on an actively constructed training set of ∼1000 coformer pairs selected by uncertainty sampling achieves up to 94% accuracy and Matthews correlation coefficient of 0.79 on a test set of >5000 unseen pairs. Property-driven coformer selection was formulated as a Bayesian optimization problem, using a machine learning model as a surrogate for aqueous solubility and Tanimoto-similarity to guide campaigns across several discovery scenarios. In simulations, the framework rapidly identifies highly soluble cocrystals, typically recovering top-5 candidates after fewer than 10 evaluations. Finally, we validate the workflow experimentally with 12 pharmaceutical and pharmaceutical-like compounds, discovering two new cocrystals, resveratrol + praziquantel and purin-6-amine + thiazole-4-carboxylic acid, with markedly enhanced aqueous solubility. These results demonstrate a practical, data-efficient route to Bayesian optimization for cocrystal design.
Building similarity graph...
Analyzing shared references across papers
Loading...
Appiah et al. (Mon,) studied this question.
www.synapsesocial.com/papers/69ba425c4e9516ffd37a289d — DOI: https://doi.org/10.1021/acs.cgd.5c01417
Samuel A. Appiah
Matthew A. McDonald
Crystal Growth & Design
Drexel University
Building similarity graph...
Analyzing shared references across papers
Loading...