Cocrystallization is a pivotal strategy for optimizing the physicochemical properties of functional molecular materials, yet its development is bottlenecked by time-consuming and labor-intensive experimental screening. While machine learning (ML) has emerged as a paradigm for high-throughput virtual screening, the black-box nature of existing models and their reliance on global molecular descriptors limit their utility in rational cocrystal design. Herein, we constructed a large-scale cocrystal data set encompassing 7700 samples. We innovatively adopted Morgan fingerprints which are capable of capturing fine-grained local chemical environment information to encode molecular structures into numerical vectors. These vectors retain key functional group and substructure features that are critical for cocrystal assembly. Five ML models (MF-KNN, MF-RF, MF-XGBoost, MF-SVM, and MF-ANN) were developed, and their optimal hyperparameter combinations were determined via systematic tuning. The MF-ANN model achieved state-of-the-art performance on the independent test set, with an accuracy of 97.16% and an F1-score of 98.35%, and exhibited excellent robustness through 10-fold, 5-fold, and 3-fold cross-validation. To address the black-box challenge, we pioneeringly applied the SHapley Additive exPlanations (SHAP) tool to cocrystal prediction: at the global level, we identified universal key substructures driving cocrystal formation, while at the local level, we correlated abstract model outputs with specific molecular interactions (e.g., hydrogen bonding, π-π stacking). This work not only provides a high-performance, interpretable ML framework for rapid cocrystal screening but also establishes a direct link between model predictions and chemical mechanisms, offering actionable guidance for the rational design of functional cocrystals in pharmaceuticals, energetic materials, and optoelectronics.
Building similarity graph...
Analyzing shared references across papers
Loading...
Yukun Liu
Yanfei Liu
Ziang Du
The Journal of Physical Chemistry A
PLA Rocket Force University of Engineering
Building similarity graph...
Analyzing shared references across papers
Loading...
Liu et al. (Wed,) studied this question.
www.synapsesocial.com/papers/69e1cf1b5cdc762e9d85812e — DOI: https://doi.org/10.1021/acs.jpca.6c00365