What question did this study set out to answer?

This research investigates how different factors influence the performance of pocket-conditioned generative models in drug discovery.

June 1, 2026Open Access

Assessing the factors influencing the quality of pocket-conditioned 3D generative models

Key Points

This research investigates how different factors influence the performance of pocket-conditioned generative models in drug discovery.
Trained generative models on internal crystallography data from a large pharmaceutical company.
Examined the impact of including hydrogens and pretraining on unconditional data on model performance.
Evaluated various training settings for generative quality of ligands.
Models trained on internal data showed improved generative quality compared to traditional PDB-based models.
Inclusion of hydrogens positively influenced model performance, enhancing binding predictions.
Pretraining on unconditional data also contributed to better ligand quality across diverse scenarios.

Abstract

Abstract Structure-based drug discovery (SBDD) aims to identify novel molecules that bind to therapeutic protein targets. The vast chemical space and limitations of traditional approaches make this task challenging. Recent advances in AI-generative models, such as flow matching, can produce novel, pocket-conditioned molecular structures directly in three-dimensional space. However, most pocket conditioned models in the literature are trained on structures derived from the Protein Data Bank (PDB), which contains structures with varying quality and inconsistent annotation. Moreover, the PDB is enriched with cofactors and natural products, thereby poorly representing real world SBDD scenarios. The relatively limited number of ligand series within the same pockets also hinder the model’s ability to learn protein-ligand interactions effectively. Here for the first time we report the results of training pocket-conditioned generative models on internal crystallography data from a large pharmaceutical company. We also investigate other key determinants of model performance, such as inclusion of hydrogens and pretraining on unconditional data. We evaluate how each factor affects the generative quality of the ligands across the diverse training settings. Our results provide practical guidelines for the development of more effective 3D generative models for SBDD and highlight key directions for future research toward reliable, pocket-aware molecular design.

Bookmark

View Full Paper

Cite This Study

Wang et al. (Sat,) studied this question.

synapsesocial.com/papers/6a1d236002fbce9130638fdb https://doi.org/https://doi.org/10.1186/s13321-026-01230-5

Bookmark

View Full Paper