Key points are not available for this paper at this time.
Molecular docking is indispensable across computer‑aided discovery. However, its conclusions often hinge more on modeling choices than on software brand or nominal score. In this review, we argue that docking should be treated explicitly as conditional modeling whose interpretability depends on structural provenance, ligand‑state definition, search‑space design, and validation under deployment‑relevant conditions. This framework is intended as a practical reference for evaluating docking rigor across both academic and applied workflows. We highlight recurrent failure modes, cross‑target score comparisons under non‑comparable states, over‑read scores as affinities, under‑modeled solvation/flexibility, and uncritical use of predicted structures, and show how AI both exacerbates and mitigates these risks. We then propose best practices for modern validation (self‑docking as necessary but insufficient; cross‑docking, decoys, apo/predicted structures, and out‑of‑distribution tests as essential complements) and offer a concise FAIR reporting checklist enabling reuse and audit. Looking forward, we contend that the most valuable advances are those that improve deployment‑relevant reliability, pose plausibility, enrich screening, and enhance robustness across receptor uncertainty, rather than tool novelty alone. This review reframes docking success from "obtaining a pose and a score" to earning confidence through transparent workflows and evaluation aligned with real use. In this context, the term "AI-driven" does not imply replacing physics-based docking, but rather expanding the workflow landscape in which classical and machine learning approaches coexist. This review, therefore, treats docking as a unified decision framework spanning both paradigms, with emphasis on how validation, generalization, and reproducibility requirements evolve in AI-assisted workflows.
Kittelson et al. (Mon,) studied this question.