The growing number and size of DNA-encoded libraries (DELs), together with the vast space of possible DEL designs, demand interpretable and scalable criteria for selecting which libraries to construct and screen against a given target. An ideal target-focused DEL shows both strong similarity with an active reference compound collection and high intra-DEL diversity. Chemography with Generative Topographic Mapping (GTM) was shown to be a promising approach for selecting DELs, offering both intuitive visualization and fast quantitative analysis scalable to thousands of DEL designs. This is achieved by defining each library by a "stand-alone" vector, the comparison of which precludes costly pairwise inter-molecular similarity calculations. However, the extent to which such "stand-alone" (SA) approaches in general, and GTM-derived SA metrics in particular, recover DELs that are reference-proximal and chemically diverse as evaluated by conventional compound pair-matching (CP) metrics in the initial descriptor space remains insufficiently characterized. In this article, the comparative analysis of the Morgan count fingerprint-based chemical-library similarity versus GTM-derived metrics, using 100 diverse DEL subsets and a reference set of compounds tested against cyclin-dependent kinase 2 (CDK2) from ChEMBL, was performed. GTM-based SA metrics provide robust approximations for "gold standard" molecular descriptor space CP metrics for DEL selection: Spearman rank correlations fall in the 0.6-0.7 range. Our results demonstrate that GTM helps to identify DELs that best span the reference space according to same "gold standard" molecular descriptor space metrics: SA GTM-driven rankings of libraries achieve enrichment factors at 5% (EF5%) of 4-12 (in terms of finding "gold standard" top libraries within the 5% best ranked by GTM)-always picking 2 out of the top 3 libraries. The accompanying two-dimensional landscapes make intra- and interlibrary diversity visually accessible, supporting rapid, interpretable screening of alternative DEL designs. Collectively, these results position GTM as an efficient tool for chemical-library similarity assessment and target-focused DEL selection.
Plyer et al. (Sun,) studied this question.