What question did this study set out to answer?

The study investigates whether topological signals in odor embeddings are reproducible and useful for molecular representation analysis.

April 21, 2026Open Access

Topological Signal in Learned Odor Embeddings Under Baseline and Utility Controls

Key Points

The study investigates whether topological signals in odor embeddings are reproducible and useful for molecular representation analysis.
Utilized persistent homology to analyze Principal Odor Map (POM) in multiple datasets.
Employed repeated subsampling and matched null models to evaluate H1 signal strength and robustness.
Performed utility analysis on odor-label prediction using local topology features.
POM exhibited robust topological signals above null models across datasets, with top-1 signal ratios between 1.41 and 1.68.
Morgan fingerprints showed stronger topological signals than POM, achieving ratios up to 4.02 on specific datasets.
Topology features provided modest utility gains for odor-label predictions, highlighting dependency on the representation used.

Abstract

An audit of whether first-homology (H1) persistent topology in the Principal Odor Map (POM) is reproducible, representation-specific, and practically useful. Persistent homology has become a tempting way to assign geometric meaning to learned molecular representations, but the existence of topological signal does not by itself imply that a representation captures uniquely informative structure. We evaluate H1 signal in fixed OpenPOM embeddings across a curated 4,983-row GoodScents/Leffingwell table, a broader 5,862-row GS/LF table, and a 1,600-molecule non-overlap subset, using repeated subsampling and matched null models. Across repeated direct subsamples, POM showed robust signal above matched nulls on all datasets, with top-1 signal-to-null ratios of 1.41–1.68 on the curated table and 1.42–1.56 on the non-overlap subset. The same Euclidean result stayed above null for all 10 released OpenPOM ensemble checkpoints (mean 1.52, range 1.42–1.68). Paper-matched Morgan bit fingerprints were at least as strong and often stronger, reaching direct top-1 ratios of 2.44–2.93 on the curated table and 3.02–4.02 on the non-overlap subset. Landmark distance-matrix analyses preserved the same qualitative caution: POM's H1 signal is real, but robust topology is not uniquely favorable to POM relative to strong chemical baselines. An important interpretive detail runs through the whole study: POM is a 256-dimensional dense representation while the strongest fingerprint baselines are 2,048-dimensional sparse encodings of explicit substructure content. That asymmetry makes the comparison not a clean scoreboard. A compressed learned space that retains robust topology under that bottleneck is noteworthy, but the fact that a sparser high-dimensional fingerprint shows stronger raw signal does not automatically imply it contains more odor-relevant structure. The results are better read as evidence that compressed learned odor spaces can preserve nontrivial topology than as evidence that POM has topological superiority. A utility analysis tested whether local topology features add explanatory value beyond local geometry for neighborhood-level odor-label prediction. Gains were modest and target-dependent: the largest POM improvement was ΔR² = +0.048 (neighbor-label entropy, curated table, cosine) but non-POM representations sometimes matched or exceeded POM on the non-overlap subset. Topology can add utility, but not universally and not uniquely to POM. The contribution is analytical rather than mechanistic: a reproducible comparison pipeline, stress-tested across datasets and checkpoints, with explicit statements of what the evidence does and does not justify. The work supports topological data analysis as a useful audit of learned odor representations while arguing against strong claims that current odor embeddings exhibit uniquely informative topology. Supports: reproducible H1 signal in POM across datasets; Euclidean stability across all 10 OpenPOM ensemble checkpoints; landmark-route agreement with direct analyses; modest utility gains from topology features in some settings. Not necessarily support: topological uniqueness of POM relative to sparse chemical baselines; interpretability of detected loops as perceptual dimensions; broad practical utility for molecular design; a clean separation of "odor-relevant structure preserved under compression" from "raw combinatorial structure retained by sparse encodings." This Zenodo archive contains the manuscript PDF, the analysis code (github.com/Obiohagwu/odor-topology, frozen at submission), and the CSV/JSON analysis reports that the figures are generated from. The companion preprint is on ChemRxiv. Keywords: persistent homology, topological data analysis, Principal Odor Map, POM, OpenPOM, Morgan fingerprints, molecular representations, chemosensory machine learning, representation auditing, Ripser.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Micheal Chukwuemeka Ohagwu

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Topological Signal in Learned Odor Embeddings Under Baseline and Utility Controls

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study