Continuous Approximate Nearest Neighbor Search (ANNS) over real-time vector data streams is an increasingly critical yet underexplored problem. In open-world settings—where data distributions shift, noise accumulates, and concurrent access is common—existing ANNS algorithms, originally designed for static or simplified streaming scenarios, struggle to balance ingestion latency, retrieval quality, and update efficiency. While benchmarks such as ANN-Benchmarks and Big-ANN-Benchmarks have standardized evaluation in static or large-scale settings, they fail to capture the nuanced, high-churn dynamics of real-world streams. We introduce CANDOR-Bench ( C ontinuous A pproximate N earest neighbor search under D ynamic O pen-wo R ld Streams, a benchmarking framework built on Big-ANN-Benchmark to evaluate in-memory ANNS under dynamic, open-world conditions. CANDOR-Bench supports high-frequency ingestion (up to hundreds of thousands of vectors per second), adaptive drift modeling (including modality shifts), stochastic noise injection, and concurrent query-update execution—all without requiring modifications to algorithm code. Across 12 datasets and 19 representative ANNS algorithms, our evaluation reveals that no single ANNS algorithm consistently delivers high recall, throughput, and update efficiency across dynamic open-world scenarios, which challenges assumptions drawn from static benchmarks. This variability reflects deeper trade-offs inherent to streaming settings. For example, smaller update batches improve data freshness but can introduce higher insertion overhead and reduce accuracy. We further observe that throughput in concurrent settings is often constrained by insertion overhead rather than query latency, which highlights a mismatch between streaming workloads and designs originally tuned for offline construction.
Building similarity graph...
Analyzing shared references across papers
Loading...
Mingqi Wang
Huazhong University of Science and Technology
Junyao Dong
Nanyang Technological University
Zhuoyan Wu
National University of Singapore
Proceedings of the ACM on Management of Data
National University of Singapore
Nanyang Technological University
Huazhong University of Science and Technology
Building similarity graph...
Analyzing shared references across papers
Loading...
Wang et al. (Thu,) studied this question.
synapsesocial.com/papers/69d893896c1944d70ce04886 — DOI: https://doi.org/10.1145/3786630
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: