Vector databases have become a cornerstone of modern data science and AI applications, powering recommendation systems, semantic search, retrieval-augmented generation, and more. This paper focuses on vector index merging (particularly HNSW merging), which merges two (or more) vector indexes. This is a key operation in vector databases with many use cases in vector index construction and vector index updates. While there are a few early approaches to solve the problem, the index merging performance remains slow. In this work, we propose HNSW-Merger, a new algorithm for merging two (or more) HNSW indexes that fully exploits the proximity information in existing indexes. It is a novel two-stage, search-based algorithm that relies on forward HNSW search and lazy backward direct-connect to efficiently connect potential edges. HNSW-Merger is optimized for multi-core parallelism and memory efficiency. It also supports efficient merging of multiple indexes. Extensive experiments show that HNSW-Merger achieves significantly faster merging performance than prior approaches while maintaining similar or even higher index quality.
Building similarity graph...
Analyzing shared references across papers
Loading...
Chenzhe Jin
Yunan Zhang
Jiayi Liu
Proceedings of the ACM on Management of Data
Purdue University West Lafayette
Building similarity graph...
Analyzing shared references across papers
Loading...
Jin et al. (Thu,) studied this question.
www.synapsesocial.com/papers/69d8946e6c1944d70ce05607 — DOI: https://doi.org/10.1145/3786645