We present a garbage collection (GC) algorithm for multi-version concurrency control (MVCC) in database systems that support hierarchical copy-on-write branching. Traditional MVCC GC systems use single-dimensional retention policies (time-based or snapshot-based) and are unaware of branch hierarchies. When a parent branch garbage-collects old versions, child branches that depend on those versions through inheritance lose data access, causing query failures. Our algorithm computes a multidimensional GC horizon for each branch by taking the minimum log sequence number (LSN) across three dimensions: (1) the branch’s own retention period, (2) the creation LSNs of all descendant branches (protecting inherited data), and (3) active transaction snapshots across the branch lineage. We further integrate this branch-aware GC policy with LSM-tree compaction by filtering version records during SSTable merge operations. The algorithm achieves 100% correctness (zero data loss in multi-branchscenarios) with 10–30% storage savings versus a conservative “never delete” approach. We describe the architecture, algorithms, and implementation in HeliosDB, with per-branch observability via 10 Prometheus metrics. No prior database system combines hierarchical branch visibility with MVCC garbage collection and LSM compaction integration.
Building similarity graph...
Analyzing shared references across papers
Loading...
Daniel Moya Vaca (Thu,) studied this question.
www.synapsesocial.com/papers/69c772818bbfbc51511e2fed — DOI: https://doi.org/10.5281/zenodo.19242033
Daniel Moya Vaca
Building similarity graph...
Analyzing shared references across papers
Loading...