What question did this study set out to answer?

The aim is to develop a robust method for estimating the benefits of database indexing despite noisy labels in training data.

April 10, 2026Open Access

RIB: Robust Learning-based Index Benefit Estimation

Key Points

The aim is to develop a robust method for estimating the benefits of database indexing despite noisy labels in training data.
Introduced a context-aware encoder using a bidirectional graph neural network (GNN)
Developed a probabilistic prediction model based on fully parameterized quantile regression (FPQR)
Conducted extensive experiments using multiple benchmarks to evaluate performance
RIB significantly improves estimation accuracy compared to existing index benefit estimators
Demonstrated enhanced end-to-end index recommendation quality
Reduced impact of both epistemic and aleatoric noise in the estimation process

Abstract

Index benefit estimation is arguably the most important step in index tuning. Existing index tuners developed for commercial and open-source database systems typically leverage the ''what if'' API for this purpose, which relies on query optimizer's cost estimation module and can be inaccurate due to various reasons such as cardinality estimation errors. Recent work has proposed learning-based index benefit estimators to replace the what-if API. Although learned index benefit estimators show better accuracy, they require large amounts of query execution telemetry as training data, which often contain noisy or conflicting labels that can perplex the trained ML model. There are two types of such noisy labels: epistemic noise, which are caused by the limitations of query plan encodings employed by existing learned estimators and aleatoric noise, which are caused by the inherent dynamicity of the query execution environment due to unpredictable runtime factors such as buffer pool utilization, resource contention, and sometimes adaptive query execution. In this paper, we propose RIB to mitigate the impact of noisy labels and therefore improve the robustness of learned index benefit estimators. RIB introduces two new technologies to address the challenges imposed by epistemic and aleatoric label noises: (1) a context-aware encoder based on bidirectional graph neural network (GNN) and (2) a probabilistic prediction model based on fully parameterized quantile regression (FPQR). Compared to existing work, the GNN-based encoder captures more contextual information about index-optimizable operations, such as structural changes in query plans before and after utilizing indexes, thereby reducing the chance of epistemic noise. Moreover, unlike existing work that uses a point estimate for index benefit estimation, the FPQR-based predictor considers the entire distribution of likely index benefits and provides more robust estimates by aggregating over all quantiles of the distribution, thereby reducing the impact of aleatoric noise. Extensive experiments on top of multiple benchmarks demonstrate that RIB significantly outperforms state-of-the-art index benefit estimators in terms of both estimation accuracy and end-to-end index recommendation quality.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Sifan Chen

Chenning Wu

Yu Jing

Journals

Proceedings of the ACM on Management of Data

Actions

Institutions

Microsoft (United States)

Fudan University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

RIB: Robust Learning-based Index Benefit Estimation

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study