Index benefit estimation is arguably the most important step in index tuning. Existing index tuners developed for commercial and open-source database systems typically leverage the ''what if'' API for this purpose, which relies on query optimizer's cost estimation module and can be inaccurate due to various reasons such as cardinality estimation errors. Recent work has proposed learning-based index benefit estimators to replace the what-if API. Although learned index benefit estimators show better accuracy, they require large amounts of query execution telemetry as training data, which often contain noisy or conflicting labels that can perplex the trained ML model. There are two types of such noisy labels: epistemic noise, which are caused by the limitations of query plan encodings employed by existing learned estimators and aleatoric noise, which are caused by the inherent dynamicity of the query execution environment due to unpredictable runtime factors such as buffer pool utilization, resource contention, and sometimes adaptive query execution. In this paper, we propose RIB to mitigate the impact of noisy labels and therefore improve the robustness of learned index benefit estimators. RIB introduces two new technologies to address the challenges imposed by epistemic and aleatoric label noises: (1) a context-aware encoder based on bidirectional graph neural network (GNN) and (2) a probabilistic prediction model based on fully parameterized quantile regression (FPQR). Compared to existing work, the GNN-based encoder captures more contextual information about index-optimizable operations, such as structural changes in query plans before and after utilizing indexes, thereby reducing the chance of epistemic noise. Moreover, unlike existing work that uses a point estimate for index benefit estimation, the FPQR-based predictor considers the entire distribution of likely index benefits and provides more robust estimates by aggregating over all quantiles of the distribution, thereby reducing the impact of aleatoric noise. Extensive experiments on top of multiple benchmarks demonstrate that RIB significantly outperforms state-of-the-art index benefit estimators in terms of both estimation accuracy and end-to-end index recommendation quality.
Building similarity graph...
Analyzing shared references across papers
Loading...
Sifan Chen
Chenning Wu
Yu Jing
Proceedings of the ACM on Management of Data
Microsoft (United States)
Fudan University
Building similarity graph...
Analyzing shared references across papers
Loading...
Chen et al. (Thu,) studied this question.
www.synapsesocial.com/papers/69d8940c6c1944d70ce04f2d — DOI: https://doi.org/10.1145/3786691