Large language models (LLMs) have achieved strong performance on natural language to SQL (NL2SQL) tasks, but their practical effectiveness depends on tuning a complex pipeline of interacting components. Real-world deployments must navigate a critical trade-off between execution accuracy and monetary cost, a factor that has been largely overlooked by prior work focused primarily on maximizing accuracy. Navigating this trade-off is non-trivial: the ideal configuration of components (e.g., LLM, prompting strategy, schema linking) is not only interdependent but also highly sensitive to the target database schema. This creates a challenging, schema-aware configuration tuning problem that lacks a systematic solution. We present PRISM, a framework that systematically identifies high-accuracy, cost-efficient NL2SQL configurations tailored to each schema. Adopting an optimize-then-deploy strategy, PRISM first uses cost-aware Bayesian Optimization in an offline phase to efficiently explore the configuration space and curate a pool of high-performing pipelines. In an online phase, it deploys these configurations either as a single, cost-effective candidate or as an ensemble to maximize accuracy. Experiments on the BIRD benchmark demonstrate that PRISM achieves 69.48% execution accuracy in the single-candidate setting, improving accuracy by 2.34% over the strongest baseline while reducing cost by 92%. In the ensemble setting, PRISM boosts accuracy further to 74.9%.
Building similarity graph...
Analyzing shared references across papers
Loading...
Gaurav Tarlok Kakkar
Yeounoh Chung
Fatma Özcan
Proceedings of the ACM on Management of Data
Georgia Institute of Technology
Google (United States)
Building similarity graph...
Analyzing shared references across papers
Loading...
Kakkar et al. (Thu,) studied this question.
www.synapsesocial.com/papers/69d893c96c1944d70ce04b54 — DOI: https://doi.org/10.1145/3786679