Modern databases often use the LIKE predicate to search text data. However, when the search condition is interrupted by wildcards, the existing search structure can degrade to a worst-case complexity linear to full table scale, resulting in poor performance. Traditional methods, such as B+-trees, fail to handle wildcards at both ends efficiently. Recent advances in language models offer a promising solution. These models can decode complex LIKE patterns into a small set of candidate values, which are then verified in dataset-size-invariant time via hash table lookups, greatly improving efficiency. However, integrating LLMs into databases faces challenges such as high latency, large storage requirements, and sensitivity to data distribution drifts. To address these issues, we propose SMILE, a S mall language M odel I ntegrated L IKE E ngine that learns column-local character distributions through small but exquisite parameters. Our SMILE acts as a neural translator that converts complex LIKE patterns into their corresponding result sets. Our approach achieves asymptotic complexity improvements while preserving SQL LIKE logic. We conduct comprehensive evaluation across diverse datasets to validate the efficacy of our approach. Our compact SMILE, with a parameter size 5 orders of magnitude smaller than large language models, achieves strong LIKE decoding efficiency and quality. Specifically, SMILE obtains high recall ability while accelerating LIKE by 3 orders of magnitude compared to large language models and sequential scans, 1.8-41.6 times faster than trigram indexes, and 2 orders of magnitude faster than B+-trees. Moreover, our model demonstrates robustness against potential data and query distribution drifts.
Building similarity graph...
Analyzing shared references across papers
Loading...
Y. X. Li
Dong Wang
Zixuan Wang
Proceedings of the ACM on Management of Data
Harbin Institute of Technology
Chinese University of Hong Kong, Shenzhen
Building similarity graph...
Analyzing shared references across papers
Loading...
Li et al. (Thu,) studied this question.
www.synapsesocial.com/papers/69d894526c1944d70ce054cf — DOI: https://doi.org/10.1145/3786703