What question did this study set out to answer?

The research aims to improve database performance for text searches using the LIKE predicate by integrating language models.

April 10, 2026Open Access

The Case For Language Model Approximated LIKE Predicate

Key Points

The research aims to improve database performance for text searches using the LIKE predicate by integrating language models.
Developed SMILE, a small language model integrated LIKE engine.
Decoded complex LIKE patterns into candidate values.
Used hash table lookups for dataset-size-invariant verification.
Conducted evaluation across diverse datasets to assess performance.
Achieved a 3-orders-of-magnitude acceleration in LIKE performance versus traditional methods.
Implemented SMILE with a parameter size 5 orders smaller than large language models.
Showed recall ability and robustness against data distribution drifts.

Abstract

Modern databases often use the LIKE predicate to search text data. However, when the search condition is interrupted by wildcards, the existing search structure can degrade to a worst-case complexity linear to full table scale, resulting in poor performance. Traditional methods, such as B+-trees, fail to handle wildcards at both ends efficiently. Recent advances in language models offer a promising solution. These models can decode complex LIKE patterns into a small set of candidate values, which are then verified in dataset-size-invariant time via hash table lookups, greatly improving efficiency. However, integrating LLMs into databases faces challenges such as high latency, large storage requirements, and sensitivity to data distribution drifts. To address these issues, we propose SMILE, a S mall language M odel I ntegrated L IKE E ngine that learns column-local character distributions through small but exquisite parameters. Our SMILE acts as a neural translator that converts complex LIKE patterns into their corresponding result sets. Our approach achieves asymptotic complexity improvements while preserving SQL LIKE logic. We conduct comprehensive evaluation across diverse datasets to validate the efficacy of our approach. Our compact SMILE, with a parameter size 5 orders of magnitude smaller than large language models, achieves strong LIKE decoding efficiency and quality. Specifically, SMILE obtains high recall ability while accelerating LIKE by 3 orders of magnitude compared to large language models and sequential scans, 1.8-41.6 times faster than trigram indexes, and 2 orders of magnitude faster than B+-trees. Moreover, our model demonstrates robustness against potential data and query distribution drifts.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Y. X. Li

Dong Wang

Zixuan Wang

Journals

Proceedings of the ACM on Management of Data

Actions

Institutions

Harbin Institute of Technology

Chinese University of Hong Kong, Shenzhen

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

The Case For Language Model Approximated LIKE Predicate

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study