LZ4 is a compression algorithm widely adopted in many database systems, which surprisingly has no support for directly querying the compressed data. Existing systems rely on full decompression for query processing, leading to increased query latency. Moreover, the LZ4 compression algorithm has issues like long match dependency, which reduces both compression effectiveness and the efficiency of compressed-data query processing. In this paper, (1) we propose LZV, a compression algorithm that employs a variable-length hash mechanism to identify maximal matches, significantly improving the compression ratio and reducing the compressed-data query overhead. (2) We propose compressed index search method that leverages auxiliary structure to efficiently query compressed data in LZ4 format directly. (3) Leveraging the ordered, quasi-uniformly spaced nature of the key column, we introduce compressed key search method that integrates prediction with binary search to retrieve the corresponding index for a given key efficiently. Finally, we implement the compressed-data query pipeline in Apache TsFile, an open-source KV storage system. Experimental results show that our approach significantly improves compression effectiveness and query efficiency.
Building similarity graph...
Analyzing shared references across papers
Loading...
Liu et al. (Thu,) studied this question.
www.synapsesocial.com/papers/69d894326c1944d70ce0525c — DOI: https://doi.org/10.1145/3786660
Synapse has enriched 4 closely related papers on similar clinical questions. Consider them for comparative context:
Zhiheng Liu
Shaoxu Song
Proceedings of the ACM on Management of Data
Tsinghua University
Building similarity graph...
Analyzing shared references across papers
Loading...