What question did this study set out to answer?

The aim is to enhance the LZ4 compression algorithm for better query processing without full decompression.

April 10, 2026Open Access

Improving LZ4 for Effective Compression and Efficient Query

Key Points

The aim is to enhance the LZ4 compression algorithm for better query processing without full decompression.
Developed LZV, a compression algorithm using a variable-length hash mechanism.
Introduced a compressed index search method for direct querying of compressed data in LZ4 format.
Created a compressed key search method integrating prediction and binary search for efficient index retrieval.
Implemented a compressed-data query pipeline in Apache TsFile.
Achieved a significant improvement in compression effectiveness compared to standard LZ4.
Reduced query latency by allowing direct querying of compressed data.
Demonstrated enhanced efficiency in retrieving keys using the new search methods.

Abstract

LZ4 is a compression algorithm widely adopted in many database systems, which surprisingly has no support for directly querying the compressed data. Existing systems rely on full decompression for query processing, leading to increased query latency. Moreover, the LZ4 compression algorithm has issues like long match dependency, which reduces both compression effectiveness and the efficiency of compressed-data query processing. In this paper, (1) we propose LZV, a compression algorithm that employs a variable-length hash mechanism to identify maximal matches, significantly improving the compression ratio and reducing the compressed-data query overhead. (2) We propose compressed index search method that leverages auxiliary structure to efficiently query compressed data in LZ4 format directly. (3) Leveraging the ordered, quasi-uniformly spaced nature of the key column, we introduce compressed key search method that integrates prediction with binary search to retrieve the corresponding index for a given key efficiently. Finally, we implement the compressed-data query pipeline in Apache TsFile, an open-source KV storage system. Experimental results show that our approach significantly improves compression effectiveness and query efficiency.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Liu et al. (Thu,) studied this question.

www.synapsesocial.com/papers/69d894326c1944d70ce0525c — DOI: https://doi.org/10.1145/3786660

Also consider

Synapse has enriched 4 closely related papers on similar clinical questions. Consider them for comparative context:

LeCo: Lightweight Compression via Learning Serial Correlations· 2024 · 14 citations
Gorilla· 2015 · 278 citations
A universal algorithm for sequential data compression· 1977 · 5,455 citations
Column-stores vs. row-stores· 2008 · 424 citations

Authors

Zhiheng Liu

Shaoxu Song

Journals

Proceedings of the ACM on Management of Data

Actions

Institutions

Tsinghua University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Improving LZ4 for Effective Compression and Efficient Query

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion