What type of study is this?

This is a Quantitative Study study.

September 29, 2025Open Access

NSNQuant: A Double Normalization Approach for Calibration-Free Low-Bit Vector Quantization of KV Cache

Puntos clave

NSNQuant achieves up to 3x throughput improvement compared to full-precision methods, enhancing KV cache efficiency.
The technique employs a novel double normalization approach with Hadamard transform to ensure calibration-free VQ.
This study demonstrates strong generalization capabilities for low-bit vector quantization across various settings.
Extensive experiments validate NSNQuant's superiority over traditional methods in both 1-bit and 2-bit quantization.

Resumen

Large Language Model (LLM) inference is typically memory-intensive, especially when processing large batch sizes and long sequences, due to the large size of key-value (KV) cache. Vector Quantization (VQ) is recently adopted to alleviate this issue, but we find that the existing approach is susceptible to distribution shift due to its reliance on calibration datasets. To address this limitation, we introduce NSNQuant, a calibration-free Vector Quantization (VQ) technique designed for low-bit compression of the KV cache. By applying a three-step transformation-1) a token-wise normalization (Normalize), 2) a channel-wise centering (Shift), and 3) a second token-wise normalization (Normalize) -with Hadamard transform, NSNQuant effectively aligns the token distribution with the standard normal distribution. This alignment enables robust, calibration-free vector quantization using a single reusable codebook. Extensive experiments show that NSNQuant consistently outperforms prior methods in both 1-bit and 2-bit settings, offering strong generalization and up to 3 throughput gain over full-precision baselines.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Son et al. (Fri,) studied this question.

www.synapsesocial.com/papers/68da58d8c1728099cfd10fe3 — DOI: https://doi.org/10.48550/arxiv.2505.18231

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Authors

Donghyun Son

Euntae Choi

Sungjoo Yoo

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

NSNQuant: A Double Normalization Approach for Calibration-Free Low-Bit Vector Quantization of KV Cache

Puntos clave

Resumen

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion