What question did this study set out to answer?

This research aims to develop a lightweight image compression model that maintains high fidelity while being efficient.

May 21, 2026Open Access

A lightweight perceptual-guided VQVAE for high-fidelity image compression

Key Points

This research aims to develop a lightweight image compression model that maintains high fidelity while being efficient.
Introduced HiRes-VQ, a variant of VQ-VAE-2 featuring an asymmetric encoder-decoder architecture.
Implemented a multi-scale perceptual alignment loss for optimizing pixel accuracy and semantic consistency.
Conducted ablation experiments to analyze the impact of the dual-path decoder and perceptual loss.
HiRes-VQ achieved 18%–40% gains in fidelity over baseline models on FFHQ-256 and ImageNet-256.
Demonstrated superior quality-efficiency trade-off compared to high-complexity models like VQGAN and OptVQ.
Confirmed the complementary roles of the dual-path decoder and perceptual loss in enhancing image quality.

Abstract

Abstract Low-bit-rate image compression faces a persistent quality-efficiency dilemma: lightweight models such as VQ-VAE produce perceptually degraded reconstructions, while high-quality alternatives like VQGAN and diffusion models incur prohibitive computational costs. To bridge this gap, we propose HiRes-VQ, a lightweight perceptual-guided VQ-VAE that achieves high-fidelity reconstruction without sacrificing efficiency. Built upon VQ-VAE-2’s hierarchical quantization, HiRes-VQ introduces two key innovations: (1) an asymmetric encoder-decoder architecture, where the encoder hierarchically extracts semantic features at multiple spatial scales and the decoder reconstructs low-frequency structures and high-frequency textures through separate frequency-domain pathways, together ensuring pixel-level fidelity; and (2) a multi-scale perceptual alignment loss that jointly optimizes pixel accuracy, semantic feature consistency, and style statistics, enabling perceptual-quality gains without compromising structural metrics. With only 3.21M parameters, HiRes-VQ achieves 18%–40% fidelity gains over similar-sized baselines on FFHQ-256 and ImageNet-256 across both pixel-level and semantic-level metrics, while surpassing high-complexity models such as VQGAN and OptVQ in quality-efficiency trade-off. Ablation experiments confirm that the dual-path decoder and the perceptual loss serve complementary roles, together enabling significant improvements in both pixel-level fidelity and semantic perceptual quality. These results demonstrate that HiRes-VQ effectively resolves the quality-efficiency dilemma, offering a practical solution for resource-constrained deployment.

Bookmark

View Full Paper

Bookmark

View Full Paper

A lightweight perceptual-guided VQVAE for high-fidelity image compression

Key Points

Abstract

Cite This Study