What question did this study set out to answer?

The aim is to enhance night-time image quality assessment by effectively combining global and local distortion perceptions.

May 25, 2026Open Access

HCA-Mamba: a hierarchical cross-attention framework combining Vision Mamba and CNN for night-time image quality assessment

Key Points

The aim is to enhance night-time image quality assessment by effectively combining global and local distortion perceptions.
Proposed HCA-Mamba framework combines Vision Mamba for global context modeling and CNN for local distortion extraction.
Utilized a multi-scale distortion extractor to compact local degradation cues into distortion-aware tokens.
Implemented a local distortion injection module for gated cross-attention across network depths.
Demonstrated improved performance on NNID and EHNQ benchmarks with effective generalization on NPHD.
Ablation studies confirmed that the multi-scale distortion extractor and local distortion injection module significantly enhanced sensitivity to night-time distortions.

Abstract

Night-time image quality assessment (NTIQA) requires jointly modeling global illumination consistency and spatially localized distortions, such as noise, glare, and halo artifacts. Existing CNN-based methods are effective at capturing local degradations but are limited in modeling long-range contextual dependencies, whereas Transformer-based approaches improve global perception at the cost of higher computational complexity. To address this challenge, we propose HCA-Mamba, a hybrid framework that combines Vision Mamba (ViM)-based global context modeling with CNN-based local distortion perception for NTIQA. Specifically, a ViM backbone is employed to capture long-range dependencies and global luminance structure, while a parallel CNN branch extracts multi-scale local degradation cues and compacts them into distortion-aware tokens through a multi-scale distortion extractor (MSDE). These tokens are then progressively injected into successive ViM layers via a local distortion injection module (LDIM), which performs gated cross-attention to enable stable interaction between global and local representations across network depths. The fused representation is finally mapped to a perceptual quality score by a regression head. Experiments on the NNID and EHNQ benchmarks, together with additional cross-dataset evaluation on NPHD, including both intra-dataset evaluation and cross-dataset generalization settings, demonstrate the effectiveness and generalization potential of the proposed method. A controlled synthetic distortion severity study further evaluates the model response to progressively intensified night-time degradations. Extensive ablation studies further verify the effectiveness of MSDE and LDIM in enhancing sensitivity to night-time distortions.

Bookmark

View Full Paper

Bookmark

View Full Paper

HCA-Mamba: a hierarchical cross-attention framework combining Vision Mamba and CNN for night-time image quality assessment

Key Points

Abstract

Cite This Study