Night-time image quality assessment (NTIQA) requires jointly modeling global illumination consistency and spatially localized distortions, such as noise, glare, and halo artifacts. Existing CNN-based methods are effective at capturing local degradations but are limited in modeling long-range contextual dependencies, whereas Transformer-based approaches improve global perception at the cost of higher computational complexity. To address this challenge, we propose HCA-Mamba, a hybrid framework that combines Vision Mamba (ViM)-based global context modeling with CNN-based local distortion perception for NTIQA. Specifically, a ViM backbone is employed to capture long-range dependencies and global luminance structure, while a parallel CNN branch extracts multi-scale local degradation cues and compacts them into distortion-aware tokens through a multi-scale distortion extractor (MSDE). These tokens are then progressively injected into successive ViM layers via a local distortion injection module (LDIM), which performs gated cross-attention to enable stable interaction between global and local representations across network depths. The fused representation is finally mapped to a perceptual quality score by a regression head. Experiments on the NNID and EHNQ benchmarks, together with additional cross-dataset evaluation on NPHD, including both intra-dataset evaluation and cross-dataset generalization settings, demonstrate the effectiveness and generalization potential of the proposed method. A controlled synthetic distortion severity study further evaluates the model response to progressively intensified night-time degradations. Extensive ablation studies further verify the effectiveness of MSDE and LDIM in enhancing sensitivity to night-time distortions.
Zhang et al. (Sat,) studied this question.