Abstract Swin transformer-based methods have achieved impressive performance in image super-resolution (SR) due to their ability to effectively model long-range spatial dependencies. However, the core component, window-based self-attention (WSA), introduces considerable computational overhead, which limits their applicability on resource-constrained devices. To address these issues, we propose a Swin-style shifted pooling cross-aggregation network (SPCAN) for image SR, which achieves high computational efficiency while maintaining excellent reconstruction quality. Specifically, we adopt max pooling-based downsampling as a lightweight alternative to WSA for extracting low-frequency features and introduce a shifted pooling mechanism that emulates the shifted window strategy of Swin transformers within a convolutional neural network (CNN) framework. This mechanism is embedded within a cross-aggregation module to facilitate efficient inter-region feature interaction. Moreover, we generalize the pooling operation from square to rectangular regions to enhance the model’s ability to capture spatial dependencies across different orientations. Extensive experiments on public SR benchmarks demonstrate that the proposed method achieves competitive reconstruction accuracy while offering significantly better efficiency compared with existing state-of-the-art methods. The source code and pretrained models are available at: https://github.com/hms-source/SPCAN .
Building similarity graph...
Analyzing shared references across papers
Loading...
Rui He
Zhenyang Zhu
Xiaoyang Mao
The Visual Computer
Building similarity graph...
Analyzing shared references across papers
Loading...
He et al. (Sun,) studied this question.
www.synapsesocial.com/papers/69bf89a9f665edcd009e97ed — DOI: https://doi.org/10.1007/s00371-026-04357-6