Image super-resolution (SR) is a computer vision task that reconstructs high-resolution (HR) images from low-resolution (LR) images using algorithms. Current Transformer architectures typically employ modules in a sequential arrangement, which often results in the loss of significant effective features across certain layers. Additionally, the Transformer-based models extract feature information through local windows, but they are prone to losing global features. To prevent the loss of valuable features without increasing model complexity, we propose a novel model, Weighted Hierarchy Aggregation(WHA). The core of our WHA is the Hierarchy Aggregation Block (HAB), which builds upon the techniques of residual and dense connections by introducing a learned weighting mechanism that adaptively emphasizes the most valuable features from each layer in the final output. Furthermore, to address the issue that local window self-attention tends to lose global features, we propose a Global Residual Self-Attention Block(GRSAB). The GRSAB can effectively solve the problem of global information loss by introducing two-dimensional channel attention. Our WHA not only accelerates network training but also effectively captures both local and global features, significantly enhancing super-resolution performance. Through extensive comparative experiments and ablation studies, our results demonstrate the high efficiency of HAB, especially in lightweight SR tasks. In the ×4 super-resolution task, our model achieves a 26.88dB PSNR score on the Urban100 benchmark dataset, demonstrating its efficacy through comparative analysis with other models sharing similar architectural frameworks.
Tang et al. (Wed,) studied this question.