What question did this study set out to answer?

The aim is to enhance medical image segmentation by overcoming limitations of standard transformer architectures.

April 15, 2026Open Access

Swin-UNet: A Unified Transformer–CNN Framework for Multi-Organ Medical Image Segmentation

Key Points

The aim is to enhance medical image segmentation by overcoming limitations of standard transformer architectures.
Introduces Swin-UNet framework combining Swin Transformer and U-Net concepts.
Uses shifted-window self-attention in the encoder for local-global feature learning.
Employs residual convolutional paths and multi-scale patch embeddings in the decoder for robustness and reconstruction.
Evaluated on the Synapse multi-organ CT dataset.
Achieves competitive Dice scores in segmentation tasks.
Shows lower Hausdorff distances compared to U-Net and TransUNet.
Highlights a balance between computational efficiency and accuracy.

Abstract

Transformer-based architectures have demonstrated significant promise in medical image segmentation due to their strong ability to model long-range contextual relationships. However, standard Vision Transformer (ViT) modules used in hybrid networks such as TransUNet are limited in representing both fine-grained and coarse features effectively. To overcome this limitation, this paper introduces Swin-UNet, a hybrid framework that combines the hierarchical Swin Transformer encoder with a U-Net-inspired decoder. The encoder utilizes shifted-window self-attention for efficient local-global feature learning, while the decoder integrates residual convolutional paths and multi-scale patch embeddings for improved reconstruction and scale robustness. Evaluated on the Synapse multi-organ CT dataset, the model achieves competitive Dice scores and lower Hausdorff distances compared to U-Net and TransUNet, highlighting its potential as a robust and generalizable approach for medical image segmentation. These results suggest that the Swin-UNet effectively balances computational efficiency with segmentation accuracy, offering a strong foundation for future medical imaging applications.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Xiaodong Li

Actions

Institutions

Lanzhou University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Swin-UNet: A Unified Transformer–CNN Framework for Multi-Organ Medical Image Segmentation

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study