What question did this study set out to answer?

The aim is to enhance medical image segmentation by overcoming limitations of standard transformer architectures.

April 15, 2026Open Access

Swin-UNet: A Unified Transformer–CNN Framework for Multi-Organ Medical Image Segmentation

Key Points

The aim is to enhance medical image segmentation by overcoming limitations of standard transformer architectures.
Introduces Swin-UNet framework combining Swin Transformer and U-Net concepts.
Uses shifted-window self-attention in the encoder for local-global feature learning.
Employs residual convolutional paths and multi-scale patch embeddings in the decoder for robustness and reconstruction.
Evaluated on the Synapse multi-organ CT dataset.
Achieves competitive Dice scores in segmentation tasks.
Shows lower Hausdorff distances compared to U-Net and TransUNet.
Highlights a balance between computational efficiency and accuracy.

Abstract

Transformer-based architectures have demonstrated significant promise in medical image segmentation due to their strong ability to model long-range contextual relationships. However, standard Vision Transformer (ViT) modules used in hybrid networks such as TransUNet are limited in representing both fine-grained and coarse features effectively. To overcome this limitation, this paper introduces Swin-UNet, a hybrid framework that combines the hierarchical Swin Transformer encoder with a U-Net-inspired decoder. The encoder utilizes shifted-window self-attention for efficient local-global feature learning, while the decoder integrates residual convolutional paths and multi-scale patch embeddings for improved reconstruction and scale robustness. Evaluated on the Synapse multi-organ CT dataset, the model achieves competitive Dice scores and lower Hausdorff distances compared to U-Net and TransUNet, highlighting its potential as a robust and generalizable approach for medical image segmentation. These results suggest that the Swin-UNet effectively balances computational efficiency with segmentation accuracy, offering a strong foundation for future medical imaging applications.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Xiaodong Li (Mon,) studied this question.

www.synapsesocial.com/papers/69df2c88e4eeef8a2a6b1bba — DOI: https://doi.org/10.1051/itmconf/20268401003/pdf

Swin-UNet: A Unified Transformer–CNN Framework for Multi-Organ Medical Image Segmentation

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion