This paper proposes a multi-task method MTL-Swin-Unet for classification task. This is multi-task learning usingtransformers for classification and semantic segmentation. For spurious-correlation problems, this method allowsus to enhance the image representation with two other image representations: representation obtained by semanticsegmentation and representation obtained by image reconstruction. In our experiments,the proposed method outperformed4.8% in precision compared with SwinTransformer when the test data included slices from the same patient(no covariance shift setting). Similarly, when the test data did not include slices from the same patient (covarianceshift setting), the proposed method outperformed 11.2% in recall compared with SwinTransformer.
Hirata et al. (Sat,) studied this question.