Abstract The advent of wide-field surveys promises a revolution in our understanding of the Universe, yet the accumulation of heterogeneous, unlabeled image datasets presents a formidable computational challenge. Conventional deep learning approaches rely heavily on labeled data, which is often unavailable or inconsistent across different telescopes. To overcome this bottleneck, we present a foundational image model based on masked autoencoders tailored for the unique properties of astronomical data. We introduce a standardized benchmark encompassing both pretraining and diverse downstream tasks, optimizing the model architecture to capture features in sparse, noisy environments. Our evaluations reveal that this self-supervised approach yields significant performance gains over supervised baselines in galaxy classification, object detection, and redshift estimation. Most notably, the models demonstrate robust transferability across different observational instruments, effectively mitigating the domain shift problem. These results provide a scalable solution for the automated analysis of petabyte-scale datasets, a critical requirement for the success of the Chinese Space Station Telescope and other future observatories.
Lv et al. (Thu,) studied this question.