Summary The precise and comprehensive diagnosis of complex brain disorders relies on non-invasive computed tomography (CT) and magnetic resonance imaging (MRI) in conjunction with multi-modal clinical information. Here, we present Brainfound, a multi-modal foundation model for brain medical imaging that integrates image-text contrastive learning with a diffusion-based generative framework. The model was pre-trained on more than 3 million brain CT slices and 7 million brain MRI slices paired with clinical reports. In multi-center evaluations, Brainfound demonstrates state-of-the-art performance across seven tasks, including brain disease diagnosis, lesion segmentation, MRI enhancement, cross-modality translation, automatic report generation, zero-shot disease classification, and human-AI dialogue. It substantially outperforms leading models in automated report generation and clinical question answering for brain imaging, and its performance approaches that of expert physicians. These findings highlight the potential of Brainfound for accelerating diagnosis, support treatment decisions, and advance human-in-the-loop brain health care.
Zhang et al. (Wed,) studied this question.