Diabetic Retinopathy (DR), a leading cause of preventable blindness worldwide, underscores the urgent need for robust AI-driven diagnostic tools. Although various deep learning models for retinal imaging have emerged, their evaluation remains constrained by limited public available datasets that lack both large-scale coverage and fine-grained annotations, compromising reliable assessments of model generalizability. To bridge this gap, we introduce a comprehensive multimodal dataset that includes three key retinal imaging modalities: color fundus photography (CFP), optical coherence tomography (OCT), and ultrawide-field fundus imaging (UWF). Our dataset is unprecedented in scale and modality diversity, provides detailed lesion-level annotations and severity grades for DR and Diabetic Macular Edema (DME). We benchmark a range of fundus foundation models and large vision-language models on this dataset, revealing critical performance gaps and domain-specific challenges. By unifying large-scale multimodal data with precisely annotated clinical labels, our work establishes a foundational benchmark to drive advances in AI reliability and real-world clinical utility.
Tang et al. (Tue,) studied this question.