What question did this study set out to answer?

To create a large multimodal retinal image dataset that includes detailed annotations for diabetic retinopathy and diabetic macular edema.

March 13, 2026Open Access

A multimodal retinal image dataset for diabetic retinopathy detection using foundation models

Key Points

To create a large multimodal retinal image dataset that includes detailed annotations for diabetic retinopathy and diabetic macular edema.
Introduced a dataset containing color fundus photography, optical coherence tomography, and ultrawide-field images.
Provided detailed lesion-level annotations and severity grades.
Evaluated various foundation models and vision-language models on this dataset.
Revealed significant performance gaps in current models when assessed on the dataset.
Identified specific challenges related to model generalizability and domain application.

Abstract

Diabetic Retinopathy (DR), a leading cause of preventable blindness worldwide, underscores the urgent need for robust AI-driven diagnostic tools. Although various deep learning models for retinal imaging have emerged, their evaluation remains constrained by limited public available datasets that lack both large-scale coverage and fine-grained annotations, compromising reliable assessments of model generalizability. To bridge this gap, we introduce a comprehensive multimodal dataset that includes three key retinal imaging modalities: color fundus photography (CFP), optical coherence tomography (OCT), and ultrawide-field fundus imaging (UWF). Our dataset is unprecedented in scale and modality diversity, provides detailed lesion-level annotations and severity grades for DR and Diabetic Macular Edema (DME). We benchmark a range of fundus foundation models and large vision-language models on this dataset, revealing critical performance gaps and domain-specific challenges. By unifying large-scale multimodal data with precisely annotated clinical labels, our work establishes a foundational benchmark to drive advances in AI reliability and real-world clinical utility.

Bookmark

View Full Paper

Bookmark

View Full Paper

A multimodal retinal image dataset for diabetic retinopathy detection using foundation models

Key Points

Abstract

Cite This Study