Multimodal large language models (MLLMs) can automatically analyze clinical video, but evidence from full esophagogastroduodenoscopy (EGD) and the impact of on-screen computer-aided detection/diagnosis (CAD) overlays on MLLM behavior remain unclear. We tested whether an MLLM can produce clinically adequate EGD reports and whether a CAD overlay changes performance. We analyzed five complete EGD videos with Gemini 2.5 Pro in paired versions: (1) clean video and (2) the same video with a CAD overlay. Five blinded endoscopists rated report adequacy in three domains. MLLM accuracy for landmarks/lesions was further assessed by two blinded expert endoscopists using the time-window rule (a model detection counted as correct if it occurred within ±2 s of the expert-annotated timestamp). In this retrospective pilot study, five archived diagnostic EGD procedures from five patients were available as full-length videos. Across five raters, MLLM Completeness was judged adequate in 56.0% (14/25 ratings) with Clean-Video versus 48.0% (12/25 ratings) with Overlay-Video (p = 0.500). Visualization was identical (36.0% 9/25 ratings for both; p = 1.000). Lesions characteristics were identical (16.0% 4/25 for both; p = 1.00). For the Landmark agreement, the overall accuracy of the MLLM with Clean-Video vs. Overlay-Video was: 0.55 95% CI 0.43-0.67 vs. 0.33 0.23-0.46, p = 0.029; sensitivity 0.53 0.40-0.66 vs. 0.35 0.24-0.49, p = 0.122; specificity 0.67 0.35-0.88 vs. 0.22 0.06-0.55, p = 0.125. In this pilot study, Gemini 2.5 Pro demonstrated inadequate performance for clinical EGD reporting. These hypothesis-generating findings suggest substantial optimization and larger-scale validation are required before deployment.
Building similarity graph...
Analyzing shared references across papers
Loading...
Davide Massimi
Luca Di Stefano
Tommy Rizkala
Digestive Endoscopy
KU Leuven
Université de Montréal
University of Oslo
Building similarity graph...
Analyzing shared references across papers
Loading...
Massimi et al. (Sun,) studied this question.
www.synapsesocial.com/papers/69b25be596eeacc4fceca4b4 — DOI: https://doi.org/10.1111/den.70134