What question did this study set out to answer?

The aim is to assess the diagnostic accuracy of various multimodal large language models in differentiating between epileptic and functional seizures using smartphone videos.

April 10, 2026Open Access

Diagnostic accuracy of multimodal large language models in differentiating epileptic from functional seizures in smartphone recorded videos

Key Points

The aim is to assess the diagnostic accuracy of various multimodal large language models in differentiating between epileptic and functional seizures using smartphone videos.
Analyzed 24 smartphone-recorded videos from 15 patients.
Compared performance of four multimodal large language models: Gemini 1.5 Pro, 2.0 Flash, 2.5 Flash, and 2.5 Pro.
Used video-electroencephalography monitoring as the gold standard for seizure classification.
Diagnostic accuracy improved with successive models, peaking at 54.2% for Gemini 2.5 models.
Gemini 2.5 Pro showed significantly higher accuracy compared to earlier models (p = 0.01 and p = 0.003).
Accurate diagnoses were more frequent for upper body/face-focused videos (80.0%-90.0%) compared to whole-body views (28.6%-35.7%).
Models exhibited high confidence scores (median 8.0–9.0) that did not correlate well with correctness.

Abstract

Differentiating epileptic from functional seizures is a clinical challenge; while smartphone videos can aid diagnosis, they often require expert review, causing delays. We evaluated the accuracy of four successive multimodal large language models (LLMs), Gemini 1.5 Pro, 2.0 Flash, 2.5 Flash, and 2.5 Pro, in differentiating seizure types from smartphone videos without clinical context. In this prospective diagnostic study at a tertiary epilepsy center, 24 videos from 15 patients were analyzed, with video-electroencephalography monitoring as the gold standard. Of the 24 events (19 epileptic, 5 functional), diagnostic accuracy improved with successive models: Gemini 1.5 Pro (33.3%), Gemini 2.0 Flash (25.0%), and both Gemini 2.5 Flash and Pro (54.2%). In exploratory pairwise comparisons, Gemini 2.5 pro showed higher accuracy than Gemini 1.5 Pro (p = 0.01) and Gemini 2.0 Flash (p = 0.003). Performance was influenced by video features; for example, diagnosis was more accurate for the Gemini 2.5 models when videos focused on the upper body/face (80.0%-90.0%) compared to a whole-body view (28.6%-35.7%). All models reported high confidence scores (median 8.0–9.0) that were poorly aligned and did not correlate with correctness. Successive LLMs show improved yet modest accuracy for seizure classification from video alone, highlighting the need for domain-specific fine-tuning before clinical implementation.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Anshum Patel

Sai Krishna Vallamchetla

Adrian Safa

Journals

Scientific Reports

Actions

Institutions

Mayo Clinic in Arizona

Mayo Clinic in Florida

Jacksonville College

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Diagnostic accuracy of multimodal large language models in differentiating epileptic from functional seizures in smartphone recorded videos

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study