What is the clinical evidence from this study?

Study design: Other. Population: Cardiovascular diseases (ECG interpretation). Intervention: PULSE (Multimodal Large Language Model) vs. Proprietary MLLMs (GPT-4o, Gemini 1.5 Pro, Claude 3.5 Sonnet) and open-source MLLMs. Primary outcome: Performance on ECGBench (Accuracy, AUC, F1, Report Score).

What question did this study set out to answer?

The aim is to improve electrocardiogram image interpretation using multimodal large language models.

March 18, 2026Open Access

Teaching multimodal LLMs to comprehend 12-lead electrocardiographic images

Key Result

PULSE, a multimodal large language model trained on over one million ECG images, outperformed general-purpose MLLMs by 21% to 33% in average accuracy across diverse ECG interpretation tasks.

Key Points

The aim is to improve electrocardiogram image interpretation using multimodal large language models.
Introduction of a large-scale ECG image instruction-tuning dataset with over one million samples.
Development of an open-source multimodal large language model for ECG imagery.
Creation of a human expert-developed benchmark for evaluating ECG interpretation across multiple datasets.
The new model outperforms general-purpose multimodal large language models by 21% to 33% in average accuracy.
Successful evaluation indicates significant improvements in ECG image interpretation capabilities.

Structured PICO

Does PULSE, a multimodal large language model trained on ECG images, improve ECG interpretation accuracy compared to general-purpose MLLMs?

Population

Over one million synthesized and real-world 12-lead ECG images covering diverse tasks including feature recognition, rhythm analysis, morphology assessment, and clinical report generation.

Intervention

PULSE, a fully open-source 7B multimodal large language model (MLLM) trained on the ECGInstruct dataset.

Comparator

General-purpose proprietary MLLMs (e.g., GPT-4o, Gemini 1.5 Pro, Claude 3.5 Sonnet), open-source MLLMs (e.g., LLaVA), and domain-specific signal-based methods.

Outcome

Performance on ECG interpretation tasks measured by Macro AUC, Macro F1, Hamming Loss, accuracy, and Report Perfect Score.

PULSE, a novel open-source multimodal large language model trained on over one million ECG images, establishes a new state-of-the-art for automated ECG image interpretation, significantly outperforming general-purpose models.

Limitations

Original reports in training datasets are not always fully narrative and may consist of structured key points or multilingual entries
Lack of richer clinician-authored narrative reports for training
Need for prospective testing in hospital environments to bridge the gap between controlled evaluation and clinical practice
Risk of incorrect or overconfident outputs and potential dataset biases
Complex and open-ended tasks remain challenging and demand stronger reasoning and instruction-following capabilities

Abstract

Abstract Electrocardiograms (ECGs) are essential, non-invasive diagnostic tools for assessing cardiac conditions. Existing methods often have limited generalizability, focus on narrow condition sets, and rely on raw physiological signals, which may be unavailable in resource-limited settings where only printed or digital ECG images are accessible. Recent advances in multimodal large language models (MLLMs) offer new opportunities, yet ECG image interpretation remains challenging due to the lack of instruction-tuning data and standardized benchmarks. To address these gaps, we introduce , the first large-scale ECG image instruction-tuning dataset with over one million samples, covering diverse tasks including feature recognition, rhythm analysis, morphology assessment, and clinical report generation. We develop , a fully open-source MLLM for ECG image interpretation trained on . We further curate , a human expert-developed benchmark spanning four core ECG interpretation tasks across nine datasets, incorporating both synthesized and real-world ECG images to enable clinically realistic evaluation. Our experiments demonstrate that establishes a new state of the art, outperforming general-purpose MLLMs by 21% to 33% in average accuracy. These results highlight the potential of to improve ECG image interpretation in clinical practice. All code, data and models are available at https://aimedlab.github.io/PULSE/ .

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Ruoqi Liu

Yuelin Bai

Xiang Yue

Journals

npj Digital Medicine

Actions

Institutions

Carnegie Mellon University

The Ohio State University

Shenzhen Institutes of Advanced Technology

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Teaching multimodal LLMs to comprehend 12-lead electrocardiographic images

Key Result

Key Points

Structured PICO

Limitations

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study