What question did this study set out to answer?

The aim is to create a synthetic data generation pipeline to improve long-context audio reasoning evaluation and training.

April 10, 2026Open Access

Generating Synthetic Doctor-Patient Conversations for Long-form Audio Summarization

Key Points

The aim is to create a synthetic data generation pipeline to improve long-context audio reasoning evaluation and training.
Developed a dialogue generation system driven by personas
Synthesized multi-speaker audio incorporating pauses, overlaps, and sound events
Produced LLM-based SOAP notes from the generated conversations
Released 8,800 synthetic conversations with 1.3k hours of audio for evaluation.
Cascaded approaches outperformed end-to-end models in current evaluations
Generated data serves as both training materials and controlled evaluation for audio reasoning tasks.

Abstract

Long-context audio reasoning is underserved in both training data and evaluation. Existing benchmarks target short-context tasks, and the open-ended generation tasks most relevant to long-context reasoning pose well-known challenges for automatic evaluation. We propose a synthetic data generation pipeline designed to serve both as a training resource and as a controlled evaluation environment, and instantiate it for first-visit doctor-patient conversations with SOAP note generation as the task. The pipeline has three stages, persona-driven dialogue generation, multi-speaker audio synthesis with overlap/pause modeling, room acoustics, and sound events, and LLM-based reference SOAP note production, built entirely on open-weight models. We release 8,800 synthetic conversations with 1.3k hours of corresponding audio and reference notes. Evaluating current open-weight systems, we find that cascaded approaches still substantially outperform end-to-end models.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Yanis Labrak

David Grünert

Séverin Baroudi

Actions

Institutions

Johns Hopkins University

University of Pittsburgh

The Ohio State University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Generating Synthetic Doctor-Patient Conversations for Long-form Audio Summarization

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study