What type of study is this?

This is a Qualitative Study study.

What question did this study set out to answer?

The research aims to evaluate how different prompts affect the performance of a large language model in psychiatric OSCE scenarios.

March 30, 2026Open Access

Large language model used to simulate psychiatric OSCE scenarios: a medical student perspective

Key Points

The research aims to evaluate how different prompts affect the performance of a large language model in psychiatric OSCE scenarios.
Utilized GPT-4o mini to analyze four psychiatric OSCE cases under different prompting conditions.
Prompts varied from standard clinical to context-enhanced and included distracting information.
Responses assessed using structured rubric for scoring and thematic analysis.
GPT-4o mini produced relevant responses under standard and context-enhanced prompts.
Decline in performance occurred with the introduction of irrelevant data.
Quantitative scores significantly dropped under distracting prompt conditions, indicating issues with coherence.

Abstract

AIM: This study investigates how varying prompt conditions influence the quality and clinical coherence of responses generated by a large language model (GPT-4o mini) in simulated psychiatric OSCE scenarios. METHODS: Four psychiatric OSCE cases were presented to GPT-4o mini under four conditions with increasing details: a standard clinical prompt, a context-enhanced prompt, and two variation prompts incorporating irrelevant or distracting information. GPT-4o mini was asked to perform key OSCE tasks, history-taking, risk assessment, explanation, and management for each case. Responses were scored using a standardised, structured rubric and analysed thematically. RESULTS: GPT-4o mini generated clinically relevant responses under standard and context-enhanced prompts. However, performance declined as irrelevant information was introduced. Quantitative scores dropped significantly across the different conditions, and qualitative analysis revealed reduced coherence, increased verbosity, and difficulty prioritising clinical content. CONCLUSIONS: LLMs like GPT-4o mini can generate useful responses when provided with clear and concise prompt instructions. However, in this study, we noted that clinical accuracy and coherence deteriorated in the presence of distracting or ambiguous input. This highlights the need for critical evaluation and unambiguous literacy when using LLMs in medical education.

Large language model used to simulate psychiatric OSCE scenarios: a medical student perspective

Key Points

Abstract

Cite This Study