This study explores whether Large Language Models (LLMs) can analyze brief trauma narratives in ways that align with self-reported trauma-related symptoms and probable Post-Traumatic Stress Disorder (PTSD) based on a recommended cut-off. We investigate Gemini 1.5 Pro and GPT-4o's ability to score 1000 trauma narratives for severity of PTSD symptoms comparing various prompting strategies. Prompts ranged from basic instructions requesting a binary or graded (5-point) inference based solely on the narrative, to more complex prompts incorporating demographic information (age, gender), time since the traumatic event, and extended instructions detailing PTSD symptomatology. Accuracy of inferences was evaluated using Pearson correlations and Area Under the Curve (AUC) metrics computed between LLM inferences and self-reported measures. Results showed small-to-moderate positive correlations across all prompting strategies, with correlations as high as r =.42 with self-report symptoms. Graded inferences yielded stronger correlations with self-reported symptoms than binary inferences. The AUC for probable PTSD peaked at 0.713 using Gemini 1.5 Pro, with a sensitivity of 0.72 and a specificity of 0.66 in detecting PTSD symptoms. Of note, providing extended instructions on PTSD symptoms did not reliably improve performance beyond basic demographic and temporal context. The performance of LLMs was comparable to that achieved by traditional machine learning methods trained on large datasets but without requiring extensive training data; however, the emerging moderate associations underscore the need for continued refinement. Findings are limited by brief, self-selected English-language web narratives and self-reported symptoms and should be considered preliminary pending validation in clinically verified samples. • PTSD risk can be inferred from brief descriptions of traumatic experiences • The performance of LLMs is comparable to traditional machine learning methods • Adding detailed instructions on PTSD did not enhance model performance • Results suggest that extensive training data might not be needed for text mining
Building similarity graph...
Analyzing shared references across papers
Loading...
D. Marengo
C.M. Hoeboer
M. Olff
Journal of Anxiety Disorders
University of Turin
Amsterdam University Medical Centers
Arq Psychotrauma Expert Group
Building similarity graph...
Analyzing shared references across papers
Loading...
Marengo et al. (Sun,) studied this question.
synapsesocial.com/papers/69b5ff6e83145bc643d1be35 — DOI: https://doi.org/10.1016/j.janxdis.2026.103151
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: