What question did this study set out to answer?

This research aims to determine if large language models can analyze trauma narratives to identify symptoms of PTSD.

March 15, 2026Open Access

Investigating the Use of Large Language Models for the Detection of Trauma-Related Symptoms: An Exploratory Study Using Trauma Narratives

Key Points

This research aims to determine if large language models can analyze trauma narratives to identify symptoms of PTSD.
Analyzed 1000 trauma narratives using Gemini 1.5 Pro and GPT-4o.
Compared basic binary/graded inference prompts and complex prompts with demographic details.
Evaluated accuracy with Pearson correlations and Area Under the Curve metrics.
Found small-to-moderate positive correlations between LLM inferences and self-reported PTSD symptoms.
Graded inferences showed stronger correlations than binary inferences.
AUC for probable PTSD peaked at 0.713 with Gemini 1.5 Pro.

Abstract

This study explores whether Large Language Models (LLMs) can analyze brief trauma narratives in ways that align with self-reported trauma-related symptoms and probable Post-Traumatic Stress Disorder (PTSD) based on a recommended cut-off. We investigate Gemini 1.5 Pro and GPT-4o's ability to score 1000 trauma narratives for severity of PTSD symptoms comparing various prompting strategies. Prompts ranged from basic instructions requesting a binary or graded (5-point) inference based solely on the narrative, to more complex prompts incorporating demographic information (age, gender), time since the traumatic event, and extended instructions detailing PTSD symptomatology. Accuracy of inferences was evaluated using Pearson correlations and Area Under the Curve (AUC) metrics computed between LLM inferences and self-reported measures. Results showed small-to-moderate positive correlations across all prompting strategies, with correlations as high as r =.42 with self-report symptoms. Graded inferences yielded stronger correlations with self-reported symptoms than binary inferences. The AUC for probable PTSD peaked at 0.713 using Gemini 1.5 Pro, with a sensitivity of 0.72 and a specificity of 0.66 in detecting PTSD symptoms. Of note, providing extended instructions on PTSD symptoms did not reliably improve performance beyond basic demographic and temporal context. The performance of LLMs was comparable to that achieved by traditional machine learning methods trained on large datasets but without requiring extensive training data; however, the emerging moderate associations underscore the need for continued refinement. Findings are limited by brief, self-selected English-language web narratives and self-reported symptoms and should be considered preliminary pending validation in clinically verified samples. • PTSD risk can be inferred from brief descriptions of traumatic experiences • The performance of LLMs is comparable to traditional machine learning methods • Adding detailed instructions on PTSD did not enhance model performance • Results suggest that extensive training data might not be needed for text mining

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

D. Marengo

C.M. Hoeboer

M. Olff

Journals

Journal of Anxiety Disorders

Actions

Institutions

University of Turin

Amsterdam University Medical Centers

Arq Psychotrauma Expert Group

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Investigating the Use of Large Language Models for the Detection of Trauma-Related Symptoms: An Exploratory Study Using Trauma Narratives

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Also consider