Advances in speech technology and Natural Language Processing (NLP) have demonstrated promise in using speech as a valid source of data to detect features of psychosis. These technologies can potentially detect subtle speech aberrations that often go unnoticed by clinicians and family members. However, research in this area is hindered by a significant limitation: a lack of sufficient and appropriate speech corpora from psychosis patients, especially datasets containing naturalistic speech that reflects typical clinical interactions. This scarcity limits the development, testing, and generalization of new computational methods for psychosis prediction. To address this gap, our new dataset offers naturalistic speech samples collected using the semi-structured DISCOURSE protocol. This resource includes both raw audio recordings and transcribed speech from individuals participating in an early-stage psychosis treatment program (<5 years of illness), alongside demographically matched healthy controls, in English. In addition to speech data, the dataset provides comprehensive clinical, cognitive, and demographic information for each participant. Importantly, the DISCOURSE protocol and clinical assessments were repeated after a 12-month follow-up to assess stability and change in speech, symptom burden and functional status. As the inaugural dataset released by the DISCOURSE consortium, this resource marks the beginning of a series of harmonized data collection efforts across multiple countries and languages. This multi-site, multi-language approach enables validation of findings in diverse psychosis populations, allowing researchers to address questions that cannot be resolved at individual research sites. Transcripts were extracted from conversations lasting between 15 and 35 minutes in total. This data herein can be used to perform analyses on acoustic, semantic, syntactic and pragmatic measures related to psychosis, as well as in understanding the nature of communication difficulties faced by patients. We expect this dataset to be useful for future investigations into speech data's clinical utility in assessing thought disorder and psychosis-related symptoms.
Building similarity graph...
Analyzing shared references across papers
Loading...
Brian Cho
Estée Balles
Michael Mackinley
Data in Brief
McGill University
Western University
Douglas Mental Health University Institute
Building similarity graph...
Analyzing shared references across papers
Loading...
Cho et al. (Tue,) studied this question.
www.synapsesocial.com/papers/69a75a62c6e9836116a201ff — DOI: https://doi.org/10.1016/j.dib.2026.112517