Most research on automatic speech analysis (ASA) has focused on acoustic features, while the potential of linguistic markers remains underexplored, particularly in clinically diagnosed, non-English-speaking populations. This study evaluated the integration of acoustic and linguistic markers for detecting depression in a Spanish-speaking clinical sample. The sample comprised 151 participants: 80 patients with major depressive disorder (MDD) or persistent depressive disorder (PDD) recruited from the Psychiatry Department of Vall d'Hebron University Hospital and 71 healthy controls. Participants answered 11 open-ended questions related to depressive symptoms and well-being via a web-based platform. Linguistic and acoustic variables spanning four categories, namely, prosodic, cepstral, spectral, and Teager Energy Operator (TEO)-based features, were extracted. Group comparisons and logistic regressions were performed to assess the predictive value of acoustic and linguistic features. Machine learning models were used to compare the performance of acoustic, linguistic, and ensemble classification models, combining both feature sets. TEO-based and cepstral features showed the strongest predictive power. Greater use of verbs, reduced use of nouns and past-tense verbs, smaller vocabulary size, and increased use of shorter words and sentences remained strong predictors of depression after adjusting for covariates. The linguistic model outperformed the acoustic model (AUC = 0.86 vs. = 0.79), while the ensemble modelachieved comparable overall performance (AUC = 0.86), with slightly improved accuracy (0.84) and specificity (0.93). Integrating linguistic features into automated speech analysis shows promise for depression detection. With further validation and refinement, brief speech-based assessments could support early depression detection in primary care. • Combined acoustic and linguistic features for depression detection in a clinically diagnosed, Spanish-speaking sample. • Among acoustic features, TEO-based and cepstral features showed the strongest predictive power. • The linguistic model outperformed the acoustic model (AUC = 0.86 vs 0.79). • An ensemble model combining both acoustic and linguistic features showed the highest accuracy (0.84) and specificity (0.93). • Age-stratified ensemble analyses revealed optimal performance for individuals aged ≤45 years (AUC = 0.90).
Building similarity graph...
Analyzing shared references across papers
Loading...
Patricia Laura Maran
Peru Gabirondo
Alexandra Vlaic
Journal of Affective Disorders
Universitat Autònoma de Barcelona
Vall d'Hebron Hospital Universitari
Centro de Investigación Biomédica en Red de Salud Mental
Building similarity graph...
Analyzing shared references across papers
Loading...
Maran et al. (Sun,) studied this question.
www.synapsesocial.com/papers/69ada8cfbc08abd80d5bc23c — DOI: https://doi.org/10.1016/j.jad.2026.121563
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: