March 3, 2026

Deep Sound Synthesis Matched to Brain Activity Recapitulates Preferential Responses to Speech and Music

Key Points

Synthetic sounds, derived from cortical activity, evoke similar responses as natural speech and music.
Significant activation occurs in the auditory cortex for sounds even when they are acoustically dissimilar to natural stimuli.
This study combines neuroimaging, deep neural networks, and psychophysical testing to explore sound categorization processes.
Findings imply that human auditory processing relies on complex internal representations when interpreting sounds.

Abstract

The human auditory system extracts meaning from sounds in the environment by transforming acoustic input signals into semantic categories, such as speech and music. Although distinct acoustic features give rise to these categorical percepts and to preferential responses in spatially segregated regions in the auditory cortex, the nature of the internal representations underlying this transformation remains poorly understood. Here, we combined neuroimaging, a deep neural network (DNN), brain-based sound synthesis, and psychophysical testing in human participants of either sex to investigate the internal sound features encoded in speech- and music-selective regions of the auditory cortex and their functional role in sound categorization. We found that sounds synthetized from cortical activity patterns - though acoustically dissimilar to natural speech and music sounds - nonetheless elicited similar categorical cortical and behavioral responses. These results suggest that the auditory cortex relies on internal, abstracted representations of category structure that are not reducible to the natural acoustic properties of speech and music. Our findings provide new insights into intermediate sound features, as captured by DNNs that may support categorization in the human auditory system.Significance Statement Speech and music are two uniquely human sound categories. While their distinctive acoustic features and semantic attributes have been studied extensively, how the human auditory system differentially transforms the acoustics into meaning remains largely unknown. In this study, we used a deep neural network (DNN) to sonify sounds from activity patterns in the auditory cortex. We found that these synthetic sounds, though acoustically dissimilar to natural speech and music, elicit similar categorical cortical and behavioral responses. Our findings indicate that categorization in the human auditory system relies on internal representations of category as captured by DNNs that are irreducible to natural acoustic speech and music features.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Xing et al. (Wed,) studied this question.

www.synapsesocial.com/papers/69a75ccdc6e9836116a25faa — DOI: https://doi.org/10.1523/jneurosci.1651-25.2025

Authors

Lidongsheng Xing

Elia Formisano

Lars Riecke

Journals

Journal of Neuroscience

Actions

Institutions

Maastricht University

Maastricht University Medical Centre

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Deep Sound Synthesis Matched to Brain Activity Recapitulates Preferential Responses to Speech and Music

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion