Voice cloning and speech synthesis tools have become widely accessible in recent years, raising genuine concerns about their misuse in fraud, misinformation, and identity theft. Detecting such fabricated audio is no longer an academic curiosity but a pressing societal need. This work introduces a lightweight yet effective detection framework that listens for subtle inconsistencies in synthetic speech by combining two complementary audio representations — Mel Frequency Cepstral Coefficients (MFCC) and Mel Spectrogram — and feeding their fused form into a hybrid deep learning pipeline. The pipeline first applies a one-dimensional Convolutional Neural Network (CNN) to spot local spectral irregularities, then passes the output through a Bidirectional Long Short-Term Memory (BiLSTM) layer to track how those irregularities evolve over time in both directions, and finally uses a self attention layer to spotlight the most telling moments in the recording. When tested on the ASVspoof 2019 Logical Access benchmark, which pits real speech against nineteen different synthetic systems, the proposed model records an accuracy of 96.8 %, precision of 96.2 %, recall of 97.1 %, and F1-score of 96.6 %. The whole system is wrapped in a Streamlit web interface that returns a verdict in under two seconds on ordinary laptop hardware, showing that strong protection against audio deepfakes does not demand expensive infrastructure.
Building similarity graph...
Analyzing shared references across papers
Loading...
Subathra R
Thiru Selvam T
Prem Kumar R
Government College of Science
Building similarity graph...
Analyzing shared references across papers
Loading...
R et al. (Wed,) studied this question.
www.synapsesocial.com/papers/69df2abce4eeef8a2a6afba8 — DOI: https://doi.org/10.56975/ijnrd.v11i4.323054
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: