Does a hybrid CNN-ViT deep learning framework improve epileptic seizure detection from EEG recordings compared to existing machine learning models?
Electroencephalogram (EEG) recordings from the Bangalore EEG Epilepsy Dataset (BEED)
Hybrid deep learning framework integrating convolutional neural networks (CNNs) with vision transformer (ViT) architectures
State-of-the-art CNN, long short-term memory (LSTM), attention-based, ensemble learning, contrastive learning models, and existing pre-trained CNN and transformer-based models
Epileptic seizure detection performance (accuracy, precision, recall, F1-score, and AUROC)surrogate
A novel hybrid CNN-ViT deep learning framework demonstrates high accuracy (98.3%) and robust generalization for automated epileptic seizure detection from EEG recordings.
Epileptic seizure detection from electroencephalogram (EEG) recordings remains a critical challenge in clinical neurology due to the highly non-linear, multichannel, and subject-dependent nature of EEG signals. Traditional machine learning methods and recent deep learning architectures have shown promise; however, their ability to generalize across diverse subjects and accurately capture both localized and global seizure dynamics remains limited. This study introduces a novel hybrid deep learning framework that integrates convolutional neural networks (CNNs) with vision transformer (ViT) architectures to achieve robust, end-to-end epileptic seizure detection using the Bangalore EEG Epilepsy Dataset (BEED). The CNN component extracts fine-grained temporal features, while the ViT encoder models long-range temporal and inter-channel dependencies through multi-head self-attention. Comprehensive preprocessing, physiologically meaningful data augmentation, and rigorous subject-independent evaluation were employed to ensure clinical reliability. Experimental results demonstrate that the proposed CNN–ViT model significantly outperforms state-of-the-art CNN, long short-term memory (LSTM), attention-based, ensemble learning, and contrastive learning models when evaluated on BEED. The model achieved an accuracy of 98.3%, a precision and recall of 98.2%, an F1-score of 98.2%, and an area under receiver operating characteristic curve (AUROC) of 0.991, indicating superior discriminative capability and generalization across subjects. Comparative analysis further confirms that existing pre-trained CNN and transformer-based models exhibit notably lower performance on BEED due to their limited capacity to capture global spatial–temporal EEG patterns. These findings highlight the effectiveness of combining localized convolutional feature extraction with global transformer-based attention, offering a robust and scalable solution for automated seizure detection and paving the way for real-time clinical deployment and intelligent EEG monitoring systems.
Building similarity graph...
Analyzing shared references across papers
Loading...
Heba et al. (Thu,) studied this question.
www.synapsesocial.com/papers/69cd7b275652765b073a8eb9 — DOI: https://doi.org/10.57197/jdr-2026-0812
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:
Khaled Mahmoud Heba
Abbas Hassan Abbas Atya
Building similarity graph...
Analyzing shared references across papers
Loading...