Abstract Introduction Sleep scoring is essential for clinical sleep diagnostics, and recent advances in deep learning have accelerated and standardized this process. Transformer-based models such as SleepTransformer have achieved state-of-the-art performance, and our previous work, FlexSleepTransformer, further improved scoring accuracy and cross-dataset generalizability by fusing information from multiple PSG channels. However, these methods rely on fixed channel encodings that fail to capture the spatial organization of PSG electrodes, limiting their ability to model inter-channel relationships effectively. To address this limitation, we introduced a learnable 2D PSG channel encoding that explicitly represents spatial structure and integrated it into FlexSleepTransformer, leading to improved performance across multiple datasets. Methods A total of 543 subjects from seven independently acquired datasets were included. For each dataset, subject-level 5-fold cross-validation was performed to prevent data leakage. The baseline model followed the two-level sequence-to-sequence SleepTransformer architecture, which processed intra-epoch information and inter-epoch temporal context in a manner consistent with human scoring guidelines. Because the datasets differed in their PSG channel configurations, the model required a way to recognize each channel’s spatial origin. Traditional fixed channel encodings identified separate channels but failed to capture spatial relationships. To overcome this limitation, we introduced a learnable 2D PSG positional encoding that allowed the model to autonomously learn spatially informed representations of signals from different brain regions. Three models were evaluated across all datasets: (1) no channel encoding, (2) fixed channel encoding, and (3) the proposed learnable 2D channel encoding. Results Across all seven datasets, the proposed learnable 2D channel encoding achieved the highest average accuracy (82.84%±2.08%), outperforming both the no-encoding model (81.6%±1.79%) and the fixed-encoding model (82.35%±1.53%). Statistical comparisons further showed that the learnable encoding significantly outperformed the no-encoding baseline on six datasets and outperformed the fixed encoding on four datasets, demonstrating its consistent advantage across diverse data sources. Conclusion We introduced a learnable 2D EEG channel encoding for Transformer-based sleep staging and successfully incorporated it into FlexSleepTransformer. Results across seven datasets demonstrated consistent and significant performance improvements, highlighting the strong potential of this approach for deployment in real clinical workflows. Support (if any) None
Guo et al. (Fri,) studied this question.