Signal peptides play essential roles in protein secretion and localization, and their accurate identification is critical for understanding protein synthesis, transport, and functional regulation. However, severe class imbalance in signal peptide data sets leads to substantially lower recognition performance for minor classes compared with major classes. Here, we propose a structure-aware multimodal signal peptide prediction network (SaSPNet), which incorporates structural modality information into conventional sequence modeling and uses a graph convolutional network (GCN)-based structure encoder to learn structural representations of signal peptides for both signal peptide type and cleavage-site prediction. SaSPNet significantly improves the prediction performance for minor signal peptide classes on the USPNet data set, achieving more than a 10% gain over existing methods on key minor-class metrics. Feature visualization and explainability analyses show that the structure encoder learns more discriminative structural patterns for minor signal peptides, revealing the mechanism by which the structural modality enhances model performance. In addition, comparative analyses using three-dimensional structures generated by different structure prediction models demonstrate that SaSPNet is robust to variations in structural data quality. We further construct an independent test set, SP-MinorEval, specifically for minor signal peptides, and evaluations on this data set show that SaSPNet maintains strong performance across domains, providing an effective tool for minor-class signal peptide prediction, protein secretion mechanism studies, and functional protein discovery.
Yang et al. (Mon,) studied this question.