With the advancement of deep learning, polyp segmentation in endoscopic images has achieved remarkable progress. However, clinical polyps often exhibit variable morphology, blurred boundaries, and low contrast with the intestinal mucosa, hindering accurate lesion localization and edge delineation. Moreover, complex conditions of low light, luminal distortion, and mucosal folds further exacerbate the problem with identification, resulting in frequent misdetections and omissions in computer-aided diagnosis. Accordingly, we propose Waveformer, a local-global co-modeling segmentation network, to improve segmentation accuracy. Concretely, the encoder employs parallel CNN-Transformer branches to synergistically extract detailed and global features, thereby enhancing the completeness and discriminative power of the representation. The decoder integrates a wavelet-based frequency decomposition unit (WFDU), a camouflage identification module (CIM), and an information fusion layer (IFL). These modules collaboratively enhance edge responses and semantic aggregation across scales, significantly boosting the framework's capability in boundary modeling and lesion discernment. Extensive experiments on CVC-ClinicDB and Kvasir-SEG datasets achieve Dice Similarity Coefficients (DSC) of 95.60 % and 94.11 %, outperforming fourteen state-of-the-art (SOTA) methods. Cross-dataset evaluations further verify its strong generalization ability, with DSC scores of 81.0 % and 79.2 %, respectively.
LIANG et al. (Thu,) studied this question.