What question did this study set out to answer?

This research aims to develop a CNN–Transformer model specifically designed for acne image classification, addressing acoustic challenges inherent in such images.

April 22, 2026Open Access

AcneFormer: A Lesion-Aware and Noise-Robust CNN–Transformer for Acne Image Classification

Key Points

This research aims to develop a CNN–Transformer model specifically designed for acne image classification, addressing acoustic challenges inherent in such images.
Proposed AcneFormer architecture combining CNN and Transformer elements.
Implemented modules: Lesion Cue Enhancement for spatial pattern recognition, Cross-Layer Feature Transmission for information flow, and Differential Semantic Denoising to reduce noise.
Conducted extensive performance tests against multiple baseline models.
AcneFormer outperformed existing models in acne image classification tasks.
Ablation studies indicated significant improvements in lesion localization and recognition accuracy.
The new modules successfully enhanced feature interactions and reduced noise effects.

Abstract

Convolutional neural networks (CNNs) have been widely used for acne image classification due to their effectiveness in capturing local texture of skin lesions. However, the locality of convolution operations limits their ability to model long-range dependencies. Vision Transformer (ViT) methods address this issue to some extent but their high computational complexity and reliance on large-scale pre-training present challenges. Although CNN–Transformer architecture alleviates this conflict to some extent, acne images present task-specific challenges, including indistinct lesion boundaries, subtle inter-class variations, and various facial interference factors. In this paper, we propose AcneFormer, a lesion-aware and noise-robust CNN–Transformer architecture for acne image classification. We introduce three modules especially for acne tasks: a Lesion Cue Enhancement (LCE) module to highlight discriminative multi-scale spatial patterns, a Cross-Layer Feature Transmission (CLFT) module to enhance cross-layer information flow in Transformers, and a Differential Semantic Denoising (DSD) module to suppress irrelevant responses during deep feature interaction. Extensive experiments show that AcneFormer outperforms several strong baselines. Ablation and external lesion-annotated analyses further show a consistent pattern: LCE mainly improves lesion-sensitive localization and class-balanced recognition, CLFT expands valid cross-depth lesion evidence, and DSD suppresses off-lesion semantic responses.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Zhou et al. (Mon,) studied this question.

synapsesocial.com/papers/69e865926e0dea528ddea155 https://doi.org/https://doi.org/10.3390/s26082533

Bookmark

View Full Paper