Introduction Accurate recognition of plant pests and diseases under field conditions remains challenging due to complex symptom morphology, environmental variability, and limited annotated data. While deep learning has been widely adopted for image-based diagnosis, existing studies are often model-centric and evaluated under heterogeneous experimental settings, making it difficult to derive paradigm-level insights into data efficiency, robustness, and practical deployment suitability. Methods In this research, we systematically compare three main types of vision models: Convolutional Neural Networks (CNNs), Vision Transformers, and mixed State Space Model-based structures (MambaVision). These models are applied to classify images of pests and diseases across multiple crops. Using a unified and reproducible experimental framework, we benchmark representative models across multiple training regimes, diverse crop species, and symptom types reflecting realistic agricultural scenarios. Results Results reveal clear paradigm-level differences. CNN-based models perform competitively on diseases dominated by localized lesion textures but show limited robustness for symptoms requiring global spatial interpretation. Transformer-based models benefit from global dependency modeling yet exhibit increased instability under small-sample conditions. In contrast, hybrid MambaVision-based models consistently demonstrate superior data efficiency and robustness, retaining approximately 60–80% accuracy under extreme data scarcity (1% training samples) and achieving stable, high F1-scores across symptom types that require joint modeling of fine-grained textures and long-range spatial distribution. Furthermore, performance–efficiency analysis shows that hybrid MambaVision-based models achieve a more favorable accuracy–computational cost trade-off than CNN-based and Transformer-based models, supporting deployment under practical resource constraints. Discussion Overall, this study provides pathology-oriented and deployment-aware insights into how architectural inductive bias interacts with symptom morphology and data availability, highlighting hybrid MambaVision-based models as a robust and effective solution for real-world plant pest and disease recognition.
Building similarity graph...
Analyzing shared references across papers
Loading...
Hu et al. (Fri,) studied this question.
www.synapsesocial.com/papers/69e7132bcb99343efc98ce20 — DOI: https://doi.org/10.3389/fpls.2026.1807927
Liya Hu
Bowen Shi
Shiqi Hu
Frontiers in Plant Science
SHILAP Revista de lepidopterología
Shandong University of Science and Technology
Shandong Academy of Agricultural Sciences
Building similarity graph...
Analyzing shared references across papers
Loading...