What question did this study set out to answer?

This research aims to address the issue of size-based shortcut learning in the classification of Ziziphus seeds, focusing on enhancing generalization across datasets.

June 4, 2026Open Access

CNN-Based Classification of Ziziphus Seeds with Focal Loss for Overcoming Size-Based Shortcut Learning

Read Full Paperexternally

Key Points

This research aims to address the issue of size-based shortcut learning in the classification of Ziziphus seeds, focusing on enhancing generalization across datasets.
Models were trained on an internal dataset and tested on an external dataset, comparing focal loss with traditional methods.
Performance was assessed via classification accuracy on internal and external test sets, including data preprocessing techniques.
Grad-CAM++ and loss analyses were performed to evaluate model attention to features.
The focal loss model achieved a classification accuracy of 90.88 ± 2.71%, significantly improving generalization.
The internal–external generalization gap reduced from 16.18 to 8.11 percentage points with focal loss.
Size-normalized models showed improved accuracy over unprocessed models, confirming size shortcut suppression benefits.

Abstract

Herbal medicines represent a significant global market, yet food safety remains threatened by counterfeit products morphologically resembling authentic samples. Models trained on limited datasets are prone to shortcut learning, relying on superficial features rather than intrinsic morphological characteristics. This study identified size-based shortcut learning as a critical factor degrading the classification of Ziziphus jujuba Mill. var. spinosa and its counterfeit Ziziphus mauritiana Lam., and demonstrated that focal loss alone can effectively mitigate this issue. Models trained on the internal dataset were evaluated on an external dataset acquired with the Herb-X. On the internal test set, all configurations achieved high classification accuracies (≥98%), thereby obscuring meaningful differences in external generalization. However, consistent performance degradation was observed on the external dataset. The cross-entropy model trained on background-removed data dropped to 82.08 ± 10.97%, while size-normalized models recovered to 84.17 ± 10.15% (upsizing) and 88.94 ± 6.76% (downsizing), confirming that suppressing size shortcuts improves external generalization. The focal loss model, without any preprocessing, achieved 90.88 ± 2.71%, reducing the internal–external generalization gap from 16.18 to 8.11 percentage points. Grad-CAM++ and loss analyses confirmed that the focal loss model attended to intrinsic morphological features rather than object size. This study provides a practical, preprocessing-free approach for reliable herbal-medicine authentication in field conditions.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Y M Park

Kyung Hee University

Dae-Hyun Jung

Kyung Hee University

Journals

Biosensors

Actions

Institutions

Kyung Hee University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

CNN-Based Classification of Ziziphus Seeds with Focal Loss for Overcoming Size-Based Shortcut Learning

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Also consider