Food image localization and recognition on edge devices is a core task in food computing, enabling convenient dietary monitoring and efficient health management. However, food localization and recognition presents significant challenges due to inherent intra-class variability, inter-class similarity, and non-rigid characteristics. To address these challenges, we propose YOLO-Multi Feature Fusion, a novel multi-feature fusion model for food image localization and recognition. Building upon the YOLOv5 framework, YOLO-Multi Feature Fusion integrates several key components: the Ghost Bottleneck from the lightweight GhostNet, a newly designed Multi-Scale Feature Bottleneck, a Bidirectional Vision Transformer, and an Information Cross-Exchange module. These modules enable the model to comprehensively capture and fuse complex feature information from food images while simultaneously reducing both model parameters and computational load. Extensive evaluations on benchmark datasets (UEC Food100, UEC Food256, and ZSFooD) demonstrate that YOLO-Multi Feature Fusion outperforms existing lightweight detectors. Compared to YOLOv5, YOLO-Multi Feature Fusion achieves mAP improvements of 3.0%, 3.0%, and 0.3% on these datasets, respectively, with parameter reductions of 5.7M, 4.7M, and 4.9M, and computational load reductions of 44.6 GFLOPs, 42.0 GFLOPs, and 42.0 GFLOPs. The source code will be released upon the formal publication of the paper.
Building similarity graph...
Analyzing shared references across papers
Loading...
Xiangyi Zhu
Yancun Yang
Pindan Cao
Building similarity graph...
Analyzing shared references across papers
Loading...
Zhu et al. (Tue,) studied this question.
www.synapsesocial.com/papers/69e1cf7b5cdc762e9d858597 — DOI: https://doi.org/10.1109/jbhi.2026.3683800