What does this research mean for the field?

The proposed YOLO-Multi Feature Fusion model improves food image localization and recognition accuracy while significantly reducing model parameters and computational load compared to existing lightweight detectors like YOLOv5. Novelty: ClaimNovelty.METHODOLOGICAL. Consensus alignment: ConsensusAlignment.NEUTRAL.

April 17, 2026

Lightweight Food Localization and Recognition via Multi-Branch Feature Learning and Enhanced Aggregation.

Key Points

The aim is to improve food image localization and recognition on edge devices for dietary monitoring.
Developed the YOLO-Multi Feature Fusion model based on the YOLOv5 framework.
Integrated Ghost Bottleneck, Multi-Scale Feature Bottleneck, Bidirectional Vision Transformer, and Information Cross-Exchange module.
Evaluated performance using benchmark datasets: UEC Food100, UEC Food256, and ZSFooD.
Achieved mAP improvements of 3.0%, 3.0%, and 0.3% across the datasets compared to YOLOv5.
Reduced model parameters by 5.7M, 4.7M, and 4.9M, respectively.
Lowered computational load by 44.6 GFLOPs, 42.0 GFLOPs, and 42.0 GFLOPs.

Abstract

Food image localization and recognition on edge devices is a core task in food computing, enabling convenient dietary monitoring and efficient health management. However, food localization and recognition presents significant challenges due to inherent intra-class variability, inter-class similarity, and non-rigid characteristics. To address these challenges, we propose YOLO-Multi Feature Fusion, a novel multi-feature fusion model for food image localization and recognition. Building upon the YOLOv5 framework, YOLO-Multi Feature Fusion integrates several key components: the Ghost Bottleneck from the lightweight GhostNet, a newly designed Multi-Scale Feature Bottleneck, a Bidirectional Vision Transformer, and an Information Cross-Exchange module. These modules enable the model to comprehensively capture and fuse complex feature information from food images while simultaneously reducing both model parameters and computational load. Extensive evaluations on benchmark datasets (UEC Food100, UEC Food256, and ZSFooD) demonstrate that YOLO-Multi Feature Fusion outperforms existing lightweight detectors. Compared to YOLOv5, YOLO-Multi Feature Fusion achieves mAP improvements of 3.0%, 3.0%, and 0.3% on these datasets, respectively, with parameter reductions of 5.7M, 4.7M, and 4.9M, and computational load reductions of 44.6 GFLOPs, 42.0 GFLOPs, and 42.0 GFLOPs. The source code will be released upon the formal publication of the paper.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Xiangyi Zhu

Yancun Yang

Pindan Cao

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Lightweight Food Localization and Recognition via Multi-Branch Feature Learning and Enhanced Aggregation.

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study