In recent years, the damage to humans and crops caused by bears and other vermin has become increasingly serious across Japan. Although smart agricultural monitoring systems have shown some promise, they are still limited by issues such as specificity to certain species, high expenses, and a lack of adaptability. This study focused on creating and testing a zero-shot system for vermin detection using a multimodal large language model. A total of 1,073 images were collected using cameras installed at three locations in Nanae-cho, Hokkaido, Japan, between May and September 2025. Twenty-two images showed the target animals, including 12 bears, nine deer, and one crow. A comparative evaluation of GPT-4o, LLaVA, YOLO-World, and Grounding DINO showed that GPT-4o had promising recall in our preliminary deployment (recall =1.00), although 17 false detections occurred in images without animals.
Building similarity graph...
Analyzing shared references across papers
Loading...
Koki Sato
Katsuma Akamatsu
Hayato Miura
Journal of Robotics and Mechatronics
Future University Hakodate
Building similarity graph...
Analyzing shared references across papers
Loading...
Sato et al. (Sun,) studied this question.
www.synapsesocial.com/papers/69e713b4cb99343efc98d1ca — DOI: https://doi.org/10.20965/jrm.2026.p0460
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: