February 22, 2024

Zero-shot Object Navigation with Vision-Language Foundation Models Reasoning

Key Points

Key points are not available for this paper at this time.

Abstract

This research introduces a novel method for zero-shot object navigation, enabling agents to navigate unexplored environments. Our approach differs from traditional methods, which often fail in new settings due to their dependence on large navigation datasets for training. We use Large Vision Language Models (LVLMs) to help agents understand and move through unfamiliar visual environments without prior experience. The process involves using a pretrained LVLM for object detection to create a semantic map, followed by employing LVLM again to predict the likely location of the target object. Our experiments on the RoboTHOR benchmark show improved performance, with a 1.8% increase in both Success Rate and Success Weighted by Path Length (SPL) compared to the existing best method, ESC.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Yuan et al. (Thu,) studied this question.

www.synapsesocial.com/papers/68e781e8b6db6435876f4b8d — DOI: https://doi.org/10.1109/icara60736.2024.10553173

Authors

Shuaihang Yuan

Muhammad Shafique

Mohamed Baghdadi

Actions

Institutions

New York University

Centre for Artificial Intelligence and Robotics

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Zero-shot Object Navigation with Vision-Language Foundation Models Reasoning

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Also consider