What question did this study set out to answer?

This research aims to enhance zero-shot performance in Vision-Language Navigation by using a knowledge graph-driven reinforcement learning approach.

May 2, 2026Open Access

Knowledge Graph-Driven Reinforcement Learning for Zero-Shot Vision-Language Navigation

Key Points

This research aims to enhance zero-shot performance in Vision-Language Navigation by using a knowledge graph-driven reinforcement learning approach.
Proposed a novel approach utilizing a dynamically updated knowledge graph during real-time interactions.
Implemented a Chain-of-Thought prompting mechanism for multi-hop reasoning.
Designed an end-to-end optimized reinforcement learning framework incorporating multi-modal features and a composite reward function.
Significantly increased navigation success rates in zero-shot scenarios, showcasing better generalization abilities.
Demonstrated robust performance, particularly in identifying unseen object categories and navigating complex scene layouts.

Abstract

To address the limitations of zero-shot generalization in Vision-Language Navigation (VLN), this paper proposes a novel knowledge graph-driven reinforcement learning approach. Our method constructs a hierarchical, dynamically updated knowledge graph online during the agent’s real-time interaction with the environment, seamlessly aligning external semantic priors with continuous visual perception. By leveraging a Chain-of-Thought (CoT) prompting mechanism, the agent performs multi-hop reasoning to precisely locate target objects. Furthermore, we design an end-to-end optimized reinforcement learning framework that fuses multi-modal features and employs a task-oriented composite reward function. Extensive experiments in the AI2-THOR simulation environment demonstrate that the proposed method significantly improves navigation success rates in zero-shot settings. The results validate its robust generalization capabilities, particularly for unseen object categories and complex scene layouts.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Ye zhang

Yandong Zhao

He Liu

Journals

Mathematics

Actions

Institutions

Taiyuan University of Science and Technology

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Knowledge Graph-Driven Reinforcement Learning for Zero-Shot Vision-Language Navigation

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study