February 19, 2024Open Access

DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models

Key Points

Key points are not available for this paper at this time.

Abstract

A primary hurdle of autonomous driving in urban environments is understanding complex and long-tail scenarios, such as challenging road conditions and delicate human behaviors. We introduce DriveVLM, an autonomous driving system leveraging Vision-Language Models (VLMs) for enhanced scene understanding and planning capabilities. DriveVLM integrates a unique combination of chain-of-thought (CoT) modules for scene description, scene analysis, and hierarchical planning. Furthermore, recognizing the limitations of VLMs in spatial reasoning and heavy computational requirements, we propose DriveVLM-Dual, a hybrid system that synergizes the strengths of DriveVLM with the traditional autonomous driving pipeline. DriveVLM-Dual achieves robust spatial understanding and real-time inference speed. Extensive experiments on both the nuScenes dataset and our SUP-AD dataset demonstrate the effectiveness of DriveVLM and the enhanced performance of DriveVLM-Dual, surpassing existing methods in complex and unpredictable driving conditions.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Xiaoyu Tian

Junru Gu

Bailin Li

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study