Key points are not available for this paper at this time.
A primary hurdle of autonomous driving in urban environments is understanding complex and long-tail scenarios, such as challenging road conditions and delicate human behaviors. We introduce DriveVLM, an autonomous driving system leveraging Vision-Language Models (VLMs) for enhanced scene understanding and planning capabilities. DriveVLM integrates a unique combination of chain-of-thought (CoT) modules for scene description, scene analysis, and hierarchical planning. Furthermore, recognizing the limitations of VLMs in spatial reasoning and heavy computational requirements, we propose DriveVLM-Dual, a hybrid system that synergizes the strengths of DriveVLM with the traditional autonomous driving pipeline. DriveVLM-Dual achieves robust spatial understanding and real-time inference speed. Extensive experiments on both the nuScenes dataset and our SUP-AD dataset demonstrate the effectiveness of DriveVLM and the enhanced performance of DriveVLM-Dual, surpassing existing methods in complex and unpredictable driving conditions.
Building similarity graph...
Analyzing shared references across papers
Loading...
Xiaoyu Tian
Junru Gu
Bailin Li
Building similarity graph...
Analyzing shared references across papers
Loading...
Tian et al. (Mon,) studied this question.
www.synapsesocial.com/papers/68e78822b6db6435876fb14f — DOI: https://doi.org/10.48550/arxiv.2402.12289