May 17, 2026Open Access

Found-RL: Foundation model-enhanced reinforcement learning via asynchronous VLM feedback for autonomous driving

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

Reinforcement Learning (RL) has emerged as a dominant paradigm for end-to-end autonomous driving (AD) with real-time inference. However, RL typically suffers from sample inefficiency and a lack of semantic interpretability in complex scenarios. To mitigate these limitations, Foundation Models (particularly, Vision-Language Models (VLMs)) can be integrated because they offer rich, context-aware knowledge. Yet still, deploying such computationally intensive models within high-frequency multi-environment RL training loops is severely hindered by prohibitive inference latency and the absence of unified integration platforms. To bridge this gap, we present Found-RL, a specialized platform tailored to leverage foundation models to efficiently enhance RL for AD. A core innovation of the proposed platform is its asynchronous batch inference framework, which decouples heavy VLM reasoning from the simulation loop. This design effectively resolves latency bottlenecks, supporting real-time or near-real-time RL learning from VLM feedback. Using the proposed platform, we introduce diverse supervision mechanisms to address domain-specific challenges: we first implement Value-Margin Regularization (VMR) and Advantage-Weighted Action Guidance (AWAG) to effectively distill expert-like VLM action suggestions into the RL policy. Furthermore, for dense supervision, we adopt high-throughput CLIP for reward shaping. We mitigate CLIP’s dynamic blindness and probability dilution via Conditional Contrastive Action Alignment, which conditions prompts on discretized speed/command and yields a normalized, margin-based bonus from context-specific action-anchor scoring. Found-RL delivers an end-to-end pipeline for fine-tuned VLM integration with modular support, and shows that a lightweight RL model with millions of parameters can achieve near-VLM performance compared with billion-parameter VLMs while sustaining real-time inference (~500 FPS). Code, data, and models will be publicly available at https://github.com/ys-qu/found-rl.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Qu et al. (Fri,) studied this question.

www.synapsesocial.com/papers/6a095bdd7880e6d24efe1c3f — DOI: https://doi.org/10.26599/commtr.2026.9640027

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Authors

Yansong Qu

Zihao Sheng

Zilin Huang

Journals

Communications in Transportation Research

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Found-RL: Foundation model-enhanced reinforcement learning via asynchronous VLM feedback for autonomous driving

Puntos clave

Resumen

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Authors

Journals

Actions

References and Citations

Citation Network

Connected Papers

Discussion