Imitation learning (IL)-based approaches have shown great promise in autonomous valet parking (AVP) trajectory planning. However, their application faces key challenges, including the complexity of low-speed vehicle–obstacle interactions and feature degradation when integrating multimodal driving behaviors, such as cruising and parking, within a unified model. To address these challenges, we propose a dual-branch Transformer architecture with a conditional attention mechanism tailored for IL-based AVP trajectory planning. During the interaction feature extraction phase, we introduce a vehicle–obstacle interaction module that leverages conditional attention to model both behavior-level and trajectory-level interaction. This design significantly improves the representation of the characteristics of the various driving behaviors involved in AVP tasks. Additionally, spatial semantics (e.g., target pose and behavior type) and spatiotemporal interaction cues are processed in parallel feedforward branches, effectively mitigating the feature degradation observed in traditional single-branch architectures. We evaluate our method on the real-world Dragon Lake Parking (DLP) dataset and compare it against state-of-the-art approaches. Experimental results show that our model produces high-quality trajectories for both dynamic cruising and precise parking scenarios, consistently outperforming baseline methods across all evaluation metrics.
Li et al. (Thu,) studied this question.