Trajectory prediction plays a crucial role in autonomous driving by forecasting the future trajectories of agents to support safe and efficient decision-making. Most existing methods that adopt an encoder–decoder architecture have achieved remarkable success, where the scene encoder extracts contextual representations from agents’ history trajectories and lane segments. However, this architecture remains fundamentally constrained by the blind encoder. Specifically, the scene encoder of models extracts contextual information without foresight, leading to significant semantic pollution from proposal-irrelevant context, thereby degrading the prediction performance. To rectify this model deficiency, we propose ProFocus, a proactive encoding framework that reformulates the trajectory prediction model architecture via an anticipatory feedback loop. ProFocus generates the potential proposals in the nascent stage layers, utilizing them as attentional priors to dynamically modulate the scene encoding process. In addition, to optimize the information flow within the attention mechanism and reduce irrelevant context interference in attention distributions, we introduce spatio-temporal focal attention (STFA). By implementing a relation-conditioned sharpening operator through a spatio-temporal relation-controlled softmax, STFA adaptively recalibrates the attention distribution according to related dependencies. Comprehensive evaluations on the Argoverse 1 dataset and INTERACTION dataset validate that ProFocus attains competitive performance across miss rate (MR), minimum average displacement error (minADE) and minimum final displacement error (minFDE), while maintaining a real-time inference speed of 16 ms on an RTX 3090. The results from our ablation studies demonstrate that ProFocus reduces MR, minFDE, and minADE by 2.80%, 2.52%, and 1.41% relative to the baseline, respectively. Furthermore, qualitative visualizations also corroborate that ProFocus exhibits robust performance in diverse traffic scenarios.
Liu et al. (Tue,) studied this question.