What question did this study set out to answer?

This work aims to enhance trajectory prediction for autonomous vehicles by addressing limitations in existing encoder-decoder models.

June 4, 2026Open Access

A Proposal-Aware Proactive Encoding Framework for Trajectory Prediction in Autonomous Driving

Key Points

This work aims to enhance trajectory prediction for autonomous vehicles by addressing limitations in existing encoder-decoder models.
Developed ProFocus, a proactive encoding framework that utilizes anticipatory feedback loops.
Introduced spatio-temporal focal attention (STFA) to optimize information flow and reduce irrelevant context.
Evaluated performance on the Argoverse 1 and INTERACTION datasets.
ProFocus reduced miss rate (MR) by 2.80%, minimum average displacement error (minADE) by 1.41%, and minimum final displacement error (minFDE) by 2.52% compared to the baseline.
Achieved real-time inference speed of 16 ms on an RTX 3090.
Demonstrated robust performance in diverse traffic scenarios through qualitative visualizations.

Abstract

Trajectory prediction plays a crucial role in autonomous driving by forecasting the future trajectories of agents to support safe and efficient decision-making. Most existing methods that adopt an encoder–decoder architecture have achieved remarkable success, where the scene encoder extracts contextual representations from agents’ history trajectories and lane segments. However, this architecture remains fundamentally constrained by the blind encoder. Specifically, the scene encoder of models extracts contextual information without foresight, leading to significant semantic pollution from proposal-irrelevant context, thereby degrading the prediction performance. To rectify this model deficiency, we propose ProFocus, a proactive encoding framework that reformulates the trajectory prediction model architecture via an anticipatory feedback loop. ProFocus generates the potential proposals in the nascent stage layers, utilizing them as attentional priors to dynamically modulate the scene encoding process. In addition, to optimize the information flow within the attention mechanism and reduce irrelevant context interference in attention distributions, we introduce spatio-temporal focal attention (STFA). By implementing a relation-conditioned sharpening operator through a spatio-temporal relation-controlled softmax, STFA adaptively recalibrates the attention distribution according to related dependencies. Comprehensive evaluations on the Argoverse 1 dataset and INTERACTION dataset validate that ProFocus attains competitive performance across miss rate (MR), minimum average displacement error (minADE) and minimum final displacement error (minFDE), while maintaining a real-time inference speed of 16 ms on an RTX 3090. The results from our ablation studies demonstrate that ProFocus reduces MR, minFDE, and minADE by 2.80%, 2.52%, and 1.41% relative to the baseline, respectively. Furthermore, qualitative visualizations also corroborate that ProFocus exhibits robust performance in diverse traffic scenarios.

A Proposal-Aware Proactive Encoding Framework for Trajectory Prediction in Autonomous Driving

Key Points

Abstract

Cite This Study