Key points are not available for this paper at this time.
In this paper, we present a one-stage framework TriDet for temporal action detection. Existing methods often suffer from imprecise boundary predictions due to the ambiguous action boundaries in videos. To alleviate this problem, we propose a novel Trident-head to model the action boundary via an estimated relative probability distribution around the boundary. In the feature pyramid of TriDet, we propose an efficient Scalable-Granularity Perception (SGP) layer to mitigate the rank loss problem of self-attention that takes place in the video features and aggregate information across different temporal granularities. Benefiting from the Trident-head and the SGP-based feature pyramid, TriDet achieves state-of-the-art performance on three challenging benchmarks: THUMOS14, HACS and EPIC-KITCHEN 100, with lower computational costs, compared to previous methods. For example, TriDet hits an average mAP of 69.3% on THUMOS14, outperforming the previous best by 2.5%, but with only 74.6% of its latency. The code is released to https://github.com/dingfengshi/TriDet.
Building similarity graph...
Analyzing shared references across papers
Loading...
Shi et al. (Thu,) studied this question.
www.synapsesocial.com/papers/6a006017ef8139f8ff778dfb — DOI: https://doi.org/10.1109/cvpr52729.2023.01808
Synapse has enriched 2 closely related papers on similar clinical questions. Consider them for comparative context:
Dingfeng Shi
Yujie Zhong
Qiong Cao
Beihang University
Building similarity graph...
Analyzing shared references across papers
Loading...