Key points are not available for this paper at this time.
With the development of Embodied AI, Robotics and Augmented Reality, videos captured from the 'first-person' point of view, also known as egocentric videos, are arousing interests in Computer Vision and Robotics communities. Further, learning a proper representation of egocentric videos can benefit diverse downstream tasks like action forecasting and human object interactions, further beneficial for robotic planning. However, current works mostly focus on learning the temporal or topological information for egocentric video representations, while the activity patterns, which reveal the behavior regularities or the intentions of people or robots in a more explicit way, are not carefully considered. In this paper, we propose a novel framework, Pattern4Ego, that learns the representations of egocentric videos using cross-video activity patterns. This framework achieves state-of-the-art performance on two representative egocentric video tasks: long-term action anticipation and context-based environment affordance.
Building similarity graph...
Analyzing shared references across papers
Loading...
Ruihai Wu
Yourong Zhang
Yu Qi
Peking University
Northeastern University
Building similarity graph...
Analyzing shared references across papers
Loading...
Wu et al. (Thu,) studied this question.
www.synapsesocial.com/papers/68e67cb4b6db6435876064d9 — DOI: https://doi.org/10.1145/3652583.3658010
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: