May 30, 2024Open Access

Pattern4Ego: Learning Egocentric Video Representation Using Cross-video Activity Patterns

Key Points

Key points are not available for this paper at this time.

Abstract

With the development of Embodied AI, Robotics and Augmented Reality, videos captured from the 'first-person' point of view, also known as egocentric videos, are arousing interests in Computer Vision and Robotics communities. Further, learning a proper representation of egocentric videos can benefit diverse downstream tasks like action forecasting and human object interactions, further beneficial for robotic planning. However, current works mostly focus on learning the temporal or topological information for egocentric video representations, while the activity patterns, which reveal the behavior regularities or the intentions of people or robots in a more explicit way, are not carefully considered. In this paper, we propose a novel framework, Pattern4Ego, that learns the representations of egocentric videos using cross-video activity patterns. This framework achieves state-of-the-art performance on two representative egocentric video tasks: long-term action anticipation and context-based environment affordance.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Ruihai Wu

Yourong Zhang

Yu Qi

Actions

Institutions

Peking University

Northeastern University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Pattern4Ego: Learning Egocentric Video Representation Using Cross-video Activity Patterns

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider