What question did this study set out to answer?

This research aims to improve human-object interaction detection by leveraging hierarchical contextual relationships.

April 15, 2026Open Access

Exploring Hierarchical Tuple-Based Contextual Correlations for Human-Object Interaction Detection

Key Points

This research aims to improve human-object interaction detection by leveraging hierarchical contextual relationships.
Developed the HTCCL model to capture multi-level contextual relationships.
Decomposed interactions into entity, action, and event levels.
Used a heterogeneous graph network and multi-branch transformer architecture.
Employed Contrastive Language-Image Pre-training for embedding interaction cues.
Achieved state-of-the-art performance in human-object interaction detection.
Demonstrated superior effectiveness in complex scenes with multiple relationships.
Showed significant improvements on standard benchmarks.

Abstract

Human-Object Interaction (HOI) detection is a challenging task in computer vision, particularly in complex scenes involving multiple humans and interactions. In this paper, we propose the Hierarchical Tuple-based Contextual Correlations Learning (HTCCL) model, which aims to enhance HOI detection by systematically capturing multi-level contextual relationships. Our approach decomposes an interaction into three hierarchical levels: entity, action, and event. We introduce a heterogeneous graph network with a multi-branch Transformer architecture, where human and object entities are treated as distinct nodes, facilitating fine-grained relational reasoning. Furthermore, we leverage Contrastive Language-Image Pre-training model to embed interaction cues into queries, which are subsequently refined through local and global contextual aggregation modules. The proposed model effectively integrates contextual information across various levels, improving its ability to detect complex interactions within diverse scenes. Our extensive evaluations on standard benchmarks demonstrate the superiority of HTCCL in achieving state-of-the-art performance in HOI detection, particularly in scenarios with high relational complexity.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Xin Hu

Ke Qin

Tao He

Journals

Tsinghua Science & Technology

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Exploring Hierarchical Tuple-Based Contextual Correlations for Human-Object Interaction Detection

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider