Scene Graph Generation (SGG) aims to extract visual entities and their semantic relationships from images, providing a structured layout for scene understanding. Current models often suffer from insufficient multi-modal feature fusion and imbalanced predicate distributions, leading to biased predictions. To address these issues, we propose ReBalance-HCA, a unified framework that combines Hybrid Co-Attention Networks (HCA) with Predicate Reweighting (PR). HCA enhances intra-modal features and aligns cross-modal semantics, while PR dynamically adjusts the predicate distribution by modeling inter-predicate correlations. Extensive experiments on the Visual Genome and OpenImages datasets demonstrate that ReBalance-HCA achieves competitive mR@K scores compared to recent state-of-the-art methods in SGG sub-tasks. Our code and datasets are available at: https://github.com/LinusLing/ReBalance-HCA .
Ling et al. (Fri,) studied this question.