Object state changes (OSCs) play a critical role in video understanding, as they focus on localizing the stages of state transitions within temporal sequences. However, existing methods face two key challenges in open-world scenarios. First, there is a significant background-causal scene imbalance due to dataset bias. This leads to reliance on irrelevant features and degrades prediction capability. Second, existing methods have poor generalization performance on unseen objects. They typically focus on a single state change of a specific object, which limits them to understand the state change of an unseen object in a generalized way as humans do. To address these challenges, we first introduce a structural causal model (SCM) to formally structure the OSC task, which explicitly defines the confounding effect of dataset bias and the lack of generalization. Guided by this SCM, we propose CCI-Net, a causal counterfactual inference-based video OSC neural network. CCI-Net employs a causal inference network for backdoor adjustment to effectively eliminate confounders. In addition, it integrates counterfactual inference to enhance understanding in open-world scenarios. Specifically, CCI-Net comprises two key components: the backdoor scene classifier (BSC) and the counterfactual module (CM). The BSC controls potential confounders and mitigates spurious correlations. The CM enhances generalization to unseen objects and their state changes by constructing counterfactual scenes during training. Furthermore, we design two loss functions for causal and counterfactual scenes to optimize the learning process. Experimental results on three benchmark datasets demonstrate that, compared with existing methods, CCI-Net significantly improves both precision and generalization in open-world scenarios.
Building similarity graph...
Analyzing shared references across papers
Loading...
Zhichao Wang
Shucheng Huang
Mingxing Li
Building similarity graph...
Analyzing shared references across papers
Loading...
Wang et al. (Tue,) studied this question.
www.synapsesocial.com/papers/69e1ce3b5cdc762e9d857534 — DOI: https://doi.org/10.1109/tnnls.2026.3678945