Vision Graph Neural Networks (GNNs) offer a powerful alternative to CNNs and Transformers for modeling complex visual relationships. However, they still face two challenges: high computational cost of repeated global k-NN graph constructions and misalignment of rigid patch tokenization with object boundaries. We propose DiRAViG, which replaces fixed patches with boundary-aligned region tokens produced by a differentiable end-to-end assignment, and propagates on a fixed, sparse one-hop spatial contact graph with few-step diffusion. A bidirectional pixel–region pathway aggregates features into regions and projects them back to the image grid, preserving fine details and stabilizing training. On ImageNet-1K, DiRAViG-S achieves 78.7% Top-1 at 1.5 GMACs and DiRAViG-M reaches 81.5% at 4.2 GMACs. Compared to Pyramid ViG-S (∼4.6 GMACs) and ViHGNN-S (∼ GMACs), DiRAViG-M offers a better accuracy-efficiency trade-off. These results demonstrate that DiRAVIG offers a scalable and boundary-aware solution for efficient vision analysis.
Building similarity graph...
Analyzing shared references across papers
Loading...
Zhenghao Li
Huadong Zheng
Lin Qian
Building similarity graph...
Analyzing shared references across papers
Loading...
Li et al. (Sat,) studied this question.