Millimeter-wave (mmWave) radar point clouds offer a privacy-preserving solution for Human Activity Recognition (HAR), but their inherent sparsity and noise limit single-modal performance. While multimodal fusion mitigates this issue, existing methods often suffer from severe negative transfer during visual degradation and incur high computational costs, unsuitable for edge devices. To address these challenges, we propose Tac-Mamba, a lightweight cross-modal state space model. First, we introduce a topology-guided distillation scheme that uses a Spatial Mamba teacher to extract structural priors from visual skeletons. These priors are then explicitly distilled into a Point Transformer v3 (PTv3) radar student with a modality dropout strategy. We also developed a Trust-Aware Cross-Modal Attention (TACMA) module to prevent negative transfer. It evaluates the reliability of visual features through a SiLU-activated cross-modal bilinear interaction, smoothly degrading to a pure radar-driven fallback projection when visual inputs are corrupted. Finally, a Lightweight Temporal Mamba Block (LTMB) with a Zero-Parameter Cross-Gating (ZPCG) mechanism captures long-range kinematic dependencies with linear complexity. Experiments on the public MM-Fi dataset under strict cross-environment protocols demonstrate that Tac-Mamba achieves competitive accuracies of 95.37% (multimodal) and 87.54% (radar-only) with only 0.86M parameters and 1.89 ms inference latency. These results highlight the model’s exceptional robustness to modality missingness and its feasibility for edge deployment.
Building similarity graph...
Analyzing shared references across papers
Loading...
Haiyi Wu
K. K. Zhao
Wei Yao
Electronics
University of Chinese Academy of Sciences
Shanghai Institute of Microsystem and Information Technology
Building similarity graph...
Analyzing shared references across papers
Loading...
Wu et al. (Tue,) studied this question.
www.synapsesocial.com/papers/69d893a86c1944d70ce04a76 — DOI: https://doi.org/10.3390/electronics15071535