Key points are not available for this paper at this time.
Smart city construction is transitioning from infrastructure digitization toward emotionally-aware service paradigms. However, existing emotion recognition algorithms encounter significant limitations regarding accuracy, real-time performance, and environmental adaptability when deployed in complex urban scenarios. This study proposes an enhanced multimodal emotion recognition algorithm integrated within a digital twin framework. The approach employs an adaptive weighted feature fusion mechanism that dynamically adjusts contribution weights across video, audio, and textual modalities, coupled with a multi-head spatiotemporal attention network capable of simultaneously modeling temporal evolution and spatial distribution characteristics of emotional states. Furthermore, a context-aware module based on graph convolution and gated recurrent units is constructed to enhance environmental robustness. Experimental evaluation demonstrates 86.3% recognition accuracy with an inference latency of merely 15.2 ms, utilizing a compact architecture of 68.4 M parameters to achieve 23 FPS on edge computing platforms such as Jetson devices. The system was validated on a comprehensive dataset comprising 520,000 urban samples. Comparative analysis against mainstream methods including MemoCMT and GASMER reveals substantial improvements of 64.1% in inference speed and 48.2% reduction in parameter count while maintaining competitive accuracy, with performance remaining above 78% under adverse environmental conditions including strong noise and low illumination. Ablation experiments confirm the architectural contributions, with adaptive fusion, attention mechanisms, and spatiotemporal modeling providing performance gains of 3.2%, 2.8%, and 1.4% respectively, supported by statistical significance testing. These research findings provide efficient and reliable technical foundations for constructing emotion-driven smart city service systems, offering broad application prospects in urban spatial management, public safety monitoring, intelligent traffic scheduling, and related domains.
Cai et al. (Sun,) studied this question.