Abstract Millimeter wave (mmWave) communication systems use beamforming and large antenna arrays to achieve high data rates by directing signals through narrow beams, reducing interference and enhancing transmission efficiency. Efficient beamforming requires real‐time beam adjustments to adapt to user positions and environmental changes, but traditional methods relying on frequent measurements can lead to significant overhead in dynamic environments. AI/ML approaches leveraging sensor data and historical information can improve beam prediction and tracking efficiency. Building on this, we propose a multimodal beam tracking model for UAV communication, integrating image and GPS data to predict UAV movement and optimize beam tracking. The model employs ResNet‐SE blocks for feature extraction, CAformer blocks for multimodal data fusion, and LSTM for capturing sequential historical features. Experimental results show that the proposed model outperforms single‐modal methods, achieving a 24.4% improvement in Top‐1 beam accuracy.
Yeo et al. (Wed,) studied this question.