Abstract Post-event residential building damage assessment supports reconnaissance, screening, and recovery planning following major hazard events. Recent hurricanes have generated large volumes of satellite, aerial, and street-level imagery, creating challenges not of data scarcity but of integrating heterogeneous visual and contextual information into consistent and reviewable damage classifications. This study presents a multimodal deep-learning framework, the Multimodal Swin Transformer (MMST), that combines street-view imagery with structured building and hazard attributes to classify post-hurricane residential building damage. The model is evaluated using a curated dataset derived from extensive Structural Extreme Events Reconnaissance (StEER) field reconnaissance following Hurricane Ian (2022), which incorporates human interpretation, quality control, and selective sampling across impacted communities. Results show that integrating visual features with contextual information such as building age, building value, and wind speed improves classification performance relative to image-only baselines, achieving an accuracy of 92.67%. Attention-based visualizations further enable post-hoc inspection of image regions the model weighted most heavily, supporting qualitative review of model behavior rather than physical interpretation. The proposed MMST serves as a decision-support tool to augment reconnaissance workflows and enhance the continuity of critical community infrastructure in future hurricanes.
Zhang et al. (Wed,) studied this question.