Wildfire detection and localization in aerial imagery is critical for rapid response and damage mitigation. Autonomous aerial vehicles (AAVs) enable large area monitoring but face real-time processing challenges due to limited onboard computational and sensor resources. This work introduces a cross-modal knowledge distillation framework for edge-deployed AAVs. A teacher network trained only on thermal images transfers semantic and spatial representations to an optical image based student network when trained in an offline fashion using thermal and optical image pairs. During deployment, the student uses only optical images, thus reducing reliance on multi-sensor payloads while maintaining high detection accuracy. The student model incorporates dual classification heads: an image-level head for fire-free vs. fire-impacted scenes, and a patch-level head for flame vs. no-flame discrimination. This patch-level strategy provides effective fire localization while avoiding the computational overhead of segmentation, making it practical for resource-constrained deployment. Evaluated on aerial wildfire dataset, the framework achieves 90.97% patch-level accuracy, with false alarm and missed detection rates of 8.82% and 14.78%, respectively. The lightweight student model requires only 2.99 GFLOPS with inference time of 0.004s and generates patch-level probability heatmaps for fire region localization. Unlike conventional unimodal systems, this approach leverages thermal-to-optical knowledge transfer to deliver high accuracy, low latency, and precise localization under edge-computing constraints. The code and dataset will be released at https://github.com/medh132/cmkd .
Mishra et al. (Tue,) studied this question.