What question did this study set out to answer?

This research aims to enhance wildfire detection and localization using autonomous aerial vehicles with limited resources.

March 26, 2026Open Access

Cross-modal distillation for real-time wildfire detection and localization in edge-deployed aerial vehicles

Key Points

This research aims to enhance wildfire detection and localization using autonomous aerial vehicles with limited resources.
Developed a cross-modal knowledge distillation framework for aerial monitoring
Utilized a teacher network trained on thermal images and a student network on optical images
Implemented dual classification heads for efficient fire detection
Evaluated the framework on a comprehensive aerial wildfire dataset
Achieved 90.97% patch-level accuracy for fire localization
Reported false alarm and missed detection rates of 8.82% and 14.78%, respectively
Student model operates at 2.99 GFLOPS with 0.004s inference time

Abstract

Wildfire detection and localization in aerial imagery is critical for rapid response and damage mitigation. Autonomous aerial vehicles (AAVs) enable large area monitoring but face real-time processing challenges due to limited onboard computational and sensor resources. This work introduces a cross-modal knowledge distillation framework for edge-deployed AAVs. A teacher network trained only on thermal images transfers semantic and spatial representations to an optical image based student network when trained in an offline fashion using thermal and optical image pairs. During deployment, the student uses only optical images, thus reducing reliance on multi-sensor payloads while maintaining high detection accuracy. The student model incorporates dual classification heads: an image-level head for fire-free vs. fire-impacted scenes, and a patch-level head for flame vs. no-flame discrimination. This patch-level strategy provides effective fire localization while avoiding the computational overhead of segmentation, making it practical for resource-constrained deployment. Evaluated on aerial wildfire dataset, the framework achieves 90.97% patch-level accuracy, with false alarm and missed detection rates of 8.82% and 14.78%, respectively. The lightweight student model requires only 2.99 GFLOPS with inference time of 0.004s and generates patch-level probability heatmaps for fire region localization. Unlike conventional unimodal systems, this approach leverages thermal-to-optical knowledge transfer to deliver high accuracy, low latency, and precise localization under edge-computing constraints. The code and dataset will be released at https://github.com/medh132/cmkd .

Cross-modal distillation for real-time wildfire detection and localization in edge-deployed aerial vehicles

Key Points

Abstract

Cite This Study