What does this research mean for the field?

The proposed LMM-guided distillation framework improves object detection performance on both common and rare categories in power operations while maintaining real-time throughput on edge hardware. Novelty: ClaimNovelty.NOVEL_FINDING. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

The aim is to enhance real-time object detection in power-grid operations using a lightweight model while preserving performance with rare objects.

March 8, 2026Open Access

LMM-guided knowledge distillation for power operation object detection in cloud-edge environment

Key Points

The aim is to enhance real-time object detection in power-grid operations using a lightweight model while preserving performance with rare objects.
Developed an LMM-guided distillation framework for transferring semantics from a large teacher model to a lightweight student model.
Used feature distillation to align student features with teacher region embeddings.
Employed prompt-aware logit distillation to match student logits to the teacher's smoothed prompt distribution.
Implemented vision-language contrastive alignment to associate student regions with the correct prompt embeddings.
Achieved consistent gains in detection performance for both common and rare object categories.
Maintained real-time throughput on edge hardware for effective deployment.
Demonstrated practical application of a cloud-to-edge pipeline for safety monitoring.

Abstract

Power-grid field operations demand real-time visual monitoring to verify personal protective equipment and tool usage under large depth-of-field. Conventional real-time detectors are efficient but closed-vocabulary; they struggle with rare or unseen objects. Large multimodal models (LMM) offer open-vocabulary understanding guided by prompts, yet are too heavy for edge deployment. To address these challenges, We propose an LMM-guided distillation framework that transfers prompt-grounded semantics from a large teacher to a lightweight YOLO-style student. The teacher, queried with expanded prompt set, produces pseudo labels and region–text embeddings. The student is trained with a standard detection objective and three semantic transfers. Firstly, feature distillation aligns student features to teacher region embeddings via a linear projector; Secondly, prompt-aware logit distillation matches student logits to the teacher’s temperature-smoothed prompt distribution; and thirdly, vision–language contrastive alignment ties projected student regions to the correct prompt embedding. Experiments on two benchmark dataset indicate consistent gains on both common and rare categories while retaining real-time throughput on edge hardware, demonstrating a practical cloud-to-edge pipeline for safety monitoring.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Bingyang Li

Xiangyang Zhang

Lin Li

Journals

Journal of Cloud Computing Advances Systems and Applications

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

LMM-guided knowledge distillation for power operation object detection in cloud-edge environment

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study