What question did this study set out to answer?

The survey aims to explore recent advancements in few-shot learning techniques applicable to video and 3D object detection.

February 25, 2026

Few-Shot Learning in Video and 3D Object Detection: A Survey

Key Points

The survey aims to explore recent advancements in few-shot learning techniques applicable to video and 3D object detection.
Systematic survey of few-shot, semi-supervised, sparsely-supervised, and weakly-supervised approaches.
Analysis of techniques like tube proposals, temporal matching networks, and motion-guided methods for video detection.
Investigation of uncertainty-aware methods, geometric learning, and multimodal fusion for 3D detection.
Focus on foundation models and vision-language model integration across applications.
Achieved substantial gains in video detection, with average precision improving from 33 to 48 in few-shot scenarios.
Sparsely-supervised techniques performed competitively with only 2% of annotations for 3D object detection.
Highlighted data-efficient learning's potential to minimize annotation needs for real-world applications.

Abstract

Few-shot learning (FSL) and data-efficient learning paradigms enable object detection models to recognize novel classes from minimally annotated examples, addressing expensive data-labeling challenges. This systematic survey examines recent advances in few-shot, semi-supervised, sparsely-supervised, and weakly-supervised approaches for video and 3D object detection, focusing on developments through foundation models and vision-language model integration. For video object detection, techniques including tube proposals, temporal matching networks, motion-guided approaches, and temporal consistency-based semi-supervised methods utilize spatiotemporal relationships for efficient novel class adaptation, with recent architectures achieving substantial gains from 33 to 48 average precision in few-shot scenarios. For 3D object detection, specialized approaches address point cloud sparsity and texture limitations through uncertainty-aware methods, geometric learning, and multimodal fusion, with sparsely-supervised techniques achieving competitive performance using only 2% of annotations, enabling practical deployment in autonomous driving and robotics. The survey analyzes methodological advances including meta-learning, transfer learning, pseudo-label generation, contrastive instance mining, and foundation model integration across applications spanning autonomous driving, surveillance, robotics, industrial control, and medical imaging. By examining developments across multiple supervision paradigms, this work highlights data-efficient learning’s potential for minimizing annotation requirements and enabling robust real-world deployment across temporal, spatial, and multimodal domains.

Bookmark

Few-Shot Learning in Video and 3D Object Detection: A Survey

Key Points

Abstract

Cite This Study