Open-vocabulary object detection is an innovative computer vision task capable of widely recognizing and localizing various objects in images. Unlike traditional methods, it can handle diverse object categories and is suitable for real-time applications in dynamic environments. Existing methods typically achieve zero-shot detection capabilities by fusing images and text. However, when semantic discrepancies exist between text and images, biased prediction problems arise, diminishing the effectiveness of semantic guidance. To address these issues, we propose a universal open-vocabulary object detection method that leverages foundational models to provide fine-grained semantic guidance for the detection process. We design a multi-level adaptive scene perception algorithm that captures subtle features of target objects in complex scenes, enabling precise separation of background and foreground. Additionally, we introduce the Text-KAN (T-KAN) model, which integrates textual descriptions with image features. By employing learnable activation functions, it resolves dependencies on linear matrices, enhances text interpretability, corrects semantic biases, and achieves precise alignment between images and text at a fine-grained level. We comprehensively evaluate the performance of our proposed method on existing open-vocabulary benchmarks, conducting experiments on the COCO and LVIS datasets. The results demonstrate significant performance gains in detecting novel categories, highlighting the method’s strong generalization capabilities. This work provides valuable insights and references for advancing the field of open-vocabulary object detection.
Building similarity graph...
Analyzing shared references across papers
Loading...
Jing Wang
Yonghua Cao
Zhanqiang Huo
Multimedia Systems
Henan Polytechnic University
Building similarity graph...
Analyzing shared references across papers
Loading...
Wang et al. (Tue,) studied this question.
www.synapsesocial.com/papers/69a7601bc6e9836116a2c8b3 — DOI: https://doi.org/10.1007/s00530-026-02211-2
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: