March 3, 2026Open Access

Towards universal object detection: a fine-grained perspective on open-vocabulary object detection

Key Points

The proposed method demonstrates significant performance improvements in detecting novel object categories in dynamic environments.
Key evidence indicates a notable increase in detection precision with the integration of the Text-KAN model's features.
Assessment on open-vocabulary benchmarks, particularly the COCO and LVIS datasets, highlights the method's strong generalization abilities.
May enable advancements in the open-vocabulary object detection field; further validation needed across diverse datasets.

Abstract

Open-vocabulary object detection is an innovative computer vision task capable of widely recognizing and localizing various objects in images. Unlike traditional methods, it can handle diverse object categories and is suitable for real-time applications in dynamic environments. Existing methods typically achieve zero-shot detection capabilities by fusing images and text. However, when semantic discrepancies exist between text and images, biased prediction problems arise, diminishing the effectiveness of semantic guidance. To address these issues, we propose a universal open-vocabulary object detection method that leverages foundational models to provide fine-grained semantic guidance for the detection process. We design a multi-level adaptive scene perception algorithm that captures subtle features of target objects in complex scenes, enabling precise separation of background and foreground. Additionally, we introduce the Text-KAN (T-KAN) model, which integrates textual descriptions with image features. By employing learnable activation functions, it resolves dependencies on linear matrices, enhances text interpretability, corrects semantic biases, and achieves precise alignment between images and text at a fine-grained level. We comprehensively evaluate the performance of our proposed method on existing open-vocabulary benchmarks, conducting experiments on the COCO and LVIS datasets. The results demonstrate significant performance gains in detecting novel categories, highlighting the method’s strong generalization capabilities. This work provides valuable insights and references for advancing the field of open-vocabulary object detection.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Jing Wang

Yonghua Cao

Zhanqiang Huo

Journals

Multimedia Systems

Actions

Institutions

Henan Polytechnic University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Towards universal object detection: a fine-grained perspective on open-vocabulary object detection

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider