What question did this study set out to answer?

Explore enhancements in object detection by incorporating pseudo-depth features alongside traditional RGB inputs.

March 28, 2026Open Access

Pseudo-depth-based deep neural network model for object detection

Puntos clave

Explore enhancements in object detection by incorporating pseudo-depth features alongside traditional RGB inputs.
Utilized a monocular depth estimation model to extract pseudo-depth features from images.
Combined pseudo-depth and RGB features for neural network training.
Performed tests on M $$^3$$ FD and COCO datasets to evaluate detection performance.
Achieved an increase in detection metric mAP 50 by 3.8 percentage points on the M $$^3$$ FD dataset.
Achieved an increase in detection metric mAP 50 by 8.0 percentage points on the COCO dataset.
Demonstrated potential for easy integration into existing machine learning models.

Resumen

Abstract Current machine learning methods only utilize the three-channel color features of optical images for computer visual tasks. However, the optical images only explicitly present information of RGB color and two-dimensional planar shape, where the third-dimensional spatial features are not fully exploited. This limitation restricts the potential improvement in recognition performance. To address this issue, we propose a detection scheme to enhance model’s detection capabilities based on four independent features by combining the pseudo-depth and the RGB features without adding any additional hardware sensors. The monocular depth estimation model is first used as a virtual depth sensor to extract the pseudo-depth features from input optical images. Then the fused Depth-RGB features are fed into the neural network model for object detection training and inference to enhance capability for extracting spatial features. Experiments show that the proposed method has improved the detection metric mAP ₅₀ by 3. 8 and 8. 0 percentage points on the public M ³ FD and COCO datasets, respectively. Notably, the scheme can be easily embedded into any machine learning models to definitely improve the detection performance.

Me gusta

Guardar

Ver artículo completo