Abstract Current machine learning methods only utilize the three-channel color features of optical images for computer visual tasks. However, the optical images only explicitly present information of RGB color and two-dimensional planar shape, where the third-dimensional spatial features are not fully exploited. This limitation restricts the potential improvement in recognition performance. To address this issue, we propose a detection scheme to enhance model’s detection capabilities based on four independent features by combining the pseudo-depth and the RGB features without adding any additional hardware sensors. The monocular depth estimation model is first used as a virtual depth sensor to extract the pseudo-depth features from input optical images. Then the fused Depth-RGB features are fed into the neural network model for object detection training and inference to enhance capability for extracting spatial features. Experiments show that the proposed method has improved the detection metric mAP ₅₀ by 3. 8 and 8. 0 percentage points on the public M ³ FD and COCO datasets, respectively. Notably, the scheme can be easily embedded into any machine learning models to definitely improve the detection performance.
Li et al. (Thu,) studied this question.