Accurate measurement of the spatial position of targets in a fixed camera is critical in remote sensing applications. Visual spatial positioning methods that rely solely on images are susceptible to adverse factors such as inaccurate camera calibration, imprecise image target detection, and incorrect feature point selection. Complementary to images, the ubiquitous Global Navigation Satellite System (GNSS) data can provide spatial positions of targets, but most of them are low-cost GNSSs with significant positioning noise. In order to fuse these two valuable but flawed positioning measurements to improve the accuracy and stability of spatial positioning, we propose a deep learning multi-modal spatial positioning method by fusing sequential uncalibrated video images and low-cost GNSSs. Firstly, a self-supervised cascade denoising auto-encoder (SCDAE) architecture is built to endow the auto-encoder with robustness to noise in the raw inputs. Then, based on the SCDAE and Bayesian optimal estimation, a Bayesian self-supervised multi-modal fusion positioning method SCDAE-MFP is presented to achieve accurate and stable spatial positioning by self-supervised manifold learning. Specifically, to provide visual self-supervision to the SCDAE-MFP, a visual position denoising auto-encoder module based on dual unsupervised learning is proposed. Extensive experimental results on public datasets showed that SCDAE-MFP outperformed five other classical and state-of-the-art baseline methods by an average of 56.79% in reducing positioning errors.
Building similarity graph...
Analyzing shared references across papers
Loading...
Xiaofei Zeng
Ruliang He
Seunghyo Han
Remote Sensing
Sichuan University
Building similarity graph...
Analyzing shared references across papers
Loading...
Zeng et al. (Mon,) studied this question.
www.synapsesocial.com/papers/69df2b49e4eeef8a2a6b0418 — DOI: https://doi.org/10.3390/rs18081161