Worldwide visual geo-localization aims to predict the geographic coordinates of an image capture location from visual content alone, posing unique challenges due to the vast scale of the Earth’s surface and pervasive visual ambiguity across distant regions. Existing approaches face distinct limitations as follows: retrieval-based methods demand massive geo-tagged databases and scale poorly; alignment-based models lack interpretability and are vulnerable to visually similar scenes; and large vision-language models (LVLMs) offer semantic reasoning but suffer from hallucination. A natural solution is retrieval-augmented generation (RAG), yet we observe that directly injecting retrieved candidates as context causes severe context poisoning. To address this, we propose HybridGeo, a dual-stream late-fusion framework that decouples retrieval from reasoning. A retrieval stream applies continuous alignment with spatial–semantic clustering to produce stable regional anchors; a reasoning stream performs context-free Chain-of-Thought inference to yield an independent coordinate estimate. The two streams are fused only at the decision stage via a spatial–consistency module that triggers weighted averaging under agreement or confidence-based arbitration under conflict. Experiments on Im2GPS3k show that HybridGeo achieves 73.89% Country@750km accuracy, outperforming the retrieval baseline by 7.27% and 8.23%, and surpassing both VLM-only and RAG baselines. These results demonstrate that late fusion effectively avoids context poisoning while enabling complementary benefits from both streams.
Building similarity graph...
Analyzing shared references across papers
Loading...
Tang et al. (Thu,) studied this question.
www.synapsesocial.com/papers/69db375f4fe01fead37c564b — DOI: https://doi.org/10.3390/ijgi15040163
Yong Tang
Jianhua Gong
Yi Li
ISPRS International Journal of Geo-Information
Chinese Academy of Sciences
University of Chinese Academy of Sciences
Aerospace Information Research Institute
Building similarity graph...
Analyzing shared references across papers
Loading...