May 20, 2026Open Access

Semantic Visual Simultaneous Localization and Mapping: A survey on state of the art, challenges, and future directions

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

Semantic Visual Simultaneous Localization and Mapping (Semantic vSLAM) is a critical area of research in robotics and computer vision, focusing on the simultaneous localization of robotic systems and the association of semantic information to construct the most accurate and comprehensive model of the surrounding environment. Since the first foundational work on Semantic vSLAM appeared more than two decades ago, the field has attracted increasing attention across various scientific communities. Despite its significance, the field lacks comprehensive surveys encompassing recent advances and persistent challenges. In response, this study provides a thorough examination of the state-of-the-art of Semantic vSLAM techniques, with the aim of illuminating current trends and key obstacles. Beginning with an in-depth exploration of the evolution of visual SLAM, this study outlines its strengths and unique characteristics while also critically assessing previous survey literature. Subsequently, a unified problem formulation and evaluation of the modular solution framework is proposed, which decomposes the problem into discrete stages, including visual localization, semantic feature extraction, mapping, data association, and loop closure optimization. Moreover, this study investigates alternative methodologies such as deep learning and the utilization of large language models, alongside a review of relevant research about contemporary SLAM datasets. Concluding with a discussion on potential future research directions, this study serves as a comprehensive resource for researchers seeking to navigate the complex landscape of Semantic vSLAM. • Provides a comprehensive, structured review of the entire Semantic vSLAM pipeline, from semantic extraction, semantic localization, semantic mapping, semantic data association, and semantic loop closure optimization. • Presents a detailed analysis of two transformative, state-of-the-art paradigms: Continuous Representations ( e.g. , NeRFs, 3D Gaussian Splatting) and Foundation Models ( e.g. , VLMs, LLMs). • Includes a comprehensive comparison table of over 40 state-of-the-art Semantic vSLAM systems, evaluating them across 13 distinct technical criteria. • Extends the review to advanced frontiers, including Multi-Robot Semantic vSLAM and the critical challenges of building Lifelong mapping systems. • Identifies and discusses key open research questions, providing a clear roadmap for future innovation in robust, intelligent robotic perception.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Thanh Nguyen Canh

Haolan Zhang

Xiem HoangVan

Journals

Robotics and Autonomous Systems

Actions

Institutions

Hanyang University

Japan Advanced Institute of Science and Technology

Anyang University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Semantic Visual Simultaneous Localization and Mapping: A survey on state of the art, challenges, and future directions

Puntos clave

Resumen

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study