Los puntos clave no están disponibles para este artículo en este momento.
Semantic Visual Simultaneous Localization and Mapping (Semantic vSLAM) is a critical area of research in robotics and computer vision, focusing on the simultaneous localization of robotic systems and the association of semantic information to construct the most accurate and comprehensive model of the surrounding environment. Since the first foundational work on Semantic vSLAM appeared more than two decades ago, the field has attracted increasing attention across various scientific communities. Despite its significance, the field lacks comprehensive surveys encompassing recent advances and persistent challenges. In response, this study provides a thorough examination of the state-of-the-art of Semantic vSLAM techniques, with the aim of illuminating current trends and key obstacles. Beginning with an in-depth exploration of the evolution of visual SLAM, this study outlines its strengths and unique characteristics while also critically assessing previous survey literature. Subsequently, a unified problem formulation and evaluation of the modular solution framework is proposed, which decomposes the problem into discrete stages, including visual localization, semantic feature extraction, mapping, data association, and loop closure optimization. Moreover, this study investigates alternative methodologies such as deep learning and the utilization of large language models, alongside a review of relevant research about contemporary SLAM datasets. Concluding with a discussion on potential future research directions, this study serves as a comprehensive resource for researchers seeking to navigate the complex landscape of Semantic vSLAM. • Provides a comprehensive, structured review of the entire Semantic vSLAM pipeline, from semantic extraction, semantic localization, semantic mapping, semantic data association, and semantic loop closure optimization. • Presents a detailed analysis of two transformative, state-of-the-art paradigms: Continuous Representations ( e.g. , NeRFs, 3D Gaussian Splatting) and Foundation Models ( e.g. , VLMs, LLMs). • Includes a comprehensive comparison table of over 40 state-of-the-art Semantic vSLAM systems, evaluating them across 13 distinct technical criteria. • Extends the review to advanced frontiers, including Multi-Robot Semantic vSLAM and the critical challenges of building Lifelong mapping systems. • Identifies and discusses key open research questions, providing a clear roadmap for future innovation in robust, intelligent robotic perception.
Building similarity graph...
Analyzing shared references across papers
Loading...
Thanh Nguyen Canh
Haolan Zhang
Xiem HoangVan
Robotics and Autonomous Systems
Hanyang University
Japan Advanced Institute of Science and Technology
Anyang University
Building similarity graph...
Analyzing shared references across papers
Loading...
Canh et al. (Fri,) studied this question.
www.synapsesocial.com/papers/6a0d4e9df03e14405aa99cd7 — DOI: https://doi.org/10.1016/j.robot.2026.105535