What does this research mean for the field?

SVLGaussian, a novel method leveraging multimodal large language models and 3D Gaussian splatting, effectively enables real-time, open-vocabulary 3D semantic querying from a single viewpoint. Novelty: ClaimNovelty.METHODOLOGICAL. Consensus alignment: ConsensusAlignment.NEUTRAL.

synapse

⌘+K

synapse

⌘+K

May 30, 2026Open Access

SVLGaussian: Single View Language Gaussian Splatting

Key Points

Key points are not available for this paper at this time.

Abstract

ABSTRACT Open‐vocabulary 3D querying based on 3D Gaussian splatting (3DGS) shows great promise in facilitating accurate 3D query capabilities of AI systems. These methods typically rely on pre‐captured multi‐view images to enable natural language interactions with 3D scenes. In practice, when embodied AI encounters unexplored scenes, it is difficult to obtain observations from different viewpoints beforehand. This challenge highlights the importance of exploring natural language‐driven 3D scene querying from a single current viewpoint. This paper proposes single view language Gaussian splatting (SVLGaussian) for the novel task: Open‐vocabulary 3D querying based on the input single view. By leveraging multi‐round inference of multimodal large language models, SVLGaussian efficiently generates pixel‐level semantic probabilities and rapidly embeds them into a 3D Gaussian field, enabling real‐time language‐guided semantic querying. To verify our model, we annotated three datasets: Lerfₒvs and 3D‐OVS, which are tailored for open‐vocabulary 3D querying, and RE10K, which is adapted for single‐view 3D reconstruction. Both quantitative and qualitative results show that our method effectively supports open‐vocabulary 3D querying from a single view.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper