What question did this study set out to answer?

The aim is to enhance LiDAR-based 3D object detection by addressing vertical geometric information loss.

March 15, 2026Open Access

SV-TransFusion for LiDAR 3D object detection with Sparse Voxel–Query Interaction

Key Points

The aim is to enhance LiDAR-based 3D object detection by addressing vertical geometric information loss.
Introduced SV-TransFusion framework
Developed Sparse Voxel-Query Interaction (SVQI) module for better data retrieval
Implemented Query-based Contrastive Denoising (QCD) strategy to improve training stability.
Achieved state-of-the-art performance on nuScenes dataset
Significantly improved detection accuracy over baseline methods
Maintained moderate computational overhead.

Abstract

LiDAR-based 3D object detection has witnessed significant progress with the introduction of Transformer architectures. Currently, Bird’s-Eye-View (BEV) based methods, such as TransFusion, dominate the field by flattening 3D voxel features into 2D representations for efficient query processing. However, this projection inevitably leads to the loss of crucial vertical geometric information, resulting in suboptimal performance for objects with complex height profiles or in occluded scenarios. In this paper, we present SV-TransFusion, a novel framework designed to mitigate this limitation by re-establishing the connection between object queries and raw 3D structural data. Our approach incorporates two primary innovations. First, we propose the Sparse Voxel-Query Interaction (SVQI) module. Instead of relying solely on compressed BEV features, SVQI allows learnable queries to directly attend to the sparse, non-empty 3D voxels from the backbone, effectively retrieving fine-grained height and structural information. Second, to accelerate convergence and enhance training stability, we introduce a Query-based Contrastive Denoising (QCD) strategy. This mechanism aids the bipartite matching process by introducing noise-corrupted queries during training, thereby enabling the model to learn more robust feature representations. Extensive experiments on the nuScenes dataset demonstrate that SV-TransFusion achieves state-of-the-art performance, significantly outperforming baseline methods in detection accuracy with a moderate computational overhead.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Tianli Shi

Journals

Scientific Reports

Actions

Institutions

China Electronics Technology Group Corporation

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

SV-TransFusion for LiDAR 3D object detection with Sparse Voxel–Query Interaction

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study