June 12, 2024Open Access

UnO: Unsupervised Occupancy Fields for Perception and Forecasting

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

Perceiving the world and forecasting its future state is a critical task for self-driving. Supervised approaches leverage annotated object labels to learn a model of the world -- traditionally with object detections and trajectory predictions, or temporal bird's-eye-view (BEV) occupancy fields. However, these annotations are expensive and typically limited to a set of predefined categories that do not cover everything we might encounter on the road. Instead, we learn to perceive and forecast a continuous 4D (spatio-temporal) occupancy field with self-supervision from LiDAR data. This unsupervised world model can be easily and effectively transferred to downstream tasks. We tackle point cloud forecasting by adding a lightweight learned renderer and achieve state-of-the-art performance in Argoverse 2, nuScenes, and KITTI. To further showcase its transferability, we fine-tune our model for BEV semantic occupancy forecasting and show that it outperforms the fully supervised state-of-the-art, especially when labeled data is scarce. Finally, when compared to prior state-of-the-art on spatio-temporal geometric occupancy prediction, our 4D world model achieves a much higher recall of objects from classes relevant to self-driving.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Ben Agro

Quinlan Sykora

Sergio Casas

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

UnO: Unsupervised Occupancy Fields for Perception and Forecasting

Puntos clave

Resumen

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider