What type of study is this?

This is a Experimental Study study.

October 19, 2025Open Access

STI-Bench: Are MLLMs Ready for Precise Spatial-Temporal World Understanding?

Key Points

State-of-the-art MLLMs struggle with real-world spatial-temporal understanding, particularly in tasks requiring precise distance estimation.
Extensive experiments in STI-Bench highlight the limitations of current MLLMs in predicting object motion and appearance.
End-to-end applications in embodied AI and autonomous driving demand improved spatial-temporal intelligence from MLLMs.
The benchmark tasks across various scenarios expose critical gaps in spatial-temporal performance of MLLMs.

Abstract

The use of Multimodal Large Language Models (MLLMs) as an end-to-end solution for Embodied AI and Autonomous Driving has become a prevailing trend. While MLLMs have been extensively studied for visual semantic understanding tasks, their ability to perform precise and quantitative spatial-temporal understanding in real-world applications remains largely unexamined, leading to uncertain prospects. To evaluate models' Spatial-Temporal Intelligence, we introduce STI-Bench, a benchmark designed to evaluate MLLMs' spatial-temporal understanding through challenging tasks such as estimating and predicting the appearance, pose, displacement, and motion of objects. Our benchmark encompasses a wide range of robot and vehicle operations across desktop, indoor, and outdoor scenarios. The extensive experiments reveals that the state-of-the-art MLLMs still struggle in real-world spatial-temporal understanding, especially in tasks requiring precise distance estimation and motion analysis.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Li et al. (Mon,) studied this question.

www.synapsesocial.com/papers/68f4b10d3d9d770bbc696d56 — DOI: https://doi.org/10.48550/arxiv.2503.23765

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Authors

Yun Li

Yiming Zhang

Tao Lin

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

STI-Bench: Are MLLMs Ready for Precise Spatial-Temporal World Understanding?

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion