What question did this study set out to answer?

The aim is to enhance real-time visual prediction for mobile AR applications using VFMs while addressing computational constraints.

January 24, 2026

VFM in Your Hands: Optimizing Real-Time Scene Understanding for Mobile Augmented Reality

Key Points

The aim is to enhance real-time visual prediction for mobile AR applications using VFMs while addressing computational constraints.
Developed ARIA, the first system for on-device VFM inference acceleration.
Utilized a parallel and selective inference scheme leveraging mobile processor heterogeneity.
Full-frame predictions are offloaded to GPUs, while dynamic region updates are handled by NPUs.
Achieved significant improvements in prediction accuracy in real-world mobile AR scenarios.
Increased the deadline success rate for real-time applications.

Abstract

Mobile Augmented Reality (AR) applications demand high-quality, real-time visual prediction, including pixel-level depth and semantics, to enable immersive and context-aware user experiences. Recently, Vision Foundation Models (VFMs) have offered strong generalization capabilities on diverse and unseen data, supporting scalable mobile AR experiences. However, deploying VFMs on mobile devices is challenging due to computational limitations, particularly in maintaining both prediction accuracy and real-time performance. In this article, we present ARIA 3, the first system that enables on-device inference acceleration of a VFM. ARIA employs the heterogeneity of mobile processors through a parallel and selective inference scheme: full-frame prediction is periodically offloaded to a processor with high parallelism capability like GPU, while lowlatency updates on dynamic regions are conducted via a specialized accelerator like NPU. Implemented and evaluated using mobile devices, ARIA achieved significant improvements in accuracy and deadline success rate on real-world mobile AR scenarios.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Social Feed

Authors

Jeho Lee

C.R. Jung

Gunjoong Kim

Journals

GetMobile Mobile Computing and Communications

Actions

Institutions

Uppsala University

Yonsei University

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Lee et al. (Mon,) studied this question.

www.synapsesocial.com/papers/697460acbb9d90c67120a8d2 — DOI: https://doi.org/10.1145/3793236.3793246

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Mobile Foundation Model as Firmware· 2024 · 31 citations
WiP: Efficient LLM Prefilling with Mobile NPU· 2024 · 3 citations
Pantheon: Preemptible Multi-DNN Inference on Mobile Edge GPUs· 2024 · 19 citations
Fast On-device LLM Inference with NPUs· 2025 · 16 citations
Traveling Salesman Problem· 2013 · 297 citations

VFM in Your Hands: Optimizing Real-Time Scene Understanding for Mobile Augmented Reality

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Social Feed

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider