Abstract Video Object Segmentation (VOS) is a key component in computer vision applications, including surveillance, autonomous driving, and robotics. However, existing VOS models often struggle with generalization to new videos with complex, topologically transforming deformable objects (eg. cooking, assembling, state change), degraded environments and long video sequences, resulting in tracking drift, low recall and memory saturation. We developed Mu ltiple object VOS and tracking S mart Mem ory architecture (MuSMem), a generalizable approach that incorporates three key innovations: (i) fusing SAM with High-Quality masks alongside appearance-based candidate-selection to refine coarse segmentation masks, resulting in improved object boundaries; (ii) dynamic smart memory that manages a history of key frames based on a novel information preserving gain , combined with relevance and freshness spatio-temporal criteria; and (iii) explores the use of monocular depth maps for occlusion robustness. MuSMem significantly reduces memory usage, reduces drift, tracks complex object topological changes and improves long-term prediction performance. MuSMem can be integrated with Vision-Language Models (VLMs) for zero-shot generalization to unseen visual domains. Experiments using VOS benchmark datasets show that MuSMem ranks first on VOTSt-2024, Long Video Dataset and LVOS, and second on VOTS-2024, demonstrating the best generalizability and state-of-the-art performance across single-, multi-, and complex VOS tasks.
Building similarity graph...
Analyzing shared references across papers
Loading...
Elham Soltani Kazemi
Imad Eddine Toubal
Gani Rahmon
International Journal of Computer Vision
University of Missouri
Building similarity graph...
Analyzing shared references across papers
Loading...
Kazemi et al. (Tue,) studied this question.
www.synapsesocial.com/papers/69d8940c6c1944d70ce05002 — DOI: https://doi.org/10.1007/s11263-026-02742-1
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: