Key points are not available for this paper at this time.
This paper presents a systematic review and technical revisit of recent developments in embodied AI. We comprehensively synthesize current dominant paradigms, representative architectures, and pivotal advancements around five core modules: (1) perception and understanding, (2) reasoning and decision making, (3) control and action, (4) modeling and learning (e.g., VLA (vision-language-action) and WM (world models)), and (5) data and simulation. This analysis establishes a structured and cutting-edge panoramic view of embodied AI technology. Building on this foundation, we explore the transition from embodied AI to a new paradigm of deep human-AI collaboration. Anchored in the original concept of cobodied AI, we adopt a human-centered technical perspective to systematically investigate this paradigm shift. Specifically, we discuss breakthroughs in critical dimensions such as perceptual alignment, collaborative decision-making, action guidance, and bidirectional mutual learning. These efforts aim to realize three core characteristics: human-centered egocentric grounding, dual-mode cognitive integration, and physical co-embodiment. To our knowledge, this work constitutes the first systematic construction of a technical framework and implementation roadmap for realizing cobodied AI, building upon advances in embodied AI but fundamentally reorienting intelligence around the human body and intent, providing a foundational reference for future research in this domain.
Feng et al. (Wed,) studied this question.