As a nascent research frontier, Chain-of-Thought (COT) reasoning technology and its multimodal applications in large language models (LLMs) currently lack standardized concepts, methodologies, and a holistic research framework. To address these gaps, we have conducted an in-depth analysis of the core processes involved and comprehensively reviewed over 40 authoritative references. Our research has pinpointed three pivotal areas: the efficient integration of multimodal data features, the optimization and enhancement of COT and logical reasoning capabilities, and the practical implementation of multimodal LLMs. We have summarized the cutting-edge advancements, future trends, and the significant challenges. It is hoped that this comprehensive study will assist beginners in swiftly building a foundational understanding of this research area, clarifying the research methodology and workflow, and enabling them to concentrate their efforts on core algorithmic design. We are confident that this survey will attract broader participation from researchers in the field of COT reasoning for multimodal LLMs and provide valuable references and guidance for their scientific endeavors.
Building similarity graph...
Analyzing shared references across papers
Loading...
Lei Shi
Hongqi Han
Jia Luo
Frontiers in artificial intelligence and applications
Beijing University of Technology
Guilin University of Electronic Technology
Institute of Scientific and Technical Information of China
Building similarity graph...
Analyzing shared references across papers
Loading...
Shi et al. (Fri,) studied this question.
www.synapsesocial.com/papers/68dc12cc8a7d58c25ebb0c10 — DOI: https://doi.org/10.3233/faia250726