May 26, 2024Open Access

A Survey of Multimodal Large Language Model from A Data-centric Perspective

Key Points

Key points are not available for this paper at this time.

Abstract

Human beings perceive the world through diverse senses such as sight, smell, hearing, and touch. Similarly, multimodal large language models (MLLMs) enhance the capabilities of traditional large language models by integrating and processing data from multiple modalities including text, vision, audio, video, and 3D environments. Data plays a pivotal role in the development and refinement of these models. In this survey, we comprehensively review the literature on MLLMs from a data-centric perspective. Specifically, we explore methods for preparing multimodal data during the pretraining and adaptation phases of MLLMs. Additionally, we analyze the evaluation methods for datasets and review benchmarks for evaluating MLLMs. Our survey also outlines potential future research directions. This work aims to provide researchers with a detailed understanding of the data-driven aspects of MLLMs, fostering further exploration and innovation in this field.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Bai et al. (Sun,) studied this question.

www.synapsesocial.com/papers/68e6859fb6db64358760ea04 — DOI: https://doi.org/10.48550/arxiv.2405.16640

Authors

Tianyi Bai

Hao Liang

Binwang Wan

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

A Survey of Multimodal Large Language Model from A Data-centric Perspective

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion