February 19, 2024Open Access

멀티모달 대형 언어 모델의 (R)진화: 서베이

Key Points

Key points are not available for this paper at this time.

Abstract

텍스트와 시각적 양식을 연결하는 것은 생성형 인텔리전스에서 필수적인 역할을 합니다. 이러한 이유로, 대형 언어 모델의 성공에서 영감을 받아, 멀티모달 대형 언어 모델(MLLM)의 개발에 상당한 연구 노력이 집중되고 있습니다. 이 모델들은 시각적 및 텍스트 양식을 입력과 출력 모두에서 원활하게 통합할 수 있으며, 대화 기반 인터페이스와 지시 수행 능력을 제공합니다. 본 논문에서는 최근 시각 기반 MLLM들을 포괄적으로 검토하며, 이들의 아키텍처 선택, 멀티모달 정렬 전략, 학습 기법을 분석합니다. 또한 시각적 기초, 이미지 생성 및 편집, 시각적 이해, 도메인 특화 응용 등 다양한 과제를 통한 상세한 모델 분석을 수행합니다. 더불어, 학습 데이터셋과 평가 벤치마크를 정리하고, 성능 및 계산 요구 사항 면에서 기존 모델들을 비교합니다. 전반적으로 본 서베이는 현존하는 최첨단 기술에 대한 포괄적 개요를 제공하며, 미래의 MLLM 연구를 위한 기초를 마련합니다.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Caffagni et al. (Mon,)은 이 질문을 연구하였습니다.

www.synapsesocial.com/papers/68e78a66b6db6435876fce78 — DOI: https://doi.org/10.48550/arxiv.2402.12451

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Authors

Davide Caffagni

Federico Cocchi

Luca Barsellotti

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

멀티모달 대형 언어 모델의 (R)진화: 서베이

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion