April 15, 2024Open Access

Context-aware chatbot using MLLMs for Cultural Heritage

Key Points

Key points are not available for this paper at this time.

Abstract

Multi-modal Large Language Models (MLLMs) are currently an extremely active research topic for the multimedia and computer vision communities, and show a significant impact in visual analysis and text generation tasks. MLLM's are well-versed in integrated understanding, analysis of complex data from cross modalities (i.e. text-image) and text generation with chat abilities. Almost all MLLM's, focus on alignment of image features to textual features for downstream text generation tasks includes detailed image description, visual question answering, stories and poems generation, phrase grounding, etc.. However, when focusing on visual question answering, questions that are highly relevant to the context of an image may not be answered correctly with the existing MLLM's, contrary to questions that are related to visual aspects. Moreover, generating meta data (context) for an image using present day MLLM's is hard task due to hallucinating characteristic of underlying Large Language Models (LLM's), and adequate contextual information cannot be directly derived from an image based perspective.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Rachabatuni et al. (Mon,) studied this question.

www.synapsesocial.com/papers/68e6f047b6db64358766afbd — DOI: https://doi.org/10.1145/3625468.3652193

Authors

Pavan Kartheek Rachabatuni

Filippo Principi

Paolo Mazzanti

Actions

Institutions

University of Florence

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Context-aware chatbot using MLLMs for Cultural Heritage

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Also consider