Key points are not available for this paper at this time.
In this paper, we delve into the innovative application of large language models (LLMs) and their extension, large vision-language models (LVLMs), in the field of remote sensing (RS) image analysis. We particularly emphasize their multi-tasking potential with a focus on image captioning and visual question answering (VQA). In particular, we introduce an improved version of the Large Language and Vision Assistant Model (LLaVA), specifically adapted for RS imagery through a low-rank adaptation approach. To evaluate the model performance, we create the RS-instructions dataset, a comprehensive benchmark dataset that integrates four diverse single-task datasets related to captioning and VQA. The experimental results confirm the model’s effectiveness, marking a step forward toward the development of efficient multi-task models for RS image analysis.
Building similarity graph...
Analyzing shared references across papers
Loading...
Yakoub Bazi
Laila Bashmal
Mohamad Mahmoud Al Rahhal
Remote Sensing
University of Trento
King Saud University
Building similarity graph...
Analyzing shared references across papers
Loading...
Bazi et al. (Tue,) studied this question.
www.synapsesocial.com/papers/68e6de6eb6db64358765a56e — DOI: https://doi.org/10.3390/rs16091477
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: