Multi-hop Question Answering (MHQA) advances Natural Language Processing by pushing models to combine information from multiple sources in a series of reasoning steps. Despite substantial advancements in MHQA for English, resources for evaluating Large Language Models (LLMs) in Portuguese remain scarce. To address this gap, we introduce a publicly available Portuguese translation of the HotpotQA dataset, a well-established English MHQA benchmark. We systematically evaluate several variants of the Llama multilingual LLM across both the original and translated datasets, analyzing performance variations by language. Our findings demonstrate that multilingual models consistently perform better in English than in Portuguese, though this gap narrows with increased model size. Additionally, we show the impact of fine-tuning on improving MHQA performance in Portuguese. This study provides valuable insights into optimizing LLMs for multilingual contexts and contributes a relevant benchmark for Portuguese-language MHQA research.
Building similarity graph...
Analyzing shared references across papers
Loading...
Mucciaccia et al. (Mon,) studied this question.
www.synapsesocial.com/papers/68ebe3d6becc64ad52fdaee7 — DOI: https://doi.org/10.5753/jbcs.2025.5801
Synapse has enriched 4 closely related papers on similar clinical questions. Consider them for comparative context:
Sérgio S. Mucciaccia
Thiago M. Paixão
Filipe Mutz
Journal of the Brazilian Computer Society
Building similarity graph...
Analyzing shared references across papers
Loading...