What type of study is this?

This is a Quantitative Study study.

October 12, 2025Open Access

Pt-HotpotQA: Evaluating Multi-Hop Question Answering on Original and Portuguese-translated Datasets Using LLMs

Key Points

Multilingual models show significantly better performance in English than Portuguese, highlighting language-specific challenges.
Fine-tuning large language models leads to improved multi-hop question answering results in Portuguese datasets.
The evaluation utilizes the HotpotQA benchmark, providing a comprehensive view on LLMs and multilingual question answering.
Findings suggest a narrowing performance gap in Portuguese with increased model size, underscoring the need for larger models in this context.

Abstract

Multi-hop Question Answering (MHQA) advances Natural Language Processing by pushing models to combine information from multiple sources in a series of reasoning steps. Despite substantial advancements in MHQA for English, resources for evaluating Large Language Models (LLMs) in Portuguese remain scarce. To address this gap, we introduce a publicly available Portuguese translation of the HotpotQA dataset, a well-established English MHQA benchmark. We systematically evaluate several variants of the Llama multilingual LLM across both the original and translated datasets, analyzing performance variations by language. Our findings demonstrate that multilingual models consistently perform better in English than in Portuguese, though this gap narrows with increased model size. Additionally, we show the impact of fine-tuning on improving MHQA performance in Portuguese. This study provides valuable insights into optimizing LLMs for multilingual contexts and contributes a relevant benchmark for Portuguese-language MHQA research.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Mucciaccia et al. (Mon,) studied this question.

www.synapsesocial.com/papers/68ebe3d6becc64ad52fdaee7 — DOI: https://doi.org/10.5753/jbcs.2025.5801

Also consider

Synapse has enriched 4 closely related papers on similar clinical questions. Consider them for comparative context:

Medical knowledge graph question answering for drug‐drug interaction prediction based on multi‐hop machine reading comprehension· 2024 · 7 citations
MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text· 2013 · 666 citations
Semantic Parsing on Freebase from Question-Answer Pairs· 2013 · 1,586 citations
MultiWOZ-PT Um Conjunto de Diálogos Orientados a Tarefas em Português· 2024 · 1 citations

Authors

Sérgio S. Mucciaccia

Thiago M. Paixão

Filipe Mutz

Journals

Journal of the Brazilian Computer Society

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Pt-HotpotQA: Evaluating Multi-Hop Question Answering on Original and Portuguese-translated Datasets Using LLMs

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Authors

Journals

Actions

References and Citations

Citation Network

Connected Papers

Discussion