What question did this study set out to answer?

To enhance table question answering by accurately extracting data from complex table structures using a hybrid retrieval method.

April 1, 2026Open Access

WhiteME at U4 Shared Task: Hybrid Retrieval with Table-Structured Clues for Economic Table QA

Key Points

To enhance table question answering by accurately extracting data from complex table structures using a hybrid retrieval method.
Developed a cell extraction method to automate table header identification.
Integrated a language model with TF-IDF for computing similarities between questions and table cells.
Employed contrastive learning to train the language model using a dataset of question-header pairs.
Evaluated the approach on a TQA dataset from the NTCIR-18 conference.
Achieved an accuracy of 74.6% in table question answering.
Outperformed existing LLMs like GPT-4o mini, which had an accuracy of 63.9%.
Demonstrated that focusing on header relationships through hybrid retrieval effectively resolves structural uncertainties.

Abstract

Recently, Large Language Models (LLMs) are gaining increased attention in the domain of Table Question Answering (TQA), particularly for extracting data from tables in documents. However, directly entering entire tables as long text into LLMs often leads to incorrect answers because most LLMs cannot inherently capture complex table structures. In this paper, we propose a cell extraction method for TQA without manual identification, even for complex table headers. Our approach estimates table headers by computing similarities between a given question and individual cells via a hybrid retrieval mechanism that integrates a language model and TF-IDF. We then select as the answer the cells at the intersection of the most relevant row and column. Furthermore, the language model is trained using contrastive learning on a small dataset of question-header pairs to enhance performance. We evaluated our approach in the TQA dataset from the shared task "Unifying, Understanding, and Utilizing Unstructured Data in Financial Reports" (U4) held in the NTCIR-18 conference, which our team (WhiteME) participated in. The experimental results show that our pipeline achieves an accuracy of 74.6%, outperforming existing LLMs such as GPT-4o mini (63.9%). In summary, we found that focusing on the header relationships through our hybrid retrieval strategy effectively addresses structural uncertainties in complex tables.

Bookmark

View Full Paper

Cite This Study

Tanaka et al. (Fri,) studied this question.

synapsesocial.com/papers/69cd7a2b5652765b073a70c2 https://doi.org/https://doi.org/10.20736/0002002093

Bookmark

View Full Paper