Recently, Large Language Models (LLMs) are gaining increased attention in the domain of Table Question Answering (TQA), particularly for extracting data from tables in documents. However, directly entering entire tables as long text into LLMs often leads to incorrect answers because most LLMs cannot inherently capture complex table structures. In this paper, we propose a cell extraction method for TQA without manual identification, even for complex table headers. Our approach estimates table headers by computing similarities between a given question and individual cells via a hybrid retrieval mechanism that integrates a language model and TF-IDF. We then select as the answer the cells at the intersection of the most relevant row and column. Furthermore, the language model is trained using contrastive learning on a small dataset of question-header pairs to enhance performance. We evaluated our approach in the TQA dataset from the shared task "Unifying, Understanding, and Utilizing Unstructured Data in Financial Reports" (U4) held in the NTCIR-18 conference, which our team (WhiteME) participated in. The experimental results show that our pipeline achieves an accuracy of 74.6%, outperforming existing LLMs such as GPT-4o mini (63.9%). In summary, we found that focusing on the header relationships through our hybrid retrieval strategy effectively addresses structural uncertainties in complex tables.
Tanaka et al. (Fri,) studied this question.