April 2, 2024Open Access

Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey

Key Points

Key points are not available for this paper at this time.

Abstract

Large language models (LLMs) have recently shown impressive performance on tasks involving reasoning, leading to a lively debate on whether these models possess reasoning capabilities similar to humans. However, despite these successes, the depth of LLMs' reasoning abilities remains uncertain. This uncertainty partly stems from the predominant focus on task performance, measured through shallow accuracy metrics, rather than a thorough investigation of the models' reasoning behavior. This paper seeks to address this gap by providing a comprehensive review of studies that go beyond task accuracy, offering deeper insights into the models' reasoning processes. Furthermore, we survey prevalent methodologies to evaluate the reasoning behavior of LLMs, emphasizing current trends and efforts towards more nuanced reasoning analyses. Our review suggests that LLMs tend to rely on surface-level patterns and correlations in their training data, rather than on genuine reasoning abilities. Additionally, we identify the need for further research that delineates the key differences between human and LLM-based reasoning. Through this survey, we aim to shed light on the complex reasoning processes within LLMs.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Mondorf et al. (Tue,) studied this question.

www.synapsesocial.com/papers/68e70b2bb6db64358768478f — DOI: https://doi.org/10.48550/arxiv.2404.01869

Authors

Philipp Mondorf

Barbara Plank

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models -- A Survey

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Also consider