What question did this study set out to answer?

The aim is to analyze how Large Language Models (LLMs) reason with varying memory access focusing on story understanding.

April 10, 2026Open Access

Beyond Math: Stories as a Testbed for Memorization-Constrained Reasoning in LLMs

Key Points

The aim is to analyze how Large Language Models (LLMs) reason with varying memory access focusing on story understanding.
Evaluated multiple LLMs including GPT-4o, LLaMA3.3-70B, and DeepSeek V3.
Utilized a two-tier framework: Inductive and Restrictive Settings.
Conducted assessments on six character-centric story benchmarks.
Observed up to a 45.2% accuracy drop in the Restrictive Setting.
Inductive Setting maintained performance, indicating effective prompting for reasoning.

Abstract

Memorization has been shown to greatly inflate Large Language Models' (LLMs) performance on domains such as math and logic, where success should primarily rely on applying generalizable reasoning rules. In many real-world applications, however, memorization is not meant to be eliminated but selectively constrained—for example, in story understanding, where background knowledge must be integrated with narrative context. Drawing on the cognitive science distinction between “verbatim” (exact recall) and “gist” (semantic abstraction) memorization, we propose a two-tier framework for analyzing how LLMs reason under different degrees of memory access. The Inductive (prompt-guided) Setting softly steers models to reason through selective, context-relevant recall, while the Restrictive Setting imposes stronger constraints by limiting verbatim memory access. Evaluating GPT-4o, LLaMA3.3-70B, and DeepSeek V3 on six character-centric story understanding benchmarks, we find up to a 45.2% accuracy drop under the Restrictive Setting, revealing strong dependence on surface recall. By contrast, the Inductive Setting maintains performance, indicating that prompting can align LLMs toward memorization-constrained reasoning.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Yuxuan Jiang

Francis Ferraro

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Beyond Math: Stories as a Testbed for Memorization-Constrained Reasoning in LLMs

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study