What question did this study set out to answer?

The research aims to determine if large language models can realistically simulate cognitive decline and assess related memory deficits.

February 14, 2026Open Access

Architectural Constraints in LLM-Simulated Cognitive Decline: In Silico Dissociation of Memory Deficits and Generative Language as Candidate Digital Biomarkers

Key Points

The research aims to determine if large language models can realistically simulate cognitive decline and assess related memory deficits.
Generated synthetic cohorts representing healthy aging, mild cognitive impairment, and Alzheimer's disease.
Assessed cognitive profiles using a conversational neuropsychological battery across various cognitive domains.
Manually manipulated prompt context in Alzheimer's disease subjects to evaluate the robustness of cognitive deficits.
Significant cognitive gradients observed across multiple domains, indicating realistic cognitive decline profiles (p < 0.001).
AD subjects showed notable impairments in episodic memory and increased cognitive intrusions.
Generative tasks were highly sensitive to prompt manipulation, while some memory tasks remained invariant to changes.

Abstract

This study examined whether large language models (LLMs) can generate clinically realistic profiles of cognitive decline and whether simulated deficits reflect architectural constraints rather than superficial role-playing artifacts. Using GPT-4o-mini, we generated synthetic cohorts (n = 10 per group) representing healthy aging, mild cognitive impairment (MCI), and Alzheimer’s disease (AD), assessed through a conversational neuropsychological battery covering episodic memory, verbal fluency, narrative production, orientation, naming, and comprehension. Experiment 1 tested whether synthetic subjects exhibited graded cognitive profiles consistent with clinical progression (Control > MCI > AD). Experiment 2 systematically manipulated prompt context in AD subjects (short, rich biographical, and few-shot prompts) to dissociate robust from manipulable deficits. Significant cognitive gradients emerged (p 0.05), whereas generative tasks (narrative length, verbal fluency) showed high sensitivity (F > 100, p < 0.001). Rich biographical prompts paradoxically increased memory intrusions by 343%, indicating semantic interference rather than cognitive rescue. These results demonstrate that LLMs can serve as in silico test benches for exploring candidate digital biomarkers and clinical training protocols, while highlighting architectural constraints that may inform computational hypotheses about memory and language processing.

Bookmark

View Full Paper

Cite This Study

Pérez-Elvira et al. (Thu,) studied this question.

synapsesocial.com/papers/6990113f2ccff479cfe57b80 https://doi.org/https://doi.org/10.3390/ai7020069

Bookmark

View Full Paper