Large Language Models face a fundamental limitation: context windows are finite, but reasoning is not. Current approaches -- summarization at capacity limits, sliding windows, retrieval-augmented generation -- treat memory as a storage problem. I propose a different approach: drawing from cognitive science to build a bounded, self-managing context architecture. Cognitive Context Management (CCM) is inspired by neuroscience research on working memory, consolidation, and retrieval. It implements a four-tier architecture that: (1) separates working memory from long-term storage; (2) triggers compaction on conclusions, not capacity; (3) uses relevance-based displacement; and (4) enables cue-based retrieval. A working implementation tested end-to-end with real LLM calls validates all mechanisms in real-time across diverse topics (health advice, software debugging, travel planning). Retroactive analysis on three real conversations (58K-240K tokens, 43 effort phases) demonstrates O(1) working memory (approximately 4K tokens constant) with 93-98% token reduction, compared to O(n) linear growth in traditional approaches.
Building similarity graph...
Analyzing shared references across papers
Loading...
Alexander Zanfir
Building similarity graph...
Analyzing shared references across papers
Loading...
Alexander Zanfir (Tue,) studied this question.
www.synapsesocial.com/papers/699fe33695ddcd3a253e6db6 — DOI: https://doi.org/10.5281/zenodo.18752095