Large Language Models face a fundamental limitation: context windows are finite, but reasoning is not. Current approaches -- summarization at capacity limits, sliding windows, retrieval-augmented generation -- treat memory as a storage problem. I propose a different approach: drawing from cognitive science to build a bounded, self-managing context architecture. Cognitive Context Management (CCM) is inspired by neuroscience research on working memory, consolidation, and retrieval. It implements a four-tier architecture that: (1) separates working memory from long-term storage; (2) triggers compaction on conclusions, not capacity; (3) uses relevance-based displacement; and (4) enables cue-based retrieval. A working implementation tested end-to-end with real LLM calls validates all mechanisms in real-time across diverse topics (health advice, software debugging, travel planning). Retroactive analysis on three real conversations (58K-240K tokens, 43 effort phases) demonstrates O(1) working memory (approximately 4K tokens constant) with 93-98% token reduction, compared to O(n) linear growth in traditional approaches.
Building similarity graph...
Analyzing shared references across papers
Loading...
Alexander Zanfir (Tue,) studied this question.
www.synapsesocial.com/papers/699fe33695ddcd3a253e6db6 — DOI: https://doi.org/10.5281/zenodo.18752095
Alexander Zanfir
Building similarity graph...
Analyzing shared references across papers
Loading...