What question did this study set out to answer?

The aim is to develop a more effective memory architecture for large language models to improve continuity and knowledge access.

April 7, 2026Open Access

Beyond the Context Window: STAR Framework for Scalable Persistent Memory in Large Language Models

Key Points

The aim is to develop a more effective memory architecture for large language models to improve continuity and knowledge access.
Introduced STAR, a hierarchical memory architecture decoupling storage from reasoning context.
Utilized a file-based storage system indexed by semantic tags.
Enabled on-demand retrieval of relevant content into a designated zone.
Demonstrated 39.1 million tokens of accessible knowledge with the STAR-enhanced model.
Showed approximately 20 times the knowledge retention compared to traditional models with limited context.
Enabled hardware-scalability and deployment on existing infrastructure without needing retraining.

Abstract

Large language models (LLMs) are fundamentally constrained by their context windows — the maximum amount of information that can be held in active working memory during a single inference session. When context limits are reached, systems compress or discard earlier content, destroying continuity and accumulated knowledge. This paper introduces STAR (Structured Tree with Active Retrieval), a hierarchical memory architecture that decouples knowledge storage from active reasoning context. STAR maintains a persistent file-based storage system indexed by lightweight semantic tags that permanently occupy a small reserved portion of the active context window. When relevant content is needed, it is retrieved on-demand into a dedicated retrieval zone, used, and returned to storage with updates applied. This architecture enables models with constrained context windows to access knowledge stores orders of magnitude larger than their native context size. A Gemma 4 E4B model with a 128K token context window, equipped with STAR, can access up to 39.1 million tokens of organized persistent knowledge — approximately 20 times the accessible knowledge of a 2 million token flat-context model. STAR is model-agnostic, hardware-scalable, and deployable on existing LLM infrastructure without retraining.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Joshua Knoechelma

Actions

Institutions

Chronos Technology (United Kingdom)

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Beyond the Context Window: STAR Framework for Scalable Persistent Memory in Large Language Models

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider