Large language models (LLMs) are fundamentally constrained by their context windows — the maximum amount of information that can be held in active working memory during a single inference session. When context limits are reached, systems compress or discard earlier content, destroying continuity and accumulated knowledge. This paper introduces STAR (Structured Tree with Active Retrieval), a hierarchical memory architecture that decouples knowledge storage from active reasoning context. STAR maintains a persistent file-based storage system indexed by lightweight semantic tags that permanently occupy a small reserved portion of the active context window. When relevant content is needed, it is retrieved on-demand into a dedicated retrieval zone, used, and returned to storage with updates applied. This architecture enables models with constrained context windows to access knowledge stores orders of magnitude larger than their native context size. A Gemma 4 E4B model with a 128K token context window, equipped with STAR, can access up to 39.1 million tokens of organized persistent knowledge — approximately 20 times the accessible knowledge of a 2 million token flat-context model. STAR is model-agnostic, hardware-scalable, and deployable on existing LLM infrastructure without retraining.
Building similarity graph...
Analyzing shared references across papers
Loading...
Joshua Knoechelma
Chronos Technology (United Kingdom)
Building similarity graph...
Analyzing shared references across papers
Loading...
Joshua Knoechelma (Sun,) studied this question.
www.synapsesocial.com/papers/69d49fe5b33cc4c35a228517 — DOI: https://doi.org/10.5281/zenodo.19430942
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: