MemVault: A Three-Layer Memory Management Architecture for Cost-Optimized LLM Applications Large Language Models (LLMs) deployed in production environments face two compounding challenges: statelessness across conversation sessions and escalating inference costs at scale. This paper presents MemVault, a full-stack system implementing a three-layer persistent memory architecture designed to address both limitations. The system organizes conversational context into Working Memory (Redis, session-scoped), Episodic Memory (PostgreSQL, LLM-summarized), and Long-Term Persona Memory (ChromaDB, vector-embedded). A background MemScheduler autonomously promotes memory segments across layers based on recency and importance scoring. Additionally, a query complexity classifier dynamically routes inference requests to appropriately sized language models, reducing overall inference costs. Empirical evaluation demonstrates significant reductions in token overhead and improved retrieval efficiency compared to naive full-context approaches. MemVault provides a scalable, cost-efficient, and production-ready architecture for long-term conversational AI systems.
Building similarity graph...
Analyzing shared references across papers
Loading...
R Suriya Anand (Sat,) studied this question.
www.synapsesocial.com/papers/699ba08472792ae9fd8703b4 — DOI: https://doi.org/10.5281/zenodo.18725218
R Suriya Anand
Ramakrishna Mission Vidyamandira
Building similarity graph...
Analyzing shared references across papers
Loading...