Version 3 — corrects cross-system comparisons from v1/v2. See §0 Changelog in the PDF. This technical report describes Agent Brain, a biologically inspired memory system for autonomous AI agents. In contrast to stateless Large Language Model interactions, Agent Brain provides persistent, weighted, and self-organizing memory that emulates human cognitive processes: perception, storage, retrieval, consolidation, and forgetting. The system integrates eleven successive layers including a Perception Gate, Deduplication Guard, typed memory storage (episodic/semantic/procedural), Named Entity Recognition (flair/ner-german-large, F1 92.31%), a Knowledge Graph, LLM-based Query Expansion, Hybrid Search via Reciprocal Rank Fusion, Cross-Encoder Re-Ranking, an implicit Feedback Loop based on the Free Spaced Repetition Scheduler (FSRS), a nightly five-phase Dream Cycle, and complete Workspace Isolation with Row-Level Security. Evaluation on LongMemEval-M. On the public weaviate/longmemeval-m-cleaned benchmark (500 QA pairs across 510 multi-turn workspaces, GPT-4o judge), Agent Brain achieves 71.7% accuracy without consolidation and 69.8% with the Dream Cycle enabled. Our own pgvector-only control reaches 72.2 – 73.9%, which we report transparently as a 2.2 pp gap versus our hybrid pipeline on quiz-style questions. To our knowledge these are the first published numbers on the m-cleaned variant; peer numbers from Zep, Mem0, LangMem, and OpenAI Memory exist only on the LongMemEval-S variant and are therefore not directly comparable. §15 discusses what is and is not known about cross-system ranking on this benchmark. What changed vs v2: v2 contained two errors that are corrected here — the "Zep 63.8%" figure was the baseline row of Rasmussen et al. 2025 Table 2 (not Zep itself; Zep’s reported score on LongMemEval-S with gpt-4o-mini is 71.2%), and cross-system comparisons mixed LongMemEval-S peer numbers with our LongMemEval-M result. v3 removes the "state-of-the-art" claim, reports 71.7% as a single-system self-report on a clearly specified variant, and explicitly abstains from cross-system ranking until peers are re-evaluated on m-cleaned under identical judging. The system has been in production use since early 2026 for Swiss property management (Immobilienbewirtschaftung) with over 5,000 memories, 10,000 entities, and eight specialized agents. Reproducibility: All evaluation scripts, ingestion code, and judge configurations released under MIT license at github.com/AgentBrainHQ/agentbrain-benchmarks.
Building similarity graph...
Analyzing shared references across papers
Loading...
Theshoth Sritharan
Goldman Sachs (United States)
Building similarity graph...
Analyzing shared references across papers
Loading...
Theshoth Sritharan (Tue,) studied this question.
www.synapsesocial.com/papers/69e9bb6285696592c86ed206 — DOI: https://doi.org/10.5281/zenodo.19673132