Version 3 — corrects cross-system comparisons from v1/v2. See §0 Changelog in the PDF. This technical report describes Agent Brain, a biologically inspired memory system for autonomous AI agents. In contrast to stateless Large Language Model interactions, Agent Brain provides persistent, weighted, and self-organizing memory that emulates human cognitive processes: perception, storage, retrieval, consolidation, and forgetting. The system integrates eleven successive layers including a Perception Gate, Deduplication Guard, typed memory storage (episodic/semantic/procedural), Named Entity Recognition (flair/ner-german-large, F1 92.31%), a Knowledge Graph, LLM-based Query Expansion, Hybrid Search via Reciprocal Rank Fusion, Cross-Encoder Re-Ranking, an implicit Feedback Loop based on the Free Spaced Repetition Scheduler (FSRS), a nightly five-phase Dream Cycle, and complete Workspace Isolation with Row-Level Security. Evaluation on LongMemEval-M. On the public weaviate/longmemeval-m-cleaned benchmark (500 QA pairs across 510 multi-turn workspaces, GPT-4o judge), Agent Brain achieves 71.7% accuracy without consolidation and 69.8% with the Dream Cycle enabled. Our own pgvector-only control reaches 72.2 – 73.9%, which we report transparently as a 2.2 pp gap versus our hybrid pipeline on quiz-style questions. To our knowledge these are the first published numbers on the m-cleaned variant; peer numbers from Zep, Mem0, LangMem, and OpenAI Memory exist only on the LongMemEval-S variant and are therefore not directly comparable. §15 discusses what is and is not known about cross-system ranking on this benchmark. What changed vs v2: v2 contained two errors that are corrected here — the "Zep 63.8%" figure was the baseline row of Rasmussen et al. 2025 Table 2 (not Zep itself; Zep’s reported score on LongMemEval-S with gpt-4o-mini is 71.2%), and cross-system comparisons mixed LongMemEval-S peer numbers with our LongMemEval-M result. v3 removes the "state-of-the-art" claim, reports 71.7% as a single-system self-report on a clearly specified variant, and explicitly abstains from cross-system ranking until peers are re-evaluated on m-cleaned under identical judging. The system has been in production use since early 2026 for Swiss property management (Immobilienbewirtschaftung) with over 5,000 memories, 10,000 entities, and eight specialized agents. Reproducibility: All evaluation scripts, ingestion code, and judge configurations released under MIT license at github.com/AgentBrainHQ/agentbrain-benchmarks.
Building similarity graph...
Analyzing shared references across papers
Loading...
Theshoth Sritharan (Tue,) studied this question.
www.synapsesocial.com/papers/69e9bb6285696592c86ed206 — DOI: https://doi.org/10.5281/zenodo.19673132
Theshoth Sritharan
Goldman Sachs (United States)
Building similarity graph...
Analyzing shared references across papers
Loading...