This paper introduces an evaluation framework for long-term, user-centric external memory systems for AI agents. Using a synthetic persona and a diverse corpus of 500 chatbot-style conversations, we derive 59 fact-based queries with objective acceptance criteria and measure factual recall across four systems: mem0, ChatGPT Memory, Ontbo/Light, and Ontbo v2. Results show that Ontbo v2 reaches 93.2% recall, outperforming the other approaches, and a stratified analysis highlights how bounded or lossy memory strategies degrade as conversation history grows. The paper details the dataset construction, querying protocol, and evaluation methodology to support reproducible, privacy-preserving benchmarking of agent memory and personalization.
Building similarity graph...
Analyzing shared references across papers
Loading...
Aubry et al. (Tue,) studied this question.
www.synapsesocial.com/papers/69843583f1d9ada3c1fb4550 — DOI: https://doi.org/10.5281/zenodo.18471369
Stéphane Aubry
Luca Pelissero-Witoslawski
Athénaïs Oslati
Asociación Psicoanalítica de Buenos Aires
Building similarity graph...
Analyzing shared references across papers
Loading...