Key points are not available for this paper at this time.
Memory Archive: A Memory-Grounded Training Paradigm for Computer Use Agents This paper introduces the Memory Archive training paradigm, an end-to-end data architecture and training pipeline that addresses the structural failures of standard Computer Use Agent (CUA) training. Currently, most CUA systems rely on behavioural cloning followed by outcome-supervised RL, leading to intent blindness and a severe representational mismatch between training and deployment formats. The central thesis of this paradigm is Format Consistency. The system centers around a compiled task guide called 'memory.md'—a structured document containing step-by-step procedural reasoning, execution commands, and visual state references. This architecture threads this single artifact through four critical stages of the agent lifecycle: Pre-Training (Format Internalization): The base model learns the grammar of GUI actuation events and step-level multimodal alignment. Supervised Fine-Tuning (SFT): The model is trained with retrieved memories in context, treating actuation artifacts ('CommandEvent' JSONs) as first-class training targets alongside reasoning. Post-Training (Memory Adherence RL): Utilizes Group Relative Policy Optimization (GRPO) driven by a novel three-component reward function (Step Alignment, Visual Grounding, and Outcome Consistency) and a VLM-generated Process Reward Model (PRM). Inference-Time Retrieval: A two-stage retrieval stack (Bi-encoder HNSW + Cross-encoder) dynamically pulls relevant memories. The agent tracks execution deviation and autonomously compiles new 'memory.md' files upon task success, endogenously growing its own training corpus. Furthermore, the paradigm introduces a mechanism for in-training evaluation via self-generated memories, allowing researchers to detect overfitting, underfitting, and context-awareness without relying on static external benchmarks. This document provides full mathematical formulations, data construction specifications, algorithm details, and hyperparameter guidance for implementing the architecture.
Building similarity graph...
Analyzing shared references across papers
Loading...
Kartik A
Building similarity graph...
Analyzing shared references across papers
Loading...
Kartik A (Thu,) studied this question.
www.synapsesocial.com/papers/6a080a11a487c87a6a40be37 — DOI: https://doi.org/10.5281/zenodo.20176599