We present a systematic methodology for deriving safe action spaces in systems where an AI agent interacts with entities that have hard safety boundaries — physical or digital. The methodology, termed constraint-convergence, proceeds in four phases: enumerate all participants in the interaction and classify their state variables by observability, hard limits, and controllability; evaluate which variables can support hardware-enforceable (or system-enforceable) safety gates; analyze pairwise and multi-body interaction effects that create second-order dangers not captured by individual gates; and compile the results into a decision-reaction matrix that simultaneously defines the system's operational capabilities and its safety specification. The methodology's central claim is that safe interaction interfaces should be derived from constraints rather than designed from requirements. When applied systematically, the methodology produces two classes of output: the safe action space (actions for which all relevant safety variables are gatable) and an auditable exclusion record (actions removed because critical variables are unobservable, measurement latency exceeds harm onset, or emergency response does not produce a safe state). These exclusions are systematic outputs of the evaluation procedure rather than ad hoc designer assumptions — though the input classifications (particularly observability) require domain expertise and are subject to revision as sensor technology evolves. The methodology was developed through application to a near-worst-case physical domain — adversarial human-robot combat training — where it independently derived safety exclusions matching those found through manual domain analysis, while additionally identifying four safety gaps that manual analysis missed. We present application sketches demonstrating transferability to surgical robotics, collaborative manufacturing, physical rehabilitation, and AI agent interaction in digital systems. The digital domain extension generalizes the methodology beyond physical force to any AI agent operating near irreversible consequences — data destruction, privacy exposure, financial commitment, and published communication.
Fabio-Eric Rempel (Mon,) studied this question.