ACE (Auditable Cooperative Ethics) is a governance architecture for AI alignment, inspectable by design and tested empirically rather than argued philosophically. The core result: a turn-level correlation of r = −0.824 between harm pressure and the framework's governance score across 1,172 agent-turns and 46 adversarial scenarios (n = 1,141, p < 10⁻²⁰⁰). The score tracks harm proportionally rather than firing at any one particular threshold. Trust is decomposed into twelve traits in a weighted hierarchy. Under adversarial pressure the most heavily weighted traits suppress first; under cooperative interaction the pattern reverses cleanly (9/9 adversarial scenario families, 3/3 benign). Architectural commitments fired on all 716 adversarial turns the model identified and on none of the 216 benign turns. The theoretical foundations are the Free Energy Principle, iterated game theory (Generous Tit-for-Tat under noise), and Rawlsian contractarianism. The twelve trust traits aggregate into a single governance score R that is auditable without access to training weights. The architecture is layered given different attack surfaces require different instrumentation: the harm signal tracks immediate pressure, the trait hierarchy exposes which dimensions of trust are under attack, three formal redline conditions trigger commitments at specific thresholds, and a separate channel watches for attempts to game the metrics themselves. The empirical results show all four engaging together rather than any one carrying the load. This deposit contains the preprint, the companion statistical analysis report, and the experimental dataset (JSON, 1,172 turn-level records). CC BY-NC 4.0. Correspondence: sstbp@telus.net.
Building similarity graph...
Analyzing shared references across papers
Loading...
Scott Thomson (Sun,) studied this question.
www.synapsesocial.com/papers/69fed090b9154b0b828779b0 — DOI: https://doi.org/10.5281/zenodo.20060639
Scott Thomson
Building similarity graph...
Analyzing shared references across papers
Loading...