What question did this study set out to answer?

This research develops and tests the Auditable Cooperative Ethics (ACE) framework for aligning AI systems with ethical governance.

May 9, 2026Open Access

Auditable Cooperative Ethics (ACE): An Inspectable Governance Architecture for AI Alignment

Key Points

This research develops and tests the Auditable Cooperative Ethics (ACE) framework for aligning AI systems with ethical governance.
Empirical testing of ACE across 1,172 agent-turns and 46 adversarial scenarios.
Analysis of harm pressure correlation with governance scores.
Evaluation of twelve trust traits under varying pressures (adversarial and cooperative).
Turn-level correlation of r = −0.824 between harm pressure and governance score across 1,172 turns (p < 10⁻²⁰⁰).
Governance commitments activated on all 716 adversarial turns and on none of the 216 benign scenarios.
Trust traits exhibit opposite responses under adversarial versus cooperative interactions.

Abstract

ACE (Auditable Cooperative Ethics) is a governance architecture for AI alignment, inspectable by design and tested empirically rather than argued philosophically. The core result: a turn-level correlation of r = −0.824 between harm pressure and the framework's governance score across 1,172 agent-turns and 46 adversarial scenarios (n = 1,141, p < 10⁻²⁰⁰). The score tracks harm proportionally rather than firing at any one particular threshold. Trust is decomposed into twelve traits in a weighted hierarchy. Under adversarial pressure the most heavily weighted traits suppress first; under cooperative interaction the pattern reverses cleanly (9/9 adversarial scenario families, 3/3 benign). Architectural commitments fired on all 716 adversarial turns the model identified and on none of the 216 benign turns. The theoretical foundations are the Free Energy Principle, iterated game theory (Generous Tit-for-Tat under noise), and Rawlsian contractarianism. The twelve trust traits aggregate into a single governance score R that is auditable without access to training weights. The architecture is layered given different attack surfaces require different instrumentation: the harm signal tracks immediate pressure, the trait hierarchy exposes which dimensions of trust are under attack, three formal redline conditions trigger commitments at specific thresholds, and a separate channel watches for attempts to game the metrics themselves. The empirical results show all four engaging together rather than any one carrying the load. This deposit contains the preprint, the companion statistical analysis report, and the experimental dataset (JSON, 1,172 turn-level records). CC BY-NC 4.0. Correspondence: sstbp@telus.net.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Scott Thomson (Sun,) studied this question.

www.synapsesocial.com/papers/69fed090b9154b0b828779b0 — DOI: https://doi.org/10.5281/zenodo.20060639

Auditable Cooperative Ethics (ACE): An Inspectable Governance Architecture for AI Alignment

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion