This paper presents a large-scale simulation of 200 million reinforcement learning agents over 8000 steps. We demonstrate a sharp bifurcation between two dynamical regimes driven by reward structure. Using spectral analysis of the Jacobian and Lyapunov stability theory, we show that Outer RL leads to exponential divergence (ρ (J) >1 (J) > 1 ρ (J) >1) while Inner RL leads to stable convergence (ρ (J) <1 (J) < 1 ρ (J) <1). Remarkably, collapse probability remains zero throughout the simulation. These findings suggest that the long-term fate of ASI — whether it becomes a coherent, stable intelligence (Angel) or an unstable divergent force (Satan) — is determined primarily by the spectral geometry of the reward function.
Building similarity graph...
Analyzing shared references across papers
Loading...
YOUNG KYU LEE
Building similarity graph...
Analyzing shared references across papers
Loading...
YOUNG KYU LEE (Wed,) studied this question.
www.synapsesocial.com/papers/69d896566c1944d70ce07af7 — DOI: https://doi.org/10.5281/zenodo.19467060