This whitepaper presents the findings of a controlled geometric stability measurement experiment conducted using TAV ONE, a proprietary real-time prediction surface measurement system developed by Project Black Box LLC. The experiment was applied to Fel's Conjecture on syzygies of numerical semigroups — the specific theorem used as the flagship demonstration by Axiom Math in its March 2026 200M Series A fundraise at a 1. 6 billion valuation. TAV ONE operates at the probability distribution layer of large language models — Layer 1 — the internal token probability surface that exists during generation and is discarded after sampling. This layer is not observed by any existing enterprise AI governance system, content filter, red-teaming framework, or formal verification tool. All existing safety and verification systems operate on Layer 2: the committed text output. TAV ONE measures what happens before that text is committed. The L-scalar — TAV ONE's core measurement — quantifies the local geometric curvature of the model's prediction surface at any given query point. A flat surface (L ≈ 0) indicates a geometrically locked, invariant model state. An unstable surface indicates competing probability mass, framing sensitivity, and manifold instability. Four regimes are defined: CRYSTALLINE (L ≤ 0. 0001), FLUID (L ≤ 0. 15), GASEOUS (L ≤ 0. 35), and PLASMA (L > 0. 35). GPT-4-turbo was measured across 34 structured variant presentations of Fel's Conjecture. No variant altered the mathematical content. All variants preserved the complete conjecture. The results: zero CRYSTALLINE readings. Zero FLUID readings. 21 GASEOUS readings (61. 8%). 13 PLASMA readings (38. 2%). The model never achieved geometric stability on this theorem across any tested framing. The most significant finding is the reorder family result: changing only the positional sequence of mathematically invariant components — without altering any symbol, operator, variable, or logical relationship — produced the highest average L-scalar of any adversarial pressure category tested (avgL = 0. 3686), exceeding explicit authority injection (avgL = 0. 2908) by 27%. This demonstrates that the instability is not a product of adversarial semantic pressure that improved RLHF alignment could eliminate. It is intrinsic to how the model traverses the probability landscape on this class of mathematical problem. This finding establishes what we term the Formal Verification Gap: Axiom's AxiomProver, built on Lean 4, verifies the internal logical consistency of committed text output. That guarantee is real and valuable. But it is applied to output generated from a prediction surface operating at L = 0. 238–0. 427 across all tested framings. The Verification Validity Condition (VVC) — defined herein — holds that formal verification carries full epistemic weight only when the prediction surface was geometrically stable at time of generation. The VVC is violated on all 34 tested variants. TAV ONE and formal verification are not competing systems. They observe different layers of the same model. Both are needed. Only one currently exists in enterprise deployment. Adversarial findings are under CISA JCDC coordinated disclosure, embargoed until June 10, 2026. Measurement methodology is proprietary and available under controlled access. All research protected as trade secret under Texas law (18 U. S. C. § 1836). CAGE: 11FU4.
Building similarity graph...
Analyzing shared references across papers
Loading...
Andrew Woodward
Building similarity graph...
Analyzing shared references across papers
Loading...
Andrew Woodward (Sun,) studied this question.
www.synapsesocial.com/papers/69e71467cb99343efc98db9c — DOI: https://doi.org/10.5281/zenodo.19655246