Headline. The quantization cliff first reported in Paper 7 (Whitmer 2026g) is real, universal across transformer and state-space architectures, present in real trained weights, present in gate-level hardware arithmetic, and supported by Welch t = 633.74 / p = 2.84×10−15 / Cohen’s d = 400.81 — but the cliff is not at a fixed bit count. It is at the precision where the quantization scheme’s level allocation can no longer represent the weight distribution’s critical features. Symmetric uniform: cliff at INT8→INT4. NF4 (Gaussian-quantile): cliff at INT4→INT3. Lloyd-Max: per-matrix cliff below INT3, but end-to-end propagation breaks at INT4 due to layer-wise error accumulation. The minimum viable inference specification is not “N-bit integer” but “N-bit with distribution-aware level allocation, validated end-to-end.” The verification was built across seven rounds of escalating adversarial pressure (§1.4), with a thesis pivot in Round 5 where the original cliff hypothesis was refined into a level-allocation framework.
Grant R. Whitmer (Tue,) studied this question.