What question did this study set out to answer?

The research aims to determine the nature of serial decoding throughput convergence across various models.

April 22, 2026Open Access

The Throughput Basin Origin: Four Orthogonal Experiments on Whether Serial Decoding Convergence Is Architectural, Thermodynamic, or Data-Driven

Key Points

The research aims to determine the nature of serial decoding throughput convergence across various models.
Conducted four original experiments and five follow-up experiments on systems with varying parameters.
Utilized a unified evaluation harness assessing performance metrics in terms of bits per source byte.
Refined and validated a new equation relating throughput to source entropy and structural depth.
Confirmed serial decoding throughput converges to approximately 1 bit per source byte across multiple models.
Determined the refined equation BPT ≈ source_entropy − f(structural_depth) and verified up to a Cohen’s d of 400.81.
Identified that the choice of tokenizer significantly influences throughput measurements, advocating for the use of bits per source byte.

Abstract

Headline. The serial-decoding throughput basin observed across nine information- processing systems in Papers 1–6 is data-driven, not architectural or thermody- namic. The corrected basin width is ~1 bit per source byte (the historical τ ≈ 4. 16 was bits per BPE token — a tokenizer-packaging artifact, proven empirically in §3. 7). Across nine falsification experiments at 92M and 1. 2B parameters, no archi- tectural, thermodynamic, intrinsic-compression, loss-function, or scale-dependent hy- pothesis fires. The data-driven hypothesis survives, refined into the equation BPT ≈ sourceₑntropy − f (structuraldepth), and is verified at effect sizes up to Cohen’s d = 400. 81 (§3. 12, p = 2. 84×10−15). All blocking items from the internal adversarial review are resolved; cross-architecture generalization, non-Latin scripts, and alter- native quantizers remain open and are scoped under Paper 7. 1. Abstract. Papers 1–6 established that serial decoding throughput converges to τ ≈ 4. 16 ± 0. 19 bits per event across architectures, datasets, and scales (Whitmer 2026a– f). Paper 6 proposed the inherited-constraint hypothesis: AI models converge on ~4 BPT not because silicon demands it, but because biology authored the training data. We report four original experiments plus five follow-up experiments resolving all blocking items and extending the result to 1. 2 billion parameters. Under a unified evaluation harness, SYN-8 achieves 9. 06 BPT (8. 0 under random-offset training; BPSS*=8. 61±0. 12) —all exceeding twice the language basin. A 1. 2B-parameter model trained from scratch on SYN-8 extracts 8. 0 bits per source byte, identical to the 92M model, confirming the basin tracks data entropy across a 14× param- eter range. Three intermediate entropy corpora (SYN-5/6/7) show perfect linear tracking from H=5 through H=8, with no architectural attractor near 4 bits. A paired repeated-measures architecture comparison reveals a small transformer disadvantage (+0. 14 BPT, p=0. 029), not a basin-creating ceiling. INT4→INT3 is a universal quantization cliff across eight models. Silicon sits 1015. 7 –1018. 8 × above Landauer (corrected for context-length artifact). The key new finding: PCFG-8 data (8-bit entropy with hierarchical grammar) achieves 6. 59 BPT—between SYN-8 (~8–9) and natural language (~4). Three loss functions all converge near source entropy, confirming the basin is not a cross-entropy artifact. The refined equation: BPT ≈ sourceₑntropy − f (structuraldepth). A critical methodological finding: BPT is experimentally proven to be tokenizer- dependent. The same model on the same data produces BPT=8. 0 or BPT=3. 8 depending solely on tokenizer vocabulary size. Bits per source byte is the cor- rect tokenizer-independent metric and should replace BPT in cross-experiment comparisons. A comprehensive re-measurement of τ across 9 models (Pythia 70M–1. 4B, GPT-2 small–XL) and 2 corpora (WikiText-2, LAMBADA) gives τ ≈ 0. 85–1. 30 bits per source byte on WikiText-2 and 1. 15–1. 57 on LAMBADA — not the 4. 16 bits per token pre- viously reported. The basin is real; the number is ~1 bit/byte, scaling with model capacity.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Grant Lavell Whitmer III

Actions

Institutions

Alstom (United States)

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

The Throughput Basin Origin: Four Orthogonal Experiments on Whether Serial Decoding Convergence Is Architectural, Thermodynamic, or Data-Driven

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study