Headline. The serial-decoding throughput basin observed across nine information- processing systems in Papers 1–6 is data-driven, not architectural or thermody- namic. The corrected basin width is ~1 bit per source byte (the historical τ ≈ 4. 16 was bits per BPE token — a tokenizer-packaging artifact, proven empirically in §3. 7). Across nine falsification experiments at 92M and 1. 2B parameters, no archi- tectural, thermodynamic, intrinsic-compression, loss-function, or scale-dependent hy- pothesis fires. The data-driven hypothesis survives, refined into the equation BPT ≈ sourceₑntropy − f (structuraldepth), and is verified at effect sizes up to Cohen’s d = 400. 81 (§3. 12, p = 2. 84×10−15). All blocking items from the internal adversarial review are resolved; cross-architecture generalization, non-Latin scripts, and alter- native quantizers remain open and are scoped under Paper 7. 1. Abstract. Papers 1–6 established that serial decoding throughput converges to τ ≈ 4. 16 ± 0. 19 bits per event across architectures, datasets, and scales (Whitmer 2026a– f). Paper 6 proposed the inherited-constraint hypothesis: AI models converge on ~4 BPT not because silicon demands it, but because biology authored the training data. We report four original experiments plus five follow-up experiments resolving all blocking items and extending the result to 1. 2 billion parameters. Under a unified evaluation harness, SYN-8 achieves 9. 06 BPT (8. 0 under random-offset training; BPSS*=8. 61±0. 12) —all exceeding twice the language basin. A 1. 2B-parameter model trained from scratch on SYN-8 extracts 8. 0 bits per source byte, identical to the 92M model, confirming the basin tracks data entropy across a 14× param- eter range. Three intermediate entropy corpora (SYN-5/6/7) show perfect linear tracking from H=5 through H=8, with no architectural attractor near 4 bits. A paired repeated-measures architecture comparison reveals a small transformer disadvantage (+0. 14 BPT, p=0. 029), not a basin-creating ceiling. INT4→INT3 is a universal quantization cliff across eight models. Silicon sits 1015. 7 –1018. 8 × above Landauer (corrected for context-length artifact). The key new finding: PCFG-8 data (8-bit entropy with hierarchical grammar) achieves 6. 59 BPT—between SYN-8 (~8–9) and natural language (~4). Three loss functions all converge near source entropy, confirming the basin is not a cross-entropy artifact. The refined equation: BPT ≈ sourceₑntropy − f (structuraldepth). A critical methodological finding: BPT is experimentally proven to be tokenizer- dependent. The same model on the same data produces BPT=8. 0 or BPT=3. 8 depending solely on tokenizer vocabulary size. Bits per source byte is the cor- rect tokenizer-independent metric and should replace BPT in cross-experiment comparisons. A comprehensive re-measurement of τ across 9 models (Pythia 70M–1. 4B, GPT-2 small–XL) and 2 corpora (WikiText-2, LAMBADA) gives τ ≈ 0. 85–1. 30 bits per source byte on WikiText-2 and 1. 15–1. 57 on LAMBADA — not the 4. 16 bits per token pre- viously reported. The basin is real; the number is ~1 bit/byte, scaling with model capacity.
Building similarity graph...
Analyzing shared references across papers
Loading...
Grant Lavell Whitmer III
Alstom (United States)
Building similarity graph...
Analyzing shared references across papers
Loading...
Grant Lavell Whitmer III (Mon,) studied this question.
www.synapsesocial.com/papers/69e864c46e0dea528dde97b4 — DOI: https://doi.org/10.5281/zenodo.19672654