Language models trained on human text face an average per-token surprise of ~4.4 bits(measured as bits-per-token from cross-entropy loss across four models). This value,established before any comparison to biological systems, coincides with the serial decodingthroughput basin (~4.16 ± 0.19 bits) independently measured across the ribosome (4.39 bits),phoneme discrimination (4.2 bits), and neural working memory (3.1 bits) (Whitmer, 2026a–e).Shannon (1951) independently estimated English entropy at ~1 bit per character (~5 bits perword)—convergent evidence from 75 years ago. A companion paper showed silicon AI hassub-linear energy scaling (α capacity = 0.937), ruling out a thermodynamic cost basin. Wepropose that AI inherits its throughput from the biological systems that generated its trainingdata: brains constrained to ~4–5 bits per cognitive event produce language calibrated to thatcapacity. Seven-corpus experiments confirm that (1) destroying word order doubles per-tokensurprise from ~4.4 to ~10.8 bits, with syntax contributing ~3.3 bits (paired difference p < 0.01across models); (2) the Zipf distribution is identical in original and shuffled text (α = −0.843, R² =0.992 for both), proving word statistics are insufficient; (3) softmax T = 1.0 produces outputentropy of ~5.4 bits, coinciding with the basin; and (4) exploiting structure costs ~20% moreenergy per token. The throughput basin constrains AI indirectly—through language that evolvedto match biological cognition.
Building similarity graph...
Analyzing shared references across papers
Loading...
Grant Lavell Whitmer III (Mon,) studied this question.
www.synapsesocial.com/papers/69d49f6bb33cc4c35a227e17 — DOI: https://doi.org/10.5281/zenodo.19432911
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:
Grant Lavell Whitmer III
Alstom (United States)
Building similarity graph...
Analyzing shared references across papers
Loading...