This deposit contains the full reproducibility artifact for the paper No Free Signal: A Negative Result for Substrate-Evolution Around Fixed LLMs in an Embodied Multi-Agent Population: the manuscript (PDF, DOCX, Markdown), the experimental harness, the analysis code, and the canonical fitness dataset. We test whether a fixed, non-fine-tuned large language model (Amazon Nova Lite) can become adaptively useful in a 25-creature embodied multi-agent population when selection pressure acts on the heritable communication substrate around the model — production bias, perception attention, emission gating — rather than on the model's own weights. The seven-arm design isolates the contributions of substrate evolution, LLM presence, LLM context-sensitivity, and emission shape, with matched controls including a mute baseline, a no-emitter baseline, a frozen-substrate-with- LLM control, a scrambled-LLM ablation, a replay-randomized LLM ablation, and a no-LLM uniform-noise emitter at approximately matched cadence. Each arm is run on 20 paired seeds for 15,000 ticks, yielding 140 controlled runs. The substrate-evolution hypothesis is not supported by these data. The full-stack treatment did not outperform the frozen-substrate LLM baseline, the mute control, the replay-randomized LLM, the scrambled LLM, or the cadence-targeted random-emitter control. Among the four evolvable emission-bearing arms (D, E, F, G), population AUC was statistically indistinguishable; the cleanest internal contrast (F vs G, same substrate, differing only on emission-source identity) gave a coin flip with Cohen's d ≈ 0.00. No LLM-vs-control comparison crosses the 95% threshold under a paired-seed bootstrap. The substrate-only no-emitter arm is descriptively lowest, but the C-vs-emission contrasts are not statistically resolved at n=20. A complementary behavioral receiver-response analysis on a separately instrumented run shows that emission *source* does affect local receiver behavior even when fitness outcomes are equal: random-noise emissions produce more flee-like movement after a heard event than LLM-shaped emissions on the same metric, with effects modest in magnitude (~0.1–0.2 predator-distance units per heard event). Population-level fitness can therefore hide behavioral discrimination between fitness-equivalent emission sources. Methodological contribution: matched-noise and semantics-broken controls reveal whether claimed LLM-agent fitness gains arise from model intelligence or from persistent signaling channels coupled to adaptive scaffolding. Receiver-response analysis split by self-hearing vs non-self disentangles social communication from self-feedback. We argue that LLM-agent comparisons should be supplemented with (a) approximately cadence-matched noise controls, (b) per-event receiver-response analysis, and (c) explicit reporting of effect-size magnitudes alongside null-hypothesis tests.
Sterling Morrison (Wed,) studied this question.