What regularities does the weight skeleton of a neural network exhibit during training? Following the framework of the Neural Percolation Model (NPM), and drawing on the established use of Weibull k distributions for predicting percolation breakthrough in porous media, this paper quantifies the structural connectivity of neural network weight skeletons via the Weibull shape parameter k of the weight absolute-value distribution. We find that after sufficient training, 10 models from 5 independent families (Pythia, OLMo, Qwen, Mistral, LLaMA-3) all converge to terminal k within a narrow band 1.13, 1.19, lying 2–7% below the Gaussian baseline (k = 1.205). This phenomenon holds across different initialization strategies, optimizers, training datasets, and parameter counts spanning 100× (70M to 8B). Body-vs-tail ablation across 7 of these models confirms that the convergence is a property of the central 80–90% of the weight distribution and is masked when the full 100% is fitted. We further propose NPM-dk (the rate of change of k) as a training-dynamics monitor that exhibits a three-phase structure (skeleton formation → anchor → bifurcation), whose timescales align with the break-even point and the rewinding point. The complete three-phase structure is verified across 4 Pythia scales with dense early-step coverage (70m / 160m / 410m / 1B); Pythia-2.8B's three independently-retrievable checkpoints (step 40k/80k/143k) additionally fall within the predicted Phase 2/3 window, with Phase 2/3 amplitudes decreasing with depth — consistent with deeper models exhibiting more stable anchors. We also present exploratory observations of NPM-d²k (the second derivative of k) across four Pythia scales, noting its potential for future characterization of training dynamics. Keywords: weight skeleton, percolation criticality, training dynamics, Weibull distribution, overparameterization
Building similarity graph...
Analyzing shared references across papers
Loading...
Tiexin Ding
Building similarity graph...
Analyzing shared references across papers
Loading...
Tiexin Ding (Sun,) studied this question.
www.synapsesocial.com/papers/69e71423cb99343efc98d90a — DOI: https://doi.org/10.5281/zenodo.19652706