Transformer inference runs every input through all N layers regardless of difficulty. Early exit methods reduce this cost by inserting classifiers at intermediate layers and outputting when confidence is sufficient. Existing methods — entropy thresholding and the patience mechanism (Zhou et al., 2020) — use static signals: either the current confidence level or whether the prediction has recently changed. Neither tracks whether confidence is improving relative to an adaptive expected trajectory. We propose applying the stochastic power metric P(t) = E(t) × W(t) as the exit criterion, where E(t) measures actual confidence relative to adaptive expected confidence and W(t) is the EWMA of whether E(t) exceeded 1.0 at recent layers. This is structurally identical to the Leaky Integrateand-Fire neuron model (Cantrell 2026): the model fires — exits — when accumulated confidence evidence crosses a threshold. In a simulation study calibrated to BERT-base architecture (12 layers, 600 inputs across four difficulty tiers), the power metric achieves 55.9% compute savings with 99.7% accuracy preservation, compared to 14.6% savings (confidence threshold) and 52.6% savings (patience) both at 100% accuracy. Critically, the power metric is the only method that correctly scales layer allocation with input difficulty: easy inputs exit at an average of 3.6 layers, medium at 5.2, hard at 6.3. These findings are preliminary; validation on real BERT/GPT models with trained exit classifiers is the required next step. Keywords: early exit, adaptive computation, transformer inference, patience mechanism, entropy threshold, power metric, LIF neuron, difficulty-aware allocation, BERT
Building similarity graph...
Analyzing shared references across papers
Loading...
Cole Cantrell
Building similarity graph...
Analyzing shared references across papers
Loading...
Cole Cantrell (Mon,) studied this question.
www.synapsesocial.com/papers/69f1545d879cb923c4944798 — DOI: https://doi.org/10.5281/zenodo.19803061
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: