Abstract: Large Language Models (LLMs) have driven remarkable advances in automated code generationand reasoning, yet their deployment remains tethered to expensive hardware. Smaller models (1–7B parameters) can run on consumer-grade CPUs, but suffer from high error rates on complextasks, leading to wasteful token regeneration and energy overhead. We propose HAIL (HumanAugmented Inference for Lightweight Models), a formal mathematical framework that introducesthe human pair-programmer as an explicit variable in the energy-cost equation of LLM inference.HAIL models how human task decomposition reduces the effective error rate ε through a decayfunction δ(H) = (1−H)γ, where H ∈ 0,1 quantifies the level of human intervention and γ capturesorchestration efficacy. We further introduce Quality-per-Dollar-Hour (QDH), a composite metricthat measures output quality per unit of hardware cost and wall-clock time. We present six testablepredictions and a complete experimental protocol for empirical validation on consumer hardware.To our knowledge, this is the first framework that unifies human-in-the-loop interaction, LLMenergy consumption, and task decomposition into a single formal model. Corresponding author Felipe Cardoso (Carzo) Independent Developer & Researcher - Rio de Janeiro, Brazil Email: felipe@carzo.com.br Web: https://carzo.tech ORCID: https://orcid.org/0009-0005-0429-8785
Building similarity graph...
Analyzing shared references across papers
Loading...
Felipe Cardoso
Building similarity graph...
Analyzing shared references across papers
Loading...
Felipe Cardoso (Mon,) studied this question.
www.synapsesocial.com/papers/69d8948f6c1944d70ce0583c — DOI: https://doi.org/10.5281/zenodo.19446268