As the computational demand for Large Language Models (LLMs) surges, minimizing the carbon footprint of inference has become a critical challenge. While classical schedulers optimize for throughput, they often neglect the spatial and temporal variance of grid carbon intensity. This paper presents a Hybrid Quantum-Classical (HQC) framework utilizing the Quantum Approximate Optimization Algorithm (QAOA) to solve the layer-to-hardware mapping problem with the explicit objective of minimizing gCO₂e emissions. We benchmark our QAOA optimizer against classical Brute Force and Genetic Algorithms across static, dynamic, and noisy environments. Our results demonstrate that QAOA achieves near-perfect optimality (gap<10−5) and successfully adapts to 24-hour grid fluctuations, realizing a simulated carbon saving of 23. 76 gCO₂e. However, the study also reveals a "Simulation Wall" at N=15 layers, where classical simulation of the quantum circuit becomes computationally prohibitive, whereas Genetic Algorithms maintain speed at the cost of theoretical guarantees. We conclude that QAOA represents a scalable, robust pathway for green AI, provided the optimizer is migrated from simulation to physical Quantum Processing Units (QPUs).
Assil KHELIFI (Wed,) studied this question.