This study develops reinforcement learning (RL) as a mechanism to align production planning with the principles of a circular economy. It focuses on a manufacturing line that produces pet-care products. The developed RL environment captures key resource constraints, material reuse, and waste flows. A Proximal Policy Optimization (PPO) agent learns to optimize real-time decisions, trading of production throughput against environmental impacts. Its reward function explicitly favors outcomes such as waste minimization and increased packaging reuse. Experimental findings indicate that the agent quickly adjusts to shifting demand, reduces surplus materials, and steadily raises circularity scores throughout episodes. Thus, the framework serves as a flexible, data-driven solution that industrial engineers can deploy when designing greener production workflows. In broader terms, the work illustrates that RL can be embedded in operative systems, advancing the shift to circular manufacturing.
Building similarity graph...
Analyzing shared references across papers
Loading...
Matias Mauricio Davila Alarcon
Hendro Wicaksono
Procedia Computer Science
Constructor University
Building similarity graph...
Analyzing shared references across papers
Loading...
Alarcon et al. (Thu,) studied this question.
www.synapsesocial.com/papers/69c37ba2b34aaaeb1a67e466 — DOI: https://doi.org/10.1016/j.procs.2026.02.225