As the world approaches critical environmental and societal limits, a long term sustainable development stands as one of the most significant challenges of our time. This project aims to explore how policy can shape a better future, by simulating scenarios that optimize human well-being and include a sustainability perspective. This is achieved by implementing a reinforcement learning agent to select policy controls that maximize two reward functions based on future predictions using the PyWorld3 system dynamics model- a modified Python implementation of the original World3 model introduced in the book Limits to Growth. Modified versions of the Human Development Index (HDI) and Human Sustainable Development Index (HSDI) are used as rewards. The agent controls two parameters for consumption and capital in the model using a roll-out algorithm with an offline trained neural network as state evaluator. The results show that optimizing for the HSDI reward leads to more sustainable outcomes, maintaining higher population levels and delaying societal collapse, in contrast to the HDI reward, which depletes resources more rapidly. These findings suggest that HSDI’s emphasis on sustainable resource use aligns better with the dynamics of the World3 model, highlighting the importance of integrating environmental considerations into long-term policy planning.
Building similarity graph...
Analyzing shared references across papers
Loading...
Linnéa Ericsson Bäckvall
Emil Johansson
Building similarity graph...
Analyzing shared references across papers
Loading...
Bäckvall et al. (Wed,) studied this question.