March 3, 2026Open Access

Modeling the World – Optimal Policies withReinforcement Learning in World3

Key Points

Optimizing for the Human Sustainable Development Index leads to higher population levels and delays societal collapse.
The implementation of a reinforcement learning agent simulates policy controls based on two reward functions.
HSDI prioritizes sustainable resource use better than HDI, improving outcomes in the World3 model.
Findings highlight the need to integrate environmental considerations into long-term policy planning.

Abstract

As the world approaches critical environmental and societal limits, a long term sustainable development stands as one of the most significant challenges of our time. This project aims to explore how policy can shape a better future, by simulating scenarios that optimize human well-being and include a sustainability perspective. This is achieved by implementing a reinforcement learning agent to select policy controls that maximize two reward functions based on future predictions using the PyWorld3 system dynamics model- a modified Python implementation of the original World3 model introduced in the book Limits to Growth. Modified versions of the Human Development Index (HDI) and Human Sustainable Development Index (HSDI) are used as rewards. The agent controls two parameters for consumption and capital in the model using a roll-out algorithm with an offline trained neural network as state evaluator. The results show that optimizing for the HSDI reward leads to more sustainable outcomes, maintaining higher population levels and delaying societal collapse, in contrast to the HDI reward, which depletes resources more rapidly. These findings suggest that HSDI’s emphasis on sustainable resource use aligns better with the dynamics of the World3 model, highlighting the importance of integrating environmental considerations into long-term policy planning.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Linnéa Ericsson Bäckvall

Emil Johansson

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Modeling the World – Optimal Policies withReinforcement Learning in World3

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study