Reinforcement Learning (RL) in games has gained significant momentum in recent years, enabling the creation of diverse agent behaviors that can enrich a player’s experience. However, deploying RL agents in production settings still presents two key challenges: (1) crafting effective reward functions typically requires RL expertise, and (2) once a game’s content or mechanics are changed, previously tuned rewards often need to be reworked from scratch. This thesis proposes a method for automating reward tuning through the use of Large Language Models (LLMs), starting from a high-level user prompt describing the desired agent behavior. We present two approaches: a zero-shot pipeline, where the LLM proposes an initial reward configuration without any feedback, and a self-correcting loop that iteratively refines these weights based on relevant performance metrics. We test both methods across four tasks in two different environments: navigation and racing. Our results show that LLMs can generate reward configurations that lead to meaningful agent behavior even without iteration, and that the feedback-driven loop consistently improves performance over time. In the racing task, for instance, success rates increased from ∼9% to 74% in a single feedback iteration, eventually reaching over 80% – a competitive performance against the human-expert baseline of 94%, and with identical efficiency in terms of average completion time. With these results, this work takes a first step toward automating reward design in RL, enabling game designers to integrate RL agents with reduced reliance on specialists.
António Afonso (Wed,) studied this question.