March 3, 2026

An Adaptive Environment Generator for Effective Decision Region Enlargement in Deep Reinforcement Learning

Key Points

The adaptive environment generator enhances decision-making capabilities in reinforcement learning by tailoring training environments.
Simulation results show that training with the adaptive generator leads to faster learning in initial stages, with gradual complexity increases later.
The method utilizes a gaussian mixture model to create diverse environmental parameters based on agent performance metrics.
This approach may enable broader coverage of effective decision-making regions, improving generalization in reinforcement learning tasks.

Abstract

Deep reinforcement learning (DRL) has shown great potential in many fields due to its powerful decision-making ability. To enable agents to acquire sufficient generalization capabilities, domain randomization is applied during the initialization phase of training in parameterizable environments. However, due to the commonly adopted uniform random sampling strategy, the agent will obtain inefficient samples from suboptimal environments in the late training stage, which limits the enlargement of the agent's effective decision region. To address this issue, the environmental difficulty is defined first based on the long-term rewards of the agent during training. Subsequently, we proposed an adaptive environment generator (AEG) based on the Gaussian mixture model (GMM), which dynamically generates training environments with corresponding difficulty levels tailored to the agent's learning progression. The generator maintains a database of environmental parameters based on environmental difficulty, and fits a GMM with the data in the database. During the environment initialization stage in each training episode, the AEG probabilistically generates environmental parameters through sampling from either the GMM or a uniform random distribution, ensuring both appropriate difficulty and sufficient exploration capability. Simulation results demonstrate that AEG-based training expedites learning in the early phases while generating more challenging environments in the late stages. Comprehensive evaluations across multiple environments validate the general applicability of AEG and demonstrate that the resulting agent achieves a broader coverage of effective decision-making regions compared to baseline methods.

Bookmark

An Adaptive Environment Generator for Effective Decision Region Enlargement in Deep Reinforcement Learning

Key Points

Abstract

Cite This Study