Deep reinforcement learning (DRL) has shown great potential in many fields due to its powerful decision-making ability. To enable agents to acquire sufficient generalization capabilities, domain randomization is applied during the initialization phase of training in parameterizable environments. However, due to the commonly adopted uniform random sampling strategy, the agent will obtain inefficient samples from suboptimal environments in the late training stage, which limits the enlargement of the agent's effective decision region. To address this issue, the environmental difficulty is defined first based on the long-term rewards of the agent during training. Subsequently, we proposed an adaptive environment generator (AEG) based on the Gaussian mixture model (GMM), which dynamically generates training environments with corresponding difficulty levels tailored to the agent's learning progression. The generator maintains a database of environmental parameters based on environmental difficulty, and fits a GMM with the data in the database. During the environment initialization stage in each training episode, the AEG probabilistically generates environmental parameters through sampling from either the GMM or a uniform random distribution, ensuring both appropriate difficulty and sufficient exploration capability. Simulation results demonstrate that AEG-based training expedites learning in the early phases while generating more challenging environments in the late stages. Comprehensive evaluations across multiple environments validate the general applicability of AEG and demonstrate that the resulting agent achieves a broader coverage of effective decision-making regions compared to baseline methods.
Yin et al. (Thu,) studied this question.