Achieving stable bipedal locomotion for humanoid robots remains a central challenge in reinforcement learning (RL), in which the design of reward functions is pivotal but non-trivial. This paper proposes a three-tier statistical reward shaping framework to optimize bipedal gait learning. First, training outcomes are diagnostically monitored using forward distance, fall rate, and posture score. Pearson correlation and regression analyses are then employed to identify trade-offs and isolate the direct effects of reward components. Finally, targeted parameter sweeps enable directionally guided optimization, substantially reducing heuristic parameter tuning while refining a reward function for the H1 robot in Isaac Lab. Experimental results demonstrate clear improvements over the baseline. The optimized policy reduces convergence time by 14% and increases forward distance by 186%. Stability is markedly enhanced, with fall rate decreasing from 75% to 2% and active locomotion efficiency nearly doubling (0.339 to 0.678). These results validate a reproducible, data-driven framework for reward design, highlighting the importance of principled statistical analysis in complex RL-based humanoid locomotion.
Building similarity graph...
Analyzing shared references across papers
Loading...
Shuhan Yan
Chuan Chen
Xinliang Zhou
Electronics
Nanyang Technological University
Beijing Jiaotong University
Building similarity graph...
Analyzing shared references across papers
Loading...
Yan et al. (Fri,) studied this question.
www.synapsesocial.com/papers/69b6068883145bc643d1c67f — DOI: https://doi.org/10.3390/electronics15061203