Mobile robot navigation remains challenging when fast convergence, collision avoidance and deployability must be satisfied simultaneously. The original Q-learning with Artificial Potential Field (QAPF) paradigm is extended in this paper with three coordinated mechanisms that together yield a reported-horizon convergence reduction of approximately four orders of magnitude (from ∼3×106 episodes to ∼200 to 230 episodes under the present protocol) and an internal-ablation collision-rate reduction of approximately one order of magnitude (6.2% to 0.3%), and that open a new capability frontier covering dynamic obstacles, multi-robot coordination, energy-aware velocity modulation and embedded-deployable inference timing. The first mechanism is a potential-based reward-shaping schedule whose unclipped fixed-weight form follows the policy-invariant shaping theorem, while the implemented clipped and time-varying form is used as an empirically stable approximation. Under the present experimental protocol, the reported convergence horizon is reduced from the ∼3×106 episodes reported for the original QAPF formulation to approximately 200 to 230 episodes; this comparison is protocol-dependent and is not claimed as a controlled one-to-one runtime speedup. The second mechanism is a discrete Control Barrier Function (CBF)-inspired action filter (thediscrete filter described in this paper is inspired by the continuous-time CBF literature, but does not carry a forward-invariance proof; it is used as an empirical safety mechanism rather than as a formal Control Barrier Function in the formal continuous-time sense) with per episode visit memory by which the held-out collision rate is reduced from 6.2% for QAPF alone to 0.3% while 93.8% task completion is maintained, where this collision-rate comparison is internal to the QAPF ablation because the prior QAPF reference does not report a comparable held-out collision metric. The third mechanism is a set of extensions to dynamic obstacles, two-robot cooperative navigation under a centralized scheme (with an explicit O(N2) scaling-cost analysis and three decentralization strategies for fleets beyond the small-N regime), curriculum learning and energy-aware velocity modulation. Disturbance robustness tests, empirical timeout/stagnation detection for unreachable-goal cases, i7 reference inference timing with projected embedded-device latencies, multi-axis generalization over obstacle density and grid size, scalability analysis for centralized multi-robot coordination and a scope comparison against A* and RRT* are added by the revised evaluation. Across 30 independent seeds on held-out static maps, 94.5±2.1% success is achieved by adaptive QAPF while 93.8±2.3% success with 0.3±0.4% collisions is achieved by QAPF+CBF. Under a separate finite robustness suite, 85.0±4.1% success is retained by QAPF+CBF in the combined disturbance regime. The timing study indicates that the 20 Hz real-time threshold is comfortably exceeded by all methods on the measured i7 reference platform and by all projected embedded-device equivalents. The results show that a lightweight and safety-oriented navigation policy for grid-based mobile-robot settings can be provided by APF-guided tabular reinforcement learning when it is paired with a discrete safety filter and a clarified energy and robustness analysis.
Building similarity graph...
Analyzing shared references across papers
Loading...
Elizabeth Isaac
Asha J. George
Iacovos Ioannou
Electronics
University of Cyprus
Jain University
Koneru Lakshmaiah Education Foundation
Building similarity graph...
Analyzing shared references across papers
Loading...
Isaac et al. (Sun,) studied this question.
www.synapsesocial.com/papers/69fadaab03f892aec9b1e65d — DOI: https://doi.org/10.3390/electronics15091945