What question did this study set out to answer?

The aim is to enhance handover stability in vehicular networks using a regret-aware evaluation framework for reinforcement learning strategies.

April 17, 2026Open Access

Decision‐Stability and Regret Diagnostics for Reinforcement Learning Based Handover in Vehicular Mobility

Key Points

The aim is to enhance handover stability in vehicular networks using a regret-aware evaluation framework for reinforcement learning strategies.
Developed a regret-aware evaluation framework focusing on decision stability.
Implemented an actor-critic policy to manage handovers based on mobility and channel conditions.
Evaluated the policy using DriveNetSim to simulate vehicular and wireless dynamics.
Stable base station selection under fast mobility conditions was achieved.
The learned policy's median handover performance matched the oracle's, while maintaining comparable signal strength.
The regret event rate averaged at approximately 0.16, indicating decisions were close to the oracle reference.
Wireless delay remained below 120 ms and handover delay below 2 ms, confirming stable behavior.

Abstract

ABSTRACT Vehicular Base Station (BS) Handover (HO) operates under fast mobility, dynamic channel conditions, and workload. These factors make stable association difficult, particularly in dense deployments where frequent HOs introduce overhead and delay. Learning based HO improves adaptability, yet their evaluation commonly depends on average performance metrics, which obscure unstable decision behavior. This paper introduces a regret‐aware, decision‐stability‐oriented evaluation framework for Reinforcement Learning (RL) based HO. The framework shifts the focus from aggregate reward to per‐decision reliability by utilizing regret diagnostics relative to an oracle reference. An actor–critic HO policy is developed that observes mobility, channel quality, and BS resource states. The policy is trained to optimize long‐term utility while implicitly suppressing unnecessary HO. The framework is evaluated on DriveNetSim , which records mobility dynamics, wireless propagation, and BS resource states. Decision quality is evaluated using regret indicators and delay decomposition. Results show stable BS selection under fast mobility, with the learned policy matching the oracle median serving BS index while maintaining a comparable median received signal strength indicator (RSSI) of approximately dBm versus dBm for the oracle. The nonzero regret event rate remains limited between about 0.05 and 0.30 with an average of approximately 0.16, indicating that most decisions remain close to the oracle reference. Delay analysis further shows that wireless delay remains below approximately 120 ms and HO delay remains below 2 ms, confirming stable association behavior under dynamic vehicular conditions.

Decision‐Stability and Regret Diagnostics for Reinforcement Learning Based Handover in Vehicular Mobility

Key Points

Abstract

Cite This Study