ABSTRACT Vehicular Base Station (BS) Handover (HO) operates under fast mobility, dynamic channel conditions, and workload. These factors make stable association difficult, particularly in dense deployments where frequent HOs introduce overhead and delay. Learning based HO improves adaptability, yet their evaluation commonly depends on average performance metrics, which obscure unstable decision behavior. This paper introduces a regret‐aware, decision‐stability‐oriented evaluation framework for Reinforcement Learning (RL) based HO. The framework shifts the focus from aggregate reward to per‐decision reliability by utilizing regret diagnostics relative to an oracle reference. An actor–critic HO policy is developed that observes mobility, channel quality, and BS resource states. The policy is trained to optimize long‐term utility while implicitly suppressing unnecessary HO. The framework is evaluated on DriveNetSim , which records mobility dynamics, wireless propagation, and BS resource states. Decision quality is evaluated using regret indicators and delay decomposition. Results show stable BS selection under fast mobility, with the learned policy matching the oracle median serving BS index while maintaining a comparable median received signal strength indicator (RSSI) of approximately dBm versus dBm for the oracle. The nonzero regret event rate remains limited between about 0.05 and 0.30 with an average of approximately 0.16, indicating that most decisions remain close to the oracle reference. Delay analysis further shows that wireless delay remains below approximately 120 ms and HO delay remains below 2 ms, confirming stable association behavior under dynamic vehicular conditions.
Badshah et al. (Thu,) studied this question.