This paper considers the problem of controlling the motion of an autonomous underwater vehicle (AUV) following a leader in a leader–follower scheme based on visual information about the leader’s position. It is assumed that the leader is equipped with a system of light markers with known geometry, and the follower determines its relative position based on data from an onboard camera without using a hydroacoustic communication channel or direct exchange of navigation information. To synthesize the control law, a reinforcement learning method based on the Proximal Policy Optimization algorithm is used. Policy learning is performed in a simulation environment, taking into account the dynamic model of the agent in the horizontal plane and observation noise. A structure of state space, actions, and reward function is proposed, aimed at minimizing the error in relative position and orientation. Additionally, Bayesian optimization of the weight coefficients of the reward function is performed. Bayesian optimization of the reward function weights reduces the RMS tracking error from 0.24 m to 0.09 m and demonstrates that heading regulation has a significantly stronger impact on stability than position penalties. The results of modeling, testing in the Webots environment, and experiments on MiddleAUV class devices confirm the feasibility and scalability of the approach. It is shown that a single trained policy ensures stable formation maintenance when the number of follower agents and initial conditions change without additional retraining.
Building similarity graph...
Analyzing shared references across papers
Loading...
Norenko et al. (Tue,) studied this question.
www.synapsesocial.com/papers/69e07d8f2f7e8953b7cbe8b9 — DOI: https://doi.org/10.3390/drones10040282
Evgenii Norenko
Vadim Kramar
Aleksey Kabanov
Drones
Marine Hydrophysical Institute
Building similarity graph...
Analyzing shared references across papers
Loading...