What question did this study set out to answer?

This research aims to develop a method for controlling the movement of an autonomous underwater vehicle (AUV) follower based on visual cues from a leader AUV.

April 16, 2026Open Access

Method for Controlling the Movement of an AUV Follower Based on Visual Information About the Position of the AUV Leader Using Reinforcement Learning Methods

Key Points

This research aims to develop a method for controlling the movement of an autonomous underwater vehicle (AUV) follower based on visual cues from a leader AUV.
Implemented a leader-follower scheme for AUV navigation without hydroacoustic communication.
Utilized Proximal Policy Optimization for reinforcement learning in a simulated environment.
Defined state space, actions, and a reward function to minimize positioning errors.
Conducted Bayesian optimization to refine reward function weights.
Reduced the RMS tracking error from 0.24 m to 0.09 m through optimization.
Found that heading regulation is more critical for stability than position penalties.
Demonstrated that a single trained policy can maintain stable formation across varying initial conditions.

Abstract

This paper considers the problem of controlling the motion of an autonomous underwater vehicle (AUV) following a leader in a leader–follower scheme based on visual information about the leader’s position. It is assumed that the leader is equipped with a system of light markers with known geometry, and the follower determines its relative position based on data from an onboard camera without using a hydroacoustic communication channel or direct exchange of navigation information. To synthesize the control law, a reinforcement learning method based on the Proximal Policy Optimization algorithm is used. Policy learning is performed in a simulation environment, taking into account the dynamic model of the agent in the horizontal plane and observation noise. A structure of state space, actions, and reward function is proposed, aimed at minimizing the error in relative position and orientation. Additionally, Bayesian optimization of the weight coefficients of the reward function is performed. Bayesian optimization of the reward function weights reduces the RMS tracking error from 0.24 m to 0.09 m and demonstrates that heading regulation has a significantly stronger impact on stability than position penalties. The results of modeling, testing in the Webots environment, and experiments on MiddleAUV class devices confirm the feasibility and scalability of the approach. It is shown that a single trained policy ensures stable formation maintenance when the number of follower agents and initial conditions change without additional retraining.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Norenko et al. (Tue,) studied this question.

www.synapsesocial.com/papers/69e07d8f2f7e8953b7cbe8b9 — DOI: https://doi.org/10.3390/drones10040282

Authors

Evgenii Norenko

Vadim Kramar

Aleksey Kabanov

Journals

Drones

Actions

Institutions

Marine Hydrophysical Institute

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Method for Controlling the Movement of an AUV Follower Based on Visual Information About the Position of the AUV Leader Using Reinforcement Learning Methods

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion