Experimental deep reinforcement learning (DRL) control of a turbulent boundary layer is conducted for the first time at Re_ = 1196, with the aim of friction-drag reduction. Two hot films, an impinging plasma jet actuator array and two wall hot wires act as the state detector, flow disturber and reward evaluator, respectively. The control law parametrised by a radial basis function network is executed in real time on a field programmable gate array and optimised using a classical value-based algorithm (deep Q-network). Results show that DRL control requires only 30 s to train a closed-loop control law with satisfactory drag-reduction performance. Compared with open-loop control where only fine-tuned periodical forcing can reduce the friction drag, the experimental efficiency is improved significantly. Proper setting of the hyper-parameters is crucial in DRL. Particularly, the reward time delay and control frequency need to match the convection time scale and the characteristic frequency of the turbulent boundary layer. The optimal DRL control setting achieves 6. 7 % relative drag reduction, almost three times that of the best open-loop control (2. 3 %). Physically, plasma actuation induces alternating low-speed and high-speed zones that confine the sidewise motion of turbulent streaks. The final control law optimised by DRL can be simplified as a threshold control, firing the plasma actuator after perceiving a streak burst event and a long-lasting high-speed zone. Control benefits are attributed to the increase in the occurrence probability of high-reward states and the elevation of mean reward at different clusters.
Fang et al. (Thu,) studied this question.