What question did this study set out to answer?

March 2, 2026

Multi objective real-time optimization and adaptive control strategy for catalytic cracking unit based on deep reinforcement learning

Key Points

The aim is to develop a real-time adaptive control strategy for catalytic cracking units using deep reinforcement learning.
Developed a framework integrating proximal policy optimization and adaptive fuzzy PID controller.
Collected real-time sensor data including reactor temperature, feed rate, and product quality.
Utilized a reward system connected to control stability, product quality, and energy efficiency.
Achieved a rise time and settling time of 27–31 seconds, outperforming traditional methods.
Improved predictive accuracy with RMSE of 0.092, MAE of 0.011, and MSE of 0.0025.

Abstract

Abstract Catalytic cracking units (CCUs) are decisive in modern refineries as they convert heavy hydrocarbons into lighter, more expensive oils like gasoline. Their operation requires the handling of very complicated and nonlinear dynamics, conflicting goals, such as maximizing yield while minimizing energy consumption and emissions, and, finally, the constantly changing process conditions. Traditional control strategies like PID (proportional-integral-derivative) and model-based methods are not adaptive real-time controllers but rather tend to require human intervention. Thus, limiting their effectiveness in dynamic environments, they struggle to adapt in real-time. To meet these challenges, this research introduces a framework for multi-objective real-time optimization and adaptive control of CCUs based on deep reinforcement learning (DRL). The novel PPO-AFPC (proximal policy optimization with adaptive fuzzy PID controller) is a smart combination of the PPO algorithm which is a DRL algorithm that is stable for continuous action spaces, and an adaptive fuzzy PID controller to control the output more accurately. The real-time sensor data is collected from the CCU and consists of reactor temperature, feed rate, and product quality, which is then fed to the PPO to compute the actions, and these actions in turn feed the AFPC to dynamically adjust the rules of fuzziness and the parameters of the PID controller. The policy gets better and better all the time using the reward signals which are connected to control stability, product quality, and energy efficiency. The results from the simulations show that the PTO-AFPC model has a rise time and a settling time of 27–31 s, which is a very big improvement when it comes to dynamic response when compared to the traditional methods. Besides, predictive accuracy has also been improved with the errors measured as RMSE (0.092), MAE (0.011), and MSE (0.0025), which all point to the fact that the process control is more precise, stable, and efficient. All in all, this hybrid approach demonstrates the potential to provide a solution for the real-time adaptive optimization of complex industrial processes.

Bookmark

Cite This Study

Zhang et al. (Fri,) studied this question.

synapsesocial.com/papers/69a52df3f1e85e5c73bf1332 https://doi.org/https://doi.org/10.1515/cppm-2025-0212

Bookmark