Traditional auditing methods suffer from delayed responses, low resource utilization, and high misjudgment risks in dynamic business environments. This paper proposes an enterprise intelligent auditing process optimization and automated decision-making method integrating Reinforcement Learning (RL). The approach models auditing decisions as a Markov Decision Process (MDP), constructs a multi-objective reward function encompassing risk, cost, and efficiency, and designs a policy learning algorithm based on Deep Q-Network (DQN) to achieve dynamic allocation of auditing resources and path optimization. The system framework comprises a data perception layer, RL decision core layer, and application interaction layer, supporting real-time feedback and expert intervention to form a closed-loop mechanism of “continuous monitoring and self-evolution.” Experiments using real enterprise supply chain data demonstrate that this method significantly outperforms rule-based systems and supervised learning models in terms of precision (52.7%) and F1-Score (63.0%), while reducing audit resource consumption to 5.1%. This validates its advantages in balancing accuracy, efficiency, and cost. The research provides a practical technical pathway for the intelligent transformation of auditing, empowering enterprises to build proactive, efficient, and adaptive risk control systems.
Cai Li (Sun,) studied this question.