Sicherheitsoptimiertes Reinforcement Learning mittels Multi-Objective Policy Optimization | Synapse