Indirect Online Preference Optimization via Reinforcement Learning | Synapse