Aligning Large Vision-Language Models by Deep Reinforcement Learning and Direct Preference Optimization | Synapse