Latent variable modeling in multi-agent reinforcement learning via expectation-maximization for UAV-based wildlife protection | Synapse