Policy learning under constraint: Maximizing a primary outcome while controlling an adverse event | Synapse