Localizing Events in Videos with Multimodal Queries | Synapse