Audio-visual training for improved grounding in video-text LLMs | Synapse