Towards Scene-Aware Video-to-Spatial Audio Generation | Synapse