Cross-modal spatio-temporal fusion weakly supervised video anomaly detection based on large-scale vision-language models | Synapse