Cross-modal learning with multi-modal model for video action recognition based on adaptive weight training | Synapse