Unsupervised multimodal emotion-unified representation learning with dual-level language-driven cross-modal emotion alignment | Synapse