Text guided cross attentive multimodal learning with visual feature modulation for automated skin lesion detection | Synapse