Do Multimodal Large Language Models and Humans Ground Language Similarly? | Synapse