Multimodal large language models and physics visual tasks: comparative analysis of performance and costs | Synapse