Evaluating large language models in medical applications: a survey | Synapse