How well do vision models understand tasks with multiple labels? | Synapse