Pixels to Phrases: Evolution of Vision Language Models | Synapse