Key points are not available for this paper at this time.
In this paper, we present an exploration of LLMs' abilities to problem solve with physical reasoning in situated environments. We construct a simple simulated environment and demonstrate examples of where, in a zero-shot setting, both text and multimodal LLMs display atomic world knowledge about various objects but fail to compose this knowledge in correct solutions for an object manipulation and placement task. We also use BLIP, a vision-language model trained with more sophisticated cross-modal attention, to identify cases relevant to object physical properties that that model fails to ground. Finally, we present a procedure for discovering the relevant properties of objects in the environment and propose a method to distill this knowledge back into the LLM.
Building similarity graph...
Analyzing shared references across papers
Loading...
Ghaffari et al. (Mon,) studied this question.
www.synapsesocial.com/papers/68e694bdb6db64358761b660 — DOI: https://doi.org/10.1609/aaaiss.v3i1.31189
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:
Sadaf Ghaffari
Nikhil Krishnaswamy
Proceedings of the AAAI Symposium Series
Colorado State University
Building similarity graph...
Analyzing shared references across papers
Loading...