July 24, 2024Open Access

Behavioral Testing: Can Large Language Models Implicitly Resolve Ambiguous Entities?

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

One of the major aspects contributing to the striking performance of large language models (LLMs) is the vast amount of factual knowledge accumulated during pre-training. Yet, many LLMs suffer from self-inconsistency, which raises doubts about their trustworthiness and reliability. In this paper, we focus on entity type ambiguity and analyze current state-of-the-art LLMs for their proficiency and consistency in applying their factual knowledge when prompted for entities under ambiguity. To do so, we propose an evaluation protocol that disentangles knowing from applying knowledge, and test state-of-the-art LLMs on 49 entities. Our experiments reveal that LLMs perform poorly with ambiguous prompts, achieving only 80% accuracy. Our results further demonstrate systematic discrepancies in LLM behavior and their failure to consistently apply information, indicating that the models can exhibit knowledge without being able to utilize it, significant biases for preferred readings, as well as self inconsistencies. Our study highlights the importance of handling entity ambiguity in future for more trustworthy LLMs

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Sedova et al. (Wed,) studied this question.

www.synapsesocial.com/papers/68e5f50bb6db643587589798 — DOI: https://doi.org/10.48550/arxiv.2407.17125

Authors

Anastasiia Sedova

Robert Litschko

Diego Frassinelli

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Behavioral Testing: Can Large Language Models Implicitly Resolve Ambiguous Entities?

Puntos clave

Resumen

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Also consider