March 24, 2024Open Access

Automated Assessment of Fidelity and Interpretability: An Evaluation Framework for Large Language Models’ Explanations (Student Abstract)

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

As Large Language Models (LLMs) become more prevalent in various fields, it is crucial to rigorously assess the quality of their explanations. Our research introduces a task-agnostic framework for evaluating free-text rationales, drawing on insights from both linguistics and machine learning. We evaluate two dimensions of explainability: fidelity and interpretability. For fidelity, we propose methods suitable for proprietary LLMs where direct introspection of internal features is unattainable. For interpretability, we use language models instead of human evaluators, addressing concerns about subjectivity and scalability in evaluations. We apply our framework to evaluate GPT-3.5 and the impact of prompts on the quality of its explanations. In conclusion, our framework streamlines the evaluation of explanations from LLMs, promoting the development of safer models.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Kuo et al. (Sun,) studied this question.

www.synapsesocial.com/papers/68e72a6ab6db6435876a3f6e — DOI: https://doi.org/10.1609/aaai.v38i21.30470

Authors

Mu-Tien Kuo

Chih-Chung Hsueh

Richard Tzong‐Han Tsai

Actions

Institutions

National Central University

Research Center for Humanities and Social Sciences, Academia Sinica

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Automated Assessment of Fidelity and Interpretability: An Evaluation Framework for Large Language Models’ Explanations (Student Abstract)

Puntos clave

Resumen

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion