We investigate the performance of Large Language Model (LLM)-based zero-shot stance detection on tweets. Using FlanT5-XXL, an instruction-tuned open-source LLM, with the SemEval 2016 Tasks 6A, 6B, and P-Stance datasets, we analyze how its performance varies under different prompts and decoding strategies, as well as potential model biases. We show that the zero-shot approach can match or outperform state-of-the-art methods, including fine-tuned models. Additionally, we provide practical insights into its performance, including sensitivity to instructions and prompts, decoding strategies, prompt perplexity, and the role of negations and oppositions. We ensure that the LLM has not been trained on test datasets and identify a positivity bias that may partially explain performance differences across decoding strategies. Finally, we conduct a qualitative analysis of cases where the LLM consistently fails, uncovering questionable ground truth labels and an overconfidence in assigning a stance when none exists. In sum, we provide an in-depth case study of using an LLM for a stance detection task, which can serve as a guide for practitioners seeking to leverage LLMs for similar tasks and use cases.
Building similarity graph...
Analyzing shared references across papers
Loading...
Rachith Aiyappa
Indiana University Bloomington
Shruthi Senthilmani
Indiana University Bloomington
Jisun An
Indiana University Bloomington
PeerJ Computer Science
University of Virginia
Indiana University Bloomington
Building similarity graph...
Analyzing shared references across papers
Loading...
Aiyappa et al. (Thu,) studied this question.
synapsesocial.com/papers/699011602ccff479cfe580ec — DOI: https://doi.org/10.7717/peerj-cs.3540