Tourism researchers rely heavily on self-report data. The validity of their insights depends on reliability of measures, yet conventional test–retest reliability (1) overlooks item-level stability by relying on aggregate scale coefficients that can mask unstable questions; and (2) ignores how response options affect stability. We introduce the Item-Level Stability Protocol, which evaluates item-level test–retest reliability across answer options. We demonstrate its value using two-wave longitudinal survey data ( N = 3193). Results show test–retest reliabilities can be substantially increased; an internal validation experiment achieves on average 0.10 higher Pearson correlation values. The new protocol is paradigm-agnostic, complements existing psychometric methods, and is simple to implement. It helps tourism scholars and practitioners identify optimal survey questions and response options for increased reliability. • Tourism research relies on high quality self-report data for insights. • Current test–retest reliability methods have two key limitations. • A new protocol assesses item-level reliability and answer option effects. • It can be implemented easily with a small longitudinal sample. • It helps optimise items and answer options to improve reliability.
Xiang et al. (Sat,) studied this question.