What question did this study set out to answer?

This research aims to investigate how reproducibility affects real users in information retrieval systems.

May 10, 2026

Now that your System has been Reproduced, What does this Mean for the Users?

Key Points

This research aims to investigate how reproducibility affects real users in information retrieval systems.
Evaluated and compared the reproducibility of an information retrieval system both offline and online.
Conducted a between-subjects online experiment with 280 participants to collect click data.
Generated and analyzed a variety of reproduced systems with varying parameters to assess their reproducibility.
Users do not notice moderate differences in system reproducibility.
Significant variations from the original system affect user perception of performance.
Findings suggest inconsistencies with the click model results, indicating a need for improved models.

Abstract

Reproducibility lies at the basis of the empirical method: a novel approach will be widely adopted if its experimental results can be validated and reproduced by the community. Previous work on reproducibility in Information Retrieval (IR) has mainly addressed the reproducibility and replicability of offline experiments, with a few exceptions that replicate user studies. To the best of our knowledge, no previous work has investigated how reproducibility affects real users. In this paper, we do that by evaluating and comparing the reproducibility of an IR system both offline and online. We consider a reference system and generate a constellation of reproduced systems with varying parameters. We select \(6\) systems with different degrees of offline reproducibility. We then run a between-subjects online experiment with \(280\) participants and collect clicks to evaluate online reproducibility. Results show that real users do not perceive moderate variations of the reproducibility degree of systems, while they become relevant when the difference with the original system increases. Furthermore, we trained a click model to evaluate online reproducibility with simulated clicks. Results are not consistent with those from the user study, suggesting that better click models are needed to evaluate online reproducibility. Our data and source code is publicly available: https://github.com/angelogeninatti/reproducibilityLogs

Bookmark

Now that your System has been Reproduced, What does this Mean for the Users?

Key Points

Abstract

Cite This Study