Data augmentation is an effective technique for tackling data sparsity in sequential recommendation (SR). Existing methods generate new data during the model training to improve the performance. However, deploying them on a backbone model requires retraining, architecture modification, or introducing additional modules and learnable parameters. These processes are time-consuming and costly for well-trained models, especially when the model and data scales become large. In this work, we explore the test-time augmentation (TTA) for SR, which augments the input sequences during the inference phase and then fuses the model's predictions to improve final accuracy. It avoids the significant overhead associated with training-time augmentation. We first experimentally examine the potential of existing augmentation operators for TTA and find that the Substitute and Mask consistently achieve better performance. Further analysis reveals that these two operators retain the original sequential pattern while adding appropriate perturbations. Moreover, the random selection of augmentation positions creates suitable augmented samples from both semantic and temporal perspectives. Meanwhile, we find that the fixed operation ratio limits the diversity of augmented data, and the TTA may impair the model's performance on long sequences. In addition, the two operators still face time-consuming similarity-based item selection or interference from mask tokens. Based on the analysis and limitations, we present TNoise and TMask. The former injects uniform noise into the representation, avoiding the computational overhead of item selection. The latter blocks mask tokens from participating in model calculations (TMask-B) or directly removes interactions that should have been replaced with mask tokens (TMask-R). Further, we sample the augmentation ratio from a uniform distribution to improve the data diversity. For short sequences, we introduce a sequence smoothing and lengthening method based on inter-item interpolation. For long sequences, we set a threshold to avoid the negative effects of TTA. Comprehensive experiments demonstrate the effectiveness, efficiency, and generalizability of our method.
Dang et al. (Thu,) studied this question.