Deep neural networks (DNNs) dominate text classification but suffer from high computational costs and poor generalization in data-scarce or Out-of-Distribution (OOD) environments. Conversely, non-parametric methods like compression-based offer robustness but incur prohibitive inference latency due to the reliance on exhaustive pairwise comparisons. To bridge this gap, this study proposes EGTJ, a training-free framework that introduces a novel retrieval-augmented compression architecture. Unlike prior works that apply similarity metrics in isolation, EGTJ utilizes an inverted-index pre-filtering mechanism to dynamically constrain the comparison scope, effectively reducing algorithmic complexity from linear to constant time relative to the training set size. Furthermore, a tri-metric fusion strategy is introduced that integrates information-theoretic (gzip), lexical (TF-IDF), and structural (Jaccard) similarities to mitigate the inherent biases of individual metrics. Experimental results across five in-distribution and four OOD datasets demonstrate that EGTJ achieves superior accuracy over all baseline methods—notably outperforming BERT by over 30% in 5-shot OOD scenarios—while simultaneously slashing inference latency by orders of magnitude compared to standard compression-based approaches. These findings present EGTJ as a scalable, high-performance alternative for resource-constrained NLP, effectively solving the scalability bottleneck of non-parametric classification.
Lv et al. (Fri,) studied this question.