What does this research mean for the field?

EGTJ achieves superior accuracy in text classification while significantly reducing inference latency compared to standard compression-based methods, particularly in resource-limited environments. Novelty: ClaimNovelty.NOVEL_FINDING. Consensus alignment: ConsensusAlignment.CHALLENGES_CONSENSUS.

What question did this study set out to answer?

The study aims to propose a training-free framework, EGTJ, to enhance text classification efficiency in resource-limited scenarios.

February 28, 2026Open Access

EGTJ: An Unsupervised and Non-Parametric Approach for Efficient Text Classification Under Resource-Limited Environments

Key Points

The study aims to propose a training-free framework, EGTJ, to enhance text classification efficiency in resource-limited scenarios.
Developed a retrieval-augmented compression architecture without training.
Implemented an inverted-index pre-filtering mechanism to reduce comparison complexity.
Introduced a tri-metric fusion strategy to combine different similarity metrics.
EGTJ achieved over 30% higher accuracy than BERT in 5-shot out-of-distribution scenarios.
Reduced inference latency significantly compared to standard compression methods across various datasets.
Demonstrated superior performance on five in-distribution and four out-of-distribution datasets.

Abstract

Deep neural networks (DNNs) dominate text classification but suffer from high computational costs and poor generalization in data-scarce or Out-of-Distribution (OOD) environments. Conversely, non-parametric methods like compression-based offer robustness but incur prohibitive inference latency due to the reliance on exhaustive pairwise comparisons. To bridge this gap, this study proposes EGTJ, a training-free framework that introduces a novel retrieval-augmented compression architecture. Unlike prior works that apply similarity metrics in isolation, EGTJ utilizes an inverted-index pre-filtering mechanism to dynamically constrain the comparison scope, effectively reducing algorithmic complexity from linear to constant time relative to the training set size. Furthermore, a tri-metric fusion strategy is introduced that integrates information-theoretic (gzip), lexical (TF-IDF), and structural (Jaccard) similarities to mitigate the inherent biases of individual metrics. Experimental results across five in-distribution and four OOD datasets demonstrate that EGTJ achieves superior accuracy over all baseline methods—notably outperforming BERT by over 30% in 5-shot OOD scenarios—while simultaneously slashing inference latency by orders of magnitude compared to standard compression-based approaches. These findings present EGTJ as a scalable, high-performance alternative for resource-constrained NLP, effectively solving the scalability bottleneck of non-parametric classification.

EGTJ: An Unsupervised and Non-Parametric Approach for Efficient Text Classification Under Resource-Limited Environments

Key Points

Abstract

Cite This Study