Retrieval-Augmented Generation (RAG) incorporates externally retrieved evidence to support generation and has been widely used to mitigate hallucinations in large language models (LLMs). In real-world settings, long-form evidence makes it difficult to jointly encode global semantics and salient elements, leading retrieval to favor topical similarity over factual consistency. To address this issue, we propose TM-RAG, which couples a Transformer with Mamba and introduces a CAGF dynamic feature fusion module to enhance long-range dependency modeling and global semantic representation. We further design a multi-level contrastive learning objective—sentence-level, slot-level, and token-level masked-recovery contrastive learning—to strengthen global semantic alignment and fine-grained factual modeling. Experiments demonstrate that TM-RAG delivers stable improvements on the Chinese Zuo Zongtang historical dataset as well as HotpotQA, MuSiQue and SQuAD benchmarks; on the Zuo Zongtang dataset, the generation F1 increases from 0.5376 to 0.551 and BLEU from 0.6491 to 0.6634, validating the effectiveness of the proposed method.
Hu et al. (Sat,) studied this question.