Large language models (LLMs) have substantially advanced multilingual translation, yet their computational cost limits deployment in resource-constrained scenarios and latency-critical applications. Traditional Transformer-based neural machine translation (NMT) systems remain valuable in these settings. In research on improving the efficiency of the Transformer, linearization techniques that reduce the time complexity of the Transformer to O(N) only show advantages for very long texts. To address this gap, we propose a lightweight architecture, Gated Transformer with Shallow Decoder (GTSD), designed specifically for low-cost and short-text translation. The proposed method employs a gating mechanism to fuse attention and the feed-forward network (FFN), optimizes the redundant cross-attention resulting from the transformation from multi-head to single-head, and adopts a deep encoder-shallow decoder architecture. Furthermore, the proposed method supports simple cost reduction with minimal loss in accuracy. Finally, a series of experiments conducted on WMT14 en-de and WMT14 en-fr datasets demonstrate the proposed approach attains a strong efficiency–performance trade-off.
Building similarity graph...
Analyzing shared references across papers
Loading...
Li et al. (Fri,) studied this question.
www.synapsesocial.com/papers/69eefc6dfede9185760d36fa — DOI: https://doi.org/10.1038/s41598-026-49583-z
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:
Fangfang Li
Fengjing Yin
Shirui Deng
Scientific Reports
Central South University
National University of Defense Technology
Changsha University
Building similarity graph...
Analyzing shared references across papers
Loading...