Contemporary machine translation (MT) systems have achieved high semantic accuracy but systematically neglect emotional fidelity, a dimension critical for cross-cultural literary communication. From a cognitive computation standpoint, this neglect is not merely a stylistic deficiency: affective signals modulate memory consolidation, attention allocation, and communicative intent, rendering emotional fidelity a core rather than peripheral dimension of translation quality. Existing evaluation metrics—BLEU, METEOR, BERTScore, and COMET—lack explicit modelling of affective content preservation and are fundamentally reference-dependent, requiring high-quality human translations that are costly and often unavailable. To address these limitations, this study introduces the Emotion Preservation Score (EMOS), a theoretically grounded, reference-free, multidimensional evaluation framework rooted in cognitive computation principles, which quantifies emotional fidelity in MT by directly comparing source and target emotion vectors without recourse to reference translations. Grounded in Ekman’s taxonomy of six basic emotions augmented with a neutrality dimension, EMOS integrates three complementary metrics derived from cognitive theory: Vector Similarity Score (VSS), which quantifies distributional similarity between seven-dimensional emotion vectors; Label Match Rate (LMR), which evaluates dominant emotional category preservation consistent with the primacy effect in emotional memory; and Emotional Diversity Ratio (EDR), which assesses emotional complexity retention via Shannon entropy, capturing the higher-order aesthetic processing most susceptible to cross-cultural degradation. Empirical weights (= 0. 50, = 0. 35, = 0. 15) were optimised through correlation analysis against human quality judgements on 500 parallel segments. The framework was validated on the CCL-SEL corpus—comprising 19, 999 classical Chinese–English sentence pairs from nine canonical works—translated by DeepL, Google Translate, and GPT-4o. All three systems achieved good emotional fidelity (EMOS > 0. 75), with GPT-4o demonstrating superior performance (EMOS = 0. 780). EMOS exhibited stronger alignment with human judgements (VSS: r = 0. 79 ; LMR: r = 0. 76) than traditional metrics and successfully detected critical emotional distortions invisible to standard evaluation approaches, thereby establishing emotional fidelity as a distinct, quantifiable, and reference-free dimension of MT quality assessment with direct implications for cognitively plausible translation systems.
Zhou et al. (Mon,) studied this question.