What question did this study set out to answer?

The research aims to evaluate emotional fidelity in machine translation, addressing deficiencies in current assessment metrics.

June 3, 2026Open Access

EMOS: A Multidimensional Emotional Fidelity Evaluation Framework for Machine Translation Quality Assessment

Key Points

The research aims to evaluate emotional fidelity in machine translation, addressing deficiencies in current assessment metrics.
Introduced the Emotion Preservation Score (EMOS) as a reference-free evaluation framework.
Integrated metrics derived from cognitive theory: Vector Similarity Score, Label Match Rate, and Emotional Diversity Ratio.
Validated the framework on the CCL-SEL corpus with translations by DeepL, Google Translate, and GPT-4o.
All systems achieved good emotional fidelity with EMOS scores above 0.75.
GPT-4o had the highest emotional fidelity score at EMOS = 0.780.
EMOS correlated strongly with human judgements (VSS: r = 0.79; LMR: r = 0.76) and detected emotional distortions missed by traditional metrics.

Abstract

Contemporary machine translation (MT) systems have achieved high semantic accuracy but systematically neglect emotional fidelity, a dimension critical for cross-cultural literary communication. From a cognitive computation standpoint, this neglect is not merely a stylistic deficiency: affective signals modulate memory consolidation, attention allocation, and communicative intent, rendering emotional fidelity a core rather than peripheral dimension of translation quality. Existing evaluation metrics—BLEU, METEOR, BERTScore, and COMET—lack explicit modelling of affective content preservation and are fundamentally reference-dependent, requiring high-quality human translations that are costly and often unavailable. To address these limitations, this study introduces the Emotion Preservation Score (EMOS), a theoretically grounded, reference-free, multidimensional evaluation framework rooted in cognitive computation principles, which quantifies emotional fidelity in MT by directly comparing source and target emotion vectors without recourse to reference translations. Grounded in Ekman’s taxonomy of six basic emotions augmented with a neutrality dimension, EMOS integrates three complementary metrics derived from cognitive theory: Vector Similarity Score (VSS), which quantifies distributional similarity between seven-dimensional emotion vectors; Label Match Rate (LMR), which evaluates dominant emotional category preservation consistent with the primacy effect in emotional memory; and Emotional Diversity Ratio (EDR), which assesses emotional complexity retention via Shannon entropy, capturing the higher-order aesthetic processing most susceptible to cross-cultural degradation. Empirical weights (= 0. 50, = 0. 35, = 0. 15) were optimised through correlation analysis against human quality judgements on 500 parallel segments. The framework was validated on the CCL-SEL corpus—comprising 19, 999 classical Chinese–English sentence pairs from nine canonical works—translated by DeepL, Google Translate, and GPT-4o. All three systems achieved good emotional fidelity (EMOS > 0. 75), with GPT-4o demonstrating superior performance (EMOS = 0. 780). EMOS exhibited stronger alignment with human judgements (VSS: r = 0. 79 ; LMR: r = 0. 76) than traditional metrics and successfully detected critical emotional distortions invisible to standard evaluation approaches, thereby establishing emotional fidelity as a distinct, quantifiable, and reference-free dimension of MT quality assessment with direct implications for cognitively plausible translation systems.

Bookmark

View Full Paper

Bookmark

View Full Paper

EMOS: A Multidimensional Emotional Fidelity Evaluation Framework for Machine Translation Quality Assessment

Key Points

Abstract

Cite This Study