This paper examines how commonly used anthropomorphic terms in artificial intelligence—such as “hallucination,” “sycophancy,” “deception,” and “agency”—introduce systematic distortions when used to interpret model behavior. While these terms provide accessible shorthand, they often over-attribute internal states, intentions, or social motivations that are not required to explain how large language models generate outputs. The paper proposes a translation-based framework that treats anthropomorphic language as a source-level approximation requiring systematic mapping to mechanism-level descriptions. It introduces four constructs—Epistemic Drift and Confident Error Under Uncertainty (EDCEU), Preference-Aligned Output Distortion (PAOD), Apparent Goal-Directed Behavior (AGDB), and Output Distortion Under Constraint and Optimization (ODUCO)—that preserve observable behavior while removing unsupported assumptions about intent, cognition, or internal experience. A structured empirical evaluation is included to test whether coherence-driven distortion (ODUCO-B), often labeled as “sycophantic” behavior, can be induced under conditions of explicit agreement pressure and conversational consistency demands. Across both single-turn and multi-turn paradigms, no distortion was observed when ground-truth information was explicitly available, suggesting that such behaviors are not reducible to surface-level agreement or instruction-following alone. The framework reframes alignment challenges as not solely technical, but also linguistic and interpretive. By improving the mapping between descriptive language and underlying generative mechanisms, this work aims to reduce misattribution, improve analytical precision, and support more grounded discourse in AI safety, evaluation, and human–AI interaction.
Building similarity graph...
Analyzing shared references across papers
Loading...
Sara Gianna Roseland
Building similarity graph...
Analyzing shared references across papers
Loading...
Sara Gianna Roseland (Tue,) studied this question.
www.synapsesocial.com/papers/69d896a46c1944d70ce08211 — DOI: https://doi.org/10.5281/zenodo.19474856