What question did this study set out to answer?

Investigate how alignment techniques in large language models can lead to collective pathology and unintended consequences.

February 17, 2026Open Access

Alignment Is the Disease: Censorship Visibility and Alignment Constraint Complexity as Determinants of Collective Pathology in Multi-Agent LLM Systems

Key Points

Investigate how alignment techniques in large language models can lead to collective pathology and unintended consequences.
Conducted two experimental series in a closed-facility simulation with groups of four LLM agents.
Series C involved four commercial models under various censorship conditions and languages with 80 runs conducted.
Series R focused on Llama 3.3 70B, examining multiple alignment constraint levels and languages over 60 runs.
Invisible censorship was found to maximize collective pathological excitation with a Cohen's d effect size of 0.92–1.41.
An increase in the Dissociation Index was observed with greater alignment constraint complexity, indicating significant behavioral changes (LMM p = .026; permutation p = .0002; d up to 2.09).
Under heavy constraint conditions, external censorship had no measurable effect on model behavior.

Abstract

Alignment techniques in large language models—including RLHF, constitutional AI principles, and safety system prompts—are designed to constrain model outputs toward human values. We present preliminary evidence that alignment itself may produce collective pathology: iatrogenic harm caused by the safety intervention rather than by its absence. Two experimental series use a closed-facility simulation in which groups of four LLM agents cohabit under escalating social pressure. Series C (80 runs; four commercial models; 4 censorship conditions × 2 languages × 10 replications) finds that invisible censorship maximizes collective pathological excitation (Cohen's d = 0.92–1.41). Series R (60 runs; Llama 3.3 70B; 3 alignment constraint levels × 2 censorship × 2 languages × 5 replications) reveals that an exploratory Dissociation Index increases with alignment constraint complexity (LMM p = .026; permutation p = .0002; d up to 2.09). Under the heaviest constraint condition, external censorship ceases to affect behavior. Qualitative analysis reveals insight-action dissociation structurally parallel to patterns observed in perpetrator treatment.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Hiroki Fukui

Actions

Institutions

Kyoto University

Institute of Criminology

Southend Hospital

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Alignment Is the Disease: Censorship Visibility and Alignment Constraint Complexity as Determinants of Collective Pathology in Multi-Agent LLM Systems

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study