What does this research mean for the field?

Entity-level defense mechanisms improve AI decision protection, achieving a pass rate of 97.3% ± 1.2% under entity-weighted data augmentation. Novelty: ClaimNovelty.NOVEL_FINDING. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

This research aims to evaluate the effectiveness of advanced entity-level defense mechanisms for improving AI decision protection.

February 28, 2026Open Access

Beyond Distribution: Empirical Validation of Entity-Level Defense in Multi-Model AI Decision Protection

Key Points

This research aims to evaluate the effectiveness of advanced entity-level defense mechanisms for improving AI decision protection.
Conducted experiments using 50 queries across 7 AI models and 3 runs.
Introduced Semantic Surrogate for entity replacement and entity-weighted data augmentation.
Analyzed effectiveness through L2+L3 defense strategies and McNemar's exact test.
Achieved a text-based DA pass rate of 93.3% with L2+L3 defense.
Entity-weighted DA reached a pass rate of 97.3%.
Naive redaction demonstrated significant anti-defense effects with only a 54.7% pass rate.

Abstract

The Distribution Hypothesis (Chang, 2026) established that controlled fragment allocation — not fragmentation alone — determines AI decision protection, achieving 81.3% ± 3.1% pass rate under collaborative multi-model reconstruction attacks. This paper presents empirical evidence from 50 queries across 7 frontier AI models over 3 runs with fixed random seed (seed=42) addressing two open questions: whether additional defense layers can push protection above 90%, and whether text similarity accurately measures protection when defense mechanisms operate at the entity level. We introduce Semantic Surrogate — entity replacement with plausible fiction — and entity-weighted DA. Under L2+L3 defense, text-based DA pass rate improves to 93.3% ± 1.2% (McNemar's exact test, p < 0.001), with entity recovery dropping to 0.023 ± 0.006. Entity-weighted DA reaches 97.3% ± 1.2%. Ablation baselines reveal that naive redaction is an anti-defense (54.7%, −20.0pp vs baseline), while response utility analysis shows Semantic Surrogate is the only method satisfying both defense and utility feasibility constraints. We identify domain vocabulary leakage as a boundary condition requiring behavioral-layer defense (MSBA). Version 2.4.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Yuchia Chang

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Beyond Distribution: Empirical Validation of Entity-Level Defense in Multi-Model AI Decision Protection

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider