What question did this study set out to answer?

The aim is to investigate failure modes in context compression of large language models and their impact on task validity.

February 28, 2026Open Access

Validity Mirage: Context Compression Failure Modes in LLMs

Key Points

The aim is to investigate failure modes in context compression of large language models and their impact on task validity.
Development of tropical semiring algebra for context measurement
Empirical validation across five large language model architectures
Evaluation with over 11,400 boundary instances and 4,200 streaming trials
Testing against 13 real incident graphs
Naive context compression can lead to incorrect task responses despite accurate surface-level answers
Retention policies based on structural guarding effectively mitigate issues of pivot drift
The proposed algebra measures context health significantly better than recency-based approaches

Abstract

This archive presents five working papers on context compression failure modes in large language models. The central finding is the validity mirage: naive context compression can preserve surface-level answer correctness while silently substituting the governing hypothesis, causing a model to answer confidently about the wrong task. We develop a tropical semiring algebra (max-plus over ℝ ∪ −∞) for measuring context health under compression, and show that structurally guarded retention policies eliminate pivot drift where recency-based baselines fail completely. Empirical validation spans five open-weight model architectures (Llama 3. 1 8B, Mistral 7B v0. 3, Gemma 2 9B, Phi-3 Medium 14B, Qwen 2. 5 14B) across 11, 400+ boundary instances and 4, 200+ streaming trials, with additional testing against 13 real incident graphs (12 NTSB aviation investigations and the Knight Capital 2012 trading failure). A production MCP server implementation is available separately. Included papers: Paper 00: Continuous Control and Structural Regularization in Multi-Agent Narrative ExtractionPaper 01: Absorbing States in Greedy SearchPaper 02: Streaming Oscillation Traps in Endogenous-Pivot Sequential ExtractionPaper 03: The Validity Mirage: Context Algebra for Endogenous Semantics under Memory CompressionPaper I: Tropical Algebra of Endogenous-Pivot Semantics Reproducible validation artifacts and benchmark outputs are included in the results/ directory. All papers are working paper first drafts distributed under CC-BY 4. 0.

Validity Mirage: Context Compression Failure Modes in LLMs

Key Points

Abstract

Cite This Study