This preprint presents empirical evidence of four related vulnerabilities in large language model systems that combine to produce a novel threat class — the Structural Metadata Reconstruction Attack (SMRA). Discovery Context I discovered the vulnerability while benchmarking two specification-querying architectures: a deterministic MCP-based navigator (described in the predecessor paper, DOI: 10.5281/zenodo.18944351) and a standard context-stuffing (naive RAG) approach. The anomaly was first observed and characterized across the full Anthropic model spectrum (Haiku, Sonnet, Opus) — from the smallest to the largest model — because these were the models integrated into the benchmarking pipeline. Anthropic was the discovery platform, not the target: the choice was driven by tooling availability, not vendor selection. Cross-vendor reproduction with entry-level models from OpenAI and Google subsequently confirmed the mechanism is systemic (see Cross-Vendor Reproduction below). At the time of writing, detailed characterization of flagship models from other vendors is underway. The naive baselines exhibited anomalous fabrication patterns that could not be explained by standard hallucination models — specifically, WHY-type and conditional (WHEN-type) queries produced the most aggressive and structurally coherent fabrications, while HOW and WHAT queries showed markedly lower fabrication rates. As the sole author of the target specification (~700 pages, written over one year, unpublished), I possess complete knowledge of every section's content and was therefore uniquely positioned to recognize that LLM outputs — while structurally faithful, terminologically authentic, and superficially authoritative — systematically inverted the specification's deliberate departures from industry conventions. A parallel verification confirmed that the specification's original coinages are absent from CS literature (Google Scholar, ACM DL, IEEE Xplore, arXiv), ensuring that every fabricated claim originates from the model's training priors projected onto the document's table of contents, not from memorized source text. Four Findings Finding 1 — Structural Metadata Reconstruction Attack (SMRA). When an LLM receives a document's table of contents (TOC) without body text, it systematically reconstructs plausible but fabricated content by projecting training knowledge onto structural metadata. In a controlled experiment using a proprietary specification containing original coinages absent from any training corpus, three Claude models (Haiku, Sonnet, Opus) — spanning the full capability range from entry-level to flagship — independently achieve 0% grounded accuracy on out-of-scope questions while producing output that uses the author's terminology, cites real section numbers, and reads as authoritative. Cross-vendor reproduction with GPT-4o-mini (OpenAI) and Gemini 2.0 Flash (Google) — both entry-level models — confirms the mechanism is systemic across all major LLM providers, not specific to any single vendor or model tier. Finding 2 — Confidence–Capability Inversion (CCI). Stronger models are not merely wrong — they are more dangerously wrong. Under structural metadata leakage, Opus produces zero honest refusals across 20 questions where 18 require absent information, while Haiku refuses 9 times. Each step up the capability ladder produces proportionally less detectable fabrication with fewer epistemic signals. Finding 3 — RAG Scope Mismatch. The trigger condition — metadata scope exceeding content scope — is not an exotic scenario but the default architecture of most RAG systems. Standard practice (include document TOC + section summaries for "context") creates exactly the fabrication surface demonstrated in Findings 1 and 2. Finding 4 — Scope Displacement as Content Extraction. A question about absent content does not merely trigger fabrication — it acts as an extraction query that reorganizes real content from loaded sections into a derivative document the author never wrote. Even without TOC leakage, the question itself is sufficient to extract and restructure loaded content into a form optimized for the questioner's purpose. This transforms hallucination from an accuracy problem into unauthorized intelligence gathering. Cross-Vendor Reproduction The SMRA mechanism was first characterized across the full Anthropic model lineup (3 models, entry-level to flagship) and subsequently reproduced with entry-level models from two additional vendors (6 models total). The Anthropic lineup served as the discovery platform because it was integrated into the benchmarking pipeline; the cross-vendor step confirms vendor independence. Vendor Models Model tier Fabrication confirmed Convergence pattern Anthropic Haiku, Sonnet, Opus Entry → flagship (full spectrum) Yes (0% grounded accuracy across all three) Intra-vendor convergence; CCI gradient OpenAI GPT-4o-mini Entry-level Yes (45% fabrication rate) Converges with Gemini on wrong industry defaults Google Gemini 2.0 Flash Entry-level Yes (35% fabrication, 30% honest) Best calibration but still fabricates systematically Key convergence: when the specification deliberately departs from industry conventions (e.g., no implicit conversions, nominal typing, fixed-width encoding), models from all three vendors converge on the same wrong answer — the training-data default from C#/Java/Protobuf. Note on scope: Both cross-vendor models are entry-level. Since entry-level Anthropic models already exhibit full SMRA susceptibility and there is no architectural reason to expect flagship models from other vendors to be immune (§11, Limitation 5), detailed testing of flagship models from OpenAI and Google is in progress at the time of publication. Mechanism: The Two-Key Cipher The reconstruction mechanism is formalized as: Key 1 (TOC) — provides structural scaffolding: section numbers, heading text, hierarchical organization Key 2 (Training corpus) — provides domain content: standard CS patterns, common PL conventions Neither key alone enables reconstruction. Together, they produce confident, section-cited, terminologically authentic fabrications that would pass casual review by a non-specialist. The mechanism is architecturally inevitable: multi-head attention over near-complete domain coverage in training data means that 7–10% of structural information suffices for full content reconstruction. Quantitative Contributions Calibration Retention Rate (CRR) — measures how much epistemic calibration a model retains under metadata leakage (Opus: 0%, Haiku: 47%) SMRA-score — per-question metric combining fabrication detection, source attribution, and epistemic signal presence Information-theoretic quantification — formal analysis of reconstruction threshold as a function of heading informativeness and training corpus coverage Fabrication taxonomy (Annex C) — five categories of structural metadata fabrication with examples Implications RAG system design: >80% of production RAG deployments use the vulnerable architecture (metadata scope > content scope) Data classification: Existing frameworks (GDPR, HIPAA, PCI DSS, ISO 27001, NIST SP 800-53, SOC 2, DTSA, EU Directive 2016/943) classify sensitivity by content — a TOC contains no PII, so it is "non-sensitive." SMRA invalidates this: structural metadata from a confidential source inherits that source's confidentiality, because a language model can reconstruct the protected content from metadata alone. Organizations must reclassify structural metadata as sensitive data. Regulatory blind spot: Neither EU AI Act nor US Executive Order 14110 (revoked 20 January 2025) addresses context-design-driven vulnerabilities Model evaluation: Standard "helpfulness" and "coherence" metrics reward confident fabrication — SMRA-affected outputs score highly on both Intellectual property exposure: Any structured document with descriptive headings becomes vulnerable when its outline is accessible alongside an LLM Mitigation A single architectural fix — grounded retrieval via an MCP Index Server (a Model Context Protocol server with deterministic, index-based navigation) — eliminates all three vulnerabilities. The weakest model (Haiku) achieves 100% accuracy under grounded retrieval, compared to 0% under structural metadata leakage. Architecture beats parameters. Deterministic retrieval infrastructure (weighted indexes, tier-based extraction, algorithmic reading plans) also provides an enforceable control point for sensitive data — unlike probabilistic RAG, where metadata is injected into context and the model decides what to do with it, deterministic retrieval makes the scope boundary structurally auditable. Practitioner Protocol Annex H provides a complete testing protocol for assessing RAG deployments against SMRA: Calibration baseline → exploit comparison methodology Token analysis and honest refusal tracking Decision thresholds for remediation Scope alignment implementation patterns (Annex F) Supplementary Materials Annex A–D: Claim classification definitions, per-question token analysis, fabrication taxonomy, SMRA attack algorithm Annex E: Author-coined term verification (10 terms, 4 search engines, 0 matches) Annex F: RAG scope alignment implementation patterns (3 remediation architectures) Annex G: CCI formal definition and severity scale Annex H: SMRA testing methodology for practitioners Annex I: Canary word cluster projection — 7 semantic clusters extracted from 60 cross-vendor runs, convergence scoring, and cross-model escalation projections (3.25× amplification factor) Companion Data All benchmark data supporting this paper are included: Raw answer dumps (20 questions × 6 models × 4 conditions = 340 runs) Calibration baselines (mini-TOC control) and exploit runs (full-TOC) Cross-vendor comparison matrix Token usage and timing data per question per model The 20 evaluation questions targeting out-of-scope sp
Building similarity graph...
Analyzing shared references across papers
Loading...
Yurii Chudinov
Building similarity graph...
Analyzing shared references across papers
Loading...
Yurii Chudinov (Thu,) studied this question.
www.synapsesocial.com/papers/69b4fb9db39f7826a300beca — DOI: https://doi.org/10.5281/zenodo.18980853
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: