What question did this study set out to answer?

The aim is to ensure that AI-generated operations are both structurally valid and semantically justified before execution.

June 1, 2026Open Access

The Grounding Gate: Admissibility and Replay Guarantees for AI-Driven Research

Key Points

The aim is to ensure that AI-generated operations are both structurally valid and semantically justified before execution.
Developed a grounding gate for mandatory admissibility checks between AI proposals and execution.
Evaluated 30 prompts across 5 categories to analyze valid but unwarranted capabilities.
Measured execution efficiency and fault localization using an 8-layer execution hash.
Grounded pipeline reduced valid-but-unwarranted capabilities from 23.3% to 10.0% (Fisher exact p = 0.027).
Completely eliminated unwarranted capabilities on undiscoverable prompts (100% to 0%).
All repeated executions yielded bit-identical hashes across 50 runs, with grounding overhead below 14 ms.

Abstract

AI systems that generate computational pipelines from natural language may propose operations that are structurally valid but semantically unwarranted—the operation exists, but the user’s request does not justify it. Schema validation catches malformed proposals; it does not catch valid-but-wrong ones. We present a grounding gate: a mandatory admissibility boundary between AI-proposed operations and deterministic execution. The system discovers which capabilities match the user’s terms by querying a live registry (236 capabilities), and a deterministic grounding function verifies that every name in the proposal has evidence in the discovery result. Names lacking evidence are rejected before execution, even if they name real capabilities. Admitted proposals execute deterministically, producing an 8-layer execution hash that decomposes end-to-end provenance into distinct semantic layers for fault localization without re-execution. The capability registry separates description from identity: three layers (semantic, algebraic, implementation) determine a capability’s hash, while discovery metadata (aliases, tags) does not, allowing the registry to improve discoverability without invalidating prior execution hashes. We evaluate on 30 prompts across 5 categories (4 strategy families, 9 metrics, 36 valid combinations). An unconstrained pipeline executes valid-but-unwarranted capabilities at 23.3%; the grounded pipeline reduces this to 10.0% (Fisher exact p = 0.027), eliminating them entirely on undiscoverable prompts (100%→0%). On adversarial prompts exploiting discovery-alias gaps, the grounded pipeline has a higher failure rate—a tradeoff inherent to constraining the admissible set. Repeated executions produce bit-identical hashes across all 50 runs (5 configurations × 10 repetitions). Grounding overhead is under 14 ms.

The Grounding Gate: Admissibility and Replay Guarantees for AI-Driven Research

Key Points

Abstract

Cite This Study