What question did this study set out to answer?

This research aims to improve trustworthiness in natural-language data querying by addressing verification signal shortcomings.

April 1, 2026Open Access

Heterogeneous Computational Replication for Trustworthy Natural-Language Data Querying

Key Points

This research aims to improve trustworthiness in natural-language data querying by addressing verification signal shortcomings.
Proposed a heterogeneous computational replication architecture for querying.
Facilitated multiple agents to handle a query through distinct computational paths.
Introduced a semantic manifest to align agents on business terminology and schema constraints.
Developed systems to classify disagreement types to aid in resolution.
The architecture minimizes correlated failure modes by reducing reliance on single model outputs.
Quantitative benchmarking of silent failure rates is suggested for future validation.
Cross-agent disagreement is framed as a valuable signal for identifying computational failures.

Abstract

Large language models are increasingly used to translate natural-language questions intoexecutable data queries. While these systems have improved substantially through schemalinking, retrieval augmentation, execution feedback, and self-correction, a structural trust gapremains: most verification signals are generated within the same model family or along thesame computational path as the original answer. This creates correlated failure modes,especially in high-stakes settings where plausible but incorrect outputs are more dangerousthan explicit failures.We propose heterogeneous computational replication as an architectural principle for naturallanguage data querying. Instead of relying on a single model-to-query pipeline, our approachdispatches the same semantically normalised question to multiple agents, each generating andexecuting a query through a different computational path. Candidate execution systems includeSQL against a relational database, PySpark over a distributed compute layer, R through an inmemory analytical engine, and DAX against a Power BI VertiPaq semantic model. Thearchitecture is engine-agnostic – the selection criterion is that chosen systems must havesufficiently distinct computational lineage. A comparator node collects all agent outputs, scoresconfidence based on the degree of agreement, and where agents diverge, classifies the type ofdisagreement to guide resolution. The agents are not statistically independent in the strictsense but are designed to reduce correlated failure modes across heterogeneous executionpaths – a meaningful weaker claim that remains practically useful.We further introduce a semantic manifest, a structured JSON contract dispatched to all agentsbefore query generation, that pre-resolves business terminology, schema constraints, anddomain definitions. This pre-dispatch alignment transforms disagreement from a noise signalinto a diagnostic one: when all agents operate from identical semantic premises, cross-agentdivergence indicates computational failure rather than interpretive ambiguity.The soul of this architecture is epistemic honesty. A single-agent system that returns a wronganswer is indistinguishable from one that returns a right answer – both produce a number, andboth look correct. Our system surfaces uncertainty rather than hiding it. When agents disagree,the system does not guess which is right – it flags that confidence is low and escalates forhuman review. We frame cross-agent disagreement as a first-class signal: not a failure of thesystem, but the system working as intended. A structured taxonomy of disagreement types withdistinct resolution paths makes this signal actionable rather than merely informative. This workis a conceptual and architectural proposal; empirical validation through a benchmarkingprotocol designed to measure silent failure rate is outlined as future work.

Heterogeneous Computational Replication for Trustworthy Natural-Language Data Querying

Key Points

Abstract

Cite This Study