Large language models are increasingly used to translate natural-language questions intoexecutable data queries. While these systems have improved substantially through schemalinking, retrieval augmentation, execution feedback, and self-correction, a structural trust gapremains: most verification signals are generated within the same model family or along thesame computational path as the original answer. This creates correlated failure modes,especially in high-stakes settings where plausible but incorrect outputs are more dangerousthan explicit failures.We propose heterogeneous computational replication as an architectural principle for naturallanguage data querying. Instead of relying on a single model-to-query pipeline, our approachdispatches the same semantically normalised question to multiple agents, each generating andexecuting a query through a different computational path. Candidate execution systems includeSQL against a relational database, PySpark over a distributed compute layer, R through an inmemory analytical engine, and DAX against a Power BI VertiPaq semantic model. Thearchitecture is engine-agnostic – the selection criterion is that chosen systems must havesufficiently distinct computational lineage. A comparator node collects all agent outputs, scoresconfidence based on the degree of agreement, and where agents diverge, classifies the type ofdisagreement to guide resolution. The agents are not statistically independent in the strictsense but are designed to reduce correlated failure modes across heterogeneous executionpaths – a meaningful weaker claim that remains practically useful.We further introduce a semantic manifest, a structured JSON contract dispatched to all agentsbefore query generation, that pre-resolves business terminology, schema constraints, anddomain definitions. This pre-dispatch alignment transforms disagreement from a noise signalinto a diagnostic one: when all agents operate from identical semantic premises, cross-agentdivergence indicates computational failure rather than interpretive ambiguity.The soul of this architecture is epistemic honesty. A single-agent system that returns a wronganswer is indistinguishable from one that returns a right answer – both produce a number, andboth look correct. Our system surfaces uncertainty rather than hiding it. When agents disagree,the system does not guess which is right – it flags that confidence is low and escalates forhuman review. We frame cross-agent disagreement as a first-class signal: not a failure of thesystem, but the system working as intended. A structured taxonomy of disagreement types withdistinct resolution paths makes this signal actionable rather than merely informative. This workis a conceptual and architectural proposal; empirical validation through a benchmarkingprotocol designed to measure silent failure rate is outlined as future work.
Pratik Kumar (Mon,) studied this question.