Large language model orchestration systems face a fundamental design tension: optimizing for deterministic reliability suppresses the exploratory capacity essential for scientific discovery, while permitting unconstrained generation undermines trustworthiness. We present a dual- mode cognitive architecture that resolves this tension through two complementary operating modes within a single governance framework. The Deliberation Modeemploys multi-model cross-validation to eliminate hallucination, achieving 202/202 on a 50-task agentic benchmark spanning code construction, information retrieval, and multi-step reasoning (Tao, 2026b). The Exploration Mode employs multi-model cross-stimulation to channel hallucination into structured hypothesis generation, guided by cross-skill resonance detection that flags convergent extrapolations across independent models. We validate the architecture through a three-round human-in-the-loop falsification experi- ment. An initial resonant hypothesis—that tropical mixed volume predicts neural network generalization error—was decisively refuted. However, the iterative falsification process produced three findings: architecture-dependent bias factors, a dimension-dependent phase transition at critical dimension d∗≈34, and a low-dimensional anomaly at d = 40 where complexity metrics decouple from generalization. In a Phase 4 experiment on two frontier mathematical problems, the system autonomously deployed 15 specialized Skills across 7+ heterogeneous models, executing 77 tool calls over 74 minutes, and produced structured falsi- fiable hypotheses—6 of 8 of which were subsequently confirmed computationally. Execution trace analysis revealed emergent inter-Skill coordination: models autonomously discovered and incorporated files written by other models, forming a self-organized information network under minimal planning constraints—a phenomenon we term ordered behavior under minimal constraints. Unlike conventional agent frameworks that treat LLMs as interchangeable backends for predefined roles, our architecture treats each LLM as a distinct Skill with its own capability profile, enabling stronger cognitive diversity and more meaningful resonance detection. These results demonstrate that the value of controlled exploration lies not in generating correct answers, but in opening investigative pathways that purely deterministic systems would never pursue.
Building similarity graph...
Analyzing shared references across papers
Loading...
Tao Rui (Sat,) studied this question.
www.synapsesocial.com/papers/69dc89473afacbeac03eb0a6 — DOI: https://doi.org/10.5281/zenodo.19505278
Tao Rui
Building similarity graph...
Analyzing shared references across papers
Loading...