This preprint presents a deterministic navigation architecture that enables unmodified frontier LLMs to achieve 97–100% accuracy on complex specification queries — without fine-tuning, retraining, or embedding-based retrieval. Core Idea Large Language Models fail on large normative specifications not because they lack intelligence, but because they lack deterministic navigation. A 500-page ISO-grade document has deep hierarchical numbering, normative modality (MUST/SHOULD/MAY), cross-references across independent chapters, and no linear reading path. Instead of scaling context windows or using generic RAG, this work encodes the expert's mental map as a formal navigation algorithm. Architecture A 3-pass ontology compiler ingests a raw specification and produces 14 machine-readable indices. At runtime, an MCP (Model Context Protocol) server uses these indices to answer queries via a deterministic pipeline: Keyword enrichment (T&D expansion → co-occurrence → waterfall confirmation) Index search with ontological routing (WHAT/WHY/HOW/WHEN) Weighted reading plan construction Chain resolution → tier-weighted extraction Token budget management with auto-expansion Key Results Metric MCP System Baseline (no indices) Accuracy 100% (246 questions, 5 runs each) ~85% with high variance Spec tokens per query ~2–4k 25–30k (agentic) / 178k (full-context) Tool calls per query 1–2 15–20 Wall time ~3s 25–35s Token reduction 7× vs agentic, 15× vs full-context — Tested on Claude 3.5 Sonnet through Claude Sonnet 4 and reproduced on GPT-4. No fine-tuning — just structured tooling. Evaluation Corpus The evaluation corpus — the E.L.I.A. specification (~500 pages) — is guaranteed not to appear in any LLM training corpus: it is a private, unpublished language specification developed as part of this research. This eliminates data contamination risk entirely and ensures that every correct answer is attributable to the navigation architecture, not memorized training data. Benchmark 246 evaluation questions across 4 categories: Factual (82), Cross-referential (74), Philosophical/Abstract (45), Normative (45) 36-question hard subset targeting non-obvious, cross-referential, and philosophical queries 10-question conceptual set covering architectural, semantic, and design aspects 20-question demo set (Factual + Cross-referential) runnable against the open-source demo All benchmark questions, raw answer dumps, scoring tables, and the agentic benchmark runner are included in the companion repository. Supplementary Materials Three annexes accompany this preprint: Annex A — Trace Dumps: Three real MCP debug traces showing the full deterministic pipeline for WHAT, HOW, and HOW-Code queries Annex B — Test Methodology: Complete evaluation protocol — 4 evaluation modes, 3-axis scoring rubric (accuracy × completeness × precision), 4-artifact per-question format, 3 models tested, 800+ evaluation artifacts Annex C — Incremental Ablation: Layer-by-layer system build-up showing accuracy progression from 70% to 99+% Companion Repository The open-source companion repository contains: MCP Spec-Reader server (TypeScript) with the full deterministic pipeline 14 pre-built JSON index files Index build scripts (compiler) Agentic benchmark runner All benchmark question files (246 + 36 + 10 + 20) Two representative specification sections for demo evaluation Agent configuration files (specification structure, patterns, ontological mappings) Repository: chudinovuv-arhiveX-1-demo Patent Status Patent pending. Provisional patent applications covering the navigation architecture, index compilation method, and auto-regressive compiler described in this paper have been filed. Citation Chudinov, Y. (2026). Skill Without Training: Deterministic Knowledge Navigation for Large Language Models over Structured Documents. Preprint.
Building similarity graph...
Analyzing shared references across papers
Loading...
Yurii Chudinov
Building similarity graph...
Analyzing shared references across papers
Loading...
Yurii Chudinov (Tue,) studied this question.
www.synapsesocial.com/papers/69b25afb96eeacc4fcec93d2 — DOI: https://doi.org/10.5281/zenodo.18944351
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: