What question did this study set out to answer?

The study aims to address the misclassification of LLM outputs due to the rule–semantic labeling gap.

April 10, 2026Open Access

Shadow Semantic Review: Closing the Rule–Semantic Gap in LLM Evaluators via Named Failure Motif Injection

Key Points

The study aims to address the misclassification of LLM outputs due to the rule–semantic labeling gap.
Identified semantic discrepancies in LLM outputs when evaluated with rule-based methods.
Developed a pattern named Shadow Semantic Review to inject named failure motifs during evaluation.
Implemented this approach in a financial signal evaluation system using a 7-billion-parameter LLM.
Thematic mismatch accounts for 78.6% of rule–semantic disagreements, indicating systematic issues.
Initial observations from 14 shadow-reviewed rows underscored the prevalence of this gap.

Abstract

Rule-based evaluators applied to large language model (LLM) outputs systematically misclassify outputs that satisfy the communicative intent of a task while failing the surface formrequired by the rule. We identify this as the rule–semantic labeling gap and describe a lightweightpattern—Shadow Semantic Review—that addresses it through three steps: (1) a secondLLM pass that reviews rule-assigned labels for semantic correctness, (2) classification of disagreements into a named failure motif taxonomy, and (3) forward injection of those motif namesas generation constraints in the next prompt cycle. We instantiate this pattern in a deployedfinancial signal evaluation system where a local 7-billion-parameter LLM generates structuredmarket analyses that are evaluated by a deterministic rule-based harness and audited by ashadow LLM reviewer. Early observations from 14 shadow-reviewed rows reveal that a singlemotif class—thematic mismatch—accounts for 78.6% of rule–semantic disagreements, indicatingthe gap is systematic rather than random. We release the pattern and motif taxonomy as priorart; a 90-day longitudinal ablation study comparing pre- and post-injection recurrence rates isactive, with a full results paper to follow.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Sami Allan Kaurila

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Shadow Semantic Review: Closing the Rule–Semantic Gap in LLM Evaluators via Named Failure Motif Injection

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study