What question did this study set out to answer?

The aim is to develop a method for detecting code hallucinations in domain-specific code generation large language models using type stability signals.

April 10, 2026Open Access

Type-Stability-Aware Hallucination Detection for Domain-Specific Code Generation LLMs

Key Points

The aim is to develop a method for detecting code hallucinations in domain-specific code generation large language models using type stability signals.
Proposed a four-axis evaluation framework including stability and coherence metrics.
Developed a Quality Gate system providing COMMIT/REPAIR/HALT verdicts.
Implemented a phased generation protocol focusing on structured outputs.
Introduced a definition-aware attention mechanism for improved context management.
Observed a positive correlation between type-stability scores and hallucination rates.
Found that leveraging type system knowledge can enhance model reliability without increased scale.

Abstract

Code generation by large language models (LLMs) has advanced rapidly toward practical deployment, yet the detection and suppression of code hallucinations—generated code that appears syntactically valid but is semantically incorrect—remains an open challenge. This paper proposes a method that leverages type stability signals intrinsic to Julia's type system as an endogenous indicator of hallucination. The approach comprises: (1) a four-axis evaluation framework (stability, boundary compliance, hallucination, coherence) driven by an internal type-stability prediction head; (2) a Quality Gate rendering ternary COMMIT/REPAIR/HALT verdicts with a self-repair loop; (3) a phased generation protocol constraining generation to struct definitions → function signatures → implementations; and (4) a definition-aware attention mechanism maintaining distance-independent attention to definition sites. Integrated into a 66M-parameter domain-specific Transformer, a tendency toward positive correlation between type-stability scores and hallucination rates is observed. These preliminary findings raise the possibility that directly injecting domain knowledge of the target language's type system into the model architecture may contribute to Compute-Efficient Reliability without relying on massive model scale.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Mitsuro Matsuta

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Type-Stability-Aware Hallucination Detection for Domain-Specific Code Generation LLMs

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study