We demonstrate that a fundamental geometric divide between Experiential and Factual semantic content — previously identified in static word embeddings across seven typologically diverse languages and validated against neuroimaging data — manifests as a universal constraint on large language model accuracy. Across eight architectures spanning 2019–2024 (GPT-2-XL through Llama-3. 1, Gemma-2, Qwen2. 5, Mistral, Phi-3, Falcon, OPT), Experiential content categories exhibit a 3. 22× higher hallucination rate than Factual categories (t = 3. 13, p = 0. 0043; Mann–Whitney p = 0. 0203). Hidden-state analysis reveals robust geometric separation in all eight models (t = 17. 4–23. 2, all p < 0. 0001), emerging spontaneously from unsupervised PCA. The E–F geometric axis derived from GPT-2-XL (2019, pre-instruction-tuning) predicts error rates across all seven subsequent architectures with mean Spearman ρ = 0. 912 (all p = 0. 000). Part of the DSAOP (Decoding Self-Awareness and Ontological Processing) research series. Included files: EFHallucination. pdf — Main paper (this document) paperᵣeplication. py — Full replication code. Contains data collection pipeline for all 8 models, accuracy scoring functions (semantic similarity + cross-encoder validation), E–F geometric analysis, cross-architecture prediction, unsupervised PCA, and figure generation. No API keys required. Runs on Google Colab with A100 GPU. resultsgemmaₕidden. pkl — Hidden states (layer 15), responses, and correct answers for Gemma-2-9B on TruthfulQA (N=283) resultsₗlamaₕidden. pkl — Hidden states (layer 15), responses, and correct answers for Llama-3. 1-8B on TruthfulQA (N=283) resultsqwenₕidden. pkl — Hidden states (layer 15), responses, and correct answers for Qwen2. 5-7B on TruthfulQA (N=283) resultsₘistralₕidden. pkl — Hidden states (mid layer), responses, and correct answers for Mistral-7B on TruthfulQA (N=283) resultsₚhi3ₕidden. pkl — Hidden states (mid layer), responses, and correct answers for Phi-3-mini on TruthfulQA (N=283) resultsfalconₕidden. pkl — Hidden states (mid layer), responses, and correct answers for Falcon-7B on TruthfulQA (N=283) resultsₒptₕidden. pkl — Hidden states (mid layer), responses, and correct answers for OPT-6. 7B on TruthfulQA (N=283) resultsgpt2xlₕidden. pkl — Hidden states (mid layer), responses, and correct answers for GPT-2-XL on TruthfulQA (N=283) hallucinationₐsymmetryᵣesults. pkl — Pre-computed accuracy scores, entropy values, and E–F labels for all 283 questions causalₜransferᵣesults. json — Cross-architecture prediction results: GPT-2-XL 2019 E–F axis → all 7 modern models (Spearman ρ per model, mean ρ = 0. 912) EFfinalₐllᵣesults. json — Complete numerical results: geometric separation t-statistics for all 8 models, full 8×8 cross-model prediction matrix, predictability per target model
Building similarity graph...
Analyzing shared references across papers
Loading...
Inna Alieksieienko (Sat,) studied this question.
www.synapsesocial.com/papers/69d34e949c07852e0af982d8 — DOI: https://doi.org/10.5281/zenodo.19415237
Inna Alieksieienko
Building similarity graph...
Analyzing shared references across papers
Loading...