What question did this study set out to answer?

This campaign aims to explore how small, constrained neural networks can outperform larger models in both performance and interpretability.

May 2, 2026Open Access

Project GlassBox: Structure Over Scale in Neural Reasoning — An 81-Phase Campaign on Architectural Transparency, Antifragile Adaptation, and the AGI Horizon

Key Points

This campaign aims to explore how small, constrained neural networks can outperform larger models in both performance and interpretability.
Executed a systematic 81-phase experimental campaign on neural architectures.
Utilized a 77K-parameter Graph Neural Network with Pointer attention against a 1.45M-parameter Transformer baseline.
Implemented test-time gradient adaptation and various configurations to maximize accuracy.
Achieved a peak accuracy of 90.8% with multi-step latent reasoning dynamics.
Outperformed the 1.45M-parameter model with a 19× reduction in parameters.
Demonstrated 82.8% attribution coverage exceeding the baseline models by 3.3×.

Abstract

Project GlassBox is a systematic 81-phase experimental campaign demonstrating that small, structurally constrained neural architectures can simultaneously achieve superior task performance and unprecedented interpretability compared to large unconstrained models. Using ARC-AGI as a benchmark for abstract visual reasoning, a 77K-parameter Graph Neural Network with Pointer attention (the "GlassBox Agent") outperforms a 1.45M-parameter Transformer baseline (56.8% vs 43.9% full match accuracy). Through test-time gradient adaptation with geometric data augmentation, accuracy reaches 87.4%, and in v3, the Ultimate Configuration — L2 ablation at 20%, adaptation LR of 0.1, and Model Soup inference (K=5) — achieves 88.9% accuracy with 2.0% standard deviation across 3 seeds. Latent graph dynamics with multi-step reasoning in hidden space achieves 90.8%, the campaign's peak accuracy. What's new in v3: Mechanistic Anatomy (Phase 67): Linear probes prove GNN L1 encodes low-level features (color: 90%) while L2 specializes in high-level rules (operation: 78%), explaining why L2 ablation triggers optimal super-recovery. Zero-Shot Rule Synthesis (Phase 68): TTT recovers 50% accuracy on completely novel operations unseen during training — proving on-the-fly rule creation, not mere memorization. Ultimate Configuration (Phase 75): L2 Ablate 20% + LR 0.1 + Model Soup K=5 = 88.9% mean, the campaign's most reliable multi-seed configuration. Latent Graph Dynamics (Phase 79): Multi-step reasoning in latent space achieves 90.8% — matching the campaign's peak without DSL bottleneck. Prior Knowledge Dominance (Phase 72): Handcrafted BFS outperforms learned Slot Attention by 27× (62.1% vs 2.3%), proving human prior knowledge is a decisive advantage in low-data regimes. Continual Self-Play (Phase 78): Experience replay eliminates catastrophic forgetting, enabling stable self-improvement (+1.1% per iteration). 5 new summary figures: 81-phase journey timeline, innovation waterfall, breakthrough map, structure vs scale evidence, and layer anatomy visualization. Key Results: Structure > Scale: 77K structured parameters outperform 1.45M unstructured parameters (19× smaller, higher accuracy) Hydra Self-Repair: First quantitative characterization of neural self repair — after destroying 50% of model neurons, few-shot adaptation recovers 95.8% of original performance 82.8% Attribution: Full causal path tracing for 82.8% of predictions, exceeding by 3.3× the 25% attribution coverage reported for large language models Ablation as Variance Regularizer: Gradient-based ablation at 12–15% reduces seed-dependent variance by 4–5×, transforming ablation from a performance booster into a reliability mechanism Ultimate Configuration: 88.9% with L2 ablation + high LR + Model Soup (multi-seed validated) Latent Reasoning Peak: 90.8% via multi-step latent graph dynamics Source code: https://github.com/hafufu-stack/glassbox Acknowledgments This research was conducted entirely independently, without institutional affiliation or corporate funding. The author currently faces financial constraints that make it increasingly difficult to maintain subscriptions to AI services essential for this line of research. To sustain and improve the quality of future work, the author is actively seeking community sponsorship. Details are available at https://github.com/sponsors/hafufu-stack.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Hiroto Funasaki

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Project GlassBox: Structure Over Scale in Neural Reasoning — An 81-Phase Campaign on Architectural Transparency, Antifragile Adaptation, and the AGI Horizon

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider