What question did this study set out to answer?

This project aims to explore the algebraic structures involved in software compilation and decompilation within a unified 5D latent space.

May 8, 2026Open Access

Project Rosetta: Linear Algebraic Compilation, Neural Decompilation, and the 10 Laws of Software Physics in a Unified 5D Latent Space

Key Points

This project aims to explore the algebraic structures involved in software compilation and decompilation within a unified 5D latent space.
Conducted 86-phases of systematic investigation into software compilation and decompilation.
Developed a 64-dimensional contrastive embedding aligning natural language, AST, and bytecode.
Implemented various models including GRU decoders and neural networks for semantic accuracy and prediction.
Compilation as a linear operator demonstrated with R²=0.965 and 90% energy captured in 4 SVD dimensions.
Achieved 100% semantic accuracy in reconstructing Python source from latent vectors.
Discovered that programs reside on a 5-dimensional manifold exhibiting significant algebraic properties.

Abstract

Abstract Project Rosetta is a systematic 86-phase investigation into the algebraic structure of compilation, decompilation, and semantic manipulation within a shared latent space. By aligning natural language (NL), Python AST, and compiled bytecode into a 64-dimensional contrastive embedding, I demonstrate that compilation is a linear operator (R²=0.965), decompilation achieves 100% semantic accuracy, and programs reside on a 5-dimensional manifold governed by gauge symmetries, conservation laws, and non-commutative operator algebra. Foundation (Phases 1–23) Compilation = Matrix Multiply: AST→Binary captured by a 64×64 matrix (R²=0.965), with 90% of energy in just 4 SVD dimensions. Generative Decompilation: A GRU decoder reconstructs Python source from latent vectors with 100% semantic accuracy. Semantic Code Surgery: SVD-axis interventions alter program semantics in 64% of cases — all outputs are valid Python. Neural CPU: Function embeddings predict execution results at R²=0.924 without running Python. Code Morphing: Linear interpolation produces smooth transitions between programs (e.g., addition → multiplication). Beyond Linearity: The Physics of Software (Phases 24–48) Symbol Grounding: The "fMRI of the compiler" — NL concepts activate specific bytecode dimensions (98.4% bit-level prediction accuracy). Information Preservation Law: Compilation preserves 100% of mutual information (~5300 bits) despite discarding 44/64 dimensions. The Rosetta Paradox: Signal dimensions carry 2.2× higher information density than null dimensions — the compiler is an information concentrator. True Dimensionality = 5: Programs reside on a ~5-dimensional manifold — fewer degrees of freedom than a Rubik's cube. The Five Elements of Code: Five principal axes (data-structure↔computation, numeric↔textual, collection↔string, comparison↔transformation, logic↔operation) account for 86.3% of all program variance. Phase Transitions: Sharp semantic boundaries exist in the code manifold — x + y becomes x != y at just 8% interpolation. Semantic Gravity: min/max/len are gravitational wells (mass=39); x + y is the most isolated function (mass=3). The Hidden Highway: Comparison operations (x >= y) serve as the "crossroads" of the code manifold, appearing on 22/45 interpolation routes. The Ultimate Grounding (Phases 49–68) 5D Invariance: Adding Turing-complete control flow (if/else, for/while) changes the PCA variance by less than ±0.1%. The 5D structure is universal. I/O Search: Given input-output examples, exhaustive search in the 5D space achieves 100% accuracy — every I/O pair maps to the correct function. The Latent Linter: x+y and x-y point in opposite directions (cos=−0.770), enabling semantic contradiction detection without execution. The Latent Antivirus: Variable-name obfuscation has zero effect (cos=1.000); logic inversion is detected at 100% precision, 83% recall. Silicon Translation: The 5D embedding decodes directly to x86-like machine instructions at 76.7% accuracy — Python to silicon without a compiler. 14 Species of Code: DBSCAN discovers 14 natural clusters with 0% noise. Every function belongs to exactly one species. Non-Commutative Operator Algebra: Function composition achieves 64% accuracy via bilinear operators (vs. 9% for naive addition). Attractor-Stabilized NVM: Projecting onto cluster centroids reduces Neural VM drift by 99.88%. The Deeper Universe (Phases 69–86) Perfect Gauge Symmetry: Variable renaming is a perfect symmetry (cos=1.000, distance=0.000). Variable names are gauge degrees of freedom. Noether's Theorem for Software: All 6 computed charges (norm, energy, angles, parity, angular moment) are perfectly conserved under variable renaming. The Periodic Table of Programs: 236 functions organized into a 7-period, 10-group periodic table with 21 element types. The Latent Calculator: A neural network predicts f(x,y) from the 5D embedding at R²=0.97 — approximate code execution without source code. Program Analogies: Word2Vec-style arithmetic: (x+y)−(a+b)+(x*y)=(a*b) succeeds perfectly (distance=0.000). Information Compression: 10.2× compression (193→19 bits). Each of 5 dimensions carries ~3.8 bits of entropy. Structure-Behavior Independence: Structural distance and behavioral distance correlate at only r=0.034 — effectively orthogonal. The 10 Laws of Software Physics (Rosetta Score: 90.8/100) The 5-Dimensional Theorem (87% variance) The Variable Symmetry Law (cos=1.000) Noether's Software Theorem (6/6 charges conserved) The Operator Algebra Law (64% non-commutative) The Taxonomy Theorem (14 species, 0% noise) The Independence Principle (r=0.034) The Compression Theorem (10.2× compression) The Continuity Theorem (meaningful interpolation) The Semantic Invariance Law (100% malware precision) The Rosetta Principle (source, behavior, and machine code are projections of one 5D object) What's New in V3 38 new experimental phases (P49–P86) covering Turing-complete invariance, malware detection, silicon translation, gauge symmetry, conservation laws, and the Latent Calculator New section: "The Ultimate Grounding" (Phases 49–68) — practical applications of the 5D manifold New section: "The Deeper Universe" (Phases 69–86) — mathematical deep structure, Noether's theorem, periodic table Formulation of the 10 Laws of Software Physics with unified Rosetta Score of 90.8/100 9 new figures including V3 architecture overview, taxonomy, conservation laws, and the Latent Calculator 2 new summary tables covering all 38 new phases Paper expanded from 17 to 24 pages Acknowledgments This research was conducted entirely independently, without institutional affiliation or corporate funding. The author currently faces financial constraints that make it increasingly difficult to maintain subscriptions to AI services essential for this line of research. To sustain and improve the quality of future work, the author is actively seeking community sponsorship. Details are available at https://github.com/sponsors/hafufu-stack.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Hiroto Funasaki (Wed,) studied this question.

www.synapsesocial.com/papers/69fd7f0dbfa21ec5bbf0775b — DOI: https://doi.org/10.5281/zenodo.20055315

Project Rosetta: Linear Algebraic Compilation, Neural Decompilation, and the 10 Laws of Software Physics in a Unified 5D Latent Space

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion