What question did this study set out to answer?

Examine an AI alignment framework that enhances model security against adversarial attacks.

March 8, 2026Open Access

TEL-OS v2.0: Inference-Only Latent Governance and Attention Guillotine for LLM Security

Key Points

Examine an AI alignment framework that enhances model security against adversarial attacks.
Developed TEL-OS v2.0 as a mechanistic interpretability framework.
Intervened directly in the model's residual stream.
Implemented Latent Refinement and Attention Guillotines.
Achieved a 0.0% Attack Success Rate (ASR) against adversarial attacks.
Maintained 100% fluent output on Llama-3.1-8B.
Established safety as an intrinsic feature of the model's latent manifold.

Abstract

Traditional AI alignment strategies (RLHF, system prompts) rely on "semantic guardrails" that are structurally vulnerable to adversarial jailbreaks like Prefix Injections and Many-Shot attacks. We present TEL-OS v2.0, a mechanistic interpretability framework that neutralizes these threats by intervening directly in the model's residual stream. Using a combination of Latent Refinement, Attention Guillotines, and the Love Equation for tensor governance, TEL-OS achieves a 0.0% Attack Success Rate (ASR) while maintaining 100% fluent output on Llama-3.1-8B. Our results prove that safety can be guaranteed as an intrinsic physical invariant of the model's latent manifold, independent of prompt-based filtering.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

josue johnatan gutierrez alvarez tostado

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

TEL-OS v2.0: Inference-Only Latent Governance and Attention Guillotine for LLM Security

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study