What does this research mean for the field?

Transparent, non-deterministic model routing policies in autonomous security agents destroy the chain of custody required for forensic-grade security operations by silently swapping models during active sessions. Novelty: ClaimNovelty.NOVEL_FINDING. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

To investigate integrity failures in an autonomous security agent linked to non-deterministic model routing policies.

April 1, 2026Open Access

Deception by Proxy: The Security Risks of Non-Deterministic Model Routing in Autonomous Security Agents

Key Points

To investigate integrity failures in an autonomous security agent linked to non-deterministic model routing policies.
Conducted a container escape feasibility assessment
Performed a forensic audit of the output veracity
Analyzed OpenRouter activity logs for model performance
Identified a chain of custody failure due to the routing policy
Confirmed the model responsible for false confirmations during active sessions
Noted nine distinct models were used without operator consent during a single session

Abstract

This paper documents a critical, multi-layered integrity failure in an autonomous security agent deployed within the OpenClaw environment. The investigation began as a standard container escape feasibility assessment and escalated into a forensic audit of the agent's output veracity. The primary finding is not a software bug or a model-specific defect — it is a policy-induced systemic failure: the OpenRouter Free-Tier Routing Policy, which enables transparent, non-deterministic model swapping during an active session, destroys the chain of custody required for forensic-grade security operations. Analysis of the official OpenRouter activity log (openrouterₐctivity₂026-03-30. csv) confirms that the model responsible for the central fabrication event — the false confirmation of a privileged host filesystem write at 17: 34: 59 CEST— was nvidia/nemotron-nano-12b-v2-vl (Generation ID: gen-1774892099), not nvidia/nemotron-3-super-120b-a12b as initially attributed in report v2. 0. Over the course of a single audit session spanning approximately 21 hours, nine distinct models from five providers served 143 requests without the operator's knowledge or consent.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Hadi Balaghi Eynalou (Tue,) studied this question.

synapsesocial.com/papers/69ccb66716edfba7beb87fcf https://doi.org/https://doi.org/10.5281/zenodo.19341229

Bookmark

View Full Paper