We introduce Reflexive Intelligence, a framework for AI decision-making in Observer-Participant Environments (OPEs) — systems where an agent's actions causally alter the environment it seeks to predict. Unlike conventional reinforcement learning benchmarks operating in observer-invariant settings, OPEs are characterized by reflexivity: participant beliefs and actions recursively reshape system dynamics. We formalize this distinction, identify the Reward Interaction Problem in multi-objective GRPO training, and present empirical findings from a financial market implementation using a 3B active-parameter MoE model. Results suggest reflexive reasoning capabilities can be induced through targeted training methodologies even in smaller models, with implications for AI deployment in financial markets, policy systems, and social platforms.
Building similarity graph...
Analyzing shared references across papers
Loading...
Zhang Mian
Building similarity graph...
Analyzing shared references across papers
Loading...
Zhang Mian (Mon,) studied this question.
www.synapsesocial.com/papers/69df2c88e4eeef8a2a6b1a99 — DOI: https://doi.org/10.5281/zenodo.19557261