What type of study is this?

This is a Experimental Study study.

October 17, 2025Open Access

Aligning Agent Policies with Preferences: Human-Centered Interpretable Reinforcement Learning

Puntos clave

Aligning policies with user preferences leads to more effective reinforcement learning outcomes.
The framework interleaves preference learning with evolutionary algorithms to optimize policy generation.
Using feature vectors helps represent policies in a meaningful way, enhancing interpretability.
The approach demonstrates increased efficiency by minimizing unnecessary user queries through selective filtering.

Resumen

An unaddressed challenge in interpretable reinforcement learning (RL) is to enable AI agents to integrate preference feedback into the policy generation process. Existing methods collect feedback only after training is complete, neglecting opportunities to inform the learning process. To address this gap, we propose a novel framework to align interpretable policies with human feedback during training. Our framework interleaves preference learning with an evolutionary algorithm, using updated preference estimates to guide the generation of better-aligned policies, and using newly-generated policies to query users to refine the preference model. Evolutionary algorithms enable the exploration of the full space of policies; however, it is intractable to maintain separate preference estimates---like win rates or utility values---for each individual policy in this infinite space. To handle this challenge, we propose to represent policies as feature vectors consisting of a finite set of meaningful attributes. For example, among a set of policies with similar performance, some may be more intuitive or more amenable to human intervention. To maximize the value of each user query, we employ a novel filtering technique to avoid presenting policies that are dominated in all dimensions, as repeated selections of clearly superior policies provide little information. We validate our method with experiments on synthetic preference data on two RL environments. We show that it produces RL policies that are not only better-aligned with user preferences but also more efficient in the number of user queries.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Milani et al. (Wed,) studied this question.

www.synapsesocial.com/papers/68f19f20de32064e504ddbbc — DOI: https://doi.org/10.1609/aies.v8i2.36668

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Authors

Stephanie Milani

Zhicheng Zhang

Nicholay Topin

Actions

Institutions

Carnegie Mellon University

Rutgers, The State University of New Jersey

Rutgers Sexual and Reproductive Health and Rights

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Aligning Agent Policies with Preferences: Human-Centered Interpretable Reinforcement Learning

Puntos clave

Resumen

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion