An unaddressed challenge in interpretable reinforcement learning (RL) is to enable AI agents to integrate preference feedback into the policy generation process. Existing methods collect feedback only after training is complete, neglecting opportunities to inform the learning process. To address this gap, we propose a novel framework to align interpretable policies with human feedback during training. Our framework interleaves preference learning with an evolutionary algorithm, using updated preference estimates to guide the generation of better-aligned policies, and using newly-generated policies to query users to refine the preference model. Evolutionary algorithms enable the exploration of the full space of policies; however, it is intractable to maintain separate preference estimates---like win rates or utility values---for each individual policy in this infinite space. To handle this challenge, we propose to represent policies as feature vectors consisting of a finite set of meaningful attributes. For example, among a set of policies with similar performance, some may be more intuitive or more amenable to human intervention. To maximize the value of each user query, we employ a novel filtering technique to avoid presenting policies that are dominated in all dimensions, as repeated selections of clearly superior policies provide little information. We validate our method with experiments on synthetic preference data on two RL environments. We show that it produces RL policies that are not only better-aligned with user preferences but also more efficient in the number of user queries.
Building similarity graph...
Analyzing shared references across papers
Loading...
Milani et al. (Wed,) studied this question.
www.synapsesocial.com/papers/68f19f20de32064e504ddbbc — DOI: https://doi.org/10.1609/aies.v8i2.36668
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:
Stephanie Milani
Zhicheng Zhang
Nicholay Topin
Carnegie Mellon University
Rutgers, The State University of New Jersey
Rutgers Sexual and Reproductive Health and Rights
Building similarity graph...
Analyzing shared references across papers
Loading...