Key points are not available for this paper at this time.
Sequential decision-making under multiple objective functions includes the problem of exhaustively searching for a Pareto-optimal policy and the problem of selecting a policy from the resulting set of Pareto-optimal policies based on the decision maker’s preferences. This paper focuses on the latter problem. In order to select a policy that reflects the decision maker’s preferences, it is necessary to order these policies, which is problematic because the decision-maker’s preferences are generally tacit knowledge. Furthermore, it is difficult to order them quantitatively. For this reason, conventional methods have mainly been used to elicit preferences through dialogue with decision-makers and through one-to-one comparisons. In contrast, this paper proposes a method based on inverse reinforcement learning to estimate the weight of each objective from the decision-making sequence. The estimated weights can be used to quantitatively evaluate the Pareto-optimal policies from the viewpoints of the decision-makers preferences. We applied the proposed method to the multi-objective reinforcement learning benchmark problem and verified its effectiveness as an elicitation method of weights for each objective function.
Building similarity graph...
Analyzing shared references across papers
Loading...
Akiko Ikenaga
Sachiyo Arai
Journal of Advanced Computational Intelligence and Intelligent Informatics
Chiba University
Building similarity graph...
Analyzing shared references across papers
Loading...
Ikenaga et al. (Tue,) studied this question.
www.synapsesocial.com/papers/68e73626b6db6435876afb4a — DOI: https://doi.org/10.20965/jaciii.2024.p0393