What question did this study set out to answer?

The central aim is to enhance recommendation performance by optimizing multiple, often conflicting, objectives simultaneously.

February 14, 2026

Deep Pareto Reinforcement Learning for Multi-Objective Recommender Systems

Key Points

The central aim is to enhance recommendation performance by optimizing multiple, often conflicting, objectives simultaneously.
Proposed Deep Pareto Reinforcement Learning (DeepPRL) method to model relationships between objectives.
Captures personalized and contextual preferences of consumers for different objectives.
Optimizes both short-term and long-term performance through strategic decision-making.
Achieved significant Pareto-dominance over existing state-of-the-art baselines.
Showed improvements in three conflicting business objectives on Alibaba's video platform.
Demonstrated a tangible economic impact in practical applications.

Abstract

Optimizing multiple objectives simultaneously is an important task for recommendation platforms to improve their performance. However, this task is particularly challenging since the relationships between different objectives are heterogeneous across different consumers and dynamically fluctuate according to different contexts, resulting in a Pareto-frontier in the result of recommendations, where the improvement of any objective comes at the cost of others. Existing multi-objective recommender systems do not systematically consider such dynamic relationships; instead, they balance between these objectives in a static and uniform manner, resulting in only suboptimal recommendation performance. In this paper, we propose a Deep Pareto Reinforcement Learning (DeepPRL) method, where we (1) comprehensively model the complex relationships between multiple recommendation objectives; (2) effectively capture personalized and contextual consumer preferences for each objective; (3) optimize both the short-term and the long-term recommendation performance. As a result, our method achieves significant Pareto-dominance over the state-of-the-art baselines across four offline experiments. Furthermore, we conducted a controlled experiment on Alibaba's video streaming platform, where our method simultaneously improved three conflicting business objectives significantly over the latest production system, demonstrating its tangible economic impact in practice.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Li et al. (Tue,) studied this question.

www.synapsesocial.com/papers/699011932ccff479cfe58666 — DOI: https://doi.org/10.25300/misq/2025/19488

Authors

Pan Li

Alexander Tuzhilin

Journals

MIS Quarterly

Actions

Institutions

New York University

Atlanta Technical College

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Deep Pareto Reinforcement Learning for Multi-Objective Recommender Systems

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion