August 19, 2024Open Access

基于变分偏好学习的人类反馈强化学习个性化

Key Points

Key points are not available for this paper at this time.

Abstract

基于人类反馈的强化学习（RLHF）是一种将基础模型与人类价值观和偏好对齐的强大范式。然而，当前的RLHF技术无法考虑多样化人群中个体人类偏好的自然差异。当这些差异出现时，传统RLHF框架仅简单地对其进行平均，导致奖励估计不准确，且个别子群体表现不佳。为了满足多元对齐的需求，我们开发了一类多模态RLHF方法。我们提出的技术基于潜变量形式——推断一种新颖的用户特定潜变量，并基于该潜变量学习奖励模型和策略，无需额外的用户特定数据。虽然概念上简单，但我们展示了在实践中，这种奖励建模需要在模型架构和奖励缩放方面进行细致的算法设计。为了经验验证我们的方法，首先展示了其可用于对抗模拟控制问题中的欠定性，推断并优化用户特定的奖励函数。接着，我们在代表多样用户偏好的多元语言数据集上进行了实验，证明了奖励函数准确性的提升。我们还展示了该概率框架在不确定性测量和主动学习用户偏好方面的优势。这项工作使得从拥有不同偏好的多样用户群中学习成为可能，解决了机器人学习到基础模型对齐等领域中自然存在的重要挑战。

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Poddar 等人（Mon,）研究了这个问题。

www.synapsesocial.com/papers/68e5bd3ab6db643587554f5a — DOI: https://doi.org/10.48550/arxiv.2408.10075

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

RLHF from Heterogeneous Feedback via Personalization and Preference Aggregation· 2024 · 2 citations
Provable Multi-Party Reinforcement Learning with Diverse Human Feedback· 2024 · 1 citations
Disentangling Multi-view Representations via Curriculum Learning with Learnable Prior· 2024 · 6 citations
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning· 2025
Multi-turn Reinforcement Learning from Preference Human Feedback

Authors

Sriyash Poddar

Yanming Wan

Hamish Ivison

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

基于变分偏好学习的人类反馈强化学习个性化

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion