April 19, 2024Open Access

Mapping Social Choice Theory to RLHF

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

Recent work on the limitations of using reinforcement learning from human feedback (RLHF) to incorporate human preferences into model behavior often raises social choice theory as a reference point. Social choice theory's analysis of settings such as voting mechanisms provides technical infrastructure that can inform how to aggregate human preferences amid disagreement. We analyze the problem settings of social choice and RLHF, identify key differences between them, and discuss how these differences may affect the RLHF interpretation of well-known technical results in social choice.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Dai et al. (Fri,) studied this question.

www.synapsesocial.com/papers/68e6e75fb6db643587662e32 — DOI: https://doi.org/10.48550/arxiv.2404.13038

Authors

Jessica Dai

Eve Fleisig

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Mapping Social Choice Theory to RLHF

Puntos clave

Resumen

Citation Network

Connected Papers

Discussion

Cite this study

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion