July 5, 2024Open Access

대형 언어 모델은 전략적 의사결정자인가? 2인 비제로섬 게임에서 성능과 편향에 대한 연구

Key Points

Key points are not available for this paper at this time.

Abstract

대형 언어 모델(LLMs)은 실제 환경에서 점점 더 많이 사용되고 있지만, 그들의 전략적 능력은 여전히 거의 탐구되지 않았다. 게임 이론은 LLM이 다른 에이전트와의 상호작용에서 의사결정 능력을 평가하는 좋은 틀을 제공한다. 이전 연구들은 신중하게 선별된 프롬프트로 LLM이 이러한 과제를 해결할 수 있음을 보여주었지만, 문제 설정이나 프롬프트가 변경되면 실패한다. 본 연구에서는 Stag Hunt와 Prisoner Dilemma라는 전략 게임에서 LLM의 행동을 조사하고, 다양한 설정과 프롬프트 하에서 성능 변화를 분석했다. 결과는 테스트된 최신 LLM들이 다음 세 가지 체계적 편향 중 적어도 하나를 보임을 보여준다: (1) 위치 편향, (2) 페이오프 편향, (3) 행동 편향. 이후 우리는 게임 구성이 이 편향들과 어긋날 때 LLM의 성능이 떨어지는 것을 관찰했다. 성능 평가는 두 플레이어의 선호 행동과 일치하는 올바른 행동 선택을 기준으로 한다. 정렬이란 LLM의 편향이 올바른 행동과 일치하는지를 의미한다. 예를 들어, GPT-4o의 평균 성능은 정렬되지 않을 때 34% 감소한다. 또한, "더 크고 최신일수록 좋다"는 현재 추세는 위의 상황에서는 적용되지 않으며, GPT-4o(현 최고 성능 LLM)가 가장 큰 성능 하락을 겪는다. 마지막으로, 연쇄 사고(chain-of-thought) 프롬프팅이 대부분 모델에서 편향의 영향을 줄이긴 하지만, 근본적인 문제 해결에는 아직 멀었다고 주목한다.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Herr 등(Fri,)이 이 질문을 연구했다.

www.synapsesocial.com/papers/68e614bab6db6435875a7bfd — DOI: https://doi.org/10.48550/arxiv.2407.04467

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Playing games with Large language models: Randomness and strategy· 2025
How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments· 2024 · 8 citations
Strategic behavior of large language models and the role of game structure versus contextual framing· 2024 · 20 citations
Can Large Language Models Serve as Rational Players in Game Theory? A Systematic Analysis· 2024 · 31 citations

Authors

Nathan Herr

Fernando Acero

Roberta Raileanu

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

대형 언어 모델은 전략적 의사결정자인가? 2인 비제로섬 게임에서 성능과 편향에 대한 연구

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion