What question did this study set out to answer?

The aim is to develop a method for systematically selecting high-difficulty questions using language models and tournament structures.

March 13, 2026

An LLM-Based Automatic Selection of High-Difficulty Questions Using the Swiss Tournament Format

Key Points

The aim is to develop a method for systematically selecting high-difficulty questions using language models and tournament structures.
Utilized a large language model for dual difficulty comparison
Combined few-shot chain-of-thought (CoT) techniques with Swiss tournament format
Conducted experiments on the GSM8K benchmark dataset with 1,319 items
Achieved a 2.12 percentage point increase in accuracy over random selection
Showed a 1.36 percentage point improvement compared to uncertainty-based selection
Demonstrated a 10.16 percentage point enhancement over direct difficulty evaluation methods

Abstract

거대 언어 모델의 발전에 따라 문맥 내 학습은 언어 모델의 대표적인 활용법으로 주목받으며, 이에 다양한 프롬프트 기법이 연구되고 있다. 특히, 사고 과정의 명시를 유도한 few-shot CoT(Chain-of-Thought)는 소수의 예시 제공만으로 추론 성능을 극대화한 방법으로 알려져 있으나, 예시 구성에 따라 성능 편차가 발생한다는 한계가 존재한다. 기존 연구는 다양성이나 불확실성 등 일부 기준을 중심으로 예시를 구성할 질문을 선정해 왔으나, 난이도 기반 질문 선정에 관한 연구는 상대적으로 미진한 실정이다. 이에 본 연구는 언어 모델을 활용한 쌍대 난이도 비교와 스위스 토너먼트 구조를 결합하여, 고난도 질문을 체계적으로 선별하고 이를 기반으로 few-shot CoT 예시를 구축하는 새로운 방법론을 제안한다. 제안 방법론의 성능 평가를 위해 수학 서술형 벤치 마크인 GSM8K 데이터셋 1,319개 문항을 대상으로 실험을 수행한 결과, 제안 방법론이 무작위 선정, 불확실성 기반, 그리고 난이도 직접 평가 방식 대비 정확도 측면에서 각각 2.12%p, 1.36%p, 그리고 10.16%p의 성능 향상을 보임을 확인하였다.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Jooeun Lee

Minseob Song

Namgyu Kim

Journals

Journal of the Korea Society of Computer and Information

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

An LLM-Based Automatic Selection of High-Difficulty Questions Using the Swiss Tournament Format

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study