What question did this study set out to answer?

The aim is to improve the detection of AI-generated text by addressing vulnerabilities in existing models.

February 28, 2026

Generated Text Detection Using Representation Learning with Token Prediction Model

Key Points

The aim is to improve the detection of AI-generated text by addressing vulnerabilities in existing models.
Evaluation of text generated by multiple large language models (LLMs)
Comparison of token prediction results to calculate negative probability curves
Application of representation learning to enhance detection capabilities
Demonstrated effectiveness of the proposed method in detecting AI-generated text
Highlighted vulnerabilities of current methods to paraphrasing and expression changes

Abstract

거대 언어 모델(LLM, Large Language Model)이 ChatGPT와 같은 대화형 챗봇 형태로 본격적으로 상용화됨에 따라 여러 사회적 부작용이 속출하고 있다. 인공지능으로 생성한 내용을 자신이 작성한 것처럼 주장하고 사용하는 부정행위와 허위정보가 포함된 가짜뉴스를 생성하여 배포하는 것이 대표적인 사례이다. 인공지능 생성 텍스트의 악용을 방지하기 위해 인공지능 생성 텍스트를 구분할 수 있는 다양한 방법이 제기되었다. 검사 대상 텍스트를 여러 LLM 모델로 평가하여 토큰 예측 결과를 비교하여 음의 확률 곡률을 계산하는 제로 샷(zero-shot) 방법이 현재 가장 많이 활용되고 있다. 이후 모델에 따른 용이한 대응을 위해 검출 대상 모델의 분포를 복제하여 활용하는 학습 기반 구분 방법도 제안되었다. 그러나 이러한 기존 방법들은 단순히 토큰이 나올 확률을 예측하는 방법이기 때문에 의역(paraphrasing)과 같은 문장의 표현 변화에 취약하다. 본 논문에서는 이러한 문제를 해결하기 위해 예측 모델에 표현 학습(representation learning)을 적용한 방법을 제안하고 생성 텍스트 검출을 위한 활용 가능성을 확인한다.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Yeon-Seung Choo

Daejeon University

Yong-Suk Park

Journals

The Journal of Korean Institute of Communications and Information Sciences

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Generated Text Detection Using Representation Learning with Token Prediction Model

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study