What type of study is this?

This is a Quantitative Study study.

September 12, 2025Open Access

Automating Plan Evaluation Using Agentic Large Language Models

Key Points

LLMs perform comparably with human evaluations, highlighting potential for automating plan evaluation tasks.
Over 50 percent reduction in common machine errors is achieved with multi-agent approaches in LLM performance.
Leveraging LLMs alongside human oversight facilitates effective content analysis, balancing AI efficiency and human domain knowledge.
This research underscores the transition towards automated evaluation systems in content analysis with AI tools.

Abstract

Manual plan evaluation faces reliability and scalability challenges. This research benchmarks human evaluations against a large language model (LLM) using a multi-agent approach and/or retrieval-augmented generation (RAG) to automate complex content analysis tasks. We find that LLMs generally perform comparably with humans, with most errors arising from overimplication and limited domain knowledge. The multi-agent approach substantially enhances LLM’s performance, reducing common machine errors by over 50 percent. Integrating such LLM tools with human oversight will likely become the new norm for content analysis, and this study demonstrates how to leverage artificial intelligence’s (AI) efficiency and precision alongside humans’ contextual understanding and domain expertise.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Xinyu Fu

Chaosu Li

Journals

Journal of Planning Education and Research

Actions

Institutions

University of Hong Kong

Texas A&M University

Hong Kong University of Science and Technology

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Automating Plan Evaluation Using Agentic Large Language Models

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider